IndyPy: The Risks of Code-Writing ML Models

The August 2022 edition of IndyPy — Indiana’s largest Python meetup founded in 2007 by Six Feet Up CTO and Amazon Web Services (AWS) Community Hero, Calvin Hendryx-Parker — featured an analysis of some of the risks involved with the emerging field of code-writing machine learning models.

In his presentation, Nick Doiron, a software engineer at Determined AI, discusses Large Language Models (LLM), which process large amounts of text and data to create a model of a given language. Such models have resulted in products such as GitHub Copilot, an AI model that GitHub claims suggests 40% of code in projects where it’s used.

While code-writing ML models can be very helpful in helping programmers solve problems, Doiron says there are some factors to keep in mind. In his presentation, Doiron discusses:

some of the pitfalls to text-based LLMs such as deepfake text and biases,
previously reported issues with code-writing models,
how LLMs can be trained to understand a given language,
tasks that LLMs are typically trained for; and
broader questions about code-writing ML, such as whether or not it could be plagiarism.

Watch the Presentation

Did you miss the presentation? Watch the recording and explore tidbits via @IndyPy’s live Twitter thread.

Links and Resources

You can find Doiron on:

LinkedIn: https://www.linkedin.com/in/nickdoiron/
Twitter: https://twitter.com/mapmeld?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor
GitHub: https://github.com/mapmeld

Tools to get started with ML models:

IndyPy: The Risks of Code-Writing ML Models

Watch the Presentation

Links and Resources

Contact Us

HEAR FROM US