I Can’t Believe It’s Not Real Data at July 2022 IndyPy

The July 2022 edition of IndyPy — Indiana’s largest Python meetup founded in 2007 by Six Feet Up CTO and Amazon Web Services (AWS) Community Hero, Calvin Hendryx-Parker — featured an introduction to synthetic data. Mason Egger, lead developer advocate at Gretel, walked through use cases for synthetic data as well as some of the major benefits for software developers and data scientists alike.

According to Egger’s presentation, 35% of a data scientists time is spent in data gathering, and there are a number of hurdles to getting an ideal set of data for machine learning models or software applications. For example, access to some data can be limited by data privacy regulations, and data sets that are too small or too skewed can result in models that don’t reflect reality.

Egger says synthetic data gets around this issue by producing an extended data set that is based on real data so that, although the data set is not real and doesn’t contain personally identifiable information, it is at least representative of a real set. In his presentation, Egger:

discusses the three major benefits of using synthetic data,
gives examples of fields where synthetic data is used, such as cybersecurity, healthcare, and life sciences;
discusses how synthetic data is different from data created by systems like Faker,
shares a number of open source synthetic data projects; and
demonstrated how synthetic data is used in a Gretel application.

Watch the Presentation:

Did you miss the presentation? Watch the recording and explore tidbits via @IndyPy’s live Twitter thread.

Links and Resources:

You can find Mason on:

LinkedIn: https://www.linkedin.com/in/mason-egger/
GitHub: https://github.com/MasonEgger
Twitter: https://twitter.com/masonegger
His website: https://mason.dev/about/

Gretel Synthetics documentation: https://synthetics.docs.gretel.ai/en/stable/

Gretel synthetic data generators: https://github.com/gretelai/gretel-synthetics

Notebooks containing fun things you can do with Gretel synthetic data: https://github.com/gretelai/fun-with-synthetic-data

I Can’t Believe It’s Not Real Data at July 2022 IndyPy

Watch the Presentation:

Links and Resources:

Contact Us

HEAR FROM US