Contact Us
24/7
Python BlogDjango BlogSearch for Kubernetes Big DataSearch for Kubernetes AWS BlogCloud Services

News

<< All NewsI Can’t Believe It’s Not Real Data at July 2022 IndyPy

I Can’t Believe It’s Not Real Data at July 2022 IndyPy

July 20, 2022

The July 2022 edition of IndyPy — Indiana’s largest Python meetup founded in 2007 by Six Feet Up CTO and Amazon Web Services (AWS) Community Hero, Calvin Hendryx-Parker — featured an introduction to synthetic data. Mason Egger, lead developer advocate at Gretel, walked through use cases for synthetic data as well as some of the major benefits for software developers and data scientists alike.

According to Egger’s presentation, 35% of a data scientists time is spent in data gathering, and there are a number of hurdles to getting an ideal set of data for machine learning models or software applications. For example, access to some data can be limited by data privacy regulations, and data sets that are too small or too skewed can result in models that don’t reflect reality.

Egger says synthetic data gets around this issue by producing an extended data set that is based on real data so that, although the data set is not real and doesn’t contain personally identifiable information, it is at least representative of a real set. In his presentation, Egger:

Watch the Presentation:

Did you miss the presentation? Watch the recording and explore tidbits via @IndyPy’s live Twitter thread.

Links and Resources:

You can find Mason on:

Best 49 open source synthetic data projects: https://www.opensourceagenda.com/tags/synthetic-data

Gretel Synthetics documentation: https://synthetics.docs.gretel.ai/en/stable/

Gretel synthetic data generators: https://github.com/gretelai/gretel-synthetics

Notebooks containing fun things you can do with Gretel synthetic data: https://github.com/gretelai/fun-with-synthetic-data

How can we assist you in reaching your objectives?
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.