Big data pipelines aggregate petabytes of data daily, and without the right infrastructure, they are complex and overwhelming. While there are numerous cloud solutions advertised for managing big data pipelines, there is a point at which out-of-the-box solutions can’t scale up to meet demand. On the other hand, proprietary tools can be hard to maintain and nearly impossible to scale.
Health systems and big data go hand-in-hand. Every day, hospitals need to aggregate data — including patient data, financial information, employee timesheets and much, much more — to help make life-changing health and business decisions. With 16 hospitals and roughly 36,000 employees under its umbrella, a statewide health system was in need of a bespoke, developer-first infrastructure that could handle this much data, while also being easy to manage and update with new data sources.
Six Feet Up helped the health system rebuild its infrastructure and implement a new world of cloud native and open source tooling, including Airflow, Spark, Delta Lakes and Terraform. The developer-first infrastructure:
In addition to making life easier for the health system's developers, it also allows for employees who would otherwise be tasked with data entry to be reallocated to roles focused on patient care and outcomes. Furthermore, with this automation and capacity, the health system will be able to sequence genomes in-house — recouping the cost of third-party fees and allowing genomic sequencing to be used more widely.
Due to the impressive, purposeful and transformative nature of the mission this technology supports, this project has been designated as one of Six Feet Up’s 10 IMPACTFUL Projects. Six Feet Up’s 10-year goal is to complete 10 IMPACTFUL Projects by 2025.
The health system — who knew Python was the go-to programming language for big data — reached out to Six Feet Up to review the code used to manage their data pipeline. Six Feet Up’s seasoned developers quickly spotted broader issues with the existing cloud infrastructure.
After helping the in-house team use Python to optimize the existing pipeline infrastructure, Six Feet Up developers identified an all-new stack that would make the process of analyzing big data more reliable. Plus, the new stack would make onboarding new developers easier.
Specifically, Six Feet Up:
Six Feet Up developed a proof of concept for the new design. While the health system’s data pipeline infrastructure was already in the cloud, Six Feet Up experts saw areas where best practices could be implemented.
In designing the new stack, Six Feet Up’s experts:
Developer experience and usability remained top-of-mind during the design process. As such, Six Feet Up recommended a solution that would allow developers to deploy the entire infrastructure without touching the console. Plus, the infrastructure-as-code (IaC) design — which defines how you configure and build your architecture — made making updates less complex for developers.
The new structure builds a set of IaC based on configuration files that is then applied to the live structure. This ensures that when a change is made, all associated cloud objects — of which there are half a million — are changed accordingly up and down the pipeline. You can read more about how Six Feet Up experts were able to manage workflows up and down the pipeline using configuration files in the blog "Too Big for DAG Factories?"
The IaC design also builds in an inherent safety net by allowing the code to be easily and quickly deployed and rolled back in the event of a failure. All of this means that the health system’s developers can now:
A fully automated CI/CD pipeline where in-house developers to build, test, and deploy software on a massive system with a push of a button was critical to the developer experience. Six Feet Up experts designed and executed a fully automated infrastructure and release process which allows developers to repeatably deploy with confidence and manage any configuration drift with source control and Terraform.
While other companies have used these tools, the scope and sophistication of this self-deploying infrastructure is more comprehensive than many solutions out there today.
To test the new stack, Six Feet Up’s developers needed a way to run the same data loads coming from numerous sources in a local environment, but due to patient privacy laws, much of the health system’s data was off limits.
Using simulated data from Faker, the Six Feet Up team simulated the health system’s third-party data sources, targets, and everything in between to test the new stack and ensure that it would work when faced with those loads in the live environment. The team also used this data to test against a simulated version of the health system’s existing stack to compare solutions and root out integration issues.
With help from the experts at Astronomer, a Six Feet Up partner and the company behind Apache Airflow, the Six Feet Up team used Terraform to deploy the new orchestrator and data pipeline stack to production. Astronomer enhances deployment and revision of infrastructure without having to adhere to legacy tooling.
Because of the testing done using the simulated data, the new stack integrated with all of the health system’s data sources, third-party platforms, and SQL data warehouse.
Due to the number of configuration files needed to manage every table, data source, and file, Six Feet Up’s experts also devised an SQL-like query language that allows the health system’s developers to quickly find the configuration files they wish to update.
Since implementation, the health system’s developers have given the new stack positive reviews.
The new stack has made data pipeline management much more user-friendly for the health system’s development staff by:
Six Feet Up will continue to build tools to further help the health system’s developers optimize and navigate the new system in order to make managing and scaling of such a comprehensive data pipeline truly manageable.
Multimodal AI Personas for Diverse Industries
VC-Backed AI Startup