Data Platform and MLOps Architecture

I recently helped a company build out and mature their data and MLOps platform. The company had a team of data scientists and engineers who had found product market fit with a deep learning based product. As with most lean, customer driven approaches, they hadn’t over capitalized on infrastructure too early while they tested and validated the product with customers (a good thing). Given the success of the product with high-profile, paying customers, the goal was then to productize the solution and evolve the infrastructure to a new B2B SaaS platform before going to market.

Below is a view of the end-to-end data and MLOps pipeline, including the manual days spent on each stage of getting a new customer up and running (typically 2-4 weeks + incremental management).

Some additional context on the environment – the company has a strong culture of building capabilities internally and incrementally. So the solution below is heavy on AWS (their cloud provider of choice) native services, allowing for their internal team to build out a foundation platform to, initially, support onboarding multiple customers simultaneously in a matter of 1-2 days. The broader goal was to develop onboarding tools to allow less tenured engineers to quickly onboard clients.

Here is the future state view I put together to support the next phase. Beyond this, it’s likely the company may introduce an all encompassing solution for the data science pipeline such as DataBricks.