MLOps : Simplify and Operationalize AI/ML Workflows on Kubernetes — ROBIN’s Cloud-Native Platform.

Sricharan Mahavadi
4 min readJun 30, 2021

There is no doubt that a lot of companies are heavily investing a great deal of time and money to develop analytics that drives faster, better business decisions and faster Insights to critical data, But deployment challenges are a nightmare — the infamous “last mile” of analytics — From data science to Business ROI.

In this article, I would be covering some of the complexities involved in Operationalizing ML Pipelines and how these complexities can be addressed by choosing the right platform for deploying your AI/ML Pipelines.

Let’s follow the below diagram to understand the life cycle of a typical Machine Learning model pipeline.

Fig 1.

As clearly illustrated by the above diagram, There are several phases involved in managing an ML ecosystem that involves a wider variety of application options that meet requirements at each stage such as support for complex compute engines, datastore engines like Cassandra, MongoDB, Redis, HDFS, Spark, TensorFlow, Kafka, Airflow, Bigdata ecosystem, etc. This demands a solution that supports various modern databases and big data ecosystems.

While Data Scientists want to quickly spin up JupyterNotebook, TensorFlow, PyTorch applications to build a great model that runs well. Operations and Dev teams want to focus on testing and promoting the newly tested models to Higher environments for further evaluation. This process involves a lot of Continous Integration & Continous Delivery Automation practices which get complex and complex with a number of heterogeneous applications in play.

Machine Learning Models built by Data Scientists represent only a small fraction of the components that comprise an enterprise production deployment workflow, as illustrated in Fig 2[2] below.​​​​​​​

Fig 2.

ML Ops: ( ML + DevOps = MLOps )

To operationalize ML deployment workflow, data scientists are required to work closely with multiple other teams such as business, engineering, and operations and be able to quickly provision, manage the lifecycle of various ML Workflow components and have an end to end framework to achieve CI/CD of Model development and Feature Engineering. The goal of MLOps is to streamline such challenges with well-established practices that leverage Modern Day Technologies like Kubernetes and Cloud-Native Platforms.

Fig 3.

It is very important to choose the right platform to host your AI/ML pipeline that can address and simplify all the Key Challenges in Operationalizing the ML pipeline deployment.

Key Challenges in Managing AI/ML Workloads -

A platform that supports End to End Orchestration of AI/ML Workflow.

Self Service Provisioning & Management of various AI/ML Apps. Automate deployment, reduce service delivery timeframes and Increase Resiliency.

Support for Heterogeneous Applications — The Platform should support the easy deployment of Any combination of Complex applications such as

— Datascience /Model — frameworks — Jupyter Notebooks.

— Compute engines Spark, TensorFlow, PyTorch, MapReduce, Kafka, ..

— Database Engines — Cassandra, Mongo, Hadoop, Redis, Sylla DB .. etc.

— Workflow Engines — Airflow, — CI/CD: Jenkins, Gitlab, Gitops,... …etc.

— Datastores: Persistent Volumes, NFS, MINO, Docker Registry, Artifactory, etc.

— Monitoring framework — Prometheus, Grafana, datadog etc.

Scalability Challenges — We need to choose a highly scalable platform to handle large experimental and training datasets and computational needs.

Reproducibility — The Model Training, Testing, Experimentation often require reproducibility within a given model, dataset parameters. The platform should have support for Complex/Consistent data snapshot & rollback capabilities.

Portability — The platform should support the portability of Models, Applications & Datasets.

CI/CD — To promote the ongoing model and feature engineering, the platform should enable easy setup and Integration of CI/CD practices using REST API Integrations.

Support for GPU Computing — The choosen platform should support GPU processing for training complex models.

Containerize and Manage Stateful databases at scale.

Enable High-Performance Computing through GPU Support.

Observability and Monitoring Metrics and many more challenges .. etc.

Choosing the Right Platform...

Considering all the above challenges, prompted the need for an end-to-end solution that abstracts and addresses all the complexities involved in building a robust AI/ML platform.

ROBIN is the software platform that enables enterprises to build “Any AI/ML Application as a Service’’ anywhere on-premise or public cloud. It provides all the essential platform features that are vital for running any critical data-heavy workloads like databases and bigdata and ML Pipelines running on containers orchestrated by Robin’s Kubernetes.

In a Nutshell,

“ Robin’s Cloud Native Platform is a software-defined application orchestration kubernetes platform that combines containerized storage, networking, compute (Kubernetes), and the application management layer into a single system which allows enterprises to automate provisioning and lifecycle management of complex data-heavy workloads like databases and big data applications and complex data/ML pipelines.”

For more information about advanced features of ROBIN, please visit http://www.docs.robin.io

https://robin.io/platform/architecture/

Deploying Cassandra on Robin CNP

--

--

Sricharan Mahavadi

Sr. Solution Architect, Database Technology, Big data,AI/ML Enthusiast.