LLMOps — Part 1 : The Introduction

LLMOps is an extension of MLOps handling processes such as data preparation, model tuning, deployment, maintenance, monitoring etc.

Tulsipatro
5 min readApr 24, 2024

Let’s say, your application uses multiple LLM calls, like chaining multiple steps together using LangChain framework where you also have to deal with Prompt management used for designing prompts and then evaluating and comparing different prompts. All these tasks can be too complex to manage. Hence, an end-to-end workflow is necessary for LLM based applications.

What is MLOps ?

MLOps or Machine Learning Operations is an ML engineering culture and practice that aims at unifying ML Development (Dev) and ML Operations (Ops).

Automation and Monitoring at all steps of ML system construction, includes :
integration, testing, releasing, deployment, infrastructure management

For e.g. You want to deploy an LLM model and you want to automate the process with data engineering, training or tuning the model and deploying it as API in production. Once it is in Production, you want to monitor how your model performs. All these things can be managed easily by MLOps framework.

MLOps FRAMEWORK

When building an ML use case,
something like Bringing an LLM into production, one has to go through the following steps :

MLOps framework

1. Data Ingestion
2. Data Validation
3. Data Transformation
4. Model
5. Model Analysis
6. Serving
7. Logging

Data Ingestion is all about “How do take the data as input ?”;
Data Validation explains “How do I validate this data ?” or “Check missing data”;
Data Transformation helps in fixing if there’s anything wrong in the data, Transforming can also mean having text data and then convert it into a format suitable for LLM;
Model Analysis is training or tuning the model, evaluating the model and understanding how it performs;
Logging is keeping a track of all the metrics related to the model in production.

When you want to build a workflow and execute all the steps in an organized manner, then earlier it was done manually.
Managing this manually will be time consuming, and not very efficient.
This is where ‘Automation and Orchestration’ comes to play.

How is LLMOps different from LLM System Design?

LLMOps : MLOps for LLMs

LLMOps is focused on LLM development & managing the model in production.

Examples :
- experiment on foundation models
There are a lot of models available now like PaLM or LLama. One has to experiment with all the models to know which one suits the best for your use case.
- prompt design and management,
- supervised tuning,
- monitoring,
- evaluate generative output

LLM System design

This looks into the entire application and looks into the broader design of the entire end-to-end application (front-end, back-end, data engineering)

Examples :
- Chain multiple LLMs together
- Grounding
- Track history

Let’s say we have a lots of documents to summarize, which will be too much for the LLM to process & summarize at once, so we need to do the summarization in batches. Maybe we need to divide the steps and then chain the steps together.
We also might want to do Grounding, to make sure our LLMs has additional information to get the relevant output. We also want to keep a track of the history of all the previous model performances as well.

LLMOps vs LLM system design

LLM Driven application

Let’s look at a high level example of an LLM driven application.
Everything starts with an User Interface, i.e.
“How does an user interact with an application ?”
The user provides an input and the input goes into the backend.

How does the Backend processes it ? “What happens behind ?”

The user input is sent for Pre-Processing,
let’s take the same example of summarization, where there’s huge data to summarize, so we need to chunk the data into smaller pieces for summarization. Here we will need pre-processing to convert the data into smaller chunks.
After pre-processing, we will need to use Grounding; grounding your LLM with facts. These facts can be included within your prompts & then your prompts goes into the model.
At first, one can use any Foundation models, and then when you get the response from the model, some grounding can be done again to check the response against the facts that we have. This step can be done several times to get more relevant responses.
Once we have done all the Grounding, we might want to do some Post-processing & Responsible AI, where you want to clean up the response and give it a structure which is user-friendly. You can also check for toxicity in the response or biasness in the response of an LLM.
Once we have the perfect output, we can send it as the Final output to the user.

LLM Driven Application

Model Customization
In some cases, we can also do Model Customization. Model customization has three steps to follow : Data preparation, Tuning, Evaluating and monitoring the performance. This is an iterative process. Once the final fine-tuned model is done, it can be deployed with the foundation models which can be used in your LLM driven application.

LLMOps Pipeline

  1. Data preparation & versioning
    Everything starts with Data. So, we will start with preparing our datasets and note down the version of the datasets, so that we can keep track of the datasets created.
  2. Pipeline design
    Next, we will design a pipeline that will do supervised tuning on LLMs for us.
  3. Artifact
    Generate an Artifact, having information regarding the Configuration & Workflow. Workflow describes the entire steps of our pipeline. The Configuration are the parameters that we want to use to execute the workflow.
    For e.g. Which dataset has been used for the Supervised finetuning ?
  4. Pipeline execution
  5. Deploy LLM
  6. Prompting & Prediction
  7. Responsible AI

Once, we have generated the artifacts, we can execute the pipeline. The pipeline will also deploy an LLM first, following the steps for prompting & predictions. Once we have the response of the LLM, we can use Responsible AI to check the safety using safety scores.

LLMOps pipeline

The 2 important topics for LLMOps are Orchestration and Automation. Orchestration provides a structured method for overseeing and synchronizing the functions of LLMs, aiming for their smooth integration into a more expansive AI network.
Automation, is all about how can we automate our pipeline, to make our lives easy as developers.

This LLMOps pipeline is extremely simple, depending on your use case, the diagram might change.
Some other important LLMOps topics includes, Prompt design & management, Model evaluation & monitoring, Testing.

--

--