Building a Chat Application with Chainlit and Mistral 7B on CPU

This guide delves into the nuances of Mistral 7B and Chainlit, exploring their capabilities and demonstrating how they can be harnessed to build an interactive chat application.

Building a Chat Application with Chainlit and Mistral 7B on CPU
Image by Author

As AI continues to evolve, so does the potential to create more intuitive and intelligent chat applications that mimic human-like interactions.

One avenue to achieving such sophisticated chat applications is through the integration of advanced language models and interactive development libraries. Mistral 7B, a high-performance language model, coupled with Chainlit, a library designed for building chat applications, exemplifies a powerful combination of technologies capable of creating compelling chat apps.

This guide delves into the nuances of Mistral 7B and Chainlit, exploring their capabilities and demonstrating how they can be harnessed to build an interactive chat application.

Through a step-by-step approach, we'll uncover the process of designing the user interface (UI), setting up backend processing, and ensuring seamless integration for a fully functional chat app.

Whether you are a seasoned developer or a curious enthusiast, this guide aims to provide a comprehensive understanding and a practical pathway to building your own chat app with Mistral 7B and Chainlit.

This will be your completed chat application:


Mistral 7B running on a CPU

What is Mistral 7B?

Mistral 7B is a notable name in the realm of language models, built to provide high-performance results for a variety of tasks. It's a product of significant advancements in machine learning and natural language processing technologies, embodying the evolution of conversational AI.

At its core, Mistral 7B is engineered to understand and generate human-like text, making it a powerful tool for numerous applications. One of its notable applications is in the creation of chat applications, which can be fine-tuned to provide nuanced and coherent responses in real-time conversations.

Furthermore, Mistral 7B stands out with its fine-tuning ability, allowing developers to tailor its capabilities to specific domains or tasks. This fine-tuning ability makes it a flexible choice for developers leveraging cutting-edge language model technology for various projects.

The application of Mistral 7B extends beyond chat applications. It can be utilized for a range of NLP tasks including but not limited to sentiment analysis, text summarization, and question-answering. The model's versatility and high-performance nature make it a robust choice for developers and organizations aiming to integrate AI-powered solutions into their operations or offerings.

In a comparative light, Mistral 7B showcases commendable performance, often being compared to other high-caliber models like Llama 2–13B. Its capability to handle complex conversational scenarios makes it a preferred choice for those looking to build sophisticated chat applications, ensuring users have an engaging and coherent interaction experience.

The best of all, it can run on modern CPUs with a very acceptable performance.

What is Chainlit?

Chainlit is an innovative library tailored for developers aiming to build chat applications swiftly and efficiently.

This library is particularly designed to work seamlessly with large language models (LLMs), providing a conducive environment for developing interactive chat applications.

Here's a more detailed breakdown of Chainlit:

Ease of Use

Chainlit is built on top of the React framework, making it a relatively easy-to-use tool for developers familiar with React. It abstracts much of the complexity associated with handling LLMs, allowing developers to focus on creating interactive and engaging chatbot experiences.

Interactive UI Components:

The library provides a range of UI components such as text boxes, buttons, and dropdown menus which are essential in building interactive chat interfaces. It supports custom styling, enabling developers to design and brand their chat applications according to their preferences.

Integration with LLMs:

Chainlit is engineered to work harmoniously with large language models, providing a streamlined process for integrating these models into chat applications. This feature is crucial for developers looking to harness the power of modern language models like Mistral 7B in their chat app projects.

Community and Resources:

There's a growing community around Chainlit, with a repository of example projects and tutorials available for developers. The Chainlit Cookbook and community tutorials provide a great starting point for those looking to explore the capabilities of Chainlit in creating LLM apps.

Comparison with Similar Tools:

Compared to other libraries like Streamlit, Chainlit is specifically tailored for chat applications, offering a more specialized set of tools and features for developers in this domain.

In summary, Chainlit is a robust and developer-friendly library that significantly eases the process of creating, designing, and deploying chat applications, especially those powered by large language models.

Its growing community and wealth of resources further support developers in exploring and realizing the potential of conversational AI in modern-day communication solutions.

Building the Chat App

Building a chat application, especially one that is powered by advanced language models and provides a rich user interface, involves a combination of selecting the right technologies, designing a user-friendly interface, and ensuring the backend processing is efficient and reliable.

Here is a step-by-step breakdown of how one might go about building a chat app using Mistral 7B and Chainlit.

Install Dependencies

You will start by installing the necessary dependencies:

pip install chainlit ctransformers

Creating the App

Typically a Chainlit app is contained in a file called, so now you can create that file and start with the imports:

import os
import chainlit as cl
from ctransformers import AutoModelForCausalLM

There are 2 important functions in that file, that take care of loading the model and responding to the user messages.

Chainlit uses decorators to define and run the necessary actions. In case you need a refresher on Python Decorators, you can check out my previous article:

The first decorator that you will use is called @cl.on_chat_start. The method that is decorated will be run when a Chainlit chat starts and allows for setting up models and all necessary data.

In this case, you will define and initialize the model here:


# Runs when the chat starts
def main():
    # Create the llm
    llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
                                               threads=int(os.cpu_count() / 2),

    # Store the llm in the user session
    cl.user_session.set("llm", llm) (continued)

Let's break down the code:

  • First, the LLM is created. In this case, we are using CTransformers to be able to run on the CPU and using a model from HuggingFace's 'TheBloke'.
    • We specify a medium-size model (Q4_K_M), with a temperature of 0.7, no GPU (running on CPU), streaming, using half of the CPU cores for the threads, and setting the max return tokens. This will also download the model the first time it runs.
  • Then we store that LLM in the user session (from Chainlit) so that you can use the cached model in the response method next.

Next, you will define the method for handling a user message. This is done with the decorator @cl.on_message, which runs when a message is sent:


# Runs when a message is sent
async def main(message: cl.Message):
    # Retrieve the chain from the user session
    llm = cl.user_session.get("llm")

    msg = cl.Message(

    prompt = f"[INST]{message.content}[/INST]"
    for text in llm(prompt=prompt):
        await msg.stream_token(text)

    await msg.send() (continued)

Let's break down this code:

  • First, we retrieve the llm variable from the user session
  • Then, we create a return message (msg) with an empty content. This will receive the stream output of the model
  • Specifically for Mistral 7B, the prompt needs to start and end with [INST] tags, so a format on the prompt is made
  • Then the LLM is invoked in the for loop, which places each of the return tokens in the text variable. That is then sent as a stream to Chainlit with stream_token method.
  • Lastly, the message is closed with the send method.

And that is all the code that is required to build the chat app. You can now run the Chainlit app and test it with your CPU.

Running the Chat app

Running a Chainlit app is as easy as a simple run command:

chainlit run

This will start a webserver and open the app automatically in your browser, by default the URL is http://localhost:8000:

Chainlit app default home page

Let's see the Mistral 7B model in action running in (almost) real-time on a CPU:


Mistral 7B running on a CPU

Depending on your CPU it might take a bit longer or less time to run.

For reference, I was running this code on a Windows 11 virtual machine with 8 cores and 16GB of RAM. The processor is an AMD Ryzen 7 3700X (which actually has 16 cores, but only 8 were assigned to this VM)


The realm of digital communication has witnessed a significant transformation with the advent of AI-powered chat applications.

The synergy of Mistral 7B and Chainlit paves the way for creating intuitive and engaging chat interfaces that not only cater to real-time communication needs but also elevate the user experience to a level that mirrors human-like interactions.

Throughout this guide, we've navigated the key aspects of Mistral 7B and Chainlit and delineated a step-by-step approach to building a chat app that embodies modern-day communication standards.

Whether you are a developer aiming to create a sophisticated chat app or an organization looking to enhance customer engagement, the marriage of Mistral 7B and Chainlit provides a robust foundation to build upon.

As you explore and leverage these technologies, you are not just creating a chat app, but contributing to the broader narrative of enhancing digital communication in a rapidly evolving digital landscape.

Full source code available at:

Thank you for reading and I will see you on the Internet.

This post is public so feel free to share it.

If you like my free articles and would want to support my work, consider buying me a coffee: