How to summarize articles with Streamlit and LangChain with Mistral 7B on CPU

In this article, we will guide you through the steps of combining Streamlit's simplicity with the summarization prowess of Mistral 7B to create a powerful tool for distilling long articles into their core messages.

How to summarize articles with Streamlit and LangChain with Mistral 7B on CPU
Image by Author

In the age of information overload, the ability to quickly understand the essence of a text can be an invaluable skill.

This is where the art of summarization comes into play, and it has been revolutionized by AI tools like Mistral 7B and LangChain.

But how can one harness this technology in a user-friendly way? Enter Streamlit, a game-changer for building data applications.

In this article, we will guide you through the steps of combining Streamlit's simplicity with the summarization prowess of Mistral 7B powered by LangChain orchestration to create a powerful tool for distilling long articles into their core messages.

Whether you're a busy professional, a student, or just someone with a voracious appetite for knowledge, this tutorial will empower you to stay informed and efficient.

Prepare to unlock the potential of AI-assisted reading without the need for deep technical know-how.

What is Streamlit?

Streamlit was born out of the need to simplify the transition from data scripts to interactive web applications.

It was developed by a group of AI researchers who recognized that the tools for building custom web apps were too complex and time-consuming for data scientists.

Streamlit's development began with the idea that turning data scripts into web apps should not be more difficult than writing the scripts themselves.

Since its inception, Streamlit has rapidly gained popularity in the data science community for its ease of use and speed of development.

Key features of Streamlit include:

  • No Front-end Experience Required: Streamlit allows developers to create applications with only Python knowledge, eliminating the need for HTML, CSS, or JavaScript expertise.
  • Rapid Prototyping: Changes to the code are automatically reflected in the app, enabling real-time app updates without the need for a refresh.
  • Component Library: A rich set of widgets and components, including sliders, buttons, and text inputs, that can be easily integrated with minimal code.
  • Data Caching: Streamlit's caching mechanism speeds up data loading and processing, making apps more efficient.
  • Custom Components: Developers can create custom components or use community-built components to extend functionality.
  • Deployment Ease: Streamlit apps can be quickly deployed on various platforms, allowing for easy sharing and collaboration.

What is Mistral 7B?

The Mistral 7B is a Large Language Model (LLM) with 7 billion parameters, designed for generating text and performing various natural language processing tasks.

It is notable for its superior performance and efficiency, particularly in comparison to other models such as Llama 2 13B and Llama 1 34B, excelling across all evaluated benchmarks, including reasoning, mathematics, and code generation.

The architecture of the Mistral 7B leverages grouped-query attention for faster inference and sliding window attention to effectively manage tasks, making it suitable for real-time applications that require quick responses

This allows Mistral 7B to extract key information from texts and understand the context and nuances of the language used.

Advantages of using Mistral 7B for summarization include:

  • Contextual Understanding: Mistral 7B can grasp the context of the content, ensuring that summaries are coherent and capture the essence of the original text.
  • Brevity and Relevance: The model is trained to identify and condense the most relevant information, producing succinct summaries that save time for the reader.
  • Scalability: Mistral 7B can handle a variety of documents, from short news articles to lengthy research papers, making it versatile.
  • Language Comprehension: The AI has a broad understanding of language and can summarize content in a way that is accessible to non-expert readers.
  • Continuous Learning: Like many AI models, Mistral 7B can improve over time, learning from its interactions to provide better summaries.

Streamlit and Mistral 7B are powerful on their own. Still, when combined, they offer a compelling solution for anyone looking to streamline the process of summarizing and digesting large volumes of text.

What is LangChain?

LangChain is a versatile open-source framework designed to facilitate the development of applications that utilize large language models (LLMs).

It's tailored to simplify the creation of generative AI application interfaces, making it easier for developers to create advanced natural language processing (NLP) applications.

LangChain supports a range of uses, from chatbots and Generative Question-Answering (GQA) to document analysis and summarization.

The core idea is to "chain" together different components to create advanced use cases for LLMs, allowing for a modular approach to building applications.

These chains can consist of multiple components from several modules, creating end-to-end solutions for frequently encountered applications. It's designed to be simple to use, supporting a wide array of LLM-powered applications and encouraging contributions from a large user and contributor community.

In summary, LangChain is a powerful tool for developers looking to harness the capabilities of large language models in creating context-aware, intelligent applications that can analyze, summarize, and interact using natural language.

Building the Application

To build your application, only 2 sets of files will be required, one will contain the summarization logic with LangChain and Mistal 7B the other file will contain the UI interface with Streamlit.

First, we start installing the necessary requirements. Assuming you have created a new Python project and set up a virtual environment, run the command:

pip install streamlit langchain beautifulsoup4 ctransformers transformers newspaper3k

Here's what each package is used for:

  • streamlit: This is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.
  • langchain: LangChain is a Python framework for developing applications powered by language models, which simplifies the integration of Large Language Models (LLMs) into various applications.
  • beautifulsoup4: BeautifulSoup is a Python library for parsing HTML and XML documents. It's commonly used for web scraping, which is the process of extracting information from websites.
  • ctransformers and transformers: This library by Hugging Face provides general-purpose architectures for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with thousands of pre-trained models in 100+ languages including Mistral 7B.
  • newspaper3k: This package is used for extracting and parsing newspaper articles. It's useful for web scraping purposes, allowing for easy article retrieval and content curation.

For the Mistral 7B, you will use a version of Mistral 7B from TheBloke, which is optimized to run on the CPU, hence the use of ctransforemers and transformers.


Let's start with the summarization logic file by creating a new file called

import os
import time
from langchain.chains import MapReduceDocumentsChain, LLMChain, ReduceDocumentsChain, StuffDocumentsChain
from langchain.document_loaders import NewsURLLoader
from langchain.llms import CTransformers
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter

def summarize_article(article_url):
    # Load article
    loader = NewsURLLoader([article_url])
    docs = loader.load()

    # Load LLM
    config = {'max_new_tokens': 4096, 'temperature': 0.7, 'context_length': 4096}
    llm = CTransformers(model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",

    # Map template and chain
    map_template = """<s>[INST] The following is a part of an article:
    Based on this, please identify the main points. 
    Answer:  [/INST] </s>"""
    map_prompt = PromptTemplate.from_template(map_template)
    map_chain = LLMChain(llm=llm, prompt=map_prompt)

    # Reduce template and chain
    reduce_template = """<s>[INST] The following is set of summaries from the article:
    Take these and distill it into a final, consolidated summary of the main points. 
    Construct it as a well organized summary of the main points and should be between 3 and 5 paragraphs.
    Answer:  [/INST] </s>"""
    reduce_prompt = PromptTemplate.from_template(reduce_template)
    reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

    # Takes a list of documents, combines them into a single string, and passes this to an LLMChain
    combine_documents_chain = StuffDocumentsChain(
        llm_chain=reduce_chain, document_variable_name="doc_summaries"
    # Combines and iteratively reduces the mapped documents
    reduce_documents_chain = ReduceDocumentsChain(
        # This is final chain that is called.
        # If documents exceed context for `StuffDocumentsChain`
        # The maximum number of tokens to group documents into.
    # Combining documents by mapping a chain over them, then combining results
    map_reduce_chain = MapReduceDocumentsChain(
        # Map chain
        # Reduce chain
        # The variable name in the llm_chain to put the documents in
        # Return the results of the map steps in the output

    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=4000, chunk_overlap=0
    split_docs = text_splitter.split_documents(docs)

    # Run the chain
    start_time = time.time()
    result = map_reduce_chain.__call__(split_docs, return_only_outputs=True)
    time_taken = time.time() - start_time
    return result['output_text'], time_taken

This code defines a function summarize_article that takes an article_url as input and uses the LangChain framework to generate a summary of the article.

Here's a step-by-step description of the process:

  1. Load the article: The NewsURLLoader is instantiated with the article_url (as a list) , and it loads the document(s) from the web.
  2. Configure and load the language model: It sets up a configuration for the LLM, specifying parameters like max_new_tokens, temperature, and context_length. Then, it loads the CTransformers language model with the specified model and configuration, setting the number of threads to the CPU count of the system for parallel processing.
  3. Map phase with template and chain: It defines a map_template to instruct the LLM to identify the main points from the article parts. This template is turned into a PromptTemplate, and then a LLMChain is set up using the LLM and the prompt template.
  4. Reduce phase with template and chain: Similarly, a reduce_template is defined to instruct the LLM to distill the set of summaries into a final, consolidated summary. A PromptTemplate is created from this template, and a new LLMChain is set up for the reduction phase.
  5. Combine documents chain: The StuffDocumentsChain is used to take a list of document summaries and combinegroup them into a single string for the final reduction phase.
  6. Reduce documents chain: The ReduceDocumentsChain is set up to iteratively reduce the mapped documents into a single, concise summary. It uses combine_documents_chain for this process and specifies a token_max for group documents.
  7. Map-reduce chain: The MapReduceDocumentsChain is configured with the map chain and the reduce chain to process the documents. It maps a chain over the documents and then combines the results.
  8. Split the documents: The code uses a RecursiveCharacterTextSplitter to split the documents into chunks of a specified size without overlap.
  9. Run the chain: It then runs the map-reduce chain on the split documents and prints the time taken for the operation.
  10. Return the result: The function finally returns the consolidated summary produced by the reduce chain and the execution time.

This function essentially breaks down the summarization task into smaller parts using the map-reduce paradigm, processes each part with the LLM, and then combines the results into a final summary.

This implementation can handle large documents by splitting them into smaller chunks and processing them in parallel, which is beneficial for both performance and memory management.

User Interface

Let's now focus on creating the UI of your application with Streamlit by creating a new file called This will be the main file and the file you will run for running the application:

import streamlit as st
from summarizer import summarize_article

# Set page title
st.set_page_config(page_title="Article Summarizer", page_icon="📜", layout="wide")

# Set title
st.title("Article Summarizer", anchor=False)
st.header("Summarize Articles with AI", anchor=False)

# Input URL
url = st.text_input("Enter Article URL", value="")

# Download audio
if url:
    with st.status("Processing...", state="running", expanded=True) as status:
        st.write("Summarizing Article...")
        summary, time_taken = summarize_article(url)
        status.update(label=f"Finished - Time Taken: {time_taken} seconds", state="complete")

    # Show Summary
    st.subheader("Summary:", anchor=False)

Here's a breakdown of what each part of the code is doing:

  1. Streamlit App Configuration:
    • st.set_page_config sets the configuration for the Streamlit page, specifying the page title ("Article Summarizer"), an emoji icon ("📜") for the browser tab, and sets the page layout to "wide".
  2. App Title and Header:
    • st.title and st.header are used to set the title and header of the web app, which are "Article Summarizer" and "Summarize Articles with AI" respectively.
  3. User Input for Article URL:
    • st.divider() adds a visual divider in the UI.
    • st.text_input creates a text input field where users can enter the URL of the article they want to summarize. The field is initialized with an empty default value.
  4. Processing and Summarization:
    • If a URL is provided, the app displays a status message "Processing..." using st.status to inform the user that the article is being processed.
    • Inside the with block, it writes the message "Summarizing Article..." to the app.
    • The provided URL is then passed to the summarize_article function, which returns a summary and the time taken to process the article.
    • Once the summarization is complete, the status message is updated to "Finished" along with the time taken to generate the summary.
  5. Displaying the Summary:
    • After processing, a subheader "Summary:" is added to the UI.
    • The summary returned from the summarize_article function is then written to the page so that the user can read it.

The app is straightforward, user-friendly, interactive, and provides real-time feedback to users during the summarization process.

Testing the Application

To test the application, all you need to do is run the Streamlit application, which is done with the command:

streamlit run

This will start the Streamlit application server and automatically open the browser with the following page open:


Streamlit application running

You can input an article URL and the application will take care of the rest. Keep in mind that on the first run, the Mistral 7B model needs to be downloaded, so it will take some extra seconds to process.

Depending on your CPU and the article, the summarization can take maybe 1 to 2 minutes. In this example, it took 109 seconds (1 minute and 49 seconds).


In conclusion, the integration of Streamlit, LangChain, and various supporting Python libraries such as BeautifulSoup4 and Newspaper3k represents a significant advancement in the field of NLP and AI-driven content summarization.

Streamlit offers an intuitive platform for deploying interactive web applications, while LangChain provides the necessary framework to leverage large language models effectively.

The inclusion of BeautifulSoup4 and Newspaper3k for data extraction demonstrates a powerful stack for web scraping and content processing.

This technological synergy offers developers and data scientists the tools to create sophisticated applications capable of transforming vast amounts of information into succinct, digestible summaries.

The potential applications are vast, from aiding research to enhancing content curation, making this combination a valuable asset in the information-heavy digital landscape.

Full source code available at:

Thank you for reading and I will see you on the Internet.

This post is public so feel free to share it.

If you like my free articles and would want to support my work, consider buying me a coffee: