Building a Gradio Voice-to-Voice Translator App Without API Keys

In today's world where globalization is common, it's very important to overcome language barriers.

There's a new voice translation app that can help with this. This app is made using Gradio, a Python tool that makes it easier to create interactive web interfaces.

The app can change speech into text, translate it, and then turn it back into speech in different languages. This makes it easier for people from different language backgrounds to communicate with each other.

Get Source Code

If you prefer, you can follow along with the video version of this post:

How It Works

The Voice-to-Voice Translator app works by using advanced speech recognition and translation models to convert spoken words into text and then into another language.

Here's a simple explanation of how it works:

Voice Input: The user speaks into the app using a microphone.
Speech Recognition: The app uses speech recognition technology to convert spoken words into written text.
Translation: The app then uses a translation model to convert the text into the desired language.
Voice Output: Finally, the app uses text-to-speech technology to convert the translated text into spoken words, so the user can hear the translation.

Are you tired of writing the same old Python code? Want to take your programming skills to the next level? Look no further! This book is the ultimate resource for beginners and experienced Python developers alike.

Get "Python's Magic Methods - Beyond __init__ and __str__"

Magic methods are not just syntactic sugar, they're powerful tools that can significantly improve the functionality and performance of your code. With this book, you'll learn how to use these tools correctly and unlock the full potential of Python.

Core Features Implementation

Let's see its core features in more detail in terms of the necessary code to implement the different features.

Transcription

The transcription functionality leverages the WhisperModel from the faster_whisper library. This model is configured to operate on a CPU:

# Transcribe the audio
def transcribe_audio(audio_file):
    output_text = ""
    try:
        # Load the model
        model_size = "small"
        model = WhisperModel(model_size, device="cpu", compute_type="int8")
        # Transcribe the audio
        segments, info = model.transcribe(audio_file, beam_size=5)
        # Combine the segments into a single string
        for segment in segments:
            output_text += segment.text + " "
        # Return the transcribed text
        return output_text, None
    except Exception as e:
        return None, str(e)

Here's a breakdown of what the code does:

It loads a speech recognition model using the WhisperModel class from the whisper library. The model is loaded on the CPU with int8 compute type and the model size is set to "small".
The transcribe method of the model is called with the input audio file and a beam size of 5. This method returns a list of segments and some additional information about the transcription.
The function loops through the segments and appends each segment's text to the output_text string, separated by a space.
If the transcription is successful, the function returns the output_text string and None.

Translation

Once the audio is transcribed, the app translates the text into four languages: Portuguese, Spanish, French, and German. This is achieved using the Translator class from the translate library:

# Translate the text
def translate_transcription(text):
    try:
        # Translate the text to Portuguese
        pt_translator = Translator(from_lang="en", to_lang="pt")
        pt_translation = pt_translator.translate(text)

        # Translate the text to Spanish
        sp_translator = Translator(from_lang="en", to_lang="es")
        sp_translation = sp_translator.translate(text)

        # Translate the text to French
        fr_translator = Translator(from_lang="en", to_lang="fr")
        fr_translation = fr_translator.translate(text)

        # Translate the text to German
        de_translator = Translator(from_lang="en", to_lang="de")
        de_translation = de_translator.translate(text)

        # Return the translations
        return pt_translation, sp_translation, fr_translation, de_translation, None
    except Exception as e:
        return None, None, None, None, str(e)

Here's a breakdown of what the code does:

The function translates the input text into Portuguese using the Translator class from the translators library. It creates a pt_translator object with the source language set to English ("en") and the target language set to Portuguese ("pt"). Then, it calls the translate method of the pt_translator object to translate the input text and store the result in the pt_translation variable.
The function repeats the same process to translate the input text into Spanish, French, and German, creating separate Translator objects for each target language and storing the translated text in separate variables.
If the translations are successful, the function returns the translated texts as a tuple, along with None as the last element.

Speech Synthesis

The translated text is then converted into speech using Google's Text-to-Speech (gTTS) library. This allows users to hear the translations, thereby facilitating better understanding and communication:

Building a Gradio Voice-to-Voice Translator App Without API Keys

How It Works

Core Features Implementation

Transcription

Translation

Speech Synthesis

About the Author

Developer Service Netherlands

Save Hours Managing Ghost with These Python Scripts

Building Weather & History Story Cards with Python and FastAPI

How It Works

Core Features Implementation

Transcription

Translation

Speech Synthesis

This article is for paid members only

Join us for more articles about Python, Django and AI

About the Author

Developer Service Netherlands

Related Articles