In today's world where globalization is common, it's very important to overcome language barriers.
There's a new voice translation app that can help with this. This app is made using Gradio, a Python tool that makes it easier to create interactive web interfaces.
The app can change speech into text, translate it, and then turn it back into speech in different languages. This makes it easier for people from different language backgrounds to communicate with each other.
If you prefer, you can follow along with the video version of this post:
How It Works
The Voice-to-Voice Translator app works by using advanced speech recognition and translation models to convert spoken words into text and then into another language.
Here's a simple explanation of how it works:
- Voice Input: The user speaks into the app using a microphone.
- Speech Recognition: The app uses speech recognition technology to convert spoken words into written text.
- Translation: The app then uses a translation model to convert the text into the desired language.
- Voice Output: Finally, the app uses text-to-speech technology to convert the translated text into spoken words, so the user can hear the translation.
Get "Python's Magic Methods - Beyond __init__ and __str__"
Magic methods are not just syntactic sugar, they're powerful tools that can significantly improve the functionality and performance of your code. With this book, you'll learn how to use these tools correctly and unlock the full potential of Python.
Core Features Implementation
Let's see its core features in more detail in terms of the necessary code to implement the different features.
Transcription
The transcription functionality leverages the WhisperModel from the faster_whisper library. This model is configured to operate on a CPU:
# Transcribe the audio
def transcribe_audio(audio_file):
    output_text = ""
    try:
        # Load the model
        model_size = "small"
        model = WhisperModel(model_size, device="cpu", compute_type="int8")
        # Transcribe the audio
        segments, info = model.transcribe(audio_file, beam_size=5)
        # Combine the segments into a single string
        for segment in segments:
            output_text += segment.text + " "
        # Return the transcribed text
        return output_text, None
    except Exception as e:
        return None, str(e)Here's a breakdown of what the code does:
- It loads a speech recognition model using the WhisperModelclass from thewhisperlibrary. The model is loaded on the CPU withint8compute type and the model size is set to"small".
- The transcribemethod of the model is called with the input audio file and a beam size of 5. This method returns a list of segments and some additional information about the transcription.
- The function loops through the segments and appends each segment's text to the output_textstring, separated by a space.
- If the transcription is successful, the function returns the output_textstring andNone.
Translation
Once the audio is transcribed, the app translates the text into four languages: Portuguese, Spanish, French, and German. This is achieved using the Translator class from the translate library:
# Translate the text
def translate_transcription(text):
    try:
        # Translate the text to Portuguese
        pt_translator = Translator(from_lang="en", to_lang="pt")
        pt_translation = pt_translator.translate(text)
        # Translate the text to Spanish
        sp_translator = Translator(from_lang="en", to_lang="es")
        sp_translation = sp_translator.translate(text)
        # Translate the text to French
        fr_translator = Translator(from_lang="en", to_lang="fr")
        fr_translation = fr_translator.translate(text)
        # Translate the text to German
        de_translator = Translator(from_lang="en", to_lang="de")
        de_translation = de_translator.translate(text)
        # Return the translations
        return pt_translation, sp_translation, fr_translation, de_translation, None
    except Exception as e:
        return None, None, None, None, str(e)Here's a breakdown of what the code does:
- The function translates the input text into Portuguese using the Translatorclass from thetranslatorslibrary. It creates apt_translatorobject with the source language set to English ("en") and the target language set to Portuguese ("pt"). Then, it calls thetranslatemethod of thept_translatorobject to translate the input text and store the result in thept_translationvariable.
- The function repeats the same process to translate the input text into Spanish, French, and German, creating separate Translatorobjects for each target language and storing the translated text in separate variables.
- If the translations are successful, the function returns the translated texts as a tuple, along with Noneas the last element.
Speech Synthesis
The translated text is then converted into speech using Google's Text-to-Speech (gTTS) library. This allows users to hear the translations, thereby facilitating better understanding and communication:
This article is for paid members only
To continue reading this article, upgrade your account to get full access.
Subscribe NowAlready have an account? Sign In

