In today's world where globalization is common, it's very important to overcome language barriers.
There's a new voice translation app that can help with this. This app is made using Gradio, a Python tool that makes it easier to create interactive web interfaces.
The app can change speech into text, translate it, and then turn it back into speech in different languages. This makes it easier for people from different language backgrounds to communicate with each other.
If you prefer, you can follow along with the video version of this post:
How It Works
The Voice-to-Voice Translator app works by using advanced speech recognition and translation models to convert spoken words into text and then into another language.
Here's a simple explanation of how it works:
- Voice Input: The user speaks into the app using a microphone.
- Speech Recognition: The app uses speech recognition technology to convert spoken words into written text.
- Translation: The app then uses a translation model to convert the text into the desired language.
- Voice Output: Finally, the app uses text-to-speech technology to convert the translated text into spoken words, so the user can hear the translation.
Get "Python's Magic Methods - Beyond __init__ and __str__"
Magic methods are not just syntactic sugar, they're powerful tools that can significantly improve the functionality and performance of your code. With this book, you'll learn how to use these tools correctly and unlock the full potential of Python.
Core Features Implementation
Let's see its core features in more detail in terms of the necessary code to implement the different features.
Transcription
The transcription functionality leverages the WhisperModel from the faster_whisper
library. This model is configured to operate on a CPU:
# Transcribe the audio
def transcribe_audio(audio_file):
output_text = ""
try:
# Load the model
model_size = "small"
model = WhisperModel(model_size, device="cpu", compute_type="int8")
# Transcribe the audio
segments, info = model.transcribe(audio_file, beam_size=5)
# Combine the segments into a single string
for segment in segments:
output_text += segment.text + " "
# Return the transcribed text
return output_text, None
except Exception as e:
return None, str(e)
Here's a breakdown of what the code does:
- It loads a speech recognition model using the
WhisperModel
class from thewhisper
library. The model is loaded on the CPU withint8
compute type and the model size is set to"small"
. - The
transcribe
method of the model is called with the input audio file and a beam size of 5. This method returns a list of segments and some additional information about the transcription. - The function loops through the segments and appends each segment's text to the
output_text
string, separated by a space. - If the transcription is successful, the function returns the
output_text
string andNone
.
Translation
Once the audio is transcribed, the app translates the text into four languages: Portuguese, Spanish, French, and German. This is achieved using the Translator
class from the translate
library:
# Translate the text
def translate_transcription(text):
try:
# Translate the text to Portuguese
pt_translator = Translator(from_lang="en", to_lang="pt")
pt_translation = pt_translator.translate(text)
# Translate the text to Spanish
sp_translator = Translator(from_lang="en", to_lang="es")
sp_translation = sp_translator.translate(text)
# Translate the text to French
fr_translator = Translator(from_lang="en", to_lang="fr")
fr_translation = fr_translator.translate(text)
# Translate the text to German
de_translator = Translator(from_lang="en", to_lang="de")
de_translation = de_translator.translate(text)
# Return the translations
return pt_translation, sp_translation, fr_translation, de_translation, None
except Exception as e:
return None, None, None, None, str(e)
Here's a breakdown of what the code does:
- The function translates the input text into Portuguese using the
Translator
class from thetranslators
library. It creates apt_translator
object with the source language set to English ("en"
) and the target language set to Portuguese ("pt"
). Then, it calls thetranslate
method of thept_translator
object to translate the input text and store the result in thept_translation
variable. - The function repeats the same process to translate the input text into Spanish, French, and German, creating separate
Translator
objects for each target language and storing the translated text in separate variables. - If the translations are successful, the function returns the translated texts as a tuple, along with
None
as the last element.
Speech Synthesis
The translated text is then converted into speech using Google's Text-to-Speech (gTTS) library. This allows users to hear the translations, thereby facilitating better understanding and communication:
This article is for paid members only
To continue reading this article, upgrade your account to get full access.
Subscribe NowAlready have an account? Sign In