With the rapid growth of artificial intelligence technology, converting spoken language into text has become an incredibly useful skill. OpenAI’s Whisper API is a powerful tool for doing just this—it can accurately turn your spoken words into written text.
Practical Use Case
Financial analysts and investment firms often rely on earnings calls to gather insights about a company’s performance, future outlook, and management commentary. These calls can contain crucial information affecting investment decisions. However, listening to each earnings call and noting down important details can be time-consuming. To streamline this process, firms can utilize OpenAI’s Whisper API to transcribe these audio files, allowing for easier analysis and information retrieval.
In this tutorial, I’ll show you how to build a simple Python application that records audio from a microphone, saves it as an MP3 file, and then uses the Whisper API to transcribe the speech into text. Let’s dive in!
What is Whisper API?
![](https://i0.wp.com/www.marketcalls.in/wp-content/uploads/2024/05/image-13.png?resize=1024%2C418&ssl=1)
OpenAI’s Whisper API is a tool that allows developers to convert spoken language into written text. It’s built on the Whisper model, which is a type of deep learning model specifically designed for automatic speech recognition (ASR). The Whisper model is known for its robust performance across a wide variety of languages and accents, and it’s capable of handling different audio conditions and contexts.
To generate realtime speech to text OpenAI API token is necessary. Follow the straightforward steps outlined below to create your API_TOKEN with OpenAI.
- Go to https://platform.openai.com/apps and signup with your email address or connect your Google Account.
- Go to View API Keys on the left side of your Personal Account Settings
- Select Create New Secret key
![](https://i0.wp.com/www.marketcalls.in/wp-content/uploads/2024/01/image-20-1024x589.png?resize=1024%2C589&ssl=1)
The API access to openai is a paid service. You have to set up billing. Read the Pricing information before experimenting.
Inorder to store the OpenAI Apikey securely used the .env files and stored the key under environmental variable
Prerequisites
Before you begin, ensure you have the following installed:
- Python 3.8 or later
- sounddevice: For recording audio from the microphone
- numpy: For handling the audio data
- pydub: For processing audio files
- python-dotenv: For loading environment variables
- The OpenAI Python library: For accessing the Whisper API
You can install the necessary libraries using pip:
pip install sounddevice numpy pydub python-dotenv openai
Step 1: Setting Up Environment Variables
To use the OpenAI API, you need to secure your API key. Store your API key in a .env
file in your project’s root directory:
OPENAI_API_KEY='Your-OpenAI-API-Key-Here'
Load this API key in your script with python-dotenv
:
from dotenv import load_dotenv
import os
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
Step 2: Recording Audio
Use the sounddevice
library to capture audio from the system’s default microphone every 5 second as audio chunks. Here’s a simple function to record audio for a specified duration:
import sounddevice as sd
def record_audio(duration=5, sample_rate=44100):
print("Recording...")
recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2, dtype='int16')
sd.wait()
return recording
Step 3: Saving the Audio
After recording the audio, save it as an MP3 file using pydub
:
from pydub import AudioSegment
import numpy as np
import os
def save_as_mp3(audio_data, sample_rate=44100, file_name='output.mp3', folder='audio'):
if not os.path.exists(folder):
os.makedirs(folder)
full_path = os.path.join(folder, file_name)
audio_segment = AudioSegment(
data=np.array(audio_data).tobytes(),
sample_width=2,
frame_rate=sample_rate,
channels=2
)
audio_segment.export(full_path, format='mp3')
return full_path
Step 4: Transcribing Audio with Whisper API
Now, use the OpenAI library to transcribe the saved audio file:
from openai import OpenAI
def transcribe_audio(file_path):
client = OpenAI(api_key=OPENAI_API_KEY)
with open(file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language='en'
)
print(f'Transcription: {transcription.text}')
Step 5: Putting It All Together
Combine the above functions into a script that continuously records and transcribes audio until the ESC key is pressed:
import sounddevice as sd
import numpy as np
import os
import keyboard # To detect shortcut key press
from pydub import AudioSegment
from openai import OpenAI
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
def record_audio(duration=5, sample_rate=44100):
"""Record audio from the microphone."""
print("Recording...")
recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2, dtype='int16')
sd.wait() # Wait until recording is finished
#print("Recording finished")
return recording
def save_as_mp3(audio_data, sample_rate=44100, file_name='output.mp3', folder='audio'):
"""Save recorded audio as MP3 in a specified folder."""
if not os.path.exists(folder):
os.makedirs(folder)
full_path = os.path.join(folder, file_name)
audio_segment = AudioSegment(
data=np.array(audio_data).tobytes(),
sample_width=2, # 2 bytes (16 bits) per sample
frame_rate=sample_rate,
channels=2
)
audio_segment.export(full_path, format='mp3')
return full_path
def transcribe_audio(file_path):
"""Transcribe the audio file using OpenAI's API."""
client = OpenAI(api_key=OPENAI_API_KEY)
with open(file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language='en'
)
print(f'Transcription :{transcription.text}')
if __name__ == "__main__":
sample_rate = 44100 # Sample rate in Hz
duration = 5 # Duration of recording in seconds
try:
while True:
if keyboard.is_pressed('esc'): # Check if ESC key is pressed to exit
print("Exiting...")
break
audio_data = record_audio(duration, sample_rate)
file_path = save_as_mp3(audio_data, sample_rate)
transcribe_audio(file_path)
except KeyboardInterrupt:
print("Program terminated.")
This simple application showcases the power of OpenAI’s Whisper API in creating accessible tools for speech-to-text conversion. By integrating such technologies, developers can build more inclusive and efficient communication tools that bridge the gap between spoken and written language.