Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

How to Use OpenAI’s Whisper API for Speech-to-Text Conversion – Python Tutorial

3 min read

With the rapid growth of artificial intelligence technology, converting spoken language into text has become an incredibly useful skill. OpenAI’s Whisper API is a powerful tool for doing just this—it can accurately turn your spoken words into written text.

Practical Use Case

Financial analysts and investment firms often rely on earnings calls to gather insights about a company’s performance, future outlook, and management commentary. These calls can contain crucial information affecting investment decisions. However, listening to each earnings call and noting down important details can be time-consuming. To streamline this process, firms can utilize OpenAI’s Whisper API to transcribe these audio files, allowing for easier analysis and information retrieval.

In this tutorial, I’ll show you how to build a simple Python application that records audio from a microphone, saves it as an MP3 file, and then uses the Whisper API to transcribe the speech into text. Let’s dive in!

What is Whisper API?

OpenAI’s Whisper API is a tool that allows developers to convert spoken language into written text. It’s built on the Whisper model, which is a type of deep learning model specifically designed for automatic speech recognition (ASR). The Whisper model is known for its robust performance across a wide variety of languages and accents, and it’s capable of handling different audio conditions and contexts.

To generate realtime speech to text OpenAI API token is necessary. Follow the straightforward steps outlined below to create your API_TOKEN with OpenAI.

  1. Go to https://platform.openai.com/apps and signup with your email address or connect your Google Account.
  2. Go to View API Keys on the left side of your Personal Account Settings
  3. Select Create New Secret key

The API access to openai is a paid service. You have to set up billing. Read the Pricing information before experimenting.

Inorder to store the OpenAI Apikey securely used the .env files and stored the key under environmental variable

Prerequisites

Before you begin, ensure you have the following installed:

  • Python 3.8 or later
  • sounddevice: For recording audio from the microphone
  • numpy: For handling the audio data
  • pydub: For processing audio files
  • python-dotenv: For loading environment variables
  • The OpenAI Python library: For accessing the Whisper API

You can install the necessary libraries using pip:

pip install sounddevice numpy pydub python-dotenv openai

Step 1: Setting Up Environment Variables

To use the OpenAI API, you need to secure your API key. Store your API key in a .env file in your project’s root directory:

OPENAI_API_KEY='Your-OpenAI-API-Key-Here'

Load this API key in your script with python-dotenv:

from dotenv import load_dotenv
import os

load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

Step 2: Recording Audio

Use the sounddevice library to capture audio from the system’s default microphone every 5 second as audio chunks. Here’s a simple function to record audio for a specified duration:

import sounddevice as sd

def record_audio(duration=5, sample_rate=44100):
    print("Recording...")
    recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2, dtype='int16')
    sd.wait()
    return recording

Step 3: Saving the Audio

After recording the audio, save it as an MP3 file using pydub:

from pydub import AudioSegment
import numpy as np
import os

def save_as_mp3(audio_data, sample_rate=44100, file_name='output.mp3', folder='audio'):
    if not os.path.exists(folder):
        os.makedirs(folder)
    full_path = os.path.join(folder, file_name)
    audio_segment = AudioSegment(
        data=np.array(audio_data).tobytes(),
        sample_width=2,
        frame_rate=sample_rate,
        channels=2
    )
    audio_segment.export(full_path, format='mp3')
    return full_path

Step 4: Transcribing Audio with Whisper API

Now, use the OpenAI library to transcribe the saved audio file:

from openai import OpenAI

def transcribe_audio(file_path):
    client = OpenAI(api_key=OPENAI_API_KEY)
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            language='en'
        )
    print(f'Transcription: {transcription.text}')

Step 5: Putting It All Together

Combine the above functions into a script that continuously records and transcribes audio until the ESC key is pressed:

import sounddevice as sd
import numpy as np
import os
import keyboard  # To detect shortcut key press
from pydub import AudioSegment
from openai import OpenAI
import os
from dotenv import load_dotenv


load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

def record_audio(duration=5, sample_rate=44100):
    """Record audio from the microphone."""
    print("Recording...")
    recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=2, dtype='int16')
    sd.wait()  # Wait until recording is finished
    #print("Recording finished")
    return recording

def save_as_mp3(audio_data, sample_rate=44100, file_name='output.mp3', folder='audio'):
    """Save recorded audio as MP3 in a specified folder."""
    if not os.path.exists(folder):
        os.makedirs(folder)
    full_path = os.path.join(folder, file_name)
    audio_segment = AudioSegment(
        data=np.array(audio_data).tobytes(),
        sample_width=2,  # 2 bytes (16 bits) per sample
        frame_rate=sample_rate,
        channels=2
    )
    audio_segment.export(full_path, format='mp3')
    return full_path

def transcribe_audio(file_path):
    """Transcribe the audio file using OpenAI's API."""
    client = OpenAI(api_key=OPENAI_API_KEY)
    with open(file_path, "rb") as audio_file:
        transcription = client.audio.transcriptions.create(
            model="whisper-1",
            file=audio_file,
            language='en'
        )
    print(f'Transcription :{transcription.text}')

if __name__ == "__main__":
    sample_rate = 44100  # Sample rate in Hz
    duration = 5  # Duration of recording in seconds
    try:
        while True:
            if keyboard.is_pressed('esc'):  # Check if ESC key is pressed to exit
                print("Exiting...")
                break
            audio_data = record_audio(duration, sample_rate)
            file_path = save_as_mp3(audio_data, sample_rate)
            transcribe_audio(file_path)
    except KeyboardInterrupt:
        print("Program terminated.")

This simple application showcases the power of OpenAI’s Whisper API in creating accessible tools for speech-to-text conversion. By integrating such technologies, developers can build more inclusive and efficient communication tools that bridge the gap between spoken and written language.

Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

How I Built a Telegram AI Stock Assistant Using…

In this post, I'll walk you through the process of creating an intelligent Telegram AI assistant, StockBot, using the Llama 3 Groq tool use...
Rajandran R
1 min read

[Course] Building Stock Market Based Telegram Bots using Python

Learn how to build powerful Telegram bots for the stock market using Python. This hands-on course guides you through creating bots that fetch real-time...
Rajandran R
1 min read

Understanding Object-Oriented Programming (OOP) Concepts in Python for Traders…

For traders and investors, having well-structured code can greatly improve the efficiency and manageability of trading applications. Object-Oriented Programming (OOP) in Python offers a...
Rajandran R
3 min read

Leave a Reply

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More