Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

How to Download Last 10 Years of Intraday Data in CSV Format – Python Tutorial

5 min read

Intraday data is a treasure trove for traders, analysts, and researchers. This blog will guide you through creating a Python script to download historical intraday data at 1-minute intervals for multiple symbols. You’ll also learn how to customize the time range, resume downloads, and optimize the script for efficiency.

Why Historical Intraday Data Matters

Intraday data is crucial for backtesting strategies, analyzing market trends, and developing automated trading algorithms. With the right tools and a bit of programming, you can build a reliable pipeline to fetch and store this data.

What You’ll Need

1. OpenAlgo Installation

  • Install and configure OpenAlgo. Ensure you are running OpenAlgo with Python 3.9 or higher. Refer to the official OpenAlgo documentation for detailed installation instructions.
  • If you have not updated to the latest OpenAlgo, update before proceeding.

2. OpenAlgo Supported Brokers API

  • With Zerodha API + Addon Historical Chart Data (Cost Rs 4000 per month), downloading 10+ years of data is possible.
  • With Fyers API, up to 7+ years of data can be downloaded.
  • Different brokers have their own data download limits. Make sure to check your broker’s data policy.
  • Subscribe to the Zerodha API and the Historical Chart Data API. The total cost for accessing these APIs is Rs 4000 per month.

3. Editor and Tools

  • Use any reliable Python editor, such as VS Code, Cursor, or Windsurf.

4. Ensure OpenAlgo is Running

  • Before running the script, verify that OpenAlgo is running and accessible locally at http://127.0.0.1:5000.
  • The Python script ieod.py is available in the download folder inside OpenAlgo. Simply open a new terminal and run <strong>ieod.py</strong> from OpenAlgo itself.

5. Symbols List

Prepare a file named symbols.csv with a list of symbols, one per line (you can file this sample file inside the /openalgo/downloads folder inside openalgo

RELIANCE
ICICIBANK
HDFCBANK
SBIN
TCS
INFY

Features of the Script

Know Your Broker’s Data Download Limit

This script downloads only 1-minute intraday data. Feel free to adjust the code to download a custom timeframe based on your broker’s available historical data.

  1. User-Friendly Options: Choose between different time ranges like today, last 5 days, 30 days, or even 10 years.
  2. Resume Downloads: If the script stops, it resumes from the last processed symbol using a checkpoint.
  3. Batch Processing: Downloads are handled in smaller batches to optimize memory usage.
  4. Retry Mechanism: Automatically retries failed downloads up to three times.

Running the Script

How to Start the Download Process

  1. Open a terminal.
  2. Navigate to the OpenAlgo download folder.
  3. Run the script using python ieod.py
  4. If you have not updated OpenAlgo, ensure you are on the latest version before running the script.

How It Works

  1. Interactive Mode: Users can choose between a fresh download or resuming a previous session.
  2. Time Period Options: Select a time range from today’s data to the last 10 years.
  3. Checkpointing: If interrupted, the script resumes from the last saved symbol.

Complete Python Code

import pandas as pd
import logging
import os
import time
import gc
from openalgo import api
from datetime import datetime, timedelta

# Initialize the API client
client = api(api_key='your_api_key_here', host='http://127.0.0.1:5000')

# Path to the CSV file
symbols_file = "symbols.csv"
output_folder = "symbols"
checkpoint_file = "checkpoint.txt"

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Set up logging
logging.basicConfig(
    filename="data_download.log",
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
)

# Function to get start date based on user selection
def get_date_range(option):
    today = datetime.now()
    if option == 1:
        return today.strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 2:
        return (today - timedelta(days=5)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 3:
        return (today - timedelta(days=30)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 4:
        return (today - timedelta(days=90)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 5:
        return (today - timedelta(days=365)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 6:
        return (today - timedelta(days=365 * 2)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 7:
        return (today - timedelta(days=365 * 5)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    elif option == 8:
        return (today - timedelta(days=365 * 10)).strftime("%Y-%m-%d"), today.strftime("%Y-%m-%d")
    else:
        raise ValueError("Invalid selection")

# Prompt user for fresh download or continuation
print("Select download mode:")
print("1) Fresh download")
print("2) Continue from the last checkpoint")

try:
    mode_choice = int(input("Enter your choice (1-2): "))
    if mode_choice not in [1, 2]:
        raise ValueError("Invalid selection")
except ValueError:
    print("Invalid input. Please restart the script and select a valid option.")
    exit()

# Prompt user for time period
print("Select the time period for data download:")
print("1) Download Today's Data")
print("2) Download Last 5 Days Data")
print("3) Download Last 30 Days Data")
print("4) Download Last 90 Days Data")
print("5) Download Last 1 Year Data")
print("6) Download Last 2 Years Data")
print("7) Download Last 5 Years Data")
print("8) Download Last 10 Years Data")

try:
    user_choice = int(input("Enter your choice (1-8): "))
    start_date, end_date = get_date_range(user_choice)
except ValueError as e:
    print("Invalid input. Please restart the script and select a valid option.")
    exit()

# Read symbols from CSV
symbols = pd.read_csv(symbols_file, header=None)[0].tolist()

# Handle checkpoint logic
if mode_choice == 2 and os.path.exists(checkpoint_file):
    with open(checkpoint_file, "r") as f:
        last_processed = f.read().strip()
    # Skip symbols up to the last processed one
    if last_processed in symbols:
        symbols = symbols[symbols.index(last_processed) + 1:]
elif mode_choice == 1:
    # Remove existing checkpoint for fresh download
    if os.path.exists(checkpoint_file):
        os.remove(checkpoint_file)

# Process symbols in batches
batch_size = 10  # Adjust this value based on your memory availability
for i in range(0, len(symbols), batch_size):
    batch = symbols[i:i + batch_size]
    for symbol in batch:
        logging.info(f"Starting download for {symbol}")
        try:
            # Skip already downloaded symbols
            output_file = os.path.join(output_folder, f"{symbol}.csv")
            if os.path.exists(output_file):
                logging.info(f"Skipping {symbol}, already downloaded")
                continue

            # Fetch historical data for the symbol
            for attempt in range(3):  # Retry up to 3 times
                try:
                    response = client.history(
                        symbol=symbol,
                        exchange="NSE",
                        interval="1m",
                        start_date=start_date,
                        end_date=end_date
                    )
                    break
                except Exception as e:
                    logging.warning(f"Retry {attempt + 1} for {symbol} due to error: {e}")
                    time.sleep(5)  # Wait before retrying
            else:
                logging.error(f"Failed to download data for {symbol} after 3 attempts")
                continue

            # Convert the response to a DataFrame if it's a dictionary
            if isinstance(response, dict):
                if "timestamp" in response:
                    df = pd.DataFrame(response)
                else:
                    logging.error(f"Response for {symbol} missing 'timestamp' key: {response}")
                    continue
            else:
                df = response

            # Ensure the DataFrame is not empty
            if df.empty:
                logging.warning(f"No data available for {symbol}")
                continue

            # Reset the index to extract the timestamp
            df.reset_index(inplace=True)

            # Rename and split the timestamp column
            df['DATE'] = pd.to_datetime(df['timestamp']).dt.date
            df['TIME'] = pd.to_datetime(df['timestamp']).dt.time

            # Add SYMBOL column and rearrange columns
            df['SYMBOL'] = symbol
            df = df[['SYMBOL', 'DATE', 'TIME', 'open', 'high', 'low', 'close', 'volume']]
            df.columns = ['SYMBOL', 'DATE', 'TIME', 'OPEN', 'HIGH', 'LOW', 'CLOSE', 'VOLUME']

            # Save to CSV file
            df.to_csv(output_file, index=False)
            logging.info(f"Data for {symbol} saved to {output_file}")

            # Save checkpoint after successfully processing the symbol
            with open(checkpoint_file, "w") as f:
                f.write(symbol)

            # Clear DataFrame and force garbage collection
            del df
            gc.collect()

        except Exception as e:
            logging.error(f"Failed to download data for {symbol}: {e}")

        # Delay to avoid rate limiting
        time.sleep(3)

    logging.info(f"Batch of {batch_size} symbols completed.")

logging.info("All data downloaded.")

Key Tips

  1. Optimize Batch Size: Adjust batch_size based on your system’s memory.
  2. Retry Logic: Ensure you have a stable internet connection to avoid retries.
  3. Monitor Resources: Use tools like Task Manager or htop to track memory usage.
  4. Update OpenAlgo: Always ensure you are using the latest version of OpenAlgo for the best performance and compatibility.

Happy coding, and may your data-driven strategies thrive!

Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More