Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

DuckDB Tutorial for Traders – A Python Guide

12 min read

For traders dealing with large datasets, the ability to efficiently analyze and manipulate data is crucial. DuckDB is an in-memory, columnar database designed for analytical workloads, making it an excellent choice for traders who need fast queries on financial data. In this tutorial, we’ll explore how traders can leverage DuckDB with Python for quick and efficient market data analysis.

Who Should Use Duck DB

Duck DB is particularly well-suited for certain types of traders based on their needs and workflows. Here’s who stands to benefit the most:

Traders Handling Large Volumes of Data: Traders who regularly work with large financial datasets—such as algorithmic traders backtesting strategies or analysts studying historical market trends—can leverage Duck DB’s speed and efficiency to gain quick insights.

Individual Traders or Small Trading Firms: For those without the budget for expensive database subscriptions, Duck DB offers a powerful, free alternative. Its lightweight design also makes it practical for smaller setups without dedicated IT resources.

Traders Proficient in SQL: Since Duck DB uses SQL as its query language, traders comfortable with SQL will find it easy to adopt. Those unfamiliar with SQL might face a learning curve, but its widespread use makes it a valuable skill to acquire.

Traders Using Python for Analysis: Given its strong integration with Python, Duck DB is a great match for traders who rely on Python for data manipulation, visualization, or machine learning. It fits seamlessly into their existing toolkits.

Installing DuckDB

To get started, install DuckDB using pip:

pip install duckdb yfinance pandas matplotlib seaborn

1.Loading Market Data into DuckDB

Importing Required Libraries

import duckdb
import pandas as pd

Loading CSV Data into DuckDB

Let’s say you have a CSV file containing stock market data:

Date,Symbol,Open,High,Low,Close,Volume
2024-02-26,AAPL,180.2,182.5,179.8,181.3,50213000
2024-02-26,MSFT,410.3,412.8,409.1,411.5,32054000
2024-02-26,GOOG,143.2,145.5,142.8,144.3,18150000

You can load this data into DuckDB directly:

# Create a DuckDB connection
con = duckdb.connect()

# Load CSV file into a table
con.execute("""
    CREATE TABLE market_data AS
    SELECT * FROM read_csv_auto('market_data.csv');
""")

Integrating DuckDB with Pandas

If you already have market data as a Pandas DataFrame, you can query it using DuckDB:

data = pd.read_csv("market_data.csv")
result = duckdb.query("SELECT Symbol, AVG(Close) AS AvgClose FROM data GROUP BY Symbol").df()
print(result)

Getting Started with DuckDB and yfinance

Let’s start by importing the required libraries and retrieving some Indian stock data:

import duckdb
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

# Create a connection to a new DuckDB database (in-memory)
con = duckdb.connect(database=':memory:')

# Alternatively, to create a persistent database
# con = duckdb.connect('indian_stocks.duckdb')

2. Fetching Stock Market Data

Let’s download historical data for major Indian stocks. We’ll include some of the largest companies on the NSE (National Stock Exchange):

# Define a list of Indian stock symbols (with .NS suffix for NSE listings)
indian_stocks = [
    'RELIANCE.NS',  # Reliance Industries
    'TCS.NS',       # Tata Consultancy Services
    'HDFCBANK.NS',  # HDFC Bank
    'INFY.NS',      # Infosys
    'HINDUNILVR.NS', # Hindustan Unilever
    'ICICIBANK.NS', # ICICI Bank
    'SBIN.NS',      # State Bank of India
    'BHARTIARTL.NS', # Bharti Airtel
    'ITC.NS',       # ITC
    'KOTAKBANK.NS'  # Kotak Mahindra Bank
]

# Set the time period (last 3 years)
end_date = datetime.datetime.now()
start_date = end_date - datetime.timedelta(days=3*365)

# Download the data
stock_data = yf.download(
    indian_stocks,
    start=start_date,
    end=end_date,
    group_by='ticker'
)

# Restructure the data for easier use
all_stocks_df = pd.DataFrame()

for ticker in indian_stocks:
    df = stock_data[ticker].copy()
    df['Symbol'] = ticker
    all_stocks_df = pd.concat([all_stocks_df, df])

# Reset index to make Date a column
all_stocks_df.reset_index(inplace=True)

# Display the first few rows
all_stocks_df

3.Loading Data into DuckDB

Now let’s load this data into DuckDB for analysis:

# Register the DataFrame as a view in DuckDB
con.register('indian_stocks', all_stocks_df)

# Create a table from the view (optional, for persistent storage)
con.execute("CREATE TABLE IF NOT EXISTS indian_stocks_table AS SELECT * FROM indian_stocks")

# Verify the data was loaded correctly
result = con.execute("SELECT Symbol, COUNT(*) as count FROM indian_stocks GROUP BY Symbol").fetchall()
print("Number of records per stock:")
for row in result:
    print(f"{row[0]}: {row[1]} records")
Number of records per stock:
SBIN.NS: 741 records
KOTAKBANK.NS: 741 records
HDFCBANK.NS: 741 records
ICICIBANK.NS: 741 records
BHARTIARTL.NS: 741 records
INFY.NS: 741 records
RELIANCE.NS: 741 records
TCS.NS: 741 records
ITC.NS: 741 records
HINDUNILVR.NS: 741 records

Basic Stock Analysis with DuckDB

4.1. Calculate Daily Returns

Let’s calculate daily returns for each stock:

# Create a view with daily returns
con.execute("""
CREATE OR REPLACE VIEW daily_returns AS
SELECT 
    Symbol,
    Date,
    Close,
    (Close - LAG(Close) OVER (PARTITION BY Symbol ORDER BY Date)) / LAG(Close) OVER (PARTITION BY Symbol ORDER BY Date) AS daily_return
FROM 
    indian_stocks
ORDER BY 
    Symbol, Date
""")

# Query to see some results
result = con.execute("""
SELECT Symbol, Date, Close, daily_return
FROM daily_returns
WHERE daily_return IS NOT NULL
LIMIT 10
""").fetchdf()

print(result)

Output

          Symbol       Date       Close  daily_return
0  BHARTIARTL.NS 2022-03-04  644.241333     -0.027304
1  BHARTIARTL.NS 2022-03-07  665.627319      0.033196
2  BHARTIARTL.NS 2022-03-08  673.117310      0.011253
3  BHARTIARTL.NS 2022-03-09  681.494324      0.012445
4  BHARTIARTL.NS 2022-03-10  687.653931      0.009038
5  BHARTIARTL.NS 2022-03-11  684.795837     -0.004156
6  BHARTIARTL.NS 2022-03-14  690.265564      0.007987
7  BHARTIARTL.NS 2022-03-15  692.926453      0.003855
8  BHARTIARTL.NS 2022-03-16  701.697693      0.012658
9  BHARTIARTL.NS 2022-03-17  712.440063      0.015309

4.2. Calculate Cumulative Returns

Now let’s compute cumulative returns to see how each stock has performed over the entire period:

# Create a view with cumulative returns
con.execute("""
CREATE OR REPLACE VIEW cumulative_returns AS
WITH first_prices AS (
    SELECT 
        Symbol, 
        MIN(Date) as first_date,
        FIRST_VALUE(Close) OVER (PARTITION BY Symbol ORDER BY Date) as first_price
    FROM 
        indian_stocks
    GROUP BY 
        Symbol
)
SELECT 
    s.Symbol,
    s.Date,
    s.Close,
    (s.Close / f.first_price - 1) * 100 as cumulative_return_pct
FROM 
    indian_stocks s
JOIN 
    first_prices f ON s.Symbol = f.Symbol
ORDER BY 
    s.Symbol, s.Date
""")

# Check the results
latest_returns = con.execute("""
SELECT 
    Symbol, 
    MAX(Date) as latest_date,
    MAX(cumulative_return_pct) as total_return_pct
FROM 
    cumulative_returns
GROUP BY 
    Symbol
ORDER BY 
    total_return_pct DESC
""").fetchdf()

print("Total returns over the period (%):")
print(latest_returns)

Output

Total returns over the period (%):
          Symbol latest_date  total_return_pct
0  BHARTIARTL.NS  2025-02-28        167.406180
1         ITC.NS  2025-02-28        157.369812
2        SBIN.NS  2025-02-28        103.975539
3   ICICIBANK.NS  2025-02-28         98.834787
4  HINDUNILVR.NS  2025-02-28         50.840993
5    RELIANCE.NS  2025-02-28         46.849382
6    HDFCBANK.NS  2025-02-28         41.603261
7         TCS.NS  2025-02-28         35.938475
8        INFY.NS  2025-02-28         25.010394
9   KOTAKBANK.NS  2025-02-28         12.459901

4.3. Visualize Cumulative Returns

Let’s visualize the cumulative returns for all stocks:

# Get the data for plotting
plot_data = con.execute("""
SELECT Date, Symbol, cumulative_return_pct
FROM cumulative_returns
ORDER BY Date
""").fetchdf()

# Plot setup
plt.figure(figsize=(14, 8))
sns.set_style('whitegrid')

# Create a line plot for each stock
for ticker in indian_stocks:
    stock_data = plot_data[plot_data['Symbol'] == ticker]
    plt.plot(stock_data['Date'], stock_data['cumulative_return_pct'], label=ticker.replace('.NS', ''))

plt.title('Cumulative Returns of Indian Stocks (%)', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return (%)', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True)
plt.savefig('indian_stocks_cumulative_returns.png')
plt.show()

5. Technical Analysis indicators with DuckDB

5.1. Calculating Moving Averages

Let’s calculate and visualize 20-day and 50-day moving averages:

# Calculate moving averages
con.execute("""
CREATE OR REPLACE VIEW moving_averages AS
SELECT 
    Symbol,
    Date,
    Close,
    AVG(Close) OVER (PARTITION BY Symbol ORDER BY Date ROWS BETWEEN 19 PRECEDING AND CURRENT ROW) AS MA20,
    AVG(Close) OVER (PARTITION BY Symbol ORDER BY Date ROWS BETWEEN 49 PRECEDING AND CURRENT ROW) AS MA50
FROM 
    indian_stocks
ORDER BY 
    Symbol, Date
""")

# Function to plot moving averages for a specific stock
def plot_moving_averages(ticker):
    # Get the data for plotting
    ma_data = con.execute(f"""
    SELECT Date, Close, MA20, MA50
    FROM moving_averages
    WHERE Symbol = '{ticker}'
    AND MA50 IS NOT NULL
    ORDER BY Date
    """).fetchdf()
    
    # Plot
    plt.figure(figsize=(14, 7))
    plt.plot(ma_data['Date'], ma_data['Close'], label='Close Price', alpha=0.6)
    plt.plot(ma_data['Date'], ma_data['MA20'], label='20-Day MA', linewidth=2)
    plt.plot(ma_data['Date'], ma_data['MA50'], label='50-Day MA', linewidth=2)
    
    plt.title(f'Moving Averages for {ticker.replace(".NS", "")}', fontsize=16)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Price (INR)', fontsize=12)
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.savefig(f'{ticker.replace(".NS", "")}_moving_averages.png')
    plt.show()

# Example: Plot moving averages for Reliance
plot_moving_averages('RELIANCE.NS')

5.2. Volatility Analysis

Let’s analyze the volatility of Indian stocks using DuckDB:

# Calculate daily volatility (standard deviation of returns over 30 days)
con.execute("""
CREATE OR REPLACE VIEW volatility AS
SELECT 
    Symbol,
    Date,
    STDDEV(daily_return) OVER (PARTITION BY Symbol ORDER BY Date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) * SQRT(252) * 100 AS annualized_volatility
FROM 
    daily_returns
WHERE 
    daily_return IS NOT NULL
ORDER BY 
    Symbol, Date
""")

# Get average volatility for each stock
avg_volatility = con.execute("""
SELECT 
    Symbol, 
    AVG(annualized_volatility) AS avg_annualized_volatility
FROM 
    volatility
WHERE 
    annualized_volatility IS NOT NULL
GROUP BY 
    Symbol
ORDER BY 
    avg_annualized_volatility DESC
""").fetchdf()

print("Average Annualized Volatility (%):")
print(avg_volatility)

# Plot volatility over time for all stocks
vol_data = con.execute("""
SELECT Date, Symbol, annualized_volatility
FROM volatility
WHERE annualized_volatility IS NOT NULL
ORDER BY Date
""").fetchdf()

plt.figure(figsize=(14, 8))

for ticker in indian_stocks:
    stock_data = vol_data[vol_data['Symbol'] == ticker]
    plt.plot(stock_data['Date'], stock_data['annualized_volatility'], label=ticker.replace('.NS', ''))

plt.title('30-Day Rolling Annualized Volatility of Indian Stocks (%)', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Annualized Volatility (%)', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True)
plt.savefig('indian_stocks_volatility.png')
plt.show()

Output

Average Annualized Volatility (%):
          Symbol  avg_annualized_volatility
0        SBIN.NS                  24.291845
1        INFY.NS                  23.877182
2    RELIANCE.NS                  21.816325
3   KOTAKBANK.NS                  21.188126
4  BHARTIARTL.NS                  20.795398
5    HDFCBANK.NS                  20.566145
6         TCS.NS                  20.497967
7  HINDUNILVR.NS                  20.129418
8   ICICIBANK.NS                  18.977353
9         ITC.NS                  18.940518

5.3. Correlation Analysis

Let’s analyze the correlation between different Indian stocks:

# Pivot the data to have one column per stock
pivot_query = """
WITH daily_pivot AS (
    SELECT 
        Date,
        Symbol,
        daily_return
    FROM 
        daily_returns
    WHERE 
        daily_return IS NOT NULL
)
SELECT 
    Date,
    """

# Create dynamic SQL for pivoting
pivot_columns = []
for ticker in indian_stocks:
    ticker_clean = ticker.replace('.', '_')
    pivot_columns.append(f"MAX(CASE WHEN Symbol = '{ticker}' THEN daily_return END) AS {ticker_clean}")

pivot_query += ", ".join(pivot_columns)
pivot_query += """
FROM daily_pivot
GROUP BY Date
ORDER BY Date
"""

# Execute the pivot query
pivot_data = con.execute(pivot_query).fetchdf()

# Calculate correlation matrix
correlation_matrix = pivot_data.iloc[:, 1:].corr()

# Plot correlation heatmap
plt.figure(figsize=(14, 12))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5,
            xticklabels=[s.replace('.NS', '') for s in indian_stocks],
            yticklabels=[s.replace('.NS', '') for s in indian_stocks])
plt.title('Correlation Matrix of Indian Stocks', fontsize=16)
plt.tight_layout()
plt.savefig('indian_stocks_correlation.png')
plt.show()

5.4. Sector Performance Comparison (using index data)

Let’s add major Indian indices to compare sector performance:

# Major Indian indices
indices = [
    '^NSEI',      # NIFTY 50
    '^BSESN',     # BSE SENSEX
    '^CNXBANK',   # NIFTY Bank
    '^CNXIT',     # NIFTY IT
    '^CNXPHARMA'  # NIFTY Pharma
]

# Download index data
indices_data = yf.download(
    indices,
    start=start_date,
    end=end_date,
    group_by='ticker'
)

# Prepare indices data
all_indices_df = pd.DataFrame()

for idx in indices:
    df = indices_data[idx].copy()
    df['Symbol'] = idx
    all_indices_df = pd.concat([all_indices_df, df])

all_indices_df.reset_index(inplace=True)

# Register the indices DataFrame
con.register('indian_indices', all_indices_df)

# Calculate index returns
con.execute("""
CREATE OR REPLACE VIEW index_returns AS
WITH first_prices AS (
    SELECT 
        Symbol, 
        MIN(Date) as first_date,
        FIRST_VALUE(Close) OVER (PARTITION BY Symbol ORDER BY Date) as first_price
    FROM 
        indian_indices
    GROUP BY 
        Symbol
)
SELECT 
    i.Symbol,
    i.Date,
    i.Close,
    (i.Close / f.first_price - 1) * 100 as cumulative_return_pct
FROM 
    indian_indices i
JOIN 
    first_prices f ON i.Symbol = f.Symbol
ORDER BY 
    i.Symbol, i.Date
""")

# Plot index performance
index_data = con.execute("""
SELECT Date, Symbol, cumulative_return_pct
FROM index_returns
ORDER BY Date
""").fetchdf()

plt.figure(figsize=(14, 8))

for idx in indices:
    idx_data = index_data[index_data['Symbol'] == idx]
    plt.plot(idx_data['Date'], idx_data['cumulative_return_pct'], label=idx)

plt.title('Cumulative Returns of Indian Indices (%)', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Cumulative Return (%)', fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.grid(True)
plt.savefig('indian_indices_returns.png')
plt.show()

6. Trading Strategy Backtest Example

Let’s implement a simple moving average crossover strategy and backtest it on selected Indian stocks:

# Create a view for the moving average crossover strategy
con.execute("""
CREATE OR REPLACE VIEW ma_crossover_strategy AS
SELECT 
    Symbol,
    Date,
    Close,
    MA20,
    MA50,
    CASE 
        WHEN MA20 > MA50 AND LAG(MA20) OVER (PARTITION BY Symbol ORDER BY Date) <= LAG(MA50) OVER (PARTITION BY Symbol ORDER BY Date) 
        THEN 'BUY'
        WHEN MA20 < MA50 AND LAG(MA20) OVER (PARTITION BY Symbol ORDER BY Date) >= LAG(MA50) OVER (PARTITION BY Symbol ORDER BY Date) 
        THEN 'SELL'
        ELSE NULL
    END AS signal
FROM 
    moving_averages
WHERE 
    MA20 IS NOT NULL AND MA50 IS NOT NULL
ORDER BY 
    Symbol, Date
""")

# Get all signals
signals = con.execute("""
SELECT Symbol, Date, Close, signal
FROM ma_crossover_strategy
WHERE signal IS NOT NULL
ORDER BY Symbol, Date
""").fetchdf()

print("Trading Signals:")
print(signals.head(20))

# Backtest the strategy for a specific stock
def backtest_strategy(ticker):
    # Get all relevant data for the stock
    backtest_data = con.execute(f"""
    WITH signals_with_positions AS (
        SELECT 
            Symbol,
            Date,
            Close,
            signal,
            SUM(CASE WHEN signal = 'BUY' THEN 1 WHEN signal = 'SELL' THEN -1 ELSE 0 END) 
                OVER (PARTITION BY Symbol ORDER BY Date) AS position
        FROM 
            ma_crossover_strategy
        WHERE 
            Symbol = '{ticker}'
        ORDER BY 
            Date
    )
    SELECT 
        Date,
        Close,
        signal,
        position,
        CASE 
            WHEN position > 0 THEN 'In Trade'
            ELSE 'Out of Market'
        END AS market_position
    FROM 
        signals_with_positions
    ORDER BY 
        Date
    """).fetchdf()
    
    # Calculate strategy returns
    backtest_data['previous_close'] = backtest_data['Close'].shift(1)
    backtest_data['daily_return'] = backtest_data['Close'] / backtest_data['previous_close'] - 1
    backtest_data['strategy_return'] = backtest_data['daily_return'] * backtest_data['position'].shift(1).fillna(0)
    
    # Calculate cumulative returns
    backtest_data['cumulative_market_return'] = (1 + backtest_data['daily_return']).cumprod() - 1
    backtest_data['cumulative_strategy_return'] = (1 + backtest_data['strategy_return']).cumprod() - 1
    
    # Plot the results
    plt.figure(figsize=(14, 10))
    
    # Plot prices and positions
    ax1 = plt.subplot(2, 1, 1)
    ax1.plot(backtest_data['Date'], backtest_data['Close'])
    
    # Mark buy signals
    buy_signals = backtest_data[backtest_data['signal'] == 'BUY']
    ax1.scatter(buy_signals['Date'], buy_signals['Close'], marker='^', color='g', s=100, label='Buy')
    
    # Mark sell signals
    sell_signals = backtest_data[backtest_data['signal'] == 'SELL']
    ax1.scatter(sell_signals['Date'], sell_signals['Close'], marker='v', color='r', s=100, label='Sell')
    
    ax1.set_title(f'Moving Average Crossover Strategy Signals for {ticker.replace(".NS", "")}', fontsize=15)
    ax1.set_ylabel('Stock Price (INR)', fontsize=12)
    ax1.legend()
    ax1.grid(True)
    
    # Plot returns
    ax2 = plt.subplot(2, 1, 2)
    ax2.plot(backtest_data['Date'], backtest_data['cumulative_market_return'] * 100, label='Buy & Hold')
    ax2.plot(backtest_data['Date'], backtest_data['cumulative_strategy_return'] * 100, label='Strategy')
    ax2.set_title('Cumulative Returns (%)', fontsize=15)
    ax2.set_xlabel('Date', fontsize=12)
    ax2.set_ylabel('Return (%)', fontsize=12)
    ax2.legend()
    ax2.grid(True)
    
    plt.tight_layout()
    plt.savefig(f'{ticker.replace(".NS", "")}_strategy_backtest.png')
    plt.show()
    
    # Print performance metrics
    final_market_return = backtest_data['cumulative_market_return'].iloc[-1] * 100
    final_strategy_return = backtest_data['cumulative_strategy_return'].iloc[-1] * 100
    
    print(f"\nPerformance Summary for {ticker}:")
    print(f"Buy & Hold Return: {final_market_return:.2f}%")
    print(f"Strategy Return: {final_strategy_return:.2f}%")
    print(f"Outperformance: {final_strategy_return - final_market_return:.2f}%")
    
    # Calculate other metrics
    strategy_returns = backtest_data['strategy_return']
    market_returns = backtest_data['daily_return']
    
    strategy_sharpe = strategy_returns.mean() / strategy_returns.std() * (252 ** 0.5)
    market_sharpe = market_returns.mean() / market_returns.std() * (252 ** 0.5)
    
    print(f"Strategy Sharpe Ratio: {strategy_sharpe:.2f}")
    print(f"Market Sharpe Ratio: {market_sharpe:.2f}")
    
    # Count trades
    trades = backtest_data[backtest_data['signal'].notna()].shape[0]
    print(f"Number of Trades: {trades}")

# Run backtest for a sample stock
backtest_strategy('RELIANCE.NS')

Output

Trading Signals:
           Symbol       Date        Close signal
0   BHARTIARTL.NS 2022-04-01   747.524841    BUY
1   BHARTIARTL.NS 2022-05-18   686.077087   SELL
2   BHARTIARTL.NS 2022-08-08   697.243774    BUY
3   BHARTIARTL.NS 2022-12-29   812.469421   SELL
4   BHARTIARTL.NS 2023-04-28   791.235779    BUY
5   BHARTIARTL.NS 2023-08-30   852.996399   SELL
6   BHARTIARTL.NS 2023-09-15   930.961975    BUY
7   BHARTIARTL.NS 2024-11-11  1560.550049   SELL
8   BHARTIARTL.NS 2024-12-30  1586.900024    BUY
9   BHARTIARTL.NS 2024-12-31  1587.750000   SELL
10  BHARTIARTL.NS 2025-01-01  1595.699951    BUY
11    HDFCBANK.NS 2022-04-01  1451.984863    BUY
12    HDFCBANK.NS 2022-05-05  1304.424194   SELL
13    HDFCBANK.NS 2022-06-21  1303.595215    BUY
14    HDFCBANK.NS 2022-06-30  1314.762939   SELL
15    HDFCBANK.NS 2022-07-11  1371.479126    BUY
16    HDFCBANK.NS 2022-10-10  1380.110840   SELL
17    HDFCBANK.NS 2022-11-10  1487.154785    BUY
18    HDFCBANK.NS 2023-01-17  1569.229980   SELL
19    HDFCBANK.NS 2023-01-24  1653.694702    BUY

Output

Performance Summary for RELIANCE.NS:
Buy & Hold Return: 10.83%
Strategy Return: 7.20%
Outperformance: -3.63%
Strategy Sharpe Ratio: 0.23
Market Sharpe Ratio: 0.27
Number of Trades: 18

Additional Considerations

  • Types of Traders: Algorithmic traders, who often backtest strategies on large datasets, are prime candidates for Duck DB. Day traders needing quick analysis of real-time data might also benefit, though for streaming data, specialized tools could complement Duck DB. Long-term investors might find it less critical but could still use it for efficient historical analysis.
  • Learning Curve and Ecosystem: Duck DB’s SQL-based interface is accessible to those with database experience, and its growing popularity means good documentation and community support. Its compatibility with data science tools further enhances its appeal.
  • Security and Storage: As an embedded database, Duck DB gives traders control over their data locally, which can be an advantage for security (though it lacks built-in encryption—sensitive data requires additional safeguards). Its ability to query external files also reduces storage duplication.

Conclusion

In this tutorial, we explored how DuckDB can be used for efficient analysis of Indian stock market data. DuckDB’s SQL capabilities make it easy to perform complex analyses, while its integration with the Python ecosystem allows for seamless visualization and further processing.

Key advantages of using DuckDB for stock analysis include:

  1. Fast analytical queries directly on Pandas DataFrames
  2. Efficient handling of time series operations (moving averages, lag functions, etc.)
  3. SQL-based approach that is familiar to many analysts
  4. Ability to handle large datasets with minimal memory overhead
  5. Easy integration with visualization libraries like Matplotlib and Seaborn

The techniques demonstrated in this tutorial can be extended to more sophisticated analyses, such as:

  • Portfolio optimization and asset allocation
  • Risk modeling and VaR (Value at Risk) calculations
  • Machine learning-based prediction models
  • Integration with other data sources like economic indicators
  • Real-time analytics with streaming data

DuckDB’s combination of performance and ease of use makes it an excellent choice for financial analysts working with Indian stock market data.

If you’re dealing with large financial datasets and need a lightweight yet powerful analytical tool, DuckDB is definitely worth adding to your trading workflow.

Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More