ChatGPT – Advanced Data Analysis tool ( Data Analyst ), a customGPT formerly known as code interpreter is a. a specialized version of the ChatGPT model, tailored to focus on data analysis tasks. It is designed and trained to assist with a range of data-related inquiries and tasks like data exploration, statistical analysis, data visualization, data cleaning, predictive modeling (machine learning and forecasting), natural language processing, and problem-solving without writing a single line of code.

**How Advance Data Analysis is Performed by ChatGPT4 Data Analyst?**

It can write and execute Python and R scripts to perform data analysis tasks, from simple data manipulations to complex statistical modeling.

**Importing the Data to Data-Analyst**

Import the CSV File that you would like to analyze with the following prompt

**Prompt 1: Computing and Plotting Technical Indicators**

Plot EMA10, EMA20 in Color Blue and Red

Plot RSI14 in Color Red with thick lines with overbought(70) and oversold(30) levels with dashed lines

Plot the Data starting from 2023 onwards

**ChatGPT4 Output**

The top plot shows the Close data of NIFTY starting from 2023 in green color, along with EMA10 in blue and EMA20 in red.

The bottom plot displays RSI14 in red with thick lines. The overbought (70) and oversold (30) levels are indicated with dashed lines.

**Prompt 2: Calculate and Plot the Rolling Sharpe Ratio**

**ChatGPT4 Output**

Here’s the plan:

Calculate the daily returns of NIFTY.

Define the risk-free rate as 6% per annum.

Calculate the rolling mean and standard deviation of the returns over a certain window (e.g., 252 days, representing a trading year).

Compute the Sharpe Ratio using these rolling statistics.

Plot the rolling Sharpe Ratio.

Let’s proceed with this calculation and plot the result.

Here’s the plot of the Rolling Sharpe Ratio of Nifty over a 252-day window. The Sharpe Ratio is a measure of risk-adjusted return, and this rolling calculation provides a dynamic view of how the Nifty’s risk-adjusted performance has evolved over time

**Prompt 3: Calculating January Month Seasonality**

Also, provide the following seasonality metrix of January month returns.

Mean Return

Median Return

Standard Deviation

No of Positive Months

No of Negative Months

Percentage Positive Months

Percentage Negative Months

**ChatGPT4 Response**

## Seasonality Metrics for January Returns:

**Mean Return:** 0.37%

**Median Return:** -0.29%

**Standard Deviation:** 5.75%

**No of Positive Months:** 5

**No of Negative Months:** 8

**Percentage Positive Months:** 38.46%

**Percentage Negative Months:** 61.54%

This table and the metrics provide an insight into the January effect on Nifty returns, showcasing variability and trends over the specified years.

## Prompt 4 : Generating Monthly Returns Heatmap

**ChatGPT4 Output**

1.Load the data from the provided CSV file.

2.Parse and format the date information correctly.

3.Calculate the monthly returns from the daily data.

4.Create the heatmap using the calculated monthly returns.

First, let’s load the data and examine the first few rows to understand its structure.

The data contains the following columns: date, symbol, open, high, low, close, and volume. To create the monthly returns heatmap, I will focus on the ‘date’ and ‘close’ columns. Here are the steps I’ll follow:

1.Convert the ‘date’ column to a datetime object.

2.Filter the data to start from December 31, 2009.

3.Calculate the monthly returns.

4.Create the heatmap.

Let’s start by converting the ‘date’ column and filtering the data.

The data has been successfully filtered to start from December 31, 2009. Now, I will proceed with the following steps:

1.Calculate the monthly returns.

2.Pivot the data to create a matrix suitable for a heatmap, with months as columns and years as rows.

3.Create the heatmap using Matplotlib and Seaborn.

Let’s calculate the monthly returns first.

The monthly returns have been successfully calculated and organized into a pivot table suitable for creating a heatmap. Each cell in the table represents the return for a specific month and year.

Now, I’ll create the heatmap using Matplotlib and Seaborn to visualize the monthly returns from December 31, 2009, onwards.

Here’s the Monthly Returns Heatmap starting from December 31, 2009, onwards. Each cell represents the return for a given month and year, with the color indicating the magnitude of the return. Red tones signify negative returns, while blue tones signify positive returns. You can use this heatmap to quickly visualize and analyze the performance and trends over time. If you have any specific questions or need further analysis, please let me know!

## Prompt 5: Building a Simple Trading Strategy

**ChatGPT4 Output**

1.Calculate EMA20 and EMA50 for the closing prices.

2.Determine the points where EMA20 crosses above EMA50 (positive crossover) and where EMA20 crosses below EMA50 (negative crossover).

3.Plot the closing prices, EMA20, and EMA50 from 2019 onwards.

4.Add triangles to indicate buy (upward triangle for positive crossover) and sell (downward triangle for negative crossover) signals.

First, let’s calculate the EMA20 and EMA50 for the closing prices.

The EMA20 and EMA50 have been calculated, and the points of positive and negative crossovers have been identified. Now, let’s proceed to plot the closing prices, EMA20, EMA50, and the buy/sell signals from 2019 onwards.

Buy signals (positive crossover) will be indicated with an upward triangle (▲).

Sell signals (negative crossover) will be indicated with a downward triangle (▼).

Here’s the chart showing the NIFTY closing prices along with the 20-day and 50-day Exponential Moving Averages (EMA20 and EMA50) from 2019 onwards. Additionally:

Green upward triangles (▲) indicate buy signals where EMA20 crosses above EMA50 (positive crossover).

Red downward triangles (▼) indicate sell signals where EMA20 crosses below EMA50 (negative crossover).

This visualization helps in identifying potential entry (buy) and exit (sell) points based on the EMA crossover strategy. If you have any further questions or need additional assistance, feel free to ask

**Prompt 6: Backtesting a Trading Strategy**

Strategy Buy on Positive EMA Crossover and Sell on Negative EMA Crossover

Initial Capital Rs 20,00,000

Trade with 50 shares

Strategy Type: Long Only

Generate All Trade Results with Entry Date, Entry Price, Exit Date, Exit Price, and Profit/Loss from 2019 onwards, and include current open trades as well. Use the Close Price of the day as the Entry/Exit Price. Plot the Trade Results in Table Format

Generate the Basic Backtesting Metrics From 2019 onwards.

Plot the Cumulative Profit/Loss with an area plot

**ChatGPT Output**

**1.Load and Inspect the Data:** I’ll load the data from the provided CSV file to understand its structure and the available information.

**2.Calculate EMAs: I’ll calculate the 20-day and 50-day Exponential Moving Averages (EMAs) starting from 2018.**

**3.Strategy Implementation:** I’ll implement the strategy where we buy 50 shares on a positive EMA crossover (EMA20 > EMA50) and sell on a negative crossover (EMA20 < EMA50). This will be a long-only strategy.
**4.Generate Trade Results:** I’ll generate a list of all trades from 2019 onwards, including entry date, entry price, exit date, exit price, and profit/loss.

**5.Backtesting Metrics:** I’ll calculate basic backtesting metrics such as total profit/loss, win rate, and others from 2019 onwards.

**6.Visualizations:**

**Trade Results Table:** I’ll display the trade results in a tabular format.

**Cumulative Profit/Loss:** I’ll plot the cumulative profit/loss over time with an area plot.

Let’s start by loading and inspecting the data.

The dataset contains daily price data for the NIFTY index, including the date, open, high, low, close, and volume. The next steps are to:

1)Filter the data from 2018 onwards.

2)Calculate the 20-day and 50-day Exponential Moving Averages (EMAs).

3)Implement the trading strategy based on EMA crossovers.

I’ll start by performing these tasks now.

I’ve calculated the trade results based on the EMA crossover strategy from 2019 onwards. Here are the details for the first few trades:

Next, I’ll calculate the basic backtesting metrics from 2019 onwards and plot the cumulative profit/loss over time with an area plot.

Here are the basic backtesting metrics for the EMA crossover strategy from 2019 onwards:

**Total Trades:** 10

**Total Profit/Loss:** Rs 496,580.10

**Average Profit/Loss per Trade:** Rs 49,658.01

**Win Rate:** 80%

**Max Profit:** Rs 211,935.00 (Maximum profit from a single trade)

**Max Loss:** Rs -30,067.45 (Maximum loss from a single trade)

The cumulative profit/loss over time has also been plotted, showing the progression of the strategy’s returns.

**ChatGPT4 PromptPerform Machine Learning using Linear Regression to Predict the Next Day’s Close
**

## Prompt 7: Perform Machine Learning for Stock Prediction

Features – EMA10, EMA20, RSI14, Yesterdays Close

Target – Next Day Close

Train and Evaluate the Model

Predict the Next day’s close based on the trained model

Print the Performance of the ML Model

Plot the feature importance

**ChatGPT4 Output**

**1.Load and Inspect the Data:** Load the CSV file and understand its structure, checking for missing values and understanding the type of data we have.

**2.Preprocess the Data:** Handle any missing values, calculate the required features (EMA10, EMA20, RSI14, Yesterday’s Close), and shift the ‘Close’ column to create the target ‘Next Day Close’.

**3.Split the Data:** Divide the data into a training set and a testing set.

**4.Train the Model:** Use the training data to train a linear regression model.

**5.Evaluate the Model:** Evaluate the model’s performance on the test data.

**6.Predict the Next Day’s Close:** Use the model to predict the next day’s close.

**7.Print the Performance:** Output the performance metrics of the model.

**8.Plot Feature Importance:** Although traditional linear regression doesn’t give a direct measure of feature importance, we can infer it from the coefficients.

Let’s start by loading and inspecting the data.

The dataset contains the following columns:

date: The date of the trading session.

symbol: The symbol for the traded entity, in this case, ‘NIFTY’.

open: The opening price for the day.

high: The highest price during the day.

low: The lowest price during the day.

close: The closing price for the day.

volume: The number of shares or contracts traded in a security or an entire market during a given period.

Next steps involve preprocessing the data by calculating the required features (EMA10, EMA20, RSI14, and Yesterday’s Close) and preparing the target variable (Next Day Close). I’ll also handle any potential missing values in the process.

The data is now preprocessed with the following features:

EMA10: 10-day Exponential Moving Average.

EMA20: 20-day Exponential Moving Average.

RSI14: 14-day Relative Strength Index.

Yesterday Close: The closing price of the previous day.

And the target variable:

Next Day Close: The closing price of the next day.

Next, I’ll split the data into a training set and a testing set. I’ll use the first 80% of the data for training and the remaining 20% for testing. Then, I’ll train a Linear Regression model on the training set and evaluate its performance on the test set.

The Linear Regression model has been trained and evaluated on the test data. The performance metrics are as follows:

**Mean Squared Error (MSE):** 37974.61

**R-Squared (R²):** 0.997

The R² value is very close to 1, indicating that the model explains a high proportion of the variance in the target variable. However, it’s essential to remember that stock prices are influenced by numerous factors, not all of which are captured by historical prices and technical indicators. Therefore, while the model might perform well on historical data, it’s crucial to be cautious when using it for actual trading decisions.

Next, I’ll use the model to predict the next day’s close based on the most recent data available and then infer the feature importance from the model’s coefficients.

**The model predicts that the next day’s close will be approximately 21605.83. This prediction is based on the most recent data available in the dataset.**

The feature importance, inferred from the model’s coefficients, is as follows:

EMA10: 0.660

EMA20: 0.209

RSI14: 1.128

Yesterday Close: 0.549

A positive coefficient indicates that as the feature increases, the predicted value of the next day’s close also increases, and vice versa. For example, the RSI14 has the highest positive coefficient, suggesting it has a strong positive impact on predicting the next day’s close. In contrast, EMA20 has a negative coefficient, indicating an inverse relationship.

Let’s visualize these coefficients to better understand the feature importance

The bar plot above illustrates the importance of each feature in predicting the next day’s close, as inferred from the coefficients of the Linear Regression model:

EMA10: Positive impact, but less than RSI14 and Yesterday Close.

EMA20: Negative impact, suggesting that as EMA20 increases, the next day’s close tends to decrease.

RSI14: Strongest positive impact, indicating a strong relationship with the next day’s close.

Yesterday Close: Positive impact, slightly less than RSI14.

These insights can help in understanding how each feature influences the prediction.

## Prompt 8 : Calculate 5-Year Rolling CAGR Returns

Plot the rolling CAGR without explanation.

**ChatGPT4 Output**

Here’s the plot showing the 5-Year Rolling CAGR Returns on the NIFTY Close Price, with the y-axis in percentage format.

**Prompt 9: Drawdown Plot**

Plot the Drawdown without explanation.

**ChatGPT4 Output**

Here’s the Drawdown area plot on the NIFTY Close Price, with the y-axis in percentage format.

## Prompt 10: Building Simple Scanners

1)Date Range: 2015-01-01 to 2023-12-31.

2)Data Points: Close Price

3)Frequency: Weekly

4)Indicators: RSI, EMA5, EMA10, Close

5)Filter Criteria: EMA5 and EMA10 Positive EMA Crossover

6)Output Preferences: Table Format

Without Explanations Provide the Final Output(Full). Round the Table Values to 2 digits