To perform machine learning using linear regression to predict the next day’s close, I’ll follow these steps:
1.Load and Inspect the Data: Load the CSV file and understand its structure, checking for missing values and understanding the type of data we have.
2.Preprocess the Data: Handle any missing values, calculate the required features (EMA10, EMA20, RSI14, Yesterday’s Close), and shift the ‘Close’ column to create the target ‘Next Day Close’.
3.Split the Data: Divide the data into a training set and a testing set.
4.Train the Model: Use the training data to train a linear regression model.
5.Evaluate the Model: Evaluate the model’s performance on the test data.
6.Predict the Next Day’s Close: Use the model to predict the next day’s close.
7.Print the Performance: Output the performance metrics of the model.
8.Plot Feature Importance: Although traditional linear regression doesn’t give a direct measure of feature importance, we can infer it from the coefficients.
Let’s start by loading and inspecting the data.
The dataset contains the following columns:
date: The date of the trading session.
symbol: The symbol for the traded entity, in this case, ‘NIFTY’.
open: The opening price for the day.
high: The highest price during the day.
low: The lowest price during the day.
close: The closing price for the day.
volume: The number of shares or contracts traded in a security or an entire market during a given period.
Next steps involve preprocessing the data by calculating the required features (EMA10, EMA20, RSI14, and Yesterday’s Close) and preparing the target variable (Next Day Close). I’ll also handle any potential missing values in the process.
The data is now preprocessed with the following features:
EMA10: 10-day Exponential Moving Average.
EMA20: 20-day Exponential Moving Average.
RSI14: 14-day Relative Strength Index.
Yesterday Close: The closing price of the previous day.
And the target variable:
Next Day Close: The closing price of the next day.
Next, I’ll split the data into a training set and a testing set. I’ll use the first 80% of the data for training and the remaining 20% for testing. Then, I’ll train a Linear Regression model on the training set and evaluate its performance on the test set.
The Linear Regression model has been trained and evaluated on the test data. The performance metrics are as follows:
Mean Squared Error (MSE): 37974.61
R-Squared (R²): 0.997
The R² value is very close to 1, indicating that the model explains a high proportion of the variance in the target variable. However, it’s essential to remember that stock prices are influenced by numerous factors, not all of which are captured by historical prices and technical indicators. Therefore, while the model might perform well on historical data, it’s crucial to be cautious when using it for actual trading decisions.
Next, I’ll use the model to predict the next day’s close based on the most recent data available and then infer the feature importance from the model’s coefficients.
The model predicts that the next day’s close will be approximately 21605.83. This prediction is based on the most recent data available in the dataset.
The feature importance, inferred from the model’s coefficients, is as follows:
Yesterday Close: 0.549
A positive coefficient indicates that as the feature increases, the predicted value of the next day’s close also increases, and vice versa. For example, the RSI14 has the highest positive coefficient, suggesting it has a strong positive impact on predicting the next day’s close. In contrast, EMA20 has a negative coefficient, indicating an inverse relationship.
Let’s visualize these coefficients to better understand the feature importance
The bar plot above illustrates the importance of each feature in predicting the next day’s close, as inferred from the coefficients of the Linear Regression model:
EMA10: Positive impact, but less than RSI14 and Yesterday Close.
EMA20: Negative impact, suggesting that as EMA20 increases, the next day’s close tends to decrease.
RSI14: Strongest positive impact, indicating a strong relationship with the next day’s close.
Yesterday Close: Positive impact, slightly less than RSI14.
These insights can help in understanding how each feature influences the prediction.