Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

Learning Linear Regression – Simple Machine Learning Based Prediction Algorithm for Forecasting Stock Price

5 min read

Machine learning has become an integral part of stock market analysis and prediction. Linear Regression is a widely used algorithm for predicting stock prices. In this blog, we will discuss the Linear Regression model for predicting stock prices using the Python programming language.

What is Linear Regression

Linear regression is a type of supervised learning algorithm that makes predictions based on a linear relationship between the input variables (also known as features) and the output variable (also known as the target variable).

In the case of stock price prediction, the linear regression model is trained on historical stock price data, which includes features such as the open, high, low, close, volume, rsi, ema, hma, adx, atr for a given day. The target variable is typically the closing_forecast since this is the price that investors are most interested in predicting.

The linear regression model uses the training data to learn the relationship between the input variables (features) and the target variable(prediction), by estimating the coefficients of a linear equation that best fits the data. Once the model has been trained, it can be used to make predictions on new, unseen data, by simply plugging in the values of the input variables and solving for the output variable using the learned coefficients.

Preparing the Feature Dataset

The dataset used for the implementation of the model is the NIFTY_EOD.csv file, which consists of open, high, low, volume, previous close, RSI, EMA, HMA, ADX, PDI, MDI, and ATR values for a particular stock.

Download the Linear Regression Features Dataset prepared using Amibroker

Download NIFTY EOD csv data set

The dataset is split into two parts, training data, and testing data. The training data is used to train the model, and the testing data is used to evaluate the performance of the model.

The Linear Regression model is created using the LinearRegression() function from the scikit-learn library. The model is trained using the fit() method of the LinearRegression class on the training data.

Python Source code for Linear Regression Based Machine Learning Prediction

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score
import numpy as np

# Load the data
stock = pd.read_csv('NIFTY_EOD.csv')

df = stock

# Prepare the data
y = df['close_forecast']  ##target
X = df.drop(columns=['Ticker','Date/Time','close_forecast'], axis=1) ##feature input

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)

# Compute accuracy metrics
mse = mean_squared_error(y_test, y_pred)
print("Mean squared error: ", mse)

# Save predicted vs actual values to CSV
df_pred = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df_pred.to_csv('NIFTY_EOD_pred.csv', index=False)



# Compute accuracy metrics
mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
r2 = r2_score(y_test, y_pred)
ev = explained_variance_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = np.mean(np.abs(y_test - y_pred))

# Print accuracy metrics
print("Mean absolute percentage error (MAPE): ", mape)
print("R-squared: ", r2)
print("Explained variance: ", ev)
print("Mean squared error: ", mse)
print("Root mean squared error (RMSE): ", rmse)
print("Mean absolute error (MAE): ", mae)

# Make a prediction for the next day's close price

last_row = X.tail(1)
last_row_scaled = scaler.transform(last_row)
next_day_pred = model.predict(last_row_scaled)[0]
print("Predicted close price for the next day: ", next_day_pred)

Python Output

Mean squared error:  5131.098941109195
Mean absolute percentage error (MAPE):  1.0399683400963986
R-squared:  0.9997643613546477
Explained variance:  0.9997647348779873
Mean squared error:  5131.098941109195
Root mean squared error (RMSE):  71.63168950338387
Mean absolute error (MAE):  42.91492045843861
Predicted close price for the next day:  17400.060390626983

Here are the common steps involved in linear regression prediction:

  1. Data Collection: Collect relevant data related to the problem statement. In case of stock price prediction, data such as historical stock prices, volumes, market trends, etc., are collected.
  2. Data Preprocessing: This step involves cleaning and preparing the data for analysis. It includes removing any missing values, handling outliers, scaling/normalizing the data, etc.
  3. Feature Selection: Identifying the features that are most relevant to the problem statement. In case of stock price prediction, features such as open price, close price, volume, etc., may be considered.
  4. Training the Model: This involves selecting a machine learning algorithm, such as linear regression, and training the model on the prepared data. During this step, the model learns to make predictions based on the patterns found in the data.
  5. Model Evaluation: This step involves evaluating the performance of the model on a separate set of data (testing data) that was not used during training. Common evaluation metrics include mean squared error, mean absolute error, and R-squared.
  6. Hyperparameter Tuning: The performance of the model can be improved by tuning the hyperparameters of the algorithm. This involves selecting optimal values for parameters such as learning rate, regularization, and number of iterations.
  7. Prediction: Once the model is trained and evaluated, it can be used to make predictions on new, unseen data.
  8. Model Deployment: The final step involves deploying the trained model into a production environment for use in real-world applications.

After training the model, we can make predictions on the testing data using the predict() method of the LinearRegression class. The accuracy of the model is evaluated using different metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), R-squared, and Explained Variance.

In the above Python code, the metrics of the Linear Regression model are calculated as follows:

  1. Mean squared error (MSE): This metric measures the average of the squared differences between the predicted and actual values. In this case, the MSE is 5131.098941109195, which indicates that the model’s predictions are, on average, 5131.098941109195 units away from the actual values.
  2. Mean absolute percentage error (MAPE): This metric measures the average percentage difference between the predicted and actual values. A value of 1.0399683400963986 indicates that, on average, the model’s predictions are about 1.04% away from the actual values.
  3. R-squared (R2): This metric measures the proportion of the variance in the dependent variable (stock price) that is explained by the independent variables (predictors) in the model. An R2 value of 0.9997643613546477 indicates that 99.98% of the variance in the stock price can be explained by the predictors in the model.
  4. Explained variance: This metric is similar to R2 in that it measures the proportion of variance in the dependent variable that is explained by the model. An explained variance value of 0.9997647348779873 indicates that the model explains 99.98% of the variance in the stock price.
  5. Root mean squared error (RMSE): This metric measures the square root of the average of the squared differences between the predicted and actual values. In this case, the RMSE is 71.63168950338387, which indicates that the model’s predictions are, on average, 71.63168950338387 units away from the actual values.
  6. Mean absolute error (MAE): This metric measures the average absolute difference between the predicted and actual values. In this case, the MAE is 42.91492045843861, which indicates that the model’s predictions are, on average, 42.91492045843861 units away from the actual values.
  7. Predicted close price for the next day: This is the actual prediction made by the model for the close price of the stock for the next day, based on the data and model used to generate the above metrics. In this case, the predicted close price is 17400.060390626983.

In addition, the code outputs the predicted close price for the next day, which is 17400.06.

One disadvantage of the linear regression model is that it assumes a linear relationship between the input variables and the target variable. In reality, the relationship between the input variables and the target variable may be nonlinear, which can lead to poor performance of the linear regression model. Additionally, linear regression is sensitive to outliers in the data, which can skew the learned coefficients and lead to poor predictions. Finally, linear regression assumes that the input variables are independent of each other, which may not be the case in practice.

In conclusion, the Linear Regression model is a powerful machine-learning algorithm for predicting stock prices. The above Python code shows how to implement the Linear Regression model for predicting stock prices and evaluating its performance using various metrics. The accuracy of the model can be improved by using more relevant features and tuning the hyperparameters.

Rajandran R Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, USDINR and High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in)

[Live Coding Webinar] Build Your First Trading Bridge for…

In this course, you will be learning to build your own trading bridge using Python. This 60-minute session is perfect for traders, Python enthusiasts,...
Rajandran R
1 min read

How to Place Orders Concurrently using ThreadPoolExecutor – Python…

Creating concurrent orders is essential for active traders, especially those handling large funds, as it allows for executing multiple trade orders simultaneously, thereby maximizing...
Rajandran R
2 min read

Host your Python Flask Web Application using pyngrok and…

Ngrok offers several significant advantages for developers, especially when it comes to testing applications or hosting machine learning models. Ngrok allows you to expose...
Rajandran R
1 min read

Leave a Reply

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More