Learning Linear Regression - Simple Machine Learning Based Prediction Algorithm for Forecasting Stock Price

Machine learning has become an integral part of stock market analysis and prediction. Linear Regression is a widely used algorithm for predicting stock prices. In this blog, we will discuss the Linear Regression model for predicting stock prices using the Python programming language.

What is Linear Regression

Linear regression is a type of supervised learning algorithm that makes predictions based on a linear relationship between the input variables (also known as features) and the output variable (also known as the target variable).

In the case of stock price prediction, the linear regression model is trained on historical stock price data, which includes features such as the open, high, low, close, volume, rsi, ema, hma, adx, atr for a given day. The target variable is typically the closing_forecast since this is the price that investors are most interested in predicting.

The linear regression model uses the training data to learn the relationship between the input variables (features) and the target variable(prediction), by estimating the coefficients of a linear equation that best fits the data. Once the model has been trained, it can be used to make predictions on new, unseen data, by simply plugging in the values of the input variables and solving for the output variable using the learned coefficients.

Preparing the Feature Dataset

The dataset used for the implementation of the model is the NIFTY_EOD.csv file, which consists of open, high, low, volume, previous close, RSI, EMA, HMA, ADX, PDI, MDI, and ATR values for a particular stock.

Download the Linear Regression Features Dataset prepared using Amibroker

Download NIFTY EOD csv data set

The dataset is split into two parts, training data, and testing data. The training data is used to train the model, and the testing data is used to evaluate the performance of the model.

The Linear Regression model is created using the LinearRegression() function from the scikit-learn library. The model is trained using the fit() method of the LinearRegression class on the training data.

Python Source code for Linear Regression Based Machine Learning Prediction

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score
import numpy as np

# Load the data
stock = pd.read_csv('NIFTY_EOD.csv')

df = stock

# Prepare the data
y = df['close_forecast']  ##target
X = df.drop(columns=['Ticker','Date/Time','close_forecast'], axis=1) ##feature input

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Make predictions
y_pred = model.predict(X_test_scaled)

# Compute accuracy metrics
mse = mean_squared_error(y_test, y_pred)
print("Mean squared error: ", mse)

# Save predicted vs actual values to CSV
df_pred = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
df_pred.to_csv('NIFTY_EOD_pred.csv', index=False)



# Compute accuracy metrics
mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
r2 = r2_score(y_test, y_pred)
ev = explained_variance_score(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
mae = np.mean(np.abs(y_test - y_pred))

# Print accuracy metrics
print("Mean absolute percentage error (MAPE): ", mape)
print("R-squared: ", r2)
print("Explained variance: ", ev)
print("Mean squared error: ", mse)
print("Root mean squared error (RMSE): ", rmse)
print("Mean absolute error (MAE): ", mae)

# Make a prediction for the next day's close price

last_row = X.tail(1)
last_row_scaled = scaler.transform(last_row)
next_day_pred = model.predict(last_row_scaled)[0]
print("Predicted close price for the next day: ", next_day_pred)

Python Output

Mean squared error:  5131.098941109195
Mean absolute percentage error (MAPE):  1.0399683400963986
R-squared:  0.9997643613546477
Explained variance:  0.9997647348779873
Mean squared error:  5131.098941109195
Root mean squared error (RMSE):  71.63168950338387
Mean absolute error (MAE):  42.91492045843861
Predicted close price for the next day:  17400.060390626983

Here are the common steps involved in linear regression prediction:

Data Collection: Collect relevant data related to the problem statement. In case of stock price prediction, data such as historical stock prices, volumes, market trends, etc., are collected.
Data Preprocessing: This step involves cleaning and preparing the data for analysis. It includes removing any missing values, handling outliers, scaling/normalizing the data, etc.
Feature Selection: Identifying the features that are most relevant to the problem statement. In case of stock price prediction, features such as open price, close price, volume, etc., may be considered.
Training the Model: This involves selecting a machine learning algorithm, such as linear regression, and training the model on the prepared data. During this step, the model learns to make predictions based on the patterns found in the data.
Model Evaluation: This step involves evaluating the performance of the model on a separate set of data (testing data) that was not used during training. Common evaluation metrics include mean squared error, mean absolute error, and R-squared.
Hyperparameter Tuning: The performance of the model can be improved by tuning the hyperparameters of the algorithm. This involves selecting optimal values for parameters such as learning rate, regularization, and number of iterations.
Prediction: Once the model is trained and evaluated, it can be used to make predictions on new, unseen data.
Model Deployment: The final step involves deploying the trained model into a production environment for use in real-world applications.

After training the model, we can make predictions on the testing data using the predict() method of the LinearRegression class. The accuracy of the model is evaluated using different metrics, such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), R-squared, and Explained Variance.

In the above Python code, the metrics of the Linear Regression model are calculated as follows:

Mean squared error (MSE): This metric measures the average of the squared differences between the predicted and actual values. In this case, the MSE is 5131.098941109195, which indicates that the model’s predictions are, on average, 5131.098941109195 units away from the actual values.
Mean absolute percentage error (MAPE): This metric measures the average percentage difference between the predicted and actual values. A value of 1.0399683400963986 indicates that, on average, the model’s predictions are about 1.04% away from the actual values.
R-squared (R2): This metric measures the proportion of the variance in the dependent variable (stock price) that is explained by the independent variables (predictors) in the model. An R2 value of 0.9997643613546477 indicates that 99.98% of the variance in the stock price can be explained by the predictors in the model.
Explained variance: This metric is similar to R2 in that it measures the proportion of variance in the dependent variable that is explained by the model. An explained variance value of 0.9997647348779873 indicates that the model explains 99.98% of the variance in the stock price.
Root mean squared error (RMSE): This metric measures the square root of the average of the squared differences between the predicted and actual values. In this case, the RMSE is 71.63168950338387, which indicates that the model’s predictions are, on average, 71.63168950338387 units away from the actual values.
Mean absolute error (MAE): This metric measures the average absolute difference between the predicted and actual values. In this case, the MAE is 42.91492045843861, which indicates that the model’s predictions are, on average, 42.91492045843861 units away from the actual values.
Predicted close price for the next day: This is the actual prediction made by the model for the close price of the stock for the next day, based on the data and model used to generate the above metrics. In this case, the predicted close price is 17400.060390626983.

In addition, the code outputs the predicted close price for the next day, which is 17400.06.

One disadvantage of the linear regression model is that it assumes a linear relationship between the input variables and the target variable. In reality, the relationship between the input variables and the target variable may be nonlinear, which can lead to poor performance of the linear regression model. Additionally, linear regression is sensitive to outliers in the data, which can skew the learned coefficients and lead to poor predictions. Finally, linear regression assumes that the input variables are independent of each other, which may not be the case in practice.

In conclusion, the Linear Regression model is a powerful machine-learning algorithm for predicting stock prices. The above Python code shows how to implement the Linear Regression model for predicting stock prices and evaluating its performance using various metrics. The accuracy of the model can be improved by using more relevant features and tuning the hyperparameters.

Learning Linear Regression – Simple Machine Learning Based Prediction Algorithm for Forecasting Stock Price

What is Linear Regression

Related

How I Built a Telegram AI Stock Assistant Using…

[Course] Building Stock Market Based Telegram Bots using Python

Understanding Object-Oriented Programming (OOP) Concepts in Python for Traders…

Leave a ReplyCancel reply