PandasAI emerges as a groundbreaking tool in the dynamic landscape of data analysis, offering substantial advancement, particularly for students, novice programmers, fledgling data analysts, fund managers, and OpenAI/LLM enthusiasts.
The core innovation of PandasAI lies in its ability to make data conversation. Unlike traditional methods that require familiarity with specific coding syntax and data analysis concepts, PandasAI simplifies this interaction. Users can ask questions in natural language, and the system intelligently processes these queries.
PandasAI stands out with two key value propositions: ease of use and power. It’s designed for those who might not have deep knowledge of generative AI or pandas, making it an ideal learning tool. However, its capabilities are robust enough to cater to more complex tasks such as data exploration, visualization, cleaning, imputation, and feature engineering.
Installing PandasAI Library
pip install pandasai
pip install pandasai[connectors]
The magic of PandasAI is in its backend, where a generative AI model generates Python code based on the user’s natural language queries. This process involves understanding the query, creating the appropriate code, and executing it to produce results. This seamless process hides the complexities of coding and data manipulation, presenting users with an intuitive and efficient way to interact with data.
Load the Pandas Dataframe
from pandasai import SmartDataframe
import pandas as pd
# URL of the CSV file
csv_url = 'https://raw.githubusercontent.com/marketcalls/data/main/NIFTY_daily_data.csv'
# Load the CSV file from the URL
data = pd.read_csv(csv_url)
# Convert the date column to datetime for easier calculations
data['date'] = pd.to_datetime(data['date'])
data
Output
date symbol open high low close volume
0 1990-07-03 NIFTY 279.01999 279.01999 279.01999 279.01999 0.0
1 1990-07-05 NIFTY 284.04001 284.04001 284.04001 284.04001 0.0
2 1990-07-06 NIFTY 289.04001 289.04001 289.04001 289.04001 0.0
3 1990-07-09 NIFTY 289.69000 289.69000 289.69000 289.69000 0.0
4 1990-07-10 NIFTY 288.69000 288.69000 288.69000 288.69000 0.0
... ... ... ... ... ... ... ...
8100 2024-01-09 NIFTY 21653.60000 21724.44900 21517.85000 21544.85000 228573407.0
8101 2024-01-10 NIFTY 21529.30100 21641.85000 21448.65000 21618.69900 216991926.0
8102 2024-01-11 NIFTY 21688.00000 21726.50000 21593.75000 21647.19900 212453866.0
8103 2024-01-12 NIFTY 21773.55100 21928.25000 21715.15000 21894.55100 294678459.0
8104 2024-01-15 NIFTY 22053.15000 22081.95000 22021.10000 22059.90000 53802412.0
8105 rows × 7 columns
Set the OpenAI Apikey
PandasAI leverages a Large Language Model (LLM) for its functionality, so you’ll need to select and import the LLM that best suits your needs. For this example, we’ll be utilizing OpenAI’s capabilities.
To integrate OpenAI with PandasAI, an API token is necessary. Follow the straightforward steps outlined below to create your API_TOKEN with OpenAI.
- Go to https://platform.openai.com/apps and signup with your email address or connect your Google Account.
- Go to View API Keys on the left side of your Personal Account Settings
- Select Create New Secret key
The API access to openai is a paid service. You have to set up billing. Read the Pricing information before experimenting.
Inorder to store the OpenAI Apikey securely used the .env files and stored the key under environmental variable
#set the openAI apikey
import os
from dotenv import load_dotenv
from pandasai.llm import OpenAI
load_dotenv() # loads the configs from .env
openai_api_key = os.getenv("OPENAI_API_KEY")
llm = OpenAI(api_token=openai_api_key)
Querying the pandas dataframe using Prompts to perform Data Analysis
Prompt 1
#Set the SmartDataframe
sdf = SmartDataframe(data,config={'llm':llm})
#Prompt 1
sdf.chat("How many rows are there in data ?")
Output Response
8105
Prompt 2
#Prompt 2
sdf.chat("Get me the Highest value of Nifty from the high column")
Output Response
22081.95
Prompt 3
#Prompt 3
sdf.chat("Get the Last 20 days max high and min low value")
Output Response
'The last 20 days max high value is 22081.95 and min low value is 20976.801.'
Visualizing the Pandas Dataframe using Prompts
#data visualization prompt 1
sdf.chat("Plot the Line Chart of Close in red color")
Output Response
#data visualization prompt 2
sdf.chat("Plot the close chart with volume as subplot for the last 200 days")
Output Response
#data visualization prompt 3
sdf.chat("For the last 100 bars Plot the line chart of close in green color with ema10 and ema20 value")
Output Response
PandasAI Connectors
PandasAI provides several connectors that allow you to connect to different data sources like yahoo finance. These connectors are designed to be easy to use, even if you are not familiar with the data source or with PandasAI.
from pandasai.connectors.yahoo_finance import YahooFinanceConnector
yahoo_connector = YahooFinanceConnector("WIPRO.NS")
df = SmartDataframe(yahoo_connector, config={"llm": llm})
response = df.chat("What is the closing price for yesterday? Provide the output adjusted to 2 decimals")
print(response)
Output Response
The closing price for yesterday is 465.45.
yahoo_connector = YahooFinanceConnector("TATASTEEL.NS")
df_connector = SmartDataframe(yahoo_connector, config={"llm": llm})
response = df_connector.chat("Plot the line chart of Tata Steel over time for the last 200 days")
Output Response
yahoo_connector = YahooFinanceConnector("TATASTEEL.NS")
df_connector = SmartDataframe(yahoo_connector, config={"llm": llm})
response = df_connector.chat("Plot the line chart of Tata Steel over time for the last 200 days")
Output Response
PandasAI’s user-friendly nature makes it an excellent choice for a wide range of users. It’s particularly beneficial for those new to pandas or those seeking to streamline their data analysis workflow. Whether you are a student grappling with data analysis concepts, a beginner in programming looking to delve into data science, or an entry-level data analyst aiming to enhance efficiency, PandasAI is tailored for you.
The introduction of PandasAI signifies a paradigm shift in financial data analysis. Its ability to simplify complex data interactions, coupled with its powerful analytical capabilities, makes it a tool that not only enhances efficiency but also makes data analysis more accessible. As we move forward, tools like PandasAI are set to play a pivotal role in shaping the future of data analysis, making it more inclusive, efficient, and versatile.