Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data from Screener – Python Tutorial

2 min read

Note:
Blog updated on May 2024 with modern Examples. This tutorial is only for Educational purpose only to understand the capabilities of Beautiful Soup.

Web scraping is an invaluable technique for extracting data from websites, and Beautiful Soup is one of the most popular libraries in Python for this purpose. Whether you’re looking to gather financial data, product prices, or any other type of information, Beautiful Soup offers a robust and flexible solution. In this blog, we’ll explore the capabilities of Beautiful Soup and demonstrate how to extract financial data from screener.in, a comprehensive stock analysis platform.

What is Beautiful Soup?

Beautiful Soup is a Python library used for parsing HTML and XML documents. It creates a parse tree from page source code, enabling data extraction from HTML tags. Beautiful Soup works with a parser to provide Pythonic idioms for iterating, searching, and modifying the parse tree.

Key Features of Beautiful Soup

  1. Ease of Use: Beautiful Soup abstracts many of the complexities involved in parsing HTML, making it simple to use even for beginners.
  2. Integration with Parsers: It supports several parsers, including Python’s built-in HTML parser, lxml, and html5lib, giving you flexibility in how you process your documents.
  3. Navigating the Parse Tree: Beautiful Soup provides powerful methods for navigating the parse tree and finding elements using tags, attributes, text, and more.
  4. Modifying the Parse Tree: You can modify the document by adding, removing, or altering tags and content.

Extracting Data from Screener.in

To demonstrate Beautiful Soup’s capabilities, we’ll extract key financial insights from a company’s page on screener.in. We’ll use the Python requests library to fetch the webpage and Beautiful Soup to parse and extract the desired data.

Here’s the step-by-step process:

Install Necessary Libraries: Ensure you have Beautiful Soup and requests installed.

pip install beautifulsoup4 requests

Fetch the Webpage: Use requests to download the webpage content.

Parse the Webpage: Use Beautiful Soup to parse the HTML content.

Extract Data: Navigate the parse tree to find and extract relevant data.

Python Code

Below is a complete example to fetch and extract data from screener.in:

import requests
from bs4 import BeautifulSoup

stock_ticker = "RELIANCE"
url = f"https://www.screener.in/company/{stock_ticker}/consolidated/"
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response = requests.get(url, headers=headers)

if response.status_code == 200:
    print("Successfully fetched the webpage")
else:
    print(f"Failed to fetch the webpage. Status code: {response.status_code}")

soup = BeautifulSoup(response.content, 'html.parser')

def extract_key_insights(soup):
    company_name = soup.find('h1', class_='h2 shrink-text').text.strip()
    current_price = soup.find('div', class_='font-size-18 strong line-height-14').find('span').text.strip()
    market_cap = soup.find('li', {'data-source': 'default'}).find('span', class_='number').text.strip()
    about_section = soup.find('div', class_='company-profile').find('div', class_='sub show-more-box about').text.strip()

    return {
        "Company Name": company_name,
        "Current Price": current_price,
        "Market Cap": market_cap,
        "About": about_section
    }

# Extract data
key_insights = extract_key_insights(soup)

# Print extracted data
print("Company Name:", key_insights['Company Name'])
print("Current Price:", key_insights['Current Price'])
print("Market Cap:", key_insights['Market Cap'])
print("About:", key_insights['About'])

    Output

    Successfully fetched the webpage
    Company Name: Reliance Industries Ltd
    Current Price: ₹ 2,870
    Market Cap: 19,41,516
    About: Reliance was founded by Dhirubhai Ambani and is now promoted and managed by his elder son, Mukesh Dhirubhai Ambani. Ambani's family has about 50% shareholding in the conglomerate.

    Explanation of the Code

    1. Import Libraries: We import the necessary libraries: requests for making HTTP requests and BeautifulSoup from bs4 for parsing HTML.
    2. Define Variables: Set the stock ticker and construct the URL for the company’s page on screener.in.
    3. Set Headers: Define the headers to mimic a browser request, which helps in avoiding potential blocks by the website.
    4. Make the Request: Fetch the webpage content using requests.get(). Check if the request was successful.
    5. Parse the Content: Create a Beautiful Soup object to parse the HTML content.
    6. Extract Insights: Define a function extract_key_insights to navigate the parse tree and extract the company name, current price, market cap, and about section.

    Beautiful Soup simplifies the process of web scraping by providing an intuitive way to navigate and manipulate HTML documents. By following the steps outlined above, you can efficiently extract valuable financial data from screener.in or any other website. As you become more familiar with Beautiful Soup, you’ll discover even more ways to harness its power for your data extraction needs. Happy scraping!

    Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

    How I Built a Telegram AI Stock Assistant Using…

    In this post, I'll walk you through the process of creating an intelligent Telegram AI assistant, StockBot, using the Llama 3 Groq tool use...
    Rajandran R
    1 min read

    [Course] Building Stock Market Based Telegram Bots using Python

    Learn how to build powerful Telegram bots for the stock market using Python. This hands-on course guides you through creating bots that fetch real-time...
    Rajandran R
    1 min read

    Understanding Object-Oriented Programming (OOP) Concepts in Python for Traders…

    For traders and investors, having well-structured code can greatly improve the efficiency and manageability of trading applications. Object-Oriented Programming (OOP) in Python offers a...
    Rajandran R
    3 min read

    3 Replies to “Beautiful Soup: Understanding Web Scraping Capabilities to Extract Data…”

    1. Ever heard of WebQL. Try that it might help, you might then take te output from WebQL do a bit of clean up and put it into MySQ. I guess iMacro is also something similar.

      1. @Daemonkane

        Thanks for the input sujith. It looks WebQL could also aggregates data from Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores

    Leave a Reply

    Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More