I just want to parse an HTML page especially nseindia website containing a table and import it into a MySQL table in an automated way to make a realtime/EOD charting web application. After a search of nearly about a year unfortunately the search ends today @ Prasanna’s Blog.
[wp_ad_camp_5]
Beautiful Soup is a Python HTML/XML Parser i.e it pick data’s from a table which we need to make updating every minute/hour/Day with programming skills.Now web parsing is no more difficult with beatutiful soups. Iam very new to python language just need to explore more to create a wonderful and a decent web charting application.
Here is the sample python script
<pre>f = open("input_file.html","r")
g = open("outfile_file.csv,"w")
soup = BeautifulSoup(f)
t = soup.findAll('table')
for table in t:
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
for td in cols:
g.write(td.find(text=True))
g.write(",")
g.write("
")</pre>
This script parses a simple HTML table without looking for any special tags or anything. Thats the beautiful of the Soup
Ok how do we use it ???
Ever heard of WebQL. Try that it might help, you might then take te output from WebQL do a bit of clean up and put it into MySQ. I guess iMacro is also something similar.
@Daemonkane
Thanks for the input sujith. It looks WebQL could also aggregates data from Web, PDF and Word documents, spreadsheets, email repositories, corporate data stores