DK Kim's data analysis (using Python,R): Data Handling (4/4) Web Data handling using Python library

Today I am going to introduce another way to get a web data using python library.

I think it's much easier than previous one.
Python provides a useful library called "urllib". It can make you get a html tag without downloading or copying.

In order to use this library you have to import it at the first section of your python program.

I am going to do the same thing we did on previous posts.
Only you have to do is use this library and open method to get a html source. The rest of python source code is quite similar to previous one.

==================
import urllib
import re

def main(filename):
fileopen=open(filename,"w+")

htmlfile = urllib.urlopen("http://www.nbastuffer.com/2013-2014_NBA_Regular_Season_Player_Stats.html")
htmltext=htmlfile.read()

pattern = re.compile('<td></td><td>([\w]+[\s][\w]+)</td><td>(\w\w\w)</td><td>(\w+)</td><td>(\d+)</td><td>')

nba_contents=re.findall(pattern,htmltext)
for filerows in nba_contents:
(name, team, position, age ) = filerows
data = '%s\t%s\t%s\t%s' %(name,team,position,age)
fileopen.write(data+'\n')

if __name__=='__main__':
filename=raw_input('Enter Filename : ')
main(filename)

=======================

Now store this source in a appropriate linux directory and execute it.

[hadoop15:52:36@NBA]$python WEB_NBA.py
Enter Filename : NBAFILE_Through_web.txt
[hadoop15:53:49@NBA]$cat NBAFILE_Through_web.txt |more
Quincy Acy    Tor    SF    23
Quincy Acy    Sac    SF    23
Steven Adams    Okc    C    20
Jeff Adrien    Cha    PF    27
Jeff Adrien    Mil    PF    27
Arron Afflalo    Orl    SG    28
......

You can see the file which is named as a 'NBAFILE_Through_web.txt' on the directory.
It's done

DK Kim's data analysis (using Python,R)

Wednesday, 27 August 2014

Data Handling (4/4) Web Data handling using Python library

No comments: