Today I am going to introduce another way to get a web data using python library.
I think it's much easier than previous one.
Python provides a useful library called "urllib". It can make you get a html tag without downloading or copying.
In order to use this library you have to import it at the first section of your python program.
I am going to do the same thing we did on previous posts.
Only you have to do is use this library and open method to get a html source. The rest of python source code is quite similar to previous one.
==================
import urllib
import re
def main(filename):
fileopen=open(filename,"w+")
htmlfile = urllib.urlopen("http://www.nbastuffer.com/2013-2014_NBA_Regular_Season_Player_Stats.html")
htmltext=htmlfile.read()
pattern = re.compile('<td></td><td>([\w]+[\s][\w]+)</td><td>(\w\w\w)</td><td>(\w+)</td><td>(\d+)</td><td>')
nba_contents=re.findall(pattern,htmltext)
for filerows in nba_contents:
(name, team, position, age ) = filerows
data = '%s\t%s\t%s\t%s' %(name,team,position,age)
fileopen.write(data+'\n')
if __name__=='__main__':
filename=raw_input('Enter Filename : ')
main(filename)
=======================
Now store this source in a appropriate linux directory and execute it.
[hadoop15:52:36@NBA]$python WEB_NBA.py
Enter Filename : NBAFILE_Through_web.txt
[hadoop15:53:49@NBA]$cat NBAFILE_Through_web.txt |more
Quincy Acy Tor SF 23
Quincy Acy Sac SF 23
Steven Adams Okc C 20
Jeff Adrien Cha PF 27
Jeff Adrien Mil PF 27
Arron Afflalo Orl SG 28
......
You can see the file which is named as a 'NBAFILE_Through_web.txt' on the directory.
It's done
No comments:
Post a Comment