Monday, 8 September 2014

A simple data classification-part 1/4 (Using Python in a Hadoop filesystem)

Through previous two series, we learned two simple way to visualize a NBA individual age. Of course this is meaningful task
but people may more interested in that what age group has the greatest number? or vise versa.

In order to get this answer we have to count  individual age one by one. If your total number is quite small, then it is make sense but if given number is thousand of number it is a really stupid job to do it.
A classification usually make us identify complicated things more easily.

There are so many classification method.
This time I am going to make a code for calculating the frequency of each age so we can figure out the age in which most of player are included.

Before proceding python code. we have to understand dictionary structure. The dictionary structure is one of the most common structures in a python programming. It is like a real dictionary. It is composed of two pair values. (Key, Value)

For example, if we have a certain age number like this.

22, 23, 23,23 ,40, 22, 22, 22

You can figure out that most of player's age is 22.
It can be described as follows
  AGE   COUNTS
   22    :    5
   23    :    3
   40    :    1



As you can see above, age is a key and counts is a value.
A dictionary provides a structure you can contain those number set.

Take a look at the code below.
A dictionary is declared using '{}' .
It stores paired value (key,value) and also provides variety of functions to manipulate the data.


>>> Age = [22,23,23,23,40,22,22,22]
>>> Age
[22, 23, 23, 23, 40, 22, 22, 22]

>>> AgeDictionary={}
>>> AgeDictionary = { 22:5, 23:3 , 40:1}
>>> AgeDictionary
{40: 1, 22: 5, 23: 3}
>>> AgeDictionary.keys()
[40, 22, 23]
>>> AgeDictionary.values()
[1, 5, 3]
>>> AgeDictionary[22]
5

Next time I am going to make a python program using a  player's age data with a dictionary structure.

No comments: