DK Kim's data analysis (using Python,R): Data Handling (2/4)

Last post, I introduced about how to handle unstructured data using python programming. This time, I am going to reuse the result file which was generated by python programming.

Last time, I've uploaded test file data into R platform by using read,csv command. However, in this case, I am going to use MYSQL database as a data source for analysis.
Most company stores invaluable data into the specific database. So I think, data manipulation using database should be regarded as an common thing for data mining.

Now, let me start it one by one.
First , create a specific table on the MYSQL database.
In my case, I've installed MySQL server version (5.5.3) in my linux system and create the specific table schema for storing datafile.

=============================================
[hadoop15:32:21@NBA]$mysql -u dkkim -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 39
Server version: 5.5.32-0ubuntu0.12.04.1 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

mysql> use dkkim
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> CREATE TABLE NBA_LOAD
(name varchar(200) ,
team varchar(10),
position varchar(10),
age numeric);
Query OK, 0 rows affected (0.03 sec)

mysql> show columns from NBA_LOAD ;
+----------+---------------+------+-----+---------+-------+
| Field    | Type          | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+-------+
| name     | varchar(200) | YES |     | NULL    |       |
| team     | varchar(10)   | YES |     | NULL    |       |
| position | varchar(10)   | YES |     | NULL    |       |
| age      | decimal(10,0) | YES |     | NULL    |       |
+----------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

=================================================

Second, load datafile into the table.

=================================================

mysql> LOAD DATA LOCAL INFILE 'NEWFILE.txt' INTO TABLE NBA_LOAD ;
Query OK, 409 rows affected (0.01 sec)
Records: 409 Deleted: 0 Skipped: 0 Warnings: 0

* In my case, connect MySQL in the same path of the datafile, so just use relative path 'NEWFILE.txt'
==================================================
Third, once your text file data is stored in the database. You can connect the any table via R command with a minor preparation.

There are several ways to connect database. In this case I am going to use "RMySQL" library. If you haven't installed this package, then just install it by install.packages("RMySQL").
Once you've installed this pacakges then load this library first, and
connect the database as follows.

===============================================

> library("RMySQL")
Loading required package: DBI
> mydb = dbConnect (MySQL() , user='dkkim',password='dkkim' ,dbname ='dkkim')
> rs = dbSendQuery(mydb, "select * from NBA_LOAD ")

>nba_bydb = fetch(rs, n=-1)\
                     name team position age
1              Quincy Acy Sac       SF 23
2            Steven Adams Okc        C 20
3             Jeff Adrien Cha       PF 27
4           Arron Afflalo Orl       SG 28
5           Alexis Ajinca Nor        C 25
6       LaMarcus Aldridge Por       PF 28
7             Lavoy Allen Phi       PF 24
8               Ray Allen Mia       SG 38
9              Tony Allen Mem       SG 32
10         Louis Amundson Nor       PF 31
11         Chris Andersen Mia        C 35
12          Alan Anderson Bro       SF 31
13         James Anderson Phi       SG 24
14          Ryan Anderson Nor       PF 25
15 Giannis Antetokounmpo Mil       SF 19
16        Carmelo Anthony Nyk       PF 29
17             Pero Antic Atl       PF 31
18           Trevor Ariza Was       SF 28
19         Darrell Arthur Den       SF 25
20              Omer Asik Hou        C 27
21           Gustavo Ayon Atl        C 28
22             Jeff Ayres San       PF 26
23        Leandro Barbosa Pho        G 31
24             Jose Barea Min       PG 29
25        Andrea Bargnani Nyk        C 28
26        Harrison Barnes Gol       SF 21
27            Matt Barnes Lac       SF

Now, it's done.

DK Kim's data analysis (using Python,R)

Tuesday, 4 February 2014

Data Handling (2/4) - Connect Mysql data via R command

No comments: