Tuesday, 4 February 2014

Data Handling (2/4) - Connect Mysql data via R command

Last post, I introduced about how to handle unstructured data using python programming. This time, I am going to reuse the result file which was generated by python programming.

Last time, I've uploaded  test file data into R platform by using read,csv command. However, in this case, I am going to use MYSQL database as a data source for analysis.
Most company stores invaluable data into the specific database. So I think, data manipulation using database should be regarded as an common thing for data mining.


Now, let me start it one by one.
First , create a specific table on the MYSQL database.
In my case, I've installed MySQL server version (5.5.3) in my linux system and create the specific table schema  for storing datafile.



=============================================
[hadoop15:32:21@NBA]$mysql -u dkkim -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 39
Server version: 5.5.32-0ubuntu0.12.04.1 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

mysql> use dkkim
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> CREATE TABLE NBA_LOAD
(name varchar(200) ,
team varchar(10),
position varchar(10),
 age numeric);
Query OK, 0 rows affected (0.03 sec)


mysql> show columns from NBA_LOAD ;
+----------+---------------+------+-----+---------+-------+
| Field    | Type          | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+-------+
| name     | varchar(200)  | YES  |     | NULL    |       |
| team     | varchar(10)   | YES  |     | NULL    |       |
| position | varchar(10)   | YES  |     | NULL    |       |
| age      | decimal(10,0) | YES  |     | NULL    |       |
+----------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)

=================================================


 Second, load datafile into the table.

=================================================

mysql> LOAD DATA LOCAL INFILE 'NEWFILE.txt' INTO TABLE  NBA_LOAD ;
Query OK, 409 rows affected (0.01 sec)
Records: 409  Deleted: 0  Skipped: 0  Warnings: 0

* In my case, connect MySQL in the same path of the datafile, so just use relative path 'NEWFILE.txt'
==================================================
Third, once your text file data is stored in the database. You can connect the any table via R command with a minor preparation.

There are several ways to connect database. In this case I am going to use "RMySQL" library. If you haven't installed this package, then just install it by install.packages("RMySQL").
Once you've installed this pacakges then load this library first, and
connect the database as follows.


===============================================

> library("RMySQL")
Loading required package: DBI
> mydb = dbConnect (MySQL() , user='dkkim',password='dkkim' ,dbname ='dkkim')
> rs = dbSendQuery(mydb, "select  * from NBA_LOAD ")

>nba_bydb  = fetch(rs, n=-1)\
                     name team position age
1              Quincy Acy  Sac       SF  23
2            Steven Adams  Okc        C  20
3             Jeff Adrien  Cha       PF  27
4           Arron Afflalo  Orl       SG  28
5           Alexis Ajinca  Nor        C  25
6       LaMarcus Aldridge  Por       PF  28
7             Lavoy Allen  Phi       PF  24
8               Ray Allen  Mia       SG  38
9              Tony Allen  Mem       SG  32
10         Louis Amundson  Nor       PF  31
11         Chris Andersen  Mia        C  35
12          Alan Anderson  Bro       SF  31
13         James Anderson  Phi       SG  24
14          Ryan Anderson  Nor       PF  25
15  Giannis Antetokounmpo  Mil       SF  19
16        Carmelo Anthony  Nyk       PF  29
17             Pero Antic  Atl       PF  31
18           Trevor Ariza  Was       SF  28
19         Darrell Arthur  Den       SF  25
20              Omer Asik  Hou        C  27
21           Gustavo Ayon  Atl        C  28
22             Jeff Ayres  San       PF  26
23        Leandro Barbosa  Pho        G  31
24             Jose Barea  Min       PG  29
25        Andrea Bargnani  Nyk        C  28
26        Harrison Barnes  Gol       SF  21
27            Matt Barnes  Lac       SF


Now, it's done.

No comments: