Last post, I introduced about how to handle unstructured data using python programming. This time, I am going to reuse the result file which was generated by python programming.
Last time, I've uploaded test file data into R platform by using read,csv command. However, in this case, I am going to use MYSQL database as a data source for analysis.
Most company stores invaluable data into the specific database. So I think, data manipulation using database should be regarded as an common thing for data mining.
Now, let me start it one by one.
First , create a specific table on the MYSQL database.
In my case, I've installed MySQL server version (5.5.3) in my linux system and create the specific table schema for storing datafile.
=============================================
[hadoop15:32:21@NBA]$mysql -u dkkim -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 39
Server version: 5.5.32-0ubuntu0.12.04.1 (Ubuntu)
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
mysql> use dkkim
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> CREATE TABLE NBA_LOAD
(name varchar(200) ,
team varchar(10),
position varchar(10),
age numeric);
Query OK, 0 rows affected (0.03 sec)
mysql> show columns from NBA_LOAD ;
+----------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------+------+-----+---------+-------+
| name | varchar(200) | YES | | NULL | |
| team | varchar(10) | YES | | NULL | |
| position | varchar(10) | YES | | NULL | |
| age | decimal(10,0) | YES | | NULL | |
+----------+---------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
=================================================
Second, load datafile into the table.
=================================================
mysql> LOAD DATA LOCAL INFILE 'NEWFILE.txt' INTO TABLE NBA_LOAD ;
Query OK, 409 rows affected (0.01 sec)
Records: 409 Deleted: 0 Skipped: 0 Warnings: 0
* In my case, connect MySQL in the same path of the datafile, so just use relative path 'NEWFILE.txt'
==================================================
Third, once your text file data is stored in the database. You can connect the any table via R command with a minor preparation.
There are several ways to connect database. In this case I am going to use "RMySQL" library. If you haven't installed this package, then just install it by install.packages("RMySQL").
Once you've installed this pacakges then load this library first, and
connect the database as follows.
===============================================
> library("RMySQL")
Loading required package: DBI
> mydb = dbConnect (MySQL() , user='dkkim',password='dkkim' ,dbname ='dkkim')
> rs = dbSendQuery(mydb, "select * from NBA_LOAD ")
>nba_bydb = fetch(rs, n=-1)\
name team position age
1 Quincy Acy Sac SF 23
2 Steven Adams Okc C 20
3 Jeff Adrien Cha PF 27
4 Arron Afflalo Orl SG 28
5 Alexis Ajinca Nor C 25
6 LaMarcus Aldridge Por PF 28
7 Lavoy Allen Phi PF 24
8 Ray Allen Mia SG 38
9 Tony Allen Mem SG 32
10 Louis Amundson Nor PF 31
11 Chris Andersen Mia C 35
12 Alan Anderson Bro SF 31
13 James Anderson Phi SG 24
14 Ryan Anderson Nor PF 25
15 Giannis Antetokounmpo Mil SF 19
16 Carmelo Anthony Nyk PF 29
17 Pero Antic Atl PF 31
18 Trevor Ariza Was SF 28
19 Darrell Arthur Den SF 25
20 Omer Asik Hou C 27
21 Gustavo Ayon Atl C 28
22 Jeff Ayres San PF 26
23 Leandro Barbosa Pho G 31
24 Jose Barea Min PG 29
25 Andrea Bargnani Nyk C 28
26 Harrison Barnes Gol SF 21
27 Matt Barnes Lac SF
Now, it's done.
No comments:
Post a Comment