[IBM]Databases and SQL for Data Science with Python - Creating Tables, loading data and querying data

2021. 5. 2. 16:08Data science/Database

반응형

How to create tables in DB2 from Python application

ibm_db.exec_immediate() 

The parameters for the functions are: connection, statement, options 

stmt = ibm_db.exec_immediate (conn, 
"CREATE TABLE Trucks(
serial_no varchar(20) PRIMARY KEY NOT NULL,
model VARCHAR(20) NOT NULL, 
manufacturer VARCHAR(20) NOT NULL, 
Engine_size VARCHAR(20) NOT NULL, 
Truck_Class VARCHAR(20) NOT NULL) " 
)

How to load the data into the table 

stmt = ibm_db.exec_immediate(conn, 
"INSERT INTO Trucks(serial_no, 
model, manufacturer, Engine_size, Truck_Class)
VALUES('A1234', 'Lonestar', 'International Trucks',
'Cummins ISX15', 'Class 8');")

Using pandas : popular python library that contains high-level data structures and manipulation tools designed to make data analysis fast and easy in python

: data frame called DF, this represents a tabular spreadsheet like data structure containing an ordered collection of columns 

 

https://gist.github.com/37afcf56260316ac02198357be2eed25

 

Created on Skills Network Labs

Created on Skills Network Labs. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

 

 

Accessing Databases with SQL Magic

1. %load_ext sql: to load the ipython-sql extension

2. i.e. from the uri field in the Service Credentials copy everything after db2:// 

-> # Note the ibm_db_sa:// prefix instead of db2://

For convenience, we can use %%sql (two %'s instead of one) at the top of a cell to indicate we want the entire cell to be treated as SQL. 

Using Python Variables in your SQL Statements :by adding a ":" prefix to your python variable names.

country = "Canada"
%sql select * from INTERNATIONAL_STUDENT_TEST_SCORES where country = :country

 

** I was struggling for a long time to solve the error I had. 

AttributeError: 'DB2ExecutionContext_ibm_db' object has no attribute 'compiled_parameters'

The solution was

Install the following libraries in a separate cell before all the code.

Next restart kernel and run all cells.

 

You can easily convert a SQL query result to a pandas dataframe using the DataFrame() method

dataframe = test_score_distribution.DataFrame()

%matplotlib inline
# uncomment the following line if you get an module error saying seaborn not found
# !pip install seaborn
import seaborn

plot = seaborn.barplot(x='Test Score',y='Frequency', data=dataframe)

https://gist.github.com/05fe3e1ae5582e581d626c896a668f1f

 

Created on Skills Network Labs

Created on Skills Network Labs. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

 

반응형