[IBM]Databases and SQL for Data Science with Python

[IBM]Databases and SQL for Data Science with Python - Analyzing data with Python

2021. 5. 2. 19:32ㆍData science/Database

Load CSV file into DB2 on cloud : Source-Target-Define-Finalize

data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2?utm_email=Email&utm_source=Nurture&utm_content=000026UJ&utm_term=10006555&utm_campaign=PLACEHOLDER&utm_id=SkillsNetwork-Courses-IBMDeveloperSkillsNetwork-DB0201EN-SkillsNetwork-20127838

Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012 | City of Chicago | Data Portal

data.cityofchicago.org

Community Area Number (ca): Used to uniquely identify each row of the dataset
Community Area Name (community_area_name): The name of the region in the city of Chicago
Percent of Housing Crowded (percent_of_housing_crowded): Percent of occupied housing units with more than one person per room
Percent Households Below Poverty (percent_households_below_poverty): Percent of households living below the federal poverty line
Percent Aged 16+ Unemployed (percent_aged_16_unemployed): Percent of persons over the age of 16 years that are unemployed
Percent Aged 25+ without High School Diploma (percent_aged_25_without_high_school_diploma): Percent of persons over the age of 25 years without a high school education
Percent Aged Under 18 or Over 64:Percent of population under 18 or over 64 years of age (percent_aged_under_18_or_over_64): (ie. dependents)
Per Capita Income (per_capita_income_): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population
Hardship Index (hardship_index): Score that incorporates each of the six selected socioeconomic indicators

We will first read the dataset source .CSV from the internet into pandas dataframe.

Then we need to create a table in our Db2 database to store the dataset. The PERSIST command in SQL "magic" simplifies the process of table creation and writing the data from a pandas dataframe into the table

import pandas
chicago_socioeconomic_data = pandas.read_csv('https://data.cityofchicago.org/resource/jcxq-k9xf.csv')
%sql PERSIST chicago_socioeconomic_data

Create a scatter plot using the variables per_capita_income_ and hardship_index. Explain the correlation between the two variables.

import matplotlib.pyplot as plt 
%matplotlib inline
import seaborn as sns

income_vs_hardship = %sql select per_capita_income_, hardship_index from chicago_socioeconomic_data; 

plot = sns.jointplot(x= 'per_capita_income_', y='hardship_index', data=income_vs_hardship.DataFrame())

https://gist.github.com/3eb707f353307c7d173af087134f8f64

Created on Skills Network Labs

Created on Skills Network Labs. GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

You can access a database from a language like Python by using the appropriate API. Examples include ibm_db API for IBM DB2, psycopg2 for ProstgreSQL, and dblib API for SQL Server.
DB-API is Python's standard API for accessing relational databases. It allows you to write a single program that works with multiple kinds of relational databases instead of writing a separate program for each one.
The DB_API connect constructor creates a connection to the database and returns a Connection Object, which is then used by the various connection methods.
The connection methods are:
- The cursor() method, which returns a new cursor object using the connection.
- The commit() method, which is used to commit any pending transaction to the database.
- The rollback() method, which causes the database to roll back to the start of any pending transaction.
- The close() method, which is used to close a database connection.
You can use SQL Magic commands to execute queries more easily from Jupyter Notebooks. Magic commands have the general format %sql select * from tablename.
Cell magics start with a double %% (percent) sign and apply to the entire cell. Line magics start with a single % (percent) sign and apply to a particular line in a cell.

An ibm_db.exec_immediate function that includes connection information and a SQL statement to create the table.

connect('database name', 'username', 'password')

질문 1

A database cursor is a control structure that enables traversal over the records in a database

The ibm_db API provides a variety of useful Python functions for accessing and manipulating data in an IBM data server like Db2

A Dataframe represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type. Indicate whether the following statement is True or False:

A pandas dataframe in Python can be used for storing the result set of a SQL query.

저작자표시 비영리 변경금지 (새창열림)

'Data science > Database' 카테고리의 다른 글

[IBM]Databases and SQL for Data Science with Python - Views, Stored Procedures (0)	2021.05.09
[IBM]Databases and SQL for Data Science with Python - Working with real-world datasets, Getting table and columns Details (DB2) (0)	2021.05.07
[IBM]Databases and SQL for Data Science with Python - Creating Tables, loading data and querying data (0)	2021.05.02
[IBM]Databases and SQL for Data Science with Python - How to access DB using Python (0)	2021.05.02
[IBM]Databases and SQL for Data Science with Python- Week 2 & Week 3 (0)	2021.05.02

Piccole Gioie🪴

Piccole Gioie🪴

태그

최근글

댓글

공지사항

아카이브

'Data science > Database' 카테고리의 다른 글

관련글

티스토리툴바