[IBM]Python for Data Science, AI & Development - Loading Data with Pandas & One Dimensional Numpy

2021. 5. 7. 17:22Data science/Python

반응형
import pandas as pd

csv_path ='file1.csv'

df=pd.read_csv(csv_path)

*Date Frames (df) :comprised rows and columns

: df.head() - first 5 lines 

: to be able to created out of dictionary (key - columns, value - rows) 

: single or multiple columns can be extracted -> y = df [ [ 'Length' ] ] or y = df  [ ['Length', 'Genre' ] ] -> new dataframe created

 

List unique values - pandas has unique functions : df['Released'].unique() 

Save as CSV : df1.to_csv('new_songes.csv')

 

Exercise using Watson Studio 

dataplatform.cloud.ibm.com/analytics/notebooks/v2/d4a44160-79fa-40fb-b756-c2f356da71da/view?access_token=6aea532da56514c08cd1953d3358408360a401c5118824f86baac2e7a16b732a

 

Final Assignment - IBM Cloud Pak for Data

{"locales":"en-US","messages":{"CommonHeader.client.search.recentTitle":"Recent searches","CommonHeader.client.search.suggestionsTitle":"Suggestions","CommonHeader.client.trial.days":"Your trial ends in {number} days","CommonHeader.client.trial.tomorrow":"

dataplatform.cloud.ibm.com

# Dependency needed to install file 
!pip install xlrd

# Import required library
import pandas as pd

# Read data from CSV file
csv_path = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/TopSellingAlbums.csv'
df = pd.read_csv(csv_path)

df.head()     ***examine the first five rows of a dataframe

# Read data from Excel File and print the first five rows
xlsx_path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Datasets/TopSellingAlbums.xlsx'

df = pd.read_excel(xlsx_path)
df.head()

# Access to the column Length
x = df[['Length']]
x
# Get the column as a series
x = df['Length']
x
# Get the column as a dataframe
x = type(df[['Artist']])
x
# Access to multiple columns
y = df[['Artist','Length','Genre']]
y

# Access the value on the first row and the first column : iloc[ ]
df.iloc[0, 0]
# Access the column using the name
df.loc[1, 'Artist']

# Slicing the dataframe : row - from index 0 to 1 /column - from index 0 to 2
df.iloc[0:2, 0:3]
# Slicing the dataframe using name
df.loc[0:2, 'Artist':'Released']

loc : index name, column name

iloc : index number, column number 

 

gist.github.com/IreneJeong/c8d63bc33bcdeba6658ff52879fbb5a3

 

Final Assignment.ipynb

GitHub Gist: instantly share code, notes, and snippets.

gist.github.com

One Dimensional Numpy

Numpy is a library of scientific computing. Speed and memory

 

similar to list, fixed size, the same type 

access with index 

 

a.array 

type(a) : numpy.ndarray

a.size 

a.ndim : array dimensions or the rank of the array

a.shapre :

 

Zero-Dimension (Scalar) Multiplication of two scalars, a and b.
One-Dimensional Arrays (Vector) Inner product of vectors.
Two-Dimensional Arrays (Matrix) Matrix Multiplication.
a: N-Dimensional Array
b: 1-D Array
Sum product over the last axis of a and b.
a: N-Dimensional Array
b: M-Dimensional Array (M>=2)
Sum product over the last axis of a and second-to-last axis of b.

 

 

 

 

Two Dimensional Numpy

; you can even create multiple dimensional 

**********the number of columns in A and the number of rows in B should be equal! 

: use dot for multiply the arrays when the size/shape is different 

 

https://gist.github.com/05b79d12e1f4f14abab7006a3394e73d

반응형