[IBM]Python for Data Science, AI & Development - Data Analysis

2021. 5. 11. 21:28Data science/Python

반응형

*Data Analysis: Data acquisition in various ways and obtain necessary insights from a dataset 

 

*Binary File Format: when the file is not readable. containing formatting information 

: To read this file, it must be run on the appropriate software or processor first. 

: images, jpegs, GIFs, MP3s, documents format like word or pdf etc. 

 

*Reading the Image file 

: Python has PILlibrary which provides the python interpreter with image editing capabilities. 

#importing PIL 
from PIL import Image

import urllib.request

#downloading dataset
urllib.request.urlretrieve("http://hips.hearstapps.com/...")
#result 
('dog.jpg', <http.client.HTTPMessage at 0x7fb8548e0518>)

#read image
img = Image.open('dog.jpg')

#output images 
display(img) 

Exercise 

import pandas as pd 

#reading dataset and save it into df 
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%205/data/diabetes.csv"
df = pd.read_csv(path)

#show the first 5 rows using dataframe.head() method 
#dataframe.head(n), or datafram.tail(n) can be used 
print("The first 5 rows of the dataFrame")
df.head(5) 

#To view the dimensions of the dataframe -> .shape parameter can be used 

    df.shape 

    result -> (768, 9) : 768 rows and 9 columns 

#To print information about dataFrame including index dtype and columns, non-null values and memory usage 

    df.info()

#To view some basic statistical details like percentile, mean , std etc. of a data frame or a series of numeric values 

  df.describe()

#To identify and handle missing values   . isnull(), .notnull()

#Count missing values in each columns (True - missing , False- present/ value_counts() counts the number of "True" values) 

-> There are no missing values in this dataset (no True) 

 

#To check the data type : .dtype() - check data type / astype() - change the data type 

 

 

*Visualization: Seaborn and Matplotlib are 2 of python's most powerful visualization libraries. 

import matplotlib.pyplot as plt
import seaborn as sns

lables='Diabetic', 'Not Diabetic'
plt.pie(df['Outcome'].values_counts(), labels=labels.autopct='%0.02f%%')
plt.legend()
plt.show()

반응형