[IBM]Python for Data Science, AI & Development

[IBM]Python for Data Science, AI & Development - Data Analysis

2021. 5. 11. 21:28ㆍData science/Python

*Data Analysis: Data acquisition in various ways and obtain necessary insights from a dataset

*Binary File Format: when the file is not readable. containing formatting information

: To read this file, it must be run on the appropriate software or processor first.

: images, jpegs, GIFs, MP3s, documents format like word or pdf etc.

*Reading the Image file

: Python has PILlibrary which provides the python interpreter with image editing capabilities.

#importing PIL 
from PIL import Image

import urllib.request

#downloading dataset
urllib.request.urlretrieve("http://hips.hearstapps.com/...")
#result 
('dog.jpg', <http.client.HTTPMessage at 0x7fb8548e0518>)

#read image
img = Image.open('dog.jpg')

#output images 
display(img)

Exercise

import pandas as pd 

#reading dataset and save it into df 
path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%205/data/diabetes.csv"
df = pd.read_csv(path)

#show the first 5 rows using dataframe.head() method 
#dataframe.head(n), or datafram.tail(n) can be used 
print("The first 5 rows of the dataFrame")
df.head(5)

#To view the dimensions of the dataframe -> .shape parameter can be used

df.shape

result -> (768, 9) : 768 rows and 9 columns

#To print information about dataFrame including index dtype and columns, non-null values and memory usage

df.info()

#To view some basic statistical details like percentile, mean , std etc. of a data frame or a series of numeric values

df.describe()

#To identify and handle missing values . isnull(), .notnull()

#Count missing values in each columns (True - missing , False- present/ value_counts() counts the number of "True" values)

-> There are no missing values in this dataset (no True)

#To check the data type : .dtype() - check data type / astype() - change the data type

*Visualization: Seaborn and Matplotlib are 2 of python's most powerful visualization libraries.

import matplotlib.pyplot as plt
import seaborn as sns

lables='Diabetic', 'Not Diabetic'
plt.pie(df['Outcome'].values_counts(), labels=labels.autopct='%0.02f%%')
plt.legend()
plt.show()

저작자표시 비영리 변경금지

'Data science > Python' 카테고리의 다른 글

[IBM] Data Analysis with Python - Pre-Processing Data in Python (0)	2021.05.14
[IBM] Python Project for Data Science - Extracting Stock Data Using a Python Library (0)	2021.05.11
[IBM]Python for Data Science, AI & Development - Data engineering (0)	2021.05.11
[IBM]Python for Data Science, AI & Development - Working with different file formats (0)	2021.05.11
[IBM]Python for Data Science, AI & Development - HTML for Webscraping (0)	2021.05.11

Piccole Gioie🪴

Piccole Gioie🪴

태그

최근글

댓글

공지사항

아카이브

Exercise

'Data science > Python' 카테고리의 다른 글

관련글

티스토리툴바