[IBM] What is Data Science? - Big Data, Data Mining

2021. 5. 15. 09:25Data science

반응형

Foundations of Big Data

- dynamic, large, disparate volumes of data being created by people, tools and machines

- 5 V:  velocity, volume, variety, veracity, value 

- velocity: process very quickly

- volume: scale

- variety: diversity, from different sources

- veracity: quality and origin of the data, reliable and accurate

- value: not only profit

- Alternative tools: apache-spark, Hadoop, ecosystem

 

What is Hadoop?

- In a big data cluster, taking the data and sliced them. Send them to different computer machines and take out useful data from it and return the useful outcome. 

- map and reduce  

- scale linearly

- Most of the components of data science, such as probability, statistics, linear algebra, and programming, have been around for many decades but now we have the computational capabilities to apply combine them and come up with new techniques and learning algorithms.

 

 

How Big Data is Driving Digital Transformation

- Digital Transformation: Many industries have been influenced by this. Big data came out from this as a result actually impacted many industries. It is important to be handled by all organization personnel like chief executives, data manager, information manager, procurement, etc. 

 

 

Data Science Skills & Big Data

need to learn many tools such as Python, UNIX commands, pandas, and Jupyter notebook.

 

Data Scientists at New York University

- Big data is is data that is large enough and has enough volume and velocity that you cannot handle it with traditional data database systems.

- Big data was started by Google when Google tried to figure out how how to solve their PageRank algorithm.

 

 

“Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data.” This is known as: In-sample forecast.

 

After the data are appropriately processed, transformed, and stored, what is a good starting point for data mining? Data visualization

 

 

 

 

 

 

 

반응형