2021. 5. 15. 09:25ㆍData science
Foundations of Big Data
- dynamic, large, disparate volumes of data being created by people, tools and machines
- 5 V: velocity, volume, variety, veracity, value
- velocity: process very quickly
- volume: scale
- variety: diversity, from different sources
- veracity: quality and origin of the data, reliable and accurate
- value: not only profit
- Alternative tools: apache-spark, Hadoop, ecosystem
What is Hadoop?
- In a big data cluster, taking the data and sliced them. Send them to different computer machines and take out useful data from it and return the useful outcome.
- map and reduce
- scale linearly
- Most of the components of data science, such as probability, statistics, linear algebra, and programming, have been around for many decades but now we have the computational capabilities to apply combine them and come up with new techniques and learning algorithms.
How Big Data is Driving Digital Transformation
- Digital Transformation: Many industries have been influenced by this. Big data came out from this as a result actually impacted many industries. It is important to be handled by all organization personnel like chief executives, data manager, information manager, procurement, etc.
Data Science Skills & Big Data
- need to learn many tools such as Python, UNIX commands, pandas, and Jupyter notebook.
Data Scientists at New York University
- Big data is is data that is large enough and has enough volume and velocity that you cannot handle it with traditional data database systems.
- Big data was started by Google when Google tried to figure out how how to solve their PageRank algorithm.
“Formal evaluation could include testing the predictive capabilities of the models on observed data to see how effective and efficient the algorithms have been in reproducing data.” This is known as: In-sample forecast.
After the data are appropriately processed, transformed, and stored, what is a good starting point for data mining? Data visualization
'Data science' 카테고리의 다른 글
[Power BI] (0) | 2021.05.18 |
---|---|
[IBM] What is Data Science? - Data Science in Business (0) | 2021.05.18 |
[CS50x] Lecture 1. (0) | 2021.05.14 |
[IBM]What is Data Science? -My thoughts (0) | 2021.05.12 |
[IBM]Python for Data Science, AI & Development - Simple APIs (0) | 2021.05.11 |