2022. 10. 2. 23:25ㆍData science/Database
Database means knowing precisely what the data means and its worth. In other words, it is a structured collection of meaningful data.
1. Timeline of databases
1960 - 70s (In-company applications) | 1980 - 2000 (Wider Applications) | 2000 + social (Figures are monthly active users) |
COBOL, SQL(1976) numeric/textual data accounting usually per-company no standardisation data access is slow (think about the old computer data storage tool, i.e.taping) simple indexing CODASYL(the Conference/Committee on Data Systems Languages, consortium) |
SQL Standard(1986) MS Access(1991) Maturation of the relational model Media(Netflix(1997), iTunes(2001)) NoSQL(1998) |
Online banking and eCommerce Text, Photos, music, videos Views, likes, comments Sharing, Messaging Realtime streaming "friend" network, suggestions. |
2. Some definitions
Data: meaningful information
DBMS: System
Database system : DBMS + Data + interface(App, front-end)
3. SQL(Structured Query Language)
: ANSI/ISO standard language for relational DB Manipulation
: from 1970, called SEQUEL(Structured English Query language)
4. NoSQL
: Not a traditional relational DB
: Very large DB where performance is crucial
5. DB ranking: Oracle, MySQL, MS SQL, PostgresQL, MongoDB
6. Typical DBMS functionality
Define a DB | Manipulate a DB |
Construct a DB | Share a DB |
* CRUD(Create, Read, Update, Delete)
7. Relational model
- a database is a collection of relations
- a relation is a table of values with rows and columns
- tables are accessed and linked together with keys.
8. Table(aka relation) Schema: No actual data within it
: With a table of attributes and data types, which gives a data structure.
9. Anatomy of table
10. Keys - a fundamental idea
: Uniquely identify a row in a table, and create a relationship between tables. - PK, Candidate Key, FK
10.1 Primary Key: Uniquely identifies a row in a table, Underlined
- How to choose PK? 1. Identify a set of candidate keys ->2. Select PK from them.
- Among Candidate keys, we look for the minimal(simplest) key(If possible, not a combined one)
- Choosing a primary key Rules
Must be unique | no Null | Obviousness: keep it simple |
if possible; no set. One attribute |
Numbers are faster! | Once chosen, try not to change it. |
10.2 Candidate Key: A set of attributes that can uniquely identify a row
- Should not be changeable, null and should be unique, precise
10.3 Foreign Key: an attribute in one table(parent) which is used as the PK to another table(child)
- a corresponding value must be in the child
- Careless deletion or insertion might destroy the relationship between the two tables.
10.4 Integrity constraints: DBMS will apply key integrity constraints
1. Prevent from setting a PK to Null |
2. Prevent having TWO PKs with the same value in the same table. |
3. Prevent parents' FK from having a value which does not occur in the child table. |
'Data science > Database' 카테고리의 다른 글
Week 1. MetaData & ParaData (0) | 2022.10.09 |
---|---|
Week 1. Data Sources & Data Resolution (0) | 2022.10.09 |
[IBM]Databases and SQL for Data Science with Python - JOIN Statements (0) | 2021.05.09 |
[IBM]Databases and SQL for Data Science with Python - ACID TRANSACTIONS (0) | 2021.05.09 |
[IBM]Databases and SQL for Data Science with Python - Views, Stored Procedures (0) | 2021.05.09 |