2023. 1. 15. 06:58ใData science/Database
SQL์ ๊ธฐ๋ณธ ํ ์ด๋ธ ํ์์ผ๋ก ๋์ด ์์ด ์ดํดํ๊ธฐ๊ฐ ์ฝ๋ค.
์ถํ์ NoSQL๋ฅผ ์ ๋ฆฌํ ํ ์ง๋ง, NoSQL์ ์ ํํ๋์ด ์์ง ์์ ์ฒ์ ๋ฐ์ดํฐ๋ฒ ์ด์ค ์ค๊ณ์ ์ ๋ฅผ ๋ง์ด ๋จน์๋ค.
์ด ์๋ฃ๋ ๋ํ์ ์์ฌ๊ณผ์ ์์ ๋ด์ฉ์ ์์ฝํ ๋ด์ฉ์ ๋๋ค.
What is a database?: Precisely know what the database means and is worth.
: A structured collection of meaningful data.
Timeline of Databases
1970-1960: in-company applications
- SQL, COBOL, usually per-company, no standardisation, data access slow(Data was stored in the tape), sequential/simple indexing
1980-2000: Wider applications
- SQL standard[1986], MS Access, maturation of relational model, commerce(Amazon), media (Netflix, iTunes), NoSQL
2000~: Social(Figures are monthly active users)
- online banking and eCommerce, test, photos, music, videos, likes, comments, sharing, messaging, real-time streaming etc.
Key Words
Data: meaningful information
Database Management System (DBMS): Software for creating and maintaining DB
Database system : DBMS + Data + Interface(app; front-end)
SQL (Structured Query Language)
:ANSI/ISO standard language for relational DB manipulation, SEQUEL
The relational model
: A database is a collection of relations, a relation is a table of values, and tables are accessed and linked with keys.
Table(aka relation) schema
: Schema is not actual data; it is just data structure/definition. It consists of attributes (name of each column) and the data type of each column.
student_id | name | age |
integer | text | integer |
Anatomy of table
Table name , row(tuple), attribute(column/field), header(list of attributes), body(data parts), domain(data type of attribute)
Keys
A key is an attribute(or set of them) used to uniquely identify a row in a table, create a relationship between tables
i.e. Primary Key(PK), Candidate Key, Foreign Key(FK)
Primary Key(PK)
Uniquely identifies a row in a table, underlined in the table.
How to: Identify a set of candidate keys -> select PK from that set(usually pick the simplest one)
โป Must be unique, No Null, Keep it simple (if possible, no compound key), numbers are faster(than strings), Try not to change it (PK linked to other tables or used in various queries, so if this changes in the middle of the process, it is a disaster!)
Candidate Keys
A set of attributes that can uniquely identify a row
i.e. Student_id (O), ISBN (O), Name (X), Age (X), ID_Number (O), Name+ID_number(O)
Foreign Key(FK)
When linking two tables, the one(Parent) that refers to another table uses FK, and the one(Child) that is being referred from the other one uses PK to link.
Key Integrity Constraints
: to prevent PK from having a NULL value, prevent too have two duplicated PKs, prevent FK(parents) from having a value which is not in PK(child)
* if the Parents table FK has a value, not in the Child table, joining is impossible.
E.F. Codd is the one who conceived the current relational database.
: Decoupled the form of data, eliminated any duplication of data, effectively optimizing the cost of storage
RDBMS(Relational database management system)
- Database: enough data in some organized format
- Relational Database: Organize the database as "relational"
- RDMBS: Add functionality for managing and optimizing access to relational data
Values are atomic(no more breakdown), the sequence of columns and row is insignificant, Data storage is vertically scalable, and integrity constraints maintain data consistency across multiple tables.
Limitation of RDBMS
To scale the database, it needs to be distributed on multiple scales.
Users have to scale the database on expensive servers that are hard to handle.
If the database doesn't fit the table, it is needed to be re-designed with a complex structure.
Why RDBMS?
ACID compliance (Atomicity, Consistency, Isolation, Durability)
Data remains structured and unchanging
A bit of NoSQL, which I scribed before. (NoSQL will be summarised later)
NoSQL is a schema-less alternative to SQL and RDBMS.
Store, process, and analyze vast amounts of unstructured data.
Feature of NoSQL
Collection of Key-Values pair, documents and wide-columns stores without any standard schema definition
integrated caching supportive
Distributed computing,
Flexible schema
powerful, efficient architecture
easily scalable
no complex relationships
low-cost servers allow for the scaling of more data
Limitation of NoSQL
Lack of reporting tools(They provide some tools these days, more options there!!!)
Not appropriate for complex queries as there is no standard interface to perform queries
No defined standards for database
Reasons to use a NoSQL
Elastic Scaling(databases are flexible, scalable, and have superior performance), Sizable data(capability of handling vast volumes), Flexible Data Models(fewer restrictions, easily used for the creation of new columns), Eventuality Consistent(Developers to choose to fine-tune; trade-off latency, consistency or availability), Simpler data models and automatic repair(lower down the maintenance costs), Suit for the real-time analytics
'Data science > Database' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
Understanding Database 3 - Relational Algebra (๊ด๊ณ๋์) (0) | 2023.01.15 |
---|---|
Understanding Database 2 - SQL (0) | 2023.01.15 |
Week 1. Data Quality (0) | 2022.10.09 |
Week 1. MetaData & ParaData (0) | 2022.10.09 |
Week 1. Data Sources & Data Resolution (0) | 2022.10.09 |