Understanding Database 1 - Database

2023. 1. 15. 06:58ใ†Data science/Database

๋ฐ˜์‘ํ˜•

SQL์€ ๊ธฐ๋ณธ ํ…Œ์ด๋ธ” ํ˜•์‹์œผ๋กœ ๋˜์–ด ์žˆ์–ด ์ดํ•ดํ•˜๊ธฐ๊ฐ€ ์‰ฝ๋‹ค.

์ถ”ํ›„์— NoSQL๋ฅผ ์ •๋ฆฌํ• ํ…Œ์ง€๋งŒ, NoSQL์€ ์ •ํ˜•ํ™”๋˜์–ด ์žˆ์ง€ ์•Š์•„ ์ฒ˜์Œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„ค๊ณ„์‹œ ์• ๋ฅผ ๋งŽ์ด ๋จน์—ˆ๋‹ค. 

 

์ด ์ž๋ฃŒ๋Š” ๋Œ€ํ•™์› ์„์‚ฌ๊ณผ์ • ์ˆ˜์—… ๋‚ด์šฉ์„ ์š”์•ฝํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.


What is a database?: Precisely know what the database means and is worth. 

: A structured collection of meaningful data.

 

Timeline of Databases

1970-1960: in-company applications

- SQL, COBOL, usually per-company, no standardisation, data access slow(Data was stored in the tape), sequential/simple indexing

1980-2000: Wider applications

- SQL standard[1986], MS Access, maturation of relational model, commerce(Amazon), media (Netflix, iTunes), NoSQL

2000~: Social(Figures are monthly active users) 

- online banking and eCommerce, test, photos, music, videos, likes, comments, sharing, messaging, real-time streaming etc. 

 

Key Words 

Data: meaningful information

Database Management System (DBMS): Software for creating and maintaining DB 

Database system : DBMS + Data + Interface(app; front-end) 

Figure 1. Database System Structure

SQL (Structured Query Language) 

:ANSI/ISO standard language for relational DB manipulation, SEQUEL

 

The relational model 

A database is a collection of relations, a relation is a table of values, and tables are accessed and linked with keys. 

 

Table(aka relation) schema 

: Schema is not actual data; it is just data structure/definition. It consists of attributes (name of each column) and the data type of each column. 

student_id name age
integer text integer

Anatomy of table

Table name , row(tuple), attribute(column/field), header(list of attributes), body(data parts), domain(data type of attribute)

 

Keys 

A key is an attribute(or set of them) used to uniquely identify a row in a table, create a relationship between tables 

i.e. Primary Key(PK), Candidate Key, Foreign Key(FK)

 

Primary Key(PK) 

Uniquely identifies a row in a table, underlined in the table. 

How to: Identify a set of candidate keys -> select PK from that set(usually pick the simplest one) 

โ€ป Must be unique, No Null, Keep it simple (if possible, no compound key), numbers are faster(than strings), Try not to change it (PK linked to other tables or used in various queries, so if this changes in the middle of the process, it is a disaster!)

 

Candidate Keys

A set of attributes that can uniquely identify a row

i.e. Student_id (O), ISBN (O), Name (X), Age (X), ID_Number (O), Name+ID_number(O) 

 

Foreign Key(FK) 

When linking two tables, the one(Parent) that refers to another table uses FK, and the one(Child) that is being referred from the other one uses PK to link.

Figure 2. PK and FK

Key Integrity Constraints

: to prevent PK from having a NULL value, prevent too have two duplicated PKs, prevent FK(parents) from having a value which is not in PK(child)

* if the Parents table FK has a value, not in the Child table, joining is impossible. 

 

E.F. Codd is the one who conceived the current relational database. 

: Decoupled the form of data, eliminated any duplication of data, effectively optimizing the cost of storage  

 

RDBMS(Relational database management system) 

  • Database: enough data in some organized format 
  • Relational Database: Organize the database as "relational"
  • RDMBS: Add functionality for managing and optimizing access to relational data

Values are atomic(no more breakdown), the sequence of columns and row is insignificant, Data storage is vertically scalable, and integrity constraints maintain data consistency across multiple tables.

 

Limitation of RDBMS

To scale the database, it needs to be distributed on multiple scales. 

Users have to scale the database on expensive servers that are hard to handle.

If the database doesn't fit the table, it is needed to be re-designed with a complex structure.

 

Why RDBMS?

ACID compliance (Atomicity, Consistency, Isolation, Durability) 

Data remains structured and unchanging

 

 


A bit of NoSQL, which I scribed before. (NoSQL will be summarised later) 

NoSQL is a schema-less alternative to SQL and RDBMS.

Store, process, and analyze vast amounts of unstructured data.

 

Feature of NoSQL 

Collection of Key-Values pair, documents and wide-columns stores without any standard schema definition

integrated caching supportive

Distributed computing, 

Flexible schema

powerful, efficient architecture

easily scalable

no complex relationships

low-cost servers allow for the scaling of more data

 

Limitation of NoSQL 

Lack of reporting tools(They provide some tools these days, more options there!!!) 

Not appropriate for complex queries as there is no standard interface to perform queries

No defined standards for database

 

Reasons to use a NoSQL 

Elastic Scaling(databases are flexible, scalable, and have superior performance), Sizable data(capability of handling vast volumes), Flexible Data Models(fewer restrictions, easily used for the creation of new columns),  Eventuality Consistent(Developers to choose to fine-tune; trade-off latency, consistency or availability), Simpler data models and automatic repair(lower down the maintenance costs), Suit for the real-time analytics

 

 

๋ฐ˜์‘ํ˜•

'Data science > Database' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Understanding Database 3 - Relational Algebra (๊ด€๊ณ„๋Œ€์ˆ˜)  (0) 2023.01.15
Understanding Database 2 - SQL  (0) 2023.01.15
Week 1. Data Quality  (0) 2022.10.09
Week 1. MetaData & ParaData  (0) 2022.10.09
Week 1. Data Sources & Data Resolution  (0) 2022.10.09