Understanding Database 1

Understanding Database 1 - Database

2023. 1. 15. 06:58ㆍData science/Database

SQL은 기본 테이블 형식으로 되어 있어 이해하기가 쉽다.

추후에 NoSQL를 정리할테지만, NoSQL은 정형화되어 있지 않아 처음 데이터베이스 설계시 애를 많이 먹었다.

이 자료는 대학원 석사과정 수업 내용을 요약한 내용입니다.

What is a database?: Precisely know what the database means and is worth.

: A structured collection of meaningful data.

Timeline of Databases

1970-1960: in-company applications

- SQL, COBOL, usually per-company, no standardisation, data access slow(Data was stored in the tape), sequential/simple indexing

1980-2000: Wider applications

- SQL standard[1986], MS Access, maturation of relational model, commerce(Amazon), media (Netflix, iTunes), NoSQL

2000~: Social(Figures are monthly active users)

- online banking and eCommerce, test, photos, music, videos, likes, comments, sharing, messaging, real-time streaming etc.

Key Words

Data: meaningful information

Database Management System (DBMS): Software for creating and maintaining DB

Database system : DBMS + Data + Interface(app; front-end)

SQL (Structured Query Language)

:ANSI/ISO standard language for relational DB manipulation, SEQUEL

The relational model

: A database is a collection of relations, a relation is a table of values, and tables are accessed and linked with keys.

Table(aka relation) schema

: Schema is not actual data; it is just data structure/definition. It consists of attributes (name of each column) and the data type of each column.

student_id	name	age
integer	text	integer

Anatomy of table

Table name , row(tuple), attribute(column/field), header(list of attributes), body(data parts), domain(data type of attribute)

Keys

A key is an attribute(or set of them) used to uniquely identify a row in a table, create a relationship between tables

i.e. Primary Key(PK), Candidate Key, Foreign Key(FK)

Primary Key(PK)

Uniquely identifies a row in a table, underlined in the table.

How to: Identify a set of candidate keys -> select PK from that set(usually pick the simplest one)

※ Must be unique, No Null, Keep it simple (if possible, no compound key), numbers are faster(than strings), Try not to change it (PK linked to other tables or used in various queries, so if this changes in the middle of the process, it is a disaster!)

Candidate Keys

A set of attributes that can uniquely identify a row

i.e. Student_id (O), ISBN (O), Name (X), Age (X), ID_Number (O), Name+ID_number(O)

Foreign Key(FK)

When linking two tables, the one(Parent) that refers to another table uses FK, and the one(Child) that is being referred from the other one uses PK to link.

Key Integrity Constraints

: to prevent PK from having a NULL value, prevent too have two duplicated PKs, prevent FK(parents) from having a value which is not in PK(child)

* if the Parents table FK has a value, not in the Child table, joining is impossible.

E.F. Codd is the one who conceived the current relational database.

: Decoupled the form of data, eliminated any duplication of data, effectively optimizing the cost of storage

RDBMS(Relational database management system)

Database: enough data in some organized format
Relational Database: Organize the database as "relational"
RDMBS: Add functionality for managing and optimizing access to relational data

Values are atomic(no more breakdown), the sequence of columns and row is insignificant, Data storage is vertically scalable, and integrity constraints maintain data consistency across multiple tables.

Limitation of RDBMS

To scale the database, it needs to be distributed on multiple scales.

Users have to scale the database on expensive servers that are hard to handle.

If the database doesn't fit the table, it is needed to be re-designed with a complex structure.

Why RDBMS?

ACID compliance (Atomicity, Consistency, Isolation, Durability)

Data remains structured and unchanging

A bit of NoSQL, which I scribed before. (NoSQL will be summarised later)

NoSQL is a schema-less alternative to SQL and RDBMS.

Store, process, and analyze vast amounts of unstructured data.

Feature of NoSQL

Collection of Key-Values pair, documents and wide-columns stores without any standard schema definition

integrated caching supportive

Distributed computing,

Flexible schema

powerful, efficient architecture

easily scalable

no complex relationships

low-cost servers allow for the scaling of more data

Limitation of NoSQL

Lack of reporting tools(They provide some tools these days, more options there!!!)

Not appropriate for complex queries as there is no standard interface to perform queries

No defined standards for database

Reasons to use a NoSQL

Elastic Scaling(databases are flexible, scalable, and have superior performance), Sizable data(capability of handling vast volumes), Flexible Data Models(fewer restrictions, easily used for the creation of new columns), Eventuality Consistent(Developers to choose to fine-tune; trade-off latency, consistency or availability), Simpler data models and automatic repair(lower down the maintenance costs), Suit for the real-time analytics

저작자표시 비영리 변경금지 (새창열림)

'Data science > Database' 카테고리의 다른 글

Understanding Database 3 - Relational Algebra (관계대수) (0)	2023.01.15
Understanding Database 2 - SQL (0)	2023.01.15
Week 1. Data Quality (0)	2022.10.09
Week 1. MetaData & ParaData (0)	2022.10.09
Week 1. Data Sources & Data Resolution (0)	2022.10.09

Piccole Gioie🪴

Piccole Gioie🪴

태그

최근글

댓글

공지사항

아카이브

'Data science > Database' 카테고리의 다른 글

관련글

티스토리툴바