Data science/Database(28)
-
Understanding Database 4 - Entity Relationship(ER) Model
ER Design phase 1. Requirements analysis: what problem is being solved? What does the DB need to do? 2. Logical design: Identification of entities and their relationships (ER model) 3. Normalization: systematic simplification and clarification of the logical design 4. Physical Design: implementation in DBMS ER model concepts 1. Entity: thing, A noun 2. Attribute: a property, An adjective (PK mar..
2023.01.15 -
Understanding Database 3 - Relational Algebra (관계대수)
Relational Algebra: Formalisation of Operations(SELECT, JOIN, etc.), Uses operations on relations(set of tuples), Allows us to understand and create more efficient queries Unary Operations SELECT Operation [ σ (r) ] : Select the rows of the relation where the condition is true σ salary >10000 (Employee) : It is commutative, which means it can be applied in any order. Plus, it can be cascaded. σ ..
2023.01.15 -
Understanding Database 2 - SQL
Structured Query Langauge, Manipulating relational databases, International standard since 1986(ANSI, ISO) SQL is language, and MySQL is DBMS. Web interfaces( e.g., phpMyAdmin) Several notes VARCHAR(20): By specifying the number of characters for memory management purposes. Schema Management: Definition of the table, key management, modify the table - CREATE table, ALTER table, DROP table(NO UND..
2023.01.15 -
Understanding Database 1 - Database
SQL은 기본 테이블 형식으로 되어 있어 이해하기가 쉽다. 추후에 NoSQL를 정리할테지만, NoSQL은 정형화되어 있지 않아 처음 데이터베이스 설계시 애를 많이 먹었다. 이 자료는 대학원 석사과정 수업 내용을 요약한 내용입니다. What is a database?: Precisely know what the database means and is worth. : A structured collection of meaningful data. Timeline of Databases 1970-1960: in-company applications - SQL, COBOL, usually per-company, no standardisation, data access slow(Data was stored in t..
2023.01.15 -
Week 1. Data Quality
1. Data Quality Dimensions Data quality is the capability of data to be used efectively, economically and rapidly to inform and evaluate decisions.” Karr and Sanil (2002) : A dimension is a data item, a record of a dataset or a database that can be used as a parameter of data quality : Measuring data parameters against data standards to evaluate the level of quality of the data or dataset ** For..
2022.10.09 -
Week 1. MetaData & ParaData
1. Metadata: data about data with the attribute - Data that provides information about a specific dataset, focusing on different aspects such as source, resolution, production or accuracy - Multiple purposes: to manage datasets, to report information - It's stored in metadata repositories(which are databases in themselves) 1) Metadata General properties - Sufficiency: Can an object describe itse..
2022.10.09 -
Week 1. Data Sources & Data Resolution
1. Most common data types : Data can be classified according to its main production source 1) Primary Data: Generated/developed and implemented by the user 2) Secondary Data: Collected from Databases which were processed and available by third parties. Primary Data Secondary Date - Own controlled data - generated by the research like surveys, interviews, observations, data mining methods etc. - ..
2022.10.09 -
Week1. What is a database?
Database means knowing precisely what the data means and its worth. In other words, it is a structured collection of meaningful data. 1. Timeline of databases 1960 - 70s (In-company applications) 1980 - 2000 (Wider Applications) 2000 + social (Figures are monthly active users) COBOL, SQL(1976) numeric/textual data accounting usually per-company no standardisation data access is slow (think about..
2022.10.02 -
[IBM]Databases and SQL for Data Science with Python - JOIN Statements
* Join Operator: for the relationship among the entities, you need to use the JOIN operator. * Primary Key: uniquely identifies each row in a table * Foreign Key: refer to a primary key of another table * Inner Join: displays matches only ( 1 primary key and 1 Foreign key ) select b.borrower_id, b.lastname, b.country, l.borrower_id, l.loan_date from borrower b inner join loan l on b.borrower_id ..
2021.05.09 -
[IBM]Databases and SQL for Data Science with Python - ACID TRANSACTIONS
ACID Commands: Start with "BEGIN" End with "COMMIT" -> After the transaction, save the new database state -> If any of the statements fail, you can undo changes by issuing ROLLBACK -> Can be issued by some languages: Java, C, R, Python -> to execute SQL statements from code, use the EXEC SQL command : EXEC SQL COMMIT WORK; EXEC SQL ROLLBACK WORK; --#SET TERMINATOR @ CREATE PROCEDURE TRANSACTION_..
2021.05.09