Understanding database 7

Understanding database 7 - NoSQL

2023. 1. 16. 05:33ㆍData science/Database

NoSQL (Not only SQL, Non-SQL)

Categories of NoSQL Systems
Key-Value based NoSQL: Riak, Redis, Scalaris, Amazon DynamoDB

Document-based NoSQL: MongoDB, CouchDB, OrientDB

Column-base NoSQL: Google BigTable, Cassandra, Hbase, Amazon DynamoDB,

Graph-based NoSQL: Neo4J, InfoGrid, FlockDB

ACID Principle: RDBMS Gold Standard!

Atomicity: Transactions are completely done or not done at all.
Consistency: Only valid data to be stored(before and after the transaction, data consistent)
Isolation: Transactions do not affect each other(without interference)
Durability: Written data will not be lost

The CAP Theorem

Any distributed data store can provide only two of the following three guarantees:

TRADITIONAL RDBMS TO NOSQL DATABASE: NEW ERA OF DATABASES FOR BIG DATA, December 2016 Authors: Mahmud Shertil (Nottingham Trent University)

The BASE Property

Basically Available: DDB is available following CAP Theorem,

Soft state: system's state may change even without transaction execution

Eventual constancy: the system will become consistent eventually.

-> Distributed database, weaker consistency is often acceptable while guaranteeing availability and partition tolerance, where the database will eventually be consistent.

-> Hence, BASE property(NoSQL) is used instead of ACID policy(RDBMS) in a centralized database.

MongoDB

: general purpose, document-based, distributed database for modern application developers for the cloud era

: licensed under the server-side public license(SSPL)

: instead of table joining, $lookup can be used.

: Database Name(case-sensitive), Size(64 bites)

: All documents must be smaller than 16MB

: JSON documents in binary-encoded format(BSON) behind - BSON uses field-value pairs

: type-sensitive and case-sensitive

: each document required _id (unique) - primary key of the collection, objectID(by default), automatically generated

MongoDB CRUD Operations(Create, Read, Update, Delete)

Basic Comparison operators

: $eq, $gt, $gte, $in, $lt(less than), $lte(less than or equal), $ne(not equal), $nin(Non in an array)

Create Operations

: insertOne, insertMany, insert (<document>)

db.product.insertOne({"category": "fruit", "name":"Apple from Uleumgol"});
db.product.insertMany({"category": "fruit", "name":"Apple from Uleumgol"},{"category": "fruit", "name":"Banana from Ulsan"} );

Read Operations

: find( <cond> ), pyears greater than or equal to 3 and age less than or equal to 50.

db.product.find({"pyears":{$gte:3}, {"age": {$lte:50}});

Update Operations

: updateMany(<filter>, <updateAction> , <options>);

IF pyear is greater than or equal to 3<condition-filter>,

THEN set status.finished as 1 and status.ongoing as 0<Update action>.

*$set: sets the value of a field; if the field does not exist, it will be created, and the type can also be changed.

*$unset: remove the field

*$inc: increase the number; if it does not exist, create a field and assign 1.

*$push: add elements(array) if it does not exist(array)

*$addToSet: add elements to the array, but prevent the duplication of array elements.

*$pull: removing element from array

*$pop: removing by order, {$pop:{"key":1}} -> element at the end of array to be removed

*{"upsert": true}: if no document is found, a new document is added. Otherwise, updated.

db.product.updateMany({"pyear": {$gte:3}}, 
					  {$set: "status.finished":1, "status.ongoing":0}}
                     );

Delete Operations

: deleteMany(<filter>, <options>);

db.product.deleteMany(
{"finished": 1, "ongoing":0}
);

Aggregation Framework

$project: reshaping operations

$unwind: split the array into the separated documents with each element

$max, $sum, $avg, etc.

$group: similar to GROUP BY in SQL

Aggregations allow you the combine various commands to query and get various outcomes.

Index

: shortcut and looks at an ordered list with references to the content.

: Without the index, the query does a "collection scan", which means looking through entire documents in the collection

-> Reading with an index is much faster

: BUT, writing operations (insert, updates and deletes) takes much longer - maintaining and recalculating are not free, it takes memories as well -> more details https://www.mongodb.com/developer/products/mongodb/schema-design-anti-pattern-unnecessary-indexes/

-> Thus, better not to set every field as an index. Only for the key values you needed as an index.

: Compound index can be used, and each pair will have index key value

db.users.createIndex({"age":1, "username":1});
db.users.find( {"age":21 }).sort({"username":1}): find age 21, and sorted by username(already sorted due to index, regardless of sort, so no need additional computing by using index)
db.users.find({"age": {"$gte":21, "$lte":30}}): find age between 21 and 30, the username will be sorted already.
db.users.find({"age": {"$gte":21, "$lte":30}}).sort({"username":1}): In this case, each group of age from 21 ~ 30 will have a sorted username, but the overall order is not sorted by username. Therefore, if you want to sort by the username, you need to set the index order by username and age like {"username":1, "age":1}. Then it will ignore the age group, and the username will sort it.

Schema Design: What to Consider

Constraints: any database or hardware limitations?
Access patterns: identify and quantify workload
Relationship types: which data is related in terms of the application needs
Cardinality: Cardinalities of the relationship

Schema design patterns

Bucket Pattern: suitable for time series data - if a certain document reaches the limit, create another document.
Outlier Pattern: rare instances where a few queries of documents fall outside the normal pattern
Computed pattern: instead of computing frequently, instead using total count and qty to calculate per document. When the data access pattern is read-intensive
Extended referencing pattern: many different logical entities, and want to gather them together for a specific function
Approximation pattern: useful when resource-expensive calculations( time, memory, CPU cycle), but not exact precision required.

Normalization vs Denormalization

In MongoDB, Embedding(Denormalization) and Referencing(Normalization).

Embedding: storing data in a single document structure, storing in a field or array within a document
Referencing: the relationship between data by including links or references

Embedding	Referencing
Fast reads (One query, but slow writing) small sub-documents not change regularly eventual consistency is acceptable (difficult to keep in sync) grow by a small amount (each document takes more space) often need to perform a second query to fetch	Fast writes (More queries, but fast) large subdocuments volatile data(Change frequently) immediate consistency is necessary grow by a large amount often exclude from the results

Extended Reference pattern: Hybrid of embedding and referencing

- create an array of subdocuments with the frequently used info, for more info referencing to the actual document

저작자표시 비영리 변경금지 (새창열림)

'Data science > Database' 카테고리의 다른 글

Understanding Database 9 - Distributed Database Systems(DDB) (0)	2023.01.16
Understanding Database 8 - NoSQL + Python Basic (0)	2023.01.16
Understanding database 6 - Normalization (0)	2023.01.15
Understanding Database 5 - Advanced SQL Queries (Stored Procedure, Trigger, Views) (0)	2023.01.15
Understanding Database 4 - Entity Relationship(ER) Model (0)	2023.01.15

Piccole Gioie🪴

Piccole Gioie🪴

태그

최근글

댓글

공지사항

아카이브

MongoDB CRUD Operations(Create, Read, Update, Delete)

Aggregation Framework

'Data science > Database' 카테고리의 다른 글

관련글

티스토리툴바