Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration model.
Impedance Mismatch: Application developers have been frustrated with the impedance between the relational model and the in-memory data structures.
Web Services: Web services (applications communicating over HTTP) emerging from 2000s have more flexibility in the structure of data being exchanged per request / response.
Big Data:
There’s two approaches to system design to address this: scale up, or scale out.
Scaling up (vertical scaling) means bigger machines, more processors, more disk storage, more memory; Scaling out (horizontal scaling) means use lots of machines in a cluster.
Clustered Computing:
The vital factor for a change in data storage was the need to support large volumes of data by running on clusters. Relational databases are not designed to be run on clusters.
To run large clusters and capture huge amount of data, Amazon produced Dynamo in 2012 and Google produced BigTable in 2005.
Four categories are widely used in the NoSQL ecosystem: key-value, document, column-family, and graph.
The most important result of the rise of NoSQL is polyglot persistence - using different data stores in different circumstances.
The primary driver of interest in NoSQL has been its ability to run databases on a large cluster. As data volumes increase, it becomes more difficult and expensive to scale up. A more appealing option is to scale out. There are usually two approaches: replication and sharding.
Sharding puts different data on separate nodes, each of which does its own reads and writes. It can improve DB read and write performance.
In relational databases, querying, referential integrity, transactions, and consistency controls are lost across shards.
Some NoSQL databases provide auto-sharding.
Replication puts multiple copies of data on separate nodes. There are two main types of replication:
Master-slave: Master services all writes, reads can come from master or slaves. Data is replicated from master to slaves. Master-slave replication is read resilience: Should the master fail, the slave can still handle read requests.
Peer-to-peer: All nodes read and write all data. The nodes coordinate to synchronize their copies of the data.
Replication can be combined with sharding or implemented on its own. The dark side of replication is inconsistency.
It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability, and partition tolerance.
Consistency includes update consistency, read consistency, logical consistency, and replication consistency.
Sometimes we have to sacrifice consistency for availability. Some examples to relax consistency are:
To get good consistency, you need to involve many nodes in data operations, but this increases latency. So you often have to trade off consistency versus latency. Eventual consistency means that at some point the system will become consistent once all the writes have propagated to all the nodes.
Availability means that if you can talk to a node in a cluster, it can read and write data.
Partition tolerance means that the cluster can survive communication breakages in the cluster that separate the cluster into multiple partitions unable to communicate with each other.
The CAP theorem states that if you get a network partition, you have to trade off availability of data versus consistency. CAP is not a binary decision. There are varying degrees of relaxing consistency and availability.
Quorum is a Strategy for maintaining consistency in peer-to-peer replicated databases, ensuring that no two inconsistent copies of a record are read or written by two transactions concurrently.
To perform a read or write operation, a read quorum or write quorum must be obtained.
Write quorum: W > N/2
, where W
is the number of nodes participating in a write, and N
is the number of nodes the data record is replicated to (the replication factor).
Read quorum: R+W>N
,where R
is the number of nodes that need to be contacted to confirm a read.
Some Sample Quorum Configurations are:
N = 3, W = 3, R = 1: Slower writes, faster reads
N = 3, W = 2, R = 2: Reads and writes perform about the same
Version stamps are used to determine which record is the correct (most recent) record.
Mapreduce is a programming pattern for processing (and generating) large data sets over a distributed system using parallel computing algorithms.
The map function is to generate a list of (key, value) pairs from an input dataset. Map function calls are independent of one another, so they can be ran in parallel.
The reduce function is to perform some series of operations on several (key, value) pairs with the same key, and return a single output. Calls to the reduce function operate on values from only one key, so they can be ran in parallel.
Input processor divides input dataset into chunks that can be sent to parallel calls to the map function;
Shuffler group (k,v)
pairs from different map function calls by key;
Output writer combines the outputs from different reduce function calls into one final output.
Map-Reduce cannot be used on problems where the computation of the next value depends on previously computed values. It also cannot be used with algorithms that depend on a shared global state during processing.