July 29, 2017

Replication and Sharding in MongoDB

1. Introduction

1.1 Intro to SQL and NoSQL

  1. Two type of database solutions:
  1. Differences between SQL and NoSQL:

1.2 Introduction to MongoDB and its advantages

  1. Intro of MongoDB:
    • Open source DB
    • Document-oriented (JSON like documents, schemaless)
    • One of the most popular databases in the world
  2. Advantages:
    • Easy scalability:
      • highly scalable horizontally, include sharding and partitioning
      • fault tolerance and auto-sharding
      • Cloud deployment, unlimited growth, higher throughput, lower latency
      • flexible schema
      • include support for MapReduce
    • Developer agility:
      • Easy to setup
      • deployquickly/inavarietyofways/onmultipleservers
      • synchronizedataacrossservers
    • Oriented toward programmers, MongoDB drivers support most programming languages
    • Rich query language: MongoDB is not a key-value DB
    • Mongo might be the closest to a RDBMS, refer to: NoSQL: If Only It Was That Easy
      • MongoDB does like most NoSQL databases that sacrifice capacities
      • MongoDB maintains features of relational DBs, built for CRUD.
      • MongoDB is ACID compliant at the document level.

1.3 Why companies use MongoDB/ Why companies shift from DBMS to MongoDB

  1. Who use MongoDB? see:
  2. Some third party tools that enhance interaction:

2. Scalability

What scalability options are available, and what trade-offs are associated with choosing a particular scaling method?

3. Replication

4. Sharding

MongoDB achieves sharding through auto-sharding - it handles splitting up. Sharding is already built into the database.

  1. MongoDB sharded cluster with three components working together:
    • A shard / Or Replica Sets : A shard is a container holding a collection of data sets. A shard can be deployed as a single mongod server for development, or a replica set for production.
    • Mongos Server / Query router : a routing process; routes requests and aggregate responses.
    • Config Server : store metadata and configuration of a cluster
  2. Multiple sharding methods are available at MongoDB:
    • Range-based Sharding : partitioned by the shard key value.
    • Hash-based Sharding : partitioned by a hash of the key’s value.
    • Location-aware Sharding : partitioned by specified configuration.
  3. Shard Key: To shard a collection, we need to choose a key from collection and use it to split up the data. Shard keys affect operations.
    • Range based partitioning → supports more efficient range queries/ result in uneven distribution of data
    • Hash based partitioning : compute a hash value → ensure an even distribution of data/ at the expense of efficient range queries.
      • Compare performance distinctions: see above
      • Tag aware sharding: tag with ranges of the shard key
  4. Sharding provides: - Automatic balancing in load/data distribution - Adding new machines easily - Scaling out to many nodes - Automatic failover/ No single points of failure

    1. What are the tradeoffs ?
      • Additional costs: duplicate replica set; setup configuration server and router server
      • Forecasting: To make the right decision, we can collect performance metrics over time and to see which choice is cheaper. E.g: forecast resource requirements: RAM, storage requirements, processor speeds, disk I/O speed

5. CAP Theorem

Many people think that MongoDB is CP in CAP theorem. MongoDB’s CAP performance in regards of consistency and availability:

  1. Consistency : MongoDB is strongly consistent by default . → because it’s a single-master system. MongoDB selects consistency over availability.
    • How do MongoDB persist strong consistency?
      • One piece of data in one shard only
  2. Availability : MongoDB has a single master, so it’s not CAP-available. At sharding level, MongoDB has a weak availability. However, MongoDB can also support high availability.
    • How to deploy MongoDB for high availability?
      • Master-slave replication with replica sets
      • Automatic failover with replica set elections: Secondary will become primary if primary is not available.
  3. Partition Tolerance: strong partition tolerance for a distributed system

In mongoDB CAP has different options (there’s tradeoff between availability and consistency):

References

  1. MongoDB documentations, available at: https://docs.mongodb.com/manual/core/replica-set-elections
  2. Sharded Clusters in Mongodb - the Key Considerations: http://blog.scottlogic.com/2014/08/08/sharded-clusters-mongodb-considerations.html
  3. MongoDB Architecture Guide [book]
  4. MongoDB - The Definitive Guide [book]
  5. Scaling MongoDB [book]
Share with your friends: Twitter Facebook
comments powered by Disqus