1 Introduction
This article provide an introduction to Azure Cosmos DB and its extensive features including data partitioning, global distribution, elastic scaling, latency, throughput and different service level agreements.
It describes how you can design your cosmos db with different collections and relationships, You can run queries top of that design and analyse throughput of the query. If its not performing up to the expected level, you can remodel your database until it
reaches to an expected level
2 What is Azure Cosmos DB
Azure Cosmos DB is a database service that offers variety of data models and data APIs
It supports for key-value storage, document db and graph db
It is a global distributed database service, with a single button click you can replicate your data into different regions, You can scale out your database runs in Azure Cosmos DB service, when the no of users increases and same time you can achieve low
response time
Azure Cosmos DB service offers no of service level agreements, which vary with amount of availability of your data and consistency of your data, you can select according to your business requirements
If you want to distribute your data into multiple regions, closer to your users, when you want to scale out your application, when application grows and at the same time you need to achieve low latency, high data availability & consistency, you can go for
Azure CosmosDB
3 Why Azure Cosmos DB
Let's see what is the difference between Azure SQL database and Azure Cosmos DB, Both databases are database services
Azure Cosmos DB supports for document store, Graph DBMS, key value store and column store. SQL databases supports for relational database systems.
Both database services can be replicated to many data centres and achieve high data availability. We can scale out and improve latency in both database services
Azure SQL databases and Azure Cosmos DB both has different level of service agreements.
4 Data Partitioning & Global Distribution
Data in Azure Cosmos DB can be partitioned into different sections. Let's say we have products and product categories, you can define product category as the partition key and split products into different sections, partitions as above picture. If we take
East Asia region, products has been locally distributed according to the product category id, (product category A, B & C) products has distributed globally for different regions like East Asia, Central US and West US
You can host copy of your data along with the partitions into another region and make data synchronize with each other
You can configure multiple read/write regions, one region can act as a write region and all the other regions can act as read regions. you can order fail over regions, If East Asia goes down, read from Central US, If East Asia & Central US both goes down,
read from West US
In above picture I have hosted cosmos db in East Asia region, since i haven't hosted this cosmos db in multiple regions, read & write location will be East Asia region. Let's see how we can host our database service in multiple regions and how to configure
read, write regions
Go to region configuration in Cosmos DB, you can see 'Replicate data globally' screen. click on a region and you can replicate your data into different regions, i have replicated data to Central US & West US regions. You can see write region & read region
configuration, I selected East Asia as a write region and Central US & West US as
read regions, primary read region is Central US in this scenario. You can order your data read regions as you want.
5 Data Consistency & Availability
What happens to data consistency & availability when you distribute your data within multiple regions, You cant't achieve high level of data availability & consistency in same time
Your application requirements should decide what to do, whether to provide highly available system or to focus on consistency in your data, In a shopping cart application, you may get duplicate items, sometime items you added to the cart may not be there,
that's not good for a better user experience, but It's acceptable. When it comes to a mission critical application, your data should be high consistent.
If you partition your data, you have to choose either consistency or availability, If we get a one db instance it's not a problem
You have to decide whether to focus on consistency or availability, but that's not a single binary choice. You can go for a certain amount of consistency with certain amount of availability as per your requirements. In Cosmos DB, you can select level of
data consistencies, it can depend on the operations you need to perform
cosmos DB offers 99.9% availability SLA with 5 different consistency levels such as strong to eventual, In strong consistency its not high available, response time will be high, not easy to scale. In eventual consistency level, data not consistent & it shows
out of order reads
In a Cosmos DB service, default consistency level is session consistency, session consistency is useful for user centric applications. you can get a consistent view of your data in your region, but you may not see latest data in other regions as soon as
you commit your changes, its more user centric
You can change your consistency level programmatically, If you want to read more consistent data in a specific read, you can increase the consistency level and achieve strongly consistent index and data
Go to default consistency in Cosmos DB service, you can see available consistency levels and can select one of them