Cosmos DB is a powerful and flexible database service that supports globally distributed applications with multiple consistency models, seamless scaling, and multi-region replication. However, as with any sophisticated cloud service, effective management at scale requires careful planning.

In large development teams, particularly those building microservices or extensible architectures, it’s important to establish clear scopes of management. Without this, maintaining governance, access control, and network security across independently deployable components can become overwhelming.

This article outlines an approach to managing Cosmos DB at scale using Terraform, ensuring that development teams can efficiently coordinate its use while maintaining separation of concerns.

Structuring the Cosmos DB Deployment

A common practice when setting up Cosmos DB for a microservices architecture is to provision a single Cosmos DB account at the shared infrastructure level. This account serves as the foundation for all databases and collections created by individual services. Since each microservice requires its own data store, separating concerns between shared infrastructure and service-level data management is essential.

Alt Above is a diagram showing the how Cosmos DB configuration spans out deployments

To maintain clarity and control, Terraform modules should be structured to align with different scopes of responsibility. In this example, three distinct Terraform modules manage different environments:

  1. Core-network environment — Handles network connectivity and security.
  2. Core-workload environment — Houses shared services, including the Cosmos DB account.
  3. Service-level environment — Manages individual microservices and their databases within Cosmos DB.

First is the core-network environment. This is where connectivity to the broader organizations network exists. In Azure this can be either a Virtual Network or an Azure Virtual-WAN.

This approach allows us to centralize configuration for the shared resources of Cosmos DB such as replication and network settings while allowing each microservice to attach itself and setup its own database where it will retrieve and store its data.

Alt

Typical Infrastructure “Stack” with Layers of Shared Infrastructure Depending on how our team or organization is structured we might have different teams take ownership and responsibility for change management for a given deployment.

Managing Cosmos DB Infrastructure Across Teams

In a large organization, managing Cosmos DB infrastructure effectively requires a structured approach to Infrastructure-as-Code (IaC) that aligns with team responsibilities. The deployment model follows a layered module approach, ensuring separation of concerns between enterprise-wide infrastructure, shared workload resources, and individual services.

Alt

The Enterprise Platform Team manages the core-network module, which integrates with enterprise networking and security. The Product Platform Team is responsible for the core-workload module, which provisions and governs the Cosmos DB account. Finally, Service/Application Teams use service-specific modules to create and manage their own Cosmos DB databases and collections while adhering to the policies established by the platform teams.

Deploying the Core-Workload Environment

The core-workload environment is where the Cosmos DB account is provisioned. This environment ensures that all shared services, including the database, adhere to common governance policies.

resource "azurerm_cosmosdb_account" "main" {

  name                          = "cosmos-${var.application_name}-${var.environment_name}-${random_string.cosmosdb_suffix.result}"
  location                      = azurerm_resource_group.main.location
  resource_group_name           = azurerm_resource_group.main.name
  offer_type                    = "Standard"
  kind                          = "GlobalDocumentDB"
  local_authentication_disabled = true
  analytical_storage_enabled    = true
  public_network_access_enabled = false

  consistency_policy {
    consistency_level = "Session"
  }

  geo_location {
    location          = azurerm_resource_group.main.location
    failover_priority = 0
  }

}

This module is responsible for critical configurations such as consistency level, failover priority, and network access settings. It also establishes connectivity with the broader organization’s network by configuring private DNS and private endpoints. The Cosmos DB Account is where we can configure cross-region replication, operator access through the two types of role assignments at the control plane level (ARM) and the data plane level (Cosmos DB Account). We can also manage things like network connectivity by setting up Private Endpoints or Network Security Perimeter.

We need to make sure that the DNS Zone is linked to the workload’s network. Remember the workload has its own Virtual Network that attaches to the broader organization through the V-WAN via a Virtual Hub connection.

data "azurerm_private_dns_zone" "cosmosdb" {
  name                = "privatelink.documents.azure.com"
  resource_group_name = var.virtual_hub.resource_group
}
resource "azurerm_private_dns_zone_virtual_network_link" "cosmosdb" {

  name                  = "${var.application_name}-${var.environment_name}-cosmosdb"
  resource_group_name   = var.virtual_hub.resource_group
  private_dns_zone_name = data.azurerm_private_dns_zone.cosmosdb.name
  virtual_network_id    = azurerm_virtual_network.main.id
  registration_enabled  = false

}

Private endpoints further secure the Cosmos DB account, restricting access to internal networks.

resource "azurerm_private_endpoint" "cosmosdb" {

  name                = "pe-${var.application_name}-${var.environment_name}-cosmosdb"
  location            = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  subnet_id           = azurerm_subnet.shared.id

  private_dns_zone_group {
    name                 = "cosmosdb"
    private_dns_zone_ids = [data.azurerm_private_dns_zone.cosmosdb.id]
  }

  private_service_connection {
    name                           = "pec-${var.application_name}-${var.environment_name}-cosmosdb"
    private_connection_resource_id = azurerm_cosmosdb_account.main.id
    is_manual_connection           = false
    subresource_names              = ["SQL"]
  }

}

Managing access is crucial to prevent unauthorized modifications. Administrator access to the Cosmos DB environment is managed via role assignments.

data "azurerm_cosmosdb_sql_role_definition" "writer" {
  resource_group_name = azurerm_cosmosdb_account.main.resource_group_name
  account_name        = azurerm_cosmosdb_account.main.name
  role_definition_id  = "00000000-0000-0000-0000-000000000002"
}

data "azuread_group" "dev_team" {
  object_id = var.admin_group_object_id
}

resource "azurerm_cosmosdb_sql_role_assignment" "admin" {
  resource_group_name = azurerm_resource_group.main.name
  account_name        = azurerm_cosmosdb_account.main.name
  role_definition_id  = data.azurerm_cosmosdb_sql_role_definition.writer.id
  principal_id        = data.azuread_group.dev_team.object_id
  scope               = azurerm_cosmosdb_account.main.id
}

With this setup, only authorized users can configure or modify databases and collections within Cosmos DB.

Service-Level Configuration

Each microservice should be able to create its own database and collections without interfering with others. Terraform’s module system enables this by structuring service-level configurations separately.

Each component or service will have its own terraform root module that will provision that infrastructure needed for it. That will include just the things needed for that service. If we want our service to start using Cosmos DB that is going to include a Cosmos DB Database and Containers for each distinct collection of data we want to store. We’ll also need Cosmos DB Role assignments — these are those data plane role assignments that are specific to Cosmos DB.

The Database and containers will get attached using a data source to the Cosmos DB Account itself.

data "azurerm_cosmosdb_account" "main" {
  name                = var.cosmosdb_account.name
  resource_group_name = var.cosmosdb_account.resource_group
}

resource "azurerm_cosmosdb_sql_database" "assessment" {
  name                = var.application_name
  resource_group_name = data.azurerm_cosmosdb_account.main.resource_group_name
  account_name        = data.azurerm_cosmosdb_account.main.name
  throughput          = 400
}

resource "azurerm_cosmosdb_sql_container" "tasks" {
  name                   = "tasks"
  resource_group_name    = data.azurerm_cosmosdb_account.main.resource_group_name
  account_name           = data.azurerm_cosmosdb_account.main.name
  database_name          = azurerm_cosmosdb_sql_database.assessment.name
  partition_key_paths    = ["/tenantId"]
  partition_key_version  = 1
  analytical_storage_ttl = -1
}

Similarly, role assignments at the data plane level grant individual services the necessary permissions to interact with their respective databases.

data "azurerm_cosmosdb_sql_role_definition" "writer" {
  resource_group_name = var.cosmosdb_account.resource_group
  account_name        = var.cosmosdb_account.name
  role_definition_id  = "00000000-0000-0000-0000-000000000002"
}

resource "azurerm_cosmosdb_sql_role_assignment" "dev_team_writer" {

  resource_group_name = data.azurerm_cosmosdb_sql_role_definition.writer.resource_group_name
  account_name        = data.azurerm_cosmosdb_sql_role_definition.writer.account_name
  role_definition_id  = data.azurerm_cosmosdb_sql_role_definition.writer.id
  scope               = data.azurerm_cosmosdb_account.main.id
  principal_id        = azurerm_user_assigned_identity.function.principal_id

}

Conclusion

Cosmos DB is an incredibly flexible and scalable database service, but managing it effectively across a large team requires thoughtful structuring. By breaking down Terraform configurations into separate modules for Enterprise platform infrastructure, shared workload infrastructure, and individual services, teams can establish clear boundaries and responsibilities.

This approach ensures that the underlying infrastructure remains secure and maintainable while allowing development teams to operate independently. With a structured approach, organizations can harness the full potential of Infrastructure-as-Code to manage powerful cloud-native services like Cosmos DB without sacrificing governance and control.