Managing Azure Data Explorer Administrative Action Cool Down with Terraform

Provisioning Azure Data Explorer (ADX), commonly referred to as Kusto, can feel deceptively straightforward — until it isn’t. The deployment process is often tripped up by one particularly thorny constraint: the cluster’s enforced cool down period. This enforced delay means that after initiating certain operations, the cluster enters a temporary maintenance state during which no additional administrative actions are allowed. Any attempt to act prematurely results in cryptic error messages like:

Failed: ServiceIsInMaintenance — [Conflict] Cluster ‘foo-test’ is in process of maintenance for a short period. You may retry to invoke the operation in a few minutes.

When using Terraform, this creates a frustrating pattern. If your resource provisioning steps aren’t strictly sequential, you’re essentially whacking away at a provisioning piñata — sometimes a resource is created, other times it fails due to the cluster still being locked down. The only way forward is to re-run terraform apply until everything finally sticks. It’s not sustainable.

The Sequential Dependency Strategy

To work around this, we need to introduce explicit control over execution order in Terraform using the depends_onattribute. For example, in one common automation pattern, we create a Kusto database and set up data ingestion from Cosmos DB. But this can’t happen until all required tables and mappings are in place. This dependency chain becomes crucial for stable provisioning. Currently, I avoid using the ADX Terraform Provider due to its lack of Managed Identity support — a must-have in today’s security-conscious environments. Instead, I use the azurerm_kusto_script resource. It’s not an ideal experience since Terraform has no insight into what the script does, reminiscent of the old frustrations with the azurerm_deployment_template used for unsupported ARM resources. But it’s what we have until the official provider supports Managed Identity.

Building the Infrastructure

First, I reference my shared Kusto cluster, a shared cluster — yes, these things are what I consider “heavy metal” so they are very often shared across workloads.

data "azurerm_kusto_cluster" "main" {
  name                = var.kusto_cluster.name
  resource_group_name = var.kusto_cluster.resource_group
}

Then, I provision a dedicated database for my workload:

resource "azurerm_kusto_database" "foo" {
  name                = var.application_name
  resource_group_name = data.azurerm_kusto_cluster.main.resource_group_name
  location            = data.azurerm_kusto_cluster.main.location
  cluster_name        = data.azurerm_kusto_cluster.main.name
  hot_cache_period   = "P7D"
  soft_delete_period = "P31D"

  # prevent the possibility of accidental data loss
  lifecycle {
    prevent_destroy = true
  }
}

The database inherently depends on the cluster, so Terraform handles that link automatically.

Next, I use a script to create the required table and JSON ingestion mapping:

resource "azurerm_kusto_script" "foo_table" {
  name           = "${var.application_name}-${var.environment_name}-foo-table-script"
  database_id    = azurerm_kusto_database.foo.id
  script_content = file("${path.module}/files/foo_table.kql")
}

Again, the script knows who’s boss. So it will work just fine and run the script in my foo_table.kql KQL file.

.create-merge table Foo (id:string, createdts:datetime, lastupdatedts:datetime, tenantId:string, status: string, source:string, rawdata:string) with (folder = "", docstring = "")  
 
.create table Foo ingestion json mapping "FooJsonMapping" '[{"Column": "id", "Properties": {"Path": "$.id"}},{"Column": "createdts", "Properties": {"Path": "$.createdOn"}},{"Column": "lastupdatedts", "Properties": {"Path": "$.lastUpdatedOn"}},{"Column": "tenantId", "Properties": {"Path": "$.tenantId"}},{"Column": "status", "Properties": {"Path": "$.status"}},{"Column": "source", "Properties": {"Path": "$.source"}},{"Column": "rawdata", "Properties": {"Path": "$"}}]'

My script is actually creating two things. It’s creating a table called Foo and a JSON Mapping resource called FooJsonMapping. The Cosmos DB ingestion resource needs both of these things to be present in order to work.

resource "azurerm_kusto_cosmosdb_data_connection" "foo" {
  name                  = "kusto-cosmos-ingestion-foo"
  location              = azurerm_resource_group.main.location
  cosmosdb_container_id = azurerm_cosmosdb_sql_container.tasks.id
  kusto_database_id     = azurerm_kusto_database.assessment.id
  managed_identity_id   = data.azurerm_user_assigned_identity.kusto_identity.id
  table_name            = "Foo"
  mapping_rule_name     = "FooJsonMapping"
  retrieval_start_date  = "2025-02-01T00:00:00Z"

  depends_on = [azurerm_kusto_script.foo_table]
}

When I create the Cosmos DB Data Connection, notice that I need to draw an explicit dependency on the script that provisions the table Foo and the Mapping Rule FooJsonMapping.

Chaining Multiple Table Actions with Explicit Dependencies

Let’s explore what this looks like in practice when dealing with multiple tables and their corresponding ingestion configurations. Suppose you have four tables: Foo, Bar, Fizz, and Buzz. Each table must be provisioned with a Kusto script that creates the table itself and its JSON ingestion mapping. Only after both the table and mapping exist can we provision the Cosmos DB data connection.

Now, to provision the Bar table, you must explicitly wait until the Foo Cosmos DB ingestion connection is completed. Why? Because that ensures not only the script but the successful ingestion configuration has settled, which means the cluster is likely out of any transient maintenance state:

resource "azurerm_kusto_script" "bar_table" {
  name           = "${var.application_name}-${var.environment_name}-bar-table-script"
  database_id    = azurerm_kusto_database.assessment.id
  script_content = file("${path.module}/files/bar_table.kql")

  depends_on = [azurerm_kusto_cosmosdb_data_connection.foo]
}

Then, define the Cosmos DB data connection for Bar, again pointing its dependency back to the bar_table script:

resource "azurerm_kusto_cosmosdb_data_connection" "bar" {
  name                  = "kusto-cosmos-ingestion-bar"
  location              = azurerm_resource_group.main.location
  cosmosdb_container_id = azurerm_cosmosdb_sql_container.bar.id
  kusto_database_id     = azurerm_kusto_database.assessment.id
  managed_identity_id   = data.azurerm_user_assigned_identity.kusto_identity.id
  table_name            = "Bar"
  mapping_rule_name     = "BarJsonMapping"
  retrieval_start_date  = "2025-02-01T00:00:00Z"

  depends_on = [azurerm_kusto_script.bar_table]
}

This chaining pattern repeats for each subsequent table. The Fizz table script must depend on the Bar ingestion connection and likewise for the Cosmos DB connection. Repeat this one more time for Buzz, referencing the Fizz ingestion connection in buzz_table’s depends_on.

Why This Pattern Works

The key benefit of this cascading dependency chain is that it enforces temporal isolation between operations that can otherwise trip over each other due to Kusto’s opaque maintenance state. Each table and ingestion configuration has a guaranteed window in which the cluster is not being modified, minimizing the risk of conflict errors.

This also mimics a step-by-step deployment flow you’d manually execute: provision a table, wait for the system to settle, apply the ingestion config, and only then move on to the next. Terraform doesn’t know that the Kusto cluster needs this breathing room — but with depends_on, you give it the guidance it needs.

Conclusion

This method may seem verbose — and even a bit imperative, but it’s the most deterministic way to avoid retry storms when provisioning Azure Data Explorer resources via Terraform (or any tool really). Prehaps some intelligence could be implemented in the azurerm resources I am using here or maybe having a custom provider to manage the ADX data plane would be the solution.

However, as long as the ADX Terraform Provider lacks Managed Identity support and script result visibility, using azurerm_kusto_script resources and chaining downstream resources with depends_on is your best safeguard against transient failures caused by the Administrative freeze that takes place on ADX clusters.