Rookie Mistake: Misusing Data Sources for Intra-Module References in Terraform
In a recent code review, I came across a subtle but critical mistake involving the provisioning of a new Azure Data Explorer (Kusto) database using Terraform. A developer was attempting to create a new Kusto database and, rather than directly referencing the Kusto cluster resource that was already being provisioned within the same root module, they opted to use a data source to “look it up.” On the surface, this might seem harmless, or even stylistically consistent, but this misstep introduces hidden complexities and potential failure points.
The Setup
Here’s what the developer did:
data "azurerm_kusto_cluster" "main" {
name = azurerm_kusto_cluster.main.name
resource_group_name = azurerm_resource_group.main.name
}
resource "azurerm_kusto_database" "operations" {
name = var.appinsights_logs_kusto_database_name
resource_group_name = data.azurerm_kusto_cluster.main.resource_group_name
location = data.azurerm_kusto_cluster.main.location
cluster_name = data.azurerm_kusto_cluster.main.name
hot_cache_period = "P7D" # Adjust as needed
#soft_delete_period = "P31D" # Adjust as needed
lifecycle {
prevent_destroy = true # Prevent accidental deletion
}
}
This seems like a logical approach if the Kusto cluster was managed independently or provisioned externally. But in this case, the same root module was already provisioning the Kusto cluster. Not to mention, the same Azure Resource Group too!
resource "azurerm_kusto_cluster" "main" {
name = "${var.application_name}-${var.environment_name}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
sku {
name = "Standard_E16d_v5"
capacity = 2
}
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.kusto.id
]
}
}
What is the problem? well it creates what would be direct dependency an indirect one. the worst part about this is the mixed signals this sends to terraform.
Terraform now things that the data source is totally different from the resource. Remember, the below expression is NOT true:
data.azurerm_kusto_cluster.main != azurerm_kusto_cluster.main
What Went Wrong?
The core issue lies in how Terraform interprets data sources versus resources. Data sources are intended for referencing already existing infrastructure — resources that are not being managed in the current Terraform execution context. By using a data block to reference the azurerm_kusto_cluster, Terraform is led to believe that this cluster exists independently of the current execution plan.
This creates two main problems:
-
False Independence and Misleading Dependency Graph: Terraform now treats data.azurerm_kusto_cluster.main as a separate entity from azurerm_kusto_cluster.main, despite both pointing to the same conceptual object. This breaks Terraform’s internal understanding of resource relationships, leading to a flawed dependency graph and incorrect apply order.
-
Deployment Failures on First Run: On the initial deployment, Terraform will try to read the Kusto cluster via the data source before it’s created. This will almost certainly fail with an error because the cluster doesn’t yet exist, or is mid-provisioning. Even in the best-case scenario, this introduces unnecessary risk and complexity in the deployment pipeline.
The Correct Approach
When a resource is defined in the same root module, it should always be referenced directly. In this case, the correct implementation should simply use:
resource "azurerm_kusto_database" "operations" {
name = var.appinsights_logs_kusto_database_name
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
cluster_name = azurerm_kusto_cluster.main.name
hot_cache_period = "P7D"
lifecycle {
prevent_destroy = true
}
}
This makes the dependency explicit and guarantees Terraform can construct an accurate and efficient execution plan. It allows Terraform to understand that the database resource depends on the cluster resource, ensuring proper creation order and eliminating ambiguous or misleading indirection.
Conclusion
Avoid using data sources to reference resources managed within the same Terraform root module. This requires you, as a developer, to BE AWARE of what your root module is provisioning! This anti-pattern introduces false dependencies, leads to fragile first-run behaviors, and distorts Terraform’s execution plan. The right way is to use direct references to resources defined in the same context. Doing so aligns with Terraform’s design principles and leads to more reliable, predictable infrastructure as code.