Using Data Sources¶
Data sources provide read-only access to information from your infrastructure or external systems. This guide covers how to use data sources effectively in your Terraform configurations.
๐ค AI-Generated Content
This documentation was generated with AI assistance and is still being audited. Some, or potentially a lot, of this information may be inaccurate. Learn more.
What are Data Sources?¶
Data sources query existing infrastructure without managing it. They're used to: - Fetch configuration from existing resources - Query external APIs for information - Retrieve metadata for use in resource configuration - Lookup values dynamically during planning
Basic Usage¶
Query a data source in Terraform:
data "mycloud_image" "ubuntu" {
name = "ubuntu-22.04"
region = "us-west-2"
}
resource "mycloud_server" "web" {
name = "web-server"
image_id = data.mycloud_image.ubuntu.id # Use data source output
size = "small"
}
Common Patterns¶
Lookup by Filter¶
Find resources matching specific criteria:
data "mycloud_images" "available" {
name_filter = "ubuntu-*"
region = "us-west-2"
active_only = true
}
output "image_count" {
value = length(data.mycloud_images.available.ids)
}
Query Configuration¶
Fetch settings from external systems:
data "mycloud_config" "settings" {
environment = var.environment
}
resource "mycloud_server" "app" {
name = "app-server"
timeout = data.mycloud_config.settings.default_timeout
}
Retrieve Metadata¶
Get information about the provider or account:
Data Source Dependencies¶
Data Sources in Resources¶
Resources can depend on data source outputs:
data "mycloud_network" "main" {
name = "main-network"
}
resource "mycloud_server" "web" {
network_id = data.mycloud_network.main.id # Implicit dependency
}
Data Sources Referencing Resources¶
Data sources can use resource outputs:
resource "mycloud_network" "app" {
name = "app-network"
cidr = "10.0.0.0/16"
}
data "mycloud_network_details" "app" {
network_id = mycloud_network.app.id # Reads from resource
}
Real-World Examples¶
Environment-Specific Configuration¶
variable "environment" {
type = string
}
data "mycloud_config" "env" {
environment = var.environment
}
resource "mycloud_server" "app" {
name = "${var.environment}-app"
size = data.mycloud_config.env.instance_size
replicas = data.mycloud_config.env.replica_count
}
Service Discovery¶
data "mycloud_services" "databases" {
type = "database"
status = "running"
}
resource "mycloud_server" "app" {
name = "app-server"
environment = {
DB_HOST = data.mycloud_services.databases.endpoints[0]
}
}
Certificate Management¶
data "mycloud_certificate" "tls" {
domain = "example.com"
latest = true
}
resource "mycloud_load_balancer" "web" {
name = "web-lb"
certificate_id = data.mycloud_certificate.tls.id
}
Performance Considerations¶
Caching¶
Data sources are evaluated during planning and cached:
# Queried once during terraform plan
data "mycloud_images" "all" {
region = "us-west-2"
}
# All resources use cached data
resource "mycloud_server" "web1" {
image_id = data.mycloud_images.all.ids[0]
}
resource "mycloud_server" "web2" {
image_id = data.mycloud_images.all.ids[0]
}
Filtering¶
Filter data sources to reduce query size:
# Good: Specific filter
data "mycloud_images" "ubuntu" {
name_filter = "ubuntu-22.04-*"
region = "us-west-2"
}
# Avoid: Fetching all data then filtering in Terraform
data "mycloud_images" "all" {
region = "us-west-2"
}
locals {
ubuntu_images = [
for img in data.mycloud_images.all.images :
img if can(regex("ubuntu-22.04", img.name))
]
}
Error Handling¶
Missing Data¶
Handle cases where data isn't found:
data "mycloud_image" "ubuntu" {
name = "ubuntu-22.04"
}
# Terraform will error if image not found
# Provider should return clear error message
Optional Data¶
Use count for optional data sources:
data "mycloud_certificate" "tls" {
count = var.enable_tls ? 1 : 0
domain = "example.com"
}
resource "mycloud_load_balancer" "web" {
certificate_id = var.enable_tls ? data.mycloud_certificate.tls[0].id : null
}
Best Practices¶
1. Use Specific Filters¶
Query only what you need:
# Good
data "mycloud_images" "ubuntu" {
name = "ubuntu-22.04-amd64"
region = "us-west-2"
}
# Avoid
data "mycloud_images" "all" {}
2. Minimize Data Source Calls¶
Reuse data sources across multiple resources:
data "mycloud_network" "main" {
name = "main-network"
}
resource "mycloud_server" "web1" {
network_id = data.mycloud_network.main.id
}
resource "mycloud_server" "web2" {
network_id = data.mycloud_network.main.id # Reuses cached data
}
3. Document Data Sources¶
Add descriptions to clarify usage:
data "mycloud_config" "app" {
environment = var.environment
# Fetches environment-specific configuration including:
# - Instance sizes
# - Replica counts
# - Timeout values
}
4. Handle Missing Data Gracefully¶
Validate data source results:
data "mycloud_network" "app" {
name = var.network_name
}
# Validate result exists
resource "null_resource" "validate" {
count = data.mycloud_network.app.id != "" ? 0 : 1
provisioner "local-exec" {
command = "echo 'Network ${var.network_name} not found' && exit 1"
}
}
Common Use Cases¶
Configuration Management¶
Fetch external configuration:
data "mycloud_config" "app" {
application = "web-app"
environment = var.environment
}
resource "mycloud_server" "app" {
name = "app-server"
environment = merge(
data.mycloud_config.app.variables,
var.additional_env_vars
)
}
Resource Discovery¶
Find existing infrastructure:
data "mycloud_vpc" "main" {
default = true
}
data "mycloud_subnets" "public" {
vpc_id = data.mycloud_vpc.main.id
public = true
}
resource "mycloud_server" "web" {
subnet_id = data.mycloud_subnets.public.ids[0]
}
Dynamic Values¶
Compute values at plan time:
data "mycloud_availability_zones" "available" {
region = var.region
}
resource "mycloud_server" "distributed" {
count = length(data.mycloud_availability_zones.available.names)
name = "server-${count.index}"
zone = data.mycloud_availability_zones.available.names[count.index]
}
Comparison: Data Sources vs Resources¶
| Aspect | Data Sources | Resources |
|---|---|---|
| Purpose | Read-only queries | Create/Update/Delete |
| State | Not stored in state | Stored in state |
| Lifecycle | Query during plan | Full CRUD lifecycle |
| Updates | Re-queried on each plan | Only updated on changes |
| Dependencies | Can depend on resources | Can depend on data sources |
Examples from pyvider-components¶
The pyvider-components repository includes many data source examples:
env_variables- Read environment variablesfile_info- Get file metadatahttp_api- Query HTTP endpointslens_jq- Transform JSON with JQ
See Also¶
- Creating Data Sources - How to implement data sources
- Managing Resources - Resource lifecycle management
- Best Practices - Provider development patterns
- Pyvider Components - Working examples