Resources vs Data Sources¶

Understanding the fundamental difference between resources and data sources in Terraform providers.

🤖 AI-Generated Content

This documentation was generated with AI assistance and is still being audited. Some, or potentially a lot, of this information may be inaccurate. Learn more.

Conceptual Difference¶

Resources: Infrastructure Management¶

Resources represent managed infrastructure - things your Terraform configuration creates, updates, and destroys.

Think of resources as: - Active management - Write operations - State tracking - Lifecycle management

Examples: - Creating a server - Managing a database - Configuring a firewall rule - Creating a DNS record

User intent: "Make this exist and keep it that way"

Data Sources: Information Retrieval¶

Data sources are read-only queries - they fetch information without creating or modifying anything.

Think of data sources as: - Passive querying - Read operations - No state (refreshed each run) - Information lookup

Examples: - Query available server images - Look up existing VPC - Fetch current IP address - Read file contents

User intent: "Tell me what exists"

Technical Differences¶

Aspect	Resource	Data Source
Purpose	Manage infrastructure	Query information
Operations	Create, Read, Update, Delete	Read only
State	Tracked by Terraform	No state (re-queried)
Lifecycle	Full CRUD lifecycle	Single read operation
Changes	Can modify remote systems	Never modifies anything
Methods	`read()`, `_create_apply()`, `_update_apply()`, `_delete_apply()`	`read()` only
Schema	Inputs + computed outputs	Inputs + computed outputs
Context	ResourceContext with state	Just config parameter

When to Use Each¶

Use a Resource When:¶

✅ You need to create something

resource "mycloud_server" "web" {
  name = "web-server"
  size = "large"
}

✅ You need to manage lifecycle (create, update, delete)

resource "mycloud_database" "main" {
  name = "production-db"
  size = 100  # Can be updated
}

✅ Terraform should track state and detect drift

resource "mycloud_file" "config" {
  path    = "/etc/app/config.yaml"
  content = templatefile("config.yaml.tpl", {})
}
# Terraform will recreate if deleted outside Terraform

✅ You need update-in-place behavior

resource "mycloud_server" "app" {
  name  = "app-server"
  tags  = var.tags  # Update tags without recreating server
}

Use a Data Source When:¶

✅ You need to query existing infrastructure

data "mycloud_image" "ubuntu" {
  name = "ubuntu-22.04"
}

resource "mycloud_server" "web" {
  image_id = data.mycloud_image.ubuntu.id
}

✅ You need information from external systems

data "mycloud_vpc" "main" {
  filter = "name=production"
}

resource "mycloud_server" "app" {
  vpc_id = data.mycloud_vpc.main.id
}

✅ You need dynamic lookups

data "mycloud_available_zones" "current" {
  region = var.region
}

# Use in resource
resource "mycloud_server" "regional" {
  for_each = toset(data.mycloud_available_zones.current.zones)
  zone     = each.key
}

✅ You need data that changes independently of Terraform

data "mycloud_current_ip" "me" {}

# Use for security group
resource "mycloud_security_rule" "allow_me" {
  source_ip = data.mycloud_current_ip.me.ip
}
# IP is refreshed on every terraform plan

Design Philosophy¶

Resource Philosophy¶

"Declarative Infrastructure"

User declares desired state
Terraform makes it so
Continuous reconciliation
Provider owns the lifecycle

Resource contract:

User: "I want a server named 'web' with size 'large'"
Provider: "I will create it and keep it that way"
Terraform: "I will track its state and detect any drift"

Data Source Philosophy¶

"Information Bridge"

User requests information
Provider fetches it
No ownership or management
Fresh data on every run

Data source contract:

User: "What is the latest Ubuntu image ID?"
Provider: "Here it is: ami-12345"
Terraform: "I will ask again next time you run plan"

Common Anti-Patterns¶

❌ Using Resource for Read-Only Queries¶

# BAD: Resource for something that shouldn't be managed
resource "mycloud_latest_image" "ubuntu" {
  name = "ubuntu"  # This queries, doesn't create!
}

Why bad: Resources should manage lifecycle. If this is read-only, it's a data source.

Fix:

# GOOD: Data source for queries
data "mycloud_latest_image" "ubuntu" {
  name = "ubuntu"
}

❌ Using Data Source for Mutable State¶

# BAD: Data source that tries to modify
data "mycloud_server_config" "web" {
  server_id = mycloud_server.web.id
  config = {...}  # Trying to configure server via data source!
}

Why bad: Data sources are read-only. This should be a resource property.

Fix:

# GOOD: Resource manages configuration
resource "mycloud_server" "web" {
  name   = "web-server"
  config = {...}  # Part of resource
}

❌ Storing Non-Deterministic Data¶

# BAD: Data source with different results each time
data "mycloud_random_server" "any" {
  # Returns different server each query!
}

Why bad: Data sources should be deterministic (same inputs = same outputs).

Fix: If you need randomness, use a resource to track it:

# GOOD: Resource for persistent random value
resource "random_integer" "server_index" {
  min = 1
  max = 100
}

Implementation Comparison¶

Resource Implementation¶

from pyvider.resources import BaseResource

class Server(BaseResource):
    # Multiple lifecycle methods
    async def read(self, ctx) -> State | None:
        """Check current state."""
        pass

    async def _create_apply(self, ctx) -> tuple[State, None]:
        """Create server."""
        pass

    async def _update_apply(self, ctx) -> tuple[State, None]:
        """Update server."""
        pass

    async def _delete_apply(self, ctx) -> None:
        """Delete server."""
        pass

Complexity: Multiple methods, state management, lifecycle coordination

Data Source Implementation¶

from pyvider.data_sources import BaseDataSource

class ServerQuery(BaseDataSource):
    # Single read method
    async def read(self, config) -> Data:
        """Query servers."""
        servers = await api.list_servers(config.filter)
        return Data(id=config.filter, servers=servers)

Simplicity: One method, no state, just query and return

State Management¶

Resources Have State¶

resource "mycloud_server" "web" {
  name = "web-server"
}

Terraform tracks:

{
  "id": "server-123",
  "name": "web-server",
  "size": "large",
  "status": "running"
}

Changes are detected by comparing state with reality.

Data Sources Have No State¶

data "mycloud_server_list" "running" {
  status = "running"
}

Terraform does not track state. Every terraform plan re-fetches the data.

Refresh Behavior¶

Resources: Drift Detection¶

terraform plan calls read()
Compares current state with desired state
Shows what needs to change

Example:

Server exists but tags changed:
  ~ tags: {env: "dev"} → {env: "prod"}

Terraform will update in-place.

Data Sources: Always Refresh¶

terraform plan calls read()
Uses fresh data immediately
No comparison with previous run

Example:

Query returned 5 servers (was 3 last time)

This is expected - data sources always refresh.

When Something Could Be Either¶

Sometimes the same concept could be implemented as either:

Example: DNS Record¶

As a Resource:

resource "mycloud_dns_record" "www" {
  domain = "example.com"
  name   = "www"
  value  = mycloud_server.web.ip
}
# Terraform creates and manages the DNS record

As a Data Source:

data "mycloud_dns_record" "www" {
  domain = "example.com"
  name   = "www"
}
# Terraform queries existing DNS record

Decision criteria: - Resource if you want Terraform to create/manage it - Data source if it exists externally and you just need its value

Summary¶

Resources = Active Management - Create, update, delete infrastructure - Terraform owns the lifecycle - State tracked and drift detected - Use for things Terraform should manage

Data Sources = Passive Queries - Read information only - No lifecycle management - No state tracking - Use for lookups and external data

Resources vs Data Sources¶

Conceptual Difference¶

Resources: Infrastructure Management¶

Data Sources: Information Retrieval¶

Technical Differences¶

When to Use Each¶

Use a Resource When:¶

Use a Data Source When:¶

Design Philosophy¶

Resource Philosophy¶

Data Source Philosophy¶

Common Anti-Patterns¶

❌ Using Resource for Read-Only Queries¶

❌ Using Data Source for Mutable State¶

❌ Storing Non-Deterministic Data¶

Implementation Comparison¶

Resource Implementation¶

Data Source Implementation¶

State Management¶

Resources Have State¶

Data Sources Have No State¶

Refresh Behavior¶

Resources: Drift Detection¶

Data Sources: Always Refresh¶

When Something Could Be Either¶

Example: DNS Record¶

Summary¶

See Also¶