Building Your First Data Source¶
Welcome! In this tutorial, you'll build your first Terraform data source using pyvider. Data sources are read-only queries that fetch information without creating infrastructure.
๐ค AI-Generated Content
This documentation was generated with AI assistance and is still being audited. Some, or potentially a lot, of this information may be inaccurate. Learn more.
What You'll Learn:
- How data sources differ from resources
- Creating a data source class with pyvider
- Defining input/output schemas
- Implementing read operations
- Using data sources in Terraform
Time to Complete: 10-15 minutes
Prerequisites:
- Python 3.11+ installed
- pyvider installed (installation guide)
- Basic Python knowledge
- Basic Terraform knowledge
What is a Data Source?¶
A data source is a read-only query that fetches information from external systems. Unlike resources (which manage infrastructure), data sources just read data.
Examples:
- Query file information
- Look up cloud resources
- Fetch API data
- Read database records
Key Differences from Resources:
| Data Source | Resource |
|---|---|
| Read-only | Read-write |
| No lifecycle (just query) | Full CRUD lifecycle |
| No state management | Terraform tracks state |
| Quick queries | Manages infrastructure |
Step 1: Create Your Package Structure¶
mkdir -p my_provider/data_sources
touch my_provider/__init__.py
touch my_provider/data_sources/__init__.py
touch my_provider/data_sources/file_info.py
Your structure:
my_provider/
โโโ __init__.py
โโโ data_sources/
โโโ __init__.py
โโโ file_info.py # We'll work here
Step 2: Define Runtime Types¶
Data sources have two types:
- Config - Input from user (what to query)
- Data - Output to user (query results)
Open my_provider/data_sources/file_info.py:
import attrs
# Configuration: User inputs
@attrs.define
class FileInfoConfig:
"""What the user wants to query."""
path: str # Which file to query
# Data: Query results
@attrs.define
class FileInfoData:
"""Information we return about the file."""
id: str # Unique identifier
path: str # File path
size: int # File size in bytes
exists: bool # Whether file exists
content: str # File content
Why two types?
- Config = what to query
- Data = query results
Simple and clean separation!
Step 3: Create the Data Source Class¶
Now let's create the data source:
from pyvider.data_sources import register_data_source, BaseDataSource
from pyvider.schema import s_data_source, a_str, a_num, a_bool, PvsSchema
@register_data_source("file_info")
class FileInfo(BaseDataSource):
"""Reads information about a local file."""
# Link our runtime types
config_class = FileInfoConfig
state_class = FileInfoData
@classmethod
def get_schema(cls) -> PvsSchema:
"""Define what Terraform users see."""
return s_data_source({
# Input (from user)
"path": a_str(required=True, description="File path to query"),
# Outputs (we compute all of these)
"id": a_str(computed=True, description="File path as ID"),
"size": a_num(computed=True, description="File size in bytes"),
"exists": a_bool(computed=True, description="Whether file exists"),
"content": a_str(computed=True, description="File content"),
})
What's happening?
@register_data_source("file_info")- Registers as a Terraform data sourceconfig_class/data_class- Links our attrs classes- All outputs are
computed=True- We calculate them
Step 4: Implement the Read Method¶
Data sources have ONE method: read(). It takes a ResourceContext and returns data:
async def read(self, ctx: ResourceContext) -> FileInfoData | None:
"""Read file information."""
if not ctx.config:
return None
from pathlib import Path
file_path = Path(ctx.config.path)
# Check if file exists
if file_path.exists():
# File exists - read information
content = file_path.read_text()
size = file_path.stat().st_size
return FileInfoData(
id=str(file_path.absolute()),
path=str(file_path),
size=size,
exists=True,
content=content,
)
else:
# File doesn't exist - return empty data
return FileInfoData(
id=str(file_path.absolute()),
path=str(file_path),
size=0,
exists=False,
content="",
)
Key points:
- Takes
ctx: ResourceContextparameter (same as resources) - Access configuration via
ctx.config - Return
Noneif config is unavailable - Always return data (even if file doesn't exist)
- Generate a stable, deterministic ID
- Handle missing data gracefully
Complete Code¶
Here's your complete file_info.py:
import attrs
from pyvider.data_sources import register_data_source, BaseDataSource
from pyvider.resources.context import ResourceContext
from pyvider.schema import s_data_source, a_str, a_num, a_bool, PvsSchema
from pathlib import Path
# Configuration (input)
@attrs.define
class FileInfoConfig:
path: str
# Data (output)
@attrs.define
class FileInfoData:
id: str
path: str
size: int
exists: bool
content: str
@register_data_source("file_info")
class FileInfo(BaseDataSource):
"""Reads information about a local file."""
config_class = FileInfoConfig
state_class = FileInfoData
@classmethod
def get_schema(cls) -> PvsSchema:
"""Define Terraform schema."""
return s_data_source({
# Input
"path": a_str(required=True, description="File path to query"),
# Outputs (all computed)
"id": a_str(computed=True, description="File path as ID"),
"size": a_num(computed=True, description="File size in bytes"),
"exists": a_bool(computed=True, description="Whether file exists"),
"content": a_str(computed=True, description="File content"),
})
async def read(self, ctx: ResourceContext) -> FileInfoData | None:
"""Read file information."""
if not ctx.config:
return None
file_path = Path(ctx.config.path)
if file_path.exists():
content = file_path.read_text()
size = file_path.stat().st_size
return FileInfoData(
id=str(file_path.absolute()),
path=str(file_path),
size=size,
exists=True,
content=content,
)
else:
return FileInfoData(
id=str(file_path.absolute()),
path=str(file_path),
size=0,
exists=False,
content="",
)
Step 5: Test with Terraform¶
Create a Terraform configuration test.tf:
terraform {
required_providers {
local = {
source = "mycompany/local"
}
}
}
# Query file information
data "local_file_info" "readme" {
path = "../README.md"
}
# Use the data in outputs
output "readme_exists" {
value = data.local_file_info.readme.exists
}
output "readme_size" {
value = data.local_file_info.readme.size
}
# Use data in a resource
resource "local_file" "summary" {
path = "summary.txt"
content = <<EOT
README Information:
- Exists: ${data.local_file_info.readme.exists}
- Size: ${data.local_file_info.readme.size} bytes
EOT
}
Run it:
You should see:
- Data source queries the file
- Outputs show file existence and size
- Resource uses the data
Advanced Example: API Data Source¶
Here's a more realistic example that queries an API:
import attrs
from pyvider.data_sources import register_data_source, BaseDataSource
from pyvider.resources.context import ResourceContext
from pyvider.schema import s_data_source, a_str, a_num, a_list, PvsSchema
import httpx
@attrs.define
class APIQueryConfig:
endpoint: str
limit: int = 10
@attrs.define
class APIQueryData:
id: str
endpoint: str
results: list[str]
count: int
@register_data_source("api_query")
class APIQuery(BaseDataSource):
"""Queries an external API."""
config_class = APIQueryConfig
state_class = APIQueryData
@classmethod
def get_schema(cls) -> PvsSchema:
return s_data_source({
# Inputs
"endpoint": a_str(required=True, description="API endpoint"),
"limit": a_num(default=10, description="Max results"),
# Outputs
"id": a_str(computed=True, description="Query ID"),
"results": a_list(a_str(), computed=True, description="Results"),
"count": a_num(computed=True, description="Result count"),
})
async def read(self, ctx: ResourceContext) -> APIQueryData | None:
"""Execute API query."""
if not ctx.config:
return None
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api.example.com{ctx.config.endpoint}",
params={"limit": ctx.config.limit}
)
data = response.json()
items = data.get("items", [])
return APIQueryData(
id=f"{ctx.config.endpoint}:{ctx.config.limit}",
endpoint=ctx.config.endpoint,
results=items,
count=len(items),
)
Best Practices¶
-
Generate Stable IDs - Use deterministic ID generation so repeated queries return the same ID
-
Handle Missing Data - Return empty values instead of raising errors
-
Make Reads Idempotent - Multiple reads should return the same result
-
Use Computed Outputs - All outputs should be
computed=True -
Add Error Handling - Handle API failures gracefully
What You've Learned¶
Congratulations! You've built your first pyvider data source. You now understand:
โ Data Sources vs Resources - Read-only queries vs managed infrastructure โ Simple Read Pattern - One method that returns data โ Input/Output Separation - Config for inputs, Data for outputs โ Deterministic IDs - Stable identification for query results โ Error Handling - Graceful handling of missing data
Next Steps¶
Now that you understand data sources, explore:
- Building Your First Resource - For comparison
- How to Create a Data Source - Quick reference
- Handle Pagination - For large result sets
- Data Source API Reference - Complete API documentation
- Intermediate Provider Tutorial - Build a complete HTTP API provider
Troubleshooting¶
Q: My data source isn't being registered
Make sure you're using register_data_source() as a decorator and importing the module.
Q: Terraform says "computed values can't be configured"
All outputs in data sources must be computed=True. Inputs should not be computed.
Q: Data isn't refreshing
Data sources are refreshed on every terraform plan. Make sure your read() method is actually querying fresh data.
Q: How do I handle errors?
Return data with error fields instead of raising exceptions:
@attrs.define
class QueryData:
id: str
results: list[str]
error: str | None = None # Add error field
async def read(self, config):
try:
results = await query()
return QueryData(id=id, results=results, error=None)
except Exception as e:
return QueryData(id=id, results=[], error=str(e))
For more help, see Troubleshooting Guide.