CI/CD Improvements - Detailed Specifications¶
This document provides complete technical specifications for all 15 proposed improvements to soup stir for CI/CD environments.
Table of Contents¶
- High Priority Improvements
- #1 Auto-Detect CI/CD Environments
- #2 JSON Output for Standard Mode
- #3 JUnit XML Output
- #4 Format Flag with Multiple Output Modes
- #5 Timeout Controls
- #6 Parallelism Control
- Medium Priority Improvements
- #7 Timestamps in CI Mode
- #8 Populate Error Fields
- #9 Log Aggregation & Streaming
- #10 Summary File Output
- #11 Per-Phase Timing Breakdown
- #12 Progress Percentage Indicator
- #13 Configurable Refresh Rate
- Low Priority Improvements
- #14 Colored Output Control
- #15 Failure-Only Mode
High Priority Improvements¶
#1: Auto-Detect CI/CD Environments¶
Priority: 🔥 High
Effort: Medium
Files to Modify: cli.py, display.py
Description¶
Automatically detect when soup stir is running in a CI/CD environment and adapt the output format to be more suitable for non-interactive contexts.
Problem Statement¶
The current Rich Live display generates ANSI control codes and frequent updates that clutter CI logs and don't render properly in non-TTY environments. CI build logs become difficult to read and parse.
Solution¶
Detect CI environment and automatically switch to line-by-line output mode instead of live table updates.
CI Environment Detection¶
Detect CI by checking (in order):
1. TTY detection: not sys.stdout.isatty()
2. Environment variables (any of):
- CI=true (generic)
- GITHUB_ACTIONS=true (GitHub Actions)
- GITLAB_CI=true (GitLab CI)
- JENKINS_URL (Jenkins)
- CIRCLECI=true (CircleCI)
- TRAVIS=true (Travis CI)
- BUILDKITE=true (Buildkite)
- TEAMCITY_VERSION (TeamCity)
- TF_BUILD=true (Azure Pipelines)
Behavior Changes in CI Mode¶
When CI is detected:
- Disable live table updates - Don't use rich.Live()
- Use line-by-line output - Each status change prints a new line
- Reduce refresh rate - Only output on actual status changes
- Add timestamps - Prefix each line with timestamp (see #7)
- Simplify formatting - Reduce visual complexity
Output Format in CI Mode¶
[2025-11-02T10:30:00.123Z] 💤 PENDING 1/5 test-auth
[2025-11-02T10:30:01.234Z] 🧹 CLEANING 1/5 test-auth
[2025-11-02T10:30:02.345Z] 🔄 INIT 1/5 test-auth
[2025-11-02T10:30:05.456Z] 🚀 APPLYING 1/5 test-auth - Creating aws_instance.example
[2025-11-02T10:30:12.567Z] 🔬 ANALYZING 1/5 test-auth
[2025-11-02T10:30:13.678Z] 💥 DESTROYING 1/5 test-auth
[2025-11-02T10:30:15.789Z] ✅ PASS 1/5 test-auth (15.7s) - 2 providers, 5 resources
[2025-11-02T10:30:15.890Z] 🧹 CLEANING 2/5 test-network
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--ci |
boolean | auto-detect | Force CI mode even in TTY |
--no-ci |
boolean | auto-detect | Force interactive mode even in CI |
Environment Variables¶
| Variable | Values | Description |
|---|---|---|
SOUP_STIR_CI_MODE |
true/false/auto |
Override CI detection |
Acceptance Criteria¶
- CI environment is detected correctly in all major CI systems
- Non-TTY environments automatically use line-by-line output
-
--ciflag forces CI mode regardless of environment -
--no-ciflag forces interactive mode regardless of environment - Line-by-line output is clean and parseable
- Status changes are printed immediately (not buffered)
- All emoji and color are preserved (unless
--no-coloris used)
#2: JSON Output for Standard Mode¶
Priority: 🔥 High
Effort: Low
Files to Modify: cli.py, models.py, reporting.py
Description¶
Add --json flag to output test results in JSON format for standard (non-matrix) mode.
Problem Statement¶
Currently --json only works with --matrix mode. Standard test runs have no machine-readable output format, making it difficult to parse results programmatically or integrate with custom tooling.
Solution¶
Add --json flag that outputs structured JSON to stdout after tests complete.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--json |
boolean | false | Output results as JSON |
--json-pretty |
boolean | false | Pretty-print JSON output |
JSON Output Schema¶
{
"summary": {
"total": 5,
"passed": 4,
"failed": 1,
"skipped": 0,
"duration_seconds": 45.23,
"start_time": "2025-11-02T10:30:00.123456Z",
"end_time": "2025-11-02T10:30:45.356789Z",
"terraform_version": "1.5.7",
"command": "soup stir /path/to/tests"
},
"tests": [
{
"name": "test-auth",
"directory": "/full/path/to/test-auth",
"status": "passed",
"duration_seconds": 12.5,
"start_time": "2025-11-02T10:30:00.123456Z",
"end_time": "2025-11-02T10:30:12.623456Z",
"providers": 2,
"resources": 5,
"data_sources": 1,
"functions": 0,
"ephemeral_functions": 0,
"outputs": 3,
"warnings": false,
"failed_stage": null,
"error_message": null,
"logs": {
"stdout": "/path/to/stdout.log",
"stderr": "/path/to/stderr.log",
"terraform": "/path/to/terraform.log"
}
},
{
"name": "test-network",
"directory": "/full/path/to/test-network",
"status": "failed",
"duration_seconds": 5.2,
"start_time": "2025-11-02T10:30:12.623456Z",
"end_time": "2025-11-02T10:30:17.823456Z",
"providers": 1,
"resources": 0,
"data_sources": 0,
"functions": 0,
"ephemeral_functions": 0,
"outputs": 0,
"warnings": true,
"failed_stage": "APPLY",
"error_message": "Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist",
"logs": {
"stdout": "/path/to/stdout.log",
"stderr": "/path/to/stderr.log",
"terraform": "/path/to/terraform.log"
}
}
],
"provider_cache": {
"status": "success",
"duration_seconds": 2.3,
"providers_downloaded": 3
}
}
Behavior¶
- When
--jsonis used: - Suppress all Rich display output (no live table, no summary panel)
- Suppress all console output except JSON
- Write JSON to stdout after all tests complete
- Write any errors to stderr
-
Exit with appropriate code (0 for success, non-zero for failure)
-
When
--json-prettyis used: - Same as
--jsonbut with indentation (2 spaces) - Useful for human review
Compatibility¶
- Mutually exclusive with:
--format(if format is notjson) - Compatible with: all other flags (timeouts, parallelism, etc.)
--jsonimplies--no-cifor display purposes (no live updates)
Acceptance Criteria¶
-
--jsonoutputs valid JSON to stdout - JSON schema matches specification
- All test result fields are populated correctly
- Failed tests include
failed_stageanderror_message - No other output appears on stdout when using
--json - Errors and warnings go to stderr, not stdout
- JSON is parseable by
jqand other tools - Exit code reflects test results (0=all passed, non-zero=failures)
#3: JUnit XML Output¶
Priority: 🔥 High
Effort: Medium
Files to Modify: cli.py, reporting.py
Description¶
Generate JUnit XML test reports for integration with CI/CD systems.
Problem Statement¶
Most CI/CD systems (Jenkins, GitHub Actions, GitLab CI, CircleCI, etc.) have native support for displaying JUnit XML test results. This enables: - Visual test result dashboards - Historical trending - Flaky test detection - Test failure notifications
Solution¶
Add --junit-xml flag to generate JUnit XML compatible test reports.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--junit-xml=FILE |
path | none | Write JUnit XML to FILE |
--junit-suite-name=NAME |
string | "soup-stir" |
Test suite name in XML |
JUnit XML Format¶
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="soup-stir" tests="5" failures="1" errors="0" skipped="0" time="45.23" timestamp="2025-11-02T10:30:00.123456Z">
<testsuite name="terraform-tests" tests="5" failures="1" errors="0" skipped="0" time="45.23" timestamp="2025-11-02T10:30:00.123456Z">
<!-- Passed test -->
<testcase name="test-auth" classname="terraform.test-auth" time="12.5" timestamp="2025-11-02T10:30:00.123456Z">
<system-out><![CDATA[
Providers: 2
Resources: 5
Data Sources: 1
Outputs: 3
Warnings: false
]]></system-out>
</testcase>
<!-- Failed test -->
<testcase name="test-network" classname="terraform.test-network" time="5.2" timestamp="2025-11-02T10:30:12.623456Z">
<failure message="Terraform apply failed" type="TerraformApplyError"><![CDATA[
Stage: APPLY
Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist
Terraform Log: /path/to/terraform.log
Stdout Log: /path/to/stdout.log
Stderr Log: /path/to/stderr.log
]]></failure>
<system-out><![CDATA[
Providers: 1
Resources: 0
]]></system-out>
<system-err><![CDATA[
Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist
]]></system-err>
</testcase>
<!-- Skipped test -->
<testcase name="test-empty" classname="terraform.test-empty" time="0.1" timestamp="2025-11-02T10:30:17.823456Z">
<skipped message="No .tf files found" />
</testcase>
</testsuite>
</testsuites>
Field Mapping¶
| JUnit Field | Source | Notes |
|---|---|---|
testsuite/@name |
--junit-suite-name or "terraform-tests" |
Suite name |
testsuite/@tests |
Count of all tests | Total tests run |
testsuite/@failures |
Count of failed tests | Tests with status="failed" |
testsuite/@errors |
Count of error tests | Tests with ERROR state (exception in harness) |
testsuite/@skipped |
Count of skipped tests | Tests with status="skipped" |
testsuite/@time |
Total duration in seconds | Sum of all test times |
testcase/@name |
Test directory name | e.g., test-auth |
testcase/@classname |
"terraform." + test name |
Namespacing for CI systems |
testcase/@time |
Test duration in seconds | From TestResult.duration |
testcase/failure/@message |
error_message field |
First line of error |
testcase/failure/@type |
Derived from failed_stage |
e.g., TerraformInitError, TerraformApplyError |
testcase/failure/text() |
Full error with context | Error message + logs paths |
testcase/system-out |
Test metadata | Providers, resources, outputs counts |
testcase/system-err |
Error logs | Terraform error output |
Behavior¶
- Write XML to specified file path
- Create parent directories if they don't exist
- Overwrite file if it already exists
- Continue to display results to terminal (unless
--quiet) - Compatible with
--json(both files can be generated)
Acceptance Criteria¶
-
--junit-xmlcreates valid JUnit XML file - XML validates against JUnit XSD schema
- All test results are represented correctly
- Failed tests include failure message and details
- Skipped tests are marked with
<skipped>element - Timestamps use ISO 8601 format
- File is created even if tests fail
- XML is parseable by Jenkins, GitHub Actions, GitLab CI
- Parent directories are created automatically
- Existing file is overwritten
#4: Format Flag with Multiple Output Modes¶
Priority: 🔥 High
Effort: Medium
Files to Modify: cli.py, display.py, reporting.py
Description¶
Unified --format flag to control output style for different use cases.
Problem Statement¶
Different contexts require different output styles. A single flag to control output format is more intuitive than multiple boolean flags.
Solution¶
Add --format flag with multiple predefined output modes.
CLI Flags¶
| Flag | Type | Values | Default | Description |
|---|---|---|---|---|
--format |
choice | table, plain, json, github, quiet |
table (or plain if CI detected) |
Output format |
Format Modes¶
1. table - Rich Live Table (Default Interactive)¶
- Current behavior: Rich live table with colors and emoji
- Auto-refresh at configured rate
- Full visual display with all columns
- Best for: Interactive terminal use
2. plain - Plain Text Line-by-Line (Default CI)¶
- Line-by-line status updates
- Timestamps on each line
- Emoji preserved, colors preserved
- No live updates (only print on change)
- Best for: CI/CD logs, file output
Example:
[2025-11-02T10:30:00Z] 💤 PENDING 1/5 test-auth
[2025-11-02T10:30:01Z] 🔄 INIT 1/5 test-auth
[2025-11-02T10:30:05Z] 🚀 APPLYING 1/5 test-auth
[2025-11-02T10:30:12Z] ✅ PASS 1/5 test-auth (12.5s)
3. json - JSON Output¶
- Same as
--jsonflag - Outputs structured JSON to stdout
- Suppresses all other output
- Best for: Programmatic consumption, custom tooling
4. github - GitHub Actions Annotations¶
- Outputs GitHub Actions workflow commands
- Groups tests with
::group::/::endgroup:: - Errors use
::error::annotations - Warnings use
::warning::annotations - Best for: GitHub Actions workflows
Example:
::group::Test: test-auth
🔄 Running test-auth...
✅ test-auth passed in 12.5s
Providers: 2, Resources: 5, Outputs: 3
::endgroup::
::group::Test: test-network
🔄 Running test-network...
::error file=test-network/main.tf,line=15::Terraform apply failed: InvalidAMI: The image id '[ami-12345]' does not exist
❌ test-network failed in 5.2s
::endgroup::
5. quiet - Minimal Output¶
- Only print summary at end
- No progress updates
- No live display
- Errors still printed
- Best for: Scripts, when you only care about final result
Example:
Behavior¶
--formatis mutually exclusive with--json(or--jsonimplies--format=json)- Auto-detection: If CI detected and no
--formatspecified, useplain - Can be overridden:
--format=tableforces table even in CI
Environment Variables¶
| Variable | Values | Description |
|---|---|---|
SOUP_STIR_FORMAT |
table/plain/json/github/quiet |
Default format |
Acceptance Criteria¶
-
--format=tableuses Rich live table -
--format=plainuses line-by-line output -
--format=jsonoutputs valid JSON -
--format=githuboutputs valid GitHub Actions annotations -
--format=quietshows only summary - Auto-detection works (CI → plain, interactive → table)
-
SOUP_STIR_FORMATenvironment variable works - Each format is properly tested
#5: Timeout Controls¶
Priority: 🔥 High
Effort: Medium
Files to Modify: cli.py, executor.py, runtime.py
Description¶
Add timeout controls for tests to prevent hanging in CI/CD environments.
Problem Statement¶
Currently, standard tests have no timeout mechanism. A misbehaving test can run indefinitely, blocking CI pipelines and wasting resources. Matrix testing has timeouts, but standard mode does not.
Solution¶
Add global and per-test timeout controls.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--timeout=SECONDS |
int | unlimited | Global timeout for entire test suite (seconds) |
--test-timeout=SECONDS |
int | unlimited | Timeout per individual test (seconds) |
Environment Variables¶
| Variable | Type | Description |
|---|---|---|
SOUP_STIR_TIMEOUT |
int | Default global timeout (seconds) |
SOUP_STIR_TEST_TIMEOUT |
int | Default per-test timeout (seconds) |
Timeout Behavior¶
Per-Test Timeout (--test-timeout)¶
- Applies to each individual test
- Timer starts when test begins execution (CLEANING phase)
- Timer stops when test completes (PASS/FAIL/SKIP)
- If timeout is exceeded:
- Test is terminated (SIGTERM, then SIGKILL after grace period)
- Test is marked with special status:
TIMEOUT - Test counts as failure in summary
- Remaining tests continue
Global Timeout (--timeout)¶
- Applies to entire test suite
- Timer starts when first test begins
- Timer stops when all tests complete or timeout is hit
- If timeout is exceeded:
- All running tests are terminated
- Pending tests are marked as
SKIPPED - Summary shows incomplete status
- Exit with timeout-specific exit code (124)
Timeout Grace Period¶
- When timeout is hit, send SIGTERM to subprocess
- Wait 5 seconds for graceful termination
- If still running, send SIGKILL
- Mark test as
TIMEOUTwith message about forceful termination
Output Examples¶
Test Timeout:
Global Timeout:
⏱️ Suite timed out after 600s (--timeout=600)
Completed: 3/5 tests
Running: 2 tests (terminated)
Pending: 0 tests
Exit Codes¶
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | One or more tests failed |
| 124 | Global timeout exceeded (follows GNU timeout convention) |
| 125 | Test timeout exceeded |
JSON Output with Timeouts¶
{
"summary": {
"total": 5,
"passed": 2,
"failed": 1,
"timeout": 1,
"skipped": 1,
"duration_seconds": 600.0,
"timeout_exceeded": true,
"timeout_type": "global"
},
"tests": [
{
"name": "test-slow",
"status": "timeout",
"duration_seconds": 300.0,
"timeout_seconds": 300,
"error_message": "Test exceeded timeout of 300 seconds"
}
]
}
Acceptance Criteria¶
-
--timeoutenforces global timeout -
--test-timeoutenforces per-test timeout - Timeout tests are marked distinctly (TIMEOUT status)
- Timeout tests show in summary as separate category
- Exit code is 124 for global timeout, 125 for test timeout
- Graceful termination is attempted (SIGTERM before SIGKILL)
- JSON output includes timeout information
- JUnit XML marks timeout tests appropriately
- Environment variables work as defaults
#6: Parallelism Control¶
Priority: 🔥 High
Effort: Low
Files to Modify: cli.py, executor.py, config.py
Description¶
Add control over test parallelism to allow serial execution or custom concurrency limits.
Problem Statement¶
Currently, parallelism is hardcoded to os.cpu_count(). This is not always optimal:
- In CI with limited resources, may want to reduce parallelism
- For debugging, serial execution (-j 1) is often necessary
- Some CI environments have quotas that limit concurrent operations
Solution¶
Add --jobs flag to control parallelism, similar to make -j.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--jobs=N, -j N |
int | auto |
Number of tests to run in parallel |
-j (no value) |
flag | auto |
Use auto-detection (all CPUs) |
Special values:
- --jobs=1 or -j 1: Serial execution (one test at a time)
- --jobs=0 or --jobs=auto: Auto-detect (current behavior: os.cpu_count())
- --jobs=N: Run up to N tests in parallel
Environment Variables¶
| Variable | Type | Description |
|---|---|---|
SOUP_STIR_JOBS |
int/auto |
Default parallelism level |
Examples¶
# Serial execution (debugging)
soup stir --jobs=1
soup stir -j 1
# Limit to 2 parallel tests
soup stir --jobs=2
soup stir -j 2
# Use all CPUs (default)
soup stir --jobs=auto
soup stir -j
# Environment variable
export SOUP_STIR_JOBS=4
soup stir
Output Changes¶
Display effective parallelism at start:
Behavior¶
- Parallelism controls semaphore in
execute_tests() - Serial mode (
-j 1) is deterministic (tests run in sorted order) - Progress is easier to follow in serial mode
- Useful for:
- Debugging test failures
- CI environments with resource constraints
- Avoiding rate limits on cloud providers
- Ensuring deterministic test order
Acceptance Criteria¶
-
--jobs=Nlimits concurrent tests to N -
-j 1runs tests serially (one at a time) -
-j(no value) or--jobs=autouses all CPUs - Effective parallelism is displayed at start
-
SOUP_STIR_JOBSenvironment variable works - Serial mode runs tests in deterministic order
- Parallelism is reflected in live display (number of active tests)
Medium Priority Improvements¶
#7: Timestamps in CI Mode¶
Priority: 🟡 Medium
Effort: Low
Files to Modify: display.py
Description¶
Add timestamps to each output line when running in CI/CD mode.
Problem Statement¶
In CI logs, it's difficult to correlate events or understand when things happened without timestamps. This is especially problematic for long-running tests or when debugging timing issues.
Solution¶
Automatically add timestamps when in CI mode or when explicitly requested.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--timestamps |
boolean | auto (true in CI) | Show timestamps on each line |
--no-timestamps |
boolean | false | Disable timestamps even in CI |
--timestamp-format |
choice | iso8601 |
Timestamp format |
Timestamp Formats¶
| Format | Example | Description |
|---|---|---|
iso8601 |
2025-11-02T10:30:15.123456Z |
ISO 8601 with microseconds (default) |
iso8601-simple |
2025-11-02T10:30:15Z |
ISO 8601 without microseconds |
relative |
[+00:15.2s] |
Relative to test suite start |
elapsed |
[00:15.2] |
Same as relative but different format |
unix |
1699012215.123456 |
Unix timestamp with microseconds |
Output Examples¶
ISO 8601 (default):
[2025-11-02T10:30:00.123456Z] 💤 PENDING 1/5 test-auth
[2025-11-02T10:30:01.234567Z] 🔄 INIT 1/5 test-auth
Relative:
[+00:00.0s] 💤 PENDING 1/5 test-auth
[+00:01.1s] 🔄 INIT 1/5 test-auth
[+00:15.2s] ✅ PASS 1/5 test-auth
Behavior¶
- Auto-enabled in CI mode
- Can be forced with
--timestamps - Can be disabled with
--no-timestamps - Format controlled by
--timestamp-format - Timestamps use UTC for consistency across CI environments
Environment Variables¶
| Variable | Values | Description |
|---|---|---|
SOUP_STIR_TIMESTAMPS |
true/false/auto |
Enable timestamps |
SOUP_STIR_TIMESTAMP_FORMAT |
Format name | Timestamp format |
Acceptance Criteria¶
- Timestamps are auto-enabled in CI mode
-
--timestampsforces timestamps in interactive mode -
--no-timestampsdisables in CI mode - All timestamp formats work correctly
- Timestamps are aligned and don't break formatting
- Timestamps use UTC timezone
#8: Populate failed_stage and error_message Fields¶
Priority: 🟡 Medium
Effort: Low
Files to Modify: executor.py, models.py
Description¶
Populate the currently-empty failed_stage and error_message fields in TestResult.
Problem Statement¶
The TestResult data structure has failed_stage and error_message fields, but they are never populated. This makes it harder to analyze failures programmatically.
Solution¶
Track which stage failed and extract the error message.
Failed Stage Values¶
| Stage | When Set | Description |
|---|---|---|
null |
Test passed | No failure |
"INIT" |
terraform init failed |
Initialization failure |
"APPLY" |
terraform apply failed |
Apply failure |
"DESTROY" |
terraform destroy failed |
Destroy failure (rare) |
"ANALYZING" |
JSON parsing failed | State analysis failure |
"HARNESS" |
Python exception | Test harness error (not terraform) |
Error Message Extraction¶
Extract error message from parsed Terraform logs:
1. Find first log entry with @level == "error"
2. Extract @message field
3. If error is structured, extract relevant fields
4. Truncate to reasonable length (e.g., 500 chars)
5. Store in error_message field
Example¶
Before (current):
TestResult(
directory="test-network",
success=False,
failed_stage=None, # Not populated!
error_message=None, # Not populated!
...
)
After (improved):
TestResult(
directory="test-network",
success=False,
failed_stage="APPLY",
error_message="Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist",
...
)
JSON Output¶
{
"name": "test-network",
"status": "failed",
"failed_stage": "APPLY",
"error_message": "Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist"
}
JUnit XML Output¶
<testcase name="test-network" classname="terraform.test-network" time="5.2">
<failure message="Error: aws_instance.example: InvalidAMI" type="TerraformApplyError">
Stage: APPLY
Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist
</failure>
</testcase>
Acceptance Criteria¶
-
failed_stageis populated for all failures -
error_messagecontains first error from logs - Harness exceptions set
failed_stage="HARNESS" - Error messages are truncated if too long
- JSON output includes these fields
- JUnit XML uses these fields
- Passed tests have
nullfor both fields
#9: Log Aggregation & Streaming¶
Priority: 🟡 Medium
Effort: High
Files to Modify: terraform.py, cli.py, executor.py
Description¶
Provide options to aggregate logs or stream them to stdout for better CI integration.
Problem Statement¶
Logs are scattered across multiple directories, making them hard to access in CI environments. Developers need to download artifacts and navigate directory structures to find relevant logs.
Solution¶
Add options to stream logs in real-time or aggregate them into a single file.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--stream-logs |
boolean | false | Stream all terraform logs to stdout |
--aggregate-logs=FILE |
path | none | Aggregate all logs into single file |
--logs-dir=DIR |
path | auto | Custom directory for log files |
Stream Logs Mode (--stream-logs)¶
Stream all Terraform output to stdout in real-time:
[test-auth:init] Initializing the backend...
[test-auth:init] Initializing provider plugins...
[test-auth:apply] Terraform will perform the following actions:
[test-auth:apply] # aws_instance.example will be created
[test-network:init] Initializing the backend...
Features: - Prefix each line with test name and phase - Color-code by test (if colors enabled) - Interleave logs from parallel tests - Include timestamps
Aggregate Logs Mode (--aggregate-logs)¶
Write all logs to a single file:
File format:
========================================
Test: test-auth
Phase: init
Start: 2025-11-02T10:30:00Z
========================================
Initializing the backend...
Initializing provider plugins...
...
========================================
Test: test-auth
Phase: apply
Start: 2025-11-02T10:30:05Z
========================================
Terraform will perform the following actions:
...
Custom Logs Directory (--logs-dir)¶
Override default log location:
- All log files written to specified directory
- Useful for CI artifact collection
- Can be ephemeral or persistent
Environment Variables¶
| Variable | Description |
|---|---|
SOUP_STIR_STREAM_LOGS |
true/false - Enable log streaming |
SOUP_STIR_LOGS_DIR |
Path - Default logs directory |
Acceptance Criteria¶
-
--stream-logsstreams all logs to stdout - Logs are prefixed with test name and phase
- Parallel test logs are interleaved correctly
-
--aggregate-logscreates single log file - Aggregated logs are properly sectioned by test
-
--logs-dirchanges log output directory - Directory is created if it doesn't exist
- Compatible with other output formats
#10: Summary File Output¶
Priority: 🟡 Medium
Effort: Low
Files to Modify: cli.py, reporting.py
Description¶
Save test summary to a file for later analysis or CI artifact collection.
Problem Statement¶
Test summary is only printed to terminal and is not persisted. In CI, it's useful to have a summary file that can be: - Uploaded as an artifact - Parsed by other tools - Used for notifications or reports
Solution¶
Add --summary-file flag to save summary in various formats.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--summary-file=FILE |
path | none | Save summary to file |
--summary-format |
choice | json |
Summary file format (json/text/markdown) |
Summary Formats¶
JSON Format¶
{
"summary": {
"total": 5,
"passed": 4,
"failed": 1,
"skipped": 0,
"timeout": 0,
"duration_seconds": 45.23,
"start_time": "2025-11-02T10:30:00Z",
"end_time": "2025-11-02T10:30:45Z"
},
"passed": ["test-auth", "test-network", "test-storage", "test-compute"],
"failed": ["test-database"],
"skipped": [],
"timeout": []
}
Text Format¶
TofuSoup Test Summary
=====================
Total: 5
Passed: 4
Failed: 1
Skipped: 0
Duration: 45.2s
Passed Tests:
- test-auth
- test-network
- test-storage
- test-compute
Failed Tests:
- test-database
Generated: 2025-11-02T10:30:45Z
Markdown Format¶
# TofuSoup Test Summary
**Duration**: 45.2s
**Generated**: 2025-11-02T10:30:45Z
## Results
| Metric | Count |
|--------|-------|
| Total | 5 |
| ✅ Passed | 4 |
| ❌ Failed | 1 |
| ⏭️ Skipped | 0 |
## Passed Tests
- ✅ test-auth
- ✅ test-network
- ✅ test-storage
- ✅ test-compute
## Failed Tests
- ❌ test-database
Behavior¶
- Write summary file after all tests complete
- Create parent directories if needed
- Overwrite existing file
- Continue to show summary on terminal (unless
--quiet)
Acceptance Criteria¶
-
--summary-filecreates summary file - JSON format is valid and parseable
- Text format is human-readable
- Markdown format renders properly
-
--summary-formatcontrols format - Parent directories are created automatically
- File is created even if tests fail
#11: Per-Phase Timing Breakdown¶
Priority: 🟡 Medium
Effort: Medium
Files to Modify: executor.py, display.py, models.py
Description¶
Track and display timing for each phase of test execution.
Problem Statement¶
Currently only total test time is shown. When optimizing tests or diagnosing slow CI builds, it's helpful to know which phase is slow (INIT, APPLY, DESTROY, etc.).
Solution¶
Track timestamp at each phase transition and calculate phase durations.
Implementation¶
Add phase timing to TestResult:
class TestResult(NamedTuple):
# ... existing fields ...
phase_timings: dict[str, float] # Phase name -> duration in seconds
Example:
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--show-phase-timing |
boolean | false | Show per-phase timing in output |
Output Examples¶
Terminal Output (with --show-phase-timing):
✅ test-auth - PASS (12.5s total)
CLEANING: 0.5s ( 4%)
INIT: 2.0s ( 16%)
APPLYING: 8.0s ( 64%)
ANALYZING: 0.5s ( 4%)
DESTROYING: 1.5s ( 12%)
JSON Output:
{
"name": "test-auth",
"duration_seconds": 12.5,
"phase_timings": {
"CLEANING": 0.5,
"INIT": 2.0,
"APPLYING": 8.0,
"ANALYZING": 0.5,
"DESTROYING": 1.5
}
}
Acceptance Criteria¶
- Phase timings are tracked for all tests
-
--show-phase-timingdisplays breakdown - Percentages are calculated correctly
- JSON output includes phase timings
- Works with all output formats
- Timing is accurate (uses monotonic clock)
#12: Progress Percentage Indicator¶
Priority: 🟡 Medium
Effort: Low
Files to Modify: display.py
Description¶
Show overall progress as a percentage.
Problem Statement¶
In CI logs or when running many tests, it's hard to gauge overall progress. A simple percentage helps set expectations.
Solution¶
Calculate and display progress percentage.
Formula¶
Where completed_tests = passed + failed + skipped + timeout
Output Examples¶
Plain Format:
[20%] (1/5) ✅ test-auth - PASS
[40%] (2/5) ✅ test-network - PASS
[60%] (3/5) ❌ test-database - FAIL
[80%] (4/5) ✅ test-storage - PASS
[100%] (5/5) ✅ test-compute - PASS
With Estimated Time Remaining:
[20%] (1/5) ✅ test-auth - PASS - est. 60s remaining
[40%] (2/5) ✅ test-network - PASS - est. 45s remaining
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--show-progress |
boolean | auto (true in CI) | Show progress percentage |
--show-eta |
boolean | false | Show estimated time remaining |
Estimation Algorithm¶
For time remaining estimation:
1. Calculate average time per completed test
2. Multiply by remaining tests
3. Display as est. Xs remaining
4. Update after each test completes
Acceptance Criteria¶
- Progress percentage is calculated correctly
- Progress shown in plain/CI format
-
--show-progresscontrols display -
--show-etashows time estimate - Estimation becomes more accurate as tests complete
#13: Configurable Refresh Rate¶
Priority: 🟡 Medium
Effort: Low
Files to Modify: cli.py, display.py
Description¶
Allow customization of live display refresh rate.
Problem Statement¶
Current refresh rate (0.77 Hz ≈ 1.3 seconds) is hardcoded. Different scenarios benefit from different rates: - Fast refresh for local development (smoother UX) - Slow refresh for CI (less log spam) - No refresh for file output or very long tests
Solution¶
Add --refresh-rate flag and --no-refresh mode.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--refresh-rate=RATE |
float | 0.77 |
Refresh rate in Hz (updates/second) |
--no-refresh |
boolean | false | Disable periodic refresh, update only on changes |
Examples¶
# Fast refresh (2x per second)
soup stir --refresh-rate=2.0
# Slow refresh (every 5 seconds)
soup stir --refresh-rate=0.2
# Only update on actual changes
soup stir --no-refresh
Auto-Adjustment¶
In CI mode:
- Default to --no-refresh (only output on changes)
- If refresh rate is specified, honor it
Environment Variables¶
| Variable | Type | Description |
|---|---|---|
SOUP_STIR_REFRESH_RATE |
float | Default refresh rate |
Acceptance Criteria¶
-
--refresh-ratecontrols update frequency -
--no-refreshonly outputs on changes - CI mode defaults to
--no-refresh - Refresh rate is accurate (not drifting)
- Very high refresh rates don't cause performance issues
Low Priority Improvements¶
#14: Colored Output Control¶
Priority: 🟢 Low
Effort: Low
Files to Modify: cli.py, display.py
Description¶
Add control over ANSI color output.
Problem Statement¶
Some CI systems don't render ANSI colors well. Users should be able to disable colors or force them on.
Solution¶
Add --color flag and respect NO_COLOR environment variable.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--color=WHEN |
choice | auto |
When to use colors: auto, always, never |
--no-color |
boolean | false | Shorthand for --color=never |
Color Detection (auto mode)¶
if --color=always:
use colors
elif --color=never or NO_COLOR env var set:
no colors
elif stdout is TTY and TERM != "dumb":
use colors
else:
no colors
Environment Variables¶
| Variable | Effect |
|---|---|
NO_COLOR |
Disable colors (standard convention) |
FORCE_COLOR |
Force colors even in non-TTY |
SOUP_STIR_COLOR |
auto/always/never |
Examples¶
# Disable colors
soup stir --no-color
soup stir --color=never
NO_COLOR=1 soup stir
# Force colors (e.g., when piping to less -R)
soup stir --color=always
FORCE_COLOR=1 soup stir
Acceptance Criteria¶
-
--color=autoauto-detects TTY -
--color=alwaysforces colors -
--color=neverdisables colors -
NO_COLORenvironment variable works -
FORCE_COLORenvironment variable works - Emoji are preserved even when colors are disabled
#15: Failure-Only Mode¶
Priority: 🟢 Low
Effort: Low
Files to Modify: executor.py, cli.py
Description¶
Stop test execution after first failure or after N failures.
Problem Statement¶
In CI, sometimes you want fast feedback and don't need to run all tests if one fails. Stopping early saves time and resources.
Solution¶
Add --fail-fast and --fail-threshold flags.
CLI Flags¶
| Flag | Type | Default | Description |
|---|---|---|---|
--fail-fast |
boolean | false | Stop after first failure |
--fail-threshold=N |
int | unlimited | Stop after N failures |
Behavior¶
Fail Fast (--fail-fast)¶
- Stop immediately when any test fails
- Running tests are allowed to complete
- Pending tests are marked as
SKIPPED - Summary shows incomplete status
Fail Threshold (--fail-threshold=N)¶
- Stop after N tests have failed
- Useful for "stop after a few failures" scenarios
- More flexible than
--fail-fast
Examples¶
Output Example¶
✅ test-auth - PASS
❌ test-network - FAIL
⏭️ test-database - SKIPPED (fail-fast mode)
⏭️ test-storage - SKIPPED (fail-fast mode)
Stopped early: --fail-fast triggered after 1 failure
Completed: 2/5 tests
Exit Codes¶
| Code | Meaning |
|---|---|
| 1 | Tests failed (normal failure exit code) |
| Exit early, but still use code 1 | Fail-fast doesn't change exit code |
Acceptance Criteria¶
-
--fail-faststops after first failure -
--fail-threshold=Nstops after N failures - Running tests complete before stopping
- Pending tests are marked as skipped
- Summary indicates early stop
- Exit code is still 1 (failure)
- JSON output shows which tests were skipped due to fail-fast
Summary Table¶
| # | Improvement | Priority | Effort | Files | Key Features |
|---|---|---|---|---|---|
| 1 | CI Auto-Detection | 🔥 High | Medium | cli.py, display.py | Auto-detect CI, line-by-line output |
| 2 | JSON Output | 🔥 High | Low | cli.py, models.py, reporting.py | --json flag, structured output |
| 3 | JUnit XML | 🔥 High | Medium | cli.py, reporting.py | --junit-xml, CI integration |
| 4 | Format Flag | 🔥 High | Medium | cli.py, display.py, reporting.py | table/plain/json/github/quiet |
| 5 | Timeouts | 🔥 High | Medium | cli.py, executor.py, runtime.py | --timeout, --test-timeout |
| 6 | Parallelism | 🔥 High | Low | cli.py, executor.py, config.py | --jobs=N, -j 1 for serial |
| 7 | Timestamps | 🟡 Medium | Low | display.py | Auto in CI, ISO 8601 / relative |
| 8 | Error Fields | 🟡 Medium | Low | executor.py, models.py | Populate failed_stage, error_message |
| 9 | Log Aggregation | 🟡 Medium | High | terraform.py, cli.py, executor.py | --stream-logs, --aggregate-logs |
| 10 | Summary File | 🟡 Medium | Low | cli.py, reporting.py | --summary-file, json/text/markdown |
| 11 | Phase Timing | 🟡 Medium | Medium | executor.py, display.py, models.py | Per-phase duration tracking |
| 12 | Progress % | 🟡 Medium | Low | display.py | Percentage complete, ETA |
| 13 | Refresh Rate | 🟡 Medium | Low | cli.py, display.py | --refresh-rate, --no-refresh |
| 14 | Color Control | 🟢 Low | Low | cli.py, display.py | --color, respect NO_COLOR |
| 15 | Fail Fast | 🟢 Low | Low | executor.py, cli.py | --fail-fast, --fail-threshold |
Document Version: 1.0.0 Last Updated: 2025-11-02 Status: Draft Specification