CI/CD Improvements - Detailed Specifications¶

This document provides complete technical specifications for all 15 proposed improvements to soup stir for CI/CD environments.

🤖 AI-Generated Content

This documentation was generated with AI assistance and is still being audited. Some, or potentially a lot, of this information may be inaccurate. Learn more.

Table of Contents¶

High Priority Improvements
#1 Auto-Detect CI/CD Environments
#2 JSON Output for Standard Mode
#3 JUnit XML Output
#4 Format Flag with Multiple Output Modes
#5 Timeout Controls
#6 Parallelism Control
Medium Priority Improvements
#7 Timestamps in CI Mode
#8 Populate Error Fields
#9 Log Aggregation & Streaming
#10 Summary File Output
#11 Per-Phase Timing Breakdown
#12 Progress Percentage Indicator
#13 Configurable Refresh Rate
Low Priority Improvements
#14 Colored Output Control
#15 Failure-Only Mode

High Priority Improvements¶

#1: Auto-Detect CI/CD Environments¶

Priority: 🔥 High Effort: Medium Files to Modify: cli.py, display.py

Description¶

Automatically detect when soup stir is running in a CI/CD environment and adapt the output format to be more suitable for non-interactive contexts.

Problem Statement¶

The current Rich Live display generates ANSI control codes and frequent updates that clutter CI logs and don't render properly in non-TTY environments. CI build logs become difficult to read and parse.

Solution¶

Detect CI environment and automatically switch to line-by-line output mode instead of live table updates.

CI Environment Detection¶

Detect CI by checking (in order): 1. TTY detection: not sys.stdout.isatty() 2. Environment variables (any of): - CI=true (generic) - GITHUB_ACTIONS=true (GitHub Actions) - GITLAB_CI=true (GitLab CI) - JENKINS_URL (Jenkins) - CIRCLECI=true (CircleCI) - TRAVIS=true (Travis CI) - BUILDKITE=true (Buildkite) - TEAMCITY_VERSION (TeamCity) - TF_BUILD=true (Azure Pipelines)

Behavior Changes in CI Mode¶

When CI is detected: - Disable live table updates - Don't use rich.Live() - Use line-by-line output - Each status change prints a new line - Reduce refresh rate - Only output on actual status changes - Add timestamps - Prefix each line with timestamp (see #7) - Simplify formatting - Reduce visual complexity

Output Format in CI Mode¶

[2025-11-02T10:30:00.123Z] 💤 PENDING    1/5  test-auth
[2025-11-02T10:30:01.234Z] 🧹 CLEANING   1/5  test-auth
[2025-11-02T10:30:02.345Z] 🔄 INIT       1/5  test-auth
[2025-11-02T10:30:05.456Z] 🚀 APPLYING   1/5  test-auth - Creating aws_instance.example
[2025-11-02T10:30:12.567Z] 🔬 ANALYZING  1/5  test-auth
[2025-11-02T10:30:13.678Z] 💥 DESTROYING 1/5  test-auth
[2025-11-02T10:30:15.789Z] ✅ PASS       1/5  test-auth (15.7s) - 2 providers, 5 resources
[2025-11-02T10:30:15.890Z] 🧹 CLEANING   2/5  test-network

CLI Flags¶

Flag	Type	Default	Description
`--ci`	boolean	auto-detect	Force CI mode even in TTY
`--no-ci`	boolean	auto-detect	Force interactive mode even in CI

Environment Variables¶

Variable	Values	Description
`SOUP_STIR_CI_MODE`	`true`/`false`/`auto`	Override CI detection

Acceptance Criteria¶

CI environment is detected correctly in all major CI systems
Non-TTY environments automatically use line-by-line output
--ci flag forces CI mode regardless of environment
--no-ci flag forces interactive mode regardless of environment
Line-by-line output is clean and parseable
Status changes are printed immediately (not buffered)
All emoji and color are preserved (unless --no-color is used)

#2: JSON Output for Standard Mode¶

Priority: 🔥 High Effort: Low Files to Modify: cli.py, models.py, reporting.py

Description¶

Add --json flag to output test results in JSON format for standard (non-matrix) mode.

Problem Statement¶

Currently --json only works with --matrix mode. Standard test runs have no machine-readable output format, making it difficult to parse results programmatically or integrate with custom tooling.

Solution¶

Add --json flag that outputs structured JSON to stdout after tests complete.

CLI Flags¶

Flag	Type	Default	Description
`--json`	boolean	false	Output results as JSON
`--json-pretty`	boolean	false	Pretty-print JSON output

JSON Output Schema¶

{
  "summary": {
    "total": 5,
    "passed": 4,
    "failed": 1,
    "skipped": 0,
    "duration_seconds": 45.23,
    "start_time": "2025-11-02T10:30:00.123456Z",
    "end_time": "2025-11-02T10:30:45.356789Z",
    "terraform_version": "1.5.7",
    "command": "soup stir /path/to/tests"
  },
  "tests": [
    {
      "name": "test-auth",
      "directory": "/full/path/to/test-auth",
      "status": "passed",
      "duration_seconds": 12.5,
      "start_time": "2025-11-02T10:30:00.123456Z",
      "end_time": "2025-11-02T10:30:12.623456Z",
      "providers": 2,
      "resources": 5,
      "data_sources": 1,
      "functions": 0,
      "ephemeral_functions": 0,
      "outputs": 3,
      "warnings": false,
      "failed_stage": null,
      "error_message": null,
      "logs": {
        "stdout": "/path/to/stdout.log",
        "stderr": "/path/to/stderr.log",
        "terraform": "/path/to/terraform.log"
      }
    },
    {
      "name": "test-network",
      "directory": "/full/path/to/test-network",
      "status": "failed",
      "duration_seconds": 5.2,
      "start_time": "2025-11-02T10:30:12.623456Z",
      "end_time": "2025-11-02T10:30:17.823456Z",
      "providers": 1,
      "resources": 0,
      "data_sources": 0,
      "functions": 0,
      "ephemeral_functions": 0,
      "outputs": 0,
      "warnings": true,
      "failed_stage": "APPLY",
      "error_message": "Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist",
      "logs": {
        "stdout": "/path/to/stdout.log",
        "stderr": "/path/to/stderr.log",
        "terraform": "/path/to/terraform.log"
      }
    }
  ],
  "provider_cache": {
    "status": "success",
    "duration_seconds": 2.3,
    "providers_downloaded": 3
  }
}

Behavior¶

When --json is used:
Suppress all Rich display output (no live table, no summary panel)
Suppress all console output except JSON
Write JSON to stdout after all tests complete
Write any errors to stderr
Exit with appropriate code (0 for success, non-zero for failure)
When --json-pretty is used:
Same as --json but with indentation (2 spaces)
Useful for human review

Compatibility¶

Mutually exclusive with: --format (if format is not json)
Compatible with: all other flags (timeouts, parallelism, etc.)
--json implies --no-ci for display purposes (no live updates)

Acceptance Criteria¶

--json outputs valid JSON to stdout
JSON schema matches specification
All test result fields are populated correctly
Failed tests include failed_stage and error_message
No other output appears on stdout when using --json
Errors and warnings go to stderr, not stdout
JSON is parseable by jq and other tools
Exit code reflects test results (0=all passed, non-zero=failures)

#3: JUnit XML Output¶

Priority: 🔥 High Effort: Medium Files to Modify: cli.py, reporting.py

Description¶

Generate JUnit XML test reports for integration with CI/CD systems.

Problem Statement¶

Most CI/CD systems (Jenkins, GitHub Actions, GitLab CI, CircleCI, etc.) have native support for displaying JUnit XML test results. This enables: - Visual test result dashboards - Historical trending - Flaky test detection - Test failure notifications

Solution¶

Add --junit-xml flag to generate JUnit XML compatible test reports.

CLI Flags¶

Flag	Type	Default	Description
`--junit-xml=FILE`	path	none	Write JUnit XML to FILE
`--junit-suite-name=NAME`	string	`"soup-stir"`	Test suite name in XML

JUnit XML Format¶

<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="soup-stir" tests="5" failures="1" errors="0" skipped="0" time="45.23" timestamp="2025-11-02T10:30:00.123456Z">
  <testsuite name="terraform-tests" tests="5" failures="1" errors="0" skipped="0" time="45.23" timestamp="2025-11-02T10:30:00.123456Z">

    <!-- Passed test -->
    <testcase name="test-auth" classname="terraform.test-auth" time="12.5" timestamp="2025-11-02T10:30:00.123456Z">
      <system-out><![CDATA[
Providers: 2
Resources: 5
Data Sources: 1
Outputs: 3
Warnings: false
      ]]></system-out>
    </testcase>

    <!-- Failed test -->
    <testcase name="test-network" classname="terraform.test-network" time="5.2" timestamp="2025-11-02T10:30:12.623456Z">
      <failure message="Terraform apply failed" type="TerraformApplyError"><![CDATA[
Stage: APPLY
Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist

Terraform Log: /path/to/terraform.log
Stdout Log: /path/to/stdout.log
Stderr Log: /path/to/stderr.log
      ]]></failure>
      <system-out><![CDATA[
Providers: 1
Resources: 0
      ]]></system-out>
      <system-err><![CDATA[
Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist
      ]]></system-err>
    </testcase>

    <!-- Skipped test -->
    <testcase name="test-empty" classname="terraform.test-empty" time="0.1" timestamp="2025-11-02T10:30:17.823456Z">
      <skipped message="No .tf files found" />
    </testcase>

  </testsuite>
</testsuites>

Field Mapping¶

JUnit Field	Source	Notes
`testsuite/@name`	`--junit-suite-name` or `"terraform-tests"`	Suite name
`testsuite/@tests`	Count of all tests	Total tests run
`testsuite/@failures`	Count of failed tests	Tests with `status="failed"`
`testsuite/@errors`	Count of error tests	Tests with `ERROR` state (exception in harness)
`testsuite/@skipped`	Count of skipped tests	Tests with `status="skipped"`
`testsuite/@time`	Total duration in seconds	Sum of all test times
`testcase/@name`	Test directory name	e.g., `test-auth`
`testcase/@classname`	`"terraform." + test name`	Namespacing for CI systems
`testcase/@time`	Test duration in seconds	From `TestResult.duration`
`testcase/failure/@message`	`error_message` field	First line of error
`testcase/failure/@type`	Derived from `failed_stage`	e.g., `TerraformInitError`, `TerraformApplyError`
`testcase/failure/text()`	Full error with context	Error message + logs paths
`testcase/system-out`	Test metadata	Providers, resources, outputs counts
`testcase/system-err`	Error logs	Terraform error output

Behavior¶

Write XML to specified file path
Create parent directories if they don't exist
Overwrite file if it already exists
Continue to display results to terminal (unless --quiet)
Compatible with --json (both files can be generated)

Acceptance Criteria¶

#4: Format Flag with Multiple Output Modes¶

Priority: 🔥 High Effort: Medium Files to Modify: cli.py, display.py, reporting.py

Description¶

Unified --format flag to control output style for different use cases.

Problem Statement¶

Different contexts require different output styles. A single flag to control output format is more intuitive than multiple boolean flags.

Solution¶

Add --format flag with multiple predefined output modes.

CLI Flags¶

Flag	Type	Values	Default	Description
`--format`	choice	`table`, `plain`, `json`, `github`, `quiet`	`table` (or `plain` if CI detected)	Output format

Format Modes¶

1. `table` - Rich Live Table (Default Interactive)¶

Current behavior: Rich live table with colors and emoji
Auto-refresh at configured rate
Full visual display with all columns
Best for: Interactive terminal use

2. `plain` - Plain Text Line-by-Line (Default CI)¶

Line-by-line status updates
Timestamps on each line
Emoji preserved, colors preserved
No live updates (only print on change)
Best for: CI/CD logs, file output

Example:

[2025-11-02T10:30:00Z] 💤 PENDING    1/5  test-auth
[2025-11-02T10:30:01Z] 🔄 INIT       1/5  test-auth
[2025-11-02T10:30:05Z] 🚀 APPLYING   1/5  test-auth
[2025-11-02T10:30:12Z] ✅ PASS       1/5  test-auth (12.5s)

3. `json` - JSON Output¶

Same as --json flag
Outputs structured JSON to stdout
Suppresses all other output
Best for: Programmatic consumption, custom tooling

4. `github` - GitHub Actions Annotations¶

Outputs GitHub Actions workflow commands
Groups tests with ::group:: / ::endgroup::
Errors use ::error:: annotations
Warnings use ::warning:: annotations
Best for: GitHub Actions workflows

Example:

::group::Test: test-auth
🔄 Running test-auth...
✅ test-auth passed in 12.5s
  Providers: 2, Resources: 5, Outputs: 3
::endgroup::

::group::Test: test-network
🔄 Running test-network...
::error file=test-network/main.tf,line=15::Terraform apply failed: InvalidAMI: The image id '[ami-12345]' does not exist
❌ test-network failed in 5.2s
::endgroup::

5. `quiet` - Minimal Output¶

Only print summary at end
No progress updates
No live display
Errors still printed
Best for: Scripts, when you only care about final result

Example:

Running 5 tests...
Done. 4 passed, 1 failed in 45.2s

Behavior¶

--format is mutually exclusive with --json (or --json implies --format=json)
Auto-detection: If CI detected and no --format specified, use plain
Can be overridden: --format=table forces table even in CI

Environment Variables¶

Variable	Values	Description
`SOUP_STIR_FORMAT`	`table`/`plain`/`json`/`github`/`quiet`	Default format

Acceptance Criteria¶

--format=table uses Rich live table
--format=plain uses line-by-line output
--format=json outputs valid JSON
--format=github outputs valid GitHub Actions annotations
--format=quiet shows only summary
Auto-detection works (CI → plain, interactive → table)
SOUP_STIR_FORMAT environment variable works
Each format is properly tested

#5: Timeout Controls¶

Priority: 🔥 High Effort: Medium Files to Modify: cli.py, executor.py, runtime.py

Description¶

Add timeout controls for tests to prevent hanging in CI/CD environments.

Problem Statement¶

Currently, standard tests have no timeout mechanism. A misbehaving test can run indefinitely, blocking CI pipelines and wasting resources. Matrix testing has timeouts, but standard mode does not.

Solution¶

Add global and per-test timeout controls.

CLI Flags¶

Flag	Type	Default	Description
`--timeout=SECONDS`	int	unlimited	Global timeout for entire test suite (seconds)
`--test-timeout=SECONDS`	int	unlimited	Timeout per individual test (seconds)

Environment Variables¶

Variable	Type	Description
`SOUP_STIR_TIMEOUT`	int	Default global timeout (seconds)
`SOUP_STIR_TEST_TIMEOUT`	int	Default per-test timeout (seconds)

Timeout Behavior¶

Per-Test Timeout (`--test-timeout`)¶

Applies to each individual test
Timer starts when test begins execution (CLEANING phase)
Timer stops when test completes (PASS/FAIL/SKIP)
If timeout is exceeded:
Test is terminated (SIGTERM, then SIGKILL after grace period)
Test is marked with special status: TIMEOUT
Test counts as failure in summary
Remaining tests continue

Global Timeout (`--timeout`)¶

Applies to entire test suite
Timer starts when first test begins
Timer stops when all tests complete or timeout is hit
If timeout is exceeded:
All running tests are terminated
Pending tests are marked as SKIPPED
Summary shows incomplete status
Exit with timeout-specific exit code (124)

Timeout Grace Period¶

When timeout is hit, send SIGTERM to subprocess
Wait 5 seconds for graceful termination
If still running, send SIGKILL
Mark test as TIMEOUT with message about forceful termination

Output Examples¶

Test Timeout:

⏱️  TIMEOUT   3/5  test-slow (300.0s) - Exceeded --test-timeout=300

Global Timeout:

⏱️  Suite timed out after 600s (--timeout=600)
   Completed: 3/5 tests
   Running: 2 tests (terminated)
   Pending: 0 tests

Exit Codes¶

Code	Meaning
0	All tests passed
1	One or more tests failed
124	Global timeout exceeded (follows GNU timeout convention)
125	Test timeout exceeded

JSON Output with Timeouts¶

{
  "summary": {
    "total": 5,
    "passed": 2,
    "failed": 1,
    "timeout": 1,
    "skipped": 1,
    "duration_seconds": 600.0,
    "timeout_exceeded": true,
    "timeout_type": "global"
  },
  "tests": [
    {
      "name": "test-slow",
      "status": "timeout",
      "duration_seconds": 300.0,
      "timeout_seconds": 300,
      "error_message": "Test exceeded timeout of 300 seconds"
    }
  ]
}

Acceptance Criteria¶

--timeout enforces global timeout
--test-timeout enforces per-test timeout
Timeout tests are marked distinctly (TIMEOUT status)
Timeout tests show in summary as separate category
Exit code is 124 for global timeout, 125 for test timeout
Graceful termination is attempted (SIGTERM before SIGKILL)
JSON output includes timeout information
JUnit XML marks timeout tests appropriately
Environment variables work as defaults

#6: Parallelism Control¶

Priority: 🔥 High Effort: Low Files to Modify: cli.py, executor.py, config.py

Description¶

Add control over test parallelism to allow serial execution or custom concurrency limits.

Problem Statement¶

Currently, parallelism is hardcoded to os.cpu_count(). This is not always optimal: - In CI with limited resources, may want to reduce parallelism - For debugging, serial execution (-j 1) is often necessary - Some CI environments have quotas that limit concurrent operations

Solution¶

Add --jobs flag to control parallelism, similar to make -j.

CLI Flags¶

Flag	Type	Default	Description
`--jobs=N`, `-j N`	int	`auto`	Number of tests to run in parallel
`-j` (no value)	flag	`auto`	Use auto-detection (all CPUs)

Special values: - --jobs=1 or -j 1: Serial execution (one test at a time) - --jobs=0 or --jobs=auto: Auto-detect (current behavior: os.cpu_count()) - --jobs=N: Run up to N tests in parallel

Environment Variables¶

Variable	Type	Description
`SOUP_STIR_JOBS`	int/`auto`	Default parallelism level

Examples¶

# Serial execution (debugging)
soup stir --jobs=1
soup stir -j 1

# Limit to 2 parallel tests
soup stir --jobs=2
soup stir -j 2

# Use all CPUs (default)
soup stir --jobs=auto
soup stir -j

# Environment variable
export SOUP_STIR_JOBS=4
soup stir

Output Changes¶

Display effective parallelism at start:

Running 5 tests with parallelism=1 (serial mode)...

Running 5 tests with parallelism=4 (4 concurrent)...

Running 5 tests with parallelism=12 (auto-detected CPUs)...

Behavior¶

Parallelism controls semaphore in execute_tests()
Serial mode (-j 1) is deterministic (tests run in sorted order)
Progress is easier to follow in serial mode
Useful for:
Debugging test failures
CI environments with resource constraints
Avoiding rate limits on cloud providers
Ensuring deterministic test order

Acceptance Criteria¶

--jobs=N limits concurrent tests to N
-j 1 runs tests serially (one at a time)
-j (no value) or --jobs=auto uses all CPUs
Effective parallelism is displayed at start
SOUP_STIR_JOBS environment variable works
Serial mode runs tests in deterministic order
Parallelism is reflected in live display (number of active tests)

Medium Priority Improvements¶

#7: Timestamps in CI Mode¶

Priority: 🟡 Medium Effort: Low Files to Modify: display.py

Description¶

Add timestamps to each output line when running in CI/CD mode.

Problem Statement¶

In CI logs, it's difficult to correlate events or understand when things happened without timestamps. This is especially problematic for long-running tests or when debugging timing issues.

Solution¶

Automatically add timestamps when in CI mode or when explicitly requested.

CLI Flags¶

Flag	Type	Default	Description
`--timestamps`	boolean	auto (true in CI)	Show timestamps on each line
`--no-timestamps`	boolean	false	Disable timestamps even in CI
`--timestamp-format`	choice	`iso8601`	Timestamp format

Timestamp Formats¶

Format	Example	Description
`iso8601`	`2025-11-02T10:30:15.123456Z`	ISO 8601 with microseconds (default)
`iso8601-simple`	`2025-11-02T10:30:15Z`	ISO 8601 without microseconds
`relative`	`[+00:15.2s]`	Relative to test suite start
`elapsed`	`[00:15.2]`	Same as relative but different format
`unix`	`1699012215.123456`	Unix timestamp with microseconds

Output Examples¶

ISO 8601 (default):

[2025-11-02T10:30:00.123456Z] 💤 PENDING    1/5  test-auth
[2025-11-02T10:30:01.234567Z] 🔄 INIT       1/5  test-auth

Relative:

[+00:00.0s] 💤 PENDING    1/5  test-auth
[+00:01.1s] 🔄 INIT       1/5  test-auth
[+00:15.2s] ✅ PASS       1/5  test-auth

Behavior¶

Auto-enabled in CI mode
Can be forced with --timestamps
Can be disabled with --no-timestamps
Format controlled by --timestamp-format
Timestamps use UTC for consistency across CI environments

Environment Variables¶

Variable	Values	Description
`SOUP_STIR_TIMESTAMPS`	`true`/`false`/`auto`	Enable timestamps
`SOUP_STIR_TIMESTAMP_FORMAT`	Format name	Timestamp format

Acceptance Criteria¶

Timestamps are auto-enabled in CI mode
--timestamps forces timestamps in interactive mode
--no-timestamps disables in CI mode
All timestamp formats work correctly
Timestamps are aligned and don't break formatting
Timestamps use UTC timezone

#8: Populate `failed_stage` and `error_message` Fields¶

Priority: 🟡 Medium Effort: Low Files to Modify: executor.py, models.py

Description¶

Populate the currently-empty failed_stage and error_message fields in TestResult.

Problem Statement¶

The TestResult data structure has failed_stage and error_message fields, but they are never populated. This makes it harder to analyze failures programmatically.

Solution¶

Track which stage failed and extract the error message.

Failed Stage Values¶

Stage	When Set	Description
`null`	Test passed	No failure
`"INIT"`	`terraform init` failed	Initialization failure
`"APPLY"`	`terraform apply` failed	Apply failure
`"DESTROY"`	`terraform destroy` failed	Destroy failure (rare)
`"ANALYZING"`	JSON parsing failed	State analysis failure
`"HARNESS"`	Python exception	Test harness error (not terraform)

Error Message Extraction¶

Extract error message from parsed Terraform logs: 1. Find first log entry with @level == "error" 2. Extract @message field 3. If error is structured, extract relevant fields 4. Truncate to reasonable length (e.g., 500 chars) 5. Store in error_message field

Example¶

Before (current):

TestResult(
    directory="test-network",
    success=False,
    failed_stage=None,  # Not populated!
    error_message=None,  # Not populated!
    ...
)

After (improved):

TestResult(
    directory="test-network",
    success=False,
    failed_stage="APPLY",
    error_message="Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist",
    ...
)

JSON Output¶

{
  "name": "test-network",
  "status": "failed",
  "failed_stage": "APPLY",
  "error_message": "Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist"
}

JUnit XML Output¶

<testcase name="test-network" classname="terraform.test-network" time="5.2">
  <failure message="Error: aws_instance.example: InvalidAMI" type="TerraformApplyError">
    Stage: APPLY
    Error: aws_instance.example: InvalidAMI: The image id '[ami-12345]' does not exist
  </failure>
</testcase>

Acceptance Criteria¶

failed_stage is populated for all failures
error_message contains first error from logs
Harness exceptions set failed_stage="HARNESS"
Error messages are truncated if too long
JSON output includes these fields
JUnit XML uses these fields
Passed tests have null for both fields

#9: Log Aggregation & Streaming¶

Priority: 🟡 Medium Effort: High Files to Modify: terraform.py, cli.py, executor.py

Description¶

Provide options to aggregate logs or stream them to stdout for better CI integration.

Problem Statement¶

Logs are scattered across multiple directories, making them hard to access in CI environments. Developers need to download artifacts and navigate directory structures to find relevant logs.

Solution¶

Add options to stream logs in real-time or aggregate them into a single file.

CLI Flags¶

Flag	Type	Default	Description
`--stream-logs`	boolean	false	Stream all terraform logs to stdout
`--aggregate-logs=FILE`	path	none	Aggregate all logs into single file
`--logs-dir=DIR`	path	auto	Custom directory for log files

Stream Logs Mode (`--stream-logs`)¶

Stream all Terraform output to stdout in real-time:

[test-auth:init] Initializing the backend...
[test-auth:init] Initializing provider plugins...
[test-auth:apply] Terraform will perform the following actions:
[test-auth:apply]   # aws_instance.example will be created
[test-network:init] Initializing the backend...

Features: - Prefix each line with test name and phase - Color-code by test (if colors enabled) - Interleave logs from parallel tests - Include timestamps

Aggregate Logs Mode (`--aggregate-logs`)¶

Write all logs to a single file:

soup stir --aggregate-logs=all-tests.log

File format:

========================================
Test: test-auth
Phase: init
Start: 2025-11-02T10:30:00Z
========================================
Initializing the backend...
Initializing provider plugins...
...

========================================
Test: test-auth
Phase: apply
Start: 2025-11-02T10:30:05Z
========================================
Terraform will perform the following actions:
...

Custom Logs Directory (`--logs-dir`)¶

Override default log location:

soup stir --logs-dir=/tmp/stir-logs

All log files written to specified directory
Useful for CI artifact collection
Can be ephemeral or persistent

Environment Variables¶

Variable	Description
`SOUP_STIR_STREAM_LOGS`	`true`/`false` - Enable log streaming
`SOUP_STIR_LOGS_DIR`	Path - Default logs directory

Acceptance Criteria¶

--stream-logs streams all logs to stdout
Logs are prefixed with test name and phase
Parallel test logs are interleaved correctly
--aggregate-logs creates single log file
Aggregated logs are properly sectioned by test
--logs-dir changes log output directory
Directory is created if it doesn't exist
Compatible with other output formats

#10: Summary File Output¶

Priority: 🟡 Medium Effort: Low Files to Modify: cli.py, reporting.py

Description¶

Save test summary to a file for later analysis or CI artifact collection.

Problem Statement¶

Test summary is only printed to terminal and is not persisted. In CI, it's useful to have a summary file that can be: - Uploaded as an artifact - Parsed by other tools - Used for notifications or reports

Solution¶

Add --summary-file flag to save summary in various formats.

CLI Flags¶

Flag	Type	Default	Description
`--summary-file=FILE`	path	none	Save summary to file
`--summary-format`	choice	`json`	Summary file format (`json`/`text`/`markdown`)

Summary Formats¶

JSON Format¶

{
  "summary": {
    "total": 5,
    "passed": 4,
    "failed": 1,
    "skipped": 0,
    "timeout": 0,
    "duration_seconds": 45.23,
    "start_time": "2025-11-02T10:30:00Z",
    "end_time": "2025-11-02T10:30:45Z"
  },
  "passed": ["test-auth", "test-network", "test-storage", "test-compute"],
  "failed": ["test-database"],
  "skipped": [],
  "timeout": []
}

Text Format¶

TofuSoup Test Summary
=====================
Total:    5
Passed:   4
Failed:   1
Skipped:  0
Duration: 45.2s

Passed Tests:
  - test-auth
  - test-network
  - test-storage
  - test-compute

Failed Tests:
  - test-database

Generated: 2025-11-02T10:30:45Z

Markdown Format¶

# TofuSoup Test Summary

**Duration**: 45.2s
**Generated**: 2025-11-02T10:30:45Z

## Results

| Metric | Count |
|--------|-------|
| Total | 5 |
| ✅ Passed | 4 |
| ❌ Failed | 1 |
| ⏭️ Skipped | 0 |

## Passed Tests

- ✅ test-auth
- ✅ test-network
- ✅ test-storage
- ✅ test-compute

## Failed Tests

- ❌ test-database

Behavior¶

Write summary file after all tests complete
Create parent directories if needed
Overwrite existing file
Continue to show summary on terminal (unless --quiet)

Acceptance Criteria¶

--summary-file creates summary file
JSON format is valid and parseable
Text format is human-readable
Markdown format renders properly
--summary-format controls format
Parent directories are created automatically
File is created even if tests fail

#11: Per-Phase Timing Breakdown¶

Priority: 🟡 Medium Effort: Medium Files to Modify: executor.py, display.py, models.py

Description¶

Track and display timing for each phase of test execution.

Problem Statement¶

Currently only total test time is shown. When optimizing tests or diagnosing slow CI builds, it's helpful to know which phase is slow (INIT, APPLY, DESTROY, etc.).

Solution¶

Track timestamp at each phase transition and calculate phase durations.

Implementation¶

Add phase timing to TestResult:

class TestResult(NamedTuple):
    # ... existing fields ...
    phase_timings: dict[str, float]  # Phase name -> duration in seconds

Example:

{
    "CLEANING": 0.5,
    "INIT": 2.0,
    "APPLYING": 8.0,
    "ANALYZING": 0.5,
    "DESTROYING": 1.5
}

CLI Flags¶

Flag	Type	Default	Description
`--show-phase-timing`	boolean	false	Show per-phase timing in output

Output Examples¶

Terminal Output (with --show-phase-timing):

✅ test-auth - PASS (12.5s total)
   CLEANING:   0.5s (  4%)
   INIT:       2.0s ( 16%)
   APPLYING:   8.0s ( 64%)
   ANALYZING:  0.5s (  4%)
   DESTROYING: 1.5s ( 12%)

JSON Output:

{
  "name": "test-auth",
  "duration_seconds": 12.5,
  "phase_timings": {
    "CLEANING": 0.5,
    "INIT": 2.0,
    "APPLYING": 8.0,
    "ANALYZING": 0.5,
    "DESTROYING": 1.5
  }
}

Acceptance Criteria¶

Phase timings are tracked for all tests
--show-phase-timing displays breakdown
Percentages are calculated correctly
JSON output includes phase timings
Works with all output formats
Timing is accurate (uses monotonic clock)

#12: Progress Percentage Indicator¶

Priority: 🟡 Medium Effort: Low Files to Modify: display.py

Description¶

Show overall progress as a percentage.

Problem Statement¶

In CI logs or when running many tests, it's hard to gauge overall progress. A simple percentage helps set expectations.

Solution¶

Calculate and display progress percentage.

Formula¶

Progress % = (completed_tests / total_tests) * 100

Where completed_tests = passed + failed + skipped + timeout

Output Examples¶

Plain Format:

[20%] (1/5) ✅ test-auth - PASS
[40%] (2/5) ✅ test-network - PASS
[60%] (3/5) ❌ test-database - FAIL
[80%] (4/5) ✅ test-storage - PASS
[100%] (5/5) ✅ test-compute - PASS

With Estimated Time Remaining:

[20%] (1/5) ✅ test-auth - PASS - est. 60s remaining
[40%] (2/5) ✅ test-network - PASS - est. 45s remaining

CLI Flags¶

Flag	Type	Default	Description
`--show-progress`	boolean	auto (true in CI)	Show progress percentage
`--show-eta`	boolean	false	Show estimated time remaining

Estimation Algorithm¶

For time remaining estimation: 1. Calculate average time per completed test 2. Multiply by remaining tests 3. Display as est. Xs remaining 4. Update after each test completes

Acceptance Criteria¶

Progress percentage is calculated correctly
Progress shown in plain/CI format
--show-progress controls display
--show-eta shows time estimate
Estimation becomes more accurate as tests complete

#13: Configurable Refresh Rate¶

Priority: 🟡 Medium Effort: Low Files to Modify: cli.py, display.py

Description¶

Allow customization of live display refresh rate.

Problem Statement¶

Current refresh rate (0.77 Hz ≈ 1.3 seconds) is hardcoded. Different scenarios benefit from different rates: - Fast refresh for local development (smoother UX) - Slow refresh for CI (less log spam) - No refresh for file output or very long tests

Solution¶

Add --refresh-rate flag and --no-refresh mode.

CLI Flags¶

Flag	Type	Default	Description
`--refresh-rate=RATE`	float	`0.77`	Refresh rate in Hz (updates/second)
`--no-refresh`	boolean	false	Disable periodic refresh, update only on changes

Examples¶

# Fast refresh (2x per second)
soup stir --refresh-rate=2.0

# Slow refresh (every 5 seconds)
soup stir --refresh-rate=0.2

# Only update on actual changes
soup stir --no-refresh

Auto-Adjustment¶

In CI mode: - Default to --no-refresh (only output on changes) - If refresh rate is specified, honor it

Environment Variables¶

Variable	Type	Description
`SOUP_STIR_REFRESH_RATE`	float	Default refresh rate

Acceptance Criteria¶

--refresh-rate controls update frequency
--no-refresh only outputs on changes
CI mode defaults to --no-refresh
Refresh rate is accurate (not drifting)
Very high refresh rates don't cause performance issues

Low Priority Improvements¶

#14: Colored Output Control¶

Priority: 🟢 Low Effort: Low Files to Modify: cli.py, display.py

Description¶

Add control over ANSI color output.

Problem Statement¶

Some CI systems don't render ANSI colors well. Users should be able to disable colors or force them on.

Solution¶

Add --color flag and respect NO_COLOR environment variable.

CLI Flags¶

Flag	Type	Default	Description
`--color=WHEN`	choice	`auto`	When to use colors: `auto`, `always`, `never`
`--no-color`	boolean	false	Shorthand for `--color=never`

Color Detection (auto mode)¶

if --color=always:
    use colors
elif --color=never or NO_COLOR env var set:
    no colors
elif stdout is TTY and TERM != "dumb":
    use colors
else:
    no colors

Environment Variables¶

Variable	Effect
`NO_COLOR`	Disable colors (standard convention)
`FORCE_COLOR`	Force colors even in non-TTY
`SOUP_STIR_COLOR`	`auto`/`always`/`never`

Examples¶

# Disable colors
soup stir --no-color
soup stir --color=never
NO_COLOR=1 soup stir

# Force colors (e.g., when piping to less -R)
soup stir --color=always
FORCE_COLOR=1 soup stir

Acceptance Criteria¶

--color=auto auto-detects TTY
--color=always forces colors
--color=never disables colors
NO_COLOR environment variable works
FORCE_COLOR environment variable works
Emoji are preserved even when colors are disabled

#15: Failure-Only Mode¶

Priority: 🟢 Low Effort: Low Files to Modify: executor.py, cli.py

Description¶

Stop test execution after first failure or after N failures.

Problem Statement¶

In CI, sometimes you want fast feedback and don't need to run all tests if one fails. Stopping early saves time and resources.

Solution¶

Add --fail-fast and --fail-threshold flags.

CLI Flags¶

Flag	Type	Default	Description
`--fail-fast`	boolean	false	Stop after first failure
`--fail-threshold=N`	int	unlimited	Stop after N failures

Behavior¶

Fail Fast (`--fail-fast`)¶

Stop immediately when any test fails
Running tests are allowed to complete
Pending tests are marked as SKIPPED
Summary shows incomplete status

Fail Threshold (`--fail-threshold=N`)¶

Stop after N tests have failed
Useful for "stop after a few failures" scenarios
More flexible than --fail-fast

Examples¶

# Stop at first failure
soup stir --fail-fast

# Stop after 3 failures
soup stir --fail-threshold=3

Output Example¶

✅ test-auth - PASS
❌ test-network - FAIL
⏭️ test-database - SKIPPED (fail-fast mode)
⏭️ test-storage - SKIPPED (fail-fast mode)

Stopped early: --fail-fast triggered after 1 failure
Completed: 2/5 tests

Exit Codes¶

Code	Meaning
1	Tests failed (normal failure exit code)
Exit early, but still use code 1	Fail-fast doesn't change exit code

Acceptance Criteria¶

--fail-fast stops after first failure
--fail-threshold=N stops after N failures
Running tests complete before stopping
Pending tests are marked as skipped
Summary indicates early stop
Exit code is still 1 (failure)
JSON output shows which tests were skipped due to fail-fast

Summary Table¶

#	Improvement	Priority	Effort	Files	Key Features
1	CI Auto-Detection	🔥 High	Medium	cli.py, display.py	Auto-detect CI, line-by-line output
2	JSON Output	🔥 High	Low	cli.py, models.py, reporting.py	`--json` flag, structured output
3	JUnit XML	🔥 High	Medium	cli.py, reporting.py	`--junit-xml`, CI integration
4	Format Flag	🔥 High	Medium	cli.py, display.py, reporting.py	table/plain/json/github/quiet
5	Timeouts	🔥 High	Medium	cli.py, executor.py, runtime.py	`--timeout`, `--test-timeout`
6	Parallelism	🔥 High	Low	cli.py, executor.py, config.py	`--jobs=N`, `-j 1` for serial
7	Timestamps	🟡 Medium	Low	display.py	Auto in CI, ISO 8601 / relative
8	Error Fields	🟡 Medium	Low	executor.py, models.py	Populate failed_stage, error_message
9	Log Aggregation	🟡 Medium	High	terraform.py, cli.py, executor.py	`--stream-logs`, `--aggregate-logs`
10	Summary File	🟡 Medium	Low	cli.py, reporting.py	`--summary-file`, json/text/markdown
11	Phase Timing	🟡 Medium	Medium	executor.py, display.py, models.py	Per-phase duration tracking
12	Progress %	🟡 Medium	Low	display.py	Percentage complete, ETA
13	Refresh Rate	🟡 Medium	Low	cli.py, display.py	`--refresh-rate`, `--no-refresh`
14	Color Control	🟢 Low	Low	cli.py, display.py	`--color`, respect NO_COLOR
15	Fail Fast	🟢 Low	Low	executor.py, cli.py	`--fail-fast`, `--fail-threshold`

Document Version: 1.0.0 Last Updated: 2025-11-02 Status: Draft Specification

CI/CD Improvements - Detailed Specifications¶

Table of Contents¶

High Priority Improvements¶

#1: Auto-Detect CI/CD Environments¶

Description¶

Problem Statement¶

Solution¶

CI Environment Detection¶

Behavior Changes in CI Mode¶

Output Format in CI Mode¶

CLI Flags¶

Environment Variables¶

Acceptance Criteria¶

#2: JSON Output for Standard Mode¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

JSON Output Schema¶

Behavior¶

Compatibility¶

Acceptance Criteria¶

#3: JUnit XML Output¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

JUnit XML Format¶

Field Mapping¶

Behavior¶

Acceptance Criteria¶

#4: Format Flag with Multiple Output Modes¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

Format Modes¶

1. table - Rich Live Table (Default Interactive)¶

2. plain - Plain Text Line-by-Line (Default CI)¶

3. json - JSON Output¶

4. github - GitHub Actions Annotations¶

5. quiet - Minimal Output¶

Behavior¶

Environment Variables¶

Acceptance Criteria¶

#5: Timeout Controls¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

Environment Variables¶

Timeout Behavior¶

Per-Test Timeout (--test-timeout)¶

Global Timeout (--timeout)¶

Timeout Grace Period¶

Output Examples¶

Exit Codes¶

JSON Output with Timeouts¶

Acceptance Criteria¶

#6: Parallelism Control¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

Environment Variables¶

Examples¶

Output Changes¶

Behavior¶

Acceptance Criteria¶

Medium Priority Improvements¶

#7: Timestamps in CI Mode¶

Description¶

Problem Statement¶

Solution¶

CLI Flags¶

Timestamp Formats¶

Output Examples¶

Behavior¶

Environment Variables¶

Acceptance Criteria¶

1. `table` - Rich Live Table (Default Interactive)¶

2. `plain` - Plain Text Line-by-Line (Default CI)¶

3. `json` - JSON Output¶

4. `github` - GitHub Actions Annotations¶

5. `quiet` - Minimal Output¶

Per-Test Timeout (`--test-timeout`)¶

Global Timeout (`--timeout`)¶

#8: Populate `failed_stage` and `error_message` Fields¶

Stream Logs Mode (`--stream-logs`)¶

Aggregate Logs Mode (`--aggregate-logs`)¶

Custom Logs Directory (`--logs-dir`)¶