Skip to content

Exploration on Workflow Evolution #14646

@mnkiefer

Description

@mnkiefer

title: Workflow Evolution
description: Understanding the equilibrium between agentic and deterministic workflows, and strategies for evolving between them
sidebar:
order: 7

GitHub Agentic Workflows enables a natural evolution between deterministic CI/CD workflows and agentic AI-powered automation. This guide explores when to use each approach and how to transition between them as patterns mature.

The Workflow Spectrum

Workflows exist on a spectrum from fully deterministic to fully agentic:

Deterministic                 Hybrid                    Agentic
     ↓                          ↓                         ↓
┌─────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Classic   │    │  Deterministic   │    │   Pure AI       │
│  CI/CD      │ →  │   + Agentic      │ ←  │  Workflows      │
│  Workflows  │    │   Patterns       │    │                 │
└─────────────┘    └──────────────────┘    └─────────────────┘
  • Build         • Data prep         • Issue triage
  • Test          • Filter triggers   • Code review
  • Deploy        • Post-processing   • Research
  • Lint          • Custom analysis   • Documentation

The equilibrium: Use the right approach for each task based on requirements for reproducibility, flexibility, and human oversight.

When to Use Each Approach

Deterministic Workflows (Classic GitHub Actions)

Use deterministic workflows when you need:

  • Reproducibility: Exact same output every time (builds, releases)
  • Speed: Sub-second execution for simple tasks
  • Compliance: Audit trails with predictable behavior
  • Cost control: No AI inference costs
  • Critical path: Production deployments, security scanning

Examples:

  • Build and test pipelines
  • Deployment automation
  • Security scanning (actionlint, CodeQL)
  • Dependency updates (Dependabot)
  • Package publishing

Agentic Workflows

Use agentic workflows when you need:

  • Context understanding: Interpreting unstructured data (issues, PRs, discussions)
  • Adaptive behavior: Responses varying based on content and context
  • Content generation: Writing summaries, documentation, responses
  • Complex reasoning: Multi-step analysis requiring judgment
  • Research tasks: Investigating patterns, gathering information

Examples:

  • Issue and PR triage
  • Release notes generation
  • Code review suggestions
  • Documentation maintenance
  • Refactoring recommendations

Hybrid Workflows

Combine both approaches when you need:

  • Data preprocessing: Deterministic data fetching → AI analysis
  • Conditional execution: Deterministic filtering → AI action
  • Post-processing: AI output → Deterministic formatting
  • Multi-stage pipelines: Computation → Reasoning → Action

Examples:

  • Static analysis detection → AI-generated fix recommendations
  • Security scanning → Contextual triage and prioritization
  • Metrics collection → Trend analysis and reporting
  • Code analysis → Refactoring suggestions → Automated PRs

Elevating Static → Agentic Workflows

Transform deterministic workflows into agentic ones by adding AI capabilities to existing automation.

Strategy 1: Add AI Post-Processing

Enhance deterministic workflows by adding AI interpretation of results.

Before (Deterministic only):

name: Static Analysis
on: push
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run lint > results.txt
      - uses: actions/upload-artifact@v4
        with:
          name: lint-results
          path: results.txt

After (Hybrid with AI):

---
on: push
engine: copilot
safe-outputs:
  add-comment:
    max: 1

steps:
  - run: |
      npm run lint > /tmp/gh-aw/agent/lint-results.txt
      npm run test 2>&1 | tee /tmp/gh-aw/agent/test-results.txt
---

# Smart Analysis Report

Analyze the lint results in `/tmp/gh-aw/agent/lint-results.txt` and test output in `/tmp/gh-aw/agent/test-results.txt`.

If there are failures:
1. Categorize issues by severity and type
2. Identify patterns in the failures
3. Suggest specific fixes with file/line references
4. Use add-comment to post a summary on the PR

If everything passes, exit without commenting.

Value added: AI interprets results, provides context-aware recommendations, and only comments when actionable.

Strategy 2: Replace Hard-Coded Logic with AI Reasoning

Convert complex conditional logic into natural language instructions.

Before (Complex conditionals):

- name: Label issue
  run: |
    if echo "$BODY" | grep -qi "bug"; then
      gh issue edit $NUMBER --add-label bug
    elif echo "$BODY" | grep -qi "feature"; then
      gh issue edit $NUMBER --add-label enhancement
    elif echo "$BODY" | grep -qi "documentation\|docs"; then
      gh issue edit $NUMBER --add-label documentation
    fi

After (AI reasoning):

---
on:
  issues:
    types: [opened]
engine: copilot
safe-outputs:
  add-labels:
    max: 3
---

# Issue Triage

Analyze the issue "${{ github.event.issue.title }}" and body content to:

1. Determine the issue type (bug, feature, question, documentation)
2. Assess priority based on impact and urgency
3. Identify the relevant component or area
4. Use add-labels to apply appropriate labels

Value added: Handles nuanced cases, understands context beyond keywords, adapts to new patterns without code changes.

Strategy 3: Add Research and Context

Enhance workflows with AI research capabilities.

Before (Fixed response):

- name: Welcome new contributors
  run: |
    gh issue comment $NUMBER --body "Thanks for your contribution!"

After (Contextual welcome):

---
on:
  issues:
    types: [opened]
  pull_request:
    types: [opened]
engine: copilot
safe-outputs:
  add-comment:
    max: 1
tools:
  github:
    toolsets: [default]
---

# Contextual Welcome

For new contributor @${{ github.event.sender.login }}:

1. Check their contribution history in this repository
2. Review the issue/PR content for complexity
3. Identify related issues or documentation
4. Use add-comment to post a personalized welcome that:
   - Acknowledges their specific contribution
   - Links to relevant documentation
   - Offers specific guidance based on the issue/PR content
   - Sets appropriate expectations

Value added: Personalized responses, relevant guidance, better contributor experience.

Optimizing Agentic → Static Workflows

As agentic workflows mature and patterns stabilize, codify proven behaviors into deterministic workflows for improved reliability and cost.

Strategy 1: Extract Patterns into Rules

When AI consistently makes the same decisions, codify them as deterministic rules.

Observation: After months of running an agentic triage workflow, you notice it always labels issues with "good first issue" when they have specific characteristics.

Before (Agentic):

---
on:
  issues:
    types: [opened]
engine: copilot
safe-outputs:
  add-labels:
---

# Issue Triage

Review the issue and apply appropriate labels including "good first issue" if suitable.

After (Hybrid - deterministic for common cases):

---
on:
  issues:
    types: [opened]
engine: copilot
safe-outputs:
  add-labels:

steps:
  - id: check-beginner
    run: |
      BODY="${{ github.event.issue.body }}"
      LABELS="${{ join(github.event.issue.labels.*.name, ',') }}"
      
      # Deterministic rules for clear cases
      if [[ "$BODY" =~ "typo"|"documentation"|"README" ]] && \
         [[ ! "$BODY" =~ "breaking"|"architecture"|"refactor" ]]; then
        gh issue edit ${{ github.event.issue.number }} --add-label "good first issue"
        echo "handled=true" >> "$GITHUB_OUTPUT"
      else
        echo "handled=false" >> "$GITHUB_OUTPUT"
      fi
    env:
      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
---

# Advanced Triage

${{ steps.check-beginner.outputs.handled == 'true' && '✓ Already labeled as good first issue by deterministic rules.' || '' }}

${{ steps.check-beginner.outputs.handled == 'false' && 'Review this issue and apply appropriate labels. Consider complexity, scope, and clarity.' || 'No further action needed.' }}

Value saved: Common cases handled instantly without AI costs, AI focuses on edge cases.

Strategy 2: Codify Formatting and Structure

When AI consistently generates the same format, use templates and deterministic formatting.

Before (AI formatting):

---
on:
  release:
    types: [published]
engine: copilot
safe-outputs:
  update-release:
---

# Release Notes Generator

Generate release notes for release ${{ github.event.release.tag_name }} by:
1. Listing merged PRs since last release
2. Categorizing them (features, fixes, chores)
3. Writing a summary
4. Using update-release to prepend notes

After (Deterministic structure + AI summaries):

---
on:
  release:
    types: [published]
engine: copilot
safe-outputs:
  update-release:

steps:
  - run: |
      TAG="${{ github.event.release.tag_name }}"
      
      # Deterministic PR categorization
      gh pr list --state merged --json number,title,labels --limit 100 | \
        jq -r '.[] | 
          if (.labels | map(.name) | any(. == "feature")) then
            "### Features\n- #\(.number): \(.title)"
          elif (.labels | map(.name) | any(. == "bug")) then
            "### Bug Fixes\n- #\(.number): \(.title)"
          else
            "### Other Changes\n- #\(.number): \(.title)"
          end' > /tmp/gh-aw/agent/pr-list.txt
      
      echo "Categorized PRs:" >> /tmp/gh-aw/agent/pr-list.txt
---

# Release Summary

PRs for release ${{ github.event.release.tag_name }} are categorized in `/tmp/gh-aw/agent/pr-list.txt`.

Write a concise 2-3 sentence release summary highlighting the most important changes. Use update-release to prepend:

Release ${{ github.event.release.tag_name }}

[Your 2-3 sentence summary here]

[Paste the categorized PR list from the file]

Value saved: Consistent structure, faster execution, lower costs. AI focuses on high-value summaries only.

Strategy 3: Convert to Scheduled Static Analysis

When agentic research consistently finds the same types of issues, schedule static analysis tools instead.

Before (Agentic weekly audit):

---
on:
  schedule: weekly
engine: claude
safe-outputs:
  create-discussion:
---

# Security Audit

Scan the codebase for common security issues:
- Hardcoded secrets
- SQL injection patterns
- XSS vulnerabilities
- Unsafe dependencies

Create a discussion with findings.

After (Deterministic scanning + AI only for prioritization):

name: Security Scan
on:
  schedule:
    - cron: '0 9 * * 1' # Weekly Monday 9am
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: github/codeql-action/init@v3
      - uses: github/codeql-action/analyze@v3
      - run: npm audit --json > audit.json
      - run: gh secret-scanning
      # Only call AI if critical issues found
      - if: contains(steps.scan.outputs.severity, 'critical')
        run: gh workflow run ai-triage.yml

Value saved: Fast, cheap scanning. AI only invoked for critical findings requiring context.

Evolution Decision Framework

Use this framework to determine the right approach:

Criteria Deterministic Hybrid Agentic
Input predictability Structured, consistent Mixed format Unstructured, variable
Output requirements Exact reproducibility Structured + context Flexible, adaptive
Frequency High (> 10/day) Medium (1-10/day) Low (< 1/day)
Cost sensitivity Critical Moderate Low priority
Human review Minimal Review exceptions Review all
Pattern maturity Stable (6+ months) Emerging (2-6 months) Experimental (< 2 months)
Business criticality Production critical Important but not critical Nice to have

Example Decision Process

Scenario: Automatically triaging security vulnerability reports

  1. Input: Unstructured vulnerability reports from multiple sources
    Score: Agentic (unstructured)

  2. Output: Need categorization and recommended action
    Score: Hybrid (structured categories + reasoning)

  3. Frequency: 5-10 reports per week
    Score: Hybrid (medium frequency)

  4. Cost: Budget exists for AI operations
    Score: Agentic (cost acceptable)

  5. Review: Security team reviews all recommendations
    Score: Agentic (human review mandatory)

  6. Pattern maturity: New vulnerability types emerging constantly
    Score: Agentic (patterns unstable)

Decision: Start with full agentic workflow for maximum flexibility. After 3-6 months, analyze logs to identify common patterns, then migrate to hybrid with deterministic rules for common CVEs and AI for novel vulnerabilities.

Migration Strategies

Gradual Conversion

Don't replace entire workflows at once. Identify highest-value components to make agentic:

Workflow Evolution Path:

1. All Deterministic (Week 0)
   ├─ Scan for issues
   ├─ Post raw results
   └─ Manual triage

2. Hybrid - AI Summary (Week 2)
   ├─ Scan for issues
   ├─ AI summarizes and categorizes
   └─ Manual triage

3. Hybrid - AI Triage (Week 6)
   ├─ Scan for issues
   ├─ AI categorizes and triages
   └─ Human reviews AI decisions

4. Mostly Agentic (Week 12)
   ├─ Deterministic rules for obvious cases
   ├─ AI handles complex cases
   └─ AI suggests actions (human approves)

A/B Testing

Run both deterministic and agentic workflows in parallel to compare:

---
on:
  issues:
    types: [opened]
engine: copilot
safe-outputs:
  add-labels:
    max: 3

steps:
  - id: deterministic-triage
    run: |
      # Existing deterministic logic
      ./scripts/triage.sh "${{ github.event.issue.number }}"
      echo "labels=$(cat labels.txt)" >> "$GITHUB_OUTPUT"
---

# AI Triage Experiment

Existing deterministic workflow labeled this issue: ${{ steps.deterministic-triage.outputs.labels }}

Now you independently analyze and label the issue. After labeling, create an internal tracking issue noting if your labels match the deterministic ones.

Issue to triage: "${{ github.event.issue.title }}"

Metrics-Driven Evolution

Track key metrics to guide evolution decisions:

  • Accuracy: Agreement between AI and human review
  • Coverage: % of cases handled without human intervention
  • Cost: AI inference costs vs. developer time saved
  • Speed: Time to complete workflow
  • Satisfaction: User feedback on AI-generated content

Evolution triggers:

  • Accuracy > 95% for 3 months → Consider converting frequent patterns to deterministic rules
  • Cost > $100/month → Identify optimization opportunities or add deterministic filters
  • Speed > 60 seconds → Extract preprocessing to deterministic steps
  • Satisfaction < 70% → Refine prompts or add more context

Best Practices

Starting New Workflows

  1. Default to deterministic for well-understood, stable tasks
  2. Start agentic for new, experimental, or complex reasoning tasks
  3. Design for evolution - structure workflows to easily add/remove AI components

Evolving Existing Workflows

  1. Monitor patterns - Track AI decisions to identify common paths
  2. Codify gradually - Convert high-frequency patterns to deterministic rules
  3. Keep AI for edges - Retain AI for complex, rare, or novel cases
  4. Measure impact - Track cost, speed, and accuracy before/after changes

Maintaining Equilibrium

  1. Regular review - Quarterly assessment of workflow portfolio
  2. Cost awareness - Monitor AI costs per workflow
  3. Pattern extraction - Document repeated AI behaviors for potential codification
  4. Feedback loops - Use workflow results to improve future iterations

Related Documentation

Summary

The equilibrium between agentic and deterministic workflows is not static - it evolves as:

  • New patterns emerge from agentic experimentation
  • Proven patterns stabilize into deterministic rules
  • Requirements change based on cost, speed, and accuracy needs
  • AI capabilities improve enabling new use cases

Success comes from intentional evolution: starting agentic for flexibility, codifying proven patterns for efficiency, and maintaining the right balance for your organization's needs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions