generative-computing · planetf1 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026 · Mar 25, 2026
@@ -0,0 +1,159 @@
+---
+name: skill-author
+description: >
+  Draft, validate, and install new agent skills. Use when asked to create a new
+  skill, automate a workflow, or add a capability. Produces cross-compatible
+  SKILL.md files that work in both Claude Code and IBM Bob.
+argument-hint: "[skill-name]"
+compatibility: "Claude Code, IBM Bob"
+metadata:
+  version: "2026-03-25"
+  capabilities: [bash, read_file, write_file]
+---
+
+# Skill Authoring Meta-Skill
+
+Create new agent skills that work across Claude Code (CLI/IDE) and IBM Bob.
+
+## Skill Location
+
+Skills live under `.agents/skills/<name>/SKILL.md`.
+
+Discovery configuration varies by tool:
+- **Claude Code:** Add `"skillLocations": [".agents/skills"]` to `.claude/settings.json`.
+  Without this, Claude Code looks in `.claude/skills/` by default.
+- **IBM Bob:** Discovers `.agents/skills/` natively per agentskills.io convention.
+
+Both tools read the same `SKILL.md` format. Use the frontmatter schema below
+to maximise compatibility.
+
+## Workflow
+
+1. **Name the skill** — kebab-case, max 64 chars (e.g. `api-tester`, `audit-markers`).
+
+2. **Scaffold the directory:**
+   ```
+   .agents/skills/<name>/
+   ├── SKILL.md          # Required — frontmatter + instructions
+   ├── scripts/          # Optional — helper scripts
+   └── templates/        # Optional — output templates
+   ```
+
+3. **Write SKILL.md** — YAML frontmatter + markdown body (see schema below).
+
+4. **Dry-run review** — mentally execute the skill against a realistic scenario
+   before finalising. Walk through the procedure on a concrete example (a real
+   file in the repo, not a hypothetical) and check for:
+   - **Scaling gaps:** Does the procedure work for 1 file AND 100 files? If the
+     skill accepts a directory or glob, it needs a triage strategy (e.g., "grep
+     first to find candidates, then deep-read only files with issues") — not
+     just "read every file fully."
+   - **Boundary ambiguity:** If the skill defines categories or classifications,
+     test the boundaries between adjacent categories with a real example. The
+     edges are where agents will disagree or ask the user. Sharpen definitions
+     until two agents reading the same test would classify it the same way.
+   - **Stale references:** If the skill describes project state ("this hook needs
+     to be added", "this marker is not yet registered"), verify those statements
+     are still true. Embed checks ("read conftest.py to confirm") rather than
+     assertions that rot.
+   - **Output format at scale:** Run the report template mentally against the
+     largest expected input. A per-function report for 5 files is fine; for 165
+     files it's unusable. Design output for the largest scope — summary table
+     first, per-item detail only where issues exist.
+   - **Format coverage:** If the skill operates on multiple input formats (e.g.,
+     `pytestmark` lists AND `# pytest:` comments), verify each format is
+     explicitly addressed in the procedure. Implicit coverage causes agents to
+     skip or guess.
+   - **Rigid rules:** If you wrote "always X" or "never Y", find the edge case
+     where the rule is wrong. Add the escape hatch. E.g., "per-function only"
+     should say "module-level is acceptable when every function qualifies."
+
+5. **Validate:**
+   - Check the skill is discoverable: list files in `.agents/skills/`.
+   - Confirm no frontmatter warnings from the IDE.
+   - Verify the skill does not conflict with existing skills or `AGENTS.md`.
+
+## SKILL.md Frontmatter Schema
+
+Use only fields from the **cross-compatible** set to avoid IDE warnings.
+
+### Cross-compatible fields (use these)
+
+| Field | Type | Purpose |
+|-------|------|---------|
+| `name` | string | Kebab-case identifier. Becomes the `/slash-command`. Max 64 chars. |
+| `description` | string | What the skill does and when to trigger it. Be specific — agents use this to decide whether to invoke the skill automatically. |
+| `argument-hint` | string | Autocomplete hint. E.g. `"[file] [--dry-run]"`, `"[issue-number]"`. |
+| `compatibility` | string | Which tools support this skill. E.g. `"Claude Code, IBM Bob"`. |
+| `disable-model-invocation` | boolean | `true` = manual `/name` only, no auto-invocation. |
+| `user-invocable` | boolean | `false` = hidden from `/` menu. Use for background knowledge skills. |
+| `license` | string | SPDX identifier if publishing. E.g. `"Apache-2.0"`. |
+| `metadata` | object | Free-form key-value pairs for tool-specific or custom fields. |
+
+### Tool-specific fields (put under `metadata`)
+
+These are useful but not universally supported — nest them under `metadata`:
+
+```yaml
+metadata:
+  version: "2026-03-25"
+  capabilities: [bash, read_file, write_file]   # Bob/agentskills.io
+```
+
+Claude Code's `allowed-tools` and `context`/`agent` fields are recognised by
+Claude Code but may trigger warnings in Bob's validator. If needed, add them
+to `metadata` or accept the warnings.
+
+### Example frontmatter
+
+```yaml
+---
+name: my-skill
+description: >
+  Does X when Y. Use when asked to Z.
+argument-hint: "[target] [--flag]"
+compatibility: "Claude Code, IBM Bob"
+metadata:
+  version: "2026-03-25"
+  capabilities: [bash, read_file, write_file]
+---
+```
+
+## SKILL.md Body Structure
+
+After frontmatter, write clear markdown instructions the agent follows:
+
+1. **Context section** — what the skill operates on, key reference files.
+2. **Procedure** — numbered steps the agent follows. Be explicit about decisions and edge cases.
+3. **Rules / constraints** — hard rules the agent must not break.
+4. **Output format** — what the agent should produce (report, edits, summary).
+
+### Guidelines
+
+- **Be specific.** Vague instructions produce inconsistent results across models.
+  "Check if markers are correct" is worse than "Compare the test's assertions
+  to the qualitative decision rule in section 3."
+- **Reference project files.** Point to docs, configs, and examples by relative
+  path so the agent can read them. E.g. "See `test/MARKERS_GUIDE.md` for the
+  full marker taxonomy."
+- **Declare scope boundaries.** State what the skill does NOT do. E.g. "This
+  skill does not modify conftest.py — flag infrastructure issues as notes."
+- **Use `$ARGUMENTS`** for user input. `$ARGUMENTS` is the full argument string;
+  `$1`, `$2` etc. are positional.
+- **Keep SKILL.md under 500 lines.** Use supporting files for large reference
+  material (link to them from the body).
+- **Portability:** use relative paths from the repo root, never absolute paths.
+- **Formatting:** use YYYY-MM-DD for dates, 24-hour clock for times, metric units.
+- **Design for variable scope.** If the skill can operate on a single file or an
+  entire directory, provide a triage strategy for the large case. Agents given
+  "audit everything" with no prioritisation will either read every file (slow)
+  or skip files (incomplete).
+- **Sharpen category boundaries.** When defining classifications, the boundary
+  between adjacent categories causes the most disagreement. Add a "key
+  distinction from X" sentence for each pair of adjacent tiers.
+- **Avoid temporal assertions.** Don't write "this conftest hook needs to be
+  added" — write "check whether conftest.py already has the hook." State that
+  goes stale silently is worse than no guidance at all.
+- **Qualify absolutes.** "Always X" and "never Y" rules need escape hatches for
+  the common exception. E.g., "per-function only — unless every function in the
+  file qualifies, in which case module-level is acceptable."
@@ -0,0 +1,52 @@
+"""Validate SKILL.md frontmatter for agent skills."""
+
+import json
+import os
+import sys
+
+import yaml
+
+
+def validate_skill(skill_path: str) -> dict:
+    """Check that a skill directory has valid SKILL.md with required frontmatter keys."""
+    skill_file = os.path.join(skill_path, "SKILL.md")
+
+    if not os.path.exists(skill_file):
+        return {"status": "error", "message": "Missing SKILL.md"}
+
+    try:
+        with open(skill_file) as f:
+            # safe_load_all handles the --- delimiters correctly and won't
+            # break on markdown horizontal rules later in the file.
+            frontmatter = next(yaml.safe_load_all(f))
+
+        if not isinstance(frontmatter, dict):
+            return {"status": "error", "message": "Frontmatter is not a YAML mapping"}
+
+        # Root-level required keys
+        for key in ("name", "description"):
+            if key not in frontmatter:
+                return {"status": "error", "message": f"Missing root key: {key}"}
+
+        # version lives under metadata (per skill-author guide)
+        meta = frontmatter.get("metadata")
+        if not isinstance(meta, dict) or "version" not in meta:
+            return {
+                "status": "error",
+                "message": "Missing nested key: metadata.version",
+            }
+
+        return {"status": "success", "data": frontmatter}
+
+    except yaml.YAMLError as e:
+        return {"status": "error", "message": f"Invalid YAML: {e}"}
+    except StopIteration:
+        return {"status": "error", "message": "No YAML frontmatter found"}
+
+
+if __name__ == "__main__":
+    if len(sys.argv) < 2:
+        print("Usage: python3 validate_skill.py <skill-directory>", file=sys.stderr)
+        sys.exit(1)
+    result = validate_skill(sys.argv[1])
+    print(json.dumps(result))
@@ -0,0 +1,3 @@
+{
+  "skillLocations": [".agents/skills"]
+}
@@ -451,7 +451,8 @@ pyrightconfig.json
 
 # AI agent configs
 .bob/
-.claude/
+.claude/*
+!.claude/settings.json
 
 # Generated API documentation (built by tooling/docs-autogen/)
 docs/docs/api/

@@ -25,7 +25,6 @@ uv run pytest                         # Default: qualitative tests, skip slow te
 uv run pytest -m "not qualitative"    # Fast tests only (~2 min)
 uv run pytest -m slow                 # Run only slow tests (>5 min)
 uv run pytest --co -q                 # Run ALL tests including slow (bypass config)
-uv run pytest --isolate-heavy         # Enable GPU process isolation (opt-in)
 uv run ruff format .                  # Format code
 uv run ruff check .                   # Lint code
 uv run mypy .                         # Type check
@@ -44,49 +43,44 @@ uv run mypy .                         # Type check
 | `cli/` | CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`) |
 | `test/` | All tests (run from repo root) |
 | `docs/examples/` | Example code (run as tests via pytest) |
+| `.agents/skills/` | Agent skills ([agentskills.io](https://agentskills.io) standard) |
 | `scratchpad/` | Experiments (git-ignored) |
 
 ## 3. Test Markers
-All tests and examples use markers to indicate requirements. The test infrastructure automatically skips tests based on system capabilities.
-
-**Backend Markers:**
-- `@pytest.mark.ollama` — Requires Ollama running (local, lightweight)
-- `@pytest.mark.huggingface` — Requires HuggingFace backend (local, heavy)
-- `@pytest.mark.vllm` — Requires vLLM backend (local, GPU required)
-- `@pytest.mark.openai` — Requires OpenAI API (requires API key)
-- `@pytest.mark.watsonx` — Requires Watsonx API (requires API key)
-- `@pytest.mark.litellm` — Requires LiteLLM backend
-
-**Capability Markers:**
-- `@pytest.mark.requires_gpu` — Requires GPU
-- `@pytest.mark.requires_heavy_ram` — Requires 48GB+ RAM
-- `@pytest.mark.requires_api_key` — Requires external API keys
-- `@pytest.mark.qualitative` — LLM output quality tests (skipped in CI via `CICD=1`)
-- `@pytest.mark.llm` — Makes LLM calls (needs at least Ollama)
-- `@pytest.mark.slow` — Tests taking >5 minutes (skipped via `SKIP_SLOW=1`)
-
-**Execution Strategy Markers:**
-- `@pytest.mark.requires_gpu_isolation` — Requires OS-level process isolation to clear CUDA memory (use with `--isolate-heavy` or `CICD=1`)
-
-**Examples in `docs/examples/`** use comment-based markers for clean code:
+Tests use a four-tier granularity system (`unit`, `integration`, `e2e`, `qualitative`) plus backend and resource markers. The `unit` marker is auto-applied by conftest — never write it explicitly. The `llm` marker is deprecated; use `e2e` instead.
+
+See **[test/MARKERS_GUIDE.md](test/MARKERS_GUIDE.md)** for the full marker reference (tier definitions, backend markers, resource gates, auto-skip logic, common patterns).
+
+**Examples in `docs/examples/`** use comment-based markers:
 ```python
-# pytest: ollama, llm, requires_heavy_ram
+# pytest: e2e, ollama, qualitative
 """Example description..."""
-
-# Your clean example code here
 ```
 
-Tests/examples automatically skip if system lacks required resources. Heavy examples (e.g., HuggingFace) are skipped during collection to prevent memory issues.
+⚠️ Don't add `qualitative` to trivial tests — keep the fast loop fast.
+⚠️ Mark tests taking >1 minute with `slow`.
+
+## 4. Agent Skills
+
+Skills live in `.agents/skills/` following the [agentskills.io](https://agentskills.io) open standard. Each skill is a directory with a `SKILL.md` file (YAML frontmatter + markdown instructions).
+
+**Tool discovery:**
 
-**Default behavior:**
-- `uv run pytest` skips slow tests (>5 min) but runs qualitative tests
-- Use `pytest -m "not qualitative"` for fast tests only (~2 min)
-- Use `pytest -m slow` or `pytest` (without config) to include slow tests
+| Tool              | Project skills    | Global skills       | Config needed                                                      |
+| ----------------- | ----------------- | ------------------- | ------------------------------------------------------------------ |
+| Claude Code       | `.agents/skills/` | `~/.claude/skills/` | `"skillLocations": [".agents/skills"]` in `.claude/settings.json`  |
+| IBM Bob           | `.bob/skills/`    | `~/.bob/skills/`    | Symlink: `.bob/skills` → `.agents/skills`                          |
+| VS Code / Copilot | `.agents/skills/` | —                   | None (auto-discovered)                                             |
 
-⚠️ Don't add `qualitative` to trivial tests—keep the fast loop fast.
-⚠️ Mark tests taking >5 minutes with `slow` (e.g., dataset loading, extensive evaluations).
+**Bob users:** create the symlink once per clone:
 
-## 4. Coding Standards
+```bash
+mkdir -p .bob && ln -s ../.agents/skills .bob/skills
+```
+
+**Available skills:** `/audit-markers`, `/skill-author`
+
+## 5. Coding Standards
 - **Types required** on all core functions
 - **Docstrings are prompts** — be specific, the LLM reads them
 - **Google-style docstrings** — `Args:` on the **class docstring only**; `__init__` gets a single summary sentence. Add `Attributes:` only when a stored value differs in type/behaviour from its constructor input (type transforms, computed values, class constants). See CONTRIBUTING.md for a full example.
@@ -96,37 +90,38 @@ Tests/examples automatically skip if system lacks required resources. Heavy exam
 - **Friendly Dependency Errors**: Wraps optional backend imports in `try/except ImportError` with a helpful message (e.g., "Please pip install mellea[hf]"). See `mellea/stdlib/session.py` for examples.
 - **Backend telemetry fields**: All backends must populate `mot.usage` (dict with `prompt_tokens`, `completion_tokens`, `total_tokens`), `mot.model` (str), and `mot.provider` (str) in their `post_processing()` method. Metrics are automatically recorded by `TokenMetricsPlugin` — don't add manual `record_token_usage_metrics()` calls.
 
-## 5. Commits & Hooks
+## 6. Commits & Hooks
 [Angular format](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit): `feat:`, `fix:`, `docs:`, `test:`, `refactor:`, `release:`
 
 Pre-commit runs: ruff, mypy, uv-lock, codespell
 
-## 6. Timing
+## 7. Timing
 > **Don't cancel**: `pytest` (full) and `pre-commit --all-files` may take minutes. Canceling mid-run can corrupt state.
 
-## 7. Common Issues
+## 8. Common Issues
 | Problem | Fix |
 |---------|-----|
 | `ComponentParseError` | Add examples to docstring |
 | `uv.lock` out of sync | Run `uv sync` |
 | Ollama refused | Run `ollama serve` |
 | Telemetry import errors | Run `uv sync` to install OpenTelemetry deps |
 
-## 8. Self-Review (before notifying user)
+## 9. Self-Review (before notifying user)
 1. `uv run pytest test/ -m "not qualitative"` passes?
 2. `ruff format` and `ruff check` clean?
 3. New functions typed with concise docstrings?
 4. Unit tests added for new functionality?
 5. Avoided over-engineering?
 
-## 9. Writing Tests
+## 10. Writing Tests
+
 - Place tests in `test/` mirroring source structure
 - Name files `test_*.py` (required for pydocstyle)
 - Use `gh_run` fixture for CI-aware tests (see `test/conftest.py`)
 - Mark tests checking LLM output quality with `@pytest.mark.qualitative`
 - If a test fails, fix the **code**, not the test (unless the test was wrong)
 
-## 10. Writing Docs
+## 11. Writing Docs
 
 If you are modifying or creating pages under `docs/docs/`, follow the writing
 conventions in [`docs/docs/guide/CONTRIBUTING.md`](docs/docs/guide/CONTRIBUTING.md).
@@ -144,7 +139,7 @@ Key rules that differ from typical Markdown habits:
   mellea source; mark forward-looking content with `> **Coming soon:**`
 - **No visible TODOs** — if content is missing, open a GitHub issue instead
 
-## 11. Feedback Loop
+## 12. Feedback Loop
 
 Found a bug, workaround, or pattern? Update the docs:
 

@@ -0,0 +1,5 @@
+# Claude Code Directives
+@AGENTS.md
+
+## Execution
+- If instructed to create a new capability, strictly trigger the `skill-author` meta-skill to ensure cross-compatibility.