887 detectors · ML-scored confidence · decode-through scanning · live verification
Finds base64-encoded, hex-wrapped, and nested secrets that regex-only scanners miss entirely.
$ keyhog scan --path .
██ ██ ████████ ██ ██ ██ ██ ██████ ██████
██ ██ ██ ██ ██ ██ ██ ██ ██ ██
█████ █████ ████ ███████ ██ ██ ██ ███
██ ██ ██ ██ ██ ██ ██ ██ ██ ██
██ ██ ████████ ██ ██ ██ ██████ ██████
v0.2.0 · Secret Scanner · 887 detectors
by SanthSecurity
critical 82% ██████░░ GitHub Classic PAT
ghp_...7890 src/config.py:42
critical 78% █████░░░ Stripe Secret Key
sk_l...ab12 .env:7
critical 78% █████░░░ GitHub PAT (decoded from base64)
ghp_...7890 k8s/secret.yaml:12
3 secrets found · 2 unique credentials · 0 false positives
Most secret scanners run regex against plaintext. They miss anything encoded, embedded, or obfuscated. KeyHog doesn't.
Decode-through scanning recursively unwraps base64, hex, URL encoding, quoted-printable, and Unicode escapes before pattern matching — catching secrets buried in Kubernetes manifests, CI configs, Docker layers, and compiled artifacts that other tools never see.
ML confidence scoring uses a 3,969-parameter neural network trained on 200K real credentials to separate secrets from hashes, test fixtures, and documentation strings. Every finding comes with a 0–100% score. Zero false positives at the default 70% threshold.
Live verification hits real APIs (AWS, GitHub, Stripe, Slack, OpenAI, and more) to confirm whether a leaked credential is actually active.
| KeyHog | TruffleHog | Gitleaks | Semgrep | |
|---|---|---|---|---|
| Detectors | 887+ | 800+ | 150+ | Rules |
| Recall (blind test) | 98% | 32% | ~30% | ~40% |
| False positives | Zero | Moderate | Low | High |
| Base64 decode | ✓ | ✗ | ✗ | ✗ |
| Hex decode | ✓ | ✗ | ✗ | ✗ |
| ML scoring | ✓ (99.5%) | Partial | ✗ | ✗ |
| Live verify | ✓ | ✓ | ✗ | ✗ |
| Throughput | ~50 MB/s | ~10–30 | ~5–15 | ~20 |
| License | MIT | AGPL | MIT | LGPL |
KeyHog finds 74 credentials that TruffleHog misses. TruffleHog finds 0 that KeyHog misses.
- Use
KeyHogwhen you need high recall on encoded secrets, embeddable Rust crates, and optional live verification. - Use
TruffleHogwhen you prioritize its existing verification workflows over a lightweight Rust-native integration story. - Use
Gitleakswhen plaintext regex scanning is enough and you want a simpler rule engine. - Use
Semgrepwhen your main goal is broad static analysis rather than secret-specific recall.
# Install
cargo install keyhog
# Scan a directory
keyhog scan --path .
# Scan with verification
keyhog scan --path . --verify
# Scan a git repo's full history
keyhog scan --git ./repo
# CI mode: only changed files, SARIF output
keyhog scan --git-diff origin/main --format sarif --fail-on-findings# Install the published CLI
cargo install keyhog
# Or build from source
git clone https://github.com/santhsecurity/keyhog.git
cd keyhog
cargo install --path crates/cli[dependencies]
keyhog-core = "0.2"
keyhog-scanner = "0.2"
keyhog-sources = "0.2"
keyhog-verifier = "0.2"keyhog-coreprovides detector specs, findings, reporting, and allowlists.keyhog-scannercompiles detectors and scansChunkvalues.keyhog-sourcesprovides filesystem, stdin, git, Docker, S3, and binary inputs.keyhog-verifierverifies deduplicated findings asynchronously.keyhogis the end-user binary package.
use keyhog_core::{Chunk, ChunkMetadata, DetectorSpec, PatternSpec, Severity};
use keyhog_scanner::CompiledScanner;
let scanner = CompiledScanner::compile(vec![DetectorSpec {
id: "demo-token".into(),
name: "Demo Token".into(),
service: "demo".into(),
severity: Severity::High,
patterns: vec![PatternSpec {
regex: "demo_[A-Z0-9]{8}".into(),
description: None,
group: None,
}],
companion: None,
verify: None,
keywords: vec!["demo_".into()],
}])?;
let findings = scanner.scan(&Chunk {
data: "TOKEN=demo_ABC12345".into(),
metadata: ChunkMetadata {
source_type: "filesystem".into(),
path: Some(".env".into()),
commit: None,
author: None,
date: None,
},
});
assert_eq!(findings.len(), 1);
# Ok::<(), keyhog_scanner::ScanError>(())docker run --rm -v $(pwd):/scan ghcr.io/keyhog/keyhog:latest scan --path /scan- uses: keyhog/keyhog-action@v1
with:
path: .
min-confidence: 0.7
format: sarifrepos:
- repo: https://github.com/santhsecurity/keyhog
rev: v0.2.0
hooks:
- id: keyhog# Scan directory
keyhog scan --path ./src
# JSON output
keyhog scan --path . --format json
# Only high-severity findings
keyhog scan --path . --severity high
# Scan last 5 commits
keyhog scan --git-diff HEAD~5
# Staged files only (for pre-commit)
keyhog scan --git-diff --staged
# Custom confidence threshold
keyhog scan --path . --min-confidence 0.8
# Fail CI on any finding
keyhog scan --path . --fail-on-findings| Format | Flag | Use for |
|---|---|---|
| Text | --format text |
Human reading (default) |
| JSON | --format json |
Programmatic use |
| JSONL | --format jsonl |
Streaming / log ingestion |
| SARIF | --format sarif |
GitHub code scanning |
KeyHog uses a two-phase architecture built on Aho-Corasick automata:
Input Phase 1: Prefilter Phase 2: Confirm Score & Verify
───── ────────────────── ──────────────── ──────────────
┌───────────────────┐ ┌──────────────────┐ ┌────────────────┐
file │ Decode-Through │ │ Regex Confirm │ │ ML Classifier │
stdin ────▶ │ Aho-Corasick │────▶│ Match regions │────▶│ 3,969 params │
git │ O(n) single-pass │ │ per candidate │ │ 99.5% acc │
└───────────────────┘ └──────────────────┘ └───────┬────────┘
│
▼
┌────────────────┐
│ Live Verify │
│ (optional) │
│ async tokio │
└────────────────┘
Before pattern matching, KeyHog recursively decodes:
- Base64 (standard + URL-safe)
- Hexadecimal
- URL encoding
- Quoted-printable
- Unicode escapes
# KeyHog catches this. Other scanners don't.
encoded = "Z2hwX3h4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4" # base64(ghp_...)Same credential, different context, different confidence:
# 82% — production config
production_config = "ghp_xxxxxxxxxxxxxxxxxxxx"
# 25% — test fixture (auto-detected via AST context)
def test_auth():
token = "ghp_xxxxxxxxxxxxxxxxxxxx"Detectors are TOML — no code changes needed:
# detectors/my-service.toml
[detector]
id = "my-service-api-key"
name = "My Service API Key"
severity = "critical"
keywords = ["ms_live_", "ms_test_"]
[[detector.patterns]]
regex = 'ms_(live|test)_[a-zA-Z0-9]{32}'
[detector.verify]
method = "GET"
url = "https://api.myservice.com/v1/status"
[detector.verify.auth]
type = "bearer"
field = "match"detectors = "detectors" # Path to detector TOML files
severity = "medium" # Minimum: info | low | medium | high | critical
format = "text" # Output: text | json | jsonl | sarif
min_confidence = 0.7 # ML confidence threshold (0.0–1.0)
threads = 8 # Parallel scan threads
dedup = "credential" # Dedup: credential | file | none
deep = true # Enable decode-through + entropy + multiline
timeout = 10 # Verification timeout (seconds)
show_secrets = false # Redact credentials in output# Paths
path:tests/**
path:**/*.md
# Detectors
detector:entropy
detector:generic-api-key
# Specific findings by hash
hash:abc123def456# keyhog:ignore
GITHUB_TOKEN = "ghp_xxxxxxxxxxxxxxxxxxxx"
# keyhog:ignore detector=github-token
api_key = "ghp_yyyyyyyyyyyyyyyyyyyy"
# keyhog:ignore reason="public CI token"
TOKEN = "ghp_zzzzzzzzzzzzzzzzzzzz"# Full build (default)
cargo build --release
# Fast mode: regex-only, no ML/decode/multiline — for pre-commit hooks
cargo build --release --no-default-features --features fast
# With live verification
cargo build --release --features verifyAll benchmarks: AMD Ryzen 9 5900X, 32 GB RAM, NVMe SSD.
| Detectors | 1 MB | 10 MB | 100 MB |
|---|---|---|---|
| 100 | 55 MB/s | 58 MB/s | 62 MB/s |
| 500 | 48 MB/s | 52 MB/s | 56 MB/s |
| 887 | 42 MB/s | 46 MB/s | 50 MB/s |
| Repository | Size | KeyHog | TruffleHog | Gitleaks |
|---|---|---|---|---|
| facebook/react | 350 MB | 8s | 25s | 45s |
| denoland/deno | 900 MB | 18s | 55s | 95s |
| rust-lang/rust | 2.1 GB | 42s | 120s | 200s |
| Service | Status | Latency |
|---|---|---|
| AWS | ✓ | ~200ms |
| GitHub | ✓ | ~150ms |
| Slack | ✓ | ~180ms |
| Stripe | ✓ | ~220ms |
| OpenAI | ✓ | ~250ms |
MIT — see LICENSE.
KeyHog by Santh
Built with Rust · Zero dependencies in core · keyhog.santh.io