PawnAI

A Python monolith for speaker diarization, audio transcription, and LLM-powered conversation analysis. Provides two CLI applications: pawn-diarize for audio processing and embedding management, and pawn-agent for conversational analysis of recorded sessions.

Features

Speaker Diarization: Identify and separate multiple speakers using pyannote.audio
Audio Transcription: Convert speech to text via NeMo Parakeet or faster-whisper backends
Combined Transcription & Diarization: Transcribe and label speakers in one command
Multi-file Processing: Process multiple files as one conversation with cross-file speaker alignment
Session Accumulation: Process long conversations in parts across multiple invocations
Speaker Embeddings: Extract and store 512-dim speaker vectors in PostgreSQL with pgvector
Conversation Analysis: Generate titles, summaries, key topics, sentiment, and per-speaker highlights
Knowledge Graph Extraction: Extract semantic triples (subject, relation, object) from transcripts
RAG Index: Embed transcripts and SiYuan notes for semantic similarity search
S3 Storage Management: List, filter, and delete objects in S3-compatible storage
SiYuan Notes Integration: Push analysis documents back to a SiYuan Notes instance
Background Queue Worker: S3-backed job queue with lease-based concurrency
GPU Support: Accelerated processing on CUDA-enabled devices

Installation

From Source

git clone <repository-url>
cd parakeet
pip install -e ".[dev]"

System Requirements

Python 3.10+
PostgreSQL with the pgvector extension
CUDA 11.0+ (optional, for GPU acceleration)
8 GB+ RAM recommended for ML models

Database Setup

# Apply all migrations
alembic upgrade head

Configuration

All settings live in pawnai.yaml (auto-discovered in the current directory, or passed via --config). Precedence: CLI flags → pawnai.yaml → environment variables → defaults.

models:
  hf_token: hf_your_token_here        # HuggingFace token for gated models
  # asr_model: nvidia/parakeet-tdt-0.6b-v3
  # diarization_model: pyannote/speaker-diarization-3.1

db_dsn: postgresql+psycopg://user:pass@localhost:5433/pawnai

paths:
  db_path: speakers_db
  audio_dir: audio/

device: auto    # auto | cuda | cpu

s3:
  bucket: my-audio-bucket
  endpoint_url: https://s3.amazonaws.com
  access_key: AKIAIOSFODNN7EXAMPLE
  secret_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  region: us-east-1
  prefix: ""
  verify_ssl: true
  path_style: true

siyuan:
  url: http://localhost:6806
  token: your-siyuan-token
  notebook: Meetings

agent:
  name: Bob
  anima: anima.md                     # optional personality/system-prompt file
  openai:
    model: qwen3:8b
    base_url: http://localhost:11434/v1
    api_key: ollama
  copilot:
    model: gpt-4o

rag:
  embed_model: Qwen/Qwen3-Embedding-0.6B
  embed_dim: 1024
  embed_device: cpu

queue:
  topic: pawn-jobs
  consumer_name: worker-1
  bucket_name: my-audio-bucket

pawn-diarize

Quick Start

# Check system status
pawn-diarize status

# Transcribe and diarize a meeting
pawn-diarize transcribe-diarize meeting.wav -o transcript.txt

# Label a speaker for future recognition
pawn-diarize label -f audio.wav -s SPEAKER_00 -n "Alice Smith"

Commands Reference

`diarize`

Perform speaker diarization on one or more audio files. Automatically recognizes known speakers from the database and replaces generic labels with names.

pawn-diarize diarize <audio_path>... [OPTIONS]

Options:
  --output, -o TEXT        Output file (.txt or .json)
  --config TEXT            Path to pawnai.yaml
  --db-dsn TEXT            PostgreSQL DSN
  --threshold, -t FLOAT    Similarity threshold for speaker matching (default: 0.7)
  --store-new/--no-store   Store embeddings for unknown speakers (default: enabled)

pawn-diarize diarize meeting.wav -o speakers.json
pawn-diarize diarize part1.wav part2.wav part3.wav -o full_diarization.json
pawn-diarize diarize meeting.wav --no-store

`transcribe`

Transcribe audio to text. Supports two backends: nemo (Nvidia Parakeet) and whisper (faster-whisper).

pawn-diarize transcribe <audio_path> [OPTIONS]

Options:
  --output, -o TEXT          Output file (.txt or .json)
  --config TEXT              Path to pawnai.yaml
  --db-dsn TEXT              PostgreSQL DSN
  --timestamps/--no-timestamps  Include word-level timestamps (default: enabled)
  --device, -d TEXT          cuda or cpu (default: auto)
  --chunk-duration, -c FLOAT Split audio into N-second chunks (helps avoid OOM)
  --backend TEXT             nemo or whisper (default: nemo)

pawn-diarize transcribe speech.wav -o transcript.txt
pawn-diarize transcribe large.mp3 --backend whisper --device cpu -c 300

`transcribe-diarize`

Combined transcription + diarization with speaker labels in a single pass. When multiple files are given they are treated as ordered chunks of the same conversation, with speaker labels aligned across files.

pawn-diarize transcribe-diarize <audio_path>... [OPTIONS]

Options:
  --output, -o TEXT             Output file (.txt or .json)
  --config TEXT                 Path to pawnai.yaml
  --session, -s TEXT            Session JSON file for accumulative processing
  --db-dsn TEXT                 PostgreSQL DSN
  --threshold, -t FLOAT         Speaker matching threshold (default: 0.7)
  --store-new/--no-store        Store unknown speaker embeddings (default: enabled)
  --device, -d TEXT             cuda or cpu
  --chunk-duration, -c FLOAT    Split audio into chunks
  --cross-threshold, -x FLOAT   Cross-file speaker alignment threshold (default: 0.85)
  --no-timestamps               Hide timestamps in output
  --backend TEXT                nemo or whisper (default: nemo)

# Single file
pawn-diarize transcribe-diarize meeting.wav -o transcript.txt

# Multiple files (ordered chunks of same conversation)
pawn-diarize transcribe-diarize part1.wav part2.wav part3.wav -o full.txt

# Accumulate across invocations
pawn-diarize transcribe-diarize part1.wav --session conv.json
pawn-diarize transcribe-diarize part2.wav --session conv.json
pawn-diarize transcribe-diarize part3.wav --session conv.json -o final.txt

`embed`

Extract and store speaker embeddings (512-dim) in PostgreSQL for future recognition.

pawn-diarize embed <audio_path> --speaker-id SPEAKER_ID [OPTIONS]

Options:
  --speaker-id, -s TEXT   Unique speaker identifier (required)
  --config TEXT           Path to pawnai.yaml
  --db-dsn TEXT           PostgreSQL DSN

pawn-diarize embed person.wav --speaker-id alice_smith

`search`

Find speakers with similar voice characteristics via cosine similarity on stored embeddings.

pawn-diarize search <speaker_id> [OPTIONS]

Options:
  --limit INT    Maximum results (default: 5)
  --db-dsn TEXT  PostgreSQL DSN

`label`

Assign a human-readable name to a speaker for automatic recognition in future diarizations.

pawn-diarize label [OPTIONS]

Options:
  --file, -f TEXT      Audio file containing the speaker
  --speaker, -s TEXT   Speaker label (e.g. SPEAKER_00)
  --name, -n TEXT      Human-readable name
  --session TEXT       Session ID to scope the labeling
  --list, -l           List all speaker name mappings
  --config TEXT        Path to pawnai.yaml
  --db-dsn TEXT        PostgreSQL DSN

pawn-diarize label -f audio.wav -s SPEAKER_00 -n "Alice Smith"
pawn-diarize label --list

Typical workflow:

transcribe-diarize → transcript with SPEAKER_00, SPEAKER_01, …
label → assign names to those labels
Future transcribe-diarize runs → names appear automatically

`session-relabel`

Bulk-rename a mis-identified speaker across an entire session.

pawn-diarize session-relabel --session SESSION_ID --from OLD_NAME --to NEW_NAME [OPTIONS]

Options:
  --yes, -y    Skip confirmation prompt
  --db-dsn TEXT
  --config TEXT

`session-info`

Show speakers and their embedding source files for a session.

pawn-diarize session-info <session_id> [OPTIONS]

`sessions`

List all sessions or inspect a specific one.

pawn-diarize sessions [OPTIONS]

Options:
  --session TEXT    Inspect a specific session
  --head INT        Show first N segments
  --tail INT        Show last N segments
  --output TEXT     Save output to file
  --config TEXT     Path to pawnai.yaml

`sync-siyuan`

Push a session's analysis to a SiYuan Notes instance as a new document.

pawn-diarize sync-siyuan [OPTIONS]

Options:
  --session TEXT          Session ID to sync
  --all                   Sync all sessions with stored analysis
  --notebook TEXT         Target SiYuan notebook
  --token TEXT            SiYuan API token
  --url TEXT              SiYuan instance URL
  --path-template TEXT    Document path template
  --daily-note            Insert into today's daily note
  --db-dsn TEXT
  --config TEXT

`s3 ls`

List objects in the configured S3 bucket with POSIX-style output (timestamp and size always shown).

pawn-diarize s3 ls [PREFIX] [OPTIONS]

Options:
  --config TEXT           Path to pawnai.yaml
  --recursive, -r         Flat listing (no directory grouping)
  --time, -t              Sort by modification time, newest first
  --reverse               Reverse sort order
  --older-than INTEGER    Show only objects older than N days

pawn-diarize s3 ls
pawn-diarize s3 ls recordings/ -t
pawn-diarize s3 ls recordings/ -r -t --reverse
pawn-diarize s3 ls --older-than 30

Output format:

2024-03-15 14:32:10    45.2 MB  recordings/2024/meeting.wav
                PRE             recordings/archive/

`s3 rm`

Delete objects from S3 by key or by age. Always prefer --dry-run first.

pawn-diarize s3 rm [PATH] [OPTIONS]

Options:
  --config TEXT           Path to pawnai.yaml
  --older-than INTEGER    Delete all objects older than N days
  --prefix TEXT           Limit --older-than to this key prefix
  --dry-run, -n           Show what would be deleted without making changes
  --yes, -y               Skip confirmation prompt for bulk deletions

# Delete a specific file
pawn-diarize s3 rm recordings/2024/old.wav

# Preview bulk deletion
pawn-diarize s3 rm --older-than 30 --dry-run

# Delete all objects older than 90 days under a prefix
pawn-diarize s3 rm --older-than 90 --prefix recordings/ --yes

`listen`

Background worker that polls a pawn-queue topic and executes jobs (transcribe-diarize, transcribe, diarize, embed, analyze, sync-siyuan).

pawn-diarize listen [OPTIONS]

Options:
  --config TEXT          Path to pawnai.yaml
  --topic TEXT           Queue topic name
  --consumer-name TEXT   Worker identity for lease-based concurrency

`status`

Show system status: device, CUDA availability, GPU info, configured models.

pawn-diarize status [--config TEXT]

S3 Audio Paths

All audio processing commands accept s3:// URIs directly — files are downloaded to a temp location, processed, then cleaned up automatically.

pawn-diarize transcribe s3://my-bucket/recordings/meeting.wav -o transcript.txt
pawn-diarize transcribe-diarize s3://my-bucket/recordings/*.wav

pawn-agent

LLM-powered conversational agent for analyzing pawn-diarize sessions. Backed by PydanticAI and supports multiple providers (OpenAI-compatible, GitHub Copilot, Anthropic, Google, Groq, Mistral, and others).

Tools are auto-discovered from pawn_agent/tools/ — any module that exports NAME, DESCRIPTION, and build(cfg) is registered automatically.

Commands Reference

`chat`

Start an interactive multi-turn conversation session.

pawn-agent chat [OPTIONS]

Options:
  --config TEXT    Path to pawnai.yaml
  --model TEXT     Override the configured LLM model
  --session TEXT   Session hint for context (e.g. a session ID to analyze)

`run`

Execute a single prompt and exit (useful for scripting).

pawn-agent run "<prompt>" [OPTIONS]

Options:
  --config TEXT    Path to pawnai.yaml
  --model TEXT     Override the configured LLM model
  --session TEXT   Session hint

pawn-agent run "Summarize session abc123 and save to SiYuan"

`tools`

List all registered agent tools with their descriptions.

pawn-agent tools [--config TEXT]

`models`

List available Copilot models with billing and reasoning effort details.

pawn-agent models [--config TEXT]

Available Tools

Tool	Description
`query_conversation`	Fetch the full transcript for a session from the database
`analyze_summary`	Run structured analysis (title, summary, topics, sentiment, tags) and persist to DB
`get_analysis`	Retrieve the most recent stored analysis for a session
`extract_graph`	Extract knowledge-graph triples from a transcript and persist to DB
`vectorize`	Embed session transcripts or SiYuan pages into the RAG index
`search_knowledge`	Semantic similarity search over transcript chunks and SiYuan notes
`save_to_siyuan`	Save Markdown content to SiYuan Notes as a new document
`fetch_siyuan_page`	Fetch the text content of a SiYuan page by path
`rag_stats`	Show a summary of the RAG vector index (sources and chunk counts)

Project Structure

parakeet/
├── pawn_diarize/
│   ├── __main__.py           # Entry point → pawn-diarize
│   ├── cli/
│   │   ├── commands.py       # All CLI command definitions
│   │   └── utils.py          # Console, helpers
│   ├── core/
│   │   ├── config.py         # pawnai.yaml loader
│   │   ├── diarization.py    # pyannote.audio + DBSCAN clustering
│   │   ├── transcription.py  # NeMo Parakeet / faster-whisper backends
│   │   ├── embeddings.py     # Speaker embeddings (PostgreSQL + pgvector)
│   │   ├── combined.py       # Multi-file transcription+diarization
│   │   ├── database.py       # SQLAlchemy ORM (embeddings, segments, analysis, RAG, graph)
│   │   ├── analysis.py       # Session analysis via Copilot
│   │   ├── s3.py             # S3/MinIO client and transparent download helpers
│   │   ├── siyuan.py         # SiYuan Notes API client
│   │   └── queue_listener.py # Background job worker
│   └── utils/
├── pawn_agent/
│   ├── __main__.py           # Entry point → pawn-agent
│   ├── cli/commands.py       # run, chat, tools, models
│   ├── core/
│   │   ├── agent.py          # ConversationAgent (Copilot SDK)
│   │   └── pydantic_agent.py # PydanticAI multi-provider agent
│   ├── tools/                # Auto-discovered tool modules
│   └── utils/
│       ├── config.py         # AgentConfig loader
│       ├── db.py             # DB session factory + RAG tables
│       ├── transcript.py     # Fetch/format session transcripts
│       ├── siyuan.py         # SiYuan helpers
│       ├── analysis.py       # Analysis runner
│       └── vectorize.py      # RAG vectorization
├── migrations/               # Alembic schema versions
├── pawnai.yaml               # Project configuration
└── alembic.ini

Technologies

Category	Technology
Speaker diarization	pyannote.audio 4.0+
Transcription	NeMo Parakeet (nvidia/parakeet-tdt-0.6b-v3), faster-whisper
Speaker embeddings	PostgreSQL + pgvector (512-dim cosine similarity)
RAG embeddings	sentence-transformers / Qwen3-Embedding-0.6B (1024-dim)
Database ORM	SQLAlchemy 2.0 + Alembic migrations
LLM agent	PydanticAI (multi-provider) + GitHub Copilot SDK
S3 storage	boto3 / aioboto3
Job queue	pawn-queue (S3-backed)
Notes integration	SiYuan Notes API
CLI	Typer + Rich

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Format
black pawn_diarize pawn_agent tests
isort pawn_diarize pawn_agent tests

# Lint
flake8 pawn_diarize pawn_agent tests
mypy pawn_diarize pawn_agent

# Test
pytest
pytest tests/test_cli.py::test_status
pytest --cov=pawn_diarize --cov-report=html

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
docker		docker
docs		docs
migrations		migrations
pawn_agent		pawn_agent
pawn_diarize		pawn_diarize
tests		tests
.gitignore		.gitignore
.pawn-diarize.yml.example		.pawn-diarize.yml.example
CLAUDE.md		CLAUDE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
alembic.ini		alembic.ini
anima.example.md		anima.example.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

PawnAI

Features

Installation

From Source

System Requirements

Database Setup

Configuration

pawn-diarize

Quick Start

Commands Reference

diarize

transcribe

transcribe-diarize

embed

search

label

session-relabel

session-info

sessions

sync-siyuan

s3 ls

s3 rm

listen

status

S3 Audio Paths

pawn-agent

Commands Reference

chat

run

tools

models

Available Tools

Project Structure

Technologies

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`diarize`

`transcribe`

`transcribe-diarize`

`embed`

`search`

`label`

`session-relabel`

`session-info`

`sessions`

`sync-siyuan`

`s3 ls`

`s3 rm`

`listen`

`status`

`chat`

`run`

`tools`

`models`

Packages