ctoc

Like cloc, but counts Claude tokens instead of lines.

$ ctoc src/
---------------------
Ext     files  tokens
---------------------
.rs        17  52,000
.py         5  12,340
.ts         3   4,200
---------------------
SUM        25  68,540
---------------------

Self-contained C++17 binary — no runtime dependencies.
Greedy longest-match tokenizer built from 36,495 reverse-engineered Claude tokens.
~4% error vs the Anthropic count_tokens API across 30 tested files (tiktoken and bytes/4 undercount by 20%+).
Processes ~1M tokens/sec including file I/O.

Install

bazel build //:ctoc
cp bazel-bin/ctoc /usr/local/bin/

Cross-compile with --config={linux_amd64,linux_arm64,macos_amd64,macos_arm64} via hermetic zig cc.

Usage

ctoc .                                    # tokenize current project
ctoc --by-file src/                       # per-file breakdown
ctoc --include-ext .py --include-ext .js  # only Python and JS
ctoc --exclude-dir vendor .               # skip vendor/

How it works

At build time, gen_vocab.py converts vocab.json into a C++ array
At runtime, tokens are inserted into a trie and files are tokenized via greedy longest-match
Vocabulary was extracted by probing Anthropic's count_tokens API ~276K times — see REPORT.md

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.bazelignore		.bazelignore
.bazelrc		.bazelrc
.gitignore		.gitignore
BUILD.bazel		BUILD.bazel
MODULE.bazel		MODULE.bazel
MODULE.bazel.lock		MODULE.bazel.lock
README.md		README.md
REPORT.md		REPORT.md
ctoc.cc		ctoc.cc
ctoc_smoke_test.cc		ctoc_smoke_test.cc
gen_vocab.py		gen_vocab.py
vocab.json		vocab.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ctoc

Install

Usage

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ctoc

Install

Usage

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages