Skip to content

fix: chunk byline IN clauses to stay within D1 SQL variable limit#223

Open
baezor wants to merge 4 commits intoemdash-cms:mainfrom
baezor:fix/byline-sql-batch-chunking
Open

fix: chunk byline IN clauses to stay within D1 SQL variable limit#223
baezor wants to merge 4 commits intoemdash-cms:mainfrom
baezor:fix/byline-sql-batch-chunking

Conversation

@baezor
Copy link
Copy Markdown

@baezor baezor commented Apr 4, 2026

What does this PR do?

Fixes unbounded IN (?, ?, …) clauses in byline hydration that exceed Cloudflare D1's SQL bound-parameter limit when querying large collections.

Adds a chunks() utility (utils/chunks.ts) and applies it defense-in-depth at the repository level so any caller is protected:

  • BylineRepository.getContentBylinesMany — deduplicates IDs, then chunks content_id IN (…)
  • BylineRepository.findByUserIds — chunks user_id IN (…)
  • getAuthorIds (bylines/index.ts) — chunks id IN (…) raw SQL

Each batch is capped at 50 IDs, well within D1's limit. Content IDs are also deduplicated before chunking to prevent duplicate credits when the same ID spans multiple chunks.

Closes #219

Type of change

  • Bug fix
  • Feature (requires approved Discussion)
  • Refactor (no behavior change)
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm --silent lint:json | jq '.diagnostics | length' returns 0
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: https://github.com/emdash-cms/emdash/discussions/...

AI-generated code disclosure

  • This PR includes AI-generated code

Screenshots / test output

All 26 tests pass across 3 test files (10 new):

 ✓ chunks > returns empty array for empty input
 ✓ chunks > returns single chunk when array fits within size
 ✓ chunks > splits array into even chunks
 ✓ chunks > handles remainder in last chunk
 ✓ chunks > handles chunk size of 1
 ✓ chunks > handles array exactly equal to chunk size
 ✓ SQL_BATCH_SIZE > is 50
 ✓ BylineRepository > getContentBylinesMany handles more IDs than SQL_BATCH_SIZE
 ✓ BylineRepository > getContentBylinesMany does not duplicate credits for repeated content IDs
 ✓ BylineRepository > findByUserIds handles more IDs than SQL_BATCH_SIZE
 ✓ getBylinesForEntries > handles batches larger than SQL_BATCH_SIZE across explicit and inferred bylines

baezor added 3 commits April 4, 2026 01:26
Fixes emdash-cms#219. hydrateEntryBylines builds unbounded IN (?, ?, …) clauses
that exceed Cloudflare D1's bound-parameter limit on large collections.

Adds a chunks() utility and applies it defense-in-depth at the
repository level: getContentBylinesMany, findByUserIds, and
getAuthorIds now batch IDs in groups of 50.
Deduplicates contentIds in getContentBylinesMany to prevent duplicate
credits when the same ID appears across chunk boundaries. Adds tests
for the duplication edge case and an end-to-end getBylinesForEntries
test spanning both explicit and inferred byline paths.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 4, 2026

🦋 Changeset detected

Latest commit: fddf1ee

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/plugin-ai-moderation Patch
@emdash-cms/plugin-atproto Patch
@emdash-cms/plugin-audit-log Patch
@emdash-cms/plugin-color Patch
@emdash-cms/plugin-embeds Patch
@emdash-cms/plugin-forms Patch
@emdash-cms/plugin-webhook-notifier Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@baezor
Copy link
Copy Markdown
Author

baezor commented Apr 4, 2026

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hydrateEntryBylines exceeds D1 SQL variable limit on large collections

1 participant