Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch by tricknotes · Pull Request #96 · tricknotes/starseeker

tricknotes · 2026-03-19T18:18:01Z

背景

GitHub API の変更により WatchEvent データが取得できなくなったため、
/users/{login}/starred エンドポイントから star 情報を取得する方式に変更する。
データはキャッシュ・アーカイブ目的で PostgreSQL に保存する。

主な変更

データ層

StarEvent / Repository を Mongoid → ActiveRecord (PostgreSQL) に移行
star_events テーブル: actor_login, repo_name, repo_owner, starred_at 等
repositories テーブル: stargazers_count など随時更新されるメタデータを分離
repo_owner カラムを追加し StarEvent.owner の LIKE クエリを廃止

フェッチ戦略

コントローラーでのオンデマンドフェッチを廃止
rake star_events:fetch タスクで全ユーザーの star を定期取得
コントローラーは DB に存在するデータのみを表示

削除

mongoid gem、config/mongoid.yml
lib/tasks/fetch_repositories.rake

🤖 Generated with Claude Code

…orage GitHub API changes made WatchEvent data unavailable. Instead of relying on stored events, StarEvent now fetches starred repos from GitHub API (/users/{login}/starred) on demand and persists them in PostgreSQL for caching and archival purposes. - Rewrite StarEvent and Repository from Mongoid to ActiveRecord - Add fetch_and_upsert class method to StarEvent for GitHub API integration - Store repository metadata in a separate repositories table - Update controllers to fetch on demand before reading from DB - Remove mongoid gem, config/mongoid.yml, and MongoDB from CI/Docker - Remove fetch_repositories.rake (no longer needed) - Update views, helpers, rake tasks, and test support Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

alias_method doesn't work with ActiveRecord attribute methods; use a regular method definition instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Use ::OpenStruct to avoid NameError in Repository model - Handle Time objects from Octokit in fetch_starred_since - Add webmock and stub StarEvent.fetch_and_upsert in tests to prevent unintended GitHub API calls during test suite Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ruby 3.4+ requires explicit ostruct gem as it was removed from the default gems. Revert ::OpenStruct to OpenStruct now that the gem is properly declared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Update avatar_image_tag/image_link_to_github_url to handle OpenStruct (repo.owner) using respond_to?(:login) duck typing - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Restore Hash fallback in avatar_image_tag (lost during rebase merge) - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

app/helpers/application_helper.rb

Claude · 2026-03-24T14:55:34Z

@tricknotes I've opened a new pull request, #97, to work on those changes. Once the pull request is ready, I'll request review from you.

- Remove StarEvent.fetch_and_upsert calls from controllers (activities, dashboard, stars) - controllers now read from DB only - Remove User#fetch_star_events (no longer needed) - Add lib/tasks/fetch_star_events.rake for periodic background fetch - Remove fetch_and_upsert stub from rails_helper (no longer needed) - Skip Settings.url_options in test env to avoid BASE_URL port leaking into action_mailer.default_url_options Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The repo_* keys in the intermediate hash were not DB columns but appeared alongside DB columns, causing confusion. Refactor so that fetch_starred_since returns two distinct collections: star_events (only the star_events table fields) and repos (repository fields). upsert_repositories now receives repo data directly without needing the repo_* prefixed keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

all_by and by were identical — both delegate to where(actor_login:) which accepts a single value or an array. Remove the by class method, rename all_by scope to by, and update the one call site in User. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace LIKE-based owner scope with an equality check on the new repo_owner column (indexed). Populate repo_owner from repo.owner.login during fetch, and derive it from repo_name in stub_star_event! for tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Data.define provides an immutable value object with an explicit interface, no method_missing overhead, and errors on unknown attributes. Also removes the ostruct gem dependency since Data is built into Ruby 3.2+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- by_name: was used in the old Mongoid-based repository! lookup on StarEvent, which was removed during the PostgreSQL migration - watchers_count: compatibility alias left over from when repositories were fetched directly from GitHub API responses; views access stargazers_count directly on the ActiveRecord model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: tricknotes <290782+tricknotes@users.noreply.github.com> Agent-Logs-Url: https://github.com/tricknotes/starseeker/sessions/88dafe1b-34ea-48bd-b7f5-bf70463355de

Include repo_owner column and its index directly in the initial create_star_events migration, removing the separate add_repo_owner_to_star_events migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Controllers no longer call StarEvent.fetch_and_upsert directly; GitHub API access is only triggered via the rake task, which is never executed during the test suite. There are no stub_request usages, so webmock provides no value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously fetch_starred_since accumulated all pages in arrays before upserting, causing large memory spikes when users had many starred repos. - Replace fetch_starred_since with fetch_each_page (block/yield style) so each page's data is upserted immediately and GC'd - Change User.all.to_a to User.find_each in rake task to avoid loading all user records into memory at once Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Per-page upsert (fetch_each_page) caps each thread's memory to one page (~30 items), so FETCH_CONCURRENCY=5 keeps peak memory at ~150 API response objects — well within Heroku eco dyno limits. - fetch_and_upsert: parallelize fetch_each_page across logins using Concurrent::FixedThreadPool; futures are awaited and rejections logged - rake task: parallelize user.followings calls using the same pool size; logins are merged under a Mutex; remove each_slice (no longer needed) - Thread pool is always shut down via ensure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Previously all logins were mapped to futures at once, keeping all Future objects and closures in memory until the entire batch completed. - fetch_and_upsert: process logins in slices of FETCH_CONCURRENCY; each slice's futures are awaited before the next is started, allowing completed futures and their API response data to be GC'd promptly; GC.compact is called between slices to compact the heap - rake task: call GC.compact after the followings phase completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Use Concurrent::AtomicFixnum for thread-safe incrementing so each thread can log its position without data races. [fetch_and_upsert] (3/150) fetching @login [fetch_and_upsert] (3/150) @login done (1.23s) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Introduce `StarEvent.fetch_and_upsert_graphql` which issues a single HTTP request per GRAPHQL_BATCH_SIZE (default 20) users using GitHub's GraphQL API with field aliases. This reduces the number of HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE, making wall-clock time roughly proportional to the batch size ratio. Key details: - GRAPHQL_BATCH_SIZE (env) controls users per GraphQL call (default 20) - GRAPHQL_PAGE_SIZE = 30 matches the REST per_page default, keeping GraphQL point cost low while covering the common case of few new stars per window - Users with hasNextPage: true and items still within the since window are queued for a REST fallback (upsert_all idempotency makes re-fetch safe) - Uses Net::HTTP directly – no new gems required - A companion rake task `star_events:fetch_graphql` allows side-by-side timing comparison with the existing `star_events:fetch` (REST) task Also fix a pre-existing test failure: Rails merges routes.default_url_options (which includes Puma's default port 3000) into mailer URL generation. Adding `port: nil` to `action_mailer.default_url_options` in the test env explicitly suppresses the port from generated URLs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implement option B: each app-user's own GitHub token is used to fetch their followings' starred events, multiplying the effective rate limit by the number of app-users. A new rake task `star_events:fetch_per_user` drives this path. Design: - One thread per app-user (bounded by FETCH_CONCURRENCY) in the outer pool - Within each user thread, logins are processed sequentially via the new `StarEvent.fetch_and_upsert_per_user` method, avoiding nested thread pools - upsert_all idempotency handles the overlap where multiple users follow the same person (each user fetches them with their own token) Private repository filtering (applies to all fetch paths): - REST path (fetch_each_page): checks repo[:private] and skips via `next` - GraphQL path (parse_starred_edges): adds `isPrivate` to the query and skips private nodes before building star_events / repos hashes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fetch_and_upsert_per_user now delegates to fetch_and_upsert_graphql, reducing HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE while still using each app-user's own token for their independent rate-limit budget. fetch_and_upsert_graphql gains an optional fallback_client: parameter so the REST pagination fallback uses the same token context as the GraphQL phase instead of falling back to the shared Settings.github_client. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ching" This reverts commit b3c0cb4.

Three targeted improvements to address R15 (memory quota exceeded): 1. Reuse a single persistent HTTPS connection in fetch_and_upsert_graphql Previously execute_graphql created a new Net::HTTP object per batch, causing repeated TLS handshakes and OpenSSL context allocations that accumulated across N_logins/GRAPHQL_BATCH_SIZE batches before GC could reclaim them. Net::HTTP.start now opens one connection for all batches, reducing connection objects from O(N_batches) to O(1). 2. Slice users in fetch_per_user rake task with GC.compact between slices find_in_batches(batch_size: FETCH_CONCURRENCY) replaces find_each with a growing futures array. Each slice of FETCH_CONCURRENCY users is fully processed and GC.compact is called before the next slice loads, preventing all users' logins arrays and Sawyer response objects from coexisting in memory simultaneously. 3. Periodic GC.compact in fetch_and_upsert_per_user (REST path) Compact the heap every FETCH_CONCURRENCY iterations to release Sawyer / Faraday response objects that accumulate in a tight sequential loop before the GC gets a chance to collect them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tricknotes · 2026-04-02T11:11:00Z

It's production ready 🎉

heroku run "rails star_events:fetch_per_user[24] FETCH_CONCURRENCY=20" -a   2.98s user 1.20s system 0% cpu 1:12:56.84 total

tricknotes marked this pull request as draft March 19, 2026 18:23

tricknotes and others added 7 commits March 21, 2026 17:56

Add ApplicationRecord base class

956a7b3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix watchers_count alias in Repository model

ef49d07

alias_method doesn't work with ActiveRecord attribute methods; use a regular method definition instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add ostruct gem and revert to plain OpenStruct

2e03a34

Ruby 3.4+ requires explicit ostruct gem as it was removed from the default gems. Revert ::OpenStruct to OpenStruct now that the gem is properly declared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix to use OStruct

2022b6e

$ bin/rails db:migrate

b63b05b

tricknotes force-pushed the claude/xenodochial-engelbart branch from 9d5685a to b63b05b Compare March 21, 2026 08:58

tricknotes and others added 2 commits March 23, 2026 12:27

Fix remaining view errors after MongoDB to PostgreSQL migration

301e15d

- Restore Hash fallback in avatar_image_tag (lost during rebase merge) - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tricknotes temporarily deployed to starseeker-dev March 23, 2026 04:32 Inactive

tricknotes temporarily deployed to starseeker-dev March 23, 2026 08:31 Inactive

tricknotes temporarily deployed to starseeker-dev March 24, 2026 10:27 Inactive

tricknotes commented Mar 24, 2026

View reviewed changes

app/helpers/application_helper.rb Outdated Show resolved Hide resolved

Claude AI mentioned this pull request Mar 24, 2026

Simplify helper methods by using Repository::Owner consistently #97

Merged

tricknotes force-pushed the claude/xenodochial-engelbart branch from e90d9b3 to 0d916ef Compare March 24, 2026 15:11

tricknotes changed the title ~~Replace MongoDB-based StarEvent with GitHub API fetch + PostgreSQL storage~~ Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch Mar 24, 2026

tricknotes and others added 11 commits March 25, 2026 01:10

$ bin/rails db:migrate

5c0d9e2

Simplify helper methods by using Repository::Owner consistently

397ae65

Co-authored-by: tricknotes <290782+tricknotes@users.noreply.github.com> Agent-Logs-Url: https://github.com/tricknotes/starseeker/sessions/88dafe1b-34ea-48bd-b7f5-bf70463355de

Merge add_repo_owner migration into create_star_events

e1deb95

Include repo_owner column and its index directly in the initial create_star_events migration, removing the separate add_repo_owner_to_star_events migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update README to use star_events:fetch instead of seeds_stub_event

cc38a06

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

tricknotes temporarily deployed to starseeker March 25, 2026 17:24 Inactive

tricknotes temporarily deployed to starseeker March 25, 2026 17:40 Inactive

tricknotes temporarily deployed to starseeker-dev March 26, 2026 08:46 Inactive

tricknotes temporarily deployed to starseeker March 31, 2026 03:22 Inactive

Fetch by chunk

b1cb406

tricknotes force-pushed the claude/xenodochial-engelbart branch from 5c4a542 to b1cb406 Compare March 31, 2026 03:41

tricknotes temporarily deployed to starseeker-dev March 31, 2026 04:08 Inactive

tricknotes temporarily deployed to starseeker-dev March 31, 2026 04:39 Inactive

tricknotes and others added 2 commits March 31, 2026 13:44

Extend DB pool size for concurrency

7b6f62f

tricknotes force-pushed the claude/xenodochial-engelbart branch from c21196f to 7b6f62f Compare March 31, 2026 04:44

tricknotes temporarily deployed to starseeker-dev March 31, 2026 04:45 Inactive

tricknotes temporarily deployed to starseeker March 31, 2026 04:49 Inactive

tricknotes and others added 2 commits March 31, 2026 14:01

tricknotes temporarily deployed to starseeker-dev March 31, 2026 05:04 Inactive

Compact StarEvent#actor_avatar_url

979bd32

tricknotes deployed to starseeker-dev March 31, 2026 05:12 View deployment

tricknotes temporarily deployed to starseeker March 31, 2026 05:16 Inactive

tricknotes and others added 2 commits April 1, 2026 12:08

tricknotes temporarily deployed to starseeker April 2, 2026 05:23 Inactive

tricknotes temporarily deployed to starseeker April 2, 2026 06:18 Inactive

Revert "Speed up fetch_per_user by switching from REST to GraphQL bat…

e8a5043

…ching" This reverts commit b3c0cb4.

tricknotes temporarily deployed to starseeker April 2, 2026 06:21 Inactive

tricknotes deployed to starseeker April 2, 2026 07:47 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96

Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96
tricknotes wants to merge 45 commits intomainfrom
claude/xenodochial-engelbart

tricknotes commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Claude AI commented Mar 24, 2026

Uh oh!

tricknotes commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tricknotes commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

背景

主な変更

データ層

フェッチ戦略

削除

Uh oh!

Uh oh!

Claude AI commented Mar 24, 2026

Uh oh!

tricknotes commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tricknotes commented Mar 19, 2026 •

edited

Loading