Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96
Draft
tricknotes wants to merge 45 commits intomainfrom
Draft
Migrate StarEvent from MongoDB to PostgreSQL with periodic GitHub API fetch#96tricknotes wants to merge 45 commits intomainfrom
tricknotes wants to merge 45 commits intomainfrom
Conversation
…orage
GitHub API changes made WatchEvent data unavailable. Instead of relying on
stored events, StarEvent now fetches starred repos from GitHub API
(/users/{login}/starred) on demand and persists them in PostgreSQL for
caching and archival purposes.
- Rewrite StarEvent and Repository from Mongoid to ActiveRecord
- Add fetch_and_upsert class method to StarEvent for GitHub API integration
- Store repository metadata in a separate repositories table
- Update controllers to fetch on demand before reading from DB
- Remove mongoid gem, config/mongoid.yml, and MongoDB from CI/Docker
- Remove fetch_repositories.rake (no longer needed)
- Update views, helpers, rake tasks, and test support
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
alias_method doesn't work with ActiveRecord attribute methods; use a regular method definition instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use ::OpenStruct to avoid NameError in Repository model - Handle Time objects from Octokit in fetch_starred_since - Add webmock and stub StarEvent.fetch_and_upsert in tests to prevent unintended GitHub API calls during test suite Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ruby 3.4+ requires explicit ostruct gem as it was removed from the default gems. Revert ::OpenStruct to OpenStruct now that the gem is properly declared. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
9d5685a to
b63b05b
Compare
- Update avatar_image_tag/image_link_to_github_url to handle OpenStruct (repo.owner) using respond_to?(:login) duck typing - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Restore Hash fallback in avatar_image_tag (lost during rebase merge) - Update notify.text.erb to use event.actor_login instead of event['actor']['login'] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
tricknotes
commented
Mar 24, 2026
|
@tricknotes I've opened a new pull request, #97, to work on those changes. Once the pull request is ready, I'll request review from you. |
e90d9b3 to
0d916ef
Compare
- Remove StarEvent.fetch_and_upsert calls from controllers (activities, dashboard, stars) - controllers now read from DB only - Remove User#fetch_star_events (no longer needed) - Add lib/tasks/fetch_star_events.rake for periodic background fetch - Remove fetch_and_upsert stub from rails_helper (no longer needed) - Skip Settings.url_options in test env to avoid BASE_URL port leaking into action_mailer.default_url_options Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The repo_* keys in the intermediate hash were not DB columns but appeared alongside DB columns, causing confusion. Refactor so that fetch_starred_since returns two distinct collections: star_events (only the star_events table fields) and repos (repository fields). upsert_repositories now receives repo data directly without needing the repo_* prefixed keys. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
all_by and by were identical — both delegate to where(actor_login:) which accepts a single value or an array. Remove the by class method, rename all_by scope to by, and update the one call site in User. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace LIKE-based owner scope with an equality check on the new repo_owner column (indexed). Populate repo_owner from repo.owner.login during fetch, and derive it from repo_name in stub_star_event! for tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Data.define provides an immutable value object with an explicit interface, no method_missing overhead, and errors on unknown attributes. Also removes the ostruct gem dependency since Data is built into Ruby 3.2+. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- by_name: was used in the old Mongoid-based repository! lookup on StarEvent, which was removed during the PostgreSQL migration - watchers_count: compatibility alias left over from when repositories were fetched directly from GitHub API responses; views access stargazers_count directly on the ActiveRecord model Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: tricknotes <290782+tricknotes@users.noreply.github.com> Agent-Logs-Url: https://github.com/tricknotes/starseeker/sessions/88dafe1b-34ea-48bd-b7f5-bf70463355de
Include repo_owner column and its index directly in the initial create_star_events migration, removing the separate add_repo_owner_to_star_events migration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Controllers no longer call StarEvent.fetch_and_upsert directly; GitHub API access is only triggered via the rake task, which is never executed during the test suite. There are no stub_request usages, so webmock provides no value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously fetch_starred_since accumulated all pages in arrays before upserting, causing large memory spikes when users had many starred repos. - Replace fetch_starred_since with fetch_each_page (block/yield style) so each page's data is upserted immediately and GC'd - Change User.all.to_a to User.find_each in rake task to avoid loading all user records into memory at once Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5c4a542 to
b1cb406
Compare
Per-page upsert (fetch_each_page) caps each thread's memory to one page (~30 items), so FETCH_CONCURRENCY=5 keeps peak memory at ~150 API response objects — well within Heroku eco dyno limits. - fetch_and_upsert: parallelize fetch_each_page across logins using Concurrent::FixedThreadPool; futures are awaited and rejections logged - rake task: parallelize user.followings calls using the same pool size; logins are merged under a Mutex; remove each_slice (no longer needed) - Thread pool is always shut down via ensure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
c21196f to
7b6f62f
Compare
Previously all logins were mapped to futures at once, keeping all Future objects and closures in memory until the entire batch completed. - fetch_and_upsert: process logins in slices of FETCH_CONCURRENCY; each slice's futures are awaited before the next is started, allowing completed futures and their API response data to be GC'd promptly; GC.compact is called between slices to compact the heap - rake task: call GC.compact after the followings phase completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use Concurrent::AtomicFixnum for thread-safe incrementing so each thread can log its position without data races. [fetch_and_upsert] (3/150) fetching @login [fetch_and_upsert] (3/150) @login done (1.23s) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduce `StarEvent.fetch_and_upsert_graphql` which issues a single HTTP request per GRAPHQL_BATCH_SIZE (default 20) users using GitHub's GraphQL API with field aliases. This reduces the number of HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE, making wall-clock time roughly proportional to the batch size ratio. Key details: - GRAPHQL_BATCH_SIZE (env) controls users per GraphQL call (default 20) - GRAPHQL_PAGE_SIZE = 30 matches the REST per_page default, keeping GraphQL point cost low while covering the common case of few new stars per window - Users with hasNextPage: true and items still within the since window are queued for a REST fallback (upsert_all idempotency makes re-fetch safe) - Uses Net::HTTP directly – no new gems required - A companion rake task `star_events:fetch_graphql` allows side-by-side timing comparison with the existing `star_events:fetch` (REST) task Also fix a pre-existing test failure: Rails merges routes.default_url_options (which includes Puma's default port 3000) into mailer URL generation. Adding `port: nil` to `action_mailer.default_url_options` in the test env explicitly suppresses the port from generated URLs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implement option B: each app-user's own GitHub token is used to fetch their followings' starred events, multiplying the effective rate limit by the number of app-users. A new rake task `star_events:fetch_per_user` drives this path. Design: - One thread per app-user (bounded by FETCH_CONCURRENCY) in the outer pool - Within each user thread, logins are processed sequentially via the new `StarEvent.fetch_and_upsert_per_user` method, avoiding nested thread pools - upsert_all idempotency handles the overlap where multiple users follow the same person (each user fetches them with their own token) Private repository filtering (applies to all fetch paths): - REST path (fetch_each_page): checks repo[:private] and skips via `next` - GraphQL path (parse_starred_edges): adds `isPrivate` to the query and skips private nodes before building star_events / repos hashes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fetch_and_upsert_per_user now delegates to fetch_and_upsert_graphql, reducing HTTP round-trips from N_logins to N_logins / GRAPHQL_BATCH_SIZE while still using each app-user's own token for their independent rate-limit budget. fetch_and_upsert_graphql gains an optional fallback_client: parameter so the REST pagination fallback uses the same token context as the GraphQL phase instead of falling back to the shared Settings.github_client. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ching" This reverts commit b3c0cb4.
Three targeted improvements to address R15 (memory quota exceeded): 1. Reuse a single persistent HTTPS connection in fetch_and_upsert_graphql Previously execute_graphql created a new Net::HTTP object per batch, causing repeated TLS handshakes and OpenSSL context allocations that accumulated across N_logins/GRAPHQL_BATCH_SIZE batches before GC could reclaim them. Net::HTTP.start now opens one connection for all batches, reducing connection objects from O(N_batches) to O(1). 2. Slice users in fetch_per_user rake task with GC.compact between slices find_in_batches(batch_size: FETCH_CONCURRENCY) replaces find_each with a growing futures array. Each slice of FETCH_CONCURRENCY users is fully processed and GC.compact is called before the next slice loads, preventing all users' logins arrays and Sawyer response objects from coexisting in memory simultaneously. 3. Periodic GC.compact in fetch_and_upsert_per_user (REST path) Compact the heap every FETCH_CONCURRENCY iterations to release Sawyer / Faraday response objects that accumulate in a tight sequential loop before the GC gets a chance to collect them. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner
Author
|
It's production ready 🎉 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
GitHub API の変更により WatchEvent データが取得できなくなったため、
/users/{login}/starredエンドポイントから star 情報を取得する方式に変更する。データはキャッシュ・アーカイブ目的で PostgreSQL に保存する。
主な変更
データ層
StarEvent/Repositoryを Mongoid → ActiveRecord (PostgreSQL) に移行star_eventsテーブル:actor_login,repo_name,repo_owner,starred_at等repositoriesテーブル:stargazers_countなど随時更新されるメタデータを分離repo_ownerカラムを追加しStarEvent.ownerの LIKE クエリを廃止フェッチ戦略
rake star_events:fetchタスクで全ユーザーの star を定期取得削除
mongoidgem、config/mongoid.ymllib/tasks/fetch_repositories.rake🤖 Generated with Claude Code