fix: concurrent delete race conditions in delete_entity#702
Conversation
|
Claude finished @phernandez's task in 2m 5s —— View job PR Review: Fix concurrent delete race conditionsTasks:
SummaryThe overall approach is correct and well-motivated — making Issues Found1. Line length violation (minor)
logger.debug(f"Search cleanup skipped for {permalink_or_id} (likely concurrent delete)")This line is ~107 characters. Fix this → 2. Bare
|
| Item | Status |
|---|---|
| Follows Basic Memory coding conventions | |
| Python 3.12+ type annotations and async patterns | ✅ |
| SQLAlchemy 2.0 best practices | ✅ |
| 100-character line length limit | ❌ Line 758 is ~107 chars |
| No commented-out code blocks | ✅ |
| Unit tests for new functions | ✅ 5 new tests |
| Test coverage for edge cases | ✅ Well covered |
| 100% test coverage maintained | ✅ |
| No hardcoded secrets/credentials | ✅ |
| Proper error handling and logging | |
| Performance considerations | ✅ No regressions |
| Windows-compatible file paths | ✅ No path issues introduced |
| No emojis/unicode in console/log output | ✅ |
What's Working Well
- Correct idempotency logic — treating "0 results by ID" and
NoResultFoundfromrepository.delete()as success is exactly right for concurrent deletes. - Excellent test suite —
test_delete_directory_entity_deleted_between_query_and_deletein particular nails the real race condition scenario. - Good decision-point comments on the
repository.delete()return-value handling (Trigger/Why/Outcome pattern from CLAUDE.md). - The
EntityNotFoundError→return Truepath for permalink lookups was already correct; this PR keeps it intact.
The two items to address before merging: fix the line length at line 758 and raise the logging level on the caught exception to warning with exc_info=True.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36db875536
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except Exception: | ||
| # Search cleanup is best-effort during concurrent deletes. | ||
| # Relationships may have been cascade-deleted by a concurrent request. | ||
| logger.debug(f"Search cleanup skipped for {permalink_or_id} (likely concurrent delete)") |
There was a problem hiding this comment.
Re-raise unexpected search cleanup failures
Catching Exception here violates the fail-fast rule in AGENTS.md ("Do not swallow exceptions") and can silently leave the search index inconsistent: if handle_delete fails for a real defect or backend issue (not just a concurrent-race edge case), the entity is still deleted from files/DB and the stale search rows remain queryable. That produces incorrect search results and makes the root cause hard to diagnose because the original exception is discarded.
Useful? React with 👍 / 👎.
…ectory When multiple delete-directory requests target the same directory concurrently, entities can be deleted between the prefix query and the individual delete calls. Previously this produced hundreds of noisy errors (ValueError, "Delete returned False", NoneType attribute errors) even though the deletes ultimately succeeded. Changes: - 🔧 Return True when entity lookup by ID finds 0 results (already deleted) - 🔧 Treat repository.delete() returning False as success (concurrent delete) - 🔧 Make search index cleanup best-effort during deletes (cascade race) - ✅ Add 5 tests covering concurrent delete scenarios Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: phernandez <paul@basicmachines.co>
36db875 to
de93959
Compare
|
Claude finished @phernandez's task in 2m 45s —— View job PR Review: Fix concurrent delete race conditionsTasks:
SummaryThis PR correctly addresses the concurrent delete race conditions that caused 662 errors in production. The previous review flagged three issues; all three have been resolved in the current commit. Issues Resolved from Previous Review1. Line length at line 758 — ✅ Fixed 2. Bare logger.warning(
"Search cleanup failed for entity (likely concurrent delete)",
permalink_or_id=permalink_or_id,
exc_info=True,
)3. f-strings in structlog calls — ✅ Fixed Code Review:
|
| Test | What it verifies |
|---|---|
test_delete_entity_by_id_already_deleted |
Double-delete by ID returns True |
test_delete_entity_by_permalink_already_deleted |
Double-delete by permalink returns True |
test_delete_directory_concurrent_resilience |
Some entities pre-deleted before delete_directory runs |
test_delete_directory_all_already_deleted |
All entities pre-deleted (empty directory case) |
test_delete_directory_entity_deleted_between_query_and_delete |
Exact race condition: entity exists in prefix query but deleted before delete_entity is called |
The last test is the most important — it directly simulates the production failure scenario.
Checklist Results
| Item | Status |
|---|---|
| Follows Basic Memory coding conventions | ✅ |
| Python 3.12+ type annotations and async patterns | ✅ |
| SQLAlchemy 2.0 best practices | ✅ |
| 100-character line length limit | ✅ New code is clean |
| No commented-out code blocks | ✅ |
| Unit tests for new functions | ✅ 5 new tests |
| Test coverage for edge cases | ✅ All race condition paths covered |
| 100% test coverage maintained | ✅ |
| No hardcoded secrets/credentials | ✅ |
| Proper error handling and logging | ✅ Fixed from previous review |
| Performance considerations | ✅ No regressions |
| Windows-compatible file paths | ✅ |
| No emojis/unicode in console/log output | ✅ |
Verdict
All issues from the previous review are addressed. The implementation is correct, well-tested, and follows CLAUDE.md conventions. Ready to merge.
Summary
delete_entity()idempotent under concurrent access by treating "entity not found" as success instead of raising ValueErrorrepository.delete()returning False (NoResultFound) as success since it means the entity was concurrently deletedContext: A customer deleting a large directory triggered multiple concurrent
delete-directoryrequests, producing 662 errors across 3 types — all race conditions from competing deletes on the same entities.Test plan
test_delete_entity_by_id_already_deleted- double-delete by ID returns Truetest_delete_entity_by_permalink_already_deleted- double-delete by permalinktest_delete_directory_concurrent_resilience- directory delete with pre-deleted entitiestest_delete_directory_all_already_deleted- directory delete when all entities gonetest_delete_directory_entity_deleted_between_query_and_delete- simulates exact race condition🤖 Generated with Claude Code