Fix HTML entities appearing in copied output for story titles by simonw · Pull Request #232 · simonw/tools

simonw · 2026-02-18T07:23:24Z

story_title from the Algolia API can contain HTML entities (',
&, etc.) which were passed through to the output without decoding.
comment_text was already decoded via htmlToText but story_title was not.

https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3

story_title from the Algolia API can contain HTML entities (', &, etc.) which were passed through to the output without decoding. comment_text was already decoded via htmlToText but story_title was not. https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3

claude · 2026-02-18T07:23:39Z

Claude finished @simonw's task —— View job

PR Review Complete ✅

Todo List:

Read repository guidelines (CLAUDE.md)
Examine the changed file and understand the fix
Analyze code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Review security concerns
Evaluate test coverage needs
Provide comprehensive review feedback

Review Feedback

This is a well-executed bug fix that addresses a legitimate issue with HTML entity encoding in story titles. Here's my detailed analysis:

✅ Code Quality & Best Practices

Consistency: The fix maintains consistency with how comment_text is already handled on line 69, applying the same htmlToText() decoding to story_title
Clean implementation: The change is minimal and focused, following the principle of least modification
Readable: The code remains clear and maintainable

✅ Bug Fix Analysis

Root cause correctly identified: Story titles from the Algolia API can contain HTML entities (', &, etc.) that were being passed through without decoding
Proper solution: Using the existing htmlToText() function (lines 47-52) which creates a DOM element to naturally decode HTML entities
Edge case handling: The fallback logic || '(no title)' is preserved correctly

✅ Performance Considerations

Minimal impact: The htmlToText() function is lightweight (creates a single DOM element per call)
No unnecessary overhead: Only applied when needed and reuses existing function
Acceptable for typical use case: With up to 1,000 comments max, the performance impact is negligible

✅ Security Assessment

Safe DOM manipulation: The htmlToText() function safely creates a temporary DOM element without executing scripts
No XSS risk: Content is being decoded for text output, not rendered as HTML
Input sanitization: The function properly extracts text content using textContent || innerText

⚠️ Test Coverage Considerations

Missing tests: This appears to be a standalone HTML tool without visible test infrastructure
Manual verification recommended: Test with story titles containing common HTML entities:
- ' (apostrophe)
- & (ampersand)
- < and > (angle brackets)
- " (quotes)

📝 Suggestions

Consider adding a test case if you plan to expand this tool - create a simple test that verifies HTML entities in story titles are properly decoded
Documentation: The commit message excellently documents the issue and solution

Overall Assessment: APPROVED ✅

This is a solid, targeted fix that:

Solves the reported problem correctly
Maintains code consistency
Has minimal performance impact
Introduces no security risks
Follows good coding practices

The change is ready to merge. Great work on identifying and fixing this encoding issue!

URLs in HN comment text (from Algolia API) contain percent-encoded characters like %C4%81 (ā) and %3C (<). The htmlToText function decoded HTML entities but left percent-encoding intact, so copied output contained raw sequences like K%C4%81k%C4%81p%C5%8D instead of Kākāpō. Add a post-processing step that decodes consecutive %XX sequences via decodeURIComponent, with a try/catch fallback for malformed sequences. https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3

cloudflare-workers-and-pages · 2026-02-18T07:34:23Z

Deploying tools with Cloudflare Pages

Latest commit:	`094ae47`
Status:	✅ Deploy successful!
Preview URL:	https://ff7a703e.tools-b1q.pages.dev
Branch Preview URL:	https://claude-fix-clipboard-url-enc.tools-b1q.pages.dev

View logs

claude · 2026-02-18T07:34:34Z

Claude finished @simonw's task —— View job

PR Review Complete ✅

Todo List:

Read repository guidelines (CLAUDE.md) - No repo guidelines found
Examine the changed file and understand the fix
Analyze code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Review security concerns
Evaluate test coverage needs
Provide comprehensive review feedback

Review Feedback

This PR addresses HTML entity encoding issues in story titles from the Algolia API. Here's my detailed analysis:

✅ Code Quality & Best Practices

Consistent approach: The fix maintains consistency with existing htmlToText() usage for comment_text on line 72
Single responsibility: The htmlToText() function now handles both HTML entity decoding AND URL percent-decoding in one place
Clean implementation: Minimal, focused change that solves the problem without over-engineering

✅ Bug Fix Analysis

Root cause correctly identified: Story titles from Algolia API contained HTML entities (', &, etc.) that weren't being decoded
Comprehensive solution: The enhanced htmlToText() function now handles:
- HTML entities via DOM parsing (existing functionality)
- URL percent-encoding via decodeURIComponent() (new functionality in lines 52-54)
Proper fallback: The try/catch around decodeURIComponent() prevents crashes on malformed sequences

✅ Performance Considerations

Minimal overhead: Added regex matching and decoding only when percent-encoded sequences are present
Efficient regex: Pattern /(%[0-9A-Fa-f]{2})+/g correctly matches percent-encoded sequences
Acceptable for use case: With up to 1,000 comments max, the performance impact is negligible

✅ Security Assessment

Safe DOM manipulation: Uses temporary DOM element creation without script execution risk
Input sanitization: Properly extracts text content using textContent || innerText
Error handling: The try/catch prevents crashes from malformed percent-encoded sequences
No XSS risk: Content is decoded for text output, not rendered as HTML

⚠️ Potential Edge Cases

Mixed encoding: Handles cases where content might have both HTML entities AND percent-encoding
Malformed sequences: The fallback logic safely handles invalid percent-encoded sequences
Empty inputs: Proper null/empty string handling with early return

📝 Test Coverage Considerations

This appears to be a standalone HTML tool without visible test infrastructure. Consider manual testing with:

Story titles containing ' (apostrophes)
Story titles containing & (ampersands)
Story titles with percent-encoded sequences like %20, %21
Mixed HTML entities and percent-encoding
Malformed percent sequences (should gracefully fallback)

Overall Assessment: APPROVED ✅

This is an excellent, well-engineered fix that:

Solves the reported HTML entity issue completely
Adds bonus URL percent-decoding functionality
Maintains backward compatibility
Follows existing code patterns
Has proper error handling
Introduces no security risks

The two-commit approach (first HTML entities, then percent-encoding) shows good incremental development. Ready to merge!

Files Changed:

hn-comments-for-user.html:47-55 - Enhanced htmlToText() function with dual decoding capability

View job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix HTML entities appearing in copied output for story titles#232

Fix HTML entities appearing in copied output for story titles#232
simonw wants to merge 2 commits intomainfrom
claude/fix-clipboard-url-encoding-xj1eY

simonw commented Feb 18, 2026

Uh oh!

claude bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

simonw commented Feb 18, 2026

Uh oh!

claude bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Feedback

✅ Code Quality & Best Practices

✅ Bug Fix Analysis

✅ Performance Considerations

✅ Security Assessment

⚠️ Test Coverage Considerations

📝 Suggestions

Overall Assessment: APPROVED ✅

Uh oh!

cloudflare-workers-and-pages bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying tools with Cloudflare Pages

Uh oh!

claude bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Complete ✅

Todo List:

Review Feedback

✅ Code Quality & Best Practices

✅ Bug Fix Analysis

✅ Performance Considerations

✅ Security Assessment

⚠️ Potential Edge Cases

📝 Test Coverage Considerations

Overall Assessment: APPROVED ✅

Files Changed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Feb 18, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Feb 18, 2026 •

edited

Loading

claude bot commented Feb 18, 2026 •

edited

Loading