Skip to content

Fix HTML entities appearing in copied output for story titles#232

Open
simonw wants to merge 2 commits intomainfrom
claude/fix-clipboard-url-encoding-xj1eY
Open

Fix HTML entities appearing in copied output for story titles#232
simonw wants to merge 2 commits intomainfrom
claude/fix-clipboard-url-encoding-xj1eY

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Feb 18, 2026

story_title from the Algolia API can contain HTML entities (',
&, etc.) which were passed through to the output without decoding.
comment_text was already decoded via htmlToText but story_title was not.

https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3

story_title from the Algolia API can contain HTML entities (',
&, etc.) which were passed through to the output without decoding.
comment_text was already decoded via htmlToText but story_title was not.

https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3
@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @simonw's task —— View job


PR Review Complete ✅

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Examine the changed file and understand the fix
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Review security concerns
  • Evaluate test coverage needs
  • Provide comprehensive review feedback

Review Feedback

This is a well-executed bug fix that addresses a legitimate issue with HTML entity encoding in story titles. Here's my detailed analysis:

Code Quality & Best Practices

  • Consistency: The fix maintains consistency with how comment_text is already handled on line 69, applying the same htmlToText() decoding to story_title
  • Clean implementation: The change is minimal and focused, following the principle of least modification
  • Readable: The code remains clear and maintainable

Bug Fix Analysis

  • Root cause correctly identified: Story titles from the Algolia API can contain HTML entities (', &, etc.) that were being passed through without decoding
  • Proper solution: Using the existing htmlToText() function (lines 47-52) which creates a DOM element to naturally decode HTML entities
  • Edge case handling: The fallback logic || '(no title)' is preserved correctly

Performance Considerations

  • Minimal impact: The htmlToText() function is lightweight (creates a single DOM element per call)
  • No unnecessary overhead: Only applied when needed and reuses existing function
  • Acceptable for typical use case: With up to 1,000 comments max, the performance impact is negligible

Security Assessment

  • Safe DOM manipulation: The htmlToText() function safely creates a temporary DOM element without executing scripts
  • No XSS risk: Content is being decoded for text output, not rendered as HTML
  • Input sanitization: The function properly extracts text content using textContent || innerText

⚠️ Test Coverage Considerations

  • Missing tests: This appears to be a standalone HTML tool without visible test infrastructure
  • Manual verification recommended: Test with story titles containing common HTML entities:
    • ' (apostrophe)
    • & (ampersand)
    • < and > (angle brackets)
    • " (quotes)

📝 Suggestions

  1. Consider adding a test case if you plan to expand this tool - create a simple test that verifies HTML entities in story titles are properly decoded
  2. Documentation: The commit message excellently documents the issue and solution

Overall Assessment: APPROVED

This is a solid, targeted fix that:

  • Solves the reported problem correctly
  • Maintains code consistency
  • Has minimal performance impact
  • Introduces no security risks
  • Follows good coding practices

The change is ready to merge. Great work on identifying and fixing this encoding issue!


URLs in HN comment text (from Algolia API) contain percent-encoded
characters like %C4%81 (ā) and %3C (<). The htmlToText function
decoded HTML entities but left percent-encoding intact, so copied
output contained raw sequences like K%C4%81k%C4%81p%C5%8D instead
of Kākāpō.

Add a post-processing step that decodes consecutive %XX sequences
via decodeURIComponent, with a try/catch fallback for malformed
sequences.

https://claude.ai/code/session_01M9HD38yY9BBnzPzdKuGDR3
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 18, 2026

Deploying tools with  Cloudflare Pages  Cloudflare Pages

Latest commit: 094ae47
Status: ✅  Deploy successful!
Preview URL: https://ff7a703e.tools-b1q.pages.dev
Branch Preview URL: https://claude-fix-clipboard-url-enc.tools-b1q.pages.dev

View logs

@claude
Copy link
Contributor

claude bot commented Feb 18, 2026

Claude finished @simonw's task —— View job


PR Review Complete ✅

Todo List:

  • Read repository guidelines (CLAUDE.md) - No repo guidelines found
  • Examine the changed file and understand the fix
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Review security concerns
  • Evaluate test coverage needs
  • Provide comprehensive review feedback

Review Feedback

This PR addresses HTML entity encoding issues in story titles from the Algolia API. Here's my detailed analysis:

Code Quality & Best Practices

  • Consistent approach: The fix maintains consistency with existing htmlToText() usage for comment_text on line 72
  • Single responsibility: The htmlToText() function now handles both HTML entity decoding AND URL percent-decoding in one place
  • Clean implementation: Minimal, focused change that solves the problem without over-engineering

Bug Fix Analysis

  • Root cause correctly identified: Story titles from Algolia API contained HTML entities (', &amp;, etc.) that weren't being decoded
  • Comprehensive solution: The enhanced htmlToText() function now handles:
    • HTML entities via DOM parsing (existing functionality)
    • URL percent-encoding via decodeURIComponent() (new functionality in lines 52-54)
  • Proper fallback: The try/catch around decodeURIComponent() prevents crashes on malformed sequences

Performance Considerations

  • Minimal overhead: Added regex matching and decoding only when percent-encoded sequences are present
  • Efficient regex: Pattern /(%[0-9A-Fa-f]{2})+/g correctly matches percent-encoded sequences
  • Acceptable for use case: With up to 1,000 comments max, the performance impact is negligible

Security Assessment

  • Safe DOM manipulation: Uses temporary DOM element creation without script execution risk
  • Input sanitization: Properly extracts text content using textContent || innerText
  • Error handling: The try/catch prevents crashes from malformed percent-encoded sequences
  • No XSS risk: Content is decoded for text output, not rendered as HTML

⚠️ Potential Edge Cases

  • Mixed encoding: Handles cases where content might have both HTML entities AND percent-encoding
  • Malformed sequences: The fallback logic safely handles invalid percent-encoded sequences
  • Empty inputs: Proper null/empty string handling with early return

📝 Test Coverage Considerations

This appears to be a standalone HTML tool without visible test infrastructure. Consider manual testing with:

  • Story titles containing ' (apostrophes)
  • Story titles containing &amp; (ampersands)
  • Story titles with percent-encoded sequences like %20, %21
  • Mixed HTML entities and percent-encoding
  • Malformed percent sequences (should gracefully fallback)

Overall Assessment: APPROVED

This is an excellent, well-engineered fix that:

  • Solves the reported HTML entity issue completely
  • Adds bonus URL percent-decoding functionality
  • Maintains backward compatibility
  • Follows existing code patterns
  • Has proper error handling
  • Introduces no security risks

The two-commit approach (first HTML entities, then percent-encoding) shows good incremental development. Ready to merge!

Files Changed:

  • hn-comments-for-user.html:47-55 - Enhanced htmlToText() function with dual decoding capability

View job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants