Skip to content

Parse blog feeds, generate post ideas with LLM (OpenAI + LangFuse)

Notifications You must be signed in to change notification settings

OutRizz/content-engine

Repository files navigation

Content Engine

Script that:

  1. Reads a list of sources from a file (each line: type url — type is rss or html)
  2. Parses the latest N items: RSS/Atom with feedparser, HTML blog listing pages with BeautifulSoup + readability
  3. Calls an LLM to generate 10 post ideas from the parsed content (structured: source links, source insight, post idea, description, format, how to use)

Uses OpenAI for LLM calls and LangFuse for tracing (via langfuse.openai).

Setup

# Use uv (recommended)
uv sync

# Or pip
pip install -e .

Copy .env.example to .env and set:

  • OPENAI_API_KEY — required
  • LANGFUSE_PUBLIC_KEY / LANGFUSE_SECRET_KEY — optional; if set, traces are sent to LangFuse

Usage

  1. Put sources in urls.txt. Format: type url (tab or space), one per line.
    • rss — RSS/Atom feed URL (e.g. rss https://hnrss.org/frontpage).
    • html — blog listing page URL; the script will find article links on the page and fetch each article (e.g. html https://www.forrester.com/blogs/).
  2. Run:
python run.py

Options (env vars):

  • URLS_FILE — path to URLs file (default: urls.txt)
  • TOP_N — max number of latest entries to use (default: 10)
  • OPENAI_MODEL — model name (default: gpt-4o)

Output: 10 post ideas (with source links, insights, format) printed to stdout; LLM calls are logged to LangFuse when configured.

Adding new sources

When adding new URLs (especially html sources), follow docs/ADDING_SOURCES.md so that:

  • You verify how the site is parsed and which links are collected.
  • You add exclusions in parsers/html.py for category/section/landing URLs if the parser picks them up by mistake.

Project structure

content-engine/
├── run.py              # Entry point (loads .env, calls main)
├── main.py             # Pipeline: load sources → fetch → LLM → output
├── config.py           # Constants (DEFAULT_*, USER_AGENT, SOURCE_TYPES)
├── models.py           # FeedEntry, Source dataclasses
├── sources.py          # load_sources() from urls file
├── fetcher.py          # fetch_entries() — RSS + HTML, merge by date
├── parsers/
│   ├── __init__.py
│   ├── rss.py          # fetch_entries_rss()
│   └── html.py        # fetch_entries_html()
├── prompt_loader.py    # load_prompt(name, **variables)
├── llm.py              # generate_post_ideas()
├── prompts/            # Prompt templates ({{placeholder}})
│   ├── post_ideas_system.txt
│   └── post_ideas_user.txt     # {{count}}, {{content}}, {{sources}}
├── urls.txt
└── ...

Run from the project root so that imports resolve (python run.py).

About

Parse blog feeds, generate post ideas with LLM (OpenAI + LangFuse)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages