Skip to content

Add AI kernel generation endpoint#449

Open
msaroufim wants to merge 2 commits intomainfrom
ai-kernel-generation
Open

Add AI kernel generation endpoint#449
msaroufim wants to merge 2 commits intomainfrom
ai-kernel-generation

Conversation

@msaroufim
Copy link
Member

@msaroufim msaroufim commented Feb 26, 2026

Vibe coded prototype to see if can just support prompt to kernel submissions directly in the service and pay for everyone' AI credits

We'd still need to likely update the site to support this new API endpoint, we probably don't need to change the popcorn-cli tho

Summary

  • Adds a POST /ai/{leaderboard}/{gpu}/{mode} API endpoint that accepts a natural language prompt, generates kernel code, and submits it through the existing evaluation pipeline
  • New src/libkernelbot/ai_generate.py module with generate_kernel() that builds a context-rich prompt from the leaderboard's description, templates, reference files, and test specs
  • Adds anthropic dependency and ANTHROPIC_API_KEY env var

Test plan

  • Verify uv run ruff check passes on changed files
  • Verify existing tests still pass (uv run pytest tests/ -v)
  • Manual test with local API server:
    curl -X POST "http://localhost:8000/ai/vectoradd_v2/H100/test" \
      -H "X-Popcorn-Cli-Id: test-cli-id-123" \
      -H "Content-Type: application/json" \
      -d '{"prompt": "Write a simple vectoradd kernel using PyTorch"}'
  • Verify response includes generated_code and submission ID
  • Check submission status via GET /user/submissions/{id}

Allow users to submit a natural language prompt instead of code.
A new POST /ai/{leaderboard}/{gpu}/{mode} endpoint generates kernel
code via Claude API, then feeds it through the existing submission
pipeline for evaluation.
Copilot AI review requested due to automatic review settings February 26, 2026 01:45
@github-actions
Copy link

github-actions bot commented Feb 26, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  ai_generate.py 30-71
  utils.py
Project Total  

This report was generated by python-coverage-comment-action

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds AI-powered kernel code generation functionality to the KernelBot API. It integrates Claude AI (via the Anthropic API) to generate GPU kernel code from natural language prompts, then automatically submits the generated code through the existing evaluation pipeline.

Changes:

  • Added new POST /ai/{leaderboard}/{gpu}/{mode} endpoint that accepts natural language prompts and generates kernel code via Claude API
  • Created ai_generate.py module with generate_kernel() function that builds context-rich prompts from leaderboard metadata and generates code
  • Added anthropic Python dependency and ANTHROPIC_API_KEY environment variable

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
uv.lock Adds anthropic package v0.84.0 and its dependencies (jiter, distro, docstring-parser) to the lock file
src/libkernelbot/ai_generate.py New module implementing kernel code generation using Claude AI with context from task specs, templates, and test cases
src/kernelbot/env.py Adds ANTHROPIC_API_KEY environment variable configuration
src/kernelbot/api/main.py Implements new AI submission endpoint that generates code and submits it through existing pipeline
pyproject.toml Adds anthropic dependency to project dependencies
Comments suppressed due to low confidence (11)

src/kernelbot/api/main.py:474

  • The prompt content should be sanitized or validated before being sent to the Anthropic API. While the length is checked (max 10000 characters), there's no validation of the prompt content itself. Consider adding checks for potentially malicious content, prompt injection attempts, or rate limiting per user to prevent abuse of the AI API (which likely has associated costs).
        prompt = payload.get("prompt")
        if not prompt or not isinstance(prompt, str):
            raise HTTPException(status_code=400, detail="Missing or invalid 'prompt' in request body")
        if len(prompt) > 10000:
            raise HTTPException(status_code=400, detail="Prompt too long (max 10000 characters)")

src/kernelbot/api/main.py:468

  • The AI submission endpoint should log the request details for observability and debugging, similar to how the regular submission endpoint logs "Received submission request for..." at line 567-568. Add logging at the start of the AI submission handler to track leaderboard name, GPU type, submission mode, and user info for operational visibility.
    try:
        await simple_rate_limit()

src/kernelbot/api/main.py:468

  • The AI generation endpoint uses the existing simple_rate_limit function which allows 10 requests per second globally. However, AI API calls likely have higher costs and potentially different rate limits than regular submissions. Consider implementing a separate, more restrictive rate limiter specifically for AI generation requests to prevent unexpected API costs and to respect Anthropic's rate limits. You may also want to implement per-user rate limiting for AI requests.
        await simple_rate_limit()

src/libkernelbot/ai_generate.py:66

  • If the regex fails to find a code block (match is None), the function returns the raw response text. However, if the AI response contains explanatory text around the code but no proper code fence, this could result in invalid code being submitted. Consider adding validation after extraction to ensure the extracted code is not empty and contains expected patterns (e.g., function definitions), or require the AI to always use code blocks by being more explicit in the system prompt.
    # Extract code from a fenced code block if present
    match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
    code = match.group(1).strip() if match else raw.strip()

src/kernelbot/api/main.py:503

  • The error message logged contains the full exception details which may include sensitive information from the Anthropic API response. Consider sanitizing the error message before including it in the HTTPException detail, especially since this error is returned to the client. Log the full error server-side for debugging, but return a more generic error message to the client.
        except Exception as e:
            logger.error(f"AI generation failed: {e}")
            raise HTTPException(status_code=502, detail=f"AI code generation failed: {e}") from e

src/libkernelbot/ai_generate.py:62

  • The code assumes the response will have a text attribute at response.content[0].text, but this could fail if the response structure is different. The Anthropic API could return different content types or an empty content array. Add validation to check that the response has content and that the first content block is of type 'text' before accessing the text attribute.
    raw = response.content[0].text

src/kernelbot/api/main.py:528

  • The generated code is included in the response body, which could result in very large responses if the AI generates lengthy code. This could cause issues with response size limits or consume significant bandwidth. Consider whether returning the full generated code is necessary, or if it should be optional (e.g., via a query parameter) since the code is already stored in the submission and can be retrieved via the submission ID.
                "generated_code": code,

src/libkernelbot/ai_generate.py:69

  • The new AI kernel generation functionality lacks test coverage. Given that the codebase has comprehensive tests for other API endpoints (as seen in test_admin_api.py, test_submission.py, etc.), tests should be added for the new generate_kernel function and the run_ai_submission endpoint. Consider adding tests for: successful generation, handling of invalid prompts, missing API key scenarios, AI API failures, and various template/task configurations.
async def generate_kernel(
    prompt: str,
    task: LeaderboardTask,
    description: str,
    templates: dict[str, str],
) -> tuple[str, str]:
    """Generate kernel code from a natural language prompt using Claude.

    Args:
        prompt: The user's natural language description of the kernel to generate.
        task: The LeaderboardTask containing file signatures, tests, and config.
        description: The leaderboard's problem description.
        templates: Template/starter code files keyed by language name.

    Returns:
        A tuple of (generated_code, file_name).
    """
    # Build context from the task
    system_parts = [
        "You are an expert GPU kernel programmer. Generate code that solves the given problem.",
        "Return ONLY the code inside a single code block. No explanation outside the code block.",
    ]

    if description:
        system_parts.append(f"## Problem Description\n{description}")

    # Include template code so the AI knows the expected function signatures
    if templates:
        for lang, code in templates.items():
            system_parts.append(f"## Template ({lang})\n```\n{code}\n```")

    # Include reference/test files for additional context (skip submission placeholder)
    for name, content in task.files.items():
        if content != "@SUBMISSION@":
            system_parts.append(f"## Reference file: {name}\n```\n{content}\n```")

    # Include test specs so the AI knows input sizes / shapes
    if task.tests:
        system_parts.append(f"## Test cases\n{task.tests}")

    system_prompt = "\n\n".join(system_parts)

    client = anthropic.AsyncAnthropic()
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system_prompt,
        messages=[{"role": "user", "content": prompt}],
    )

    raw = response.content[0].text

    # Extract code from a fenced code block if present
    match = re.search(r"```(?:\w+)?\n(.*?)```", raw, re.DOTALL)
    code = match.group(1).strip() if match else raw.strip()

    file_name = "submission.py" if task.lang == Language.Python else "submission.cu"
    return code, file_name

src/libkernelbot/ai_generate.py:54

  • The AsyncAnthropic client is created without passing the API key. While the Anthropic SDK will automatically read from the ANTHROPIC_API_KEY environment variable, the code should explicitly validate that this environment variable is set before making the API call, or pass it explicitly to provide clearer error messages. If the API key is not set, the user would receive a cryptic error from the Anthropic SDK instead of a clear validation error from the application.
    client = anthropic.AsyncAnthropic()

src/libkernelbot/ai_generate.py:68

  • The file name is hardcoded based on the task language, but this doesn't account for other supported languages. The Language enum in consts.py only has Python and CUDA, but the templates support additional languages like Triton, HIP, and CuteDSL (as seen in task.py line 149). If a task uses one of these other languages, the file name logic will fall back to "submission.cu" which may be incorrect. Consider adding proper handling for all supported languages.
    file_name = "submission.py" if task.lang == Language.Python else "submission.cu"

src/kernelbot/env.py:44

  • The ANTHROPIC_API_KEY environment variable is defined but not validated in the init_environment function. Unlike other critical environment variables like GITHUB_TOKEN and DISCORD_TOKEN that are validated on startup, this API key is only used when someone calls the AI endpoint. Consider whether this should be a required environment variable that's validated on startup, or if the endpoint should gracefully handle the case where it's not set with a clear error message.
env.ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants