English | 中文
The world's first free Computer Use agent powered by GitHub Copilot API.
Screenshot → Vision Analysis → Action Execution — all through your Copilot subscription ($10/mo or free for students).
Every Computer Use agent today requires expensive Vision API calls ($$$). We discovered that GitHub Copilot API supports base64 image input via the copilot-vision-request: true header. This means you can build a fully functional Computer Use agent at zero additional cost beyond your existing Copilot subscription.
┌─────────────────────────────────────────────────┐
│ Agent Control Loop │
│ │
│ 1. 📸 Capture screenshot (mss + Pillow) │
│ 2. 🔄 Encode to base64 PNG │
│ 3. 👁️ Analyze via Copilot Vision API │
│ (copilot-vision-request: true header) │
│ 4. 🧠 Plan next action via Copilot Text API │
│ 5. ⚡ Execute action (pyautogui) │
│ 6. 🔁 Repeat until task complete │
└─────────────────────────────────────────────────┘
↕ All API calls go through ↕
┌─────────────────────────────────────────────────┐
│ Copilot API Adapter │
│ • GitHub OAuth Device Flow authentication │
│ • Automatic JWT token refresh │
│ • VS Code version spoofing │
│ • copilot-vision-request header management │
└─────────────────────────────────────────────────┘
| Aspect | Traditional Agent | copilot-computer-use |
|---|---|---|
| Vision API Cost | $0.01-0.05/screenshot | $0 (included in Copilot) |
| Text Reasoning Cost | $0.003-0.06/request | $0 (included in Copilot) |
| Monthly Cost | $20-100+ | $10/mo (Copilot) or free |
| Models Available | 1 provider | GPT-4o, Claude, Gemini |
- Python 3.11+
- GitHub account with Copilot subscription (Individual, Business, or Enterprise)
- macOS or Linux (Windows support planned)
# Clone
git clone https://github.com/Zey413/copilot-computer-use.git
cd copilot-computer-use
# Install
pip install -e .
# Authenticate with GitHub (one-time)
python -m src.copilot.auth
# Run
python -m src.main "Open Chrome and search for the weather"src/
├── main.py # Entry point + CLI
├── copilot/
│ ├── auth.py # GitHub OAuth Device Flow + JWT refresh
│ ├── client.py # Copilot chat/completions API client
│ └── config.py # API endpoints, headers, version spoofing
├── agent/
│ ├── loop.py # Core agent control loop
│ ├── planner.py # Task planning via text reasoning
│ └── actions.py # Action definitions (click, type, scroll...)
├── screen/
│ ├── capture.py # Screenshot capture (mss + Pillow)
│ └── annotate.py # Optional: annotate screenshots with markers
└── executor/
├── base.py # Abstract executor interface
├── macos.py # macOS executor (pyautogui + AppleScript)
└── linux.py # Linux executor (pyautogui + xdotool)
The key discovery: Copilot's chat/completions API accepts base64-encoded images when you include the copilot-vision-request: true header:
import httpx
response = httpx.post(
"https://api.githubcopilot.com/chat/completions",
headers={
"Authorization": f"Bearer {copilot_jwt}",
"copilot-vision-request": "true",
"editor-version": "vscode/1.104.3",
# ... other headers
},
json={
"model": "gpt-4o",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What do you see on this screen?"},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{screenshot_b64}"
}}
]
}]
}
)This project is based on extensive research documented in Reviews #001-#013:
- Raven (nocoo/raven): Proved Copilot API can be proxied with full Anthropic↔OpenAI translation
- Copilot Vision: Confirmed base64 support via 6+ open-source projects (LiteLLM, OpenCode, LobeHub, etc.)
- Market Gap: Zero existing projects combine Copilot proxy + Computer Use
- GPT-4o / GPT-4.1 (Vision ✅)
- Claude Sonnet 4 / Opus 4 (Vision ✅)
- Gemini 2.5 Pro / 3 Flash (Vision ✅)
- o4-mini (Reasoning)
- And more as GitHub adds them
- Experimental: This is a research project, not production software
- Rate Limits: Copilot has rate limits; aggressive screenshot loops may be throttled
- ToS Gray Area: Using Copilot API beyond IDE integration may not be explicitly permitted
- Image Formats: PNG is most reliable; JPEG works; WebP may have issues
- macOS/Linux only: Windows executor not yet implemented
- Raven — Inspiration for Copilot API authentication and format translation
- self-operating-computer — Reference for lightweight Computer Use architecture
- copilot-api — Early Copilot reverse engineering work
MIT