Local-first desktop automation agent focused on supporting developers.
The agent runs entirely on your machine, controls the desktop, and uses a tiered
verification strategy to keep actions safe and reliable.
- Tiered visual verification using perceptual hashing and a future pluggable VLM.
- Interactive terminal UI for natural commands (e.g.
click 500 400,browser search ...). - High-level command router for browser, VS Code, filesystem, and dev workflows.
- Task planner that turns goals into concrete steps (
plan fix failing tests in module auth.py). - Environment doctor that detects and repairs many dependency/config/runtime issues automatically.
The Python package lives under local_computer_agent/ and is managed with uv.
Local_Computer_Agent/
│
├── .gitignore
├── README.md
├── pyproject.toml # uv project configuration and dependencies
├── uv.lock # uv lockfile
├── .python-version # Python version pin
│
├── examples/
│ └── wait_action.json # Example AgentAction payload
│
└── src/
└── local_computer_agent/
├── __init__.py
├── main.py # Entry point and interactive REPL
├── command_router.py # High-level commands: browser, code, file, dev, doctor
├── action_orchestrator.py # Executes UI actions and wires verification
├── verification.py # Tier 1 (pHash) + Tier 2 (VLM hook) verification
├── schemas.py # Pydantic models (AgentAction, Tier1Result, Tier2Result)
├── planner.py # Goal->steps planner (`plan ...`)
└── diagnostics.py # Environment doctor for deps/config/runtime issues
- Python
3.13(managed viauv, created automatically). - Windows desktop (project currently tested on Windows).
- Ability to install Python wheels for:
pyautogui,pyscreeze,pygetwindow,mouseinfoopencv-python,Pillow,ImageHashpydantic,openai(VLM integration hook)
Note: The project uses
uvto manage dependencies and virtual environments.
Clone the repository and install dependencies with uv:
git clone https://github.com/MoMo790-m/Local_Computer_Agent.git
cd Local_Computer_Agent/local_computer_agent
uv syncThis will create a .venv inside local_computer_agent/ and install all required packages.
From inside local_computer_agent/:
uv run local-computer-agentYou should see an interactive shell:
========================================================
Local-First Computer Agent (Interactive)
Type 'help' for commands, 'exit' to quit.
========================================================
LCA>
Type commands at the LCA> prompt. The agent will perform desktop actions and log verification
results in the terminal.
-
Tier 1 – pHash (fast path):
- Captures
preandpostscreenshots around each action. - Computes perceptual hashes (pHash) and their Hamming distance.
- If the distance is above a configurable threshold, Tier 1 passes.
- If the distance is zero or unexpectedly low, Tier 1 fails and the system can retry or escalate.
- Captures
-
Tier 2 – VLM visual reasoning (slow path):
- Crops the relevant region around the click/interaction.
- Currently implemented as a lightweight hook where you can plug in OpenAI or Ollama.
- Intended to answer questions like: “Did the login succeed?” or “Is the error banner visible?”.
Critical actions (e.g. critical clicks) always escalate to Tier 2 in addition to Tier 1.
The ActionOrchestrator converts high-level AgentAction objects into concrete desktop actions:
click (x, y)type "some text"dragrelative movementsscrollamountswaitdelays
For each action it:
- Takes a
prescreenshot. - Performs the UI operation via
pyautogui. - Takes a
postscreenshot and runs Tier 1 verification. - If there is no visual change for several retries, it:
- Sends
Esc. - Clicks a neutral region (to reset focus).
- Sends
- If Tier 1 fails or the action is marked
critical, it calls Tier 2.
Once the REPL is running (uv run local-computer-agent), you can use:
-
Low-level UI actions
click X Y [critical] -> click at (X, Y) type TEXT -> type TEXT at the current focus wait SECONDS -> wait SECONDS (float) scroll AMOUNT -> scroll by AMOUNT (int; negative = up) -
High-level desktop automation
browser open -> open default browser browser search QUERY -> search QUERY in browser code open [PATH] -> open VS Code on the project or a given file file new PATH -> create empty file file write PATH TEXT... -> overwrite PATH with TEXT file append PATH TEXT... -> append TEXT as a new line to PATH dev test -> run tests (uv run pytest) dev format -> format code (placeholder: uv run python -m black src) dev doctor -> run environment diagnostics and auto-fixes dev cmd ARGS... -> run arbitrary shell command in project root -
Planning
plan GOALExample:
LCA> plan fix failing tests in module auth.py [PLAN] Steps: 1. browser search fix failing tests in module auth.py # Look up documentation or prior art for the goal. 2. code open # Open the project in VS Code for direct code edits. 3. file new auth.py # Ensure the target file exists for editing. 4. dev test # Run the test suite to see the current failure state. ...The planner then executes each step automatically, using the command router and orchestrator.
The environment doctor is designed to help developers recover from common local issues without manual intervention:
-
Run:
LCA> dev doctor -
Behavior:
- Executes
uv run pytest. - If tests pass, it exits.
- If tests fail, it parses output for patterns like:
ModuleNotFoundError: No module named 'xyz'- Version conflicts in dependency resolution.
FileNotFoundErrorfor project-local files.
- Applies fixes such as:
uv add xyzuv lock --upgrade- Creating missing project files.
- Re-runs tests, repeating for several rounds until the issues are resolved or no automatic fix can be found.
- Executes
This makes the agent particularly useful as a self-healing dev environment assistant.
For automation or scripting, you can bypass the REPL and run a single AgentAction from the CLI:
uv run local-computer-agent --action-json '{"action_type":"wait","coordinates":null,"payload":"0.5","expected_outcome":"Short delay completes","critical":false}'Or using a JSON file:
uv run local-computer-agent --action-file examples/wait_action.jsonThis calls directly into the orchestrator and verification pipeline.
The project is organized so new capabilities can be added with minimal coupling:
- New commands: extend
CommandRouterto adddev build,browser open-url, etc. - New planning patterns: extend
TaskPlannerto recognize more goal types and generate richer step lists. - Real VLM verification: implement the Tier 2 hook in
verification.pyusing OpenAI or Ollama. - Additional diagnostics: extend
DiagnosticsEnginewith more error patterns and repair strategies.
Because the core pieces (router, planner, orchestrator, verification, diagnostics) are separated, you can evolve each independently as the agent grows.