This project implements Linear Function Approximation for multi-agent reinforcement learning within an intense air combat simulation. Based on Chapter 1 of Tom Mitchell's Machine Learning, this assignment tasks an Aircraft (A) with eliminating 4 Suicide Drones (S1–S4) and reaching a Goal area, while the drones coordinate to surround and destroy the Aircraft.
The system uses Temporal Difference (TD) learning principles with linear value functions to approximate the value of states (
The simulation takes place on a 15×10 grid-based airspace.
| Element | Symbol | Description |
|---|---|---|
| Aircraft | A |
Player agent. Starts with 2 rockets (Max 3). Must eliminate drones and reach Goal. |
| Suicide Drones | S |
4 AI agents. Coordinate to surround or swarm the Aircraft. Share weights. |
| Goal | G |
Fixed at corner (15, 10). Reached after eliminating all drones for +1000 pts. |
| Reload Zone | R |
Randomly placed. Grants +1 rocket. Regenerates after use. |
| Mountains | M |
10 Obstacles. Crashing costs -500 pts and respawns Aircraft. |
- 🏆 Aircraft Wins: All drones destroyed AND Goal reached.
- 💀 Drones Win:
- Surround Aircraft (4 sides OR 2 adjacent sides).
- 2+ drones within 1 block of Aircraft.
- Aircraft crashes into a drone.
- 🤝 Draw: 30 turns pass without a decisive winner.
-
Method: Linear Value Function Approximation.
-
Update Rule: Least Mean Squares (LMS) / Gradient Descent.
-
Value Functions:
-
Aircraft:
$$\hat{V}_A(s) = W_A^T x(s)$$ -
Drones:
$$\hat{V}_D(s) = W_D^T x(s)$$
-
Aircraft:
-
Weight Update:
$$W \leftarrow W + \alpha (V_{\text{train}}(s) - \hat{V}(s)) x(s)$$
The agents evaluate states using normalized feature vectors
- Bias: Constant 1.0.
- Distance to Goal: Normalized Manhattan distance.
- Rockets: Normalized count (0-3).
- Min Drone Distance: Normalized distance to nearest drone.
- Drones within 1 Block: Count (Threat level).
- Drones within 2 Blocks: Count (Shootable range).
- Distance to Reload: Normalized distance.
- Min Mountain Distance: Normalized safety margin.
- Turn Ratio: Current turn / Max turns.
- Avg Drone Distance: Normalized average distance to all drones.
- Need Reload: Binary (1 if rockets < 1).
- All Drones Destroyed: Binary (1 if all dead).
- Bias: Constant 1.0.
- Distance to Aircraft: Normalized Manhattan distance.
- Distance to Goal: Normalized (to block aircraft).
- Distance to Nearest Drone: Normalized (for coordination).
- Aircraft in Sight: Binary (within range).
- Nearby Drones Count: Normalized count (for swarming).
- Distance to Mountain: Normalized safety margin.
- Turn Ratio: Current turn / Max turns.
- Is Surrounding: Binary (1 if currently surrounding aircraft).
| Event | Aircraft Reward | Drone Reward |
|---|---|---|
| Destroy Enemy | +250 (per drone) | -250 (per drone) |
| Win Game | +1000 (Goal) | +500 (per drone) |
| Loss/Crash | -1000 (Destroyed) | - |
| Mountain Crash | -500 | - |
| Draw | 0 | 0 |
air-combat-survival/
│
├── src/
│ ├── __init__.py
│ ├── train_agents.py # Core training logic, Game class, Matplotlib Visualization (Part A)
│ ├── evaluate_agents.py # Evaluation script (100 games), Statistics & Plotting (Part B)
│ ├── interactive_gui.py # Interactive GUI (Tkinter) for Human vs. AI (Part C)
│
├── assets/
│ ├── trained_weights.pkl # Generated file containing trained weights (Aircraft & Drones)
│
├── README.md # Project Documentation
A line plot showing the convergence of Aircraft and Drone scores over 3000 episodes. Ideally, the Aircraft score should stabilize or increase as it learns to avoid drones and reach the goal.
- Score Trends: Line plot comparing scores over 100 test games.
- Win Distribution: Pie chart showing percentages (Aircraft Win vs. Drone Win vs. Draw).
- Score Distribution: Histogram showing frequency of scores achieved.
Real-time rendering using tkinter:
- Blue Circle: Aircraft
- Red Circles: Drones
- Brown Circles: Mountains
- Gold Circle: Reload Zone
- Green Circle: Goal
Tom Mitchell, Machine Learning, Chapter 1.