Kaggle Competition: Find the Water Task: Pixel-level segmentation of water regions with precise boundary detection Training time: ~1 hour on Kaggle T4 GPU
| Metric | Score |
|---|---|
| Val Dice | 0.9814 |
| Val IoU | 0.9641 |
| Composite Score | 0.9710 |
| Best Epoch | 45 / 60 |
| Optimal Threshold | 0.60 |
The model converged rapidly — hitting 0.94+ composite by epoch 5 — then continued refining steadily through epoch 45 before plateauing. No overfitting was observed: val loss tracked train loss closely throughout all 60 epochs.
Key milestones:
| Epoch | Val Dice | Val IoU | Composite |
|---|---|---|---|
| 1 | 0.858 | 0.763 | 0.801 |
| 5 | 0.963 | 0.931 | 0.944 |
| 10 | 0.971 | 0.946 | 0.956 |
| 20 | 0.978 | 0.958 | 0.966 |
| 43 | 0.981 | 0.964 | 0.971 ← best |
| 60 | 0.978 | 0.959 | 0.967 |
Each row shows: input image → ground truth mask → model prediction. The model cleanly delineates water boundaries even in complex scenes with reflections, partial occlusion, and varying lighting.
Rather than using a fixed threshold of 0.5, the optimal binarization threshold was searched across the validation set using the actual competition metric. Higher thresholds produced sharper, more conservative boundaries — improving Boundary IoU and Contour F1 at the cost of slightly lower Region IoU.
| Threshold | Score | B-IoU | C-F1 | R-IoU |
|---|---|---|---|---|
| 0.30 | 0.421 | 0.145 | 0.429 | 0.955 |
| 0.40 | 0.453 | 0.186 | 0.468 | 0.959 |
| 0.50 | 0.478 | 0.221 | 0.494 | 0.962 |
| 0.60 | 0.493 | 0.243 | 0.508 | 0.962 |
Optimal threshold: 0.60 — the scoring heavily weights boundary precision (80% of score), so a higher threshold that tightens the predicted boundary outperforms the default 0.5.
Standard segmentation metrics (IoU, Dice) reward getting the bulk of the region right and are forgiving of sloppy edges. This competition flips that — 80% of the score comes from boundary precision:
| Component | Weight | What it measures |
|---|---|---|
| Boundary IoU | 40% | Overlap of dilated boundary pixels |
| Contour F1 | 40% | Point-to-point contour matching within 3px |
| Region IoU | 20% | Standard mask overlap |
A model that perfectly segments the region but has rough edges will score poorly. Every design decision here was made with boundary sharpness in mind.
Input (512×640) → EfficientNet-B3 Encoder → U-Net Decoder → Binary Mask
U-Net was chosen for its skip connections, which pass full-resolution spatial features directly from encoder to decoder — preserving the fine edge detail that deep encoders compress away. This is essential when boundary precision accounts for 80% of the score.
EfficientNet-B3 as the encoder provides a strong pretrained feature extractor with a good balance of accuracy and compute efficiency.
Input resolution 512×640 (upscaled from original) gives the model more pixels to work with at boundaries, directly improving contour precision.
Standard BCE or Dice loss only cares about pixel-level region overlap — they give no special weight to boundaries. The custom loss combines four terms:
| Component | Weight | Role |
|---|---|---|
| BCE Loss | 20% | Baseline per-pixel supervision |
| Dice Loss | 30% | Region-level overlap |
| Gradient Loss | 30% | Penalizes differences in spatial gradients — directly targets boundary sharpness |
| Sobel Edge Loss | 20% | MSE between predicted and ground truth edge maps via Sobel filter |
The gradient and Sobel terms together account for 50% of the total loss, meaning the model is explicitly trained to match boundaries rather than just regions.
| Parameter | Value | Reason |
|---|---|---|
| Optimizer | AdamW | Weight decay regularization helps generalization |
| Learning rate | 1e-4 | Conservative LR for stable boundary learning |
| Scheduler | CosineAnnealingWarmRestarts | Periodic LR resets help escape local minima |
| Batch size | 2 (effective 12) | Gradient accumulation × 6 steps |
| Mixed precision | AMP | Faster training, lower memory |
| Early stopping | Patience 20 | Stopped at epoch 60 (patience 15/20) |
Augmentations were kept deliberately conservative — aggressive transforms distort boundaries and degrade the contour metric.
- Geometric: horizontal/vertical flips, mild rotation (±15°), small scale/shift
- Color: brightness/contrast, hue-saturation, gamma
- Noise: Gaussian noise/blur (light)
- Weather: rain, fog, shadow (20% probability — simulates real-world water scenes)
- Dropout: CoarseDropout for occlusion robustness
At inference, each image is predicted 4 times and the results averaged:
- Original
- Horizontal flip → flip back
- Vertical flip → flip back
- Both flips → flip back
Averaging reduces prediction variance at boundaries, which directly improves Contour F1.
- Noise removal — connected components smaller than 100px are discarded as false positives
- Morphological closing — small holes in the mask are filled with a 2×2 elliptical kernel
- Edge refinement — in the boundary region (dilated − eroded mask), a slightly lower threshold (0.50 instead of 0.60) is applied to recover high-confidence edge pixels that the main threshold would discard
├── notebook.ipynb # Full training + inference notebook
├── submission.csv # Final predictions (RLE encoded, 200 images)
├── results/
│ ├── training_curves.png # Loss & Dice/IoU over 60 epochs
│ ├── predictions_sample.png # Image | Ground truth | Prediction
│ └── threshold_analysis.png # Competition score vs threshold
├── data/
│ └── threshold_optimization.csv # Raw threshold search results
└── README.md
- Add the
find-the-waterdataset to your Kaggle notebook - Run all cells in
notebook.ipynb - Outputs saved to
/kaggle/working/:submission.csv,checkpoints/best_model.pth,training_history.csv
Training takes approximately 1 hour on a Kaggle T4 GPU (60 epochs × ~63s/epoch).


