Skip to content

Azjob21/water_segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

💧 Water Segmentation — Hard Boundary Detection

Kaggle Competition: Find the Water Task: Pixel-level segmentation of water regions with precise boundary detection Training time: ~1 hour on Kaggle T4 GPU


🏆 Final Results

Metric Score
Val Dice 0.9814
Val IoU 0.9641
Composite Score 0.9710
Best Epoch 45 / 60
Optimal Threshold 0.60

📈 Training Curves

Training Curves

The model converged rapidly — hitting 0.94+ composite by epoch 5 — then continued refining steadily through epoch 45 before plateauing. No overfitting was observed: val loss tracked train loss closely throughout all 60 epochs.

Key milestones:

Epoch Val Dice Val IoU Composite
1 0.858 0.763 0.801
5 0.963 0.931 0.944
10 0.971 0.946 0.956
20 0.978 0.958 0.966
43 0.981 0.964 0.971 ← best
60 0.978 0.959 0.967

🔍 Sample Predictions

Sample Predictions

Each row shows: input image → ground truth mask → model prediction. The model cleanly delineates water boundaries even in complex scenes with reflections, partial occlusion, and varying lighting.


📉 Threshold Optimization

Threshold Analysis

Rather than using a fixed threshold of 0.5, the optimal binarization threshold was searched across the validation set using the actual competition metric. Higher thresholds produced sharper, more conservative boundaries — improving Boundary IoU and Contour F1 at the cost of slightly lower Region IoU.

Threshold Score B-IoU C-F1 R-IoU
0.30 0.421 0.145 0.429 0.955
0.40 0.453 0.186 0.468 0.959
0.50 0.478 0.221 0.494 0.962
0.60 0.493 0.243 0.508 0.962

Optimal threshold: 0.60 — the scoring heavily weights boundary precision (80% of score), so a higher threshold that tightens the predicted boundary outperforms the default 0.5.


🧠 Approach

Why this problem is hard

Standard segmentation metrics (IoU, Dice) reward getting the bulk of the region right and are forgiving of sloppy edges. This competition flips that — 80% of the score comes from boundary precision:

Component Weight What it measures
Boundary IoU 40% Overlap of dilated boundary pixels
Contour F1 40% Point-to-point contour matching within 3px
Region IoU 20% Standard mask overlap

A model that perfectly segments the region but has rough edges will score poorly. Every design decision here was made with boundary sharpness in mind.


Model: U-Net + EfficientNet-B3

Input (512×640) → EfficientNet-B3 Encoder → U-Net Decoder → Binary Mask

U-Net was chosen for its skip connections, which pass full-resolution spatial features directly from encoder to decoder — preserving the fine edge detail that deep encoders compress away. This is essential when boundary precision accounts for 80% of the score.

EfficientNet-B3 as the encoder provides a strong pretrained feature extractor with a good balance of accuracy and compute efficiency.

Input resolution 512×640 (upscaled from original) gives the model more pixels to work with at boundaries, directly improving contour precision.


Loss Function: EnhancedBoundaryLoss

Standard BCE or Dice loss only cares about pixel-level region overlap — they give no special weight to boundaries. The custom loss combines four terms:

Component Weight Role
BCE Loss 20% Baseline per-pixel supervision
Dice Loss 30% Region-level overlap
Gradient Loss 30% Penalizes differences in spatial gradients — directly targets boundary sharpness
Sobel Edge Loss 20% MSE between predicted and ground truth edge maps via Sobel filter

The gradient and Sobel terms together account for 50% of the total loss, meaning the model is explicitly trained to match boundaries rather than just regions.


Training Setup

Parameter Value Reason
Optimizer AdamW Weight decay regularization helps generalization
Learning rate 1e-4 Conservative LR for stable boundary learning
Scheduler CosineAnnealingWarmRestarts Periodic LR resets help escape local minima
Batch size 2 (effective 12) Gradient accumulation × 6 steps
Mixed precision AMP Faster training, lower memory
Early stopping Patience 20 Stopped at epoch 60 (patience 15/20)

Augmentation Strategy

Augmentations were kept deliberately conservative — aggressive transforms distort boundaries and degrade the contour metric.

  • Geometric: horizontal/vertical flips, mild rotation (±15°), small scale/shift
  • Color: brightness/contrast, hue-saturation, gamma
  • Noise: Gaussian noise/blur (light)
  • Weather: rain, fog, shadow (20% probability — simulates real-world water scenes)
  • Dropout: CoarseDropout for occlusion robustness

Inference: Test-Time Augmentation (TTA)

At inference, each image is predicted 4 times and the results averaged:

  1. Original
  2. Horizontal flip → flip back
  3. Vertical flip → flip back
  4. Both flips → flip back

Averaging reduces prediction variance at boundaries, which directly improves Contour F1.


Post-Processing

  1. Noise removal — connected components smaller than 100px are discarded as false positives
  2. Morphological closing — small holes in the mask are filled with a 2×2 elliptical kernel
  3. Edge refinement — in the boundary region (dilated − eroded mask), a slightly lower threshold (0.50 instead of 0.60) is applied to recover high-confidence edge pixels that the main threshold would discard

📁 Repository Structure

├── notebook.ipynb                    # Full training + inference notebook
├── submission.csv                    # Final predictions (RLE encoded, 200 images)
├── results/
│   ├── training_curves.png           # Loss & Dice/IoU over 60 epochs
│   ├── predictions_sample.png        # Image | Ground truth | Prediction
│   └── threshold_analysis.png        # Competition score vs threshold
├── data/
│   └── threshold_optimization.csv    # Raw threshold search results
└── README.md

🔁 Reproducing

  1. Add the find-the-water dataset to your Kaggle notebook
  2. Run all cells in notebook.ipynb
  3. Outputs saved to /kaggle/working/: submission.csv, checkpoints/best_model.pth, training_history.csv

Training takes approximately 1 hour on a Kaggle T4 GPU (60 epochs × ~63s/epoch).

Releases

No releases published

Packages

 
 
 

Contributors