From c7e12711936ce055428ec97e02faca1d8465fb59 Mon Sep 17 00:00:00 2001 From: Martin Laasmaa Date: Sat, 13 Dec 2025 09:42:00 +0200 Subject: [PATCH] Adding file --- docs/TRAINING_16BIT_TIFF.md | 269 ++++++++++++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 docs/TRAINING_16BIT_TIFF.md diff --git a/docs/TRAINING_16BIT_TIFF.md b/docs/TRAINING_16BIT_TIFF.md new file mode 100644 index 0000000..36f5936 --- /dev/null +++ b/docs/TRAINING_16BIT_TIFF.md @@ -0,0 +1,269 @@ +# Training YOLO with 16-bit TIFF Datasets + +## Quick Start + +If your dataset contains 16-bit grayscale TIFF files, the training tab will automatically: + +1. Detect 16-bit TIFF images in your dataset +2. Convert them to float32 [0-1] RGB **on-the-fly** during training +3. Train without any disk caching (memory-efficient) + +**No manual intervention or disk space needed!** + +## Why Float32 On-The-Fly Conversion? + +### The Problem + +YOLO's training expects: +- 3-channel images (RGB) +- Images loaded from disk by the dataloader + +16-bit grayscale TIFFs are: +- 1-channel (grayscale) +- Need to be converted to RGB format + +### The Solution + +**NEW APPROACH (Current)**: On-the-fly float32 conversion +- Load 16-bit TIFF with `tifffile` (not PIL/cv2) +- Convert uint16 [0-65535] → float32 [0-1] in memory +- Replicate grayscale to 3 channels +- Pass directly to YOLO training pipeline +- **No disk caching required!** + +**OLD APPROACH (Deprecated)**: Disk caching +- Created 16-bit RGB PNG cache files on disk +- Required ~2x dataset size in disk space +- Slower first training run + +## How It Works + +### Custom Dataset Loader + +The system uses a custom `Float32Dataset` class that extends Ultralytics' `YOLODataset`: + +```python +from src.utils.train_ultralytics_float import Float32Dataset + +# This dataset loader: +# 1. Intercepts image loading +# 2. Detects 16-bit TIFFs +# 3. Converts to float32 [0-1] RGB on-the-fly +# 4. Passes to training pipeline +``` + +### Conversion Process + +For each 16-bit grayscale TIFF during training: + +``` +1. Load with tifffile → uint16 [0, 65535] +2. Convert to float32 → img.astype(float32) / 65535.0 +3. Replicate to RGB → np.stack([img] * 3, axis=-1) +4. Result: float32 [0, 1] RGB array, shape (H, W, 3) +``` + +### Memory vs Disk + +| Aspect | On-the-fly (NEW) | Disk Cache (OLD) | +|--------|------------------|------------------| +| Disk Space | Dataset size only | ~2× dataset size | +| First Training | Fast | Slow (creates cache) | +| Subsequent Training | Fast | Fast | +| Data Loss | None | None | +| Setup Required | None | Cache creation | + +## Data Preservation + +### Float32 Precision + +16-bit TIFF: 65,536 levels (0-65535) +Float32: ~7 decimal digits precision + +**Conversion accuracy:** +```python +Original: 32768 (uint16, middle intensity) +Float32: 32768 / 65535 = 0.50000763 (exact) +``` + +Full 16-bit precision is preserved in float32 representation. + +### Comparison to uint8 + +| Approach | Precision Loss | Recommended | +|----------|----------------|-------------| +| **float32 [0-1]** | None | ✓ YES | +| uint16 RGB | None | ✓ YES (but disk-heavy) | +| uint8 | 99.6% data loss | ✗ NO | + +**Why NO uint8:** +``` +Original values: 32768, 32769, 32770 (distinct) +Converted to uint8: 128, 128, 128 (collapsed!) +``` + +Multiple 16-bit values collapse to the same uint8 value. + +## Training Tab Behavior + +When you click "Start Training" with a 16-bit TIFF dataset: + +``` +[01:23:45] Exported 150 annotations across 50 image(s). +[01:23:45] Using Float32 on-the-fly loader for 16-bit TIFF support (no disk caching) +[01:23:45] Starting training run 'my_model_v1' using yolov8s-seg.pt +[01:23:46] Using Float32Dataset loader for 16-bit TIFF support +``` + +Every training run uses the same approach - fast and efficient! + +## Inference vs Training + +| Operation | Input | Processing | Output to YOLO | +|-----------|-------|------------|----------------| +| **Inference** | 16-bit TIFF file | Load → float32 [0-1] → 3ch | numpy array (float32) | +| **Training** | 16-bit TIFF dataset | Load on-the-fly → float32 [0-1] → 3ch | numpy array (float32) | + +Both preserve full 16-bit precision using float32 representation. + +## Technical Details + +### Custom Dataset Class + +Located in `src/utils/train_ultralytics_float.py`: + +```python +class Float32Dataset(YOLODataset): + """ + Extends Ultralytics YOLODataset to handle 16-bit TIFFs. + + Key methods: + - load_image(): Intercepts image loading + - Detects .tif/.tiff with dtype == uint16 + - Converts: uint16 → float32 [0-1] → RGB (3-channel) + """ +``` + +### Integration with YOLO + +The `YOLOWrapper.train()` method automatically uses the custom loader: + +```python +# In src/model/yolo_wrapper.py +def train(self, data_yaml, use_float32_loader=True, **kwargs): + if use_float32_loader: + # Use custom Float32Dataset + return train_with_float32_loader(...) + else: + # Standard YOLO training + return self.model.train(...) +``` + +### No PIL or cv2 for 16-bit + +16-bit TIFF loading uses `tifffile` directly: +- PIL: Can load 16-bit but converts during processing +- cv2: Limited 16-bit TIFF support +- tifffile: Native 16-bit support, numpy output + +## Advantages Over Disk Caching + +### 1. No Disk Space Required +``` +Dataset: 1000 images × 12 MB = 12 GB +Old cache: Additional 24 GB (16-bit RGB PNGs) +New approach: 0 GB additional (on-the-fly) +``` + +### 2. Faster Setup +``` +Old: First training requires cache creation (minutes) +New: Start training immediately (seconds) +``` + +### 3. Always In Sync +``` +Old: Cache could become stale if images change +New: Always loads current version from disk +``` + +### 4. Simpler Workflow +``` +Old: Manage cache directory, cleanup, etc. +New: Just point to dataset and train +``` + +## Troubleshooting + +### Error: "expected input to have 3 channels, but got 1" + +This shouldn't happen with the new Float32Dataset, but if it does: + +1. Check that `use_float32_loader=True` in training call +2. Verify `Float32Dataset` is being used (check logs) +3. Ensure `tifffile` is installed: `pip install tifffile` + +### Memory Usage + +On-the-fly conversion uses memory during training: +- Image loaded: ~24 MB (2048×2048 uint16) +- Converted float32 RGB: ~48 MB (temporary) +- Released after augmentation pipeline + +**Mitigation:** +- Reduce batch size if OOM errors occur +- Images are processed one at a time during loading +- Only active batch kept in memory + +### Slow Training + +If training seems slow: +- Check disk I/O (slow disk can bottleneck loading) +- Verify images aren't being re-converted each epoch (should cache after first load) +- Monitor CPU usage during loading + +## Migration from Old Approach + +If you have existing cached datasets: + +```bash +# Old cache location (safe to delete) +rm -rf data/datasets/_float32_cache/ + +# The new approach doesn't use this directory +``` + +Your original dataset structure remains unchanged: +``` +data/my_dataset/ +├── train/ +│ ├── images/ (original 16-bit TIFFs) +│ └── labels/ +├── val/ +│ ├── images/ +│ └── labels/ +└── data.yaml +``` + +Just point to the same `data.yaml` and train! + +## Performance Comparison + +| Metric | Old (Disk Cache) | New (On-the-fly) | +|--------|------------------|------------------| +| First training setup | 5-10 min | 0 sec | +| Disk space overhead | 100% | 0% | +| Training speed | Fast | Fast | +| Subsequent runs | Fast | Fast | +| Data accuracy | 16-bit preserved | 16-bit preserved | + +## Summary + +✓ **On-the-fly conversion**: Load and convert during training +✓ **No disk caching**: Zero additional disk space +✓ **Full precision**: Float32 preserves 16-bit dynamic range +✓ **No PIL/cv2**: Direct tifffile loading +✓ **Automatic**: Works transparently with training tab +✓ **Fast**: Efficient memory-based conversion + +The new approach is simpler, faster to set up, and requires no disk space overhead! \ No newline at end of file