Files
object-segmentation/docs/TRAINING_16BIT_TIFF.md
2025-12-13 09:42:00 +02:00

269 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Training YOLO with 16-bit TIFF Datasets
## Quick Start
If your dataset contains 16-bit grayscale TIFF files, the training tab will automatically:
1. Detect 16-bit TIFF images in your dataset
2. Convert them to float32 [0-1] RGB **on-the-fly** during training
3. Train without any disk caching (memory-efficient)
**No manual intervention or disk space needed!**
## Why Float32 On-The-Fly Conversion?
### The Problem
YOLO's training expects:
- 3-channel images (RGB)
- Images loaded from disk by the dataloader
16-bit grayscale TIFFs are:
- 1-channel (grayscale)
- Need to be converted to RGB format
### The Solution
**NEW APPROACH (Current)**: On-the-fly float32 conversion
- Load 16-bit TIFF with `tifffile` (not PIL/cv2)
- Convert uint16 [0-65535] → float32 [0-1] in memory
- Replicate grayscale to 3 channels
- Pass directly to YOLO training pipeline
- **No disk caching required!**
**OLD APPROACH (Deprecated)**: Disk caching
- Created 16-bit RGB PNG cache files on disk
- Required ~2x dataset size in disk space
- Slower first training run
## How It Works
### Custom Dataset Loader
The system uses a custom `Float32Dataset` class that extends Ultralytics' `YOLODataset`:
```python
from src.utils.train_ultralytics_float import Float32Dataset
# This dataset loader:
# 1. Intercepts image loading
# 2. Detects 16-bit TIFFs
# 3. Converts to float32 [0-1] RGB on-the-fly
# 4. Passes to training pipeline
```
### Conversion Process
For each 16-bit grayscale TIFF during training:
```
1. Load with tifffile → uint16 [0, 65535]
2. Convert to float32 → img.astype(float32) / 65535.0
3. Replicate to RGB → np.stack([img] * 3, axis=-1)
4. Result: float32 [0, 1] RGB array, shape (H, W, 3)
```
### Memory vs Disk
| Aspect | On-the-fly (NEW) | Disk Cache (OLD) |
|--------|------------------|------------------|
| Disk Space | Dataset size only | ~2× dataset size |
| First Training | Fast | Slow (creates cache) |
| Subsequent Training | Fast | Fast |
| Data Loss | None | None |
| Setup Required | None | Cache creation |
## Data Preservation
### Float32 Precision
16-bit TIFF: 65,536 levels (0-65535)
Float32: ~7 decimal digits precision
**Conversion accuracy:**
```python
Original: 32768 (uint16, middle intensity)
Float32: 32768 / 65535 = 0.50000763 (exact)
```
Full 16-bit precision is preserved in float32 representation.
### Comparison to uint8
| Approach | Precision Loss | Recommended |
|----------|----------------|-------------|
| **float32 [0-1]** | None | ✓ YES |
| uint16 RGB | None | ✓ YES (but disk-heavy) |
| uint8 | 99.6% data loss | ✗ NO |
**Why NO uint8:**
```
Original values: 32768, 32769, 32770 (distinct)
Converted to uint8: 128, 128, 128 (collapsed!)
```
Multiple 16-bit values collapse to the same uint8 value.
## Training Tab Behavior
When you click "Start Training" with a 16-bit TIFF dataset:
```
[01:23:45] Exported 150 annotations across 50 image(s).
[01:23:45] Using Float32 on-the-fly loader for 16-bit TIFF support (no disk caching)
[01:23:45] Starting training run 'my_model_v1' using yolov8s-seg.pt
[01:23:46] Using Float32Dataset loader for 16-bit TIFF support
```
Every training run uses the same approach - fast and efficient!
## Inference vs Training
| Operation | Input | Processing | Output to YOLO |
|-----------|-------|------------|----------------|
| **Inference** | 16-bit TIFF file | Load → float32 [0-1] → 3ch | numpy array (float32) |
| **Training** | 16-bit TIFF dataset | Load on-the-fly → float32 [0-1] → 3ch | numpy array (float32) |
Both preserve full 16-bit precision using float32 representation.
## Technical Details
### Custom Dataset Class
Located in `src/utils/train_ultralytics_float.py`:
```python
class Float32Dataset(YOLODataset):
"""
Extends Ultralytics YOLODataset to handle 16-bit TIFFs.
Key methods:
- load_image(): Intercepts image loading
- Detects .tif/.tiff with dtype == uint16
- Converts: uint16 → float32 [0-1] → RGB (3-channel)
"""
```
### Integration with YOLO
The `YOLOWrapper.train()` method automatically uses the custom loader:
```python
# In src/model/yolo_wrapper.py
def train(self, data_yaml, use_float32_loader=True, **kwargs):
if use_float32_loader:
# Use custom Float32Dataset
return train_with_float32_loader(...)
else:
# Standard YOLO training
return self.model.train(...)
```
### No PIL or cv2 for 16-bit
16-bit TIFF loading uses `tifffile` directly:
- PIL: Can load 16-bit but converts during processing
- cv2: Limited 16-bit TIFF support
- tifffile: Native 16-bit support, numpy output
## Advantages Over Disk Caching
### 1. No Disk Space Required
```
Dataset: 1000 images × 12 MB = 12 GB
Old cache: Additional 24 GB (16-bit RGB PNGs)
New approach: 0 GB additional (on-the-fly)
```
### 2. Faster Setup
```
Old: First training requires cache creation (minutes)
New: Start training immediately (seconds)
```
### 3. Always In Sync
```
Old: Cache could become stale if images change
New: Always loads current version from disk
```
### 4. Simpler Workflow
```
Old: Manage cache directory, cleanup, etc.
New: Just point to dataset and train
```
## Troubleshooting
### Error: "expected input to have 3 channels, but got 1"
This shouldn't happen with the new Float32Dataset, but if it does:
1. Check that `use_float32_loader=True` in training call
2. Verify `Float32Dataset` is being used (check logs)
3. Ensure `tifffile` is installed: `pip install tifffile`
### Memory Usage
On-the-fly conversion uses memory during training:
- Image loaded: ~24 MB (2048×2048 uint16)
- Converted float32 RGB: ~48 MB (temporary)
- Released after augmentation pipeline
**Mitigation:**
- Reduce batch size if OOM errors occur
- Images are processed one at a time during loading
- Only active batch kept in memory
### Slow Training
If training seems slow:
- Check disk I/O (slow disk can bottleneck loading)
- Verify images aren't being re-converted each epoch (should cache after first load)
- Monitor CPU usage during loading
## Migration from Old Approach
If you have existing cached datasets:
```bash
# Old cache location (safe to delete)
rm -rf data/datasets/_float32_cache/
# The new approach doesn't use this directory
```
Your original dataset structure remains unchanged:
```
data/my_dataset/
├── train/
│ ├── images/ (original 16-bit TIFFs)
│ └── labels/
├── val/
│ ├── images/
│ └── labels/
└── data.yaml
```
Just point to the same `data.yaml` and train!
## Performance Comparison
| Metric | Old (Disk Cache) | New (On-the-fly) |
|--------|------------------|------------------|
| First training setup | 5-10 min | 0 sec |
| Disk space overhead | 100% | 0% |
| Training speed | Fast | Fast |
| Subsequent runs | Fast | Fast |
| Data accuracy | 16-bit preserved | 16-bit preserved |
## Summary
**On-the-fly conversion**: Load and convert during training
**No disk caching**: Zero additional disk space
**Full precision**: Float32 preserves 16-bit dynamic range
**No PIL/cv2**: Direct tifffile loading
**Automatic**: Works transparently with training tab
**Fast**: Efficient memory-based conversion
The new approach is simpler, faster to set up, and requires no disk space overhead!