269 lines
7.2 KiB
Markdown
269 lines
7.2 KiB
Markdown
|
|
# Training YOLO with 16-bit TIFF Datasets
|
|||
|
|
|
|||
|
|
## Quick Start
|
|||
|
|
|
|||
|
|
If your dataset contains 16-bit grayscale TIFF files, the training tab will automatically:
|
|||
|
|
|
|||
|
|
1. Detect 16-bit TIFF images in your dataset
|
|||
|
|
2. Convert them to float32 [0-1] RGB **on-the-fly** during training
|
|||
|
|
3. Train without any disk caching (memory-efficient)
|
|||
|
|
|
|||
|
|
**No manual intervention or disk space needed!**
|
|||
|
|
|
|||
|
|
## Why Float32 On-The-Fly Conversion?
|
|||
|
|
|
|||
|
|
### The Problem
|
|||
|
|
|
|||
|
|
YOLO's training expects:
|
|||
|
|
- 3-channel images (RGB)
|
|||
|
|
- Images loaded from disk by the dataloader
|
|||
|
|
|
|||
|
|
16-bit grayscale TIFFs are:
|
|||
|
|
- 1-channel (grayscale)
|
|||
|
|
- Need to be converted to RGB format
|
|||
|
|
|
|||
|
|
### The Solution
|
|||
|
|
|
|||
|
|
**NEW APPROACH (Current)**: On-the-fly float32 conversion
|
|||
|
|
- Load 16-bit TIFF with `tifffile` (not PIL/cv2)
|
|||
|
|
- Convert uint16 [0-65535] → float32 [0-1] in memory
|
|||
|
|
- Replicate grayscale to 3 channels
|
|||
|
|
- Pass directly to YOLO training pipeline
|
|||
|
|
- **No disk caching required!**
|
|||
|
|
|
|||
|
|
**OLD APPROACH (Deprecated)**: Disk caching
|
|||
|
|
- Created 16-bit RGB PNG cache files on disk
|
|||
|
|
- Required ~2x dataset size in disk space
|
|||
|
|
- Slower first training run
|
|||
|
|
|
|||
|
|
## How It Works
|
|||
|
|
|
|||
|
|
### Custom Dataset Loader
|
|||
|
|
|
|||
|
|
The system uses a custom `Float32Dataset` class that extends Ultralytics' `YOLODataset`:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from src.utils.train_ultralytics_float import Float32Dataset
|
|||
|
|
|
|||
|
|
# This dataset loader:
|
|||
|
|
# 1. Intercepts image loading
|
|||
|
|
# 2. Detects 16-bit TIFFs
|
|||
|
|
# 3. Converts to float32 [0-1] RGB on-the-fly
|
|||
|
|
# 4. Passes to training pipeline
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Conversion Process
|
|||
|
|
|
|||
|
|
For each 16-bit grayscale TIFF during training:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
1. Load with tifffile → uint16 [0, 65535]
|
|||
|
|
2. Convert to float32 → img.astype(float32) / 65535.0
|
|||
|
|
3. Replicate to RGB → np.stack([img] * 3, axis=-1)
|
|||
|
|
4. Result: float32 [0, 1] RGB array, shape (H, W, 3)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Memory vs Disk
|
|||
|
|
|
|||
|
|
| Aspect | On-the-fly (NEW) | Disk Cache (OLD) |
|
|||
|
|
|--------|------------------|------------------|
|
|||
|
|
| Disk Space | Dataset size only | ~2× dataset size |
|
|||
|
|
| First Training | Fast | Slow (creates cache) |
|
|||
|
|
| Subsequent Training | Fast | Fast |
|
|||
|
|
| Data Loss | None | None |
|
|||
|
|
| Setup Required | None | Cache creation |
|
|||
|
|
|
|||
|
|
## Data Preservation
|
|||
|
|
|
|||
|
|
### Float32 Precision
|
|||
|
|
|
|||
|
|
16-bit TIFF: 65,536 levels (0-65535)
|
|||
|
|
Float32: ~7 decimal digits precision
|
|||
|
|
|
|||
|
|
**Conversion accuracy:**
|
|||
|
|
```python
|
|||
|
|
Original: 32768 (uint16, middle intensity)
|
|||
|
|
Float32: 32768 / 65535 = 0.50000763 (exact)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Full 16-bit precision is preserved in float32 representation.
|
|||
|
|
|
|||
|
|
### Comparison to uint8
|
|||
|
|
|
|||
|
|
| Approach | Precision Loss | Recommended |
|
|||
|
|
|----------|----------------|-------------|
|
|||
|
|
| **float32 [0-1]** | None | ✓ YES |
|
|||
|
|
| uint16 RGB | None | ✓ YES (but disk-heavy) |
|
|||
|
|
| uint8 | 99.6% data loss | ✗ NO |
|
|||
|
|
|
|||
|
|
**Why NO uint8:**
|
|||
|
|
```
|
|||
|
|
Original values: 32768, 32769, 32770 (distinct)
|
|||
|
|
Converted to uint8: 128, 128, 128 (collapsed!)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Multiple 16-bit values collapse to the same uint8 value.
|
|||
|
|
|
|||
|
|
## Training Tab Behavior
|
|||
|
|
|
|||
|
|
When you click "Start Training" with a 16-bit TIFF dataset:
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
[01:23:45] Exported 150 annotations across 50 image(s).
|
|||
|
|
[01:23:45] Using Float32 on-the-fly loader for 16-bit TIFF support (no disk caching)
|
|||
|
|
[01:23:45] Starting training run 'my_model_v1' using yolov8s-seg.pt
|
|||
|
|
[01:23:46] Using Float32Dataset loader for 16-bit TIFF support
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Every training run uses the same approach - fast and efficient!
|
|||
|
|
|
|||
|
|
## Inference vs Training
|
|||
|
|
|
|||
|
|
| Operation | Input | Processing | Output to YOLO |
|
|||
|
|
|-----------|-------|------------|----------------|
|
|||
|
|
| **Inference** | 16-bit TIFF file | Load → float32 [0-1] → 3ch | numpy array (float32) |
|
|||
|
|
| **Training** | 16-bit TIFF dataset | Load on-the-fly → float32 [0-1] → 3ch | numpy array (float32) |
|
|||
|
|
|
|||
|
|
Both preserve full 16-bit precision using float32 representation.
|
|||
|
|
|
|||
|
|
## Technical Details
|
|||
|
|
|
|||
|
|
### Custom Dataset Class
|
|||
|
|
|
|||
|
|
Located in `src/utils/train_ultralytics_float.py`:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class Float32Dataset(YOLODataset):
|
|||
|
|
"""
|
|||
|
|
Extends Ultralytics YOLODataset to handle 16-bit TIFFs.
|
|||
|
|
|
|||
|
|
Key methods:
|
|||
|
|
- load_image(): Intercepts image loading
|
|||
|
|
- Detects .tif/.tiff with dtype == uint16
|
|||
|
|
- Converts: uint16 → float32 [0-1] → RGB (3-channel)
|
|||
|
|
"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Integration with YOLO
|
|||
|
|
|
|||
|
|
The `YOLOWrapper.train()` method automatically uses the custom loader:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# In src/model/yolo_wrapper.py
|
|||
|
|
def train(self, data_yaml, use_float32_loader=True, **kwargs):
|
|||
|
|
if use_float32_loader:
|
|||
|
|
# Use custom Float32Dataset
|
|||
|
|
return train_with_float32_loader(...)
|
|||
|
|
else:
|
|||
|
|
# Standard YOLO training
|
|||
|
|
return self.model.train(...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### No PIL or cv2 for 16-bit
|
|||
|
|
|
|||
|
|
16-bit TIFF loading uses `tifffile` directly:
|
|||
|
|
- PIL: Can load 16-bit but converts during processing
|
|||
|
|
- cv2: Limited 16-bit TIFF support
|
|||
|
|
- tifffile: Native 16-bit support, numpy output
|
|||
|
|
|
|||
|
|
## Advantages Over Disk Caching
|
|||
|
|
|
|||
|
|
### 1. No Disk Space Required
|
|||
|
|
```
|
|||
|
|
Dataset: 1000 images × 12 MB = 12 GB
|
|||
|
|
Old cache: Additional 24 GB (16-bit RGB PNGs)
|
|||
|
|
New approach: 0 GB additional (on-the-fly)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. Faster Setup
|
|||
|
|
```
|
|||
|
|
Old: First training requires cache creation (minutes)
|
|||
|
|
New: Start training immediately (seconds)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. Always In Sync
|
|||
|
|
```
|
|||
|
|
Old: Cache could become stale if images change
|
|||
|
|
New: Always loads current version from disk
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. Simpler Workflow
|
|||
|
|
```
|
|||
|
|
Old: Manage cache directory, cleanup, etc.
|
|||
|
|
New: Just point to dataset and train
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Troubleshooting
|
|||
|
|
|
|||
|
|
### Error: "expected input to have 3 channels, but got 1"
|
|||
|
|
|
|||
|
|
This shouldn't happen with the new Float32Dataset, but if it does:
|
|||
|
|
|
|||
|
|
1. Check that `use_float32_loader=True` in training call
|
|||
|
|
2. Verify `Float32Dataset` is being used (check logs)
|
|||
|
|
3. Ensure `tifffile` is installed: `pip install tifffile`
|
|||
|
|
|
|||
|
|
### Memory Usage
|
|||
|
|
|
|||
|
|
On-the-fly conversion uses memory during training:
|
|||
|
|
- Image loaded: ~24 MB (2048×2048 uint16)
|
|||
|
|
- Converted float32 RGB: ~48 MB (temporary)
|
|||
|
|
- Released after augmentation pipeline
|
|||
|
|
|
|||
|
|
**Mitigation:**
|
|||
|
|
- Reduce batch size if OOM errors occur
|
|||
|
|
- Images are processed one at a time during loading
|
|||
|
|
- Only active batch kept in memory
|
|||
|
|
|
|||
|
|
### Slow Training
|
|||
|
|
|
|||
|
|
If training seems slow:
|
|||
|
|
- Check disk I/O (slow disk can bottleneck loading)
|
|||
|
|
- Verify images aren't being re-converted each epoch (should cache after first load)
|
|||
|
|
- Monitor CPU usage during loading
|
|||
|
|
|
|||
|
|
## Migration from Old Approach
|
|||
|
|
|
|||
|
|
If you have existing cached datasets:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Old cache location (safe to delete)
|
|||
|
|
rm -rf data/datasets/_float32_cache/
|
|||
|
|
|
|||
|
|
# The new approach doesn't use this directory
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Your original dataset structure remains unchanged:
|
|||
|
|
```
|
|||
|
|
data/my_dataset/
|
|||
|
|
├── train/
|
|||
|
|
│ ├── images/ (original 16-bit TIFFs)
|
|||
|
|
│ └── labels/
|
|||
|
|
├── val/
|
|||
|
|
│ ├── images/
|
|||
|
|
│ └── labels/
|
|||
|
|
└── data.yaml
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Just point to the same `data.yaml` and train!
|
|||
|
|
|
|||
|
|
## Performance Comparison
|
|||
|
|
|
|||
|
|
| Metric | Old (Disk Cache) | New (On-the-fly) |
|
|||
|
|
|--------|------------------|------------------|
|
|||
|
|
| First training setup | 5-10 min | 0 sec |
|
|||
|
|
| Disk space overhead | 100% | 0% |
|
|||
|
|
| Training speed | Fast | Fast |
|
|||
|
|
| Subsequent runs | Fast | Fast |
|
|||
|
|
| Data accuracy | 16-bit preserved | 16-bit preserved |
|
|||
|
|
|
|||
|
|
## Summary
|
|||
|
|
|
|||
|
|
✓ **On-the-fly conversion**: Load and convert during training
|
|||
|
|
✓ **No disk caching**: Zero additional disk space
|
|||
|
|
✓ **Full precision**: Float32 preserves 16-bit dynamic range
|
|||
|
|
✓ **No PIL/cv2**: Direct tifffile loading
|
|||
|
|
✓ **Automatic**: Works transparently with training tab
|
|||
|
|
✓ **Fast**: Efficient memory-based conversion
|
|||
|
|
|
|||
|
|
The new approach is simpler, faster to set up, and requires no disk space overhead!
|