# Training YOLO with 16-bit TIFF Datasets ## Quick Start If your dataset contains 16-bit grayscale TIFF files, the training tab will automatically: 1. Detect 16-bit TIFF images in your dataset 2. Convert them to float32 [0-1] RGB **on-the-fly** during training 3. Train without any disk caching (memory-efficient) **No manual intervention or disk space needed!** ## Why Float32 On-The-Fly Conversion? ### The Problem YOLO's training expects: - 3-channel images (RGB) - Images loaded from disk by the dataloader 16-bit grayscale TIFFs are: - 1-channel (grayscale) - Need to be converted to RGB format ### The Solution **NEW APPROACH (Current)**: On-the-fly float32 conversion - Load 16-bit TIFF with `tifffile` (not PIL/cv2) - Convert uint16 [0-65535] → float32 [0-1] in memory - Replicate grayscale to 3 channels - Pass directly to YOLO training pipeline - **No disk caching required!** **OLD APPROACH (Deprecated)**: Disk caching - Created 16-bit RGB PNG cache files on disk - Required ~2x dataset size in disk space - Slower first training run ## How It Works ### Custom Dataset Loader The system uses a custom `Float32Dataset` class that extends Ultralytics' `YOLODataset`: ```python from src.utils.train_ultralytics_float import Float32Dataset # This dataset loader: # 1. Intercepts image loading # 2. Detects 16-bit TIFFs # 3. Converts to float32 [0-1] RGB on-the-fly # 4. Passes to training pipeline ``` ### Conversion Process For each 16-bit grayscale TIFF during training: ``` 1. Load with tifffile → uint16 [0, 65535] 2. Convert to float32 → img.astype(float32) / 65535.0 3. Replicate to RGB → np.stack([img] * 3, axis=-1) 4. Result: float32 [0, 1] RGB array, shape (H, W, 3) ``` ### Memory vs Disk | Aspect | On-the-fly (NEW) | Disk Cache (OLD) | |--------|------------------|------------------| | Disk Space | Dataset size only | ~2× dataset size | | First Training | Fast | Slow (creates cache) | | Subsequent Training | Fast | Fast | | Data Loss | None | None | | Setup Required | None | Cache creation | ## Data Preservation ### Float32 Precision 16-bit TIFF: 65,536 levels (0-65535) Float32: ~7 decimal digits precision **Conversion accuracy:** ```python Original: 32768 (uint16, middle intensity) Float32: 32768 / 65535 = 0.50000763 (exact) ``` Full 16-bit precision is preserved in float32 representation. ### Comparison to uint8 | Approach | Precision Loss | Recommended | |----------|----------------|-------------| | **float32 [0-1]** | None | ✓ YES | | uint16 RGB | None | ✓ YES (but disk-heavy) | | uint8 | 99.6% data loss | ✗ NO | **Why NO uint8:** ``` Original values: 32768, 32769, 32770 (distinct) Converted to uint8: 128, 128, 128 (collapsed!) ``` Multiple 16-bit values collapse to the same uint8 value. ## Training Tab Behavior When you click "Start Training" with a 16-bit TIFF dataset: ``` [01:23:45] Exported 150 annotations across 50 image(s). [01:23:45] Using Float32 on-the-fly loader for 16-bit TIFF support (no disk caching) [01:23:45] Starting training run 'my_model_v1' using yolov8s-seg.pt [01:23:46] Using Float32Dataset loader for 16-bit TIFF support ``` Every training run uses the same approach - fast and efficient! ## Inference vs Training | Operation | Input | Processing | Output to YOLO | |-----------|-------|------------|----------------| | **Inference** | 16-bit TIFF file | Load → float32 [0-1] → 3ch | numpy array (float32) | | **Training** | 16-bit TIFF dataset | Load on-the-fly → float32 [0-1] → 3ch | numpy array (float32) | Both preserve full 16-bit precision using float32 representation. ## Technical Details ### Custom Dataset Class Located in `src/utils/train_ultralytics_float.py`: ```python class Float32Dataset(YOLODataset): """ Extends Ultralytics YOLODataset to handle 16-bit TIFFs. Key methods: - load_image(): Intercepts image loading - Detects .tif/.tiff with dtype == uint16 - Converts: uint16 → float32 [0-1] → RGB (3-channel) """ ``` ### Integration with YOLO The `YOLOWrapper.train()` method automatically uses the custom loader: ```python # In src/model/yolo_wrapper.py def train(self, data_yaml, use_float32_loader=True, **kwargs): if use_float32_loader: # Use custom Float32Dataset return train_with_float32_loader(...) else: # Standard YOLO training return self.model.train(...) ``` ### No PIL or cv2 for 16-bit 16-bit TIFF loading uses `tifffile` directly: - PIL: Can load 16-bit but converts during processing - cv2: Limited 16-bit TIFF support - tifffile: Native 16-bit support, numpy output ## Advantages Over Disk Caching ### 1. No Disk Space Required ``` Dataset: 1000 images × 12 MB = 12 GB Old cache: Additional 24 GB (16-bit RGB PNGs) New approach: 0 GB additional (on-the-fly) ``` ### 2. Faster Setup ``` Old: First training requires cache creation (minutes) New: Start training immediately (seconds) ``` ### 3. Always In Sync ``` Old: Cache could become stale if images change New: Always loads current version from disk ``` ### 4. Simpler Workflow ``` Old: Manage cache directory, cleanup, etc. New: Just point to dataset and train ``` ## Troubleshooting ### Error: "expected input to have 3 channels, but got 1" This shouldn't happen with the new Float32Dataset, but if it does: 1. Check that `use_float32_loader=True` in training call 2. Verify `Float32Dataset` is being used (check logs) 3. Ensure `tifffile` is installed: `pip install tifffile` ### Memory Usage On-the-fly conversion uses memory during training: - Image loaded: ~24 MB (2048×2048 uint16) - Converted float32 RGB: ~48 MB (temporary) - Released after augmentation pipeline **Mitigation:** - Reduce batch size if OOM errors occur - Images are processed one at a time during loading - Only active batch kept in memory ### Slow Training If training seems slow: - Check disk I/O (slow disk can bottleneck loading) - Verify images aren't being re-converted each epoch (should cache after first load) - Monitor CPU usage during loading ## Migration from Old Approach If you have existing cached datasets: ```bash # Old cache location (safe to delete) rm -rf data/datasets/_float32_cache/ # The new approach doesn't use this directory ``` Your original dataset structure remains unchanged: ``` data/my_dataset/ ├── train/ │ ├── images/ (original 16-bit TIFFs) │ └── labels/ ├── val/ │ ├── images/ │ └── labels/ └── data.yaml ``` Just point to the same `data.yaml` and train! ## Performance Comparison | Metric | Old (Disk Cache) | New (On-the-fly) | |--------|------------------|------------------| | First training setup | 5-10 min | 0 sec | | Disk space overhead | 100% | 0% | | Training speed | Fast | Fast | | Subsequent runs | Fast | Fast | | Data accuracy | 16-bit preserved | 16-bit preserved | ## Summary ✓ **On-the-fly conversion**: Load and convert during training ✓ **No disk caching**: Zero additional disk space ✓ **Full precision**: Float32 preserves 16-bit dynamic range ✓ **No PIL/cv2**: Direct tifffile loading ✓ **Automatic**: Works transparently with training tab ✓ **Fast**: Efficient memory-based conversion The new approach is simpler, faster to set up, and requires no disk space overhead!