This commit is contained in:
2025-12-13 01:18:16 +02:00
parent edcd448a61
commit 908e9a5b82
4 changed files with 2102 additions and 21 deletions

View File

@@ -2,17 +2,18 @@
## Overview
This document describes the implementation of 16-bit grayscale TIFF support for YOLO object detection. The system properly loads 16-bit TIFF images, normalizes them to float32 [0-1], and passes them directly to YOLO **without uint8 conversion** to preserve the full dynamic range and avoid data loss.
This document describes the implementation of 16-bit grayscale TIFF support for YOLO object detection. The system properly loads 16-bit TIFF images, normalizes them to float32 [0-1], and handles them appropriately for both **inference** and **training** **without uint8 conversion** to preserve the full dynamic range and avoid data loss.
## Key Features
✅ Reads 16-bit or float32 images using tifffile
✅ Converts to float32 [0-1] (NO uint8 conversion)
✅ Replicates grayscale → RGB (3 channels)
✅ Passes numpy arrays directly to YOLO (no file I/O)
Uses Ultralytics YOLOv8/v11 models
Works with segmentation models
No data loss, no double normalization, no silent clipping
✅ Reads 16-bit or float32 images using tifffile
✅ Converts to float32 [0-1] (NO uint8 conversion)
✅ Replicates grayscale → RGB (3 channels)
**Inference**: Passes numpy arrays directly to YOLO (no file I/O)
**Training**: Creates float32 3-channel TIFF dataset cache
Uses Ultralytics YOLOv8/v11 models
Works with segmentation models
✅ No data loss, no double normalization, no silent clipping
## Changes Made
@@ -46,7 +47,9 @@ Enhanced [`YOLOWrapper._prepare_source()`](../src/model/yolo_wrapper.py:231) to:
## Processing Pipeline
For 16-bit TIFF files:
### For Inference (predict)
For 16-bit TIFF files during inference:
1. **Load**: File loaded using `tifffile` → preserves 16-bit uint16 data
2. **Normalize**: Convert to float32 and scale to [0, 1]
@@ -60,12 +63,28 @@ For 16-bit TIFF files:
4. **Pass to YOLO**: Return float32 array directly (no uint8, no file I/O)
5. **Inference**: YOLO processes the float32 [0-1] RGB array
### For Training (train)
During training, YOLO's internal dataloader loads images from disk, so we create a cached 3-channel dataset:
1. **Detect**: Check if dataset contains 16-bit TIFF files
2. **Create Cache**: Build float32 3-channel TIFF dataset in `data/datasets/_float32_cache/`
3. **Convert Each Image**:
- Load 16-bit TIFF using `tifffile`
- Normalize to float32 [0-1]
- Replicate to 3 channels
- Save as float32 TIFF (preserves precision)
4. **Copy Labels**: Copy label files unchanged
5. **Generate data.yaml**: Points to cached 3-channel dataset
6. **Train**: YOLO trains on float32 3-channel TIFFs
### No Data Loss!
Unlike the previous approach that converted to uint8 (256 levels), the new implementation:
Unlike approaches that convert to uint8 (256 levels), this implementation:
- Preserves full 16-bit dynamic range (65536 levels)
- Maintains precision with float32 representation
- Passes data directly without intermediate file conversions
- For inference: passes data directly without file conversions
- For training: uses float32 TIFFs (not uint8 PNGs)
## Usage
@@ -188,14 +207,16 @@ This test shows the old behavior (uint8 conversion) - kept for comparison.
For a 2048×2048 single-channel image:
| Format | Memory | Notes |
|--------|--------|-------|
| Original 16-bit | 8 MB | uint16 grayscale |
| Float32 grayscale | 16 MB | Intermediate |
| Float32 RGB | 48 MB | Final (3 channels) |
| uint8 RGB (old) | 12 MB | OLD approach with data loss |
| Format | Memory | Disk Space | Notes |
|--------|--------|------------|-------|
| Original 16-bit | 8 MB | ~8 MB | uint16 grayscale TIFF |
| Float32 grayscale | 16 MB | - | Intermediate |
| Float32 3-channel | 48 MB | ~48 MB | Training cache |
| uint8 RGB (old) | 12 MB | ~12 MB | OLD approach with data loss |
The float32 approach uses ~4× more memory than uint8 but preserves **all information**.
The float32 approach uses ~4× more memory and disk space than uint8 but preserves **all information**.
**Cache Directory**: Training creates cached datasets in `data/datasets/_float32_cache/<dataset>_<hash>/`
### Why Direct Numpy Array?