Bug fix
This commit is contained in:
@@ -2,17 +2,18 @@
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the implementation of 16-bit grayscale TIFF support for YOLO object detection. The system properly loads 16-bit TIFF images, normalizes them to float32 [0-1], and passes them directly to YOLO **without uint8 conversion** to preserve the full dynamic range and avoid data loss.
|
||||
This document describes the implementation of 16-bit grayscale TIFF support for YOLO object detection. The system properly loads 16-bit TIFF images, normalizes them to float32 [0-1], and handles them appropriately for both **inference** and **training** **without uint8 conversion** to preserve the full dynamic range and avoid data loss.
|
||||
|
||||
## Key Features
|
||||
|
||||
✅ Reads 16-bit or float32 images using tifffile
|
||||
✅ Converts to float32 [0-1] (NO uint8 conversion)
|
||||
✅ Replicates grayscale → RGB (3 channels)
|
||||
✅ Passes numpy arrays directly to YOLO (no file I/O)
|
||||
✅ Uses Ultralytics YOLOv8/v11 models
|
||||
✅ Works with segmentation models
|
||||
✅ No data loss, no double normalization, no silent clipping
|
||||
✅ Reads 16-bit or float32 images using tifffile
|
||||
✅ Converts to float32 [0-1] (NO uint8 conversion)
|
||||
✅ Replicates grayscale → RGB (3 channels)
|
||||
✅ **Inference**: Passes numpy arrays directly to YOLO (no file I/O)
|
||||
✅ **Training**: Creates float32 3-channel TIFF dataset cache
|
||||
✅ Uses Ultralytics YOLOv8/v11 models
|
||||
✅ Works with segmentation models
|
||||
✅ No data loss, no double normalization, no silent clipping
|
||||
|
||||
## Changes Made
|
||||
|
||||
@@ -46,7 +47,9 @@ Enhanced [`YOLOWrapper._prepare_source()`](../src/model/yolo_wrapper.py:231) to:
|
||||
|
||||
## Processing Pipeline
|
||||
|
||||
For 16-bit TIFF files:
|
||||
### For Inference (predict)
|
||||
|
||||
For 16-bit TIFF files during inference:
|
||||
|
||||
1. **Load**: File loaded using `tifffile` → preserves 16-bit uint16 data
|
||||
2. **Normalize**: Convert to float32 and scale to [0, 1]
|
||||
@@ -60,12 +63,28 @@ For 16-bit TIFF files:
|
||||
4. **Pass to YOLO**: Return float32 array directly (no uint8, no file I/O)
|
||||
5. **Inference**: YOLO processes the float32 [0-1] RGB array
|
||||
|
||||
### For Training (train)
|
||||
|
||||
During training, YOLO's internal dataloader loads images from disk, so we create a cached 3-channel dataset:
|
||||
|
||||
1. **Detect**: Check if dataset contains 16-bit TIFF files
|
||||
2. **Create Cache**: Build float32 3-channel TIFF dataset in `data/datasets/_float32_cache/`
|
||||
3. **Convert Each Image**:
|
||||
- Load 16-bit TIFF using `tifffile`
|
||||
- Normalize to float32 [0-1]
|
||||
- Replicate to 3 channels
|
||||
- Save as float32 TIFF (preserves precision)
|
||||
4. **Copy Labels**: Copy label files unchanged
|
||||
5. **Generate data.yaml**: Points to cached 3-channel dataset
|
||||
6. **Train**: YOLO trains on float32 3-channel TIFFs
|
||||
|
||||
### No Data Loss!
|
||||
|
||||
Unlike the previous approach that converted to uint8 (256 levels), the new implementation:
|
||||
Unlike approaches that convert to uint8 (256 levels), this implementation:
|
||||
- Preserves full 16-bit dynamic range (65536 levels)
|
||||
- Maintains precision with float32 representation
|
||||
- Passes data directly without intermediate file conversions
|
||||
- For inference: passes data directly without file conversions
|
||||
- For training: uses float32 TIFFs (not uint8 PNGs)
|
||||
|
||||
## Usage
|
||||
|
||||
@@ -188,14 +207,16 @@ This test shows the old behavior (uint8 conversion) - kept for comparison.
|
||||
|
||||
For a 2048×2048 single-channel image:
|
||||
|
||||
| Format | Memory | Notes |
|
||||
|--------|--------|-------|
|
||||
| Original 16-bit | 8 MB | uint16 grayscale |
|
||||
| Float32 grayscale | 16 MB | Intermediate |
|
||||
| Float32 RGB | 48 MB | Final (3 channels) |
|
||||
| uint8 RGB (old) | 12 MB | OLD approach with data loss |
|
||||
| Format | Memory | Disk Space | Notes |
|
||||
|--------|--------|------------|-------|
|
||||
| Original 16-bit | 8 MB | ~8 MB | uint16 grayscale TIFF |
|
||||
| Float32 grayscale | 16 MB | - | Intermediate |
|
||||
| Float32 3-channel | 48 MB | ~48 MB | Training cache |
|
||||
| uint8 RGB (old) | 12 MB | ~12 MB | OLD approach with data loss |
|
||||
|
||||
The float32 approach uses ~4× more memory than uint8 but preserves **all information**.
|
||||
The float32 approach uses ~4× more memory and disk space than uint8 but preserves **all information**.
|
||||
|
||||
**Cache Directory**: Training creates cached datasets in `data/datasets/_float32_cache/<dataset>_<hash>/`
|
||||
|
||||
### Why Direct Numpy Array?
|
||||
|
||||
|
||||
Reference in New Issue
Block a user