Files

Martin Laasmaa aec0fbf83c Adding standalone training script and update

2025-12-13 09:28:24 +02:00

9.0 KiB

Raw Blame History

16-bit TIFF Support for YOLO Object Detection

Overview

This document describes the implementation of 16-bit grayscale TIFF support for YOLO object detection. The system properly loads 16-bit TIFF images, normalizes them to float32 [0-1], and handles them appropriately for both inference and training without uint8 conversion to preserve the full dynamic range and avoid data loss.

Key Features

✅ Reads 16-bit or float32 images using tifffile ✅ Converts to float32 [0-1] (NO uint8 conversion) ✅ Replicates grayscale → RGB (3 channels) ✅ Inference: Passes numpy arrays directly to YOLO (no file I/O) ✅ Training: On-the-fly float32 conversion (NO disk caching) ✅ Uses Ultralytics YOLOv8/v11 models ✅ Works with segmentation models ✅ No data loss, no double normalization, no silent clipping

Changes Made

1. Dependencies (`requirements.txt`)

Added tifffile>=2023.0.0 for reliable 16-bit TIFF loading

2. Image Loading (`src/utils/image.py`)

Enhanced TIFF Loading

Modified Image._load() to use tifffile for .tif and .tiff files
Preserves original 16-bit data type during loading
Properly handles both grayscale and multi-channel TIFF files

New Normalization Method

Added Image.to_normalized_float32() method that:

Converts image data to float32
Properly scales values to [0, 1] range:
- 16-bit images: divides by 65535 (full dynamic range)
- 8-bit images: divides by 255
- Float images: clips to [0, 1]
Handles various data types automatically

3. YOLO Preprocessing (`src/model/yolo_wrapper.py`)

Enhanced YOLOWrapper._prepare_source() to:

Detect 16-bit TIFF files automatically
Load and normalize to float32 [0-1] using the new method
Replicate grayscale to RGB (3 channels)
Return numpy array directly (NO file saving, NO uint8 conversion)
Pass float32 array directly to YOLO for inference

Processing Pipeline

For Inference (predict)

For 16-bit TIFF files during inference:

Load: File loaded using tifffile → preserves 16-bit uint16 data

Normalize: Convert to float32 and scale to [0, 1]

float_data = uint16_data.astype(np.float32) / 65535.0

RGB Conversion: Replicate grayscale to 3 channels

rgb_float = np.stack([float_data] * 3, axis=-1)

Pass to YOLO: Return float32 array directly (no uint8, no file I/O)
Inference: YOLO processes the float32 [0-1] RGB array

For Training (train)

Training now uses a custom dataset loader with on-the-fly conversion (NO disk caching):

Custom Dataset: Uses Float32Dataset class that extends Ultralytics' YOLODataset
Load On-The-Fly: Each image is loaded and converted during training:
- Detect 16-bit TIFF files automatically
- Load with tifffile (preserves uint16)
- Convert to float32 [0-1] in memory
- Replicate to 3 channels (RGB)
No Disk Cache: Conversion happens in memory, no files written
Train: YOLO trains on float32 [0-1] RGB arrays directly

See src/utils/train_ultralytics_float.py for implementation.

No Data Loss!

Unlike approaches that convert to uint8 (256 levels), this implementation:

Preserves full 16-bit dynamic range (65536 levels)
Maintains precision with float32 representation
For inference: passes data directly without file conversions
For training: uses float32 TIFFs (not uint8 PNGs)

Usage

Basic Image Loading

from src.utils.image import Image

# Load a 16-bit TIFF file
img = Image("path/to/16bit_image.tif")

# Get normalized float32 data [0-1]
normalized = img.to_normalized_float32()  # Shape: (H, W), dtype: float32

# Original data is preserved
original = img.data  # Still uint16

YOLO Inference

The preprocessing is automatic - just use YOLO as normal:

from src.model.yolo_wrapper import YOLOWrapper

# Initialize model
yolo = YOLOWrapper("yolov8s-seg.pt")
yolo.load_model()

# Perform inference on 16-bit TIFF
# The image will be automatically normalized and passed as float32 [0-1]
detections = yolo.predict("path/to/16bit_image.tif", conf=0.25)

With InferenceEngine

from src.model.inference import InferenceEngine
from src.database.db_manager import DatabaseManager

# Setup
db = DatabaseManager("database.db")
engine = InferenceEngine("model.pt", db, model_id=1)

# Detect objects in 16-bit TIFF
result = engine.detect_single(
    image_path="path/to/16bit_image.tif",
    relative_path="images/16bit_image.tif",
    conf=0.25
)

Testing

Three test scripts are provided:

1. Image Loading Test

./venv/bin/python tests/test_16bit_tiff_loading.py

Tests:

Loading 16-bit TIFF files with tifffile
Normalization to float32 [0-1]
Data type and value range verification

2. Float32 Passthrough Test (Most Important!)

./venv/bin/python tests/test_yolo_16bit_float32.py

Tests:

YOLO preprocessing returns numpy array (not file path)
Data is float32 [0-1] (not uint8)
No quantization to 256 levels (proves no uint8 conversion)

Sample output:

✓ SUCCESS: Prepared source is a numpy array (float32 passthrough)
  Shape: (200, 200, 3)
  Dtype: float32
  Min value: 0.000000
  Max value: 1.000000
  Unique values: 399

✓ SUCCESS: Data has 399 unique values (> 256)
  This confirms NO uint8 quantization occurred!

3. Legacy Test (Shows Old Behavior)

./venv/bin/python tests/test_yolo_16bit_preprocessing.py

This test shows the old behavior (uint8 conversion) - kept for comparison.

Benefits

No Data Loss: Preserves full 16-bit dynamic range (65536 levels vs 256)
High Precision: Float32 maintains fine-grained intensity differences
Automatic Processing: No manual preprocessing needed
YOLO Compatible: Ultralytics YOLO accepts float32 [0-1] arrays
Performance: No intermediate file I/O for 16-bit TIFFs
Backwards Compatible: Regular images (8-bit PNG, JPEG, etc.) still work as before

Technical Notes

Float32 vs uint8

With uint8 conversion (OLD - BAD):

16-bit (65536 levels) → uint8 (256 levels) = 99.6% data loss!
Fine intensity differences are lost
Quantization artifacts

With float32 [0-1] (NEW - GOOD):

16-bit (65536 levels) → float32 (continuous) = No data loss
Full dynamic range preserved
Smooth gradients maintained

Memory Considerations

For a 2048×2048 single-channel image:

Format	Memory	Disk Space	Notes
Original 16-bit	8 MB	~8 MB	uint16 grayscale TIFF
Float32 grayscale	16 MB	-	Intermediate
Float32 3-channel	48 MB	~48 MB	Training cache
uint8 RGB (old)	12 MB	~12 MB	OLD approach with data loss

The float32 approach uses ~3× more memory than uint8 during training but preserves all information.

No Disk Cache: The new on-the-fly approach eliminates the need for cached datasets on disk.

Why Direct Numpy Array?

Passing numpy arrays directly to YOLO (instead of saving to file):

Faster: No disk I/O overhead
No Quantization: Avoids PNG/JPEG quantization
Memory Efficient: Single copy in memory
Cleaner: No temp file management

Ultralytics YOLO supports various input types:

File paths (str): "image.jpg"
Numpy arrays: np.ndarray ← we use this
PIL Images: PIL.Image
Torch tensors: torch.Tensor

Training with Float32 Dataset Loader

The system now includes a custom dataset loader for 16-bit TIFF training:

from src.utils.train_ultralytics_float import train_with_float32_loader

# Train with on-the-fly float32 conversion
results = train_with_float32_loader(
    model_path="yolov8s-seg.pt",
    data_yaml="data/my_dataset/data.yaml",
    epochs=100,
    batch=16,
    imgsz=640,
)

The Float32Dataset class automatically:

Detects 16-bit TIFF files
Loads with tifffile (not PIL/cv2)
Converts to float32 [0-1] on-the-fly
Replicates to 3 channels
Integrates seamlessly with Ultralytics training pipeline

This is used automatically by the training tab in the GUI.

Installation

Install the updated dependencies:

./venv/bin/pip install -r requirements.txt

Or install tifffile directly:

./venv/bin/pip install tifffile>=2023.0.0

Example Test Output

=== Testing Float32 Passthrough (NO uint8) ===
Created test 16-bit TIFF: /tmp/tmpdt5hm0ab.tif
  Shape: (200, 200)
  Dtype: uint16
  Min value: 0
  Max value: 65535

Preprocessing result:
  Prepared source type: <class 'numpy.ndarray'>

✓ SUCCESS: Prepared source is a numpy array (float32 passthrough)
  Shape: (200, 200, 3)
  Dtype: float32
  Min value: 0.000000
  Max value: 1.000000
  Mean value: 0.499992
  Unique values: 399

✓ SUCCESS: Data has 399 unique values (> 256)
  This confirms NO uint8 quantization occurred!

✓ All float32 passthrough tests passed!

9.0 KiB Raw Blame History Unescape Escape