martin/object-segmentation

Fork 0

Files

Martin Laasmaa c7e1271193 Adding file

2025-12-13 09:42:00 +02:00

7.2 KiB

Raw Blame History

Training YOLO with 16-bit TIFF Datasets

Quick Start

If your dataset contains 16-bit grayscale TIFF files, the training tab will automatically:

Detect 16-bit TIFF images in your dataset
Convert them to float32 [0-1] RGB on-the-fly during training
Train without any disk caching (memory-efficient)

No manual intervention or disk space needed!

Why Float32 On-The-Fly Conversion?

The Problem

YOLO's training expects:

3-channel images (RGB)
Images loaded from disk by the dataloader

16-bit grayscale TIFFs are:

1-channel (grayscale)
Need to be converted to RGB format

The Solution

NEW APPROACH (Current): On-the-fly float32 conversion

Load 16-bit TIFF with tifffile (not PIL/cv2)
Convert uint16 [0-65535] → float32 [0-1] in memory
Replicate grayscale to 3 channels
Pass directly to YOLO training pipeline
No disk caching required!

OLD APPROACH (Deprecated): Disk caching

Created 16-bit RGB PNG cache files on disk
Required ~2x dataset size in disk space
Slower first training run

How It Works

Custom Dataset Loader

The system uses a custom Float32Dataset class that extends Ultralytics' YOLODataset:

from src.utils.train_ultralytics_float import Float32Dataset

# This dataset loader:
# 1. Intercepts image loading
# 2. Detects 16-bit TIFFs
# 3. Converts to float32 [0-1] RGB on-the-fly
# 4. Passes to training pipeline

Conversion Process

For each 16-bit grayscale TIFF during training:

1. Load with tifffile → uint16 [0, 65535]
2. Convert to float32 → img.astype(float32) / 65535.0
3. Replicate to RGB → np.stack([img] * 3, axis=-1)
4. Result: float32 [0, 1] RGB array, shape (H, W, 3)

Memory vs Disk

Aspect	On-the-fly (NEW)	Disk Cache (OLD)
Disk Space	Dataset size only	~2× dataset size
First Training	Fast	Slow (creates cache)
Subsequent Training	Fast	Fast
Data Loss	None	None
Setup Required	None	Cache creation

Data Preservation

Float32 Precision

16-bit TIFF: 65,536 levels (0-65535) Float32: ~7 decimal digits precision

Conversion accuracy:

Original: 32768 (uint16, middle intensity)
Float32: 32768 / 65535 = 0.50000763 (exact)

Full 16-bit precision is preserved in float32 representation.

Comparison to uint8

Approach	Precision Loss	Recommended
float32 [0-1]	None	✓ YES
uint16 RGB	None	✓ YES (but disk-heavy)
uint8	99.6% data loss	✗ NO

Why NO uint8:

Original values:     32768, 32769, 32770 (distinct)
Converted to uint8:  128,   128,   128   (collapsed!)

Multiple 16-bit values collapse to the same uint8 value.

Training Tab Behavior

When you click "Start Training" with a 16-bit TIFF dataset:

[01:23:45] Exported 150 annotations across 50 image(s).
[01:23:45] Using Float32 on-the-fly loader for 16-bit TIFF support (no disk caching)
[01:23:45] Starting training run 'my_model_v1' using yolov8s-seg.pt
[01:23:46] Using Float32Dataset loader for 16-bit TIFF support

Every training run uses the same approach - fast and efficient!

Inference vs Training

Operation	Input	Processing	Output to YOLO
Inference	16-bit TIFF file	Load → float32 [0-1] → 3ch	numpy array (float32)
Training	16-bit TIFF dataset	Load on-the-fly → float32 [0-1] → 3ch	numpy array (float32)

Both preserve full 16-bit precision using float32 representation.

Technical Details

Custom Dataset Class

Located in src/utils/train_ultralytics_float.py:

class Float32Dataset(YOLODataset):
    """
    Extends Ultralytics YOLODataset to handle 16-bit TIFFs.
    
    Key methods:
    - load_image(): Intercepts image loading
    - Detects .tif/.tiff with dtype == uint16
    - Converts: uint16 → float32 [0-1] → RGB (3-channel)
    """

Integration with YOLO

The YOLOWrapper.train() method automatically uses the custom loader:

# In src/model/yolo_wrapper.py
def train(self, data_yaml, use_float32_loader=True, **kwargs):
    if use_float32_loader:
        # Use custom Float32Dataset
        return train_with_float32_loader(...)
    else:
        # Standard YOLO training
        return self.model.train(...)

No PIL or cv2 for 16-bit

16-bit TIFF loading uses tifffile directly:

PIL: Can load 16-bit but converts during processing
cv2: Limited 16-bit TIFF support
tifffile: Native 16-bit support, numpy output

Advantages Over Disk Caching

1. No Disk Space Required

Dataset: 1000 images × 12 MB = 12 GB
Old cache: Additional 24 GB (16-bit RGB PNGs)
New approach: 0 GB additional (on-the-fly)

2. Faster Setup

Old: First training requires cache creation (minutes)
New: Start training immediately (seconds)

3. Always In Sync

Old: Cache could become stale if images change
New: Always loads current version from disk

4. Simpler Workflow

Old: Manage cache directory, cleanup, etc.
New: Just point to dataset and train

Troubleshooting

Error: "expected input to have 3 channels, but got 1"

This shouldn't happen with the new Float32Dataset, but if it does:

Check that use_float32_loader=True in training call
Verify Float32Dataset is being used (check logs)
Ensure tifffile is installed: pip install tifffile

Memory Usage

On-the-fly conversion uses memory during training:

Image loaded: ~24 MB (2048×2048 uint16)
Converted float32 RGB: ~48 MB (temporary)
Released after augmentation pipeline

Mitigation:

Reduce batch size if OOM errors occur
Images are processed one at a time during loading
Only active batch kept in memory

Slow Training

If training seems slow:

Check disk I/O (slow disk can bottleneck loading)
Verify images aren't being re-converted each epoch (should cache after first load)
Monitor CPU usage during loading

Migration from Old Approach

If you have existing cached datasets:

# Old cache location (safe to delete)
rm -rf data/datasets/_float32_cache/

# The new approach doesn't use this directory

Your original dataset structure remains unchanged:

data/my_dataset/
├── train/
│   ├── images/  (original 16-bit TIFFs)
│   └── labels/
├── val/
│   ├── images/
│   └── labels/
└── data.yaml

Just point to the same data.yaml and train!

Performance Comparison

Metric	Old (Disk Cache)	New (On-the-fly)
First training setup	5-10 min	0 sec
Disk space overhead	100%	0%
Training speed	Fast	Fast
Subsequent runs	Fast	Fast
Data accuracy	16-bit preserved	16-bit preserved

Summary

✓ On-the-fly conversion: Load and convert during training
✓ No disk caching: Zero additional disk space
✓ Full precision: Float32 preserves 16-bit dynamic range
✓ No PIL/cv2: Direct tifffile loading
✓ Automatic: Works transparently with training tab
✓ Fast: Efficient memory-based conversion

The new approach is simpler, faster to set up, and requires no disk space overhead!

7.2 KiB Raw Blame History Unescape Escape