Chapter 5: File formats, tiling, and storage

Chapter 5: File formats, tiling, and storage

This chapter explains how whole slide images are actually stored as files, why formats differ between vendors, and what that means for performance, interoperability, and long‑term archiving.


5.1 Why so many whole slide image formats exist

What you need to know

  • All WSI formats solve the same basic problem: how to store a very large 2D image (often several gigapixels) in a way that is navigable and reasonably fast to load.
  • Most formats use three core ideas:
    • Tiles: the slide is broken into many small image blocks that can be read independently.
    • Pyramids: the same slide is stored at multiple resolutions, from a low‑resolution overview up to full resolution.
    • Compression: tiles are compressed (usually JPEG or JPEG2000) to keep file sizes manageable.
  • Different vendors designed their own container formats for historical reasons:
    • They needed something that worked with their hardware and servers before standards existed.
    • They optimized for their own viewers and workflows.
  • This means that in practice, labs inherit multiple coexisting formats (SVS, NDPI, MRXS, SCN, etc.), especially during mergers or scanner upgrades.
  • For daily work, the key questions are:
    • Can your viewer and image management system open all the formats you need?
    • How easily can you export or convert images if you change vendor?

Reference
Zarella MD, Bowman D, Aeffner F, et al. A practical guide to whole slide imaging: A white paper from the Digital Pathology Association. Arch Pathol Lab Med. 2019;143(2):222–234. doi:10.5858/arpa.2018-0343-RA. Available at: https://doi.org/10.5858/arpa.2018-0343-RA


5.2 Proprietary vendor formats versus open standards

What you need to know

  • Proprietary formats (such as SVS, NDPI, MRXS, SCN) are essentially containers that wrap:
    • Pyramid tile data (often JPEG or JPEG2000 compressed).
    • Headers and metadata in vendor‑specific layouts.
  • Advantages of proprietary formats:
    • Deep integration with the vendor’s scanners and viewers.
    • Often tuned for performance on the supplier’s hardware and network stack.
  • Disadvantages:
    • Interoperability problems: not all viewers can read every vendor format.
    • Migration pain: large archives may need conversion when you change vendors or viewers.
    • Risk of vendor lock‑in: it can be harder to move data if licensing or support changes.
  • Open or standard‑oriented approaches (e.g., DICOM WSI, “generic” pyramidal TIFF) aim to:
    • Make it easier to use multiple viewers and tools on the same data.
    • Allow easier long‑term archiving independent of a single vendor.
    • Provide a predictable way to store metadata (patient ID, stain, magnification, etc.).
  • In practice, many labs run a mixed model:
    • Store slides initially in each scanner’s native format for performance.
    • Export or convert selected subsets (e.g., research cohorts, shared datasets) into more open formats.

Reference
Jahn SW, Plass M, Moinfar F. Digital pathology: Advantages, limitations and emerging perspectives. J Clin Med. 2020;9(11):3697. doi:10.3390/jcm9113697. Available at: https://doi.org/10.3390/jcm9113697


5.3 DICOM for digital pathology: what it is and why it matters

What you need to know

  • DICOM is the same standard used for radiology images; digital pathology has now been added so that WSI can live in the wider enterprise imaging ecosystem.
  • A DICOM WSI object stores:
    • The tiled pyramid of the slide.
    • Rich metadata about the specimen, slide, stain, and acquisition device.
    • Spatial coordinates so that annotations and measurements can be linked precisely to image regions.
  • Benefits of using DICOM for WSI:
    • Interoperability with existing PACS, VNAs, and enterprise viewers.
    • Standardized way to encode AI outputs and annotations, which helps with reproducibility and regulatory submissions.
    • Clearer separation between the storage format and the viewer implementation.
  • Challenges and limitations in real labs:
    • Converters and native DICOM scanners are still evolving; performance and feature support can vary.
    • Some pathology‑specific concepts (e.g., multiple blocks on one slide, complex stains) need careful mapping into DICOM objects.
    • Moving from “proprietary + pathology‑only server” to “enterprise imaging in DICOM” is a multi‑year infrastructure project.
  • Even if your lab is not ready for full DICOM adoption, it is worth understanding the basics, because many vendors and national infrastructures are moving in this direction.

Reference
Herrmann MD, Clunie DA, Fedorov A, et al. Implementing the DICOM standard for digital pathology. J Pathol Inform. 2018;9:37. doi:10.4103/jpi.jpi_42_18. Available at: https://doi.org/10.4103/jpi.jpi_42_18


5.4 Tiling, pyramids, and why they matter for performance

What you need to know

  • WSI files are not opened like a simple JPEG; the viewer asks the server or local file for specific tiles at specific resolutions.
  • Key ideas:
    • Tile size (for example, 256 × 256 or 512 × 512 pixels) affects:
      • How many requests are needed to fill your screen.
      • How much data is wasted when you zoom into a small region.
    • Pyramid depth (number of resolution levels) affects:
      • How smooth zooming feels.
      • Whether the viewer can avoid loading full‑resolution data when you are zoomed out.
    • Compression type and level affect:
      • File size and storage costs.
      • CPU load and latency when decompressing tiles.
  • For clinicians, the practical implications are:
    • Perceived “snappiness” when you pan and zoom is determined mostly by tile layout, compression, and network speed, not by the scanner’s megapixel count.
    • Poorly chosen tile sizes or heavy compression can create lag or visual artifacts (blockiness, ringing) when you move quickly across a slide.
  • For local or remote reporting, you care about:
    • How fast a typical case loads.
    • How much lag appears when reviewing large resection slides.
    • Whether cache and prefetch strategies in the viewer are optimized for your workflow (for example, prefetching tiles in the direction you usually scan).

Reference
Herrmann MD, Clunie DA, Fedorov A, et al. Implementing the DICOM standard for digital pathology. J Pathol Inform. 2018;9:37. doi:10.4103/jpi.jpi_42_18. Available at: https://doi.org/10.4103/jpi.jpi_42_18


5.5 Storage planning, archiving, and vendor lock‑in

What you need to know

  • WSI data volumes are large but predictable; planning ahead avoids painful “storage crises” later:
    • Typical H&E slides are in the hundreds of megabytes range; special stains and multiplexed images can be much larger.
    • Multiply by your annual slide volume to estimate growth; then add extra for rescans, external consults, and research projects.
  • Storage design should consider:
    • Short‑term high‑performance storage for active cases.
    • Medium‑term storage for recent cases and quality assurance.
    • Long‑term archive (on‑premise, cloud, or hybrid) with appropriate backup and disaster recovery.
  • Regulatory expectations increasingly view digital slides as part of the permanent patient record; your retention policies should align with local rules for glass slides and reports.
  • Vendor lock‑in risks:
    • Archives stored only in proprietary formats on proprietary systems are harder to migrate.
    • Negotiate up front for export options, documented formats, and access to your own data if contracts change.
  • Pathologists should participate in storage and archive discussions because choices can affect:
    • How easily you can retrieve old cases for comparison or quality review.
    • How quickly you can build research cohorts or teaching sets.
    • Whether future tools (including AI) can be applied to historical data without re‑scanning.

Reference
Jahn SW, Plass M, Moinfar F. Digital pathology: Advantages, limitations and emerging perspectives. J Clin Med. 2020;9(11):3697. doi:10.3390/jcm9113697. Available at: https://doi.org/10.3390/jcm9113697