Chapter 2: Pixels, colour, and compression

Chapter 2: Pixels, colour, and compression

2.1 Pixels, magnification, and microns per pixel

What you need to know

  • A digital pathology image is a grid of pixels
    • Each pixel is the smallest addressable element in the image.
    • Think of it as a tiny square on the tissue. The scanner measures the light from that square and stores it as numbers.
  • The most important physical quantity is microns per pixel (µm/px)
    • This tells you how much real tissue a single pixel represents.
    • Example:
      • 0.25 µm/px means each pixel covers a 0.25 by 0.25 micron square of tissue.
      • 0.5 µm/px means each pixel covers a 0.5 by 0.5 micron square.
    • Smaller µm/px means finer sampling and more detail, but also more pixels and bigger files.
  • Magnification labels (20x, 40x) are only shorthand
    • Manufacturers often say things like “scanned at 40x”, but what really matters is the µm/px that falls out of the optics and sensor.
    • Two scanners both labeled “40x” can have slightly different µm/px values.
    • For QA, validation, and publications you should always quote µm/px, not just “20x vs 40x”.
  • Why µm/px matters clinically
    • At coarse sampling (larger µm/px) you may lose fine detail such as mitotic figures, tiny microorganisms, or subtle nuclear membrane irregularities.
    • At extremely fine sampling (very small µm/px) you add data volume and scan time but may not gain meaningful clinical information if the tissue quality or optics are the limiting factors.
    • You should be comfortable asking:
      • For this specimen type and stain, can I safely judge nuclear detail at this µm/px?
      • Is there a group of cases where we routinely need finer sampling, such as small biopsies for lymphoma or high grade dysplasia?
  • Linking back to the glass microscope
    • On glass, you instinctively know what you trust at 10x, 20x, or 40x.
    • In WSI, µm/px is the digital equivalent of that mental model.
    • During validation, you are essentially checking whether a given µm/px behaves like your usual “sign out magnification” for your typical tasks.

Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.


2.2 Colour, channels, and colour spaces in WSI (RGB, stains, calibration)

What you need to know

  • Most brightfield WSI data is stored as RGB colour images
    • Each pixel has three values: red, green, and blue.
    • What you see as purple nuclei or pink cytoplasm is your brain interpreting combinations of these three numbers.
  • There is no direct “H channel” and “E channel”
    • H and E are physical stains on the slide, not separate digital channels.
    • The scanner measures the light that has passed through both stains together and decomposes it into RGB.
  • Channel depth and representation
    • For routine pathology, scanners usually store 8 bits per channel (0 to 255).
    • That gives 256 possible values for each of the red, green, and blue signals at every pixel.
    • This is sufficient for smooth colour gradients and stable viewing on standard monitors.
  • Why colour varies between labs and scanners
    • Staining: reagent batches, timing, temperature, and protocols.
    • Scanner hardware: lamp or LED spectrum, sensor characteristics, optics.
    • Viewing: monitor calibration, brightness settings, and ambient light in the room.
    • This is why the same H and E in two hospitals often looks slightly “warmer”, “cooler”, “more purple”, or “more red”.
  • Clinical implications of colour variation
    • Slight differences rarely destroy diagnostic value, but can change the feel of the slide, especially:
      • prominence of nucleoli
      • contrast between chromatin and nucleolus
      • intensity of eosinophilic material or red blood cells
      • apparent strength of IHC staining.
    • When you validate a digital system, you validate the end to end colour pipeline (stain, scanner, viewer, monitor) rather than any single element.
  • Colour standardisation and AI
    • Algorithms are sensitive to stain and scanner variation. The same tumour may look quite different numerically across institutions.
    • Many image analysis pipelines introduce stain or colour normalisation steps to reduce this variation.
    • As a pathologist you do not need to implement these methods, but you should know that:
      • unstable staining protocols or sudden scanner setting changes can break both human visual expectations and AI performance
      • for any algorithm, its training data, staining conditions, and scanners matter just as much as its network architecture.
  • Practical questions to ask in your lab
    • Are our stain runs reasonably consistent day to day and between sites?
    • Are scanners kept in a reasonably stable configuration, or does someone frequently change brightness and contrast settings?
    • Do we have any routine process for monitor calibration for sign out workstations?

Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.


2.3 Sampling and resolution trade offs, including 20x versus 40x

What you need to know

  • Sampling is how finely you measure the tissue
    • In digital terms this is µm/px.
    • In microscope terms this is like deciding whether you mostly work at 10x, 20x, or 40x.
  • What changes when you go from 20x to 40x scanning
    • Pixels become smaller on the tissue, so you capture more detail.
    • For a given field of view, you now have roughly four times as many pixels.
    • File sizes grow, scan times increase, and network and storage load increase.
  • Many services successfully use 20x scanning (around 0.5 µm/px) for routine surgical cases
    • For a broad range of H and E tasks, 20x sampling has been shown to be diagnostically safe when properly validated.
    • 40x scans or rescans are often reserved for:
      • small biopsies where every nucleus counts
      • cases where mitotic figure counting, microorganisms, or very fine architectural detail drive the decision
      • particular subspecialties that have agreed they genuinely need that level of detail.
  • You should think in terms of tasks, not magnification labels
    • “Is 20x enough” is not a universal question. It becomes:
      • For this tumour type and grading system, are the relevant features adequately sampled at our chosen µm/px?
      • For this IHC assessment, can I reliably see what I need to see at this sampling?
    • Your local validation studies should include the kinds of cases where you suspect resolution might be critical.
  • Practical tradeoffs you should be able to discuss
    • Scanning everything at 40x may:
      • stress storage and archives
      • increase scan failures in some scanners
      • slow down viewing and AI processing
      • offer minimal benefit for many bread and butter cases.
    • A mixed approach is common:
      • baseline scanning at a carefully chosen µm/px
      • selective higher resolution capture where there is a clear justification in terms of patient safety or specific diagnostic needs.
  • Talking to IT and administration
    • When you say “we need 40x”, you should be ready to explain:
      • which cases specifically
      • what risk is mitigated
      • what the cost is in terms of scan time and storage
      • whether this might be handled through a protocol such as “selective high resolution rescanning” rather than default 40x for all slides.

Reference
Kumar N, Gupta R, Gupta S. Whole slide imaging (WSI) in pathology: current perspectives and future directions. J Digit Imaging. 2020;33(4):1034–1040. doi:10.1007/s10278-020-00351-z.


2.4 Bit depth, dynamic range, and why 8 versus 12 bits often matters less than you think

What you need to know

  • Bit depth is how many distinct intensity levels a scanner can encode for each channel
    • 8 bits per channel gives 256 possible values (0 to 255).
    • 12 bits gives 4096 values.
    • 16 bits gives 65536 values.
    • Higher bit depth allows a larger dynamic range and finer distinctions in principle.
  • In routine H and E and many brightfield IHC applications
    • The limiting factors are often the biology and optics, not the bit depth.
    • Tissue thickness, stain variability, and sensor noise mean that beyond a certain point you do not gain clinically obvious improvement by going from 8 to 12 or 16 bits.
    • Your monitor and graphics chain also limit what you can actually see.
  • Why many systems use 8 bits per channel for viewing
    • It is enough to avoid visible banding or posterisation for normal pathology slides.
    • Files are smaller and faster to move around.
    • Most monitors and operating systems are optimised for 8 bit per channel colour.
  • When higher bit depth might matter
    • Certain quantitative workflows, such as detailed densitometry or some fluorescence applications, can benefit from higher bit depths, especially at acquisition time.
    • Research projects that work with raw sensor data sometimes use the full native bit depth before mapping down to 8 bits for display.
    • Even in those cases, good staining and stable scanner settings usually have more impact than bit depth alone.
  • How to think about bit depth in service design
    • For routine brightfield work it is usually safer to assume that:
      • consistent staining
      • stable exposure
      • good focus
      • and robust validation
        are far more important than chasing extra bits.
    • Before insisting on higher bit depth, ask:
      • What specific task would be unsafe or impossible at 8 bits per channel?
      • Is there evidence that clinicians or algorithms perform better with higher bit depth on the slides we actually produce?

Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.


2.5 Compression in WSI: lossless versus lossy, diagnostic impact, and file size

What you need to know

  • Whole slide images are huge, so compression is unavoidable
    • A single uncompressed slide can be multiple gigabytes.
    • Compression makes storage and network use manageable.
  • Two main types of compression
    • Lossless
      • Reduces file size without changing any pixel values.
      • The original image can be reconstructed exactly.
      • Gives moderate savings but is safe conceptually.
    • Lossy
      • Discards some information to achieve much higher compression.
      • The decompressed image is not identical to the original.
      • The key question is whether the loss is diagnostically acceptable.
  • The idea of diagnostically acceptable irreversible compression
    • There is a range of lossy compression where the differences are invisible or irrelevant for diagnosis.
    • Studies with JPEG2000 and other methods have shown that pathology slides can be compressed by a factor of several times before observers begin to lose accuracy.
    • Past that range, artefacts appear:
      • blocky areas
      • halos around edges
      • blur that softens fine detail.
  • Clinical responsibilities around compression
    • Compression settings are not “IT only” decisions. Pathologists should:
      • participate in choosing compression levels
      • inspect slides at the proposed settings, including challenging cases
      • be involved in validation studies that compare diagnostic performance at different compression ratios.
    • If your lab changes compression settings, this is effectively a change to the imaging system and can require re validation, especially if you use AI or quantitative tools.
  • Compression and AI or quantitative analysis
    • Moderate lossy compression is often tolerated by algorithms, especially if both training and deployment use the same settings.
    • Very high compression or mismatched compression between training and deployment can degrade model performance or introduce subtle biases.
    • For any algorithm, you should treat the compression configuration as part of the model’s environment:
      • if you change it, you may need to re test the model.
  • Questions you should be able to ask and answer
    • Do we know what compression scheme and ratio our scanners or archives use for WSI storage?
    • Has this been tested in our validation, including a realistic mix of easy and hard cases?
    • If we adopt AI tools, are we sure they were developed and tested with similar compression settings to ours?

Reference
Krupinski EA, Johnson J, Roehrig H, Nafziger J, Lubner M. Compressing pathology whole-slide images using JPEG2000: effects on diagnostic accuracy. J Digit Imaging. 2012;25(3):347–353. doi:10.1007/s10278-011-9423-7.