Chapter 2: Pixels, colour, and compression
Chapter 2: Pixels, colour, and compression
2.1 Pixels, magnification, and microns per pixel
What you need to know
- A digital pathology image is a grid of pixels
- Each pixel is the smallest addressable element in the image.
- Think of it as a tiny square on the tissue. The scanner measures the light from that square and stores it as numbers.
- Each pixel is the smallest addressable element in the image.
- The most important physical quantity is microns per pixel (µm/px)
- This tells you how much real tissue a single pixel represents.
- Example:
- 0.25 µm/px means each pixel covers a 0.25 by 0.25 micron square of tissue.
- 0.5 µm/px means each pixel covers a 0.5 by 0.5 micron square.
- 0.25 µm/px means each pixel covers a 0.25 by 0.25 micron square of tissue.
- Smaller µm/px means finer sampling and more detail, but also more pixels and bigger files.
- This tells you how much real tissue a single pixel represents.
- Magnification labels (20x, 40x) are only shorthand
- Manufacturers often say things like “scanned at 40x”, but what really matters is the µm/px that falls out of the optics and sensor.
- Two scanners both labeled “40x” can have slightly different µm/px values.
- For QA, validation, and publications you should always quote µm/px, not just “20x vs 40x”.
- Manufacturers often say things like “scanned at 40x”, but what really matters is the µm/px that falls out of the optics and sensor.
- Why µm/px matters clinically
- At coarse sampling (larger µm/px) you may lose fine detail such as mitotic figures, tiny microorganisms, or subtle nuclear membrane irregularities.
- At extremely fine sampling (very small µm/px) you add data volume and scan time but may not gain meaningful clinical information if the tissue quality or optics are the limiting factors.
- You should be comfortable asking:
- For this specimen type and stain, can I safely judge nuclear detail at this µm/px?
- Is there a group of cases where we routinely need finer sampling, such as small biopsies for lymphoma or high grade dysplasia?
- For this specimen type and stain, can I safely judge nuclear detail at this µm/px?
- At coarse sampling (larger µm/px) you may lose fine detail such as mitotic figures, tiny microorganisms, or subtle nuclear membrane irregularities.
- Linking back to the glass microscope
- On glass, you instinctively know what you trust at 10x, 20x, or 40x.
- In WSI, µm/px is the digital equivalent of that mental model.
- During validation, you are essentially checking whether a given µm/px behaves like your usual “sign out magnification” for your typical tasks.
- On glass, you instinctively know what you trust at 10x, 20x, or 40x.
Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.
2.2 Colour, channels, and colour spaces in WSI (RGB, stains, calibration)
What you need to know
- Most brightfield WSI data is stored as RGB colour images
- Each pixel has three values: red, green, and blue.
- What you see as purple nuclei or pink cytoplasm is your brain interpreting combinations of these three numbers.
- Each pixel has three values: red, green, and blue.
- There is no direct “H channel” and “E channel”
- H and E are physical stains on the slide, not separate digital channels.
- The scanner measures the light that has passed through both stains together and decomposes it into RGB.
- H and E are physical stains on the slide, not separate digital channels.
- Channel depth and representation
- For routine pathology, scanners usually store 8 bits per channel (0 to 255).
- That gives 256 possible values for each of the red, green, and blue signals at every pixel.
- This is sufficient for smooth colour gradients and stable viewing on standard monitors.
- For routine pathology, scanners usually store 8 bits per channel (0 to 255).
- Why colour varies between labs and scanners
- Staining: reagent batches, timing, temperature, and protocols.
- Scanner hardware: lamp or LED spectrum, sensor characteristics, optics.
- Viewing: monitor calibration, brightness settings, and ambient light in the room.
- This is why the same H and E in two hospitals often looks slightly “warmer”, “cooler”, “more purple”, or “more red”.
- Staining: reagent batches, timing, temperature, and protocols.
- Clinical implications of colour variation
- Slight differences rarely destroy diagnostic value, but can change the feel of the slide, especially:
- prominence of nucleoli
- contrast between chromatin and nucleolus
- intensity of eosinophilic material or red blood cells
- apparent strength of IHC staining.
- prominence of nucleoli
- When you validate a digital system, you validate the end to end colour pipeline (stain, scanner, viewer, monitor) rather than any single element.
- Slight differences rarely destroy diagnostic value, but can change the feel of the slide, especially:
- Colour standardisation and AI
- Algorithms are sensitive to stain and scanner variation. The same tumour may look quite different numerically across institutions.
- Many image analysis pipelines introduce stain or colour normalisation steps to reduce this variation.
- As a pathologist you do not need to implement these methods, but you should know that:
- unstable staining protocols or sudden scanner setting changes can break both human visual expectations and AI performance
- for any algorithm, its training data, staining conditions, and scanners matter just as much as its network architecture.
- unstable staining protocols or sudden scanner setting changes can break both human visual expectations and AI performance
- Algorithms are sensitive to stain and scanner variation. The same tumour may look quite different numerically across institutions.
- Practical questions to ask in your lab
- Are our stain runs reasonably consistent day to day and between sites?
- Are scanners kept in a reasonably stable configuration, or does someone frequently change brightness and contrast settings?
- Do we have any routine process for monitor calibration for sign out workstations?
- Are our stain runs reasonably consistent day to day and between sites?
Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.
2.3 Sampling and resolution trade offs, including 20x versus 40x
What you need to know
- Sampling is how finely you measure the tissue
- In digital terms this is µm/px.
- In microscope terms this is like deciding whether you mostly work at 10x, 20x, or 40x.
- In digital terms this is µm/px.
- What changes when you go from 20x to 40x scanning
- Pixels become smaller on the tissue, so you capture more detail.
- For a given field of view, you now have roughly four times as many pixels.
- File sizes grow, scan times increase, and network and storage load increase.
- Pixels become smaller on the tissue, so you capture more detail.
- Many services successfully use 20x scanning (around 0.5 µm/px) for routine surgical cases
- For a broad range of H and E tasks, 20x sampling has been shown to be diagnostically safe when properly validated.
- 40x scans or rescans are often reserved for:
- small biopsies where every nucleus counts
- cases where mitotic figure counting, microorganisms, or very fine architectural detail drive the decision
- particular subspecialties that have agreed they genuinely need that level of detail.
- small biopsies where every nucleus counts
- For a broad range of H and E tasks, 20x sampling has been shown to be diagnostically safe when properly validated.
- You should think in terms of tasks, not magnification labels
- “Is 20x enough” is not a universal question. It becomes:
- For this tumour type and grading system, are the relevant features adequately sampled at our chosen µm/px?
- For this IHC assessment, can I reliably see what I need to see at this sampling?
- For this tumour type and grading system, are the relevant features adequately sampled at our chosen µm/px?
- Your local validation studies should include the kinds of cases where you suspect resolution might be critical.
- “Is 20x enough” is not a universal question. It becomes:
- Practical tradeoffs you should be able to discuss
- Scanning everything at 40x may:
- stress storage and archives
- increase scan failures in some scanners
- slow down viewing and AI processing
- offer minimal benefit for many bread and butter cases.
- stress storage and archives
- A mixed approach is common:
- baseline scanning at a carefully chosen µm/px
- selective higher resolution capture where there is a clear justification in terms of patient safety or specific diagnostic needs.
- baseline scanning at a carefully chosen µm/px
- Scanning everything at 40x may:
- Talking to IT and administration
- When you say “we need 40x”, you should be ready to explain:
- which cases specifically
- what risk is mitigated
- what the cost is in terms of scan time and storage
- whether this might be handled through a protocol such as “selective high resolution rescanning” rather than default 40x for all slides.
- which cases specifically
- When you say “we need 40x”, you should be ready to explain:
Reference
Kumar N, Gupta R, Gupta S. Whole slide imaging (WSI) in pathology: current perspectives and future directions. J Digit Imaging. 2020;33(4):1034–1040. doi:10.1007/s10278-020-00351-z.
2.4 Bit depth, dynamic range, and why 8 versus 12 bits often matters less than you think
What you need to know
- Bit depth is how many distinct intensity levels a scanner can encode for each channel
- 8 bits per channel gives 256 possible values (0 to 255).
- 12 bits gives 4096 values.
- 16 bits gives 65536 values.
- Higher bit depth allows a larger dynamic range and finer distinctions in principle.
- 8 bits per channel gives 256 possible values (0 to 255).
- In routine H and E and many brightfield IHC applications
- The limiting factors are often the biology and optics, not the bit depth.
- Tissue thickness, stain variability, and sensor noise mean that beyond a certain point you do not gain clinically obvious improvement by going from 8 to 12 or 16 bits.
- Your monitor and graphics chain also limit what you can actually see.
- The limiting factors are often the biology and optics, not the bit depth.
- Why many systems use 8 bits per channel for viewing
- It is enough to avoid visible banding or posterisation for normal pathology slides.
- Files are smaller and faster to move around.
- Most monitors and operating systems are optimised for 8 bit per channel colour.
- It is enough to avoid visible banding or posterisation for normal pathology slides.
- When higher bit depth might matter
- Certain quantitative workflows, such as detailed densitometry or some fluorescence applications, can benefit from higher bit depths, especially at acquisition time.
- Research projects that work with raw sensor data sometimes use the full native bit depth before mapping down to 8 bits for display.
- Even in those cases, good staining and stable scanner settings usually have more impact than bit depth alone.
- Certain quantitative workflows, such as detailed densitometry or some fluorescence applications, can benefit from higher bit depths, especially at acquisition time.
- How to think about bit depth in service design
- For routine brightfield work it is usually safer to assume that:
- consistent staining
- stable exposure
- good focus
- and robust validation
are far more important than chasing extra bits.
- consistent staining
- Before insisting on higher bit depth, ask:
- What specific task would be unsafe or impossible at 8 bits per channel?
- Is there evidence that clinicians or algorithms perform better with higher bit depth on the slides we actually produce?
- What specific task would be unsafe or impossible at 8 bits per channel?
- For routine brightfield work it is usually safer to assume that:
Reference
Aeffner F, Zarella MD, Buchbinder N, et al. Introduction to digital image analysis in whole-slide imaging: a white paper from the Digital Pathology Association. J Pathol Inform. 2019;10:9. doi:10.4103/jpi.jpi_69_18.
2.5 Compression in WSI: lossless versus lossy, diagnostic impact, and file size
What you need to know
- Whole slide images are huge, so compression is unavoidable
- A single uncompressed slide can be multiple gigabytes.
- Compression makes storage and network use manageable.
- A single uncompressed slide can be multiple gigabytes.
- Two main types of compression
- Lossless
- Reduces file size without changing any pixel values.
- The original image can be reconstructed exactly.
- Gives moderate savings but is safe conceptually.
- Reduces file size without changing any pixel values.
- Lossy
- Discards some information to achieve much higher compression.
- The decompressed image is not identical to the original.
- The key question is whether the loss is diagnostically acceptable.
- Discards some information to achieve much higher compression.
- Lossless
- The idea of diagnostically acceptable irreversible compression
- There is a range of lossy compression where the differences are invisible or irrelevant for diagnosis.
- Studies with JPEG2000 and other methods have shown that pathology slides can be compressed by a factor of several times before observers begin to lose accuracy.
- Past that range, artefacts appear:
- blocky areas
- halos around edges
- blur that softens fine detail.
- blocky areas
- There is a range of lossy compression where the differences are invisible or irrelevant for diagnosis.
- Clinical responsibilities around compression
- Compression settings are not “IT only” decisions. Pathologists should:
- participate in choosing compression levels
- inspect slides at the proposed settings, including challenging cases
- be involved in validation studies that compare diagnostic performance at different compression ratios.
- participate in choosing compression levels
- If your lab changes compression settings, this is effectively a change to the imaging system and can require re validation, especially if you use AI or quantitative tools.
- Compression settings are not “IT only” decisions. Pathologists should:
- Compression and AI or quantitative analysis
- Moderate lossy compression is often tolerated by algorithms, especially if both training and deployment use the same settings.
- Very high compression or mismatched compression between training and deployment can degrade model performance or introduce subtle biases.
- For any algorithm, you should treat the compression configuration as part of the model’s environment:
- if you change it, you may need to re test the model.
- Moderate lossy compression is often tolerated by algorithms, especially if both training and deployment use the same settings.
- Questions you should be able to ask and answer
- Do we know what compression scheme and ratio our scanners or archives use for WSI storage?
- Has this been tested in our validation, including a realistic mix of easy and hard cases?
- If we adopt AI tools, are we sure they were developed and tested with similar compression settings to ours?
- Do we know what compression scheme and ratio our scanners or archives use for WSI storage?
Reference
Krupinski EA, Johnson J, Roehrig H, Nafziger J, Lubner M. Compressing pathology whole-slide images using JPEG2000: effects on diagnostic accuracy. J Digit Imaging. 2012;25(3):347–353. doi:10.1007/s10278-011-9423-7.