Chapter 2: Naming, security, and networks
Chapter 2: Naming, security, and networks
Chapter 3 - Storage 101: where your cases actually live
3.3 Storage tiers – hot, warm, and cold data
What you need to know
- “Hot” storage holds data that is accessed frequently and needs low latency (e.g., current cases, teaching sets in active use). It is typically built on fast SSDs or high-performance SAN.
- “Warm” storage is for data accessed occasionally (e.g., cases from recent months) where a modest delay is acceptable; mid-range NAS or slower disks are common.
- “Cold” storage (deep archive) is for rarely accessed data that must be kept for legal or research reasons; it may use very slow media (tape, cold cloud tiers) with retrieval times of minutes to hours.
- Matching data to the right tier is primarily about clinical access patterns and cost, not technology buzzwords. Misplacing data (e.g., everything hot forever) leads to runaway costs; misjudging “cold” can make re-review and audit painful.
Key reference
- Hart EM, Barmby P, LeBauer D, et al. Ten Simple Rules for Digital Data Storage. PLoS Comput Biol. 2016;12(10):e1005097. doi:10.1371/journal.pcbi.1005097.
3.4 Capacity planning for pathology – from slides to terabytes
What you need to know
- A realistic plan starts from clinical volumes: slides per case, cases per day, expected scanning percentage, and retention requirements. Even conservative assumptions often lead to multi-terabyte annual growth.
- Compression ratios, multiple focal planes, z-stacks, and multi-channel immunofluorescence all increase storage consumption beyond simple “H&E at 40×” estimates.
- Capacity must be planned not just for raw images but also for derived data: thumbnails, label images, AI features, annotations, and exports for research.
- Good planning considers growth over 3–5 years, with staged investments and a clear strategy for when and how to expand or migrate storage without disrupting clinical work.
Key reference
- Sinard JH. Practical Pathology Informatics: Demystifying Informatics for the Pathologist. 2nd ed. Springer; 2014. Chapters on laboratory information systems and data storage planning.
Chapter 4 – Naming and versioning: making files findable and safe
4.1 Human- and machine-friendly file naming
What you need to know
- Good names usually combine a stable identifier (e.g., case or specimen ID), a clear description, and a timestamp in a sortable format (e.g.,
2025-11-20), separated by simple delimiters (-or_).
- Using safe characters (letters, numbers, hyphen, underscore) avoids cross-platform problems and weird behaviour in scripts or URLs. Avoid spaces, special symbols, and accented characters in filenames when possible.
- File names should be descriptive enough that a person can guess what is inside without opening them, but not so long that they become unwieldy; the “five-second rule” (can you tell in five seconds?) is helpful.
- Consistent patterns across a group or department matter more than any particular choice; they enable simple sorting, searching, and bulk operations across many cases.
Key reference
- Wilson G, Bryan J, Cranston K, et al. Good enough practices in scientific computing. PLoS Comput Biol. 2017;13(6):e1005510. doi:10.1371/journal.pcbi.1005510.
4.2 Keeping PHI out of filenames and folder names
What you need to know
- Filenames and folder names are often logged, backed up, and exposed in contexts far beyond their original use (e.g., error logs, support tickets, cloud storage URLs). Treat them as potentially “semi-public.”
- Embedding patient names, full dates of birth, or other direct identifiers in filenames increases the risk of privacy breaches, and can conflict with legal requirements in HIPAA, GDPR, PHIPA, and similar frameworks.
- A better pattern is to use internal identifiers (accession numbers, study IDs) and keep mappings to patient identity inside controlled, audited systems (LIS, EHR) rather than in filenames themselves.
- When legacy systems have PHI-laden filenames, mitigation strategies include access controls, renaming on export, and clear policies about where such files are allowed to be stored or shared.
Key reference
- U.S. Department of Health and Human Services, Office for Civil Rights. Summary of the HIPAA Privacy Rule. Updated 2025. Available at: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html
4.4 Versioning and change history – avoiding “Final_v9_really_final”
What you need to know
- Manual version names (e.g.,
v1,v2,v3) are better than no versioning at all, but they break down quickly when multiple people edit in parallel, or when it is important to know exactly which version produced a specific result.
- Proper version control systems (e.g., Git) record every change with timestamps and authorship, allow you to see differences, and support branching and merging for parallel work – at the cost of a small learning curve.
- For images and binary files, it is often enough to version small metadata files or analysis scripts while keeping large binaries in well-named folders or object stores, linked by stable identifiers.
- In clinical settings, the key takeaway is to be explicit about which files are authoritative, how they are versioned, and how changes are reviewed – even if not everyone uses full software-engineering tools day-to-day.
Key reference
- Wilson G, Bryan J, Cranston K, et al. Good enough practices in scientific computing. PLoS Comput Biol. 2017;13(6):e1005510. doi:10.1371/journal.pcbi.1005510.
Chapter 5 – Privacy and security basics for clinicians
5.1 What privacy laws actually care about
What you need to know
- Despite local variations (HIPAA in the U.S., GDPR in Europe, PHIPA in Ontario, Law 25 in Quebec, etc.), most health privacy frameworks focus on protecting identifiable health information and giving patients meaningful control over how it is used.
- Core principles include: limiting collection and use to what is necessary, obtaining appropriate consent, maintaining accuracy and security, and giving individuals rights to access and correct their information.
- From a clinician’s perspective, the main questions are: “Could this information reasonably identify a patient?” and “Do I have a legitimate clinical, operational, or approved research reason to access or share it?”
- De-identification and pseudonymisation reduce, but rarely eliminate, privacy risk; linkage across datasets can re-identify individuals more easily than many people realise.
Key reference
- U.S. Department of Health and Human Services, Office for Civil Rights. Summary of the HIPAA Privacy Rule. Updated 2025. Available at: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html
5.2 Authentication, passwords, and multi-factor – why “something else” is needed
What you need to know
- Authentication answers the question “Are you really who you claim to be?”; authorisation answers “Now that we think you are Dr X, what are you allowed to do?” Both matter for protecting patient data.
- Modern good practice is to use longer, memorable passphrases instead of short, complex passwords, and to pair them with some form of multi-factor authentication (MFA) such as a phone app, hardware token, or biometric.
- For clinicians, the key is to minimise password reuse, use password managers where allowed, and embrace MFA as a way to make stolen passwords much less useful to attackers.
- Shared accounts and passwords are almost always a bad idea in clinical systems: they undermine auditing, incident response, and accountability, and are usually prohibited by policy.
Key reference
- Grassi PA, Garcia M, Fenton JL, et al. Digital Identity Guidelines: Authentication and Lifecycle Management (NIST SP 800-63B). National Institute of Standards and Technology; 2020. Available at: https://csrc.nist.gov/publications/detail/sp/800-63b/final
5.3 Protecting devices and data at rest – encryption and basic hygiene
What you need to know
- Full-disk encryption on laptops and portable devices is now considered a basic control: if a device is lost or stolen, properly implemented encryption can keep patient data unreadable to the finder.
- Auto-locking screens, short idle timeouts, and avoiding unattended unlocked workstations are equally important; many real incidents arise from curious bystanders rather than sophisticated hackers.
- Keeping operating systems and applications patched reduces the risk that known vulnerabilities will be exploited. In most hospitals this is handled centrally, but clinicians should understand why “please reboot to apply updates” is not optional.
- External drives, USB sticks, and personal cloud accounts are common sources of leakage. Departments should have clear, simple guidance on when and how (if ever) such media may be used with patient data.
Key reference
- Scarfone K, Souppaya M. Guide to Storage Encryption Technologies for End User Devices (NIST SP 800-111). National Institute of Standards and Technology; 2007.
5.5 When something feels wrong – first steps in incident response
What you need to know
- If you suspect that an account has been compromised, a device infected, or data sent to the wrong person, the worst response is silence. Early reporting often turns a near-miss into a manageable event.
- First steps usually include: stop using the affected system if safe to do so, preserve evidence (don’t wipe or “clean up” yet), and notify the designated IT/security contact or hotline with as much detail as you can provide.
- Most institutions have policies that treat prompt reporting as a positive behaviour; hiding incidents generally leads to worse outcomes for both individuals and organisations.
- Clinicians should know where to find local incident reporting procedures, what information is helpful to include, and what follow-up to expect after a report is filed.
Key reference
- Cichonski P, Millar T, Grance T, Scarfone K. Computer Security Incident Handling Guide (NIST SP 800-61 Rev. 2). National Institute of Standards and Technology; 2012.
Chapter 6 – Network 101: why “the network” matters to pathologists
6.1 Packets, bandwidth, and latency – a mental model for performance
What you need to know
- Data does not flow across the network as a continuous stream; it is broken up into packets that are sent, routed, and reassembled. Each packet faces potential delays (queueing, propagation, processing) and loss.
- Bandwidth is the maximum rate at which data could be sent over a link; throughput is how much actually gets through; latency is the time it takes for a small message to go from A to B and back.
- High bandwidth with very high latency can still feel slow, especially for interactive tasks like slide viewing where many small requests are made in sequence.
- Understanding these concepts helps pathologists interpret statements like “we have 1 Gbps to the data centre” and ask follow-up questions about latency, contention, and real-world performance.
Key reference
- Kurose JF, Ross KW. Computer Networking: A Top-Down Approach. 8th ed. Pearson; 2021. Chapters 1–3.
6.2 HTTPS, TLS, and why browsers show padlocks
What you need to know
- When you access a web-based LIS, PACS, or WSI viewer over HTTPS, the connection is typically protected by Transport Layer Security (TLS), which provides confidentiality and integrity against eavesdropping and tampering.
- Certificates and certificate authorities (CAs) underpin the browser’s decision to show a “secure” padlock; misconfigured or expired certificates often explain “Your connection is not private” warnings.
- TLS does not by itself guarantee that the application is trustworthy or that data is appropriately handled once it reaches the server – it only protects the data in transit between your device and that server.
- Clinicians do not need to tune cipher suites, but should understand that insisting on encrypted connections (HTTPS/VPN) is about protecting patient data from being read or modified in transit, especially over untrusted networks.
Key reference
- McKay KA, Cooper DA. Guidelines for the Selection, Configuration, and Use of Transport Layer Security (TLS) Implementations (NIST SP 800-52 Rev. 2). National Institute of Standards and Technology; 2019.
6.3 VPNs and remote access – safe ways into the hospital from outside
What you need to know
- Virtual private networks (VPNs) create an encrypted tunnel between a remote device and the hospital network, making that device appear (logically) as if it were inside the firewall.
- VPNs are powerful: once connected, remote users may have broad access to internal systems. That is why strong authentication, up-to-date devices, and clear policies are critical for telepathology and home reporting.
- Split tunnelling, always-on VPNs, and client posture checks are examples of design choices that balance usability with security; clinicians should be aware of local policies and avoid ad hoc workarounds (e.g., forwarding cases to personal email).
- When performance over VPN feels poor, it is often due to home network issues, congestion at the VPN gateway, or per-packet overhead; understanding the basics prepares clinicians for productive conversations with IT support.
Key reference
- Souppaya MP, Scarfone KA. Guide to Enterprise Telework, Remote Access, and Bring Your Own Device (BYOD) Security (NIST SP 800-46 Rev. 2). National Institute of Standards and Technology; 2016.
6.4 Practical troubleshooting and talking to IT
What you need to know
- Simple tools like ping and traceroute (usually run by IT, not clinicians) help distinguish local workstation problems from broader network or server issues. Clinicians mainly need to know what symptoms to describe.
- Useful information includes: which system was being used, what you were trying to do, exact error messages, whether the problem affects others, and whether it persists across devices or locations.
- Keeping a few screenshots and approximate times of failure makes it easier for IT to correlate problems with logs and network metrics.
- A shared vocabulary (latency vs bandwidth, local vs remote, application vs network) keeps discussions calm and efficient when systems misbehave during clinical work.
Key reference
- Kurose JF, Ross KW. Computer Networking: A Top-Down Approach. 8th ed. Pearson; 2021. Chapters on network troubleshooting and performance.