Compliance at scale: immunizations

Written by Aaron Hammond

April 14, 2023

Nearly every student in America has an immunization certificate of some origin or another. The reliability of these records and compliance with legislated requirements are critical to public health.

Unsurprisingly, a lot of effort goes into verifying these documents.

This contemporary certificate from the state of Oregon demonstrates the most common format of immunization records. Each column represents different dates of vaccinations within the series row for a particular kind of vaccination.

This contemporary certificate from the state of Oregon demonstrates the most common format of immunization records. Each column represents different dates of vaccinations within the series row for a particular kind of vaccination.

First, there’s data entry. A typical immunization document may contain entries for about 25 separate shot events. Each of these dates may take around 8 seconds to key in. If a school has 1000 students, these all multiply to 55 hours of labor, just for data entry alone.

This grid shows a selection of states' requirements for the varicella vaccine, a more recent addition that is now mandatory nation-wide.

This grid shows a selection of states' requirements for the varicella vaccine, a more recent addition that is now mandatory nation-wide.

Clinic staff must quickly become experts on their state’s immunization rules in order to pore through these documents and judge compliance. In some cases, a single shot taken off-schedule is enough to render a student non-compliant, and mistakes could have professional repercussions for practitioners.

The whole process feels a lot like walking a tightrope while trying to file hundreds of complicated tax returns. Distressing to humans but ripe for automation.

Solving problems

This isn’t a problem that could be solved by an off-the-shelf API, though. The full lifecycle of an immunization import is complex in time, space, and computationally. There’s a reason it takes humans so long!

It was a terrific challenge. The technology we developed could be used to extract any structured data from a PDF, not just immunization records. For example, we observe that most health details are visible on a student’s annual physical.

Tokenization

We use a combination of computer vision (CV) and natural language processing (NLP) to find any writing in the document.

Single characters are segmented into meaningful chunks called tokens. The location of these tokens on the page is one way the system matches dates in the grid with their corresponding type of immunization.

Each box we draw on the PDF represents a token, with a bounding box drawn to enclose the text content on the page fully.

Each box we draw on the PDF represents a token, with a bounding box drawn to enclose the text content on the page fully.

Text translation

International students and recent arrivals may submit immunization documentation in a language other than English. Many states today also offer other language versions of their immunization forms that parents may complete.

In these cases, we use machine translation to derive an English-language version of the document for further processing.

The system recognizes Spanish-language immunization labels and parses records consistently.

The system recognizes Spanish-language immunization labels and parses records consistently.

Date detection

Our first round of detection isolates any dates in the document. This includes the dates of specific shots represented in the grid, the date the report was produced, and the student's birthdate.

Shot dates typically follow the same format MM/DD/YY, but the order may vary. Handwritten documentation may also use natural language like April 7, 2022, so we’ve built our detector for substantial flexibility.

We automatically tag the owner of the document if we detect their name and birthdate uniquely in the document.

We automatically tag the owner of the document if we detect their name and birthdate uniquely in the document.

Shot detection

A typical immunization record contains rows or columns of dates that each represent an administration event for a specific type of shot. Each shot type names a different formulation recognized by the CDC, but labeling is inconsistent between different state and health records.

Our shot detector is resilient to these variations.

The system recognizes the words "Haemophilis Infl B" and correctly identifies the unspecified version of the Hib vaccine (CVX 17)

The system recognizes the words "Haemophilis Infl B" and correctly identifies the unspecified version of the Hib vaccine (CVX 17)

The system also recognizes the plain "Hib" as representing the unspecified version of the Hib vaccine.

The system also recognizes the plain "Hib" as representing the unspecified version of the Hib vaccine.

The system recognizes the complete term "Hib (PRP-T)" as distinct from simple Hib and assigns the correct shot type (CVX 48)

The system recognizes the complete term "Hib (PRP-T)" as distinct from simple Hib and assigns the correct shot type (CVX 48)

Combinations

Some types of shots represent combinations of two or more standalone vaccinations. A student may receive a HepB-IPV shot, which combines two vaccines into one shot that confers immunity from both diseases. These combination shots likewise satisfy requirements for each of their constituent vaccines.

The system correctly recognizes "MMR-Varicella" as the combination of the two (CVX 94). A color gradient is used to indicate the multiple constituent vaccines.

The system correctly recognizes "MMR-Varicella" as the combination of the two (CVX 94). A color gradient is used to indicate the multiple constituent vaccines.

The system correctly decomposes combination shots into the relevant standardized immunization types actually referenced in state requirements. In this case, an administration of the MMRV vaccine counts as an immunization event for both the MMR requirement as well as Varicella.

The system correctly decomposes combination shots into the relevant standardized immunization types actually referenced in state requirements. In this case, an administration of the MMRV vaccine counts as an immunization event for both the MMR requirement as well as Varicella.

States write their requirements in terms of immunity to each disease in particular, rather than the shots that confer it. We account for combination shots by inserting events for every consistent vaccine in a combination shot. Visually we indicate these shots with a color gradient over each immunity granted by the specific kind of vaccine in question.

A combination shot may include two, three, or four vaccines in a single administration. The result is a delightful band of color.

A combination shot may include two, three, or four vaccines in a single administration. The result is a delightful band of color.

Series inference

Our greatest challenge was finding a way to infer sequences of shots for a particular immunization reliably. Although there are a few common patterns, each immunization document is a snowflake.

Some feature columns of dates, and others orient around rows. Documentation produced by physician offices and EMR systems can follow entirely different geometries. Parents often provide documents that are upside-down or rotated.

To handle as many documents as possible, compliance is attempted several times using different strategies refined over time to handle different conformations. The best fit of shot types to shot dates is taken as the winning strategy.

We choose the strategy that minimizes the number of non-compliant records in the records parsed from the document

We choose the strategy that minimizes the number of non-compliant records in the records parsed from the document

Compliance

At the core of our system is our in-house library of state immunization requirements for students as they progress through school. We encode these requirements in a domain-specific language to evaluate whether a given student’s immunization records are compliant.

Our system evaluates compliance down to a single administration in a series for one specific type of immunization.

We encode immunization requirements for pre-K through 12th grade. These requirements may mandate a dose be received before or after a certain age; or a particular kind of vaccination to satisfy an immunization type in some cases. We also encode rules around waivers, exemptions, and titers

We encode immunization requirements for pre-K through 12th grade. These requirements may mandate a dose be received before or after a certain age; or a particular kind of vaccination to satisfy an immunization type in some cases. We also encode rules around waivers, exemptions, and titers

We use the results of our compliance evaluation to find the best matching between dates and shot types. This is the same immunization compliance calculation we apply elsewhere in the platform.

What's next

In this blog series, we'll examine each of these components one by one. We'll explore the computer science principles and modern technologies that combine to make a robo-nurse.

Developing this technology has stoked our imaginations and spread enthusiasm. There are so many opportunities in ed-tech for beautiful technology. We want to raise your expectations!

Let's start with computer vision.