The measles crisis has thrown into sharp relief how publicising targets reached – and targets missed – can affect the healthcare all of us receive, writes Carl Shuker.
The debate around healthcare targets is hot right now because of two things: 1) the very public success – and, significantly for some, failure – of the target for immunisations, and 2) the 2018 suspension of public reporting of our present slate of health targets, pending a new system, to media and party political outcry.
Targets are hot. They are discrete, they are measurable, they are very public, and they sit directly at the nexus of the media, voters, politicians, and what drives change in the healthcare sector. They are the measurable and reportable – read: announceable – face of a massive, complex and mostly opaque system.
The national health target for immunisation in Aotearoa New Zealand was that 95% of babies receive the full schedule of shots at a number of significant age milestones. It was an important intervention because in the early 2000s, famously if perhaps only anecdotally, a World Health Organization bigwig flew out to inform us our immunisation rate was worse than the landlocked, central-African Republic of Chad.
Introduction of the target (and an enormous amount of work) dragged us up from a 67% to a 93% immunisation rate for two-year-olds by 2013. Crucially for Māori, the target eliminated the equity gap, raising the immunisation rate for tamariki from 59% in 2007 to 93% in 2014.
Immunisation was a top-down intervention, driven by a target, where champions were selected at every level from ministry to district health board (DHB), right down to individual GP practices. The target was clear and the logic was coherent and people mobilised behind it. By the time we hit about 92%, however, the marginal gains were growing increasingly harder and more resource-intensive to make. There was, at the national level, no further progress toward 95%. People talked about “target fatigue”. Remarkable gains had been made though; it was a success story.
However – and this is a big however – what we have seen since is backsliding.
Recent analysis suggests the current measles outbreak is not an anti-vaxx issue, it’s an eye-on-the-ball issue. And an equity issue. What that bland, “looks pretty good” national level of 92% in 2018 hides is that the backsliding is hitting those who most benefitted from the target. Immunisations for non-Māori children remain relatively stable, while the gains for Māori have been lost. The immunisation rate for Māori children has now dropped three years in a row and the fall seems to be accelerating. Look at the detail on the graph below. Having achieved statistical parity in 2015, by 2018 Māori babies were immunised at a rate lower than 2012 and more than 6% lower than non-Māori.
Hitting the target and missing the point
This is where we get into the machinery of how targets work in health systems. Because immunisation is one of the easy ones. The target was more than 95% of infants get immunised to the full schedule. Why 95%? Because that’s the proportion required for herd immunity to set in. Shoot for 95% and we hit the target and get what we want – success, transcendence, problem solved, the point of the target.
It’s hardly ever so straightforward.
With targets it’s really important to understand that gorgeous, unpronounceable word: synecdoche. “Sin-neck-doh-kee” means where the part stands in for the whole. A target shines a light on a single thing – a problem, a condition, a setting – and it generates all kinds of incentives around that thing in order to improve some aspect of it. This is done on the presumption that this targeted improvement will both represent some greater aspect of the system, and have positive or at least no negative effects on all the other elements that interconnect with that particular thing specified in the target.
And that’s where we get two effects. The first is known as the lamp post effect. Everything under a lamp post is illuminated and clear and we navigate at least somewhat confidently. Everything outside that pool of light is dark, obscure, avoided and even ignored. Unmeasured and not incentivised. Not until much later will conventional studies start to unpick the consequences for that undiscovered country.
The second effect is what’s known as “Goodhart’s law”, after British economist Charles Goodhart. Goodhart’s law in the original is challenging to unpick and generally gets plain-Englished as, “When a measure becomes a target, it ceases to be a good measure”. Put simply, when one thing stands in as a target for another thing the relationship between the two breaks down over time.
So we have a thing that we want to change, like immunity to communicable disease, or like clogged emergency departments (EDs) and patients dying while they wait. We call that Q. We need a measure of Q to know that the things we’re doing to improve Q are having an effect. We call that measure Q1. But Q1 is not quite the same thing as Q. Q1 might be the number of patients discharged from ED in six hours when Q is smooth-flowing EDs, or Q1 might be the number of eight-month-old kids who get all their vaccinations, when Q is herd immunity.
Goodhart’s law states that once you start measuring and incentivising behaviour around Q1, the relationship between Q1 and Q begins to unravel. Under pressure from our managers, our leaders, our government, we game the data, play the system, do anything under the sun to improve Q1, despite what it might mean for Q, the actual thing we’re trying to fix. We address Q1, not Q, sometimes even in spite of or actively against Q. It’s often called “management to measure” and it’s arisen out of the school of performance management systems that has defined behaviours of late 20th century business and government. We do whatever we can to hit the target, even if that means missing the point.
When targets lead us astray: gaming and gilding the lily
Gwyn Bevan is Emeritus Professor of Policy Analysis at the London School of Economics. He was also for a time part of the apparatus which, with some reservations, installed the targets system into the English NHS in 2001. Targets arrived in England with Tony Blair’s government in the form of a publicly reported star rating system for NHS trusts, district groupings of hospitals and other services that are (kind of) the equivalent of DHBs.
The new star ratings were in fact a complex measurement system: complex, yet presented with simplicity and brute force. Marks out of three stars, for the performance of NHS trusts on a collection of targets they had to hit, were awarded and very publicly published. More stars meant more money, more prestige, and leaders and managers got to keep their jobs. No or low stars meant you got the chop, pour encourager les autres.
“Star ratings,” Professor Bevan wrote in a seminal paper (cited 348 times), with typical understatement and what passes in the British Medical Journal for somewhat risqué humour, “have been criticised for their similarities to the target regime of the former Soviet Union, although NHS managers were threatened with loss of their jobs rather than their life or liberty.”
The reference to Stalinist excesses gave rise to the now accepted shorthand for the new regime: “targets and terror”. Bevan tells a story about a meeting of the “dirty dozen”. Circa 2001, the CEOs of the 12 trusts that had performed lowest on the new star system were hauled before the Department of Health to please explain. Bevan saw the 12 terrified CEOs, he says, waiting for the axe in the DoH foyer on Whitehall in attitudes of despair and dismay much akin to Rodin’s Burghers of Calais emerging from the city walls to be executed by Edward III.
Usefully for the rest of the world, while the English NHS embraced targets and terror, in Wales and Scotland there were targets but no ranking system, and no axe for poor performance. So the UK became a giant natural experiment in how targets incentivise – and transform – healthcare.
The first results to emerge concerned the target around ambulance response times. Ambulances were required to respond to at least 75% of “urgent” calls (category A – often cardiac arrests) within eight minutes. Before targets, about 60% of urgent calls for English ambulances were met within eight minutes. Two years after a target of 75% was set and star ratings were publicly reported, this figure hit and stayed around the target.
In Scotland and Wales? Zero movement from around 50-55% between 2001 and 2007.
Check out the graph.
So the incentive for English ambulances was clear. A national target, publicly reported, with clear and brutal consequences for failing to meet it. They succeeded. How did they do it?
Well, they gamed. Over the Christmas holidays of 2002, hunkered down in the basements under their gloomy central London HQ, analysts working for Bevan (including one who was an actual paramedic) dove deep into the databases of ambulance trusts. They broke the response times down by individual trusts to see how they responded to these targets, made some pretty graphs and eventually published. And we saw what happens when you incentivise people the hard and public way.
How it worked was that when a category-A emergency call came in the clock started ticking, and when the ambulance arrived at the patient a signal was sent back to control and the clock stopped. Thus data points were available for time of leaving and time of arrival, and the difference between the two was the ambulance’s response time. At one ambulance trust they showed giant spikes at each single-minute mark. Controllers were rounding ambulance arrival times (up to 200 extra calls) to the nearest minute, perhaps naturally enough in the days of slightly shonky 2G coverage.
But at another trust, a “noisy” number of arrival times between 6 and 7 minutes was followed by a giant spike in arrivals in the last ten seconds before eight minutes ticked over. Afterward, a crash down to near-single figures.
Picture those drivers watching the clock, hammering it down the country lanes to get to the old chap having a heart attack on his lawn, slamming their fists down on a huge red button as they spray gravel on the driveway to stop the clock at 7 minutes 57 seconds. Everyone high-fiving. It was clearly nonsense.
But what Bevan and Hamblin found was that most of these trusts were gilding the lily. Hitting the target wasn’t solely down to gaming. They did improve. Some took extraordinary and inventive measures. Ambulances were stationed in streets known as hot spots for calls at certain times on certain days – the end of a row of pubs on a Friday night, for example. Other trusts “deputised” neighborhood station wagons to snap up patients in rural areas. Unpicking the gaming from the improvement was incredibly complex but, Bevan and Hamblin found, they had improved, and furthermore, that any system of measurement with any teeth is bound to encounter gaming and should guard itself appropriately.
Aotearoa New Zealand and the future of targets
So where do we sit, armed with this information? New Zealand’s previous target regime was limited and variable but it had teeth too. We had targets for our ED waiting times (perhaps the most studied): that 95% of waiting patients be admitted to a ward, discharged or transferred in six hours. Clogged EDs dominated the news ten years ago and this was a target for which there was public and political appetite. It was developed with a working group comprised of actual emergency medicine doctors; it required substantial hospital reorganisation, huge effort and, the literature suggests, appears largely to have worked: in the first quarter of 2019 nearly 90% of New Zealand patients were out of ED within six hours, with a study looking at outcomes from 2006 to 2012 suggesting decreased crowding and 700 fewer deaths in EDs in 2012 than would otherwise have been the case. Ie, at a population level at least, the target improved outcomes and experience of care in our emergency departments.
Efforts were made to counter gaming seen with the equivalent English target, where parts of English EDs were reclassified as wards, where actual corridors were called wards, so patients could be moved 20 feet, called “admitted” and the target met. Because, in New Zealand, the local clinicians “owned” the target, it was felt gaming was unlikely to be tolerated on their “patch”. However, a study published this month shows gaming went on in at least four EDs – “short stay units” attached to EDs were used, for example, so patients could be recorded as “transferred” for further observation (or in fact “decanted”) and the clock stopped even when they were being cared for by the same ED staff. Sometimes, even when real improvements were seen (gilding the lily again), the clock was simply stopped before patients left ED.
Other naysayers point (anecdotally) to unforeseen and unintended consequences in other parts of complex hospital systems perhaps invisible to the policy makers: the intensive care unit (ICU) becoming crammed with patients they were unable to transfer, and being forced to cancel cardiac surgeries (who all go to the ICU to be woken post-op), because the ICUs were now competing with the emergency department for beds on the wards where all these new ED patients had been sent. Dozens of cardiac cases transferred to the private sector; increased length of stay in an expensive ICU; huge costs blowout; ICU patients wheeling around the tea trolley for overworked staff.
Yes, it’s complex. A hospital, they argue, is not a linear thing: it is more akin to the MONIAC “financephalograph” device that modeled money in an economy as a complex series of water flows in tubes. But people with healthcare problems aren’t alike and they aren’t like money, let alone water: picture those flows at varying levels of syrupiness, some approaching a solid, to get an idea of the difficulties involved. Touch one part of a complex thing and something else invariably will be affected.
We have targets for dispensing quit advice for smoking – does Q1 (GPs giving advice to smokers) affect Q (number of smokers)? If Q1 hits the target and Q reduces is it hitting Q1 that did it? Hard to say. Is it the most valuable use of someone’s 15-minute consultation time and does it reduce smoking or just alienate people with other problems? Also hard to say in the absence of any large published studies or evaluations.
Recent thinking tends to argue that the way forward might be high-level targets we can all agree on – reduced levels of deaths from things we know are treatable (AKA “amenable mortality”), for example – coupled with local freedom for hospitals or services to pursue whatever agreed means to achieve the target in their own specific contexts.
Tim Tenbensel, associate professor of health policy at Auckland, says that you need to understand the environment the specific target is embedded in. He argues that targets can work well when:
a) The target measure is a good representation of the desirable outcome
b) There is a system of independent verification available
c) Feedback is timely so that successful strategies can be disseminated across the country.
The example of immunisation shows a close match to each of these three criteria. Q1 is Q. An immunisations registry exists to track practice independently of clinicians and providers, precluding gaming (no similar independent verification of length of stay in ED was available). Until targets were retired, the results were published in half-page pieces in our newspapers, and there was tremendous national momentum to just help children.
Immunisations were probably the low-hanging fruit for the kinds of extraordinary results targets can show. The slippage we see now is likely part and parcel of the entire equity problem our country faces. The hardest to reach, the most disadvantaged and those most likely to benefit are the easiest for an institutionally racist system to neglect afresh when the grip of a good target loosens.
How closely do our other targets fit Tenbensel’s criteria? The healthcare of every country is a complex adaptive system, and it is a sleeping or at least dozing giant until awakened by interventions to which it responds in sometimes surprising ways – ones we only discover much later as the studies and evaluations roll in.
What we do know is that targets are powerful in their effects, in the complexity of the clinical response to them, and in the reception they get in public. They must be thought through, be connected intimately with the outcome we want, be independently verifiable and have companion measures of unintended consequences; they must be connected to some kind of guided way to improve against them, and most importantly, most of those affected must agree they are the right thing to do.