Several recent papers have found that algorithms are better at predicting human behavior than judges. In one high profile example, Kleinberg et al. used an algorithm to re-evaluate decisions to grant defendants pre-trial release made by judges in New York City from 2008 to 2013. They showed that an algorithm given variables about “characteristics of the defendant’s current case, their prior criminal record, and age (but not other demographic features like race, ethnicity, or gender)” could dramatically out-perform the actual decisions made by judges. As the authors put it, relying on their algorithm could have produced “crime reductions up to 24.7% with no change in jailing rates, or jailing rate reductions up to 41.9% with no increase in crime rates.”
Their finding is sufficiently thought provoking that Malcom Gladwell’s used it as a motivating example for his new book, Talking to Strangers. The relevance to Gladwell’s argument is that the judges have access to all the information the researcher input into the algorithm, but they also are able to look the defendant in the eye when assessing their character. But even though the judges have access to more information, the humans are just systematically worse at decisionmaking than computers.
This pessimistic evidence about the quality of judicial decisionmaking reminds me of John Robert’s analogy of judges as umpires. If what we are after is calling balls and strikes, it can now be done more accurately, quickly, and cheaply by computers than umpires.
But a paper released yesterday suggests that maybe judges are better thought of as referees than umpires. Megan Stevenson and Jennifer Doleac’s paper examines how judges in Virginia that were given algorithmic risk assessment scores changed the way they made sentencing decisions. They found that judges’ decisions were influenced by the information: judges gave defendants with higher risk scores longer sentences and defendants with lower risk scores shorter sentences.
However, the judges’ deviated from the risk scores in an important way. Despite high risks of recidivism, the judges systematically gave young defendants more lenient sentences. This deviation leads Stevenson and Doleac to persuasively conclude that the judges’ have goals other than just predicting recidivism in mind when they are making predictions.
This makes judges seem more like basketball referees than baseball umpires. When refs are calling a basketball game, most fans are open to the idea that the refs might call the game differently depending on the circumstances. When the stakes are high—at the end of the game, in the playoffs—we’re often cool with the umpires giving the players more leeway. If this is the right analogy, maybe it’s a little unfair to say that judges aren’t doing a great job of calling balls and strikes when they are actually playing a different game.