Here is the refutation being referred to in the article, the article author did a poor job explaining it, and the reason to reject isn't "effect is too large", because the effect is real -- but this study explains it:
http://journal.sjdm.org/16/16823/jdm16823.html
1. Prisoners are ordered by whether they have an attorney (so those going last in a session are self-represented)
2. Judges order prisoners often by "complexity of the case", so the ones taking most time are first (and therefore most likely to have a favorable decision, a short/easy case is probably a no-parole situation)
3. Statistically, the cases that gained parole take longer than those that don't, so even if they are random/normally distributed, then if one falls at the end, it will come back to the beginning of next session. A simulation shows that even with random ordering, you still get the same graph (because the long/complex cases that could be paroled tend to be moved to next session when the session is almost out of time).
So after reading this, the graph means:
Ratio of cases with a lawyer that can be finished in the remaining time in the session.
Both values are decreasing as the session continues, so it produces a heavy down sloping graph that resets each session to some approximately random value (.65).
Thank you so much for providing this comment. When I was done reading the original link I was wondering if I had missed the author's explanation as to what the alternative theory was. It seemed they were simply saying, "I don't believe it, so it can't be true." Which is more or less what they did, at least in writing. The author may be fully aware of what you mention, but that certainly did not come across at all in what was written.
The author is trying to make an argument that it should have been done by the original researchers - since the implied conclusion is impossible and thus wrong, they shouldn't have been allowed to publish the study with this conclusion without a proper interpretation; and even though he might have some interpretations, what matters is that it was their job to produce them and include them in the paper.
Quoting OP, "It is up to authors to interpret the effect size in their study, and to show the mechanism through which an effect that is impossibly large, becomes plausible. Without such an explanation, the finding should simply be dismissed."
And specifically, that the impossibly large effect has no plausible psychological basis, putting the onus on the researchers to perform additional due-dilligence in the face of data that is completely outside the bounds of the theoretical model(s) under discussion.
> the reason to reject isn't "effect is too large"
The author argues the effect is too large to be a psychological effect. Your explanation only supports this notion: the effect is real, certainly, but not psychological in nature.
So yes, the effect is too large to be a psychological effect, because the consequences of such a strong psychological effect would not be limited to the court-room alone.
The author never tries to explain the cited paper, they just offer 'a different approach' for how we can know that the original effect isn't caused by hunger.
I appreciate the perspective that, because psychology is variable from human to human, by having a large effect size, it almost certainly invalidates whatever conclusions are made based on the data. This kind of thinking allows the reader to understand further psychological studies without relying on specific refutations, but instead they now have a "bad smell" sense that they can fall back on.
Ordered by whether you have an attorney really saved me when I was 19. I sat patiently, and took notes from attorneys defending people with similar charges. I said the same things, and got the same leniency.
I don't remember much from my operating systems class, but I do remember that "shortest job first" is a more efficient scheduling algorithm than "longest job first".
I cannot fathom why anyone would want to schedule 12 cases that will likely take 15-30 minutes each after the one that will almost certainly take more than 4 hours. I understand that type of scheduling in hospitals, because the complex, time-consuming case is probably the guy who will die if not treated immediately, while the simple cases can afford to wait, but no one is going to die or go broke if their court appearance gets pushed back a week because too many "easy" cases suddenly popped up and pre-empted it.
The big thing missing from that assessment is case variability.
The clear-cut case with a plea bargain already assessed is short, but also predictable. If you schedule it at 11:50 AM, you'll still go to lunch at 12:00. The case that's going to take 16 different motions with unclear evidence isn't just long, it's unpredictable. Even if such cases take an average of one hour, you don't want to put them at 11:00 AM.
Shorter tasks are also easier to reschedule. Pushing three short cases from before lunch to after is much lower-impact than interrupting one big case midway through.
So I don't think your assessment is wrong, but there're other factors deciding this.
You impact fewer people by delaying a long case vs 5 short ones. But, you are likely to forget details and/or spend lunch thinking about a case that's open which is IMO more motivation.
It's akin to a good algorithm for avoiding memory fragmentation - allocate your largest blocks first.
Assuming you need a contiguous block of time for each case, you should schedule the longest case that you expect to complete in the time available, leaving a bunch of cases you can easily slot into whatever time is left.
They aren't directly comparable. OS scheduling assumes 100% availability of the resources for its own purposes, with no external interruptions.
However, instead assume that the processor is only available for a couple 4-hour time slices a day. And, for reasons, starting a task that cannot be finished should generally be avoided. Now, if you need an estimated 4 hours to finish a task, that task must be scheduled at the beginning. If you take the alternative and schedule smaller tasks first, say they all take 2 hours. Well, you can't plug a 4-hour task into a 2-hour slot, so you just wasted 2 hours of time.
Maybe they're trying to minimize the number of hours needed to go through the cases yet to be decided. So whenever some free time comes up, they're able to go through that backlog easily.
It could also be that, just as in medicine, harder cases are more relevant than easier cases. If the amount of effort each side puts into their case has any positive relationship with the value with what is at stake, solving harder cases first would free up whatever is at stake and make it available to society again.
Unlike the computing power, the mental energy does not stay constant throughout the day. Sedentary work style, mental workload from processing the cases and I guess [lack of] nutrition or hydration all contribute to fatigue.
The bet here is that simpler cases require less mental energy and can be done on autopilot.
First, the original study in question involved Israeli parole board hearings. I'm not sure if that augurs in favor of few easy cases or many easy cases.
Second, at least in the United States and regarding both general criminal and civil matters, there are few easy cases in terms of how judicial resources are spent. The easy cases would have pled out or settled, with the courts only nominally involved as a matter of process. Trials are arduous time sinks, both inside and outside the court room; few cases go to trial.
On the other, if we're talking about discrete hearings in front a judge, most would be easy. That's because most hearings are for motions or other orders (e.g. approval of a plea deal or settlement), and the complex aspects of those would have largely been hashed out between the parties, in the filings, or in the judge's chambers. That applies for all cases. A trial might take one or two extended courtroom proceedings, but have been preceded (and even succeeded!) by a dozen quick hearings on motions and other process.
A lot of replies in here seem to think the article is arguing that the data be disregarded. My reading is that the author is saying that because of the data, the conclusion that the cause is purely psychological must be disregarded.
The argument for that being, partially, that if the effect is caused by hunger that would predict widespread issues elsewhere in society that we do not see and, partially, that there are no other psychological effects this strong.
Another hypothesis is needed. For instance, anyone who has spent time in a courtroom knows judges like to get small simple cases done first to get people on about their day - perhaps the effect is because more complex cases scheduled towards the end of a shift at the bench are more likely to get a negative ruling?
This is what the Tom Stafford [1] article suggests:
The main analysis works like this: we know that favourable rulings take longer than unfavourable ones (~7 mins vs ~5 mins), and we assume that judges are able to guess how long a case will take to rule on before they begin it (from clues like the thickness of the file, the types of request made, the representation the prisoner has and so on). Finally, we assume judges have a time limit in mind for each of the three sessions of the day, and will avoid starting cases which they estimate will overrun the time limit for the current session.
From the article: "If hunger had an effect on our mental resources of this magnitude, our society would fall into minor chaos every day at 11:45 a.m."
This looks compelling at first but how many people are really making any considerable decisions on an empty stomach before lunch? Everyone at my company sets up meetings earlier (9 or 10am) or after lunch because they know how they are when they are hungry and/or thinking about their lunch break. How many people are actually subject to this effect on a given day and how many are making important decisions at that time?
There are a lot of aspects to our daily life that could be under this effect without us observing them individually or as a whole as "minor chaos".
Automobile drivers could be more aggressive and dangerous when hungry. "the highest percentage of them (crashes) occurring between 6-9pm -- evening rush hour. Commuters rush home daily to eat, spend time with their families, watch television and/or get to work on a second job." [1] and "Nationwide, 49% of fatal crashes happen at night, with a fatality rate per mile of travel about three times as high as daytime hours." [2]
Perhaps this effect is causing more automobile crashes but as individuals we chalk it up to people just wanting to get home under their own free will (because we all want to get home after work for a variety of reasons). Or we've just become accustomed to how evening rush hour is and we don't observe it as a more chaotic event affected by psychology.
Separately, why is "minor chaos" necessary to prove the data? This is just the author's parameters. Even though what the judges do is messed up in the hungry situation, it doesn't cause minor chaos and they're making substantial decisions. And the aggregation of this effect could be minor chaos but still not readily observable as the author suggests.
So to the author's key argument about minor chaos - if this effect has always been around then what we know about our daily life might not have attributable and observable chaos even though it does. Or the chaos might not even exist because the number of people truly affected is very small on a daily basis.
> How many people are actually subject to this effect on a given day and how many are making important decisions at that time?
That is the point, this effect is so remarkable, that even a small number of people making a small number of decisions over a long period of time is going to add up to a huge effect.
Maybe it does. How many 2pm meetings are had to correct for 11am meetings? The thing is, most systems have a lot of checks and balances built even. Even if a decision is made, it is quite easy to revisit it hours later. Even days after it is often pretty easy to revise a decision.
It would be better to look in areas where decisions are irreversible and consequences are immediate instead of society in general.
> a small number of people making a small number of decisions over a long period of time is going to add up to a huge effect
That hasn't been shown but I agree it's possible. As I mentioned, if it's true and there is a huge effect as you suggest or minor chaos as the author argues but it's not observable, then the effect is still real.
The key argument from the author against this effect being true is that minor chaos would manifest because it's remarkable in the judges. But we had to do a study to find this in judges... And judges are in a unique and powerful position.
>The argument for that being, partially, that if the effect is caused by hunger that would predict widespread issues elsewhere in society that we do not see and, partially, that there are no other psychological effects this strong.
I like that the author is invoking a consistency check from a separate domain (a very good practice!), but this particular one is over-reaching, for several reasons:
1) Judges don't personally bear a negative cost from overly harsh judgments (within the variation shown on these cases). So you can reasonably believe that judges are influenced by hunger (and likewise for anyone who knows their decision isn't life-or-death), but also believe that when there is more at stake, the general population is more careful and introspective (at these mild hunger levels).
2) We do in fact, anecdotally, see the "hangry" phenomenon.
3) We do see people rioting when resources are short, especially food. It would be surprising if you got the opposite result -- if judgments were hasty when you're very hungry, but not sorta-hasty when you're sorta-hungry.
This feels like half good science (be skeptical of outlandish results) and bad science (can't be true! so... dismiss it?). Humans think in narratives, which is why every time the stock market goes down 3 points it's becaise "investors are nervous about Syria" then when it rebounds it's "investors unfazed by Syria" (or whatever).
The scientific approach should be to come up with a plausible narrative and then do everything possible to discredit it. If the narrative survives, it is likely accurate. This hungry judges narrative seems inaccurate for a lot of reasons, the double dip being the easiest reason for dismissal. But it does still beg the question: why this pattern? Are cases staggered in a specific way? How many cases were looked at? Does this pattern still hold and could we sit in on a case and see it happen in realtime?
As scientists in a field dismissing an outlandish result out of hand is horribly dangerous. As lay consumers of science who don't have the time to double check every paper we hear about we should absolutely ignore counter intuitive papers until such time as they're replicated and accepted by the relevant scientific community.
It could be argued he is too aggressive in saying we should dismiss it outright, but for it this finding to be scientifically accepted it demands replication.
It's also not out of line to make a loose "common sense" appeal with exaggerated findings and demand more rigor. Even if this study was replicated, it would still need more exploration of the mechanism and unknown variables. We don't have enough information about what the judges did during lunch recess, as I doubt what they ate is a matter of record.
Study has big procedural failures (assuming randomness with no evidence), big analysis flaws (assuming causation out of correlation), and gets into a result that is against nearly all the evidence out there.
A) Cases are not presented in a random order, B) judges attempt to complete all cases from a given prison before a break is taken, C) the study groups together "deferred" and "not granted parole" - it's possible "deferred" cases are pushed to the back of the prison's docket. Also, D) shared representation is common, and in these cases it's possible and likely for the representative (who chooses what order to present his cases to the judge) to present the most promising case first.*
How this at all boils down to "hungry judges" and then gets reported as such is basically everything wrong with science reporting (which is to say, almost everything is wrong with science reporting).
I really appreciate the author's critical analysis of this correlation presented as "fact" by Radiolab and I love how Hacker News and other blogs take these types of scientific findings and dig in for the truth. I think the PNAS paper refutes the original conclusion pretty thoroughly - I wish the Nautilus author would just explain that.
I don't think we should dismiss effects just because they seem really large (as the Nautilus author claims) but I do think that it's incredibly irresponsible of Sapolsky and Radiolab to be uncritically citing a study that looks like it was debunked in 2011.
I also think it's strange that the author cites the SJDM paper which is much, much less convincing, claiming that it refutes the original experiment. It looks to me like that paper just shows that by simulating a non-random order of parole requests they can create data that looks like the original experiment.
I love that Hacker News posts these things and people go through and analyze the papers. No one outside of the specialized field could possibly have time to analyze all of these papers but they clearly have implications that matter for everyone. I wish that popular science shows would do a more thorough analysis of these results on their own.
This author misuses a gaussian distribution by way of claiming that a standard deviation is outrageous...all on a phenomenon that is overwhelmingly likely NOT to be gaussian distributed.
The justification for this procedure is that the judges are assumed to make decisions by dichotomizing a continuous variable with logistic distributed error (this one statistical justification for logistic regression; see https://en.wikipedia.org/wiki/Logistic_regression#Latent_var...). The mean difference in the continuous variable is given by the log odds ratio times some constant, and the standard deviation of the continuous variable is pi/sqrt(3) times the same constant. Because the logistic distribution resembles a normal distribution (see https://en.wikipedia.org/wiki/Logistic_distribution and figure in paper), the standardized mean difference given by this method will approximately equal the standardized mean difference of the latent continuous variable.
There are a lot of problems with the conclusions Radiolab and other science journalism draws from the original study. But this article isn't better: the author dismisses the results of the study as impossible, because it seems impossible, because we would already know, etc.
If I ran a study that said whenever a bell rings at 11:45AM students in public schools exit their classrooms (99% within 2 minutes) and quickly file into the cafeteria, and that this was evidence for mass systemic Pavlovian conditioning, what would you say?
Might you say, "that's an impossibly unnatural conclusion", given everything we accept to be true, including our lived experiences--as students, as human beings.
Of course, it's a _logical_ fallacy to say so. But pure, unadulterated Aristotelian logic has surprisingly little application even in science. While as a logical matter such a conclusion on its face might not be impossible, it's not necessarily hyperbolic to label it so even in scientific discourse. Not the least because everything we know about Pavlovian responses suggests that there should be far more outliers, such that there must be at least some structural component involved, not a purely psychological response. In other words, the result is too _clean_. The real world is much messier, especially in the world of human psychology in complex settings, and the odds of seeing such a clean and consistent response relationship is extremely slim.
Quantum physics suggests that you could spontaneously teleport a kilometer away. Would I be wrong in believing impossible a paper that concluded that you spontaneously quantum teleported from home to work? What if the paper fails to establish--even nominally--the absence of other, less shocking, explanations?
Logically speaking, the teleportation might not be impossible, but in most others contexts (including statistics and other forms and methods of reasoning) one might fairly call the conclusion impossible, preserving the label implausible for scenarios more deserving of critical assessment.
> If I ran a study that said whenever a bell rings at 11:45AM students in public schools exit their classrooms (99% within 2 minutes) and quickly file into the cafeteria, and that this was evidence for mass systemic Pavlovian conditioning, what would you say?
> Might you say, "that's an impossibly unnatural conclusion", given everything we accept to be true, including our lived experiences--as students, as human beings.
No, I would say that there are alternate theories which better explain the data. And it would not be a logical fallacy to say that.
> Of course, it's a _logical_ fallacy to say so. But pure, unadulterated Aristotelian logic has surprisingly little application even in science. While as a logical matter such a conclusion on its face might not be impossible, it's not necessarily hyperbolic to label it so even in scientific discourse. Not the least because everything we know about Pavlovian responses suggests that there should be far more outliers, such that there must be at least some structural component involved, not a purely psychological response. In other words, the result is too _clean_. The real world is much messier, especially in the world of human psychology in complex settings, and the odds of seeing such a clean and consistent response relationship is extremely slim.
The problem with the "argument from too-clean" is that clean data occurs all the time. If you drop a ball in my living room it will fall down 100% of the time. Are we to disbelieve these results because they are too clean? Obviously not.
Look at your analogy. Obviously the migration to the lunch rooms isn't caused by Pavlovian response (at least not exclusively) but that's not because the data is wrong: the students do migrate to the lunch room at the described time. If you're claiming this data is too clean to be believable, then you're claiming that the students don't migrate to the lunch room in consistent numbers. This is just as wrong as the Pavlovian response theory.
Exactly. That's the same mentality people have when someone solves a complex problem with a simple solution. E=mc^2...so obvious, duh!
It's like saying that anything we can observe or intuitively know cannot possibly be accurate because it is self-evident. So no one can have an original, creative thought either because someone else would have already thought of it.
Thankfully, other posters here have provided a legitimate argument against the original paper.
>But I want to take a different approach in this blog. I think we should dismiss this finding, simply because it is impossible. When we interpret how impossibly large the effect size is, anyone with even a modest understanding of psychology should be able to conclude that it is impossible that this data pattern is caused by a psychological mechanism.
Isn't it? The answer wasn't that judges were making decisions based on their hunger, it was that judges arrange their schedules for some reason other than random. If a biologist working with samples gets a reading that can't be explained by a natural process but can be explained by a miscalibrated piece of equipment or a test tube that wasn't properly washed, it absolutely is how science works for the problem to be identified and the labwork to be run correctly, rather than publishing a result based on bad data. It's the same here; if the magnitude of the effect you find is at least an order of magnitude higher than any other study of similar effects, before publishing the finding you should have an airtight case that what you found is the effect you're measuring, not a data quality problem. They couldn't have an airtight case because their findings weren't driven by what they thought they were measuring.
> If hunger had an effect on our mental resources of this magnitude, our society would fall into minor chaos every day at 11:45 a.m. Or at the very least, our society would have organized itself around this incredibly strong effect of mental depletion.
Kind of pedantic, but we indeed organize society around lunch time
It would be great if such kind of criticism would start by just writing down what it sees in that diagram. What I see without training looks totally normal and as if it would be validating the claim strongly.
> But I want to take a different approach in this blog. I think we should dismiss this finding, simply because it is impossible.
That is truly begging the question. He goes into details about what the impact to society would be if we're that affected by hunger/fatigue, but the validity of the finding still remains.
> It is up to authors to interpret the effect size in their study, and to show the mechanism through which an effect that is impossibly large, becomes plausible. Without such an explanation, the finding should simply be dismissed.
Again, horrible circular logic here. Simply disregarding things because they're "impossible" is the antithesis of science. The authors of the paper gave a hypothesis that explains the (undisputed) data.
You should not simply dismiss this by just saying "it's not true because it can't possibly be true". For instance, try explaining the germ theory of disease to somebody before Anton van Leeuwenhoek invented the microsope.
I agree that simply disregarding things because they're impossible is not great science, but in this case this isn't great science from the start, because we're simply measuring two of many, many variables. We're only looking at judgement outcomes, and time of day.
The thing is, this result is about the story... about the narrative the findings tell. The original findings told the story of "hungry judges are grumpy judges", but that was conflating correlation with causation. There are other variables which were NOT controlled for (i.e. not great science) that tell a DIFFERENT story.
This article is about the author pointing out that there are multiple ways to tell this story, and this particular way is about as impossible as it gets, statistically speaking.
> This article is about the author pointing out that there are multiple ways to tell this story, and this particular way is about as impossible as it gets, statistically speaking.
The author doesn't cite any statistical examples for his refutation of the study. He simply says that the effect is too big, gives some examples of how big the effect is in other situations, and cites absolutely zero statistics to support his refutation.
If there was a study where they compared relative conviction rates of judges who had a 10 minute snack recess every two hours, if that rate was the same, that would be some evidence.
In fact, it seems as though the conviction rate goes back to normal every time that the judge takes a break, which might suggest that taking a break is the triggering thing, not the food. Would be interesting to see what would happens if the judge were allowed breaks, but not allowed to eat or drink during the breaks.
As a layman, it felt like his article was addressed at professional psychologists, not laymen like myself (or, presumably, yourself).
It read like it was assumed that you'd realize how massively overblown such a result should be interpreted.
It would be like if you wrote a couple tests and your tooling said you had 100% code coverage, you be like "Wait, that just can't be right. I was just writing tests for one function..." Or imagine you go to download a 100GB file from some server and the download completes in 5 seconds, you'd be like "Wait, that.... just can't be right. I don't have a SSD, allocating the disk space alone should take longer than that..."
Of course, that just my assumption, and my impression.
This makes much more sense than your previous response.
When you have unexpected results, you need to examine you assumptions and the methodology behind them.
You can't simply dismiss the results out of hand. In your examples of code coverage, or fast file transfers, you need to dig deeper, not dismiss the results out of hand.
For instance, testing the Object.toString() function (although I'm not sure how you'd test that) is something that could result in abnormally high test coverage. Also, when you transfer a file quite quickly, is the OS doing any sort of compression on the data? Is it only for that file? How lossy is the transfer? Etc etc.
Simply dismissing the results out of hand because they disagree with your preconceived notions of reality is wrong.
Japanese scientists were convinced that the beriberi epidemic was due to germs rather than a nutritional deficiency because that was the prevailing notion of the day.
Based on this data, the difference between the height of 21-year-old men and women in The Netherlands is approximately 13 centimeters. That is a Cohen’s d of 2. That’s the effect size in the hungry judges study.
If hunger had an effect on our mental resources of this magnitude, our society would fall into minor chaos every day at 11:45 a.m. Or at the very least, our society would have organized itself around this incredibly strong effect of mental depletion. Just like manufacturers take size differences between men and women into account when producing items such as golf clubs or watches, we would stop teaching in the time before lunch, doctors would not schedule surgery, and driving before lunch would be illegal. If a psychological effect is this big, we don’t need to discover it and publish it in a scientific journal—you would already know it exists. Sort of how the “after lunch dip” is a strong and replicable finding that you can feel yourself (and that, as it happens, is directly in conflict with the finding that judges perform better immediately after lunch—surprisingly, the authors don’t discuss the after lunch dip).
It is impossible because we have very strong evidence against it.
The flaw is that the leap from decision in difficult cases to dangerous driving
or other behavior are wild guesses. He sets up straw men versions of what the impact of the evidence should be, then rebuts them.
The reality is that the same psychological state that makes a judge less lenient might also make us better drivers, not worse.
You're missing the point. The point is that the effect sizes in question are large enough that people should notice them in their day to day lives, because the proposed mechanism of action should not only apply to judges, but should apply to everyone everywhere. But we don't see this big obvious affect in other places.
I'd like to postulate that it may be that such chaos actually exists now, it is just spread throughout the day as not everyone takes breakfast/elevensies/lunch/dinner at exactly the same time as their peers, and metabolisms vary from person to person.
> It is impossible because we have very strong evidence against it.
We do not have evidence against it.
Everything he has written is the _lack_ of things that he thinks should logically happen, rather than things that are happening that would disprove the hunger hypothesis.
Evidence would some other experiential data, such as the number of birdies in golf remaining the same before and after lunch.
>Everything he has written is the _lack_ of things that he thinks should logically happen, rather than things that are happening that would disprove the hunger hypothesis.
Actually arguing on the "lack of things that should logically happen" is exactly the same as arguing on "things that are happening".
In other words, what he says is the same whether he says "It can't be happen because X would be happen too and we don't see X", or he says "It can't happen because we see Y" (where Y is not X).
> Actually arguing on the "lack of things that should logically happen" is exactly the same as arguing on "things that are happening".
Reasoning from first principles is a huge downfall the higher up the science stack you go. The only time you really reason from first principles is in math. When you get to psychology and sociology, you need try fit observed data to hypotheses.
The author has suggested interesting follow up experiments, but has given no backing for the disproval of the hypothesis other than "I don't believe that can possibly be true."
That's not science, or even an informed opinion on science. That's just religion.
>Reasoning from first principles is a huge downfall the higher up the science stack you go. The only time you really reason from first principles is in math. When you get to psychology and sociology, you need try fit observed data to hypotheses.
Observed reality/experience -- which is what the author invokes -- is not "first principles" though.
> Observed reality/experience -- which is what the author invokes -- is not "first principles" though.
You're missing a huge point though, which is the difference between "observed reality" and calling out BS.
The author simply says that all these other things should happen if the hypothesis is incorrect, without any experimental data behind it. There is no evidence behind his assertion that all these other things should happen, so he's reasoning from his life experiences and his personal biases.
I completely agree. The article's author suggests that the study's result is too good (to be true), and casts that as the sole reason to reject the study. I think that such a surprising result definitely deserves scrutiny, but barring problems with the method or stats used, it's merely surprising, not "impossible".
Within the paper itself, the authors clearly considered this:
A key aspect for interpreting the association
between the ordinal position of a case and parole
decisions is whether an unobserved factor determines
case order in such a way that yields the pattern of
results we obtain.
They then interview judges and lawyers and examine the process to rule these out! Which is rare enough for a quantitative study like this (where they often publish the conclusions and just say "it could be X"). Unless the author can cite a more specific grievance, I'll trust the data.
The beginning of the article has several links giving reasons why the correlation could be explained by other effects. This article explains why the correlation can't be explained by the given reason.
> This article explains why the correlation can't be explained by the given reason.
The correlation exists, and the authors of the paper suggest a possible hypothesis that would explain it.
This article explains nothing - just goes into flights of fancy describing possible outcomes based on the author's logic. If there were actually studies that show leniency vs. hunger that contradict original findings, that would explain why the correlation should not be explained by the given reason.
In fact, the referenced study is a perfect example of a good science paper, and the scientific process in general. "Hey, we have a big anomaly here, here's a possible explanation. This explanation also suggests other further papers/studies that could explain the problem."
Well it's probably not a bad paper. And the follow-up from the authors seems pretty solid too. http://m.pnas.org/content/108/42/E834.full But that doesn't mean it's right. And yeah the linked article isn't going to get published in PNAS or anything but it's an interesting thought experiment.
1. Prisoners are ordered by whether they have an attorney (so those going last in a session are self-represented)
2. Judges order prisoners often by "complexity of the case", so the ones taking most time are first (and therefore most likely to have a favorable decision, a short/easy case is probably a no-parole situation)
3. Statistically, the cases that gained parole take longer than those that don't, so even if they are random/normally distributed, then if one falls at the end, it will come back to the beginning of next session. A simulation shows that even with random ordering, you still get the same graph (because the long/complex cases that could be paroled tend to be moved to next session when the session is almost out of time).
So after reading this, the graph means: Ratio of cases with a lawyer that can be finished in the remaining time in the session.
Both values are decreasing as the session continues, so it produces a heavy down sloping graph that resets each session to some approximately random value (.65).