Teorema de Bayes: el problema de la barrica sueltA
En esta entrada de Conthe se narra el caso y se explica el análisis. Conthe remite al trabajo Carga de la prueba y responsabilidad objetiva de Fernando Gómez Pomar que dice sobre la doctrina res ipsa loquitur.Esta doctrina, originaria del Common Law, autoriza al demandante de una indemnización por responsabilidad extracontractual a obtener la compensación de los daños y perjuicios sin necesidad de aportar prueba de la negligencia del demandado, si las circunstancias de producción del daño hacen imposible o muy difícil pensar que el daño pudiera haberse producido de haber observado el causante las exigencias de la diligencia debida…. en el nivel de cuidado socialmente óptimo, la probabilidad de accidente y acaecimiento del daño es cero … Como lo expresa el Tribunal Supremo en la STS, 1ª, 9.12.1998, que el evento dañoso se haya producido de tal modo o rodeado de tales circunstancias que se convierta en un evento dañoso de los que normalmente no se producen sino por razón de una conducta negligente del causante. Y ello será tanto más probable, ceteris paribus, cuánto menos intrínsecamente peligrosa sea la actividad generadora de daño. La utilización de la doctrina por el Tribunal Supremo en el sector de responsabilidad médica, que el propio Tribunal considera como una actividad que no debe ser regulada, en principio, por la responsabilidad por riesgo, parece confirmar esta conclusión del análisis teórico
BLINKING ON THE BENCH: HOW JUDGES DECIDE CASES Chris Guthrie/Jeffrey J. Rachlinski/Andrew J. Wistrich
Y, a continuación, un resumen de un artículo también citado por Conthe que resume algunos estudios realizados para averiguar como razonan los jueces – norteamericanos –. Los jueces americanos deciden más de 30 millones de casos al año. Primero, un pequeño examen:
COGNITIVE REFLECTION TEST
(1) A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? _____cents
(2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? _____minutes
(3) In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? _____days
In thirty-five separate studies involving 3428 respondents, Frederick found that subjects on average correctly answered 1.24 of the three items, although results varied across the subject pools. For example, students at the University of Toledo obtained an average score of .57, while students at MIT obtained an average score of 2.18.
Abajo está la tabla comparando los resultados obtenidos por los jueces y los obtenidos por alumnos de distintas universidades norteamericanas y los resultados correctos (al final) Está claro que si el test significa algo y uno quiere que sus hijos estén rodeados de compañeros brillantes, tiene que mandarlos al MIT.
Among all of the subjects tested, only 17% answered all three questions correctly, while nearly twice that many (33%) answered all three questions incorrectly…. The judges obtained an average CRT score of 1.23 out of a possible 3.00. This score is slightly higher than the average that student subjects at Michigan achieved and slightly lower than the average student subjects at Harvard achieved …the second question is computationally more challenging than the first, yet more judges answered it correctly. Frederick’s discussion of the CRT, however, predicts precisely this pattern because the second question seems more difficult than the first, which suggests to the test taker that reliance on intuition might be unwise.
The CRT assesses a subset of what psychologists include in measures of intelligence—the capability and willingness to deliberate to solve a problem when intuition would lead one astray… By assessing “introspection, verbal reports and scribbles in the margin,” Frederick found that even those subjects who responded correctly often considered the intuitive answer before selecting the correct answer
The second example of intuitive judicial decision making arises from studies of what psychologists call the “representativeness” heuristic. When people rely on the representativeness heuristic, they tend to undervalue statistical information, which can lead to notable decision errors. For example, people tend to discount information about the frequency with which the underlying category occurs, a phenomenon known as “base rate” neglect.In one illustrative study, researchers asked college students to indicate whether a person described as being “of high intelligence, although lacking . . . creativity” who “has a high need for order and clarity” and whose “writing is rather dull” and who seems to have “little sympathy for other people and does not enjoy interacting with others” was a student in either computer science or in humanities and education. Although the participants knew that three times as many graduate students studied humanities and education as studied computer science, they tended to guess that the student was in computer science.Notwithstanding the high relevance of base-rate statistics, people discount their probative value in favor of impressionistic and intuitive reactions to the representativeness of the information.
El caso de la barrica suelta: Byrne v. Boadle
Recordemos el caso:
The plaintiff was passing by a warehouse owned by the defendant when he was struck by a barrel, resulting in severe injuries. At the time, the barrel was in the final stages of being hoisted from the ground and loaded into the warehouse. The defendant’s employees are not sure how the barrel broke loose and fell, but they agree that either the barrel was negligently secured or the rope was faulty. Government safety inspectors conducted an investigation of the warehouse and determined that in this warehouse: (1) when barrels are negligently secured, there is a 90% chance that they will break loose; (2) when barrels are safely secured, they break loose only 1% of the time; (3) workers negligently secure barrels only 1 in 1,000 times. We then asked: “‘Given these facts, how likely is it that the barrel that hit the plaintiff fell due to the negligence of one of the workers?’” The materials then asked the judges to answer by choosing one of four probability ranges: 0–25%, 26–50%, 51–75%, or 76–100%.
A juristas, es mejor plantearlo así – en términos de rangos – que en términos de cálculo exacto de la probabilidad.
Most of the judges who assessed our problem answered it incorrectly. In fact, only about 40% answered correctly and selected the low range as the actual probability that the accident was the result of negligence. Compared to other people who have evaluated similar statistical problems, the judges we studied performed well. Fewer than 20% of doctors facing a nearly identical problem in a medical context chose the correct answer. Thus, although many of the judges responded intuitively, many others responded deliberatively such that the overall relative performance of judges was admirable.
Del repaso de las sentencias de Audiencias Provinciales de unos meses se deduce que buena parte de los casos se deciden sobre la base de la aplicación de las reglas sobre prueba.
When presented with a problem like this one, most people rely on their intuition—the accident sounds like it was the product of negligence, so intuition would suggest negligence must have caused it. The subjects largely treat the 90% figure as the likelihood that the accident was the product of negligence, thereby converting the true meaning of the 90% statistic (the likelihood of injury given negligence) into its inverse (the likelihood of negligence given injury).A deductive approach reveals that the actual probability that the defendant was negligent is only 8.3%. (“Because the defendant is negligent .1% of the time and is 90% likely to cause an injury under these circumstances, the probability that a victim would be injured by the defendant’s negligence is .09% (and the probability that the defendant is negligent but causes no injury is .01%). Because the defendant is not negligent 99.9% of the time and is 1% likely to cause an injury under these circumstances, the probability that on any given occasion a victim would be injured even though the defendant took reasonable care is 0.999% (and the probability that the defendant is not negligent and causes no injury is 98.901%). As a result, the conditional probability that the defendant is negligent given that the plaintiff is injured equals .090% divided by 1.089%, or 8.3%.”).
O sea, el 90 % del 1/1000 es inferior al 1 %
Unlike chess grandmasters, judges are unlikely to obtain accurate and reliable feedback on most of the judgments they make; indeed, they are only likely to receive external validation (or invalidation) of the accuracy of their judgments when their rulings are challenged on appeal. The appeals process, however, does not provide reliable feedback. Many cases settle before appellate courts resolve the appeal; collateral policy concerns influence the outcome of some appeals, clouding the meaning of appellate decisions for the trial judge; and finally, appeals commonly take years to resolve, heavily diluting the value of any feedback. Moreover, the standards of review require appellate courts to give deference to trial judges on many of their discretionary decisions. By the time an appellate court decides an appeal, the trial judge may have forgotten the nuances of the case, the law may have changed, or the judge may have retired or switched assignments. It is thus not surprising that we found no differences in CRT performance based on judges’ experience or length of service. Unlike chess grandmasters, judges operate in an environment that does not allow them to perfect their intuitive decision-making processes.
No estoy seguro de que tal sea el caso de los jueces de primera instancia españoles, al menos en materia civil.
Cómo razonan los expertos – por ejemplo, los grandes jugadores de ajedrez:Resultados correctos, 1,05/0,5 $; 5; 47.To illustrate, let us [ ]consider George, the dermatologist, who is examining a patient who has a growth below the right eye. When he first sees the growth, George has an immediate intuitive reaction. He has seen many growths in the past, although not necessarily below the right eye. However, the similarity between this growth and others of a particular type is striking. He just sees the resemblance without having to expend mental effort. This is George’s tacit system in action. Yet George also knows that errors are made identifying growths. He therefore deliberately checks various features of this particular growth against a mental checklist in order to query his initial diagnosis. This second process is deliberative. It involves recalling details of codified medical knowledge. It involves attention and mental effort. This is the deliberative system at work.