Thank you for the question. It seems likely that the reason that the studies are not reproducible is caused by changing mix of cases in each new RCT study group. The compromises made to gain larger cohort size produces a fragile state wherein the small baseline group of diseases for which the protocol is not the right treatment will not be stable. This baseline (and the associated protocol failure rate) can markedly increase if the disease population mix changes. Therefore one RCT will show benefit for the group and another will not because it’s a different mix.

No it is not reasonable. My point is that the hard work determining the signals which define the different diseases (and the mortalities of those disease), of which sepsis is comprised, has to be done but this is not possible if we a priori define “sepsis” for RCT by a set of unifying thresholds.

Very Very good question. We are designing one now and we don’t have a statistician in the mix yet. Given your question I will fix that next week.

No. I am simply saying that the math is a continuum from measurement to statistics to output.

If a statistician simply asked " What was the origin of this measurement (eg SOFA score) ? How reproducible is it? Where did these threshold cutoffs come from? Why are these signals chosen?

These are the questions required to make sure the statistician is not wasting her time embellishing a fake measurement. If nothing else these questions would result in an enhanced discussion of limitations of the trial which would help future researchers avoid repeating the design error. Without the questions we get what we have already seen in this forum, siloed discussions about the statistics used with the SOFA measurement, with no discussion of the math or limitations of SOFA. That fools the young researcher and statistician into thinking SIRS or SOFA are valid and reproducible measurement tools so they simply make the same mistake propagating the mistake for decades.