Lately, I've had the opportunity to work with data from a randomized controlled trial. It's interesting, as despite all the efforts taken to ensure that the randomizing of treatment assignment is valid, this is still the real world and things happen. With these imperfections occurring, it's important to understand just what you're doing when you're trying to conduct statistical inference. Let me explain.
Scenario
We have a randomized controlled trial, in which cancer patients are randomized to either placebo or active treatment. These patients are followed regularly until the occurrence of some end event (e.g. death, disease progression, etc) to see how the active treatment compares with the placebo group. In the ideal scientific scenario, both treatment groups would stay on their assigned treatment during their entire follow-up. 
In practice, however, many of the patients in the placebo group decline in health. For example, we could be interested in overall survival, but we notice that many patients are having disease progressions. As is commonly drawn up in study protocols, these patients are then allowed to cross over to active treatment. Of interest to the relevant scientists is how to draw inference in the face of this informative crossover design. 
One well known approach at accounting for these crossovers when modeling survival is by implementing the rank preserving structural failure time model (RPSFTM).
One well known approach at accounting for these crossovers when modeling survival is by implementing the rank preserving structural failure time model (RPSFTM).
RPSFTM
Before I can really go into this, I first need to review the idea of "counterfactuals." In reality, a person is normally only exposed to one treatment regime and their outcome observed under that treatment. For example, in the RCT you either randomize them to placebo or active treatment and then follow them to observe their outcome. Now, what if, "contrary to fact", you observed the same person's outcome under both treatment assignments? We can think of it as duplicating the person into two copies, assigning each to separate treatment arms, and following both to observe their outcomes. This is the idea of counterfactuals. Rather than just accepting their outcome under the realized treatment assignment, you try to also model their counterfactual outcome under other potential treatment assignments as well. 
The RPSFTM makes use of this counterfactual concept to model the outcome under no treatment. It will help to explain if we think of this under the simplified scenario of no crossovers. This model tries to estimate the survival time of all subjects off treatment. Usually in an RCT, half the patients are randomize to placebo. Therefore, for these subjects, there is no "contrary to the fact" as in reality their observed survival is the survival off treatment.
For the patients on active treatment, we try to model their survival time off treatment under an accelerated failure time model by making use of the randomization that happened at the beginning of the study. That is, we assume that if these subjects did not undergo active treatment, their survival would have been equivalent to the survival of subjects off treatment. We do this by using the survival curve from those subjects on treatment to try and fit the Acceleration Factor (e^\psi) and test how close the resulting survival curve is to that from the placebo group. The figure below demonstrates this.
To test how close the curves are, we can rely on common statistics such as the log-rank test or the Wilcoxon test. The cox proportional hazards parameter can also be used. Our estimate of the acceleration factor is then the value of the statistic which minimizes the test statistic, implying that the two survival curves are as indistinguishable as possible.
I've put together an R-package which implements this model, uploaded at www.github.com/tranlm/rpsftm. For anyone interested, you can install it directly with the following command in R:
devtools::install_github(“tranlm/rpsftm”)
As always: Stay hungry. Stay foolish.
The RPSFTM makes use of this counterfactual concept to model the outcome under no treatment. It will help to explain if we think of this under the simplified scenario of no crossovers. This model tries to estimate the survival time of all subjects off treatment. Usually in an RCT, half the patients are randomize to placebo. Therefore, for these subjects, there is no "contrary to the fact" as in reality their observed survival is the survival off treatment.
For the patients on active treatment, we try to model their survival time off treatment under an accelerated failure time model by making use of the randomization that happened at the beginning of the study. That is, we assume that if these subjects did not undergo active treatment, their survival would have been equivalent to the survival of subjects off treatment. We do this by using the survival curve from those subjects on treatment to try and fit the Acceleration Factor (e^\psi) and test how close the resulting survival curve is to that from the placebo group. The figure below demonstrates this.
To test how close the curves are, we can rely on common statistics such as the log-rank test or the Wilcoxon test. The cox proportional hazards parameter can also be used. Our estimate of the acceleration factor is then the value of the statistic which minimizes the test statistic, implying that the two survival curves are as indistinguishable as possible.
I've put together an R-package which implements this model, uploaded at www.github.com/tranlm/rpsftm. For anyone interested, you can install it directly with the following command in R:
devtools::install_github(“tranlm/rpsftm”)
As always: Stay hungry. Stay foolish.

