Chapter 2 Design Approaches
Parallel-group trials are studies used to develop new therapeutic measures. In these trials, participants who meet the inclusion criteria, are randomly allocated to one of two groups — one group receives a treatment, the other does not — then a comparison is made between the groups to determine the eﬃcacy of the treatment .
In the case of TROPHY, participants were randomised to receive can desartan. The diﬀerence in cumulative diagnoses of hypertension between the treatment and control groups over the 4 year study was the compari son used to determine if treatment with candesartan suppressed BP after treatment ceased. Our ﬁrst approach involved simulating parallel-group trials, using varying rules for diagnosis, and analysing diﬀerences in cumulative incidence. We explored rates of false positives, false negatives, and power and found parallel group designs inadequate. In addition, we lowered the inclusion criteria to allow for people with systolic BP as low as 110 mm Hg; even with this alteration the approach was ineﬀective.
Our second approach examined whether a crossover design could allow the desired carryover eﬀects to be tested for reliably, from simple comparisons of cumulative incidence. In a crossover design, the treatment group would receive the intervention ﬁrst, and so would be observed during intervention and carryover. The control group would receive the control ﬁrst, and so would be observed during pre-treatment and intervention periods. As demonstrated in 2.1 it is plausible that a comparison of cumulative incidence would allow tests and estimates of the diﬀerence between carryover and pre-treatment periods, i.e., for the carryover eﬀect. Although it is unusual to use a crossover design for studying an irreversible outcome, other recent examples exist [29,30].
In a two-period two-treatment crossover trial half the subjects receive treatment A ﬁrst, then crossover to treatment B  as demonstrated by Figure 2.1. For our purposes, treatment A lowers BP while treatment B in volves placebo only. Here we see 3 possibilities: treatment second, treatment ﬁrst and 1.5 years carryover, and treatment ﬁrst and no carryover.
It is plausible that a crossover design allows for detecting the existence of carryover and estimating its cumulative magnitude. The treatment and control arms of the study experience equal times with lowered BP when there is no carryover and the numbers of people diagnosed in both arms would be approximately equal at the same time from the start of treatment. Any carryover eﬀect would result in lower cumulative incidence in the treatment ﬁrst arm, by an amount that depends on the magnitude and duration of the carryover eﬀect.
Simple comparisons are frequently used to determine treatment eﬀects in crossover trials ; we explore this methodology to see if it correctly identiﬁes carryover. We test an analysis for determining carryover when the eﬀect of treatment is known. This diﬀers from the norm, which determines the eﬀect of treatment in the presence of carryover. Is detection of carry over possible solely using design or will more complex analysis of results be necessary?
Carryover has been an important topic in the literature on crossover trials, but primarily as a nuisance factor. That is, trials have been designed so that the carryover eﬀect need not be known when estimating the eﬀect of an intervention during the treatment period. Even with this goal, it is controversial whether existing crossover designs can usefully handle carryover eﬀects. Stephen Senn cautions that“including carry-over [sic] in the model has a disastrous eﬀect on eﬃciency.” . Two period crossover designs have been criticised as having low power which may indicate that this type of design is not suitable [33, 34, 35, 36]. Tests for carryover involving more complex designs are faulted for unrealistic assumptions [33, 34] — namely, that carryover occurs for at most one period, or that carryover has equal eﬀect throughout all subsequent periods. Stephen Senn laments “that little has appeared … on the subject of modelling for carry-over [sic] that is more grounded in clinical and pharmocological reality .”
This chapter is a systematic simulation study where we attempt to ﬁnd more robust designs with which to test a carryover hypothesis by focussing on altering various parameters and analysing rates of both false positives and false negatives.
We conducted both parallel-group and crossover simulations to evaluate the tests for the presence of carryover eﬀect and the estimates of its magnitude. Both sets of simulations began with a random number distributed uniformly between 125 – 140 mm Hg as the participants in the TROPHY trial were prehypertensive . From here a trend is used as BP increases over time, [38, 39] we used a trend of 0 , 1, and 2 mm Hg per year which is similar to what has been used in simulations which have replicated TROPHY [21, 22]. BP is variable due to both measurement error and intra-individual variability which we combined to assume normally distributed standard deviations of 3, 5, and 7 mm Hg [21, 22]. We used treatment eﬀects of −5 and −10 mm Hg. Measurements were taken either 3 monthly, 6 monthly, or yearly and carryovers of length 0, 0.5, 1, 1.5, and 2 years were assumed. Treatment length was 2 years for the crossover trial which is the length of time participants received treatment in TROPHY . For the parrallel-group trial, the duration of the treatment was either 1, 1.5, 2, 2.5, or 3 years . Simulations for the parallel-group design also looked at varying the inclusion criteria, by sampling from a uniform distribution with the baseline BP from 110–140, 120–140, or 130–140 mm Hg. We ﬁxed the design at 2 years of treatments , 1 mm Hg per year trend in BP , and a standard deviation of 5 mm Hg . The carryover duration was 0, .5, 1, 1.5, or 2 years , measurements 3 monthly, 6 monthly, or yearly . Cummulative incidence is the proportion of simulated participants diag nosed with hypertension throughout our 4 year study. The simulations produced an estimate of the cumulative incidence of diagnosis in the two trial arms. The Type I error rate (and the power, not reported) were computed from these two cumulative incidences and the sample size using standard for mulae for power calculation . We report the rates of false positives and false negatives. We deﬁne false positives as the probability of rejecting the null hypothesis (there is no carryover) when the model that generated the data has no carryover; false negatives is the probability of failing to reject the null hypothesis when the model used to generate the data has carryover.
A three-arm trial with combination parallel and crossover was also simulated where one arm received treatment ﬁrst, another treatment second, and the third no treatment. The same values and parameters were used for this simulation as those above.
Rules for diagnosis
In addition to the formula for BP it was necessary to develop criteria to establish when a person became hypertensive. We analysed ﬁve feasible criteria for diagnosing hypertension using a threshold of 140 mm Hg outlined in table 2.1: if one measurement was above, if two consecutive measurements were above, if three measurements were above, if the average of two consecutive measurements were above, and if the average of three consecutive measurements were above. To illustrate the importance of measurement error we also considered a rule that diagnosed hypertension when both the measured systolic pressure and the underlying long-term systolic BP were above threshold. This rule is not of practical use, although it could be implemented by averaging a large number of measurements over a period of days for anyone who had a single measurement over 140 mm Hg. Both parallel and crossover designs were analysed using all rules.
Results and discussion
Parallel-group design — false positives
Figure 2.2 shows the rates of false positives across four rules studied with values above 25% represented with 25%. These are all signiﬁcantly higher than the accepted rate of 5%. The x axis of each graph tells us the length of time participants received treatment, with diﬀering measurement standard deviations found in rows and columns signifying varying rules. The line types distinguish the frequency of measurements, as indicated in the key. All the results have trend of 1 mm Hg per year. Type I error rate is inﬂated except for the smallest measurement error and shortest period of active treatment. Two rules are omitted from the graph. One, where people are diagnosed when 3 measurements are above the threshold, has been studied previously [22, 23, 24], the other is the infeasible rule that uses the true long-term average BP. This rule is the only one that does achieve close to nominal Type I error rate which demonstrates that measurement error impinges upon the
eﬀectiveness of parallel designs. Figure 2.2 utilised the results of simulations described in section 2.2 and the R function power.prop.test to compute the power of a 2-sample comparison of proportions . We inputted the proportions of participants diagnosed in two simulated 10000 person trials where the treatments in the control and treatment arms diﬀered; the control arm was consistently 0 mm Hg. Other inputs included 400 participants and a type-1 error rate of 5%.
We used a two-sided alternative hypothesis and found the power which is displayed in ﬁgure 2.2. All the trials in this ﬁgure have no carryover included in the model used to simulate the data. As the power is the probability of rejecting the null hypothesis, that the proportions are equivalent and there is no carryover, we ﬁnd the probability of rejecting the null when it is true.
Parallel-group design — false negatives
Figure 2.3 shows the rates of false negatives when testing a carryover hypothesis when the length of carryover is 2 years. The x axis of each graph indicates the duration of treatment in each simulated trial. This graph demonstrates that we are likely to ﬁnd carryover when the length of treatment and carry over period span the duration of the study. Carryover is 2 years and lengths of treatment longer than 2 years all have rates of false negatives close to 0 for all rules. Studies with lower standard deviations have higher rates of false negatives with short lengths of treatment; at the onset of the study participants are less likely to be falsely diagnosed as BP measurements are under the threshold and there is less error. Using the rule which diagnoses by averaging 3 measurements, yearly measurements are not frequent enough for diagnosis to occur when treatment lengths are small. As in section 2.3.1 two rules are not included. We obtained results for ﬁgure 2.2 by adjusting the false positive rates described in section 2.3.1 two ways. Firstly, we included trials where data was simulated with 2 years of carryover and secondly, we subtracted the power from 1. With these changes, we have the probability of failing to reject the null hypothesis (that there is no carryover in the model) when there is carryover in the model. This gives us the rates of false positives.
Parallel-group design — inclusion criteria
Measurement error can lead to false positive diagnosis only when true BP is relatively close to the threshold, so varying inclusion criteria for baseline long-term-average BP was considered. Figure 2.4 shows diﬀerences in cumulative incidence of diagnosis for three inclusion thresholds (110, 120, mm Hg) in the presence and absence of carryover. Each line colour shows diﬀering magnitudes of treatment, as indicated by the key. The columns indicate diﬀering measurement schedules while the x axis for each graph indicates the length of carryover. Including participants with lower BP reduces the estimated carryover eﬀect under the null hypothesis, but also under thealternative hypothesis.
Crossover design — false positives
Figure 2.5 shows the Type I error rate for ﬁve rules studied, all of which are higher than the nominal 5% rate with a maximum rate of 25%. The x axis of each graph denotes the yearly trend in systolic BP, diﬀering line types indicate measurement schedules, per the key. Each rule receives its own column, while standard deviations are in rows. As none of the rules even has suitable Type I error rate, we do not present results on the power of the tests or the estimates of carryover magnitude.
As in section 2.3.1 the rates of false positives used the R function power.prop.test; the calculations were the same, only the data for the control group also received treatment. The rule which involves removing variation is not included; even this design had high Type I error.
Combination of parallel and crossover
As the parallel group design has positive bias [21, 22] and the crossover design has negative bias (which will be discussed in section 2.3.6), there is the potential for combining the two and having the bias cancel out. The combination design would be a two period, three arm crossover trial in which 1 arm receives treatment for the ﬁrst period and placebo the second, another receives placebo for both periods, and the last placebo followed by treatment. Our simulation results show that this combination cannot be made to work reliably. The rules tested demonstrated inconsistent diﬀerences in relation to increasing amounts of carryover.
Previous sections in 2.3 utilised only results from simulations to demonstrate that identifying carryover via design is ineﬀective. In ﬁgure 1.3 we discussed the reasons for the error in TROPHY’s design. After careful consideration of a crossover design we suggest a heuristic explanation for it’s failure.
Figure 2.6 illustrates why the crossover design failed and why even slight trends over time are problematic. To demonstrate this we simulate 2 pairs of systolic BP with trend 1 mm Hg per year, without carryover, with measurements taken every 3 months, without error. One pair has baseline systolic pressure of approximately 137.5 mm Hg, the other with baseline systolic pressure of approximately 139 mm Hg. These two treatment: control pairs diﬀer only due to randomisation to treatment ﬁrst or second. For a crossover design to eﬀectively diagnose carryover using diﬀerences in cumulative incidence, diagnosis rates in treatment and control arms should vary based only upon the presence of carryover; no inherent bias can be present. Figure 2.6 shows that the diﬀerence in cumulative diagnosis without carry over in this example is biased. The dotted lines indicate that the participants have true incident hypertension and are being treated per trial protocol; the data is no longer available. Of the control pairs only the one with high base line pressure is diagnosed in the ﬁrst period and not available in the second, in the treatment pairs both are diagnosed in the second period. The eﬀect of the trend is to selectively remove higher-risk individuals from the control-ﬁrst arm of the trial and results in a bias when testing for carryover.
Although carryover eﬀects are potentially important, especially for intensive lifestyle interventions, they are diﬃcult to assess reliably. Most parallel group designs examined had low levels of false negatives. However, in a wide range of parallel-group designs with data simulated we have shown that randomisation fails to preserve the Type I error rate even approximately. The bias is smaller when the active treatment period is short and the followup is long (relative to the spacing of measurements), when the measurement error in systolic BP is (unrealistically) small, and when the inclusion criteria are broad enough to allow participants who are far from the threshold, and thus not ethical to treat. A short active treatment period and broad inclusion criteria also reduce the estimated carryover eﬀect when carryover is truly present, so they are not a solution to the problem. Rather than modifying the design so that carryover eﬀects can be demonstrated by comparing cumulative incidence of diagnosis, it may be necessary to develop new statistical methodology to extract valid estimates from these designs. Also, a simple comparison of the cumulative incidence of diagnosis in a crossover trial provides a valid test for carryover eﬀects only when there is no trend in the underlying outcome variable (or the trial is too short for the trend to be apparent). Unfortunately, blood pressure in prehypertensives, fasting glucose in prediabetics, and lipid testing all exhibit non-negligible trends. It does not appear possible to design a parallel-group or crossover study where carryover eﬀects of this sort can be estimated by simple comparisons of cumulative incidence diagnosis.
1.2 Noisy measurements
1.3 A practical setting: TROPHY
1.4 Possible solutions
2 Design Approaches
2.3 Results and discussion
3 Linear Mixed Model
3.1 Exploring the correlation structure
3.3 Results and discussion
4 Survival Analysis
4.5 Results and discussion
5 Discussion, Conclusions, and Future Work
GET THE COMPLETE PROJECT
Statistical Modeling of Carryover Eﬀects After Cessation of Treatments