A practical setting: TROPHY

Get Complete Project Material File(s) Now! »

Chapter 2 Design Approaches


Parallel-group trials are studies used to develop new therapeutic measures. In these trials, participants who meet the inclusion criteria, are randomly allocated to one of two groups — one group receives a treatment, the other does not — then a comparison is made between the groups to determine the efficacy of the treatment [28].
In the case of TROPHY, participants were randomised to receive can desartan. The difference in cumulative diagnoses of hypertension between the treatment and control groups over the 4 year study was the compari son used to determine if treatment with candesartan suppressed BP after treatment ceased. Our first approach involved simulating parallel-group trials, using varying rules for diagnosis, and analysing differences in cumulative incidence. We explored rates of false positives, false negatives, and power and found parallel group designs inadequate. In addition, we lowered the inclusion criteria to allow for people with systolic BP as low as 110 mm Hg; even with this alteration the approach was ineffective.
Our second approach examined whether a crossover design could allow the desired carryover effects to be tested for reliably, from simple comparisons of cumulative incidence. In a crossover design, the treatment group would receive the intervention first, and so would be observed during intervention and carryover. The control group would receive the control first, and so would be observed during pre-treatment and intervention periods. As demonstrated in 2.1 it is plausible that a comparison of cumulative incidence would allow tests and estimates of the difference between carryover and pre-treatment periods, i.e., for the carryover effect. Although it is unusual to use a crossover design for studying an irreversible outcome, other recent examples exist [29,30].
In a two-period two-treatment crossover trial half the subjects receive treatment A first, then crossover to treatment B [31] as demonstrated by Figure 2.1. For our purposes, treatment A lowers BP while treatment B in volves placebo only. Here we see 3 possibilities: treatment second, treatment first and 1.5 years carryover, and treatment first and no carryover.
It is plausible that a crossover design allows for detecting the existence of carryover and estimating its cumulative magnitude. The treatment and control arms of the study experience equal times with lowered BP when there is no carryover and the numbers of people diagnosed in both arms would be approximately equal at the same time from the start of treatment. Any carryover effect would result in lower cumulative incidence in the treatment first arm, by an amount that depends on the magnitude and duration of the carryover effect.
Simple comparisons are frequently used to determine treatment effects in crossover trials [32]; we explore this methodology to see if it correctly identifies carryover. We test an analysis for determining carryover when the effect of treatment is known. This differs from the norm, which determines the effect of treatment in the presence of carryover. Is detection of carry over possible solely using design or will more complex analysis of results be necessary?
Carryover has been an important topic in the literature on crossover trials, but primarily as a nuisance factor. That is, trials have been designed so that the carryover effect need not be known when estimating the effect of an intervention during the treatment period. Even with this goal, it is controversial whether existing crossover designs can usefully handle carryover effects. Stephen Senn cautions that“including carry-over [sic] in the model has a disastrous effect on efficiency.” [30]. Two period crossover designs have been criticised as having low power which may indicate that this type of design is not suitable [33, 34, 35, 36]. Tests for carryover involving more complex designs are faulted for unrealistic assumptions [33, 34] — namely, that carryover occurs for at most one period, or that carryover has equal effect throughout all subsequent periods. Stephen Senn laments “that little has appeared … on the subject of modelling for carry-over [sic] that is more grounded in clinical and pharmocological reality [37].”
This chapter is a systematic simulation study where we attempt to find more robust designs with which to test a carryover hypothesis by focussing on altering various parameters and analysing rates of both false positives and false negatives.


We conducted both parallel-group and crossover simulations to evaluate the tests for the presence of carryover effect and the estimates of its magnitude. Both sets of simulations began with a random number distributed uniformly between 125 – 140 mm Hg as the participants in the TROPHY trial were prehypertensive [6]. From here a trend is used as BP increases over time, [38, 39] we used a trend of 0 , 1, and 2 mm Hg per year which is similar to what has been used in simulations which have replicated TROPHY [21, 22]. BP is variable due to both measurement error and intra-individual variability which we combined to assume normally distributed standard deviations of 3, 5, and 7 mm Hg [21, 22]. We used treatment effects of −5 and −10 mm Hg. Measurements were taken either 3 monthly, 6 monthly, or yearly and carryovers of length 0, 0.5, 1, 1.5, and 2 years were assumed. Treatment length was 2 years for the crossover trial which is the length of time participants received treatment in TROPHY [6]. For the parrallel-group trial, the duration of the treatment was either 1, 1.5, 2, 2.5, or 3 years [21]. Simulations for the parallel-group design also looked at varying the inclusion criteria, by sampling from a uniform distribution with the baseline BP from 110–140, 120–140, or 130–140 mm Hg. We fixed the design at 2 years of treatments [6], 1 mm Hg per year trend in BP [21], and a standard deviation of 5 mm Hg [21]. The carryover duration was 0, .5, 1, 1.5, or 2 years [6], measurements 3 monthly, 6 monthly, or yearly [21]. Cummulative incidence is the proportion of simulated participants diag nosed with hypertension throughout our 4 year study. The simulations produced an estimate of the cumulative incidence of diagnosis in the two trial arms. The Type I error rate (and the power, not reported) were computed from these two cumulative incidences and the sample size using standard for mulae for power calculation [40]. We report the rates of false positives and false negatives. We define false positives as the probability of rejecting the null hypothesis (there is no carryover) when the model that generated the data has no carryover; false negatives is the probability of failing to reject the null hypothesis when the model used to generate the data has carryover.
A three-arm trial with combination parallel and crossover was also simulated where one arm received treatment first, another treatment second, and the third no treatment. The same values and parameters were used for this simulation as those above.

Rules for diagnosis

In addition to the formula for BP it was necessary to develop criteria to establish when a person became hypertensive. We analysed five feasible criteria for diagnosing hypertension using a threshold of 140 mm Hg outlined in table 2.1: if one measurement was above, if two consecutive measurements were above, if three measurements were above, if the average of two consecutive measurements were above, and if the average of three consecutive measurements were above. To illustrate the importance of measurement error we also considered a rule that diagnosed hypertension when both the measured systolic pressure and the underlying long-term systolic BP were above threshold. This rule is not of practical use, although it could be implemented by averaging a large number of measurements over a period of days for anyone who had a single measurement over 140 mm Hg. Both parallel and crossover designs were analysed using all rules.

 Results and discussion

Parallel-group design — false positives

Figure 2.2 shows the rates of false positives across four rules studied with values above 25% represented with 25%. These are all significantly higher than the accepted rate of 5%. The x axis of each graph tells us the length of time participants received treatment, with differing measurement standard deviations found in rows and columns signifying varying rules. The line types distinguish the frequency of measurements, as indicated in the key. All the results have trend of 1 mm Hg per year. Type I error rate is inflated except for the smallest measurement error and shortest period of active treatment. Two rules are omitted from the graph. One, where people are diagnosed when 3 measurements are above the threshold, has been studied previously [22, 23, 24], the other is the infeasible rule that uses the true long-term average BP. This rule is the only one that does achieve close to nominal Type I error rate which demonstrates that measurement error impinges upon the
effectiveness of parallel designs. Figure 2.2 utilised the results of simulations described in section 2.2 and the R function power.prop.test to compute the power of a 2-sample comparison of proportions [40]. We inputted the proportions of participants diagnosed in two simulated 10000 person trials where the treatments in the control and treatment arms differed; the control arm was consistently 0 mm Hg. Other inputs included 400 participants and a type-1 error rate of 5%.
We used a two-sided alternative hypothesis and found the power which is displayed in figure 2.2. All the trials in this figure have no carryover included in the model used to simulate the data. As the power is the probability of rejecting the null hypothesis, that the proportions are equivalent and there is no carryover, we find the probability of rejecting the null when it is true.

READ  The Health Care Professional as a Therapeutic Person

Parallel-group design — false negatives

Figure 2.3 shows the rates of false negatives when testing a carryover hypothesis when the length of carryover is 2 years. The x axis of each graph indicates the duration of treatment in each simulated trial. This graph demonstrates that we are likely to find carryover when the length of treatment and carry over period span the duration of the study. Carryover is 2 years and lengths of treatment longer than 2 years all have rates of false negatives close to 0 for all rules. Studies with lower standard deviations have higher rates of false negatives with short lengths of treatment; at the onset of the study participants are less likely to be falsely diagnosed as BP measurements are under the threshold and there is less error. Using the rule which diagnoses by averaging 3 measurements, yearly measurements are not frequent enough for diagnosis to occur when treatment lengths are small. As in section 2.3.1 two rules are not included. We obtained results for figure 2.2 by adjusting the false positive rates described in section 2.3.1 two ways. Firstly, we included trials where data was simulated with 2 years of carryover and secondly, we subtracted the power from 1. With these changes, we have the probability of failing to reject the null hypothesis (that there is no carryover in the model) when there is carryover in the model. This gives us the rates of false positives.

 Parallel-group design — inclusion criteria

Measurement error can lead to false positive diagnosis only when true BP is relatively close to the threshold, so varying inclusion criteria for baseline long-term-average BP was considered. Figure 2.4 shows differences in cumulative incidence of diagnosis for three inclusion thresholds (110, 120, mm Hg) in the presence and absence of carryover. Each line colour shows differing magnitudes of treatment, as indicated by the key. The columns indicate differing measurement schedules while the x axis for each graph indicates the length of carryover. Including participants with lower BP reduces the estimated carryover effect under the null hypothesis, but also under thealternative hypothesis.

Crossover design — false positives

Figure 2.5 shows the Type I error rate for five rules studied, all of which are higher than the nominal 5% rate with a maximum rate of 25%. The x axis of each graph denotes the yearly trend in systolic BP, differing line types indicate measurement schedules, per the key. Each rule receives its own column, while standard deviations are in rows. As none of the rules even has suitable Type I error rate, we do not present results on the power of the tests or the estimates of carryover magnitude.
As in section 2.3.1 the rates of false positives used the R function power.prop.test; the calculations were the same, only the data for the control group also received treatment. The rule which involves removing variation is not included; even this design had high Type I error.

 Combination of parallel and crossover

As the parallel group design has positive bias [21, 22] and the crossover design has negative bias (which will be discussed in section 2.3.6), there is the potential for combining the two and having the bias cancel out. The combination design would be a two period, three arm crossover trial in which 1 arm receives treatment for the first period and placebo the second, another receives placebo for both periods, and the last placebo followed by treatment. Our simulation results show that this combination cannot be made to work reliably. The rules tested demonstrated inconsistent differences in relation to increasing amounts of carryover.

Crossover faults

Previous sections in 2.3 utilised only results from simulations to demonstrate that identifying carryover via design is ineffective. In figure 1.3 we discussed the reasons for the error in TROPHY’s design. After careful consideration of a crossover design we suggest a heuristic explanation for it’s failure.
Figure 2.6 illustrates why the crossover design failed and why even slight trends over time are problematic. To demonstrate this we simulate 2 pairs of systolic BP with trend 1 mm Hg per year, without carryover, with measurements taken every 3 months, without error. One pair has baseline systolic pressure of approximately 137.5 mm Hg, the other with baseline systolic pressure of approximately 139 mm Hg. These two treatment: control pairs differ only due to randomisation to treatment first or second. For a crossover design to effectively diagnose carryover using differences in cumulative incidence, diagnosis rates in treatment and control arms should vary based only upon the presence of carryover; no inherent bias can be present. Figure 2.6 shows that the difference in cumulative diagnosis without carry over in this example is biased. The dotted lines indicate that the participants have true incident hypertension and are being treated per trial protocol; the data is no longer available. Of the control pairs only the one with high base line pressure is diagnosed in the first period and not available in the second, in the treatment pairs both are diagnosed in the second period. The effect of the trend is to selectively remove higher-risk individuals from the control-first arm of the trial and results in a bias when testing for carryover.


Although carryover effects are potentially important, especially for intensive lifestyle interventions, they are difficult to assess reliably. Most parallel group designs examined had low levels of false negatives. However, in a wide range of parallel-group designs with data simulated we have shown that randomisation fails to preserve the Type I error rate even approximately. The bias is smaller when the active treatment period is short and the followup is long (relative to the spacing of measurements), when the measurement error in systolic BP is (unrealistically) small, and when the inclusion criteria are broad enough to allow participants who are far from the threshold, and thus not ethical to treat. A short active treatment period and broad inclusion criteria also reduce the estimated carryover effect when carryover is truly present, so they are not a solution to the problem. Rather than modifying the design so that carryover effects can be demonstrated by comparing cumulative incidence of diagnosis, it may be necessary to develop new statistical methodology to extract valid estimates from these designs. Also, a simple comparison of the cumulative incidence of diagnosis in a crossover trial provides a valid test for carryover effects only when there is no trend in the underlying outcome variable (or the trial is too short for the trend to be apparent). Unfortunately, blood pressure in prehypertensives, fasting glucose in prediabetics, and lipid testing all exhibit non-negligible trends. It does not appear possible to design a parallel-group or crossover study where carryover effects of this sort can be estimated by simple comparisons of cumulative incidence diagnosis.

1 Introduction
1.1 History
1.2 Noisy measurements
1.3 A practical setting: TROPHY
1.4 Possible solutions
2 Design Approaches
2.1 Introduction
2.2 Simulation
2.3 Results and discussion
2.4 Summary
3 Linear Mixed Model
3.1 Exploring the correlation structure
3.2 Methods
3.3 Results and discussion
3.4 Summary
4 Survival Analysis
4.1 Introduction
4.2 Notation
4.3 Models
4.4 Simulations
4.5 Results and discussion
4.6 Summary
5 Discussion, Conclusions, and Future Work

Statistical Modeling of Carryover Effects After Cessation of Treatments

Related Posts