How to Design a Clinical Trial for Aging
A piece written by Marton Meszaros, Brennan Overhoff & Norn Group
This is an essay where we outline how we believe the first pivotal clinical trial targeting aging needs to be designed. We propose a rough trial design on a conceptual level, but first walk through the thinking that led us to these conclusions. As part of this, we examine different types of trial endpoints that could be used, then discuss the common issues with prevention trials and how an aging drug can help overcome them. We then share a calculator for estimating trial sample size requirements for the suggested trial design, for any chosen therapy. Using this calculator, we try to illustrate how and when aging drugs can make running such a trial an economically sound decision.
Problem: State-of-art ways to get aging drugs to market disincentivise the development of therapeutics that directly address aging mechanisms
Notes for Clarity:
For the purposes of this essay, we define aging as:
○ The sum of underlying biological processes that happens in every human gradually over long periods of time.
○ That makes people more susceptible to diseases associated with high chronological age (e.g., neurodegeneration, cardiometabolic diseases, cancer).
Consequently, for the purposes of this essay, an aging drug is a drug that:
○ Works against these underlying natural processes.
○ Reduces the risk of multiple diseases with unrelated pathologies associated with high chronological age in virtually every human (vs just patient groups with specific risk factors like high LDL, obesity, etc).
Therapies approved and labelled for reversing or slowing aging could create unprecedented patient benefit and commercial success. Others have thought about quantifying this upside, here we think about how to get there.
To get a therapy approved and labelled for aging, you eventually need to run your pivotal clinical trial of that drug against aging. In other words, the primary endpoint of the treatment’s Phase 3 trial needs to measure aging, and not be limited to any one specific indication.
However, so far nobody has run such a trial…
This is in part because measuring aging is hard and new: it creates uncertainty around the exact endpoints, approval criteria, and reimbursement pathways.
So instead, the majority of biotech companies that are developing aging drugs plan to follow the conventional path of developing/validating their therapy against a specific disease. That is, they pick an age-related disease as a proxy indication, and plan to get their drug to market against that indication. Then, once the drug is on the market, they plan to expand the drug’s labelling for other indications by running additional trials against other diseases. This is a reasonable approach, we see the label expansion trials happening today with e.g. GLP1-R agonists.
However, this approach creates an incentive misalignment:
If you will have to approve your drug against a specific indication, you want to optimize your drug development programme for making a drug that works best against that one indication.
Chances are that the drug that works best for that one indication won’t be the exact same drug that works best against aging as a whole -> you will have to pick between optimizing your drug development against your target indication or optimizing your drug development against aging.
Considering that making any successful drug is already hard enough, and you need your drug to succeed, you will choose to optimize for making a good drug for your initial indication -> and won’t develop a great aging drug.
We believe that developing aging drugs has a higher expected value than traditional drug development because of the much-much-higher upside. However, because of the much-increased risk of failure, most reasonable investors and entrepreneurs won’t follow this path, because they want to optimize their chances of an already sufficiently large success that comes with putting a traditional drug on the market. That is, for most people, having a 5% chance for a $10billion upside is more attractive than a 1% chance for a $100 billion upside (numbers made up for illustration, real EV delta likely larger).
Being kind-of-forced to chase local maxima is frustrating. It’s therefore worth thinking about how we could align our incentives with what we, as aging researchers and biotechnologists, would want to do in the first place.
Besides frustration at the individual level, this incentive-misalignment also leaves a lot of societal and patient benefits on the table. As a society, we should be pursuing the opportunities with the highest expected value.
Quick background on drug development process and different phases of clinical trials, for those unfamiliar:
The drug development process can be broken down into four main stages. To begin, a company will conduct preclinical studies in model organisms to find a putatively efficacious compound for their indication of interest. Once they have established sufficient preclinical support they can submit an investigational new drug (IND) approval application to the FDA which, if approved, allows them to move into human studies. Phase 1 trials primarily assess safety, dosage, and pharmacokinetics in a small group of healthy volunteers or patients. Phase 2 trials expand the population size to explore efficacy in humans and further evaluate safety, often in individuals with the target condition. Phase 3 trials, also known as pivotal trials, are large-scale studies (randomized, double-blinded, and controlled) that provide the definitive evidence that the drug is efficacious (and safe). This is required by regulatory bodies like the FDA to approve the drug, as well as payors like health insurers to want to pay for the approved treatment.
Trials are run on patient populations that are defined by inclusion/exclusion criteria of the trial (such as patients aged 60 to 70 that are diagnosed with x). Trials measure one primary, and a couple of secondary endpoints. Achieving a predefined effect on the primary endpoint is by far most important goal of the trial, if you do that, your trial succeeded, and your drug is likely to get approved. Generally, upon approval your drug will be labelled (prescribed, paid for) on for the patient population you include in your Phase 3 trial. Hence the most important parts of a trial design are the patient population and the primary endpoint of the trial.
How would a trial look like that targets aging directly?
More specifically, let’s think about what can be the primary endpoint of our eventual Phase 3 trial down the line. This is the decision that should and will drive the design of the whole trial, and will serve as the north star for all of our earlier-stage drug discovery and development processes.
We know that our primary Phase 3 endpoint needs to be:
a) A generally good proxy for aging (vs for a more specific indication) for the reasons above.
b) Reimbursable.
Reimbursable means that if our Phase 3 trial succeeds and a treatment shows improvement on our primary endpoint, that also means that our treatment will be:
Approved by the FDA (and other regulatory agencies)
Reimbursed by payors, for a very wide group of patients, at a reasonably high price tag. Payors generally pay for, but only pay for therapeutics that produce very obvious and very direct benefit to the patients on their insurance plan.
The main relevant payor types we are talking about here are:
Private commercial insurance plans (United, Cigna, etc) (29% US healthcare spend)
Medicare, governments outside of the US (21% US spend + 17% for Medicaid)
Patients themselves (out of pocket) (11% US spend)
Private and government plans generally care about the same treatments and evidence. Most important for aging drugs will be Medicare and other government payors, as most of the cost savings and QUALY an aging drug will make are going to be on the 65+ year old population.
Good aging drugs will be prescribed earlier in life as well, and provide disease prevention and quality of life improvements for patients on private plans, which is needed for private plans to be incentivised to reimburse. (In this essay we’ll ignore complicated situations that would arise from aging drugs that should be dosed early in life but only provide medical benefits at age 65+.)
Out of pocket payments may be a bit of a wildcard for aging drugs, as patients themselves may require less efficacy and health economic evidence as long as the upside and safety is established. We expect that the market-size for this is still small compared to a drug reimbursed by main payors, so going forward we’ll assume the need to get reimbursed by government and private payors.
| Indication | Most common Phase 3 endpoint(s) for the indication | We categorize the endpoint as |
|---|---|---|
| Alzheimer’s Disease |
CDR-SB change from baseline (Clinical Dementia Rating Scale, Sum of Boxes)
This is a semi-structured questionnaire/interview performed by trained neuropsychologists to evaluate how much the patient is limited in their daily activities and responsibilities by their current cognitive abilities.
See here for worksheet. |
Functional |
| Parkinson’s Disease |
UPDRS change from baseline (Unified Parkinson's Disease Rating Scale)
Clinical scale, evaluating how much Parkinson’s affects different aspects of patients' life (behaviour, mood, activities of daily living, motor control).
See here for worksheet. |
Functional |
| Heart Failure |
"Time to first occurrence of cardiovascular (CV) death or HF event."
(finerenone, zilitvekimab similar)
|
Morbidity or mortality related |
|
6MWTD change from baseline Distance walked in 6 minutes in standardised conditions
|
Functional | |
| NASH |
"Resolution of steatohepatitis and no worsening of liver fibrosis (Yes/No)"
= count of patients where things got worse vs not-worse.
|
Morbidity or mortality related |
| Chronic Obstructive Pulmonary Disease (COPD) |
FEV1 change from baseline (forced expiratory volume in one second)
= how much air a person can exhale in one second, i.e., how impaired is their breathing.
|
Functional |
| Colorectal Cancer |
Overall Survival
(trastuzumab)
(encorafenib similar)
| Morbidity or mortality related |
| Hepatocellular Carcinoma |
Overall Survival
| Morbidity or mortality related |
| HIV infection |
“HIV-1 plasma viral RNA measurements and CD4 counts during follow-up and after therapy”
(HAART)
|
Morbidity or mortality related |
| Type-2 Diabetes |
Time to first 3P-MACE (3-Point Major Adverse Cardiovascular Events)
= Time to First Occurrence of Death from Cardiovascular Causes, Myocardial Infarction, or Stroke
|
Morbidity or mortality related |
|
HbA1c change from baseline
|
Validated surrogate endpoint | |
| Atherosclerosis | LDL-C (percent change from baseline)
|
Validated surrogate endpoint |
| Time to first 3P-MACE
|
Morbidity or mortality related |
We see that the overwhelming majority of Phase 3 primary endpoints are extremely simple and non-fancy.
These endpoints are either:
Functional. I.e. measuring how well a patient is functioning, in the simplest terms possible
Morbidity or mortality related. I.e. measure the presence/absence/frequency of unwanted medical events (incl. death) or diseases.
It’s undeniably obvious that if you change these endpoints, you have created true and unquestionable patient benefit. ( c.f. aging clocks :-) )
The types of these endpoints also outline our options for the endpoints we can have for our trial against aging. The primary endpoint of our aging Phase 3 trial will either have to be something very functional, or something morbidity/mortality related.
The few Phase 3 endpoints in use currently by trials that don’t fit either of the above two categories are very extensively validated surrogate endpoints. We see such endpoints in e.g. atherosclerosis (LDL cholesterol), T2DM (HbA1c). The benefit of these endpoints is that we can expect to see change in them earlier than in functional or morbidity-related endpoints for the same disease.
Currently, surrogate biomarkers for aging are nowhere near the evidence level needed to be used as surrogates in late stage trials. Long term, we should certainly aspire to develop such surrogate endpoints, and the way such surrogate endpoints should be developed warrant further thinking - but the first aging trials are not going to be against such endpoints.
Option 1: trial against a frailty index
The functional kind of aging Phase 3 endpoints would resemble something like what frailty indices are today. Measuring a combination of cognition, physical fitness, etc - that make it clear that a patient has a higher quality of life if they score better on the index. Importantly, a great frailty index may also show responsiveness to a drug earlier than other Phase 3 endpoints would.
Current frailty indexes are not reimbursed Phase 3 endpoints. This is in part because the complexity of including multiple domains (cognition, mobility, blood markers) in a single metric makes it hard to interpret what an improvement in the overall score means, and partly because such indexes' ability to predict medical events that would be costly for payors (intensive care, multi-day in-patient visits) is not extensively validated in patients without specific diagnoses. Other, more simple functional endpoints, validated and used in specific indications show that similar measures are one good way to think about proving benefit to patients. For example the UPDRS for Parkinson’s, or the CDR for Alzheimer’s are quite complex semi-structured and semi-subjective indices, but they still only measure function related to one large domain (motor and cognitive function, respectively). In short, we most likely need to develop, and then validate a frailty index that is better than the ones that exist today.
Trialling against a frailty index style endpoint is certainly worth exploring further, but is not what we focus on in this essay.
It is also not how we would run our first aging trial if we had to decide today. This is because running the first aging trial against a frailty index will still carry regulatory and reimbursement risk, like any previously not used or reimbursed endpoint would. This risk is larger with a frailty index than with a mortality or morbidity-based endpoint, because the components that the index includes are not individually reimbursed in themselves, and are not directly tied to currently reimbursed endpoints. Frailty indexes will either need to be tied to currently reimbursed endpoints, or to other costly-for-payor medical events that they predict. This needs further validation. And since, paradoxically, part of the complete validation of a frailty index likely includes measuring how an aging drug that works affects the index, at the time of our first aging trial the index will be inherently not yet fully validated.
What would a morbidity or mortality-related aging Phase 3 endpoint look like?
Option 2.1: Trial Against All-Cause-Mortality
The trial could simply measure all-cause mortality as its primary endpoint. Meaning: two trial arms (treatment and control) see whether there are significantly less patients dying in our treatment arm vs in our control arm.
This would obviously be reimbursable, but wouldn’t be very responsive -> we would need to run trials with extremely large sample sizes and/or long follow-up periods.
Option 2.2: Morbidity-Based Trial
The next most obvious option is running the trial against age-related morbidities. The endpoint in such trials would be simply counting the frequency of an unwanted medical ‘event’. The medical event can be an acute event, such as a patient having a stroke or MI, or it can be the new diagnosis of a disease. (One can also count the number of desired medical events such as resolution of a condition, the idea there is the same, just reversed.)
To illustrate how a trial for such an endpoint would look, we can look at one of the many Heart Failure trials as an example, such as this one for finerenone. Two arms in the trial (treatment and control), see whether there are significantly less patients experiencing the medical event (Cardiovascular death or other Heart Failure event) in the treatment arm vs in the control arm over the duration of the trial. (We’re simplifying, the trial actually measures “time to first [event]” which allows more nuanced analysis, but it’s the same general idea.)
Such an endpoint is still obviously reimbursable today, and is much more responsive than an all-cause-mortality trial. If there is on average 10 years between the diagnosis of an age-related chronic disease (the medical event) and the consequent death, then measuring morbidity as opposed to mortality gets us the same power 10 years earlier than the all cause mortality trial would.
Out of these options that we can think of, this is the kind of trial design we would go for today.
Multimorbidity-prevention trial as an aging trial
In our trial that is specifically against aging, we complicate the design of a morbidity-based trial by not just counting the number of one single type of medical event, but counting the number of a variety of medical events.
That is, where a Heart Failure trial would count the number of cardiovascular deaths and HF events only, we will count the number of multiple kinds of medical events. (More specifically, we measure time-to-first-event.)
For example, we will count it as a medical event if a patient in a trial:
has a cardiovascular death or HF event
has any other type of cardiovascular event (Myocardial Infarct, stroke)
is diagnosed with any type of (age-related, deadly) cancer
is diagnosed with any type of (age-related) neurological or psychiatric disease
is diagnosed with a renal disease (e.g. late stage chronic kidney disease = CKD).
Notably, we just bring these as examples to demonstrate the concept. Depending on your drug, maybe you’d want to include metabolic events, or not measure cancers, or anything else, and you’ll need to define exact diagnostic criteria for each disease or disease group.
In the most simple case, we count all these kinds of medical events as equal, and care only about whether one has happened or not. We only count at most one event per patient (the first qualifying event) per patient. In more complicated designs, we can weight different types of medical events differently based on a variety of factors, such as assigning a higher multiplier to medical events that cause a larger decrease in quality of life. We will assume the simplest way to sum events going forward in this essay.
This type of endpoint is very likely reimbursable because the components of the endpoint, or very analogous endpoints to the components are reimbursed individually today.
Such composite endpoints are also commonly used already in more limited capacity. The example brought above already groups cardiovascular death and other HF events into a single endpoint. Another example is 3P-MACE (3-Point Major Adverse Cardiovascular Events) which combines death from cardiovascular causes, myocardial infarction, and stroke, and is perhaps the most common endpoint in cardiometabolic Phase 3 trials. '
It’s worth pointing out that the study design of TAME also follows this type of endpoint, suggesting that the team behind the TAME study reached similar conclusions than we just have. It is reported that the TAME study team has consulted with FDA on their study design, and they claim that FDA has agreed that the trial design and endpoints are reasonable.
The problem of prevention trials
It’s important to notice that what we’re describing above is a prevention trial. Most trials are run on patients who already have a disease, and try to slow the progression of that disease. Here we start the trial with ~healthy patients. We then measure their conversion to patients with age-related diseases. And we count each specific conversion from healthy-to-diseased patient as a medical event.
Prevention trials are very rare with any chronic indications, but are well-known in other parts of medicine. Vaccine trials against infectious diseases are prevention trials. Healthy individuals are treated with a vaccine, and are followed over a period of time, to measure how many of the patients will develop notable or serious medical problems going forward (due to an infection).
The problem with prevention trials is that most of the patients in the trial will provide us no information of our treatment’s effectiveness whatsoever, because they weren’t going to develop the disease that we are trying to prevent in the first place.
In the vaccine trial example, if a patient never meets with the pathogen whose infection’s effects the vaccine is designed to prevent, the patient was pointless to involve in the trial from a statistical power perspective. If only 1 out of 10 patient meets with the pathogen naturally, you need to involve 10x as many patients in the trial than you otherwise would need to for a non-prevention trial for an equally effective drug to achieve the same statistical power.
Vaccines are more powerful (higher effect size) than most of the drugs against e.g. chronic diseases, and there’s also a way to artificially infect trial participants in challenge trials.
But for not infectious diseases, where the above is not possible, we have this problem with no obvious solution -> prevention trials targeting healthy individuals (vs at-risk patient groups) for chronic diseases are rare, even though many of us working on aging believe those are the kinds of therapies that could lead to the largest benefits.
A primer on trial stats
To understand what’s going on here more, we need to know some very basic statistics that go into designing a clinical trial. We’ll keep things very simple in this section, if you want to fact check or read more on biostatistics of clinical trials, you might want to start here.
The goal of every trial is to prove that the drug you have is successful in modulating your primary endpoint.
To be able to show this is very likely (~statistically significantly) the case, the perhaps most important and difficult decision in any trial design process is deciding on the sample sizes of the trial. You want a high sample size to be able to show the drug’s effect confidently, and you want a low sample size because trials are expensive and their cost is driven most by the number of patients in the trial.
The formula to calculate the trial size you want is:
(Small note: the formula above that we use going forward is approximately equal to the also commonly used Schoenfeld formula, as well as other ways to calculate trial size from HRs.)
where:
D is the number of total medical events required in the trial
for a drug that has HR hazard ratio for preventing such events
for beta statistical power
and alpha type I. error
A medical event is the same as what we have discussed above. E.g. one patient having a stroke, or receiving a new diagnosis of dementia.
HR is the hazard ratio (or risk ratio) you declare based on your estimates. The hazard ratio is the ratio of