How to Design a Clinical Trial for Aging
A piece written by Marton Meszaros, Brennan Overhoff & Norn Group
This is an essay where we outline how we believe the first pivotal clinical trial targeting aging needs to be designed. We propose a rough trial design on a conceptual level, but first walk through the thinking that led us to these conclusions. As part of this, we examine different types of trial endpoints that could be used, then discuss the common issues with prevention trials and how an aging drug can help overcome them. We then share a calculator for estimating trial sample size requirements for the suggested trial design, for any chosen therapy. Using this calculator, we try to illustrate how and when aging drugs can make running such a trial an economically sound decision.
Problem: State-of-art ways to get aging drugs to market disincentivise the development of therapeutics that directly address aging mechanisms
Notes for Clarity:
For the purposes of this essay, we define aging as:
○ The sum of underlying biological processes that happens in every human gradually over long periods of time.
○ That makes people more susceptible to diseases associated with high chronological age (e.g., neurodegeneration, cardiometabolic diseases, cancer).
Consequently, for the purposes of this essay, an aging drug is a drug that:
○ Works against these underlying natural processes.
○ Reduces the risk of multiple diseases with unrelated pathologies associated with high chronological age in virtually every human (vs just patient groups with specific risk factors like high LDL, obesity, etc).
Therapies approved and labelled for reversing or slowing aging could create unprecedented patient benefit and commercial success. Others have thought about quantifying this upside, here we think about how to get there.
To get a therapy approved and labelled for aging, you eventually need to run your pivotal clinical trial of that drug against aging. In other words, the primary endpoint of the treatment’s Phase 3 trial needs to measure aging, and not be limited to any one specific indication.
However, so far nobody has run such a trial…
This is in part because measuring aging is hard and new: it creates uncertainty around the exact endpoints, approval criteria, and reimbursement pathways.
So instead, the majority of biotech companies that are developing aging drugs plan to follow the conventional path of developing/validating their therapy against a specific disease. That is, they pick an age-related disease as a proxy indication, and plan to get their drug to market against that indication. Then, once the drug is on the market, they plan to expand the drug’s labelling for other indications by running additional trials against other diseases. This is a reasonable approach, we see the label expansion trials happening today with e.g. GLP1-R agonists.
However, this approach creates an incentive misalignment:
If you will have to approve your drug against a specific indication, you want to optimize your drug development programme for making a drug that works best against that one indication.
Chances are that the drug that works best for that one indication won’t be the exact same drug that works best against aging as a whole -> you will have to pick between optimizing your drug development against your target indication or optimizing your drug development against aging.
Considering that making any successful drug is already hard enough, and you need your drug to succeed, you will choose to optimize for making a good drug for your initial indication -> and won’t develop a great aging drug.
We believe that developing aging drugs has a higher expected value than traditional drug development because of the much-much-higher upside. However, because of the much-increased risk of failure, most reasonable investors and entrepreneurs won’t follow this path, because they want to optimize their chances of an already sufficiently large success that comes with putting a traditional drug on the market. That is, for most people, having a 5% chance for a $10billion upside is more attractive than a 1% chance for a $100 billion upside (numbers made up for illustration, real EV delta likely larger).
Being kind-of-forced to chase local maxima is frustrating. It’s therefore worth thinking about how we could align our incentives with what we, as aging researchers and biotechnologists, would want to do in the first place.
Besides frustration at the individual level, this incentive-misalignment also leaves a lot of societal and patient benefits on the table. As a society, we should be pursuing the opportunities with the highest expected value.
Quick background on drug development process and different phases of clinical trials, for those unfamiliar:
The drug development process can be broken down into four main stages. To begin, a company will conduct preclinical studies in model organisms to find a putatively efficacious compound for their indication of interest. Once they have established sufficient preclinical support they can submit an investigational new drug (IND) approval application to the FDA which, if approved, allows them to move into human studies. Phase 1 trials primarily assess safety, dosage, and pharmacokinetics in a small group of healthy volunteers or patients. Phase 2 trials expand the population size to explore efficacy in humans and further evaluate safety, often in individuals with the target condition. Phase 3 trials, also known as pivotal trials, are large-scale studies (randomized, double-blinded, and controlled) that provide the definitive evidence that the drug is efficacious (and safe). This is required by regulatory bodies like the FDA to approve the drug, as well as payors like health insurers to want to pay for the approved treatment.
Trials are run on patient populations that are defined by inclusion/exclusion criteria of the trial (such as patients aged 60 to 70 that are diagnosed with x). Trials measure one primary, and a couple of secondary endpoints. Achieving a predefined effect on the primary endpoint is by far most important goal of the trial, if you do that, your trial succeeded, and your drug is likely to get approved. Generally, upon approval your drug will be labelled (prescribed, paid for) on for the patient population you include in your Phase 3 trial. Hence the most important parts of a trial design are the patient population and the primary endpoint of the trial.
How would a trial look like that targets aging directly?
More specifically, let’s think about what can be the primary endpoint of our eventual Phase 3 trial down the line. This is the decision that should and will drive the design of the whole trial, and will serve as the north star for all of our earlier-stage drug discovery and development processes.
We know that our primary Phase 3 endpoint needs to be:
a) A generally good proxy for aging (vs for a more specific indication) for the reasons above.
b) Reimbursable.
Reimbursable means that if our Phase 3 trial succeeds and a treatment shows improvement on our primary endpoint, that also means that our treatment will be:
Approved by the FDA (and other regulatory agencies)
Reimbursed by payors, for a very wide group of patients, at a reasonably high price tag. Payors generally pay for, but only pay for therapeutics that produce very obvious and very direct benefit to the patients on their insurance plan.
The main relevant payor types we are talking about here are:
Private commercial insurance plans (United, Cigna, etc) (29% US healthcare spend)
Medicare, governments outside of the US (21% US spend + 17% for Medicaid)
Patients themselves (out of pocket) (11% US spend)
Private and government plans generally care about the same treatments and evidence. Most important for aging drugs will be Medicare and other government payors, as most of the cost savings and QUALY an aging drug will make are going to be on the 65+ year old population.
Good aging drugs will be prescribed earlier in life as well, and provide disease prevention and quality of life improvements for patients on private plans, which is needed for private plans to be incentivised to reimburse. (In this essay we’ll ignore complicated situations that would arise from aging drugs that should be dosed early in life but only provide medical benefits at age 65+.)
Out of pocket payments may be a bit of a wildcard for aging drugs, as patients themselves may require less efficacy and health economic evidence as long as the upside and safety is established. We expect that the market-size for this is still small compared to a drug reimbursed by main payors, so going forward we’ll assume the need to get reimbursed by government and private payors.
Indication | Most common Phase 3 endpoint(s) for the indication | We categorize the endpoint as |
---|---|---|
Alzheimer’s Disease |
CDR-SB change from baseline (Clinical Dementia Rating Scale, Sum of Boxes)
This is a semi-structured questionnaire/interview performed by trained neuropsychologists to evaluate how much the patient is limited in their daily activities and responsibilities by their current cognitive abilities.
See here for worksheet. |
Functional |
Parkinson’s Disease |
UPDRS change from baseline (Unified Parkinson's Disease Rating Scale)
Clinical scale, evaluating how much Parkinson’s affects different aspects of patients' life (behaviour, mood, activities of daily living, motor control).
See here for worksheet. |
Functional |
Heart Failure |
"Time to first occurrence of cardiovascular (CV) death or HF event."
(finerenone, zilitvekimab similar)
|
Morbidity or mortality related |
6MWTD change from baseline Distance walked in 6 minutes in standardised conditions
|
Functional | |
NASH |
"Resolution of steatohepatitis and no worsening of liver fibrosis (Yes/No)"
= count of patients where things got worse vs not-worse.
|
Morbidity or mortality related |
Chronic Obstructive Pulmonary Disease (COPD) |
FEV1 change from baseline (forced expiratory volume in one second)
= how much air a person can exhale in one second, i.e., how impaired is their breathing.
|
Functional |
Colorectal Cancer |
Overall Survival
(trastuzumab)
(encorafenib similar)
| Morbidity or mortality related |
Hepatocellular Carcinoma |
Overall Survival
| Morbidity or mortality related |
HIV infection |
“HIV-1 plasma viral RNA measurements and CD4 counts during follow-up and after therapy”
(HAART)
|
Morbidity or mortality related |
Type-2 Diabetes |
Time to first 3P-MACE (3-Point Major Adverse Cardiovascular Events)
= Time to First Occurrence of Death from Cardiovascular Causes, Myocardial Infarction, or Stroke
|
Morbidity or mortality related |
HbA1c change from baseline
|
Validated surrogate endpoint | |
Atherosclerosis | LDL-C (percent change from baseline)
|
Validated surrogate endpoint |
Time to first 3P-MACE
|
Morbidity or mortality related |
We see that the overwhelming majority of Phase 3 primary endpoints are extremely simple and non-fancy.
These endpoints are either:
Functional. I.e. measuring how well a patient is functioning, in the simplest terms possible
Morbidity or mortality related. I.e. measure the presence/absence/frequency of unwanted medical events (incl. death) or diseases.
It’s undeniably obvious that if you change these endpoints, you have created true and unquestionable patient benefit. ( c.f. aging clocks :-) )
The types of these endpoints also outline our options for the endpoints we can have for our trial against aging. The primary endpoint of our aging Phase 3 trial will either have to be something very functional, or something morbidity/mortality related.
The few Phase 3 endpoints in use currently by trials that don’t fit either of the above two categories are very extensively validated surrogate endpoints. We see such endpoints in e.g. atherosclerosis (LDL cholesterol), T2DM (HbA1c). The benefit of these endpoints is that we can expect to see change in them earlier than in functional or morbidity-related endpoints for the same disease.
Currently, surrogate biomarkers for aging are nowhere near the evidence level needed to be used as surrogates in late stage trials. Long term, we should certainly aspire to develop such surrogate endpoints, and the way such surrogate endpoints should be developed warrant further thinking - but the first aging trials are not going to be against such endpoints.
Option 1: trial against a frailty index
The functional kind of aging Phase 3 endpoints would resemble something like what frailty indices are today. Measuring a combination of cognition, physical fitness, etc - that make it clear that a patient has a higher quality of life if they score better on the index. Importantly, a great frailty index may also show responsiveness to a drug earlier than other Phase 3 endpoints would.
Current frailty indexes are not reimbursed Phase 3 endpoints. This is in part because the complexity of including multiple domains (cognition, mobility, blood markers) in a single metric makes it hard to interpret what an improvement in the overall score means, and partly because such indexes' ability to predict medical events that would be costly for payors (intensive care, multi-day in-patient visits) is not extensively validated in patients without specific diagnoses. Other, more simple functional endpoints, validated and used in specific indications show that similar measures are one good way to think about proving benefit to patients. For example the UPDRS for Parkinson’s, or the CDR for Alzheimer’s are quite complex semi-structured and semi-subjective indices, but they still only measure function related to one large domain (motor and cognitive function, respectively). In short, we most likely need to develop, and then validate a frailty index that is better than the ones that exist today.
Trialling against a frailty index style endpoint is certainly worth exploring further, but is not what we focus on in this essay.
It is also not how we would run our first aging trial if we had to decide today. This is because running the first aging trial against a frailty index will still carry regulatory and reimbursement risk, like any previously not used or reimbursed endpoint would. This risk is larger with a frailty index than with a mortality or morbidity-based endpoint, because the components that the index includes are not individually reimbursed in themselves, and are not directly tied to currently reimbursed endpoints. Frailty indexes will either need to be tied to currently reimbursed endpoints, or to other costly-for-payor medical events that they predict. This needs further validation. And since, paradoxically, part of the complete validation of a frailty index likely includes measuring how an aging drug that works affects the index, at the time of our first aging trial the index will be inherently not yet fully validated.
What would a morbidity or mortality-related aging Phase 3 endpoint look like?
Option 2.1: Trial Against All-Cause-Mortality
The trial could simply measure all-cause mortality as its primary endpoint. Meaning: two trial arms (treatment and control) see whether there are significantly less patients dying in our treatment arm vs in our control arm.
This would obviously be reimbursable, but wouldn’t be very responsive -> we would need to run trials with extremely large sample sizes and/or long follow-up periods.
Option 2.2: Morbidity-Based Trial
The next most obvious option is running the trial against age-related morbidities. The endpoint in such trials would be simply counting the frequency of an unwanted medical ‘event’. The medical event can be an acute event, such as a patient having a stroke or MI, or it can be the new diagnosis of a disease. (One can also count the number of desired medical events such as resolution of a condition, the idea there is the same, just reversed.)
To illustrate how a trial for such an endpoint would look, we can look at one of the many Heart Failure trials as an example, such as this one for finerenone. Two arms in the trial (treatment and control), see whether there are significantly less patients experiencing the medical event (Cardiovascular death or other Heart Failure event) in the treatment arm vs in the control arm over the duration of the trial. (We’re simplifying, the trial actually measures “time to first [event]” which allows more nuanced analysis, but it’s the same general idea.)
Such an endpoint is still obviously reimbursable today, and is much more responsive than an all-cause-mortality trial. If there is on average 10 years between the diagnosis of an age-related chronic disease (the medical event) and the consequent death, then measuring morbidity as opposed to mortality gets us the same power 10 years earlier than the all cause mortality trial would.
Out of these options that we can think of, this is the kind of trial design we would go for today.
Multimorbidity-prevention trial as an aging trial
In our trial that is specifically against aging, we complicate the design of a morbidity-based trial by not just counting the number of one single type of medical event, but counting the number of a variety of medical events.
That is, where a Heart Failure trial would count the number of cardiovascular deaths and HF events only, we will count the number of multiple kinds of medical events. (More specifically, we measure time-to-first-event.)
For example, we will count it as a medical event if a patient in a trial:
has a cardiovascular death or HF event
has any other type of cardiovascular event (Myocardial Infarct, stroke)
is diagnosed with any type of (age-related, deadly) cancer
is diagnosed with any type of (age-related) neurological or psychiatric disease
is diagnosed with a renal disease (e.g. late stage chronic kidney disease = CKD).
Notably, we just bring these as examples to demonstrate the concept. Depending on your drug, maybe you’d want to include metabolic events, or not measure cancers, or anything else, and you’ll need to define exact diagnostic criteria for each disease or disease group.
In the most simple case, we count all these kinds of medical events as equal, and care only about whether one has happened or not. We only count at most one event per patient (the first qualifying event) per patient. In more complicated designs, we can weight different types of medical events differently based on a variety of factors, such as assigning a higher multiplier to medical events that cause a larger decrease in quality of life. We will assume the simplest way to sum events going forward in this essay.
This type of endpoint is very likely reimbursable because the components of the endpoint, or very analogous endpoints to the components are reimbursed individually today.
Such composite endpoints are also commonly used already in more limited capacity. The example brought above already groups cardiovascular death and other HF events into a single endpoint. Another example is 3P-MACE (3-Point Major Adverse Cardiovascular Events) which combines death from cardiovascular causes, myocardial infarction, and stroke, and is perhaps the most common endpoint in cardiometabolic Phase 3 trials. '
It’s worth pointing out that the study design of TAME also follows this type of endpoint, suggesting that the team behind the TAME study reached similar conclusions than we just have. It is reported that the TAME study team has consulted with FDA on their study design, and they claim that FDA has agreed that the trial design and endpoints are reasonable.
The problem of prevention trials
It’s important to notice that what we’re describing above is a prevention trial. Most trials are run on patients who already have a disease, and try to slow the progression of that disease. Here we start the trial with ~healthy patients. We then measure their conversion to patients with age-related diseases. And we count each specific conversion from healthy-to-diseased patient as a medical event.
Prevention trials are very rare with any chronic indications, but are well-known in other parts of medicine. Vaccine trials against infectious diseases are prevention trials. Healthy individuals are treated with a vaccine, and are followed over a period of time, to measure how many of the patients will develop notable or serious medical problems going forward (due to an infection).
The problem with prevention trials is that most of the patients in the trial will provide us no information of our treatment’s effectiveness whatsoever, because they weren’t going to develop the disease that we are trying to prevent in the first place.
In the vaccine trial example, if a patient never meets with the pathogen whose infection’s effects the vaccine is designed to prevent, the patient was pointless to involve in the trial from a statistical power perspective. If only 1 out of 10 patient meets with the pathogen naturally, you need to involve 10x as many patients in the trial than you otherwise would need to for a non-prevention trial for an equally effective drug to achieve the same statistical power.
Vaccines are more powerful (higher effect size) than most of the drugs against e.g. chronic diseases, and there’s also a way to artificially infect trial participants in challenge trials.
But for not infectious diseases, where the above is not possible, we have this problem with no obvious solution -> prevention trials targeting healthy individuals (vs at-risk patient groups) for chronic diseases are rare, even though many of us working on aging believe those are the kinds of therapies that could lead to the largest benefits.
A primer on trial stats
To understand what’s going on here more, we need to know some very basic statistics that go into designing a clinical trial. We’ll keep things very simple in this section, if you want to fact check or read more on biostatistics of clinical trials, you might want to start here.
The goal of every trial is to prove that the drug you have is successful in modulating your primary endpoint.
To be able to show this is very likely (~statistically significantly) the case, the perhaps most important and difficult decision in any trial design process is deciding on the sample sizes of the trial. You want a high sample size to be able to show the drug’s effect confidently, and you want a low sample size because trials are expensive and their cost is driven most by the number of patients in the trial.
The formula to calculate the trial size you want is:
(Small note: the formula above that we use going forward is approximately equal to the also commonly used Schoenfeld formula, as well as other ways to calculate trial size from HRs.)
where:
D is the number of total medical events required in the trial
for a drug that has HR hazard ratio for preventing such events
for beta statistical power
and alpha type I. error
A medical event is the same as what we have discussed above. E.g. one patient having a stroke, or receiving a new diagnosis of dementia.
HR is the hazard ratio (or risk ratio) you declare based on your estimates. The hazard ratio is the ratio of
Alpha and Beta are basically constants we don’t need to think about here, but for the record
Beta is 1-power or type II. error, i.e. chance that you miss out on a positive result even if the drug works. Usually set to 10%.
Alpha is type I. error. Chance that you get a false positive result. Usually set at 0.05.
z is a standardised normal deviate. Don’t need to understand here.
Notice that D is still just the number of events you need in the trial, and not the number of patients you need to recruit and enrol into the trial - and it’s really this latter that we ultimately want to know.
In our case of a multimorbidity trial, the way we get to the number of patients needed (n) is:
where:
i is the natural incidence of the event of the patient population in our trial, measured by the number of events per year.
t is the trial length in years.
So if our event is a new dementia diagnosis, which generally happens to 20 patients out of 1,000 per year in our investigated patient population, then our i = 0.02.
This sample size assumes a 1 year long trial because the incidences we input will usually be incidence per year numbers. Trial stats doesn’t care about the length of the trial, just about the number of events (D) that the trial includes, so the sample size (n) can be relatively freely traded off for a longer trial. We would likely want to do that because the drug we have might work better over a longer dosing period than immediately. If we want e.g. a 5 year long trial, a reasonable estimate for the patient numbers to enrol will be n/5.
What we can notice from the above, is that biggest influences on n, ie the number of patients we need in our prevention trial will be
HR which basically signals how good our drug is for preventing an event in question
i meaning how common is the event naturally in our population.
Aging drugs may unlock prevention trials
Hence, besides not having amazing drugs for most chronic diseases, the problem that one faces when trying to run a trial for the prevention of a chronic disease, is that the incidence rates of chronic diseases in a generally healthy population are quite low, perhaps with the exception of metabolic disorders. This means that a lot of patients need to be enrolled into a trial for the trial to have a moderate amount of medical events.
To give more color here on what ballpark incidence numbers we’re talking about, see some approximate numbers in the table below. More nuance in e.g. Norn’s Age-related diseases overview spreadsheet. You can also see that picking the right age-group for the trial population will be an important consideration that we’re not diving into now.
Disease / Medical event |
Incidence (new cases per year per 100,000 population) |
||
---|---|---|---|
At ages 55-60 | At ages 65-70 | At ages 75-80 | |
Cancers (any kind) | 800 | 1,600 | 2,200 |
Dementia or MCI | 200 | 1,100 | 3,000 |
Chronic Heart Failure | 140 | 540 | 1,700 |
Myocardial Infarct | 170 | 600 | 1,200 |
CKD (stage 4 or 5) | 400 | 950 | 2,800 |
Note: the numbers in the table above are just for illustration, often just inferred/eye-balled from publications citing slightly different age groups. Getting precise estimates, including on interdependence between diseases needs lots of careful work for precise power calculations, but is not important to demonstrate the trial designs concept we’re trying to communicate, and hence this is not something we’re focusing on in this essay.
Even for more prevalent age-related diseases, we may only see 1-2% or less yearly incidence rates. With traditional drugs that are developed against a single disease, this means that the sample sizes that would be needed to run the prevention trial may be so large that they are economically prohibitive for running the trial at all.
Enter Aging Drugs…
We know that aging is the main risk factor for many diseases.
This means that true aging drugs will have an effect in preventing not just one, but multiple diseases.
This means that from now on we don't just need to wait around for our healthy enrolled participants to develop one single kind of medical event, but they can develop any of a variety of medical events, because our drug works on preventing each of those events.
Imagine that we have a drug that doesn’t just have an effect for preventing dementia, which has an incidence in our imaginary population of 0.02/year, but it also works and works equally well for preventing cancers that in our population also have an incidence of 0.02/y, as well as cardiovascular disease again with a hypothetical incidence of 0.02/y. Suddenly we have ~tripled our total incidence rate (around 0.06/year for any one event, in reality less because of non-independence) and need to enrol ~3 times less patients into our trial - which makes a huge cost difference. Maybe this cost difference is the difference between it being economically worth it to run a trial or not.
Estimating the trial size requirements of a drug’s multimorbidity prevention trial
This sounds great, but in reality, our aging drug will likely have different effect sizes (Hazard Ratios) for preventing different indications.
How do we find out how many patients are needed to be enrolled in our multimorbidity prevention trial then?
That seems like a hard question. We talked to lots of biostats people and worked through a few approaches below.
We made calculators for these to get closer to an answer.
1.0 Treating different aging events as if they are same underlying trial event measured across different sites: a quick and dirty approach to multimorbidity trial sizing
Our first notion of an “aging” trial, where we look for a treatment’s ability to prevent any one event out of a set of age-related events, was that this may be isomorphic to the problem of combining multiple trial arms where we look to see a drug’s effect on treating any one event occurring across said arms. If a drug worked well in one location and not so well in another (due to whatever heterogeneous factors), we’d expect the global effect size was somewhere in between these two. Similar expectation for an aging drug: if it works well in one related morbidity and not so well in the other, we expect its aging effect is somewhere in between these.
So, we computed an aggregate hazard ratio to represent the effect size of a given treatment for preventing any contributing aging event. We borrowed the stats to compute this from meta-analysis, which is typically used to combine the results (hazard ratios) estimated from independently run trials. You can think of this like a weighted average that takes into account the reliability of each trial arm when coming up with a global effect size estimate.
There are several reasons this is a contrived approach. For one, the comorbidities have dependencies between them, which we are not accounting for here. Second, this assumes fixed risk of each morbidity across time, which is not what happens in the wild. Third, this approach essentially averages an intervention’s effects on time-to-first-event for a set of disparate events, which should actually underestimate the effect size in most cases (e.g. if cancer typically happens first and the drug is really good at preventing cancer, then the time to first ‘aging’ event is more delayed when measuring this, as opposed to average time to first any one of separately measured aging events).
Anyways, you can access the calculator on the link below, then make a copy of the spreadsheet and play with it:
In case it’s not immediately intuitive to you, read on for instructions on how to use the calculator, and for an explanation on underlying assumptions/how it’s been built.
The most important thing you need to input into the calculator are hazard ratios (HR), which are effect size estimates for the chosen drug preventing the selected diseases or medical events.
Our hope is that you input your effect size (HR) estimates of your favourite aging drug, and then either
See trial size numbers that’d make a trial feasible to run -> in which case please proceed to run the trial.
See trial size numbers that are so large that it’d be economically unwise to run such a trial. -> in which case at least you know and maybe can try to find or develop a drug that will work well enough to warrant running an aging trial.
The calculator assumes:
That the drug works for preventing cancers, cardiovascular diseases, neurodegenerative diseases (dementia), and late stage chronic kidney disease, and not for any other types of diseases.
Incidences for these diseases, which are roughly the population averages for 65-70 year age groups.
Independence of different diseases. This won’t hold up in an actual trial design (a patient developing one age-related disease or medical event is more likely to have a second event as well) but we’re assuming it here nonetheless for simplicity.
1.1 Example: multimorbidity prevention trial for an aging drug
For the sake of demonstrating how we’d think of trial size requirements of a multimorbidity prevention trial, let’s look at an example drug, rapamycin.
What we need to be able to calculate the feasibility of running a multimorbidity prevention trial for rapamycin are estimates of how well would rapamycin prevent the diseases or medical events we care about.
Estimating clinical effect sizes of any treatment well is very hard, how to do that is maybe the content of another essay. Coming up with the best effect size estimates specifically for rapamycin’s effect in preventing different age-related diseases is hard again, and maybe the content of another essay.
Just to have some numbers to play with, let’s assume that rapamycin reduces the chance of a healthy patient
developing any form of cancer by 40% (HR = 0.6)
developing any form of dementia by 5% (HR = 0.95)
experiencing any form of stroke, or chronic heart failure, or myocardial infarction by 15% (HR = 0.85)
developing stage 4 or 5 chronic kidney disease by 10% (HR = 0.9)
(For context and sanity checking, HRs for well-known drugs often fall in a similar range: statins typically show an HR of 0.7 - 0.8 for major cardiovascular events, while SGLT2 inhibitors demonstrate an HR of 0.8 - 0.9 for heart failure hospitalisation, in the population they are generally trialled in.)
Inputting these HR numbers into the calculator we get the following:
Giving a composite HR of 0.87 and estimated size of ~9100.
We see that the estimated size of a rapamycin multimorbidity trial is substantially less than running each arm independently, offering an improved drug label for a fraction of the cost (assuming large trial size = large trial cost).
Provided these made-up HR numbers were correct, that’d be an interesting result worth thinking more about.
1.2 Aging drugs that have balanced effects on different disease groups provide even better trial economics
That said, our calculation above also highlights how asymmetric the effects of rapamycin are across aging morbidities, i.e. most of its aging effect size will likely be due to anti-cancer efficacy. While still cheaper than running each arm independently, the multimorbidity prevention trial for rapamycin is not cheaper than running its cancer arm alone.
So when and why would a more conservative biotech that cares more about the cheapest possible trial cost than the strength of the label run a multimorbidity prevention trial?
What if our drug candidate was more egalitarian in its approach to aging morbidities, i.e. its anticipated disease-specific hazard ratios were more similar than those of rapamycin? Our calculator indicates that this could provide an even stronger economic opportunity than a rapamycin trial. The reason for this is that the expected trial size is dependent on the given treatment hazard ratio and the natural incidence of the given disease event. If the composite hazard remains roughly the same as the contributing hazards (from each independent aging event, enabling a more ‘egalitarian’ therapy), but the incidence of aging events increases in the composite trial because we are counting more events, then the cost of a multimorbidity trial becomes cheaper than the cheapest possible indication-specific arm of that trial.
We can see this in a hypothetical case below:
This should make intuitive sense too for those already bullish on longevity/healthspan interventions—of course the aging drug trial is more fruitful. But what the numbers here indicate is that the drug actually needs to have pleiotropic effects across aging morbidities for the aging trial to really, definitively be better than a single-morbidity trial.
This approach of repurposing meta-analysis to understand how an aging drug/trial may work seemed like a reasonable approach to us initially and few biostatistics people we talked to, but we sought to confirm this with a more first-principles-esque approach.
2.0 A (slightly) less contrived approach to estimating size of a first-event prevention trial
To better understand the mechanics of how an aging trial would more accurately and granularly work, we built some simulations with a friend of ours outside of Norn (credit: Zane). Using our previous estimates of yearly morbidity incidence rates, we simulated outcomes in control and treatment groups (scaling the incidences by the individual HRs we previously estimated) with varying trial sizes to understand when and how this is powered.
The composite HR here converged to around 0.80 as the trial became more powered (achieving 90% power at ~5000 participants for a 5 year trial). This is in line with our expectation that our meta-analytic average would underestimate the effect size of an aging drug, which came out to 0.87 with the same assumptions of incidence (remember, the farther we are from 1.0, the greater the effect size).
Another friend of ours (credit: Mica) came up with an analytic approximation came up with an analytic approximation of the composite HR using the following equation:
Where P(c) is the natural incidence of each condition, c, and λ(t | c, Z) is the Cox proportional hazards function. Here we substitute our previous HR estimates for the ratios (with treatment vs without) in the sum.
This gave roughly the same composite HR as our simulation (0.81), which makes sense since these should mechanically be working similarly.
Conclusion
To summarize:
We believe that understanding what’s the path in which an aging drug can be trialled directly against aging (vs specific proxy-indications) will incentivise the development of true aging drugs that promise a higher upside and a higher expected value for all.
We explore potential primary endpoints for a potential pivotal trial against aging, and claim that the type of endpoint and trial that is most likely to lead to wide reimbursement is a multimorbidity-prevention trial, measuring time to first of any age-related event.
We suggest potential ways to calculate sample size requirements of a potential multimorbidity-prevention trial, given a drug’s estimated effect sizes against preventing specific age-related events.
From these calculations, we conclude that
1. In all cases, sample sizes needed to run a multimorbidity-prevention trial is substantially less than running a prevention trial for each disease independently. That is, this trial design offers a stronger drug label for a reduced cost.2. Aging drugs that have balanced effects across different disease groups provide even better economics: The cost of a multimorbidity-prevention trial becomes cheaper than a prevention trial for any single indication.
Acknowledgements
Self-evident disclaimer: This is an obviously imperfect essay to prompt discussion and further work. We tried to signal when we are less confident in the assumptions we’ve taken, but even when we haven’t done a good job in that: please use the ideas and resources presented here thoughtfully, and use them at most as a starting point in your own careful thinking that precedes e.g. committing resources to a project. We will certainly do the same.
Most of the work was done by Marton Meszaros and Brennan Overhoff. Thanks to the many people who contributed and gave us guidance or feedback. Special thanks to Mica Xu Ji, Zane Koch, Frank David, and everyone in Norn Longevity Nexus. All remaining errors in the essay are owned by Marton and Brennan only. Feel free to send feedback.
Published Jan/2025. Last updated Jan/2025.