Data Gaps that Bottleneck Aging Biomarker Discovery
By Marton Meszaros
Developing “aging clocks” or “biomarkers of aging” is a popular pursuit within the aging research community, and increasingly also in popular culture. We now have dozens of unverified consumer tests, hundreds of papers worth of research for research’s sake, and some good work that may lead somewhere. The impact of all the work on aging biomarkers today - on clinical care, or on improving the development of therapeutics - is far below what promises to be, and what it could be.
In this piece, I make the claim that aging biomarker discovery is bottlenecked in large part by the human cohorts and data that biomarker discovery efforts are based on. I start by thinking through what an aging biomarker needs to be to be actually useful, what are the high-level steps of developing and validating one, what data do those steps require. I then look at the datasets available today, and make a high-level proposal on what additional datasets need to be created to enable biomarker discovery with large real-world impact.
This piece mainly consists of personal opinions and “first pass”-level thinking, that can hopefully be used to build on top of. My opinions have been mainly informed by our (Norn Group) thinking on how a clinical trial targeting aging needs to be designed, learnings from my previous startup/job where we developed/validated/deployed screening assessments for very early stage Alzheimer’s disease, learnings from my current newco where we collect our own new well-selected primary human cohorts/datasets that fill important data-gaps and thereby enable us to discover better targets and make age-related drugs that work, and on conversations with industry stakeholders.
Table of Contents
- Section 1: Why Care About Aging Biomarkers?
- Section 2: Defining and categorizing biomarkers
- Section 3: Requirements that an aging biomarker needs to fulfill
- Section 4: What data does developing aging biomarkers require?
- Section 5: What are the data gaps we need to fill?
- Section 6: Actions to address data gaps
1. Why care about aging biomarkers?
If you’re working on aging or are familiar with this field of research, my hope is that this piece will prompt productive discussions about aging biomarker discovery and development, which can then help efforts steer more directly towards actionable findings and real world impact. Skip to the next section.
If you’re new to thinking about aging, or have heard of the field before but have perhaps been turned off by the lack of competent and feasible drug development focused work: I ask you to look at the players and opportunities in this field again after reading this, and consider if/how the expected value of aging-related work could be higher than other areas of biotech (and beyond), regardless of whether you’re optimising for financial returns or patient/societal benefit.
1.1 Why care about aging?
Aging is the root cause of the world’s largest diseases. The prevalence of cancer, cardiovascular disease, cerebrovascular disease, metabolic disease, dementia are all much higher in old people than in young people. This suggests that there are underlying biological changes that happen as we age that make us more susceptible to, and/or straight up causing all these diseases. If we are trying to treat these diseases once they are already there, we are fighting a losing battle. Preventing these diseases, by targeting their root cause, is an approach that while might be harder, promises much higher upside and expected value - both altruistically, and commercially.
1.2 Why care about biomarkers?
There’s two reasons to care:
1.2.1 Good biomarkers make drug development cheaper and faster.
In some cases, the cost delta in drug development created by a good biomarker is so large that it enables entirely new or new types of therapeutics to be developed, validated, approved, marketed.
Consider biomarker or companion diagnostic (cDX) driven cancer trials and therapies. E.g. Trastuzumab/Herceptin + HercepTest™ (2024 sales $1.5b), or Osimertinib/Tagrisso + EGFR mutation tests ($6.6b), or Olaparib/Lynparza + BRACAnalysis CDx ($3b). These therapeutics could not have been developed, and would be virtually impossible to market to patients without their cDX, because they are only effective on patients who test positive on the cDX. Notably, in this setup, the biomarker is generally developed, owned, and monetised by a for-profit entity - with the biomarker effectively creating a big moat for developing treatments for that particular indication.
Roy Baynes, Chief Medical Officer of Merck (2013-2022) on the use of biomarkers in the Keytruda program:
“In Merck’s pembrolizumab [Keytruda] program, we were well-served by using precision medicine to select patients we thought would do well. This allowed for focused development, enhanced effect sizes and smaller clinical studies.”
Keytruda is 2024’s highest grossing drug, sales $29.5b
PDL-1 is a biomarker used to predict how well Keytruda will work in treating patients with non-small cell lung cancer (NSCLC)
In other cases, the development of a new biomarker or surrogate endpoint “merely” causes or contributes to a step-change in the number of drugs successfully trialled and approved for the indication that the surrogate endpoint is put in place for. These endpoints are generally not owned by a single entity and are developed primarily through non-profit routes - but provide very high return on expanded capital.
LDL first used as surrogate endpoint, as primary endpoint in a Phase 3 trial leading to an approval: 1987 (lovastatin)
Drugs approved in preceding 10 years for hypercholesterolemia: 3 (colestipol, probucol, gemfibrozil)
Drugs approved in following 10 years: 6 (5 more statins + niacin)
HbA1C first used as a primary Phase 3 endpoint: 1994 (metformin)
Number of diabetes therapies approved 1984-1993: 2 (sulfonylureas)
Number of diabetes therapies approved 1995-2004: 12 (insulin-analogs, TZDs, alpha‑glucosidase inhibitors, meglitinides)
1.2.2. Biomarker discovery and target discovery overlap very heavily.
As I will discuss later, a good attribute that useful biomarkers have is that they are causally implicated in a disease process. In such cases, those biomarkers discovered may also themselves serve as, or highlight promising drug targets that are closely relevant to them and can then serve as the basis of successful drug dev efforts. We’ve seen this many times over. Some examples:
Disease | Biomarker | Target related to biomarker | Approved Tx hitting that target (2024 sales) [List is not Comprehensive] |
---|---|---|---|
Atherosclerotic CVD (Hypercholesterolemia) | LDL-C (accepted surrogate for CV risk) | PCSK9 (which regulates LDL receptor recycling and LDL-C levels) | evolocumab ($2.0 B)
alirocumab ($0.53 B) inclisiran ($0.75 B) |
Heart Failure (HFrEF) | NT-proBNP
(N-terminal pro–B-type natriuretic peptide) (biomarker of cardiac wall stress, tracked as a secondary endpoint in HF trials) |
Neprilysin (degrades natriuretic peptides) |
sacubitril ($7.8 B) |
Severe Eosinophilic Asthma | Blood eosinophil count | IL-5 (needed for eosinophil growth/activation/ survival) | mepolizumab ($2.2 B) |
Rheumatoid Arthritis | C-reactive protein | IL-6 (induces CRP production by the liver) | tocilizumab ($1.7 B) |
Industry stakeholders on the value and use of biomarkers
2. Defining and categorizing biomarkers: What are the use-cases in which an aging biomarker needs to perform?
A biomarker by definition (FDA) is any characteristic that you can objectively measure and that tells you a biological variable of interest about your subject.
You can categorise biomarkers in many different ways. I’m going to only talk about biomarkers in a (clinical) drug development context, as that’s what is most important for us here. Within that, it’s productive to break down biomarkers based on where they are going to be used within a drug development pipeline: in preclinical studies, in Phase 1/2 clinical trials, or in Phase 3 clinical trials.
Types of Aging Biomarkers
|
||||
---|---|---|---|---|
Category of biomarker |
Preclinical aging biomarker | ‘Investigational Aging Biomarker’ | ‘Clinical Aging Surrogate’ | Functional or morbidity-related endpoint |
Primary use | Preclinical (animal) studies. | Phase 1 or 2 trial, as a primary endpoint. | Phase 3 trial, primary endpoint. | Phase 3 trial, primary endpoint. |
Primary goal
What does the biomarker need to do? |
Changes must predict changes in human aging biomarkers, used in Phase 2 or Phase 3 trials. | Changes must predict change in the eventual Phase 3 endpoint (whether surrogate or functional or morbidity / mortality-related)
If your Phase 2 trial succeeds based on this biomarker, you want to have a very low chance that your Phase 3 will fail. |
Predict approvable patient benefit.
The relationship between the ‘Clinical Aging Surrogate’ and the previously used Phase 3 endpoints must be extremely clear, so much so that people (FDA, payors, clinicians) often won’t even want to measure both, just the more practical one. |
Measure the direct benefit to patients ( increased feel or function or survival) that FDA and payors care about. See previous piece re: endpoints. |
Analogous examples from other indications | ||||
Most common endpoints for hypercholesterinaemia today | LDL-C
Atherosclerotic Plaque Area Triglycerides |
LDL-C ApoB |
LDL-C
|
Phase Endpoints before LDL or HDL: Rate of major CV events (MI, strokes, or CV death). e.g. MACE |
Most common endpoints for Type-2 Diabetes today | Blood glucose (fasted or measured in an oral glucose tolerance test)
HbA1c Insulin/C-peptide |
HbA1c
(shorter term than in Phase 3 trials) |
HbA1c
|
Endpoints before Hb1Ac: Rate of diabetes complications (e.g. retinopathy progression, amputations, all-cause mortality) |
Notice how for hypercholesterolemia and Type 2 diabetes mellitus, the biomarkers used as a surrogate endpoint in a phase 2 trial are also the, or among the most common endpoints used in earlier clinical and preclinical development.
The fact that they fulfill the requirements for biomarkers needed in each of these preclinical/clinical development phases obviously makes them amazingly valuable biomarkers. But this does not need to be definitionally true - it is acceptable and common to use different endpoints at different stages of development: the requirements for a biomarker to be a good Preclinical vs a Phase 2 vs a Phase 3 endpoint are different. Let’s break down these requirements with respect to aging.
While we’re at definitions, for the purposes of this assay, we define aging as:
○ The sum of underlying biological processes that happens in every human gradually over long periods of time.
○ That makes people more susceptible to diseases associated with high chronological age (e.g., neurodegeneration, cardiometabolic diseases, cancer).
If you’re not familiar with different standard clinical development phases, or other terms used here, please read this previous essay we’ve written on aging trial design.
3. Requirements that an aging biomarker needs to fulfill for it to be practically useful in drug development or clinical care
I will limit this to aging biomarkers we’d use in (human) clinical trials, because those will be the most game-changing to have developed.
Requirements from Different Aging Biomarker Type | |||
---|---|---|---|
‘Investigational Aging Biomarker’ (Phase 2 endpoint) |
‘Clinical Aging Surrogate’ (Phase 3 endpoint) |
‘Functional or Morbidity-related Aging Endpoint’ (Phase 3 endpoint) |
|
Characteristic that an ideal biomarker would have | Importance of characteristic | ||
Clinical Relevance | |||
Directly measuring current health of patient “Health” in our case refers to FDA’s “feel, function, survive”, i.e. better numbers on the biomarker definitionally need to mean that the patient has an obviously higher quality of life. |
Not Crucial | Not Crucial | Must Have |
Changes in the biomarker precede the functional or morbidity-related outcomes. | Must Have | Must Have | n/a (directly measures functional/morbidity outcome) |
Changes in the biomarker are strongly predictive of future health and functional or morbidity-related outcomes | Important | Must Have | n/a (directly measures functional/morbidity outcome) |
Explainability or biological plausibility: There should be a clear biological rationale, explainable to e.g. clinicians, for how the biomarker is causally involved in the pathological process (of aging). |
Important | Must Have* (for FDA today, could change in the future) |
n/a |
Sensitivity: The biomarker should change when the process it represents (age, aging) is altered. Such as the person ages, or receives an effective intervention against aging. |
Must Have | Must Have | Must Have |
Quick response to modulation: If a patient receives an intervention that is efficacious on that particular patient, the biomarker should show a positive change as early as possible. The quicker the biomarker picks up on the effect of the intervention, the shorter/cheaper the trial is. |
Important, should generally be quicker than Phase 3 endpoints | Important, should generally be quicker than functional or morbidity endpoints | Important |
Specificity, Robustness, Analytical Validity | |||
Resistance to or known effects on the biomarker of different confounding factors, physiological states, such as ones originating from: - sampling and pre-analytical handling, within protocol or with minor protocol deviations - demographic variations of patients - short term state of the patient - unrelated conditions, comorbidities of the patient |
Must Have | Must Have | Must Have |
Validity: the method used to measure the biomarker must be accurate, close to the true value of the biomarker (as measured by gold standard techniques for the same measurement) | Important | Important | Important |
Feasibility and Practicality | |||
Cheap | Not Crucial | Medium | Important |
Low burden (easy, quick) for patient to perform | Not Crucial | Medium | Medium |
Possible to perform on-site in most clinics/hospitals | Not Crucial | Important | Important |
Possible to perform at-home/remotely, or has a remote equivalent | Not Crucial | Not Crucial | Not Crucial |
Safe. Doesn’t pose risk to patients, e.g. non-invasive | Important | Must Have | Must Have |
Readout (result) of the biomarker available ~immediately | Not Crucial | Not Crucial | Not Crucial |
Let’s look at the above table for the key differences, and expand on a couple points.
Firstly, we can see the main difference between a functional endpoint and a surrogate endpoint: A functional aging endpoint needs to measure the health of the patients directly and super obviously, so that it’s directly relevant to payors. We wrote more about this previously. A surrogate endpoint needs to be able to replace existing endpoints, by being extremely predictive of them.
Industry stakeholders on the requirements biomarkers need to fulfill
3.1 Explainability
I’m hopeful that this will change over the medium/long term, but it’s important to notice that as of today, all of the biomarkers, and especially Phase 3 surrogate endpoints are very clearly, explainably and obviously linked to the disease process that they are measuring. We know quite clearly how LDL plays a part in atherosclerosis and CVD events. It’s quite easy to imagine how decreased bone density, lower mineral content of bones will lead to more fragile bones and more fractures and hospitalizations.
Surrogate markers need to be explainable, they can’t be “black boxes”, probably because if we can’t understand how something works, that brings a much higher level of required evidence for the biomarker to be approved and accepted by regulators, payors, clinicians.
Notice that this is not necessarily the same as the biomarker needing to be causally implicated within the pathological process: HbA1c for example is perhaps more a very very closely linked side effect of the underlying pathology (which includes elevated blood glucose) than a crucial causal element of the disease process.
3.2 Predictive performance for future health outcomes
One of the points to expand more on is aging biomarkers' ability to pick up on and predict clinically relevant changes, outcomes.
Existing aging biomarkers are generally trained to predict either mortality or multimorbidity/age-related morbidities. This is a good direction and a good first step.
There’s different levels to validate the predictive performance of a biomarker though. I’ll arbitrarily separate these into 3 levels here. These levels aren’t sequential, for example an interventional dataset would be an amazing asset for discovery, and small analytical studies might be necessary to be able to design a study that can collect samples/data meaningfully for the purpose of discovery.
3.2.1 Biomarker discovery using retrospective cohorts
The first level is exploratory discovery on associative data. We look at data, generally retrospective cohorts. Then we establish associations between a biomarker or a set of biomarker, and the outcome we care about. This is perhaps the step where the most intelligence is needed. This is also the level where aging biomarker development is at today.
How effective we are at this step however, obviously depends A LOT on what discovery datasets are available for us to research on.
3.2.2 Analytical Validation, Initial Clinical Correlation
Once we have a biomarker discovered, we need to develop it and validate it. The first steps of the validation process would generally be about validating its robustness (see next section on confounding factors) and clinical relevance in smaller, but very well-characterized datasets to confirm that it indeed correlates with the outcomes we care to predict.
3.2.3 Prospective validation of the biomarker in interventional trials
Virtually all existing Phase 2 or Phase 3 primary endpoints in use today have been validated in interventional trials before being used on their own as a primary endpoint. The way this is generally done, is that newly developed biomarkers, once they are good enough, are added to interventional trials as secondary endpoints - these are things that you measure over the course of the clinical trial for additional information, but the success of the trial isn’t hinged on them.
The data you can uniquely get here, provided that both the developed biomarker and the intervention in the trial works as hoped, is that your biomarker changes as a result of the intervention, in the same direction and in similar magnitudes as your original endpoint, but faster than your original endpoint. For example, a great aging biomarker would pick up on a change in ‘biological age’ over the course of a couple of months, while you would need to wait years for the result of a multimorbidity trial, or a decade for the result of a trial measuring all-cause mortality.
Only when this evidence exists can you start talking to regulators about the acceptance of your endpoint as a surrogate or a reasonable clinical trial primary endpoint.
You can notice an apparent chicken and egg problem here: you need an intervention that works, or multiple interventions that work, to conclusively validate a surrogate endpoint. But to be able to run a trial that can measure the effect of the intervention well, you need a good biomarker.
There is no magic solution here to this. Based on how surrogate endpoints have been developed so far (treatments were developed first, and subsequently surrogate biomarkers were validated) this paradox stands. Hence, it is likely that the very first endpoints used in de-facto aging trials are going to be conservative, functional or multimorbidity-related endpoints.
3.3 Resistance to confounding factors
Another major limitation of existing aging biomarker development efforts is the too little respect towards the robustness of the biomarker they create. These considerations are absolutely crucial for developing any biomarker with any clinical meaning or usefulness in drug development.
What are the factors to consider here?
3.3.1 Effects of differences in sampling and sample handling process pre-analysis.
Environment of sampling and sample processing: temperature, light. (These feed into what some biomarker people would call ‘inter-laboratory reproducibility’)
Tools/devices used, minor differences in sampling tubes, collection device, and in reagents used (‘inter-laboratory reproducibility’, ‘lot-to-lot consistency’)
Time from sampling to assay, and between different processing steps (‘inter-laboratory reproducibility’ ‘inter-operator reproducibility’ ‘time-based consistency’)
3.3.2 Effects of differences in a patients ‘acute’ state
Is there a difference on the biomarker readout between:
Different times-of-day? (circadian rhythm, restfulness)
Different times-of-week, or times-of-month? (hormonal cycle, restfulness)
Time since last meal, last sleep, last exercise, quality of those
Time since last illness/infectious disease, other small medical events causing e.g. inflammation
3.3.3 Effects of differences in patients’ demographic and non-relevant disease
Age, sex, ethnicity, education/socioeconomic background
Diseases and comorbidities (not closely relevant to the pathological process measured)
3.3.4 Effects of concomitant medications, not relevant to the pathological process measured
In an ideal case, the biomarker is proven not to be affected by such factors. In a less ideal but still acceptable case, the biomarker is affected by some of these factors, but those effects are well-characterised and will then contribute to the protocol (“instructions to administer”) of the biomarker.
4. What data does developing such aging biomarkers require?
I claim that ~all limitations of aging biomarker efforts to date can be explained by:
Them being developed by players (academic labs, direct-to-consumer companies) whose future is not directly dependent on biomarkers functioning reliably, and hence lack the incentives to perform the more boring and less innovative, but important work of developing biomarkers that are robust, and actually work.
The datasets based on which aging biomarkers have been developed are not good enough.
So far, I discussed the requirements an aging biomarker needs to be able to fulfill for it to be able to serve different purposes.
These requirements directly describe the data needed for developing such biomarkers. But let’s specify and add some nuance.
4.1 Important aspects of an appropriate discovery dataset
An ideal discovery dataset for aging biomarker development is
Large.
Captures the diversity of the population in which the biomarker is intended to be used:
Demographically. It includes individuals across different sex, ethnicity, geography, genetic background, socioeconomic background, health status.
In age groups, i.e. it includes younger, middle aged and older participants.
Balanced. It includes different diverse groups, and overlaps of those groups in not very asymmetrical proportions.
Time series. That is, it includes multiple clinical outcome data AND sampling points within patients,
ideally over a long period of time within a patient,
and with many sampling points per patient.
Ideally with consistent times between sampling occasions across patients
Rich in phenotypic and clinical data
including the onset of any acquired diseases or occurred medical events diagnosed and documented early in their progression.
Including functional measures related to muscle function, cognition
Lifestyle (diet, exercise) and environmental data
Has biomarker data (imaging, omics). As much as possible within the modalities we could be interested in investigating
Medications taken
Samples have been collected in a standardised, uniform way. The factors that this standardisation needs to apply to is information that one would get from small studies investigating the robustness of biomarkers and assays. For factors where this standardisation is not available, having metadata on the circumstances of the sample collection, state of the patient, how and how much we the sampling/sample processing has deviated from the ideal scenario is especially important/
Easy to access, patients have been appropriately consented, in countries with non-crazy data protection regulations.
This is the kind of discovery dataset that would enable us to discover biomarkers, out of which some biomarkers are going to capture what we want to capture (i.e. biological aging). I won’t here go into how we would do the discovery process based on this data, instead let’s look at what data we need to do going forward.
4.2 Data to validate the robustness of the assessment
Any good discovery dataset still only gets us to the starting line. The biomarkers we have discovered now need to go through a number of validation steps before we can believe that they can have real value in a drug development process. This is the boring part, it won’t lead to nice publications.
Even if we have a very good discovery dataset, that most likely still won’t capture some factors that could plausibly affect our discovered biomarkers. We need data to prove that those effects are not actually large on our biomarkers.
If they are affecting our discovered biomarkers, we’re going to have two options to proceed
Discard those biomarkers, go back a step, select some other biomarkers based on our discovery dataset.
Limit the settings in which our biomarker can be used.
Think of this option as creating an instruction manual for the biomarkers we’re developing. If for example we find that our biomarkers are messy if we measure them on samples collected right after a meal, we can prescribe to the users (clinicians, nurses, patients) to only measure the biomarker >=2 hours after the patient’s last meal (as with blood glucose tests).
The confounding factors to consider are mentioned in the section above.
What are potential confounding factors that we further need data on, so we can validate they don’t confound much? This depends on:
What are factors our discovery dataset did not have a (good enough) grasp on.
What type of biomarkers have we selected, what could reasonably affect them. This is a soft definition and we’ll need to use prior data common sense. FDA will too.
To illustrate: it is not likely that our bone density measure will be affected by time-to-previous-meal. It is likely that blood glucose, or many blood omics markers will be.
There are some things that we can safely assume our discovery dataset won’t have captured, and it’s reasonable to expect that a complex aging biomarker would be affected by it, so I can start a preliminary list. These are datasets that I think we are likely need to validate our discovered aging biomarker:
Molecular biomarkers (modalities that we’re interested in, ie: our discovered biomarker is based on) derived on samples collected
at different times-of-day (e.g. morning and evening) from the same patient. In this data, capture hours of sleep during the night preceding testing, and average daily hours of sleep.
at different times-of-month (e.g. every 5 days over the course of a month) from the same patient
at different times since the patients last meal (e.g. before and after consuming 75grams of glucose)
during, shortly after, and after full recovery from an illness/infectious disease
We need all these from a couple (think: 5-40) patients.
In addition, I would want data on samples collected within the same sampling occasion, but handled differently. Most of these data points are often provided already, i.e. the molecular assay validated already by the maker of a molecular test, so we may not need to collect new data on them.
4.3 Interventional data
Let’s say we discovered sensitive biomarkers using our discovery dataset, then we did the important work of validating that they are likely actually measuring what we think they are measuring, instead of some random confounder. Next, we need to validate that our biomarkers are clinically relevant, not just robust. For this, our biomarker, if we want to use it as a Phase 2 or Phase 3 trial endpoint, needs to respond to modulation quicker than other options, and needs to predict future health. (See ‘Clinical relevance’ section in table above for more nuance.)
To illustrate this point: someone’s body height will change with age (in certain age-ranges) and will be quite robust before/after a meal, etc, but won’t make a good aging biomarker because even if we had a treatment that slows down aging significantly, it probably wouldn’t affect body height.
What data do we need to validate that changes in our biomarker, precedes changes in functional or morbidity-related outcomes, if something (e.g. a therapy in a clinical trial) affects such outcomes? In other words, how do we know that our aging biomarker picks up on the therapeutic modulation of aging?
Unfortunately, to validate this, we need data from:
one or multiple clinical trials (or equivalent)
in which the investigated intervention has affected the gold-standard functional or morbidity/mortality related outcomes. In our case this basically means that the intervention has affected aging.
Given that we do not at the moment have an intervention that meaningfully modulates aging in humans, this may seem like a chicken-or-egg problem: We can’t validate an aging biomarker before we have an aging therapeutics that works. And it’s much harder/longer/more expensive to validate an aging intervention before we have an aging biomarker that works and we have validated that it works.
This is not a paradox I (or I think anyone) will be able to provide a smart solution to in the next sentence. It’s actually a problem.
The way to solve it is to trial one, or most likely multiple aging intervention first against endpoints that are not innovative, but are measuring things that we know for sure matter. See our previous work on multimorbidity prevention trial designs. The proposed design there I believe is close to how the first interventions against aging will be trialed. In these first trials, we would then measure the aging biomarkers we have discovered and validated the robustness of as secondary endpoints. If the biomarkers work, within a couple (successful!) trials we will rack up sufficient evidence to
The first companies running aging trials will have a harder job. In exchange, they get a large financial reward if they succeed, which I think makes it worth trying.
This chicken-or-egg, you need a viable intervention first before you can have better biomarkers is not unique to aging. Other indications also went through the same issues, and had to follow the same roadmap. There have been drugs approved for hypercholesterinaemia before LDL became a primary endpoint. There have been diabetes therapies approved before we started using HbA1C as a primary endpoint. More recently, we’ve seen bone density become a primary endpoint, again only after many osteoporosis therapies were approved.
5. Does this data exist? Is this data being created? What are the data gaps we need to fill to develop aging biomarkers that work ?
Maybe we have all this, or some of this data already. Let’s first look at what data some of the more prominent or meaningful current aging biomarkers have been developed based on.
5.1 What data were existing aging biomarker development attempts based on?
Aging biomarker | Datasets/cohorts used for in the development |
---|---|
DNAm PhenoAge | NHANES III (training), InCHIANTI study (training), Women's Health Initiative, Framingham Heart Study Offspring (FHS), Normative Aging Study (NAS), Jackson Heart Study (JHS) |
GrimAge | Framingham Heart Study Offspring (training), InCHIANTI study, Women's Health Initiative, Normative Aging Study, Jackson Heart Study (JHS), Baltimore Longitudinal Study of Aging (BLSA), Lothian Birth Cohort (LBC) 1921 and 1936 |
DunedinPACE | Dunedin Study 1972–73 birth cohort (training), Understanding Society Study, Normative Aging Study (NAS), Framingham Heart Study Offspring (FHS), Environmental-Risk (E-Risk) Longitudinal Twin Study |
Oh et al. 2023 organ aging plasma proteomics clock | Knight Alzheimer’s Disease Research Center cohort (KADRC) (training), Covance Study of Lifetime Health, LongGenity, Stanford Aging and Memory Study (SAMS), Stanford Alzheimer's Research and Disease Center (SADRC) |
Argentieri et al. 2024 plasma proteomics clock | UK Biobank Pharma Proteomics Project (UKBB PPP), China Kadoorie Biobank (CKB) |
Fried Frailty Phenotype | Cardiovascular Health Study (CHS), Women’s Health and Aging Studies (WHAS) |
Rockwood Accumulation of Deficits Frailty Index | Canadian Study of Health and Aging (CSHA), National Population Health Survey (NPHS) |
FRAIL Scale | African American Health Study (AAH), Healthy Aging in Neighborhoods of Diversity across the Life Span (HANDLS) |
Clinical Frailty Scale (CFS) | Canadian Study of Health and Aging (CSHA), UK Hospital Admissions Cohort |
WHO Intrinsic Capacity Index | English Longitudinal Study of Ageing, China Health and Retirement Longitudinal Study (CHARLS), Integrated Care for Older People (ICOPE) initiatives |
Next, let’s look at some of the main properties of these datasets.
To keep things clear, I’m going to exclude from the table some of the discovery cohorts that these biomarker discovery efforts have used, but are not actually very good datasets for aging biomarker discovery, or are very similar to others in the table.
To avoid the assumption that existing aging biomarker discovery efforts have been using the optimal discovery cohorts, I’m going to add some additional cohorts from my company’s database that I think would be some of the highest-value cohorts for aging biomarker discovery.
To fit here, I’m limiting the columns to just an overview, and for the properties most relevant to aging biomarker development.
Color codes in the table are subjective categorization on whether a cohort for a particular aspect is likely sufficient (green), or almost certainly insufficient (red) for developing clinically viable aging biomarkers.
Note that the table is reflective of the current state of the cohorts, that in some cases can change going forward. If for example the availability of omics data on a cohort is limited, but samples are available, omics data can be generated.
Cohort | n (total) | Study Population | Year of initial sampling | Age range |
Are there multiple samples per patient? How many? | Available omics data | Available functional and clinical data | Available samples | Data to verify biomarker robustness (as defined in section 4.2) |
Interventional data (as defined in section 4.3) |
---|---|---|---|---|---|---|---|---|---|---|
Cohorts used in previous aging biomarker studies | ||||||||||
FHS Offspring | 5,124 | US, Framingham. Offspring of initial cohort, which was random sample | 1971 | 5–70 | Yes. Active follow-up with scheduled in-person exams every 4–8 years; nested studies common. | Limited proteomics (some proteome, mass spec), metabolomic (on multiple time point), microbiome, WGS. DNA methylation data on 2× repeat samples from 1,202 patients. | Very good. Physical exams, imaging, clinical outcomes. | Blood (DNA, serum, plasma), urine, PBMCs archived at NHLBI. | ~no | ~no |
inCHIANTI | 1,453 | Tuscany, Italy | 1998–2000 | 21–102 | Yes. Appr. every 3 years. Some dropouts, e.g. 8th year followup attended by ~900 patients, then ~600, ~600. | SNP array, DNA methylation on 2× repeat samples on 499 patients. Whole blood transcriptomics, metabolomics, 93 circulating proteins | Good. Tests on mobility and frailty. Cognitive tests. Clinical records | Blood, urine | ~no | ~no |
Jackson Heart Study | 5,200 | US, Jackson metropolitan area, community-based cohort of African Americans | 2000–2004 | 21–84 | 2 follow-ups after baseline, over 16 years, good retention. | Genotype array on all, some WGS on 3,418 Panel and cardiovascular related proteins, some metabolomics, DNA meth on single time-point | Very good. Physical functional frailty, cognitive assessments | Blood (DNA, serum, plasma), urine. PBMCs in some cases. | ~no | ~no |
LonGenity | 2,140 (650 centenarians and ~1,500 offspring) | US, Ashkenazi Jews | 2008 | 95+, and offspring 50–80 | Yes for some, 60% of cohort provided multiple samples, 5–7 years from baseline | WGS, DNA meth, SomaLogic proteomics on multiple time points. | Very good. Cognitive, physical, ADL, medical history and longitudinal tracking | DNA samples from blood, serum/plasma, and peripheral blood mononuclear cells stored | ~no | ~no |
BLSA | 3,400 | US, Baltimore/DC area, community-dwelling adults | 1958 | 20–96 | Yes, very good. Under 60 are seen every 4 years, 60–79 every 2 years, and ages 80+ seen annually | SNP arrays only, targeted metabolomics, DNA meth | Very good. Physical functional assessments, cognitive. Imaging | Exceptional. Blood, urine, muscle, tissue saliva, CSF, serum post-mortem | ~no | ~no |
K-ADRC | 5,200 | Alzheimer’s, MCI and control patients from Missouri, US. | 1980 | 45–64 and 65+ | Yes, good. Typically 5–8 year follow-up for patients and every 1–3 years for 45-64 population. For some, missing data points. Most repeat samplings are regularly just in the 100s. | Genomic arrays, WGS, epigenomics for subset patients: WES, cytoscan (transcriptomics, somascan) | Good, because of frequent clinical visits. Advanced cognitive testing. More limited in physical frailty related measures. | Exceptional. Brain, whole blood, CSF, etc. from more than 400+ donors, dermal fibroblasts, iPSCs | ~no | ~no |
Covance | 1,028 | US, general population | ~2010 | 20–90 | No single blood sample only at baseline | Plasma proteomics (somascan). | Some. Clinical data, no cognitive or frailty | Plasma/serum/whole blood | ~no | ~no |
UKBB | 500,000 | UK gen pop. | 2006-2010 | 40–69 | Limited. Once 4–5 years after baseline for 20,000 patients. Imaging cohort 5–15 years after initial visit, 100,000 patients. | WES, WGS, Olink proteomics, metabolomics | Physical measurements at baseline, some follow-up surveys, but generally mainly just clinical and questionnaire data. Limiting for aging biomarker dev. | DNA, plasma, serum, saliva. In quite limited quantities. 6 plasma + 3 serum in 0.5 mL tubes | ~no | ~no |
CKB | 512,000 | China, 10 diverse regions | 2004–2008 | 30–79 | Only from 25,000 patients, once | SNP arrays for 100,000+ | Clinical record linkage ~only. Significant lifestyle + environmental data | Plasma, buffy coat | ~no | ~no |
CSHA | 10,263 | General population, multiple centers across Canada | 1991–92 | 65+ | Yes. 5,586 at 5-year follow-up, and 3,211 at 10-year follow-up | Basically nothing, some genotypes on a handful of loci | Cognitive screens, detailed clinical at visits. Activities of daily living. Physical exams. Frailty data. Lots of questionnaires. | DNA and plasma from 2,129 participants, not longitudinally | ~no | ~no |
WHAS | 1,438 | Female only, baltimore | 1992 | 65+ | No samples collected at follow-ups | Basically nothing, some genotyping data | Extensive. Physical, cognitive, medical history, follow-on tracking | Sparse blood samples | ~no | ~no |
Cohorts not used in previous aging biomarker studies, but maybe should be | ||||||||||
HUNT | 120,000 total. Repeat samples from 20k+ patients | General population in a Norwegian municipality | 1995–1997 (HUNT2) | 20+ | Yes, baseline + 2 follow-ups from a large number of patients. There have been multiple studies, with overlapping populations. E.g. 26,000 participants took part in HUNT2 (1995–1997) and HUNT3 (2008) and HUNT4 (2018–19) | Limited, mainly just genotyping array data, from ~90,000 patients in 2022. More omics on subsets. | Ok. Linked to clinical records. Physical exams only when samples are collected. | Serum, plasma, DNA, urine, some RNA and cell lines. | ~no | ~no |
Tromso | 45,000 | General population in a Norwegian municipality | 1986 | 20+ | Yes, 1–6 studies per patient. Multiple population studies in same geo, individuals often come back. Repeat sample data 4–2x times from 4–8,000 patients, ~7 years between sampling | Limited. Genotype arrays from most patients. WES on 2,000 participants | Ok. Link to clinical records, anamnesis and basic physical exam, behavioural/environmental questionnaires | plasma/serum. DNA | ~no | ~no |
NSHDS VIP | 114,000 | General population in a Swedish municipality | 1995 | 40+ | Yes, up to 3 samples, on every 10th birthday. N e.g. 33,189 with 2× samples | Limited. Genotyping arrays yes. Some DNA methylation. Some proteomics (Olink, somalogic) on small subsets. Some longitudinal proteomic data on T2DM, MI, stroke. Lots of metabolomics data | Good. Physical exams, lifestyle/environmental questionnaires, linkage to clinical data. Missing robust frailty measures | plasma/serum/buffy coat | ~no | ~no |
5.2 What are the limitations in the datasets used for aging biomarker discovery and development?
My hope is that the color coding in the table above highlights the main limitations that are consistent across the cohorts, but let’s list the few most important ones.
5.2.1 No robustness and no interventional data
To get it out of the way, let’s first mention that all of these cohorts have very limited data points that would validate an aging biomarker’s robustness (section 4.2) or interventional relevance (section 4.3), in the sense that I defined in sections above. This is not surprising and is not terrible, as we’re going to rely on these cohorts as discovery datasets (described in section 4.1) - but it does confirm the need for data to validate these aspects of any future biomarker.
5.2.2 No time-series omics data on large cohorts
The main limitation that I want to highlight is that there are no cohorts with large sample sizes, where samples are available from multiple time points per patient.
Looking at the table, we see that:
Some of the smaller cohorts that have been used for aging clock development have collected data and samples from the same participants at multiple time points. However:
All these cohorts are very limited in sample size.
Even though samples have been collected at multiple (2-6) time points, omics data is generally not available at all those time points from participants. Generally, assays have been done on just samples collected in a single sampling occasion per patient. In some cases (FHS, inChianti) there’s DNA methylation data on 2 time points per patient, that the aging clocks have relied on. These time-series data points are still on just a subset of the cohort, and arguably 2 data points per patient are still limited in their ability to capture longitudinal changes within a person.
The large biobank cohorts, such as UKBB, CKB have large sample sizes, but haven’t consistently collected samples repeatedly from participants. In these 2 particular examples, they have a single follow-up sample collected on a subset of participants, after a relatively random follow-up period.
These 2x repeat samples have generally not been assayed yet.
Importantly: Proteomic analyses on all of UK Biobank’s plasma samples (500,000 patients total, 100,000 of have samples from 2x time points) are planned, and are currently ongoing for the first 300,000 samples (not including the repeat samples), and data will start to be released in 2026. As of what is public info on January 2025, funding for the second half of this project, which would include the 100,000 repeat samples is not yet committed.
Assuming that UKBB PPP will find the funding for the proteomics assays on these repeat assays, the resulting data will provide room for a step-change in aging-biomarker discovery 2027-2028 onwards. The limitation that remains in that case, is that these repeat samples will still only be from 2 time points from each patient, approximately 15 years apart. It seems likely that 2 time points per patient won’t capture longitudinal changes associated with aging precisely.
5.2.3 Limited longitudinal data on younger individuals
This is perhaps a lower priority limitation that we’re going to need to deal with longer term, but it is worth a small note that any of the large cohorts currently lack data on healthy younger individuals. (Disease prevalence is much lower in younger people -> data on younger people is much less valuable for most disease-focused (vs aging focused) drug discovery -> most cohorts ignore collecting samples and data from younger individuals.) Many believe that biological aging is ongoing in young middle ages as well. If at one point we want to understand, measure and modulate the biological aging process in those earlier ages, we’re going to need much better datasets on individuals in their 20s to 40s.
6. The datasets needed to resolve bottlenecks in aging biomarker development
To recap quickly, so far in this piece we have:
Established that developing new aging biomarkers that can actually be useful for drug development is important
Thought through the requirements that an aging biomarker needs to fulfill for different use cases
Discussed the steps and data needed to be able to develop and validate biomarkers that can fulfill these requirements
Explored the currently available data landscape that is used or could be used for aging biomarker development, and the gaps that exist in the available data.
Based on these data gaps, I’ll now take the liberty to propose two separate projects that would remove these data gaps and would therefore remove an important and likely un-bypassable bottleneck from developing viable aging biomarkers.
These projects/datasets would be feasible to create today (unlike perhaps the interventional data needed, due to the lack of a good treatment targeting aging).
More work on designing details of these projects, and confirming their utility is necessary. Stay tuned, or message us (Norn Group) to help (please!) or learn more.
Project-A: Create the ultimate discovery dataset for aging biomarker development
- time-series longitudinal data from a large number of patients, to fill the data gap that existing cohorts leave
Objective
Create a discovery dataset that complements and resolves limitations of the best datasets that are used or could be used for aging biomarker discovery today or in the near future, in particular the UKBB PPP dataset (Olink proteomics on 500,000 patients, 1x repeat samples on 100,000 patients, 5-15y after baseline.
There are many generally important aspects of a good discovery dataset (see section 4.1). Out of those, most importantly, that the dataset created by Project-A needs to have:
at least proteomics assays (Olink, or harmonizable with Olink, so the data can be used in unison with UKB PPP data)
on samples collected in multiple (as many as possible, more than 2 which is current gold standard) sampling occasions per patient longitudinally
over a long period of time
From a large number of participants (as many as possible, current largest gen pop time-series omics datasets are in the low thousands)
From the general population (vs specific to a particular disease area)
With rich phenotypic, environmental and disease-state data, also collected longitudinally
How, next steps to take
Creating a longitudinal dataset from scratch would take a long time, which in the long run would likely also be worthwhile, but it’s more sensible to rely cohorts/biobanks/sample collections that already exist and would allow for the creation of a dataset that most closely resembles the ideal discovery dataset described above.
That is, the rough steps to take on this project, that we have started on and are continuing with:
Map out longitudinal cohort studies conclusively that have samples from multiple time points per patient, over a long period of time
Select the cohort or cohorts to partner with, negotiate access to stored samples
Sponsor the required assays on the selected samples, structure the resulting data into a usable and accessible dataset
Impact
The project fills the data gap that will remain and bottleneck aging biomarker discovery after some large ongoing projects that are planned over the coming years are completed. Given other datasets, and the dataset to be created in this project, the best possible resources will be created to enable the discovery of aging biomarkers with clinical relevance, and are ready to be progressed for later validation steps.
Project-B: Create the necessary robustness dataset(s) to eliminate confounders inherent in aging biomarkers
Objective
Create the data needed to validate and refine the robustness of an aging biomarker against the potential confounders that are most likely to affect an aging biomarker that are developed based on available discovery datasets.
Even the best discovery datasets won’t capture data on all potential confounders that might affect a complex aging biomarker. It makes sense to collect data on these potential confounders separately, so that their effect can be investigated and controlled for.
How, next steps to take
Our preliminary cohort-mapping work suggests that much of this robustness data, or samples based on which the data could be created do not exist today. This would mean that for the creation of these datasets, we need to start from scratch, by collecting samples from patients in the right way.
The steps we have started taking and need to continue with to create these robustness datasets:
Complete search for potential sample collections that could be utilised for creating this robustness data,
Confirm and prioritise the exact confounders to be captured, complete power calculations, write study protocol
Run small small observational clinical studies, on healthy volunteers, in which we (prelim list)
Collect repeat samples from the same participants at different times-of-day
Collect repeat samples from participants at at different times-of-month
before/after certain behavioural confounders (exercise, nourishment, acute illness)
Biobank collected samples for future analyses, run the selected assays on part of the samples
Make resulting data accessible for aging biomarker research
Impact
The project fills a crucial data gap that creates the bridge between discovery datasets and real world clinical applications of any potential biomarkers. Without these robustness datasets, it’s likely that discovered biomarkers will be hindered by unknown confounders, effectively prohibiting their application in drug development, despite the large amount of resources going into large discovery cohorts.
Project-C: Recontact participants of an existing longitudinal cohort for further sampling and phenotypic data collection
Objective
Create a discovery cohort with multiple sampling occasions and with rich functional and disease data on participants. This is an alternative and/or complementary approach to Project-A, to complement and resolve key limitations of existing discovery datasets. Specifically, to fulfil the need for:
Multiple sampling occasions per participant
Deep and precise functional (mobility, cognitive) and clinical data on each participant
Broad and appropriate consent from the patients, so that the molecular data created can be used widely by researchers
How, next steps to take
This project relies on identifying existing cohorts where:
One or multiple samples have been collected and biobanked from in the past,
No repeat sample collection from the recent past on participants and/or no good functional data available on the participants, and/or no appropriate consent from participants
Recontacting participants is feasible, both ethically and practically
The cohort is sufficiently large to create a large discovery cohort even considering that not all recontacted participants will participate in the follow-up sampling of Project-C.
The high level steps of this project are to:
Identify the most appropriate cohort, and confirm the feasibility of recontacting participants
Design protocol and secure ethical and other approvals
Recontact participants for further sampling and data collection
Run assays on collected samples and structure into a shareable database for research
Impact
The project fills the data gap left by the discovery cohorts that currently exist (or are being created) leave. It achieves this goal by creating a dataset that involves multiple sampling occasions per participant, with appropriate metadata and permissions from patients for the wide use of that data. Given other datasets, and the dataset to be created in this project, the best possible resources will be created to enable the discovery of aging biomarkers with clinical relevance, and are ready to be progressed for later validation steps.
This piece was produced as a Talent Bridge Award Project. To learn more about how you can support Talent Bridge and Norn Group click the button below!
Published April 2025. Last updated April 2025.
© Norn Group 2025