The first generation of wearable-based clinical data was a revelation — and a disappointment. Step counts, heart rate trends, and sleep proxies gave researchers a window into patients' daily lives, but the noise often drowned the signal. Today, the field is moving beyond wrist-worn sensors toward a more deliberate integration of real-world data (RWD) sources that combine digital, clinical, and environmental inputs. This guide is for clinical research teams, data scientists, and regulatory strategists who need to decide which RWD approach to adopt for their next study, and how to avoid the traps that make real-world evidence unreliable.
We will walk through the option landscape, the criteria for choosing among them, the trade-offs you must accept, and the implementation steps that separate useful data from expensive noise. By the end, you should be able to assess whether a given RWD strategy fits your trial's phase, endpoint sensitivity, and regulatory risk tolerance.
Who Must Choose — and Why the Window Is Narrow
The decision about which real-world data strategy to adopt is no longer academic. Multiple forces are compressing the timeline: regulators in the US and Europe have released frameworks that explicitly accept RWD for label expansions and post-market commitments; payers increasingly demand real-world effectiveness data before adding drugs to formularies; and patient advocacy groups push for evidence that reflects lived experience, not just clinic visits. Teams that wait for perfect data infrastructure risk falling behind in both speed and credibility.
This choice is most acute for three groups: sponsors planning a Phase IIIb/IV study where traditional endpoints are impractical; investigators running pragmatic trials in community settings; and biotech startups that cannot afford the overhead of a full traditional RCT but still need convincing evidence for partners. Each group faces a different mix of constraints — budget, timeline, data privacy, and endpoint sensitivity — but all need to act within the next 12 to 18 months to align with upcoming regulatory guidance updates.
The core question is not whether to use real-world data, but which kind, at what granularity, and with what validation. The decision framework we present here is built from patterns observed across dozens of recent trial designs and regulatory submissions, though we caution that each study's specific context will shift the weights of the criteria.
The Option Landscape: Three Distinct Approaches
Traditional Wearables with Clinical-Grade Sensors
The most familiar option remains dedicated study devices — often single-purpose, clinical-grade wearables that measure a specific biomarker such as continuous glucose, blood pressure, or actigraphy. These devices typically have better signal-to-noise ratios than consumer wearables and come with validated algorithms. The trade-off is cost: per-patient device and logistics costs can run into thousands of dollars, and patients may need to wear a secondary device alongside their personal smartwatch, which can affect adherence.
Passive Digital Phenotyping via Smartphone and Smartwatch APIs
An emerging alternative uses the sensors already in patients' personal devices — accelerometers, gyroscopes, GPS, screen interaction patterns — to infer health states without requiring patients to wear a dedicated study device. This approach dramatically reduces logistical burden and cost, but introduces variability across device models, operating systems, and user behavior. Researchers must invest heavily in data cleaning and normalization, and the regulatory acceptance of such inferred endpoints is still evolving. Some teams have succeeded with gait analysis for Parkinson's disease and keystroke dynamics for cognitive decline, but each new endpoint requires bespoke validation.
Structured RWD Integration from Electronic Health Records and Claims
A third path bypasses patient-worn sensors entirely, drawing instead from existing clinical data streams: EHRs, pharmacy claims, laboratory results, and patient registries. This approach offers large sample sizes and long follow-up at relatively low marginal cost, but the data are collected for clinical or billing purposes, not research. Confounding by indication, missing data, and inconsistent coding are persistent challenges. Recent advances in natural language processing and structured data extraction have improved usability, but the gap between real-world and trial-grade data remains significant for many endpoints.
Each of these approaches can be combined — for example, using EHR data for baseline characterization and a wearable for a specific digital endpoint — but hybrid designs multiply complexity in data harmonization and statistical analysis.
Criteria for Choosing Among RWD Approaches
Signal Fidelity vs. Patient Burden
The most fundamental trade-off is between how precisely the approach measures the construct of interest and how much it asks of patients. A clinical-grade continuous glucose monitor provides high-fidelity glycemic data but requires a sensor insertion that some patients find uncomfortable. A smartphone-based typing test for cognition imposes zero burden but yields a noisier measure. Teams should map each candidate endpoint to a fidelity-burden grid and decide where their study can tolerate compromise.
Regulatory Acceptance Trajectory
Not all RWD sources are equally accepted by regulators for the same purpose. For primary efficacy endpoints in pivotal trials, FDA and EMA still generally prefer traditional clinical outcomes or well-validated digital endpoints with a clear evidence package. For secondary endpoints, exploratory analyses, or post-market commitments, acceptance is broader. Teams should review recent regulatory guidance documents and precedent decisions for their therapeutic area, and factor in the likelihood that a given RWD approach will require a qualification process or an additional validation study.
Data Quality and Missingness Mechanisms
Every RWD source has characteristic patterns of missing data. Wearables suffer from non-wear periods and device failures. EHR data may be missing for patients who switch providers or whose visits are not captured in the system. Passive phenotyping data can be lost when patients upgrade phones or disable permissions. Understanding whether data are missing at random, conditionally at random, or not at random is critical for choosing appropriate imputation or sensitivity analysis methods. Teams should plan for a missing-data analysis before data collection begins, not as an afterthought.
Scalability and Cost Per Patient
Scalability matters not just for the current trial but for potential follow-up studies. A smartphone-based approach can be deployed to thousands of patients in days at minimal cost. A clinical-grade wearable program requires device procurement, shipping, training, and return logistics that do not scale linearly. Teams should model total cost per patient including data processing, quality control, and regulatory consulting, not just device unit cost.
Trade-Offs in Practice: A Structured Comparison
| Criterion | Clinical-Grade Wearable | Passive Phenotyping | EHR/Claims Integration |
|---|---|---|---|
| Signal fidelity | High for targeted biomarker | Moderate to low; device-dependent | Variable; depends on data source and coding |
| Patient burden | Moderate to high | Low (passive collection) | None (retrospective) |
| Regulatory precedent | Strong for certain endpoints | Evolving; few precedents | Established for safety and some effectiveness |
| Data missingness risk | Moderate (non-wear, device failure) | High (permission changes, OS updates) | High (non-random, system-dependent) |
| Scalability | Low to moderate | High | Very high (if data access exists) |
| Cost per patient | $500–$5,000 | $10–$100 (data processing) | $50–$500 (data licensing + cleaning) |
The table above oversimplifies, but it highlights the dominant pattern: no single approach wins on all criteria. The best strategy for most studies is a hybrid that uses one primary RWD source for the primary endpoint and a second source for sensitivity or exploratory analyses. However, hybrids require careful statistical planning to avoid multiplicity and to ensure that the primary analysis is pre-specified.
One composite scenario: a Phase IV study of a diabetes drug using continuous glucose monitors (clinical-grade wearable) for the primary endpoint of time-in-range, combined with EHR extraction for secondary endpoints of HbA1c and hypoglycemia events. The wearable provides high-fidelity glucose data; the EHR provides standard-of-care comparators and long-term safety data. The missingness risk is moderate for both, but they are likely missing for different reasons, so a joint model with shared random effects can handle the incompleteness.
Implementation Path After the Choice Is Made
Step 1: Pre-Study Validation and Pilot Testing
Before deploying any RWD approach at scale, run a pilot with 10–20 participants to confirm that the data collection pipeline works end-to-end: device pairing, data transfer, de-identification, and quality metrics. For passive phenotyping, test on at least three different device models and operating system versions. For EHR integration, run a feasibility query to assess completeness of key variables in the target population.
Step 2: Write a Pre-Specified Statistical Analysis Plan
Real-world data tempts researchers to explore and then report the most favorable analysis. To avoid this, pre-specify the primary estimand, the handling of missing data, the covariates, and the sensitivity analyses. Include a section on how device non-adherence or data gaps will be defined and addressed. This plan should be finalized before any outcome data are accessed.
Step 3: Build a Data Quality Dashboard
During the study, monitor data quality in near-real-time. Track device wear time per participant, data completeness by day, and outlier values that may indicate sensor malfunction. For EHR data, monitor the rate of missing visits and the consistency of coding across sites. A dashboard allows the team to intervene early — for example, by calling a participant who has not worn the device for 48 hours, or by querying a site that has not uploaded any records.
Step 4: Plan for Regulatory Interaction
If the RWD approach will be used for a regulatory submission, engage with the relevant agency early. Present your device validation data, your missing-data strategy, and your pre-specified analysis plan. Many regulators offer qualification advice for novel digital endpoints. Even if you do not seek formal qualification, documenting the interaction timeline strengthens your submission.
Risks of Choosing Wrong or Skipping Steps
Data That Cannot Be Interpreted
The most common failure is collecting large volumes of data that cannot be converted into a reliable endpoint. A team that deploys passive phenotyping without validating its algorithm against a gold standard may end up with activity measures that correlate poorly with functional status. The data looks clean, the sample size is large, but the regulatory reviewer sees noise and rejects the evidence. This risk is highest when the endpoint is novel and the validation study was underpowered.
Missing Data That Biases Results
Missing data is not just a nuisance; it can systematically bias treatment comparisons if the missingness is related to the outcome. For example, patients who feel worse may wear a device less, leading to an overestimate of improvement in the treated group. Without careful sensitivity analyses (e.g., pattern-mixture models or tipping-point analyses), the trial's conclusions may be invalid.
Regulatory Rejection Due to Lack of Precedent
Even a well-executed RWD study may face rejection if the agency has not previously accepted that endpoint or data source for the specific indication. Sponsors who assume that because RWD is accepted for one disease it will be accepted for another often face delays or additional data requests. The risk is mitigated by early dialogue and by including a traditional secondary endpoint as a backup.
Cost Overruns from Poor Planning
Underestimating the cost of data cleaning, quality control, and regulatory consulting can blow a budget. A wearable study that budgets $200 per patient for devices but nothing for data management may find that cleaning and harmonizing the raw sensor data costs three times that amount. Similarly, EHR data that appears cheap to license may require extensive mapping and de-duplication before analysis.
This guide does not constitute professional advice. Readers should consult qualified experts for decisions specific to their trial and regulatory context.
Frequently Asked Questions
How do I know if my RWD source is acceptable to regulators?
Review recent FDA and EMA guidance documents for your therapeutic area, and search for precedent decisions on digital endpoints. If your endpoint is novel, request a qualification meeting or submit a study proposal under the agency's RWD framework. Many agencies provide written feedback within 60–90 days.
What sample size do I need for a RWD-based endpoint?
Sample size depends on the effect size, variability, and the acceptable missing-data rate. Because RWD often has higher variability and more missingness than traditional trial data, sample sizes typically need to be 20–50% larger. Conduct a simulation study that incorporates realistic missing-data patterns to determine the required N.
Can I combine data from different wearables or EHR systems?
Yes, but only after careful harmonization. Each device or system may measure the same construct differently. For wearables, cross-calibration studies can establish conversion formulas. For EHR systems, mapping to a common data model (e.g., OMOP CDM) is standard practice. Document all harmonization steps in a data provenance report.
How do I validate a digital endpoint derived from passive sensing?
Conduct a validation study against a gold-standard measure (e.g., clinic-based gait analysis for step count, or polysomnography for sleep). Report sensitivity, specificity, and agreement metrics such as intraclass correlation coefficient. The sample size for validation should be large enough to estimate these metrics with acceptable precision, typically at least 50–100 participants.
What if my RWD study finds a result that contradicts the pivotal trial?
This is not uncommon, and it does not necessarily mean the RWD is wrong. Differences in population, setting, adherence, and outcome definitions can all contribute. Pre-specify a plan for reconciling discrepant findings, including a systematic review of potential confounders and a pre-planned meta-analysis if multiple RWD sources are available.
Recommendation Recap: A Decision Framework for Teams
After reviewing the options, criteria, and risks, here is a three-step framework for teams moving beyond wearables toward precision RWD:
First, define your primary estimand — exactly what you want to measure, in whom, and under what conditions. This forces you to be specific about the data source needed. If your estimand requires continuous glucose data, a clinical-grade wearable is likely necessary. If it requires all-cause hospitalization, EHR data may suffice.
Second, assess your regulatory risk appetite. If this study is for internal decision-making or exploratory analysis, you can afford more innovative, less-validated RWD sources. If it will support a label claim, stick with approaches that have a clear regulatory track record or invest in a qualification process.
Third, pilot, pre-specify, and monitor. No matter which approach you choose, validate it in a pilot, pre-specify your analysis plan, and monitor data quality throughout. These three disciplines separate evidence that persuades from data that wastes money.
The era of real-world clinical data is here, but precision does not come from the data source alone — it comes from deliberate design, honest acknowledgment of limitations, and rigorous execution. Teams that embrace that mindset will produce evidence that moves both science and regulation forward.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!