International Journal of Applied and Basic Medical Research

: 2014  |  Volume : 4  |  Issue : 3  |  Page : 6--7

Missing data in clinical trials: Pitfalls and remedies

Sandeep Kaushal 
 Department of Pharmacology, Dayanand Medical College and Hospital, Ludhiana, Punjab, India

Correspondence Address:
Sandeep Kaushal
Department of Pharmacology, Dayanand Medical College and Hospital, Ludhiana - 141 001, Punjab

How to cite this article:
Kaushal S. Missing data in clinical trials: Pitfalls and remedies.Int J App Basic Med Res 2014;4:6-7

How to cite this URL:
Kaushal S. Missing data in clinical trials: Pitfalls and remedies. Int J App Basic Med Res [serial online] 2014 [cited 2021 Dec 6 ];4:6-7
Available from:

Full Text

Clinical trials are a robust method to generate data to prove or disprove the hypothesis. One important issue after the enrolment of study subjects in a clinical trial is attrition and missing data due to any number of reasons. The missing data can be due to many factors like duration of trial (longer the trial, more risk of missing data), difficult to assess the outcome, type of intervention (surgical or medical), less adherence to study protocol (e.g. in psychiatric disorders), poor communication with study subjects (to explain about the study, procedures to be followed, follow-up schedule, poor response to patient queries), poor interpersonal relation with the study subject, etc.

The missing data can be categorized into three types [1] as described below:

Missing completely at random (MCAR): There are no systemic differences between the missing values and the observed values, e.g. blood sugar values are missing due to nonworking glucometer for a period of time or transfer of study subject to another city.

Missing at random (MAR): The systematic difference between the missing values and the observed values can be explained by differences in observed data, e.g. missing blood sugar values may be lower than measured blood sugar values as younger subjects in the study groups have missing values.

Missing not at random (MNAR): Even after differences in observed data are taken into account, the systematic differences remain between missing values and observed values, e.g. study subjects with low blood sugar values are more likely to miss appointments as they were recovering from hypoglycemic episodes.

The missing data makes the data corrupted, introduces an element of bias, invalidates results and conclusions, makes it unsuitable to apply statistics and makes it liable for rejection by authorities due to deviations. However, if we simply exclude these patients with missing data it affects the power of the study. At the same time, it is likely that patients with missing values are the ones with extreme values (treatment failure, toxicity, and good responders). Excluding these patients will lead to underestimation of variability and hence narrows the confidence interval. [2] It must be decided in the beginning of the study that if error is of MCAR or MAR, it can be ignored, but MNAR must be looked critically as it can give a better insight and food for thought for future studies or serious lacunae in the study design.

There are many methods to take care of missing data, but each of them has their own demerits. Earlier approaches to handle this problem include intention to treat analysis (a priory), replacing missing values with mean of observed values, omitting participants with missing values (complete case analysis), missing indicator method (creating dummy variable, e.g. zero as an indicator for missing data), mean substitution (replacing missing value with overall mean or subgroup mean). [3]

Some more robust methods are under development for this purpose. These include maximum likelihood method (large class of model based procedures arising from defining a model for the variable with missing values and making statistical inferences based on maximum likelihood, helps to obtain standard errors also), multiple imputation (creating multiple complete data sets by filling in values for missing data, followed by analysis of each and then combined to one result by taking a mean of all datasets), fully Bayesian (filling missing data by specifying distribution for all parameters as well as missing data), weighted estimating equations (by weighing the analysis to allow for missing data). [1],[4]

The advantages of these newer methods are that they help to minimize the loss of precision and power. However, one must remember that even these highly effective methods may fail if the proportion of missing data is very large; number of observations is very less and in the presence of large number of variables.


1Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, et al. Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ 2009;338:b2393.
2European Medicines Agency. Guidelines on missing data in confirmatory clinical trials, 2010, July 2. Available from: [Last cited on 2014 Mar 14].
3Guan NC, Yusoff MS. Missing values in data analysis: Ignore or impute? Educ Med J 2011;3:e6-11.
4Ibrahim JG, Chu H, Chen MH. Missing data in clinical studies: Issues and methods. J Clin Oncol 2012;30:3297-303.