Article Text

Download PDFPDF

How fragile are Mediterranean diet interventions? A research-on-research study of randomised controlled trials
  1. Maria G Grammatikopoulou1,2,3,
  2. Meletios P Nigdelis1,
  3. Xenophon Theodoridis2,4,
  4. Konstantinos Gkiouras2,4,
  5. Antigoni Tranidou4,5,
  6. Theodora Papamitsou6,
  7. Dimitrios P Bogdanos2,7 and
  8. Dimitrios G Goulis1
  1. 1 Unit of Reproductive Endocrinology, 1st Department of Obstetrics and Gynecology, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Central Macedonia, Greece
  2. 2 Rheumatology and Clinical Immunology, Faculty of Health Sciences, University of Thessaly, Larissa, Greece
  3. 3 Nutritional Sciences & Dietetics, Faculty of Health Sciences, International Hellenic University, Thessaloniki, Greece
  4. 4 Medical School, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Central Macedonia, Greece
  5. 5 Department of Endocrinology, Diabetes and Metabolism, Hippokration General Hospital of Thessaloniki, Thessaloniki, Central Macedonia, Greece
  6. 6 Laboratory of Histology and Embryology, Medical School, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Central Macedonia, Greece
  7. 7 Division of Transplantation, Immunology and Mucosal Biology, MRC Centre for Transplantation, School of Medical Education, King's College London, London, UK
  1. Correspondence to Dr Maria G Grammatikopoulou, Nutritional Sciences & Dieteticss, International Hellenic University Faculty of Health Sciences, Alexander Campus, Thessaloniki, Greece; mariagram{at}; Professor Dimitrios G Goulis, Unit of Reproductive Endocrinology, 1st Department of Obstetrics and Gynecology, Aristotle University of Thessaloniki Faculty of Health Sciences, Thessaloniki, Greece; dgg{at}


Introduction The Mediterranean diet (MD) is a traditional regional dietary pattern and a healthy diet recommended for the primary and secondary prevention of various diseases and health conditions. Results from the higher level of primary evidence, namely randomised controlled trials (RCTs), are often used to produce dietary recommendations; however, the robustness of RCTs with MD interventions is unknown.

Methods A systematic search was conducted and all MD RCTs with dichotomous primary outcomes were extracted from PubMed. The fragility (FI) and the reverse fragility index (RFI) were calculated for the trials with significant and non-significant comparisons, respectively.

Results Out of 27 RCTs of parallel design, the majority failed to present a significant primary outcome, exhibiting an FI equal to 0. The median FI of the significant comparisons was 5, ranging between 1 and 39. More than half of the comparisons had an FI <5, indicating that the addition of 1–4 events to the treatment arm eliminated the statistical significance. For the comparisons with an FI=0, the RFI ranged between 1 and 29 (Median RFI: 7). When the included RCTs were stratified according to masking, the use of a composite primary endpoint, sample size, outcome category, or dietary adherence assessment method, no differences were exhibited in the FI and RFI between groups, except for the RFI among different compliance assessment methods.

Conclusions In essence, the present study shows that even in the top tiers of evidence hierarchy, research on the MD may lack robustness, setting concerns for the formulation of nutrition recommendations.

  • nutritional treatment

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

What this paper adds

  • Recommendations for the adoption of the Mediterranean diet (MD) for improved health outcomes are based mainly on randomised controlled trials (RCTs) and their synthesis.

  • The robustness of RCTs with MD interventions appears to be low to moderate. Similarly, fragility (FIs) and reverse fragility indexes (RFIs) have also been reported among RCTs in other therapeutic domains, including clinical nutrition, anesthesiology, perioperative medicine, etc.

  • The FI and RFI can be used to improve and promote the science of nutrition.


Since Keys first presented a diet-mortality hypothesis explaining the Seven Countries study results in 1986,1 the Mediterranean diet (MD) has become a dietary pattern of particular interest. Research on the MD has spiralled,2 3 reputed for its health effects, spanning from ameliorated cardiovascular disease (CVD) factors,4–7 to improved pregnancy outcomes,8 ticking all the boxes in the quest for health attainment. For some, the MD is much more than a traditional regional dietary pattern, being regarded as the ‘unicorn’ of diet paradigms, with many clinical practice guidelines endorsing the adherence of the MD.9 10

Apart from many ‘followers’ however, several scientists are also questioning the MD. Some are high-lightening the observational design of the Seven Countries study,11 while others are stressing the limitations of nutritional epidemiology in general,12 often incorporating selective reporting,13 inflated results,14 over-interpretation and skewed perspectives,15 with large flexibility in the performed analyses which can be based οn questionnaires of low reproducibility.16

Subsequently, research designs were improved to minimise bias,14 and the focus shifted to randomised controlled trials (RCTs), situated higher in the pyramid of evidence.17 The worm turned again when the biggest and most promising MD trial to date, the Prevención con Dieta Mediterránea (PREDIMED),18 19 raised concerns over randomisation bias, resulting in its reanalysis.20 Nutrition RCTs were once more in the spotlight, and scepticism was apparent,21 with researchers questioning the suitability of RCTs for nutrition research and the quality of the trials. Most trials tend to report positive findings21; however, statistical significance (P-value) does not ensure the robustness of an analysis and a pledge towards the use of more specific measures was made.22–25

Today, clinical research continues to emphasise the P threshold of 0.05 when interpreting RCT results.26 For this, it is additionally important to evaluate the robustness of RCTs with MD interventions and attain an additional measure of the quality of MD RCTs. Two indexes have been proposed for the evaluation of an RCTs’ robustness,27 namely the fragility index (FI) and the reverse fragility index (RFI), for trials with significant or non-significant findings, respectively. Both indexes can only be calculated on studies with an RCT design and dichotomous primary outcomes.

To assess the robustness of RCTs with MD interventions, the present research-on-research study aimed to identify all RCTs with MD interventions and dichotomous primary outcomes, and calculate their FI or RFI, depending on the significance of the comparisons.


Research question and search strategy

The present study used a systematic search strategy to answer the question “What is the fragility and reverse fragility index of RCTs assessing MD interventions?” The PICO of the study’s hypothesis was P: human population of any age group or health status, I: MD intervention, C: any comparison other than the MD, a sham diet, other diet or no intervention, O: any dichotomous primary outcome (table 1). To answer this research question, the focus was set on all RCTs examining MD interventions, irrespective of their other characteristics. Similar studies examining the FI/RFI in broad research areas are common in the literature.26

Table 1

PICO strategy of the study’s research question

The protocol of the study was published at the Center for Open Science A systematic search was conducted on PubMed from inception until 31 August 2019, using the keyword (Mediterranean diet) and the PubMed filter for clinical trials.

Inclusion and exclusion criteria

As the concept of fragility is only applicable to RCTs, only studies with an RCT design were considered eligible.28 In parallel, we searched for trials with dichotomous primary outcomes, as the FI and RFI cannot be calculated in trials with continuous outcomes. Secondary outcomes were not of concern as they are not accounted for when estimating the sample size required for an RCT and should not be used to assess a trial’s robustness.29 All RCTs with MD interventions were assessed for eligibility, despite other possible heterogeneities, as the research question focused on the FI and RFI of MD interventions in general and not in MD RCTs with more homogenous outcomes/samples/designs.

The criteria for inclusion in the present analyses involved (1) RCTs performed on humans, (2) of any age group, (3) irrespectively of any medical diagnosis or health condition, (4) applying MD interventions, (5) compared with no intervention, control diet, or to dietary patterns other than the MD, (6) assessing any dichotomous primary outcome and (7) published in any language.

On the other hand, criteria for exclusion involved trials (1) lacking randomisation, (2) performed on animals, (3) with continuous primary outcomes or (4) with dichotomous secondary outcomes, (5) comparing MD interventions to control diets based on the MD, (6) not including an MD intervention, (7) not reporting the number of events and the sample size in each arm, making it impossible to calculate 2×2 frequency tables, (8) failing to report adequate data to calculate persons–years, (9) trial protocols without results and (10) research performed on animals.

Data extraction

Two researchers (MGG and XT) independently extracted data from the selected RCTs, aided by an additional pair of reviewers (MPN and KG) when deemed necessary. Extracted data involved details regarding the study design, the level of masking (open label/single/double), sample size, protocol registration details, study name/acronym, interventions and comparators, the primary outcomes, the event rates in each arm, the geographical origin of the trial, the randomisation methods used, the level of prevention (primary/secondary) and the methods used to assess intervention adherence. As far as time-to-event outcomes are concerned, extracted data involved the total number of events in each arm over the entire follow-up period of each trial.

Risk of bias

The risk of bias (RoB) of the selected RCTs was evaluated using the Cochrane RoB V.2.0 tool30 by two independent researchers (MPN and KG). Disagreements were resolved via discussion and whenever needed, through the intervention of more experienced researchers (DGG, MGG and DPB).

Calculation of the FI and RFI

The FI was developed as a measure of RCT robustness. It describes the minimum number of patients within the group with the fewest event count needed to change from a non-event to an event, to transform a significant result to a non-significant one.27 It is considered as the measurement of the event count, on which the statistical significance depends.22

For the current analysis, two researchers (XT and MGG) calculated the FI of each RCT, according to Walsh et al.27 In further detail, after extracting the number of events and non-events for each trial arm in 2×2 tables, the additional number of events required to be added in the group with the smaller number of events to make the p value of the Fisher’s exact test ≥0.05 was calculated.

An FI equal to zero describes a highly fragile RCT, as zero participants are required to change from a non-event to an event to reverse a significant finding to a non-significant one.22

On the other hand, in non-significant comparisons (with an FI equal to 0), the RFI was calculated. This was performed via the subtraction of events from the arm with the fewer events, while simultaneously adding non-events to the same arm, keeping the number of total participants constant, until the Fisher exact test two-sided P-value became <0.05. Lower RFIs indicate reduced statistical robustness and increased vulnerability to change from statistical non-significance to significance, with only a minimum number of events. At the moment, there is no recognised cut-off for categorising either the FI or the RFI.26

For the current analyses, 2×2 tables were created in Microsoft Excel and the Fischer’s exact test was used to calculate and verify the FIs and RFIs of the included trials. For one trial,31 32 the reported sample and events in each group were used to calculate the FI, and for another,33 the incidence and the total number of participants allocated in each group were applied in the FI calculations. When more than two interventions were included in one trial, like in the PREDIMED, each arm was compared with the control diet independently, and the FI or RFI was calculated accordingly, for each paired comparison. When the primary outcome was not reported, the first result presented in the abstract was considered as the primary outcome. In RCTs reporting more than one dichotomous primary outcomes, the FI of all three endpoints was calculated accordingly.

Statistical analyses

As the research question was ‘broad’, incorporating all RCTs with MD interventions, an effort to assess differences in RCTs with different characteristics was also performed. Three researchers (KG, MPN and MGG) stratified the selected trials according to blinding, outcome category, sample size, the use of a composite outcome (yes/no) and the method used to assess compliance to the assigned dietary scheme. These categories were used to detect differences in the FI and the RFI between RCTs with different design characteristics and outcomes. As most data did not follow the normal distribution hypothesis, results were presented as medians with their respective IQRs. Group differences were assessed with the Mann-Whitney U test (for comparisons involving two groups) and the Kruskal-Wallis test (for comparisons involving more than two groups). For these analyses, the Jamovi project (V. was used. Significance was set at 0.05, unless otherwise specified.


Search results and RCT characteristics

The detailed process of the selection of RCTs fulfilling the study’s criteria is illustrated in figure 1. Published protocols of RCTs lacking the reporting of results, published studies with design issues (cross-sectional, qualitative, or post-hoc analyses), RCTs without dichotomous primary outcomes, and trials lacking a MD intervention, or those with a concomitant MD comparator arm were excluded from the records. A total of 35 distinct publications18 19 31–63 of parallel interventions were identified meeting the predefined criteria (table 2), with those having an original publication and an erratum being counted as one record (five cases in total).18 19 37–39 44 46–48 63 Multiple publications deriving from the same trials, using the same sample size and outcomes, were also counted as one record (three cases in total).31–33 35 36 53 This resulted in 27 distinct RCTs in total, fulfilling the study’s criteria and being included in the present analyses.

Figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow chart of the selection of the studies. MD, Mediterranean diet; RCT, randomised controlled trial.

Table 2

Characteristics of the included trials

The majority of RCTs were performed in Spain,18 19 33 37–44 46–48 50 52–56 59–61 63 four originated from France34–36 62 two took place in the UK51 58 and Italy,31 32 and single trials were performed in Australia,45 Israel57 and India.49 Most publications belonged to the PREDIMED or PREDIMED Reus trials,18 19 33 37–40 43 44 46–48 50 52–56 61 63 and few referred to the Lyon Heart Study.34–36 62 Two records involved the St. Carlos gestational diabetes mellitus (GDM) prevention RCT,59 60 and others were produced from the Effect of Simple, Targeted Diet in Pregnant Women With Metabolic Risk Factors on Pregnancy Outcomes (ESTEEM),58 The Heart Institute of Spokane Diet Intervention and Evaluation Trial,51 Pre Frail 80,41 Indo-MD Heart Study49 or other trials31 32 42 45 57 (table 2). The sample size ranged from 5645 to 740344 participants.

Given the nature of the intervention (diet), most RCTs were of single-blind masking, and the remaining were open labelled. Regarding the PREDIMED trial, the single-blind masking was disputed by some researchers and further verifications were published by the investigators to support the issue.

Intervention and outcomes

For one trial,41 it was difficult to discern the exact primary outcome. For this specific RCT,41 the first result presented in the abstract (reversion to robustness) was considered as the primary outcome. Accordingly, given that the Consolidated Standards of Reporting Trials (CONSORT)64 guidelines were produced fairly recently, few—mainly older—trials did not have a preregistered protocol, although some had preceding publications detailing the protocol.

The PREDIMED RCTs18 19 33 37–40 43 44 46–48 50 52–56 61 63 evaluated the efficacy of two MD interventions, one with extra-virgin olive oil (EVOO) and one with nuts, in a great variety of health outcomes. In further detail, included PREDIMED RCTs involved the prevention the development of diabetic retinopathy and nephropathy,37 38 CVD,18 19 incidence and reversion of the metabolic syndrome,55 61 liver steatosis,52 depression,56 osteoporosis-related fractures,33 53 peripheral artery disease,54 the occurrence of cataract surgery,40 as well as the incidence of type 2 diabetes mellitus (T2DM),46–48 atrial fibrillation,43 breast cancer50 and heart failure.44 Among the remaining trials, the majority34–36 42 49 51 57 62 investigated the effects of the MD on CVD risk factors. The St. Carlos GDM prevention59 60 and the ESTEEM58 trials used a MD with EVOO and pistachios to investigate maternal and fetal outcomes. The Pre Frail 8041 and Properzi45 trials applied the MD to evaluate frailty65 and non-alcoholic fatty liver (NAFLD) parameters, respectively.

Compliance to the dietary interventions was assessed by the majority of trials using the MEDAS questionnaire,66 food frequency questionnaires (FFQs)34–36 57 58 including the ESTEEM-Q,67 previous day 24 hours diet recalls,34–36 49 diet diaries,31 32 51 diet history,45 diet ‘surveys’62 or other screeners.42 58 In addition, biomarkers indicative of increased MD adherence were selectively assessed, including urine hydroxytyrosol concentrations and plasma α-linolenic acid proportions. When adherence to the control diet differed from that of the intervention group, either a 9-item dietary screener was used, or compliance assessment was not reported in the procedures at all.

Risk of bias

A summary of the RoB of the included RCTs is presented in figure 2. For some of the PREDIMED RCTs,18 37 44 46 47 the deviations from the randomisation protocol were considered when assessing the domains of random-sequence generation and allocation-sequence concealment. Many of the PREDIMED RCTs18 19 37–39 44 46–48 63 published errata and reanalyses of their datasets, excluding participants who had deviated from the randomisation protocol; for these, the allocation sequence concealment was considered as adequate, without, however, altering the random-sequence generation domain of the RoB tool, which remained biased. Furthermore, the use of different tools to assess compliance between intervention and controls was also accounted for when assessing the RoB, as it confuted the single-blind masking.

Figure 2

Included randomised controlled trials, investigating the effects of the Mediterranean diet interventions, rated against the Cochrane risk of bias 2.0 tool.30 *Publication excluding participants who had deviated from the randomisation protocol. †Concerns regarding randomisation rose post publication. ‡Personnel blinding was reported; however, compliance assessment indicates inadequate blinding of the intervention personnel.

According to the RoB (figure 2), the majority of RCTs exhibited either unclear, or high overall bias.18 19 33 35 36 38–42 46 47 50 52 54–57 59–61 The fewest concerns were raised with regard to missing outcome data. Among all included RCTs, the ESTEEM58 demonstrated the lowest bias throughout the examined RoB domains.

FI and RFI of the included RCTs

Table 3 details the FI and RFI of all included RCTs. The majority of comparisons33 37–40 43–48 50–52 55 57 58 61 failed to provide a significant result between MD intervention and comparator arms, exhibiting an FI equal to 0. On the other hand, the FI of significant comparisons ranged between 152 and 39.60 The median FI of the RCTs, excluding those with non-significant comparisons, was 5. More than half of the comparisons had an FI <5, indicating that the addition of 1–4 events to the opposite treatment arm eliminated the statistical significance of the RCTs. The most robust results (FI >15) involved publications of the St. Carlos GDM trial,60 the Indo-MD Heart Study,49 the Lyon Heart Study62 and the PREDIMED comparison between MD + EVOO versus control diet, published by Babio et al.61

Table 3

Fragility Index of the included randomised controlled trials

For the comparisons with an FI=0, the RFI was calculated, ranging between 137–39 45 46 and 2940 (median RFI: 7). Six out of 23 comparisons had RFΙ <5 (median: 4), indicating that the change of 1–4 non-events to events reverses the respective comparisons to statistically significant ones.

Categorisation of the FI and RFI according to study characteristics

Table 4 details the FI and RFI categorisation according to the RCT design, the number of participants, and the primary outcome. When masking was accounted for, no differences were noted in the FI or RFI between trials of different allocation masking.

Table 4

Categorisation of the Fragility and Reverse Fragility Index according to randomised controlled trial design, sample size, compliance assessment method and primary outcomes (n, median and IQR)

Primary outcomes of the trials were categorised as perinatal, those related to diabetes mellitus or metabolic syndrome, cardiovascular, NAFLD-outcomes, or other (first incidence of breast cancer, cataract surgery, osteoporotic fractures, return to robustness or depression). This allocation failed to induce differences in the FI and RFI between different outcome categories. Similarly, allocation of the trials to those with composite primary endpoints against all others failed to show differences in the FI and RFI between the two groups.

Again, when sample size and methods used to assess dietary compliance between trials were used to allocate the RCTs, no differences were observed in the FI and RFI, with the exception of the RFI among distinct compliance-methods groups (p≤0.035).


The present study revealed that most individual comparisons of RCTs with MD interventions and dichotomous primary outcomes as endpoints fail to demonstrate significant results. In parallel, those with comparisons yielding significant findings appear fragile, with a small number of events needed to change the result from significant to non-significant. Subsequently, the number of robust RCTs investigating MD interventions appears to be limited.

Among the reviewed trials, the St. Carlos GDM60 and the PREDIMED RCT conducted by Babio61 exhibited the highest FIs, indicating that nutrition RCTs can be robust. Both of these trials exhibited high RoB in several RoB domains, suggesting that robustness does not necessarily coincide with low RoB. In the St. Carlos GDM study,60 the reported event rate was high, corresponding to 27.8% and 25.8% of the intervention and control groups, respectively, whereas the Babio61 PREDIMED trial did not exhibit a similar high rate of events (1.5% of the total participants in the intervention arm and 3.9% of those allocated in the control group, respectively). This discrepancy between two RCTs with high FI indicates that the event rate is not the only parameter influencing the FI. According to Gaudino et al 29 the FI, P-values, events and sample size are mathematically related; however, the type of primary outcome might also have an effect on a trial’s robustness. For instance, the St. Carlos GDM study60 used two primary outcomes, the first being the incidence of GDM59 and the second being a composite maternal-fetal score60 and published the trial’s results in two distinct publications. Although both publications reported significant findings, the first exhibited an FI equal to 459 and the latter an FI of 39.60 Composite scores are popular in nutrition research; they are combining distinct outcomes, often resulting in a greater event rate as compared with the use of the ‘component’ outcomes independently. In the present analyses, the use of composite scores did not ensure statistical robustness in all of the trials herein, with many exhibiting low FIs and RFIs (<5).18 19 35 36 58

The Esposito et al 31 32 trial also demonstrated a high FI, indicating a robust outcome. However, in this specific RCT the two diets applied by the trialists were not so comparable. In more detail, the intervention arm adopted a low-calorie MD, whereas the comparator group followed a low-fat diet, without any reported restrictions concerning the energy intake. Thus, the observed effects of the intervention arm, and subsequently the high FI, could well have been the result of the prescribed low-calorie diet, as restricted energy intake leading to weight loss has been shown to delay the development of T2DM and subsequently, improve glycaemic control and various coronary factors.68–72

Additionally, it appears that the majority of evidence on MD interventions with dichotomous outcomes is based on the PREDIMED trial, which had a multiarm design. According to Parmar et al,73 trials concurrently evaluating more than one intervention, like the PREDIMED, have increased chances of finding significant differences even with the use of small sample sizes. Since the FI is based on Fischer’s exact test it can only be applied on 2×2 tables, thus in trials with three parallel arms, distinct comparisons of each intervention with the comparator group were performed for the calculation of the FI/RFI. Exlcusion of the trials with three arms however, did not alter the pooled results herein (Median FI: 5, RFI: 7). Accordingly, separation of the PREDIMED comparisons revealed a lack of a significant effect in approximately half of the comparison pairs. When the PREDIMED comparison arms where grouped together and compared against the other trials, the median FIs and RFIs between groups were similar (5 and 7, respectively for both groups), indicating a similar robustness to the rest of MD RCTs. Apart from disputes concerning the randomisation of the PREDIMED sample and the different reported tools used to assess compliance, Correia74 also noted discrepancies in the medical care offered to the participants, resulting in allocation concealment bias. In parallel, the control group received an intervention of lower intensity for the initial 3 years of the RCT, a corrected problem before completion of the recruitment and analysis of the results.75 Inevitably, however, a different intervention frequency unmasks participant allocation. Additionally, compliance with the low-fat control diet appeared to be a difficult task in the long run, with the mean fat intake of participants reaching 37.4% of the total energy intake 5 years post intervention. Thus, the control diet did not correspond to a low-fat regime but was rather lower in the fat content compared with the two MD interventions (42% of the total energy intake).75 Subsequently, more losses to follow-up were recorded in controls, mainly among participants with a worse CVD risk profile at recruitment.75 This induced further bias towards ameliorated results for the control group, leading to mitigated between-groups differences, and by inference, the bias in the FI. Despite the issues mentioned above, the PREDIMED is an ambitious milestone trial for nutrition research and reanalysis of the data did not reveal differences in the reported results despite the randomisation issues. Given the prolonged intervention duration and the large number of participants, collaborators and outcomes, it is not uncanny that certain aspects of the trial’s design and execution demonstrated issues. Undoubtedly, similar issues might have also been observed in pharmacological trials. On a sidenote, the PREDIMED is probably the only megatrial that has undergone this degree of exhaustive scrutiny, despite the results being unchanged at republication. Moreover, unlike pharmacological trials, the trial aimed in providing evidence to a more traditional and accessible therapy (i.e. diet), without supporting any industry products other than common, ‘healthy’ foods, including olive oil and nuts. According to the authors, these issues should have increased trust to the results. However, for the detailed methodologists, the majority of nutrition research has limitations, whereas for the sceptics, nutrition research is scrutinised for competing against the big Pharma on a pretence of evidence.

For many of the included trials, the calculated low FIs and RFIs were associated with an overall smaller number of events. This problem can be surpassed if greater sample sizes are recruited at baseline, or if we shift the focus towards the execution of pragmatic trials. However, Gaudino29 noted that it is more ethical to power RCTs in order to produce the required level of evidence using the minimum possible number of participants. Enrolling additional participants might result in stronger evidence against the null hypothesis, however, it might violate the equipoise principle.29 On the other hand, findings may produce more contradictory results than similar trials, and may also pose further ethical concerns.29

An important question arising from the present findings is whether we are receiving the reliable data we are craving for, by performing RCTs, or if we are overlooking important flaws of either the nutrition science, or the methodology applied in trials examining the MD. However, the present study did not aim in examining the importance or the effectiveness of the MD as a therapeutic dietary regimen. The low robustness calculated herein indicates that even the best level of primary MD evidence proving causality, namely the RCTs, can fail to reach the standards one would expect. Recently, a study76 assessing the FI of clinical nutrition trials revealed a low FI. According to Zeilstra,77 many nutritional RCTs yield ambiguous results, which is why the RCT design is often considered ‘ill-suited’ for nutritional research.77–80 Additionally, given that most trials are based on different analyses of the same landmark protocol (PREDIMED), bias and limitations of the trial are inevitably reproduced in every publication. Subsequently, any synthesis of related RCTs, although it may present low heterogeneity, carries an inherited risk of extrapolated findings. To nutrition’s defence however, lower median FI compared with that of MD interventions has been reported in perioperative,81 anesthesiology,82 plastic surgery,83 and critical-care medicine84 RCTs, as well as among paediatric orthopedic85 and appendicitis86 trials. Nevertheless, the synthesis of these trials for recommendations formulation consists of a common practice in the fields mentioned above, as in the science of nutrition.

On the flip side, RCTs with MD interventions and continuous primary outcomes demonstrate significant findings while supporting the health benefits of adhering to the MD prototype. However, similarly to the Esposito31 32 trial, control interventions are not always comparable, with a tendency to favour the MD arms. This is why, to verify the health effects of MD adherence and advocate for its prescription, superiority trials with continuous primary outcomes should be performed, comparing the MD to other healthy diet regimens instead of the usual diet of participants or dietary advice only.

Although the current results indicate that as far as trials with dichotomous outcomes are concerned, the evidence on the MD entails some limitations, several other factors must also be considered before treating the MD with contempt. For instance, assessment of the participants’ adherence to the dietary intervention, often relies on short dietary indexes instead of more objective measures, and consists of an important component of a nutrition RCT. Moreover, the Hawthorn effect87 (individuals modify an aspect of their behaviour in response to their awareness of being observed) is apparent in all of nutrition research; thus, compliance and assessment are not always accurate. RCTs are often used to guide clinical practice and are sometimes incorporated in clinical practice guidelines intact or after synthesis, using systematic reviews and meta-analyses. Given the demand for evidence-based nutrition recommendations,88–91 the results suggest that the formulation of recommendations promoting the MD based on RCTs should be performed with caution.76 Thorough examination of the American College of Gastroenterology guidelines revealed that most RCTs used to guide recommendations regarding Crohn’s disease relied on a small number of superior events for ‘securing’ statistical significance.92 Often, the FI coincided with the drop-outs reported in some trials. This is why, reporting the FI has also been suggested for systematic reviews and meta-analyses, to understand the fragility of the presented associations and identify possible misuse of the P-value.93 The present study aimed to pinpoint another issue requiring the attention of scientists when performing nutrition trials, namely the FI. Meticulous care in the trial design, sample size and execution can improve the FI of nutrition trials and aid in upgrading the science of nutrition, as succinctly pointed out by other researchers.94

Another important issue in nutrition research is that often, detailed definitions of the interventions are not reported. This is also the case with the MD. Although the label MD is a generic term used to describe the diet of inhabitants around the Mediterranean basin, according to Trichopoulou,95 what constitutes the MD and its key determinants differs even among ‘experts’ worldwide. Martínez-González96 noted that the discrepancies in the MD definition consist of a major problem, especially for intervention studies. As a result, except for the RCTs included herein which were stemming from the same protocol, like the PREDIMED, the remaining trials have most probably used different definitions of the MD. For instance, Singh and associates49 used a National Cholesterol Education Program modification of the MD, whereas Greenberg et al 57 reported following Professor Willet’s definition of the MD. This indicates that differences may exist even under the same intervention label, and these may well induce inconsistencies and bias in the reported outcomes.97

Undoubtedly, one important limitation of the study stems from the relatively small number of RCTs with a dichotomous primary outcome included in the analyses. However, one should consider that the total number of RCTs examining MD interventions is rather small; additionally, in the present study, RCTs were selected based on a systematic search strategy; thus, the results reflect the actual number of available MD-RCTs fulfilling the study’s criteria and being indexed in the PubMed database. An additional limitation is that the publication of many RCTs predated the CONSORT64 98 guidelines; thus, few important characteristics have not been reported. In parallel, in the case of MD RCTs, as in the majority of nutritional epidemiology, diet adherence and intake rely on not so precise exposure assessments—mainly self-reported information—with an increased potential for confounding.16 99–101

Moreover, due to the small number of retrieved trials, it was not possible to correlate the FI with individual study characteristics, or to perform additional statistical analyses. As already mentioned, the use of broad research topics for the assessment of the FI/RFI, as seen herein with the MD, is common in the literature.26 76 Although such studies result in pooling a greater number of RCTs, they also tend to mix many studies with non-comparable aspects, including participant age, health status, study question, outcomes categories, etc. In an effort to correct the heterogeneity observed in the included trials, we also calculated the FI and the RFI after allocating the RCTs based on sample sizes, masking, or outcomes categories. However, these analyses failed to reveal differences, with the only observed significant finding involving the different RFI among RCTs using different methods to assess dietary adherence. Therefore, in the pooled sample of RCTs included herein, differences in sample size, outcomes categories or masking had a minimal effect on the FI and the RFI. Nevertheless, a larger pool of RCTs might have produced different results.

Limitations of the FI include the fact that its calculation is based on the Fischer’s exact test, which is considered as stricter and more prone to type II errors when compared with the χ2 test. Additionally, as already mentioned, it can only be applied to dichotomous outcomes, whereas the majority of nutrition research tends to examine continuous outcomes. Furthermore, the lack of standardised cut-offs for categorising RCTs as either robust, or fragile, is evident.102 103 According to Andrade,102 the most important limitation of the index concerns the use of the much decried statistical threshold (p<0.05) for determining the significance of a study’s outcome. However, one should consider that the FI uses the same threshold applied in the published RCTs and that additionally, the FI is highly correlated to the P-value of a trial, with a significance closer to 0.05 indicating a lower FI.103 104 Moreover, although Walsh27 suggested calculating the index in time-to-event data—as performed in the current analysis—several researchers raised concerns, claiming that it cannot account for the effect of time.102 Nevertheless, as Charilaou105 promptly noted, the FI can offer a measure of the validity of an RCT, especially in trials where the number of participants lost to follow-up, exceeded the FI of the trial. More recently, in a collective effort to optimise patient care, the routine use of the FI has been recommended for the development of all clinical practice guidelines,28 with incorporation of the results in the GRADE (Grading of Recommendations Assessment, Development and Evaluation) format.


In summary, the present study reveals that, when adhering to good scientific principles, one discerns that even in the top tiers of evidence hierarchy, research on the MD may lack robustness, setting concerns for the formulation of nutrition recommendations in a wider context. A collective effort is required to promote the science of nutrition in an evidence-based manner. Despite the mediocre robustness of RCTs with MD interventions, the findings herein do not overlay on the importance of the MD on health or as a UNESCO accredited intangible cultural heritage. Nevertheless, it appears that our quest for an ideal diet for all could prove horses for courses, and a more personalised approach may be required for both health attainment and ameliorated disease outcomes. As Correia74 noted ‘enthusiasm regarding the MD may not be proportional to the level of evidence’ and this might lead to allegiance bias and an imbalance between expectancies and evidence.


The present work has been submitted for presentation at the 10th Scientific Congress of the Medical School, Aristotle University of Thessaloniki (2021).



  • Twitter @melnigdelis, @x_theodoridis, @dbogdanos

  • MPN and XT contributed equally.

  • Contributors MGG and DGG developed the research question and finalised the paper inclusion. MPN, MGG, KG and XT performed the search and extracted the data. AT and TP participated in the search. KG designed the protocol. XT, MGG and DPB calculated the FI and the RFI. MPN and KG assessed the risk of bias of the included trials, with some help from MGG and supervision from DGG and DPB. MPN and MGG performed the statistical analyses. KG, MPN, XT, TP and MGG prepared tables and figures. DGG, DPB and MGG interpreted results, drafted and finalised the manuscript. All authors read the manuscript, contributed comments to its revision, and have approved and agreed to the final version. MGG submitted the manuscript, and DGG is responsible for the overall content as guarantor.

  • Funding The present research was funded by the “MSc in Health and Environmental factors”, Medical School, Aristotle University of Thessaloniki.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement Although most of the data are already presented in the manuscript text, they are also available upon reasonable request. For expression of interest, please contact Prof. Dimitrios G. Goulis.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.