Article Text

other Versions

Download PDFPDF

Misclassification of coffee consumption data and the development of a standardised coffee unit measure
  1. Robin Poole1,
  2. Sean Ewings2,
  3. Julie Parkes1,
  4. Jonathan A Fallowfield3 and
  5. Paul Roderick1
  1. 1Primary Care and Population Sciences Academic Unit, University of Southampton Faculty of Medicine, Southampton, UK
  2. 2School of Health Sciences, University of Southampton, Southampton, UK
  3. 3University of Edinburgh Centre for Inflammation Research, Queen’s Medical Research Institute, Edinburgh, UK
  1. Correspondence to Robin Poole; r.poole{at}


Background Associations of coffee consumption with multiple health outcomes have been researched extensively. Coffee consumption, usually reported in cups a day, is a heterogeneous measure due to numerous preparation methods and cup sizes, leading to misclassification. This paper develops a new ‘unit’ measure of coffee and uses coffee consumption data from a representative sample of the UK population to assess misclassification when cup volume and preparation type are not taken into account.

Methods A coffee unit measure was created using published estimates of caffeine and chlorogenic acid concentrations, and applied across volumes and preparation types. Four-day food diary data in adults from the UK National Diet and Nutrition Survey (NDNS; 2012–2016) were used to quantify coffee intake. Participant self-reported cups a day were compared with cups a day standardised by (a) 227 mL volume and (b) 227 mL instant coffee equivalents (unit measure), and the degree of misclassification was derived. Sensitivity analyses were conducted to model coffee drinking preferences of different populations and caffeine:chlorogenic acid weighting assumptions of the unit measure.

Results The NDNS sample consisted of 2832 adult participants. Coffee was consumed by 62% of participants. Types varied, with 75% of caffeinated coffee cups being instant, 17% filter, 3% latte, 2% cappuccino, 2% espresso and <1% other types. Comparing reported cups to volume-standardised cups, 84% of participants had correct classification, and 73% when using the coffee unit measure, 22% underestimated and 5% overestimated, largely by one cup. Misclassification varied by gender, age and income. Sensitivity analysis highlighted the benefits of using the unit measure over volume alone to cater for different populations, and stability of the unit composition assumption.

Conclusion Cup volume and preparation type should be taken into account, through the application of a standardised coffee unit measure, when coffee consumption is classified in future research studies.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

What this paper adds

  • Cups a day is a heterogeneous measure of coffee consumption due to numerous preparation methods and cup sizes.

  • To overcome this limitation we developed a standardised coffee unit measure that takes preparation type and cup size into account.

  • When applied to a representative coffee drinking population in the UK, the coffee unit measure revealed misclassification in approximately 1 in 4 participants when preparation type and cup size are not taken into account.


Globally, an estimated two billion cups of coffee are consumed every day.1 When consumed on such a large population scale, even relatively small health benefits or harms are important to understand. We recently highlighted that aside from pregnancy, coffee consumption is more likely to be beneficially, rather than harmfully, associated with health outcomes, including lower risk of all-cause mortality, cardiovascular mortality and incident cardiovascular disease.2 For these outcomes, maximal relative risk reduction was seen at intakes of 3–4 cups a day. Other consistent beneficial associations included lower risk of incident cancer (prostate, endometrial and skin cancers), metabolic (including type II diabetes) and neurological conditions (including depression and Parkinson’s disease). The greatest magnitude of benefit was consistently observed between coffee consumption and a range of liver conditions including fibrosis, cirrhosis and hepatocellular carcinoma.

However, the existing evidence may be biased by misclassification of coffee ‘exposure’ due to the use of the coffee cup as a unit of measure. Coffee is a complex mixture of over 1000 bioactive compounds including caffeine, chlorogenic acids and diterpenes,3 and consumption of these compounds can vary between preparation methods in numerous ways4—three prominent examples include: espresso coffee, which efficiently extracts caffeine and chlorogenic acid to reach higher concentrations than other methods, but is consumed in smaller volumes; instant coffee, the most common type of coffee consumed in the UK with lower concentrations of caffeine than other methods, but will vary depending on brand and ratio of product to water; and filtered coffee, which has lower concentrations of diterpenes because the compounds do not readily pass through filter paper. Quantifying exposure to coffee, by accounting for cup size and preparation methods, would resolve misclassification of consumption both within and between studies and help to generate more robust evidence for coffee’s potential benefits and harms. Furthermore, it would allow for greater generalisability of evidence, given that coffee consumption varies by preparation type across different countries; for example, drip/filtered coffee is commonly consumed in North America5 and Scandinavia,6 whereas espresso dominates consumption patterns in Spain and Italy.6

This study aims to investigate whether a new coffee unit measure could be created, similar to the alcohol unit, based on two coffee compounds (caffeine and chlorogenic acid), by taking into account different coffee preparation methods and cup sizes. It also aims to explore the extent of misclassification in the cups a day measure when compared with a cups a day measure standardised by the coffee unit measure and to see if this varies by age, gender and income—factors that may affect the choice of preparation method.


Creation of a coffee unit measure

Published estimates of caffeine and chlorogenic acid concentrations across different preparation methods were used to produce a standard coffee unit measure. The caffeine and chlorogenic acid concentrations (mg/mL) were extracted from published analyses of coffee shop or home prepared coffees, frequently found to have much lower caffeine concentrations compared with laboratory samples,7 (table 1).4 8–15 Where these were not available, published laboratory estimates were used. Diterpenes were not included in the coffee unit measure because they are in the order of 100–1000 times lower in concentration (depending on the preparation method) compared with caffeine and chlorogenic acid. Therefore, chlorogenic acid concentrations were considered as a surrogate measure of all non-caffeine compounds within coffee. Caffeine and chlorogenic acid were summed (with equal weight) to produce a total concentration of active ingredients in mg/mL. The most commonly consumed coffee in the UK, instant coffee at a volume of 227 mL (8 UK fluid ounces), was referenced as one unit measure and unit measures of other typical coffee drinks were derived as presented in table 1. These were calculated by dividing the summed caffeine and chlorogenic acid of the preparation type and volume of interest by the caffeine and chlorogenic acid concentration of 227 mL of instant coffee. For example, 30 mL of espresso delivers 4.75 mg/mL×30mL=142.5 mg caffeine and chlorogenic acid, which is equivalent to 142.5/(0.84×227) or 0.7 coffee units. Other examples include 1.7 units in a 227 mL mug of filter coffee, 2.0 units in a 354 mL cappuccino and 1.4 units in a 240 mL latte.

Table 1

Preparation type definitions, caffeine (CAF), chlorogenic acids (CGA) and diterpene concentrations, classification within NDNS and derived coffee unit examples

Population sample

Data from years 5–8 (2012–2016) of the UK National Diet and Nutrition Survey (NDNS)16 were used to quantify coffee intake in a representative sample of the UK population. The NDNS is a rolling annual cross-sectional survey of approximately 1000 UK adults and children.17 Participants record all food and drink consumption in a 4-day food diary, later coded and classified by researchers. We extracted data from NDNS for every adult participant (aged≥18 years) who drank at least one cup of coffee during data capture. We recorded number of cups and cup volume for each coffee type consumed.

In the NDNS, coffee preparation methods are broadly classified and recorded as instant, cappuccino, latte, strong infusion, weak infusion and vending machine coffee. Espresso-based drinks, such as cappuccino, latte and mocha, are recorded in their own category but no separate category exists for espresso coffee; we categorised this as strong infusion with volume <65 mL, in keeping with typical volumes of single (30 mL) or double (60 mL) espressos. The remaining strong infusion cups were combined with the weak infusion cups and assumed to represent filtered (regular coffee). Vending machine coffee was assumed to be equivalent in composition to instant coffee. Cup volumes <15 mL or >1000 mL were excluded.

The NDNS provides weights to allow adjustment of the survey data to account for sampling and non-responder bias. The complex sample function of SPSS V.2418 was used throughout the analysis to account for stratification, clustering and weighting of the NDNS data.

Ascertainment of misclassification

Misclassification in the use of the cups a day measure was assessed by applying (a) a standard cup volume and (b) a standard cup volume and preparation type (coffee unit measure) to the intake of each participant to investigate the impact of using a cups a day measure when volume and preparation type have not been taken into account.

(a) Standard cup volume

A 227 mL volume-standardised equivalent number of cups a day was calculated for each participant. Misclassification was calculated by subtracting the number of volume-standardised cups from the number of reported cups and rounding the result to the nearest cup. For example, if a participant reported one cup of coffee a day with a volume of 400 mL, this would be equivalent to 400/227 or 1.8 volume-standardised cups a day. In this example, the misclassification would be 1.0 minus 1.8 equals −0.8 cups a day (rounded to −1 cup). This is interpreted as reported cups underestimating actual intake by one cup.

(b) Standard cup volume and preparation method (coffee unit measure)

A unit measure-standardised equivalent number of cups was calculated for each participant. The total coffee unit measure intake for each participant was calculated by summing total caffeine and chlorogenic acid (mg) for each coffee consumed and dividing by the single unit equivalent (ie, instant coffee 0.84 mg/mL×227 mL).

For example, a participant reporting a seven-cup consumption comprising four cups of instant coffee at 250 mL each, two cups of cappuccino at 350 mL each and one cup of espresso at 30 mL, would have consumed:

4 (0.84 mg/mL×250 mL)+2 (1.13 mg/mL×350 mL)+1 (4.75 mg/mL×30 mL)

=840 mg+791 mg+142.5 mg

=1773.5 mg of total caffeine plus chlorogenic acid

To standardise to coffee units:

=1773.5 mg/single coffee unit caffeine plus chlorogenic acid

=1773.5 mg/(0.84 mg/mL×227 mL)

=9.3 coffee units

In this example, reported intake underestimated actual intake by two cups, calculated by 7.0 minus 9.3 equals −2.3 cups and rounded to −2 cups. The misclassification analysis was repeated separately for decaffeinated coffee using, first, 227 mL caffeinated instant coffee, and second, 227 mL decaffeinated instant coffee as the standard unit.

Subgroup analysis

Misclassification was also calculated separately for men and women, age group (18–34, 35–54 and ≥55 years) and income tertile (≤£17 500, >£17 500 to ≤£32 383 and >£32 383). Finally, instant coffee as a proportion of all coffee consumed was calculated for all caffeinated coffee drinkers and separately for each subgroup.

Sensitivity analysis

Due to espresso being a small volume of highly concentrated coffee, the misclassification methodology was repeated separately by excluding espresso. Second, the analysis was repeated by substituting instant coffee of any volume with 30 mL espresso coffee (volume-standardised to 30 mL and a single coffee unit measure redefined as 30 mL espresso) to model settings in which espresso is the most frequently consumed coffee type. Finally, to see how misclassification might change with changing composition assumptions of the unit measure, the analysis was repeated using ratios of caffeine to chlorogenic acid of 0:1, 1:0, 1:2, 1:3, 1:4, 1:5, 2:1, 3:1, 4:1, 5:1, 1:1:1 (diterpenes) and 1:1:1 (higher diterpenes: filter diterpenes replaced with French press diterpenes).


There were 2832 adults in the 2012–2016 NDNS sample, and weighted, 62% of participants consumed any coffee over 4 days, including those drinking only caffeinated (54%), decaffeinated (4%) and mixed caffeinated types (4%). The proportion of drinkers and non-coffee drinkers did not differ by gender, but there were fewer coffee drinkers in the 18–34 age group and in the lowest income tertile (table 2).

Table 2

Proportion of coffee and non-coffee drinkers by gender, age and income

Cups a day and mean cup volume, by preparation type, are presented in table 3. A total of 10 681 cups of caffeinated coffee were consumed during the diary period. Mean intake was 1.6 and 1.4 cups a day among caffeinated and decaffeinated coffee drinkers, respectively. Intake of coffee was marginally higher in men with a mean intake of 1.8 cups compared with 1.5 cups a day in women (data not shown). For those drinking coffee at least once daily, the mean intake was 2.2 cups a day. The mean cup volume was 227 mL and did not vary between daily and non-daily coffee drinkers. It also equated with the mean volume of the most frequently consumed coffee type, instant coffee, which was consumed by 78% of caffeinated coffee drinkers and represented 75% of all coffee cups consumed. The next most frequently consumed coffee type was filter coffee with 31% of caffeinated coffee drinkers consuming this at least once, with a mean volume of 224 mL. Drinks, such as latte, cappuccino, mocha and espresso, were consumed by fewer participants and, apart from espresso, typically in larger volumes than instant coffee.

Table 3

Proportion of coffee drinkers, mean cups a day and mean cup volume by preparation type

Among caffeinated coffee drinkers, 69% drank only one preparation type during the diary period and 27% consumed two types, the majority of these drinking instant and one other type. Thus, 4% of coffee drinkers consumed three or more preparation types. For decaffeinated coffee drinkers, one and two preparation types were consumed by 85% and 14%, respectively.

Misclassification of coffee intake

When standardised by volume, 84% of participants had correctly classified reported intakes, 8% underestimated and 8% overestimated (table 4). Most misclassification was for one cup in either direction, with two or more cups of misclassification accounting for only 2% of participants. The proportion of misclassification generally increased as reported cups a day increased. Unrounded, median volume misclassification was 0.0 cups (IQR −0.2 to 0.2). When standardised by the coffee unit measure, 73% of participant intakes were correctly classified, 22% underestimated and 5% overestimated (table 5). Again, most misclassification was for one cup in either direction, but there was a marginal increase in the proportion of participants with two or more cups of misclassification accounting for 5% of participants. There was also an increase in the proportion of reported cups a day underestimating intake compared with misclassification of volume-standardised cups a day. Unrounded, median coffee unit misclassification was −0.1 cups (IQR −0.4 to 0.1). For decaffeinated coffee, 91% of participants had correctly classified volume-standardised intakes and 58% coffee unit measure-standardised intakes, with majority of misclassification overestimating intake by one cup, but increased to 90% when coffee unit measure was redefined as 227 mL of decaffeinated coffee (data not shown).

Table 4

Proportion of participants misclassified across reported caffeinated cups compared with 227 mL volume-standardised cups a day

Table 5

Proportion of participants misclassified across reported caffeinated cups compared with coffee unit standardised cups a day (where 1 unit=227 mL instant coffee)

Subgroup analysis

Table 6 presents the proportion of misclassification (coffee unit measure) across different subgroups of caffeinated coffee drinkers. There were some notable differences with misclassification being greater in men compared with women, younger compared with older participants and participants in the highest income tertile. Participants in the oldest age group and middle or lower tertile of income had the least misclassification. Caffeinated coffee drinkers in the lowest tertile of income drank 79% of all coffee cups as instant coffee compared with 56% in the upper tertile. Income rather than age appeared to drive most of the non-instant coffee consumption.

Table 6

Misclassification of reported caffeinated cups a day compared with caffeinated coffee unit standardised cups a day across subgroups

Sensitivity analysis

Similar results were found when espresso coffee was removed from the analysis with 85% and 74% of participants with no misclassification for volume-standardised and coffee unit-standardised cups a day, respectively. When instant coffee was substituted with espresso coffee, 40% of participants had no misclassification when volume-standardised, but 75% when using the coffee unit measure. When the ratio of caffeine to chlorogenic acid used to create the unit measure was varied, proportions of participants with no misclassification were relatively stable with 78% for 0:1, 71% for 1:0, 76% for 1:2, 77% for 1:3, 1:4 and 1:5, 70% for 2:1, 3:1, 4:1 and 5:1, and 73% for 1:1:1 (both diterpenes and higher diterpenes).


A new coffee unit measure was created using published estimates of caffeine and chlorogenic acid across preparation methods and applied to representative coffee consumption data from a UK population. Approximately, 84% of caffeinated coffee drinkers in the NDNS had correct classification of reported cups a day measure when compared with volume-standardised and 73% for coffee-unit standardised cups a day that took preparation type into account. The vast majority of the misclassification was under or over by only one cup, with two or more cup misclassification in 5% of participants. This is reassuring when considering most existing research between coffee and health has used cups a day as the measure of intake. However, our analysis suggests classification of coffee consumption could be improved beyond the simple cups a day measure, since approximately one in four participants had misclassified intake when taking into account volume and preparation type.

Misclassification varied with gender, age and income tertile with greater proportion of misclassification in men, younger participants and participants in the highest income tertile. Misclassification is a measure of deviation in size or preparation type from the standard 227 mL cup of instant coffee and participants in the highest tertile of income had lowest instant coffee consumption as a proportion of total coffee consumption compared with lower incomes. Instant coffee represents a relatively inexpensive preparation type with the price of one jar being similar to a single coffee shop bought espresso-based coffee. Home prepared non-instant types using ground coffee or coffee pods/capsules, which would be classified as infusions in the NDNS data, while not as expensive as coffee shop cups represent a significant additional cost per cup compared with instant coffee. Younger participants in the lowest income tertile had a relatively high proportion of underestimated misclassification despite a high proportion of instant coffee consumption. This was due to larger volumes of non-instant coffee compared with other subgroups (data not shown). Despite low income, younger people in the lower income tertile may be drinking more of their non-instant coffee outside the home environment where typically drinks are served in much larger volumes.

There were an even proportion of participants with under or overestimation of coffee consumption when reported cups were compared with volume-standardised cups a day suggesting actual coffee cup size was distributed evenly around the 227 mL standard volume. This pattern was still present when espresso coffee was excluded from the analysis. This was because relatively few espresso coffees were consumed during the diary period. When instant coffee was switched to espresso and compared with a 30 mL standard volume, the proportion of misclassification increased substantially while the misclassification using the coffee unit measure was relatively stable. Misclassification by volume is clearly affected by the choice of standard volume and this is especially important when intake includes espresso coffee, which is low volume but high concentration, compared with other preparation methods, and is the most commonly consumed coffee in some European countries.6 The results highlight the superiority of our coffee unit measure over a volume only comparison across the range of preparation methods. When using a coffee unit measure, there were a greater proportion of participants with underestimated compared with overestimated intakes. This suggests that the coffee unit measure captures the higher concentration of caffeine and chlorogenic acid present in the non-instant types of coffee preparation.

Misclassification of intake among decaffeinated coffee drinkers was much less than caffeinated coffee drinkers when standardised to a unit measure of 227 mL decaffeinated instant coffee, due to less deviation from size and type of decaffeinated compared with caffeinated coffee. However, when standardised to a unit measure of caffeinated instant coffee, the misclassification increased substantially. This highlights potential bias where studies have not differentiated between caffeinated and decaffeinated coffee when measuring coffee exposure.

The impact of an approximate 25% misclassification of coffee consumption on the conclusions drawn by existing coffee research is uncertain. Misclassification of exposure in this context is likely to be non-differential meaning that it will affect those with and without a health outcome equally. Such misclassification is generally understood to dilute the strength of effect estimates when the exposures are dichotomous, moving both beneficial and harmful estimates towards the null, but may be less predictable when there are more than two exposure groups.19

Strengths and limitations

The creation of a coffee unit measure represents a unique attempt to improve the classification of coffee consumption in participants of research studies and in the wider healthcare setting. However, there are a number of limitations. First, the coffee unit measure was created using limited data from published estimates of caffeine and chlorogenic acid concentrations. In contrast to a unit of alcohol that is easy to define as 10 mL (8 g) of pure ethanol, the coffee unit measure does not focus on one ingredient. Coffee is a complex mixture of over 1000 bioactive substances, with no scientific consensus that a single component is responsible for health effects. More likely, there is a synergy between ingredients such that caffeine in isolation has different health effects compared with whole coffee. We used two components of coffee to create a unit measure because these were available as a concentration (mg/mL) for a range of coffee preparation types. In the sensitivity analysis, varying the ratio of caffeine to chlorogenic acid, or adding in diterpenes, in the creation of the coffee unit measure made little difference to the proportion of misclassification.

There are many other factors that could not be taken into account in our analysis of the NDNS data. We made assumptions regarding the preparation types, such as vending machine coffee being equivalent to instant coffee. Many modern vending machines emulate the barista prepared espresso-based beverages and may have coffee unit concentrations more similar to non-instant coffee. A further assumption was that strong infusions under 65 mL were espresso and this may have overestimated coffee unit intake if these had actually been small volumes of non-espresso coffee. However, vending machine and espresso coffee were a very small proportion of total coffee consumed.

We assumed larger volumes of strong infusion and all weak infusions as filter coffee but, in reality, these may have been other types including French press (cafetière), Aeropress or coffee pods. Such coffee types would have a similar composition to filter coffee and our assumption is unlikely to have affected the misclassification substantially.

Further misclassification may arise from incomplete consumption of coffee within each cup although studies have suggested that these tiny amounts are unlikely to contribute to significant misclassification.20 Furthermore, we cannot account for strength of coffee due to variation in the quantity of coffee grounds used, extraction by baristas, roast or bean type (Arabica vs Robusta). Concentrations of caffeine and chlorogenic acids in the analysis of home-prepared and shop-prepared coffee beverages varied widely, and even identical preparation methods using the same coffee in the same establishment on consecutive days have been found to produce coffee that varied in composition.21

The standardised coffee unit measure could be applied in an interventional study to classify baseline coffee intake or quantify a target intake across preparation types. It could also be used in observational studies to improve the quantification of coffee intake. One potential drawback is the extra level of information required to generate the coffee unit measure, requiring estimation of volume and preparation method, and a suitable instrument to capture this information. Many studies have found a dose–response relationship between coffee and health benefits, and future health advice may be based on reaching an intake threshold. A threshold based on units rather than cups could reduce the issues associated with coffee cup heterogeneity.

In conclusion, coffee has been beneficially associated with a range of health outcomes, and some harms, especially during pregnancy. A coffee unit measure is easy to construct and can be applied to a range of coffee preparation types. It has the potential to improve the classification of coffee as an exposure and could be considered for use in studies that evaluate the relationship between coffee drinking and health outcomes, and in delivering future health advice.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.


  • Contributors RP conceptualised and created the coffee unit measure, extracted coffee drinking data, performed the analysis and wrote the first draft of the manuscript; SE advised on aspects of the analysis and revised the manuscript; JP conceptualised the study and revised the manuscript; JAF provided comments and revised the manuscript; and PR conceptualised the study, revised the manuscript and is the guarantor.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests JP reports personal fees from Siemens Healthineers, outside the submitted work; JAF reports personal fees and other from Novartis, personal fees from Merck Sharp & Dohme, grants from GlaxoSmithKline, grants from Intercept Pharmaceuticals, personal fees from Galecto Biotech and personal fees from Gilde Healthcare, outside the submitted work.

  • Patient consent for publication Not required.

  • Data availability statement National Diet and Nutrition Survey data used in this study are available via the UK Data Service.