Original Article
GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence

https://doi.org/10.1016/j.jclinepi.2018.01.012Get rights and content

Abstract

Objective

To provide guidance on how systematic review authors, guideline developers, and health technology assessment practitioners should approach the use of the risk of bias in nonrandomized studies of interventions (ROBINS-I) tool as a part of GRADE's certainty rating process.

Study Design and Setting

The study design and setting comprised iterative discussions, testing in systematic reviews, and presentation at GRADE working group meetings with feedback from the GRADE working group.

Results

We describe where to start the initial assessment of a body of evidence with the use of ROBINS-I and where one would anticipate the final rating would end up. The GRADE accounted for issues that mitigate concerns about confounding and selection bias by introducing the upgrading domains: large effects, dose-effect relations, and when plausible residual confounders or other biases increase certainty. They will need to be considered in an assessment of a body of evidence when using ROBINS-I.

Conclusions

The use of ROBINS-I in GRADE assessments may allow for a better comparison of evidence from randomized controlled trials (RCTs) and nonrandomized studies (NRSs) because they are placed on a common metric for risk of bias. Challenges remain, including appropriate presentation of evidence from RCTs and NRSs for decision-making and how to optimally integrate RCTs and NRSs in an evidence assessment.

Section snippets

GRADE's approach to rate the certainty of the evidence from observational studies

The GRADE working group has developed a widely accepted approach to rate the certainty of a body of evidence (also known as quality of evidence or confidence in evidence) in the contexts of systematic reviews, developing health-care recommendations, and supporting decisions. GRADE's approach to rating the certainty of the evidence is based on a four-level system: high, moderate, low, and very low (Table 1). This is the 18th in the ongoing series of articles describing the GRADE approach in the

Rating risk of bias in individual observational studies

Consider now the assessment of risk of bias in individual observational studies, which in the GRADE approach might lead to further rating down quality from low to very low. Investigators have developed many assessment tools for rating risk of bias in observational studies. Most of the instruments address a specific type of observational or nonrandomized design (e.g., cohort or case control) [18] and seek to determine how well, relative to a perfect observational study of that particular design,

ROBINS-I and GRADE

The arrival of ROBINS-I presents a number of opportunities for the GRADE approach. First, it offers an alternative terminology: establishing NRS rather than observational studies. Although not different in intended meaning in the GRADE approach, substituting NRS for observational studies will lead to a more transparent separation of studies based on their design. For instance, some have struggled with the classification of certain types of studies, such as nonrandomized before-after studies, as

Concerns about GRADE's approach to start an NRS at low certainty

Despite GRADE's broad acceptance in the evidence synthesis community, GRADE's initial certainty rating of outcome data from NRS as low has led to challenges for some GRADE users. First, users of GRADE may inappropriately double count the risk of confounding and selection bias, initially by starting a body of evidence from NRS as low certainty of the evidence followed by again rating down for unknown confounders (although rating down additionally for failure to accurately measure known

Certainty of evidence for a body of evidence from NRS when using ROBINS-I for assessing risk of bias in individual NRSs

Here, we provide general guidance for the use of GRADE in the context of ROBINS-I. ROBINS-I compares an assessment of an individual NRS against a target RCT. The initial description of the underlying study design, such as cohort, case-control, case series, or cross-sectional study, is not considered as a risk of bias feature in ROBINS-I. Thus, when using ROBINS-I for assessing risk of bias in NRS, given that assessment of selection bias and confounding is an integral part of the ROBINS-I tool,

What makes us confident in results of NRS and does GRADE already account for this?

At the end of the previous section, we have noted how, within current GRADE thinking, a body of evidence from NRSs may emerge from the rating exercise as moderate or high-quality evidence. We will now discuss on these issues.

Advantages

Among other features, ROBINS-I allows review authors to assess how failure to use randomization in individual studies has impacted on risk of bias. For example, ROBINS-I allows categorization of the magnitude of bias from lack of randomization through the selection and confounding bias domains, application of this assessment across risk of bias domains, and evaluation of how this differs across individual studies that address different health-care questions. Furthermore, ROBINS-I will

Unresolved issues

GRADE recognizes that there are a number of unresolved issues related to the arrival of ROBINS-I. The GRADE working group is aiming to address those in the near future. The unresolved issues are as follows:

  • 1.

    If systematic review authors use ROBINS-I, should the results from NRSs and RCTs be considered together, including potentially in a meta-analysis (Fig. 4)? If RCTs and NRSs are indeed considered together, when should they be combined? Should NRSs be used to provide more precise estimates in

Summary and next steps

Risk of bias can be best mitigated by a well-conducted RCT that balances known and unknown confounders, using the Cochrane RoB 2.0 tool or similar assessment tools for RCTs to assess risk of bias. For situations in which NRSs are used instead or in addition to RCTs, the arrival of ROBINS-I poses a number of opportunities and challenges to summarizing RoB in GRADE and raises a need for clarification about how ROBINS-I and GRADE are used together. Given the inherent limitations of studies that do

Acknowledgments

H.J.S., C.C., E.A.A., R.A.M., K.T., R.L.M., J.J.M., J.P.T.H., and G.G. are the members of the GRADE working group who contributed to writing this article. The authors would like to acknowledge the GRADE working group for input on the work.

Article history: Slides presented at GRADE meetings in Barcelona (2014), Amsterdam (2015), Philadelphia (2016), and Seoul (2016); Approved at GRADE meeting May 2017.

Authors' contributions: H.J.S. conceived and designed the article and wrote the first draft of

References (25)

  • H.J. Schunemann et al.

    Letters, numbers, symbols and words: how to communicate grades of evidence and recommendations

    CMAJ

    (2003)
  • H.J. Schunemann et al.

    GRADE: assessing the quality of evidence for diagnostic recommendations

    ACP J Club

    (2008)
  • Cited by (432)

    View all citing articles on Scopus

    Conflict of interest: H.J.S. has no direct financial conflict of interest and other authors have not declared financial conflicts of interest. Part of the work has been presented at scientific conferences and at GRADE working group meetings. This article has been officially endorsed by the GRADE working group.

    View full text