Step by step tutorial to use Grade and Gradepro for evidence appraisal, part I: basic concepts and gameplay

… step by step tutorial for using Gradepro for evidence appraisal of a study question

Gradepro (?link) is an evidence appraisal and research evidence grading tool. The tool lives on the web, and you should be able to use it over the Internet. Gradepro allows you not only to critically appraise evidence, it also allows you to write a report, and generate tables of data. Before you use gradepro, you need to understand how GRADE works.

Note that GRADE is NOT suitable for appraising a SINGLE study. Do not use GRADE for appraising a single study. If you have a single research article, you can appraise the article either based on your understanding of what constitutes best evidence using checklists from the JAMA series (see the CASP checklist below), or you can use checklists from different other sources. For example, in order to critically appraise observational epidemiological studies, use the STROBE checklist:

Here, download the STROBE checklist for case control study and give it a go:

STROBE checklist for case control study: click on the image to download the file and give it a go

Now, you will see that as a checklist, it is helpful if you are writing a case-control study report, but also, if you were to appraise a study that reported a case-control study design. As a student, you could also use the CASP checklist for a case control study design:

CASP checklist for case control study design

I mentioned in both cases case-control study designs as these are the most widely used observational study designs. For intervention trials, I’d use the CASP randomised controlled trial study design checklist, you can obtain from here:

Now, the point that I’d like to highlight here is that in each case:

  • These take a single study and help you develop a framework as to how you will appraise various aspects of internal and external validity of such research
  • These will also help you to set up a series of notes but
  • These are not meant for setting up a score on the overall evidence that the studies present. What do I mean by this?

Note that an individual study by itself provides valuable insights into the ‘relationship’ between an exposure or an intervention and an outcome. But it is also true that:

  • An outcome can be covered by a number of different studies
  • These studies can all be of the same type (say RCTs), or a mix of studies (some RCTs, some Cohort studies, other study designs)
  • Equally, a study can include many different outcomes (that is, a study can report harmful as well as beneficial outcomes).
  • In clinical practice as well as in public health interventions, we tend to be focused on achieving outcomes as a result of interventions or changed practices.

Let’s explain this with an example. You’d like to find out the level of evidence you have for effectiveness of grommet insertion for children with otitis media with effusion. You specifically want to know whether inserting a grommet in the affected ear-drum will lead to better hearing and function of middle ear, less discharge and complications, when compared with myringotomy or non-surgical approaches. You can see that we are discussing three different outcomes:

  1. Hearing improvement (or improvement in deafness)
  2. Less discharge (adverse effect: more discharge)
  3. Quality of life (improved quality of life)

We can say that a single article may focus on a single outcome or many outcomes; however, it is more common that many articles will focus on many outcomes. Similarly, a single outcome will be covered by many articles; so this is the reason we would focus on outcomes as opposed to single journal articles to appraise the level of evidence available to us for appraising the quality of evidence available to us. This is where GRADE enters the picture.

In the first article on the series introducing how to do GRADE, Guyatt et al [1] introduced the concept of GRADE evidence profiles and summary of findings tables. Here is a schematic diagram from the paper for your review:

The diagram of using GRADE (we will see how we use this later, the shaded boxes are for guideline development)

Note the criss-cross between studies (S1 … S5) and outcomes (OC1 … OC4). This pattern is critical to understand this process. As I have mentioned, not all studies will contain all outcomes, and not all outcomes will map to all studies. This is why when you apply the GRADE process you must have more than one study, and you must be prepared to summarise the results of those studies. This is why I noted that GRADE is useful for systematic reviews and meta analysis, NOT for a single study. Having said this, if you have a single study on which you want to appraise, then do not use GRADE, use one of the tools I described earlier, but you will not be able to pool together a ‘score’ for the quality appraisal. Now, the other important point to note here is that, you will need to arrive at a summary result across studies using a number of different considerations.

So here are the steps.

  • Conduct a meta-analysis or systematic review or source one
  • Play a game

Do a meta-analysis or systematic review

Step 1. Ask a focused question based on PICO. — PICO stands for person/patient/population, intervention (it could also be exposure), comparison, and outcomes. For example, we are interested to find out whether tympanostomy tubes or myringotomy tubes are better than non-surgical approaches in children for the treatment of secretory otitis media (or otitis media with effusion). We should convert this to a pico formatted question as follows: (we will run these steps in a section below while demonstrating GRADEPro, so we are lightly covering these at conceptual level here):

P. — children with secretory otitis media (or otitis media with effusion)
I. — typamonostomy tubes or myringotomy tubes
C. — non-surgical approaches (e.g., only myringotomy but no tube insertion and antibiotics and other medication and breathing exercises)
O. — hearing, speech, discharge from ear, quality of life

Step 2. Use the search terms to conduct a focused search of the literature. — Here in our case we will search Cochrane register of trials, Medline/Pubmed, and Google Scholar for the last five years worth of studies to see what we find

Step 3. Pool the results of the individual studies you obtained or use systematic reviews and meta-analysis/meta-analyses. — Now this is the critical part. You will need to abstract data from individual studies at this stage. Without a quality appraisal, you can still abstract data from the tables that those studies provide. Read the tables of each individual study and see how or what did they present in the form of data that you will abstract and put on a spreadsheet or a programme. Then, decide whether you can at all pool them. If the data are narrative, you cannot pool them statistically. If statistical data are presented, then you will need to pool the statistical data using meta-analysis. Whether a meta-analysis is the best thing to do will depend on studying how heterogeneous are the studies. Then decide whether you want to pool the results using fixed effects or random effects model. Besides, you should test the extent to which the studies have covered negative, small, and equivocal studies in the mix (assess publication bias, we have covered this in section xXXX, see that section to learn in details as to what I mean by this).

We will use a jupyter notebook for this purpose. You can find the annotated notebook here:

You do not have to conduct a meta analysis of primary studies. You can use another meta analysis or systematic review that others have conducted. But note that only after you have done steps 1 through 3, you can move to step 4. In summary, you can do it in two ways:

  • You source and seek individual primary studies and synthesise the evidence
  • You source and seek systematic reviews and then use the already synthesised evidence for an outcome to guide your GRADE process.
  • Or you can combine both and remember that a single SR may not cover all relevant outcomes, therefore, for a single outcome, you will need more than one SR or pool of studies.

Play the game

Step 4. Play the game of rating quality of evidence for EACH OUTCOME. — this is the critical bit and this where we start with the study design. We say that if the evidence for that outcome came from randomised controlled trials (or for each randomised controlled trial for that outcome), we will rate that a highest score and then start downgrading it. Also, for that reason, if the evidence for that outcome came from observational studies, then we downgrade the evidence. The following is a guideline table for assessment of quality (taken from this first article):

What factors to consider for quality appraisal (from Guyatt’s first paper)

Consider the above table. It’s like a game. So, once you start with a pool of primary studies or one or more systematic reviews that you have selected on the basis of your query and search, you identify an outcome and your aim will be to arrive at a rating for THAT outcome in one of the four categories:

  • High
  • Moderate
  • Low
  • Very Low

Here are the moves:

Move 1. What is the study design. — This is important because remember for each outcome, you have selected ‘like’ study designs, otherwise, you could not do an SR or a meta analysis nor even pool studies. So, now consider what kind of study designs you were dealing with. If RCT, assign: “high” (assign something like 4 points), else assign: “Low” (assign 3 points), and we move from there.

Move 2. Check the risk of bias. — If they are serious, take away 1 point; else, if the risk of bias is very serious, take away 2 points; else, leave the points earned (that is no serious bias, you are good to go)

Move 3. Check the level of inconsistency. — Again, as these are largely pooled data, therefore, you will need to check heterogeneity of the effects. For meta analyses, check what was reported in their Q statistic and work on that basis, and for systematic reviews, you need to read the reviews carefully to understand this. Then based on your subjective bias, take out -1 or -2 points based on whether they are serious or very serious (more in section xxxx)

Move 4. Indirectness. — Check how the outcome was measured. If the measurement was based on surrogate outcomes or indirectly, you would take away 1 or 2 points. Else, leave and move on.

Move 5. Imprecision. — After you conduct the meta-analysis or the systematic review, check the results (pooled results), and the associated 95% confidence interval. If not reported or not possible from the type of data you are working with, deduct 1 or 2 points as you cannot assess imprecision. If you can, check if the 95% confidence interval traverses the null. If they do, deduct 1 or 2 points; if not, leave the points.

Move 6. Publication bias. — Check the publication bias stats.

Move 7. Check effect size. — If it is high increase the points tally by +1, if it is very high raise by +2, else leave as is

Move 8. Dose response effect. — If there is a clear evidence of dose response, raise by +1, and

Move 9. Check for all plausible confounding. — If they have done that, raise the points by +1

If you look at the moves, you will see that the system is geared towards downgrading or penalising the evidence, or making it conservative. Anyway, at the end of this, you end up with two tables, an evidence portfolio and a summary of findings table. EP is a detailed quality assessment together with the summary of findings table, and looks like as follows:

Evidence portfolio (the red part is the quality appraisal, and the right part is summary of findings. The two parts are flanked by a list of outcomes to the left and the quality scores to the right

In the figure, to the right you see the red box that lists the quality appraisal based on the criteria we listed earlier. On the left, you see the summary findings that should come from either the systematic review or the meta analysis you have conducted or sourced. In subsequent parts of this tutorial, I am going to show you how to:

  1. Conduct a meta analysis so that you can pool the results together or how to appraise a meta analysis or systematic review and what to include
  2. The detailed rules of the game of grade
  3. How to use the Gradepro tool itself

[1] Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64(4):383–394. doi:10.1016/j.jclinepi.2010.04.026

Associate Professor of Epidemiology and Environmental Health at the University of Canterbury, New Zealand. Also in:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store