Article Text

A systematic review of stroke recognition instruments in hospital and prehospital settings
  1. Matthew Rudd1,2,
  2. Deborah Buck1,
  3. Gary A Ford3,
  4. Christopher I Price1,2
  1. 1Stroke Research Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne, UK
  2. 2Northumbria Healthcare NHS Foundation Trust, Wansbeck General Hospital, Northumberland, UK
  3. 3Division of Medical Sciences, Oxford University, Oxford, UK
  1. Correspondence to Dr Matthew Rudd, Stroke Research Group, Institute of Neuroscience and Newcastle University Institute for Ageing, Newcastle University, 3-4 Claremont Terrace, Newcastle upon Tyne NE2 4AE, UK; matthew.rudd{at}ncl.ac.uk

Abstract

Background We undertook a systematic review of all published stroke identification instruments to describe their performance characteristics when used prospectively in any clinical setting.

Methods A search strategy was applied to Medline and Embase for material published prior to 10 August 2015. Two authors independently screened titles, and abstracts as necessary. Data including clinical setting, reported sensitivity, specificity, positive predictive value, negative predictive value were extracted independently by two reviewers.

Results 5622 references were screened by title and or abstract. 18 papers and 3 conference abstracts were included after full text review. 7 instruments were identified; Face Arm Speech Test (FAST), Recognition of Stroke in the Emergency Room (ROSIER), Los Angeles Prehospital Stroke Screen (LAPSS), Melbourne Ambulance Stroke Scale (MASS), Ontario Prehospital Stroke Screening tool (OPSS), Medic Prehospital Assessment for Code Stroke (MedPACS) and Cincinnati Prehospital Stroke Scale (CPSS). Cohorts varied between 50 and 1225 individuals, with 17.5% to 92% subsequently receiving a stroke diagnosis. Sensitivity and specificity for the same instrument varied across clinical settings. Studies varied in terms of quality, scoring 13–31/36 points using modified Standards for the Reporting of Diagnostic accuracy studies checklist. There was considerable variation in the detail reported about patient demographics, characteristics of false-negative patients and service context. Prevalence of instrument detectable stroke varied between cohorts and over time. CPSS and the similar FAST test generally report the highest level of sensitivity, with more complex instruments such as LAPSS reporting higher specificity at the cost of lower detection rates.

Conclusions Available data do not allow a strong recommendation to be made about the superiority of a stroke recognition instrument. Choice of instrument depends on intended purpose, and the consequences of a false-negative or false-positive result.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Identification of suspected acute stroke and transient ischaemic attack (TIA) in the prehospital setting is challenging, and ambulance personnel applying general clinical assessment are unable to identify up to 39% of patients with a subsequent stroke or TIA diagnosis.1 Particularly since the advent of thrombolysis, numerous stroke recognition instruments have been devised for use by ambulance and emergency department (ED) clinicians2–8 to improve the sensitivity and specificity of identification but no consensus exists on which is optimal.9 ,10 A previous review on this topic11 only considered the prehospital ‘urban’ setting and did not distinguish between tools designed to identify all patients or only individuals potentially suitable for urgent therapies. We undertook a systematic review of all published stroke identification instruments to describe their performance characteristics according to the intended purpose when used prospectively in any clinical setting.

Methods

Search strategy and study selection

A search strategy (see online supplementary table S1) was applied to Medline and Embase databases up until 10 August 2015, with no time or study design restrictions. Abstracts of conference proceedings were included when published in peer reviewed medical journals.

Two authors (MR and DB) independently reviewed titles and abstracts to screen for potentially eligible studies, with a third author (CIP) adjudicating as necessary. Articles were included if they described an instrument administered prospectively face to face by any prehospital or hospital clinician to identify adults with suspected stroke. Articles were excluded if the instrument was: applied retrospectively to clinical records; used only by ambulance dispatchers; administered by a telephone or telemedicine system; published in a language other than English or German. Instruments were excluded which assessed severity or clinical subtype within a known stroke population (eg, distinguishing ischaemic from haemorrhagic stroke or symptoms suggestive of persistent large vessel occlusion (LVO)) and when data were not clearly presented about total numbers of patients with suspected and confirmed stroke.

Data extraction and quality appraisal

Two authors (MR and DB) independently extracted the following from eligible studies, in accordance with the review protocol.

  • study author, year and setting

  • study population baseline characteristics

  • instrument design

  • published sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV)

  • data to calculate sensitivity, specificity, PPV and NPV when these were not provided by authors

  • information to assess the risk of bias and confounding.

The methodological quality of included studies was assessed using a modified version of the Standards for the Reporting of Diagnostic accuracy studies checklist12 (see online supplementary table S2). Conference abstracts were not scored due to limited description of methods. Due to the expected heterogeneity of service settings and lack of trial evidence, there was no prespecified plan for meta-analysis in the review protocol, which was registered with PROSPERO International Prospective Register of Systematic Reviews at the Centre for Reviews and Dissemination, University of York: (CRD42014010687).

Results

The search yielded 5622 references across both databases. Figure 1 illustrates numbers of identified and excluded articles at each stage. After removal of duplicates and abstract screening, 28 full papers and 20 conference proceedings were scrutinised. Eighteen articles3 ,5–8 ,13–25 and three conference abstracts26–28 were subsequently included, describing seven instruments (table 1). Three were developed in the USA, two in the UK, one in Canada and one in Australia. The absence of instruments developed in other regions may reflect differing prehospital and ED service configurations and greater use of medical staff in the prehospital setting.

Table 1

Instrument description and clinical application criteria

Figure 1

Numbers of studies identified, included and excluded.

Facial weakness and arm weakness are evaluated by all instruments. Only the Los Angeles Prehospital Stroke Screen (LAPSS) does not include speech. The LAPSS and Melbourne Ambulance Stroke Scale (MASS) appear designed to identify patients who may benefit from emergency treatment rather than all stroke and exclude individuals with high pre-existing disability (requiring wheelchair use) and prolonged symptoms. LAPSS also excludes those <45 years, reflecting the reduced incidence of stroke in younger adults. By using exclusion criteria, instruments such as LAPSS, MASS, Ontario Prehospital Stroke Screening tool (OPSS) and Medic Prehospital Assessment for Code Stroke (MedPACS) would fail to identify some patients with stroke recognised by the Cincinnati Prehospital Stroke Scale (CPSS) and Face Arm Speech Test (FAST) who might still be suitable for emergency intervention and organised stroke care. To improve specificity, Recognition of Stroke in the Emergency Room (ROSIER) includes features which reduce the likelihood of stroke (seizures or syncope) by modifying a numerical score rather than excluding patients when one feature is present. There was no obvious improvement in instrument performance from this increasing sophistication in the prehospital setting.

Among the articles included, MASS, MedPACS and OPSS have only been evaluated in a single prehospital setting which limits wider applicability. FAST, CPSS and LAPSS were originally designed for prehospital application but were subsequently evaluated in hospital settings, while ROSIER was developed for ED use before later evaluation in ambulance services.

Online supplementary table S3 describes all included studies. Cohorts varied between 50 and 1225 individuals, with 17.5–92% subsequently receiving a stroke diagnosis. Mean age ranged from 65 years to 74 years. Studies varied widely in terms of quality and comprehensiveness, scoring 13–31 points out of a possible 36. There was considerable variation in the detail reported about patient demographics, characteristics of false-negative patients and service context. Stroke prevalence was reported, based upon the numbers of patients screened by a recognition instrument who subsequently had a diagnosis of stroke. It was reported inconsistently whether the recognition instrument was being used to define patients for diversion to access hyperacute stroke care. Failure to identify and count false-negative cases in services reliant upon redirection to acute stroke care has previously been noted as a potential methodological flaw which artificially boosts instrument performance.29 The prevalence of individual detectable deficits among all patients with confirmed stroke assessed with an instrument was not described by any study. A description of training was present in all except five studies.

The ‘trigger’ for assessing patients with a stroke recognition instrument differed considerably, leading to cohorts which contained varying proportions of non-stroke cases. Harbison,3 Nor,24 Fothergill,19 Frendl21 and Studnek6 stipulated that the instrument was to be applied in all prehospital suspected stroke admissions according to individual clinician judgment, but Kidwell14 stipulated that LAPSS was undertaken when certain categories of patient complaints were assessed (altered level of consciousness, local neurological signs, seizure, syncope, head pain, weak/dizzy/sick) and Bray5 required MASS to be undertaken when ambulance dispatch already suspected stroke or where the initial examination revealed a focal neurological deficit. These additional instructions vary the frequency of stroke cases within each cohort and impede comparison of instruments.

Table 2 presents summary data for sensitivity, specificity, PPVs and NPVs, thereby highlighting the variability between instruments and between different studies of the same instrument. Meta-analysis was not part of the review protocol due to the anticipated lack of studies comparing intervention and standard care approaches, but would also be invalid because of heterogeneity created by inconsistencies in data capture approaches, stroke diagnosis confirmation and population boundaries. Although FAST and CPSS are very similar and in combination have been evaluated by the largest total number of patients, studies reported widely varying performance metrics, suggesting that other factors such as service setting and training should be carefully considered when choosing an instrument for clinical use. FAST and CPSS specificity estimates were generally lower than for more complex instruments such as LAPSS, but the different triggers and intended purpose (ie, all vs selective identification) prevent direct comparison. Full details from all papers are available as online supplementary table S4.

Table 2

Summary data for sensitivity, specificity, and PPVs and NPVs

Discussion

Our review identified 18 papers and 3 conference abstracts describing the performance of 7 stroke recognition instruments. Heterogeneous study design, clinical setting, data availability and intended instrument purpose prevent a recommendation being made about the superiority of any single instrument.

Sensitivity and specificity of the same instrument varied considerably between studies. Although these should not be influenced by disease prevalence, the proportion of included patients with stroke with instrument detectable features will have varied between populations (eg, all ED attenders vs those who immediately call an ambulance) and over time, due to changes in public awareness of stroke symptoms.30 ,31 A limitation of all studies was the use of the term ‘sensitivity’ without full data capture for the service cohort. Instead we would recommend using ‘case detection rate’, which is the product of true sensitivity, that is, the ability of the instrument in the hands of a clinician to detect certain deficits when present, and the prevalence of these deficits in the screened stroke population in the study as a whole. This failure to report the proportion of patients with stroke evaluated with deficits which could have been detected by use of the instrument undermines the direct comparison of sensitivity and specificity between studies, even of the same instrument. Consequently, we disagree with the previous review which attempted to conclude that one instrument has superior operating characteristics across different clinical settings on the basis of available evidence.11

Instrument performance varied according to the study service configuration, particularly when test-negative patients were not transported to the study centre, thereby limiting recognition of false-negative cases and favouring the rate of positive identification.29 For example, Fothergill19 reported an extremely high sensitivity for FAST and ROSIER when used by paramedics, as the validation occurred within an established prehospital redirection model without confirmation of diagnosis for instrument negative cases who were not diverted. The lack of superiority of ROSIER compared with FAST found in this setting may also reflect the challenge of obtaining reliable history and examination, and additional paramedic training requirements.

Some variation between included studies may result from the clinical experience of assessors. Although inter-rater reliability of FAST and CPSS signs has been reported,2 ,24 there is no information available regarding agreement of items used by more complex scores such as syncope and prestroke mobility and the impact of clinician background has not been considered. Different reference standards for final stroke diagnosis may also impact upon instrument performance.32 Studies reliant upon ED,25 admitting physician8 ,24 or 72 h19 diagnosis apply a different standard from those using entry onto a stroke registry at discharge,6 ,7 ,21–23 when all necessary imaging and investigation is likely to be complete. Purrucker and colleagues33 reported that more complex, selective instruments led to a reduction in sensitivity when compared with examination by a neurologist. Kidwell's approach14 of using blinded review of charts, and Chen's approach15 of using blinded patient assessment has advantages over registry data as the latter could lead to confirmation bias.

Instrument performance should not be considered as static, and over time it is likely to be affected by changes in the nature and severity of presenting symptoms resulting from increasing public awareness, greater use of emergency services and a demographic trend towards older populations.34 It is important that studies continue to describe instrument performance in the context of patient characteristics and the underlying operational conditions. As evidence and experience grows from the use of reperfusion therapies, it is also sensible to consider whether the stroke identification system being used remains the most efficient approach to minimise delays in treatment for all eligible patients.

Greater professional awareness and changes to training about stroke treatments may have also impacted upon paramedic and ED clinician performance. The amount of clinical judgement applied during the decision to use a recognition instrument, and the way in which its findings are integrated into the clinical assessment as a whole, will likely depend upon the perceived consequences for the patient and the health professional. Prehospital and ED stroke recognition could now be further optimised by clinical judgement rather than increasingly complicated scoring systems.

If early stroke identification is mainly to facilitate the delivery of emergency treatments or prehospital clinical trials, then a high level of specificity (eg, LAPSS14) may be desirable, even at the cost of lower sensitivity. The MASS was developed in an attempt to improve LAPSS sensitivity by combining with the CPSS5 but the age and mobility items in LAPSS exclude patients who may still have CPSS detectable symptoms. Bray5 reported statistical equivalence in sensitivity for CPSS and MASS in two small cohorts, but these were underpowered to detect a significant difference, and prescreening clinical judgement by paramedics may have led to exclusion of patients with more severe comorbidities but milder stroke symptoms. Increasing evidence supporting intra-arterial treatments may lead to an emphasis upon the prehospital recognition of patients with symptoms more likely to reflect LVO. An instrument which combines identification with LVO detection has not yet been evaluated prospectively in the prehospital setting and future studies should report the impact in the context of overall service cost-effectiveness.

A previous review11 has proposed that LAPSS performs best for stroke recognition, based upon low negative likelihood ratios. The largest study included reports a specificity of 99% for LAPSS, however this relates to paramedic diagnosis of stroke in an unselected cohort of 11 296 mixed ambulance service patients, most of whom were never suspected to be a presentation of stroke by the attending clinician.35 Therefore this is not a meaningful evaluation of a stroke scale, as paramedics are unlikely to require LAPSS to differentiate stroke from myocardial infarction or trauma, and augments specificity by elevating the true negative rate. In other cohorts, the negative likelihood ratios reported for LAPSS are similar to other instruments.

Future studies of stroke recognition instruments should report the prevalence of stroke in the study population, and the proportion of those patients who had instrument detectable signs following specialist assessment. This would allow evaluation of whether reported sensitivity reflected the instrument itself, or is likely to be limited to the population assessed and so facilitate comparison between studies. Studies within a service redirecting patients on the basis of the test result must ensure that all instrument negative cases transported elsewhere are identified, in order to avoid artificially deflating the false and true negative rates. Determination of inter-rater reliability for certain more complex elements of instruments such as ROSIER (visual field assessment) and LAPSS (assessment that someone is ‘wheelchair bound or bedridden’) would also be informative.

Limitations

Our review is limited by our inclusion of papers only in the English and German languages, and indexed in the MEDLINE and EMBASE databases. It is further limited by the methodological issues identified in the papers themselves, which prevented combination of data in a reliable analysis of clinical effectiveness.

Conclusion

In the absence of data supporting any single stroke identification instrument, clinicians should make a choice by considering the main purpose (all cases or selective emergency treatment), the timing of application along the clinical pathway and the service setting. Future studies should report instrument detectable stroke over time within clearly defined settings and clinical cohort boundaries, including inter-rater reliability. Knowledge of these factors will facilitate clearer recommendations about the choice of stroke recognition instruments under different operational conditions.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors MR, CIP and GAF had the idea for the project. MR and DB designed and executed the search strategy, before independently reviewing hits and extracting data. CIP adjudicated where differences occurred. MR drafted the initial version of the manuscript, with DB composing tables. All authors contributed to subsequent redrafting of the manuscript for intellectual content.

  • Funding MR was funded by a Teaching and Research Fellowship from Northumbria Healthcare NHS Foundation Trust. GAF is supported by an NIHR Senior Investigator Award.

  • Competing interests GAF has been paid lecture fees for attending and speaking at workshops held by Boehringer Ingelheim. His institution has received research funding for stroke-related activities from Boehringer Ingelheim and grant assistance towards administrative expenses for coordination of Safe Implementation of Treatments for Stroke in the UK. GAF has also received funding from Lundbeck A/S in relation to participation in the steering committee for DIAS 3 and 4.

  • Provenance and peer review Not commissioned; externally peer reviewed.