Are observation selection methods important when comparing early warning score performance?

doi:10.1016/j.resuscitation.2015.01.033

Resuscitation

Volume 90, May 2015, Pages 1-6

https://doi.org/10.1016/j.resuscitation.2015.01.033 Get rights and content

Abstract

Introduction

Sicker patients generally have more vital sign assessments, particularly immediately before an adverse outcome, and especially if the vital sign monitoring schedule is driven by an early warning score (EWS) value. This lack of independence could influence the measured discriminatory performance of an EWS.

Methods

We used a population of 1564,143 consecutive vital signs observation sets collected as a routine part of patients’ care. We compared 35 published EWSs for their discrimination of the risk of death within 24 h of an observation set using (1) all observations in our dataset, (2) one observation per patient care episode, chosen at random and (3) one observation per patient care episode, chosen as the closest to a randomly selected point in time in each episode. We compared the area under the ROC curve (AUROC) as a measure of discrimination for each of the 35 EWSs under each observation selection method and looked for changes in their rank order.

Results

There were no significant changes in rank order of the EWSs based on AUROC between the different observation selection methods, except for one EWS that included age among its components. Whichever method of observation selection was used, the National Early Warning Score (NEWS) showed the highest discrimination of risk of death within 24 h. AUROCs were higher when only one observation set was used per episode of care (significantly higher for many EWSs, including NEWS).

Conclusions

Vital sign measurements can be treated as if they are independent – multiple observations can be used from each episode of care – when comparing the performance and ranking of EWSs, provided no EWS includes age.

Introduction

Several prior publications by our group have assessed the performance of the early warning scores (EWS) used to identify patients’ severity of illness.1, 2, 3 EWS systems allocate points in a weighted manner, based on the derangement of a predetermined set of patient vital signs variables (e.g., blood pressure, heart rate, breathing rate, temperature) from an arbitrarily agreed “normal” range. The points for each variable are summed and the total is used to inform a change in the patient's vital sign monitoring schedule and/or trigger a call for expert help at the bedside.

Our performance evaluations of EWS systems have often used all the observations sets from a sample of patient episodes and, therefore, contain multiple vital sign observation sets from the same patient episode in the analysis.2, 3 Multiple observations may be within 24 h of death (or another adverse outcome). We have considered an EWS to be better than another if it has a significantly (p < 0.05) higher area under the ROC curve (AUROC,⁴ a measure of discrimination). Sicker patients generally have more vital sign assessments, particularly immediately before an adverse outcome, and especially if the vital sign monitoring schedule is driven by an EWS value. A previous review of our manuscripts have suggested that this lack of independence of the data points in the sample data sets may influence the measured discriminatory performance of an EWS. By extension, it is possible that an EWS that appears significantly better than another when all observations are used may appear significantly worse if only one observation was used from each episode.

EWS systems are implemented clinically as if vital sign measurements and derived EWS values are independent. EWS escalation decisions are generally binary. For example, an EWS value of 4 might result in no clinical intervention, whereas a value of 5 might require both a change in vital signs frequency and an assessment by a doctor (irrespective of the fact that the previous EWS was 0 or 4). Consequently, it is the extent of derangement of physiology at any given time, and not the degree of abnormality of any previous measurements, that determines actions taken based on the EWS score.

One study by our group⁵ has suggested that treating vital signs and derived EWS values as independent may be reasonable, as an alternative technique of using one randomly chosen observation set per episode did not significantly affect discrimination of the combined outcome of cardiac arrest, unanticipated ICU admission or death within 24 h. In this study,⁵ as with others,1, 2, 3 the ability of the EWSs to discriminate the risk of a range of adverse outcome has been compared using the AUROC.⁴ The use of multiple observation sets per episode has the potential to bias the AUROC as episodes with more observations may disproportionately influence the AUROC compared to those with fewer observations.

The aim of this study was to determine whether a lack of independence between data points when sampling patient observations might significantly change the ranking of EWS systems by their AUROC (i.e., lead to one EWS having significantly higher AUROC than another under one method of choosing observations, but significantly lower AUROC than the other under another method). We compared the performance of EWSs using three methods of observation selection: (1) all observations, (2) one randomly chosen observation set per episode, and (3) one observation set per episode based on choosing a random point in time within each episode.

Section snippets

Method

This research falls within local research ethics committee approval (08/02/1394) from the Isle of Wight, Portsmouth and South East Hampshire Research Ethics Committee.

Results

In the study period, there were 64,285 episodes of care with admission on or after 25/05/2011 and discharge on or before 31/12/2012, where the patient was aged ≥16, the patient was not discharged alive on the day of admission and one or more observations were taken during the last 24 h of the stay. Associated with these episodes of care were 1395,941 observation sets (mean 21.7 observation sets per episode). Of these episodes, 30,723 (48%) were for male patients and the mean age at admission was

Discussion

For the three observation selection methods studies, there were no significant changes in the rank of EWSs by their AUROCs except for EWSs that included age. Overall, the findings of this research suggest that vital signs and derived EWS values for EWSs that do not include age can be treated as if they were independent (even though the ICCs demonstrate that there is within-episode dependence). Therefore use of multiple observation sets from a single episode in assessing the performance of EWS

Conclusions

Using multiple observations from each episode of care does not significantly change the ranking of EWSs compared to using only one observation from each episode, as long as no EWS includes age. This is in spite of observed dependence between vital signs observations collected during the same episode of care.

The method of observation selection can affect the AUROCs recorded—higher AUROCs (significantly higher for many EWSs) are recorded when only one observation is used from each episode. For

Conflict of interest statement

VitalPAC is a collaborative development of The Learning Clinic Ltd (TLC) and Portsmouth Hospitals NHS Trust (PHT). At the time of the study, PHT had a royalty agreement with TLC to pay for the use of PHT intellectual property within the VitalPAC product. PM, DP, PF and PS are employed by PHT. GS was an employee of PHT until 31/03/2011. PS, PF, and the wives of GS and DP are minority shareholders in TLC. GS, DP, and PS are unpaid research advisors to TLC, and have received reimbursement of

Funding

None.

Acknowledgements

The authors would like to acknowledge the efforts of the medical, nursing and administrative staff at Portsmouth Hospitals NHS Trust who collected the data used in this study. Dr Stuart Jarvis had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

References (14)

G.B. Smith et al.
Review and performance evaluation of aggregate weighted ‘track and trigger’ systems
Resuscitation
(2008)
G.B. Smith et al.
The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death
Resuscitation
(2013)
D.R. Prytherch et al.
ViEWS—towards a national Early Warning Score for detecting adult inpatient deterioration
Resuscitation
(2010)
T. Badriyah et al.
Decision-tree early warning score (DTEWS) validates the design of the National Early Warning Score (NEWS)
Resuscitation
(2014)
L. Tarassenko et al.
Centile-based early warning scores derived from statistical distributions of vital signs
Resuscitation
(2011)
A. Murray et al.
Trajectories of the averaged abbreviated VitalPAC early warning score (AbEWS) and clinical course of 44,531 consecutive admissions hospitalized for acute medical illness
Resuscitation
(2014)
J.A. Hanley et al.
The meaning and use of the area under a receiver operating characteristic (ROC) curve
Radiology
(1982)

There are more references available in the full text version of this article.

Cited by (26)

Evaluating the performance of the National Early Warning Score in different diagnostic groups
2023, Resuscitation
The National Early Warning Score (NEWS) is used in hospitals across the UK to detect deterioration of patients within care pathways. It is used for most patients, but there are relatively few studies validating its performance in groups of patients with specific conditions.
The performance of NEWS was evaluated against 36 other Early Warning Scores, in 123 patient groups, through use of the area under the receiver operating characteristic (AUROC) curve technique, to compare the abilities of each Early Warning Score to discriminate an outcome within 24hrs of vital sign recording. Outcomes evaluated were death, ICU admission, or a combined outcome of either death or ICU admission within 24 hours of an observation set.
The National Early Warning Score 2 performs either best or joint best within 120 of the 123 patient groups evaluated and is only outperformed in prediction of unanticipated ICU admission. When outperformed by other Early Warning Scores in the remaining 3 patient groups, the performance difference was marginal.
Consistently high performance indicates that NEWS is a suitable early warning score to use for all diagnostic groups considered by this analysis, and patients are not disadvantaged through use of NEWS in comparison to any of the other evaluated Early Warning Scores.
The performance of the National Early Warning Score and National Early Warning Score 2 in hospitalised patients infected by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
2021, Resuscitation
Citation Excerpt :
This is a single centre study such that the results are not necessarily transferrable and require external validation. We used repeated observation sets from the same patient episode in the analysis, making the assumption that the observation sets are independent based on previous work by our group.31 The conclusions of our work are also based on the assumption that all patients were treated optimally and equitably.
Since the introduction of the UK’s National Early Warning Score (NEWS) and its modification, NEWS2, coronavirus disease 2019 (COVID-19), has caused a worldwide pandemic. NEWS and NEWS2 have good predictive abilities in patients with other infections and sepsis, however there is little evidence of their performance in COVID-19.
Using receiver-operating characteristics analyses, we used the area under the receiver operating characteristic (AUROC) curve to evaluate the performance of NEWS or NEWS2 to discriminate the combined outcome of either death or intensive care unit (ICU) admission within 24 h of a vital sign set in five cohorts (COVID-19 POSITIVE, n = 405; COVID-19 NOT DETECTED, n = 1716; COVID-19 NOT TESTED, n = 2686; CONTROL 2018, n = 6273; CONTROL 2019, n = 6523).
The AUROC values for NEWS or NEWS2 for the combined outcome were: COVID-19 POSITIVE, 0.882 (0.868−0.895); COVID-19 NOT DETECTED, 0.875 (0.861−0.89); COVID-19 NOT TESTED, 0.876 (0.85−0.902); CONTROL 2018, 0.894 (0.884−0.904); CONTROL 2019, 0.842 (0.829−0.855).
The finding that NEWS or NEWS2 performance was good and similar in all five cohorts (range = 0.842−0.894) suggests that amendments to NEWS or NEWS2, such as the addition of new covariates or the need to change the weighting of existing parameters, are unnecessary when evaluating patients with COVID-19. Our results support the national and international recommendations for the use of NEWS or NEWS2 for the assessment of acute-illness severity in patients with COVID-19.
A comparison of the ability of the National Early Warning Score and the National Early Warning Score 2 to identify patients at risk of in-hospital mortality: A multi-centre database study
2019, Resuscitation
To compare the ability of the National Early Warning Score (NEWS) and the National Early Warning Score 2 (NEWS2) to identify patients at risk of in-hospital mortality and other adverse outcomes.
We undertook a multi-centre retrospective observational study at five acute hospitals from two UK NHS Trusts. Data were obtained from completed adult admissions who were not fit enough to be discharged alive on the day of admission. Diagnostic coding and oxygen prescriptions were used to identify patients with type II respiratory failure (T2RF). The primary outcome was in-hospital mortality within 24 h of a vital signs observation. Secondary outcomes included unanticipated intensive care unit admission or cardiac arrest within 24 h of a vital signs observation. Discrimination was assessed using the c-statistic.
Among 251,266 adult admissions, 48,898 were identified to be at risk of T2RF by diagnostic coding. In this group, NEWS2 showed statistically significant lower discrimination (c-statistic, 95% CI) for identifying in-hospital mortality within 24 h (0.860, 0.857–0.864) than NEWS (0.881, 0.878-0.884). For 1394 admissions with documented T2RF, discrimination was similar for both systems: NEWS2 (0.841, 0.827-0.855), NEWS (0.862, 0.848–0.875). For all secondary endpoints, NEWS2 showed no improvements in discrimination.
NEWS2 modifications to NEWS do not improve discrimination of adverse outcomes in patients with documented T2RF and decrease discrimination in patients at risk of T2RF. Further evaluation of the relationship between SpO₂ values, oxygen therapy and risk should be investigated further before wide-scale adoption of NEWS2.
Evaluation of the efficacy of the National Early Warning Score in predicting in-hospital mortality via the risk stratification
2018, Journal of Critical Care
Citation Excerpt :
The NEWS has a good ability to discriminate acutely ill patients at risk of clinical deterioration within 24 h, as well as at events such as cardiac arrest, unexpected admission to an ICU, and death [12,15-17]. This tool is more effective than 33 other systems for predicting the individual outcomes of unexpected ICU admission or death but not cardiac arrest alone [12,15,20,21]. However, the efficacy of the NEWS as a predictor of in-hospital mortality has not been validated because the tool was developed to predict short-term outcomes occurring within 24 h.
To investigate the efficacy of the National Early Warning Score (NEWS) in predicting in-hospital mortality.
This was a retrospective observational study and the electronic medical records of the patients were reviewed based on NEWS at the time of admission.
The performance of NEWS was effective in predicting hospital mortality (area under the curve: 0.765; 95% confidence interval: 0.659–0.846). Based on the Kaplan Meier survival curves, the survival time of patients who are at high risk according to NEWS was significantly shorter than that of patients who are at low risk (p < 0.001). Results of the multivariate Cox proportional hazards regression analysis showed that the hazard ratios of patients who are at medium and high risk based on NEWS were 2.6 and 4.7, respectively (p < 0.001). In addition, our study showed that the combination model that used other factors, such as age and diagnosis, was more effective than NEWS alone in predicting hospital mortality (NEWS: 0.765; combination model: 0.861; p < 0.005).
NEWS is a simple and useful bedside tool for predicting in-hospital mortality. In addition, the rapid response team must consider other clinical factors as well as screening tools to improve clinical outcomes.
Comparison of the Between the Flags calling criteria to the MEWS, NEWS and the electronic Cardiac Arrest Risk Triage (eCART) score for the identification of deteriorating ward patients
2018, Resuscitation
Citation Excerpt :
Accuracy comparisons were performed using sensitivity, specificity, and false positive rates. The area under the receiver operating curve (AUROC) was used to evaluate score discrimination with vital sign observations treated as if they are independent as per previous studies [20]. A two-tailed p-value of less than 0.05 was considered statistically significant.
Traditionally, paper based observation charts have been used to identify deteriorating patients, with emerging recent electronic medical records allowing electronic algorithms to risk stratify and help direct the response to deterioration.
We sought to compare the Between the Flags (BTF) calling criteria to the Modified Early Warning Score (MEWS), National Early Warning Score (NEWS) and electronic Cardiac Arrest Risk Triage (eCART) score.
Multicenter retrospective analysis of electronic health record data from all patients admitted to five US hospitals from November 2008-August 2013.
Main outcome measures: Cardiac arrest, ICU transfer or death within 24 h of a score
Overall accuracy was highest for eCART, with an AUC of 0.801 (95% CI 0.799–0.802), followed by NEWS, MEWS and BTF respectively (0.718 [0.716–0.720]; 0.698 [0.696–0.700]; 0.663 [0.661–0.664]). BTF criteria had a high risk (Red Zone) specificity of 95.0% and a moderate risk (Yellow Zone) specificity of 27.5%, which corresponded to MEWS thresholds of > = 4 and > = 2, NEWS thresholds of > = 5 and > = 2, and eCART thresholds of > = 12 and > = 4, respectively. At those thresholds, eCART caught 22 more adverse events per 10,000 patients than BTF using the moderate risk criteria and 13 more using high risk criteria, while MEWS and NEWS identified the same or fewer.
An electronically generated eCART score was more accurate than commonly used paper based observation tools for predicting the composite outcome of in-hospital cardiac arrest, ICU transfer and death within 24 h of observation. The outcomes of this analysis lend weight for a move towards an algorithm based electronic risk identification tool for deteriorating patients to ensure earlier detection and prevent adverse events in the hospital.
Scoping review: The use of early warning systems for the identification of in-hospital patients at risk of deterioration
2017, Australian Critical Care
Citation Excerpt :
The AUROC predicts risk rather than outcomes.45 Jarvis et al.25 tested their theory using three vital signs sampling methods—multiple sets of vital signs; one randomly selected set of vital signs; and one set of vital signs taken at a predetermined time preceding the SAE, and applying them to efficacy testing for 35 published EWS. They found that the sampling method had minimal impact on the results and the efficiency rank order of the various EWS remained largely unchanged for all three vital signs sampling methods.
Early warning systems (EWS) were developed as a means of alerting medical staff to patient clinical decline. Since 85% of severe adverse events are preceded by abnormal physiological signs, the patient bed-side vital signs observation chart has emerged as an EWS tool to help staff identify and quantify deteriorating patients. There are three broad categories of patient observation chart EWS: single or multiple parameter systems; aggregated weighted scoring systems; or combinations of single or multiple parameter and aggregated weighted scoring systems.
This scoping review is an overview of quantitative studies and systematic reviews examining the efficiency of the adult EWS charts in the recognition of in-hospital patient deterioration.
A broad search was undertaken of peer-reviewed publications, official government websites and databases housing research theses, using combinations of keywords and phrases.
CINAHL with full text; MedLine, PsycINFO, MasterFILE Premier, GreenFILE and ScienceDirect. Also, the Cochrane Library database, Department of Health government websites and Ethos, ProQuest and Trove databases were searched.
Paediatric, obstetric and intensive care studies, studies undertaken at the point of hospital admission or pre-admission, non-English publications and editorials.
Five hundred and sixty five publications, government documents, reports and theses were located of which 91 were considered and 21 were included in the scoping review. Of the 21 publications eight studies compared the efficacy of various EWS and 13 publications validated specific EWS.
There is low level quantitative evidence that EWS improve patient outcomes and strong anecdotal evidence that they augment the ability of the clinical staff to recognise and respond to patient decline, thus reducing the incidence of severe adverse events. Although aggregated weighted scoring systems are most frequently used, the efficiency of the specific EWS appears to be dependent on the patient cohort, facilities available and staff training and attitude. While the review demonstrates support for EWS, researchers caution that given the contribution of human factors to the EWS decision-making process, patient EWS charts alone cannot replace good clinical judgment.

View all citing articles on Scopus

^☆: A Spanish translated version of the summary of this article appears as Appendix in the final online version at http://dx.doi.org/10.1016/j.resuscitation.2015.01.033.

View full text

Rapid Response SystemsAre observation selection methods important when comparing early warning score performance?☆

Abstract

Introduction

Methods

Results

Conclusions

Introduction

Section snippets

Method

Results

Discussion

Conclusions

Conflict of interest statement

Funding

Acknowledgements

Resuscitation

Resuscitation

Resuscitation

Resuscitation

Resuscitation

Resuscitation

The meaning and use of the area under a receiver operating characteristic (ROC) curve

Radiology

Rapid Response Systems
Are observation selection methods important when comparing early warning score performance?☆