Rapid Response SystemsAre observation selection methods important when comparing early warning score performance?☆
Introduction
Several prior publications by our group have assessed the performance of the early warning scores (EWS) used to identify patients’ severity of illness.1, 2, 3 EWS systems allocate points in a weighted manner, based on the derangement of a predetermined set of patient vital signs variables (e.g., blood pressure, heart rate, breathing rate, temperature) from an arbitrarily agreed “normal” range. The points for each variable are summed and the total is used to inform a change in the patient's vital sign monitoring schedule and/or trigger a call for expert help at the bedside.
Our performance evaluations of EWS systems have often used all the observations sets from a sample of patient episodes and, therefore, contain multiple vital sign observation sets from the same patient episode in the analysis.2, 3 Multiple observations may be within 24 h of death (or another adverse outcome). We have considered an EWS to be better than another if it has a significantly (p < 0.05) higher area under the ROC curve (AUROC,4 a measure of discrimination). Sicker patients generally have more vital sign assessments, particularly immediately before an adverse outcome, and especially if the vital sign monitoring schedule is driven by an EWS value. A previous review of our manuscripts have suggested that this lack of independence of the data points in the sample data sets may influence the measured discriminatory performance of an EWS. By extension, it is possible that an EWS that appears significantly better than another when all observations are used may appear significantly worse if only one observation was used from each episode.
EWS systems are implemented clinically as if vital sign measurements and derived EWS values are independent. EWS escalation decisions are generally binary. For example, an EWS value of 4 might result in no clinical intervention, whereas a value of 5 might require both a change in vital signs frequency and an assessment by a doctor (irrespective of the fact that the previous EWS was 0 or 4). Consequently, it is the extent of derangement of physiology at any given time, and not the degree of abnormality of any previous measurements, that determines actions taken based on the EWS score.
One study by our group5 has suggested that treating vital signs and derived EWS values as independent may be reasonable, as an alternative technique of using one randomly chosen observation set per episode did not significantly affect discrimination of the combined outcome of cardiac arrest, unanticipated ICU admission or death within 24 h. In this study,5 as with others,1, 2, 3 the ability of the EWSs to discriminate the risk of a range of adverse outcome has been compared using the AUROC.4 The use of multiple observation sets per episode has the potential to bias the AUROC as episodes with more observations may disproportionately influence the AUROC compared to those with fewer observations.
The aim of this study was to determine whether a lack of independence between data points when sampling patient observations might significantly change the ranking of EWS systems by their AUROC (i.e., lead to one EWS having significantly higher AUROC than another under one method of choosing observations, but significantly lower AUROC than the other under another method). We compared the performance of EWSs using three methods of observation selection: (1) all observations, (2) one randomly chosen observation set per episode, and (3) one observation set per episode based on choosing a random point in time within each episode.
Section snippets
Method
This research falls within local research ethics committee approval (08/02/1394) from the Isle of Wight, Portsmouth and South East Hampshire Research Ethics Committee.
Results
In the study period, there were 64,285 episodes of care with admission on or after 25/05/2011 and discharge on or before 31/12/2012, where the patient was aged ≥16, the patient was not discharged alive on the day of admission and one or more observations were taken during the last 24 h of the stay. Associated with these episodes of care were 1395,941 observation sets (mean 21.7 observation sets per episode). Of these episodes, 30,723 (48%) were for male patients and the mean age at admission was
Discussion
For the three observation selection methods studies, there were no significant changes in the rank of EWSs by their AUROCs except for EWSs that included age. Overall, the findings of this research suggest that vital signs and derived EWS values for EWSs that do not include age can be treated as if they were independent (even though the ICCs demonstrate that there is within-episode dependence). Therefore use of multiple observation sets from a single episode in assessing the performance of EWS
Conclusions
Using multiple observations from each episode of care does not significantly change the ranking of EWSs compared to using only one observation from each episode, as long as no EWS includes age. This is in spite of observed dependence between vital signs observations collected during the same episode of care.
The method of observation selection can affect the AUROCs recorded—higher AUROCs (significantly higher for many EWSs) are recorded when only one observation is used from each episode. For
Conflict of interest statement
VitalPAC is a collaborative development of The Learning Clinic Ltd (TLC) and Portsmouth Hospitals NHS Trust (PHT). At the time of the study, PHT had a royalty agreement with TLC to pay for the use of PHT intellectual property within the VitalPAC product. PM, DP, PF and PS are employed by PHT. GS was an employee of PHT until 31/03/2011. PS, PF, and the wives of GS and DP are minority shareholders in TLC. GS, DP, and PS are unpaid research advisors to TLC, and have received reimbursement of
Funding
None.
Acknowledgements
The authors would like to acknowledge the efforts of the medical, nursing and administrative staff at Portsmouth Hospitals NHS Trust who collected the data used in this study. Dr Stuart Jarvis had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
References (14)
- et al.
Review and performance evaluation of aggregate weighted ‘track and trigger’ systems
Resuscitation
(2008) - et al.
The ability of the National Early Warning Score (NEWS) to discriminate patients at risk of early cardiac arrest, unanticipated intensive care unit admission, and death
Resuscitation
(2013) - et al.
ViEWS—towards a national Early Warning Score for detecting adult inpatient deterioration
Resuscitation
(2010) - et al.
Decision-tree early warning score (DTEWS) validates the design of the National Early Warning Score (NEWS)
Resuscitation
(2014) - et al.
Centile-based early warning scores derived from statistical distributions of vital signs
Resuscitation
(2011) - et al.
Trajectories of the averaged abbreviated VitalPAC early warning score (AbEWS) and clinical course of 44,531 consecutive admissions hospitalized for acute medical illness
Resuscitation
(2014) - et al.
The meaning and use of the area under a receiver operating characteristic (ROC) curve
Radiology
(1982)
Cited by (26)
The performance of the National Early Warning Score and National Early Warning Score 2 in hospitalised patients infected by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)
2021, ResuscitationCitation Excerpt :This is a single centre study such that the results are not necessarily transferrable and require external validation. We used repeated observation sets from the same patient episode in the analysis, making the assumption that the observation sets are independent based on previous work by our group.31 The conclusions of our work are also based on the assumption that all patients were treated optimally and equitably.
Evaluation of the efficacy of the National Early Warning Score in predicting in-hospital mortality via the risk stratification
2018, Journal of Critical CareCitation Excerpt :The NEWS has a good ability to discriminate acutely ill patients at risk of clinical deterioration within 24 h, as well as at events such as cardiac arrest, unexpected admission to an ICU, and death [12,15-17]. This tool is more effective than 33 other systems for predicting the individual outcomes of unexpected ICU admission or death but not cardiac arrest alone [12,15,20,21]. However, the efficacy of the NEWS as a predictor of in-hospital mortality has not been validated because the tool was developed to predict short-term outcomes occurring within 24 h.
Comparison of the Between the Flags calling criteria to the MEWS, NEWS and the electronic Cardiac Arrest Risk Triage (eCART) score for the identification of deteriorating ward patients
2018, ResuscitationCitation Excerpt :Accuracy comparisons were performed using sensitivity, specificity, and false positive rates. The area under the receiver operating curve (AUROC) was used to evaluate score discrimination with vital sign observations treated as if they are independent as per previous studies [20]. A two-tailed p-value of less than 0.05 was considered statistically significant.
Scoping review: The use of early warning systems for the identification of in-hospital patients at risk of deterioration
2017, Australian Critical CareCitation Excerpt :The AUROC predicts risk rather than outcomes.45 Jarvis et al.25 tested their theory using three vital signs sampling methods—multiple sets of vital signs; one randomly selected set of vital signs; and one set of vital signs taken at a predetermined time preceding the SAE, and applying them to efficacy testing for 35 published EWS. They found that the sampling method had minimal impact on the results and the efficiency rank order of the various EWS remained largely unchanged for all three vital signs sampling methods.
- ☆
A Spanish translated version of the summary of this article appears as Appendix in the final online version at http://dx.doi.org/10.1016/j.resuscitation.2015.01.033.