A Multicenter Diagnostic Accuracy Study on Prehospital Stroke Screening Scales

1Prehospital and Hospital Emergency Research Center, Tehran University of Medical Sciences, Tehran, Iran 2Anesthesiology Department, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran 3Department of Epidemiology, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran 4Department of Emergency Medicine, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran 5Department of Emergency Medicine, Shohadaye Tajrish Hospital, Shahid Beheshti University of Medical Sciences, Tehran, Iran 6Department of Emergency Medicine, Sina Hospital, Tehran University of Medical Sciences, Tehran, Iran


Introduction
Acute ischemic stroke (AIS) is one of the most common and debilitating diseases, ranking third in the most common causes of death following heart disease and cancer. On average, every three seconds, one person dies due to stroke. 1 Brain cells are highly susceptible to ischemia, and if a large vein is blocked by thrombosis, about 1.9 million neurons are lost per minute. Therefore, each hour of delay in stroke treatment results in the loss of numerous brain cells that a human would lose in 3.6 years of his/her normal life. 2 Effective treatments is available for AIS, but a golden time has been defined in this regard. It has been reported that only 1%-8% of stroke patients receive proper treatment and the others face poor outcomes due to delayed referral. [3][4][5][6] Therefore, prompt action is vital to identify patients with stroke as soon as possible, even in the pre-hospital stage. 7 Although public education at the community level plays an undeniable role in promptly calling emergency medical services (EMS) after a stroke, effective interventions in the healthcare system begin from the moment that a patient or his/her companion contacts the EMS. Naturally, paraclinical diagnostic tests before hospitalization are almost impossible, and the diagnosis must be made solely on the basis of clinical presentations. Accurate and timely diagnosis allows the patient to be referred to the right place at the right time.
In recent years, several clinical scales have been introduced for this purpose, including recognition of stroke in the emergency room (ROSIER), Los Angeles Pre-Hospital Stroke Screening (LAPSS), Face-Arm-Speech-Time (FAST), Cincinnati Pre-hospital Stroke Scale (CPSS), Medic Prehospital Assessment for Code Stroke (Med PACS), Ontario Pre-Hospital Stroke Screening (OPSS), Melbourne Ambulance Stroke Screen (MASS) and PreHAST. [8][9][10][11][12][13][14][15] Choosing a scale depends on both its accuracy and ease of use. Confirming the accuracy of a scale by comparing it to other pre-hospital scales can play an important role in accurate diagnosis of acute stroke and thus, increases the chance of the patient benefiting from successful treatment. Therefore, we performed the present study to examine the accuracy of these criteria in terms of stroke diagnosis in patients admitted to the emergency department (ED) via a multicenter research project. However, in our study, the effectiveness of these scales was not evaluated in the field, and it is recommended for future studies.

Study Design
This diagnostic accuracy study was conducted in 2019, using a multi-centric approach, including four major teaching hospitals in Iran (Sina and Shohaday-e-Tajrish hospital in Tehran; Al-Zahra hospital in Isfahan; Golestan General hospital in Ahvaz).

Study Population
All patients who were referred to the ED of the mentioned hospitals, and underwent a brain magnetic resonance imaging (MRI) for a suspicious stroke after the evaluation of an in-charge physician were included in the study. Those with a history of head trauma, previous stroke, known neurological disease or previous neurological surgery, and those who had left the ED against medical advice before undergoing brain MRI were excluded. The sample size in this study was calculated based on the sample size formula for estimating the difference of the area under the curve (AUC) in the two stroke screening tools. We required 260 positive and negative patients for detection of a 5% difference in the AUC between two stroke screening tools based on the following assumption: 0.5 error type I, 80% test power to detect difference, lower AUC = 0.85 and 0.50 correlation between the two AUCs. Given that we expected 30% of suspected cases to have a stroke, we needed at least 800 suspicious stroke patients. The required sample size for each hospital was determined based on the proportion of patients suspected of stroke admitted in 2017. Then, in each center, all patients meeting the inclusion criteria entered the study from January 2018 until the intended sample size was achieved.

Data Gathering
From almost three years ago, on admission to the ED, an appropriate checklist was included in the patients' hospital files to assess proper neurological examination of patients with any neurological complaints (such as focal neurological deficit, headache, seizure, etc) and all required data for calculating the scales are easily available. Actually, all the findings of neurological examination are routinely recorded when the patient arrives in the ED. The data were gathered through a pre-prepared checklist consisting of three sections, using the patients' clinical records. The first section of the checklist includes basic characteristics and demographic data such as age, gender, past medical history, drug history, and the time of symptom onset. The second part includes physical examination findings of 19 items related to the 8 scales along with other manifestations such as vital signs, blood sugar level, and level of consciousness. The third part is dedicated to the final diagnosis based on the interpretation of brain MRI, which was considered as the gold standard for the diagnosis of AIS in the current study. All data were gathered under the supervision of an emergency medicine resident and three emergency medicine specialists. The required data were collected from the patients' records as well as the MRI images available in the hospital's picture archiving and communication system (PACS). The brain MRI scans were interpreted by both a radiologist and a neurologist.

Statistical Analysis
We described data using frequency and percentage or mean and standard deviation (SD). We used the chi-square test to assess the distribution difference of demographic characteristics and the history of diseases, as well as risk factors between patients with and without a final diagnosis of stroke. Additionally, the independent t test was used for assessment of the mean difference in numerical variables such as age, between the two groups of patients. The normality of variables was assessed using the Kolmogorov-Smirnov test and graphical approaches. Also, we checked the homogeneity of variance using Levene's test.
We calculated the sensitivity, specificity, and positive and negative likelihood ratios of all eight screening tests with 95% confidence intervals (CIs) based on their originally defined scoring and cut-off points. The positive and negative predictive values with 95% CI for screening tests were calculated based on the patients' final diagnosis. Also, the prevalence of correct and incorrect diagnoses for each tool is presented [true positive (TP), false positive (FP), true negative (TN), and false negative (FN)]. We used the McNemar's chi-square test to compare the performance of each screening test based on the final diagnosis, and then calculated McNemar's odds ratio (OR) with 95% CI. McNemar's test presented the difference between predicted stroke cases with each screening tool and final diagnosis based on the gold standard. The sensitivities and specificities of screening tests were compared using the McNemar's Chi-square analysis described in previous articles. 16,17 First, the overall test of difference (sensitivity or specificity) between all pairwise comparisons of eight screening tests were conducted using a 4×4 extension of McNemar's test, and if the difference was significant, then the sensitivity and specificity were compared separately using a 2×2 contingency table of McNemar's test. Finally, we used the Youden's J statistic to compare the performance of the eight screening tests. The receiver operating characteristic (ROC) curve and the AUC with 95% CI of screening tools with a numerical score (ROSIER, LAPSS, FAST, and CPSS) were calculated and their AUCs were compared (as described by DeLong et al). 18 P value less than 0.05 was considered statistically significant and all statistical analyses were conducted using Stata version 14 (StataCorp LP, College Station, TX).

Results
Data from 805 patients suspected of stroke, who were transferred to ED by the EMS, were analyzed. In all, 463 patients (57.5%) were male. The participants' age was 6-95 years with a mean age of 66.9 years (SD = 13.9). Of all the registered patients, 562 (69.8%) had an ischemic stroke based on the gold standard. Table 1 reports the demographic and baseline characteristics of the studied patients. The prevalence of ischemic stroke was higher in males than females (73.9% vs. 64.3%; P = 0.004). The history of ischemic heart disease was higher in patients with stroke (74.9% vs. 67.1%; P = 0.021). Also, patients with stroke were older (P < 0.001).
The prevalence of ischemic stroke based on screening tools ranged between 55.4% and 81.8%, and the lowest and highest prevalence pertained to LAPSS and Med PACS, respectively. The lowest True results (true positive and negative) reported by screening tools were seen in OPSS (74.4%) and the highest were seen in ROSIER (84.4%). The difference between the final diagnosis made in the hospital regarding stroke and the predictions by the screening tests was statistically significant (P < 0.001), except for OPSS (67.8% vs 69.8% positive; P = 0.531) and MASS (72.9% vs 69.8% positive; P = 0.424). Thus, the odds of a positive diagnosis with ROSIER, FAST, CPSS, Med PACS, and PreHAST was about 3.4 to 5.0 times higher than the actual hospital diagnosis ( Table 2).
The accuracy of screening tests was between 63.0% and 84.4%. Their sensitivity and specificity were between 50.2% to 95.7% and 46.5% to 92.2%, respectively. Also, the positive predictive value and negative predictive value of the screening tests were between 80.1% to 93.7% and 44.4% to 83.9%, respectively (Table 3).
Among all the screening tests, LAPSS had the lowest sensitivity (71.9%) and Med PACS had the highest sensitivity (95.7%). In addition, PreHAST had the lowest specificity (46.5%) and LAPSS had the highest specificity (82.8%) ( Figure 1). The test of difference (sensitivity or specificity) showed a statistically significant difference between all pairs of tests in pairwise comparison, except for CPSS and FAST (P = 0.368); so, the sensitivity and specificity were not significantly different between the two tests. However, the test of difference (sensitivity or specificity) showed a marginally significant difference between ROSIER and FAST (P = 0.060).
The Youden index for ROSIER and LAPSS was 55.1% and 54.7%, respectively, which was higher than that of the other tests. Therefore, based on this index and assuming that sensitivity and specificity have equal importance, ROSIER and LAPSS had better performance compared to others (Table 3).
Among the screening tools with a numerical score, the AUC of ROSIER and FAST was higher than CPSS and LAPSS. The AUC of both ROSIER and FAST was 0.850, which was significantly higher than the AUC of LAPSS (P = 0.002). The pairwise comparison of AUC-ROC was not significantly different for ROSIER vs. CPSS (P = 0.672), or FAST vs. CPSS (P = 0.245) (Figure 2).

Discussion
According to the results of the analysis, Med PACS has the highest sensitivity among the 9 assessed tools at cutoff = 1; it also has the highest sensitivity at cut-off = 3. FAST, which is currently used by the Iranian EMS to detect stroke, has a sensitivity of almost 95% at cut-off = 1. Obviously, in pre-hospital settings, the sensitivity of a test is much more important than its specificity, and the tendency to screen correctly and not to miss positive cases is a priority. Therefore, based on the findings of the present study, highly sensitive tests that can be used in this regard are CPSS, FAST, and Med PACS, all of which have about 95% sensitivity. On the other hand, in hospital settings, where diagnoses are expected to be more precise and specialized, examinations should be applied to avoid wasting resources, so tests with higher specificity are required. Unfortunately, none of the studied tools were desirable (specificity above 90%) in any of the examined cut-offs; so, in order to define a criterion for ruling out the diagnosis of stroke in the ED with a clinical rule, it may be necessary to perform more analysis and consider designing a new scoring system for this purpose.
Each of these criteria has its strengths and weaknesses. PreHAST, LAPSS, MASS, and OPSS have considered more details, and therefore, completing their checklists is time-consuming and also difficult without specific training. [9][10][11]15,19 On the other hand, patient assessment with FAST and CPSS is very easy and feasible for almost everyone and does not require any special training. These two tools do not consider lower limbs and eye symptoms. However, it should be mentioned that, given the lack of exclusion criteria, they may declare stroke-mimic cases as false-positive stroke. 11,12 MASS was indeed designed through integrating LAPSS and CPSS. LAPSS and MASS exclude patients with a history of seizures, those younger than 45 years, bedridden patients and those in a wheelchair. LAPSS has tried to increase specificity and sensitivity by examining blood glucose level and unilateral symptoms. The time of symptom onset has been taken into account by LAPSS but not by MASS. On the other hand, speech difficulty is assessed by MASS but not by LAPSS. In comparison with MASS and LAPSS, Med PACS considers seizure, the onset of symptoms, and blood glucose level, but age has not been taken into account. 14,20,21 OPSS does not consider the age and eye symptoms but excludes hypoglycemic and terminally ill patients as well as those under palliative care, and those with a transient ischemic attack and Glasgow coma scale < 10. 29 It is well-known that hypoglycemia is a stroke-mimic diagnosis that could easily be differentiated using a bed side testing of blood glucose, but this is not considered in CPSS, FAST, ROSIER, and PreHAST. It seems that it is an important weak point that leads to an increase in the number of false-positive stroke diagnoses in the prehospital setting when these tools are used. 9,25,27 History of seizure has been considered as a negative point in LAPSS, MASS, Med PACS, OPSS, and ROSIER, but not in FAST, CPSS, and PreHAST. It is known that seizure could occur due to stroke; on the other hand, the post-ictal phase of seizure may mimic stroke. So, it is very challenging to decide to ignore seizure or assign a negative score to it. 9,10,19 Pre-HAST is a new tool that has been designed based on NIHSS and has tried to cover everything, so completing its checklist is time-consuming and also difficult without training. Age, blood sugar level, history of seizures, and the time of symptom onset are not taken into account. In this scale, all four limbs are examined, so generalized or symmetric weakness can lead to a false positive decision. In general, eliminating those with a history of seizures and those younger than 45 years can cause adverse events, as stroke can also occur in young people, and seizures can be a symptom of a stroke. 11,15 ROSIER has assigned negative scores to seizure and syncope in order to better differentiate stroke and stroke mimics; also, by adding "new onset of symptoms", it has helped differentiate new stroke cases from old ones. 13,27 The key point that should be noted regarding the present study is that the instruments were only compared to a gold standard, namely MRI, and their effectiveness in dealing with patients on the scene may differ from the reported findings due to many reasons. For example, the level of knowledge and experience of the emergency medical technicians (EMTs) in this field is very important and a specific scale may not be useful due to difficulty on the scene. Therefore, in future studies, the efficacy of these tools should be examined at the time of dealing with patients in the pre-hospital setting. It will also be understandable if different results are achieved in different communities, as the level of knowledge and educational backgrounds of EMTs obviously vary in different countries.
Overall, the authors of the present article believe that imposing age restrictions might lead to missing young individuals with stroke, who in fact benefit more from treatment compared to old patients. Dealing with seizure is very challenging, as it can either occur due to stroke or stroke-like symptoms may manifest following its occurrence. Monitoring the patients' blood sugar level is definitely important and should be performed as part of routine vital signs examinations so that hypoglycemia cases can be easily eliminated. A previous history of stroke can largely affect the findings of physical examination. It might not lead to false negative results, but it will probably increase the number of false positive cases. Knowledge and skill of EMTs affect the findings of physical examinations; although increasing the number of items that should be considered might increase the accuracy of the screening tool, it might make the evaluation more difficult for EMTs and consequently, hinder the desired outcome. However, the use of calculators and telemedicine can be helpful in this regard.
This study was conducted retrospectively on registered data and was not conducted in the field. On the other hand, the strengths of the work are the number of registered patients as well as a multicenter approach, which adds to its reproducibility.
In conclusion, based on the findings of the present study, highly sensitive tests that can be used in this regard are CPSS, FAST, and Med PACS, all of which have about 95% sensitivity. On the other hand, none of the studied tools were desirable (specificity above 95%) in any of the examined cut-offs.

Authors' Contribution
AB, SK and PS: The conception and design of the work. SK, SM and FH: Data acquisition. HR and AB: Analysis and interpretation of data. AB, HR and SK: Drafting the work. PS, SM and FH: Revising it critically for important intellectual content. All the authors approved the final version to be published and agree to be accountable for all aspects of the work in ensuring