Obstetric triage systems: a systematic review of measurement properties (Clinimetric)

Background Since labor and delivery units often serve as emergency units for pregnant women, the use of obstetric triage systems with poor or inadequate quality can lead to unintended consequences such as over and under-triage and so a waste of humans and financial resources. Therefore, this systematic review was conducted to evaluate the measurement properties of obstetric triage tools. Methods PubMed, EMBASE, and Medline were searched to identify studies in October 2018 and were updated in May 2019. The risk of bias COSMIN checklist was used to evaluate the quality of the studies. The quality of every measurement property was appraised by the update criteria of COSMIN. Evidence quality was judged using the modified GRADE approach. Results A total of 444 studies were retrieved in initial search. Six studies evaluating 4 tools were included in this study. All the included studies reported only content validity and reliability. The quality of evidence varied from very low to moderate. The quality of content validity and reliability of the included tools was sufficient except for the reliability of the maternal-fetal triage index. The obstetric triage acuity scale (OTAS) was found to have higher reliability than other tools. Conclusions Due to insufficient evidence, the conclusions about the quality of measurement properties of each obstetric triage tool may be uncertain. This review emphasizes the necessity for further studies with robust methodological quality on the measurement properties of obstetric triage tools.


Background
Triage is the process of prioritizing patients based on the acuity of the problem to take the best treatment in the shortest possible time [1]. It includes brief and focused assessment and patient allocation to an acuity level, which determines the length of time a patient can safely wait for therapeutic screening examination and treatment [2]. The first time triage was used to prioritize medical care during the Napoleonic Wars in the late eighteenth century [3]. Later in the 1950s, triage was introduced in the United States as an answer to the problem of overcrowding in the emergency department (ED) of hospitals. Emergency departments needed structured triage guidelines to implement this process, so several countries have designed and presented different triage systems [4,5]. The American College of Emergency Physicians (ACEP) also emphasizes supporting triage systems and believes that the quality, safety, and efficiency of patient care processes will be improved by implementing standardized emergency department triage acuity tools [4,6,7].
From 1986 to 2010, labor and delivery units that serve as emergency units for pregnant women triaged pregnant women based on standardized emergency acuity scales such as the Canadian Triage Acuity Scale (CTAS) or the Emergency Severity Index (ESI) [2,8,9]. Over time, studies have shown the limited applicability of these scales in obstetric triage [9][10][11][12] due to the following reasons. Firstly, obstetric triage is beyond the concise estimate and entails a thorough assessment of the mother and fetus [13]. Secondly, the acuity determinants do not reflect the variation of pregnancy manifestations or the specialized needs of obstetric patients [9]. These reasons led to the development of the first obstetric triage acuity scale in 2010 [14]. Since then, different obstetric triage acuity scales have been developed and validated, including the Florida Hospital System (FHS), Obstetric Triage Acuity Scale (OTAS), Maternal-Fetal Triage Index (MFTI), and Birmingham Symptom-Specific Obstetric Triage System (BSOTS) [6,11,15,16].
However, Angelini and LaFontaine showed that the use of an acuity or risk stratification scale specific to obstetric triage is one of the components of the best obstetric triage model [2]. Nevertheless, the use of obstetric triage systems with poor or inadequate quality can lead to unintended consequences, such as "under or over-triage" and so a waste of human and financial resources. Thus assessing the measurement properties of designed tools seems essential [17,18]. In the meantime, although it is recommended that instruments used in clinical settings be systematically evaluated for measurement properties [19], no evaluation has so far been carried out on the quality of the measurement properties for obstetric triage tools. Therefore, this systematic review was conducted to assess the quality of measurement properties of the existing obstetric triage tools based on the consensus-based standards for the selection of health measurement instruments (COSMIN) checklist. This checklist describes validity [content validity, construct validity (structural validity, hypotheses testing, cross-cultural validity), and criterion validity], reliability (internal consistency, reliability, measurement error), and responsiveness [20]. The COSMIN was developed for use in studies on the quality of patient-reported outcome measures (PROMs), but it can also be used for clinician-reported outcome measures.

Study type
The present study was the first systematic review conducted to evaluate the measurement properties of obstetric triage tools. This study was designed in line with the recommendations of the COSMIN initiative and reported in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) statement [20,21].

Eligibility criteria
Eligibility criteria for studies in this review included articles that have developed an obstetric triage tool or at least had reported one measurement property as defined in the COSMIN taxonomy related to triage in pregnant women [22]. Language and time restrictions were not considered in the search strategies. Studies were excluded if the tools were used for other purposes such as measuring outcomes or validating another tool. Editorials and conference abstracts were excluded.

Literature search strategy
A systematic literature search was conducted between 13-23 th October 2018 by two independent reviewers using the following electronic databases: PubMed, Medline (Ovid), and Embase. Search strategies included both free text words and subject headings and a sensitive search filter for measurement properties, which is available for PubMed, Medline, and EMBASE in the COS-MIN website (See Table 1) [23,24]. To ensure access to all available articles, a medical librarian searched the databases. The search was updated on 26 May 2019 to verify new publications. A total of 617 abstracts were retrieved as follows: PubMed = 141, Medline = 161, Embase = 308, and Grey literature = 7, of which173 duplicate abstracts were deleted.

Study selection
Abstracts and full texts were independently evaluated by two of the authors (AM and MI) and were selected if they had inclusion criteria. Disagreements about choosing an article were discussed, and discordance between the two reviewers was consulted with the third author. The selection process is shown in the flowchart (Fig. 1). All references of the included articles were searched to achieve additional related studies.

Data collection process and data extraction
The general characteristics for each study including design, the purpose, target population, and sample size of the study, and instrument characteristics, measurement properties, and methodological quality were extracted by the same two independent reviewers as previously mentioned (AM and MI).

Methodological quality assessment of the studies
The COSMIN Risk of bias checklist was used to assess information about the measurement properties and the methodological quality of every single study. This checklist contains the 10 boxes including, PROMs development, .ti,ab.) or (generaliza* or generalisa* or concordance).ti,ab. or (intraclass and correlation*).ti, ab. or (discriminative or factor analysis or factor analyses or dimension* or subscale*).ti,ab. or (multitrait and scaling and (analysis or analyses)).ti,ab. or (item discriminant or interscale correlation* or error or errors).ti,ab. or (variability and (analysis or values)).ti,ab. or (uncertainty and (measurement or measuring)).ti,ab. or (sensitiv* or responsive*).ti,ab. or ((minimal or minimally or clinical or clinically) and (important or significant or detectable) and (change or difference)).ti,ab. or (small* and (real or detectable) and (change or difference)).ti,ab.) and (triage or acuity or patient flow).ti,ab. and (obstetric or pregnancy).ti,ab. Fig. 1 The PRISMA flow diagram for an overview of the study selected content validity, structural validity, internal validity, crosscultural validity (measurement invariance), reliability, measurement error, criterion validity, hypotheses testing for construct validity, and responsiveness [20,25]. Each box of this checklist contains 3 to 35 standards referring to design requirements and preferred statistical methods [25]. Each item was rated using a 4-point rating scale (very good, adequate, doubtful, and inadequate). The overall rating of the quality of each study was determined using the lowest grade of any standard in the box was taken (i.e., "the worst score counts" principle). This quality could rate as very good, adequate, doubtful, or inadequate [20,26] (See the attached supplement file).

Quality of single studies on measurement properties
The result of a single study on a measurement property was rated against the updated criteria for good content validity and other measurement properties in the COS-NIM. Each issue was rated as either sufficient (+), insufficient (−), or indeterminate (?) [20,21] (See the attached supplement file). The updated criteria for measurement properties in the COSMIN are listed in Table 2.
Summarizing the evidence and grading the quality of the evidence The results of all available studies were qualitatively summarized to determine overall rating measurement properties as sufficient (+), insufficient (−), or inconsistent (±) based on guidelines for determining measurement properties in the COSMIN manual [20,21]. Then, the quality of the evidence was ranked using a modified GRADE approach in a way, high, moderate, low, and very low [26].

Systematic literature search
Six hundred and seventeen abstracts related to obstetric triage were found in search of 3 databases. After comparing the achieved results, 173 were removed due to duplicate, and the overall 444 abstracts were screened to enter this review. Abstracts were appraised, and 409 were excluded due to a lack of eligibility criteria. Thirtyfive full-texts, who had studied the 14 instruments, were assessed for eligibility for the study criteria. Twenty-nine of these studies did not meet the principles for inclusion and were excluded. The reasons for eliminating were not measuring the obstetric triage (10), no reporting the psychometric properties (15), and not rating the psychometric properties (4). As a result, the measurement properties of 4 obstetric triage tools used in 6 articles were evaluated (Fig. 1). Table 3 displayed the characteristics of the included tools in the review. All of these were developed in the latest 8 years (since 2011). The measured construct in 3 tools is obstetric triage, and the other measures the gynecology triage in addition to the obstetric triage. The target population of all instruments is pregnant women, but the Swiss Emergency Triage Scale (SETS) include gynecology patients in addition to pregnant women [29].

Included obstetric triage instruments
All of the tools are clinician-reported outcome measures and use a dichotomous (i.e., yes or no) rating system. MFTI and OTAS tools have 5 levels, but SETS and BSOTS tools have four. Information on the development and validation of the entered tools is described in Table 4. The development process is reported only for BSOTS and MFTI tools. The development group of BSOTS includes researchers and clinicians (obstetrician and senior midwives) who developed tools using the available evidence and consensus statements with the agreement of the local obstetric consultants. The task force that includes 6 nurses, one statistician, and one project manager, developed the MFTI based on the literature review and scenarios of actual pregnant women. The MFTI was validated by three professional groups such as physicians, certified nursemidwives, and nurses, SETS by nurses and midwives, OTAS by nurses only, and BSOTS by midwives only. All measures demonstrated some evidence of validation through the use of a normative study sample sizes. All studies reported using professionals with work experience except OTAS studies that reported nothing about work experience. The SETS study revealed a median of 15 months of work experience for participants. About half of the participants of BSOTS worked 1-2 times a week in the obstetric triage unit, and all of the participants of MFTI had minimal 3 years of experience in triage. Table 5 provides an overview of the quality ratings of the psychometric studies of all tools, which were evaluated against the standards of COSMIN risk of bias checklist. The development process is reported only in two BSOTS and MFTI tools. The methodological quality of tool development was rated inadequate in both studies. Content validity was done only for MFTI. According to the COSMIN checklist in content validity studies, both participants and experts should assess content relevance. In the MFTI study, the evaluation of the quality of relevance study was rated adequate according to participants, while it was rated doubtful based on the views of experts. (See the attached supplement file).  Reliability property was reported for all the studies. The quality of reliability studies of SETS and Smithson's OTAS was found adequate, those of BSOTS and Gratton's OTAS was doubtful, and that of MFTI was inadequate. The reasons for this rating are as follows. The reliability of BSOTS and Gratton's OTAS were calculated using un-weighted kappa, although SETS and Smithson's OTAS used weighted kappa, the weighting scheme was not described. In the reliability study of MFTI, the reliability test conditions were not similar between raters. A rater conducted it on the patient's bedside at arrival time, and another did it based on the patient's electronic health record retrospectively. The first rater was inexperienced about the tool studied, while the second-rater was from the research team and familiar with the tools. Other psychometric properties were not evaluated in any of the studies. (See the attached supplement file).

Evidence synthesis
Since according to the COSMIN guideline, if the quality of a study is rated indeterminate, it can be used from the view of reviewers in the quality rating. The quality evidence for content validity, comprehensiveness, and comprehensibility of MFTI was rated low. Because of the quality of development and content validity of examined studies were rated indeterminate, the rating was based on the view of reviewers. There was moderate-quality evidence for sufficient relevance of MFTI based on one adequate quality study of asking clinicians and a doubtful of asking experts. The insufficient reliability of MFTI was supported by very low-quality evidence ( Table 6).
The quality of evidence reliability based on a sufficient rate for OTAS was moderate, but it was low for SETS and BSOTS. OTAS proved to be the only tool with sufficient reliability (moderate-quality evidence).

Discussion
This systematic review was conducted to identify and evaluate the quality of the measurement properties of obstetric triage tools. Four tools were identified that evaluated obstetric triage fitting the definition used in this review. Three out of these tools were mainly developed for the pregnant population, and one was for both pregnant women and gynecology patients.
The COSMIN taxonomy was applied to guide a comprehensive summary of the measurement properties tools. This taxonomy consists of measurement properties including, internal consistency, reliability, measurement error, content (face validity), construct (hypotheses testing, structural, and cross-cultural validity), and criterion validity, and responsiveness. Since the obstetric triage tool is a formative model, the two measurement properties of internal consistency and structural validity are not relevant [20], but it is expected that other mentioned properties be tested for it. Each included study

Quality of the studies using the COSMIN taxonomy
The COSMIN Risk of bias checklist provides standards about the quality of the studies that examine the measurement properties [20,25]. The overall quality of the studies was mostly (66.7%) from inadequate to doubtful. To evaluate the inter and intra-rater reliability of SETS and to explore the factors associated with an optimal triage N = 40 trained triage professionals, first phase: n = 22 (

OTAS
Smithson [11] To test the interrater reliability and validity of OTAS and to determine the distribution of patient acuity and flow by OTAS level N = 8 triage nurses to test IRR of 110 clinical Scenarios Not reported Gratton [9] To compare the inter-rater reliability (IRR a ) in tertiary and community hospital settings and measure the intra-rater reliability (ITR b ) of OTAS; to establish the validity of OTAS, and to present the first revision of OTAS from the national obstetrical triage working group.  The common reasons for these COSMIN ratings were insufficient sample size, failure to report details of study methodology, lack of clarity in reporting of statistical analysis, and lack similarity of test conditions for the measurement. Some existing deficiencies may be readily improved via more detailed reporting in future studies. Given that the critical appraisal of studies is done based on what was reported in the articles, the COSMIN checklist must be considered in future studies. Content validity is one of the most fundamental features of measurement, and lack of its test can lead to errors in clinical judgment or inaccurate interpretation of assessment results by practitioners [30]. In this review, MFTI is the only obstetric triage tool whose content validity was assessed. Ruhl et al. used the Delphi method for content validation of this tool, which has been introduced as the best method for content validation of triage tools [6,31]. However, the rating of study quality for content validity of MFTI was inadequate for tool development and doubtful to adequate for study relevance. It seems the MFTI may need a modification of the measures to improve evidence. The OTAS, SETS, and BSOTS did not provide any evidence of content validity, highlighting a need for further research on all of the measurement properties of these tools.
Reliability was reported for all measures with variability in the quality of studies ranging from inadequate (MFTI), doubtful (Gratton's OTAS, BSOTS), to adequate (Smithson's OTAS, SETS). Three assessments (MFTI, BSOTS, and Smithson's OTAS) reported inter-rater reliability but did not examine intra-rater. The other two (i.e., SETS and Gratton's OTAS) conducted both inter and intra-rater reliability. Statistical analysis identified as optimal (ICC or weighted kappa) was used in three studies (MFTI, SETS, and Smithson's OTAS). As previously mentioned, the reliability study of MFTI, test conditions were not similar for the measurements.

Overall quality of measurement properties and evidence
The findings showed that the quality of evidence of measurement properties for obstetric triage tools varied from very low to moderate. All obstetric triage tools showed sufficient measurement properties except for the reliability of MFTI.
To measure obstetric triage, the MFTI displayed moderate-quality evidence for sufficient relevance. The evidence quality of its content validity, comprehensiveness, and comprehensibility was low. The missing data for comprehensiveness and comprehensibility subscales was the reason for low overall quality score for content validity of MFTI. Given that the content validity is the first measurement property which to be considered for selecting a tool, the low evidence quality underpinning the content validity of MFTI as the recommended tool by ACOG to measure obstetric triage is worrisome. The failure to report evidence does not mean that the assessment is not valid. Thus, it is recommended that future studies evaluate measurement properties based on the recently developed COSMIN standards.
Sufficient reliability with moderate-quality evidence was observed for OTAS, while sufficient findings based on low-quality evidence were found for SETS and BSOTS. Insufficient results with very low-quality evidence were reported for the reliability of MFTI. Given the limited number of studies on obstetric triage tools and the overall ratings from very low to moderate on their measurement properties, it is evident that more research is needed in the area of obstetric triage.

Strengths and limitations
There are several strengths in this study. The most important of these is using the most recent version of the COSMIN taxonomy to select and evaluate appropriate measurement properties. Using the checklist cause that the methodology quality of each study is individually accounted for when determining the quality of evidence, and interpreting the results. In addition to the changes made to it, this checklist is appropriate for evaluating clinician-reported outcome measures [32]. The second strength of this study is using highly sensitive validated search filters for finding studies on measurement properties that are available in the COSMIN website for PubMed, EMBASE, and Medline (Ovid). Thirdly, this study is the first systematic review that has been conducted on the measurement properties of obstetric triage tools. Fourthly, language and time restrictions were not considered in the search strategies. Lastly, the multidisciplinary presence of people with relevant expertise in the research team is another strength of this study.
The limitations of this study are; the first, the search was conducted in just 3 databases including EMBASE, PubMed, and Medline (through Ovid) because only these highly sensitive validated other databases. Second, grey literature such as conference abstracts was excluded, which may have contributed to selection bias. Finally, the number of included studies in the systematic review was limited, and the conclusion was unclear accordingly, so more studies are recommended to be conducted about the measurement properties of obstetric triage tools.

Conclusion
This systematic review presents an in-depth insight into the current evidence about the measurement properties of obstetric triage tools. There are many knowledge gaps about the validity (i.e., content, construct, and criterion), reliability (measurement error), and responsiveness of obstetric triage tools. The quality of the majority of studies in terms of reliability ranged from doubtful to adequate. Sufficient reliability with overall ratings ranging from very low to moderate into quality evidence was observed for the majority of scales. According to available evidence, OTAS has higher reliability than other tools. However, due to insufficient evidence, the conclusions about the measurement properties' quality of each of the obstetric triage tools may be uncertain. Therefore, this review emphasizes the necessity for further studies on the measurement properties of obstetric triage tools.