Original research

Scent dogs in detection of COVID-19: triple-blinded randomised trial and operational real-life screening in airport setting

Abstract

Objective To estimate scent dogs’ diagnostic accuracy in identification of people infected with SARS-CoV-2 in comparison with reverse transcriptase polymerase chain reaction (RT-PCR). We conducted a randomised triple-blinded validation trial, and a real-life study at the Helsinki-Vantaa International Airport, Finland.

Methods Four dogs were trained to detect COVID-19 using skin swabs from individuals tested for SARS-CoV-2 by RT-PCR. Our controlled triple-blinded validation study comprised four identical sets of 420 parallel samples (from 114 individuals tested positive and 306 negative by RT-PCR), randomly presented to each dog over seven trial sessions. In a real-life setting the dogs screened skin swabs from 303 incoming passengers all concomitantly examined by nasal swab SARS-CoV-2 RT-PCR. Our main outcomes were variables of diagnostic accuracy (sensitivity, specificity, positive predictive value, negative predictive value) for scent dog identification in comparison with RT-PCR.

Results Our validation experiments had an overall accuracy of 92% (95% CI 90% to 93%), a sensitivity of 92% (95% CI 89% to 94%) and a specificity of 91% (95% CI 89% to 93%) compared with RT-PCR. For our dogs, trained using the wild-type virus, performance was less accurate for the alpha variant (89% for confirmed wild-type vs 36% for alpha variant, OR 14.0, 95% CI 4.5 to 43.4). In the real-life setting, scent detection and RT-PCR matched 98.7% of the negative swabs. Scant airport prevalence (0.47%) did not allow sensitivity testing; our only SARS-CoV-2 positive swab was not identified (alpha variant). However, ad hoc analysis including predefined positive spike samples showed a total accuracy of 98% (95% CI 97% to 99%).

Conclusions This large randomised controlled triple-blinded validation study with a precalculated sample size conducted at an international airport showed that trained scent dogs screen airport passenger samples with high accuracy. One of our findings highlights the importance of continuous retraining as new variants emerge. Using scent dogs may present a valuable approach for high-throughput, rapid screening of large numbers of people.

Key messages

What is already known on this topic

  • Previous data suggest that scent dogs can discriminate between samples from individuals infected with SARS-CoV-2 and controls.

What this study adds

  • Scent dogs showed high diagnostic accuracy in a randomised, controlled, triple-blinded validation test with sample size based on power calculations.

  • Scent dogs trained with wild-type SARS-CoV-2 virus also mastered identification of other variants, although less accurately, revealing their robust discriminatory power and indicating a need for continual training to deal with emerging new variants of concern.

How this study might affect research, practice and/or policy

  • Scent dog detection can serve as a prescreening method to save time and resources or even as the sole testing method when other approaches are not yet available—for example, at the early stages of a pandemic.

  • Scent dogs trained to screen SARS-CoV-2 carriers at a public international airport, and other similar mass gatherings, can provide a valuable tool to contain the pandemic.

Introduction

Containment of the COVID-19 pandemic necessitates rapid large-scale identification of infected individuals. Most patients with SARS-CoV-2 disease are either asymptomatic or have only mild symptoms, but can be contagious.1 The test-and-isolate strategy has largely relied on the modern reverse transcriptase-polymerase chain reaction (RT-PCR) technique. Its practicality is hampered by inadequate availability, restricted testing capacity, high costs, long turnaround time, and prolonged positivity after infection.2 3 Rapid screening methods, such as antigen tests are already in use.4 A fascinating screening strategy consists of detection by trained scent dogs, an approach not confined to laboratories, enabling large sample numbers with results in real time.5 6

Dogs have an extremely sensitive olfactory system: their limit of detection reaches as low as one part per trillion concentrations,7 exceeding the instruments currently available.8 Dogs are presumed to detect distinct volatile organic compounds (VOCs)9 released by their hosts’ metabolic processes in various conditions.5 Indeed, dogs have been reported to identify distinct VOCs elicited by various bacterial, viral and parasitic infections.10–12

During the current pandemic, scent detection dogs have been trained to identify samples from hospitalised patients with COVID-1913–17 (online supplemental table 1). The preliminary data suggest that dogs can be trained within weeks to detect samples from SARS-CoV-2-infected individuals with an accuracy comparable to standard RT-PCR. However, stronger evidence is needed with power calculated sample sizes, better defined control groups, and above all, randomised double/triple-blinded research designs including previously unsniffed samples from the actual target population, outpatients. While proof-of-concept studies have been encouraging, scent dogs need to be taken from laboratory settings to real-life conditions.

The scent dog approach appears particularly appealing for screening SARS-CoV-2-infected individuals in public places and among masses of travellers at airports and harbours. In the spring 2020, we started training dogs to see whether they could identify samples from SARS-CoV-2-infected individuals and, in the autumn, started operational scent work at the Helsinki-Vantaa International Airport. Here, we present the results of a three-faceted study comprising (1) the dogs’ training, (2) a prospective, randomised triple-blind validation study using four dogs and (3) a real-life prospective study using the same dogs in daily screening of incoming passengers at the airport, comprising a simultaneous scent detection dog test and nasopharyngeal SARS-CoV-2 RT-PCR.

Methods

Study design

We explored whether scent dogs can be trained to identify humans with SARS-CoV-2 infection. The study was conducted at the scent detection dog training centre Wise Nose, Vantaa, Finland; University of Helsinki, Finland (Veterinary Faculty and Departments of Equine and Small Animal Medicine and Veterinary Biosciences and the DogRisk/Helsinki One Health and Medical Faculty, Department of Virology) and Meilahti Vaccine Research Centre, MeVac and Helsinki University Hospital Laboratory (HUSLAB), Helsinki University Hospital (HUH), Finland.

At the Helsinki-Vantaa International Airport, the study was conducted in a specifically designed cubicle (figure 1A) built at the arrivals terminal. The cubicle setting was used in the dogs’ final training, in the validation and as an area where the dogs screened incoming travellers in the real-life study. The cubicle had three sampling rooms for passengers’ skin swab sampling (figure 1B), a working area for the dogs and sliding hatches on walls for samples and tracks (figure 1C).

Figure 1
Figure 1

The purpose-built cubicle at the Helsinki-Vantaa International Airport. (A) The cubicle from the outside with the doors into the three sampling rooms. (B) Sampling room with a hatch for handing in the sample for the scent detection dog test. (C) A room for scent detection dog testing, showing two of the three hatches to the right. (D) White Shepherd, E.T., inside the test room, indicating the sample in the middle (No 2) as positive. During the validation, only three of the five scent track holes had cans with samples.

Patient and public involvement

The participants were recruited from among the following groups: (1) inpatients in HUH, (2) outpatients and healthy individuals who were contacted by telephone or who contacted the study team in response to advertisements posted at PCR testing stations around Helsinki and (3) incoming flight passengers and personnel at Helsinki-Vantaa International Airport. For all inpatients and outpatients, a recent RT-PCR result was available at the time of recruitment. At the airport, recruited passengers were tested concomitantly by RT-PCR and a scent detection dog. Those with a pretravel negative RT-PCR test less than 72 hours old were not retested. For inclusion of airport employees, a RT-PCR test result within 72 hours was required.

All the volunteers gave written informed consent. They completed questionnaires on demographics, symptoms and PCR test results. If their results were not available at the time of form filling, these were obtained later. In addition, electronic medical records of hospitalised patients and personal interviews were used, when needed.

Individuals with incomplete questionnaires and/or non-availability of RT-PCR results were excluded. There were no restrictions of age, sex, nationality or concurrent diseases. In addition, samples from individuals with asymptomatic or typical COVID-19 symptoms were included in all three arms of the study.

Skin swab sampling and specimen handling

All volunteers collected the skin swabs themselves using a sterile package containing five gauze eight ply swabs (Mölnlycke Healthcare AB, Göteborg, Sweden). They were instructed to separate the layers of the gauze and use a single layer to swab the skin of their neck, throat area, forehead, and wrists. They swabbed 5–20 gauze samples and, to avoid evaporation and cross-contamination during storage, they placed them in the smallest of three plastic zip lock bags (volumes 0.5 L, 1 L and 2 L) each of which they placed inside the next, larger one (Minigrip; Suominen Joustopakkaukset Oy, Ikaalinen, Finland).

The outpatient samples to be used in the training, validation and as spike samples in the real-life cohort were collected from the volunteers at their homes shortly after their RT-PCR tests. The courier left the sampling kit at the front door, and after sampling, returned the samples to the dog training facility. All samples, positive and negative samples separated, were stored in plastic boxes (Smart Store; Orthex Sweden AB, Tingsryd, Sweden) in a dark place at 21°C–23°C until used. Samples with an unknown infection status were stored separately until the status was confirmed. The storage time for the validation samples was 0–5 months.

At the airport, in the sampling room (figure 1B), the gauze swab was placed in a 1 L plastic freezer bag (Pirkka; HP Rani Plast Ab, Teerijärvi, Finland) and then placed inside a metallic stainless steel can (85 mm high and 70 mm in diameter). Four extra swabbed gauze samples were placed in a 0.5 L plastic zip lock bag (Minigrip; Suominen Joustopakkaukset Oy, Ikaalinen, Finland), enclosed in a 1 L zip lock bag and stored as previously described.

For each validation session, the scent tracks with samples were prepared in a separate location at the airport on the validation day. Sixty cans were prepared for each dog as follows: The zip lock bag containing the sample was opened, and a single gauze was transferred with sterile metallic tweezers into a can lined with a 1 L plastic freezer bag. To avoid sample odour contamination, the positive and negative samples were prepared on separate tables. The cans were loaded onto the scent tracks according to a computer-generated randomisation list, and a trolley with the tracks transported to the cubicle.

The study team used adequate personal protective equipment, including a mask and powder-free nitrile gloves, when handling the study specimens.

Confirmation of a COVID-19 diagnosis

A COVID-19 diagnosis was based on a positive RT-PCR of a nasopharyngeal swab. Discrepancies in the validation results (at least two dogs giving a response different from that of the RT-PCR test) and in the real-life study were resolved by serum SARS-CoV-2 antibody test analysed by nucleocapsid protein and Spike IgG enzyme immunoassays, as described previously.18 When serum samples could not be obtained, viral load, symptoms and information about SARS-CoV-2 exposure were used to estimate the infection status.

The variant status of SARS-CoV-2 was determined by the SYNLAB for Diagnostics Centre of the Hospital District of Helsinki using data of S gene target failure (SGTF) for the alpha variant, HUSLAB using TaqPath COVID-19 PCR (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and TipMolBiol (Berlin, Germany) using N501Y mutation RT-PCR, which detects alpha or beta variants. The initial RT-PCR samples were subjected to genomic sequencing and bioinformatics analysis as previously described.19 Based on epidemiological data,20 all samples obtained before 6 January 2020 were considered ‘wild-type,’ with reference to the D614G Wuhan-like strain.

Animals

All the dogs included in this study had previous experience of scent work (table 1).

Table 1
|
The dogs’ characteristics, working history and indication behaviours

Part I: Dog training

The initial training, aiming to provide the dogs with a clear scent picture of COVID-19, was carried out by skilled canine scent detection trainers using operant conditioning, with a clicker and treats used for positive reinforcement. In brief, first, the dogs were exposed to cans containing positive samples and taught to indicate a can with a positive sample. Second, they were introduced to a negative sample in parallel with a positive sample to allow scent discrimination. Third, the number of negative and positive samples was increased in the scent track to reinforce discrimination between positive and negative samples. Finally, confounding samples, including samples obtained from volunteers with other respiratory and viral diseases and samples from children, seniors or individuals with underlying diseases, such as asthma or allergies, cancer or diabetes, were introduced as controls. Once the dog and dog handler pair achieved a success rate of higher than 80% in detecting SARS-CoV-2 positive samples, the dog continued its training at Helsinki-Vantaa International Airport in the purpose-built working cubicle described earlier (figure 1A–D). The training was performed during a period when novel virus variants had not yet emerged in Finland.

The training was conducted using two different types of purpose-built metallic scent tracks each with either five or nine holes for the cans and/or triangular shaped metallic single can-holders. Cans used for positive samples were not mixed with cans used for negative samples. The cans and can-holders were washed in an industrial dishwasher at a temperature of around 85°C between every exercise.

Part II: Triple-blinded validation study at Helsinki-Vantaa International Airport

The validation study was conducted according to the Helsinki University triple-blind validation protocol, as described in detail in the online supplemental notes; for the design and execution see figure 2. In total, six investigators and one external controller were present at all validation sessions. Prior to the first validation day, the validation team and the dogs were familiarised with the study conditions and the protocol in a rehearsal session identical to a validation session, here also introducing 60 novel samples. This was followed by seven validation sessions (VAL1–7). In these sessions, four parallel samples from the 420 individuals were randomised (samples: n=1680; dogs: n=4) into tracks of three samples, with 20 tracks in each of the seven sessions. Thus, each dog was exposed to 140 scent tracks. To allow comparisons between dogs, all four dogs received an identical set of samples. Thus, each parallel sample was used only once and for only one dog. The samples were assigned to the sessions (VAL1–7) in chronological order, the ones collected first in the VAL1 session and last in the VAL7 session. The order of the sessions was different for each dog. The dates of the sample collection and the order of the validation sessions per dog are shown in online supplemental table 2).

Figure 2
Figure 2

Triple-blinded study. Assistant A gives the track through a hatch in the wall to assistant B, who places it on the floor and, after the dog and dog handler C have completed their work, gives it to assistant G. The dog handler C announces the result to data recorder D, who instructs whether to reward the dog. The external evaluator E and assistant F follow the setup from a video screen (four cameras inside the cubicle) and verify the triple-blinded study conduct. blinded: the dog, handler C, assistants B, E, G. circles: red, SARS-CoV-2 reverse transcriptase-polymerase chain reaction positive; green, negative sample.

The dogs were rewarded for each positive result immediately after the correct indication. If a dog immediately selected the positive sample and skipped sniffing the other samples, the result was still recorded as successful.

The validation stage of the study was recorded using four cameras set up at different angles. A retired police sergeant from the Finnish K-9 police dog school was present during all the validation sessions as an external controller, confirming that the validations followed the predetermined protocol.

Part III: Real-life cohort

The operational activity at Helsinki-Vantaa International Airport took place between 23 September 2020 and 30 April 2021. In total, 10 119 travellers (83.2%) and airport employees (16.8%) took part in the scent detection dog test, resulting in 48 (0.47%) samples indicated as positive. Part of these were recruited to the validation or the real-life study.

For collection of the skin swabs, see description above. The can with the sample was passed through a hatch to an assistant in the dogs’ working space (figure 1C). The dog handler then placed the can in the five-hole scent track, together with a variable number of control samples (figure 1D). The dog handler interpreted the dog’s indications as positive or negative for SARS-CoV-2 and a written test result was given to the participant.

Statistical analysis

The power calculation suggested a minimum of 108 RT-PCR positive and 108 negative samples to achieve sensitivity (Se) and specificity (Sp) of 90%. This sample size was expected to have an 80% probability of obtaining an estimated Se and Sp of which the lower bound of the 95% CI would be greater than the minimal value of 80% (calculated using https://www.stat.ubc.ca/%7Erollin/stats/ssize/b1.html).21

Se and Sp were calculated according to Trevethan.22 To cover incidences where the dogs directly marked a positive sample and skipped sniffing one or both of the other samples in the same track, Se and Sp were calculated using two separate methods. First, we calculated the Se and Sp only for those samples the dogs truly sniffed (Sesniff and Spsniff). Second, we calculated the Se and Sp for all the samples (Seall and Spall). In this approach, we assumed that the dogs considered all unsniffed samples negative, as they left them untouched. Positive predictive values (PPVs) and negative predictive values (NPVs) were calculated based on our data and on hypothetical prevalence scenarios23: 40% reflecting a high prevalence setting such as a pandemic time hospital, and 1% reflecting a low prevalence setting such as an airport.

We also investigated whether some epidemiological/clinical variables were possibly associated with failure to identify positive samples. To do so, we restricted the calculations to include only positive samples. Positive samples were defined as a true positive (TP) if all four dogs correctly marked them, and as a false negative (FN) if even a single dog did not mark the sample. The candidate variables potentially associated with the outcome (FN) were: age, gender, concurrent chronic diseases (asthma, allergy, cancer, diabetes, migraine), presence of typical COVID-19 symptoms, duration of symptoms (time in days between symptom onset and sample collections), time between RT-PCR test and sample collection and type of virus (wild type vs alpha variant). Univariate logistic regression models were performed and ORs with their corresponding 95% confidence intervals (95% CI) were provided. For quantitative variables (such as age, duration of symptoms and time between RT-PCR test and sample collection), the linearity of the association was investigated using restricted cubic spline (RCS) functions with three knots located at the 5th, 50th, and 95th centiles of the quantitative variable.24 When the association with the outcome (on the log scale) was not considered as linear, the quantitative variable was included in the model with the RCS functions, and ORs were provided for arbitrary values. Because of the low number of FN samples, multivariate models could not be performed. However, in order to rule out the possible confounding effects of the candidate variables, bivariate logistic regression models were performed, exploring one by one each candidate variable as a potential confounder together with the variable of interest. ORs were considered significant (type-I error set at 0.05) when their corresponding 95% CI did not include the number 1. To obtain Se and Sp values for wild-type only samples, we used for calculations VAL1–2 (where no alpha variants had yet emerged). SAS University Edition (SAS Institute Inc., Cary, North Carolina, USA) was used for the statistical data analysis.

Results

Participant and sample characteristics

For research design and number of samples in the three studies, see flowchart in figure 3. The particulars of the volunteers who provided samples for the validation and real-time studies are presented in table 2.

Figure 3
Figure 3

Flow chart of the study conduct.

Table 2
|
Data of volunteers providing skin swab samples with concomitant reverse transcriptase-polymerase chain reaction (RT-PCR) verification

Part I: Dog training

The initial training relied on a positive reinforcement approach and predefined positive and negative samples. Having completed that phase at a training centre (qualifying by sensitivity and specificity exceeding 80%) and, before starting operational work, the dogs were further coached in the purpose-built cubicle at the Helsinki-Vantaa International Airport (figure 1).

Part II: Validation

For the validation study, we selected the four dogs working at the airport during the study period. The conduct followed the detailed Helsinki University Scent detection validation protocol (online supplemental notes). Each dog was presented with an identical set of 420 parallel novel randomised samples, including 114 positives (27%) and 306 negatives (73%; table 2), in 140 fixed three-sample tracks, provided over seven validation trial sessions (VAL1–7). Of the 140 tracks, 26 (18.6%) were randomised as not containing positive samples. The session and track order were not disclosed to personnel and varied by dog.

Overall, the diagnostic accuracy of all samples sniffed was 92% (95% CI 90% to 93%). The combined sensitivity Sesniff and specificity Spsniff for all four dogs was 92% (95% CI 89% to 94%) and 91% (89%–93%), respectively (for unsniffed samples and positive and negative predictive values, see table 3). Only minor variation was seen between the dogs: the best performance reached 93% (95% CI 85% to 96%) for Se and 95% (91% to 97%) for Sp, and the lowest 88% (80% to 94%) and 90% (85% to 93%), respectively. To obtain Se and Sp values for detecting wild-type samples only, we used the data from VAL1–2 where no alpha variants had emerged, separately: the overall accuracy was 97% (95% CI 95% to 98%), the Se 99% (96% to 100%) and the Sp 96% (93% to 98%). The figures for each dog’s individual validation session are provided in online supplemental table 2).

Table 3
|
The diagnostic performance of the scent dogs in the triple-blind validation test

Discrepancies (at least two dogs’ results differing from the RT-PCR) were observed for 19/420 samples (table 4), 79% of them in VAL6–7, with samples gathered in late February–March 2021. Eight of the 19 samples were RT-PCR positive. Our re-evaluation of sample status (based on RT-PCR viral load, symptoms, time since symptom onset and antibody data) confirmed all these eight as SARS-CoV-2 positive, of which six were alpha variants, one not known, and one was wild-type. Of the 11 RT-PCR negative samples detected as positive by the dogs, six were confirmed as SARS-CoV-2 negative, four as uncertain and one as a possible positive.

Table 4
|
Validation participants with a discrepancy between SARS-CoV-2 RT-PCR and the response from two dogs or more

Based on the prevalence rate of COVID-19 positive samples in our data (27%), the overall PPV and NPV were 83.9% (95% CI 80.8% to 86.7%) and 95.8% (94.4% to 96.9%), respectively (table 3). In a population with 40% prevalence, the PPV and NPV were calculated as 87.8% (95% CI 85.3% to 90.0%) and 94.4% (92.4% to 95.8%), respectively. In a population with a prevalence of 1% the PPV and NPV were 9.8% (8.1% to 12.0%) and 99.91% (99.88% to 99.93%), respectively.

Of the 114 positive COVID-19 samples, 30 were FN and 84 were TP. Failure to identify a COVID-19 positive sample was associated with the SARS-CoV-2 variant status (alpha vs wild-type; OR=14.0; 95% CI, 4.5 to 43.4; table 5): the dogs indicated correctly 89% of the confirmed wild-type samples but only 36% of the alpha variant samples. Based on the OR values with the 95% CI, gender, concurrent chronic disease, time between start of symptoms and sampling, time since PCR test and increasing age of patients were found not to be associated with failure to identify a COVID-19 positive samples (table 5). None of the ORs presented in table 5 were modified in bivariate analyses (data not shown).

Table 5
|
Univariate analysis of associations between variables and failure to identify COVID-19 positive samples

Part III: Real-life cohort

The dog identification and the RT-PCR result matched for 296/303 (97.7%) of the real-life samples of incoming passengers. The dogs correctly identified the samples as negative for 296/300 (98.7%) RT-PCR negative swabs. Table 6 provides details of the seven discrepant results. The dogs indicated three RT-PCR positive cases as negative. After re-evaluation with clinical and serological data, one was judged as SARS-CoV-2 negative, one as SARS-CoV-2 positive and one as a likely postinfectious positive RT-PCR result. Similarly, the dogs indicated four RT-PCR negative cases as positive. These were all judged as SARS-CoV-2 negative.

Table 6
|
Real-life cohort participants with a discrepancy between SARS-CoV-2 RT-PCR and dog response

To maintain the dogs’ screening skills in this low prevalence (0.47%) setting, a total of 155 novel RT-PCR positive ‘spike’ samples were provided to the dogs during working days (online supplemental table 3). They correctly indicated 98.7% of them as positive. Had the spike samples been calculated as part of the real-life study, the dogs’ performance would have reached a sensitivity of 97% (95% CI 92% to 99%) and a specificity of 99% (96% to 100%).

In the operational real-life setup, we also used five non-validated dogs. Their results closely accorded with those of the validated dogs (data not shown).

Discussion

Our study demonstrates that, in comparison with RT-PCR, scent detection dogs can be trained to identify SARS-CoV-2-infected individuals from skin swab samples with high diagnostic accuracy. In our real-life setting with a very low prevalence, the performance in identifying negative samples was very good (98.7%). Unfortunately, because of a low number of confirmed positive cases, accuracy with respect to positive samples could not be reliably assessed. However, ad hoc analysis also calculating the positive spike swabs showed a real-life performance of 98.5% for detecting positive samples. Below we discuss separately each of the three parts of the study.

Part I: Training of dogs

To keep the training short, we used dogs with previous scent work experience. Unlike studies conducted only in laboratory settings,13–17 we included two phases: initial training at a training centre, and—once the dogs were qualified—training in a challenging environment (Helsinki-Vantaa International Airport). Only one of nine dogs did not show high motivation for working in the test cubicle.

Part II: Validation test

Our validation experiment showed a high diagnostic accuracy with 92% sensitivity and 91% specificity. Several previous studies suggest that scent dogs can distinguish between samples from SARS-CoV-2-infected and uninfected individuals (reviewed in online supplemental table 1). However, although they demonstrate the dogs’ diagnostic accuracy, these previous proof-of-concept studies13–17 had some limitations: small samples sizes,13 16 17 repeated use of the same samples,15 use of inactivated samples,13 14 16 use of empty cans or clean gauze swabs as controls15 16 and conducting validation tests only in laboratory settings.13–17 Perhaps even more importantly, in those studies almost all samples were collected from hospitals,13 15–17 failing to cover the actual target population, outpatients. Alternatively, positive samples were from hospitals and controls from outside hospitals—potentially misleading the dogs to identify hospital-associated odours as positive cues. Indeed, scent dog guidelines advise to watch out for systematic differences between positive and control samples.25 Apart from these published proof-of-concept studies, some non-peer reviewed preprints provide further data26: Guest et al collected 400 odour samples from patients with asymptomatic or mild COVID-19 and demonstrated in a randomised double-blind trial under laboratory conditions with six dogs a sensitivity range of 82%–94% and a specificity range of 76%–92%.

Based on the overall Se and Sp (92% and 91%, respectively), we calculated PPV and NPV according to two infection probability scenarios reflecting the prevalence of 40% and 1%. For a population with a prevalence of 40%, we estimated a PPV of 87.8% and a NPV of 94.4%. This means that the information provided by the dog (marking or not marking) increases the chances of detection to around 90%. For a population with a prevalence of 1%, by contrast, we estimated a PPV of 9.8% and an NPV of 99.9%. In both scenarios, high NPV supports the use of dogs for screening to exclude individuals not needing RT-PCR. We therefore suggest that dogs could be used both in sites of high SARS-CoV-2 prevalence, such as hospitals (to prescreen patients and personnel), as well as in low prevalence sites such as airports or ports (to prescreen passengers). Such prescreening could save considerably both time and PCR testing resources.

Our study design overcomes some of the major limitations of the previous studies: our sample size was based on a power calculation, our validation experiment was conducted outside laboratory conditions and our samples were collected in a random fashion as four parallel swabs, each used only once. We collected positive and negative swabs from both asymptomatic and symptomatic outpatients, children and seniors, and those with non-communicable diseases, and included samples collected during early and late phases of the disease. Unlike the previous validation studies, we randomly included tracks with no positive samples. This mimics better the real-life situation in low-prevalence settings.

In univariate analysis, the only variable strongly associated with failure to identify COVID-19 positive samples was the alpha variant (table 5). Indeed, according with the epidemiological situation in our country,20 the virus variants started to emerge only at the end of our validation sample collection period, with 59% of the positive samples in VAL6–7 confirmed to represent virus variants (mostly alpha) and only 3% the wild-type virus (the virus type of the others remained unknown). Importantly, the dogs had only been trained to detect samples of patients infected by the wild-type virus. The emergence of the new variants presumably explains the less successful performance by the dogs towards the end of the study period. In the bivariate analyses, after adjustment for all other variables possibly associated with the dogs’ performance, the association between variant type and detection failure remained as strong as in the univariate analysis. Naturally, we cannot rule out confounding effects of other variables than the ones we investigated. Interestingly, Guest et al also had a small amount of alpha variant samples in their dataset. Their dogs correctly identified 38/48 (79%) of the alpha variant samples, the rate remaining lower than for the wild-type virus.26 The difference was not significant, yet as their study was not designed to investigate the variants, it might have lacked statistical power. In our investigation the difference was highly significant according to the OR and its 95% CI, as the dogs correctly indicated 55/62 (89%) of the confirmed wild-type samples, but only 9/25 (36%) of the alpha variant samples. Thus, while the dogs indicated the alpha variant samples, their performance was lower than with the wild-type virus. Indeed, this observation is remarkable as it proves the scent dogs’ robust discriminatory power. The obvious implication is that training samples should cover all epidemiologically relevant variants. Our preliminary observations suggest that dogs primed with one virus type can in a few hours be retrained to detect its variants (data not shown).

Another aspect to discuss is the low number of asymptomatic sample donors, which could have hampered the evaluation of the scent dogs’ performance with samples from such individuals. In fact, the performance related to asymptomatic subjects is of particular importance, since in a real-world screening most individuals are asymptomatic. However, as we collected four samples from each of the sample donors, we ended up with 28 tests with samples from asymptomatic individuals. Only one was incorrectly identified as negative and two were left unsniffed. Thus 25/28 (89.3%) were correctly identified as positive. In our analysis lack of symptoms was not associated with poorer performance.

Finally, since dogs may become tired or unfocused when working long hours, we ran the validation tests randomly in varying order over seven working days for each dog. Like in previous studies,14 15 all dogs did not perform equally. The differences were surprisingly small, however, particularly considering that the dog with the lowest results, E.T, was diagnosed with parotitis during her validation study, yet also her less successful days prior to diagnosis were included in her data. The low inter-dog variability observed in our study originates most probably from the consistent high-quality training performed both in the training centre and at the airport.

Part III: real-life air passenger study

While the validation experiment was successful, the real-life study at the airport met with some adversity. Although the dogs identified 98.7% of the negative samples as negative, they indicated four RT-PCR negatives as positives and did not identify three RT-PCR positive cases. Re-evaluations (the time span between symptom onset and sampling for RT-PCR, clinical symptoms, viral loads as cycle threshold counts and SARS-CoV-2 antibodies) of the RT-PCR positive cases suggested, however, that only one of the three represented the targeted group of early and potentially infectious cases. One of the RT-PCR positive individuals failed to seroconvert, suggesting a false-positive RT-PCR result. The samples of another three RT-PCR positive individuals had been collected as late as 10 days after symptom onset, presumably indicating a postinfectious positive RT-PCR result. Interestingly, the virus in the single verified case was identified as an alpha variant, possibly reflecting the dogs’ lower sensitivity to detect it.

A major difference between the real-life and validation studies was seen in the rate of positive samples, which was over 50-fold lower in the real-life study than in validation (0.47% vs 27% of all samples)—that is, the dogs would have got only one positive sample to sniff each week at the most. Anticipating a low prevalence, the dogs’ skills were kept up by providing them with a total of 155 novel (not sniffed by any dog), positive ‘spike’ samples over their shifts. Had these spike samples been included in the real-life study, the prevalence would have been 34%, not differing from that in the validation study, thus confirming the methods’ potential for screening SARS-CoV-2 carriers. Similarly to any other diagnostic or screening tests, positive controls are needed to validate their accuracy. With the dogs, these spike samples serve as controls and also act as rewards, reinforcing the detection. In a low-prevalence setting, the use of spike samples needs to be preplanned before implementing scent dogs in the operational work. Of note, collection of spike samples from patients may no longer be needed in the future, as preliminary data suggest that spike material can be produced in the laboratory.27

Limitations of the study

Some limitations deserve to be discussed. First, scent dogs previously trained to detect other substances such as drugs may also mark them, and the dog handler may record the marking falsely as positive for COVID-19. In this study, samples with false indications were not studied for narcotics and dangerous goods—that is, odours with which the three dogs were previously familiar.

Second, the age of the samples varied. The samples used in training and validation, as well as the ‘spike’ samples were older than in the real-life study, for they had to be verified before use: in the real-life operational setting, the samples were freshly collected and immediately presented to the dogs. We acknowledge that storage might have affected the VOCs.28 Further studies have been started to determine the precise nature of COVID-19-specific VOCs.

Third, the validation test also had some limitations. The low number of positive samples available led to a lack of tracks with multiple positive samples. This should not have had any greater effect on the results, as the dogs had practised both with blank tracks and tracks with multiple positive samples.

Finally, since variants did not emerge in Finland at the time of training, only wild-type samples were used. Many of the discrepant results were associated with the new variant. In the future, operational work skills should be kept up by simultaneous training with samples of emerging virus variants. Fortunately, once the dogs have received the basic training, retraining to cover new variants is expected to be easy as discussed above.

Conclusion

Employing a triple-blinded validation study setup, we provided evidence that trained scent dogs can master detection of samples from individuals infected with SARS-CoV-2 with good diagnostic accuracy. Interestingly, trained using samples only from individuals who had contracted the wild-type virus, the dogs’ performance declined with samples of the variant era. We also provided some evidence that dogs can be trained to work at an international airport where large-scale rapid screening of crowds in a short period of time is required. In the real-life setting, we verified the results from our validation study for negative samples, but the dogs’ ability to detect positive samples could not be confirmed owing to low prevalence of positive individuals. Ad hoc analysis also taking into account the positive spike samples, however, yielded convincing accuracy among the real-life cohort.