OO-Do detection

Attacking the Out-of-Domain Problem of a Parasite Egg Detection In-The-Wild

Nutsuda Penpong

Yupaporn Wanna

Cristakan Kamjanlard

Anchalee Techasen

Thanapong Intharah

ITC-CSCC2023

Heliyon (Q1)

Datasets

Video

Parasite egg detection presentation

Abstract

Out-of-domain problem (OO-Do) has been hindered machine learning models especially when the models are deployed in real-world situation. The OO-Do happens at the test time when a learned machine learning model have to make a prediction for an input data belongs to a class that has not been seen at the training time. In this work, we tackle the OO-Do in object detection task specifically a parasite egg detection model being used in real-world situation. First, we introduced an in-the-wild parasite egg dataset to evaluate the OO-Do-aware model. The in-the-wild parasite egg dataset was constructed by conducting a chatbot test session with 222 Medical Technology students, which contains 1,552 images uploaded through the chatbot, including 1,049 parasite egg images and 503 non-parasite egg (OO-Do) images. Moreover, we propose a data-driven framework for constructing a parasite egg recognition model for in-the-wild applications to address the issue. The framework describes how we use publicly available datasets to train the parasite egg recognition model about in domain and out-of-domain knowledge. Finally, we compare integration strategies for our proposed two-step parasite egg detection approaches on two test sets: standard and in-the-wild datasets. We also investigate different thresholding strategies for robustness to OO-Do data. In the experiments, we found that concatenating a classification model that is fine-tuned to be aware of OO-Do after the object detection model and using Softmax and G-mean achieved outstanding performance for detecting parasite eggs in the two test sets. The framework gained 7.37% and 4.09% F1-score improvement from the baselines on Chula_test+Wild_OO−Do dataset and in-the-wild parasite egg dataset, respectively

The definition of OO-Do

Differences between out-of-domain test data and out-of-distribution test data where the training data consists of four classes: English springer, car, parachute, and church. The out-of-domain problem (OO-Do) occurs when a test image comes from a class outside the training data. It differs from the well-known out-of-distribution problem. In this work, we clearly define that the OO-Do problem occurs when the test images come from a class outside the training set, while the out-of-5 distribution problem occurs when the test images are of one of the trained classes but come from different distributions from the training data

Proposed Data Driven Frameworks

We propose and evaluate two data-driven frameworks. The frameworks comprise of the data-driven model construction process and recognition model architecture. The distinction between the two frameworks is the order of the recognition models: OO-Do image classification model and object detection model. This section describes our proposed data-driven steps for both frameworks.

1. Classification-first

For the classification-first framework, we use the OO-Do-aware classification model to screen the input images before passing the images to the object detection model

2. Classification-later

In contrast to the first framework, the classification-later framework places the OO-Do-aware classification model behind the object detection model

Experiments and Results

Comparison of different approaches on Chula_test+Wild_OO−Do dataset for out-of-domain experiments. All values are percentages. Bold numbers are superior results. (numbers in the table are represented as top, 2nd-top, 3rd-top, and regular)

Comparison of different approaches on in-the-wild dataset for out-of-domain experiments. All values are percentages. Bold numbers are superior results. (numbers in the table are represented as top, 2nd-top, 3rd-top, and regular)

Acknowledgements

Thank you to Anchalee Techasen and Thanapong Intharah for supporting and advising me to complete the research.
The website template was borrowed from Jon Barron.