Attacking the Out-of-Domain Problem of a Parasite Egg Detection In-The-Wild
- Nutsuda Penpong
- Yupaporn Wanna
- Cristakan Kamjanlard
- Anchalee Techasen
- Thanapong Intharah
Video
Parasite egg detection presentation
Abstract
Out-of-domain problem (OO-Do) has been hindered machine learning models especially when the models are deployed in real-world situation. The OO-Do happens at the test time when a learned machine learning model have to make a prediction for an input data belongs to a class that has not been seen at the training time. In this work, we tackle the OO-Do in object detection task specifically a parasite egg detection model being used in real-world situation. First, we introduced an in-the-wild parasite egg dataset to evaluate the OO-Do-aware model. The in-the-wild parasite egg dataset was constructed by conducting a chatbot test session with 222 Medical Technology students, which contains 1,552 images uploaded through the chatbot, including 1,049 parasite egg images and 503 non-parasite egg (OO-Do) images. Moreover, we propose a data-driven framework for constructing a parasite egg recognition model for in-the-wild applications to address the issue. The framework describes how we use publicly available datasets to train the parasite egg recognition model about in domain and out-of-domain knowledge. Finally, we compare integration strategies for our proposed two-step parasite egg detection approaches on two test sets: standard and in-the-wild datasets. We also investigate different thresholding strategies for robustness to OO-Do data. In the experiments, we found that concatenating a classification model that is fine-tuned to be aware of OO-Do after the object detection model and using Softmax and G-mean achieved outstanding performance for detecting parasite eggs in the two test sets. The framework gained 7.37% and 4.09% F1-score improvement from the baselines on Chulatest+WildOO−Do dataset and in-the-wild parasite egg dataset, respectively
The definition of OO-Do
Differences between out-of-domain test data and out-of-distribution test data where the training data consists of four classes: English springer, car, parachute, and church. The out-of-domain problem (OO-Do) occurs when a test image comes from a class outside the training data. It differs from the well-known out-of-distribution problem. In this work, we clearly define that the OO-Do problem occurs when the test images come from a class outside the training set, while the out-of-5 distribution problem occurs when the test images are of one of the trained classes but come from different distributions from the training data
Proposed Data Driven Frameworks
We propose and evaluate two data-driven frameworks. The frameworks comprise of the data-driven model construction process and recognition model architecture. The distinction between the two frameworks is the order of the recognition models: OO-Do image classification model and object detection model. This section describes our proposed data-driven steps for both frameworks.
1. Classification-first
For the classification-first framework, we use the OO-Do-aware classification model to screen the input images before passing the images to the object detection model
2. Classification-later
In contrast to the first framework, the classification-later framework places the OO-Do-aware classification model behind the object detection model
Experiments and Results
Comparison of different approaches on Chulatest+WildOO−Do dataset for out-of-domain experiments. All values are percentages. Bold numbers are superior results. (numbers in the table are represented as top, 2nd-top, 3rd-top, and regular)
Comparison of different approaches on in-the-wild dataset for out-of-domain experiments. All values are percentages. Bold numbers are superior results. (numbers in the table are represented as top, 2nd-top, 3rd-top, and regular)
Acknowledgements
Thank you to Anchalee Techasen and Thanapong Intharah for supporting and advising me to complete the research.
The website template was borrowed from Jon Barron.