The intersection of deep learning and photogrammetry unveils a critical need for balancing the power of deep neural networks with interpretability and trustworthiness, especially for safety-critical application like autonomous driving, medical imaging, or machine vision tasks with high demands on reliability. Quantifying the predictive uncertainty is a promising endeavour to open up the use of deep neural networks for such applications. Unfortunately, most current available methods are computationally expensive. In this work, we present a novel approach for efficient and reliable uncertainty estimation for semantic segmentation, which we call Deep Uncertainty Distillation using Ensembles for Segmentation (DUDES). DUDES applies student-teacher distillation with a Deep Ensemble to accurately approximate predictive uncertainties with a single forward pass while maintaining simplicity and adaptability. Experimentally, DUDES accurately captures predictive uncertainties without sacrificing performance on the segmentation task and indicates impressive capabilities of highlighting wrongly classified pixels and out-of-domain samples through high uncertainties on the Cityscapes and Pascal VOC 2012 dataset. With DUDES, we manage to simultaneously simplify and outperform previous work on Deep-Ensemble-based Uncertainty Distillation.
A schematic overview of the training process of the student model of DUDES. DUDES is an easy-to-adapt framework for efficiently estimating predictive uncertainty through student-teacher distillation. The student model simultaneously outputs a segmentation prediction alongside a corresponding uncertainty prediction. Training the student involves a regular segmentation loss with the ground truth labels and an additional uncertainty loss. As ground truth uncertainties, we compute the predictive uncertainty of a Deep Ensemble, thereby acting as the teacher.
Comparison between the student's and the teacher's mean Intersection over Union (mIoU). We progressively ignore an increasing percentage of pixels in the segmentation prediction and simultaneously re-evaluated the mIoU. The pixels are sorted based on their predictive uncertainty in descending order, thus removing the most uncertain segmentation predictions first.
Example images from the Cityscapes validation set (a) with corresponding ground truth labels (b), our student's segmentation predictions (c), a binary accuracy map (d), and the student's uncertainty prediction (e). White pixels in the binary accuracy map are either incorrect predictions or void classes. Latter appear black in the ground truth labels. For the uncertainty prediction, brighter pixels represent higher predictive uncertainties.
In this work, we propose DUDES, an efficient and reliable uncertainty quantification method by applying student-teacher distillation that maintains simplicity and adaptability throughout the entire framework. We quantitatively demonstrated that DUDES accurately captures predictive uncertainties without sacrificing performance on the segmentation task. Additionally, qualitative results indicate impressive capabilities for the potential identification of wrongly classified pixels and out-of-domain samples through a simple uncertainty-based threshold. With DUDES, we managed to simultaneously simplify and outperform previous work on Deep-Ensemble-based uncertainty quantification. We hope that DUDES encourages other researchers to incorporate uncertainties into state-of-the-art semantic segmentation approaches and to explore the usefulness of our proposed method for other tasks such as detection or depth estimation.
@article{landgraf2024dudes,
title={Dudes: Deep uncertainty distillation using ensembles for semantic segmentation},
author={Landgraf, Steven and Wursthorn, Kira and Hillemann, Markus and Ulrich, Markus},
journal={PFG--Journal of Photogrammetry, Remote Sensing and Geoinformation Science},
volume={92},
number={2},
pages={101--114},
year={2024},
publisher={Springer}
}