Effects of Label Noise on Deep Learning-Based Skin Cancer Classification

Hekler, Achim and Kather, Jakob N. and Krieghoff-Henning, Eva and Utikal, Jochen S. and Meier, Friedegund and Gellrich, Frank F. and Belzen, Julius Upmeier Zu and French, Lars and Schlager, Justin G. and Ghoreschi, Kamran and Wilhelm, Tabea and Kutzner, Heinz and Berking, Carola and Heppt, Markus and Haferkamp, Sebastian and Sondermann, Wiebke and Schadendorf, Dirk and Schilling, Bastian and Izar, Benjamin and Maron, Roman and Schmitt, Max and Froehling, Stefan and Lipka, Daniel B. and Brinker, Titus J. (2020) Effects of Label Noise on Deep Learning-Based Skin Cancer Classification. FRONTIERS IN MEDICINE, 7: 177. ISSN , 2296-858X

Full text not available from this repository. (Request a copy)

Abstract

Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39-75.66%) for dermatological and 73.80% (95% CI: 73.10-74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12-65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66-65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.

Item Type: Article
Uncontrolled Keywords: DERMATOLOGISTS; MELANOMA; ALGORITHMS; dermatology; artificial intelligence; label noise; skin cancer; melanoma; nevi
Subjects: 600 Technology > 610 Medical sciences Medicine
Divisions: Medicine > Lehrstuhl für Dermatologie und Venerologie
Depositing User: Dr. Gernot Deinzer
Date Deposited: 24 Mar 2021 10:10
Last Modified: 24 Mar 2021 10:10
URI: https://pred.uni-regensburg.de/id/eprint/44596

Actions (login required)

View Item View Item