Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks

Maron, Roman C. and Weichenthal, Michael and Utikal, Jochen S. and Hekler, Achim and Berking, Carola and Hauschild, Axel and Enk, Alexander H. and Haferkamp, Sebastian and Klode, Joachim and Schadendorf, Dirk and Jansen, Philipp and Holland-Letz, Tim and Schilling, Bastian and von Kalle, Christof and Froehling, Stefan and Gaiser, Maria R. and Hartmann, Daniela and Gesierich, Anja and Kaehler, Katharina C. and Wehkamp, Ulrike and Karoglan, Ante and Baer, Claudia and Brinker, Titus J. and Schmitt, Laurenz and Peitsch, Wiebke K. and Hoffmann, Friederike and Becker, Juergen C. and Drusio, Christina and Jansen, Philipp and Klode, Joachim and Lodde, Georg and Sammet, Stefanie and Schadendorf, Dirk and Sondermann, Wiebke and Ugurel, Selma and Zader, Jeannine and Enk, Alexander and Salzmann, Martin and Schaefer, Sarah and Schaekel, Knut and Winkler, Julia and Woelbing, Priscilla and Asper, Hiba and Bohne, Ann-Sophie and Brown, Victoria and Burba, Bianca and Deffaa, Sophia and Dietrich, Cecilia and Dietrich, Matthias and Drerup, Katharina Antonia and Egberts, Friederike and Erkens, Anna-Sophie and Greven, Salim and Harde, Viola and Jost, Marion and Kaeding, Merit and Kosova, Katharina and Lischner, Stephan and Maagk, Maria and Messinger, Anna Laetitia and Metzner, Malte and Motamedi, Rogina and Rosenthal, Ann-Christine and Seidl, Ulrich and Stemmermann, Jana and Torz, Kaspar and Velez, Juliana Giraldo and Haiduk, Jennifer and Alter, Mareike and Baer, Claudia and Bergenthal, Paul and Gerlach, Anne and Holtorf, Christian and Karoglan, Ante and Kindermann, Sophie and Kraas, Luise and Felcht, Moritz and Gaiser, Maria R. and Klemke, Claus-Detlev and Kurzen, Hjalmar and Leibing, Thomas and Mueller, Verena and Reinhard, Raphael R. and Utikal, Jochen and Winter, Franziska and Berking, Carola and Eicher, Laurie and Hartmann, Daniela and Heppt, Markus and Kilian, Katharina and Krammer, Sebastian and Lill, Diana and Niesert, Anne-Charlotte and Oppel, Eva and Sattler, Elke and Senner, Sonja and Wallmichrath, Jens and Wolff, Hans and Giner, Tina and Glutsch, Valerie and Kerstan, Andreas and Presser, Dagmar and Schruefer, Philipp and Schummer, Patrick and Stolze, Ina and Weber, Judith and Drexler, Konstantin and Haferkamp, Sebastian and Mickler, Marion and Stauner, Camila Toledo and Thiem, Alexander (2019) Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. EUROPEAN JOURNAL OF CANCER, 119. pp. 57-65. ISSN 0959-8049, 1879-0852

Full text not available from this repository. (Request a copy)

Abstract

Background: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. Methods: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. Findings: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). Interpretation: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001). (C) 2019 The Author(s). Published by Elsevier Ltd.

Item Type:	Article
Uncontrolled Keywords:	LEVEL CLASSIFICATION; MELANOMA; Skin cancer; Artificial intelligence; Melanoma; Skin cancer screening
Subjects:	600 Technology > 610 Medical sciences Medicine
Divisions:	Medicine > Lehrstuhl für Dermatologie und Venerologie
Depositing User:	Dr. Gernot Deinzer
Date Deposited:	31 Mar 2020 05:09
Last Modified:	31 Mar 2020 05:09
URI:	https://pred.uni-regensburg.de/id/eprint/26315

Actions (login required)

View Item