Assessing data quality - A probability-based metric for semantic consistency

Heinrich, Bernd and Klier, Mathias and Schiller, Alexander and Wagner, Gerit (2018) Assessing data quality - A probability-based metric for semantic consistency. DECISION SUPPORT SYSTEMS, 110. pp. 95-106. ISSN 0167-9236, 1873-5797

Full text not available from this repository. (Request a copy)

Abstract

We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers.

Item Type: Article
Uncontrolled Keywords: DATA CURRENCY; TAXONOMY; Data quality; Data quality assessment; Data quality metric; Data consistency
Subjects: 300 Social sciences > 330 Economics
Divisions: Business, Economics and Information Systems > Institut für Wirtschaftsinformatik > Lehrstuhl für Wirtschaftsinformatik II (Prof. Dr. Bernd Heinrich)
Depositing User: Dr. Gernot Deinzer
Date Deposited: 10 Mar 2020 13:37
Last Modified: 10 Mar 2020 13:37
URI: https://pred.uni-regensburg.de/id/eprint/14524

Actions (login required)

View Item View Item