Explaining Data Changes with Prototypes: A Measure-Driven Approach

Karolczak, J., Stefanowski, J.
Information Fusion
February 2026

Abstract

Prototype explanations of machine learning models have been considered solely for static data, while their use for concept drifting data still remains underexplored. In this work, this challenge is addressed using the algorithm that explains the predictions of the Random Forest tree ensemble classifier with a limited number of prototypes. This also involves the proposal of new measures to evaluate prototypes in static and evolving settings, enabling comparison of prototype sets before and after the data change and the construction of new drift detectors. The presented proposals are evaluated through many experiments. In the first experiments with synthetic datasets, the new measures - mean minimal distance, mean centroid displacement, and prototype reassignment impact - proved effective when evaluated using a set of diverse data generators. Then, for incremental learning, the RACE-P algorithm is introduced, leveraging prototypes for interpretable drift detection. Experiments demonstrate competitive performance against established detection methods such as ADWIN and Page-Hinkley. Additionally, the use of prototypes to analyse and explain detected drifts is discussed, underscoring their potential to enhance understanding of data evolution.

Jacek Karolczak

Abstract