Evaluating Concept Explanations for CNNs Under Adversarial Image Transformations
Keywords:
Convolutional neural networks, adversarial attacks, concept explanations, fidelity, image transformationAbstract
Concept-based explainers for convolutional neural networks (CNNs) provide human-understandable explanations by revealing what the CNN sees, rather than merely indicating where it looked. However, their performance is limited by the reducer at its core and adversarial attacks. Although CNN classification performance may be enhanced by some image transformations in small amounts whereas intense image transformations can cause noticeable variations to CNN predictions, it is uncertain how explainers perform in such cases. This paper investigates the performance of state-of-the-art concept-based explainers at different levels of adversarial attacks for the first time. We achieve this by exploring different image transformations as adversarial attacks, including Gaussian noise, elastic transform, rotation, and contrast on the ILSVRC2012 dataset. Our study shows that image transformation techniques altering only image coordinates have little impact on classifier and explainer performance, whereas methods modifying image pixels, such as elastic transform and contrast, significantly affect performance, akin to introducing Gaussian noise. Our work underscores the significance of scrutinizing explainers during their development and adoption for CNNs.
https://doi.org/10.59200/ICONIC.2024.006