That's an interesting idea and I get the motivation. I often wonder if we're already able explain predictions. There are things like gradcam to determine which inputs caused which activations. It's just that the explanations aren't very understandable or satisfying from a human perspective.
You should definitely look at advances in the field then. There a lot more promising work beyond grad cam. And there are a few techniques for human oriented explanations - namely TCAV (human oriented explanations) and PatternNet/PatternAttribution as well as contrastive LRP (in terms of much more informative heatmapping techniques which are dependent on intermediate layers as well.)