What Does a Higher SHAP Value Mean for One Demographic Group in Credit Scoring?
Learn what different SHAP values across demographic groups can mean in credit scoring and how uneven feature importance may signal potential disparate impact or fairness risk.
Table of Contents
Question
You’re analyzing a credit scoring model using SHAP values. For applicants from Group A, you notice that the feature “years_at_current_address” has a mean absolute SHAP value of 0.12, while for Group B applicants, the same feature has a mean absolute SHAP value of 0.03. What does this difference most likely indicate?
A. The model treats both groups fairly because the feature is used for both.
B. The model may be exhibiting disparate impact by weighting this feature differently across demographic groups.
C. The model is more accurate for Group A than Group B.
D. Group A applicants have more stable addresses than Group B applicants.
Answer
B. The model may be exhibiting disparate impact by weighting this feature differently across demographic groups.
Explanation
The difference in mean absolute SHAP values suggests that the model is relying much more heavily on years at current address when making predictions for Group A than for Group B.
In a credit scoring setting, that kind of uneven feature influence across demographic groups can indicate a potential fairness concern. SHAP-based fairness analysis is often used to identify which features are contributing to demographic disparities in model outputs, so this pattern may point to possible disparate impact rather than equal treatment.
Why the others are wrong
A is incorrect because simply using the same feature for both groups does not mean the model is treating both groups fairly. If the feature affects one group much more strongly, fairness concerns can still exist.
C is incorrect because SHAP values measure feature contribution to predictions, not model accuracy. A larger SHAP value does not mean the model performs better for that group.
D is incorrect because SHAP values do not directly tell you that Group A actually has more stable addresses. They show how much the model relies on that feature, not whether the real-world trait is true.