Learn the best approach for understanding the relationship between two categorical features like day of week and age group to discover product usage patterns in this AWS Machine Learning Specialty Exam question.
Table of Contents
Question
A machine learning (ML) specialist collected daily product usage data for a group of customers. The ML specialist appended customer metadata such as age and gender from an external data source.
The ML specialist wants to understand product usage patterns for each day of the week for customers in specific age groups. The ML specialist creates two categorical features named dayofweek and binned_age, respectively.
Which approach should the ML specialist use discover the relationship between the two new categorical features?
A. Create a scatterplot for day_of_week and binned_age.
B. Create crosstabs for day_of_week and binned_age.
C. Create word clouds for day_of_week and binned_age.
D. Create a boxplot for day_of_week and binned_age.
Answer
B. Create crosstabs for day_of_week and binned_age.
Explanation
To understand the relationship between two categorical features like day_of_week and binned_age, the most appropriate approach is to create a crosstab (also known as a contingency table or pivot table).
A crosstab shows the frequency distribution and relationship between two categorical variables. It displays the count or percentage of observations for each combination of categories from the two features. This allows you to easily see patterns, such as which age groups tend to use the product more on certain days of the week.
The other options are not suitable for categorical features:
- A scatterplot (Choice A) is used to visualize the relationship between two continuous numeric variables, not categorical variables.
- Word clouds (Choice C) are used to display the frequency or importance of words in a body of text, not for analyzing relationships between categorical variables.
- A boxplot (Choice D) summarizes the distribution of a numeric variable for different categories, but does not show the relationship between two categorical variables.
Therefore, creating a crosstab is the best approach for the ML specialist to discover usage patterns across the the dayofweek and binned_age categorical features in this scenario. The crosstab will clearly show the frequency of product usage for each age group on each day of the week.
Amazon AWS Certified Machine Learning – Specialty certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Machine Learning – Specialty exam and earn Amazon AWS Certified Machine Learning – Specialty certification.