Skip to Content

Amazon AWS Certified Machine Learning – Specialty: What’s the Best Way to Transform Categorical Country Codes for Machine Learning Models?

Learn how to efficiently transform two-letter country codes into numeric variables for machine learning model training while minimizing dimensionality increase and avoiding information loss.

Table of Contents

Question

A data scientist is conducting exploratory data analysis (EDA) on a dataset that contains information about product suppliers. The dataset records the country where each product supplier is located as a two-letter text code. For example, the code for New Zealand is “NZ.”

The data scientist needs to transform the country codes for model training. The data scientist must choose the solution that will result in the smallest increase in dimensionality. The solution must not result in any information loss.

Which solution will meet these requirements?

A. Add a new column of data that includes the full country name.
B. Encode the country codes into numeric variables by using similarity encoding.
C. Map the country codes to continent names.
D. Encode the country codes into numeric variables by using one-hot encoding.

Answer

The best solution to transform the two-letter country codes that meets the requirements is:

B. Encode the country codes into numeric variables by using similarity encoding.

Explanation

Similarity encoding will map the country codes to numeric values in a way that captures similarities between countries, such as geographic proximity or economic ties. This will add useful information that the model can learn from.

In contrast:
A) Adding the full country names increases dimensionality significantly by adding long text strings.
C) Mapping to continent names loses the country-level granularity.
D) One-hot encoding would create a new binary column for every country, greatly increasing dimensionality.

Therefore, similarity encoding allows transforming the codes to numeric values while minimizing the increase in dimensionality and preserving the most relevant information. This will result in a more compact, informative feature for the machine learning model to train on.

Amazon AWS Certified Machine Learning – Specialty certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Machine Learning – Specialty exam and earn Amazon AWS Certified Machine Learning – Specialty certification.