Amazon AWS Certified Machine Learning - Specialty: How to Represent Multiple-Choice Survey Responses for Logistic Regression?

Learn the best method to comprehensively represent responses from a multiple-choice survey in a dataset to train a logistic regression model, comparing one-hot encoding, binning, categorical labels, and numeric features.

Table of Contents

Question
Answer
Explanation

Question

A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question.

A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model.

Which solution will meet these requirements?

A. Perform one-hot encoding on every possible option for each question of the survey.
B. Perform binning on all the answers each respondent selected for each question.
C. Use Amazon Mechanical Turk to create categorical labels for each set of possible responses.
D. Use Amazon Textract to create numeric features for each set of possible responses.

Answer

The correct solution to comprehensively represent every response from the multiple-choice survey in the dataset is:

A. Perform one-hot encoding on every possible option for each question of the survey.

Explanation

One-hot encoding is the most appropriate technique to convert the multiple choice responses into a format suitable for training a logistic regression model. Here's why:

One-hot encoding creates binary dummy variables for each possible response option. For example, if a survey question has options A, B, C, and D, one-hot encoding will create 4 binary features, one for each option. A "1" indicates the respondent selected that option, while a "0" means they did not.

This allows the model to consider each response option as an independent feature. Logistic regression requires input features to be numeric, and one-hot encoding converts the categorical responses into a numeric format in a way that doesn't impose any arbitrary ordering or numerical relationship between the options.

The other options have drawbacks:

Binning (B) is useful for converting continuous variables into discrete bins, but is not applicable for representing multiple-choice responses.
Using Mechanical Turk (C) to manually label the responses would be time-consuming and unnecessary, since the full set of possible response options is already known.
Extracting numeric features with Textract (D) is useful for OCR and analyzing text in images, but not relevant for a structured multiple-choice survey.

In summary, one-hot encoding is the best way to comprehensively represent the survey responses as input features to train the logistic regression model. It will create clear, independent numeric features for each possible response that the model can learn from.

Amazon AWS Certified Machine Learning - Specialty certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Amazon AWS Certified Machine Learning - Specialty exam and earn Amazon AWS Certified Machine Learning - Specialty certification.