AI-900: How to Split Data for Machine Learning: Training and Evaluation Sets

Learn how to split data for machine learning using the random row split method, which is the best way to create training and evaluation sets for your algorithm.

Table of Contents

Question
Answer
Explanation

Question

In a machine learning algorithm, what method should you use to split data for training and evaluation?

A. Use features for training and labels for evaluation
B. Randomly split the data into rows for training and columns for evaluation
C. Randomly split the data into rows for training and rows for evaluation
D. Use labels for training and features for evaluation

Answer

C. Randomly split the data into rows for training and rows for evaluation

Explanation

In Azure Machine Learning, the percentage split is the available technique to split the data. In this technique, random data of a given percentage will be split to train and test data.

The correct answer is C. Randomly split the data into rows for training and rows for evaluation. Here is a detailed explanation:

In machine learning, data is usually organized in a tabular format, where each row represents an observation or an instance of the data, and each column represents a feature or an attribute of the data. For example, if the data is about students’ grades, each row could be a student, and each column could be a subject or a test score.
A machine learning algorithm learns from the data by finding patterns and relationships between the features and the labels, which are the target values or outcomes that we want to predict or classify. For example, if we want to predict the final grade of a student based on their test scores, the final grade would be the label, and the test scores would be the features.
To evaluate the performance and accuracy of a machine learning algorithm, we need to split the data into two sets: a training set and an evaluation set. The training set is used to train or fit the algorithm, and the evaluation set is used to test or validate the algorithm. The evaluation set should be different from the training set, so that we can measure how well the algorithm generalizes to new and unseen data.
The most common and simple method to split the data into training and evaluation sets is to randomly split the data into rows. This means that we randomly select a subset of the rows to be the training set, and the remaining rows to be the evaluation set. For example, we could use 80% of the rows for training, and 20% of the rows for evaluation. This ensures that both sets have a similar distribution of features and labels, and that the evaluation set is representative of the data population.
The other options are incorrect because they do not split the data into rows, but rather into columns or labels. This would result in either losing information or leaking information between the training and evaluation sets, which would compromise the validity and reliability of the evaluation.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Fundamentals AI-900 exam and earn Microsoft Azure AI Fundamentals AI-900 certification.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump