Skip to Content

AI-900: Proper Data Splitting for Machine Learning Model Evaluation

When training a machine learning model, the data must be split correctly into training and evaluation sets. Learn why random row-wise splitting is preferred over splitting by feature or label.

Question

For a machine learning progress, how should you split data for training and evaluation?

A. Use features for training and labels for evaluation.
B. Randomly split the data into rows for training and rows for evaluation.
C. Use labels for training and features for evaluation.
D. Randomly split the data into columns for training and columns for evaluation.

Answer

B. Randomly split the data into rows for training and rows for evaluation.

Explanation

In Azure Machine Learning, the percentage split is the available technique to split the data. In this technique, random data of a given percentage will be split to train and test data.

The Split Data module is particularly useful when you need to separate data into training and testing sets. Use the Split Rows option if you want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50. You can also randomize the selection of rows in each group, and use stratified sampling.

The correct answer is B. You should randomly split the data into rows for training and evaluation for a machine learning project.

A machine learning project involves using data to train a model that can perform a specific task, such as classification, regression, or clustering. To train and evaluate a model, you need to split the data into two subsets: a training set and an evaluation set.

The training set is the data that you use to train the model, which means to adjust the model parameters to minimize the error between the model predictions and the actual outcomes. The evaluation set is the data that you use to evaluate the model, which means to measure how well the model performs on new and unseen data.

To split the data into training and evaluation sets, you should randomly split the data into rows, not columns. A row in a data set represents an observation or an example, which consists of a set of features and a label. A feature is an attribute or a characteristic of the observation, such as age, height, or color. A label is the outcome or the target variable that you want to predict, such as income, grade, or category.

By randomly splitting the data into rows, you are ensuring that both the training and evaluation sets have a representative sample of the population, and that the features and labels are consistent across the sets. This way, you can train and evaluate the model on the same data format and distribution.

If you split the data into columns, you are splitting the data into features and labels, not into training and evaluation sets. This is not a valid way to split the data, because you need both features and labels for both training and evaluation. If you use features for training and labels for evaluation, or vice versa, you are not training or evaluating the model properly, because you are not providing the model with the input and output data that it needs.

Reference

Microsoft Learn > Previous Versions > Module Categories and Descriptions > Data Transformation > Sample and Split > Split Data

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Fundamentals AI-900 exam and earn Microsoft Azure AI Fundamentals AI-900 certification.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump

Alex Lim is a certified IT Technical Support Architect with over 15 years of experience in designing, implementing, and troubleshooting complex IT systems and networks. He has worked for leading IT companies, such as Microsoft, IBM, and Cisco, providing technical support and solutions to clients across various industries and sectors. Alex has a bachelor’s degree in computer science from the National University of Singapore and a master’s degree in information security from the Massachusetts Institute of Technology. He is also the author of several best-selling books on IT technical support, such as The IT Technical Support Handbook and Troubleshooting IT Systems and Networks. Alex lives in Bandar, Johore, Malaysia with his wife and two chilrdren. You can reach him at [email protected] or follow him on Website | Twitter | Facebook

    Ads Blocker Image Powered by Code Help Pro

    Your Support Matters...

    We run an independent site that is committed to delivering valuable content, but it comes with its challenges. Many of our readers use ad blockers, causing our advertising revenue to decline. Unlike some websites, we have not implemented paywalls to restrict access. Your support can make a significant difference. If you find this website useful and choose to support us, it would greatly secure our future. We appreciate your help. If you are currently using an ad blocker, please consider disabling it for our site. Thank you for your understanding and support.