Skip to Content

AI-900: Optimal Data Splitting for Machine Learning Key Strategies Revealed

Discover the best practices in splitting data for machine learning progress. Learn why randomly dividing data into rows for training and evaluation is crucial for accurate model assessment.

Table of Contents

Question

For a machine learning progress, how should you split data for training and evaluation?

A. Use features for training and labels for evaluation.
B. Randomly split the data into rows for training and rows for evaluation.
C. Use labels for training and features for evaluation.
D. Randomly split the data into columns for training and columns for evaluation.

Answer

B. Randomly split the data into rows for training and rows for evaluation.

Explanation

You’ll want to split your dataset into two subsets: one for training the model and another for evaluating its performance. Option B, randomly splitting the data into rows for training and evaluation, is the correct approach. This ensures a balanced representation of data in both sets, helping the model learn patterns and then test its performance on unseen data.

The Split Data module is particularly useful when you need to separate data into training and testing sets. Use the Split Rows option if you want to divide the data into two parts. You can specify the percentage of data to put in each split, but by default, the data is divided 50-50. You can also randomize the selection of rows in each group, and use stratified sampling.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Fundamentals AI-900 exam and earn Microsoft Azure AI Fundamentals AI-900 certification.

Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump