When predicting numeric outcomes like taxi fares, choosing relevant input features is key. Trip distance provides useful signal for regression models to estimate ride costs.
Table of Contents
Question
You have a dataset that contains information about taxi journeys that occurred during a given period. You need to train a model to predict the fare of a taxi journey. What should you use as a feature?
A. the number of taxi journeys in the dataset
B. the trip distance of individual taxi journeys
C. the fare of individual taxi journeys
D. the trip ID of individual taxi journeys
Answer
B. the trip distance of individual taxi journeys
Explanation
The correct answer is B. the trip distance of individual taxi journeys.
The trip distance of individual taxi journeys is a feature that can be used to train a model to predict the fare of a taxi journey. A feature is an input variable that has some predictive power for the output variable, which in this case is the fare. The trip distance is likely to have a strong correlation with the fare, as longer trips tend to cost more. Therefore, the trip distance is a relevant and useful feature for the model.
The other options are not correct for the following reasons:
- the number of taxi journeys in the dataset: This is not a feature, but a property of the dataset. It does not vary for each individual taxi journey, and it does not have any predictive power for the fare.
- the fare of individual taxi journeys: This is not a feature, but the output variable that the model is trying to predict. It cannot be used as an input for the model, as it would create a circular dependency.
- the trip ID of individual taxi journeys: This is not a feature, but an identifier for each taxi journey. It does not have any predictive power for the fare, as it is a random or arbitrary value.
The label is the column you want to predict. The identified Featuresare the inputs you give the model to predict the Label.
Example: The provided data set contains the following columns:
vendor_id: The ID of the taxi vendor is a feature.
rate_code: The rate type of the taxi trip is a feature.
passenger_count: The number of passengers on the trip is a feature.
trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment, you don’t know how long the trip would take. Thus, the trip time is not a feature and you’ll exclude this column from the model.
trip_distance: The distance of the trip is a feature.
payment_type: The payment method (cash or credit card) is a feature.
fare_amount: The total taxi fare paid is the label.
Reference
Microsoft Learn > .NET > ML.NET guide > Tutorial: Predict prices using regression with ML.NET
Microsoft Azure AI Fundamentals AI-900 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft Azure AI Fundamentals AI-900 exam and earn Microsoft Azure AI Fundamentals AI-900 certification.