Skip to Content

CompTIA DA0-001: What is the reason an analyst would need to cleanse this income data set?

Learn why data cleansing is necessary in this sample CompTIA DA0-001 exam question about analyzing average income data containing an outlier value.

Table of Contents

Question

An analyst is reporting on the average income for a county and is reviewing the following data:

Name Address Yearly income
Jessica Jones 145 Stonebridge Avenue $634,900
Spencer James 1567 Watercress $135,000
Olivia Baker 456 Harvard Road $95,000
Layla Harding 5674 Yarding Street $37,000

Which of the following is the reason the analyst would need to cleanse the data in this data set?

A. Data completeness
B. Data outliers
C. Duplicate data
D. Missing values

Answer

B. Data outliers

Explanation

In this data set, most of the yearly incomes fall between $37,000 and $135,000. However, there is one value that is significantly higher at $634,900 for Jessica Jones. This income is much greater than the other values and appears to be an outlier.

An outlier is a data point that differs significantly from other observations. Outliers can be caused by variability in the data or measurement error. In this case, Jessica Jones’ income is so much higher than the others that it is likely an error, such as an extra digit being added.

Outliers can skew statistical measures like the mean average. If this outlier value was included, it would dramatically increase the calculated average income for the county and give an inaccurate result. Therefore, the analyst would need to cleanse the data by investigating this outlier and excluding or correcting it if it is determined to be an error.

The other options don’t apply:
A) The data set appears complete, with no missing information
C) There are no duplicate entries in the data
D) There are no missing/blank values that would require cleansing

So in summary, the best answer is B – the analyst needs to cleanse this data to deal with the outlier value that could skew their analysis of the average income. Identifying and handling outliers is an important data cleansing step to ensure accurate results.

CompTIA DA0-001 certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the CompTIA DA0-001 exam and earn CompTIA DA0-001 certification.