Skip to Content

Certified Associate Developer for Apache Spark: Returning New DataFrame with Constant String Column

Discover the correct Spark code to create a new DataFrame from an existing one with a column containing a constant string value. Gain comprehensive insights to ace the Databricks Certified Associate Developer for Apache Spark certification exam.

Table of Contents

Question

Which of the following code blocks returns a new DataFrame from DataFrame storesDF where column modality is the constant string “PHYSICAL”? Assume DataFrame storesDF is the only defined language variable.

A. storesDF.withColumn(“modality”, lit(PHYSICAL))
B. storesDF.withColumn(“modality”, col(“PHYSICAL”))
C. storesDF.withColumn(“modality”, lit(“PHYSICAL”))
D. storesDF.withColumn(“modality”, StringType(“PHYSICAL”))
E. storesDF.withColumn(“modality”, “PHYSICAL”)

Answer

C. storesDF.withColumn(“modality”, lit(“PHYSICAL”))

Explanation

The correct answer is C. storesDF.withColumn(“modality”, lit(“PHYSICAL”)).

Explanation:
The withColumn() function in Spark DataFrame API is used to add a new column or replace an existing column in a DataFrame. The second argument of the withColumn() function specifies the value to be assigned to the new column.

In this case, we want to create a new column named “modality” with a constant value of “PHYSICAL”. To achieve this, we need to use the lit() function, which creates a literal column. The lit() function takes a constant value as an argument and returns a column with that value.

Option A, storesDF.withColumn(“modality”, lit(PHYSICAL)), is incorrect because PHYSICAL is not a valid Spark DataType. It should be a string literal, which is achieved using option C.

Option B, storesDF.withColumn(“modality”, col(“PHYSICAL”)), is incorrect because col(“PHYSICAL”) will try to reference a column named “PHYSICAL” in the DataFrame, which is not the desired behavior.

Option D, storesDF.withColumn(“modality”, StringType(“PHYSICAL”)), is incorrect because StringType() is a Spark DataType, not a function that can be used to create a literal column.

Option E, storesDF.withColumn(“modality”, “PHYSICAL”), is incorrect because the string value “PHYSICAL” needs to be wrapped in the lit() function to create a literal column.

Databricks Certified Associate Developer for Apache Spark certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Databricks Certified Associate Developer for Apache Spark exam and earn Databricks Certified Associate Developer for Apache Spark certification.