The Latest Real Exam Questions from the Latest DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Study Guide Try Free DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Practice Questions

Pass2lead > Databricks > Databricks Certifications > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK > DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

Questions 4

The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the answer that correctly fills the blanks in the code block to accomplish this.

1.from pyspark import StorageLevel 2.transactionsDf.__1__(StorageLevel.__2__).__3__

A. 1. cache

MEMORY_ONLY_2

count()

B. 1. persist

DISK_ONLY_2

count()

C. 1. persist

MEMORY_ONLY_2

select()

D. 1. cache

DISK_ONLY_2

count()

E. 1. persist

MEMORY_ONLY_2

count()

Buy Now

Questions 5

Which of the following statements about stages is correct?

A. Different stages in a job may be executed in parallel.

B. Stages consist of one or more jobs.

C. Stages ephemerally store transactions, before they are committed through actions.

D. Tasks in a stage may be executed by multiple machines at the same time.

E. Stages may contain multiple actions, narrow, and wide transformations.

Buy Now

Questions 6

Which of the following describes characteristics of the Dataset API?

A. The Dataset API does not support unstructured data.

B. In Python, the Dataset API mainly resembles Pandas' DataFrame API.

C. In Python, the Dataset API's schema is constructed via type hints.

D. The Dataset API is available in Scala, but it is not available in Python.

E. The Dataset API does not provide compile-time type safety.

Buy Now

Questions 7

Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame

itemsDf from last to first one in the alphabet?

1.+------+-----------------------------+-------------------+

2.|itemId|attributes |supplier |

3.+------+-----------------------------+-------------------+

4.|1 |[blue, winter, cozy] |Sports Company Inc.|

5.|2 |[red, summer, fresh, cooling]|YetiX |

6.|3 |[green, summer, travel] |Sports Company Inc.|

7.+------+-----------------------------+-------------------+

A. itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))

B. itemsDf.withColumn('attributes', sort_array(desc('attributes')))

C. itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))

D. itemsDf.withColumn("attributes", sort_array("attributes", asc=False))

E. itemsDf.select(sort_array("attributes"))

Buy Now

Questions 8

Which of the following describes properties of a shuffle?

A. Operations involving shuffles are never evaluated lazily.

B. Shuffles involve only single partitions.

C. Shuffles belong to a class known as "full transformations".

D. A shuffle is one of many actions in Spark.

E. In a shuffle, Spark writes data to disk.

Buy Now

Questions 9

The code block displayed below contains an error. The code block should trigger Spark to cache DataFrame transactionsDf in executor memory where available, writing to disk where insufficient

executor memory is available, in a fault-tolerant way. Find the error.

Code block:

transactionsDf.persist(StorageLevel.MEMORY_AND_DISK)

A. Caching is not supported in Spark, data are always recomputed.

B. Data caching capabilities can be accessed through the spark object, but not through the DataFrame API.

C. The storage level is inappropriate for fault-tolerant storage.

D. The code block uses the wrong operator for caching.

E. The DataFrameWriter needs to be invoked.

Buy Now

Questions 10

The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))

A. 1. sample

True

0.15

filter

B. 1. sample

False

0.15

select

C. 1. sample

0.85

False

select

D. 1. fraction

0.15

True

where

E. 1. fraction

False

0.85

select

Buy Now

Questions 11

Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

A. transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

B. transactionsDf.select(sqrt(predError))

C. transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

D. transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

E. transactionsDf.select(sqrt("predError"))

Buy Now

Correct Answer: D

transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError"))) Correct. The DataFrame.withColumn() operator is used to add a new column to a DataFrame. It takes two arguments: The name of the new column (here: predErrorSqrt) and a Column expression as the new column. In PySpark, a Column expression means referring to a column using the col ("predError") command or by other means, for example by transactionsDf.predError, or even just using the column name as a string, "predError". The asks for the square root. sqrt() is a function in pyspark.sql.functions and calculates the square root. It takes a value or a Column as an input. Here it is the predError column of DataFrame transactionsDf expressed through col("predError"). transactionsDf.withColumn ("predErrorSqrt", sqrt(predError)) Incorrect. In this expression, sqrt(predError) is incorrect syntax. You cannot refer to predError in this way ?to Spark it looks as if you are trying to refer to the non-existent Python variable predError. You could pass transactionsDf.predError, col("predError") (as in the correct solution), or even just "predError" instead. transactionsDf.select(sqrt(predError)) Wrong. Here, the explanation just above this one about how to refer to predError applies. transactionsDf.select(sqrt("predError")) No. While this is correct syntax, it will return a single-column DataFrame only containing a column showing the square root of column predError. However, the asks for a column to be added to the original DataFrame transactionsDf. transactionsDf.withColumn("predErrorSqrt", col ("predError").sqrt()) No. The issue with this statement is that column col("predError") has no sqrt() method. sqrt() is a member of pyspark.sql.functions, but not of pyspark.sql.Column. More info: pyspark.sql.DataFrame.withColumn -- PySpark 3.1.2 documentation and pyspark.sql.functions.sqrt --PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2, 31 (Databricks import instructions)

Questions 12

Which of the following code blocks concatenates rows of DataFrames transactionsDf and transactionsNewDf, omitting any duplicates?

A. transactionsDf.concat(transactionsNewDf).unique()

B. transactionsDf.union(transactionsNewDf).distinct()

C. spark.union(transactionsDf, transactionsNewDf).distinct()

D. transactionsDf.join(transactionsNewDf, how="union").distinct()

E. transactionsDf.union(transactionsNewDf).unique()

Buy Now

Questions 13

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

A. 1. select

col("storeId")

cast

StringType

B. 1. select

col("storeId")

StringType

C. 1. cast

"storeId"

StringType()

D. 1. select

col("storeId")

cast

StringType()

E. 1. select

storeId

cast

StringType()

Buy Now

Exam Code: DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0

Last Update: Jul 08, 2026

Questions: 180

PDF (Q&A)

$49.99

ADD TO CART

VCE

$55.99

ADD TO CART

PDF + VCE

$65.99

ADD TO CART