The Latest Real Exam Questions from the Latest DATABRICKS-MACHINE-LEARNING-ASSOCIATE Study Guide Try Free DATABRICKS-MACHINE-LEARNING-ASSOCIATE Practice Questions

Pass2lead > Databricks > Databricks Certifications > DATABRICKS-MACHINE-LEARNING-ASSOCIATE > DATABRICKS-MACHINE-LEARNING-ASSOCIATE Online Practice Questions and Answers

DATABRICKS-MACHINE-LEARNING-ASSOCIATE Online Practice Questions and Answers

Questions 4

A data scientist has developed a random forest regressor rfr and included it as the final stage in a Spark MLPipeline pipeline. They then set up a cross-validation process with pipeline as the estimator in the following code block:

Which of the following is a negative consequence of includingpipelineas the estimator in the cross-validation process rather thanrfras the estimator?

A. The process will have a longer runtime because all stages of pipeline need to be refit or retransformed with each mode

B. The process will leak data from the training set to the test set during the evaluation phase

C. The process will be unable to parallelize tuning due to the distributed nature of pipeline

D. The process will leak data prep information from the validation sets to the training sets for each model

Buy Now

Questions 5

A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.

They are using the following code block to evaluate the model:

regression_evaluator.setMetricName("rmse").evaluate(preds_df)

Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?

A. They should exponentiate the computed RMSE value

B. They should take the log of the predictions before computing the RMSE

C. They should evaluate the MSE of the log predictions to compute the RMSE

D. They should exponentiate the predictions before computing the RMSE

Buy Now

Questions 6

Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?

A. MLflow Experiment Tracking

B. Spark ML

C. Autoscaling clusters

D. Autoscaling clusters

E. Delta Lake

Buy Now

Correct Answer: B

Spark ML (part of Apache Spark's MLlib) is designed to handle machine learning tasks across multiple nodes in a cluster, effectively parallelizing tasks like hyperparameter tuning. It supports various machine learning algorithms that can be

optimized over a Spark cluster, making it suitable for parallelizing hyperparameter tuning for single-node machine learning models when they are adapted to run on Spark.

References:

Apache Spark MLlib Guide:https://spark.apache.org/docs/latest/ml-guide.html

Spark ML is a library within Apache Spark designed for scalable machine learning. It provides tools to handle large-scale machine learning tasks, including parallelizing the hyperparameter tuning process for single-node machine learning

models using a Spark cluster. Here's a detailed explanation of how Spark ML can be used:

Hyperparameter Tuning with CrossValidator: Spark ML includes theCrossValidatorandTrainValidationSplitclasses, which are used for hyperparameter tuning. These classes can evaluate multiple sets of hyperparameters in parallel using a

Spark cluster. from pyspark.ml.tuning import CrossValidator, ParamGridBuilder from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Define the model

model = ...

# Create a parameter grid

paramGrid = ParamGridBuilder() \

addGrid(model.hyperparam1, [value1, value2]) \

addGrid(model.hyperparam2, [value3, value4]) \

build()

# Define the evaluator

evaluator = BinaryClassificationEvaluator()

# Define the CrossValidator

crossval = CrossValidator(estimator=model,

estimatorParamMaps=paramGrid,

evaluator=evaluator,

numFolds=3)

Parallel Execution: Spark distributes the tasks of training models with different hyperparameters across the cluster's nodes. Each node processes a subset of the parameter grid, which allows multiple models to be trained simultaneously.

Scalability: Spark ML leverages the distributed computing capabilities of Spark. This allows for efficient processing of large datasets and training of models across many nodes, which speeds up the hyperparameter tuning process significantly

compared to single-node computations.

References:

Apache Spark MLlib Documentation

Hyperparameter Tuning in Spark ML

Questions 7

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation

when k > 2.

Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

A. A holdout set is not necessary when using a train-validation split

B. Reproducibility is achievable when using a train-validation split

C. Fewer hyperparameter values need to be tested when usinga train-validation split

D. Bias is avoidable when using a train-validation split

E. Fewer models need to be trained when using a train-validation split

Buy Now

Questions 8

A data scientist has been given an incomplete notebook from the data engineering team. The notebook uses a Spark DataFrame spark_df on which the data scientist needs to perform further feature engineering. Unfortunately, the data scientist has not yet learned the PySpark DataFrame API.

Which of the following blocks of code can the data scientist run to be able to use the pandas API on Spark?

A. import pyspark.pandas as ps df = ps.DataFrame(spark_df)

B. import pyspark.pandas as ps df = ps.to_pandas(spark_df)

C. spark_df.to_sql()

D. import pandas as pd df = pd.DataFrame(spark_df)

E. spark_df.to_pandas()

Buy Now

Questions 9

A data scientist has produced two models for a single machine learning problem. One of the models performs well when one of the features has a value of less than 5, and the other model performs well when the value of that feature is greater than or equal to 5. The data scientist decides to combine the two models into a single machine learning solution.

Which of the following terms is used to describe this combination of models?

A. Bootstrap aggregation

B. Support vector machines

C. Bucketing

D. Ensemble learning

E. Stacking

Buy Now

Questions 10

A health organization is developing a classification model to determine whether or not a patient currently has a specific type of infection. The organization's leaders want to maximize the number of positive cases identified by the model.

Which of the following classification metrics should be used to evaluate the model?

A. RMSE

B. Precision

C. Area under the residual operating curve

D. Accuracy

E. Recall

Buy Now

Questions 11

A data scientist is using the following code block to tune hyperparameters for a machine learning model:

Which change can they make the above code block to improve the likelihood of a more accurate model?

A. Increase num_evals to 100

B. Change fmin() to fmax()

C. Change sparkTrials() to Trials()

D. Change tpe.suggest to random.suggest

Buy Now

Questions 12

A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.

Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?

A. Spark ML decision trees test every feature variable in the splitting algorithm

B. Spark ML decision trees automatically prune overfit trees

C. Spark ML decision trees test more split candidates in the splitting algorithm

D. Spark ML decision trees test a random sample of feature variables in the splitting algorithm

E. Spark ML decision trees test binned features values as representative split candidates

Buy Now

Questions 13

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

A. Leave-one-out encoding

B. Target encoding

C. One-hot encoding

D. Categorical

E. String indexing

Buy Now

Exam Code: DATABRICKS-MACHINE-LEARNING-ASSOCIATE

Exam Name: Databricks Certified Machine Learning Associate

Last Update: Jul 05, 2026

Questions: 74

PDF (Q&A)

$49.99

ADD TO CART

VCE

$55.99

ADD TO CART

PDF + VCE

$65.99

ADD TO CART