What is required in a presentation for business analysts?
A. Budgetary considerations and requests
B. Operational process changes
C. Detailed statistical explanation of the applicable modeling theory
D. The presentation author's credentials
What is Hadoop?
A. Java classes for HDFS types and MapReduce job management and HDFS
B. Java classes for HDFS types and MapReduce job management and the MapReduce paradigm
C. MapReduce paradigm and HDFS
D. MapReduce paradigm and massive unstructured data storage on commodity hardware
Which data asset is an example of semi-structured data?
A. XML data file
B. Database table
C. Webserver log
D. News article
Refer to the Exhibit.

In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows the values for the output attribute "class". Which decision tree is valid for the data?
A. Tree B
B. Tree A
C. Tree C
D. Tree D
Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?
A. Data exploration
B. Descriptive statistics
C. ETLT
D. Model selection
You are using the Apriori algorithm to determine the likelihood that a person who owns a home has a good credit score. You have determined that the confidence for the rules used in the algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are homeowners". What can you determine from the lift calculation?
A. Support for the association is low
B. Leverage of the rules is low
C. The rule is coincidental
D. The rule is true
You are attempting to find the Euclidean distance between two centroids:
Centroid A's coordinates: (X = 2, Y = 4)
Centroid B's coordinates (X = 8, Y = 10)
Which formula finds the correct Euclidean distance?
A. SQRT((2-8)2+(4-10)2) or 8.49
B. SQRT(((2-8) x 2) + ((4-10) x 2)) or 12.17
C. ((2-8)2+(4-10)2) or 72
D. ((2-8) x 2 + (4-10) x 2) or 148
Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is to a Table as R is to a ______________ .
A. Data frame
B. List
C. Matrix
D. Array
Refer to the exhibit.

Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents for the topic "solid state disk". In the Exhibit, Table A provides the inverse document frequency for each term across the corpus. Table B provides each term's frequency in four documents selected from corpus. Which of the four documents is most relevant to the analyst's search?
A. Document B
B. Document A
C. Document C
D. Document D
You are studying the behavior of a population, and you are provided with multidimensional data at the individual level. You have identified four specific individuals who are valuable to your study, and would like to find all users who are most similar to each individual. Which algorithm is the most appropriate for this study?
A. K-means clustering
B. Linear regression
C. Association rules
D. Decision trees