Consider a database with 4 transactions:
Transaction 1: {cheese, bread, milk} Transaction 2: {soda, bread, milk} Transaction 3: {cheese, bread} Transaction 4: {cheese, soda, juice}
The minimum support is 25%. Which rule has a confidence equal to 50%?
A. {bread,milk} => {cheese}
B. {bread} => {milk}
C. {juice} => {soda}
D. {bread} => {cheese}
The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use?
A. Sqoop
B. Pig
C. Chukwa
D. Scribe
The Marketing department of your company wishes to track opinion on a new product that was recently introduced. Marketing would like to know how many positive and negative reviews are appearing over a given period and potentially retrieve each review for more in-depth insight. They have identified several popular product review blogs that historically have published thousands of user reviews of your company's products. You have been asked to provide the desired analysis. You examine the RSS feeds for each blog and determine which fields are relevant. You then craft a regular expression to match your new product's name and extract the relevant text from each matching review. What is the next step you should take?
A. Convert the extracted text into a suitable document representation and index into a review corpus
B. Use the extracted text and your regular expression to perform a sentiment analysis based on mentions of the new product
C. Read the extracted text for each review and manually tabulate the results
D. Group the reviews using Na.ve Bayesian classification
Which word or phrase completes the statement? A Data Scientist would consider that a RDBMS is to a Table as R is to a ______________ .
A. Data frame
B. List
C. Matrix
D. Array
In which lifecycle stage are appropriate analytical techniques determined?
A. Model planning
B. Model building
C. Data preparation
D. Discovery
How are window functions different from regular aggregate functions?
A. Rows retain their separate identities and the window function can access more than the current row.
B. Rows are grouped into an output row and the window function can access more than the current row.
C. Rows retain their separate identities and the window function can only access the current row.
D. Rows are grouped into an output row and the window function can only access the current row.
You submit a MapReduce job to a Hadoop cluster and notice that although the job was successfully submitted, it is not completing. What should you do?
A. Ensure that the TaskTracker is running.
B. Ensure that the JobTracker is running
C. Ensure that the NameNode is running
D. Ensure that a DataNode is running
What is the mandatory Clause that must be included when using Window functions?
A. OVER
B. RANK
C. PARTITION BY
D. RANK BY
Refer to the Exhibit.
In the Exhibit, the table shows the values for the input Boolean attributes "A", "B", and "C". It also shows
the values for the output attribute "class". Which decision tree is valid for the data?

A. Tree B
B. Tree A
C. Tree C
D. Tree D
Refer to the exhibit.
Click on the calculator icon in the upper left corner. You are going into a meeting where you know your
manager will have a question on your dataset -- specifically relating to customers that are classified as
renters with good credit status.
In order to prepare for the meeting, you create a rule: RENTER => GOOD CREDIT. What is the
confidence of the rule?

A. 63%
B. 41%
C. 18%
D. 73%