You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
A. Nothing; the worker node will automatically join the cluster when the DataNode daemon is started.
B. Without creating a dfs.hosts file or making any entries, run the command hadoop dfsadmin refreshHadoop on the NameNode
C. Create a dfs.hosts file on the NameNode, add the worker node's name to it, then issue the command hadoop dfsadmin refreshNodes on the NameNode
D. Restart the NameNode
Your cluster has the following characteristics:
A rack aware topology is configured and on
Replication is not set to 3
Cluster block size is set to 64 MB
Which describes the file read process when a client application connects into the cluster and requests a 50MB file?
A. The client queries the NameNode which retrieves the block from the nearest DataNode to the client and then passes that block back to the client.
B. The client queries the NameNode for the locations of the block, and reads from a random location in the list it retrieves to eliminate network I/O leads by balancing which nodes it retrieves data from at any given time.
C. The client queries the NameNode for the locations of the block, and reads all three copies. The first copy to complete transfer to the client is the one the client reads as part of Hadoop's
speculative execution framework.
D. The client queries the NameNode for the locations of the block, and reads from the first location in the list it receives.
On a cluster running CDH 5.0 or above, you use the hadoop fs put command to write a 300MB file into a previously empty directory using an HDFS block of 64MB. Just after this command has finished writing 200MB of this file, what would another use see when they look in the directory?
A. They will see the file with its original name. if they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster
B. They will see the file with a ._COPYING_extension on its name. If they attempt to view the file, they will get a ConcurrentFileAccessException until the entire file write is completed on the cluster.
C. They will see the file with a ._COPYING_ extension on its name. if they view the file, they will see contents of the file up to the last completed block (as each 64MB block is written, that block becomes available)
D. The directory will appear to be empty until the entire file write is completed on the cluster
You suspect that your NameNode is incorrectly configured, and is swapping memory to disk. Which Linux commands help you to identify whether swapping is occurring? (Select 3)
A. free
B. df
C. memcat
D. top
E. vmstat
F. swapinfo
Which YARN process runs as "controller O" of a submitted job and is responsible for resource requests?
A. ResourceManager
B. NodeManager
C. JobHistoryServer
D. ApplicationMaster
E. JobTracker
F. ApplicationManager
You have a cluster running with the Fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs?
A. When job A gets submitted, it consumes all the tasks slots.
B. When job A gets submitted, it doesn't consume all the task slots
C. When job B gets submitted, Job A has to finish first, before job B can scheduled
D. When job B gets submitted, it will get assigned tasks, while Job A continue to run with fewer tasks.
You observe that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. Decrease the io.sort.mb value to 0
B. Increase the io.sort.mb to 1GB
C. For 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records
Which is the default scheduler in YARN?
A. Fair Scheduler
B. FIFO Scheduler
C. Capacity Scheduler
D. YARN doesn't configure a default scheduler. You must first assign a appropriate scheduler class in yarn-site.xml
On a cluster running MapReduce v2 (MRv2) on YARN, a MapReduce job is given a directory of 10 plain text as its input directory. Each file is made up of 3 HDFS blocks. How many Mappers will run?
A. We cannot say; the number of Mappers is determined by the RsourceManager
B. We cannot say; the number of Mappers is determined by the ApplicationManager
C. We cannot say; the number of Mappers is determined by the developer
D. 30
E. 3
F. 10
You are working on a project where you need to chain together MapReduce, Pig jobs. You also needs the ability to use forks, decision, and path joins. Which ecosystem project should you use to perform these actions?
A. Oozie
B. Zookeeper
C. HBase
D. Sqoop
E. HUE