Your cluster's mapred-start.xml includes the following parameters mapreduce.map.memory.mb 4096 mapreduce.reduce.memory.mb 8192
And any cluster's yarn-site.xml includes the following parameters yarn.nodemanager.vmen-pmen-ration
What is the maximum amount of virtual memory allocated for each map task before YARN will kill its Container?
A. 4 GB
B. 17.2 GB C. 8.9 GB D. 8.2 GB E. 24.6 GB
In order to get maximum amount of virtual memory allocated for each map task, you have to multiply mapreduce.map.memory.mb with yarn.nodemanager.vmen-pmen-ration. The result would be 8601.6 MB. So the nearest answer is 8.2 since 8.9 is more than 8601.6 MB.
Assuming you're not running HDFS Federation, what is the maximum number of NameNode daemons you should run on your cluster in order to avoid a "split-brain"? scenario with your NameNode when running
HDFS High Availability (HA) using Quorum-based storage?
A. Two active NameNodes and two Standby NameNodes
B. One active NameNode and one Standby NameNode
C. Two active NameNodes and on Standby NameNode
D. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of
NameNodes you can deploy
Table schemas in Hive are:
A. Stored as metadata on the NameNode B. Stored along with the data in HDFS
C. Stored in the Metadata
D. Stored in ZooKeeper
For each YARN job, the Hadoop framework generates task log file. Where are Hadoop task log files stored?
A. Cached by the NodeManager managing the job containers, then written to a log directory on the
B. Cached in the YARN container running the task, then copied into HDFS on job completion C. In HDFS, in the directory of the user who generates the job
D. On the local disk of the slave mode running the task
You have a cluster running with the fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now Job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs? (Choose two)
A. When Job B gets submitted, it will get assigned tasks, while job A continues to run with fewer tasks.
B. When Job B gets submitted, Job A has to finish first, before job B can gets scheduled. C. When Job A gets submitted, it doesn't consumes all the task slots.
D. When Job A gets submitted, it consumes all the task slots.
Each node in your Hadoop cluster, running YARN, has 64GB memory and 24 cores. Your yarn.site.xml has the following configuration:
You want YARN to launch no more than 16 containers per node. What should you do?
A. Modify yarn-site.xml with the following property:
B. Modify yarn-sites.xml with the following property:
C. Modify yarn-site.xml with the following property:
D. No action is needed: YARN's dynamic resource allocation automatically optimizes the node memory
You want to node to only swap Hadoop daemon data from RAM to disk when absolutely necessary. What should you do?
A. Delete the /dev/vmswap file on the node
B. Delete the /etc/swap file on the node
C. Set the ram.swap parameter to 0 in core-site.xml D. Set vm.swapfile file on the node
E. Delete the /swapfile file on the node
You are configuring your cluster to run HDFS and MapReducer v2 (MRv2) on YARN. Which two daemons needs to be installed on your cluster's master nodes? (Choose two)
B. ResourceManager C. TaskManager
Answer B, E
You observed that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 1000MB. How would you tune your
io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. For a 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
B. Increase the io.sort.mb to 1GB
C. Decrease the io.sort.mb value to 0
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to
equals) the number of map output records.
You are running a Hadoop cluster with a NameNode on host mynamenode, a secondary NameNode on host mysecondarynamenode and several DataNodes.
Which best describes how you determine when the last checkpoint happened?
A. Execute hdfs namenode -report on the command line and look at the Last Checkpoint information
B. Execute hdfs dfsadmin -saveNamespace on the command line which returns to you the last
checkpoint value in fstime file
C. Connect to the web UI of the Secondary NameNode (http://mysecondary:50090/) and look at the "Last
D. Connect to the web UI of the NameNode (http://mynamenode:50070) and look at the "Last Checkpoint"?
Answer C Explanation:
Would you like to see more? Don't miss our CCA-500
PDF file at: