Cloudera Certified Developer for Apache Hadoop (CCDH)
Exam Details
Exam Code
:CCD-410
Exam Name
:Cloudera Certified Developer for Apache Hadoop (CCDH)
Certification
:CCDH
Vendor
:Cloudera
Total Questions
:60 Q&As
Last Updated
:May 14, 2024
Cloudera CCDH CCD-410 Questions & Answers
Question 11:
Table metadata in Hive is:
A. Stored as metadata on the NameNode.
B. Stored along with the data in HDFS.
C. Stored in the Metastore.
D. Stored in ZooKeeper.
Correct Answer: C
By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa. They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies.
Note:
*
By default, Hive stores metadata in an embedded Apache Derby database, and other client/server
databases like MySQL can optionally be used.
*
features of Hive include:
Metadata storage in an RDBMS, significantly reducing the time to perform semantic checks during query
execution.
Reference: Store Hive Metadata into RDBMS
Question 12:
In the reducer, the MapReduce API provides you with an iterator over Writable values. What does calling the next () method return?
A. It returns a reference to a different Writable object time.
B. It returns a reference to a Writable object from an object pool.
C. It returns a reference to the same Writable object each time, but populated with different data.
D. It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object.
E. It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise.
Correct Answer: C
Calling Iterator.next() will always return the SAME EXACT instance of IntWritable, with the contents of that instance replaced with the next value.
Reference: manupulating iterator in mapreduce
Question 13:
What types of algorithms are difficult to express in MapReduce v1 (MRv1)?
A. Algorithms that require applying the same mathematical function to large numbers of individual binary records.
B. Relational operations on large amounts of structured and semi-structured data.
C. Algorithms that require global, sharing states.
D. Large-scale graph algorithms that require one-step link traversal.
E. Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).
Correct Answer: C
See 3) below.
Limitations of Mapreduce where not to use Mapreduce While very powerful and applicable to a wide variety of problems, MapReduce is not the answer to every problem. Here are some problems I found where MapReudce is not suited and some papers that address the limitations of MapReuce.
1.
Computation depends on previously computed values
If the computation of a value depends on previously computed values, then MapReduce cannot be used. One good example is the Fibonacci series where each value is summation of the previous two values. i.e., f(k+2) = f(k+1) + f(k). Also, if the data set is small enough to be computed on a single machine, then it is better to do it as a single reduce(map(data)) operation rather than going through the entire map reduce process.
2.
Full-text indexing or ad hoc searching
The index generated in the Map step is one dimensional, and the Reduce step must not generate a large amount of data or there will be a serious performance degradation. For example, CouchDB's MapReduce may not be a good fit for full-text indexing or ad hoc searching. This is a problem better suited for a tool such as Lucene.
3.
Algorithms depend on shared global state
Solutions to many interesting problems in text processing do not require global synchronization. As a result, they can be expressed naturally in MapReduce, since map and reduce tasks run independently and in isolation. However, there are many examples of algorithms that depend crucially on the existence of shared global state during processing, making them difficult to implement in MapReduce (since the single opportunity for global synchronization in MapReduce is the barrier between the map and reduce phases of processing)
Reference: Limitations of Mapreduce where not to use Mapreduce
Question 14:
MapReduce v2 (MRv2/YARN) splits which major functions of the JobTracker into separate daemons? Select two.
A. Heath states checks (heartbeats)
B. Resource management
C. Job scheduling/monitoring
D. Job coordination between the ResourceManager and NodeManager
E. Launching tasks
F. Managing file system metadata
G. MapReduce metric reporting
H. Managing tasks
Correct Answer: BC
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource
management and job scheduling/monitoring, into separate daemons. The idea is to have a global
ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in
the classical sense of Map-Reduce jobs or a DAG of jobs.
Note:
The central goal of YARN is to clearly separate two things that are unfortunately smushed together in
current Hadoop, specifically in (mainly) JobTracker:
/ Monitoring the status of the cluster with respect to which nodes have which resources available.
Under YARN, this will be global.
/ Managing the parallelization execution of any specific job. Under YARN, this will be done separately for each job.
Reference: Apache Hadoop YARN Concepts and Applications
Question 15:
In a MapReduce job with 500 map tasks, how many map task attempts will there be?
A. It depends on the number of reduces in the job.
B. Between 500 and 1000.
C. At most 500.
D. At least 500.
E. Exactly 500.
Correct Answer: D
Explanation: From Cloudera Training Course: Task attempt is a particular instance of an attempt to execute a task There will be at least as many task attempts as there are tasks If a task attempt fails, another will be started by the JobTracker Speculative execution can also result in more task attempts than completed tasks
Question 16:
A combiner reduces:
A. The number of values across different keys in the iterator supplied to a single reduce method call.
B. The amount of intermediate data that must be transferred between the mapper and reducer.
C. The number of input files a mapper must process.
D. The number of output files a reducer must produce.
Correct Answer: B
Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more then 1 times. Therefore your MapReduce jobs should not depend on the combiners execution.
Reference: 24 Interview Questions and Answers for Hadoop MapReduce developers, What are combiners? When should I use a combiner in my MapReduce Job?
Question 17:
You write MapReduce job to process 100 files in HDFS. Your MapReduce algorithm uses TextInputFormat: the mapper applies a regular expression over input values and emits key- values pairs with the key consisting of the matching text, and the value containing the filename and byte offset. Determine the difference between setting the number of reduces to one and settings the number of reducers to zero.
A. There is no difference in output between the two settings.
B. With zero reducers, no reducer runs and the job throws an exception. With one reducer, instances of matching patterns are stored in a single file on HDFS.
C. With zero reducers, all instances of matching patterns are gathered together in one file on HDFS. With one reducer, instances of matching patterns are stored in multiple files on HDFS.
D. With zero reducers, instances of matching patterns are stored in multiple files on HDFS. With one reducer, all instances of matching patterns are gathered together in one file on HDFS.
Correct Answer: D
*
It is legal to set the number of reduce-tasks to zero if no reduction is desired.
In this case the outputs of the map-tasks go directly to the FileSystem, into the output path set by setOutputPath(Path). The framework does not sort the map-outputs before writing them out to the FileSystem.
*
Often, you may want to process input data using a map function only. To do this, simply set mapreduce.job.reduces to zero. The MapReduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
Note:
Reduce
In this phase the reduce(WritableComparable, Iterator, OutputCollector, Reporter) method is called for each pair in the grouped inputs.
The output of the reduce task is typically written to the FileSystem via OutputCollector.collect (WritableComparable, Writable).
Applications can use the Reporter to report progress, set application-level status messages and update Counters, or just indicate that they are alive.
The output of the Reducer is not sorted.
Question 18:
You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt. How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this directory?
A. Four, all files will be processed
B. Three, the pound sign is an invalid character for HDFS file names
C. Two, file names with a leading period or underscore are ignored
D. None, the directory cannot be named jobdata
E. One, no special characters can prefix the name of an input file
Correct Answer: C
Files starting with '_' are considered 'hidden' like unix files starting with '.'.
# characters are allowed in HDFS file names.
Question 19:
Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?
A. Oozie
B. Flume
C. Pig
D. Hue
E. Hive
F. Sqoop
G. fuse-dfs
Correct Answer: F
Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities: Imports individual tables or entire databases to files in HDFS Generates Java classes to allow you to interact with your imported data Provides the ability to import from SQL databases straight into your Hive data warehouse Note: Data Movement Between Hadoop and Relational Databases Data can be moved between Hadoop and a relational database as a bulk data transfer, or relational tables
can be accessed from within a MapReduce map function. Note:
* Cloudera's Distribution for Hadoop provides a bulk data transfer tool (i.e., Sqoop) that imports individual tables or entire databases into HDFS files. The tool also generates Java classes that support interaction with the imported data. Sqoop supports all relational databases over JDBC, and Quest Software provides a connector (i.e., OraOop) that has been optimized for access to data residing in Oracle databases.
Reference: http://log.medcl.net/item/2011/08/hadoop-and-mapreduce-big-data-analytics-gartner/ (Data Movement between hadoop and relational databases, second paragraph)
Question 20:
You use the hadoop fs put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this life?
A. They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.
B. They would see the current state of the file, up to the last bit written by the command.
C. They would see the current of the file through the last completed block.
D. They would see no content until the whole file written and closed.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCD-410 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.