Exam Details

  • Exam Code
    :CCA175
  • Exam Name
    :CCA Spark and Hadoop Developer Exam
  • Certification
    :Cloudera Certified Associate CCA
  • Vendor
    :Cloudera
  • Total Questions
    :95 Q&As
  • Last Updated
    :May 12, 2024

Cloudera Cloudera Certified Associate CCA CCA175 Questions & Answers

  • Question 31:

    Problem Scenario 78 : You have been given MySQL DB with following details.

    user=retail_dba

    password=cloudera

    database=retail_db

    table=retail_db.orders

    table=retail_db.order_items

    jdbc URL = jdbc:mysql://quickstart:3306/retail_db

    Columns of order table : (orderid , order_date , order_customer_id, order_status)

    Columns of ordeMtems table : (order_item_td , order_item_order_id ,

    order_item_product_id,

    order_item_quantity,order_item_subtotal,order_item_product_price)

    Please accomplish following activities.

    1.

    Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92_order_items .

    2.

    Join these data using order_id in Spark and Python

    3.

    Calculate total revenue perday and per customer

    4.

    Calculate maximum revenue customer

  • Question 32:

    Problem Scenario 74 : You have been given MySQL DB with following details.

    user=retail_dba

    password=cloudera

    database=retail_db

    table=retail_db.orders

    table=retail_db.order_items

    jdbc URL = jdbc:mysql://quickstart:3306/retail_db

    Columns of order table : (orderjd , order_date , ordercustomerid, order status}

    Columns of orderjtems table : (order_item_td , order_item_order_id ,

    order_item_product_id,

    order_item_quantity,order_item_subtotal,order_item_product_price)

    Please accomplish following activities.

    1.

    Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory p89_orders and p89_order_items .

    2.

    Join these data using orderjd in Spark and Python

    3.

    Now fetch selected columns from joined data Orderld, Order date and amount collected on this order.

    4.

    Calculate total order placed for each date, and produced the output sorted by date.

  • Question 33:

    You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.categories jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities.

    1.

    Connect MySQL DB and check the content of the tables.

    2.

    Copy "retaildb.categories" table to hdfs, without specifying directory name.

    3.

    Copy "retaildb.categories" table to hdfs, in a directory name "categories_target".

    4.

    Copy "retaildb.categories" table to hdfs, in a warehouse directory name "categories_warehouse".

  • Question 34:

    Problem Scenario 44 : You have been given 4 files , with the content as given below: spark11/file1.txt Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework spark11/file2.txt The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed. spark11/file3.txt his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking spark11/file4.txt Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use

    Storm to transform unstructured data as it flows into a system into a desired format

    (spark11Afile1.txt)

    (spark11/file2.txt)

    (spark11/file3.txt)

    (sparkl 1/file4.txt)

    Write a Spark program, which will give you the highest occurring words in each file. With

    their file name and highest occurring words.

  • Question 35:

    Problem Scenario 90 : You have been given below two files course.txt id,course 1,Hadoop 2,Spark 3,HBase fee.txt id,fee 2,3900 3,4200 4,2900 Accomplish the following activities.

    1.

    Select all the courses and their fees , whether fee is listed or not.

    2.

    Select all the available fees and respective course. If course does not exists still list the fee

    3.

    Select all the courses and their fees , whether fee is listed or not. However, ignore records having fee as null.

  • Question 36:

    Problem Scenario 34 : You have given a file named spark6/user.csv. Data is given below: user.csv id,topic,hits Rahul,scala,120 Nikita,spark,80 Mithun,spark,1 myself,cca175,180 Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row. Map(id -> om, topic -> scala, hits -> 120)

  • Question 37:

    Problem Scenario GG : You have been given below code snippet.

    val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2)

    val b = a.keyBy(_.length)

    val c = sc.parallelize(List("ant", "falcon", "squid"), 2)

    val d = c.keyBy(.length)

    operation 1

    Write a correct code snippet for operationl which will produce desired output, shown below.

    Array[(lnt, String)] = Array((4,lion))

  • Question 38:

    Problem Scenario 84 : In Continuation of previous question, please accomplish following activities.

    1.

    Select all the products which has product code as null

    2.

    Select all the products, whose name starts with Pen and results should be order by Price descending order.

    3.

    Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order.

    4.

    Select top 2 products by price

  • Question 39:

    Problem Scenario 32 : You have given three files as below. spark3/sparkdir1/file1.txt spark3/sparkd ir2ffile2.txt spark3/sparkd ir3Zfile3.txt Each file contain some text.

    spark3/sparkdir1/file1.txt

    Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework

    spark3/sparkdir2/file2.txt

    The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

    spark3/sparkdir3/file3.txt

    his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking Now write a Spark code in scala which will load all these three files from hdfs and do the word count by filtering following words. And result should be sorted by word count in reverse order. Filter words ("a","the","an", "as", "a","with","this","these","is","are","in", "for", "to","and","The","of") Also please make sure you load all three files as a Single RDD (All three files must be loaded using single API call). You have also been given following codec import org.apache.hadoop.io.compress.GzipCodec Please use above codec to compress file, while saving in hdfs.

  • Question 40:

    Problem Scenario 77 : You have been given MySQL DB with following details.

    user=retail_dba

    password=cloudera

    database=retail_db

    table=retail_db.orders

    table=retail_db.order_items

    jdbc URL = jdbc:mysql://quickstart:3306/retail_db

    Columns of order table : (orderid , order_date , order_customer_id, order_status)

    Columns of ordeMtems table : (order_item_id , order_item_order_ld ,

    order_item_product_id, order_item_quantity,order_item_subtotal,order_

    item_product_price)

    Please accomplish following activities.

    1.

    Copy "retail_db.orders" and "retail_db.order_items" table to hdfs in respective directory p92_orders and p92 order items .

    2.

    Join these data using orderid in Spark and Python

    3.

    Calculate total revenue perday and per order

    4.

    Calculate total and average revenue for each date. - combineByKey -aggregateByKey

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.