Exam Details

  • Exam Code
    :CCA175
  • Exam Name
    :CCA Spark and Hadoop Developer Exam
  • Certification
    :Cloudera Certified Associate CCA
  • Vendor
    :Cloudera
  • Total Questions
    :95 Q&As
  • Last Updated
    :

Cloudera Cloudera Certified Associate CCA CCA175 Questions & Answers

  • Question 1:

    Problem Scenario 91 : You have been given data in json format as below.

    {"first_name":"Ankit", "last_name":"Jain"}

    {"first_name":"Amir", "last_name":"Khan"}

    {"first_name":"Rajesh", "last_name":"Khanna"}

    {"first_name":"Priynka", "last_name":"Chopra"}

    {"first_name":"Kareena", "last_name":"Kapoor"}

    {"first_name":"Lokesh", "last_name":"Yadav"}

    Do the following activity

    1.

    create employee.json tile locally.

    2.

    Load this tile on hdfs

    3.

    Register this data as a temp table in Spark using Python.

    4.

    Write select query and print this data.

    5.

    Now save back this selected data in json format.

  • Question 2:

    Problem Scenario 80 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.products jdbc URL = jdbc:mysql://quickstart:3306/retail_db Columns of products table : (product_id | product_category_id | product_name | product_description | product_price | product_image ) Please accomplish following activities.

    1.

    Copy "retaildb.products" table to hdfs in a directory p93_products

    2.

    Now sort the products data sorted by product price per category, use productcategoryid colunm to group by category

  • Question 3:

    Problem Scenario 71 :

    Write down a Spark script using Python,

    In which it read a file "Content.txt" (On hdfs) with following content.

    After that split each row as (key, value), where key is first word in line and entire line as

    value.

    Filter out the empty lines.

    And save this key value in "problem86" as Sequence file(On hdfs)

    Part 2 : Save as sequence file , where key as null and entire line as value. Read back the

    stored sequence files.

    Content.txt

    Hello this is ABCTECH.com

    This is XYZTECH.com

    Apache Spark Training

    This is Spark Learning Session Spark is faster than MapReduce

  • Question 4:

    Problem Scenario 23 : You have been given log generating service as below. Start_logs (It will generate continuous logs) Tail_logs (You can check , what logs are being generated) Stop_logs (It will stop the log service) Path where logs are generated using above service : /opt/gen_logs/logs/access.log Now write a flume configuration file named flume3.conf , using that configuration file dumps logs in HDFS file system in a directory called flumeflume3/%Y/%m/%d/%H/%M Means every minute new directory should be created). Please us the interceptors to provide timestamp information, if message header does not have header info. And also note that you have to preserve existing timestamp, if message contains it. Flume channel should have following property as well. After every 100 message it should be committed, use non-durable/faster channel and it should be able to hold maximum 1000 events.

  • Question 5:

    Problem Scenario 87 : You have been given below three files product.csv (Create this file in hdfs) productID,productCode,name,quantity,price,supplierid 1001,PEN,Pen Red,5000,1.23,501 1002,PEN,Pen Blue,8000,1.25,501

    1003,PEN,Pen Black,2000,1.25,501 1004,PEC,Pencil 2B,10000,0.48,502 1005,PEC,Pencil 2H,8000,0.49,502 1006,PEC,Pencil HB,0,9999.99,502 2001,PEC,Pencil 3B,500,0.52,501 2002,PEC,Pencil 4B,200,0.62,501 2003,PEC,Pencil 5B,100,0.73,501 2004,PEC,Pencil 6B,500,0.47,502 supplier.csv supplierid,name,phone 501,ABC Traders,88881111 502,XYZ Company,88882222 503,QQ Corp,88883333 products_suppliers.csv productID,supplierID 2001,501 2002,501 2003,501 2004,502 2001,503 Now accomplish all the queries given in solution. Select product, its price , its supplier name where product price is less than 0.6 using SparkSQL

  • Question 6:

    Problem Scenario 6 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Compression Codec : org.apache.hadoop.io.compress.SnappyCodec Please accomplish following.

    1.

    Import entire database such that it can be used as a hive tables, it must be created in default schema.

    2.

    Also make sure each tables file is partitioned in 3 files e.g. part-00000, part-00002, part00003

    3.

    Store all the Java files in a directory called java_output to evalute the further

  • Question 7:

    Problem Scenario 41 : You have been given below code snippet.

    val aul = sc.parallelize(List (("a" , Array(1,2)), ("b" , Array(1,2))))

    val au2 = sc.parallelize(List (("a" , Array(3)), ("b" , Array(2))))

    Apply the Spark method, which will generate below output.

    Array[(String, Array[lnt])] = Array((a,Array(1, 2)), (b,Array(1, 2)), (a(Array(3)), (b,Array(2)))

  • Question 8:

    Problem Scenario 7 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.

    1.

    Import department tables using your custom boundary query, which import departments between 1 to 25.

    2.

    Also make sure each tables file is partitioned in 2 files e.g. part-00000, part-00002

    3.

    Also make sure you have imported only two columns from table, which are department_id,department_name

  • Question 9:

    Problem Scenario 12 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.

    1.

    Create a table in retailedb with following definition.

    CREATE table departments_new (department_id int(11), department_name varchar(45),

    created_date T1MESTAMP DEFAULT NOW());

    2.

    Now isert records from departments table to departments_new

    3.

    Now import data from departments_new table to hdfs.

    4.

    Insert following 5 records in departmentsnew table. Insert into departments_new

    values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null);

    Insert into departments_new values(112, "Automobile" , null); Insert into departments_new

    values(113, "Pharma" , null);

    Insert into departments_new values(114, "Social Engineering" , null);

    5.

    Now do the incremental import based on created_date column.

  • Question 10:

    Problem Scenario 3: You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.categories jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities.

    1.

    Import data from categories table, where category=22 (Data should be stored in categories subset)

    2.

    Import data from categories table, where category>22 (Data should be stored in categories_subset_2)

    3.

    Import data from categories table, where category between 1 and 22 (Data should be stored in categories_subset_3)

    4.

    While importing catagories data change the delimiter to '|' (Data should be stored in categories_subset_S)

    5.

    Importing data from catagories table and restrict the import to category_name,category id columns only with delimiter as '|'

    6.

    Add null values in the table using below SQL statement ALTER TABLE categories modify category_department_id int(11); INSERT INTO categories values (eO.NULL.'TESTING');

    7.

    Importing data from catagories table (In categories_subset_17 directory) using '|' delimiter and categoryjd between 1 and 61 and encode null values for both string and non string columns.

    8.

    Import entire schema retail_db in a directory categories_subset_all_tables

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.