Vcedump 100% Guareented CCA175 Questions and Answers. 100% Pass Guarantee. Latest Questions with Accurate Answers.

Exam Details

Exam Code
:CCA175
Exam Name
:CCA Spark and Hadoop Developer Exam
Certification
:Cloudera Certified Associate CCA
Vendor
:Cloudera
Total Questions
:95 Q&As
Last Updated
:May 12, 2024

Cloudera Cloudera Certified Associate CCA CCA175 Questions & Answers

Question 11:

Problem Scenario 13 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.
1.
Create a table in retailedb with following definition.
CREATE table departments_export (department_id int(11), department_name varchar(45),
created_date T1MESTAMP DEFAULT NOWQ);
2.
Now import the data from following directory into departments_export table,
/user/cloudera/departments new

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Login to musql db
mysql --user=retail_dba -password=cloudera
show databases; use retail_db; show tables;
step 2 : Create a table as given in problem statement.
CREATE table departments_export (departmentjd int(11), department_name varchar(45),
created_date T1MESTAMP DEFAULT NOW());
show tables;
Step 3 : Export data from /user/cloudera/departmentsnew to new table departments_export
sqoop export -connect jdbc:mysql://quickstart:3306/retail_db \
-username retaildba \
--password cloudera \
--table departments_export \
-export-dir /user/cloudera/departments_new \
-batch
Step 4 : Now check the export is correctly done or not. mysql -user*retail_dba password=cloudera
show databases;
use retail _db;
show tables;
select' from departments_export;
Question 12:

Problem Scenario 38 : You have been given an RDD as below,
val rdd: RDD[Array[Byte]]
Now you have to save this RDD as a SequenceFile. And below is the code snippet.
import org.apache.hadoop.io.compress.GzipCodec
rdd.map(bytesArray => (A.get(), new B(bytesArray))).saveAsSequenceFile('7output/path",classOt[GzipCodec]) What would be the correct replacement for A and B in above snippet.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
A. NullWritable
B. BytesWritable
Question 13:

Problem Scenario 43 : You have been given following code snippet.
val grouped = sc.parallelize(Seq(((1,"twoM), List((3,4), (5,6)))))
val flattened = grouped.flatMap {A =>
groupValues.map { value => B }
}
You need to generate following output.
Hence replace A and B
Array((1,two,3,4),(1,two,5,6))

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
A case (key, groupValues)
B (key._1, key._2, value._1, value._2)
Question 14:

Problem Scenario 75 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities.
1.
Copy "retail_db.order_items" table to hdfs in respective directory p90_order_items .
2.
Do the summation of entire revenue in this table using pyspark.
3.
Find the maximum and minimum revenue as well.
4.
Calculate average revenue
Columns of ordeMtems table : (order_item_id , order_item_order_id ,
order_item_product_id, order_item_quantity,order_item_subtotal,order_
item_subtotal,order_item_product_price)

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Import Single table . sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba password=cloudera -table=order_items --target -dir=p90 ordeMtems --m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000 Step 3 : In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile("p90_order_items") #Cast string to float extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(",")[4])) Step 4 : Verify extracted data for revenue in extractedRevenueColumn.collect(): print revenue #use reduce'function to sum a single column vale totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b) Step 5 : Calculate the maximum revenue maximumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a>=b else b)) Step 6 : Calculate the minimum revenue minimumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a<=b else b)) Step 7 : Caclculate average revenue count=extractedRevenueColumn.count() averageRev=totalRevenue/count
Question 15:

Problem Scenario 60 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}
val b = a.keyBy(_.length)
val c = sc.parallelize(List("dog","cat","gnu","salmon","rabbit","turkey","woif","bear","bee"), 3)
val d = c.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, (String, String))] = Array((6,(salmon,salmon)), (6,(salmon,rabbit)),
(6,(salmon,turkey)), (6,(salmon,salmon)), (6,(salmon,rabbit)),
(6,(salmon,turkey)), (3,(dog,dog)), (3,(dog,cat)), (3,(dog,gnu)), (3,(dog,bee)), (3,(rat,dog)),
(3,(rat,cat)), (3,(rat,gnu)), (3,(rat,bee)))

Correct Answer: See the explanation for Step by Step Solution and configuration.
solution: b.join(d).collect join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the original value of the newly created tuples.
Question 16:

Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.
1.
Select Maximum, minimum, average , Standard Deviation, and total quantity.
2.
Select minimum and maximum price for each product code.
3.
Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.
4.
Select all the product code and average price only where product count is more than or equal to 3.
5.
Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Select Maximum, minimum, average , Standard Deviation, and total quantity.
val results = sqlContext.sql('.....SELECT MAX(price) AS MAX , MIN(price) AS MIN ,
AVG(price) AS Average, STD(price) AS STD, SUM(quantity) AS total_products FROM
products......)
results. showQ
Step 2 : Select minimum and maximum price for each product code.
val results = sqlContext.sql(......SELECT code, MAX(price) AS Highest Price', MIN(price)
AS Lowest Price'
FROM products GROUP BY code......)
results. showQ
Step 3 : Select Maximum, minimum, average , Standard Deviation, and total quantity for
each product code, hwoever make sure Average and Standard deviation will have
maximum two decimal values.
val results = sqlContext.sql(......SELECT code, MAX(price), MIN(price),
CAST(AVG(price} AS DECIMAL(7,2)) AS Average', CAST(STD(price) AS DECIMAL(7,2))
AS 'Std Dev\ SUM(quantity) FROM products
GROUP BY code......)
results. showQ
Step 4 : Select all the product code and average price only where product count is more
than or equal to 3.
val results = sqlContext.sql(......SELECT code AS Product Code',
COUNTf) AS Count',
CAST(AVG(price) AS DECIMAL(7,2)) AS Average' FROM products GROUP BY code
HAVING Count >=3"M") results. showQ
Step 5 : Select maximum, minimum , average and total of all the products for each code.
Also produce the same across all the products.
val results = sqlContext.sql( """SELECT
code,
MAX(price),
MIN(pnce),
CAST(AVG(price) AS DECIMAL(7,2)) AS Average',
SUM(quantity)FROM products
GROUP BY code
WITH ROLLUP""" )
results. show()
Question 17:

Problem Scenario 92 : You have been given a spark scala application, which is bundled in
jar named hadoopexam.jar.
Your application class name is com.hadoopexam.MyTask
You want that while submitting your application should launch a driver on one of the cluster
node.
Please complete the following command to submit the application.
spark-submit XXX -master yarn \
YYY SSPARK HOME/lib/hadoopexam.jar 10

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution
XXX: -class com.hadoopexam.MyTask YYY : --deploy-mode cluster
Question 18:

Problem Scenario 94 : You have to run your Spark application on yarn with each executor
20GB and number of executors should be 50. Please replace XXX, YYY, ZZZ
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
-class com.hadoopexam.MyTask \
xxx\
-deploy-mode cluster \ # can be client for client mode
YYY\
222 \
/path/to/hadoopexam.jar \
1000

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution
XXX: -master yarn YYY : -executor-memory 20G ZZZ: -num-executors 50
Question 19:

Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1.
create employee.json file locally.
2.
Load this file on hdfs
3.
Register this data as a temp table in Spark using Python.
4.
Write select query and print this data.
5.
Now save back this selected data in json format.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
Step 3 : Write spark script
#lmport SQLContext
from pyspark import SQLContext
#Create instance of SQLContext sqIContext = SQLContext(sc)
#Load json file
employee = sqlContext.jsonFile("employee.json")
#Register RDD as a temp table employee.registerTempTablef'EmployeeTab"}
#Select data from Employee table
employeelnfo = sqlContext.sql("select * from EmployeeTab"}
#lterate data and print
for row in employeelnfo.collect():
print(row)
Step 4 : Write dataas a Text file employeelnfo.toJSON().saveAsTextFile("employeeJson1") Step 5: Check whether data has been created or not hadoop fs -cat employeeJsonl/part"
Question 20:

Problem Scenario 10 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.
1. Create a database named hadoopexam and then create a table named departments in it, with following fields. department_id int, department_name string
e.g. location should be hdfs://quickstart.cloudera:8020/user/hive/warehouse/hadoopexam.db/departments
2.
Please import data in existing table created above from retaidb.departments into hive table hadoopexam.departments.
3.
Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Go to hive interface and create database.
hive
create database hadoopexam;
Step 2. Use the database created in above step and then create table in it. use
hadoopexam; show tables;
Step 3 : Create table in it.
create table departments (department_id int, department_name string);
show tables;
desc departments;
desc formatted departments;
Step 4 : Please check following directory must not exist else it will give error, hdfs dfs -Is
/user/cloudera/departments
If directory already exists, make sure it is not useful and than delete the same.
This is the staging directory where Sqoop store the intermediate data before pushing in
hive table.
hadoop fs -rm -R departments
Step 5 : Now import data in existing table
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
--table departments \
-hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
-hive-table hadoopexam.departments
Step 6 : Check whether data has been loaded or not.
hive;
use hadoopexam;
show tables;
select" from departments;
desc formatted departments;
Step 7 : Import data in non-existing tables in hive and create table while importing.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
~password=cloudera \
-table departments \
-hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
-hive-table hadoopexam.departments_new \
-create-hive-table
Step 8 : Check-whether data has been loaded or not.
hive;
use hadoopexam;
show tables;
select" from departments_new;
desc formatted departments_new;

Related Exams:

CCA175
CCA Spark and Hadoop Developer Exam

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

CCA Spark and Hadoop Developer Exam

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Cloudera Cloudera Certified Associate CCA CCA175 Questions & Answers

Question 11:

Question 12:

Question 13:

Question 14:

Question 15:

Question 16:

Question 17:

Question 18:

Question 19:

Question 20:

Related Exams:

CCA175

Tips on How to Prepare for the Exams