Cloudera Cloudera Certified Associate CCA CCA175 Questions & Answers
Question 11:
Problem Scenario 13 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.
1.
Create a table in retailedb with following definition.
Step 4 : Now check the export is correctly done or not. mysql -user*retail_dba password=cloudera
show databases;
use retail _db;
show tables;
select' from departments_export;
Question 12:
Problem Scenario 38 : You have been given an RDD as below,
val rdd: RDD[Array[Byte]]
Now you have to save this RDD as a SequenceFile. And below is the code snippet.
import org.apache.hadoop.io.compress.GzipCodec
rdd.map(bytesArray => (A.get(), new B(bytesArray))).saveAsSequenceFile('7output/path",classOt[GzipCodec]) What would be the correct replacement for A and B in above snippet.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
A. NullWritable
B. BytesWritable
Question 13:
Problem Scenario 43 : You have been given following code snippet.
val grouped = sc.parallelize(Seq(((1,"twoM), List((3,4), (5,6)))))
val flattened = grouped.flatMap {A =>
groupValues.map { value => B }
}
You need to generate following output.
Hence replace A and B
Array((1,two,3,4),(1,two,5,6))
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
A case (key, groupValues)
B (key._1, key._2, value._1, value._2)
Question 14:
Problem Scenario 75 : You have been given MySQL DB with following details. user=retail_dba password=cloudera database=retail_db table=retail_db.orders table=retail_db.order_items jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following activities.
1.
Copy "retail_db.order_items" table to hdfs in respective directory p90_order_items .
2.
Do the summation of entire revenue in this table using pyspark.
3.
Find the maximum and minimum revenue as well.
4.
Calculate average revenue
Columns of ordeMtems table : (order_item_id , order_item_order_id ,
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Import Single table . sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba password=cloudera -table=order_items --target -dir=p90 ordeMtems --m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs Step 2 : Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000 Step 3 : In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile("p90_order_items") #Cast string to float extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(",")[4])) Step 4 : Verify extracted data for revenue in extractedRevenueColumn.collect(): print revenue #use reduce'function to sum a single column vale totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b) Step 5 : Calculate the maximum revenue maximumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a>=b else b)) Step 6 : Calculate the minimum revenue minimumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a<=b else b)) Step 7 : Caclculate average revenue count=extractedRevenueColumn.count() averageRev=totalRevenue/count
Question 15:
Problem Scenario 60 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}
val b = a.keyBy(_.length)
val c = sc.parallelize(List("dog","cat","gnu","salmon","rabbit","turkey","woif","bear","bee"), 3)
val d = c.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Correct Answer: See the explanation for Step by Step Solution and configuration.
solution: b.join(d).collect join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the original value of the newly created tuples.
Question 16:
Problem Scenario 86 : In Continuation of previous question, please accomplish following activities.
1.
Select Maximum, minimum, average , Standard Deviation, and total quantity.
2.
Select minimum and maximum price for each product code.
3.
Select Maximum, minimum, average , Standard Deviation, and total quantity for each product code, hwoever make sure Average and Standard deviation will have maximum two decimal values.
4.
Select all the product code and average price only where product count is more than or equal to 3.
5.
Select maximum, minimum , average and total of all the products for each code. Also produce the same across all the products.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Select Maximum, minimum, average , Standard Deviation, and total quantity.
val results = sqlContext.sql('.....SELECT MAX(price) AS MAX , MIN(price) AS MIN ,
AVG(price) AS Average, STD(price) AS STD, SUM(quantity) AS total_products FROM
products......)
results. showQ
Step 2 : Select minimum and maximum price for each product code.
val results = sqlContext.sql(......SELECT code, MAX(price) AS Highest Price', MIN(price)
AS Lowest Price'
FROM products GROUP BY code......)
results. showQ
Step 3 : Select Maximum, minimum, average , Standard Deviation, and total quantity for
each product code, hwoever make sure Average and Standard deviation will have
maximum two decimal values.
val results = sqlContext.sql(......SELECT code, MAX(price), MIN(price),
CAST(AVG(price} AS DECIMAL(7,2)) AS Average', CAST(STD(price) AS DECIMAL(7,2))
AS 'Std Dev\ SUM(quantity) FROM products
GROUP BY code......)
results. showQ
Step 4 : Select all the product code and average price only where product count is more
than or equal to 3.
val results = sqlContext.sql(......SELECT code AS Product Code',
COUNTf) AS Count',
CAST(AVG(price) AS DECIMAL(7,2)) AS Average' FROM products GROUP BY code
HAVING Count >=3"M") results. showQ
Step 5 : Select maximum, minimum , average and total of all the products for each code.
Also produce the same across all the products.
val results = sqlContext.sql( """SELECT
code,
MAX(price),
MIN(pnce),
CAST(AVG(price) AS DECIMAL(7,2)) AS Average',
SUM(quantity)FROM products
GROUP BY code
WITH ROLLUP""" )
results. show()
Question 17:
Problem Scenario 92 : You have been given a spark scala application, which is bundled in
jar named hadoopexam.jar.
Your application class name is com.hadoopexam.MyTask
You want that while submitting your application should launch a driver on one of the cluster
node.
Please complete the following command to submit the application.
spark-submit XXX -master yarn \
YYY SSPARK HOME/lib/hadoopexam.jar 10
Correct Answer: See the explanation for Step by Step Solution and configuration.
Problem Scenario 73 : You have been given data in json format as below.
{"first_name":"Ankit", "last_name":"Jain"}
{"first_name":"Amir", "last_name":"Khan"}
{"first_name":"Rajesh", "last_name":"Khanna"}
{"first_name":"Priynka", "last_name":"Chopra"}
{"first_name":"Kareena", "last_name":"Kapoor"}
{"first_name":"Lokesh", "last_name":"Yadav"}
Do the following activity
1.
create employee.json file locally.
2.
Load this file on hdfs
3.
Register this data as a temp table in Spark using Python.
4.
Write select query and print this data.
5.
Now save back this selected data in json format.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : create employee.json tile locally.
vi employee.json (press insert) past the content.
Step 2 : Upload this tile to hdfs, default location hadoop fs -put employee.json
Step 3 : Write spark script
#lmport SQLContext
from pyspark import SQLContext
#Create instance of SQLContext sqIContext = SQLContext(sc)
#Load json file
employee = sqlContext.jsonFile("employee.json")
#Register RDD as a temp table employee.registerTempTablef'EmployeeTab"}
#Select data from Employee table
employeelnfo = sqlContext.sql("select * from EmployeeTab"}
#lterate data and print
for row in employeelnfo.collect():
print(row)
Step 4 : Write dataas a Text file employeelnfo.toJSON().saveAsTextFile("employeeJson1") Step 5: Check whether data has been created or not hadoop fs -cat employeeJsonl/part"
Question 20:
Problem Scenario 10 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Please accomplish following.
1. Create a database named hadoopexam and then create a table named departments in it, with following fields. department_id int, department_name string
e.g. location should be hdfs://quickstart.cloudera:8020/user/hive/warehouse/hadoopexam.db/departments
2.
Please import data in existing table created above from retaidb.departments into hive table hadoopexam.departments.
3.
Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Go to hive interface and create database.
hive
create database hadoopexam;
Step 2. Use the database created in above step and then create table in it. use
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.