Vcedump 100% Guareented CCA175 Questions and Answers. 100% Pass Guarantee. Latest Questions with Accurate Answers.

Exam Details

Exam Code
:CCA175
Exam Name
:CCA Spark and Hadoop Developer Exam
Certification
:Cloudera Certifications
Vendor
:Cloudera
Total Questions
:95 Q&As
Last Updated
:Jun 24, 2025

Cloudera Cloudera Certifications CCA175 Questions & Answers

Question 81:

Problem Scenario 52 : You have been given below code snippet.
val b = sc.parallelize(List(1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1))
Operation_xyz
Write a correct code snippet for Operation_xyz which will produce below output.
scalaxollection.Map[lnt,Long] = Map(5 -> 1, 8 -> 1, 3 -> 1, 6 -> 1, 1 -> S, 2 -> 3, 4 -> 2, 7 ->
1)

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : b.countByValue countByValue Returns a map that contains all unique values of the RDD and their respective occurrence counts. (Warning: This operation will finally aggregate the information in a single reducer.) Listing Variants def countByValue(): Map[T, Long]
Question 82:

Problem Scenario 11 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1.
Import departments table in a directory called departments.
2.
Once import is done, please insert following 5 records in departments mysql table.
Insert into departments(10, physics);
Insert into departments(11, Chemistry);
Insert into departments(12, Maths);
Insert into departments(13, Science);
Insert into departments(14, Engineering);
3.
Now import only new inserted records and append to existring directory . which has been created in first step.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Clean already imported data. (In real exam, please make sure you dont delete
data generated from previous exercise).
hadoop fs -rm -R departments
Step 2 : Import data in departments directory.
sqoop import \
--connect jdbc:mysql://quickstart:3306/retail_db \
--username=retail_dba \
-password=cloudera \
-table departments \
"target-dir/user/cloudera/departments
Step 3 : Insert the five records in departments table.
mysql -user=retail_dba --password=cloudera retail_db
Insert into departments values(10, "physics"); Insert into departments values(11,
"Chemistry"); Insert into departments values(12, "Maths"); Insert into departments
values(13, "Science"); Insert into departments values(14, "Engineering"); commit;
select' from departments;
Step 4 : Get the maximum value of departments from last import, hdfs dfs -cat
/user/cloudera/departments/part* that should be 7
Step 5 : Do the incremental import based on last import and append the results.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:330G/retail_db" \
~username=retail_dba \
-password=cloudera \
-table departments \
--target-dir /user/cloudera/departments \
-append \
-check-column "department_id" \
-incremental append \
-last-value 7
Step 6 : Now check the result.
hdfs dfs -cat /user/cloudera/departments/part"
Question 83:

Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.
1.
Create a directory in hdfs named hdfs_commands.
2.
Create a file in hdfs named data.txt in hdfs_commands.
3.
Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
4.
Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
5.
Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
6.
Create a file in local filesystem named file1.txt and put it to hdfs

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create directory hdfs dfs -mkdir hdfs_commands Step 2 : Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt Step 3 : Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions. hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam Step 4 : Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory. touch data_local.txt hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/ Step 5 : Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file
system.
hdfs dfs -touchz hdfscommands/data hdfs.txt
hdfs dfs -getfrdfs_commands/data_hdfs.txt /home/cloudera/Desktop/HadoopExam/
Step 6 : Create a file in local filesystem named filel .txt and put it to hdfs
touch filel.txt
hdfs dfs -put/home/cloudera/Desktop/HadoopExam/file1.txt hdfs_commands/
Question 84:

Problem Scenario 85 : In Continuation of previous question, please accomplish following activities.
1.
Select all the columns from product table with output header as below. productID AS ID code AS Code name AS Description price AS 'Unit Price'
2.
Select code and name both separated by ' -' and header name should be Product Description'.
3.
Select all distinct prices.
4.
Select distinct price and name combination.
5.
Select all price data sorted by both code and productID combination.
6.
count number of products.
7.
Count number of products for each code.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Select all the columns from product table with output header as below. productID
AS ID code AS Code name AS Description price AS "Unit Price'
val results = sqlContext.sql(......SELECT productID AS ID, code AS Code, name AS
Description, price AS Unit Price' FROM products ORDER BY ID"""
results.show()
Step 2 : Select code and name both separated by ' -' and header name should be "Product
Description.
val results = sqlContext.sql(......SELECT CONCAT(code,' -', name) AS Product Description,
price FROM products""" )
results.showQ
Step 3 : Select all distinct prices.
val results = sqlContext.sql(......SELECT DISTINCT price AS Distinct Price" FROM
products......)
results.show()
Step 4 : Select distinct price and name combination.
val results = sqlContext.sql(......SELECT DISTINCT price, name FROM products""" )
results. showQ
Step 5 : Select all price data sorted by both code and productID combination.
val results = sqlContext.sql('.....SELECT' FROM products ORDER BY code, productID'.....)
results.show()
Step 6 : count number of products.
val results = sqlContext.sql(......SELECT COUNT(') AS 'Count' FROM products......)
results.show()
Step 7 : Count number of products for each code.
val results = sqlContext.sql(......SELECT code, COUNT('} FROM products GROUP BY
code......)
results. showQ
val results = sqlContext.sql(......SELECT code, COUNT('} AS count FROM products
GROUP BY code ORDER BY count DESC......)
results. showQ
Question 85:

Problem Scenario 57 : You have been given below code snippet.
val a = sc.parallelize(1 to 9, 3) operationl
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(String, Seq[lnt])] = Array((even,ArrayBuffer(2, 4, G, 8)), (odd,ArrayBuffer(1, 3, 5, 7,
9)))

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
a.groupBy(x => {if (x % 2 == 0) "even" else "odd" }).collect
Question 86:

Problem Scenario 24 : You have been given below comma separated employee information.
Data Set:
name,salary,sex,age alok,100000,male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki,123000,male,29
Requirements:
Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.
1.
Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumemaleemployee (Create hive table as well tor given data).
2.
While importing, make sure only male employee data is stored.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Step 1 : Create hive table for flumeemployee.' CREATE TABLE flumemaleemployee (
name string,salary int, sex string, age int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf. #Define source , sink, channel and agent. agent1 .sources = source1 agent1 .sinks = sink1 agent1 .channels = channel1 # Describe/configure source1 agent1 .sources.source1.type = netcat agent1 .sources.source1.bind = 127.0.0.1 agent1.sources.sourcel.port = 44444 #Define interceptors agent1.sources.source1.interceptors=il agent1 .sources.source1.interceptors.i1.type=regex_filter agent1 .sources.source1.interceptors.i1.regex=female agent1 .sources.source1.interceptors.i1.excludeEvents=true ## Describe sink1 agent1 .sinks, sinkl.channel = memory-channel agent1.sinks.sink1.type = hdfs agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text agentl .sinks.sink1.hdfs.fileType = Data Stream # Now we need to define channel1 property. agent1.channels.channel1.type = memory agent1.channels.channell.capacity = 1000 agent1.channels.channel1.transactionCapacity = 100 # Bind the source and sink to the channel agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 step 3 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl Step 4 : Open another terminal and use the netcat service, nc localhost 44444 Step 5 : Enter data line by line. alok,100000,male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki.123000.male.29 Step 6 : Open hue and check the data is available in hive table or not. Step 7 : Stop flume service by pressing ctrl+c Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
Question 87:

Problem Scenario 54 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"))
val b = a.map(x => (x.length, x))
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String)] = Array((4,lion), (7,panther), (3,dogcat), (5,tigereagle))

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
b.foidByKey("")(_ + J.collect
foldByKey [Pair]
Very similar to fold, but performs the folding separately for each key of the RDD. This
function is only available if the RDD consists of two-component tuples
Listing Variants
def foldByKey(zeroValue: V)(func: (V, V) => V): RDD[(K, V}]
def foldByKey(zeroValue: V, numPartitions: lnt)(func: (V, V) => V): RDD[(K, V)]
def foldByKey(zeroValue: V, partitioner: Partitioner)(func: (V, V) => V): RDD[(K, V}]
Question 88:

Problem Scenario 22 : You have been given below comma separated employee information. name,salary,sex,age alok,100000,male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki,123000,male,29 Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.
1.
Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumeemployee (Create hive table as well tor given data).
2.
Write a hive query to read average salary of all employees.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create hive table forflumeemployee.' CREATE TABLE flumeemployee ( name string, salary int, sex string,age int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf. #Define source , sink , channel and agent, agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 # Describe/configure source1 agent1.sources.source1.type = netcat agent1.sources.source1.bind = 127.0.0.1 agent1.sources.source1.port = 44444 ## Describe sink1 agent1 .sinks.sink1.channel = memory-channel agent1.sinks.sink1.type = hdfs agent1 .sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text agent1 .sinks.sink1.hdfs.tileType = Data Stream # Now we need to define channel1 property. agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 1000 agent1.channels.channel1.transactionCapacity = 100 # Bind the source and sink to the channel Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1 Step 3 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume2.conf --name agent1 Step 4 : Open another terminal and use the netcat service. nc localhost 44444 Step 5 : Enter data line by line. alok,100000.male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki,123000,male,29 Step 6 : Open hue and check the data is available in hive table or not. step 7 : Stop flume service by pressing ctrl+c Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
Question 89:

Problem Scenario 18 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Now accomplish following activities.
1.
Create mysql table as below.
mysql --user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
2.
Now export data from hive table departments_hive01 in departments_hive02. While
exporting, please note following. wherever there is a empty string it should be loaded as a null value in
mysql.
wherever there is -999 value for int field, it should be created as null value.

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Create table in mysql db as well.
mysql ~user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
Step 2 : Now export data from hive table to mysql table as per the requirement.
sqoop export --connect jdbc:mysql://quickstart:3306/retail_db \
-username retaildba \
-password cloudera \
--table departments_hive02 \
-export-dir /user/hive/warehouse/departments_hive01 \
-input-fields-terminated-by '\001' \
--input-Iines-terminated-by '\n' \
--num-mappers 1 \
-batch \
-Input-null-string "" \
-input-null-non-string -999
step 3 : Now validate the data,select * from departments_hive02;
Question 90:

Problem Scenario 69 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. And filter out the word which is less than 2 characters and ignore all empty lines. Once doen store the filtered data in a directory called "problem84" (On hdfs) Content.txt Hello this is ABCTECH.com This is ABYTECH.com Apache Spark TrainingThis is Spark Learning Session Spark is faster than MapReduce

Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create an application with following code and store it in problem84.py # Import SparkContext and SparkConf from pyspark import SparkContext, SparkConf # Create configuration object and set App name
conf = SparkConf().setAppName("CCA 175 Problem 84") sc = sparkContext(conf=conf)
#load data from hdfs
contentRDD = sc.textFile(MContent.txt")
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)
#Split line based on space
words = nonempty_lines.ffatMap(lambda x: x.split(''}}
#filter out all 2 letter words
finalRDD = words.filter(lambda x: len(x) > 2)
for word in finalRDD.collect():
print(word)
#Save final data finalRDD.saveAsTextFile("problem84M)
step 2 : Submit this application
spark-submit -master yarn problem84.py

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Cloudera Cloudera Certifications CCA175 Questions & Answers

Question 81:

Question 82:

Question 83:

Question 84:

Question 85:

Question 86:

Question 87:

Question 88:

Question 89:

Question 90:

Related Exams:

CCA-175

CCA-410

CCA-470

CCA-500

CCA-505

CCB-400

CCD-410

CCD-470

Tips on How to Prepare for the Exams

CCA Spark and Hadoop Developer Exam

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Cloudera Cloudera Certifications CCA175 Questions & Answers

Question 81:

Question 82:

Question 83:

Question 84:

Question 85:

Question 86:

Question 87:

Question 88:

Question 89:

Question 90:

Related Exams:

Tips on How to Prepare for the Exams