Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : b.countByValue countByValue Returns a map that contains all unique values of the RDD and their respective occurrence counts. (Warning: This operation will finally aggregate the information in a single reducer.) Listing Variants def countByValue(): Map[T, Long]
Question 82:
Problem Scenario 11 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1.
Import departments table in a directory called departments.
2.
Once import is done, please insert following 5 records in departments mysql table.
Insert into departments(10, physics);
Insert into departments(11, Chemistry);
Insert into departments(12, Maths);
Insert into departments(13, Science);
Insert into departments(14, Engineering);
3.
Now import only new inserted records and append to existring directory . which has been created in first step.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Clean already imported data. (In real exam, please make sure you dont delete
Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.
1.
Create a directory in hdfs named hdfs_commands.
2.
Create a file in hdfs named data.txt in hdfs_commands.
3.
Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
4.
Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
5.
Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
6.
Create a file in local filesystem named file1.txt and put it to hdfs
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create directory hdfs dfs -mkdir hdfs_commands Step 2 : Create a file in hdfs named data.txt in hdfs_commands. hdfs dfs -touchz hdfs_commands/data.txt Step 3 : Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions. hdfs dfs -copyToLocal -p hdfs_commands/data.txt/home/cloudera/Desktop/HadoopExam Step 4 : Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory. touch data_local.txt hdfs dfs -moveFromLocal /home/cloudera/Desktop/HadoopExam/dataJocal.txt hdfs_commands/ Step 5 : Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file
Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.
1.
Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumemaleemployee (Create hive table as well tor given data).
2.
While importing, make sure only male employee data is stored.
Correct Answer: See the explanation for Step by Step Solution and configuration.
name string,salary int, sex string, age int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf. #Define source , sink, channel and agent. agent1 .sources = source1 agent1 .sinks = sink1 agent1 .channels = channel1 # Describe/configure source1 agent1 .sources.source1.type = netcat agent1 .sources.source1.bind = 127.0.0.1 agent1.sources.sourcel.port = 44444 #Define interceptors agent1.sources.source1.interceptors=il agent1 .sources.source1.interceptors.i1.type=regex_filter agent1 .sources.source1.interceptors.i1.regex=female agent1 .sources.source1.interceptors.i1.excludeEvents=true ## Describe sink1 agent1 .sinks, sinkl.channel = memory-channel agent1.sinks.sink1.type = hdfs agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text agentl .sinks.sink1.hdfs.fileType = Data Stream # Now we need to define channel1 property. agent1.channels.channel1.type = memory agent1.channels.channell.capacity = 1000 agent1.channels.channel1.transactionCapacity = 100 # Bind the source and sink to the channel agent1 .sources.source1.channels = channel1 agent1 .sinks.sink1.channel = channel1 step 3 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl Step 4 : Open another terminal and use the netcat service, nc localhost 44444 Step 5 : Enter data line by line. alok,100000,male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki.123000.male.29 Step 6 : Open hue and check the data is available in hive table or not. Step 7 : Stop flume service by pressing ctrl+c Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
Question 87:
Problem Scenario 54 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"))
val b = a.map(x => (x.length, x))
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Problem Scenario 22 : You have been given below comma separated employee information. name,salary,sex,age alok,100000,male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki,123000,male,29 Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.
1.
Create a flume conf file using fastest channel, which write data in hive warehouse directory, in a table called flumeemployee (Create hive table as well tor given data).
2.
Write a hive query to read average salary of all employees.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create hive table forflumeemployee.' CREATE TABLE flumeemployee ( name string, salary int, sex string,age int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; Step 2 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume2.conf. #Define source , sink , channel and agent, agent1 .sources = source1 agent1 .sinks = sink1 agent1.channels = channel1 # Describe/configure source1 agent1.sources.source1.type = netcat agent1.sources.source1.bind = 127.0.0.1 agent1.sources.source1.port = 44444 ## Describe sink1 agent1 .sinks.sink1.channel = memory-channel agent1.sinks.sink1.type = hdfs agent1 .sinks.sink1.hdfs.path = /user/hive/warehouse/flumeemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text agent1 .sinks.sink1.hdfs.tileType = Data Stream # Now we need to define channel1 property. agent1.channels.channel1.type = memory agent1.channels.channel1.capacity = 1000 agent1.channels.channel1.transactionCapacity = 100 # Bind the source and sink to the channel Agent1 .sources.sourcel.channels = channell agent1 .sinks.sinkl.channel = channel1 Step 3 : Run below command which will use this configuration file and append data in hdfs. Start flume service: flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume2.conf --name agent1 Step 4 : Open another terminal and use the netcat service. nc localhost 44444 Step 5 : Enter data line by line. alok,100000.male,29 jatin,105000,male,32 yogesh,134000,male,39 ragini,112000,female,35 jyotsana,129000,female,39 valmiki,123000,male,29 Step 6 : Open hue and check the data is available in hive table or not. step 7 : Stop flume service by pressing ctrl+c Step 8 : Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;
Question 89:
Problem Scenario 18 : You have been given following mysql database details as well as other info. user=retail_dba password=cloudera database=retail_db jdbc URL = jdbc:mysql://quickstart:3306/retail_db Now accomplish following activities.
1.
Create mysql table as below.
mysql --user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
2.
Now export data from hive table departments_hive01 in departments_hive02. While
exporting, please note following. wherever there is a empty string it should be loaded as a null value in
mysql.
wherever there is -999 value for int field, it should be created as null value.
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Create table in mysql db as well.
mysql ~user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive02(id int, department_name
varchar(45), avg_salary int);
show tables;
Step 2 : Now export data from hive table to mysql table as per the requirement.
step 3 : Now validate the data,select * from departments_hive02;
Question 90:
Problem Scenario 69 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. And filter out the word which is less than 2 characters and ignore all empty lines. Once doen store the filtered data in a directory called "problem84" (On hdfs) Content.txt Hello this is ABCTECH.com This is ABYTECH.com Apache Spark TrainingThis is Spark Learning Session Spark is faster than MapReduce
Correct Answer: See the explanation for Step by Step Solution and configuration.
Solution : Step 1 : Create an application with following code and store it in problem84.py # Import SparkContext and SparkConf from pyspark import SparkContext, SparkConf # Create configuration object and set App name
conf = SparkConf().setAppName("CCA 175 Problem 84") sc = sparkContext(conf=conf)
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your CCA175 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.