Big data answers quiz 2
i am not sure if all the answers are correct but you will pass definitely follow the questions below
Which of the library supportd real time streaming?
ANS:- Spark Streaming.
choose the function which is not transformation
1.sample
2.Reduce
3.Cogroup
4.union
which of this not related to spark ecosystem?
Spark CTL
Which is the graph library fot spark?
1.Network
2.None of the above
3.NetworkX
4.GraphX
which of the following statements are "true "about Lazy Executed?
Ans- it is used in situations when it's not mandatory to execute a bunch of programs immediately.
what are diffrent memory storage levels?
Memory_only-ser
2.Memory_and_disk
3.all of the above
4.Memory_only
Worker Node can have more than one 1 worker?
True
What are the different Languages supported by the spark for big data application?
1.java
2.scala
3.R
4.Python
5.All of the above
What are the characteristics of an RDD?
1.Immutable
2.Partitioned
3.Resilient
4.All of the above
Choose the Spark Machine Learning library from below
1.MLlib
2.Mahout
3.BlinkDB
4.GraphX
Memory management, monitoring jobs, fault tolerance, job scheduling and interaction with storage systems is handled by
ans :- Spark Engine
What are operations provided by RDD?
1.Action
2.both 1&3
3.Transformation
4..None of the above
Which are the common Spark ecosystems?
1.SparkSQL
2.Spark Streaming
3.BlinkDB
4.All of the above
is reduce function an action?
1.yes
2.no
3.don't know
What are the responsibilities of Spark Engine?
1. Distributing
2.Scheduling
3.monitoring
4.All of the above
IS the transformation Executed before an action follows?
1.No
2.Yes
3.Don't Know.
say i have list of numbers in RDD(say myrdd) . i want compute
DefmuAvg(x,y)
Reutrn(X=y)/2.0
avg = myrdd.reduce(myAvg),
is there something wrong with it?
1.yes
2.no
3.Don't know
Choose the function which is not action
1.Reduce
2.Take
3.Collect
4.Pipe
Which of the following are common spark ecosystems?
1.MLLib
2.Mahout
3.SparkSQL
4.both 1& 3
Which of this is not identifier ?
1.Numeric identifiers
2. Alpha Numeric identifiers.
3.Operator Identifiers
4.Literal Identifiers
5Mixed Identifiers
Immutability feature of scala helps in
1.Equality issues
2.Sequential programs
3. concurrent programs
5.Non_equality issues
6.both 2&3
what is the easiest way to format a string?
1.call.format
2.all of the above
3.call.arrange()
4.call.formatstring()
Which all cluster amanger can be used with spark?
1.mesos
2.yarn
3spark stand alone
4.All of the above
Which one of the following is not part of stateless transformation?
1.Join
2.map
3.reducebykey
4.filter
What is the default for mesos web ui?
18080
2.8088
3.5050
4.none of the above
When you can join operation on two pair Rddseg(k,v)anf (k,w) what is the result?
1.(K,(v+w))pairs with all pairs of elements for each key
2.(K,(v,w))pairs with all pairs of elements for each key
3.(K,(v-w))pairs with all pairs of elements for each key
What are the keys used by wide transformation?
1.groupbykeys
2.map()
3.reducebykey
4.both 1& 3
Which one of the following command was not sent by the driver program to executors?
1.foreach
2.task
3.filter
4.map
Which of the following is not an example of creation of RDD using Spark Context?
1sc.paralleize(0 to 100)
2.sc.broadcast("hello")
3.sc.textFiles("README.md")
4.using sc.newAPIHHadoopFile
which spark library allows reliable file sharining at memory speed across different cluste franmeworks?
1.Tachyon
2.ByKey
3.map
4.reduceBykey()
Which of the following is not an operator in spark?
1.map90
2.mapred()
3.filter()
4.reduceBykey()
5.groupBykey()
which one of the following is not considered as block store?
1.RAM
2.memory
3.disk
4.offheap
Point out the Error in the following Code:
val conf = new SparkConf()
setMaster("local [1]") setAppName("CountingSheep") val sc=new SparkContext(conf)
Spark Context should be set first
setMaster("local [1]") should be replaced by set("local [1])
"local[1]" means there is no parallelism
There is no Error
which of the following data sources spark can not process?
1.HDFS
2.cassandra
3.Hbase
4.My SQL
DAGSchduler uses event queue architecture. Ture/Flase
True
False
What is task with regards to spark job execution?
1.A Task also be considered in stage on a partition in a given job attempt
2.A Task belongs to single stage and oprates on single parition(part of an RDD)
3.Tasks are spawned one by one for each state and data partition
4.All of above
Sliding window controls transmission of data packets between various computers networks- Ture/Flase
True
Shuffling changes number of partitions-
False
How can you use machine learining library sckit library which is written in python with spark engine?
1can be used as pipeline API
2.using spark Mlib
3.using pipw()
4.All of the above
Which of the following is not transformation oprator on RDD?
1.Flatmap
2.reduceBykey
3.fork
4.cogroup
What is the advantage of parquet file
1.limit I?O OPerations
2consumes less space
3.fectches only requrired columns
4.all of the above
Partitions of RDD can be controlled using which of the operations?
1.repartition
2.partition
3.coalesce
4.option 1 &3
which of following is not component of yarn?
1.Resource manager
2.Application master
3.Name node
4. name node container
which of the following method pair RDD use?
1.map
2.reduceBYkey
3.join
4.both b & c
what is the eeftectof setting up of spark driver allow multiple Contexts is true?
1.create multiple spark contexts in single JVM
2. Spark will log warining instead of throwing exceptions
3.you cab not set the propert to true
4.both 1& 2
Which of the following is high availability schemes?
1.standby masters with Zoo keeper
2.Stand Masters in HDFS
3.single node recover with local file system
4.both 1& 3
how will you start mesos shuffle services?
1./bin/start-mesos-shuffle-service.sh
2./bin/mesos-shuffle-service.sh
3.sbin/start-mesos-shuffle-service.sh
4./sbin/start-mesos-shuffle-service.sh
which of the following operations the RDD Supports?
1.Transformation
2.action
3.addition
4.both 1 and 2
What is the purpose of friver in spark Architecture?
1.Driver splits spark application into task and schedules them to run on executors
2.A driver is where the task scheduler lives and spawns taks across workers
3.A Driver coordinates workers and overall execution of tasks
How to minimise data transfers when working with spark?
1.using brodcast variable
2.using Accumulators
3.Avoid operations which trigger shuffle
4. All os the above
What is the use of akka in spark?
1.combiner
2.reducer
3. scheduler
4.mapper
Which one of the following Mlib will not provide?
1.classification
2.Association
3. Regression
4.clustering
Which one of the following is not a property of RDD?
1. Preferred Locations
2. Partitioner
3.Compute
4.combiner
What is the level of security in spark?
1.Matured
2.Infancy
3. Evolving
4. No Security
Apache spark is framework with?
1. scheduling
2. monitoring
3. Distributing Applications
4.All of the above
In Spark data is represented as?
1.Blocks
2.Chunks
3. Rdds
4. None of the above
what are sparse vectors
Ans:- A sparse vector is a vector having a relatively small number of nonzero elements.
An adaptive optimisation framework that builds and maintains a set of multi-dimensional samples from original data over data time called as
1. Spark
2. Shark
3. BlinkDB
4. MapR
What is Graphx?
1. Library
2. Class
3. object
4. File
The collect AsMap() function collect the result as a map to provide easy lookup
True
False
Which of the following is not a characteristic shared by hadoop and spark?
1. both are data processing platforms
2. both are cluster computing environments
3. Both have their own file system
4. Both use open source APIs to link between different tools
No comments