Big Data hadoop Ecosystem answers.
What is parquet?
Apache parquet is a open source column oriented data file which is used for data storage and retriveal.
what is property of Rdd?
Immutable(Read only ) collection of objects which computes on different nodes of cluster
it can be created or retrieved anytime which makes caching, sharing & replication
which all cluster manager can be used in spark?
Stand alone, Hadoop Yarn and Apache Mesos , Kubernetes.
sliding window controls transmission of data packets between various computer networks.
it controls data packets between two devices where reliable delivery of data frames is needed,
it is also used in TCP protocol.
which of the followig method pair rdd used in spark
KEY-VALUE PAIR.{groupbykey(), reducebykey(), countbykey(),join()}
DAG scheduler uses Which architecture?
Event Queue Architecture
Big data cannot analyze with traditional spreadsheets Like RDBMS
In a map reduce job, you want each of you input files processed by a single map task.how do you configure a map reduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
Write a custom MapRunner that iterates over all key-value pairs in the entire files
What is parquet?
Apache parquet is a open source column oriented data file which is used for data storage and retriveal.
what is property of Rdd?
Immutable(Read only ) collection of objects which computes on different nodes of cluster
it can be created or retrieved anytime which makes caching, sharing & replication
which all cluster manager can be used in spark?
Stand alone, Hadoop Yarn and Apache Mesos , Kubernetes.
sliding window controls transmission of data packets between various computer networks.
it controls data packets between two devices where reliable delivery of data frames is needed,
it is also used in TCP protocol.
which of the followig method pair rdd used in spark
KEY-VALUE PAIR.{groupbykey(), reducebykey(), countbykey(),join()}
DAG scheduler uses Which architecture?
Event Queue Architecture
Big data cannot analyze with traditional spreadsheets Like RDBMS
In a map reduce job, you want each of you input files processed by a single map task.how do you configure a map reduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
Write a custom MapRunner that iterates over all key-value pairs in the entire files
What is reduce - side join?
Reduce-side join is a technique for merging data from different sources based on a specific key.
What is map - side join?
Map-side join is done in the map phase and done in memory
How can you use binary data in MapReduce?
Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file.
What are map files and why are they important?
Map files are sorted sequence files that also have an index. The index allows fast data look up.
What are sequence files and why are they important?
Sequence files are binary format files that are compressed
What are supported programming languages for Map Reduce?
The most common programming language is Java, but scripting languages are also supported via Hadoop streaming.
How many states does Writable interface defines
Two
Which method of the FileSystem object is used for reading a file in HDFS
open()
RPC means
Remote procedure call
The switch given to “hadoop fs” command for detailed help is
-help
The size of block in HDFS is
64 MB
Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution?
Split
What is the input to the Reduce function?
One key and a list of all values associated with that key.
How can a distributed filesystem such as HDFS provide opportunities for optimization of a MapReduce operation?
A distributed filesystem makes random access faster because of the presence of a dedicated node serving file metadata.
Which of the following MapReduce execution frameworks focus on execution in sharedmemory environments?
Phoenix
What is the implementation language of the Hadoop MapReduce framework?
Java
The Combine stage, if present, must perform the same aggregation operation as Reduce.
False
Which of the following statements most accurately describes the general approach to error recovery when using MapReduce?
Ranger
Which MapReduce stage serves as a barrier, where all previous stages must be completed before it may proceed?
Combine
While SequenceFile stores each record as key-value pair, the avro system stored records as
D - schema and data
What is the default value used by sqoop when it encounters a missing value while importing form CSV file.
B - null
Q 1 - If there is already a target directory with the same name as the table being imported then
A - The directory gets deleted and recreated.
B - The sqoop job fails
C - Another directory under the existing directory gets created.
D - The existing directory gets renamed
Q 2 - What does the --last-value parameter in sqoop incremental import signify?
A - What is the number of rows sucessfully imported in append type import
C - Both of the above
D - The count of the number of rows that were succesful in the current import.
Q 3 - The argument in a saved sqoop job can be altered at run time by using the option
C - --exec
D - --changeparam
Q 4 - While inserting data into Relational system from Hadoop using sqoop, the various table constraints present in the relational table must be
D - Not violated
Q 5 - The insert query used to insert exported data into tables is generate by
A - Sqoop command and processed as such
B - Sqoop command and modified suitably by JDBC drive
D - Database specific driver
Q 1 - If there is already a target directory with the same name as the table being imported then
A - The directory gets deleted and recreated.
B - The sqoop job fails
C - Another directory under the existing directory gets created.
D - The existing directory gets renamed
Q 2 - What does the --last-value parameter in sqoop incremental import signify?
A - What is the number of rows sucessfully imported in append type import
C - Both of the above
D - The count of the number of rows that were succesful in the current import.
Q 4 - While inserting data into Relational system from Hadoop using sqoop, the various table constraints present in the relational table must be
D - Not violated
Q 5 The insert query used to insert exported data into tables is generate by
A - Sqoop command and processed as such
B - Sqoop command and modified suitably by JDBC drive
Q 6 - The –update-key parameter can take
A - Only one column name as key field
B - Two column name as key filed
C - Comma separated list of columns as keys
D - Table name and column name as key
Q 7 - Load all or load nothing semantics is implemented by using the parameter
D - -staging-table
Q 8 - To ensure that the columns created in hive by sqoop have the correct data types the parameter used by sqoop is
A - --map-column-hive
D - --map-table-hive
Q 9 - The parameter(s) used to laod data using sqoop into the hive partitions is/are
A - --hive-partition-key and -hive-partition-value
B - --hive-partition-key-value
D - --hive-sqoop-partition-value
Q 10 - The parameters in sqoop command can be passed in to Oozie by using which tags?
B - <args>
D - <command>
Q 1 - Which of the following is used by sqoop to establish a connection with enterprise data warehouses? A - RDBMS driver B - JDBC Driver C - IDBC Driver D - SQL Driver |
|
Q 2 - Besides the JDBC driver, sqoop also needs
which of the following to connect to remote databases? A - Putty B - SSH C -Conenctor
D - sqoop client |
|
Q 3 - To run sqoop from multiple
nodes, it has to be installed in A - Any one of
the in the local filesystem. B - each of the node where it is supposed to run C - Only
on a pair of nodes
of the cluster D - Need not be installed. |
|
Q 4 - By default
the records from databases imported to HDFS by sqoop are A - Tab separated |
B - Concatenated columns C - space separated D - comma
separated |
|
Q 5 - To import data to Hadoop cluster
from relational database sqoop create a mapreduce job.
In this job A - All the data is transferred in one go. B - each mapper
transfers a slice of Table's
data C - Each mapper
transfers tables' data along with table's metadata nameofthecolumnsetc D - Only the schema of relational table is validated without fetching data |
|
Q 6 - The parameter in sqoop which specifies the output directories when importing data
is A - --output-path B - --target-path C - --output-dir
D - --target-dir |
|
Q 7 - If there is already a target directory with
the same name as the table being imported then A - The directory gets deleted and recreated . B - The sqoop job
fails C - Another
directory under the existing directory gets created. D - The existing directory gets renamed |
|
Q 8 - To prevent the password from being mentioned in the sqoop
import clause we can use the additional parameters A - -p B - --password-file C - both
of these D - cannot be prevented |
|
Q 9 - What are the two binary file
formats supported by sqoop? A - Avro & SequenceFile B - Rcfile and SequenceFile C - ORC file and RC file |
D - Avro and RC file |
|
Q 10 - While SequenceFile stores each record
as key-value pair,
the avro system stored records as A - Simple text B - chained lists C - Linked lists D - schema and
data |
|
Q 11 - The compression mechanism used by sqoop is A - built in sqoop B - delegated to Hadoop C - supplied as a java plugin to sqoop D - Needs to be installed in the OS runnign sqoop |
|
Q 12 - For some databases sqoop can to faster data transefr by using the parameter A - --bulkload B - --fastload C - --dump D - --direct |
|
Q 13 - The data
type mapping between
the database column
and sqoop column
can be overridden by using the parameter A - --override-column-type B - --map-column-type C - --override-column-java D - --map-column-java |
|
Q 14 - What does the num-mappers parameter serves? A - force sqoop
to use only one map task B - set the number of map tasks sqoop can use C - store the data imported
by each map tasks in a separate
file D - Fetch
each row form the table using a new map task |
|
Q 15 - What is the
default value used by sqoop when it encounters a missing value
while importing form
CSV file. |
A - NULL B - null C - space
character D - No values |
|
Q 16 - What option can be used to import the entire database from a relational system using sqoop? A - --import-all-db B - --import-all-tables C - --import-all D - --import |
|
Q 17 - what option
can bne used
to import only some of the table
from a database while using the --import-all-tables parameter? A - --skip-tables B - --without-tables C - --forgo-tables D - --exclude-tables |
|
Q 18 - Sqoop supports A - full import
of tables B - partial import of data from tables C - Both full and
partial data import D - Import both
the table and its partitions |
|
Q 19 - What are the two different incremental modes of importing data into sqoop? A - merge and add B - append and modified C - merge and lastmodified D - append and lastmodified |
|
Q 20 - What does the --last-value parameter in sqoop incremental import signify? A - What is the
number of rows sucessfully
imported in append type import B - what is the
date value to be used to select the rows for import in the
last_update_date type import C - Both of the above |
D - The count
of the number
of rows that were succesful in the current import. |
|
Q 21 - The --options-file parameter is used to A - save the import log B - specify the name of the data files to be created after import C - store all
the sqoop variables D - store the parameters and their values in a file to be used by various
sqoop commands. |
|
Q 22 - while specifying the connect string
in the sqoop
import command, for a Hadoop
cluster, if we specify localhost in place of a server addresshostnameorIPaddress in the URI, then A - The import job will connect to local databases B - Each node may connect to different databases C - the import job may succeed D - All of the above |
|
Q 23 - What is the disadvantage of storing password in the metastore as compared to storing in a password file? A - it is easily accessible B - it may get deleted accidentally C - It cannot be updated D - it is unencrypted |
|
Q 24 - What is the advantage of storing password
in a metastore as compared to storing in password in a file? A - It can be run by
any user with valid access to sqoop environment B - The password in metastore can be updated
while that in password file cannot be C - The password file can be encrypted while the metastore cannot be encrypted D - User intervention is required in password file but not
in metastore. |
|
Q 25 - The argument in a saved
sqoop job can be altered at run time by using the option A - --alter B - --newval C - --exec D - --changeparam |
|
|
Q 26 - What is achieved by using the --meta-connect parameter in a sqoop
command? A - run metastore as a service accessible remotely B - run metastore as a service
accessible locally C - connect to the meastore
tables D - connect to the metadata
of the external relational tables form which data has to be imported |
|
Q 27 - The free-form query import feature
in sqoop allows
to import data
from A - non relations sources B - a relational source without using a connector C - a relation source
using a sql query D - a relational source using custom
java classes |
|
Q 28 - The clause
'WHERE $CONDITIONS' in the sql query specified to import data,
serves the purpose
of A - split the query result into multiple chunks while importing B - picking a subset of rows from a
table C - specify the database from where the data needs to be imported D - Specify the target directory where the data
will be stored. |
|
Q 29 - The parameter to give a custom name to the mapreduce job running a sqoop import command is − A - --sqoop-job-name B - --map-job-name C - --mapreduce-job-name D - --rename-job |
|
Q 30 - While
using a free-form query to import
data, Sqoop finds that two columns from
the joined tables
have the same
name. In this case the job A - will fail B - will run ignoring the column from each table C - will prompt
the user to rename one of the columns D - automatically create
an alias for one of the columns
as succeed the job. |
|
Q 31 - The –boundary-query parameter is used to A - Select the maximum number of rows to be retrieved by the query B - Select maximum
and minimum values
of the column specified in the –split-by parameter |
C - Select the number of splits they query can run D - Select the maximum
and minimum number
of mapreduce tasks that will be used in the query. |
|
Q 32 - In a table import
the name of the mapreduce job A - Is named after the table name B - Can be customized C - Can be passed as a query parameter D - Is a random name decided by the system. |
|
Q 33 - In the import involving join of two tables the if there are two columns with
matching name between
two tables then this conflict can be resolved by A - Using table aliases
B - Column aliases C - First creating temporary tables form each table with different column names D - Rename
the columns in the source
system and then
import |
|
Q 34 - Data Transfer using sqoop can be A - only imported into the Hadoop
system B - both imported and exported from Hadoop system
C - transformed during import D - transformed during
the export |
|
Q 35 - While importing data into Hadoop
using sqoop the SQL SELCT
clause is used.
Similarly while exporting data form Hadoop the SQL clause used is A - APPEND
B - MERGE C - UPDTAE D - INSERT |
|
Q 36 - While inserting data into Relational system from Hadoop using sqoop, the various table constraints present
in the relational table must be A - Disabled temporarily B - Dropped
and re created C - Renamed D - Not violated |
|
Q 37 - The export and import of data between
sqoop and relational system happens through which of the following
programs? A - Sqoop client
program B - Mapreduce job submitted by the sqoop command C - Database stores
procedure D - Hdfs file management program |
|
Q 38 - When does sqoop gather
the metadata of the relational table into which it exports the data? A - Gathers the metadata
of all tables only once
during establishing the connection to the database B - Never as it relies
on the user to ensure
the exported data matches the table’s structure C - Every time the sqoop export command it submitted and just before the data transfer starts. D - Only if the export
fails, Sqoop accesses the metadata of the table |
|
Q 39 - Sqoop’s default
behavior while inserting rows into relational tables is A - one row at a time B - multiple rows depending on the memory availability C - It depends on the
database driver being used D - Executes random
number of insert
statements depending on the CPU availability |
|
Q 40 - Which parameter in sqoop is used for bulk data export to relational tables? A - –bulk B - –batch
C - -load D - -grouped data |
|
Q 41 - What does the parameter “
Dsqoop.export.records.per.statemet=10” do in a sqoop export command? A - Exports 10 records in each insert
statement B - Export
10 insert statements every time the command runs
C - Export only the
first 10 record
to the table D - Run 10 data export
commands in parallel |
|
Q 42 - The parameter which decided How many rows
will be inserted per
transaction in sqoop
is A - Dsqoop.export.rows.per.transaction |
B - Dsqoop.export.records.per.transaction C -
Dsqoop.export.inserts.per.transaction D - Dsqoop.export.statements.per.transaction |
|
Q 43 - The insert
query used to insert exported data into tables is generate by A - Sqoop command
and processed as such B - Sqoop command and modified suitably by JDBC drive C
-
JDBC driver D - Database specific driver |
|
Q 44 - When
the “sqoop.export.records.per.statement” is set to two or more, the
query created by sqoop has
the SQL forma of A - INSERT
INTO TABLE VLAUES. . ;NSERT INTO TABLE VLAUES. . ; and so on. B - BULK NSERT
INTO TABLE VLAUES. . ,…,…; C - iNSERT INTO TABLE VLAUES. . ,VALUES. . ,VALUES……. D - iNSERT
INTO TABLE VLAUES. . ,…,…; |
|
Q 45 - What happens
if the sqoop generated export query
is not accepted by the database? A - The export
fails B - The export succeeds partially. C - The export
does not start D - Sqoop automatically modifies the query to succeed after
receiving the failure
response form database. |
|
Q 46 - Using the higher
value for the parameter sqoop.export.statements.per.transaction will A - Always increase the export performance B - May or may not increase the export performance C - May decrease the performance for some of the tables D - Cause frequent commits
to the database |
|
Q 47 - The –staging-table parameter is used for A - Storing some sample data from Hadoop
before loading the real table B - Storing all the required
data from Hadoop
before loading it to real table C - Storing the rejected rows |
D - Storing the metadata structure of tables to which data is being
exported |
||
|
||
Q 48 - With the –staging-table parameter, the data is moved from staging
to final table A - Automatically if staging load is
successful B - Has to be done by user after verifying the data in staging C - Depends
on the data size D - Depends on the memory available to move the data |
||
|
||
Q 49 - Which of the following is a disadvantage of using the –staging-table parameter? A - Data is stored twice
and consumes more memory B - The overall
export time is more than
direct export to final table C - User should ensure the structure of staging table and final tables are in Sync.
D - All of the above |
||
|
||
Q 50 - Using the –staging-table parameter while loading data to relational tables the creation of staging
table is done A - Automatically
b sqoop B - Automatically by database C - User has to ensure
it is created D - Automatically created by a Hadoop process beyond sqoop |
No comments