Breaking News

Big Data Hadoop Lab Assignment module -1

Hadoop Lab Assignment

Data Node Calculation
Let's assume that, you have 100 TB of data to store and process with Hadoop. The configuration of each available Data Node is as follows: 
8 GB RAM 
10 TB HDD 
100 MB/s read-write speed 
 
You have a Hadoop Cluster with replication factor = 3 and block size = 64 MB. 
In this case, the number of DataNodes required to store would be: 
Total amount of Data * Replication Factor / Disk Space available on each DataNode 
100 * 3 / 10 
30 DataNodes 
 
Now, let's assume you need to process this 100 TB of data using MapReduce. 
And, reading 100 TB data at a speed of 100 MB/s using only 1 node would take: 
Total data / Read-write speed 
100 * 1024 * 1024 / 100 
1048576 seconds 
291.27 hours 
 
So, with 30 DataNodes you would be able to finish this MapReduce job in: 
291.27 / 30 
9.70 hours 
 

1. Problem Statement 


How many such Data Nodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster? 

1. Problem Solution 


1.1 Time required for reading the data using Single DataNode 
 
1 DataNode takes: -
 
Total Data/Read –Write speed 
(100TB * 1024*1024) in MB / 100MB/s 
1048576 seconds or 291.77 hours to read 100TB Data 
 
1.2 DataNodes required to read the data in FIVE minutes 
 
Number of DataNodes required to read 100TB in 5 minutes 
 
Time taken by 1 DataNode to read the 100TB data / Total time given to finish the read 
= (1048576 seconds/60)/5 minutes 
= 3495.253333 Data Nodes
 
So, you would need ~ 3495 such DataNodes to read the 100TB data in 5 minutes
 

No comments