Big Data Hadoop Lab Assignment module -1
Hadoop Lab Assignment
Data Node CalculationLet's assume that, you have 100 TB of data to store and process with Hadoop. The configuration of each available Data Node is as follows:
• 8 GB RAM
• 10 TB HDD
• 100 MB/s read-write speed
You have a Hadoop Cluster with replication factor = 3 and block size = 64 MB.
In this case, the number of DataNodes required to store would be:
• Total amount of Data * Replication Factor / Disk Space available on each DataNode
• 100 * 3 / 10
• 30 DataNodes
Now, let's assume you need to process this 100 TB of data using MapReduce.
And, reading 100 TB data at a speed of 100 MB/s using only 1 node would take:
• Total data / Read-write speed
• 100 * 1024 * 1024 / 100
• 1048576 seconds
• 291.27 hours
So, with 30 DataNodes you would be able to finish this MapReduce job in:
• 291.27 / 30
• 9.70 hours
1. Problem Statement
How many such Data Nodes you would need to read 100TB data in 5 minutes in your Hadoop Cluster?
1. Problem Solution
1.1 Time required for reading the data using Single DataNode
1 DataNode takes: -
Total Data/Read –Write speed
(100TB * 1024*1024) in MB / 100MB/s
1048576 seconds or 291.77 hours to read 100TB Data
1.2 DataNodes required to read the data in FIVE minutes
Number of DataNodes required to read 100TB in 5 minutes
Time taken by 1 DataNode to read the 100TB data / Total time given to finish the read
= (1048576 seconds/60)/5 minutes
= 3495.253333 Data Nodes
So, you would need ~ 3495 such DataNodes to read the 100TB data in 5 minutes
No comments