HomeworkAssignment2-CalculateSizeofDistributedFileSystem_1224096529.PDF

Homework Assignment #2 – Calculating Size of Distributed File Systems

Exercise 3-1 Imagine that you want to analyze one terabyte (1 TB) of

data that is residing in a single machine with eight input/output channels,

where each channel has a reading speed of 150 megabytes per second

(MB/s).

1. Calculate the time it takes for the reader to read the entire file.

2. To speed up the reading operation, consider adding more machines

and creating a distributed cluster. What is the minimum number of

machines you should install in the cluster so the entire read time is less

than 10 seconds?

Exercise 3-2 A cluster with 50 machines is storing blocks of data that

belong to customer complaints. The size of the file is 5 TB, and each

machine has four channels with a reading speed of 100 MB/s for each

channel. Is the number of machines (50) sufficient to read the data in

under 20 seconds? If not, how many more similar machines need to be

added to the cluster?

Exercise 3-3 You want to store a 500 MB file into a cluster with 12 nodes, which

are located in four different racks (three nodes per rack) as shown in the figure

below.

1. If a data block can store 128 MB, how many data blocks are needed to split

this file?

2. Use a replication factor of 3 and the write principles discussed earlier to

allocate the data blocks into this cluster.

3. Repeat steps 1 and 2 but with a block size of 256 MB.

Exercise 3-4 Use the same cluster in the figure shown below for a file size of 50

GB. Each Data Node can store up to 8 GB of data. You need to allocate the data

blocks, each of a 256 MB size, in the cluster using a replication factor of 3.

1. Is the number of Data Nodes (12) sufficient to store this data file?

2. If not, how many more Data Nodes are needed? If needed, add them to the

cluster in a separate rack and allocate the blocks in the modified cluster.

3. If 12 is sufficient, allocate the data blocks in the cluster.

4. Repeat steps 1 through 3 but with a block size of 128 MB.

Exercise 3-5 Consider the block allocations shown in the figure below. Using a

replication factor of 3, are all blocks allocated in the correct Rack and Data Node?

If no, reallocate the blocks correctly. Explain your decision.

Exercise 3-6 Consider the block allocations shown in the figure below. Using a

replication factor of 3, are all blocks allocated in the correct Rack and Data Node?

If no, reallocate the blocks correctly. Explain your decision.

Exercise 3-7 Use the HDFS commands provided in Appendix A-Part 2 (HDFS) to

perform the following tasks. Submit a document in which each command is

associated with a screenshot of its result.

1. Create a directory in HDFS.

2. Copy any file from the local machine to the newly created directory.

3. List the directory’s contents.

4. View the contents of the file.

5. Rename any file in HDFS.

6. Create another directory in HDFS and move any file from one directory to

another.

7. Delete any file.

8. Delete any directory.

9. Move any file from HDFS to the local machine.

10. Display the size of files.

11. Change the group of any file or directory.

12. Change permissions of any file or directory.