Got it

MapReduce workload is batch oriented(long streaming reads, large sequential writes), so how can it optimize the network bandwidth usage?

Created: Jul 28, 2022 16:49:30Latest reply: Aug 1, 2022 01:55:40 125 4 0 0 0
  HiCoins as reward: 0 (problem unresolved)

Workloads are batch oriented, dominated by long streaming reads and large sequential writes.

As a result,high sustained bandwidth is more important than low latency.This exactly describes

the nature of MapReduce jobs, which are batch operations on large amounts of data. Due to the

common-case workload, both HDFS and GFS do not implement any form of data caching


Source-: "Data Intensive Text Processing with MapReduce" book


According to my textbook "Big Data Black Book-DT editorial services"


To use the network optimally, a long stream of data should be sent by the application code when it's reading from or writing to the file system.


I don't get what it's trying to say.


I'd guess it's trying to say sth in the lines of "multi-task, don't wait". But I don't get the exact scenario here.


What's is application code reading/writing to file system? 


Where's the application code sending the data?


And how does it optimize the network usage?


For reference, Here's all the data flow in map and reduce-:

Map Phase data flow

Reduce phase data flow


Source-:

https://courses.cs.duke.edu/spring16/compsci516/Lectures/Lecture-14.pdf


  • x
  • convention:

Featured Answers
olive.zhao
Admin Created Jul 29, 2022 02:11:05

Hello, friend!

1. The program is the MapReduce service program.

2. After the service program starts to run, it is divided into map task and reduece task map to read the file system data and reduce the output result.

3. Network optimization: MapReduce optimization is complex and needs to be analyzed based on the actual situation. For example, Hadoop decides to split a job into multiple independent map and reduce tasks for execution. It schedules the task and allocates the appropriate resources to it, and decides where to assign a task in the cluster. (If possible, it is usually the location of the data to be processed by the task to minimize network overhead.)

Hope this helps!


View more
  • x
  • convention:

All Answers
ariase88
ariase88 Admin Created Jul 28, 2022 16:54:22

Thanks for contacting the Huawei community!

We are checking your question and will provide an answer to you shortly...
View more
  • x
  • convention:

olive.zhao
olive.zhao Admin Created Jul 29, 2022 02:11:05

Hello, friend!

1. The program is the MapReduce service program.

2. After the service program starts to run, it is divided into map task and reduece task map to read the file system data and reduce the output result.

3. Network optimization: MapReduce optimization is complex and needs to be analyzed based on the actual situation. For example, Hadoop decides to split a job into multiple independent map and reduce tasks for execution. It schedules the task and allocates the appropriate resources to it, and decides where to assign a task in the cluster. (If possible, it is usually the location of the data to be processed by the task to minimize network overhead.)

Hope this helps!


View more
  • x
  • convention:

Saqibaz
Saqibaz Created Jul 29, 2022 06:32:29

Thanks
View more
  • x
  • convention:

Jackson.F
Jackson.F Created 7 days ago

Data read/write and data write to disks exert pressure on the network. I think it is not low to increase the bandwidth. I think it is not low to compress data and reduce the number of read and write times on the network. Comprehensive service optimization is required.
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register
Comment

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.
Information Protection Guide
Thanks for using Huawei Enterprise Support Community! We will help you learn how we collect, use, store and share your personal information and the rights you have in accordance with Privacy Policy and User Agreement.