Got it

[Distributed Training on Atlas 800-9000] Ranktable file is not recognized

Latest reply: Mar 23, 2021 16:30:28 420 7 1 0 0

I am running the inception v4 training code obtained from model zoo, on Atlas 800-9000. I have some questions regarding the configuration of the distributed training (up to 8 devices on a single server). 


From the documentation, it is required to prepare a resource configuration file (the ranktable file) which should contain the NIC IP addresses of the devices available in the training server. 


However, I noticed that even if the IP addresses in the ranktable file are not valid, the training runs successfully and leverages all the 8 devices (I checked the device utilization using npu-smi and ascend-dmi tools). So my question are:

  • Is the ranktable file required for distributed training on single server scenario?

  • If the ranktable is not required for such a scenario, then how the training job is scheduled over the devices?

Hello, dear!
It's nice to meet you in the community.
We're working on your problem. Please be patient.

View more
  • x
  • convention:

Did you got a chance to have a look at this issue?
Thanks in advance.
View more
  • x
  • convention:

Saqib123
Saqib123 Created Mar 23, 2021 16:29:10 (0) (0)
 
Good
View more
  • x
  • convention:

[Distributed Training on Atlas 800-9000] Ranktable file is not recognized-3839847-1
View more
  • x
  • convention:

Nice
View more
  • x
  • convention:

[Distributed Training on Atlas 800-9000] Ranktable file is not recognized-3839853-1
View more
  • x
  • convention:

Comment

You need to log in to comment to the post Login | Register

Notice: To protect the legitimate rights and interests of you, the community, and third parties, do not release content that may bring legal risks to all parties, including but are not limited to the following:
  • Politically sensitive content
  • Content concerning pornography, gambling, and drug abuse
  • Content that may disclose or infringe upon others ' commercial secrets, intellectual properties, including trade marks, copyrights, and patents, and personal privacy
Do not share your account and password with others. All operations performed using your account will be regarded as your own actions and all consequences arising therefrom will be borne by you. For details, see " User Agreement."

My Followers

Login and enjoy all the member benefits

Login

Block
Are you sure to block this user?
Users on your blacklist cannot comment on your post,cannot mention you, cannot send you private messages.
Reminder
Please bind your phone number to obtain invitation bonus.