Hi,
I have tried to train my model on Ascend AI Processor, but I've got this error:
[ERROR] RUNTIME(6845)mem async copy error, retCode=0x87, [pcie dma copy error].[ERROR] RUNTIME(6845)mem async copy failed device_id=1, stream_id=534, task_id=831[ERROR] RUNTIME(6845)copy_type=1, memcpy_type=0, copy_data_type=0, src_addr=dbc0, dst_addr=2c00000001, length=4 Traceback (most recent call last):... RuntimeError: ACL stream synchronize failed.THPModule_npu_shutdown success.
I was using ascend-pytorch-x86:21.0.1 docker image. After this error, npu-smi is not showing NPU chip device 1 anymore.
Why did I got this error, and how can I fix this? I look forward to your help in this issue.