Hi all, I keep getting error E40007 when running YoloV3 training on Atlas 800 (3010).
I'm using this version of Yolo: https://www.hiascend.com/en/software/modelzoo/detail/2/7b5f73072a24453389602051affe9b31
The error occurs when trying to load darknet weights.
Here is the error message:
INFO:tensorflow:Restoring parameters from ../../.././data/darknet_weights/darknet53.ckpt 2021-09-30 13:32:55.849385: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 0 in both shapes must be equal, but are 60 and 255. Shapes are [60] and [255]. for 'save/Assign_290' (op: 'Assign') with input shapes: [60], [255].. 2021-09-30 13:32:55.849444: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 3 in both shapes must be equal, but are 60 and 255. Shapes are [1,1,512,60] and [1,1,512,255]. for 'save/Assign_291' (op: 'Assign') with input shapes: [1,1,512,60], [1,1,512,255].. 2021-09-30 13:32:55.849462: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 0 in both shapes must be equal, but are 60 and 255. Shapes are [60] and [255]. for 'save/Assign_332' (op: 'Assign') with input shapes: [60], [255].. 2021-09-30 13:32:55.849477: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 3 in both shapes must be equal, but are 60 and 255. Shapes are [1,1,256,60] and [1,1,256,255]. for 'save/Assign_333' (op: 'Assign') with input shapes: [1,1,256,60], [1,1,256,255].. 2021-09-30 13:32:55.849491: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 0 in both shapes must be equal, but are 60 and 255. Shapes are [60] and [255]. for 'save/Assign_349' (op: 'Assign') with input shapes: [60], [255].. 2021-09-30 13:32:55.849505: W tf_adapter/util/infershape_util.cc:275] AddNode failed, errormsg is Dimension 3 in both shapes must be equal, but are 60 and 255. Shapes are [1,1,1024,60] and [1,1,1024,255]. for 'save/Assign_350' (op: 'Assign') with input shapes: [1,1,1024,60], [1,1,1024,255].. 2021-09-30 13:32:55.849522: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SOURCE is null. 2021-09-30 13:32:55.849529: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node _SINK is null. 2021-09-30 13:32:55.853388: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_290 is null. 2021-09-30 13:32:55.853409: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_291 is null. 2021-09-30 13:32:55.853557: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_332 is null. 2021-09-30 13:32:55.853567: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_333 is null. 2021-09-30 13:32:55.853652: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_349 is null. 2021-09-30 13:32:55.853660: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/Assign_350 is null. 2021-09-30 13:32:55.854274: W tf_adapter/util/infershape_util.cc:313] The InferenceContext of node save/restore_all is null. 2021-09-30 13:32:55.863400: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_290 2021-09-30 13:32:55.863439: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_291 2021-09-30 13:32:55.863871: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_332 2021-09-30 13:32:55.863899: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_333 2021-09-30 13:32:55.864175: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_349 2021-09-30 13:32:55.864200: W tf_adapter/kernels/geop_npu.cc:1176] [GEOP] There is no infershape of node : save/Assign_350 2021-09-30 13:32:57.587722: W tensorflow/core/framework/op_kernel.cc:1639] Unavailable: failed 2021-09-30 13:33:02.338308: F tf_adapter/kernels/geop_npu.cc:766] GeOp3_0GEOP::::DoRunAsync Failed Error Message is : E40007: Prebuild op[save/Assign_333] failed, oppath[/usr/local/Ascend/nnae/5.0.1/opp/op_impl/built-in/ai_core/tbe/impl/assign.py], optype[Assign]. Please check the op's compilation error message. Prebuild op[save/Assign_332] failed, oppath[/usr/local/Ascend/nnae/5.0.1/opp/op_impl/built-in/ai_core/tbe/impl/assign.py], optype[Assign]. Please check the op's compilation error message. Prebuild op[save/Assign_291] failed, oppath[/usr/local/Ascend/nnae/5.0.1/opp/op_impl/built-in/ai_core/tbe/impl/assign.py], optype[Assign]. Please check the op's compilation error message. run_yolov3.sh: line 30: 55170 Aborted (core dumped) taskset -c $PID_START-$PID_END python3 $3/train.py --mode $4
Also, here are the error messages from /root/ascend/log/plog/ folder
[ERROR] GE(50377,python3):2021-09-30-10:26:00.727.952 [error_manager.cc:131]50646 ReportErrMessage: ErrorNo: -1(failed) [COMP][PRE_OPT][Report][Error]error_code is not registered
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.727.997 [fusion_op.cc:5071]PrintOp Failed to pre-compile node [name:save/Assign_333, type: Assign]. Detailed pre-compilation info is:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.007 [fusion_op.cc:5328]GetFinishedCompilationTask FinishedTask[0]: taskID[139807837619968:2406], status[1], kernel[None], File[/usr/local/Ascend/nnae/5.0.1/opp/op_impl/built-in/ai_core/tbe/impl/assign.py], result: prebuild failed. module[impl.assign] func[assign], compile_info_key:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.011 ===========FOLLOWING IS ARGUMENTS OF NODE save/Assign_333, TYPE Assign=============
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.028 input0: {'shape': (1, 1, 256, 60), 'ori_shape': (1, 1, 256, 60), 'format': 'NHWC', 'sub_format': 0, 'ori_format': 'NHWC', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.032 input1: {'shape': (1, 1, 256, 255), 'ori_shape': (1, 1, 256, 255), 'format': 'NHWC', 'sub_format': 0, 'ori_format': 'NHWC', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.040 outputs0: {'shape': (1, 1, 256, 255), 'ori_shape': (1, 1, 256, 255), 'format': 'NHWC', 'sub_format': 0, 'ori_format': 'NHWC', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.044 Attributes are: {}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.047 ===========FOLLOWING IS STACK INFO OF NODE save/Assign_333=============
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.053 Stack:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.060 Stack:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.728.064 ===========END OF DETAILED OP INFO OF NODE save/Assign_333=============
[ERROR] FE(50377,python3):2021-09-30-10:26:00.728.161 [tbe_op_store_adapter.cc:142]50646 ProcessFailPreCompTask:"tid[139807837619968], taskId[2406], node[save/Assign_333], precompile failed"
[ERROR] FE(50377,python3):2021-09-30-10:26:00.728.165 [tbe_op_store_adapter.cc:295]50646 ParallelPreCompileOp:"Pre-build Tbe op failed, graph_id[139807837619968]"
[ERROR] FE(50377,python3):2021-09-30-10:26:00.728.188 [op_compiler.cc:399]50646 PreCompileOp:"PreCompileOp failed, graph [partition0_rank187_new_sub_graph646]"
[ERROR] GE(50377,python3):2021-09-30-10:26:00.728.209 [graph_optimize.cc:118]50646 OptimizeSubGraph: ErrorNo: -1(failed) [COMP][PRE_OPT][OptimizeSubGraph][OptimizeFusedGraph]: graph optimize failed, ret:-1
[ERROR] GE(50377,python3):2021-09-30-10:26:00.728.214 [graph_manager.cc:2699]50646 ProcessSubGraphWithMultiThreads: ErrorNo: -1(failed) [COMP][PRE_OPT]SubGraph optimize Failed AIcoreEngine
[ERROR] GE(50377,python3):2021-09-30-10:26:00.737.531 [error_manager.cc:131]50637 ReportErrMessage: ErrorNo: -1(failed) [COMP][PRE_OPT][Report][Error]error_code is not registered
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.576 [fusion_op.cc:5071]PrintOp Failed to pre-compile node [name:save/Assign_332, type: Assign]. Detailed pre-compilation info is:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.585 [fusion_op.cc:5328]GetFinishedCompilationTask FinishedTask[0]: taskID[139807988107008:2428], status[1], kernel[None], File[/usr/local/Ascend/nnae/5.0.1/opp/op_impl/built-in/ai_core/tbe/impl/assign.py], result: prebuild failed. module[impl.assign] func[assign], compile_info_key:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.589 ===========FOLLOWING IS ARGUMENTS OF NODE save/Assign_332, TYPE Assign=============
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.604 input0: {'shape': (60,), 'ori_shape': (60,), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.607 input1: {'shape': (255,), 'ori_shape': (255,), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.615 outputs0: {'shape': (255,), 'ori_shape': (255,), 'format': 'ND', 'sub_format': 0, 'ori_format': 'ND', 'dtype': 'float32', 'addr_type': 0, 'valid_shape': (), 'slice_offset': (), 'L1_workspace_size': -1, 'L1_fusion_type': -1, 'L1_addr_offset': 0, 'total_shape': (), 'split_index': 0}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.619 Attributes are: {}
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.622 ===========FOLLOWING IS STACK INFO OF NODE save/Assign_332=============
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.629 Stack:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.636 Stack:
[ERROR] TEFUSION(50377,python3):2021-09-30-10:26:00.737.640 ===========END OF DETAILED OP INFO OF NODE save/Assign_332=============
[ERROR] FE(50377,python3):2021-09-30-10:26:00.737.712 [tbe_op_store_adapter.cc:142]50637 ProcessFailPreCompTask:"tid[139807988107008], taskId[2428], node[save/Assign_332], precompile failed"
[ERROR] FE(50377,python3):2021-09-30-10:26:00.737.717 [tbe_op_store_adapter.cc:295]50637 ParallelPreCompileOp:"Pre-build Tbe op failed, graph_id[139807988107008]"
[ERROR] FE(50377,python3):2021-09-30-10:26:00.737.743 [op_compiler.cc:399]50637 PreCompileOp:"PreCompileOp failed, graph [partition0_rank201_new_sub_graph667]"
[ERROR] GE(50377,python3):2021-09-30-10:26:00.737.765 [graph_optimize.cc:118]50637 OptimizeSubGraph: ErrorNo: -1(failed) [COMP][PRE_OPT][OptimizeSubGraph][OptimizeFusedGraph]: graph optimize failed, ret:-1
[ERROR] GE(50377,python3):2021-09-30-10:26:00.737.770 [graph_manager.cc:2699]50637 ProcessSubGraphWithMultiThreads: ErrorNo: -1(failed) [COMP][PRE_OPT]SubGraph optimize Failed AIcoreEngine
[ERROR] GE(50377,python3):2021-09-30-10:26:00.778.326 [graph_manager.cc:707]50434 OptimizeSubGraphWithMultiThreads: ErrorNo: -1(failed) [COMP][PRE_OPT]subgraph 186 optimize failed
[ERROR] GE(50377,python3):2021-09-30-10:26:00.878.258 [graph_manager.cc:785]50434 SetSubgraph: ErrorNo: -1(failed) [COMP][PRE_OPT]Multiply optimize subgraph failed
[ERROR] GE(50377,python3):2021-09-30-10:26:00.878.330 [graph_manager.cc:3230]50434 OptimizeSubgraph: ErrorNo: -1(failed) [COMP][PRE_OPT]Graph set subgraph Failed
[ERROR] GE(50377,python3):2021-09-30-10:26:00.878.350 [graph_manager.cc:837]50434 PreRunOptimizeSubGraph: ErrorNo: -1(failed) [COMP][PRE_OPT]Failed to process GraphManager_OptimizeSubgraph
[ERROR] GE(50377,python3):2021-09-30-10:26:00.878.358 [graph_manager.cc:946]50434 PreRun: ErrorNo: -1(failed) [COMP][PRE_OPT]Run PreRunOptimizeSubGraph failed for graph:ge_default_20210930102559.
[ERROR] GE(50377,python3):2021-09-30-10:26:00.881.802 [graph_manager.cc:3094]50434 ReturnError: ErrorNo: -1(failed) [COMP][PRE_OPT]PreRun Failed..
[ERROR] FMK(50377,python3):2021-09-30-10:26:03.881.991 [tf_adapter/common/adp_logger.cc:34][TF_ADAPTER] [tf_adapter/kernels/geop_npu.cc:763] GeOp3_0GEOP::::DoRunAsync FailedCould you please help me solve the issue?


