Following counters might not be supported by rocprof: SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F64
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:09.111545 123191299821376 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306038 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:09.120910 123191299821376 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.332960 123191299821376 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:09.463840 123191299821376 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342931 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.503380 123191299821376 generateRocpd.cpp:582] writing SQL database for process 2381692 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:09.504847 123191299821376 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381692_results.db (UUID=00004311-d04e-704e-8727-bfdce1d2c829)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.595190 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014341 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.596326 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001105 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.598808 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002453 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.603930 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003106 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.673242 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069284 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.675990 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002717 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.676018 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692026 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015993 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692055 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692067 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692079 123191299821376 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692306 123191299821376 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000208 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.692834 123191299821376 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.189454 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.698685 123191299821376 simple_timer.cpp:55] [rocprofv3] output generation ::     0.232303 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:09.698773 123191299821376 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.234881 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381692_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:11.963923 133015315824448 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304740 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:11.974034 133015315824448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.185692 133015315824448 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:12.315678 133015315824448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341644 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.355488 133015315824448 generateRocpd.cpp:582] writing SQL database for process 2381702 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:12.356768 133015315824448 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381702_results.db (UUID=00004311-db73-7b73-b401-feb689e11344)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.448378 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014581 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.449519 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001110 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.452052 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002505 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.457203 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003156 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.519905 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062673 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.522800 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002864 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.522829 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.538916 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016071 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.538950 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.538963 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.538984 133015315824448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.539202 133015315824448 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000205 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.539710 133015315824448 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.184223 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.545589 133015315824448 simple_timer.cpp:55] [rocprofv3] output generation ::     0.227346 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:12.545679 133015315824448 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.229949 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381702_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:14.786155 132001210654528 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304838 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:14.794884 132001210654528 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.008642 132001210654528 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:15.139124 132001210654528 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344240 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.178009 132001210654528 generateRocpd.cpp:582] writing SQL database for process 2381712 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:15.179285 132001210654528 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381712_results.db (UUID=00004311-e679-7679-899a-0e29ef2c66a2)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.270147 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014503 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.271296 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001113 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.273840 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.278906 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003065 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.346666 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067731 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.349528 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002831 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.349558 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.364925 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015348 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.364952 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.364964 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.364986 132001210654528 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.365182 132001210654528 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.365599 132001210654528 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.187591 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.371327 132001210654528 simple_timer.cpp:55] [rocprofv3] output generation ::     0.229766 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:15.371406 132001210654528 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.232231 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381712_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 23 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:17.628187 139833343827776 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305845 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:17.636789 139833343827776 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:17.847807 139833343827776 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:17.977879 139833343827776 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341090 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.017458 139833343827776 generateRocpd.cpp:582] writing SQL database for process 2381723 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:18.018751 139833343827776 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381723_results.db (UUID=00004311-f192-7192-be8b-1fbe4e4f4dc2)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.108776 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014470 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.109942 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001135 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.112509 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002538 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.117647 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003163 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.189914 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.072238 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.192819 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002875 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.192849 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.208661 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015798 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.208692 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.208704 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.208716 139833343827776 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.208950 139833343827776 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000213 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.209504 139833343827776 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.192047 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.215543 139833343827776 simple_timer.cpp:55] [rocprofv3] output generation ::     0.235198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:18.215642 139833343827776 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.237711 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381723_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:20.476527 125940254662464 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302116 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:20.486638 125940254662464 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.702601 125940254662464 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:20.831892 125940254662464 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345255 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.871325 125940254662464 generateRocpd.cpp:582] writing SQL database for process 2381733 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:20.872641 125940254662464 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381733_results.db (UUID=00004311-fcb7-7cb7-8846-c805a40de0a0)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.980853 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.028296 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.982040 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001157 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.984323 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002254 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:20.989437 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003124 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.051916 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062450 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.054621 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002675 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.054650 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.070464 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015799 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.070494 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.070507 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.070519 125940254662464 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.070730 125940254662464 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000196 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.071245 125940254662464 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.199920 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.076754 125940254662464 simple_timer.cpp:55] [rocprofv3] output generation ::     0.242324 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:21.076839 125940254662464 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.244894 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381733_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:23.352956 133120571883328 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305061 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:23.362699 133120571883328 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.574283 133120571883328 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:23.703185 133120571883328 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340486 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.743122 133120571883328 generateRocpd.cpp:582] writing SQL database for process 2381743 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:23.744427 133120571883328 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381743_results.db (UUID=00004312-07f0-77f0-a030-6c78d46fd202)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.835568 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014282 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.836723 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001125 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.839277 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002526 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.844348 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003061 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.898607 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.054231 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.901461 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002819 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.901491 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.917856 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016350 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.917884 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.917896 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.917907 133120571883328 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.918124 133120571883328 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000195 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.918577 133120571883328 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.175456 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.924444 133120571883328 simple_timer.cpp:55] [rocprofv3] output generation ::     0.218658 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:23.924531 133120571883328 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.221295 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381743_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:26.216597 124551118286656 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304688 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:26.226525 124551118286656 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.438334 124551118286656 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:26.571169 124551118286656 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344644 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.609763 124551118286656 generateRocpd.cpp:582] writing SQL database for process 2381767 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:26.611050 124551118286656 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381767_results.db (UUID=00004312-1320-7320-9e1e-04df21175b11)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.701589 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014294 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.702760 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001136 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.705002 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002214 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.710072 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003148 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.760755 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.050654 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.763477 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002693 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.763506 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779201 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015680 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779229 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779242 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779253 124551118286656 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779458 124551118286656 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000186 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.779919 124551118286656 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.170157 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.785726 124551118286656 simple_timer.cpp:55] [rocprofv3] output generation ::     0.212047 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:26.785806 124551118286656 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.214586 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381767_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:29.096199 134868569046848 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304452 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:29.104748 134868569046848 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.316054 134868569046848 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:29.447842 134868569046848 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343094 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.487549 134868569046848 generateRocpd.cpp:582] writing SQL database for process 2381780 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:29.488859 134868569046848 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381780_results.db (UUID=00004312-1e60-7e60-8e31-0b4125f4c3a3)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.579837 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014742 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.580962 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.583561 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002557 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.588760 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003144 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.682892 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094105 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.685622 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002700 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.685652 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.701912 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016245 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.701941 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.701953 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.701965 134868569046848 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.702187 134868569046848 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000201 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.702757 134868569046848 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.215209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.708713 134868569046848 simple_timer.cpp:55] [rocprofv3] output generation ::     0.258394 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:29.708811 134868569046848 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.260921 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381780_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:31.990882 134275555553088 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303875 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:32.000597 134275555553088 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.211487 134275555553088 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:32.344321 134275555553088 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343724 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.384197 134275555553088 generateRocpd.cpp:582] writing SQL database for process 2381791 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:32.385491 134275555553088 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381791_results.db (UUID=00004312-29af-79af-ae80-249eda7faad0)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.476773 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014435 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.477953 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001148 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.480662 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002667 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.485745 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003096 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.578478 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.092704 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.581230 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002721 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.581259 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.597774 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.597803 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.597815 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.597827 134275555553088 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.598069 134275555553088 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000221 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.598825 134275555553088 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.214629 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.605209 134275555553088 simple_timer.cpp:55] [rocprofv3] output generation ::     0.258419 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:32.605303 134275555553088 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.260928 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381791_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:34.915769 129555539312448 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.310317 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:34.925374 129555539312448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.136070 129555539312448 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:35.269940 129555539312448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344566 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.309226 129555539312448 generateRocpd.cpp:582] writing SQL database for process 2381802 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:35.310504 129555539312448 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381802_results.db (UUID=00004312-3516-7516-905e-253509424ef4)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.402225 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014918 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.403410 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001149 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.406058 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002620 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.411185 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003050 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.545992 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134779 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.548657 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002634 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.548686 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.564963 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016262 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.565002 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.565015 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.565027 129555539312448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.565266 129555539312448 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000218 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.565884 129555539312448 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.256659 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.571871 129555539312448 simple_timer.cpp:55] [rocprofv3] output generation ::     0.299379 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:35.571985 129555539312448 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.301982 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381802_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:37.903440 125881935556416 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.308744 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:37.913343 125881935556416 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.124628 125881935556416 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:38.259230 125881935556416 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345887 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.299144 125881935556416 generateRocpd.cpp:582] writing SQL database for process 2381813 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:38.300419 125881935556416 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381813_results.db (UUID=00004312-40c3-70c3-8a97-11c9bea67ec1)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.391559 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014839 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.392731 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.395468 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002709 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.400757 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003218 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.524825 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124040 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.527533 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002677 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.527561 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543153 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015576 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543180 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543192 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543204 125881935556416 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543403 125881935556416 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000179 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.543913 125881935556416 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.244770 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.549772 125881935556416 simple_timer.cpp:55] [rocprofv3] output generation ::     0.288008 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:38.549864 125881935556416 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.290585 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381813_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:40.816649 126125271809856 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304728 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:40.826720 126125271809856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.038266 126125271809856 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:46:41.173027 126125271809856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346308 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.212586 126125271809856 generateRocpd.cpp:582] writing SQL database for process 2381823 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:46:41.213849 126125271809856 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/dl385-20-mi100-3c48/2381823_results.db (UUID=00004312-4c28-7c28-b3ea-ffdbdfa7e948)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.303364 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014350 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.304499 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001104 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.307025 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002497 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.312064 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003060 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.388257 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076164 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.390991 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002704 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.391020 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.406932 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015897 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.406959 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.406980 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000009 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.407006 126125271809856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.407213 126125271809856 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.407643 126125271809856 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.195057 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.413530 126125271809856 simple_timer.cpp:55] [rocprofv3] output generation ::     0.238022 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:46:41.413612 126125271809856 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.240532 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/path/MI100/out/pmc_1/2381823_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
