Following counters might not be supported by rocprof: SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_MFMA_MOPS_F64
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: ['vecCopy']
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:05.685638 138156119646016 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.308472 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:05.695569 138156119646016 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:05.902684 138156119646016 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:06.036689 138156119646016 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341120 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.071295 138156119646016 generateRocpd.cpp:582] writing SQL database for process 2382236 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:06.072656 138156119646016 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382236_results.db (UUID=00004313-97a9-77a9-be6d-728d578c7107)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.152885 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.011091 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.153902 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.000988 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.156051 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002126 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.160602 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002807 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.214755 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.054131 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.217112 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002333 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.217139 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000005 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.230945 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.013793 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.230986 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.230995 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.231007 138156119646016 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.231207 138156119646016 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.231883 138156119646016 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.160590 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.236896 138156119646016 simple_timer.cpp:55] [rocprofv3] output generation ::     0.197426 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:06.237019 138156119646016 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.200172 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382236_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 32 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:08.503177 127117281554240 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.311606 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:08.513307 127117281554240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:08.728824 127117281554240 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:08.864130 127117281554240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.350823 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:08.904388 127117281554240 generateRocpd.cpp:582] writing SQL database for process 2382247 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:08.905749 127117281554240 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382247_results.db (UUID=00004313-a2a8-72a8-86f9-003a3370b3bf)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:08.998043 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014616 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:08.999216 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001141 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.001739 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002494 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.006887 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003120 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.070033 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.063118 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.072795 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002731 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.072824 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.088283 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015444 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.088311 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.088323 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.088335 127117281554240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.088549 127117281554240 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000193 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.089002 127117281554240 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.184614 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.094706 127117281554240 simple_timer.cpp:55] [rocprofv3] output generation ::     0.228003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:09.094793 127117281554240 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.230609 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382247_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:11.372372 137579951562560 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304053 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:11.382470 137579951562560 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.598159 137579951562560 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:11.730131 137579951562560 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347662 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.769878 137579951562560 generateRocpd.cpp:582] writing SQL database for process 2382257 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:11.771190 137579951562560 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382257_results.db (UUID=00004313-ade4-7de4-8d35-9e4dec4ed545)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.863561 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.864748 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001155 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.867356 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002580 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.872577 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003180 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.940414 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067808 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.943093 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002649 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.943122 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.958432 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015294 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.958459 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.958471 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.958483 137579951562560 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.958694 137579951562560 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000191 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.959141 137579951562560 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.189264 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.964855 137579951562560 simple_timer.cpp:55] [rocprofv3] output generation ::     0.232221 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:11.964943 137579951562560 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.234761 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382257_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:14.241888 127919406276416 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303194 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:14.251889 127919406276416 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.466990 127919406276416 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:14.599005 127919406276416 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347116 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.638368 127919406276416 generateRocpd.cpp:582] writing SQL database for process 2382268 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:14.639659 127919406276416 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382268_results.db (UUID=00004313-b91b-791b-9134-6b6fb6b1d25b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.731574 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014390 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.732705 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001100 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.735220 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002486 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.740222 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003051 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.812253 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.071997 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.814859 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002576 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.814889 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.830540 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015635 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.830572 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.830584 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.830596 127919406276416 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.830815 127919406276416 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.831362 127919406276416 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.192995 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.837301 127919406276416 simple_timer.cpp:55] [rocprofv3] output generation ::     0.235816 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:14.837392 127919406276416 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.238334 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382268_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:17.125954 129681500819264 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305807 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:17.134721 129681500819264 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.347025 129681500819264 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:17.480651 129681500819264 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345930 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.515393 129681500819264 generateRocpd.cpp:582] writing SQL database for process 2382279 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:17.516423 129681500819264 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382279_results.db (UUID=00004313-c45c-745c-9a5a-4d19395f491e)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.591606 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.010974 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.592621 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.000991 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.594362 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001719 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.598626 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002563 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.647147 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.048499 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.649349 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002180 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.649371 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.661851 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.012469 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.661873 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.661882 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.661891 129681500819264 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.662053 129681500819264 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000152 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.662419 129681500819264 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.147027 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.666934 129681500819264 simple_timer.cpp:55] [rocprofv3] output generation ::     0.183656 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:17.667019 129681500819264 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.186295 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382279_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:19.908714 140337872604992 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303933 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:19.918611 140337872604992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.129477 140337872604992 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:20.258730 140337872604992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340119 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.297770 140337872604992 generateRocpd.cpp:582] writing SQL database for process 2382289 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:20.299088 140337872604992 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382289_results.db (UUID=00004313-cf3d-7f3d-beaa-8ecae51ef89d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.389224 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014107 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.390366 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001110 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.392868 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002472 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.397962 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003081 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.451960 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.053943 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.454701 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002694 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.454730 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.470336 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015592 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.470364 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.470376 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.470388 140337872604992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.470588 140337872604992 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.471080 140337872604992 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.173311 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.476882 140337872604992 simple_timer.cpp:55] [rocprofv3] output generation ::     0.215692 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:20.476963 140337872604992 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.218182 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382289_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:22.744788 133348905402176 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305370 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:22.753715 133348905402176 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:22.965019 133348905402176 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:23.097884 133348905402176 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344170 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.129806 133348905402176 generateRocpd.cpp:582] writing SQL database for process 2382299 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:23.130809 133348905402176 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382299_results.db (UUID=00004313-da50-7a50-959c-5b706780ca56)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.203618 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.010582 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.204611 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.000970 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.206378 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001746 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.210599 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002548 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.249754 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.039134 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.252005 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002228 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.252027 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265102 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.013064 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265122 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265132 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265140 133348905402176 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265291 133348905402176 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000137 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.265591 133348905402176 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.135786 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.269780 133348905402176 simple_timer.cpp:55] [rocprofv3] output generation ::     0.169320 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:23.269845 133348905402176 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.171890 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382299_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:25.533529 132171130711872 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.308689 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:25.543552 132171130711872 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:25.755827 132171130711872 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:25.883133 132171130711872 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.339581 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:25.922279 132171130711872 generateRocpd.cpp:582] writing SQL database for process 2382323 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:25.923568 132171130711872 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382323_results.db (UUID=00004313-e531-7531-ba3a-e097b51a8590)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.012158 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014543 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.013255 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001066 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.015748 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002465 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.020817 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003086 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.114985 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094140 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.117646 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002631 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.117675 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.133757 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016065 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.133789 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.133806 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.133819 132171130711872 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.134041 132171130711872 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.134560 132171130711872 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.212281 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.140771 132171130711872 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255312 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:26.140869 132171130711872 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.257693 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382323_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:28.422199 135808839671616 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303371 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:28.431875 135808839671616 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.642714 135808839671616 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:28.773140 135808839671616 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341265 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.812581 135808839671616 generateRocpd.cpp:582] writing SQL database for process 2382333 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:28.813857 135808839671616 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382333_results.db (UUID=00004313-f07f-707f-a4dd-df498c0122c4)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.902799 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014647 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.903986 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001141 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.906431 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002411 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:28.911529 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003032 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.005406 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.093848 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.008063 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002627 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.008093 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.024571 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016464 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.024606 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.024619 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.024631 135808839671616 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.024858 135808839671616 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000207 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.025468 135808839671616 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.212887 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.031451 135808839671616 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255844 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:29.031550 135808839671616 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.258359 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382333_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:31.354349 139449724456768 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.313317 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:31.364211 139449724456768 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.576849 139449724456768 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:31.712923 139449724456768 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.348712 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.752414 139449724456768 generateRocpd.cpp:582] writing SQL database for process 2382343 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:31.753703 139449724456768 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382343_results.db (UUID=00004313-fbe9-7be9-a89f-4d38f6e98a14)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.842274 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014806 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.843427 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001122 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.846056 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002600 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.851131 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003109 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.985868 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134708 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.988457 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002556 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:31.988486 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.004571 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016070 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.004600 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.004612 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.004624 139449724456768 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.004840 139449724456768 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000197 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.005298 139449724456768 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.252884 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.011170 139449724456768 simple_timer.cpp:55] [rocprofv3] output generation ::     0.295757 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:32.011269 139449724456768 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.298284 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382343_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:34.348783 130801891311424 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.309720 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:34.358750 130801891311424 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.571784 130801891311424 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:34.708662 130801891311424 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.349912 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.747695 130801891311424 generateRocpd.cpp:582] writing SQL database for process 2382354 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:34.748986 130801891311424 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382354_results.db (UUID=00004314-079f-779f-91f4-29ad9c53e198)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.840217 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014984 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.841383 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001136 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.843988 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002576 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.849174 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003181 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.973178 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.123975 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.975854 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002642 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.975884 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.991454 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015556 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.991482 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.991493 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.991505 130801891311424 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.991711 130801891311424 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000191 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.992253 130801891311424 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.244559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.998184 130801891311424 simple_timer.cpp:55] [rocprofv3] output generation ::     0.286948 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:34.998279 130801891311424 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.289566 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382354_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:37.272442 134399902310208 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307208 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:37.282653 134399902310208 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.494478 134399902310208 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:48:37.626870 134399902310208 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344218 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.664965 134399902310208 generateRocpd.cpp:582] writing SQL database for process 2382364 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:48:37.666246 134399902310208 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382364_results.db (UUID=00004314-130d-730d-91b1-ad2c33c0e8e9)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.754441 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014320 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.755579 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001109 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.758124 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.762967 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003048 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.838702 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.075696 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.841400 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002669 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.841430 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.856966 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015521 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.857008 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.857020 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.857032 134399902310208 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.857261 134399902310208 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000210 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.857826 134399902310208 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.192861 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.863813 134399902310208 simple_timer.cpp:55] [rocprofv3] output generation ::     0.234457 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:48:37.863906 134399902310208 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.236988 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel/MI100/out/pmc_1/2382364_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
