Following counters might not be supported by rocprof: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_TRANS_F16
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: ['vecPaste']
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:10.138326 131100630613824 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302340 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:10.148174 131100630613824 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.358670 131100630613824 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:10.491646 131100630613824 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343473 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.531221 131100630613824 generateRocpd.cpp:582] writing SQL database for process 2386671 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:10.532517 131100630613824 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386671_results.db (UUID=0000431c-d0d4-70d4-8c43-ed3e8419f51d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.620581 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014356 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.621678 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001066 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.624096 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002390 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.629031 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003041 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.698161 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069102 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.700945 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002754 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.700983 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000012 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717090 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016092 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717122 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717134 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717146 131100630613824 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717372 131100630613824 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000206 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.717923 131100630613824 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.186703 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.723889 131100630613824 simple_timer.cpp:55] [rocprofv3] output generation ::     0.229677 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:10.723990 131100630613824 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.232294 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386671_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:13.014751 123406330883904 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306439 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:13.025079 123406330883904 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.238400 123406330883904 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:13.373823 123406330883904 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.348744 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.413751 123406330883904 generateRocpd.cpp:582] writing SQL database for process 2386681 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:13.415041 123406330883904 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386681_results.db (UUID=0000431c-dc0c-7c0c-ae0c-b3e975ab69b4)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.503882 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014454 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.505060 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001145 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.507580 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002492 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.512717 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.576067 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.063321 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.578899 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002802 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.578928 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.594877 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015934 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.594907 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.594919 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.594931 123406330883904 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.595141 123406330883904 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000195 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.595620 123406330883904 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.181870 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.601543 123406330883904 simple_timer.cpp:55] [rocprofv3] output generation ::     0.225237 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:13.601635 123406330883904 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.227758 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386681_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:15.851757 126331595734848 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303491 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:15.861631 126331595734848 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.073081 126331595734848 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:16.206268 126331595734848 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344637 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.245327 126331595734848 generateRocpd.cpp:582] writing SQL database for process 2386691 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:16.246629 126331595734848 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386691_results.db (UUID=0000431c-e725-7725-835b-56625a63c691)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.335468 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014508 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.336585 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001084 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.339111 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002498 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.344087 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003042 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.412036 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067920 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.414849 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002784 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.414878 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.430834 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015940 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.430864 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.430876 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.430889 126331595734848 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.431131 126331595734848 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000222 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.431785 126331595734848 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.186458 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.437724 126331595734848 simple_timer.cpp:55] [rocprofv3] output generation ::     0.228980 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:16.437819 126331595734848 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.231498 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386691_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:18.694479 133885309202240 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305190 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:18.704441 133885309202240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:18.916918 133885309202240 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:19.048235 133885309202240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343795 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.087800 133885309202240 generateRocpd.cpp:582] writing SQL database for process 2386701 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:19.089094 133885309202240 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386701_results.db (UUID=0000431c-f23d-723d-b213-bbc8596f4310)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.178438 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014567 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.179525 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001056 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.182005 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002451 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.187087 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003133 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.258940 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.071824 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.261767 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002794 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.261796 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.277445 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015635 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.277473 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.277485 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.277497 133885309202240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.277699 133885309202240 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.278177 133885309202240 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.190378 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.283942 133885309202240 simple_timer.cpp:55] [rocprofv3] output generation ::     0.233195 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:19.284029 133885309202240 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.235742 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386701_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:21.552833 129290411396928 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306105 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:21.562697 129290411396928 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:21.774310 129290411396928 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:21.902866 129290411396928 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340170 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:21.942343 129290411396928 generateRocpd.cpp:582] writing SQL database for process 2386711 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:21.943617 129290411396928 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386711_results.db (UUID=0000431c-fd67-7d67-90bf-9f3a78f50d16)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.033484 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014461 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.034649 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001135 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.037191 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002514 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.042206 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003098 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.105031 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062796 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.107874 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002812 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.107903 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.123418 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.123448 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.123461 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.123472 129290411396928 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.123690 129290411396928 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.124230 129290411396928 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.181887 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.130106 129290411396928 simple_timer.cpp:55] [rocprofv3] output generation ::     0.224720 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:22.130196 129290411396928 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.227279 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386711_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:24.362703 127515687796544 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303382 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:24.371831 127515687796544 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.582953 127515687796544 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:24.711962 127515687796544 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340131 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.751507 127515687796544 generateRocpd.cpp:582] writing SQL database for process 2386722 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:24.752785 127515687796544 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386722_results.db (UUID=0000431d-0863-7863-9258-81d21a22c93b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.842209 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014415 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.843366 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001125 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.845928 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002534 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.850992 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003136 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.905076 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.054054 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.907888 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002782 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.907917 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.923651 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015720 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.923681 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.923693 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.923705 127515687796544 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.923907 127515687796544 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000185 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.924346 127515687796544 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.172839 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.930035 127515687796544 simple_timer.cpp:55] [rocprofv3] output generation ::     0.215660 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:24.930125 127515687796544 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.218101 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386722_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:27.194638 136523486076736 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305295 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:27.204624 136523486076736 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.415246 136523486076736 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:27.547193 136523486076736 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342569 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.586301 136523486076736 generateRocpd.cpp:582] writing SQL database for process 2386732 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:27.587591 136523486076736 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386732_results.db (UUID=0000431d-1372-7372-9782-46ec5045af58)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.676949 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014244 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.678114 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001115 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.680313 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.685361 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003054 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.736115 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.050726 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.738888 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002743 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.738917 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.754652 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015720 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.754682 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.754694 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.754706 136523486076736 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.754920 136523486076736 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000193 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.755396 136523486076736 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.169096 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.761247 136523486076736 simple_timer.cpp:55] [rocprofv3] output generation ::     0.211479 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:27.761334 136523486076736 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.214087 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386732_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:30.018833 128295847337792 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304445 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:30.027731 128295847337792 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.239739 128295847337792 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:30.374042 128295847337792 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346312 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.413000 128295847337792 generateRocpd.cpp:582] writing SQL database for process 2386743 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:30.414294 128295847337792 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386743_results.db (UUID=0000431d-1e7b-7e7b-8b0f-0dac654f5872)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.503855 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014506 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.505016 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001128 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.507493 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002448 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.512464 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003064 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.606895 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094402 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.609762 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002837 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.609790 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.625820 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016015 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.625850 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.625863 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.625875 128295847337792 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.626105 128295847337792 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000210 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.626623 128295847337792 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.213624 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.632503 128295847337792 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255989 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:30.632598 128295847337792 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.258502 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386743_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:32.910103 131976621870912 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306847 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:32.918765 131976621870912 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.130215 131976621870912 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:33.261851 131976621870912 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343086 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.301459 131976621870912 generateRocpd.cpp:582] writing SQL database for process 2386753 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:33.302764 131976621870912 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386753_results.db (UUID=0000431d-29c3-79c3-82d6-8cdc44f44ad8)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.392066 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.393224 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001125 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.395770 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002518 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.400813 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003100 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.493694 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.092847 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.496468 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002743 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.496497 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.512335 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015824 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.512368 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.512380 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.512392 131976621870912 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.512611 131976621870912 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.513188 131976621870912 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.211730 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.519531 131976621870912 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255161 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:33.519626 131976621870912 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.257726 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386753_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:35.829554 123214334672704 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.313304 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:35.839718 123214334672704 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.051468 123214334672704 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:36.187782 123214334672704 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.348064 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.227134 123214334672704 generateRocpd.cpp:582] writing SQL database for process 2386764 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:36.228392 123214334672704 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386764_results.db (UUID=0000431d-3524-7524-b6fe-aae3c6a998f3)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.317800 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014961 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.318950 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001119 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.321525 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002535 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.326583 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003116 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.461256 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134644 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.464048 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002761 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.464077 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.479984 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015891 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.480014 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.480027 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.480039 123214334672704 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.480258 123214334672704 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000205 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.480811 123214334672704 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.253677 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.486737 123214334672704 simple_timer.cpp:55] [rocprofv3] output generation ::     0.296441 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:36.486844 123214334672704 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.299008 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386764_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:38.790472 124207363612480 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.311725 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:38.800241 124207363612480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.010648 124207363612480 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:39.143930 124207363612480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343689 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.183087 124207363612480 generateRocpd.cpp:582] writing SQL database for process 2386774 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:39.184424 124207363612480 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386774_results.db (UUID=0000431d-40b7-70b7-9f25-d4073dbfb18a)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.275199 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014951 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.276369 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001139 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.278936 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002538 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.284007 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003069 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.408234 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.411075 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002811 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.411105 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427035 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015915 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427063 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427075 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427087 124207363612480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427293 124207363612480 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.427743 124207363612480 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.244656 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.433645 124207363612480 simple_timer.cpp:55] [rocprofv3] output generation ::     0.287173 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:39.433749 124207363612480 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.289761 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386774_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:41.731281 128968853008192 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307041 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:41.741313 128968853008192 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:41.953349 128968853008192 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:42.088813 128968853008192 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.128287 128968853008192 generateRocpd.cpp:582] writing SQL database for process 2386785 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:42.129562 128968853008192 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/dl385-20-mi100-3c48/2386785_results.db (UUID=0000431d-4c38-7c38-ac5d-0efe06f95210)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.217805 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014481 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.218932 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001097 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.221372 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002411 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.226213 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002985 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.302609 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076362 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.305455 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002814 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.305484 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.321315 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015816 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.321345 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.321357 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.321369 128968853008192 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.321586 128968853008192 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.322171 128968853008192 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.193884 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.328135 128968853008192 simple_timer.cpp:55] [rocprofv3] output generation ::     0.236805 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:42.328228 128968853008192 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.239365 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_str/MI100/out/pmc_1/2386785_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
