Following counters might not be supported by rocprof: SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_ADD_F32
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: ['42']
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[ 33%] Built target fmt
[ 22%] Built target gsl_assert
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:48.429858 132604752367424 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306823 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:48.438905 132604752367424 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.650666 132604752367424 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:48.780782 132604752367424 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341877 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.819555 132604752367424 generateRocpd.cpp:582] writing SQL database for process 2386843 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:48.820833 132604752367424 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386843_results.db (UUID=0000431d-6663-7663-8094-553b309c969f)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.908870 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014402 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.910026 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001126 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.912517 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002462 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.917487 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003047 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.987141 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069625 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.989961 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002790 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:48.990001 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006037 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016021 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006064 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006076 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006088 132604752367424 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006293 132604752367424 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.006715 132604752367424 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.187160 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.012453 132604752367424 simple_timer.cpp:55] [rocprofv3] output generation ::     0.229165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:49.012541 132604752367424 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.231710 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386843_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:51.293007 130569212272448 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303905 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:51.302955 130569212272448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.514451 130569212272448 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:51.646727 130569212272448 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343772 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.685912 130569212272448 generateRocpd.cpp:582] writing SQL database for process 2386854 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:51.687183 130569212272448 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386854_results.db (UUID=0000431d-7195-7195-8c1e-c000740d09f0)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.771196 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.010883 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.772150 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.000928 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.774153 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.001978 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.778327 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.826785 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.048433 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.829252 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002441 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.829278 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.841932 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.012636 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.841957 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.841982 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.841997 130569212272448 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.842150 130569212272448 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000140 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.842527 130569212272448 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.156616 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.847156 130569212272448 simple_timer.cpp:55] [rocprofv3] output generation ::     0.197959 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:51.847223 130569212272448 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.200443 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386854_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:54.110846 133228973473600 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306059 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:54.119671 133228973473600 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.330817 133228973473600 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:54.463136 133228973473600 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343465 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.502318 133228973473600 generateRocpd.cpp:582] writing SQL database for process 2386865 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:54.503595 133228973473600 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386865_results.db (UUID=0000431d-7c95-7c95-9437-cf9e1cf7aab6)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.592923 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014605 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.594081 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001121 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.596666 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002556 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.601729 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003113 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.692530 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.090771 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.695445 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002880 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.695475 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.711579 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016089 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.711608 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.711620 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.711632 133228973473600 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.711855 133228973473600 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000203 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.712425 133228973473600 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.210108 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.718292 133228973473600 simple_timer.cpp:55] [rocprofv3] output generation ::     0.252695 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:54.718380 133228973473600 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.255195 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386865_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:56.995578 127352216780608 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306589 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:57.005635 127352216780608 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.218142 127352216780608 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:57.350437 127352216780608 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344803 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.382442 127352216780608 generateRocpd.cpp:582] writing SQL database for process 2386876 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:58:57.383457 127352216780608 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386876_results.db (UUID=0000431d-87d9-77d9-9dc6-d21d1f369c9d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.455872 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.010864 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.456834 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.000938 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.458880 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002025 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.462942 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002453 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.518231 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.055267 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.520586 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002333 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.520609 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533166 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.012546 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533192 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533202 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533211 127352216780608 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533370 127352216780608 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000149 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.533795 127352216780608 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.151354 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.538549 127352216780608 simple_timer.cpp:55] [rocprofv3] output generation ::     0.185588 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:58:57.538624 127352216780608 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.188135 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386876_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:59.800013 124938055561024 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305620 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:58:59.808791 124938055561024 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.019104 124938055561024 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:00.154000 124938055561024 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.193639 124938055561024 generateRocpd.cpp:582] writing SQL database for process 2386888 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:00.194917 124938055561024 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386888_results.db (UUID=0000431d-92ce-72ce-a4c3-cd2bd66109a6)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.284445 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014404 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.285576 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001101 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.288104 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002499 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.293064 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003042 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.357493 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.064396 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.360455 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002932 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.360487 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.375993 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015491 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.376020 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.376032 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.376044 124938055561024 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.376250 124938055561024 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000185 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.376723 124938055561024 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.183084 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.382560 124938055561024 simple_timer.cpp:55] [rocprofv3] output generation ::     0.226061 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:00.382646 124938055561024 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.228593 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386888_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:02.650336 131269699276608 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305986 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:02.660270 131269699276608 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:02.876691 131269699276608 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:03.009958 131269699276608 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.349688 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.049210 131269699276608 generateRocpd.cpp:582] writing SQL database for process 2386910 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:03.050475 131269699276608 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386910_results.db (UUID=0000431d-9df0-7df0-931d-f873b9e22d30)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.141325 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014383 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.142450 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001094 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.144918 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002440 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.150154 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003258 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.204106 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.053924 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.206942 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002807 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.206971 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.222723 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015722 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.222750 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.222762 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.222774 131269699276608 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.222991 131269699276608 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.223396 131269699276608 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.174187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.229169 131269699276608 simple_timer.cpp:55] [rocprofv3] output generation ::     0.216697 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:03.229249 131269699276608 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.219231 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386910_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:05.516579 127190879153984 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305323 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:05.526601 127190879153984 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:05.737091 127190879153984 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:05.866008 127190879153984 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.339408 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:05.904788 127190879153984 generateRocpd.cpp:582] writing SQL database for process 2386920 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:05.906083 127190879153984 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386920_results.db (UUID=0000431d-a923-7923-a39c-d32b179b2527)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:05.993402 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014185 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:05.994474 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001043 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:05.996649 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002146 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.001649 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003105 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.052082 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.050404 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.054822 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002711 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.054850 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070012 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015147 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070040 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070052 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070064 127190879153984 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070270 127190879153984 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000188 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.070730 127190879153984 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.165943 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.076482 127190879153984 simple_timer.cpp:55] [rocprofv3] output generation ::     0.208000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:06.076555 127190879153984 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.210498 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386920_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:08.359362 128876706246464 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302919 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:08.369167 128876706246464 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.580200 128876706246464 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:08.712675 128876706246464 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343508 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.752595 128876706246464 generateRocpd.cpp:582] writing SQL database for process 2386931 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:08.753874 128876706246464 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386931_results.db (UUID=0000431d-b440-7440-ae05-8eb296bcf20d)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.841666 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014513 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.842767 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001070 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.845321 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002525 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.850296 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003082 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.944592 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094266 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.947409 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002782 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.947439 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.963511 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016058 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.963541 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.963553 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.963565 128876706246464 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.963771 128876706246464 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000190 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.964286 128876706246464 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.211691 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.970143 128876706246464 simple_timer.cpp:55] [rocprofv3] output generation ::     0.254936 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:08.970241 128876706246464 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.257513 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386931_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:11.241678 126317027016512 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307610 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:11.250779 126317027016512 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.463046 126317027016512 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:11.595082 126317027016512 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344304 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.634956 126317027016512 generateRocpd.cpp:582] writing SQL database for process 2386942 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:11.636270 126317027016512 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386942_results.db (UUID=0000431d-bf7e-7f7e-9b76-73dc51897de4)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.726480 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014440 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.727654 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.730193 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002510 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.735180 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003068 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.827895 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.092687 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.830780 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002855 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.830809 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.846320 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015496 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.846348 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.846360 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.846371 126317027016512 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.846570 126317027016512 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000182 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.847038 126317027016512 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.212082 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.853241 126317027016512 simple_timer.cpp:55] [rocprofv3] output generation ::     0.255670 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:11.853329 126317027016512 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.258195 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386942_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:14.154604 123932961820480 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.312825 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:14.164555 123932961820480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.378496 123932961820480 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:14.512917 123932961820480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.348363 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.551971 123932961820480 generateRocpd.cpp:582] writing SQL database for process 2386952 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:14.553295 123932961820480 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386952_results.db (UUID=0000431d-cada-7ada-9351-59d55ef4e394)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.642080 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014938 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.643255 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.645769 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002486 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.650707 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.002999 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.785239 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134503 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.788008 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002739 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.788038 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.803427 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015375 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.803455 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.803467 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.803479 123932961820480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.803686 123932961820480 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000188 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.804184 123932961820480 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.252214 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.809999 123932961820480 simple_timer.cpp:55] [rocprofv3] output generation ::     0.294609 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:14.810091 123932961820480 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.297120 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386952_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:17.120216 127885390802752 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.309627 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:17.130081 127885390802752 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.340640 127885390802752 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:17.475016 127885390802752 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344936 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.514072 127885390802752 generateRocpd.cpp:582] writing SQL database for process 2386962 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:17.515358 127885390802752 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386962_results.db (UUID=0000431d-d673-7673-a7c9-11f32f3a7725)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.604411 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015016 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.605495 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001053 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.608041 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.613006 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003097 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.737252 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124217 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.740060 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002779 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.740090 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.755902 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015798 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.755931 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.755944 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.755955 127885390802752 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.756189 127885390802752 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000199 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.756685 127885390802752 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.242614 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.762486 127885390802752 simple_timer.cpp:55] [rocprofv3] output generation ::     0.285010 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:17.762576 127885390802752 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.287508 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386962_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:20.041781 131785440882496 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303299 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:20.050697 131785440882496 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.261291 131785440882496 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:59:20.392448 131785440882496 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341751 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.432073 131785440882496 generateRocpd.cpp:582] writing SQL database for process 2386973 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:59:20.433385 131785440882496 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/dl385-20-mi100-3c48/2386973_results.db (UUID=0000431d-e1e3-71e3-9d7c-1ddbc5b604f1)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.522909 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014573 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.524031 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001091 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.526492 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002433 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.531390 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003017 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.608149 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076730 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.611007 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002828 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.611037 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.626775 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015724 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.626804 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.626816 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.626827 131785440882496 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.627057 131785440882496 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.627501 131785440882496 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.195429 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.633345 131785440882496 simple_timer.cpp:55] [rocprofv3] output generation ::     0.238405 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:59:20.633427 131785440882496 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.240929 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/kernel_inv_int/MI100/out/pmc_1/2386973_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
