Following counters might not be supported by rocprof: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MFMA_MOPS_F64
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:01.760757 130554595262272 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307591 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:01.770684 130554595262272 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:01.982481 130554595262272 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:02.114001 130554595262272 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343317 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.153548 130554595262272 generateRocpd.cpp:582] writing SQL database for process 2382780 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:02.154856 130554595262272 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382780_results.db (UUID=00004315-5d15-7d15-adf0-d8fbbdfc1d14)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.246743 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014414 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.247897 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001123 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.250484 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002559 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.255621 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003170 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.324719 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069069 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.327407 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002655 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.327437 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343143 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015691 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343170 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343182 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343194 130554595262272 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343394 130554595262272 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000186 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.343811 130554595262272 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.190263 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.349659 130554595262272 simple_timer.cpp:55] [rocprofv3] output generation ::     0.233000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:02.349740 130554595262272 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.235687 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382780_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:04.628224 132400548654912 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.301633 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:04.638013 132400548654912 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:04.848317 132400548654912 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:04.980916 132400548654912 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342903 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.020086 132400548654912 generateRocpd.cpp:582] writing SQL database for process 2382796 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:05.021360 132400548654912 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382796_results.db (UUID=00004315-684f-784f-a3d0-4436fe28967b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.111747 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014382 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.112932 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001150 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.115514 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002549 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.120614 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003124 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.183385 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062738 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.186033 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002614 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.186067 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.201340 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015246 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.201367 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.201379 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.201391 132400548654912 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.201603 132400548654912 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.202047 132400548654912 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.181962 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.207861 132400548654912 simple_timer.cpp:55] [rocprofv3] output generation ::     0.224425 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:05.207940 132400548654912 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.226933 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382796_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:07.471529 128355712376640 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302714 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:07.480947 128355712376640 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.693510 128355712376640 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:07.823894 128355712376640 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342948 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.863264 128355712376640 generateRocpd.cpp:582] writing SQL database for process 2382808 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:07.864584 128355712376640 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382808_results.db (UUID=00004315-7369-7369-beac-66c8454e05ca)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.956646 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014464 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.957801 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001123 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.960427 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002597 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:07.965698 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003266 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.033554 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067826 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.036323 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002734 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.036352 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.051803 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015436 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.051831 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.051843 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.051855 128355712376640 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.052084 128355712376640 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.052561 128355712376640 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.189298 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.058466 128355712376640 simple_timer.cpp:55] [rocprofv3] output generation ::     0.232060 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:08.058554 128355712376640 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.234607 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382808_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:10.303505 139230031212352 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303343 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:10.313249 139230031212352 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.525272 139230031212352 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:10.653489 139230031212352 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340240 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.692812 139230031212352 generateRocpd.cpp:582] writing SQL database for process 2382819 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:10.694092 139230031212352 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382819_results.db (UUID=00004315-7e78-7e78-93cf-c17367362952)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.785234 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014500 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.786389 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001124 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.788956 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002539 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.794227 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003215 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.866067 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.071811 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.868791 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002694 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.868820 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.884446 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015611 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.884474 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.884486 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.884498 139230031212352 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.884706 139230031212352 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.885178 139230031212352 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.192367 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.891040 139230031212352 simple_timer.cpp:55] [rocprofv3] output generation ::     0.235053 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:10.891128 139230031212352 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.237587 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382819_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:13.158538 127230663868224 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304694 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:13.168765 127230663868224 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.382938 127230663868224 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:13.513123 127230663868224 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344359 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.553385 127230663868224 generateRocpd.cpp:582] writing SQL database for process 2382830 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:13.554670 127230663868224 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382830_results.db (UUID=00004315-899e-799e-95e5-7f88a8c2c56a)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.642999 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014401 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.644105 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001076 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.646276 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.651275 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003087 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.714180 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062877 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.716807 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002598 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.716835 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732091 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015241 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732118 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732130 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732141 127230663868224 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732345 127230663868224 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.732786 127230663868224 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.179402 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.738367 127230663868224 simple_timer.cpp:55] [rocprofv3] output generation ::     0.222740 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:13.738447 127230663868224 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.225269 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382830_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:15.988308 124943799570240 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307362 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:15.998011 124943799570240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.209861 124943799570240 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:16.340451 124943799570240 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342440 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.381622 124943799570240 generateRocpd.cpp:582] writing SQL database for process 2382840 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:16.382948 124943799570240 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382840_results.db (UUID=00004315-94a9-74a9-86d3-215bb26c92bf)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.477136 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014276 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.478249 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001083 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.480780 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002503 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.486044 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.539829 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.053757 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.542497 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002639 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.542526 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558137 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015596 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558163 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558175 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558187 124943799570240 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558382 124943799570240 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000179 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.558794 124943799570240 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.177173 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.564432 124943799570240 simple_timer.cpp:55] [rocprofv3] output generation ::     0.221444 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:16.564514 124943799570240 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.223980 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382840_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:18.824429 137643623608128 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303932 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:18.834600 137643623608128 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.042355 137643623608128 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:19.171999 137643623608128 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.337399 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.209839 137643623608128 generateRocpd.cpp:582] writing SQL database for process 2382851 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:19.211119 137643623608128 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382851_results.db (UUID=00004315-9fc1-7fc1-9d4d-cfef82e2176b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.302727 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014368 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.303923 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.306171 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002219 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.311336 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.362327 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.050958 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.364998 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002641 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.365027 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.380570 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015528 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.380600 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.380612 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.380623 137643623608128 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.380826 137643623608128 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000182 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.381296 137643623608128 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.171457 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.387096 137643623608128 simple_timer.cpp:55] [rocprofv3] output generation ::     0.212630 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:19.387176 137643623608128 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.215127 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382851_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:21.640061 131165672156992 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307402 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:21.648659 131165672156992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:21.859749 131165672156992 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:21.994194 131165672156992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345535 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.034168 131165672156992 generateRocpd.cpp:582] writing SQL database for process 2382861 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:22.035446 131165672156992 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382861_results.db (UUID=00004315-aabd-7abd-90b0-8c2bfbcaabf9)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.134967 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.021846 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.136134 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001121 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.138721 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002558 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.143988 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003191 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.238295 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.094278 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.241026 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002699 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.241055 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257041 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015971 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257069 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257081 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257093 131165672156992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257295 131165672156992 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000181 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.257749 131165672156992 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.223582 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.264807 131165672156992 simple_timer.cpp:55] [rocprofv3] output generation ::     0.268125 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:22.264894 131165672156992 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.270645 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382861_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:24.537012 129721441283904 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303571 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:24.546862 129721441283904 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:24.763300 129721441283904 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:24.894594 129721441283904 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.347733 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:24.933912 129721441283904 generateRocpd.cpp:582] writing SQL database for process 2382872 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:24.935262 129721441283904 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382872_results.db (UUID=00004315-b611-7611-99a0-5b40f56a3aef)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.027443 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014662 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.028576 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001103 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.031195 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002589 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.036220 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.129196 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.092946 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.131888 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002661 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.131917 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148157 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016225 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148187 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148199 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148211 129721441283904 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148449 129721441283904 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000222 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.148985 129721441283904 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.215074 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.155135 129721441283904 simple_timer.cpp:55] [rocprofv3] output generation ::     0.258041 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:25.155234 129721441283904 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.260587 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382872_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:27.464795 134696013283136 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.313701 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:27.473751 134696013283136 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.685369 134696013283136 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:27.820205 134696013283136 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346454 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.859649 134696013283136 generateRocpd.cpp:582] writing SQL database for process 2382894 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:27.860921 134696013283136 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382894_results.db (UUID=00004315-c177-7177-ab5b-c070ae502a86)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.953274 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015114 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.954491 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001185 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.957169 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002649 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:27.962370 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003156 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.096999 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.134599 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.099677 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002648 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.099706 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.115666 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015945 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.115696 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.115708 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.115720 134696013283136 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.115933 134696013283136 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000197 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.116536 134696013283136 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.256888 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.122704 134696013283136 simple_timer.cpp:55] [rocprofv3] output generation ::     0.299991 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:28.122812 134696013283136 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.302556 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382894_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:30.432110 123696235036480 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.311654 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:30.442045 123696235036480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.653498 123696235036480 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:30.785826 123696235036480 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343781 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.825096 123696235036480 generateRocpd.cpp:582] writing SQL database for process 2382904 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:30.826373 123696235036480 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382904_results.db (UUID=00004315-cd11-7d11-b8af-3e35eb170c9b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.919689 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014994 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.920868 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001148 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.923425 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002529 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:30.928708 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003249 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.053040 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124304 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.055820 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002749 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.055849 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.071651 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015787 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.071679 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.071692 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.071703 123696235036480 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.071906 123696235036480 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000183 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.072356 123696235036480 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.247261 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.078190 123696235036480 simple_timer.cpp:55] [rocprofv3] output generation ::     0.289847 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:31.078279 123696235036480 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.292403 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382904_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:33.373043 129073957633856 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305709 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:33.383082 129073957633856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.594329 129073957633856 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:50:33.723752 129073957633856 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340670 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.763246 129073957633856 generateRocpd.cpp:582] writing SQL database for process 2382914 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:50:33.764509 129073957633856 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/dl385-20-mi100-3c48/2382914_results.db (UUID=00004315-d893-7893-bc0c-b403c798e7d3)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.856395 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.857531 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001106 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.860069 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002510 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.865098 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003138 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.941455 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076328 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.944253 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002764 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.944282 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.959862 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015565 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.959889 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.959902 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.959913 129073957633856 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.960125 129073957633856 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.960566 129073957633856 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.197320 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.966376 129073957633856 simple_timer.cpp:55] [rocprofv3] output generation ::     0.240148 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:50:33.966463 129073957633856 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.242660 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_kernel/MI100/out/pmc_1/2382914_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
