Following counters might not be supported by rocprof: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_MFMA_MOPS_F32
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: All

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/12][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:22.976934 140663928938304 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.305681 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:22.986915 140663928938304 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.198815 140663928938304 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_ADD_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_FMA_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:23.333887 140663928938304 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346973 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.375501 140663928938304 generateRocpd.cpp:582] writing SQL database for process 2382594 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:23.376773 140663928938304 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382594_results.db (UUID=00004314-c597-7597-8973-d939d2daf820)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.467233 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014522 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.468370 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001106 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.470961 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002563 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.476109 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003178 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.545640 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.069503 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.548307 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002637 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.548335 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.564325 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015975 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.564353 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.564364 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.564377 140663928938304 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.564590 140663928938304 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000191 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.565016 140663928938304 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.189516 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.570608 140663928938304 simple_timer.cpp:55] [rocprofv3] output generation ::     0.233650 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:23.570694 140663928938304 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.236242 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382594_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/12][Approximate profiling time left: 33 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:25.884939 124008738303808 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.307526 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:25.894386 124008738303808 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.104774 124008738303808 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_FMA_F64, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MUL_F16[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:26.237481 124008738303808 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.343095 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.276806 124008738303808 generateRocpd.cpp:582] writing SQL database for process 2382618 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:26.278116 124008738303808 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382618_results.db (UUID=00004314-d0f2-70f2-9cf8-91a9b984ee55)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.372518 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014421 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.373725 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001176 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.376349 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002596 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.381496 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003155 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.444446 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.062922 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.447159 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002683 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.447188 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.463191 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015988 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.463223 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.463236 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.463248 124008738303808 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.463480 124008738303808 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000214 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.464122 124008738303808 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.187317 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.470145 124008738303808 simple_timer.cpp:55] [rocprofv3] output generation ::     0.230192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:26.470240 124008738303808 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.232709 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382618_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/12][Approximate profiling time left: 27 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_2.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:28.767574 124783447338816 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306234 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:28.777532 124783447338816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:28.988968 124783447338816 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] [33m[rocprofiler-compute] [create_counter_collection_profile] WARNING: Requested counters not available: SQ_INSTS_VALU_MUL_F32, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_TRANS_F64[0m
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:29.123120 124783447338816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345589 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.162075 124783447338816 generateRocpd.cpp:582] writing SQL database for process 2382628 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:29.163351 124783447338816 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382628_results.db (UUID=00004314-dc35-7c35-aff2-27b8d97a1a4c)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.255026 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014705 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.256223 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001165 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.258803 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002552 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.263913 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003099 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.331911 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.067970 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.334557 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002614 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.334586 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.349894 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015293 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.349921 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.349933 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.349945 124783447338816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.350165 124783447338816 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000202 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.350658 124783447338816 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.188583 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.356500 124783447338816 simple_timer.cpp:55] [rocprofv3] output generation ::     0.230921 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:29.356580 124783447338816 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.233410 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382628_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 4/12][Approximate profiling time left: 24 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_3.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:31.632596 140657601953600 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302687 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:31.642789 140657601953600 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:31.853372 140657601953600 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:31.984112 140657601953600 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341323 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.024162 140657601953600 generateRocpd.cpp:582] writing SQL database for process 2382638 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:32.025431 140657601953600 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382638_results.db (UUID=00004314-e76a-776a-b75d-b807f85e289b)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.117815 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014433 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.119105 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001260 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.121742 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002608 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.126899 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003142 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.198771 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.071844 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.201467 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002666 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.201497 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.217468 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015957 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.217495 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.217508 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.217519 140657601953600 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.217718 140657601953600 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000178 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.218178 140657601953600 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.194017 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.224017 140657601953600 simple_timer.cpp:55] [rocprofv3] output generation ::     0.237290 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:32.224103 140657601953600 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.239939 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382638_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 5/12][Approximate profiling time left: 20 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_4.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:34.505745 131660960649024 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302337 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:34.515658 131660960649024 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:34.730593 131660960649024 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:34.860411 131660960649024 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.344753 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:34.899941 131660960649024 generateRocpd.cpp:582] writing SQL database for process 2382649 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:34.901234 131660960649024 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382649_results.db (UUID=00004314-f2a3-72a3-ad2d-7f5f293b098e)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:34.993199 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014287 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:34.994336 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001106 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:34.996569 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.001797 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003240 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.064883 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.063056 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.067704 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002788 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.067734 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.083347 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015596 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.083379 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.083391 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.083403 131660960649024 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.083611 131660960649024 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000187 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.084094 131660960649024 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.184153 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.089868 131660960649024 simple_timer.cpp:55] [rocprofv3] output generation ::     0.227007 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:35.089950 131660960649024 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.229489 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382649_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 6/12][Approximate profiling time left: 17 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_5.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:37.359493 134401097031488 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304941 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:37.368861 134401097031488 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.579230 134401097031488 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:37.711362 134401097031488 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.342501 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.751145 134401097031488 generateRocpd.cpp:582] writing SQL database for process 2382659 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:37.752468 134401097031488 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382659_results.db (UUID=00004314-fdc7-7dc7-91d4-4fccb2041719)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.840937 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014112 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.842127 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001159 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.844607 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002450 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.849742 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003144 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.903955 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.054186 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.906614 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002612 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.906642 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922198 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015540 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922226 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922238 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922250 134401097031488 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922461 134401097031488 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000190 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.922881 134401097031488 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.171737 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.928829 134401097031488 simple_timer.cpp:55] [rocprofv3] output generation ::     0.215011 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:37.928906 134401097031488 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.217494 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382659_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 7/12][Approximate profiling time left: 14 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_6.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:40.197737 130410019561280 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.306136 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:40.207418 130410019561280 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.418701 130410019561280 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:40.548158 130410019561280 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340741 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.586750 130410019561280 generateRocpd.cpp:582] writing SQL database for process 2382669 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:40.588052 130410019561280 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382669_results.db (UUID=00004315-08dc-78dc-ac2a-d5693deb2f73)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.679749 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014239 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.680906 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001126 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.683189 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002256 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.688419 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003202 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.739506 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.051058 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.742189 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002654 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.742218 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.757592 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015359 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.757621 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.757633 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.757645 130410019561280 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.757856 130410019561280 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.758290 130410019561280 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.171541 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.764058 130410019561280 simple_timer.cpp:55] [rocprofv3] output generation ::     0.213440 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:40.764137 130410019561280 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.215930 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382669_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 8/12][Approximate profiling time left: 11 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_SQC_DCACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:43.038896 131941368282944 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.302532 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:43.048967 131941368282944 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.260068 131941368282944 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:43.389907 131941368282944 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340941 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.429252 131941368282944 generateRocpd.cpp:582] writing SQL database for process 2382680 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:43.430549 131941368282944 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382680_results.db (UUID=00004315-13f8-73f8-8b98-3913878a18f2)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.522367 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.523565 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001167 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.526182 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002589 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.531290 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.634516 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.103198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.637369 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002823 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.637398 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000003 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.653606 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.016193 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.653634 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.653646 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.653658 131941368282944 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.653885 131941368282944 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000209 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.654330 131941368282944 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.225078 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.660144 131941368282944 simple_timer.cpp:55] [rocprofv3] output generation ::     0.267553 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:43.660231 131941368282944 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.270269 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382680_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 9/12][Approximate profiling time left: 8 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_SQC_ICACHE_INFLIGHT_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:45.934399 137181377834816 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.303358 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:45.944426 137181377834816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.154711 137181377834816 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:46.291018 137181377834816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.346592 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.330650 137181377834816 generateRocpd.cpp:582] writing SQL database for process 2382691 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:46.331939 137181377834816 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382691_results.db (UUID=00004315-1f47-7f47-a1ef-16332ec0a262)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.423743 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014498 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.424942 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001167 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.427564 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002594 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.432855 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003234 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.527899 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.095009 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.530653 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002719 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.530682 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.546465 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015768 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.546493 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.546505 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.546517 137181377834816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.546730 137181377834816 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000193 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.547249 137181377834816 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.216599 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.553169 137181377834816 simple_timer.cpp:55] [rocprofv3] output generation ::     0.259642 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:46.553265 137181377834816 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.262191 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382691_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 10/12][Approximate profiling time left: 5 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:48.888417 130317302927168 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.308689 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:48.898715 130317302927168 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.114903 130317302927168 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:49.247791 130317302927168 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.349077 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.287207 130317302927168 generateRocpd.cpp:582] writing SQL database for process 2382701 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:49.288517 130317302927168 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382701_results.db (UUID=00004315-2acc-7acc-b5b9-730e6491cd9f)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.379073 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015026 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.380254 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001151 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.382817 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002535 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.388000 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003175 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.523260 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.135231 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.525948 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002659 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.525987 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.541960 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015958 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.541996 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.542009 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.542020 130317302927168 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.542223 130317302927168 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000180 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.542721 130317302927168 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.255514 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.548666 130317302927168 simple_timer.cpp:55] [rocprofv3] output generation ::     0.298409 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:49.548755 130317302927168 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.300912 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382701_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 11/12][Approximate profiling time left: 2 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_SQ_INST_LEVEL_LDS_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:51.883680 135898107100992 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.312896 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:51.894299 135898107100992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.105223 135898107100992 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:52.240087 135898107100992 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.345788 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.278333 135898107100992 generateRocpd.cpp:582] writing SQL database for process 2382711 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:52.279607 135898107100992 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382711_results.db (UUID=00004315-367b-767b-afa2-64b2305b5a1e)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.373099 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.015101 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.374279 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001150 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.376920 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002613 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.382089 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003092 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.506456 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.124337 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.509201 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002713 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.509230 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.524924 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015679 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.524952 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.524964 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.524986 135898107100992 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.525175 135898107100992 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000178 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.525610 135898107100992 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.247278 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.531435 135898107100992 simple_timer.cpp:55] [rocprofv3] output generation ::     0.288863 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:52.531526 135898107100992 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.291387 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382711_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 12/12][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/perfmon/pmc_perf_SQ_LEVEL_WAVES_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:54.812997 132720381226816 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.304230 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:54.822835 132720381226816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.033785 132720381226816 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:49:55.163660 132720381226816 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.340825 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.203001 132720381226816 generateRocpd.cpp:582] writing SQL database for process 2382722 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:49:55.204298 132720381226816 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/dl385-20-mi100-3c48/2382722_results.db (UUID=00004315-41f5-71f5-8289-97d75e80127c)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.296598 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014469 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.297770 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001141 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.300321 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002522 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.305556 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003150 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.381763 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.076178 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.384535 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002736 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.384568 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.400332 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015748 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.400370 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.400382 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.400394 132720381226816 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.400613 132720381226816 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.401213 132720381226816 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.198213 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.407050 132720381226816 simple_timer.cpp:55] [rocprofv3] output generation ::     0.240922 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:49:55.407150 132720381226816 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.243436 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/join_type_grid/MI100/out/pmc_1/2382722_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
