alias: ins_cache, block id: 13
alias: sl1d, block id: 14
Rocprofiler-Compute version: 3.7.0
Profiler choice: rocprofiler-sdk
Output directory: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100
Target: MI100
Command: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3
Kernel Selection: None
Dispatch Selection: None
Filtered sections: ['ins_cache', 'sl1d']

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generating native tool project using command: cmake -S /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib -B /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
-- Checking for module 'libdw'
--   Package 'libdw', required by 'virtual:world', not found
-- Could NOT find libdw (missing: libdw_LIBRARY libdw_INCLUDE_DIR)
-- {fmt} version: 12.1.0
-- Build type:
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build
Building native tool using command: cmake --build /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build --parallel
[  0%] Built target gsl_assert
[ 33%] Built target fmt
[100%] Built target rocprofiler-compute-tool
Searching /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src by lib/_build/lib/librocprofiler-compute-tool.so for native collector
Using native collector: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
Using native counter collection tool: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/src/lib/_build/lib/librocprofiler-compute-tool.so
[profiling] Iteration multiplexing: Disabled
[Run 1/3][Approximate profiling time left: pending first measurement...]
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/perfmon/pmc_perf_0.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:35.873492 130123318050624 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.299364 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:35.883017 130123318050624 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.096835 130123318050624 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:36.224887 130123318050624 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341871 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.263633 130123318050624 generateRocpd.cpp:582] writing SQL database for process 2383275 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:51:36.264916 130123318050624 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/dl385-20-mi100-3c48/2383275_results.db (UUID=00004316-ccbe-7cbe-aa8b-df6c3bdd3c5c)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.357014 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014021 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.358213 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001168 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.360447 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002206 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.365652 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003192 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.379281 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.013600 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.381757 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002448 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.381786 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397308 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015507 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397336 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397347 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397359 130123318050624 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397558 130123318050624 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000182 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.397909 130123318050624 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.134277 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.403895 130123318050624 simple_timer.cpp:55] [rocprofv3] output generation ::     0.176468 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:36.403969 130123318050624 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.179030 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/2383275_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 2/3][Approximate profiling time left: 3 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/perfmon/pmc_perf_1.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:38.620408 125633598594880 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.297110 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:38.630448 125633598594880 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:38.840572 125633598594880 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:38.967795 125633598594880 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.337348 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.007171 125633598594880 generateRocpd.cpp:582] writing SQL database for process 2383287 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:51:39.008483 125633598594880 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/dl385-20-mi100-3c48/2383287_results.db (UUID=00004316-d77c-777c-a7d9-8cb8be6c4b13)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.100090 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.013959 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.101270 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001149 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.103473 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002175 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.108565 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003135 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.120537 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.011943 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.123129 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002563 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.123158 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.138746 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015573 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.138774 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.138785 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.138797 125633598594880 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.139012 125633598594880 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000194 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.139369 125633598594880 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.132198 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.145196 125633598594880 simple_timer.cpp:55] [rocprofv3] output generation ::     0.174843 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:39.145270 125633598594880 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.177422 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/2383287_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
[Run 3/3][Approximate profiling time left: 0 seconds]...
[profiling] Current input file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/perfmon/pmc_perf_SQ_IFETCH_LEVEL_ACCUM.yaml
   |-> [rocprofiler-sdk] [rocprofiler-compute] [rocprofiler_configure] (priority=1) is using rocprofiler-sdk v1.1.0 (1.1.0)
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:41.358600 131047172226880 simple_timer.cpp:55] [rocprofv3] tool initialization ::     0.299175 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool init
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:41.368312 131047172226880 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.580251 131047172226880 tool.cpp:2422] HSA version 8.20.1 initialized (instance=0)
   |-> [rocprofiler-sdk] [mvcopy testing on GCD 0
   |-> [rocprofiler-sdk] Finished allocating vectors on the CPU
   |-> [rocprofiler-sdk] Finished allocating vectors on the GPU
   |-> [rocprofiler-sdk] Finished copying vectors to the GPU
   |-> [rocprofiler-sdk] sw thinks it moved 1.000000 KB per wave
   |-> [rocprofiler-sdk] Total threads: 1048576, Grid Size: 4096 block Size:256, Wavefronts:16384:
   |-> [rocprofiler-sdk] Launching the  kernel on the GPU
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished executing kernel
   |-> [rocprofiler-sdk] Finished copying the output vector from the GPU to the CPU
   |-> [rocprofiler-sdk] Releasing GPU memory
   |-> [rocprofiler-sdk] Releasing CPU memory
   |-> [rocprofiler-sdk] [0;33mW20260526 16:51:41.709955 131047172226880 simple_timer.cpp:55] [rocprofv3] '/home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/vcopy -n 1048576 -b 256 -i 3' ::     0.341643 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.749089 131047172226880 generateRocpd.cpp:582] writing SQL database for process 2383297 on node 2710291163
   |-> [rocprofiler-sdk] [m[0;31mE20260526 16:51:41.750357 131047172226880 generateRocpd.cpp:605] Opened result file: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/dl385-20-mi100-3c48/2383297_results.db (UUID=00004316-e22b-722b-9766-ce76436796c6)
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.842237 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_string             ::     0.014076 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.843481 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_node          ::     0.001213 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.845715 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_process       ::     0.002205 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.851003 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_agent         ::     0.003204 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.864515 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_info_pmc           ::     0.013483 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.866992 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd kernel info        ::     0.002447 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.867022 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_region             ::     0.000002 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.882801 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_kernel_dispatch    ::     0.015765 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.882831 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_pmc_event          ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.882844 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_copy        ::     0.000000 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.882855 131047172226880 simple_timer.cpp:55] SQLite3 generation :: rocpd_memory_allocate    ::     0.000001 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.883084 131047172226880 simple_timer.cpp:55] SQLite3 generation :: SQL indexing             ::     0.000210 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.883493 131047172226880 simple_timer.cpp:55] SQLite3 generation :: total                    ::     0.134404 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.889430 131047172226880 simple_timer.cpp:55] [rocprofv3] output generation ::     0.176902 sec
   |-> [rocprofiler-sdk] [m[0;33mW20260526 16:51:41.889523 131047172226880 simple_timer.cpp:55] [rocprofv3] tool finalization ::     0.179436 sec
   |-> [rocprofiler-sdk] [m[rocprofiler-compute] In tool fini
   |-> [rocprofiler-sdk] [rocprofiler-compute] [write_counters] Counter collection data has been written to: /home/xuchen/dev/rocm-systems/projects/rocprofiler-compute/tests/workloads/ipblocks_SQC/MI100/out/pmc_1/2383297_native_counter_collection.csv
Intermediate results_*.csv generation from rocpd databases is deprecated and will be replaced with automatic .db file retention in a future release.
PC sampling data collection skipped as block 21 is not specified.
[roofline] Skipping roofline
