Info about the slowest avocado tcg tests (arm, s390x, ppc, replay)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: qemu-devel@nongnu.org
Cc: "Cleber Rosa" <crosa@redhat.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	"Peter Maydell" <peter.maydell@linaro.org>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>
Subject: Info about the slowest avocado tcg tests (arm, s390x, ppc, replay)
Date: Sat, 13 Aug 2022 07:10:32 +0100	[thread overview]
Message-ID: <87r11kpt1j.fsf@linaro.org> (raw)

[RESEND because the ML ate my email again]

Hi All,

In my investigations into our current Avocado tests I identified all
that ran for more than 30s for an --enable-debug build. They are:

  tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2: PASS( 763.81 s)
  tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3: PASS( 742.75 s)

The main slowness here is due to inefficient TB invalidation of code
generated during the firmware phase when the rest of the system boots.

  tests/avocado/boot_linux.py:BootLinuxS390X.test_s390_ccw_virtio_tcg:  PASS( 433.48 s)

This looks like a very fat distro. A lot of slow delays waiting for
systemd units to come up. That said the positively svelte Alpine ISO
still takes about 4min30s to churn through it's userspace boot.

Looking at the callchains though it might be helped by the same fixes as
the ARM tcg GIC tests:

-   28.15%     0.16%  qemu-system-s39  qemu-system-s390x             [.] store_helper
   - 27.99% store_helper
      - 23.08% notdirty_write
         - 17.51% page_collection_lock
              2.76% page_trylock_add
              1.22% g_tree_lookup
              0.62% g_tree_find_node
         - 4.53% tb_invalidate_phys_page_fast
            - 4.37% tb_invalidate_phys_page_range__locked
               - 3.44% tb_phys_invalidate__locked
                  - 3.44% do_tb_phys_invalidate
                       3.40% tb_page_remove
           0.63% page_trylock_add
      - 4.25% tlb_fill
         - 4.21% s390_cpu_tlb_fill
            - 2.35% mmu_translate
               - 1.77% mmu_translate_asce
                  - 1.58% read_table_entry
                     - 1.39% address_space_read_full
                        - 1.11% flatview_read
                           - 0.62% flatview_translate
                                0.55% flatview_do_translate
            - 1.70% tlb_set_page
               - 1.64% tlb_set_page_with_attrs
                    0.52% cpu_physical_memory_is_clean
-   25.10%     0.02%  qemu-system-s39  qemu-system-s390x             [.] notdirty_write
   - 25.08% notdirty_write
      - 18.14% page_collection_lock
           2.84% page_trylock_add
           1.25% g_tree_lookup
           0.64% g_tree_find_node
      - 5.86% tb_invalidate_phys_page_fast
         - 5.69% tb_invalidate_phys_page_range__locked
            - 4.62% tb_phys_invalidate__locked
               - 4.62% do_tb_phys_invalidate
                    4.57% tb_page_remove
        0.65% page_trylock_add

There might be some millage looking at the tb_jmp_cache utilisation
though because it looks like pretty much all tb_lookup falls through to
the full QHT lookup:

-   19.82%     1.65%  qemu-system-s39  qemu-system-s390x             [.] tb_lookup
   - 18.17% tb_lookup
      - 17.84% tb_htable_lookup
         - 9.94% get_page_addr_code
            - 9.66% get_page_addr_code_hostp
               - 5.55% tlb_fill
                  - 5.48% s390_cpu_tlb_fill
                     - 3.55% mmu_translate
                        - 2.73% mmu_translate_asce
                           - 2.44% read_table_entry
                              - 2.17% address_space_read_full
                                 - 1.72% flatview_read
                                    - 0.93% flatview_translate
                                       - 0.83% flatview_do_translate
                                            0.58% address_space_translate_internal
                        - 0.54% mmu_absolute_addr_valid
                             0.51% address_space_access_valid
                     - 1.73% tlb_set_page
                        - 1.64% tlb_set_page_with_attrs
                             0.54% tlb_flush_vtlb_page_locked
               - 1.56% qemu_ram_addr_from_host_nofail
                  - 1.40% qemu_ram_addr_from_host
                       1.17% qemu_ram_block_from_host
                 0.73% tlb_entry
         - 4.30% qht_lookup_custom
            - 1.90% qht_do_lookup
                 1.21% tb_lookup_cmp
              1.59% seqlock_read_begin
         - 2.82% tb_hash_func
            - 2.66% qemu_xxhash6
               - 2.35% qemu_xxhash7
                    rol32

  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg: PASS( 341.90 s)
  tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg: PASS( 240.76 s)
  tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS( 221.70 s)

Nothing especially jumps out of perf (in fact it looks very similar to
s390x):

   6.40%  qemu-system-ppc  qemu-system-ppc64        [.] tb_lookup
   5.63%  qemu-system-ppc  qemu-system-ppc64        [.] check_regs
   4.87%  qemu-system-ppc  qemu-system-ppc64        [.] hreg_compute_hflags_value
   4.21%  qemu-system-ppc  qemu-system-ppc64        [.] pmu_increment_insns
   2.42%  qemu-system-ppc  qemu-system-ppc64        [.] helper_insns_inc
   2.26%  qemu-system-ppc  qemu-system-ppc64        [.] cpu_get_tb_cpu_state
   2.20%  qemu-system-ppc  qemu-system-ppc64        [.] type_get_parent
   2.07%  qemu-system-ppc  qemu-system-ppc64        [.] helper_lookup_tb_ptr
   1.96%  qemu-system-ppc  [kernel.kallsyms]        [k] list_lru_count_one
   1.63%  qemu-system-ppc  qemu-system-ppc64        [.] type_is_ancestor
   1.62%  qemu-system-ppc  qemu-system-ppc64        [.] helper_vsldoi
   1.49%  qemu-system-ppc  qemu-system-ppc64        [.] la_reset_pref
   1.14%  qemu-system-ppc  qemu-system-ppc64        [.] seqlock_read_begin
   1.02%  qemu-system-ppc  qemu-system-ppc64        [.] liveness_pass_1
   0.97%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock
   0.81%  qemu-system-ppc  qemu-system-ppc64        [.] curr_cflags
   0.76%  qemu-system-ppc  libglib-2.0.so.0.5600.4  [.] g_hash_table_lookup
   0.75%  qemu-system-ppc  qemu-system-ppc64        [.] check_for_breakpoints
   0.75%  qemu-system-ppc  qemu-system-ppc64        [.] tb_jmp_cache_hash_func                       

which all looks fairly typical for --debug code with heavy computed
jumps (triggering a lot of tb_lookups).

  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: WARN( 91.80 s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: WARN( 91.47 s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: WARN( 91.36 s)
  tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: WARN( 88.56 s)

These all end with WARN so could be broken anyway and just timing out.
Didn't dig in too deeply.

  tests/avocado/replay_kernel.py:ReplayKernelNormal.test_arm_cubieboard_initrd: PASS( 63.24 s)
  tests/avocado/replay_kernel.py:ReplayKernelNormal.test_sparc_ss20: PASS( 59.36 s)

This could just be replay being inherently slow given it runs several
times under icount.

  tests/avocado/boot_xen.py:BootXen.test_arm64_xen_411_and_dom0: PASS( 57.15 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd: PASS( 55.00 s)
  tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2500_romulus_openbmc_v2_9_0: PASS( 52.73 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_x86_64_pc: PASS( 51.08 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_aarch64_xlnx_versal_virt: PASS( 47.98 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_ast2600_debian: PASS( 47.48 s)
  tests/avocado/linux_initrd.py:LinuxInitrd.test_with_2gib_file_should_work_with_linux_v4_16: PASS( 45.29 s)
  tests/avocado/machine_aspeed.py:AST2x00Machine.test_arm_ast2400_palmetto_openbmc_v2_9_0: PASS( 44.54 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd: PASS( 40.96 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_exynos4210_initrd: PASS( 37.70 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_initrd: PASS( 34.29 s)
  tests/avocado/boot_linux_console.py:BootLinuxConsole.test_arm_cubieboard_sata: PASS( 33.19 s)

-- 
Alex Bennée

                 reply	other threads:[~2022-08-13  6:19 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r11kpt1j.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=bleal@redhat.com \
    --cc=crosa@redhat.com \
    --cc=f4bug@amsat.org \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=wainersm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).