Re: Slowness with multi-thread TCG?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: "Matheus K. Ferst" <matheus.ferst@eldorado.org.br>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org
Subject: Re: Slowness with multi-thread TCG?
Date: Tue, 28 Jun 2022 16:12:40 +0100	[thread overview]
Message-ID: <87h744keg7.fsf@linaro.org> (raw)
In-Reply-To: <cf38218a-6e13-8024-8e08-fcdd74faa5a8@linux.ibm.com>


Frederic Barrat <fbarrat@linux.ibm.com> writes:

> On 28/06/2022 13:25, Matheus K. Ferst wrote:
>> On 27/06/2022 15:25, Frederic Barrat wrote:
>>> [ Resending as it was meant for the qemu-ppc list ]
>>>
>>> Hello,
>>>
>>> I've been looking at why our qemu powernv model is so slow when booting
>>> a compressed linux kernel, using multiple vcpus and multi-thread tcg.
>>> With only one vcpu, the decompression time of the kernel is what it is,
>>> but when using multiple vcpus, the decompression is actually slower. And
>>> worse: it degrades very fast with the number of vcpus!
>>>
>>> Rough measurement of the decompression time on a x86 laptop with
>>> multi-thread tcg and using the qemu powernv10 machine:
>>> 1 vcpu => 15 seconds
>>> 2 vcpus => 45 seconds
>>> 4 vcpus => 1 min 30 seconds
>>>
>>> Looking in details, when the firmware (skiboot) hands over execution to
>>> the linux kernel, there's one main thread entering some bootstrap code
>>> and running the kernel decompression algorithm. All the other secondary
>>> threads are left spinning in skiboot (1 thread per vpcu). So on paper,
>>> with multi-thread tcg and assuming the system has enough available
>>> physical cpus, I would expect the decompression to hog one physical cpu
>>> and the time needed to be constant, no matter the number of vpcus.
<snip>
>>>
>>> Ironically, the behavior seen with single thread tcg is what I would
>>> expect: 1 thread decompressing in 15 seconds, all the other threads
>>> spinning for that same amount of time, all sharing the same physical
>>> cpu, so it all adds up nicely: I see 60 seconds decompression time with
>>> 4 vcpus (4x15). Which means multi-thread tcg is slower by quite a bit.
>>> And single thread tcg hogs one physical cpu of the laptop vs. 4 physical
>>> cpus for the slower multi-thread tcg.
>>>
>>> Does anybody have an idea of what might happen or have suggestion to
>>> keep investigating?
>>> Thanks for your help!
>>>
>>>    Fred
>>>
>>>
>> Hi Frederic,
>> I did some boot time tests recently and didn't notice this behavior.
>> Could you share your QEMU command line with us? Did you build QEMU
>> with any debug option or sanitizer enabled?
>
>
> You should be able to see it with:
>
> qemu-system-ppc64 -machine powernv10 -smp 4 -m 4G -nographic -bios
> <path to skiboot.lid> -kernel <path to compresses kernel>   -initrd
> <path to initd>  -serial mon:stdio
>
>
> -smp is what matters.
>
> When simplifying the command line above, I noticed something
> interesting: the problem doesn't show using the skiboot.lid shipped
> with qemu! I'm using something closer to the current upstream head and
> the idle code (the for loop in my initial mail) had been reworked in
> between. So, clearly, the way the guest code is written matters. But
> that doesn't explain it.
>
> I'm using a kernel in debug mode, so it's pretty big and that's why I
> was using a compressed image. The compressed image is about 8 MB.

If the debug mode on PPC enables live patching of kernel functions for
instrumentation that can certainly slow things down. You would see that
in tcg_optimize appearing in the perf log and "info jit" showing
constantly growing translation buffers.

>
> The initrd shouldn't matter, the issue is seen during kernel
> decompression, before the init ram is used.
>
> I can share my binaries if you'd like. Especially a recent version of
> skiboot showing the problem.
>
>   Fred


-- 
Alex Bennée

next prev parent reply	other threads:[~2022-06-28 15:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-27 18:25 Slowness with multi-thread TCG? Frederic Barrat
2022-06-27 21:10 ` Alex Bennée
2022-06-28 11:25 ` Matheus K. Ferst
2022-06-28 13:08   ` Frederic Barrat
2022-06-28 15:12     ` Alex Bennée [this message]
2022-06-28 16:16       ` Frederic Barrat
2022-06-28 22:17         ` Alex Bennée
2022-06-29 15:36           ` Frederic Barrat
2022-06-29 16:01             ` Alex Bennée
2022-06-29 16:25             ` Matheus K. Ferst
2022-06-29 17:13               ` Alex Bennée
2022-06-29 20:55                 ` Cédric Le Goater
  -- strict thread matches above, loose matches on Subject: below --
2022-06-27 16:25 Frederic Barrat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h744keg7.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=fbarrat@linux.ibm.com \
    --cc=matheus.ferst@eldorado.org.br \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.