All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,  qemu-ppc@nongnu.org,
	 Nicholas Piggin <npiggin@gmail.com>
Subject: Re: [RFC PATCH] target/ppc: Inline most of dcbz helper
Date: Tue, 29 Apr 2025 17:04:38 +0100	[thread overview]
Message-ID: <875xin3qeh.fsf@draig.linaro.org> (raw)
In-Reply-To: <164d86d5-f17a-1f89-d973-c3e56255195d@eik.bme.hu> (BALATON Zoltan's message of "Tue, 29 Apr 2025 16:40:28 +0200 (CEST)")

BALATON Zoltan <balaton@eik.bme.hu> writes:

> On Mon, 28 Apr 2025, Richard Henderson wrote:
>> On 4/28/25 06:26, BALATON Zoltan wrote:
>>> I have tried profiling the dst in real card vfio vram with dcbz
>>> case (with 100 iterations instead of 10000 in above tests) but I'm
>>> not sure I understand the results. vperm and dcbz show up but not
>>> too high. Can somebody explain what is happening here and where the
>>> overhead likely comes from? Here is the profile result I got:
>>> Samples: 104K of event 'cycles:Pu', Event count (approx.):
>>> 122371086557
>>>    Children      Self  Command          Shared Object            Symbol
>>> -   99.44%     0.95%  qemu-system-ppc  qemu-system-ppc          [.]
>>> cpu_exec_loop
>>>     - 98.49% cpu_exec_loop
>>>        - 98.48% cpu_tb_exec
>>>           - 90.95% 0x7f4e705d8f15
>>>                helper_ldub_mmu
>>>                do_ld_mmio_beN
>>>              - cpu_io_recompile
>>>                 - 45.79% cpu_loop_exit_noexc
>>
>> I think the real problem is the number of loop exits due to i/o.  If
>> I'm reading this rightly, 45% of execution is in cpu_io_recompile.
>>
>> I/O can only happen as the last insn of a translation block.
>
> I'm not sure I understand this. A comment above cpu_io_recompile says
> "In deterministic execution mode, instructions doing device I/Os must
> be at the end of the TB." Is that wrong? Otherwise shouldn't this only
> apply if running with icount or something like that?

That comment should be fixed. It used to only be the case for icount
mode but there was another race bug that meant we need to honour device
access as the last insn for both modes.

>
>> When we detect that it has happened in the middle of a translation
>> block, we abort the block, compile a new one, and restart execution.
>
> Where does that happen? The calls of cpu_io_recompile in this case
> seem to come from io_prepare which is called from do_ld16_mmio_beN if
> (!cpu->neg.can_do_io) but I don't see how can_do_io is set.

Inline by set_can_do_io()

>> Where this becomes a bottleneck is when this same translation block
>> is in a loop.  Exactly this case of memset/memcpy of VRAM.  This
>> could be addressed by invalidating the previous translation block
>> and creating a new one which always ends with the i/o.
>
> And where to do that? cpu_io_recompile just exits the TB but what
> generates the new TB? I need some more clues to understands how to do
> this.

  cpu->cflags_next_tb = curr_cflags(cpu) | CF_MEMI_ONLY | CF_NOIRQ | n;

sets the cflags for the next cb, which typically will fail to find and
then regenerate. Normally cflags_next_tb is empty.

>
> Regards,
> BALATON Zoltan

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


  reply	other threads:[~2025-04-29 16:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-01  0:59 [RFC PATCH] target/ppc: Inline most of dcbz helper BALATON Zoltan
2025-04-24 12:45 ` BALATON Zoltan
2025-04-28  0:12   ` BALATON Zoltan
2025-04-28 10:44     ` BALATON Zoltan
2025-04-28 13:26       ` BALATON Zoltan
2025-04-28 13:47         ` Richard Henderson
2025-04-29 14:40           ` BALATON Zoltan
2025-04-29 16:04             ` Alex Bennée [this message]
2025-04-29 17:14               ` BALATON Zoltan
2025-04-29 17:58                 ` Alex Bennée
2025-04-29 21:09                   ` BALATON Zoltan
2025-04-30  0:35                     ` Nicholas Piggin
2025-04-30 11:20                       ` BALATON Zoltan
2025-04-30 13:47                         ` Alex Bennée
2025-04-30 15:14                           ` BALATON Zoltan
2025-04-29 15:27         ` Alex Bennée
2025-04-29 17:11           ` BALATON Zoltan
2025-04-29 17:30           ` Richard Henderson
2025-04-29 18:00             ` Alex Bennée
2025-04-29 20:51               ` BALATON Zoltan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875xin3qeh.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=npiggin@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.