From: "Alex Bennée" <alex.bennee@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
Nicholas Piggin <npiggin@gmail.com>,
Richard Henderson <richard.henderson@linaro.org>
Subject: Re: [RFC PATCH] target/ppc: Inline most of dcbz helper
Date: Tue, 29 Apr 2025 16:27:43 +0100 [thread overview]
Message-ID: <87bjsf3s40.fsf@draig.linaro.org> (raw)
In-Reply-To: <e4fc537a-a15e-77dd-1167-32b12ee7a22d@eik.bme.hu> (BALATON Zoltan's message of "Mon, 28 Apr 2025 15:26:07 +0200 (CEST)")
BALATON Zoltan <balaton@eik.bme.hu> writes:
> On Mon, 28 Apr 2025, BALATON Zoltan wrote:
>> On Mon, 28 Apr 2025, BALATON Zoltan wrote:
>>> On Thu, 24 Apr 2025, BALATON Zoltan wrote:
>>>>> The test case I've used came out of a discussion about very slow
>>>>> access to VRAM of a graphics card passed through with vfio the reason
>>>>> for which is still not clear but it was already known that dcbz is
>>>>> often used by MacOS and AmigaOS for clearing memory and to avoid
>>>>> reading values about to be overwritten which is faster on real CPU but
>>>>> was found to be slower on QEMU. The optimised copy routines were
>>>>> posted here:
<snip>
>
> I have tried profiling the dst in real card vfio vram with dcbz case
> (with 100 iterations instead of 10000 in above tests) but I'm not sure
> I understand the results. vperm and dcbz show up but not too high. Can
> somebody explain what is happening here and where the overhead likely
> comes from? Here is the profile result I got:
>
> Samples: 104K of event 'cycles:Pu', Event count (approx.): 122371086557
> Children Self Command Shared Object Symbol
> - 99.44% 0.95% qemu-system-ppc qemu-system-ppc [.] cpu_exec_loop
> - 98.49% cpu_exec_loop
> - 98.48% cpu_tb_exec
> - 90.95% 0x7f4e705d8f15
> helper_ldub_mmu
> do_ld_mmio_beN
> - cpu_io_recompile
This looks like the dbz instructions are being used to clear device
memory and tripping over the can_do_io check (normally the translator
tries to ensure all device access is at the end of a block).
You could try ending the block on dbz instructions and seeing if that
helps. Normally I would expect the helper to be more efficient as it can
probe the whole address range once and then use host insns to blat the
memory.
> - 45.79% cpu_loop_exit_noexc
> - cpu_loop_exit
> __longjmp_chk
> cpu_exec_setjmp
> - cpu_exec_loop
> - 45.78% cpu_tb_exec
> 42.35% 0x7f4e6f3f0000
> - 0.72% 0x7f4e99f37037
> helper_VPERM
> - 0.68% 0x7f4e99f3716d
> helper_VPERM
> - 45.16% rr_cpu_thread_fn
Hmm you seem to be running in icount mode here for some reason.
> - 45.16% tcg_cpu_exec
> - 45.15% cpu_exec
> - 45.15% cpu_exec_setjmp
> - cpu_exec_loop
> - 45.14% cpu_tb_exec
> 42.08% 0x7f4e6f3f0000
> - 0.72% 0x7f4e99f37037
> helper_VPERM
> - 0.67% 0x7f4e99f3716d
> helper_VPERM
<snip>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
next prev parent reply other threads:[~2025-04-29 15:28 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-01 0:59 [RFC PATCH] target/ppc: Inline most of dcbz helper BALATON Zoltan
2025-04-24 12:45 ` BALATON Zoltan
2025-04-28 0:12 ` BALATON Zoltan
2025-04-28 10:44 ` BALATON Zoltan
2025-04-28 13:26 ` BALATON Zoltan
2025-04-28 13:47 ` Richard Henderson
2025-04-29 14:40 ` BALATON Zoltan
2025-04-29 16:04 ` Alex Bennée
2025-04-29 17:14 ` BALATON Zoltan
2025-04-29 17:58 ` Alex Bennée
2025-04-29 21:09 ` BALATON Zoltan
2025-04-30 0:35 ` Nicholas Piggin
2025-04-30 11:20 ` BALATON Zoltan
2025-04-30 13:47 ` Alex Bennée
2025-04-30 15:14 ` BALATON Zoltan
2025-04-29 15:27 ` Alex Bennée [this message]
2025-04-29 17:11 ` BALATON Zoltan
2025-04-29 17:30 ` Richard Henderson
2025-04-29 18:00 ` Alex Bennée
2025-04-29 20:51 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87bjsf3s40.fsf@draig.linaro.org \
--to=alex.bennee@linaro.org \
--cc=balaton@eik.bme.hu \
--cc=npiggin@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.