From: Richard Henderson <richard.henderson@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>,
qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Cc: Nicholas Piggin <npiggin@gmail.com>,
Daniel Henrique Barboza <danielhb413@gmail.com>
Subject: Re: [PATCH] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()
Date: Sun, 23 Jun 2024 10:52:49 -0700 [thread overview]
Message-ID: <6664471b-7223-4c6e-a106-ce272be72f28@linaro.org> (raw)
In-Reply-To: <20240622204833.5F7C74E6000@zero.eik.bme.hu>
On 6/22/24 13:48, BALATON Zoltan wrote:
> Instead of passing a bool and select a value within dcbz_common() let
> the callers pass in the right value to avoid this conditional
> statement. On PPC dcbz is often used to zero memory and some code uses
> it a lot. This change improves the run time of a test case that copies
> memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.
>
> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
> ---
> This is just a small optimisation removing some of the overhead but
> dcbz still seems to be the biggest issue with this test. Removing the
> dcbz call it runs in 2 seconds. In a profile I see:
> Children Self Command Shared Object Symbol
> - 55.01% 11.44% qemu-ppc qemu-ppc [.] dcbz_common.constprop.0
> - 43.57% dcbz_common.constprop.0
> - probe_access
> - page_get_flags
> interval_tree_iter_first
> - 11.44% helper_raise_exception_err
> cpu_loop_exit_restore
> cpu_loop
> cpu_exec
> cpu_exec_setjmp.isra.0
> cpu_exec_loop.constprop.0
> cpu_tb_exec
> 0x7f262403636e
> helper_raise_exception_err
> cpu_loop_exit_restore
> cpu_loop
> cpu_exec
> cpu_exec_setjmp.isra.0
> cpu_exec_loop.constprop.0
> cpu_tb_exec
> - 0x7f26240386a4
> 11.20% helper_dcbz
> + 43.81% 12.28% qemu-ppc qemu-ppc [.] probe_access
> + 39.31% 0.00% qemu-ppc [JIT] tid 9969 [.] 0x00007f2624000000
> + 32.45% 4.51% qemu-ppc qemu-ppc [.] page_get_flags
> + 25.50% 2.10% qemu-ppc qemu-ppc [.] interval_tree_iter_first
> + 24.67% 24.67% qemu-ppc qemu-ppc [.] interval_tree_subtree_search
> + 16.75% 1.19% qemu-ppc qemu-ppc [.] helper_dcbz
> + 4.78% 4.78% qemu-ppc [JIT] tid 9969 [.] 0x00007f26240386be
> + 3.46% 3.46% qemu-ppc libc-2.32.so [.] __memset_avx2_unaligned_erms
> Any idea how this could be optimised further? (This is running with
> qemu-ppc user mode emulation but I think with system it might be even
> worse.) Could an inline implementation with TCG vector ops work to
> avoid the helper and let it compile to efficient host code? Even if
> that could work I don't know how to do that so I'd need some further
> advice on this.
>
> target/ppc/mem_helper.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index f88155ad45..361fd72226 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
> }
>
> static void dcbz_common(CPUPPCState *env, target_ulong addr,
> - uint32_t opcode, bool epid, uintptr_t retaddr)
> + uint32_t opcode, int mmu_idx, uintptr_t retaddr)
> {
> target_ulong mask, dcbz_size = env->dcache_line_size;
> uint32_t i;
> void *haddr;
> - int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false);
>
> #if defined(TARGET_PPC64)
> /* Check for dcbz vs dcbzl on 970 */
> @@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>
> void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> {
> - dcbz_common(env, addr, opcode, false, GETPC());
> + dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
This is already computed in the translator: DisasContext.mem_idx.
If you pass the mmu_idx as an argument, you can unify these two helpers.
r~
> }
>
> void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> {
> - dcbz_common(env, addr, opcode, true, GETPC());
> + dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
> }
>
> void helper_icbi(CPUPPCState *env, target_ulong addr)
next prev parent reply other threads:[~2024-06-23 17:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-22 20:48 [PATCH] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() BALATON Zoltan
2024-06-23 17:52 ` Richard Henderson [this message]
2024-06-23 22:24 ` BALATON Zoltan
2024-06-24 3:23 ` Richard Henderson
2024-06-24 9:19 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6664471b-7223-4c6e-a106-ce272be72f28@linaro.org \
--to=richard.henderson@linaro.org \
--cc=balaton@eik.bme.hu \
--cc=danielhb413@gmail.com \
--cc=npiggin@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).