From: Richard Henderson <richard.henderson@linaro.org>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
Nicholas Piggin <npiggin@gmail.com>,
Daniel Henrique Barboza <danielhb413@gmail.com>
Subject: Re: [PATCH] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()
Date: Sun, 23 Jun 2024 20:23:06 -0700 [thread overview]
Message-ID: <ccb2fe51-4256-42a0-b9c8-1e4886c0472a@linaro.org> (raw)
In-Reply-To: <a0e7e8e3-97b1-34a3-b688-78bf77db5fd9@eik.bme.hu>
On 6/23/24 15:24, BALATON Zoltan wrote:
> On Sun, 23 Jun 2024, Richard Henderson wrote:
>> On 6/22/24 13:48, BALATON Zoltan wrote:
>>> Instead of passing a bool and select a value within dcbz_common() let
>>> the callers pass in the right value to avoid this conditional
>>> statement. On PPC dcbz is often used to zero memory and some code uses
>>> it a lot. This change improves the run time of a test case that copies
>>> memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.
>>>
>>> Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
>>> ---
>>> This is just a small optimisation removing some of the overhead but
>>> dcbz still seems to be the biggest issue with this test. Removing the
>>> dcbz call it runs in 2 seconds. In a profile I see:
>>> Children Self Command Shared Object Symbol
>>> - 55.01% 11.44% qemu-ppc qemu-ppc [.] dcbz_common.constprop.0
>>> - 43.57% dcbz_common.constprop.0
>>> - probe_access
>>> - page_get_flags
>>> interval_tree_iter_first
>>> - 11.44% helper_raise_exception_err
>>> cpu_loop_exit_restore
>>> cpu_loop
>>> cpu_exec
>>> cpu_exec_setjmp.isra.0
>>> cpu_exec_loop.constprop.0
>>> cpu_tb_exec
>>> 0x7f262403636e
>>> helper_raise_exception_err
>>> cpu_loop_exit_restore
>>> cpu_loop
>>> cpu_exec
>>> cpu_exec_setjmp.isra.0
>>> cpu_exec_loop.constprop.0
>>> cpu_tb_exec
>>> - 0x7f26240386a4
>>> 11.20% helper_dcbz
>>> + 43.81% 12.28% qemu-ppc qemu-ppc [.] probe_access
>>> + 39.31% 0.00% qemu-ppc [JIT] tid 9969 [.] 0x00007f2624000000
>>> + 32.45% 4.51% qemu-ppc qemu-ppc [.] page_get_flags
>>> + 25.50% 2.10% qemu-ppc qemu-ppc [.] interval_tree_iter_first
>>> + 24.67% 24.67% qemu-ppc qemu-ppc [.] interval_tree_subtree_search
>>> + 16.75% 1.19% qemu-ppc qemu-ppc [.] helper_dcbz
>>> + 4.78% 4.78% qemu-ppc [JIT] tid 9969 [.] 0x00007f26240386be
>>> + 3.46% 3.46% qemu-ppc libc-2.32.so [.] __memset_avx2_unaligned_erms
>>> Any idea how this could be optimised further? (This is running with
>>> qemu-ppc user mode emulation but I think with system it might be even
>>> worse.) Could an inline implementation with TCG vector ops work to
>>> avoid the helper and let it compile to efficient host code? Even if
>>> that could work I don't know how to do that so I'd need some further
>>> advice on this.
>>>
>>> target/ppc/mem_helper.c | 7 +++----
>>> 1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
>>> index f88155ad45..361fd72226 100644
>>> --- a/target/ppc/mem_helper.c
>>> +++ b/target/ppc/mem_helper.c
>>> @@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
>>> }
>>> static void dcbz_common(CPUPPCState *env, target_ulong addr,
>>> - uint32_t opcode, bool epid, uintptr_t retaddr)
>>> + uint32_t opcode, int mmu_idx, uintptr_t retaddr)
>>> {
>>> target_ulong mask, dcbz_size = env->dcache_line_size;
>>> uint32_t i;
>>> void *haddr;
>>> - int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false);
>>> #if defined(TARGET_PPC64)
>>> /* Check for dcbz vs dcbzl on 970 */
>>> @@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>>> void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
>>> {
>>> - dcbz_common(env, addr, opcode, false, GETPC());
>>> + dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
>>
>> This is already computed in the translator: DisasContext.mem_idx.
>> If you pass the mmu_idx as an argument, you can unify these two helpers.
>
> I've tried that. It works but slower: I get 5.9 seconds vs 5.83 with this patch so I think
> I'd stay with this one. Maybe it's because making the helper take 4 parameters instead of
> 3? I can submit the patch for reference if it would be useful but I'd keep this one for
> merging for now.
Interesting. Can you share your test case?
The other thing I see is that we can split out the 970 test to a separate helper. We can
test for POWERPC_EXCP_970 and !(opcode & 0x00200000) during translation. I think the
current placement, where dcbzep tests that as well, is wrong, since dcbze is an E500 insn.
Anyway, that would eliminate opcode as an argument bringing you back to 3.
r~
next prev parent reply other threads:[~2024-06-24 3:24 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-22 20:48 [PATCH] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() BALATON Zoltan
2024-06-23 17:52 ` Richard Henderson
2024-06-23 22:24 ` BALATON Zoltan
2024-06-24 3:23 ` Richard Henderson [this message]
2024-06-24 9:19 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ccb2fe51-4256-42a0-b9c8-1e4886c0472a@linaro.org \
--to=richard.henderson@linaro.org \
--cc=balaton@eik.bme.hu \
--cc=danielhb413@gmail.com \
--cc=npiggin@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).