From: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
To: Richard Henderson <rth@twiddle.net>
Cc: blauwirbel@gmail.com, sw@weilnetz.de,
laurent.desnogues@gmail.com, qemu-devel@nongnu.org,
peter.maydell@linaro.org
Subject: Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
Date: Mon, 27 Aug 2012 16:23:57 +0900 [thread overview]
Message-ID: <503B208D.2020407@samsung.com> (raw)
In-Reply-To: <50140795.5030209@samsung.com>
On 2012년 07월 29일 00:39, Yeongkyoon Lee wrote:
> On 2012년 07월 25일 23:00, Richard Henderson wrote:
>> On 07/25/2012 12:35 AM, Yeongkyoon Lee wrote:
>>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU)
>>> +/* Macros/structures for qemu_ld/st IR code optimization:
>>> + TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in
>>> exec-all.h. */
>>> +#define TCG_MAX_QEMU_LDST 640
>> Why statically size this ...
>
> This just followed the other TCG's code style, the allocation of the
> "labels" of "TCGContext" in tcg.c.
>
>
>>
>>> + /* labels info for qemu_ld/st IRs
>>> + The labels help to generate TLB miss case codes at the end
>>> of TB */
>>> + TCGLabelQemuLdst *qemu_ldst_labels;
>> ... and then allocate the array dynamically?
>
> ditto.
>
>>
>>> + /* jne slow_path */
>>> + /* XXX: How to avoid using OPC_JCC_long for peephole
>>> optimization? */
>>> + tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
>> You can't, not and maintain the code-generate-until-address-reached
>> exception invariant.
>>
>>> +#ifndef CONFIG_QEMU_LDST_OPTIMIZATION
>>> uint8_t __ldb_mmu(target_ulong addr, int mmu_idx);
>>> void __stb_mmu(target_ulong addr, uint8_t val, int mmu_idx);
>>> uint16_t __ldw_mmu(target_ulong addr, int mmu_idx);
>>> @@ -28,6 +30,30 @@ void __stl_cmmu(target_ulong addr, uint32_t val,
>>> int mmu_idx);
>>> uint64_t __ldq_cmmu(target_ulong addr, int mmu_idx);
>>> void __stq_cmmu(target_ulong addr, uint64_t val, int mmu_idx);
>>> #else
>>> +/* Extended versions of MMU helpers for qemu_ld/st optimization.
>>> + The additional argument is a host code address accessing guest
>>> memory */
>>> +uint8_t ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra);
>> Don't tie LDST_OPTIMIZATION directly to the extended function calls.
>>
>> For a host supporting predication, like ARM, the best code sequence
>> may look like
>>
>> (1) TLB check
>> (2) If hit, load value from memory
>> (3) If miss, call miss case (5)
>> (4) ... next code
>> ...
>> (5) Load call parameters
>> (6) Tail call (aka jump) to MMU helper
>>
>> so that (a) we need not explicitly load the address of (3) by hand
>> for your RA parameter and (b) the mmu helper returns directly to (4).
>>
>>
>> r~
>
> The difference between current HEAD and the code sequence you said is,
> I think, code locality.
> My LDST_OPTIMIZATION patches enhances the code locality and also
> removes one jump.
> It shows about 4% rising of CoreMark performance on x86 host which
> supports predication like ARM.
> Probably, the performance enhancement for AREG0 cases might get more
> larger.
> I'm not sure where the performance enhancement came from now, and I'll
> check it by some tests later.
>
> In my humble opinion, there are no things to lose in LDST_OPTIMIZATION
> except
> for just adding one argument to MMU helper implicitly which doesn't
> look so critical.
> How about your opinion?
>
> Thanks.
>
It's been a long time.
I've tested the performances of one jump difference when fast qemu_ld/st
(TLB hit).
The result shows 3.6% CoreMark enhancement when reducing one jump where
slow paths are generated at the end of block as same for the both cases.
That means reducing one jump dominates the majority of performance
enhancement from LDST_OPTIMIZATION.
As a result, it needs extended MMU helper functions for attaining that
performance rising, and those extended functions are used only implicitly.
BTW, who will finally confirm my patches?
I have sent four version of my patches in which I have applied all the
reasonable feedbacks from this community.
Currently, v4 is the final candidate though it might need merge with
latest HEAD because it was sent 1 month before.
Thanks.
next prev parent reply other threads:[~2012-08-27 7:23 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-07-25 7:35 [Qemu-devel] [RFC][PATCH v4 0/3] tcg: enhance code generation quality for qemu_ld/st IRs Yeongkyoon Lee
2012-07-25 7:35 ` [Qemu-devel] [RFC][PATCH v4 1/3] configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization Yeongkyoon Lee
2012-07-25 7:35 ` [Qemu-devel] [RFC][PATCH v4 2/3] tcg: Add declarations and templates of extended MMU helpers Yeongkyoon Lee
2012-07-25 7:35 ` [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block Yeongkyoon Lee
2012-07-25 14:00 ` Richard Henderson
2012-07-28 15:39 ` Yeongkyoon Lee
2012-08-27 7:23 ` Yeongkyoon Lee [this message]
2012-08-27 18:24 ` Blue Swirl
2012-08-28 6:52 ` Yeongkyoon Lee
2012-08-28 16:58 ` Blue Swirl
2012-08-27 18:31 ` Peter Maydell
2012-08-28 6:38 ` Yeongkyoon Lee
2012-08-28 17:18 ` Andreas Färber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=503B208D.2020407@samsung.com \
--to=yeongkyoon.lee@samsung.com \
--cc=blauwirbel@gmail.com \
--cc=laurent.desnogues@gmail.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).