From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:60608)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1Sv95h-00067o-Sl
	for qemu-devel@nongnu.org; Sat, 28 Jul 2012 11:38:06 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1Sv95g-0003aV-Qd
	for qemu-devel@nongnu.org; Sat, 28 Jul 2012 11:38:05 -0400
Received: from mailout1.samsung.com ([203.254.224.24]:15418)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1Sv95g-0003ZS-GB
	for qemu-devel@nongnu.org; Sat, 28 Jul 2012 11:38:04 -0400
Received: from epcpsbgm2.samsung.com (mailout1.samsung.com [203.254.224.24])
	by mailout1.samsung.com
	(Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit
	(built Nov
	17 2011)) with ESMTP id <0M7V008P1NFBOS40@mailout1.samsung.com> for
	qemu-devel@nongnu.org; Sun, 29 Jul 2012 00:37:59 +0900 (KST)
Received: from [172.21.111.108] ([182.198.1.3])
	by mmp2.samsung.com (Oracle Communications Messaging Server 7u4-24.01
	(7.0.4.24.0) 64bit (built Nov 17 2011))
	with ESMTPA id <0M7V00LTNNFASEC0@mmp2.samsung.com> for
	qemu-devel@nongnu.org; Sun, 29 Jul 2012 00:37:59 +0900 (KST)
Date: Sun, 29 Jul 2012 00:39:01 +0900
From: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
In-reply-to: <500FFBE0.70700@twiddle.net>
Message-id: <50140795.5030209@samsung.com>
MIME-version: 1.0
Content-type: text/plain; charset=UTF-8; format=flowed
Content-transfer-encoding: QUOTED-PRINTABLE
References: <1343201734-12062-1-git-send-email-yeongkyoon.lee@samsung.com>
	<1343201734-12062-4-git-send-email-yeongkyoon.lee@samsung.com>
	<500FFBE0.70700@twiddle.net>
Subject: Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by
 generating slow paths at the end of a block
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: blauwirbel@gmail.com, sw@weilnetz.de, laurent.desnogues@gmail.com, qemu-devel@nongnu.org, peter.maydell@linaro.org

On 2012=EB=85=84 07=EC=9B=94 25=EC=9D=BC 23:00, Richard Henderson wro=
te:
> On 07/25/2012 12:35 AM, Yeongkyoon Lee wrote:
>> +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFT=
MMU)
>> +/* Macros/structures for qemu_ld/st IR code optimization:
>> +   TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in ex=
ec-all.h. */
>> +#define TCG_MAX_QEMU_LDST       640
> Why statically size this ...

This just followed the other TCG's code style, the allocation of the=
=20
"labels" of "TCGContext" in tcg.c.


>
>> +    /* labels info for qemu_ld/st IRs
>> +       The labels help to generate TLB miss case codes at the end=
 of TB */
>> +    TCGLabelQemuLdst *qemu_ldst_labels;
> ... and then allocate the array dynamically?

ditto.

>
>> +    /* jne slow_path */
>> +    /* XXX: How to avoid using OPC_JCC_long for peephole optimiza=
tion? */
>> +    tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
> You can't, not and maintain the code-generate-until-address-reached
> exception invariant.
>
>> +#ifndef CONFIG_QEMU_LDST_OPTIMIZATION
>>   uint8_t __ldb_mmu(target_ulong addr, int mmu_idx);
>>   void __stb_mmu(target_ulong addr, uint8_t val, int mmu_idx);
>>   uint16_t __ldw_mmu(target_ulong addr, int mmu_idx);
>> @@ -28,6 +30,30 @@ void __stl_cmmu(target_ulong addr, uint32_t val=
, int mmu_idx);
>>   uint64_t __ldq_cmmu(target_ulong addr, int mmu_idx);
>>   void __stq_cmmu(target_ulong addr, uint64_t val, int mmu_idx);
>>   #else
>> +/* Extended versions of MMU helpers for qemu_ld/st optimization.
>> +   The additional argument is a host code address accessing guest=
 memory */
>> +uint8_t ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra)=
;
> Don't tie LDST_OPTIMIZATION directly to the extended function calls=
.
>
> For a host supporting predication, like ARM, the best code sequence
> may look like
>
> =09(1) TLB check
> =09(2) If hit, load value from memory
> =09(3) If miss, call miss case (5)
> =09(4) ... next code
> =09...
> =09(5) Load call parameters
> =09(6) Tail call (aka jump) to MMU helper
>
> so that (a) we need not explicitly load the address of (3) by hand
> for your RA parameter and (b) the mmu helper returns directly to (4=
).
>
>
> r~

The difference between current HEAD and the code sequence you said is=
, I=20
think, code locality.
My LDST_OPTIMIZATION patches enhances the code locality and also remo=
ves=20
one jump.
It shows about 4% rising of CoreMark performance on x86 host which=
=20
supports predication like ARM.
Probably, the performance enhancement for AREG0 cases might get more =
larger.
I'm not sure where the performance enhancement came from now, and I'l=
l=20
check it by some tests later.

In my humble opinion, there are no things to lose in LDST_OPTIMIZATIO=
N=20
except
for just adding one argument to MMU helper implicitly which doesn't l=
ook=20
so critical.
How about your opinion?

Thanks.