From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45503) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Su28L-0002Oz-4Q for qemu-devel@nongnu.org; Wed, 25 Jul 2012 10:00:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Su28E-0002p7-3T for qemu-devel@nongnu.org; Wed, 25 Jul 2012 10:00:13 -0400 Received: from mail-pb0-f45.google.com ([209.85.160.45]:43811) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Su28D-0002ie-Oy for qemu-devel@nongnu.org; Wed, 25 Jul 2012 10:00:06 -0400 Received: by pbbro12 with SMTP id ro12so1430271pbb.4 for ; Wed, 25 Jul 2012 07:00:02 -0700 (PDT) Sender: Richard Henderson Message-ID: <500FFBE0.70700@twiddle.net> Date: Wed, 25 Jul 2012 07:00:00 -0700 From: Richard Henderson MIME-Version: 1.0 References: <1343201734-12062-1-git-send-email-yeongkyoon.lee@samsung.com> <1343201734-12062-4-git-send-email-yeongkyoon.lee@samsung.com> In-Reply-To: <1343201734-12062-4-git-send-email-yeongkyoon.lee@samsung.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC][PATCH v4 3/3] tcg: Optimize qemu_ld/st by generating slow paths at the end of a block List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yeongkyoon Lee Cc: blauwirbel@gmail.com, sw@weilnetz.de, laurent.desnogues@gmail.com, qemu-devel@nongnu.org, peter.maydell@linaro.org On 07/25/2012 12:35 AM, Yeongkyoon Lee wrote: > +#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU) > +/* Macros/structures for qemu_ld/st IR code optimization: > + TCG_MAX_HELPER_LABELS is defined as same as OPC_BUF_SIZE in exec-all.h. */ > +#define TCG_MAX_QEMU_LDST 640 Why statically size this ... > + /* labels info for qemu_ld/st IRs > + The labels help to generate TLB miss case codes at the end of TB */ > + TCGLabelQemuLdst *qemu_ldst_labels; ... and then allocate the array dynamically? > + /* jne slow_path */ > + /* XXX: How to avoid using OPC_JCC_long for peephole optimization? */ > + tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0); You can't, not and maintain the code-generate-until-address-reached exception invariant. > +#ifndef CONFIG_QEMU_LDST_OPTIMIZATION > uint8_t __ldb_mmu(target_ulong addr, int mmu_idx); > void __stb_mmu(target_ulong addr, uint8_t val, int mmu_idx); > uint16_t __ldw_mmu(target_ulong addr, int mmu_idx); > @@ -28,6 +30,30 @@ void __stl_cmmu(target_ulong addr, uint32_t val, int mmu_idx); > uint64_t __ldq_cmmu(target_ulong addr, int mmu_idx); > void __stq_cmmu(target_ulong addr, uint64_t val, int mmu_idx); > #else > +/* Extended versions of MMU helpers for qemu_ld/st optimization. > + The additional argument is a host code address accessing guest memory */ > +uint8_t ext_ldb_mmu(target_ulong addr, int mmu_idx, uintptr_t ra); Don't tie LDST_OPTIMIZATION directly to the extended function calls. For a host supporting predication, like ARM, the best code sequence may look like (1) TLB check (2) If hit, load value from memory (3) If miss, call miss case (5) (4) ... next code ... (5) Load call parameters (6) Tail call (aka jump) to MMU helper so that (a) we need not explicitly load the address of (3) by hand for your RA parameter and (b) the mmu helper returns directly to (4). r~