From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:38992) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ugvqg-0001K9-Oz for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UgvqY-0008B0-6W for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:22 -0400 Received: from lhrrgout.huawei.com ([194.213.3.17]:7130) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UgvqX-0008Ag-TX for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:14 -0400 Message-ID: <51A346EC.2080005@huawei.com> Date: Mon, 27 May 2013 13:43:40 +0200 From: Claudio Fontana MIME-Version: 1.0 References: <5141F36E.10004@huawei.com> <519DCEC8.8060000@huawei.com> <519DD0BF.4090702@huawei.com> <519E43EE.6090702@twiddle.net> <519F2A8F.6090903@huawei.com> <519F9D11.9020603@twiddle.net> In-Reply-To: <519F9D11.9020603@twiddle.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement new TCG target for aarch64 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: Peter Maydell , Jani Kokkonen , qemu-devel@nongnu.org On 24.05.2013 19:02, Richard Henderson wrote: > On 05/24/2013 01:53 AM, Claudio Fontana wrote: >>> No real need to special case zero; it's just an extra test slowing down the >>> compiler. >> >> Yes, we need to handle the special case zero. >> Otherwise no instruction at all would be emitted for value 0. > > Hmm, true. Although I'd been thinking more along the lines of > arranging the code such that we'd use movz to set the zero. I think we need to keep treating zero specially if we want to keep the optimization where we don't emit needless MOVK instructions for half-words of value 0000h. I can however make one single function out of movi32 and movi64, it could look like this: if (!value) { tcg_out_movr(s, 0, rd, TCG_REG_ZXR); return; } base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000; while (value) { /* etc etc */ } >> I actually don't know whether to prefer ext=0 or ext=1, >> in the sense that it would be useful to know whether using the extended registers >> with a small constant is performance-wise preferable to using the 32bit operation, >> and relying on 0-extension. See also the rotation comment below. > >>>From the armv8 isa overview: > > # Rationale: [...] By maintaining this semantic information in the instruction > # set, implementations can exploit this information to avoid expending energy > # or cycles to compute, forward and store the unused upper 32 bits of such > # data types. Implementations are free to exploit this freedom in whatever way > # they choose to save energy. I did not notice that, that solves the issue. >>> addr_reg almost certainly needs to be zero-extended for 32-bit guests, easily >>> done by setting ext = 0 here. >> >> I can easily put an #ifdef just to be sure. > > No ifdef, just the TARGET_LONG_BITS == 64 comparison works. > >>> You initialize FP, but you don't reserve the register, so it's going to get >>> clobbered. We don't actually use the frame pointer in the translated code, so >>> I don't think there's any call to actually initialize it either. >> >> The FP is not going to be clobbered, not by code here and not by called code. >> >> It is not going to be clobbered between our use before the jump and after the >> jump, because all the called functions need to preserve FP as mandated by the >> calling conventions. >> >> It is not going to be clobbered from the point of view of our caller, >> because we save (FP, LR) along with (X19, X20) .. (X27, X28) and restore them >> before returning. > > Ah, well, I didn't see it mentioned here, > >> + tcg_regset_clear(s->reserved_regs); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); >> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */ > > but hadn't noticed that it's not listed in the reg_alloc_order. > >> We use FP to point to the callee_saved registers, and to move to/from them >> in the tcg_out_store_pair and tcg_out_load_pair functions. > > I hadn't noticed you'd hard-coded FP into the load/store_pair functions. > Let's *really* not do that. Even if we decide to continue using it, let's > pass it in explicitly. > > But I don't see that you're really gaining anything in the prologue from > using FP instead of SP. It seems like a waste of a register to me. > > > r~ > -- Claudio Fontana Server OS Architect Huawei Technologies Duesseldorf GmbH Riesstraße 25 - 80992 München office: +49 89 158834 4135 mobile: +49 15253060158