From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:38992)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1Ugvqg-0001K9-Oz
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:29 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1UgvqY-0008B0-6W
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:22 -0400
Received: from lhrrgout.huawei.com ([194.213.3.17]:7130)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <Claudio.Fontana@huawei.com>) id 1UgvqX-0008Ag-TX
	for qemu-devel@nongnu.org; Mon, 27 May 2013 07:44:14 -0400
Message-ID: <51A346EC.2080005@huawei.com>
Date: Mon, 27 May 2013 13:43:40 +0200
From: Claudio Fontana <claudio.fontana@huawei.com>
MIME-Version: 1.0
References: <5141F36E.10004@huawei.com> <519DCEC8.8060000@huawei.com>
	<519DD0BF.4090702@huawei.com> <519E43EE.6090702@twiddle.net>
	<519F2A8F.6090903@huawei.com> <519F9D11.9020603@twiddle.net>
In-Reply-To: <519F9D11.9020603@twiddle.net>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement new TCG target
	for aarch64
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: Peter Maydell <peter.maydell@linaro.org>, Jani Kokkonen <Jani.Kokkonen@huawei.com>, qemu-devel@nongnu.org

On 24.05.2013 19:02, Richard Henderson wrote:
> On 05/24/2013 01:53 AM, Claudio Fontana wrote:
>>> No real need to special case zero; it's just an extra test slowing down the
>>> compiler.
>>
>> Yes, we need to handle the special case zero.
>> Otherwise no instruction at all would be emitted for value 0.
> 
> Hmm, true.  Although I'd been thinking more along the lines of
> arranging the code such that we'd use movz to set the zero.

I think we need to keep treating zero specially if we want to keep the optimization where we don't emit needless MOVK instructions for half-words of value 0000h.

I can however make one single function out of movi32 and movi64, it could look like this:

if (!value) {
    tcg_out_movr(s, 0, rd, TCG_REG_ZXR);
    return;
}

base = (value > 0xffffffff) ? 0xd2800000 : 0x52800000;

while (value) {
    /* etc etc */
}

>> I actually don't know whether to prefer ext=0 or ext=1,
>> in the sense that it would be useful to know whether using the extended registers
>> with a small constant is performance-wise preferable to using the 32bit operation,
>> and relying on 0-extension. See also the rotation comment below.
> 
>>>From the armv8 isa overview:
> 
> # Rationale: [...] By maintaining this semantic information in the instruction
> # set, implementations can exploit this information to avoid expending energy
> # or cycles to compute, forward and store the unused upper 32 bits of such
> # data types. Implementations are free to exploit this freedom in whatever way
> # they choose to save energy.

I did not notice that, that solves the issue.

>>> addr_reg almost certainly needs to be zero-extended for 32-bit guests, easily
>>> done by setting ext = 0 here.
>>
>> I can easily put an #ifdef just to be sure.
> 
> No ifdef, just the TARGET_LONG_BITS == 64 comparison works.
> 
>>> You initialize FP, but you don't reserve the register, so it's going to get
>>> clobbered.  We don't actually use the frame pointer in the translated code, so
>>> I don't think there's any call to actually initialize it either.
>>
>> The FP is not going to be clobbered, not by code here and not by called code.
>>
>> It is not going to be clobbered between our use before the jump and after the
>> jump, because all the called functions need to preserve FP as mandated by the
>> calling conventions.
>>
>> It is not going to be clobbered from the point of view of our caller,
>> because we save (FP, LR) along with (X19, X20) .. (X27, X28) and restore them
>> before returning.
> 
> Ah, well, I didn't see it mentioned here,
> 
>> +    tcg_regset_clear(s->reserved_regs);
>> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
>> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP);
>> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_X18); /* platform register */
> 
> but hadn't noticed that it's not listed in the reg_alloc_order.
> 
>> We use FP to point to the callee_saved registers, and to move to/from them
>> in the tcg_out_store_pair and tcg_out_load_pair functions.
> 
> I hadn't noticed you'd hard-coded FP into the load/store_pair functions.
> Let's *really* not do that.  Even if we decide to continue using it, let's
> pass it in explicitly.
> 
> But I don't see that you're really gaining anything in the prologue from
> using FP instead of SP.  It seems like a waste of a register to me.
> 
> 
> r~
> 


-- 
Claudio Fontana
Server OS Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

office: +49 89 158834 4135
mobile: +49 15253060158