From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:52319)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aurelien@aurel32.net>) id 1ZFHiE-00061f-Jq
	for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <aurelien@aurel32.net>) id 1ZFHiC-0000aV-DM
	for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:42 -0400
Received: from hall.aurel32.net ([2001:bc8:30d7:100::1]:46878)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <aurelien@aurel32.net>) id 1ZFHiC-0000aG-87
	for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:40 -0400
Date: Wed, 15 Jul 2015 10:06:33 +0200
From: Aurelien Jarno <aurelien@aurel32.net>
Message-ID: <20150715080633.GJ11361@aurel32.net>
References: <1436891912-14742-1-git-send-email-leon.alrae@imgtec.com>
	<20150714170928.GC7569@aurel32.net> <55A552F1.70000@redhat.com>
	<20150714183735.GA2685@aurel32.net> <55A57792.5070509@redhat.com>
	<20150714220938.GA11278@aurel32.net> <55A60C4C.3070406@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <55A60C4C.3070406@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG
 optimizations for MFC1
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Leon Alrae <leon.alrae@imgtec.com>, qemu-devel@nongnu.org, rth@twiddle.net

On 2015-07-15 09:31, Paolo Bonzini wrote:
> Ok, I see your point.  If you put it like this :) the fault definitely
> lies in the backends.  What I'm proposing would be in a new
> tcg_reg_alloc_trunc function, and it would require implementing a
> non-noop trunc.

Why not reusing the existing trunc_shr_i64_i32 op? AFAIU, it has been 
designed exactly for that.

Actually I think we should implement the following ops as optional but
*real* TCG ops:
- trunc_shr_i64_i32
- extu_i32_i64
- ext_i32_i64

Then each backend can implement the one it considers necessary. If not
implemented in a backend it is simply replaced by a mov. This would also
allow to remove the "remember high bits as garbage" in the optimizer,
which I consider a band aid more than a real fix.

Note that we might have multiple choices for example on x86:

1) implement trunc_shr_i64_i32 and ext_i32_i64
This way we make sure that all 32-bit values are always stored
zero-extended (even if a move has been propagated by the register
allocator or by the optimizer). The extu_i32_i64 can therefore always
be considered as a mov op.

2) implement extu_i32_i64 and ext_i32_i64
We have to guarantee that all 32-bit ops ignore the high part of the
registers (which is not the case currently for qemu_ld/st in user mode)
as they might contain garbage. Given that we have to properly zero and
sign extend the value when converting a 32-bit value in a 64-bit value.

> I still believe the register allocator can be improved to do 32-bit
> loads, though as an optimization and not as a bugfix:
> 
> > > Even if the prefix was added, modifying the register allocator to use
> > > 32-bit loads would still be useful as an optimization, since on x86
> > > 32-bit loads are smaller than 64-bit loads.
> >
> > AFAIK, that's already the case. The REXW prefix is only emitted for
> > 64-bit ops.
> 
> Yes, but a load from a 64-bit register to a 32-bit destination emits
> REX.W.  From Leon's dump:
> 
>  mov_i32 tmp1,w0.d0  => mov    0xe8(%r14),%rbp
>  mov_i32 tmp0,tmp1
>  mov_i32 t8,tmp0     => mov    %ebp,0x60(%r14)
> 
> Note %rbp as the load destination and %ebp as the source of the store.

Indeed, that's something we might want to improve (and is due to the
fact we have just replaced trunc_shr_i64_i32 by a move on x86). Note
however that this simplification might be target specific (it is at
least little endian specific if we don't adjust the address).

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net