From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52319) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFHiE-00061f-Jq for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZFHiC-0000aV-DM for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:42 -0400 Received: from hall.aurel32.net ([2001:bc8:30d7:100::1]:46878) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZFHiC-0000aG-87 for qemu-devel@nongnu.org; Wed, 15 Jul 2015 04:06:40 -0400 Date: Wed, 15 Jul 2015 10:06:33 +0200 From: Aurelien Jarno Message-ID: <20150715080633.GJ11361@aurel32.net> References: <1436891912-14742-1-git-send-email-leon.alrae@imgtec.com> <20150714170928.GC7569@aurel32.net> <55A552F1.70000@redhat.com> <20150714183735.GA2685@aurel32.net> <55A57792.5070509@redhat.com> <20150714220938.GA11278@aurel32.net> <55A60C4C.3070406@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <55A60C4C.3070406@redhat.com> Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Leon Alrae , qemu-devel@nongnu.org, rth@twiddle.net On 2015-07-15 09:31, Paolo Bonzini wrote: > Ok, I see your point. If you put it like this :) the fault definitely > lies in the backends. What I'm proposing would be in a new > tcg_reg_alloc_trunc function, and it would require implementing a > non-noop trunc. Why not reusing the existing trunc_shr_i64_i32 op? AFAIU, it has been designed exactly for that. Actually I think we should implement the following ops as optional but *real* TCG ops: - trunc_shr_i64_i32 - extu_i32_i64 - ext_i32_i64 Then each backend can implement the one it considers necessary. If not implemented in a backend it is simply replaced by a mov. This would also allow to remove the "remember high bits as garbage" in the optimizer, which I consider a band aid more than a real fix. Note that we might have multiple choices for example on x86: 1) implement trunc_shr_i64_i32 and ext_i32_i64 This way we make sure that all 32-bit values are always stored zero-extended (even if a move has been propagated by the register allocator or by the optimizer). The extu_i32_i64 can therefore always be considered as a mov op. 2) implement extu_i32_i64 and ext_i32_i64 We have to guarantee that all 32-bit ops ignore the high part of the registers (which is not the case currently for qemu_ld/st in user mode) as they might contain garbage. Given that we have to properly zero and sign extend the value when converting a 32-bit value in a 64-bit value. > I still believe the register allocator can be improved to do 32-bit > loads, though as an optimization and not as a bugfix: > > > > Even if the prefix was added, modifying the register allocator to use > > > 32-bit loads would still be useful as an optimization, since on x86 > > > 32-bit loads are smaller than 64-bit loads. > > > > AFAIK, that's already the case. The REXW prefix is only emitted for > > 64-bit ops. > > Yes, but a load from a 64-bit register to a 32-bit destination emits > REX.W. From Leon's dump: > > mov_i32 tmp1,w0.d0 => mov 0xe8(%r14),%rbp > mov_i32 tmp0,tmp1 > mov_i32 t8,tmp0 => mov %ebp,0x60(%r14) > > Note %rbp as the load destination and %ebp as the source of the store. Indeed, that's something we might want to improve (and is due to the fact we have just replaced trunc_shr_i64_i32 by a move on x86). Note however that this simplification might be target specific (it is at least little endian specific if we don't adjust the address). -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://www.aurel32.net