From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51449) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF7G7-0006Zu-W9 for qemu-devel@nongnu.org; Tue, 14 Jul 2015 16:57:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZF7G2-0004z0-TM for qemu-devel@nongnu.org; Tue, 14 Jul 2015 16:56:59 -0400 Received: from mail-wi0-x22c.google.com ([2a00:1450:400c:c05::22c]:32779) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZF7G2-0004yT-Kf for qemu-devel@nongnu.org; Tue, 14 Jul 2015 16:56:54 -0400 Received: by widic2 with SMTP id ic2so47729412wid.0 for ; Tue, 14 Jul 2015 13:56:53 -0700 (PDT) Sender: Paolo Bonzini References: <1436891912-14742-1-git-send-email-leon.alrae@imgtec.com> <20150714170928.GC7569@aurel32.net> <55A552F1.70000@redhat.com> <20150714183735.GA2685@aurel32.net> From: Paolo Bonzini Message-ID: <55A57792.5070509@redhat.com> Date: Tue, 14 Jul 2015 22:56:50 +0200 MIME-Version: 1.0 In-Reply-To: <20150714183735.GA2685@aurel32.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH] target-mips: apply workaround for TCG optimizations for MFC1 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: Leon Alrae , qemu-devel@nongnu.org, rth@twiddle.net On 14/07/2015 20:37, Aurelien Jarno wrote: >> > >> > I certainly don't have a global view, so much that I didn't think at >> > all of the optimizer... Instead, it looks to me like a bug in the >> > register allocator. In particular this code in tcg_reg_alloc_mov: > That's exactly my point when I said that someone doesn't have a global > view. I think the fact that we don't check for type when simplifying > moves in the register allocator is intentional, the same way we simply > transform the trunc op into a mov op (except on sparc). This is done > because it's not needed for example on x86 and most architectures, > given 32-bit instructions do not care about the high part of the > registers. > > Basically size changing ops are trunc_i64_i32, ext_i32_i64 and > extu_i32_i64. We can be conservative and implement all of them as real > instructions in all TCG backends. In that case the mov op never has > to deal with registers of different size (just like we enforce that at > the TCG frotnend level), and the register allocator and the optimizer > do not have to deal with this. However that's suboptimal on some > architectures, that's why on x86 we decided to just replace the > trunc_i64_i32 by a move. But if we do this simplification it should be > done everywhere (in that case, including in the qemu_ld op). And > DOCUMENTED somewhere, given different choices can be made for different > backends. I think there are four cases: 1) 64-bit processors that do not have loads with 32-bit addresses, and do not zero extend on 32-bit operations---possibly because 32-bit operations do not exist at all. => qemu_ld/qemu_st must truncate the address ia64, s390, sparc all fall under this group. 2) 64-bit processors that have loads with 32-bit addresses. => qemu_ld/qemu_st can use 32-bit addresses to do the truncation aarch64, I think, falls under this group 3) Processors that do not have 32-bit loads, and automatically zero extend on 32-bit operations => qemu_ld/qemu_st could use 64-bit addresses and no truncation x86 currently falls under 3, because it doesn't use ADDR32, but the register allocator is breaking case 3 by forcing 64-bit operations when loading from a global. I am not sure if the optimizer could also break this case, or if it is working by chance. So, the simplest fix for 2.4 would be to add the prefix as suggested in the comment and make x86 fall under 2. If the optimizer is not breaking this case, fixing the register allocator would be an option, and then the ADDR32 prefix could be reverted. Even if the prefix was added, modifying the register allocator to use 32-bit loads would still be useful as an optimization, since on x86 32-bit loads are smaller than 64-bit loads. Paolo