From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45542) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z0SnL-00029s-Hq for qemu-devel@nongnu.org; Thu, 04 Jun 2015 06:54:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Z0SnF-00015u-ES for qemu-devel@nongnu.org; Thu, 04 Jun 2015 06:54:43 -0400 Received: from mail-wi0-x22f.google.com ([2a00:1450:400c:c05::22f]:34325) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Z0SnF-00015e-8t for qemu-devel@nongnu.org; Thu, 04 Jun 2015 06:54:37 -0400 Received: by wibut5 with SMTP id ut5so17280980wib.1 for ; Thu, 04 Jun 2015 03:54:36 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <55702E68.3070908@redhat.com> Date: Thu, 04 Jun 2015 12:54:32 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1432510638-21021-1-git-send-email-aurelien@aurel32.net> <1432510638-21021-4-git-send-email-aurelien@aurel32.net> <556FDC26.9090302@twiddle.net> In-Reply-To: <556FDC26.9090302@twiddle.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v3 3/8] target-sh4: optimize addc using add2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson , Aurelien Jarno , qemu-devel@nongnu.org On 04/06/2015 07:03, Richard Henderson wrote: >> + tcg_gen_add2_i32(t1, t2, REG(B11_8), t0, REG(B7_4), t0); >> + tcg_gen_add2_i32(REG(B11_8), cpu_sr_t, t1, t2, cpu_sr_t, >> t0); > > Swap these two adds and you don't need t2. You can consume sr_t > immediately and start producing it in the same go. Could TCG do some kind of intra-basic-block live range splitting? In this case, the new sr_t could be allocated to a different register than the old one, saving one instruction on 2-address targets. The pseudocode below uses "dest, src" operand order: // add2(t1, cpu_sr_t, cpu_sr_t, t0, REG(B7_4), t0) add sr_t_in, B7_4 // instead of mov t1, sr_t; add t1, B7_4 mov sr_t_out, 0 adc sr_t_out, 0 // cout(B7_r + sr_t_in) // add2(REG(B11_8), cpu_sr_t, t1, cpu_sr_t, REG(B11_8), t0) add B11_8, sr_t_in // B11_8 + B7_4 + sr_t_in adc sr_t_out, 0 // cout(B11_8 + B7_4 + sr_t_in) Paolo