From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39979)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1WPcQb-00067Y-UL
	for qemu-devel@nongnu.org; Mon, 17 Mar 2014 14:38:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1WPcQW-0001uS-2n
	for qemu-devel@nongnu.org; Mon, 17 Mar 2014 14:38:25 -0400
Received: from mail-qa0-x22a.google.com ([2607:f8b0:400d:c00::22a]:36964)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1WPcQV-0001uG-Uz
	for qemu-devel@nongnu.org; Mon, 17 Mar 2014 14:38:20 -0400
Received: by mail-qa0-f42.google.com with SMTP id k15so5855977qaq.1
	for <qemu-devel@nongnu.org>; Mon, 17 Mar 2014 11:38:19 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
From: Richard Henderson <rth@twiddle.net>
Date: Mon, 17 Mar 2014 11:37:49 -0700
Message-Id: <1395081476-6038-8-git-send-email-rth@twiddle.net>
In-Reply-To: <1395081476-6038-1-git-send-email-rth@twiddle.net>
References: <1395081476-6038-1-git-send-email-rth@twiddle.net>
Subject: [Qemu-devel] [PATCH 07/14] tcg-sparc: Implement muls2_i32
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: blauwirbel@gmail.com, aurelien@aurel32.net

Using the 32-bit SMUL is a tad more efficient than
resorting to extending and using the 64-bit MULX.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 18 +++++++++++++++---
 tcg/sparc/tcg-target.h |  2 +-
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index d086c10..43ede5b 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -200,6 +200,7 @@ static const int tcg_target_call_oarg_regs[] = {
 #define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x08))
 #define ARITH_SUBX (INSN_OP(2) | INSN_OP3(0x0c))
 #define ARITH_UMUL (INSN_OP(2) | INSN_OP3(0x0a))
+#define ARITH_SMUL (INSN_OP(2) | INSN_OP3(0x0b))
 #define ARITH_UDIV (INSN_OP(2) | INSN_OP3(0x0e))
 #define ARITH_SDIV (INSN_OP(2) | INSN_OP3(0x0f))
 #define ARITH_MULX (INSN_OP(2) | INSN_OP3(0x09))
@@ -1284,9 +1285,19 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                         ARITH_SUBCC, ARITH_SUBX);
         break;
     case INDEX_op_mulu2_i32:
-        tcg_out_arithc(s, args[0], args[2], args[3], const_args[3],
-                       ARITH_UMUL);
-        tcg_out_rdy(s, args[1]);
+        c = ARITH_UMUL;
+        goto do_mul2;
+    case INDEX_op_muls2_i32:
+        c = ARITH_SMUL;
+    do_mul2:
+        /* The 32-bit multiply insns produce a full 64-bit result.  If the
+           destination register can hold it, we can avoid the slower RDY.  */
+        tcg_out_arithc(s, args[0], args[2], args[3], const_args[3], c);
+        if (SPARC64 || args[0] <= TCG_REG_O7) {
+            tcg_out_arithi(s, args[1], args[0], 32, SHIFT_SRLX);
+        } else {
+            tcg_out_rdy(s, args[1]);
+        }
         break;
 
     case INDEX_op_qemu_ld_i32:
@@ -1418,6 +1429,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_sub2_i32, { "r", "r", "rZ", "rZ", "rJ", "rJ" } },
     { INDEX_op_mulu2_i32, { "r", "r", "rZ", "rJ" } },
+    { INDEX_op_muls2_i32, { "r", "r", "rZ", "rJ" } },
 
     { INDEX_op_mov_i64, { "R", "R" } },
     { INDEX_op_movi_i64, { "R" } },
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 5442d45..091224c 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -108,7 +108,7 @@ typedef enum {
 #define TCG_TARGET_HAS_add2_i32         1
 #define TCG_TARGET_HAS_sub2_i32         1
 #define TCG_TARGET_HAS_mulu2_i32        1
-#define TCG_TARGET_HAS_muls2_i32        0
+#define TCG_TARGET_HAS_muls2_i32        1
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
-- 
1.8.5.3