* [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets
@ 2004-07-19 15:35 Maciej W. Rozycki
2004-07-19 16:59 ` Richard Sandiford
0 siblings, 1 reply; 36+ messages in thread
From: Maciej W. Rozycki @ 2004-07-19 15:35 UTC (permalink / raw)
To: Richard Sandiford; +Cc: Ralf Baechle, gcc-patches, linux-mips
Hello Richard,
Your change to gcc 3.5:
2004-05-17 Richard Sandiford <rsandifo@redhat.com>
* config/mips/mips.h (MASK_DEBUG_G, TARGET_DEBUG_G_MODE): Delete.
(TARGET_SWITCHES): Remove debugg.
* config/mips/mips.md (adddi3, ashldi3, ashrdi3, lshrdi3): Only handle
TARGET_64BIT.
(subdi3): Replace the define_expand with a define_insn, the latter
renamed from subdi3_internal_3.
(negdi2): Likewise negdi2_internal_2.
(adddi3_internal_[12], subdi3_internal, ashldi3_internal{,2,3})
(ashrdi3_internal{,2,3}, lshrdi3_internal{,2,3}): Remove patterns
and associated define_splits.
(adddi3_internal): Renamed from adddi3_internal_3.
(ashldi3_internal): Likewise ashldi3_internal4.
(ashrdi3_internal): Likewise ashrdi3_internal4.
(lshrdi3_internal): Likewise lshrdi3_internal4.
breaks 32-bit Linux builds. Linux relies on simple operations
(addition/subtraction and shifts) on "long long" variables being
implemented inline without a call to libgcc, which isn't linked in.
After your change Linux has unresolved references to external __ashldi3(),
__ashrdi3() and __lshrdi3() functions at the final link.
Here is a complete revert of the relevant changes that works for me, but
please feel free to provide a replacement. Either way, please make
ashldi3, ashrdi3 and lshrdi3 available for 32-bit targets again.
2004-07-19 Maciej W. Rozycki <macro@linux-mips.org>
* config/mips/mips.md (ashldi3, ashrdi3, lshrdi3): Handle
!TARGET_64BIT again, partially reverting the change from
2004-05-17.
(ashldi3_internal{,2,3}, ashrdi3_internal{,2,3},
lshrdi3_internal{,2,3}): Restore patterns and associated
define_splits.
(ashldi3_internal4): Rename from ashldi3_internal.
(ashrdi3_internal4): Likewise ashrdi3_internal.
(lshrdi3_internal4): Likewise lshrdi3_internal.
Maciej
gcc-3.5.0-20040714-mips-xshxdi32.patch
diff -up --recursive --new-file gcc-3.5.0-20040714.macro/gcc/config/mips/mips.md gcc-3.5.0-20040714/gcc/config/mips/mips.md
--- gcc-3.5.0-20040714.macro/gcc/config/mips/mips.md 2004-07-13 04:59:54.000000000 +0000
+++ gcc-3.5.0-20040714/gcc/config/mips/mips.md 2004-07-17 19:00:14.000000000 +0000
@@ -4997,37 +4997,213 @@ dsrl\t%3,%3,1\n\
{ operands[2] = GEN_INT (INTVAL (operands[2]) - 8); })
(define_expand "ashldi3"
- [(set (match_operand:DI 0 "register_operand")
- (ashift:DI (match_operand:DI 1 "register_operand")
- (match_operand:SI 2 "arith_operand")))]
- "TARGET_64BIT"
+ [(parallel [(set (match_operand:DI 0 "register_operand")
+ (ashift:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "arith_operand")))
+ (clobber (match_dup 3))])]
+ "TARGET_64BIT || !TARGET_MIPS16"
{
- /* On the mips16, a shift of more than 8 is a four byte
- instruction, so, for a shift between 8 and 16, it is just as
- fast to do two shifts of 8 or less. If there is a lot of
- shifting going on, we may win in CSE. Otherwise combine will
- put the shifts back together again. This can be called by
- function_arg, so we must be careful not to allocate a new
- register if we've reached the reload pass. */
- if (TARGET_MIPS16
- && optimize
- && GET_CODE (operands[2]) == CONST_INT
- && INTVAL (operands[2]) > 8
- && INTVAL (operands[2]) <= 16
- && ! reload_in_progress
- && ! reload_completed)
+ if (TARGET_64BIT)
{
- rtx temp = gen_reg_rtx (DImode);
+ /* On the mips16, a shift of more than 8 is a four byte
+ instruction, so, for a shift between 8 and 16, it is just as
+ fast to do two shifts of 8 or less. If there is a lot of
+ shifting going on, we may win in CSE. Otherwise combine will
+ put the shifts back together again. This can be called by
+ function_arg, so we must be careful not to allocate a new
+ register if we've reached the reload pass. */
+ if (TARGET_MIPS16
+ && optimize
+ && GET_CODE (operands[2]) == CONST_INT
+ && INTVAL (operands[2]) > 8
+ && INTVAL (operands[2]) <= 16
+ && ! reload_in_progress
+ && ! reload_completed)
+ {
+ rtx temp = gen_reg_rtx (DImode);
- emit_insn (gen_ashldi3_internal (temp, operands[1], GEN_INT (8)));
- emit_insn (gen_ashldi3_internal (operands[0], temp,
- GEN_INT (INTVAL (operands[2]) - 8)));
+ emit_insn (gen_ashldi3_internal4 (temp, operands[1], GEN_INT (8)));
+ emit_insn (gen_ashldi3_internal4 (operands[0], temp,
+ GEN_INT (INTVAL (operands[2]) - 8)));
+ DONE;
+ }
+
+ emit_insn (gen_ashldi3_internal4 (operands[0], operands[1],
+ operands[2]));
DONE;
}
+
+ operands[3] = gen_reg_rtx (SImode);
})
(define_insn "ashldi3_internal"
+ [(set (match_operand:DI 0 "register_operand" "=&d")
+ (ashift:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "register_operand" "d")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16"
+ "sll\t%3,%2,26\;\
+bgez\t%3,1f%#\;\
+sll\t%M0,%L1,%2\;\
+%(b\t3f\;\
+move\t%L0,%.%)\
+\n\n\
+%~1:\;\
+%(beq\t%3,%.,2f\;\
+sll\t%M0,%M1,%2%)\
+\n\;\
+subu\t%3,%.,%2\;\
+srl\t%3,%L1,%3\;\
+or\t%M0,%M0,%3\n\
+%~2:\;\
+sll\t%L0,%L1,%2\n\
+%~3:"
+ [(set_attr "type" "multi")
+ (set_attr "mode" "SI")
+ (set_attr "length" "48")])
+
+
+(define_insn "ashldi3_internal2"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (ashift:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16
+ && (INTVAL (operands[2]) & 32) != 0"
+{
+ operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);
+ return "sll\t%M0,%L1,%2\;move\t%L0,%.";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "8")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashift:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4) (ashift:SI (subreg:SI (match_dup 1) 0) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 0) (const_int 0))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashift:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0) (ashift:SI (subreg:SI (match_dup 1) 4) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 4) (const_int 0))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_insn "ashldi3_internal3"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (ashift:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+{
+ int amount = INTVAL (operands[2]);
+
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+
+ return "sll\t%M0,%M1,%2\;srl\t%3,%L1,%4\;or\t%M0,%M0,%3\;sll\t%L0,%L1,%2";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "16")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashift:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4)
+ (ashift:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (ior:SI (subreg:SI (match_dup 0) 4)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (ashift:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashift:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0)
+ (ashift:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (ior:SI (subreg:SI (match_dup 0) 0)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (ashift:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_insn "ashldi3_internal4"
[(set (match_operand:DI 0 "register_operand" "=d")
(ashift:DI (match_operand:DI 1 "register_operand" "d")
(match_operand:SI 2 "arith_operand" "dI")))]
@@ -5157,33 +5333,208 @@ dsrl\t%3,%3,1\n\
{ operands[2] = GEN_INT (INTVAL (operands[2]) - 8); })
(define_expand "ashrdi3"
- [(set (match_operand:DI 0 "register_operand")
- (ashiftrt:DI (match_operand:DI 1 "register_operand")
- (match_operand:SI 2 "arith_operand")))]
- "TARGET_64BIT"
+ [(parallel [(set (match_operand:DI 0 "register_operand")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "arith_operand")))
+ (clobber (match_dup 3))])]
+ "TARGET_64BIT || !TARGET_MIPS16"
{
- /* On the mips16, a shift of more than 8 is a four byte
- instruction, so, for a shift between 8 and 16, it is just as
- fast to do two shifts of 8 or less. If there is a lot of
- shifting going on, we may win in CSE. Otherwise combine will
- put the shifts back together again. */
- if (TARGET_MIPS16
- && optimize
- && GET_CODE (operands[2]) == CONST_INT
- && INTVAL (operands[2]) > 8
- && INTVAL (operands[2]) <= 16)
+ if (TARGET_64BIT)
{
- rtx temp = gen_reg_rtx (DImode);
+ /* On the mips16, a shift of more than 8 is a four byte
+ instruction, so, for a shift between 8 and 16, it is just as
+ fast to do two shifts of 8 or less. If there is a lot of
+ shifting going on, we may win in CSE. Otherwise combine will
+ put the shifts back together again. */
+ if (TARGET_MIPS16
+ && optimize
+ && GET_CODE (operands[2]) == CONST_INT
+ && INTVAL (operands[2]) > 8
+ && INTVAL (operands[2]) <= 16)
+ {
+ rtx temp = gen_reg_rtx (DImode);
+
+ emit_insn (gen_ashrdi3_internal4 (temp, operands[1], GEN_INT (8)));
+ emit_insn (gen_ashrdi3_internal4 (operands[0], temp,
+ GEN_INT (INTVAL (operands[2]) - 8)));
+ DONE;
+ }
- emit_insn (gen_ashrdi3_internal (temp, operands[1], GEN_INT (8)));
- emit_insn (gen_ashrdi3_internal (operands[0], temp,
- GEN_INT (INTVAL (operands[2]) - 8)));
+ emit_insn (gen_ashrdi3_internal4 (operands[0], operands[1],
+ operands[2]));
DONE;
}
+
+ operands[3] = gen_reg_rtx (SImode);
})
(define_insn "ashrdi3_internal"
+ [(set (match_operand:DI 0 "register_operand" "=&d")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "register_operand" "d")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16"
+ "sll\t%3,%2,26\;\
+bgez\t%3,1f%#\;\
+sra\t%L0,%M1,%2\;\
+%(b\t3f\;\
+sra\t%M0,%M1,31%)\
+\n\n\
+%~1:\;\
+%(beq\t%3,%.,2f\;\
+srl\t%L0,%L1,%2%)\
+\n\;\
+subu\t%3,%.,%2\;\
+sll\t%3,%M1,%3\;\
+or\t%L0,%L0,%3\n\
+%~2:\;\
+sra\t%M0,%M1,%2\n\
+%~3:"
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "48")])
+
+
+(define_insn "ashrdi3_internal2"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && (INTVAL (operands[2]) & 32) != 0"
+{
+ operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);
+ return "sra\t%L0,%M1,%2\;sra\t%M0,%M1,31";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "8")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0) (ashiftrt:SI (subreg:SI (match_dup 1) 4) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 4) (ashiftrt:SI (subreg:SI (match_dup 1) 4) (const_int 31)))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4) (ashiftrt:SI (subreg:SI (match_dup 1) 0) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 0) (ashiftrt:SI (subreg:SI (match_dup 1) 0) (const_int 31)))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_insn "ashrdi3_internal3"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+{
+ int amount = INTVAL (operands[2]);
+
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+
+ return "srl\t%L0,%L1,%2\;sll\t%3,%M1,%4\;or\t%L0,%L0,%3\;sra\t%M0,%M1,%2";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "16")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (ashift:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (ior:SI (subreg:SI (match_dup 0) 0)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (ashiftrt:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (ashiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (ashift:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (ior:SI (subreg:SI (match_dup 0) 4)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (ashiftrt:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_insn "ashrdi3_internal4"
[(set (match_operand:DI 0 "register_operand" "=d")
(ashiftrt:DI (match_operand:DI 1 "register_operand" "d")
(match_operand:SI 2 "arith_operand" "dI")))]
@@ -5332,33 +5683,209 @@ dsrl\t%3,%3,1\n\
(set_attr "length" "16")])
(define_expand "lshrdi3"
- [(set (match_operand:DI 0 "register_operand")
- (lshiftrt:DI (match_operand:DI 1 "register_operand")
- (match_operand:SI 2 "arith_operand")))]
- "TARGET_64BIT"
+ [(parallel [(set (match_operand:DI 0 "register_operand")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "arith_operand")))
+ (clobber (match_dup 3))])]
+ "TARGET_64BIT || !TARGET_MIPS16"
{
- /* On the mips16, a shift of more than 8 is a four byte
- instruction, so, for a shift between 8 and 16, it is just as
- fast to do two shifts of 8 or less. If there is a lot of
- shifting going on, we may win in CSE. Otherwise combine will
- put the shifts back together again. */
- if (TARGET_MIPS16
- && optimize
- && GET_CODE (operands[2]) == CONST_INT
- && INTVAL (operands[2]) > 8
- && INTVAL (operands[2]) <= 16)
+ if (TARGET_64BIT)
{
- rtx temp = gen_reg_rtx (DImode);
+ /* On the mips16, a shift of more than 8 is a four byte
+ instruction, so, for a shift between 8 and 16, it is just as
+ fast to do two shifts of 8 or less. If there is a lot of
+ shifting going on, we may win in CSE. Otherwise combine will
+ put the shifts back together again. */
+ if (TARGET_MIPS16
+ && optimize
+ && GET_CODE (operands[2]) == CONST_INT
+ && INTVAL (operands[2]) > 8
+ && INTVAL (operands[2]) <= 16)
+ {
+ rtx temp = gen_reg_rtx (DImode);
- emit_insn (gen_lshrdi3_internal (temp, operands[1], GEN_INT (8)));
- emit_insn (gen_lshrdi3_internal (operands[0], temp,
- GEN_INT (INTVAL (operands[2]) - 8)));
+ emit_insn (gen_lshrdi3_internal4 (temp, operands[1], GEN_INT (8)));
+ emit_insn (gen_lshrdi3_internal4 (operands[0], temp,
+ GEN_INT (INTVAL (operands[2]) - 8)));
+ DONE;
+ }
+
+ emit_insn (gen_lshrdi3_internal4 (operands[0], operands[1],
+ operands[2]));
DONE;
}
+
+ operands[3] = gen_reg_rtx (SImode);
})
(define_insn "lshrdi3_internal"
+ [(set (match_operand:DI 0 "register_operand" "=&d")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "register_operand" "d")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16"
+ "sll\t%3,%2,26\;\
+bgez\t%3,1f%#\;\
+srl\t%L0,%M1,%2\;\
+%(b\t3f\;\
+move\t%M0,%.%)\
+\n\n\
+%~1:\;\
+%(beq\t%3,%.,2f\;\
+srl\t%L0,%L1,%2%)\
+\n\;\
+subu\t%3,%.,%2\;\
+sll\t%3,%M1,%3\;\
+or\t%L0,%L0,%3\n\
+%~2:\;\
+srl\t%M0,%M1,%2\n\
+%~3:"
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "48")])
+
+
+(define_insn "lshrdi3_internal2"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16
+ && (INTVAL (operands[2]) & 32) != 0"
+{
+ operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);
+ return "srl\t%L0,%M1,%2\;move\t%M0,%.";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "8")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0) (lshiftrt:SI (subreg:SI (match_dup 1) 4) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 4) (const_int 0))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 32) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4) (lshiftrt:SI (subreg:SI (match_dup 1) 0) (match_dup 2)))
+ (set (subreg:SI (match_dup 0) 0) (const_int 0))]
+
+ "operands[2] = GEN_INT (INTVAL (operands[2]) & 0x1f);")
+
+
+(define_insn "lshrdi3_internal3"
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
+ (match_operand:SI 2 "small_int" "IJK")))
+ (clobber (match_operand:SI 3 "register_operand" "=d"))]
+ "!TARGET_64BIT && !TARGET_MIPS16
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+{
+ int amount = INTVAL (operands[2]);
+
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+
+ return "srl\t%L0,%L1,%2\;sll\t%3,%M1,%4\;or\t%L0,%L0,%3\;srl\t%M0,%M1,%2";
+}
+ [(set_attr "type" "multi")
+ (set_attr "mode" "DI")
+ (set_attr "length" "16")])
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && !WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 0)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (ashift:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (ior:SI (subreg:SI (match_dup 0) 0)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (lshiftrt:DI (match_operand:DI 1 "register_operand")
+ (match_operand:SI 2 "small_int")))
+ (clobber (match_operand:SI 3 "register_operand"))]
+ "reload_completed && WORDS_BIG_ENDIAN && !TARGET_64BIT
+ && !TARGET_DEBUG_D_MODE && !TARGET_MIPS16
+ && GET_CODE (operands[0]) == REG && REGNO (operands[0]) < FIRST_PSEUDO_REGISTER
+ && GET_CODE (operands[1]) == REG && REGNO (operands[1]) < FIRST_PSEUDO_REGISTER
+ && (INTVAL (operands[2]) & 63) < 32
+ && (INTVAL (operands[2]) & 63) != 0"
+
+ [(set (subreg:SI (match_dup 0) 4)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 4)
+ (match_dup 2)))
+
+ (set (match_dup 3)
+ (ashift:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 4)))
+
+ (set (subreg:SI (match_dup 0) 4)
+ (ior:SI (subreg:SI (match_dup 0) 4)
+ (match_dup 3)))
+
+ (set (subreg:SI (match_dup 0) 0)
+ (lshiftrt:SI (subreg:SI (match_dup 1) 0)
+ (match_dup 2)))]
+{
+ int amount = INTVAL (operands[2]);
+ operands[2] = GEN_INT (amount & 31);
+ operands[4] = GEN_INT ((-amount) & 31);
+})
+
+
+(define_insn "lshrdi3_internal4"
[(set (match_operand:DI 0 "register_operand" "=d")
(lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
(match_operand:SI 2 "arith_operand" "dI")))]
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-19 15:35 [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki @ 2004-07-19 16:59 ` Richard Sandiford 2004-07-19 17:32 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bittargets David Edelsohn 2004-07-19 17:33 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki 0 siblings, 2 replies; 36+ messages in thread From: Richard Sandiford @ 2004-07-19 16:59 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Ralf Baechle, gcc-patches, linux-mips "Maciej W. Rozycki" <macro@linux-mips.org> writes: > Linux relies on simple operations (addition/subtraction and shifts) on > "long long" variables being implemented inline without a call to > libgcc, which isn't linked in. Sorry, but I don't think this is a reasonable expection for 64-bit shifts on 32-bit targets. If linux insists on not using libgcc, it should provide: > After your change Linux has unresolved references to external __ashldi3(), > __ashrdi3() and __lshrdi3() functions at the final link. ...these functions itself. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bittargets 2004-07-19 16:59 ` Richard Sandiford @ 2004-07-19 17:32 ` David Edelsohn 2004-07-19 17:33 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki 1 sibling, 0 replies; 36+ messages in thread From: David Edelsohn @ 2004-07-19 17:32 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Ralf Baechle, gcc-patches, linux-mips >>>>> Richard Sandiford writes: > Sorry, but I don't think this is a reasonable expection for 64-bit > shifts on 32-bit targets. If linux insists on not using libgcc, > it should provide: Other targets provide those DImode operations. Part of the mission of GCC is to support GNU and GNU/Linux. David ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-19 16:59 ` Richard Sandiford 2004-07-19 17:32 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bittargets David Edelsohn @ 2004-07-19 17:33 ` Maciej W. Rozycki 2004-07-19 17:37 ` Richard Sandiford 2004-07-19 21:38 ` Richard Henderson 1 sibling, 2 replies; 36+ messages in thread From: Maciej W. Rozycki @ 2004-07-19 17:33 UTC (permalink / raw) To: Richard Sandiford; +Cc: Ralf Baechle, gcc-patches, linux-mips On Mon, 19 Jul 2004, Richard Sandiford wrote: > > Linux relies on simple operations (addition/subtraction and shifts) on > > "long long" variables being implemented inline without a call to > > libgcc, which isn't linked in. > > Sorry, but I don't think this is a reasonable expection for 64-bit > shifts on 32-bit targets. If linux insists on not using libgcc, See e.g: "http://www.ussg.iu.edu/hypermail/linux/kernel/0009.2/0655.html" for a rationale behind that. > it should provide: > > > After your change Linux has unresolved references to external __ashldi3(), > > __ashrdi3() and __lshrdi3() functions at the final link. > > ...these functions itself. Well, other targets, like the i386 (which didn't even have a 64-bit variation till recently), do not force Linux to go through such contortions. I can't see a reason why MIPS should be different -- it's not any harder to implement shifts for this processor than for an average other platform. Anyway, the patch works for me and it has been published so that others can use it, thus I have no incentive to do anything else, sorry. Maciej ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-19 17:33 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki @ 2004-07-19 17:37 ` Richard Sandiford 2004-07-19 21:38 ` Richard Henderson 1 sibling, 0 replies; 36+ messages in thread From: Richard Sandiford @ 2004-07-19 17:37 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Ralf Baechle, gcc-patches, linux-mips "Maciej W. Rozycki" <macro@linux-mips.org> writes: > Well, other targets, like the i386 (which didn't even have a 64-bit > variation till recently), do not force Linux to go through such > contortions. But arm targets (for instance) don't provide inline 64-bit shifts, do they? I don't think there's anything special in the MIPS ISA that makes them easier for MIPS than they are for ARM. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-19 17:33 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki 2004-07-19 17:37 ` Richard Sandiford @ 2004-07-19 21:38 ` Richard Henderson 2004-07-23 14:41 ` Maciej W. Rozycki 1 sibling, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-07-19 21:38 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Richard Sandiford, Ralf Baechle, gcc-patches, linux-mips On Mon, Jul 19, 2004 at 07:33:14PM +0200, Maciej W. Rozycki wrote: > Well, other targets, like the i386 (which didn't even have a 64-bit > variation till recently)... Except that 80386 has 64-bit shifts in hardware. And in rebuttal to the "does not make linux jump through hoops" argument, see arch/*/lib/ for arm, h8300, m68k, sparc, v850. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-19 21:38 ` Richard Henderson @ 2004-07-23 14:41 ` Maciej W. Rozycki 2004-07-23 20:27 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Maciej W. Rozycki @ 2004-07-23 14:41 UTC (permalink / raw) To: Richard Henderson Cc: Richard Sandiford, Ralf Baechle, gcc-patches, linux-mips On Mon, 19 Jul 2004, Richard Henderson wrote: > > Well, other targets, like the i386 (which didn't even have a 64-bit > > variation till recently)... > > Except that 80386 has 64-bit shifts in hardware. Indeed -- I tend to forget of these two, sigh... > And in rebuttal to the "does not make linux jump through hoops" > argument, see arch/*/lib/ for arm, h8300, m68k, sparc, v850. OK -- but then is there any way to convince gcc to embed a "static inline" version of these functions instead of emitting a call? Sometimes putting these eight (or nine for ashrdi3) instructions inline would be a performance win. Maciej ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-23 14:41 ` Maciej W. Rozycki @ 2004-07-23 20:27 ` Richard Henderson 2004-07-23 21:12 ` Ralf Baechle 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-07-23 20:27 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Richard Sandiford, Ralf Baechle, gcc-patches, linux-mips On Fri, Jul 23, 2004 at 04:41:35PM +0200, Maciej W. Rozycki wrote: > OK -- but then is there any way to convince gcc to embed a "static > inline" version of these functions instead of emitting a call? No. > Sometimes putting these eight (or nine for ashrdi3) instructions > inline would be a performance win. Sometimes, maybe. I suspect you'll find that in general it's nothing but bloat. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-23 20:27 ` Richard Henderson @ 2004-07-23 21:12 ` Ralf Baechle 2004-07-26 11:56 ` Maciej W. Rozycki 0 siblings, 1 reply; 36+ messages in thread From: Ralf Baechle @ 2004-07-23 21:12 UTC (permalink / raw) To: Richard Henderson, Maciej W. Rozycki, Richard Sandiford, gcc-patches, linux-mips On Fri, Jul 23, 2004 at 01:27:03PM -0700, Richard Henderson wrote: > > Sometimes putting these eight (or nine for ashrdi3) instructions > > inline would be a performance win. > > Sometimes, maybe. I suspect you'll find that in general it's > nothing but bloat. With a bit of hand waiving because haven't done benchmarks I guess Richard might be right. The subroutine calling overhead on modern processors is rather low and smaller code means better cache hit rates ... Ralf ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-23 21:12 ` Ralf Baechle @ 2004-07-26 11:56 ` Maciej W. Rozycki 2004-08-02 20:03 ` Nigel Stephens 0 siblings, 1 reply; 36+ messages in thread From: Maciej W. Rozycki @ 2004-07-26 11:56 UTC (permalink / raw) To: Ralf Baechle Cc: Richard Henderson, Richard Sandiford, gcc-patches, linux-mips On Fri, 23 Jul 2004, Ralf Baechle wrote: > With a bit of hand waiving because haven't done benchmarks I guess Richard > might be right. The subroutine calling overhead on modern processors is > rather low and smaller code means better cache hit rates ... Well, I just worry the call may itself include at least the same number of instructions as the callee if inlined. There would be no way for it to be faster. That may happen for a leaf function -- the call itself, plus $ra saving/restoration is already four instructions. Now it's sufficient for two statics to be needed to preserve temporaries across such a call and the size of the caller is already the same. With three statics, you lose even for a non-leaf function. That's for a function containing a single call to such a shift -- if there are more, then you may win (but is it common?). So not only it may not be faster, but the resulting code may be bigger as well. That said, the current GCC's implementation of these operations is not exactly optimal for current MIPS processors. That's trivial to deal with in Linux, but would it be possible to pick a different implementation from libgcc based on the "-march=" setting, too? Maciej ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-07-26 11:56 ` Maciej W. Rozycki @ 2004-08-02 20:03 ` Nigel Stephens 2004-08-03 5:30 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Nigel Stephens @ 2004-08-02 20:03 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Ralf Baechle, Richard Henderson, Richard Sandiford, gcc-patches, linux-mips Maciej W. Rozycki wrote: >On Fri, 23 Jul 2004, Ralf Baechle wrote: > > > >>With a bit of hand waiving because haven't done benchmarks I guess Richard >>might be right. The subroutine calling overhead on modern processors is >>rather low and smaller code means better cache hit rates ... >> >> > > Well, I just worry the call may itself include at least the same number >of instructions as the callee if inlined. There would be no way for it to >be faster. > > That may happen for a leaf function -- the call itself, plus $ra >saving/restoration is already four instructions. Now it's sufficient for >two statics to be needed to preserve temporaries across such a call and >the size of the caller is already the same. With three statics, you lose >even for a non-leaf function. That's for a function containing a single >call to such a shift -- if there are more, then you may win (but is it >common?). > > So not only it may not be faster, but the resulting code may be bigger as >well. That said, the current GCC's implementation of these operations is >not exactly optimal for current MIPS processors. That's trivial to deal >with in Linux, but would it be possible to pick a different implementation >from libgcc based on the "-march=" setting, too? > > > I second Maciej. My own recent experience when tuning the hell out of a software floating-point emulator was that efficient 64-bit shifts were really critical. I have a patch against gcc-3.4 which makes the 64-bit inline shifts somewhat smaller on ISAs which include the conditional move (movz/movn) instructions, but more importantly removes all branches from the inline code - which can be very expensive on long pipeline CPUs, since in this sort of code they tend to cause many branch mispredicts. Let me know if you want me to extract the patch - here's a table of the number of instructions generated by the original md pattern and the patched version: Instructions Old New ashldi3 12 9 ashrdi3 12 12 lshrdi3 12 9 If people really don't like the inline expansion, then maybe it could be enabled or disabled by a new -m option. Nigel -- Nigel Stephens Mailto:nigel@mips.com _ _ ____ ___ MIPS Technologies Phone.: +44 1223 706200 |\ /|||___)(___ The Fruit Farm Direct: +44 1223 706207 | \/ ||| ____) Ely Road, Chittering Fax...: +44 1223 706250 TECHNOLOGIES UK Cambridge CB5 9PH Cell..: +44 7976 686470 England http://www.mips.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-02 20:03 ` Nigel Stephens @ 2004-08-03 5:30 ` Richard Sandiford 2004-08-03 9:22 ` Nigel Stephens 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-08-03 5:30 UTC (permalink / raw) To: Nigel Stephens Cc: Maciej W. Rozycki, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Nigel Stephens <nigel@mips.com> writes: > I have a patch against gcc-3.4 which makes the 64-bit inline shifts > somewhat smaller on ISAs which include the conditional move > (movz/movn) instructions, but more importantly removes all branches > from the inline code - which can be very expensive on long pipeline > CPUs, since in this sort of code they tend to cause many branch > mispredicts. Let me know if you want me to extract the patch - here's > a table of the number of instructions generated by the original md > pattern and the patched version: > > Instructions > Old New > ashldi3 12 9 > ashrdi3 12 12 > lshrdi3 12 9 > > > If people really don't like the inline expansion, then maybe it could be > enabled or disabled by a new -m option. IMO, controlling with optimize_size would be enough. But it sounds from your description like the patch just adds a new hard-coded multi-insn asm string. Is that right? If so, I'd really like to avoid that. It would much better IMO if we handle this in the target-independent parts of the compiler. We can already open-code certain non-native operations, it's "just" that wide shifts are a missing case. If we handle it in a target-independent way, with each insn exposed separately, we will be able to optimize special cases better. We'll also get the usual scheduling benefits. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-03 5:30 ` Richard Sandiford @ 2004-08-03 9:22 ` Nigel Stephens 2004-08-03 9:36 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Nigel Stephens @ 2004-08-03 9:22 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Richard Sandiford wrote: >Nigel Stephens <nigel@mips.com> writes: > > >>I have a patch against gcc-3.4 >><snip> >>If people really don't like the inline expansion, then maybe it could be >>enabled or disabled by a new -m option. >> >> > >IMO, controlling with optimize_size would be enough. > Yes, that sounds right. >But it sounds from >your description like the patch just adds a new hard-coded multi-insn >asm string. Is that right? If so, I'd really like to avoid that. > > > Yes, and I totally agree with you. >It would much better IMO if we handle this in the target-independent >parts of the compiler. We can already open-code certain non-native >operations, it's "just" that wide shifts are a missing case. > > > >If we handle it in a target-independent way, with each insn exposed >separately, we will be able to optimize special cases better. >We'll also get the usual scheduling benefits. > > I agree that we should open-code it for the obvious reasons, but does it have to be target independent, or could/should we prototype it with define_expand? Nigel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-03 9:22 ` Nigel Stephens @ 2004-08-03 9:36 ` Richard Sandiford 2004-08-03 9:54 ` Nigel Stephens 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-08-03 9:36 UTC (permalink / raw) To: Nigel Stephens Cc: Maciej W. Rozycki, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Nigel Stephens <nigel@mips.com> writes: >>If we handle it in a target-independent way, with each insn exposed >>separately, we will be able to optimize special cases better. >>We'll also get the usual scheduling benefits. > > I agree that we should open-code it for the obvious reasons, but does it > have to be target independent, or could/should we prototype it with > define_expand? I think we should only use define_expands if there's a truly MIPS-specific feature in the expansion (as there is in the block move stuff, for example, where we use left/right loads and stores). Now obviously I'm only guessing what insn sequence you're using, but I suspect it doesn't involve anything that the middle-end couldn't work out from stock optabs. If there are different trade-offs to be made during the expansion, they should probably be predicated on rtx_costs. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-03 9:36 ` Richard Sandiford @ 2004-08-03 9:54 ` Nigel Stephens 2004-08-04 19:57 ` Maciej W. Rozycki 0 siblings, 1 reply; 36+ messages in thread From: Nigel Stephens @ 2004-08-03 9:54 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Richard Sandiford wrote: >I think we should only use define_expands if there's a truly >MIPS-specific feature in the expansion (as there is in the block >move stuff, for example, where we use left/right loads and stores). > > > Fair enough. >Now obviously I'm only guessing what insn sequence you're using, > > OK, the simplest thing is for me to attach the define_insns. See below. Note that there is one slightly controversial aspect of these sequences, which is that they don't truncate the shift count, so a shift outside of the range 0 to 63 will generate an "unusual" result. This didn't cause any regression failures, and I believe that this is strictly speaking acceptable for C, since a shift is undefined outside of this range - but it could cause some "buggy" code to break. It wouldn't be hard to add an extra mask with 0x3f if people were nervous about this - it's just that I didn't have enough spare temp registers within the constraints of the existing DImode patterns. ---- cut here --- ;; XXX Would be better done using define_expand, so it can be scheduled ;; XXX Note won't handle a shift count outside the range 0 - 63 (define_insn "ashldi3_internal_movc" [(set (match_operand:DI 0 "register_operand" "=&d") (ashift:DI (match_operand:DI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "d"))) (clobber (match_operand:SI 3 "register_operand" "=&d"))] "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16 && ISA_HAS_CONDMOVE" "subu\t%3,%.,%2\;\ sll\t%M0,%M1,%2\;\ srl\t%3,%L1,%3\;\ sll\t%L0,%L1,%2\;\ movz\t%3,%.,%2\;\ or\t%M0,%M0,%3\;\ and\t%3,%2,32\;\ movn\t%M0,%L0,%3\;\ movn\t%L0,%.,%3" [(set_attr "type" "darith") (set_attr "mode" "DI") (set_attr "length" "36")]) ;; Same length as before, but avoids branches ;; XXX Note won't handle a shift count outside the range 0 - 63 (define_insn "ashrdi3_internal_movc" [(set (match_operand:DI 0 "register_operand" "=&d") (ashiftrt:DI (match_operand:DI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "d"))) (clobber (match_operand:SI 3 "register_operand" "=&d"))] "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16 && ISA_HAS_CONDMOVE" "subu\t%3,%.,%2\;\ srl\t%L0,%L1,%2\;\ sll\t%3,%M1,%3\;\ sra\t%M0,%M1,%2\;\ movz\t%3,%.,%2\;\ or\t%L0,%L0,%3\;\ and\t%3,%2,32\;\ movn\t%L0,%M0,%3\;\ movn\t%M0,%.,%3\;\ movn\t%3,%L0,%3\;\ sra\t%3,%3,31\;\ or\t%M0,%M0,%3" [(set_attr "type" "darith") (set_attr "mode" "DI") (set_attr "length" "48")]) ;;; XXX Note won't handle a shift count outside the range 0 - 63 (define_insn "lshrdi3_internal_movc" [(set (match_operand:DI 0 "register_operand" "=&d") (lshiftrt:DI (match_operand:DI 1 "register_operand" "d") (match_operand:SI 2 "register_operand" "d"))) (clobber (match_operand:SI 3 "register_operand" "=&d"))] "!TARGET_64BIT && !TARGET_DEBUG_G_MODE && !TARGET_MIPS16 && ISA_HAS_CONDMOVE" "subu\t%3,%.,%2\;\ srl\t%L0,%L1,%2\;\ sll\t%3,%M1,%3\;\ srl\t%M0,%M1,%2\;\ movz\t%3,%.,%2\;\ or\t%L0,%L0,%3\;\ and\t%3,%2,32\;\ movn\t%L0,%M0,%3\;\ movn\t%M0,%.,%3" [(set_attr "type" "darith") (set_attr "mode" "DI") (set_attr "length" "36")]) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-03 9:54 ` Nigel Stephens @ 2004-08-04 19:57 ` Maciej W. Rozycki 2004-08-04 20:37 ` Nigel Stephens 2004-08-07 19:01 ` Richard Sandiford 0 siblings, 2 replies; 36+ messages in thread From: Maciej W. Rozycki @ 2004-08-04 19:57 UTC (permalink / raw) To: Nigel Stephens Cc: Richard Sandiford, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips On Tue, 3 Aug 2004, Nigel Stephens wrote: > Note that there is one slightly controversial aspect of these sequences, > which is that they don't truncate the shift count, so a shift outside of > the range 0 to 63 will generate an "unusual" result. This didn't cause > any regression failures, and I believe that this is strictly speaking > acceptable for C, since a shift is undefined outside of this range - but > it could cause some "buggy" code to break. It wouldn't be hard to add an > extra mask with 0x3f if people were nervous about this - it's just that > I didn't have enough spare temp registers within the constraints of the > existing DImode patterns. Well, masking is trivial with no additional temporary :-) and for ashrdi3 we can "cheat" and use $at to require only a single additional instruction compared to the others. Here are my proposals I've referred to previously. Instruction counts are 9, 9 and 10, respectively, as I've missed an additional instruction required to handle shifts by 0 (or actually any multiples of 64). The semantics they implement corresponds to one of the dsllv, dsrlv and dsrav, respectively. I've expressed them in terms of functions rather than RTL patterns, but a conversion is trivial. This form was simply easier to validate for me and they can be used as libgcc function replacements for Linux for MIPS IV and higher ISAs. long long __ashldi3(long long v, int c) { long long r; long r0; asm( "sllv %L0, %L2, %3\n\t" "sllv %M0, %M2, %3\n\t" "not %1, %3\n\t" "srlv %1, %L2, %1\n\t" "srl %1, %1, 1\n\t" "or %M0, %M0, %1\n\t" "andi %1, %3, 0x20\n\t" "movn %M0, %L0, %1\n\t" "movn %L0, $0, %1" : "=&r" (r), "=&r" (r0) : "r" (v), "r" (c)); return r; } unsigned long long __lshrdi3(unsigned long long v, int c) { unsigned long long r; long r0; asm( "srlv %M0, %M2, %3\n\t" "srlv %L0, %L2, %3\n\t" "not %1, %3\n\t" "sllv %1, %M2, %1\n\t" "sll %1, %1, 1\n\t" "or %L0, %L0, %1\n\t" "andi %1, %3, 0x20\n\t" "movn %L0, %M0, %1\n\t" "movn %M0, $0, %1" : "=&r" (r), "=&r" (r0) : "r" (v), "r" (c)); return r; } long long __ashrdi3(long long v, int c) { long long r; long r0; asm( "not %1, %3\n\t" "srav %M0, %M2, %3\n\t" "srlv %L0, %L2, %3\n\t" "sllv %1, %M2, %1\n\t" "sll %1, %1, 1\n\t" "or %L0, %L0, %1\n\t" "andi %1, %3, 0x20\n\t" ".set push\n\t" ".set noat\n\t" "sra $1, %M2, 31\n\t" "movn %L0, %M0, %1\n\t" "movn %M0, $1, %1\n\t" ".set pop" : "=&r" (r), "=&r" (r0) : "r" (v), "r" (c)); return r; } I don't know if the middle-end is capable to express these operations, but they are pure ALU, so I'd expect it to. Maciej ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-04 19:57 ` Maciej W. Rozycki @ 2004-08-04 20:37 ` Nigel Stephens 2004-08-04 20:54 ` Maciej W. Rozycki 2004-08-07 19:01 ` Richard Sandiford 1 sibling, 1 reply; 36+ messages in thread From: Nigel Stephens @ 2004-08-04 20:37 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Richard Sandiford, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Maciej W. Rozycki wrote: > Here are my proposals I've referred to previously. Instruction counts >are 9, 9 and 10, respectively, as I've missed an additional instruction >required to handle shifts by 0 (or actually any multiples of 64). > IMHO handling a shift by zero correctly is important. > "not %1, %3\n\t" > "srlv %1, %L2, %1\n\t" > "srl %1, %1, 1\n\t" > Why not the shorter: > "neg %1, %3\n\t" > "srlv %1, %L2, %1\n\t" > > > And then in __ashrdi3: "andi %1, %3, 0x20\n\t" ".set push\n\t" ".set noat\n\t" "sra $1, %M2, 31\n\t" "movn %L0, %M0, %1\n\t" "movn %M0, $1, %1\n\t" ".set pop" Cute, but I think that should be "sra $1, %M0, 31\n\t" (i.e %M0 not %M2) Nigel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-04 20:37 ` Nigel Stephens @ 2004-08-04 20:54 ` Maciej W. Rozycki 2004-08-04 23:39 ` Nigel Stephens 0 siblings, 1 reply; 36+ messages in thread From: Maciej W. Rozycki @ 2004-08-04 20:54 UTC (permalink / raw) To: Nigel Stephens Cc: Richard Sandiford, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips On Wed, 4 Aug 2004, Nigel Stephens wrote: > > Here are my proposals I've referred to previously. Instruction counts > >are 9, 9 and 10, respectively, as I've missed an additional instruction > >required to handle shifts by 0 (or actually any multiples of 64). > > IMHO handling a shift by zero correctly is important. Agreed, hence an additional instruction needed. > > "not %1, %3\n\t" > > "srlv %1, %L2, %1\n\t" > > "srl %1, %1, 1\n\t" > > > > Why not the shorter: > > > "neg %1, %3\n\t" > > "srlv %1, %L2, %1\n\t" Notice the difference -- this shorter code doesn't handle shifts by zero correctly. ;-) > And then in __ashrdi3: > > "andi %1, %3, 0x20\n\t" > ".set push\n\t" > ".set noat\n\t" > "sra $1, %M2, 31\n\t" > "movn %L0, %M0, %1\n\t" > "movn %M0, $1, %1\n\t" > ".set pop" > > Cute, but I think that should be > > "sra $1, %M0, 31\n\t" > > (i.e %M0 not %M2) Well, I've tested it for all shift counts and it works properly as is -- we care of the value of bit #31 to be shifted only and at this stage it's the same in both registers. So it's just a matter of style. Maciej ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-04 20:54 ` Maciej W. Rozycki @ 2004-08-04 23:39 ` Nigel Stephens 0 siblings, 0 replies; 36+ messages in thread From: Nigel Stephens @ 2004-08-04 23:39 UTC (permalink / raw) To: Maciej W. Rozycki Cc: Richard Sandiford, Ralf Baechle, Richard Henderson, gcc-patches, linux-mips Maciej W. Rozycki wrote: >On Wed, 4 Aug 2004, Nigel Stephens wrote: > > > >>>Here are my proposals I've referred to previously. Instruction counts >>>are 9, 9 and 10, respectively, as I've missed an additional instruction >>>required to handle shifts by 0 (or actually any multiples of 64). >>> >>> >>IMHO handling a shift by zero correctly is important. >> >> > > Agreed, hence an additional instruction needed. > > > >>> "not %1, %3\n\t" >>> "srlv %1, %L2, %1\n\t" >>> "srl %1, %1, 1\n\t" >>> >>> >>> >>Why not the shorter: >> >> >> >>> "neg %1, %3\n\t" >>> "srlv %1, %L2, %1\n\t" >>> >>> > > Notice the difference -- this shorter code doesn't handle shifts by zero >correctly. ;-) > > Ah yes, I see. I did it with a conditional move to fix up after the shift, but same result. >>And then in __ashrdi3: >> >> "andi %1, %3, 0x20\n\t" >> ".set push\n\t" >> ".set noat\n\t" >> "sra $1, %M2, 31\n\t" >> "movn %L0, %M0, %1\n\t" >> "movn %M0, $1, %1\n\t" >> ".set pop" >> >>Cute, but I think that should be >> >> "sra $1, %M0, 31\n\t" >> >>(i.e %M0 not %M2) >> >> > > Well, I've tested it for all shift counts and it works properly as is -- >we care of the value of bit #31 to be shifted only and at this stage it's >the same in both registers. So it's just a matter of style. > > > OK, I see Nigel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-04 19:57 ` Maciej W. Rozycki 2004-08-04 20:37 ` Nigel Stephens @ 2004-08-07 19:01 ` Richard Sandiford 2004-08-09 22:08 ` Richard Henderson 1 sibling, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-08-07 19:01 UTC (permalink / raw) To: Maciej W. Rozycki; +Cc: Nigel Stephens, gcc-patches, linux-mips FWIW, here's a work-in-progress patch. It takes the current optabs handling of doubleword shifts by constants and generalises it to handle variable shifts as well. I've tried to take advantage of SHIFT_COUNT_TRUNCATED where possible. Unfortunately, I'm not sure whether the code is good enough in the !SHIFT_COUNT_TRUNCATED case. E.g. ARM has special asm versions of 64-bit shifts, and it takes advantage of the fact that word shifts by 32 bits or more will set the result to 0 (ashl or lshr) or -1 (ashr). We can't do that in optabs.c since !SHIFT_COUNT_TRUNCATED dosn't mean that shifts _aren't_ truncated, it simply means that the behaviour of out-of-range shifts is undefined. I've checked that the new open-coded variable shifts do work on ARM, but perhaps they should be disabled on !SHIFT_COUNT_TRUNCATED targets, at least for now. Anyway, the SHIFT_COUNT_TRUNCATED version produces MIPS sequences that are the same length as Maciej's asm versions (assuming conditional moves are available). They should take 5 cycles on a typical 2-way superscalar target. I've bootstrapped & regression tested the patch on mips-sgi-irix6.5 but it's not really ready for approval yet. Just posting for info & comments. Of course, this doesn't let linux off the hook. It still needs to define the libgcc shift functions if it wants to support -Os compilation. Richard * optabs.c (simplify_expand_binop, expand_superword_shift) (expand_subword_shift, expand_doubleword_shift): New functions. Generalize expand_binop's handling of doubleword shifts so that it can cope with non-constant shift amounts. (expand_binop): Replace said handling with expand_doubleword_shift. Index: optabs.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/optabs.c,v retrieving revision 1.231 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.231 optabs.c *** optabs.c 22 Jul 2004 08:20:35 -0000 1.231 --- optabs.c 7 Aug 2004 07:14:19 -0000 *************** optab_for_tree_code (enum tree_code code *** 709,715 **** --- 709,981 ---- return NULL; } } + \f + /* Like expand_binop, but return a constant rtx if the result can be + calculated at compile time. The arguments and return value are + otherwise the same as for expand_binop. */ + + static rtx + simplify_expand_binop (enum machine_mode mode, optab binoptab, + rtx op0, rtx op1, rtx target, int unsignedp, + enum optab_methods methods) + { + if (CONSTANT_P (op0) && CONSTANT_P (op1)) + return simplify_gen_binary (binoptab->code, mode, op0, op1); + else + return expand_binop (mode, binoptab, op0, op1, target, unsignedp, methods); + } + /* This subroutine of expand_doubleword_shift handles the cases in which + the effective shift value is >= BITS_PER_WORD. The arguments and return + value are the same as for the parent routine. */ + + static bool + expand_superword_shift (enum machine_mode op1_mode, optab binoptab, + rtx outof_input, rtx op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods) + { + rtx tmp; + + /* If shifts aren't truncated, we should shift OUTOF_INPUT by + (OP1 - BITS_PER_WORD) bits and store the result in INTO_TARGET. + If shifts are truncated, we can just shift by OP1 itself . */ + if (SHIFT_COUNT_TRUNCATED && !CONSTANT_P (op1)) + tmp = op1; + else + { + tmp = immed_double_const (BITS_PER_WORD, 0, op1_mode); + tmp = simplify_expand_binop (op1_mode, sub_optab, op1, tmp, + 0, true, methods); + if (tmp == 0) + return false; + } + tmp = expand_binop (word_mode, binoptab, outof_input, tmp, + into_target, unsignedp, methods); + if (tmp == 0) + return false; + if (tmp != into_target) + emit_move_insn (into_target, tmp); + + /* For a signed right shift, we must fill OUTOF_TARGET with copies + of the sign bit, otherwise we must fill it with zeros. */ + if (binoptab != ashr_optab) + tmp = CONST0_RTX (word_mode); + else + { + tmp = expand_binop (word_mode, binoptab, + outof_input, GEN_INT (BITS_PER_WORD - 1), + outof_target, unsignedp, methods); + if (tmp == 0) + return false; + } + if (tmp != outof_target) + emit_move_insn (outof_target, tmp); + + return true; + } + + /* This subroutine of expand_doubleword_shift handles the cases in which + the effective shift value is < BITS_PER_WORD. The arguments and return + value are the same as for the parent routine. */ + + static bool + expand_subword_shift (enum machine_mode op1_mode, optab binoptab, + rtx outof_input, rtx into_input, rtx op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods) + { + optab reverse_unsigned_shift, unsigned_shift; + rtx tmp, carries; + + reverse_unsigned_shift = (binoptab == ashl_optab ? lshr_optab : ashl_optab); + unsigned_shift = (binoptab == ashl_optab ? ashl_optab : lshr_optab); + + /* The low OP1 bits of INTO_TARGET come from the high bits of OUTOF_INPUT. + We therefore need to shift OUTOF_INPUT by (BITS_PER_WORD - OP1) bits in + the opposite direction to BINOPTAB. */ + if (GET_CODE (op1) != CONST_INT || INTVAL (op1) == 0) + { + /* We must avoid shifting by BITS_PER_WORD bits since that is + either the same as a zero shift (if SHIFT_COUNT_TRUNCATED) + or has undefined RTL semantics. Do a single shift first, + then shift by the remainder. It's OK to use ~OP1 as the + remainder if shift counts are truncated. */ + carries = expand_binop (word_mode, reverse_unsigned_shift, + outof_input, const1_rtx, 0, unsignedp, methods); + if (SHIFT_COUNT_TRUNCATED && !CONSTANT_P (op1)) + { + tmp = immed_double_const (-1, -1, op1_mode); + tmp = simplify_expand_binop (op1_mode, xor_optab, op1, tmp, + 0, true, methods); + } + else + { + tmp = immed_double_const (BITS_PER_WORD - 1, 0, op1_mode); + tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1, + 0, true, methods); + } + } + else + { + carries = outof_input; + tmp = immed_double_const (BITS_PER_WORD, 0, op1_mode); + tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1, + 0, true, methods); + } + if (tmp == 0 || carries == 0) + return false; + carries = expand_binop (word_mode, reverse_unsigned_shift, + carries, tmp, 0, unsignedp, methods); + if (carries == 0) + return false; + + /* Shift INTO_INPUT logically by OP1 bits... */ + tmp = expand_binop (word_mode, unsigned_shift, into_input, op1, + 0, unsignedp, methods); + if (tmp == 0) + return false; + + /* ...and OR in the bits carried over from OUTOF_INPUT. */ + tmp = expand_binop (word_mode, ior_optab, carries, tmp, + into_target, unsignedp, methods); + if (tmp == 0) + return false; + if (tmp != into_target) + emit_move_insn (into_target, tmp); + + /* Use a standard word_mode shift for the out-of half. */ + tmp = expand_binop (word_mode, binoptab, outof_input, op1, + outof_target, unsignedp, methods); + if (tmp == 0) + return false; + if (tmp != outof_target) + emit_move_insn (outof_target, tmp); + + return true; + } + + /* Expand a doubleword shift (ashl, ashr or lshr) using word-mode shifts. + OUTOF_INPUT is the input word that we are shifting away from and + INTO_INPUT is the word that we are shifting towards. OUTOF_TARGET + and INTO_TARGET specify the equivalent words of the output. OP1 is + the shift amount, which has mode OP1_MODE. BINOPTAB, UNSIGNEDP and + METHODS are as for expand_binop. + + This function must not assign to OUTOF_TARGET or INTO_TARGET + until it has completely finished with the input operands. + + Return true if the shift could be successfully synthesized. */ + + static bool + expand_doubleword_shift (enum machine_mode op1_mode, optab binoptab, + rtx outof_input, rtx into_input, rtx op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods) + { + rtx tmp, cmp1, cmp2; + rtx subword_label, done_label; + #ifdef HAVE_conditional_move + rtx start, outof_superword, into_superword; + #endif + enum rtx_code cmp_code; + + /* Set CMP_CODE, CMP1 and CMP2 so that the rtx (CMP_CODE CMP1 CMP2) + is true when the effective shift value is less than BITS_PER_WORD. */ + tmp = immed_double_const (BITS_PER_WORD, 0, op1_mode); + if (SHIFT_COUNT_TRUNCATED) + { + cmp1 = simplify_expand_binop (op1_mode, and_optab, + op1, tmp, 0, true, methods); + cmp2 = const0_rtx; + cmp_code = EQ; + if (cmp1 == 0) + return false; + } + else + { + cmp1 = op1; + cmp2 = tmp; + cmp_code = LT; + } + + /* If we can compute the condition at compile time, pick the + appropriate subroutine. */ + tmp = simplify_relational_operation (cmp_code, SImode, op1_mode, cmp1, cmp2); + if (tmp != 0 && GET_CODE (tmp) == CONST_INT) + { + if (tmp == const0_rtx) + return expand_superword_shift (op1_mode, binoptab, + outof_input, op1, + outof_target, into_target, + unsignedp, methods); + else + return expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + outof_target, into_target, + unsignedp, methods); + } + + #ifdef HAVE_conditional_move + /* Try using conditional moves to select between the subword and + superword forms. Do the superword version first, putting the + result into a temporary comprised of OUTOF_SUPERWORD and + INTO_SUPERWORD. Then do the subword version and store it + directly into the final output. + + Note that OUTOF_TARGET and INTO_SUPERWORD are both equal to + (shift OUTOF_INPUT OP1) when shift counts are truncated. */ + outof_superword = gen_reg_rtx (word_mode); + into_superword = gen_reg_rtx (word_mode); + + start = get_last_insn (); + if (expand_superword_shift (op1_mode, binoptab, + outof_input, op1, + outof_superword, into_superword, + unsignedp, methods) + && expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + outof_target, into_target, + unsignedp, methods) + && emit_conditional_move (into_target, cmp_code, cmp1, cmp2, op1_mode, + into_target, (SHIFT_COUNT_TRUNCATED + ? outof_target + : into_superword), + word_mode, unsignedp) + && emit_conditional_move (outof_target, cmp_code, cmp1, cmp2, op1_mode, + outof_target, outof_superword, word_mode, + unsignedp)) + return true; + + delete_insns_since (start); + #endif + + /* As a last resort, use branches to select the correct alternative. */ + subword_label = gen_label_rtx (); + done_label = gen_label_rtx (); + + do_compare_rtx_and_jump (cmp1, cmp2, cmp_code, true, op1_mode, + 0, 0, subword_label); + + if (!expand_superword_shift (op1_mode, binoptab, + outof_input, op1, + outof_target, into_target, + unsignedp, methods)) + return false; + + emit_jump_insn (gen_jump (done_label)); + emit_barrier (); + emit_label (subword_label); + + if (!expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + outof_target, into_target, + unsignedp, methods)) + return false; + + emit_label (done_label); + return true; + } \f /* Wrapper around expand_binop which takes an rtx code to specify the operation to perform, not an optab pointer. All other *************** expand_binop (enum machine_mode mode, op *** 1035,1050 **** if ((binoptab == lshr_optab || binoptab == ashl_optab || binoptab == ashr_optab) && class == MODE_INT ! && GET_CODE (op1) == CONST_INT && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD && binoptab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && ashl_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && lshr_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing) { ! rtx insns, inter, equiv_value; rtx into_target, outof_target; rtx into_input, outof_input; ! int shift_count, left_shift, outof_word; /* If TARGET is the same as one of the operands, the REG_EQUAL note won't be accurate, so use a new target. */ --- 1301,1317 ---- if ((binoptab == lshr_optab || binoptab == ashl_optab || binoptab == ashr_optab) && class == MODE_INT ! && (GET_CODE (op1) == CONST_INT || !optimize_size) && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD && binoptab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && ashl_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && lshr_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing) { ! rtx insns, equiv_value; rtx into_target, outof_target; rtx into_input, outof_input; ! enum machine_mode op1_mode; ! int left_shift, outof_word; /* If TARGET is the same as one of the operands, the REG_EQUAL note won't be accurate, so use a new target. */ *************** expand_binop (enum machine_mode mode, op *** 1053,1059 **** start_sequence (); ! shift_count = INTVAL (op1); /* OUTOF_* is the word we are shifting bits away from, and INTO_* is the word that we are shifting bits towards, thus --- 1320,1326 ---- start_sequence (); ! op1_mode = GET_MODE (op1) != VOIDmode ? GET_MODE (op1) : word_mode; /* OUTOF_* is the word we are shifting bits away from, and INTO_* is the word that we are shifting bits towards, thus *************** expand_binop (enum machine_mode mode, op *** 1069,1145 **** outof_input = operand_subword_force (op0, outof_word, mode); into_input = operand_subword_force (op0, 1 - outof_word, mode); ! if (shift_count >= BITS_PER_WORD) ! { ! inter = expand_binop (word_mode, binoptab, ! outof_input, ! GEN_INT (shift_count - BITS_PER_WORD), ! into_target, unsignedp, next_methods); ! ! if (inter != 0 && inter != into_target) ! emit_move_insn (into_target, inter); ! ! /* For a signed right shift, we must fill the word we are shifting ! out of with copies of the sign bit. Otherwise it is zeroed. */ ! if (inter != 0 && binoptab != ashr_optab) ! inter = CONST0_RTX (word_mode); ! else if (inter != 0) ! inter = expand_binop (word_mode, binoptab, ! outof_input, ! GEN_INT (BITS_PER_WORD - 1), ! outof_target, unsignedp, next_methods); ! ! if (inter != 0 && inter != outof_target) ! emit_move_insn (outof_target, inter); ! } ! else { ! rtx carries; ! optab reverse_unsigned_shift, unsigned_shift; ! ! /* For a shift of less then BITS_PER_WORD, to compute the carry, ! we must do a logical shift in the opposite direction of the ! desired shift. */ ! ! reverse_unsigned_shift = (left_shift ? lshr_optab : ashl_optab); ! ! /* For a shift of less than BITS_PER_WORD, to compute the word ! shifted towards, we need to unsigned shift the orig value of ! that word. */ ! ! unsigned_shift = (left_shift ? ashl_optab : lshr_optab); ! ! carries = expand_binop (word_mode, reverse_unsigned_shift, ! outof_input, ! GEN_INT (BITS_PER_WORD - shift_count), ! 0, unsignedp, next_methods); - if (carries == 0) - inter = 0; - else - inter = expand_binop (word_mode, unsigned_shift, into_input, - op1, 0, unsignedp, next_methods); - - if (inter != 0) - inter = expand_binop (word_mode, ior_optab, carries, inter, - into_target, unsignedp, next_methods); - - if (inter != 0 && inter != into_target) - emit_move_insn (into_target, inter); - - if (inter != 0) - inter = expand_binop (word_mode, binoptab, outof_input, - op1, outof_target, unsignedp, next_methods); - - if (inter != 0 && inter != outof_target) - emit_move_insn (outof_target, inter); - } - - insns = get_insns (); - end_sequence (); - - if (inter != 0) - { if (binoptab->code != UNKNOWN) equiv_value = gen_rtx_fmt_ee (binoptab->code, mode, op0, op1); else --- 1336,1349 ---- outof_input = operand_subword_force (op0, outof_word, mode); into_input = operand_subword_force (op0, 1 - outof_word, mode); ! if (expand_doubleword_shift (op1_mode, binoptab, ! outof_input, into_input, op1, ! outof_target, into_target, ! unsignedp, methods)) { ! insns = get_insns (); ! end_sequence (); if (binoptab->code != UNKNOWN) equiv_value = gen_rtx_fmt_ee (binoptab->code, mode, op0, op1); else *************** expand_binop (enum machine_mode mode, op *** 1148,1153 **** --- 1352,1358 ---- emit_no_conflict_block (insns, target, op0, op1, equiv_value); return target; } + end_sequence (); } /* Synthesize double word rotates from single word shifts. */ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-07 19:01 ` Richard Sandiford @ 2004-08-09 22:08 ` Richard Henderson 2004-08-10 5:30 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-08-09 22:08 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Sat, Aug 07, 2004 at 08:01:43PM +0100, Richard Sandiford wrote: > + do_compare_rtx_and_jump (cmp1, cmp2, cmp_code, true, op1_mode, > + 0, 0, subword_label); > + > + if (!expand_superword_shift (op1_mode, binoptab, > + outof_input, op1, > + outof_target, into_target, > + unsignedp, methods)) > + return false; Return without cleaning up the branch emitted? In particular, doing so without emitting the labels will result in ICEs. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-09 22:08 ` Richard Henderson @ 2004-08-10 5:30 ` Richard Sandiford 2004-08-10 23:20 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-08-10 5:30 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > On Sat, Aug 07, 2004 at 08:01:43PM +0100, Richard Sandiford wrote: >> + do_compare_rtx_and_jump (cmp1, cmp2, cmp_code, true, op1_mode, >> + 0, 0, subword_label); >> + >> + if (!expand_superword_shift (op1_mode, binoptab, >> + outof_input, op1, >> + outof_target, into_target, >> + unsignedp, methods)) >> + return false; > > Return without cleaning up the branch emitted? In particular, > doing so without emitting the labels will result in ICEs. The whole thing's in a sequence that gets discarded if expand_doubleword_shift returns false. Isn't that enough? Richad ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-10 5:30 ` Richard Sandiford @ 2004-08-10 23:20 ` Richard Henderson 2004-08-11 0:24 ` Andreas Schwab ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Richard Henderson @ 2004-08-10 23:20 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Tue, Aug 10, 2004 at 06:30:28AM +0100, Richard Sandiford wrote: > The whole thing's in a sequence that gets discarded if > expand_doubleword_shift returns false. Isn't that enough? Missed that, sorry. Patch seems ok then. We'd have to add a new macro/target flag to handle non-truncating shifts -- we've got cases: (1) Large shift shifts out all bits (ARM) (2) Large shifts trap (VAX) (3) Shift count truncated to 31, always, which means QI/HI shifts are yield undefined results with large shifts. (i386) r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-10 23:20 ` Richard Henderson @ 2004-08-11 0:24 ` Andreas Schwab 2004-08-11 0:40 ` Paul Brook 2004-08-31 19:51 ` Richard Sandiford 2 siblings, 0 replies; 36+ messages in thread From: Andreas Schwab @ 2004-08-11 0:24 UTC (permalink / raw) To: Richard Henderson Cc: Richard Sandiford, Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > Patch seems ok then. We'd have to add a new macro/target flag > to handle non-truncating shifts -- we've got cases: > > (1) Large shift shifts out all bits (ARM) > (2) Large shifts trap (VAX) > (3) Shift count truncated to 31, always, which means QI/HI > shifts are yield undefined results with large shifts. (i386) (4) Shift count reduced modulo 64 (m68k) Andreas. -- Andreas Schwab, SuSE Labs, schwab@suse.de SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-10 23:20 ` Richard Henderson 2004-08-11 0:24 ` Andreas Schwab @ 2004-08-11 0:40 ` Paul Brook 2004-08-11 4:32 ` Richard Henderson 2004-08-31 19:51 ` Richard Sandiford 2 siblings, 1 reply; 36+ messages in thread From: Paul Brook @ 2004-08-11 0:40 UTC (permalink / raw) To: gcc-patches Cc: Richard Henderson, Richard Sandiford, Maciej W. Rozycki, Nigel Stephens, linux-mips On Wednesday 11 August 2004 00:20, Richard Henderson wrote: > On Tue, Aug 10, 2004 at 06:30:28AM +0100, Richard Sandiford wrote: > > The whole thing's in a sequence that gets discarded if > > expand_doubleword_shift returns false. Isn't that enough? > > Missed that, sorry. > > Patch seems ok then. We'd have to add a new macro/target flag > to handle non-truncating shifts -- we've got cases: > > (1) Large shift shifts out all bits (ARM) ARM is actually shift count modulo 256 Paul ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-11 0:40 ` Paul Brook @ 2004-08-11 4:32 ` Richard Henderson 0 siblings, 0 replies; 36+ messages in thread From: Richard Henderson @ 2004-08-11 4:32 UTC (permalink / raw) To: Paul Brook Cc: gcc-patches, Richard Sandiford, Maciej W. Rozycki, Nigel Stephens, linux-mips On Wed, Aug 11, 2004 at 01:40:03AM +0100, Paul Brook wrote: > ARM is actually shift count modulo 256 Ah, well. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-10 23:20 ` Richard Henderson 2004-08-11 0:24 ` Andreas Schwab 2004-08-11 0:40 ` Paul Brook @ 2004-08-31 19:51 ` Richard Sandiford 2004-09-03 6:53 ` Richard Henderson 2 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-08-31 19:51 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > Patch seems ok then. We'd have to add a new macro/target flag > to handle non-truncating shifts -- we've got cases: > > (1) Large shift shifts out all bits (ARM) > (2) Large shifts trap (VAX) > (3) Shift count truncated to 31, always, which means QI/HI > shifts are yield undefined results with large shifts. (i386) I'm not sure whether (2) really affects things much. By default, the code is supposed to do exactly what libgcc2.c would do, i.e.: - the double-word shift guarantees no particular behaviour for shifts counts outside the range [0, BITS_PER_WORD * 2) - for counts inside that range, the code will only use well-defined word-mode shifts. In the patch I posted originally, SHIFT_COUNT_TRUNCATED changed the default behaviour in two ways: a) it guaranteed that the doubleword shift would truncate the shift count. b) it enabled some extra optimisations, particularly in the conditional move case. As you say, using S_C_T was a bit limited, especially since it requires a particular behaviour for unrelated things like ZERO_EXTRACT. So, to deal with (1) and (3) from your list, the patch below adds a new target hook: int TARGET_SHIFT_TRUNCATION_MASK (enum machine_mode MODE) This function describes how the standard shift patterns for MODE deal with shifts by negative amounts or by more than the width of the mode. *Note shift patterns::. On many machines, the shift patterns will apply a mask M to the shift count, meaning that a fixed-width shift of X by Y is equivalent to an arbitrary-width shift of X by Y & M. If this is true for mode MODE, the function should return M, otherwise it should return 0. A return value of 0 indicates that no particular behavior is guaranteed. Note that, unlike `SHIFT_COUNT_TRUNCATED', this function does _not_ apply to general shift rtxes; it applies only to instructions that are generated by the named shift patterns. The default implementation of this function returns `GET_MODE_BITSIZE (MODE) - 1' if `SHIFT_COUNT_TRUNCATED' and 0 otherwise. This definition is always safe, but if `SHIFT_COUNT_TRUNCATED' is false, and some shift patterns nevertheless truncate the shift count, you may get better code by overriding it. Thus the optimisations from (b) that used to be conditional on S_C_T are now conditional on: TARGET_SHIFT_TRUNCATION_MASK (word_mode) == BITS_PER_WORD - 1 The truncation behaviour from (a) is guaranteed if: TARGET_SHIFT_TRUNCATION_MASK (double_word_mode) == BITS_PER_WORD * 2 - 1 although the optimisation only handles the latter if the former is also true. In other cases, it punts unless T_S_T_M returns 0 for double_word_mode. The point of the new hook is that allows us to optimise case (1) from your list. If: TARGET_SHIFT_TRUNCATION_MASK (word_mode) >= BITS_PER_WORD * 2 - 1 then (because out-of-range shift counts are undefined for the doubleword shifts) we can use: outof_target = (shift outof_input op1) as per config/arm/lib1funcs.asm. One potential drawback of all this is that it generates rtxes that rely on the behaviour of shifts by more than the word width. At the moment, simplify-rtx.c will fold such shifts using whatever the host compiler thinks is suitable. E.g.: case ASHIFT: if (arg1 < 0) return 0; if (SHIFT_COUNT_TRUNCATED) arg1 %= width; val = ((unsigned HOST_WIDE_INT) arg0) << arg1; break; which accepts any positive arg1, even if !SHIFT_COUNT_TRUNCATED. This seems pretty dubious anyway. What if a define_expand in the backend uses shifts to implement a complex named pattern? I'd have thought the backend would be free to use target-specific knowledge about what that shift does with out-of-range values. And if we are later able to constant-fold the result, the code above might not do what the target machine would do. The patch therefore refuses to optimise out-of-range counts unless SHIFT_COUNT_TRUNCATED. It also fixes the following sign-extension code: /* Bootstrap compiler may not have sign extended the right shift. Manually extend the sign to insure bootstrap cc matches gcc. */ if (arg0s < 0 && arg1 > 0) val |= ((HOST_WIDE_INT) -1) << (HOST_BITS_PER_WIDE_INT - arg1); which isn't right for modes whose width is != HOST_BITS_PER_WIDE_INT. As an example: unsigned long long f (unsigned long long x, int y) { return x << y; } is now implemented thus for arm-elf-gcc -O2 -fno-schedule-insns: mov r1, r1, asl r2 rsb r3, r2, #32 orr r1, r1, r0, lsr r3 subs ip, r2, #32 movpl r1, r0, asl ip mov r0, r0, asl r2 bx lr (without -fno-schedule-insns, the scheduler will increase register pressure and force some call-saved registers live.) This sequence is at least the same length as the hand-coded lib1funcs.asm version, so I hope it's better than the call we get now. For mips64-elf-gcc -O2 -march=rm9000 -mabi=32, we get: nor $7,$0,$6 srl $8,$5,1 sll $2,$4,$6 srl $8,$8,$7 sll $3,$5,$6 or $2,$8,$2 andi $6,$6,0x20 movn $2,$3,$6 j $31 movn $3,$0,$6 which is nicely superscalar, and the same length as Maciej's hand-coded version. As before, the patch will fall back on jumps if conditional moves aren't available. Bootstrapped & regression tested on i686-pc-linux-gnu and mips-sgi-irix6.5. Also tested on arm-elf (default language set, test pattern arm-elf{,-mthumb}). OK to install? Richard * doc/md.texi (shift patterns): New anchor. Add reference to TARGET_SHIFT_TRUNCATION_MASK. * doc/tm.texi (TARGET_SHIFT_TRUNCATION_MASK): Document. * target.h (shift_truncation_mask): New target hook. * targhook.h (default_shift_truncation_mask): Declare. * targhook.c (default_shift_truncation_mask): Define. * target-def.h (TARGET_SHIFT_TRUNCATION_MASK): Define. (TARGET_INITIALIZER): Include it. * simplify-rtx.c (simplify_binary_operation): Combine ASHIFT, ASHIFTRT and LSHIFTRT cases. Truncate arg1 if SHIFT_COUNT_TRUNCATED, otherwise reject all out-of-range values. Fix sign-extension code for modes whose width is smaller than HOST_BITS_PER_WIDE_INT. * optabs.c (simplify_expand_binop, force_expand_binop): New functions. (expand_superword_shift, expand_subword_shift): Likewise. (expand_doubleword_shift_condmove, expand_doubleword_shift): Likewise. (expand_binop): Use them to implement double-word shifts. * config/arm/arm.c (arm_shift_truncation_mask): New function. (TARGET_SHIFT_TRUNCATION_MASK): Define. Index: doc/md.texi =================================================================== RCS file: /cvs/gcc/gcc/gcc/doc/md.texi,v retrieving revision 1.108 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.108 md.texi *** doc/md.texi 23 Aug 2004 05:55:46 -0000 1.108 --- doc/md.texi 31 Aug 2004 18:44:55 -0000 *************** quotient or remainder and generate the a *** 2884,2896 **** @item @samp{udivmod@var{m}4} Similar, but does unsigned division. @cindex @code{ashl@var{m}3} instruction pattern @item @samp{ashl@var{m}3} Arithmetic-shift operand 1 left by a number of bits specified by operand 2, and store the result in operand 0. Here @var{m} is the mode of operand 0 and operand 1; operand 2's mode is specified by the instruction pattern, and the compiler will convert the operand to that ! mode before generating the instruction. @cindex @code{ashr@var{m}3} instruction pattern @cindex @code{lshr@var{m}3} instruction pattern --- 2884,2899 ---- @item @samp{udivmod@var{m}4} Similar, but does unsigned division. + @anchor{shift patterns} @cindex @code{ashl@var{m}3} instruction pattern @item @samp{ashl@var{m}3} Arithmetic-shift operand 1 left by a number of bits specified by operand 2, and store the result in operand 0. Here @var{m} is the mode of operand 0 and operand 1; operand 2's mode is specified by the instruction pattern, and the compiler will convert the operand to that ! mode before generating the instruction. The meaning of out-of-range shift ! counts can optionally be specified by @code{TARGET_SHIFT_TRUNCATION_MASK}. ! @xref{TARGET_SHIFT_TRUNCATION_MASK}. @cindex @code{ashr@var{m}3} instruction pattern @cindex @code{lshr@var{m}3} instruction pattern Index: doc/tm.texi =================================================================== RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v retrieving revision 1.360 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.360 tm.texi *** doc/tm.texi 29 Aug 2004 22:10:44 -0000 1.360 --- doc/tm.texi 31 Aug 2004 18:45:09 -0000 *************** the implied truncation of the shift inst *** 8731,8736 **** --- 8731,8761 ---- You need not define this macro if it would always have the value of zero. @end defmac + @anchor{TARGET_SHIFT_TRUNCATION_MASK} + @deftypefn {Target Hook} int TARGET_SHIFT_TRUNCATION_MASK (enum machine_mode @var{mode}) + This function describes how the standard shift patterns for @var{mode} + deal with shifts by negative amounts or by more than the width of the mode. + @xref{shift patterns}. + + On many machines, the shift patterns will apply a mask @var{m} to the + shift count, meaning that a fixed-width shift of @var{x} by @var{y} is + equivalent to an arbitrary-width shift of @var{x} by @var{y & m}. If + this is true for mode @var{mode}, the function should return @var{m}, + otherwise it should return 0. A return value of 0 indicates that no + particular behavior is guaranteed. + + Note that, unlike @code{SHIFT_COUNT_TRUNCATED}, this function does + @emph{not} apply to general shift rtxes; it applies only to instructions + that are generated by the named shift patterns. + + The default implementation of this function returns + @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED} + and 0 otherwise. This definition is always safe, but if + @code{SHIFT_COUNT_TRUNCATED} is false, and some shift patterns + nevertheless truncate the shift count, you may get better code + by overriding it. + @end deftypefn + @defmac TRULY_NOOP_TRUNCATION (@var{outprec}, @var{inprec}) A C expression which is nonzero if on this machine it is safe to ``convert'' an integer of @var{inprec} bits to one of @var{outprec} Index: target.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/target.h,v retrieving revision 1.109 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.109 target.h *** target.h 26 Aug 2004 00:24:34 -0000 1.109 --- target.h 31 Aug 2004 18:45:10 -0000 *************** struct gcc_target *** 378,383 **** --- 378,387 ---- /* Undo the effects of encode_section_info on the symbol string. */ const char * (* strip_name_encoding) (const char *); + /* If shift optabs for MODE are known to always truncate the shift count, + return the mask that they apply. Return 0 otherwise. */ + unsigned HOST_WIDE_INT (* shift_truncation_mask) (enum machine_mode mode); + /* True if MODE is valid for a pointer in __attribute__((mode("MODE"))). */ bool (* valid_pointer_mode) (enum machine_mode mode); Index: targhooks.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/targhooks.h,v retrieving revision 2.18 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r2.18 targhooks.h *** targhooks.h 26 Aug 2004 00:24:34 -0000 2.18 --- targhooks.h 31 Aug 2004 18:45:10 -0000 *************** extern bool hook_bool_CUMULATIVE_ARGS_fa *** 32,37 **** --- 32,39 ---- extern bool default_pretend_outgoing_varargs_named (CUMULATIVE_ARGS *); extern enum machine_mode default_eh_return_filter_mode (void); + extern unsigned HOST_WIDE_INT default_shift_truncation_mask + (enum machine_mode); extern bool hook_bool_CUMULATIVE_ARGS_true (CUMULATIVE_ARGS *); extern tree default_cxx_guard_type (void); Index: targhooks.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/targhooks.c,v retrieving revision 2.27 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r2.27 targhooks.c *** targhooks.c 26 Aug 2004 00:24:34 -0000 2.27 --- targhooks.c 31 Aug 2004 18:45:10 -0000 *************** default_eh_return_filter_mode (void) *** 135,140 **** --- 135,148 ---- return word_mode; } + /* The default implementation of TARGET_SHIFT_TRUNCATION_MASK. */ + + unsigned HOST_WIDE_INT + default_shift_truncation_mask (enum machine_mode mode) + { + return SHIFT_COUNT_TRUNCATED ? GET_MODE_BITSIZE (mode) - 1 : 0; + } + /* Generic hook that takes a CUMULATIVE_ARGS pointer and returns true. */ bool Index: target-def.h =================================================================== RCS file: /cvs/gcc/gcc/gcc/target-def.h,v retrieving revision 1.98 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.98 target-def.h *** target-def.h 26 Aug 2004 00:24:33 -0000 1.98 --- target-def.h 31 Aug 2004 18:45:11 -0000 *************** #define TARGET_STRIP_NAME_ENCODING defau *** 301,306 **** --- 301,310 ---- #define TARGET_BINDS_LOCAL_P default_binds_local_p #endif + #ifndef TARGET_SHIFT_TRUNCATION_MASK + #define TARGET_SHIFT_TRUNCATION_MASK default_shift_truncation_mask + #endif + #ifndef TARGET_VALID_POINTER_MODE #define TARGET_VALID_POINTER_MODE default_valid_pointer_mode #endif *************** #define TARGET_INITIALIZER \ *** 478,483 **** --- 482,488 ---- TARGET_BINDS_LOCAL_P, \ TARGET_ENCODE_SECTION_INFO, \ TARGET_STRIP_NAME_ENCODING, \ + TARGET_SHIFT_TRUNCATION_MASK, \ TARGET_VALID_POINTER_MODE, \ TARGET_SCALAR_MODE_SUPPORTED_P, \ TARGET_VECTOR_MODE_SUPPORTED_P, \ Index: simplify-rtx.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v retrieving revision 1.202 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.202 simplify-rtx.c *** simplify-rtx.c 27 Jul 2004 19:09:32 -0000 1.202 --- simplify-rtx.c 31 Aug 2004 18:45:15 -0000 *************** simplify_binary_operation (enum rtx_code *** 2343,2383 **** break; case LSHIFTRT: - /* If shift count is undefined, don't fold it; let the machine do - what it wants. But truncate it if the machine will do that. */ - if (arg1 < 0) - return 0; - - if (SHIFT_COUNT_TRUNCATED) - arg1 %= width; - - val = ((unsigned HOST_WIDE_INT) arg0) >> arg1; - break; - case ASHIFT: - if (arg1 < 0) - return 0; - - if (SHIFT_COUNT_TRUNCATED) - arg1 %= width; - - val = ((unsigned HOST_WIDE_INT) arg0) << arg1; - break; - case ASHIFTRT: ! if (arg1 < 0) ! return 0; ! if (SHIFT_COUNT_TRUNCATED) ! arg1 %= width; ! ! val = arg0s >> arg1; ! ! /* Bootstrap compiler may not have sign extended the right shift. ! Manually extend the sign to insure bootstrap cc matches gcc. */ ! if (arg0s < 0 && arg1 > 0) ! val |= ((HOST_WIDE_INT) -1) << (HOST_BITS_PER_WIDE_INT - arg1); break; case ROTATERT: --- 2343,2368 ---- break; case LSHIFTRT: case ASHIFT: case ASHIFTRT: ! /* Truncate the shift if SHIFT_COUNT_TRUNCATED, otherwise make sure the ! value is in range. We can't return any old value for out-of-range ! arguments because either the middle-end (via shift_truncation_mask) ! or the back-end might be relying on target-specific knowledge. ! Nor can we rely on shift_truncation_mask, since the shift might ! not be part of an ashlM3, lshrM3 or ashrM3 instruction. */ if (SHIFT_COUNT_TRUNCATED) ! arg1 = (unsigned HOST_WIDE_INT) arg1 % width; ! else if (arg1 < 0 || arg1 >= GET_MODE_BITSIZE (mode)) ! return 0; + val = (code == ASHIFT + ? ((unsigned HOST_WIDE_INT) arg0) << arg1 + : ((unsigned HOST_WIDE_INT) arg0) >> arg1); + + /* Sign-extend the result for arithmetic right shifts. */ + if (code == ASHIFTRT && arg0s < 0 && arg1 > 0) + val |= ((HOST_WIDE_INT) -1) << (width - arg1); break; case ROTATERT: Index: optabs.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/optabs.c,v retrieving revision 1.235 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.235 optabs.c *** optabs.c 19 Aug 2004 22:24:54 -0000 1.235 --- optabs.c 31 Aug 2004 18:45:20 -0000 *************** optab_for_tree_code (enum tree_code code *** 709,715 **** --- 709,1064 ---- return NULL; } } + \f + /* Like expand_binop, but return a constant rtx if the result can be + calculated at compile time. The arguments and return value are + otherwise the same as for expand_binop. */ + + static rtx + simplify_expand_binop (enum machine_mode mode, optab binoptab, + rtx op0, rtx op1, rtx target, int unsignedp, + enum optab_methods methods) + { + if (CONSTANT_P (op0) && CONSTANT_P (op1)) + return simplify_gen_binary (binoptab->code, mode, op0, op1); + else + return expand_binop (mode, binoptab, op0, op1, target, unsignedp, methods); + } + + /* Like simplify_expand_binop, but always put the result in TARGET. + Return true if the expansion succeeded. */ + + static bool + force_expand_binop (enum machine_mode mode, optab binoptab, + rtx op0, rtx op1, rtx target, int unsignedp, + enum optab_methods methods) + { + rtx x = simplify_expand_binop (mode, binoptab, op0, op1, + target, unsignedp, methods); + if (x == 0) + return false; + if (x != target) + emit_move_insn (target, x); + return true; + } + + /* This subroutine of expand_doubleword_shift handles the cases in which + the effective shift value is >= BITS_PER_WORD. The arguments and return + value are the same as for the parent routine, except that SUPERWORD_OP1 + is the shift count to use when shifting OUTOF_INPUT into INTO_TARGET. + INTO_TARGET may be null if the caller has decided to calculate it. */ + + static bool + expand_superword_shift (optab binoptab, rtx outof_input, rtx superword_op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods) + { + if (into_target != 0) + if (!force_expand_binop (word_mode, binoptab, outof_input, superword_op1, + into_target, unsignedp, methods)) + return false; + + if (outof_target != 0) + { + /* For a signed right shift, we must fill OUTOF_TARGET with copies + of the sign bit, otherwise we must fill it with zeros. */ + if (binoptab != ashr_optab) + emit_move_insn (outof_target, CONST0_RTX (word_mode)); + else + if (!force_expand_binop (word_mode, binoptab, + outof_input, GEN_INT (BITS_PER_WORD - 1), + outof_target, unsignedp, methods)) + return false; + } + return true; + } + + /* This subroutine of expand_doubleword_shift handles the cases in which + the effective shift value is < BITS_PER_WORD. The arguments and return + value are the same as for the parent routine. */ + + static bool + expand_subword_shift (enum machine_mode op1_mode, optab binoptab, + rtx outof_input, rtx into_input, rtx op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods, + unsigned HOST_WIDE_INT shift_mask) + { + optab reverse_unsigned_shift, unsigned_shift; + rtx tmp, carries; + + reverse_unsigned_shift = (binoptab == ashl_optab ? lshr_optab : ashl_optab); + unsigned_shift = (binoptab == ashl_optab ? ashl_optab : lshr_optab); + + /* The low OP1 bits of INTO_TARGET come from the high bits of OUTOF_INPUT. + We therefore need to shift OUTOF_INPUT by (BITS_PER_WORD - OP1) bits in + the opposite direction to BINOPTAB. */ + if (CONSTANT_P (op1) || shift_mask >= BITS_PER_WORD) + { + carries = outof_input; + tmp = immed_double_const (BITS_PER_WORD, 0, op1_mode); + tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1, + 0, true, methods); + } + else + { + /* We must avoid shifting by BITS_PER_WORD bits since that is either + the same as a zero shift (if shift_mask == BITS_PER_WORD - 1) or + has unknown behaviour. Do a single shift first, then shift by the + remainder. It's OK to use ~OP1 as the remainder if shift counts + are truncated to the mode size. */ + carries = expand_binop (word_mode, reverse_unsigned_shift, + outof_input, const1_rtx, 0, unsignedp, methods); + if (shift_mask == BITS_PER_WORD - 1) + { + tmp = immed_double_const (-1, -1, op1_mode); + tmp = simplify_expand_binop (op1_mode, xor_optab, op1, tmp, + 0, true, methods); + } + else + { + tmp = immed_double_const (BITS_PER_WORD - 1, 0, op1_mode); + tmp = simplify_expand_binop (op1_mode, sub_optab, tmp, op1, + 0, true, methods); + } + } + if (tmp == 0 || carries == 0) + return false; + carries = expand_binop (word_mode, reverse_unsigned_shift, + carries, tmp, 0, unsignedp, methods); + if (carries == 0) + return false; + + /* Shift INTO_INPUT logically by OP1. This is the last use of INTO_INPUT + so the result can go directly into INTO_TARGET if convenient. */ + tmp = expand_binop (word_mode, unsigned_shift, into_input, op1, + into_target, unsignedp, methods); + if (tmp == 0) + return false; + + /* Now OR in the bits carried over from OUTOF_INPUT. */ + if (!force_expand_binop (word_mode, ior_optab, tmp, carries, + into_target, unsignedp, methods)) + return false; + + /* Use a standard word_mode shift for the out-of half. */ + if (outof_target != 0) + if (!force_expand_binop (word_mode, binoptab, outof_input, op1, + outof_target, unsignedp, methods)) + return false; + + return true; + } + + #ifdef HAVE_conditional_move + /* Try implementing expand_doubleword_shift using conditional moves. + The shift is by < BITS_PER_WORD if (CMP_CODE CMP1 CMP2) is true, + otherwise it is by >= BITS_PER_WORD. SUBWORD_OP1 and SUPERWORD_OP1 + are the shift counts to use in the former and latter case. All other + arguments are the same as the parent routine. */ + + static bool + expand_doubleword_shift_condmove (enum machine_mode op1_mode, optab binoptab, + enum rtx_code cmp_code, rtx cmp1, rtx cmp2, + rtx outof_input, rtx into_input, + rtx subword_op1, rtx superword_op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods, + unsigned HOST_WIDE_INT shift_mask) + { + rtx outof_superword, into_superword; + + /* Put the superword version of the output into OUTOF_SUPERWORD and + INTO_SUPERWORD. */ + outof_superword = outof_target != 0 ? gen_reg_rtx (word_mode) : 0; + if (outof_target != 0 && subword_op1 == superword_op1) + { + /* The value INTO_TARGET >> SUBWORD_OP1, which we later store in + OUTOF_TARGET, is the same as the value of INTO_SUPERWORD. */ + into_superword = outof_target; + if (!expand_superword_shift (binoptab, outof_input, superword_op1, + outof_superword, 0, unsignedp, methods)) + return false; + } + else + { + into_superword = gen_reg_rtx (word_mode); + if (!expand_superword_shift (binoptab, outof_input, superword_op1, + outof_superword, into_superword, + unsignedp, methods)) + return false; + } + + /* Put the subword version directly in OUTOF_TARGET and INTO_TARGET. */ + if (!expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, subword_op1, + outof_target, into_target, + unsignedp, methods, shift_mask)) + return false; + + /* Select between them. Do the INTO half first because INTO_SUPERWORD + might be the current value of OUTOF_TARGET. */ + if (!emit_conditional_move (into_target, cmp_code, cmp1, cmp2, op1_mode, + into_target, into_superword, word_mode, false)) + return false; + + if (outof_target != 0) + if (!emit_conditional_move (outof_target, cmp_code, cmp1, cmp2, op1_mode, + outof_target, outof_superword, + word_mode, false)) + return false; + + return true; + } + #endif + + /* Expand a doubleword shift (ashl, ashr or lshr) using word-mode shifts. + OUTOF_INPUT and INTO_INPUT are the two word-sized halves of the first + input operand; the shift moves bits in the direction OUTOF_INPUT-> + INTO_TARGET. OUTOF_TARGET and INTO_TARGET are the equivalent words + of the target. OP1 is the shift count and OP1_MODE is its mode. + If OP1 is constant, it will have been truncated as appropriate + and is known to be nonzero. + + If SHIFT_MASK is zero, the result of word shifts is undefined when the + shift count is outside the range [0, BITS_PER_WORD). This routine must + avoid generating such shifts for OP1s in the range [0, BITS_PER_WORD * 2). + + If SHIFT_MASK is nonzero, all word-mode shift counts are effectively + masked by it and shifts in the range [BITS_PER_WORD, SHIFT_MASK) will + fill with zeros or sign bits as appropriate. + + If SHIFT_MASK is BITS_PER_WORD - 1, this routine will synthesise + a doubleword shift whose equivalent mask is BITS_PER_WORD * 2 - 1. + Doing this preserves semantics required by SHIFT_COUNT_TRUNCATED. + In all other cases, shifts by values outside [0, BITS_PER_UNIT * 2) + are undefined. + + BINOPTAB, UNSIGNEDP and METHODS are as for expand_binop. This function + may not use INTO_INPUT after modifying INTO_TARGET, and similarly for + OUTOF_INPUT and OUTOF_TARGET. OUTOF_TARGET can be null if the parent + function wants to calculate it itself. + + Return true if the shift could be successfully synthesized. */ + + static bool + expand_doubleword_shift (enum machine_mode op1_mode, optab binoptab, + rtx outof_input, rtx into_input, rtx op1, + rtx outof_target, rtx into_target, + int unsignedp, enum optab_methods methods, + unsigned HOST_WIDE_INT shift_mask) + { + rtx superword_op1, tmp, cmp1, cmp2; + rtx subword_label, done_label; + enum rtx_code cmp_code; + + /* See if word-mode shifts by BITS_PER_WORD...BITS_PER_WORD * 2 - 1 will + fill the result with sign or zero bits as appropriate. If so, the value + of OUTOF_TARGET will always be (SHIFT OUTOF_INPUT OP1). Recursively call + this routine to calculate INTO_TARGET (which depends on both OUTOF_INPUT + and INTO_INPUT), then emit code to set up OUTOF_TARGET. + + This isn't worthwhile for constant shifts since the optimizers will + cope better with in-range shift counts. */ + if (shift_mask >= BITS_PER_WORD + && outof_target != 0 + && !CONSTANT_P (op1)) + { + if (!expand_doubleword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + 0, into_target, + unsignedp, methods, shift_mask)) + return false; + if (!force_expand_binop (word_mode, binoptab, outof_input, op1, + outof_target, unsignedp, methods)) + return false; + return true; + } + + /* Set CMP_CODE, CMP1 and CMP2 so that the rtx (CMP_CODE CMP1 CMP2) + is true when the effective shift value is less than BITS_PER_WORD. + Set SUPERWORD_OP1 to the shift count that should be used to shift + OUTOF_INPUT into INTO_TARGET when the condition is false. */ + tmp = immed_double_const (BITS_PER_WORD, 0, op1_mode); + if (!CONSTANT_P (op1) && shift_mask == BITS_PER_WORD - 1) + { + /* Set CMP1 to OP1 & BITS_PER_WORD. The result is zero iff OP1 + is a subword shift count. */ + cmp1 = simplify_expand_binop (op1_mode, and_optab, op1, tmp, + 0, true, methods); + cmp2 = CONST0_RTX (op1_mode); + cmp_code = EQ; + superword_op1 = op1; + } + else + { + /* Set CMP1 to OP1 - BITS_PER_WORD. */ + cmp1 = simplify_expand_binop (op1_mode, sub_optab, op1, tmp, + 0, true, methods); + cmp2 = CONST0_RTX (op1_mode); + cmp_code = LT; + superword_op1 = cmp1; + } + if (cmp1 == 0) + return false; + + /* If we can compute the condition at compile time, pick the + appropriate subroutine. */ + tmp = simplify_relational_operation (cmp_code, SImode, op1_mode, cmp1, cmp2); + if (tmp != 0 && GET_CODE (tmp) == CONST_INT) + { + if (tmp == const0_rtx) + return expand_superword_shift (binoptab, outof_input, superword_op1, + outof_target, into_target, + unsignedp, methods); + else + return expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + outof_target, into_target, + unsignedp, methods, shift_mask); + } + + #ifdef HAVE_conditional_move + /* Try using conditional moves to generate straight-line code. */ + { + rtx start = get_last_insn (); + if (expand_doubleword_shift_condmove (op1_mode, binoptab, + cmp_code, cmp1, cmp2, + outof_input, into_input, + op1, superword_op1, + outof_target, into_target, + unsignedp, methods, shift_mask)) + return true; + delete_insns_since (start); + } + #endif + + /* As a last resort, use branches to select the correct alternative. */ + subword_label = gen_label_rtx (); + done_label = gen_label_rtx (); + + do_compare_rtx_and_jump (cmp1, cmp2, cmp_code, false, op1_mode, + 0, 0, subword_label); + + if (!expand_superword_shift (binoptab, outof_input, superword_op1, + outof_target, into_target, + unsignedp, methods)) + return false; + + emit_jump_insn (gen_jump (done_label)); + emit_barrier (); + emit_label (subword_label); + + if (!expand_subword_shift (op1_mode, binoptab, + outof_input, into_input, op1, + outof_target, into_target, + unsignedp, methods, shift_mask)) + return false; + + emit_label (done_label); + return true; + } \f /* Wrapper around expand_binop which takes an rtx code to specify the operation to perform, not an optab pointer. All other *************** expand_binop (enum machine_mode mode, op *** 1035,1152 **** if ((binoptab == lshr_optab || binoptab == ashl_optab || binoptab == ashr_optab) && class == MODE_INT ! && GET_CODE (op1) == CONST_INT && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD && binoptab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && ashl_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && lshr_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing) { ! rtx insns, inter, equiv_value; ! rtx into_target, outof_target; ! rtx into_input, outof_input; ! int shift_count, left_shift, outof_word; ! ! /* If TARGET is the same as one of the operands, the REG_EQUAL note ! won't be accurate, so use a new target. */ ! if (target == 0 || target == op0 || target == op1) ! target = gen_reg_rtx (mode); ! ! start_sequence (); ! ! shift_count = INTVAL (op1); ! ! /* OUTOF_* is the word we are shifting bits away from, and ! INTO_* is the word that we are shifting bits towards, thus ! they differ depending on the direction of the shift and ! WORDS_BIG_ENDIAN. */ ! ! left_shift = binoptab == ashl_optab; ! outof_word = left_shift ^ ! WORDS_BIG_ENDIAN; ! ! outof_target = operand_subword (target, outof_word, 1, mode); ! into_target = operand_subword (target, 1 - outof_word, 1, mode); ! ! outof_input = operand_subword_force (op0, outof_word, mode); ! into_input = operand_subword_force (op0, 1 - outof_word, mode); ! ! if (shift_count >= BITS_PER_WORD) ! { ! inter = expand_binop (word_mode, binoptab, ! outof_input, ! GEN_INT (shift_count - BITS_PER_WORD), ! into_target, unsignedp, next_methods); ! ! if (inter != 0 && inter != into_target) ! emit_move_insn (into_target, inter); ! ! /* For a signed right shift, we must fill the word we are shifting ! out of with copies of the sign bit. Otherwise it is zeroed. */ ! if (inter != 0 && binoptab != ashr_optab) ! inter = CONST0_RTX (word_mode); ! else if (inter != 0) ! inter = expand_binop (word_mode, binoptab, ! outof_input, ! GEN_INT (BITS_PER_WORD - 1), ! outof_target, unsignedp, next_methods); ! ! if (inter != 0 && inter != outof_target) ! emit_move_insn (outof_target, inter); ! } ! else ! { ! rtx carries; ! optab reverse_unsigned_shift, unsigned_shift; ! ! /* For a shift of less then BITS_PER_WORD, to compute the carry, ! we must do a logical shift in the opposite direction of the ! desired shift. */ ! ! reverse_unsigned_shift = (left_shift ? lshr_optab : ashl_optab); ! ! /* For a shift of less than BITS_PER_WORD, to compute the word ! shifted towards, we need to unsigned shift the orig value of ! that word. */ ! ! unsigned_shift = (left_shift ? ashl_optab : lshr_optab); ! ! carries = expand_binop (word_mode, reverse_unsigned_shift, ! outof_input, ! GEN_INT (BITS_PER_WORD - shift_count), ! 0, unsignedp, next_methods); ! ! if (carries == 0) ! inter = 0; ! else ! inter = expand_binop (word_mode, unsigned_shift, into_input, ! op1, 0, unsignedp, next_methods); ! ! if (inter != 0) ! inter = expand_binop (word_mode, ior_optab, carries, inter, ! into_target, unsignedp, next_methods); ! ! if (inter != 0 && inter != into_target) ! emit_move_insn (into_target, inter); ! ! if (inter != 0) ! inter = expand_binop (word_mode, binoptab, outof_input, ! op1, outof_target, unsignedp, next_methods); ! if (inter != 0 && inter != outof_target) ! emit_move_insn (outof_target, inter); ! } ! ! insns = get_insns (); ! end_sequence (); ! ! if (inter != 0) ! { ! if (binoptab->code != UNKNOWN) ! equiv_value = gen_rtx_fmt_ee (binoptab->code, mode, op0, op1); ! else ! equiv_value = 0; ! emit_no_conflict_block (insns, target, op0, op1, equiv_value); ! return target; } } --- 1384,1454 ---- if ((binoptab == lshr_optab || binoptab == ashl_optab || binoptab == ashr_optab) && class == MODE_INT ! && (GET_CODE (op1) == CONST_INT || !optimize_size) && GET_MODE_SIZE (mode) == 2 * UNITS_PER_WORD && binoptab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && ashl_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing && lshr_optab->handlers[(int) word_mode].insn_code != CODE_FOR_nothing) { ! unsigned HOST_WIDE_INT shift_mask, double_shift_mask; ! enum machine_mode op1_mode; ! double_shift_mask = targetm.shift_truncation_mask (mode); ! shift_mask = targetm.shift_truncation_mask (word_mode); ! op1_mode = GET_MODE (op1) != VOIDmode ? GET_MODE (op1) : word_mode; ! ! /* Apply the truncation to constant shifts. */ ! if (double_shift_mask > 0 && GET_CODE (op1) == CONST_INT) ! op1 = GEN_INT (INTVAL (op1) & double_shift_mask); ! ! if (op1 == CONST0_RTX (op1_mode)) ! return op0; ! ! /* Make sure that this is a combination that expand_doubleword_shift ! can handle. See the comments there for details. */ ! if (double_shift_mask == 0 ! || (shift_mask == BITS_PER_WORD - 1 ! && double_shift_mask == BITS_PER_WORD * 2 - 1)) ! { ! rtx insns, equiv_value; ! rtx into_target, outof_target; ! rtx into_input, outof_input; ! int left_shift, outof_word; ! ! /* If TARGET is the same as one of the operands, the REG_EQUAL note ! won't be accurate, so use a new target. */ ! if (target == 0 || target == op0 || target == op1) ! target = gen_reg_rtx (mode); ! ! start_sequence (); ! ! /* OUTOF_* is the word we are shifting bits away from, and ! INTO_* is the word that we are shifting bits towards, thus ! they differ depending on the direction of the shift and ! WORDS_BIG_ENDIAN. */ ! ! left_shift = binoptab == ashl_optab; ! outof_word = left_shift ^ ! WORDS_BIG_ENDIAN; ! ! outof_target = operand_subword (target, outof_word, 1, mode); ! into_target = operand_subword (target, 1 - outof_word, 1, mode); ! ! outof_input = operand_subword_force (op0, outof_word, mode); ! into_input = operand_subword_force (op0, 1 - outof_word, mode); ! ! if (expand_doubleword_shift (op1_mode, binoptab, ! outof_input, into_input, op1, ! outof_target, into_target, ! unsignedp, methods, shift_mask)) ! { ! insns = get_insns (); ! end_sequence (); ! equiv_value = gen_rtx_fmt_ee (binoptab->code, mode, op0, op1); ! emit_no_conflict_block (insns, target, op0, op1, equiv_value); ! return target; ! } ! end_sequence (); } } Index: config/arm/arm.c =================================================================== RCS file: /cvs/gcc/gcc/gcc/config/arm/arm.c,v retrieving revision 1.399 diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.399 arm.c *** config/arm/arm.c 25 Aug 2004 09:52:09 -0000 1.399 --- config/arm/arm.c 31 Aug 2004 18:45:34 -0000 *************** static tree arm_get_cookie_size (tree); *** 173,179 **** static bool arm_cookie_has_size (void); static bool arm_cxx_cdtor_returns_this (void); static void arm_init_libfuncs (void); ! \f /* Initialize the GCC target structure. */ #if TARGET_DLLIMPORT_DECL_ATTRIBUTES --- 173,179 ---- static bool arm_cookie_has_size (void); static bool arm_cxx_cdtor_returns_this (void); static void arm_init_libfuncs (void); ! static unsigned HOST_WIDE_INT arm_shift_truncation_mask (enum machine_mode); \f /* Initialize the GCC target structure. */ #if TARGET_DLLIMPORT_DECL_ATTRIBUTES *************** #define TARGET_RTX_COSTS arm_slowmul_rtx *** 246,251 **** --- 246,253 ---- #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST arm_address_cost + #undef TARGET_SHIFT_TRUNCATION_MASK + #define TARGET_SHIFT_TRUNCATION_MASK arm_shift_truncation_mask #undef TARGET_VECTOR_MODE_SUPPORTED_P #define TARGET_VECTOR_MODE_SUPPORTED_P arm_vector_mode_supported_p *************** arm_vector_mode_supported_p (enum machin *** 14307,14309 **** --- 14309,14322 ---- return false; } + + /* Implement TARGET_SHIFT_TRUNCATION_MASK. SImode shifts use normal + ARM insns and therefore guarantee that the shift count is modulo 256. + DImode shifts (those implemented by libgcc1.asm or by optabs.c) + guarantee no particular behavior for out-of-range counts. */ + + static unsigned HOST_WIDE_INT + arm_shift_truncation_mask (enum machine_mode mode) + { + return mode == SImode ? 255 : 0; + } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-08-31 19:51 ` Richard Sandiford @ 2004-09-03 6:53 ` Richard Henderson 2004-09-03 7:05 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-09-03 6:53 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Tue, Aug 31, 2004 at 08:51:20PM +0100, Richard Sandiford wrote: > int TARGET_SHIFT_TRUNCATION_MASK (enum machine_mode MODE) ... > Note that, unlike `SHIFT_COUNT_TRUNCATED', this function does > _not_ apply to general shift rtxes; it applies only to instructions > that are generated by the named shift patterns. I'm not particularly thrilled about this notion. I'd much prefer a target hook that could replace SHIFT_COUNT_TRUNCATED. How often are the named patterns going to differ from the rtxes anyway? r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 6:53 ` Richard Henderson @ 2004-09-03 7:05 ` Richard Sandiford 2004-09-03 7:08 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-09-03 7:05 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > On Tue, Aug 31, 2004 at 08:51:20PM +0100, Richard Sandiford wrote: >> int TARGET_SHIFT_TRUNCATION_MASK (enum machine_mode MODE) > ... >> Note that, unlike `SHIFT_COUNT_TRUNCATED', this function does >> _not_ apply to general shift rtxes; it applies only to instructions >> that are generated by the named shift patterns. > > I'm not particularly thrilled about this notion. I'd much prefer a > target hook that could replace SHIFT_COUNT_TRUNCATED. How often are > the named patterns going to differ from the rtxes anyway? Well, the problem is that SHIFT_COUNT_TRUNCATED applies to all shift rtxes, including those synthesised by things like combine.c:expand_ compound_operation(). I assume that's why SHIFT_COUNT_TRUNCATED is documented as follows: A C expression that is nonzero if on this machine the number of bits actually used for the count of a shift operation is equal to the number of bits needed to represent the size of the object being shifted. When this macro is nonzero, the compiler will assume that it is safe to omit a sign-extend, zero-extend, and certain bitwise `and' instructions that truncates the count of a shift operation. On machines that have instructions that act on bit-fields at variable positions, which may include `bit test' instructions, a nonzero @code{SHIFT_COUNT_TRUNCATED} also enables deletion of truncations of the values that serve as arguments to bit-field instructions. If both types of instructions truncate the count (for shifts) and position (for bit-field operations), or if no variable-position bit-field instructions exist, you should define this macro. However, on some machines, such as the 80386 and the 680x0, truncation only applies to shift operations and not the (real or pretended) bit-field operations. Define @code{SHIFT_COUNT_TRUNCATED} to be zero on such machines. Instead, add patterns to the @file{md} file that include the implied truncation of the shift instructions. I was deliberately trying to avoid this fuzziness with the new target hook. E.g., if it ever becomes useful to know that ashlsi3 truncates on x86, then it will be possible to use the new hook there too, even though the requirements of S_C_T aren't met. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 7:05 ` Richard Sandiford @ 2004-09-03 7:08 ` Richard Henderson 2004-09-03 7:11 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-09-03 7:08 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Fri, Sep 03, 2004 at 08:05:15AM +0100, Richard Sandiford wrote: > However, on some machines, such as the 80386 and the 680x0, truncation > only applies to shift operations and not the (real or pretended) > bit-field operations. Define @code{SHIFT_COUNT_TRUNCATED} to be zero on > such machines. Instead, add patterns to the @file{md} file that include > the implied truncation of the shift instructions. > > I was deliberately trying to avoid this fuzziness with the new target hook. Hmm. I suppose we could pass the shift operation in there; ASHIFT, LSHIFT, ZERO_EXTRACT, SIGN_EXTRACT. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 7:08 ` Richard Henderson @ 2004-09-03 7:11 ` Richard Sandiford 2004-09-03 7:20 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-09-03 7:11 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > On Fri, Sep 03, 2004 at 08:05:15AM +0100, Richard Sandiford wrote: >> However, on some machines, such as the 80386 and the 680x0, truncation >> only applies to shift operations and not the (real or pretended) >> bit-field operations. Define @code{SHIFT_COUNT_TRUNCATED} to be zero on >> such machines. Instead, add patterns to the @file{md} file that include >> the implied truncation of the shift instructions. >> >> I was deliberately trying to avoid this fuzziness with the new target hook. > > Hmm. I suppose we could pass the shift operation in there; > ASHIFT, LSHIFT, ZERO_EXTRACT, SIGN_EXTRACT. But the point as I understand it is that the generic optimisers (e.g. simplify-rtx.c) can't tell the difference between an ASHIFT that came from an (ashift ...) in the instruction stream or from something that was generated artificially by expand_compound_operation. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 7:11 ` Richard Sandiford @ 2004-09-03 7:20 ` Richard Henderson 2004-09-03 7:29 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-09-03 7:20 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Fri, Sep 03, 2004 at 08:11:47AM +0100, Richard Sandiford wrote: > But the point as I understand it is that the generic optimisers > (e.g. simplify-rtx.c) can't tell the difference between an ASHIFT > that came from an (ashift ...) in the instruction stream or from > something that was generated artificially by expand_compound_operation. That would be a bug in expand_compound_operation, I would think. The alternative is to not add your new hook and do what you can with the existing SHIFT_COUNT_TRUNCATED macro. Which I recommend that you do; I don't think you really want to have the shift bits dependent on a cleanup / infrastructure change of this scale. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 7:20 ` Richard Henderson @ 2004-09-03 7:29 ` Richard Sandiford 2004-09-03 20:15 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-09-03 7:29 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > On Fri, Sep 03, 2004 at 08:11:47AM +0100, Richard Sandiford wrote: >> But the point as I understand it is that the generic optimisers >> (e.g. simplify-rtx.c) can't tell the difference between an ASHIFT >> that came from an (ashift ...) in the instruction stream or from >> something that was generated artificially by expand_compound_operation. > > That would be a bug in expand_compound_operation, I would think. > > The alternative is to not add your new hook and do what you can > with the existing SHIFT_COUNT_TRUNCATED macro. Which I recommend > that you do; I don't think you really want to have the shift bits > dependent on a cleanup / infrastructure change of this scale. FWIW, that's what my original patch did: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00461.html The patch I posted this week was in response to the request for wider-ranging target support: http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00606.html But because it depended on S_C_T, the original patch produced much worse code for ARM than the new one does. Is the new target hook really that invasive? It doesn't affect any other code as such. The only change not directly related to the optabs expansion was the simplify-rtx.c thing, and like I said in my covering message, I think that code's bogus anyway: This seems pretty dubious anyway. What if a define_expand in the backend uses shifts to implement a complex named pattern? I'd have thought the backend would be free to use target-specific knowledge about what that shift does with out-of-range values. And if we are later able to constant-fold the result, the code above might not do what the target machine would do. To be honest, I'd still like to apply that hunk even if we go back to S_C_T. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 7:29 ` Richard Sandiford @ 2004-09-03 20:15 ` Richard Henderson 2004-09-04 8:53 ` Richard Sandiford 0 siblings, 1 reply; 36+ messages in thread From: Richard Henderson @ 2004-09-03 20:15 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Fri, Sep 03, 2004 at 08:29:39AM +0100, Richard Sandiford wrote: > The patch I posted this week was in response to the request for > wider-ranging target support: > > http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00606.html Ah. I wasn't actually asking for wider target support. I was commenting that if we wanted something better, we'd have to add new target support. Sorry for the confusion. > Is the new target hook really that invasive? It doesn't affect any > other code as such. No... I guess not. And it is a start if we ever do decide to expand its meaning to replace S_C_T. > The only change not directly related to the optabs expansion was the > simplify-rtx.c thing, and like I said in my covering message, I think > that code's bogus anyway ... Agreed. Ok, revised patch approved. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-03 20:15 ` Richard Henderson @ 2004-09-04 8:53 ` Richard Sandiford 2004-09-05 0:03 ` Richard Henderson 0 siblings, 1 reply; 36+ messages in thread From: Richard Sandiford @ 2004-09-04 8:53 UTC (permalink / raw) To: Richard Henderson Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips Richard Henderson <rth@redhat.com> writes: > On Fri, Sep 03, 2004 at 08:29:39AM +0100, Richard Sandiford wrote: >> The patch I posted this week was in response to the request for >> wider-ranging target support: >> >> http://gcc.gnu.org/ml/gcc-patches/2004-08/msg00606.html > > Ah. I wasn't actually asking for wider target support. I was > commenting that if we wanted something better, we'd have to add > new target support. Ooops! I misunderstood, sorry. >> Is the new target hook really that invasive? It doesn't affect any >> other code as such. > > No... I guess not. And it is a start if we ever do decide to > expand its meaning to replace S_C_T. [...] > Ok, revised patch approved. Thanks, applied. Looking back, I see I didn't do a very good job of explaining why I think S_C_T and this target hook are doing two different things. A bit more explanation (mostly for the record, since I doubt I'm saying anything surprising here): I can only see two optimisations guarded by !S_C_T, both of them in combine.c. They only disallow S_C_T because we might have already optimised the construct in a different way. All other uses of S_C_T are used for shifts. So the important point is that: although S_C_T requires a particular behaviour for things like ZERO_EXTRACT, it is never actually used in a positive context for ZERO_EXTRACT rtxes. S_C_T is only ever used in a positive context for shift rtxes. As I understand it, the reason that S_C_T has those extra requirements is that we have no way of tracking where a particular shift came from. Sometimes it might come from an insn's PATTERN, sometimes it might be the result of some temporary rewrites, such as that performed by combine.c for "compound" operations. The definition of S_C_T says that these rewrites must be valid. It sounds like you're saying that the target hook should eventually be extended to cover shift rtxes rather than shift optabs, and that anything which generates a shift rtx should make sure the rtx behaves correctly wrt these hooks. So, for example, the onus for verifying the ZERO_EXTRACT rewrites should be with combine rather than the backend. Is that right? If so, I'll try to look at that in the 3.6 timeframe. Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets 2004-09-04 8:53 ` Richard Sandiford @ 2004-09-05 0:03 ` Richard Henderson 0 siblings, 0 replies; 36+ messages in thread From: Richard Henderson @ 2004-09-05 0:03 UTC (permalink / raw) To: Richard Sandiford Cc: Maciej W. Rozycki, Nigel Stephens, gcc-patches, linux-mips On Sat, Sep 04, 2004 at 09:53:28AM +0100, Richard Sandiford wrote: > So, for example, the onus for verifying the ZERO_EXTRACT rewrites > should be with combine rather than the backend. Is that right? Yes. r~ ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2004-09-05 0:03 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-07-19 15:35 [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki 2004-07-19 16:59 ` Richard Sandiford 2004-07-19 17:32 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bittargets David Edelsohn 2004-07-19 17:33 ` [patch] MIPS/gcc: Revert removal of DImode shifts for 32-bit targets Maciej W. Rozycki 2004-07-19 17:37 ` Richard Sandiford 2004-07-19 21:38 ` Richard Henderson 2004-07-23 14:41 ` Maciej W. Rozycki 2004-07-23 20:27 ` Richard Henderson 2004-07-23 21:12 ` Ralf Baechle 2004-07-26 11:56 ` Maciej W. Rozycki 2004-08-02 20:03 ` Nigel Stephens 2004-08-03 5:30 ` Richard Sandiford 2004-08-03 9:22 ` Nigel Stephens 2004-08-03 9:36 ` Richard Sandiford 2004-08-03 9:54 ` Nigel Stephens 2004-08-04 19:57 ` Maciej W. Rozycki 2004-08-04 20:37 ` Nigel Stephens 2004-08-04 20:54 ` Maciej W. Rozycki 2004-08-04 23:39 ` Nigel Stephens 2004-08-07 19:01 ` Richard Sandiford 2004-08-09 22:08 ` Richard Henderson 2004-08-10 5:30 ` Richard Sandiford 2004-08-10 23:20 ` Richard Henderson 2004-08-11 0:24 ` Andreas Schwab 2004-08-11 0:40 ` Paul Brook 2004-08-11 4:32 ` Richard Henderson 2004-08-31 19:51 ` Richard Sandiford 2004-09-03 6:53 ` Richard Henderson 2004-09-03 7:05 ` Richard Sandiford 2004-09-03 7:08 ` Richard Henderson 2004-09-03 7:11 ` Richard Sandiford 2004-09-03 7:20 ` Richard Henderson 2004-09-03 7:29 ` Richard Sandiford 2004-09-03 20:15 ` Richard Henderson 2004-09-04 8:53 ` Richard Sandiford 2004-09-05 0:03 ` Richard Henderson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.