All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] tcg: Improve extract and deposit code gen
@ 2026-02-04  5:24 Richard Henderson
  2026-02-04  5:24 ` [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Supercedes: 20260119000740.50516-1-richard.henderson@linaro.org
[PATCH 0/3] tcg: Lower deposit/extract2 during optimize

Supercedes: 20260115135453.140870-1-pbonzini@redhat.com
[PATCH 0/2] tcg: improve instruction selection for extract and deposit_z

This is a merge of these two patch sets.  I'm not sure what
inputs you were looking at, Paolo?

From random aarch64 guest binaries, and an x86_64 host, I still
see most benefit from the lowering during optimize.  It's not
lots, but every little bit helps, I guess.


r~


Paolo Bonzini (2):
  tcg: Add tcg_op_imm_match
  tcg: target-dependent lowering of extract to shr/and

Richard Henderson (4):
  tcg/optimize: Lower unsupported deposit during optimize
  tcg/optimize: Lower unsupported extract2 during optimize
  tcg: Expand missing rotri with extract2
  tcg/optimize: possibly expand deposit into zero with shifts

 tcg/tcg-internal.h |   5 +
 tcg/optimize.c     | 279 ++++++++++++++++++++++++++++++++++++++++-----
 tcg/tcg-op.c       | 210 ++++++++--------------------------
 tcg/tcg.c          |  21 +++-
 4 files changed, 322 insertions(+), 193 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-25 13:34   ` Jim MacArthur
  2026-02-04  5:24 ` [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 " Richard Henderson
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

The expansions that we chose in tcg-op.c may be less than optimial.
Delay lowering until optimize, so that we have propagated constants
and have computed known zero/one masks.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 194 +++++++++++++++++++++++++++++++++++++++++++------
 tcg/tcg-op.c   | 113 ++--------------------------
 2 files changed, 178 insertions(+), 129 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 801a0a2c68..890c8068fb 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1652,12 +1652,17 @@ static bool fold_ctpop(OptContext *ctx, TCGOp *op)
 
 static bool fold_deposit(OptContext *ctx, TCGOp *op)
 {
-    TempOptInfo *t1 = arg_info(op->args[1]);
-    TempOptInfo *t2 = arg_info(op->args[2]);
+    TCGArg ret = op->args[0];
+    TCGArg arg1 = op->args[1];
+    TCGArg arg2 = op->args[2];
     int ofs = op->args[3];
     int len = op->args[4];
-    int width = 8 * tcg_type_size(ctx->type);
-    uint64_t z_mask, o_mask, s_mask;
+    TempOptInfo *t1 = arg_info(arg1);
+    TempOptInfo *t2 = arg_info(arg2);
+    int width;
+    uint64_t z_mask, o_mask, s_mask, type_mask, len_mask;
+    TCGOp *op2;
+    bool valid;
 
     if (ti_is_const(t1) && ti_is_const(t2)) {
         return tcg_opt_gen_movi(ctx, op, op->args[0],
@@ -1665,35 +1670,182 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
                                           ti_const_val(t2)));
     }
 
-    /* Inserting a value into zero at offset 0. */
-    if (ti_is_const_val(t1, 0) && ofs == 0) {
-        uint64_t mask = MAKE_64BIT_MASK(0, len);
+    width = 8 * tcg_type_size(ctx->type);
+    type_mask = MAKE_64BIT_MASK(0, width);
+    len_mask = MAKE_64BIT_MASK(0, len);
 
+    /* Inserting all-zero into a value. */
+    if ((t2->z_mask & len_mask) == 0) {
         op->opc = INDEX_op_and;
-        op->args[1] = op->args[2];
-        op->args[2] = arg_new_constant(ctx, mask);
+        op->args[2] = arg_new_constant(ctx, ~(len_mask << ofs));
         return fold_and(ctx, op);
     }
 
-    /* Inserting zero into a value. */
-    if (ti_is_const_val(t2, 0)) {
-        uint64_t mask = deposit64(-1, ofs, len, 0);
-
-        op->opc = INDEX_op_and;
-        op->args[2] = arg_new_constant(ctx, mask);
-        return fold_and(ctx, op);
+    /* Inserting all-one into a value. */
+    if ((t2->o_mask & len_mask) == len_mask) {
+        op->opc = INDEX_op_or;
+        op->args[2] = arg_new_constant(ctx, len_mask << ofs);
+        return fold_or(ctx, op);
     }
 
-    /* The s_mask from the top portion of the deposit is still valid. */
-    if (ofs + len == width) {
-        s_mask = t2->s_mask << ofs;
-    } else {
-        s_mask = t1->s_mask & ~MAKE_64BIT_MASK(0, ofs + len);
+    valid = TCG_TARGET_deposit_valid(ctx->type, ofs, len);
+
+    /* Lower invalid deposit of constant as AND + OR. */
+    if (!valid && ti_is_const(t2)) {
+        uint64_t ins_val = (ti_const_val(t2) & len_mask) << ofs;
+
+        op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
+        op2->args[0] = ret;
+        op2->args[1] = arg1;
+        op2->args[2] = arg_new_constant(ctx, ~(len_mask << ofs));
+        fold_and(ctx, op2);
+
+        op->opc = INDEX_op_or;
+        op->args[1] = ret;
+        op->args[2] = arg_new_constant(ctx, ins_val);
+        return fold_or(ctx, op);
     }
 
+    /*
+     * Compute result masks before calling other fold_* subroutines
+     * which could modify the masks of our inputs.
+     */
     z_mask = deposit64(t1->z_mask, ofs, len, t2->z_mask);
     o_mask = deposit64(t1->o_mask, ofs, len, t2->o_mask);
+    if (ofs + len < width) {
+        s_mask = t1->s_mask & ~MAKE_64BIT_MASK(0, ofs + len);
+    } else {
+        s_mask = t2->s_mask << ofs;
+    }
 
+    /* Inserting a value into zero. */
+    if (ti_is_const_val(t1, 0)) {
+        uint64_t need_mask;
+
+        /* Always lower deposit into zero at 0 as AND. */
+        if (ofs == 0) {
+            op->opc = INDEX_op_and;
+            op->args[1] = arg2;
+            op->args[2] = arg_new_constant(ctx, len_mask);
+            return fold_and(ctx, op);
+        }
+
+        /*
+         * If the portion of the value outside len that remains after
+         * shifting is zero, we can elide the mask and just shift.
+         */
+        need_mask = t2->z_mask & ~len_mask;
+        need_mask = (need_mask << ofs) & type_mask;
+        if (!need_mask) {
+            op->opc = INDEX_op_shl;
+            op->args[1] = arg2;
+            op->args[2] = arg_new_constant(ctx, ofs);
+            goto done;
+        }
+
+        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
+        if (!valid) {
+            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
+                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+                op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = arg_new_constant(ctx, ofs);
+
+                op->opc = INDEX_op_extract;
+                op->args[1] = ret;
+                op->args[2] = 0;
+                op->args[3] = ofs + len;
+                goto done;
+            }
+
+            op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
+            op2->args[0] = ret;
+            op2->args[1] = arg2;
+            op2->args[2] = arg_new_constant(ctx, len_mask);
+            fold_and(ctx, op2);
+
+            op->opc = INDEX_op_shl;
+            op->args[1] = ret;
+            op->args[2] = arg_new_constant(ctx, ofs);
+            goto done;
+        }
+    }
+
+    /* After special cases, lower invalid deposit. */
+    if (!valid) {
+        TCGArg tmp;
+        bool has_ext2 = tcg_op_supported(INDEX_op_extract2, ctx->type, 0);
+        bool has_rotl = tcg_op_supported(INDEX_op_rotl, ctx->type, 0);
+
+        /*
+         * ret = arg2:arg1 >> len
+         * ret = rotl(ret, len)
+         */
+        if (ofs == 0 && has_ext2 && has_rotl) {
+            op2 = opt_insert_before(ctx, op, INDEX_op_extract2, 4);
+            op2->args[0] = ret;
+            op2->args[1] = arg1;
+            op2->args[2] = arg2;
+            op2->args[3] = len;
+
+            op->opc = INDEX_op_rotl;
+            op->args[1] = ret;
+            op->args[2] = arg_new_constant(ctx, len);
+            goto done;
+        }
+
+        /*
+         * tmp = arg1 << len
+         * ret = arg2:tmp >> len
+         */
+        if (ofs + len == width && has_ext2) {
+            tmp = ret == arg2 ? arg_new_temp(ctx) : ret;
+
+            op2 = opt_insert_before(ctx, op, INDEX_op_shl, 4);
+            op2->args[0] = tmp;
+            op2->args[1] = arg1;
+            op2->args[2] = arg_new_constant(ctx, len);
+
+            op->opc = INDEX_op_extract2;
+            op->args[0] = ret;
+            op->args[1] = tmp;
+            op->args[2] = arg2;
+            op->args[3] = len;
+            goto done;
+        }
+
+        /*
+         * tmp = arg2 & mask
+         * ret = arg1 & ~(mask << ofs)
+         * tmp = tmp << ofs
+         * ret = ret | tmp
+         */
+        tmp = arg_new_temp(ctx);
+
+        op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
+        op2->args[0] = tmp;
+        op2->args[1] = arg2;
+        op2->args[2] = arg_new_constant(ctx, len_mask);
+        fold_and(ctx, op2);
+
+        op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
+        op2->args[0] = tmp;
+        op2->args[1] = tmp;
+        op2->args[2] = arg_new_constant(ctx, ofs);
+
+        op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
+        op2->args[0] = ret;
+        op2->args[1] = arg1;
+        op2->args[2] = arg_new_constant(ctx, ~(len_mask << ofs));
+        fold_and(ctx, op2);
+
+        op->opc = INDEX_op_or;
+        op->args[1] = ret;
+        op->args[2] = tmp;
+    }
+
+ done:
     return fold_masks_zos(ctx, op, z_mask, o_mask, s_mask);
 }
 
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 8d67acc4fc..96f72ba381 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -876,9 +876,6 @@ void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
                          unsigned int ofs, unsigned int len)
 {
-    uint32_t mask;
-    TCGv_i32 t1;
-
     tcg_debug_assert(ofs < 32);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 32);
@@ -886,39 +883,9 @@ void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
 
     if (len == 32) {
         tcg_gen_mov_i32(ret, arg2);
-        return;
-    }
-    if (TCG_TARGET_deposit_valid(TCG_TYPE_I32, ofs, len)) {
-        tcg_gen_op5ii_i32(INDEX_op_deposit, ret, arg1, arg2, ofs, len);
-        return;
-    }
-
-    t1 = tcg_temp_ebb_new_i32();
-
-    if (tcg_op_supported(INDEX_op_extract2, TCG_TYPE_I32, 0)) {
-        if (ofs + len == 32) {
-            tcg_gen_shli_i32(t1, arg1, len);
-            tcg_gen_extract2_i32(ret, t1, arg2, len);
-            goto done;
-        }
-        if (ofs == 0) {
-            tcg_gen_extract2_i32(ret, arg1, arg2, len);
-            tcg_gen_rotli_i32(ret, ret, len);
-            goto done;
-        }
-    }
-
-    mask = (1u << len) - 1;
-    if (ofs + len < 32) {
-        tcg_gen_andi_i32(t1, arg2, mask);
-        tcg_gen_shli_i32(t1, t1, ofs);
     } else {
-        tcg_gen_shli_i32(t1, arg2, ofs);
+        tcg_gen_op5ii_i32(INDEX_op_deposit, ret, arg1, arg2, ofs, len);
     }
-    tcg_gen_andi_i32(ret, arg1, ~(mask << ofs));
-    tcg_gen_or_i32(ret, ret, t1);
- done:
-    tcg_temp_free_i32(t1);
 }
 
 void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
@@ -932,28 +899,10 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
     if (ofs + len == 32) {
         tcg_gen_shli_i32(ret, arg, ofs);
     } else if (ofs == 0) {
-        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
-    } else if (TCG_TARGET_deposit_valid(TCG_TYPE_I32, ofs, len)) {
+        tcg_gen_extract_i32(ret, arg, 0, len);
+    } else {
         TCGv_i32 zero = tcg_constant_i32(0);
         tcg_gen_op5ii_i32(INDEX_op_deposit, ret, zero, arg, ofs, len);
-    } else {
-        /*
-         * To help two-operand hosts we prefer to zero-extend first,
-         * which allows ARG to stay live.
-         */
-        if (TCG_TARGET_extract_valid(TCG_TYPE_I32, 0, len)) {
-            tcg_gen_extract_i32(ret, arg, 0, len);
-            tcg_gen_shli_i32(ret, ret, ofs);
-            return;
-        }
-        /* Otherwise prefer zero-extension over AND for code size.  */
-        if (TCG_TARGET_extract_valid(TCG_TYPE_I32, 0, ofs + len)) {
-            tcg_gen_shli_i32(ret, arg, ofs);
-            tcg_gen_extract_i32(ret, ret, 0, ofs + len);
-            return;
-        }
-        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
-        tcg_gen_shli_i32(ret, ret, ofs);
     }
 }
 
@@ -2148,9 +2097,6 @@ void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
                          unsigned int ofs, unsigned int len)
 {
-    uint64_t mask;
-    TCGv_i64 t1;
-
     tcg_debug_assert(ofs < 64);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 64);
@@ -2158,40 +2104,9 @@ void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
 
     if (len == 64) {
         tcg_gen_mov_i64(ret, arg2);
-        return;
-    }
-
-    if (TCG_TARGET_deposit_valid(TCG_TYPE_I64, ofs, len)) {
-        tcg_gen_op5ii_i64(INDEX_op_deposit, ret, arg1, arg2, ofs, len);
-        return;
-    }
-
-    t1 = tcg_temp_ebb_new_i64();
-
-    if (tcg_op_supported(INDEX_op_extract2, TCG_TYPE_I64, 0)) {
-        if (ofs + len == 64) {
-            tcg_gen_shli_i64(t1, arg1, len);
-            tcg_gen_extract2_i64(ret, t1, arg2, len);
-            goto done;
-        }
-        if (ofs == 0) {
-            tcg_gen_extract2_i64(ret, arg1, arg2, len);
-            tcg_gen_rotli_i64(ret, ret, len);
-            goto done;
-        }
-    }
-
-    mask = (1ull << len) - 1;
-    if (ofs + len < 64) {
-        tcg_gen_andi_i64(t1, arg2, mask);
-        tcg_gen_shli_i64(t1, t1, ofs);
     } else {
-        tcg_gen_shli_i64(t1, arg2, ofs);
+        tcg_gen_op5ii_i64(INDEX_op_deposit, ret, arg1, arg2, ofs, len);
     }
-    tcg_gen_andi_i64(ret, arg1, ~(mask << ofs));
-    tcg_gen_or_i64(ret, ret, t1);
- done:
-    tcg_temp_free_i64(t1);
 }
 
 void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
@@ -2206,27 +2121,9 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
         tcg_gen_shli_i64(ret, arg, ofs);
     } else if (ofs == 0) {
         tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
-    } else if (TCG_TARGET_deposit_valid(TCG_TYPE_I64, ofs, len)) {
+    } else {
         TCGv_i64 zero = tcg_constant_i64(0);
         tcg_gen_op5ii_i64(INDEX_op_deposit, ret, zero, arg, ofs, len);
-    } else {
-        /*
-         * To help two-operand hosts we prefer to zero-extend first,
-         * which allows ARG to stay live.
-         */
-        if (TCG_TARGET_extract_valid(TCG_TYPE_I64, 0, len)) {
-            tcg_gen_extract_i64(ret, arg, 0, len);
-            tcg_gen_shli_i64(ret, ret, ofs);
-            return;
-        }
-        /* Otherwise prefer zero-extension over AND for code size.  */
-        if (TCG_TARGET_extract_valid(TCG_TYPE_I64, 0, ofs + len)) {
-            tcg_gen_shli_i64(ret, arg, ofs);
-            tcg_gen_extract_i64(ret, ret, 0, ofs + len);
-            return;
-        }
-        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
-        tcg_gen_shli_i64(ret, ret, ofs);
     }
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 during optimize
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
  2026-02-04  5:24 ` [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-25 14:47   ` Jim MacArthur
  2026-02-04  5:24 ` [PATCH v2 3/6] tcg: Expand missing rotri with extract2 Richard Henderson
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, Manos Pitsidianakis

The expansions that we chose in tcg-op.c may be less than optimial.
Delay lowering until optimize, so that we have propagated constants
and have computed known zero/one masks.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++----
 tcg/tcg-op.c   |  9 ++------
 2 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 890c8068fb..e6a16921c9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1933,21 +1933,74 @@ static bool fold_extract2(OptContext *ctx, TCGOp *op)
     uint64_t z2 = t2->z_mask;
     uint64_t o1 = t1->o_mask;
     uint64_t o2 = t2->o_mask;
+    uint64_t zr, or;
     int shr = op->args[3];
+    int shl;
 
     if (ctx->type == TCG_TYPE_I32) {
         z1 = (uint32_t)z1 >> shr;
         o1 = (uint32_t)o1 >> shr;
-        z2 = (uint64_t)((int32_t)z2 << (32 - shr));
-        o2 = (uint64_t)((int32_t)o2 << (32 - shr));
+        shl = 32 - shr;
+        z2 = (uint64_t)((int32_t)z2 << shl);
+        o2 = (uint64_t)((int32_t)o2 << shl);
     } else {
         z1 >>= shr;
         o1 >>= shr;
-        z2 <<= 64 - shr;
-        o2 <<= 64 - shr;
+        shl = 64 - shr;
+        z2 <<= shl;
+        o2 <<= shl;
+    }
+    zr = z1 | z2;
+    or = o1 | o2;
+
+    if (zr == or) {
+        return tcg_opt_gen_movi(ctx, op, op->args[0], zr);
     }
 
-    return fold_masks_zo(ctx, op, z1 | z2, o1 | o2);
+    if (z2 == 0) {
+        /* High part zeros folds to simple right shift. */
+        op->opc = INDEX_op_shr;
+        op->args[2] = arg_new_constant(ctx, shr);
+    } else if (z1 == 0) {
+        /* Low part zeros folds to simple left shift. */
+        op->opc = INDEX_op_shl;
+        op->args[1] = op->args[2];
+        op->args[2] = arg_new_constant(ctx, shl);
+    } else if (!tcg_op_supported(INDEX_op_extract2, ctx->type, 0)) {
+        TCGArg tmp = arg_new_temp(ctx);
+        TCGOp *op2 = opt_insert_before(ctx, op, INDEX_op_shr, 3);
+
+        op2->args[0] = tmp;
+        op2->args[1] = op->args[1];
+        op2->args[2] = arg_new_constant(ctx, shr);
+
+        if (TCG_TARGET_deposit_valid(ctx->type, shl, shr)) {
+            /*
+             * Deposit has more arguments than extract2,
+             * so we need to create a new TCGOp.
+             */
+            op2 = opt_insert_before(ctx, op, INDEX_op_deposit, 5);
+            op2->args[0] = op->args[0];
+            op2->args[1] = tmp;
+            op2->args[2] = op->args[2];
+            op2->args[3] = shl;
+            op2->args[4] = shr;
+
+            tcg_op_remove(ctx->tcg, op);
+            op = op2;
+        } else {
+            op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
+            op2->args[0] = op->args[0];
+            op2->args[1] = op->args[2];
+            op2->args[2] = arg_new_constant(ctx, shl);
+
+            op->opc = INDEX_op_or;
+            op->args[1] = op->args[0];
+            op->args[2] = tmp;
+        }
+    }
+
+    return fold_masks_zo(ctx, op, zr, or);
 }
 
 static bool fold_exts(OptContext *ctx, TCGOp *op)
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 96f72ba381..8a4fd14ad5 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1000,13 +1000,8 @@ void tcg_gen_extract2_i32(TCGv_i32 ret, TCGv_i32 al, TCGv_i32 ah,
         tcg_gen_mov_i32(ret, ah);
     } else if (al == ah) {
         tcg_gen_rotri_i32(ret, al, ofs);
-    } else if (tcg_op_supported(INDEX_op_extract2, TCG_TYPE_I32, 0)) {
-        tcg_gen_op4i_i32(INDEX_op_extract2, ret, al, ah, ofs);
     } else {
-        TCGv_i32 t0 = tcg_temp_ebb_new_i32();
-        tcg_gen_shri_i32(t0, al, ofs);
-        tcg_gen_deposit_i32(ret, t0, ah, 32 - ofs, ofs);
-        tcg_temp_free_i32(t0);
+        tcg_gen_op4i_i32(INDEX_op_extract2, ret, al, ah, ofs);
     }
 }
 
@@ -2221,7 +2216,7 @@ void tcg_gen_extract2_i64(TCGv_i64 ret, TCGv_i64 al, TCGv_i64 ah,
         tcg_gen_mov_i64(ret, ah);
     } else if (al == ah) {
         tcg_gen_rotri_i64(ret, al, ofs);
-    } else if (tcg_op_supported(INDEX_op_extract2, TCG_TYPE_I64, 0)) {
+    } else if (TCG_TARGET_REG_BITS == 64) {
         tcg_gen_op4i_i64(INDEX_op_extract2, ret, al, ah, ofs);
     } else {
         TCGv_i64 t0 = tcg_temp_ebb_new_i64();
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/6] tcg: Expand missing rotri with extract2
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
  2026-02-04  5:24 ` [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
  2026-02-04  5:24 ` [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 " Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-25 14:54   ` Jim MacArthur
  2026-02-04  5:24 ` [PATCH v2 4/6] tcg: Add tcg_op_imm_match Richard Henderson
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Use extract2 to implement rotri.  To make this easier,
redefine rotli in terms of rotri, rather than the reverse.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 52 ++++++++++++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 8a4fd14ad5..078adce610 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -826,23 +826,12 @@ void tcg_gen_rotl_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 void tcg_gen_rotli_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
     tcg_debug_assert(arg2 >= 0 && arg2 < 32);
-    /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i32(ret, arg1);
     } else if (tcg_op_supported(INDEX_op_rotl, TCG_TYPE_I32, 0)) {
-        TCGv_i32 t0 = tcg_constant_i32(arg2);
-        tcg_gen_op3_i32(INDEX_op_rotl, ret, arg1, t0);
-    } else if (tcg_op_supported(INDEX_op_rotr, TCG_TYPE_I32, 0)) {
-        TCGv_i32 t0 = tcg_constant_i32(32 - arg2);
-        tcg_gen_op3_i32(INDEX_op_rotr, ret, arg1, t0);
+        tcg_gen_op3_i32(INDEX_op_rotl, ret, arg1, tcg_constant_i32(arg2));
     } else {
-        TCGv_i32 t0 = tcg_temp_ebb_new_i32();
-        TCGv_i32 t1 = tcg_temp_ebb_new_i32();
-        tcg_gen_shli_i32(t0, arg1, arg2);
-        tcg_gen_shri_i32(t1, arg1, 32 - arg2);
-        tcg_gen_or_i32(ret, t0, t1);
-        tcg_temp_free_i32(t0);
-        tcg_temp_free_i32(t1);
+        tcg_gen_rotri_i32(ret, arg1, -arg2 & 31);
     }
 }
 
@@ -870,7 +859,16 @@ void tcg_gen_rotr_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2)
 void tcg_gen_rotri_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
 {
     tcg_debug_assert(arg2 >= 0 && arg2 < 32);
-    tcg_gen_rotli_i32(ret, arg1, -arg2 & 31);
+    if (arg2 == 0) {
+        tcg_gen_mov_i32(ret, arg1);
+    } else if (tcg_op_supported(INDEX_op_rotr, TCG_TYPE_I32, 0)) {
+        tcg_gen_op3_i32(INDEX_op_rotr, ret, arg1, tcg_constant_i32(arg2));
+    } else if (tcg_op_supported(INDEX_op_rotl, TCG_TYPE_I32, 0)) {
+        tcg_gen_op3_i32(INDEX_op_rotl, ret, arg1, tcg_constant_i32(32 - arg2));
+    } else {
+        /* Do not recurse with the rotri simplification. */
+        tcg_gen_op4i_i32(INDEX_op_extract2, ret, arg1, arg1, arg2);
+    }
 }
 
 void tcg_gen_deposit_i32(TCGv_i32 ret, TCGv_i32 arg1, TCGv_i32 arg2,
@@ -2042,23 +2040,12 @@ void tcg_gen_rotl_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 void tcg_gen_rotli_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
     tcg_debug_assert(arg2 >= 0 && arg2 < 64);
-    /* some cases can be optimized here */
     if (arg2 == 0) {
         tcg_gen_mov_i64(ret, arg1);
     } else if (tcg_op_supported(INDEX_op_rotl, TCG_TYPE_I64, 0)) {
-        TCGv_i64 t0 = tcg_constant_i64(arg2);
-        tcg_gen_op3_i64(INDEX_op_rotl, ret, arg1, t0);
-    } else if (tcg_op_supported(INDEX_op_rotr, TCG_TYPE_I64, 0)) {
-        TCGv_i64 t0 = tcg_constant_i64(64 - arg2);
-        tcg_gen_op3_i64(INDEX_op_rotr, ret, arg1, t0);
+        tcg_gen_op3_i64(INDEX_op_rotl, ret, arg1, tcg_constant_i64(arg2));
     } else {
-        TCGv_i64 t0 = tcg_temp_ebb_new_i64();
-        TCGv_i64 t1 = tcg_temp_ebb_new_i64();
-        tcg_gen_shli_i64(t0, arg1, arg2);
-        tcg_gen_shri_i64(t1, arg1, 64 - arg2);
-        tcg_gen_or_i64(ret, t0, t1);
-        tcg_temp_free_i64(t0);
-        tcg_temp_free_i64(t1);
+        tcg_gen_rotri_i64(ret, arg1, -arg2 & 63);
     }
 }
 
@@ -2086,7 +2073,16 @@ void tcg_gen_rotr_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2)
 void tcg_gen_rotri_i64(TCGv_i64 ret, TCGv_i64 arg1, int64_t arg2)
 {
     tcg_debug_assert(arg2 >= 0 && arg2 < 64);
-    tcg_gen_rotli_i64(ret, arg1, -arg2 & 63);
+    if (arg2 == 0) {
+        tcg_gen_mov_i64(ret, arg1);
+    } else if (tcg_op_supported(INDEX_op_rotr, TCG_TYPE_I64, 0)) {
+        tcg_gen_op3_i64(INDEX_op_rotr, ret, arg1, tcg_constant_i64(arg2));
+    } else if (tcg_op_supported(INDEX_op_rotl, TCG_TYPE_I64, 0)) {
+        tcg_gen_op3_i64(INDEX_op_rotl, ret, arg1, tcg_constant_i64(64 - arg2));
+    } else {
+        /* Do not recurse with the rotri simplification. */
+        tcg_gen_op4i_i64(INDEX_op_extract2, ret, arg1, arg1, arg2);
+    }
 }
 
 void tcg_gen_deposit_i64(TCGv_i64 ret, TCGv_i64 arg1, TCGv_i64 arg2,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/6] tcg: Add tcg_op_imm_match
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
                   ` (2 preceding siblings ...)
  2026-02-04  5:24 ` [PATCH v2 3/6] tcg: Expand missing rotri with extract2 Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-25 15:06   ` Jim MacArthur
  2026-02-04  5:24 ` [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and Richard Henderson
  2026-02-04  5:24 ` [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts Richard Henderson
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

From: Paolo Bonzini <pbonzini@redhat.com>

Create a function to test whether the second operand of a
binary operation allows a given immediate.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[rth: Split out from a larger patch; keep the declaration internal.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-internal.h |  5 +++++
 tcg/tcg.c          | 21 +++++++++++++++++----
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 2cbfb5d5ca..c1ce50998e 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -94,4 +94,9 @@ TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op,
 TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op,
                            TCGOpcode, TCGType, unsigned nargs);
 
+/*
+ * For a binary opcode OP, return true if the second input operand allows IMM.
+ */
+bool tcg_op_imm_match(TCGOpcode op, TCGType type, tcg_target_ulong imm);
+
 #endif /* TCG_INTERNAL_H */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index e7bf4dad4e..778268f5cd 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -3391,11 +3391,9 @@ static void process_constraint_sets(void)
     }
 }
 
-static const TCGArgConstraint *opcode_args_ct(const TCGOp *op)
+static const TCGArgConstraint *op_args_ct(TCGOpcode opc, TCGType type,
+                                          unsigned flags)
 {
-    TCGOpcode opc = op->opc;
-    TCGType type = TCGOP_TYPE(op);
-    unsigned flags = TCGOP_FLAGS(op);
     const TCGOpDef *def = &tcg_op_defs[opc];
     const TCGOutOp *outop = all_outop[opc];
     TCGConstraintSetIndex con_set;
@@ -3422,6 +3420,21 @@ static const TCGArgConstraint *opcode_args_ct(const TCGOp *op)
     return all_cts[con_set];
 }
 
+static const TCGArgConstraint *opcode_args_ct(const TCGOp *op)
+{
+    return op_args_ct(op->opc, TCGOP_TYPE(op), TCGOP_FLAGS(op));
+}
+
+bool tcg_op_imm_match(TCGOpcode opc, TCGType type, tcg_target_ulong imm)
+{
+    const TCGArgConstraint *args_ct = op_args_ct(opc, type, 0);
+    const TCGOpDef *def = &tcg_op_defs[opc];
+
+    tcg_debug_assert(def->nb_oargs == 1);
+    tcg_debug_assert(def->nb_iargs == 2);
+    return tcg_target_const_match(imm, args_ct[2].ct, type, 0, 0);
+}
+
 static void remove_label_use(TCGOp *op, int idx)
 {
     TCGLabel *label = arg_label(op->args[idx]);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
                   ` (3 preceding siblings ...)
  2026-02-04  5:24 ` [PATCH v2 4/6] tcg: Add tcg_op_imm_match Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-25 15:16   ` Jim MacArthur
  2026-02-04  5:24 ` [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts Richard Henderson
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

From: Paolo Bonzini <pbonzini@redhat.com>

Instead of assuming only small immediates are available for AND,
consult the backend in order to decide between SHL/SHR and SHR/AND.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[rth: Split from a larger patch]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 078adce610..263d208002 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -907,6 +907,8 @@ void tcg_gen_deposit_z_i32(TCGv_i32 ret, TCGv_i32 arg,
 void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
                          unsigned int ofs, unsigned int len)
 {
+    uint32_t mask;
+
     tcg_debug_assert(ofs < 32);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 32);
@@ -922,8 +924,10 @@ void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
         tcg_gen_op4ii_i32(INDEX_op_extract, ret, arg, ofs, len);
         return;
     }
+
+    mask = (1u << len) - 1;
     if (ofs == 0) {
-        tcg_gen_andi_i32(ret, arg, (1u << len) - 1);
+        tcg_gen_andi_i32(ret, arg, mask);
         return;
     }
 
@@ -934,18 +938,12 @@ void tcg_gen_extract_i32(TCGv_i32 ret, TCGv_i32 arg,
         return;
     }
 
-    /* ??? Ideally we'd know what values are available for immediate AND.
-       Assume that 8 bits are available, plus the special case of 16,
-       so that we get ext8u, ext16u.  */
-    switch (len) {
-    case 1 ... 8: case 16:
+    if (tcg_op_imm_match(INDEX_op_and, TCG_TYPE_I32, mask)) {
         tcg_gen_shri_i32(ret, arg, ofs);
-        tcg_gen_andi_i32(ret, ret, (1u << len) - 1);
-        break;
-    default:
+        tcg_gen_andi_i32(ret, ret, mask);
+    } else {
         tcg_gen_shli_i32(ret, arg, 32 - len - ofs);
         tcg_gen_shri_i32(ret, ret, 32 - len);
-        break;
     }
 }
 
@@ -2121,6 +2119,8 @@ void tcg_gen_deposit_z_i64(TCGv_i64 ret, TCGv_i64 arg,
 void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
                          unsigned int ofs, unsigned int len)
 {
+    uint64_t mask;
+
     tcg_debug_assert(ofs < 64);
     tcg_debug_assert(len > 0);
     tcg_debug_assert(len <= 64);
@@ -2136,8 +2136,10 @@ void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
         tcg_gen_op4ii_i64(INDEX_op_extract, ret, arg, ofs, len);
         return;
     }
+
+    mask = (1ull << len) - 1;
     if (ofs == 0) {
-        tcg_gen_andi_i64(ret, arg, (1ull << len) - 1);
+        tcg_gen_andi_i64(ret, arg, mask);
         return;
     }
 
@@ -2148,18 +2150,12 @@ void tcg_gen_extract_i64(TCGv_i64 ret, TCGv_i64 arg,
         return;
     }
 
-    /* ??? Ideally we'd know what values are available for immediate AND.
-       Assume that 8 bits are available, plus the special cases of 16 and 32,
-       so that we get ext8u, ext16u, and ext32u.  */
-    switch (len) {
-    case 1 ... 8: case 16: case 32:
+    if (tcg_op_imm_match(INDEX_op_and, TCG_TYPE_I64, mask)) {
         tcg_gen_shri_i64(ret, arg, ofs);
-        tcg_gen_andi_i64(ret, ret, (1ull << len) - 1);
-        break;
-    default:
+        tcg_gen_andi_i64(ret, ret, mask);
+    } else {
         tcg_gen_shli_i64(ret, arg, 64 - len - ofs);
         tcg_gen_shri_i64(ret, ret, 64 - len);
-        break;
     }
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
                   ` (4 preceding siblings ...)
  2026-02-04  5:24 ` [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and Richard Henderson
@ 2026-02-04  5:24 ` Richard Henderson
  2026-02-04  8:05   ` Paolo Bonzini
  5 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  5:24 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini

Use tcg_op_imm_match to choose between expanding with AND+SHL vs SHL+SHR.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index e6a16921c9..2944c5a748 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
             goto done;
         }
 
-        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
+        /* Lower invalid deposit into zero. */
         if (!valid) {
-            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
-                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
+                /* EXTRACT (at 0) + SHL */
+                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = 0;
+                op2->args[3] = len;
+            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len)) {
+                /* SHL + EXTRACT (at 0) */
                 op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
                 op2->args[0] = ret;
                 op2->args[1] = arg2;
@@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
                 op->args[2] = 0;
                 op->args[3] = ofs + len;
                 goto done;
+            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, len_mask)) {
+                /* AND + SHL */
+                op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = arg_new_constant(ctx, len_mask);
+            } else {
+                /* SHL + SHR */
+                int shl = width - len;
+                int shr = width - len - ofs;
+
+                op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
+                op2->args[0] = ret;
+                op2->args[1] = arg2;
+                op2->args[2] = arg_new_constant(ctx, shl);
+
+                op->opc = INDEX_op_shr;
+                op->args[1] = ret;
+                op->args[2] = arg_new_constant(ctx, shr);
+                goto done;
             }
 
-            op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
-            op2->args[0] = ret;
-            op2->args[1] = arg2;
-            op2->args[2] = arg_new_constant(ctx, len_mask);
-            fold_and(ctx, op2);
-
+            /* Finish the (EXTRACT|AND) + SHL cases. */
             op->opc = INDEX_op_shl;
             op->args[1] = ret;
             op->args[2] = arg_new_constant(ctx, ofs);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04  5:24 ` [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts Richard Henderson
@ 2026-02-04  8:05   ` Paolo Bonzini
  2026-02-04  9:06     ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2026-02-04  8:05 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

On 2/4/26 06:24, Richard Henderson wrote:
> Use tcg_op_imm_match to choose between expanding with AND+SHL vs SHL+SHR.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
>   1 file changed, 31 insertions(+), 9 deletions(-)
> 
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index e6a16921c9..2944c5a748 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>               goto done;
>           }
>   
> -        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
> +        /* Lower invalid deposit into zero. */
>           if (!valid) {
> -            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
> -                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> +            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> +                /* EXTRACT (at 0) + SHL */
> +                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
> +                op2->args[0] = ret;
> +                op2->args[1] = arg2;
> +                op2->args[2] = 0;
> +                op2->args[3] = len;
> +            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len)) {
> +                /* SHL + EXTRACT (at 0) */
>                   op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
>                   op2->args[0] = ret;
>                   op2->args[1] = arg2;
> @@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>                   op->args[2] = 0;
>                   op->args[3] = ofs + len;
>                   goto done;
> +            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, len_mask)) {
> +                /* AND + SHL */

Even if these extracts are valid, can they really be cheaper then an AND 
with immediate argument, or back to back shifts?  You still have a 
dependency between the two instruction.  I wouldn't bother with using 
EXTRACT here.

Paolo

> +                op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
> +                op2->args[0] = ret;
> +                op2->args[1] = arg2;
> +                op2->args[2] = arg_new_constant(ctx, len_mask);
> +            } else {
> +                /* SHL + SHR */
> +                int shl = width - len;
> +                int shr = width - len - ofs;
> +
> +                op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
> +                op2->args[0] = ret;
> +                op2->args[1] = arg2;
> +                op2->args[2] = arg_new_constant(ctx, shl);
> +
> +                op->opc = INDEX_op_shr;
> +                op->args[1] = ret;
> +                op->args[2] = arg_new_constant(ctx, shr);
> +                goto done;
>               }
>   
> -            op2 = opt_insert_before(ctx, op, INDEX_op_and, 3);
> -            op2->args[0] = ret;
> -            op2->args[1] = arg2;
> -            op2->args[2] = arg_new_constant(ctx, len_mask);
> -            fold_and(ctx, op2);
> -
> +            /* Finish the (EXTRACT|AND) + SHL cases. */
>               op->opc = INDEX_op_shl;
>               op->args[1] = ret;
>               op->args[2] = arg_new_constant(ctx, ofs);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04  8:05   ` Paolo Bonzini
@ 2026-02-04  9:06     ` Richard Henderson
  2026-02-04 10:41       ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04  9:06 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

On 2/4/26 18:05, Paolo Bonzini wrote:
> On 2/4/26 06:24, Richard Henderson wrote:
>> Use tcg_op_imm_match to choose between expanding with AND+SHL vs SHL+SHR.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
>>   1 file changed, 31 insertions(+), 9 deletions(-)
>>
>> diff --git a/tcg/optimize.c b/tcg/optimize.c
>> index e6a16921c9..2944c5a748 100644
>> --- a/tcg/optimize.c
>> +++ b/tcg/optimize.c
>> @@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>>               goto done;
>>           }
>> -        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
>> +        /* Lower invalid deposit into zero. */
>>           if (!valid) {
>> -            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
>> -                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
>> +            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
>> +                /* EXTRACT (at 0) + SHL */
>> +                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
>> +                op2->args[0] = ret;
>> +                op2->args[1] = arg2;
>> +                op2->args[2] = 0;
>> +                op2->args[3] = len;
>> +            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len)) {
>> +                /* SHL + EXTRACT (at 0) */
>>                   op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
>>                   op2->args[0] = ret;
>>                   op2->args[1] = arg2;
>> @@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp *op)
>>                   op->args[2] = 0;
>>                   op->args[3] = ofs + len;
>>                   goto done;
>> +            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type, len_mask)) {
>> +                /* AND + SHL */
> 
> Even if these extracts are valid, can they really be cheaper then an AND with immediate 
> argument, or back to back shifts?

This is primarily for x86.

(1) movz is 2 operand, so that may avoid clobbering an input,
(2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.

Because of these, there's a comment somewhere that says we'll prefer extract over and 
(perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to simplify ppc and s390x 
insn selection (and vs rotate and mask).  AFAIK, no other hosts are penalized.



r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04  9:06     ` Richard Henderson
@ 2026-02-04 10:41       ` Paolo Bonzini
  2026-02-04 20:45         ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2026-02-04 10:41 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3029 bytes --]

Il mer 4 feb 2026, 10:06 Richard Henderson <richard.henderson@linaro.org>
ha scritto:

> On 2/4/26 18:05, Paolo Bonzini wrote:
> > On 2/4/26 06:24, Richard Henderson wrote:
> >> Use tcg_op_imm_match to choose between expanding with AND+SHL vs
> SHL+SHR.
> >>
> >> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   tcg/optimize.c | 40 +++++++++++++++++++++++++++++++---------
> >>   1 file changed, 31 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/tcg/optimize.c b/tcg/optimize.c
> >> index e6a16921c9..2944c5a748 100644
> >> --- a/tcg/optimize.c
> >> +++ b/tcg/optimize.c
> >> @@ -1743,10 +1743,17 @@ static bool fold_deposit(OptContext *ctx, TCGOp
> *op)
> >>               goto done;
> >>           }
> >> -        /* Lower invalid deposit into zero as AND + SHL or SHL + AND.
> */
> >> +        /* Lower invalid deposit into zero. */
> >>           if (!valid) {
> >> -            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
> >> -                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> >> +            if (TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> >> +                /* EXTRACT (at 0) + SHL */
> >> +                op2 = opt_insert_before(ctx, op, INDEX_op_extract, 4);
> >> +                op2->args[0] = ret;
> >> +                op2->args[1] = arg2;
> >> +                op2->args[2] = 0;
> >> +                op2->args[3] = len;
> >> +            } else if (TCG_TARGET_extract_valid(ctx->type, 0, ofs +
> len)) {
> >> +                /* SHL + EXTRACT (at 0) */
> >>                   op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
> >>                   op2->args[0] = ret;
> >>                   op2->args[1] = arg2;
> >> @@ -1757,14 +1764,29 @@ static bool fold_deposit(OptContext *ctx, TCGOp
> *op)
> >>                   op->args[2] = 0;
> >>                   op->args[3] = ofs + len;
> >>                   goto done;
> >> +            } else if (tcg_op_imm_match(INDEX_op_and, ctx->type,
> len_mask)) {
> >> +                /* AND + SHL */
> >
> > Even if these extracts are valid, can they really be cheaper then an AND
> with immediate
> > argument, or back to back shifts?
>
> This is primarily for x86.
>
> (1) movz is 2 operand, so that may avoid clobbering an input,
> (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
>
> Because of these, there's a comment somewhere that says we'll prefer
> extract over and
> (perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to
> simplify ppc and s390x
> insn selection (and vs rotate and mask).  AFAIK, no other hosts are
> penalized.
>

I think it would be better to pick a canonical form for AND with 2^n-1 and
handle conversion to extract (like PPC rotates or movz) in the backend.

Picking AND as the canonical form also avoids makes the macros for extract
validity simpler too; adding an extra constraint for immediate 2^n-1 is
easier and it generalizes to other PPC rotate and mask cases.

Paolo

>
>
>
> r~
>
>

[-- Attachment #2: Type: text/html, Size: 4657 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04 10:41       ` Paolo Bonzini
@ 2026-02-04 20:45         ` Richard Henderson
  2026-02-05  8:22           ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-04 20:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 2/4/26 20:41, Paolo Bonzini wrote:
>     This is primarily for x86.
> 
>     (1) movz is 2 operand, so that may avoid clobbering an input,
>     (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
> 
>     Because of these, there's a comment somewhere that says we'll prefer extract over and
>     (perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to simplify ppc and
>     s390x
>     insn selection (and vs rotate and mask).  AFAIK, no other hosts are penalized.
> 
> 
> I think it would be better to pick a canonical form for AND with 2^n-1 and handle 
> conversion to extract (like PPC rotates or movz) in the backend.
> 
> Picking AND as the canonical form also avoids makes the macros for extract validity 
> simpler too; adding an extra constraint for immediate 2^n-1 is easier and it generalizes 
> to other PPC rotate and mask cases.

Picking AND means we have to use "r,0,ri" for x86, losing register allocation flexibility.


r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-04 20:45         ` Richard Henderson
@ 2026-02-05  8:22           ` Paolo Bonzini
  2026-02-05 22:29             ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2026-02-05  8:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1267 bytes --]

Il mer 4 feb 2026, 21:46 Richard Henderson <richard.henderson@linaro.org>
ha scritto:

> On 2/4/26 20:41, Paolo Bonzini wrote:
> >     This is primarily for x86.
> >
> >     (1) movz is 2 operand, so that may avoid clobbering an input,
> >     (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
> >
> >     Because of these, there's a comment somewhere that says we'll prefer
> extract over and
> >     (perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to
> simplify ppc and
> >     s390x
> >     insn selection (and vs rotate and mask).  AFAIK, no other hosts are
> penalized.
> >
> >
> > I think it would be better to pick a canonical form for AND with 2^n-1
> and handle
> > conversion to extract (like PPC rotates or movz) in the backend.
> >
> > Picking AND as the canonical form also avoids makes the macros for
> extract validity
> > simpler too; adding an extra constraint for immediate 2^n-1 is easier
> and it generalizes
> > to other PPC rotate and mask cases.
>
> Picking AND means we have to use "r,0,ri" for x86, losing register
> allocation flexibility.
>

Then could you wrap the target specific extract_valid with one that allows
ofs == 0 if AND allows the immediate 2^len-1? That would also simplify this
series.

Paolo


>
> r~
>
>

[-- Attachment #2: Type: text/html, Size: 2080 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-05  8:22           ` Paolo Bonzini
@ 2026-02-05 22:29             ` Richard Henderson
  2026-02-05 23:22               ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2026-02-05 22:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 2/5/26 18:22, Paolo Bonzini wrote:
> 
> 
> Il mer 4 feb 2026, 21:46 Richard Henderson <richard.henderson@linaro.org 
> <mailto:richard.henderson@linaro.org>> ha scritto:
> 
>     On 2/4/26 20:41, Paolo Bonzini wrote:
>      >     This is primarily for x86.
>      >
>      >     (1) movz is 2 operand, so that may avoid clobbering an input,
>      >     (2) movz is 3-4 byte whereas and r/i32 is 6-7 byte.
>      >
>      >     Because of these, there's a comment somewhere that says we'll prefer extract
>     over and
>      >     (perhaps in tcg_gen_andi_* or fold_and).  IIRC this also happens to simplify
>     ppc and
>      >     s390x
>      >     insn selection (and vs rotate and mask).  AFAIK, no other hosts are penalized.
>      >
>      >
>      > I think it would be better to pick a canonical form for AND with 2^n-1 and handle
>      > conversion to extract (like PPC rotates or movz) in the backend.
>      >
>      > Picking AND as the canonical form also avoids makes the macros for extract validity
>      > simpler too; adding an extra constraint for immediate 2^n-1 is easier and it
>     generalizes
>      > to other PPC rotate and mask cases.
> 
>     Picking AND means we have to use "r,0,ri" for x86, losing register allocation flexibility.
> 
> 
> Then could you wrap the target specific extract_valid with one that allows ofs == 0 if AND 
> allows the immediate 2^len-1? That would also simplify this series.

I don't understand your suggestion here.


r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-05 22:29             ` Richard Henderson
@ 2026-02-05 23:22               ` Paolo Bonzini
  2026-02-06  1:09                 ` Richard Henderson
  0 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2026-02-05 23:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

Il gio 5 feb 2026, 23:29 Richard Henderson <richard.henderson@linaro.org>
ha scritto:

> >      > I think it would be better to pick a canonical form for AND with
> 2^n-1 and handle
> >      > conversion to extract (like PPC rotates or movz) in the backend.
> >      >
> >      > Picking AND as the canonical form also avoids makes the macros
> for extract validity
> >      > simpler too; adding an extra constraint for immediate 2^n-1 is
> easier and it
> >     generalizes
> >      > to other PPC rotate and mask cases.
> >
> >     Picking AND means we have to use "r,0,ri" for x86, losing register
> allocation flexibility.
> >
> >
> > Then could you wrap the target specific extract_valid with one that
> allows ofs == 0 if AND
> > allows the immediate 2^len-1? That would also simplify this series.
>
> I don't understand your suggestion here.
>

I am not sure about it either... I am just not sure why extract is
guaranteed to be cheaper or have better constraints than AND.

It does happen to be true for x86, though only for len == 8 or 16; but is
it true of all targets that have a more expansive extract instruction?

Paolo



>
> r~
>
>

[-- Attachment #2: Type: text/html, Size: 2021 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts
  2026-02-05 23:22               ` Paolo Bonzini
@ 2026-02-06  1:09                 ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2026-02-06  1:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 2/6/26 09:22, Paolo Bonzini wrote:
> 
> 
> Il gio 5 feb 2026, 23:29 Richard Henderson <richard.henderson@linaro.org 
> <mailto:richard.henderson@linaro.org>> ha scritto:
> 
>      >      > I think it would be better to pick a canonical form for AND with 2^n-1 and
>     handle
>      >      > conversion to extract (like PPC rotates or movz) in the backend.
>      >      >
>      >      > Picking AND as the canonical form also avoids makes the macros for extract
>     validity
>      >      > simpler too; adding an extra constraint for immediate 2^n-1 is easier and it
>      >     generalizes
>      >      > to other PPC rotate and mask cases.
>      >
>      >     Picking AND means we have to use "r,0,ri" for x86, losing register allocation
>     flexibility.
>      >
>      >
>      > Then could you wrap the target specific extract_valid with one that allows ofs == 0
>     if AND
>      > allows the immediate 2^len-1? That would also simplify this series.
> 
>     I don't understand your suggestion here.
> 
> 
> I am not sure about it either... I am just not sure why extract is guaranteed to be 
> cheaper or have better constraints than AND.
> 
> It does happen to be true for x86, though only for len == 8 or 16; but is it true of all 
> targets that have a more expansive extract instruction?

x86 includes len == 32 via 'movl', fwiw.

Similarly, riscv64 has quite a number of filter conditions for extract, mostly because of 
a 12-bit signed argument for AND, and a collection of other zero-extend insns.

AArch64, loongarch64, and ppc64 all emit ANDI if possible during tgen_extract.

So it really is all about using extract if valid, and allowing the backend to use the more 
favorable set of constraints.


r~


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize
  2026-02-04  5:24 ` [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
@ 2026-02-25 13:34   ` Jim MacArthur
  0 siblings, 0 replies; 20+ messages in thread
From: Jim MacArthur @ 2026-02-25 13:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

On Wed, Feb 04, 2026 at 03:24:51PM +1000, Richard Henderson wrote:
> The expansions that we chose in tcg-op.c may be less than optimial.
> Delay lowering until optimize, so that we have propagated constants
> and have computed known zero/one masks.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> +        /* Lower invalid deposit into zero as AND + SHL or SHL + AND. */
> +        if (!valid) {
> +            if (TCG_TARGET_extract_valid(ctx->type, 0, ofs + len) &&
> +                !TCG_TARGET_extract_valid(ctx->type, 0, len)) {
> +                op2 = opt_insert_before(ctx, op, INDEX_op_shl, 3);
> +                op2->args[0] = ret;
> +                op2->args[1] = arg2;
> +                op2->args[2] = arg_new_constant(ctx, ofs);
> +
> +                op->opc = INDEX_op_extract;
> +                op->args[1] = ret;
> +                op->args[2] = 0;
> +                op->args[3] = ofs + len;
> +                goto done;
> +            }

I also had questions about extract vs shift/and here. You've explained this on patch 6, but a comment here about it might help future developers.

Nonetheless,

Reviewed-by: Jim MacArthur <jim.macarthur@linaro.org>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 during optimize
  2026-02-04  5:24 ` [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 " Richard Henderson
@ 2026-02-25 14:47   ` Jim MacArthur
  0 siblings, 0 replies; 20+ messages in thread
From: Jim MacArthur @ 2026-02-25 14:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

On Wed, Feb 04, 2026 at 03:24:52PM +1000, Richard Henderson wrote:
> The expansions that we chose in tcg-op.c may be less than optimial.
> Delay lowering until optimize, so that we have propagated constants
> and have computed known zero/one masks.
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Jim MacArthur <jim.macarthur@linaro.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/6] tcg: Expand missing rotri with extract2
  2026-02-04  5:24 ` [PATCH v2 3/6] tcg: Expand missing rotri with extract2 Richard Henderson
@ 2026-02-25 14:54   ` Jim MacArthur
  0 siblings, 0 replies; 20+ messages in thread
From: Jim MacArthur @ 2026-02-25 14:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

On Wed, Feb 04, 2026 at 03:24:53PM +1000, Richard Henderson wrote:
> Use extract2 to implement rotri.  To make this easier,
> redefine rotli in terms of rotri, rather than the reverse.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  tcg/tcg-op.c | 52 ++++++++++++++++++++++++----------------------------
>  1 file changed, 24 insertions(+), 28 deletions(-)
> 

Reviewed-by: Jim MacArthur <jim.macarthur@linaro.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/6] tcg: Add tcg_op_imm_match
  2026-02-04  5:24 ` [PATCH v2 4/6] tcg: Add tcg_op_imm_match Richard Henderson
@ 2026-02-25 15:06   ` Jim MacArthur
  0 siblings, 0 replies; 20+ messages in thread
From: Jim MacArthur @ 2026-02-25 15:06 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Paolo Bonzini

On Wed, Feb 04, 2026 at 03:24:54PM +1000, Richard Henderson wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Create a function to test whether the second operand of a
> binary operation allows a given immediate.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [rth: Split out from a larger patch; keep the declaration internal.]
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Jim MacArthur <jim.macarthur@linaro.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and
  2026-02-04  5:24 ` [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and Richard Henderson
@ 2026-02-25 15:16   ` Jim MacArthur
  0 siblings, 0 replies; 20+ messages in thread
From: Jim MacArthur @ 2026-02-25 15:16 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson, Paolo Bonzini

On Wed, Feb 04, 2026 at 03:24:55PM +1000, Richard Henderson wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> Instead of assuming only small immediates are available for AND,
> consult the backend in order to decide between SHL/SHR and SHR/AND.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [rth: Split from a larger patch]
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Jim MacArthur <jim.macarthur@linaro.org>



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2026-02-25 15:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-04  5:24 [PATCH v2 0/6] tcg: Improve extract and deposit code gen Richard Henderson
2026-02-04  5:24 ` [PATCH v2 1/6] tcg/optimize: Lower unsupported deposit during optimize Richard Henderson
2026-02-25 13:34   ` Jim MacArthur
2026-02-04  5:24 ` [PATCH v2 2/6] tcg/optimize: Lower unsupported extract2 " Richard Henderson
2026-02-25 14:47   ` Jim MacArthur
2026-02-04  5:24 ` [PATCH v2 3/6] tcg: Expand missing rotri with extract2 Richard Henderson
2026-02-25 14:54   ` Jim MacArthur
2026-02-04  5:24 ` [PATCH v2 4/6] tcg: Add tcg_op_imm_match Richard Henderson
2026-02-25 15:06   ` Jim MacArthur
2026-02-04  5:24 ` [PATCH v2 5/6] tcg: target-dependent lowering of extract to shr/and Richard Henderson
2026-02-25 15:16   ` Jim MacArthur
2026-02-04  5:24 ` [PATCH v2 6/6] tcg/optimize: possibly expand deposit into zero with shifts Richard Henderson
2026-02-04  8:05   ` Paolo Bonzini
2026-02-04  9:06     ` Richard Henderson
2026-02-04 10:41       ` Paolo Bonzini
2026-02-04 20:45         ` Richard Henderson
2026-02-05  8:22           ` Paolo Bonzini
2026-02-05 22:29             ` Richard Henderson
2026-02-05 23:22               ` Paolo Bonzini
2026-02-06  1:09                 ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.