Re: [Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Claudio Fontana <claudio.fontana@huawei.com>
To: Richard Henderson <rth@twiddle.net>
Cc: peter.maydell@linaro.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi
Date: Mon, 16 Sep 2013 11:16:29 +0200	[thread overview]
Message-ID: <5236CC6D.40901@huawei.com> (raw)
In-Reply-To: <1379195690-6509-23-git-send-email-rth@twiddle.net>

On 14.09.2013 23:54, Richard Henderson wrote:
> When profitable, initialize the register with MOVN instead of MOVZ,
> before setting the remaining lanes with MOVK.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/aarch64/tcg-target.c | 88 +++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 75 insertions(+), 13 deletions(-)
> 
> diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
> index f9319ed..cecda05 100644
> --- a/tcg/aarch64/tcg-target.c
> +++ b/tcg/aarch64/tcg-target.c
> @@ -559,24 +559,86 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
>                           tcg_target_long value)
>  {
>      AArch64Insn insn;
> -
> -    if (type == TCG_TYPE_I32) {
> +    int i, wantinv, shift;
> +    tcg_target_long svalue = value;
> +    tcg_target_long ivalue, imask;
> +
> +    /* For 32-bit values, discard potential garbage in value.  For 64-bit
> +       values within [2**31, 2**32-1], we can create smaller sequences by
> +       interpreting this as a negative 32-bit number, while ensuring that
> +       the high 32 bits are cleared by setting SF=0.  */
> +    if (type == TCG_TYPE_I32 || (value & ~0xffffffffull) == 0) {
> +        svalue = (int32_t)value;
>          value = (uint32_t)value;
> +        type = TCG_TYPE_I32;
> +    }
> +
> +    /* Would it take fewer insns to begin with MOVN?  For the value and its
> +       inverse, count the number of 16-bit lanes that are 0.  For the benefit
> +       of 32-bit quantities, compare the zero-extended normal value vs the
> +       sign-extended inverted value.  For example,
> +            v = 0x00000000f100ffff, zeros = 2
> +           ~v = 0xffffffff0eff0000, zeros = 1
> +          ~sv = 0x000000000eff0000, zeros = 3
> +       By using ~sv we see that 3 > 2, leading us to emit just a single insn
> +       "movn ret, 0x0eff, lsl #16".  */
> +
> +    ivalue = ~svalue;
> +    imask = 0;
> +    wantinv = 0;
> +
> +    /* ??? This can be done in the simd unit without a loop:
> +        // Move value and ivalue into V0 and V1 respectively.
> +        mov     v0.d[0], value
> +        mov     v1.d[0], ivalue
> +        // Compare each 16-bit lane vs 0, producing -1 for true.
> +        cmeq    v0.4h, v0.4h, #0
> +        cmeq    v1.4h, v1.4h, #0
> +        mov     imask, v1.d[0]
> +        // Sum the comparisons, producing 0 to -4.
> +        addv    h0, v0.4h
> +        addv    h1, v1.4h
> +        // Subtract the two, forming a positive wantinv result.
> +        sub     v0.4h, v0.4h, v1.4h
> +        smov    wantinv, v0.h[0]
> +     */
> +    for (i = 0; i < 64; i += 16) {
> +        tcg_target_long mask = 0xffffull << i;
> +        if ((value & mask) == 0) {
> +            wantinv -= 1;
> +        }
> +        if ((ivalue & mask) == 0) {
> +            wantinv += 1;
> +            imask |= mask;
> +        }
>      }
>  
> -    /* count trailing zeros in 16 bit steps, mapping 64 to 0. Emit the
> -       first MOVZ with the half-word immediate skipping the zeros, with a shift
> -       (LSL) equal to this number. Then all next instructions use MOVKs.
> -       Zero the processed half-word in the value, continue until empty.
> -       We build the final result 16bits at a time with up to 4 instructions,
> -       but do not emit instructions for 16bit zero holes. */
> +    /* If we had more 0xffff than 0x0000, invert VALUE and use MOVN.  */
>      insn = INSN_MOVZ;
> -    do {
> -        unsigned shift = ctz64(value) & (63 & -16);
> -        tcg_fmt_Rd_uimm(s, insn, shift >= 32, rd, value >> shift, shift);
> +    if (wantinv > 0) {
> +        value = ivalue;
> +        insn = INSN_MOVN;
> +    }
> +
> +    /* Find the lowest lane that is not 0x0000.  */
> +    shift = ctz64(value) & (63 & -16);
> +    tcg_fmt_Rd_uimm(s, insn, type, rd, value >> shift, shift);
> +
> +    if (wantinv > 0) {
> +        /* Re-invert the value, so MOVK sees non-inverted bits.  */
> +        value = ~value;
> +        /* Clear out all the 0xffff lanes.  */
> +        value ^= imask;
> +    }
> +    /* Clear out the lane that we just set.  */
> +    value &= ~(0xffffUL << shift);
> +
> +    /* Iterate until all lanes have been set, and thus cleared from VALUE.  */
> +    while (value) {
> +        shift = ctz64(value) & (63 & -16);
> +        tcg_fmt_Rd_uimm(s, INSN_MOVK, type, rd, value >> shift, shift);
>          value &= ~(0xffffUL << shift);
> -        insn = INSN_MOVK;
> -    } while (value);
> +    }
>  }
>  
>  static inline void tcg_out_ldst_r(TCGContext *s,
> 

I agree in general with the approach "lets see if it is more convenient to start with MOVN".
The existing implementation is, although not easy, leaner.
Can we make it a little this one a little bit leaner?
I'll think myself about it as well.

C.

next prev parent reply	other threads:[~2013-09-16  9:17 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-14 21:54 [Qemu-devel] [PATCH v4 00/33] tcg-aarch64 improvements Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 01/33] tcg-aarch64: Change all ext variables to TCGType Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 02/33] tcg-aarch64: Set ext based on TCG_OPF_64BIT Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 03/33] tcg-aarch64: Don't handle mov/movi in tcg_out_op Richard Henderson
2013-09-16  7:45   ` Claudio Fontana
2013-09-16 15:07     ` Richard Henderson
2013-09-17  8:05       ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 04/33] tcg-aarch64: Hoist common argument loads " Richard Henderson
2013-09-16  7:42   ` Claudio Fontana
2013-09-16 16:20     ` Richard Henderson
2013-09-17  8:01       ` Claudio Fontana
2013-09-17 14:27         ` Richard Henderson
2013-09-18  8:10           ` Claudio Fontana
2013-09-18 14:00             ` Richard Henderson
2013-09-18 14:18           ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 05/33] tcg-aarch64: Change enum aarch64_arith_opc to AArch64Insn Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 06/33] tcg-aarch64: Merge enum aarch64_srr_opc with AArch64Insn Richard Henderson
2013-09-16  7:56   ` Claudio Fontana
2013-09-16 15:06     ` Richard Henderson
2013-09-17  8:51       ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 07/33] tcg-aarch64: Remove the shift_imm parameter from tcg_out_cmp Richard Henderson
2013-09-16  8:02   ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 08/33] tcg-aarch64: Introduce tcg_fmt_Rdnm and tcg_fmt_Rdnm_lsl Richard Henderson
2013-09-16  8:41   ` Claudio Fontana
2013-09-16 15:32     ` Richard Henderson
2013-09-16 19:11       ` Richard Henderson
2013-09-17  8:23       ` Claudio Fontana
2013-09-17 14:54         ` Richard Henderson
2013-09-18  8:24           ` Claudio Fontana
2013-09-18 14:54             ` Richard Henderson
2013-09-18 15:01               ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 09/33] tcg-aarch64: Introduce tcg_fmt_Rdn_aimm Richard Henderson
2013-09-16  8:47   ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 10/33] tcg-aarch64: Implement mov with tcg_fmt_* functions Richard Henderson
2013-09-16  8:50   ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 11/33] tcg-aarch64: Handle constant operands to add, sub, and compare Richard Henderson
2013-09-16  9:02   ` Claudio Fontana
2013-09-16 15:45     ` Richard Henderson
2013-09-17  8:49       ` Claudio Fontana
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 12/33] tcg-aarch64: Handle constant operands to and, or, xor Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 13/33] tcg-aarch64: Support andc, orc, eqv, not Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 14/33] tcg-aarch64: Handle zero as first argument to sub Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 15/33] tcg-aarch64: Support movcond Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 16/33] tcg-aarch64: Use tcg_fmt_Rdnm_cond for setcond Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 17/33] tcg-aarch64: Support deposit Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 18/33] tcg-aarch64: Support add2, sub2 Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 19/33] tcg-aarch64: Support muluh, mulsh Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 20/33] tcg-aarch64: Support div, rem Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 21/33] tcg-aarch64: Introduce tcg_fmt_Rd_uimm Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 22/33] tcg-aarch64: Use MOVN in tcg_out_movi Richard Henderson
2013-09-16  9:16   ` Claudio Fontana [this message]
2013-09-16 15:50     ` Richard Henderson
2013-09-17  7:55       ` Claudio Fontana
2013-09-17 15:56         ` Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 23/33] tcg-aarch64: Use ORRI " Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 24/33] tcg-aarch64: Special case small constants " Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 25/33] tcg-aarch64: Use adrp " Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 26/33] tcg-aarch64: Avoid add with zero in tlb load Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 27/33] tcg-aarch64: Pass return address to load/store helpers directly Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 28/33] tcg-aarch64: Use tcg_out_call for qemu_ld/st Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 29/33] tcg-aarch64: Use symbolic names for branches Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 30/33] tcg-aarch64: Implement tcg_register_jit Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 31/33] tcg-aarch64: Reuse FP and LR in translated code Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 32/33] tcg-aarch64: Introduce tcg_out_ldst_pair Richard Henderson
2013-09-14 21:54 ` [Qemu-devel] [PATCH v4 33/33] tcg-aarch64: Remove redundant CPU_TLB_ENTRY_BITS check Richard Henderson
2013-09-16  9:05   ` Claudio Fontana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5236CC6D.40901@huawei.com \
    --to=claudio.fontana@huawei.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.