Re: [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder
Date: Wed, 10 Apr 2024 18:12:35 -1000	[thread overview]
Message-ID: <f211d5d7-9d0f-455a-97c5-d2c09d600bcb@linaro.org> (raw)
In-Reply-To: <20240409164323.776660-10-pbonzini@redhat.com>

On 4/9/24 06:43, Paolo Bonzini wrote:
> +static void gen_ARPL(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
> +{
> +    TCGLabel *label1 = gen_new_label();
> +    TCGv rpl_adj = tcg_temp_new();
> +    TCGv flags = tcg_temp_new();
> +
> +    gen_mov_eflags(s, flags);
> +    tcg_gen_andi_tl(flags, flags, ~CC_Z);
> +
> +    /* Compute dest[rpl] - src[rpl], adjust if result <0.  */
> +    tcg_gen_andi_tl(rpl_adj, s->T0, 3);
> +    tcg_gen_andi_tl(s->T1, s->T1, 3);
> +    tcg_gen_sub_tl(rpl_adj, rpl_adj, s->T1);
> +
> +    tcg_gen_brcondi_tl(TCG_COND_LT, rpl_adj, 0, label1);

Comment is right, but branch condition is wrong.

I think this might be better as:

     /* SRC = DST with SRC[RPL] */
     tcg_gen_deposit_tl(s->T1, s->T0, s->T1, 0, 2);
     /* Z flag set if DST < SRC */
     tcg_gen_setcond_tl(TCG_COND_LTU, tmp, s->T0, s->T1);
     /* Install Z */
     tcg_gen_deposit_tl(flags, flags, tmp, ctz(CC_Z), 1);
     /* DST with maximum RPL */
     tcg_gen_umax_tl(s->T0, s->T0, s->T1);


> +    case MO_32:
> +#ifdef TARGET_X86_64
> +        /*
> +         * This could also use the same algorithm as MO_16.  It produces fewer
> +         * TCG ops and better code if flags are needed, but it requires a 64-bit
> +         * multiply even if they are not (and thus the high part of the multiply
> +         * is dead).
> +         */

Is 64-bit multiply ever slower these days?
My intuition says "slow" multiply is at least a decade out of date.

> +        tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
> +        tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);

Avoid s->tmp*, especially in new code.

> +        tcg_gen_muls2_i32(s->tmp2_i32, s->tmp3_i32,
> +                          s->tmp2_i32, s->tmp3_i32);
> +        tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
> +
> +        cc_src_rhs = tcg_temp_new();
> +        tcg_gen_extu_i32_tl(cc_src_rhs, s->tmp3_i32);
> +        /* Compare the high part to the sign bit of the truncated result */
> +        tcg_gen_negsetcondi_i32(TCG_COND_LT, s->tmp2_i32, s->tmp2_i32, 0);

This seems like something the optimizer should handle, but doesn't.
I'd write this as

     tcg_gen_sari_i32(tmp, tmp, 31);
or
     tcg_gen_sextract_i32(tmp, tmp, 31, 1);

which I know will expand to the same thing.

> +    case MO_64:
> +#endif
> +        cc_src_rhs = tcg_temp_new();
> +        tcg_gen_muls2_tl(s->T0, cc_src_rhs, s->T0, s->T1);
> +        /* Compare the high part to the sign bit of the truncated result */
> +        tcg_gen_negsetcondi_tl(TCG_COND_LT, s->T1, s->T0, 0);

Similarly.


r~

next prev parent reply	other threads:[~2024-04-11  7:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09 16:43 [PATCH for-9.1 00/19] target/i386: convert 1-byte opcodes to new decoder Paolo Bonzini
2024-04-09 16:43 ` [PATCH for-9.1 01/19] target/i386: use TSTEQ/TSTNE to test low bits Paolo Bonzini
2024-04-09 16:43 ` [PATCH for-9.1 02/19] target/i386: use TSTEQ/TSTNE to check flags Paolo Bonzini
2024-04-09 16:43 ` [PATCH for-9.1 03/19] target/i386: remove mask from CCPrepare Paolo Bonzini
2024-04-09 17:23   ` Philippe Mathieu-Daudé
2024-04-09 16:43 ` [PATCH for-9.1 04/19] target/i386: do not use s->tmp0 and s->tmp4 to compute flags Paolo Bonzini
2024-04-10  6:34   ` Richard Henderson
2024-04-10 18:33     ` Paolo Bonzini
2024-04-09 16:43 ` [PATCH for-9.1 05/19] target/i386: reintroduce debugging mechanism Paolo Bonzini
2024-04-09 16:43 ` [PATCH for-9.1 06/19] target/i386: move 00-5F opcodes to new decoder Paolo Bonzini
2024-04-11  2:50   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 07/19] target/i386: extract gen_far_call/jmp, reordering temporaries Paolo Bonzini
2024-04-11  2:55   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 08/19] target/i386: allow instructions with more than one immediate Paolo Bonzini
2024-04-11  2:57   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 09/19] target/i386: move 60-BF opcodes to new decoder Paolo Bonzini
2024-04-11  4:12   ` Richard Henderson [this message]
2024-04-11 11:18     ` Paolo Bonzini
2024-04-11 14:31   ` Zhao Liu
2024-04-11 15:19   ` Zhao Liu
2024-04-11 16:43     ` Paolo Bonzini
2024-04-24 11:13     ` Paolo Bonzini
2024-04-25 15:29       ` Zhao Liu
2024-04-09 16:43 ` [PATCH for-9.1 10/19] target/i386: generalize gen_movl_seg_T0 Paolo Bonzini
2024-04-11  4:13   ` Richard Henderson
2024-04-11 14:45   ` Zhao Liu
2024-04-09 16:43 ` [PATCH for-9.1 11/19] target/i386: move C0-FF opcodes to new decoder (except for x87) Paolo Bonzini
2024-04-11  6:02   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 12/19] target/i386: merge and enlarge a few ranges for call to disas_insn_new Paolo Bonzini
2024-04-11  7:56   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 13/19] target/i386: move remaining conditional operations to new decoder Paolo Bonzini
2024-04-11  8:00   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 14/19] target/i386: move BSWAP " Paolo Bonzini
2024-04-11  8:02   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 15/19] target/i386: port extensions of one-byte opcodes " Paolo Bonzini
2024-04-11  8:08   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 16/19] target/i386: remove now-converted opcodes from old decoder Paolo Bonzini
2024-04-11  8:11   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 17/19] target/i386: decode x87 instructions in a separate function Paolo Bonzini
2024-04-09 17:20   ` Philippe Mathieu-Daudé
2024-04-11  8:16   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 18/19] target/i386: split legacy decoder into " Paolo Bonzini
2024-04-09 17:17   ` Philippe Mathieu-Daudé
2024-04-11  8:17   ` Richard Henderson
2024-04-09 16:43 ` [PATCH for-9.1 19/19] target/i386: remove duplicate prefix decoding Paolo Bonzini
2024-04-11  8:34   ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f211d5d7-9d0f-455a-97c5-d2c09d600bcb@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).