qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: "Philippe Mathieu-Daudé" <f4bug@amsat.org>, qemu-devel@nongnu.org
Cc: "Thomas Huth" <thuth@redhat.com>,
	"Aleksandar Rikalo" <aleksandar.rikalo@syrmia.com>,
	"Fredrik Noring" <noring@nocrew.org>,
	"Laurent Vivier" <laurent@vivier.eu>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Aurelien Jarno" <aurelien@aurel32.net>,
	"Maciej W. Rozycki" <macro@orcam.me.uk>
Subject: Re: [RFC PATCH 18/42] target/mips/tx79: Introduce PEXTU[BHW] opcodes (Parallel Extend Lower)
Date: Mon, 15 Feb 2021 10:28:32 -0800	[thread overview]
Message-ID: <8668c62a-89c9-0456-bc33-07527dd16d91@linaro.org> (raw)
In-Reply-To: <20210214175912.732946-19-f4bug@amsat.org>

On 2/14/21 9:58 AM, Philippe Mathieu-Daudé wrote:
> Introduce the 'Parallel Extend Lower' opcodes:

$SUBJECT s/PEXTU/PEXTL/.

> +    /* Lower halve */
> +    for (int i = 0; i < 64 / (2 * wlen); i++) {
> +        tcg_gen_deposit_i64(cpu_gpr[a->rd],
> +                            cpu_gpr[a->rd], bx, 2 * wlen * i, wlen);
> +        tcg_gen_deposit_i64(cpu_gpr[a->rd],
> +                            cpu_gpr[a->rd], ax, 2 * wlen * i + wlen, wlen);
> +        tcg_gen_shri_i64(bx, bx, wlen);
> +        tcg_gen_shri_i64(ax, ax, wlen);
> +    }
> +    /* Upper halve */
> +    for (int i = 0; i < 64 / (2 * wlen); i++) {
> +        tcg_gen_deposit_i64(cpu_gpr_hi[a->rd],
> +                            cpu_gpr_hi[a->rd], bx, 2 * wlen * i, wlen);
> +        tcg_gen_deposit_i64(cpu_gpr_hi[a->rd],
> +                            cpu_gpr_hi[a->rd], ax, 2 * wlen * i + wlen, wlen);
> +        tcg_gen_shri_i64(bx, bx, wlen);
> +        tcg_gen_shri_i64(ax, ax, wlen);
> +    }

Right, so, this expands to (4 * 4 * 2) = 32 operations for pextlb, if deposit
is supported, or ((4*2 + 2) * 4 * 2) = 80 operations if not (4 per deposit).

We can do a bit better, though, exploiting parallelism.

/* 5 or 8 operations, w/ or w/o deposit */
void gen_widen_b(TCGv_i64 d, TCGv_i64 s)
{
    TCGv_i64 x = tcg_temp_new_i64();
    TCGv_i64 y = tcg_temp_new_i64();
    TCGv_i64 m0 = tcg_constant_i64(0x0000ff000000ff00ull);

    /* s = abcdefgh */
    tcg_gen_deposit_i64(x, s, s, 16, 48);
    /* x = cdefghgh */
    tcg_gen_and_i64(y, x, m);
    /* y = 00e000g0 */
    tcg_gen_andc_i64(x, x, m0);
    /* x = 000f000h */
    tcg_gen_shli_i64(y, y, 8);
    /* y = 0e000g00 */
    tcg_gen_or_i64(d, x, y);
    /* d = 0e0f0g0h */

    tcg_temp_free_i64(x);
    tcg_temp_free_i64(y);
}

/* 12 or 18 operations w/ or w/o deposit */
void gen_pextb(TCGv_i64 d, TCGv_i64 s, TCGv_i64 t)
{
    TCGv_i64 x = tcg_temp_new_i64();

    gen_widen_b(x, s);
    gen_widen_b(d, s);
    tcg_gen_shli_i64(x, x, 8);
    tcg_gen_or_i64(d, d, x);

    tcg_temp_free_i64(x);
}

then

    gen_read_gpr(s, a->rs);
    gen_read_gpr(t, a->rt);
    gen_pextb(cpu_gpr[a->rd], s, t);

    tcg_gen_shri_i64(s, s, 32);
    tcg_gen_shri_i64(t, t, 32);
    gen_pextb(cpu_gpr_hi[a->rd], s, t);

gives you the result in 26 or 38 operations.

Similarly

void gen_widen_h(TCGv_i64 d, TCGv_i64 s)
{
    TCGv_i64 x = tcg_temp_new_i64();

    /* s = abcd */
    tcg_gen_andi_i64(x, s, 0xffff0000u);
    /* x = 00c0 */
    tcg_gen_deposit_i64(d, s, x, 16, 48);
    /* d = 0c0d */

    tcg_temp_free_i64(x);
}


r~


  reply	other threads:[~2021-02-15 18:32 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-14 17:58 [RFC PATCH 00/42] target/mips: Reintroduce the R5900 CPU (with more testing) Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 01/42] linux-user/mips64: Restore setup_frame() for o32 ABI Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 02/42] linux-user/mips64: Support o32 ABI syscalls Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 03/42] target/mips/translate: Make cpu_HI/LO registers public Philippe Mathieu-Daudé
2021-02-15 16:12   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 04/42] target/mips: Promote 128-bit multimedia registers as global ones Philippe Mathieu-Daudé
2021-02-15 16:14   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 05/42] target/mips: Rename 128-bit upper halve GPR registers Philippe Mathieu-Daudé
2021-02-15 16:15   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 06/42] target/mips: Introduce gen_load_gpr_hi() / gen_store_gpr_hi() helpers Philippe Mathieu-Daudé
2021-02-15 16:15   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 07/42] target/mips/translate: Use GPR move functions in gen_HILO1_tx79() Philippe Mathieu-Daudé
2021-02-15 16:17   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 08/42] target/mips/tx79: Move MFHI1 / MFLO1 opcodes to decodetree Philippe Mathieu-Daudé
2021-02-15 16:21   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 09/42] target/mips/tx79: Move MTHI1 / MTLO1 " Philippe Mathieu-Daudé
2021-02-15 16:23   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 10/42] target/mips/translate: Simplify PCPYH using deposit_i64() Philippe Mathieu-Daudé
2021-02-15 16:24   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 11/42] target/mips/tx79: Move PCPYH opcode to decodetree Philippe Mathieu-Daudé
2021-02-15 16:26   ` Richard Henderson
2021-03-08 10:48     ` Philippe Mathieu-Daudé
2021-03-08 11:57       ` Philippe Mathieu-Daudé
2021-03-09 14:25         ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 12/42] target/mips/tx79: Move PCPYLD / PCPYUD opcodes " Philippe Mathieu-Daudé
2021-02-15 16:28   ` Richard Henderson
2021-02-15 16:58     ` Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 13/42] target/mips: Remove 'C790 Multimedia Instructions' dead code Philippe Mathieu-Daudé
2021-02-15 16:32   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 14/42] target/mips/tx79: Salvage instructions description comment Philippe Mathieu-Daudé
2021-02-15 16:33   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 15/42] target/mips/tx79: Introduce PAND/POR/PXOR/PNOR opcodes (parallel logic) Philippe Mathieu-Daudé
2021-02-15 16:35   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 16/42] target/mips/tx79: Introduce PSUB* opcodes (Parallel Subtract) Philippe Mathieu-Daudé
2021-02-15 16:38   ` Richard Henderson
2021-03-08 18:46     ` Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 17/42] target/mips/tx79: Introduce PEXTUW (Parallel Extend Upper from Word) Philippe Mathieu-Daudé
2021-02-15 16:44   ` Richard Henderson
2021-03-08 18:40     ` Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 18/42] target/mips/tx79: Introduce PEXTU[BHW] opcodes (Parallel Extend Lower) Philippe Mathieu-Daudé
2021-02-15 18:28   ` Richard Henderson [this message]
2021-02-14 17:58 ` [RFC PATCH 19/42] target/mips/tx79: Introduce PCEQ* opcodes (Parallel Compare for Equal) Philippe Mathieu-Daudé
2021-02-15 20:32   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 20/42] target/mips/tx79: Introduce PCGT* (Parallel Compare for Greater Than) Philippe Mathieu-Daudé
2021-02-14 17:58 ` [RFC PATCH 21/42] target/mips/tx79: Introduce PPACW opcode (Parallel Pack to Word) Philippe Mathieu-Daudé
2021-02-15 20:38   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 22/42] target/mips/tx79: Introduce PINTEH (Parallel Interleave Even Halfword) Philippe Mathieu-Daudé
2021-02-15 20:41   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 23/42] target/mips/tx79: Introduce PEXE[HW] opcodes (Parallel Exchange Even) Philippe Mathieu-Daudé
2021-02-15 20:45   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 24/42] target/mips/tx79: Introduce PROT3W opcode (Parallel Rotate 3 Words) Philippe Mathieu-Daudé
2021-02-15 20:49   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 25/42] target/mips/tx79: Introduce LQ opcode (Load Quadword) Philippe Mathieu-Daudé
2021-02-15 20:51   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 26/42] target/mips/tx79: Introduce SQ opcode (Store Quadword) Philippe Mathieu-Daudé
2021-02-15 20:51   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 27/42] target/mips/translate: Make gen_rdhwr() public Philippe Mathieu-Daudé
2021-02-15 20:51   ` Richard Henderson
2021-02-14 17:58 ` [RFC PATCH 28/42] target/mips/tx79: Move RDHWR usermode kludge to trans_SQ() Philippe Mathieu-Daudé
2021-02-15 21:01   ` Richard Henderson
2021-02-16  7:05     ` Fredrik Noring
2021-02-16 12:21       ` Maciej W. Rozycki
2021-02-16 13:04         ` Fredrik Noring
2021-02-14 17:58 ` [RFC PATCH 29/42] linux-user/mips64: Support the n32 ABI for the R5900 Philippe Mathieu-Daudé
2021-02-15 21:02   ` Richard Henderson
2021-02-14 17:59 ` [RFC PATCH 30/42] target/mips: Reintroduce the R5900 CPU Philippe Mathieu-Daudé
2021-02-15 21:04   ` Richard Henderson
2021-02-14 17:59 ` [RFC PATCH 31/42] default-configs: Support o32 ABI with R5900 64-bit MIPS CPU Philippe Mathieu-Daudé
2021-02-15 21:05   ` Richard Henderson
2021-02-14 17:59 ` [RFC PATCH 32/42] docker: Add gentoo-mipsr5900el-cross image Philippe Mathieu-Daudé
2021-02-15 11:59   ` Daniel P. Berrangé
2021-02-15 13:45     ` Fredrik Noring
2021-02-20 20:01       ` Philippe Mathieu-Daudé
2021-03-12 15:10         ` Philippe Mathieu-Daudé
2021-03-12 17:05           ` Maciej W. Rozycki
2021-03-12 17:46             ` Philippe Mathieu-Daudé
2021-03-12 20:04               ` Maciej W. Rozycki
2021-03-13  7:02                 ` Fredrik Noring
2021-03-17 18:55                 ` Philippe Mathieu-Daudé
2021-03-17 22:21                   ` Maciej W. Rozycki
2021-03-12 18:24             ` Fredrik Noring
2021-03-12 20:05               ` Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 33/42] gitlab-ci: Pass optional EXTRA_FILES when building docker images Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 34/42] gitlab-ci: Build MIPS R5900 cross-toolchain (Gentoo based) Philippe Mathieu-Daudé
2021-02-15 11:42   ` Philippe Mathieu-Daudé
2021-02-15 11:58     ` Daniel P. Berrangé
2021-02-14 17:59 ` [RFC PATCH 35/42] tests/tcg: Add MIPS R5900 to arches filter Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 36/42] tests/tcg/mips: Test user mode DMULT for the R5900 Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 37/42] gitlab-ci: Add job to test the MIPS r5900o32el target Philippe Mathieu-Daudé
2021-02-15  5:31   ` Thomas Huth
2021-02-15  8:07     ` Philippe Mathieu-Daudé
2021-02-15  8:11       ` Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 38/42] tests/acceptance: Extract QemuBaseTest from Test Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 39/42] tests/acceptance: Make pick_default_qemu_bin() more generic Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 40/42] tests/acceptance: Introduce QemuUserTest base class Philippe Mathieu-Daudé
2021-02-14 17:59 ` [RFC PATCH 41/42] tests/acceptance: Test R5900 CPU with BusyBox from Sony PS2 Philippe Mathieu-Daudé
2021-02-15 14:28   ` Fredrik Noring
2021-02-15 14:46     ` Maciej W. Rozycki
2021-02-14 17:59 ` [RFC PATCH 42/42] gitlab-ci: Add job to run integration tests for the r5900o32el target Philippe Mathieu-Daudé
2021-02-14 18:08 ` [RFC PATCH 00/42] target/mips: Reintroduce the R5900 CPU (with more testing) Philippe Mathieu-Daudé
2021-02-15  9:24 ` Philippe Mathieu-Daudé
2021-02-21 14:04 ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8668c62a-89c9-0456-bc33-07527dd16d91@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=aleksandar.rikalo@syrmia.com \
    --cc=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=f4bug@amsat.org \
    --cc=laurent@vivier.eu \
    --cc=macro@orcam.me.uk \
    --cc=noring@nocrew.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).