From: David Laight <david.laight.linux@gmail.com>
To: cp0613@linux.alibaba.com
Cc: alex@ghiti.fr, aou@eecs.berkeley.edu, arnd@arndb.de,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org, linux@rasmusvillemoes.dk,
palmer@dabbelt.com, paul.walmsley@sifive.com,
yury.norov@gmail.com
Subject: Re: [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension
Date: Sun, 29 Jun 2025 11:38:40 +0100 [thread overview]
Message-ID: <20250629113840.2f319956@pumpkin> (raw)
In-Reply-To: <20250628120816.1679-1-cp0613@linux.alibaba.com>
On Sat, 28 Jun 2025 20:08:16 +0800
cp0613@linux.alibaba.com wrote:
> On Wed, 25 Jun 2025 17:02:34 +0100, david.laight.linux@gmail.com wrote:
>
> > Is it even a gain in the zbb case?
> > The "rorw" is only ever going to help full word rotates.
> > Here you might as well do ((word << 8 | word) >> shift).
> >
> > For "rol8" you'd need ((word << 24 | word) 'rol' shift).
> > I still bet the generic code is faster (but see below).
> >
> > Same for 16bit rotates.
> >
> > Actually the generic version is (probably) horrid for everything except x86.
> > See https://www.godbolt.org/z/xTxYj57To
>
> Thanks for your suggestion, this website is very inspiring. According to the
> results, the generic version is indeed the most friendly to x86. I think this
> is also a reason why other architectures should be optimized. Take the riscv64
> ror32 implementation as an example, compare the number of assembly instructions
> of the following two functions:
> ```
> u32 zbb_opt_ror32(u32 word, unsigned int shift)
> {
> asm volatile(
> ".option push\n"
> ".option arch,+zbb\n"
> "rorw %0, %1, %2\n"
> ".option pop\n"
> : "=r" (word) : "r" (word), "r" (shift) :);
>
> return word;
> }
>
> u16 generic_ror32(u16 word, unsigned int shift)
> {
> return (word >> (shift & 31)) | (word << ((-shift) & 31));
> }
> ```
> Their disassembly is:
> ```
> zbb_opt_ror32:
> <+0>: addi sp,sp,-16
> <+2>: sd s0,0(sp)
> <+4>: sd ra,8(sp)
> <+6>: addi s0,sp,16
> <+8>: .insn 4, 0x60b5553b
> <+12>: ld ra,8(sp)
> <+14>: ld s0,0(sp)
> <+16>: sext.w a0,a0
> <+18>: addi sp,sp,16
> <+20>: ret
>
> generic_ror32:
> <+0>: addi sp,sp,-16
> <+2>: andi a1,a1,31
> <+4>: sd s0,0(sp)
> <+6>: sd ra,8(sp)
> <+8>: addi s0,sp,16
> <+10>: negw a5,a1
> <+14>: sllw a5,a0,a5
> <+18>: ld ra,8(sp)
> <+20>: ld s0,0(sp)
> <+22>: srlw a0,a0,a1
> <+26>: or a0,a0,a5
> <+28>: slli a0,a0,0x30
> <+30>: srli a0,a0,0x30
> <+32>: addi sp,sp,16
> <+34>: ret
> ```
> It can be found that the zbb optimized implementation uses fewer instructions,
> even for 16-bit and 8-bit data.
Far too many register spills to stack.
I think you've forgotten to specify -O2
David
WARNING: multiple messages have this Message-ID (diff)
From: David Laight <david.laight.linux@gmail.com>
To: cp0613@linux.alibaba.com
Cc: alex@ghiti.fr, aou@eecs.berkeley.edu, arnd@arndb.de,
linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-riscv@lists.infradead.org, linux@rasmusvillemoes.dk,
palmer@dabbelt.com, paul.walmsley@sifive.com,
yury.norov@gmail.com
Subject: Re: [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension
Date: Sun, 29 Jun 2025 11:38:40 +0100 [thread overview]
Message-ID: <20250629113840.2f319956@pumpkin> (raw)
In-Reply-To: <20250628120816.1679-1-cp0613@linux.alibaba.com>
On Sat, 28 Jun 2025 20:08:16 +0800
cp0613@linux.alibaba.com wrote:
> On Wed, 25 Jun 2025 17:02:34 +0100, david.laight.linux@gmail.com wrote:
>
> > Is it even a gain in the zbb case?
> > The "rorw" is only ever going to help full word rotates.
> > Here you might as well do ((word << 8 | word) >> shift).
> >
> > For "rol8" you'd need ((word << 24 | word) 'rol' shift).
> > I still bet the generic code is faster (but see below).
> >
> > Same for 16bit rotates.
> >
> > Actually the generic version is (probably) horrid for everything except x86.
> > See https://www.godbolt.org/z/xTxYj57To
>
> Thanks for your suggestion, this website is very inspiring. According to the
> results, the generic version is indeed the most friendly to x86. I think this
> is also a reason why other architectures should be optimized. Take the riscv64
> ror32 implementation as an example, compare the number of assembly instructions
> of the following two functions:
> ```
> u32 zbb_opt_ror32(u32 word, unsigned int shift)
> {
> asm volatile(
> ".option push\n"
> ".option arch,+zbb\n"
> "rorw %0, %1, %2\n"
> ".option pop\n"
> : "=r" (word) : "r" (word), "r" (shift) :);
>
> return word;
> }
>
> u16 generic_ror32(u16 word, unsigned int shift)
> {
> return (word >> (shift & 31)) | (word << ((-shift) & 31));
> }
> ```
> Their disassembly is:
> ```
> zbb_opt_ror32:
> <+0>: addi sp,sp,-16
> <+2>: sd s0,0(sp)
> <+4>: sd ra,8(sp)
> <+6>: addi s0,sp,16
> <+8>: .insn 4, 0x60b5553b
> <+12>: ld ra,8(sp)
> <+14>: ld s0,0(sp)
> <+16>: sext.w a0,a0
> <+18>: addi sp,sp,16
> <+20>: ret
>
> generic_ror32:
> <+0>: addi sp,sp,-16
> <+2>: andi a1,a1,31
> <+4>: sd s0,0(sp)
> <+6>: sd ra,8(sp)
> <+8>: addi s0,sp,16
> <+10>: negw a5,a1
> <+14>: sllw a5,a0,a5
> <+18>: ld ra,8(sp)
> <+20>: ld s0,0(sp)
> <+22>: srlw a0,a0,a1
> <+26>: or a0,a0,a5
> <+28>: slli a0,a0,0x30
> <+30>: srli a0,a0,0x30
> <+32>: addi sp,sp,16
> <+34>: ret
> ```
> It can be found that the zbb optimized implementation uses fewer instructions,
> even for 16-bit and 8-bit data.
Far too many register spills to stack.
I think you've forgotten to specify -O2
David
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2025-06-29 10:38 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-20 11:16 [PATCH 0/2] Implementing bitops rotate using riscv Zbb extension cp0613
2025-06-20 11:16 ` cp0613
2025-06-20 11:16 ` [PATCH 1/2] bitops: generic rotate cp0613
2025-06-20 11:16 ` cp0613
2025-06-20 15:47 ` kernel test robot
2025-06-20 15:47 ` kernel test robot
2025-06-23 11:59 ` kernel test robot
2025-06-23 11:59 ` kernel test robot
2025-06-20 11:16 ` [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension cp0613
2025-06-20 11:16 ` cp0613
2025-06-20 16:20 ` Yury Norov
2025-06-20 16:20 ` Yury Norov
2025-06-25 16:02 ` David Laight
2025-06-25 16:02 ` David Laight
2025-06-28 12:08 ` cp0613
2025-06-28 12:08 ` cp0613
2025-06-29 10:38 ` David Laight [this message]
2025-06-29 10:38 ` David Laight
2025-06-30 12:14 ` cp0613
2025-06-30 12:14 ` cp0613
2025-06-30 17:35 ` David Laight
2025-06-30 17:35 ` David Laight
2025-07-01 13:01 ` cp0613
2025-07-01 13:01 ` cp0613
2025-06-28 11:13 ` cp0613
2025-06-28 11:13 ` cp0613
2025-06-29 1:48 ` Yury Norov
2025-06-29 1:48 ` Yury Norov
2025-06-30 12:04 ` cp0613
2025-06-30 12:04 ` cp0613
2025-06-30 16:53 ` Yury Norov
2025-06-30 16:53 ` Yury Norov
2025-07-01 12:47 ` cp0613
2025-07-01 12:47 ` cp0613
2025-07-01 18:32 ` Yury Norov
2025-07-01 18:32 ` Yury Norov
2025-07-02 10:11 ` David Laight
2025-07-02 10:11 ` David Laight
2025-07-03 16:58 ` Yury Norov
2025-07-03 16:58 ` Yury Norov
2025-07-02 12:30 ` cp0613
2025-07-02 12:30 ` cp0613
-- strict thread matches above, loose matches on Subject: below --
2025-06-20 17:40 kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250629113840.2f319956@pumpkin \
--to=david.laight.linux@gmail.com \
--cc=alex@ghiti.fr \
--cc=aou@eecs.berkeley.edu \
--cc=arnd@arndb.de \
--cc=cp0613@linux.alibaba.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@rasmusvillemoes.dk \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.