public inbox for linux-arch@vger.kernel.org
 help / color / mirror / Atom feed
From: David Laight <david.laight.linux@gmail.com>
To: cp0613@linux.alibaba.com
Cc: alex@ghiti.fr, aou@eecs.berkeley.edu, arnd@arndb.de,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-riscv@lists.infradead.org, linux@rasmusvillemoes.dk,
	palmer@dabbelt.com, paul.walmsley@sifive.com,
	yury.norov@gmail.com
Subject: Re: [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension
Date: Mon, 30 Jun 2025 18:35:34 +0100	[thread overview]
Message-ID: <20250630183534.160b9823@pumpkin> (raw)
In-Reply-To: <20250630121430.1989-1-cp0613@linux.alibaba.com>

On Mon, 30 Jun 2025 20:14:30 +0800
cp0613@linux.alibaba.com wrote:

> On Sun, 29 Jun 2025 11:38:40 +0100, david.laight.linux@gmail.com wrote:
> 
> > > It can be found that the zbb optimized implementation uses fewer instructions,
> > > even for 16-bit and 8-bit data.  
> > 
> > Far too many register spills to stack.
> > I think you've forgotten to specify -O2  
> 
> Yes, I extracted it from the vmlinux disassembly, without compiling with -O2, and
> I used the web tool you provided as follows:
> ```
> unsigned int generic_ror32(unsigned int word, unsigned int shift)
> {
> 	return (word >> (shift & 31)) | (word << ((-shift) & 31));
> }
> 
> unsigned int zbb_opt_ror32(unsigned int word, unsigned int shift)
> {
> #ifdef __riscv
> 	__asm__ volatile("nop"); // ALTERNATIVE(nop)
> 
> 	__asm__ volatile(
> 		".option push\n"
> 		".option arch,+zbb\n"
> 		"rorw %0, %1, %2\n"
> 		".option pop\n"
> 		: "=r" (word) : "r" (word), "r" (shift) :);
> #endif
> 	return word;
> }
> 
> unsigned short generic_ror16(unsigned short word, unsigned int shift)
> {
> 	return (word >> (shift & 15)) | (word << ((-shift) & 15));
> }
> 
> unsigned short zbb_opt_ror16(unsigned short word, unsigned int shift)
> {
> 	unsigned int word32 = ((unsigned int)word << 16) | word;
> #ifdef __riscv
> 	__asm__ volatile("nop"); // ALTERNATIVE(nop)
> 
> 	__asm__ volatile(
> 		".option push\n"
> 		".option arch,+zbb\n"
> 		"rorw %0, %1, %2\n"
> 		".option pop\n"
> 		: "=r" (word32) : "r" (word32), "r" (shift) :);
> #endif
> 	return (unsigned short)word;
> }
> ```
> The disassembly obtained is:
> ```
> generic_ror32:
>     andi    a1,a1,31

The compiler shouldn't be generating that mask.
After all it knows the negated value doesn't need the same mask.
(I'd guess the cpu just ignores the high bits of the shift - most do.)

>     negw    a5,a1
>     sllw    a5,a0,a5
>     srlw    a0,a0,a1
>     or      a0,a5,a0
>     ret
> 
> zbb_opt_ror32:
>     nop
>     rorw a0, a0, a1
>     sext.w  a0,a0

Is that a sign extend?
Why is it there?
If it is related to the (broken) 'feature' of riscv-64 that 32bit results
are sign extended, why isn't there one in the example above.

You also need to consider the code for non-zbb cpu.

>     ret
> 
> generic_ror16:
>     andi    a1,a1,15
>     negw    a5,a1
>     andi    a5,a5,15
>     sllw    a5,a0,a5
>     srlw    a0,a0,a1
>     or      a0,a0,a5
>     slli    a0,a0,48
>     srli    a0,a0,48

The last two instructions mask the result with 0xffff.
If that is necessary it is missing from the zbb version below.

>     ret
> 
> zbb_opt_ror16:
>     slliw   a5,a0,16
>     addw    a5,a5,a0

At this point you can just do a 'shift right' on all cpu.
For rol16 you can do a variable shift left and a 16 bit
shift right on all cpu.
If the zbb version ends up with a nop (as below) then it is
likely to be much the same speed.

	David

>     nop
>     rorw a5, a5, a1
>     ret
> ```
> 
> Thanks,
> Pei


  reply	other threads:[~2025-06-30 17:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 11:16 [PATCH 0/2] Implementing bitops rotate using riscv Zbb extension cp0613
2025-06-20 11:16 ` [PATCH 1/2] bitops: generic rotate cp0613
2025-06-20 15:47   ` kernel test robot
2025-06-23 11:59   ` kernel test robot
2025-06-20 11:16 ` [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension cp0613
2025-06-20 16:20   ` Yury Norov
2025-06-25 16:02     ` David Laight
2025-06-28 12:08       ` cp0613
2025-06-29 10:38         ` David Laight
2025-06-30 12:14           ` cp0613
2025-06-30 17:35             ` David Laight [this message]
2025-07-01 13:01               ` cp0613
2025-06-28 11:13     ` cp0613
2025-06-29  1:48       ` Yury Norov
2025-06-30 12:04         ` cp0613
2025-06-30 16:53           ` Yury Norov
2025-07-01 12:47             ` cp0613
2025-07-01 18:32               ` Yury Norov
2025-07-02 10:11                 ` David Laight
2025-07-03 16:58                   ` Yury Norov
2025-07-02 12:30                 ` cp0613

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250630183534.160b9823@pumpkin \
    --to=david.laight.linux@gmail.com \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=cp0613@linux.alibaba.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox