Re: [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Yury Norov <yury.norov@gmail.com>
To: David Laight <david.laight.linux@gmail.com>
Cc: cp0613@linux.alibaba.com, alex@ghiti.fr, aou@eecs.berkeley.edu,
	arnd@arndb.de, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
	linux@rasmusvillemoes.dk, palmer@dabbelt.com,
	paul.walmsley@sifive.com
Subject: Re: [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension
Date: Thu, 3 Jul 2025 12:58:28 -0400	[thread overview]
Message-ID: <aGa2tKXJutz_Gvsi@yury> (raw)
In-Reply-To: <20250702111135.37854d1b@pumpkin>

On Wed, Jul 02, 2025 at 11:11:35AM +0100, David Laight wrote:
> On Tue, 1 Jul 2025 14:32:14 -0400
> Yury Norov <yury.norov@gmail.com> wrote:
> 
> I'd not worry about rotates of 8 bits or more (for ror8).
> They can be treated as 'undefined behaviour' under the assumption they don't happen.

Good for you. But generic implementation is safe against overflowing
the shift, so the arch must be safe as well.

> The 'generic' version needs them to get gcc to generate a 'rorb' on x86.
> The negated shift needs masking so that clang doesn't throw the code away when
> the value is constant.

...

> > > I compared the performance of ror8 (zbb optimized) and generic_ror8 on the XUANTIE C908
> > > by looping them. ror8 is better, and the advantage of ror8 becomes more obvious as the
> > > number of iterations increases. The test code is as follows:
> > > ```
> > > 	u8 word = 0x5a;
> > > 	u32 shift = 9;
> > > 	u32 i, loop = 100;
> > > 	u8 ret1, ret2;
> > > 
> > > 	u64 t1 = ktime_get_ns();
> > > 	for (i = 0; i < loop; i++) {
> > > 		ret2 = generic_ror8(word, shift);
> > > 	}
> > > 	u64 t2 = ktime_get_ns();
> > > 	for (i = 0; i < loop; i++) {
> > > 		ret1 = ror8(word, shift);
> > > 	}
> > > 	u64 t3 = ktime_get_ns();
> > > 
> > > 	pr_info("t2-t1=%lld t3-t2=%lld\n", t2 - t1, t3 - t2);
> > > ```  
> > 
> > Please do the following:
> > 
> > 1. Drop the generic_ror8() and keep only ror/l8()
> > 2. Add ror/l16, 34 and 64 tests.
> > 3. Adjust the 'loop' so that each subtest will take 1-10 ms on your hw.
> 
> That is far too many iterations.
> You'll get interrupts dominating the tests.

That's interesting observation. Can you show numbers for your
hardware?

> The best thing is to do 'just enough' iterations to get a meaningful result,
> and then repeat a few times and report the fastest (or average excluding
> any large outliers).
> 
> You also need to ensure the compiler doesn't (or isn't allowed to) pull
> the contents of the inlined function outside the loop - and then throw
> the loop away,

Not me - Chen Pei needs. I wrote __always_used for it. It should
help.
 
> The other question is whether any of it is worth the effort.
> How many ror8() and ror16() calls are there?
> I suspect not many.

I'm not a RISC-V engineer, and I can't judge how they want to use the
functions. This doesn't bring significant extra burden on generic
side, so I don't object against arch ror8() on RISCs.

> Improving the generic ones might be worth while.
> Perhaps moving the current versions to x86 only.
> (I suspect the only other cpu with byte/short rotates is m68k)
> 
> 	David

next prev parent reply	other threads:[~2025-07-03 16:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-20 11:16 [PATCH 0/2] Implementing bitops rotate using riscv Zbb extension cp0613
2025-06-20 11:16 ` [PATCH 1/2] bitops: generic rotate cp0613
2025-06-20 15:47   ` kernel test robot
2025-06-23 11:59   ` kernel test robot
2025-06-20 11:16 ` [PATCH 2/2] bitops: rotate: Add riscv implementation using Zbb extension cp0613
2025-06-20 16:20   ` Yury Norov
2025-06-25 16:02     ` David Laight
2025-06-28 12:08       ` cp0613
2025-06-29 10:38         ` David Laight
2025-06-30 12:14           ` cp0613
2025-06-30 17:35             ` David Laight
2025-07-01 13:01               ` cp0613
2025-06-28 11:13     ` cp0613
2025-06-29  1:48       ` Yury Norov
2025-06-30 12:04         ` cp0613
2025-06-30 16:53           ` Yury Norov
2025-07-01 12:47             ` cp0613
2025-07-01 18:32               ` Yury Norov
2025-07-02 10:11                 ` David Laight
2025-07-03 16:58                   ` Yury Norov [this message]
2025-07-02 12:30                 ` cp0613

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGa2tKXJutz_Gvsi@yury \
    --to=yury.norov@gmail.com \
    --cc=alex@ghiti.fr \
    --cc=aou@eecs.berkeley.edu \
    --cc=arnd@arndb.de \
    --cc=cp0613@linux.alibaba.com \
    --cc=david.laight.linux@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).