Re: Linux 5.19-rc8 - Russell King (Oracle)

public inbox for linux-m68k@lists.linux-m68k.org
 help / color / mirror / Atom feed

From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yury Norov <yury.norov@gmail.com>,
	Dennis Zhou <dennis@kernel.org>,
	Guenter Roeck <linux@roeck-us.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	linux-m68k@lists.linux-m68k.org
Subject: Re: Linux 5.19-rc8
Date: Tue, 26 Jul 2022 20:44:34 +0100	[thread overview]
Message-ID: <YuBEIiLL1xZVyEFl@shell.armlinux.org.uk> (raw)
In-Reply-To: <CAHk-=wg2-j8zocUjurAeg_bimNz7C5h5HDEXKK6PxDmR+DaHRg@mail.gmail.com>

On Tue, Jul 26, 2022 at 11:36:21AM -0700, Linus Torvalds wrote:
> On Tue, Jul 26, 2022 at 11:18 AM Yury Norov <yury.norov@gmail.com> wrote:
> >
> > We have find_bit_benchmark to check how it works in practice. Would
> > be great if someone with access to the hardware can share numbers.
> 
> Honestly, I doubt benchmarking find_bit in a loop is all that sensible.

Yes, that's what I was thinking - I've never seen it crop up in any of
the perf traces I've seen.

Nevertheless, here's some numbers from a single run of the
find_bit_benchmark module, kernel built with:
arm-linux-gnueabihf-gcc (Debian 10.2.1-6) 10.2.1 20210110

Current native implementation:

[   46.184565]
               Start testing find_bit() with random-filled bitmap
[   46.195127] find_next_bit:                 2440833 ns, 163112 iterations
[   46.204226] find_next_zero_bit:            2372128 ns, 164569 iterations
[   46.213152] find_last_bit:                 2199779 ns, 163112 iterations
[   46.299398] find_first_bit:               79526013 ns,  16234 iterations
[   46.684026] find_first_and_bit:          377912990 ns,  32617 iterations
[   46.692020] find_next_and_bit:             1269071 ns,  73562 iterations
[   46.698745]
               Start testing find_bit() with sparse bitmap
[   46.705711] find_next_bit:                  118652 ns,    656 iterations
[   46.716621] find_next_zero_bit:            4183472 ns, 327025 iterations
[   46.723395] find_last_bit:                   50448 ns,    656 iterations
[   46.762308] find_first_bit:               32190802 ns,    656 iterations
[   46.769093] find_first_and_bit:              52129 ns,      1 iterations
[   46.775882] find_next_and_bit:               62522 ns,      1 iterations

Generic implementation:

[   25.149238]
               Start testing find_bit() with random-filled bitmap
[   25.160002] find_next_bit:                 2640943 ns, 163537 iterations
[   25.169567] find_next_zero_bit:            2838485 ns, 164144 iterations
[   25.178595] find_last_bit:                 2302372 ns, 163538 iterations
[   25.204016] find_first_bit:               18697630 ns,  16373 iterations
[   25.602571] find_first_and_bit:          391841480 ns,  32555 iterations
[   25.610563] find_next_and_bit:             1260306 ns,  73587 iterations
[   25.617295]
               Start testing find_bit() with sparse bitmap
[   25.624222] find_next_bit:                   70289 ns,    656 iterations
[   25.636478] find_next_zero_bit:            5527050 ns, 327025 iterations
[   25.643253] find_last_bit:                   52147 ns,    656 iterations
[   25.657304] find_first_bit:                7328573 ns,    656 iterations
[   25.664087] find_first_and_bit:              48518 ns,      1 iterations
[   25.670871] find_next_and_bit:               59750 ns,      1 iterations

Overall, I would say it's pretty similar (some generic perform
marginally better, some native perform marginally better) with the
exception of find_first_bit() being much better with the generic
implementation, but find_next_zero_bit() being noticably worse.

So, pretty much nothing of any relevance between them, which may
come as a surprise given the byte vs word access differences between
the two implementations.

I suspect the reason behind that may be because the native
implementation code is smaller than the generic implementation,
outweighing the effects of the by-byte rather than by-word. I would
also suspect that, because of the smaller implementation, the native
version performs better in a I$-cool situation than the generic. Lastly,
I would suspect if we fixed the bug in the native version, and converted
it to use word loads, it would probably be better than the generic
version. I haven't anything to base that on other than gut feeling at
the moment, but I can make the changes to the native implementation and
see what effect that has, possibly tomorrow.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

next prev parent reply	other threads:[~2022-07-26 19:44 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAHk-=wiWwDYxNhnStS0e+p-NTFAQSHvab=2-8LwxunCVuY5-2A@mail.gmail.com>
     [not found] ` <20220725161141.GA1306881@roeck-us.net>
     [not found]   ` <CAHk-=whtGUwJwHUSNsXd4g7cok=n0Zwje7nACp8skh1fa2NtJA@mail.gmail.com>
     [not found]     ` <YuAm5h1B6bsrR/9q@fedora>
     [not found]       ` <CAHk-=wgYpJTMMxmfbpqc=JVtSK0Zj4b15G=AvEYk6vPNySDSsA@mail.gmail.com>
2022-07-26 18:18         ` Linux 5.19-rc8 Yury Norov
2022-07-26 18:36           ` Linus Torvalds
2022-07-26 19:44             ` Russell King (Oracle) [this message]
2022-07-26 20:20               ` Linus Torvalds
2022-07-27  0:15                 ` Russell King (Oracle)
2022-07-27  1:33                   ` Yury Norov
2022-07-27  7:43                     ` Russell King (Oracle)
2022-07-30 21:38                       ` Yury Norov
2022-08-01 15:48                         ` Russell King (Oracle)
2022-08-01 15:54                           ` Russell King (Oracle)
2022-07-27  7:46                     ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YuBEIiLL1xZVyEFl@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=catalin.marinas@arm.com \
    --cc=dennis@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux@roeck-us.net \
    --cc=torvalds@linux-foundation.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox