public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Arnd Bergmann" <arnd@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Alexander Usyskin" <alexander.usyskin@intel.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Mateusz Jończyk" <mat.jonczyk@o2.pl>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org
Subject: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C
Date: Mon, 28 Apr 2025 08:58:31 +0200	[thread overview]
Message-ID: <aA8nF0moBYOIgC5J@gmail.com> (raw)
In-Reply-To: <CAHk-=wj0S2vWui0Y+1hpYMEhCiXKexbQ01h+Ckvww8hB29az_A@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, 27 Apr 2025 at 12:17, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > ffs/fls are commonly found inside loops where x is the loop condition
> > too.  Therefore, using statically_true() to provide a form without the
> > zero compatibility turns out to be a win.
> 
> We already have the version without the zero capability - it's just
> called "__ffs()" and "__fls()", and performance-critical code uses
> those.
> 
> So fls/ffs are the "standard" library functions that have to handle
> zero, and add that stupid "+1" because that interface was designed by
> some Pascal person who doesn't understand that we start counting from
> 0.
> 
> Standards bodies: "companies aren't sending their best people".
> 
> But it's silly that we then spend effort on magic cmov in inline asm
> on those things when it's literally the "don't use this version unless
> you don't actually care about performance" case.
> 
> I don't think it would be wrong to just make the x86-32 code just do
> the check against zero ahead of time - in C.
> 
> And yes, that will generate some extra code - you'll test for zero
> before, and then the caller might also test for a zero result that
> then results in another test for zero that can't actually happen (but
> the compiler doesn't know that). But I suspect that on the whole, it
> is likely to generate better code anyway just because the compiler
> sees that first test and can DTRT.
> 
> UNTESTED patch applied in case somebody wants to play with this. It
> removes 10 lines of silly code, and along with them that 'cmov' use.
> 
> Anybody?

Makes sense - it seems to boot here, but I only did some very light 
testing.

There's a minor text size increase on x86-32 defconfig, GCC 14.2.0:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Patch with changelog and your SOB added attached. Does it look good to 
you?

Thanks,

	Ingo

================>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 28 Apr 2025 08:38:35 +0200
Subject: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C

Don't do the complicated and probably questionable BS*L+CMOVZL
asm() optimization in variable_ffs() and fls(): performance-critical
code is already using __ffs() and __fls() that use sane interfaces
close to the machine instruction ABI. Check ahead for zero in C.

There's a minor text size increase on x86-32 defconfig:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/bitops.h | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 100413aff640..6061c87f14ac 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -321,15 +321,10 @@ static __always_inline int variable_ffs(int x)
 	asm("bsfl %1,%0"
 	    : "=r" (r)
 	    : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-	asm("bsfl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=&r" (r) : "rm" (x), "r" (-1));
 #else
-	asm("bsfl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
+	if (!x)
+		return 0;
+	asm("bsfl %1,%0" : "=r" (r) : "rm" (x));
 #endif
 	return r + 1;
 }
@@ -378,15 +373,10 @@ static __always_inline int fls(unsigned int x)
 	asm("bsrl %1,%0"
 	    : "=r" (r)
 	    : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-	asm("bsrl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=&r" (r) : "rm" (x), "rm" (-1));
 #else
-	asm("bsrl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
+	if (!x)
+		return 0;
+	asm("bsrl %1,%0" : "=r" (r) : "rm" (x));
 #endif
 	return r + 1;
 }

  parent reply	other threads:[~2025-04-28  6:58 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 14:15 [PATCH] [RFC] x86/cpu: rework instruction set selection Arnd Bergmann
2025-04-25 15:34 ` H. Peter Anvin
2025-04-25 16:13   ` Arnd Bergmann
2025-04-25 20:15     ` H. Peter Anvin
2025-04-26  9:08 ` Ingo Molnar
2025-04-26 13:17   ` H. Peter Anvin
2025-04-26 18:55     ` Ingo Molnar
2025-04-27  0:35       ` H. Peter Anvin
2025-04-26 18:58   ` Arnd Bergmann
2025-04-26 19:09     ` Ingo Molnar
2025-04-27 13:24       ` Arnd Bergmann
2025-04-27 21:17         ` H. Peter Anvin
2025-04-26 19:24     ` Linus Torvalds
2025-04-26 19:55       ` Linus Torvalds
2025-04-26 23:47         ` H. Peter Anvin
2025-04-27 10:18           ` Ingo Molnar
2025-04-27  0:02         ` H. Peter Anvin
2025-04-27 19:17         ` Andrew Cooper
2025-04-27 19:34           ` Linus Torvalds
2025-04-27 21:14             ` H. Peter Anvin
2025-04-28  6:58             ` Ingo Molnar [this message]
2025-04-28  7:05               ` [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C Ingo Molnar
2025-04-28  7:14                 ` Ingo Molnar
2025-04-28 12:30                   ` Arnd Bergmann
2025-04-28 13:41                   ` H. Peter Anvin
2025-04-28 16:23                   ` Linus Torvalds
2025-04-29 10:08                     ` Ingo Molnar
2025-04-29 14:32                       ` H. Peter Anvin
2025-04-28 16:14                 ` Linus Torvalds
2025-04-28 21:38                   ` H. Peter Anvin
2025-04-29  0:12                     ` Andrew Cooper
2025-04-29  2:00                       ` H. Peter Anvin
2025-04-29  2:22                         ` Linus Torvalds
2025-04-29  2:25                         ` Andrew Cooper
2025-04-29  3:13                           ` H. Peter Anvin
2025-04-29 14:38                             ` Andrew Cooper
2025-04-29 18:05                               ` Linus Torvalds
2025-04-29 19:13                                 ` Andrew Cooper
2025-04-29 20:12                                   ` Linus Torvalds
2025-04-29 21:23                                     ` H. Peter Anvin
2025-04-29 21:53                                       ` Linus Torvalds
2025-04-29 21:59                                         ` Andrew Cooper
2025-04-29 22:04                                           ` Linus Torvalds
2025-04-29 22:10                                             ` H. Peter Anvin
2025-04-29 22:22                                             ` Andrew Cooper
2025-04-29 22:34                                               ` Linus Torvalds
2025-04-27  9:50       ` [PATCH] [RFC] x86/cpu: rework instruction set selection Ingo Molnar
2025-04-30 21:54       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aA8nF0moBYOIgC5J@gmail.com \
    --to=mingo@kernel.org \
    --cc=alexander.usyskin@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mat.jonczyk@o2.pl \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox