All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Arnd Bergmann" <arnd@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Alexander Usyskin" <alexander.usyskin@intel.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Mateusz Jończyk" <mat.jonczyk@o2.pl>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org
Subject: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C
Date: Mon, 28 Apr 2025 08:58:31 +0200	[thread overview]
Message-ID: <aA8nF0moBYOIgC5J@gmail.com> (raw)
In-Reply-To: <CAHk-=wj0S2vWui0Y+1hpYMEhCiXKexbQ01h+Ckvww8hB29az_A@mail.gmail.com>


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, 27 Apr 2025 at 12:17, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >
> > ffs/fls are commonly found inside loops where x is the loop condition
> > too.  Therefore, using statically_true() to provide a form without the
> > zero compatibility turns out to be a win.
> 
> We already have the version without the zero capability - it's just
> called "__ffs()" and "__fls()", and performance-critical code uses
> those.
> 
> So fls/ffs are the "standard" library functions that have to handle
> zero, and add that stupid "+1" because that interface was designed by
> some Pascal person who doesn't understand that we start counting from
> 0.
> 
> Standards bodies: "companies aren't sending their best people".
> 
> But it's silly that we then spend effort on magic cmov in inline asm
> on those things when it's literally the "don't use this version unless
> you don't actually care about performance" case.
> 
> I don't think it would be wrong to just make the x86-32 code just do
> the check against zero ahead of time - in C.
> 
> And yes, that will generate some extra code - you'll test for zero
> before, and then the caller might also test for a zero result that
> then results in another test for zero that can't actually happen (but
> the compiler doesn't know that). But I suspect that on the whole, it
> is likely to generate better code anyway just because the compiler
> sees that first test and can DTRT.
> 
> UNTESTED patch applied in case somebody wants to play with this. It
> removes 10 lines of silly code, and along with them that 'cmov' use.
> 
> Anybody?

Makes sense - it seems to boot here, but I only did some very light 
testing.

There's a minor text size increase on x86-32 defconfig, GCC 14.2.0:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Patch with changelog and your SOB added attached. Does it look good to 
you?

Thanks,

	Ingo

================>
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 28 Apr 2025 08:38:35 +0200
Subject: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C

Don't do the complicated and probably questionable BS*L+CMOVZL
asm() optimization in variable_ffs() and fls(): performance-critical
code is already using __ffs() and __fls() that use sane interfaces
close to the machine instruction ABI. Check ahead for zero in C.

There's a minor text size increase on x86-32 defconfig:

      text       data        bss         dec        hex    filename
  16577728    7598826    1744896    25921450    18b87aa    vmlinux.before
  16577908    7598838    1744896    25921642    18b886a    vmlinux.after

bloatometer output:

  add/remove: 2/1 grow/shrink: 201/189 up/down: 5681/-3486 (2195)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/bitops.h | 22 ++++++----------------
 1 file changed, 6 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 100413aff640..6061c87f14ac 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -321,15 +321,10 @@ static __always_inline int variable_ffs(int x)
 	asm("bsfl %1,%0"
 	    : "=r" (r)
 	    : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-	asm("bsfl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=&r" (r) : "rm" (x), "r" (-1));
 #else
-	asm("bsfl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
+	if (!x)
+		return 0;
+	asm("bsfl %1,%0" : "=r" (r) : "rm" (x));
 #endif
 	return r + 1;
 }
@@ -378,15 +373,10 @@ static __always_inline int fls(unsigned int x)
 	asm("bsrl %1,%0"
 	    : "=r" (r)
 	    : ASM_INPUT_RM (x), "0" (-1));
-#elif defined(CONFIG_X86_CMOV)
-	asm("bsrl %1,%0\n\t"
-	    "cmovzl %2,%0"
-	    : "=&r" (r) : "rm" (x), "rm" (-1));
 #else
-	asm("bsrl %1,%0\n\t"
-	    "jnz 1f\n\t"
-	    "movl $-1,%0\n"
-	    "1:" : "=r" (r) : "rm" (x));
+	if (!x)
+		return 0;
+	asm("bsrl %1,%0" : "=r" (r) : "rm" (x));
 #endif
 	return r + 1;
 }


  parent reply	other threads:[~2025-04-28  6:58 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 14:15 [PATCH] [RFC] x86/cpu: rework instruction set selection Arnd Bergmann
2025-04-25 15:34 ` H. Peter Anvin
2025-04-25 16:13   ` Arnd Bergmann
2025-04-25 20:15     ` H. Peter Anvin
2025-04-26  9:08 ` Ingo Molnar
2025-04-26 13:17   ` H. Peter Anvin
2025-04-26 18:55     ` Ingo Molnar
2025-04-27  0:35       ` H. Peter Anvin
2025-04-26 18:58   ` Arnd Bergmann
2025-04-26 19:09     ` Ingo Molnar
2025-04-27 13:24       ` Arnd Bergmann
2025-04-27 21:17         ` H. Peter Anvin
2025-04-26 19:24     ` Linus Torvalds
2025-04-26 19:55       ` Linus Torvalds
2025-04-26 23:47         ` H. Peter Anvin
2025-04-27 10:18           ` Ingo Molnar
2025-04-27  0:02         ` H. Peter Anvin
2025-04-27 19:17         ` Andrew Cooper
2025-04-27 19:34           ` Linus Torvalds
2025-04-27 21:14             ` H. Peter Anvin
2025-04-28  6:58             ` Ingo Molnar [this message]
2025-04-28  7:05               ` [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C Ingo Molnar
2025-04-28  7:14                 ` Ingo Molnar
2025-04-28 12:30                   ` Arnd Bergmann
2025-04-28 13:41                   ` H. Peter Anvin
2025-04-28 16:23                   ` Linus Torvalds
2025-04-29 10:08                     ` Ingo Molnar
2025-04-29 14:32                       ` H. Peter Anvin
2025-04-28 16:14                 ` Linus Torvalds
2025-04-28 21:38                   ` H. Peter Anvin
2025-04-29  0:12                     ` Andrew Cooper
2025-04-29  2:00                       ` H. Peter Anvin
2025-04-29  2:22                         ` Linus Torvalds
2025-04-29  2:25                         ` Andrew Cooper
2025-04-29  3:13                           ` H. Peter Anvin
2025-04-29 14:38                             ` Andrew Cooper
2025-04-29 18:05                               ` Linus Torvalds
2025-04-29 19:13                                 ` Andrew Cooper
2025-04-29 20:12                                   ` Linus Torvalds
2025-04-29 21:23                                     ` H. Peter Anvin
2025-04-29 21:53                                       ` Linus Torvalds
2025-04-29 21:59                                         ` Andrew Cooper
2025-04-29 22:04                                           ` Linus Torvalds
2025-04-29 22:10                                             ` H. Peter Anvin
2025-04-29 22:22                                             ` Andrew Cooper
2025-04-29 22:34                                               ` Linus Torvalds
2025-04-27  9:50       ` [PATCH] [RFC] x86/cpu: rework instruction set selection Ingo Molnar
2025-04-30 21:54       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aA8nF0moBYOIgC5J@gmail.com \
    --to=mingo@kernel.org \
    --cc=alexander.usyskin@intel.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mat.jonczyk@o2.pl \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.