public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
	"Ingo Molnar" <mingo@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>,
	"Arnd Bergmann" <arnd@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Dave Hansen" <dave.hansen@linux.intel.com>,
	x86@kernel.org, "Juergen Gross" <jgross@suse.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Alexander Usyskin" <alexander.usyskin@intel.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Mateusz Jończyk" <mat.jonczyk@o2.pl>,
	"Mike Rapoport" <rppt@kernel.org>,
	"Ard Biesheuvel" <ardb@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org
Subject: Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C
Date: Tue, 29 Apr 2025 20:13:37 +0100	[thread overview]
Message-ID: <81ed8b53-1a40-4777-ab87-4f4abe032dbc@citrix.com> (raw)
In-Reply-To: <CAHk-=wig1E4B-e1_6=it1LfVQ64DJsVgO6f6Ytnbzm2qChbAdw@mail.gmail.com>

On 29/04/2025 7:05 pm, Linus Torvalds wrote:
> On Tue, 29 Apr 2025 at 07:38, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> I tried that.  (The thread started as a question around
>> __builtin_constant_p() but did grow to cover __builtin_ffs().)
> Maybe we could do something like
>
>    #define ffs(x) \
>         (statically_true((x) != 0) ? __ffs(x)+1 : __builtin_ffs(x))
>
> which uses our "statically_true()" helper that is actually fairly good
> at the whole "let the compiler tell us that it knows that value cannot
> be zero"
>
> I didn't check what code that generated, but I've seen gcc do well on
> that statically_true() thing in the past.
>
> Then we can just remove our current variable_ffs() thing entirely,
> because we now depend on our (good) __ffs() and the builtin being
> "good enough" for the bad case.

That would improve code generation for 32bit, but generally regress 64bit.

Preloading the destination register with -1 is better than the CMOV form
emitted by the builtin; BSF's habit of conditionally not writing the
destination register *is* a CMOV of sorts.


When I cleaned this up in Xen, there were several factors where I
thought improvements could be made.

Having both ffs() and __ffs(), where the latter is undefined in a common
case, is a trap waiting for an unwary programmer.  I have no particular
love for ffs() being off-by-one from normal, but is well defined for all
inputs.

Also, leaving the constant folding to the arch-optimised form means that
it often gets forgotten.  Therefore, I rearranged everything to have
this be common:

static always_inline attr_const unsigned int ffs(unsigned int x)
{
    if ( __builtin_constant_p(x) )
        return __builtin_ffs(x);

#ifdef arch_ffs
    return arch_ffs(x);
#else
    return generic_ffsl(x);
#endif
}

with most architectures implementing arch_ffs as:

#define arch_ffs(x) ((x) ? 1 + __builtin_ctz(x) : 0)

and x86 as:

static always_inline unsigned int arch_ffs(unsigned int x)
{
    unsigned int r;

    if ( __builtin_constant_p(x > 0) && x > 0 )
    {
        /*
         * A common code pattern is:
         *
         *     while ( bits )
         *     {
         *         bit = ffs(bits);
         *         ...
         *
         * and the optimiser really can work with the knowledge of x being
         * non-zero without knowing it's exact value, in which case we don't
         * need to compensate for BSF's corner cases.  Otherwise...
         */
        asm ( "bsf %[val], %[res]"
              : [res] "=r" (r)
              : [val] "rm" (x) );
    }
    else
    {
        /*
         * ... the AMD manual states that BSF won't modify the destination
         * register if x=0.  The Intel manual states that the result is
         * undefined, but the architects have said that the register is
         * written back with it's old value (zero extended as normal).
         */
        asm ( "bsf %[val], %[res]"
              : [res] "=r" (r)
              : [val] "rm" (x), "[res]" (-1) );
    }

    return r + 1;
}
#define arch_ffs arch_ffs

and finally, providing compatibility for the other forms as:

#define __ffs(x) (ffs(x) - 1)


The end result is fewer APIs to implement in arch-specific code, and the
removal of undefined behaviour.

That said, I don't envy anyone wanting to try and untangle this in
Linux, even if consensus were to agree on it as an approach.

~Andrew

  reply	other threads:[~2025-04-29 19:13 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-25 14:15 [PATCH] [RFC] x86/cpu: rework instruction set selection Arnd Bergmann
2025-04-25 15:34 ` H. Peter Anvin
2025-04-25 16:13   ` Arnd Bergmann
2025-04-25 20:15     ` H. Peter Anvin
2025-04-26  9:08 ` Ingo Molnar
2025-04-26 13:17   ` H. Peter Anvin
2025-04-26 18:55     ` Ingo Molnar
2025-04-27  0:35       ` H. Peter Anvin
2025-04-26 18:58   ` Arnd Bergmann
2025-04-26 19:09     ` Ingo Molnar
2025-04-27 13:24       ` Arnd Bergmann
2025-04-27 21:17         ` H. Peter Anvin
2025-04-26 19:24     ` Linus Torvalds
2025-04-26 19:55       ` Linus Torvalds
2025-04-26 23:47         ` H. Peter Anvin
2025-04-27 10:18           ` Ingo Molnar
2025-04-27  0:02         ` H. Peter Anvin
2025-04-27 19:17         ` Andrew Cooper
2025-04-27 19:34           ` Linus Torvalds
2025-04-27 21:14             ` H. Peter Anvin
2025-04-28  6:58             ` [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case handling to C Ingo Molnar
2025-04-28  7:05               ` Ingo Molnar
2025-04-28  7:14                 ` Ingo Molnar
2025-04-28 12:30                   ` Arnd Bergmann
2025-04-28 13:41                   ` H. Peter Anvin
2025-04-28 16:23                   ` Linus Torvalds
2025-04-29 10:08                     ` Ingo Molnar
2025-04-29 14:32                       ` H. Peter Anvin
2025-04-28 16:14                 ` Linus Torvalds
2025-04-28 21:38                   ` H. Peter Anvin
2025-04-29  0:12                     ` Andrew Cooper
2025-04-29  2:00                       ` H. Peter Anvin
2025-04-29  2:22                         ` Linus Torvalds
2025-04-29  2:25                         ` Andrew Cooper
2025-04-29  3:13                           ` H. Peter Anvin
2025-04-29 14:38                             ` Andrew Cooper
2025-04-29 18:05                               ` Linus Torvalds
2025-04-29 19:13                                 ` Andrew Cooper [this message]
2025-04-29 20:12                                   ` Linus Torvalds
2025-04-29 21:23                                     ` H. Peter Anvin
2025-04-29 21:53                                       ` Linus Torvalds
2025-04-29 21:59                                         ` Andrew Cooper
2025-04-29 22:04                                           ` Linus Torvalds
2025-04-29 22:10                                             ` H. Peter Anvin
2025-04-29 22:22                                             ` Andrew Cooper
2025-04-29 22:34                                               ` Linus Torvalds
2025-04-27  9:50       ` [PATCH] [RFC] x86/cpu: rework instruction set selection Ingo Molnar
2025-04-30 21:54       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81ed8b53-1a40-4777-ab87-4f4abe032dbc@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=alexander.usyskin@intel.com \
    --cc=ardb@kernel.org \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mat.jonczyk@o2.pl \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox