public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Peter Anvin <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: More annoying code generation by clang
Date: Sat, 6 Apr 2024 12:56:27 +0200	[thread overview]
Message-ID: <ZhEqW748nht2M4Si@gmail.com> (raw)
In-Reply-To: <CAHk-=whHWjKK1TOMT1XvxFj8e-_uctJnXPxM=SyWHmW63B_EDw@mail.gmail.com>


[ I've Cc:-ed a few more people who might be interested in this. ]
[ Copy of Linus's email below. ]

* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> So this doesn't really matter in any real life situation, but it
> really grated on me.
> 
> Clang has this nasty habit of taking our nice asm constraints, and
> turning them into worst-case garbage. It's been reported a couple of
> times where we use "g" to tell the compiler that pretty much any
> source to the asm works, and then clang takes that to mean "I will
> take that to use 'memory'" even when that makes no sense what-so-ever.
> 
> See for example
> 
>     https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
> 
> where I was ranting about clang just doing pointlessly stupid things.
> 
> However, I found a case where yes, clang does pointlessly stupid
> things, but it's at least _partly_ our fault, and gcc can't generate
> optimal code either.
> 
> We have this fairly critical code in __fget_files_rcu() to look up a
> 'struct file *' from an fd, and it does this:
> 
>                 /* Mask is a 0 for invalid fd's, ~0 for valid ones */
>                 nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> and clang makes a *horrid* mess of it, generating this code:
> 
>         movl    %edi, %r14d
>         movq    32(%rbx), %rdx
>         movl    (%rdx), %eax
>         movq    %rax, 8(%rsp)
>         cmpq    8(%rsp), %r14
>         sbbq    %rcx, %rcx
> 
> which is just crazy. Notice how it does that "move rax to stack, then
> do the compare against the stack", instead of just using %rax.
> 
> In fact, that function shouldn't have a stack frame at all, and the
> only reason it is generated is because of this whole oddity.
> 
> All clang's fault, right?
> 
> Yeah, mostly. But it turns out that what really messes with clangs
> little head is that the x86 array_index_mask_nospec() function is
> being a bit annoying.
> 
> This is what we do:
> 
>   static __always_inline unsigned long
> array_index_mask_nospec(unsigned long index,
>                 unsigned long size)
>   {
>         unsigned long mask;
> 
>         asm volatile ("cmp %1,%2; sbb %0,%0;"
>                         :"=r" (mask)
>                         :"g"(size),"r" (index)
>                         :"cc");
>         return mask;
>   }
> 
> and look at the use again:
> 
>         nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
> 
> here all the values are actually 'unsigned int'. So what happens is
> that clang can't just use the fdt->max_fds value *directly* from
> memory, because it needs to be expanded from 32-bit to 64-bit because
> we've made our array_index_mask_nospec() function only work on 64-bit
> 'unsigned long' values.
> 
> So it turns out that by massaging this a bit, and making it just be a
> macro - so that the asm can decide that "I can do this in 32-bit" - I
> can get clang to generate much better code.
> 
> Clang still absolutely hates the "g" constraint, so to get clang to
> really get this right I have to use "ir" instead of "g". Which is
> wrong. Because gcc does this right, and could use the memory op
> directly. But even gcc cannot do that with our *current* function,
> because of that "the memory value is 32-bit, we require a 64-bit
> value"
> 
> Anyway, I can get gcc to generate the right code:
> 
>         movq    32(%r13), %rdx
>         cmp (%rdx),%ebx
>         sbb %esi,%esi
> 
> which is basically the right code for the six crazy instructions clang
> generates. And if I make the "g" be "ir", I can get clang to generate
> 
>         movq    32(%rdi), %rcx
>         movl    (%rcx), %eax
>         cmpl    %eax, %esi
>         sbbl    %esi, %esi
> 
> which is the same thing, but with that (pointless) load to a register.
> 
> And now clang doesn't generate that stack frame at all.
> 
> Anyway, this was a long email to explain the odd attached patch.
> 
> Comments? Note that this patch is *entirely* untested, I have done
> this purely by looking at the code generation in fs/file.c.
> 
>                 Linus

>  arch/x86/include/asm/barrier.h | 23 +++++++++--------------
>  1 file changed, 9 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
>   * Returns:
>   *     0 - (index < size)
>   */
> -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> -		unsigned long size)
> -{
> -	unsigned long mask;
> -
> -	asm volatile ("cmp %1,%2; sbb %0,%0;"
> -			:"=r" (mask)
> -			:"g"(size),"r" (index)
> -			:"cc");
> -	return mask;
> -}
> -
> -/* Override the default implementation from linux/nospec.h. */
> -#define array_index_mask_nospec array_index_mask_nospec
> +#define array_index_mask_nospec(idx,sz) ({	\
> +	typeof((idx)+(sz)) __idx = (idx);	\
> +	typeof(__idx) __sz = (sz);		\
> +	typeof(__idx) __mask;			\
> +	asm volatile ("cmp %1,%2; sbb %0,%0"	\
> +			:"=r" (__mask)		\
> +			:"ir"(__sz),"r" (__idx)	\
> +			:"cc");			\
> +	__mask; })
>  
>  /* Prevent speculative execution past this barrier. */
>  #define barrier_nospec() asm volatile("lfence":::"memory")


  reply	other threads:[~2024-04-06 10:56 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
2024-04-06 10:56 ` Ingo Molnar [this message]
2024-04-06 12:30   ` Uros Bizjak
2024-04-06 15:39     ` Linus Torvalds
2024-04-06 16:04       ` Linus Torvalds
2024-04-08  8:49 ` Peter Zijlstra
2024-04-08 13:41   ` H. Peter Anvin
2024-04-08 18:32   ` Linus Torvalds
2024-04-08 19:42     ` Linus Torvalds
2024-04-09 10:45       ` Peter Zijlstra
2024-04-09 15:37         ` Nick Desaulniers
2024-04-10 19:23           ` Bill Wendling
2024-04-10  8:11       ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhEqW748nht2M4Si@gmail.com \
    --to=mingo@kernel.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=ubizjak@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox