From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Nick Desaulniers <ndesaulniers@google.com>,
Nathan Chancellor <nathan@kernel.org>,
Uros Bizjak <ubizjak@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, Peter Anvin <hpa@zytor.com>,
the arch/x86 maintainers <x86@kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: More annoying code generation by clang
Date: Sat, 6 Apr 2024 12:56:27 +0200 [thread overview]
Message-ID: <ZhEqW748nht2M4Si@gmail.com> (raw)
In-Reply-To: <CAHk-=whHWjKK1TOMT1XvxFj8e-_uctJnXPxM=SyWHmW63B_EDw@mail.gmail.com>
[ I've Cc:-ed a few more people who might be interested in this. ]
[ Copy of Linus's email below. ]
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> So this doesn't really matter in any real life situation, but it
> really grated on me.
>
> Clang has this nasty habit of taking our nice asm constraints, and
> turning them into worst-case garbage. It's been reported a couple of
> times where we use "g" to tell the compiler that pretty much any
> source to the asm works, and then clang takes that to mean "I will
> take that to use 'memory'" even when that makes no sense what-so-ever.
>
> See for example
>
> https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/
>
> where I was ranting about clang just doing pointlessly stupid things.
>
> However, I found a case where yes, clang does pointlessly stupid
> things, but it's at least _partly_ our fault, and gcc can't generate
> optimal code either.
>
> We have this fairly critical code in __fget_files_rcu() to look up a
> 'struct file *' from an fd, and it does this:
>
> /* Mask is a 0 for invalid fd's, ~0 for valid ones */
> nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
>
> and clang makes a *horrid* mess of it, generating this code:
>
> movl %edi, %r14d
> movq 32(%rbx), %rdx
> movl (%rdx), %eax
> movq %rax, 8(%rsp)
> cmpq 8(%rsp), %r14
> sbbq %rcx, %rcx
>
> which is just crazy. Notice how it does that "move rax to stack, then
> do the compare against the stack", instead of just using %rax.
>
> In fact, that function shouldn't have a stack frame at all, and the
> only reason it is generated is because of this whole oddity.
>
> All clang's fault, right?
>
> Yeah, mostly. But it turns out that what really messes with clangs
> little head is that the x86 array_index_mask_nospec() function is
> being a bit annoying.
>
> This is what we do:
>
> static __always_inline unsigned long
> array_index_mask_nospec(unsigned long index,
> unsigned long size)
> {
> unsigned long mask;
>
> asm volatile ("cmp %1,%2; sbb %0,%0;"
> :"=r" (mask)
> :"g"(size),"r" (index)
> :"cc");
> return mask;
> }
>
> and look at the use again:
>
> nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);
>
> here all the values are actually 'unsigned int'. So what happens is
> that clang can't just use the fdt->max_fds value *directly* from
> memory, because it needs to be expanded from 32-bit to 64-bit because
> we've made our array_index_mask_nospec() function only work on 64-bit
> 'unsigned long' values.
>
> So it turns out that by massaging this a bit, and making it just be a
> macro - so that the asm can decide that "I can do this in 32-bit" - I
> can get clang to generate much better code.
>
> Clang still absolutely hates the "g" constraint, so to get clang to
> really get this right I have to use "ir" instead of "g". Which is
> wrong. Because gcc does this right, and could use the memory op
> directly. But even gcc cannot do that with our *current* function,
> because of that "the memory value is 32-bit, we require a 64-bit
> value"
>
> Anyway, I can get gcc to generate the right code:
>
> movq 32(%r13), %rdx
> cmp (%rdx),%ebx
> sbb %esi,%esi
>
> which is basically the right code for the six crazy instructions clang
> generates. And if I make the "g" be "ir", I can get clang to generate
>
> movq 32(%rdi), %rcx
> movl (%rcx), %eax
> cmpl %eax, %esi
> sbbl %esi, %esi
>
> which is the same thing, but with that (pointless) load to a register.
>
> And now clang doesn't generate that stack frame at all.
>
> Anyway, this was a long email to explain the odd attached patch.
>
> Comments? Note that this patch is *entirely* untested, I have done
> this purely by looking at the code generation in fs/file.c.
>
> Linus
> arch/x86/include/asm/barrier.h | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
> index 66e57c010392..6159d2cbbfde 100644
> --- a/arch/x86/include/asm/barrier.h
> +++ b/arch/x86/include/asm/barrier.h
> @@ -33,20 +33,15 @@
> * Returns:
> * 0 - (index < size)
> */
> -static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
> - unsigned long size)
> -{
> - unsigned long mask;
> -
> - asm volatile ("cmp %1,%2; sbb %0,%0;"
> - :"=r" (mask)
> - :"g"(size),"r" (index)
> - :"cc");
> - return mask;
> -}
> -
> -/* Override the default implementation from linux/nospec.h. */
> -#define array_index_mask_nospec array_index_mask_nospec
> +#define array_index_mask_nospec(idx,sz) ({ \
> + typeof((idx)+(sz)) __idx = (idx); \
> + typeof(__idx) __sz = (sz); \
> + typeof(__idx) __mask; \
> + asm volatile ("cmp %1,%2; sbb %0,%0" \
> + :"=r" (__mask) \
> + :"ir"(__sz),"r" (__idx) \
> + :"cc"); \
> + __mask; })
>
> /* Prevent speculative execution past this barrier. */
> #define barrier_nospec() asm volatile("lfence":::"memory")
next prev parent reply other threads:[~2024-04-06 10:56 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
2024-04-06 10:56 ` Ingo Molnar [this message]
2024-04-06 12:30 ` Uros Bizjak
2024-04-06 15:39 ` Linus Torvalds
2024-04-06 16:04 ` Linus Torvalds
2024-04-08 8:49 ` Peter Zijlstra
2024-04-08 13:41 ` H. Peter Anvin
2024-04-08 18:32 ` Linus Torvalds
2024-04-08 19:42 ` Linus Torvalds
2024-04-09 10:45 ` Peter Zijlstra
2024-04-09 15:37 ` Nick Desaulniers
2024-04-10 19:23 ` Bill Wendling
2024-04-10 8:11 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZhEqW748nht2M4Si@gmail.com \
--to=mingo@kernel.org \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=ubizjak@gmail.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.