public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* More annoying code generation by clang
@ 2024-04-04 22:53 Linus Torvalds
  2024-04-06 10:56 ` Ingo Molnar
  2024-04-08  8:49 ` Peter Zijlstra
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2024-04-04 22:53 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner, Peter Anvin
  Cc: the arch/x86 maintainers, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 3725 bytes --]

So this doesn't really matter in any real life situation, but it
really grated on me.

Clang has this nasty habit of taking our nice asm constraints, and
turning them into worst-case garbage. It's been reported a couple of
times where we use "g" to tell the compiler that pretty much any
source to the asm works, and then clang takes that to mean "I will
take that to use 'memory'" even when that makes no sense what-so-ever.

See for example

    https://lore.kernel.org/all/CAHk-=wgobnShg4c2yyMbk2p=U-wmnOmX_0=b3ZY_479Jjey2xw@mail.gmail.com/

where I was ranting about clang just doing pointlessly stupid things.

However, I found a case where yes, clang does pointlessly stupid
things, but it's at least _partly_ our fault, and gcc can't generate
optimal code either.

We have this fairly critical code in __fget_files_rcu() to look up a
'struct file *' from an fd, and it does this:

                /* Mask is a 0 for invalid fd's, ~0 for valid ones */
                nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

and clang makes a *horrid* mess of it, generating this code:

        movl    %edi, %r14d
        movq    32(%rbx), %rdx
        movl    (%rdx), %eax
        movq    %rax, 8(%rsp)
        cmpq    8(%rsp), %r14
        sbbq    %rcx, %rcx

which is just crazy. Notice how it does that "move rax to stack, then
do the compare against the stack", instead of just using %rax.

In fact, that function shouldn't have a stack frame at all, and the
only reason it is generated is because of this whole oddity.

All clang's fault, right?

Yeah, mostly. But it turns out that what really messes with clangs
little head is that the x86 array_index_mask_nospec() function is
being a bit annoying.

This is what we do:

  static __always_inline unsigned long
array_index_mask_nospec(unsigned long index,
                unsigned long size)
  {
        unsigned long mask;

        asm volatile ("cmp %1,%2; sbb %0,%0;"
                        :"=r" (mask)
                        :"g"(size),"r" (index)
                        :"cc");
        return mask;
  }

and look at the use again:

        nospec_mask = array_index_mask_nospec(fd, fdt->max_fds);

here all the values are actually 'unsigned int'. So what happens is
that clang can't just use the fdt->max_fds value *directly* from
memory, because it needs to be expanded from 32-bit to 64-bit because
we've made our array_index_mask_nospec() function only work on 64-bit
'unsigned long' values.

So it turns out that by massaging this a bit, and making it just be a
macro - so that the asm can decide that "I can do this in 32-bit" - I
can get clang to generate much better code.

Clang still absolutely hates the "g" constraint, so to get clang to
really get this right I have to use "ir" instead of "g". Which is
wrong. Because gcc does this right, and could use the memory op
directly. But even gcc cannot do that with our *current* function,
because of that "the memory value is 32-bit, we require a 64-bit
value"

Anyway, I can get gcc to generate the right code:

        movq    32(%r13), %rdx
        cmp (%rdx),%ebx
        sbb %esi,%esi

which is basically the right code for the six crazy instructions clang
generates. And if I make the "g" be "ir", I can get clang to generate

        movq    32(%rdi), %rcx
        movl    (%rcx), %eax
        cmpl    %eax, %esi
        sbbl    %esi, %esi

which is the same thing, but with that (pointless) load to a register.

And now clang doesn't generate that stack frame at all.

Anyway, this was a long email to explain the odd attached patch.

Comments? Note that this patch is *entirely* untested, I have done
this purely by looking at the code generation in fs/file.c.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 1122 bytes --]

 arch/x86/include/asm/barrier.h | 23 +++++++++--------------
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 66e57c010392..6159d2cbbfde 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -33,20 +33,15 @@
  * Returns:
  *     0 - (index < size)
  */
-static __always_inline unsigned long array_index_mask_nospec(unsigned long index,
-		unsigned long size)
-{
-	unsigned long mask;
-
-	asm volatile ("cmp %1,%2; sbb %0,%0;"
-			:"=r" (mask)
-			:"g"(size),"r" (index)
-			:"cc");
-	return mask;
-}
-
-/* Override the default implementation from linux/nospec.h. */
-#define array_index_mask_nospec array_index_mask_nospec
+#define array_index_mask_nospec(idx,sz) ({	\
+	typeof((idx)+(sz)) __idx = (idx);	\
+	typeof(__idx) __sz = (sz);		\
+	typeof(__idx) __mask;			\
+	asm volatile ("cmp %1,%2; sbb %0,%0"	\
+			:"=r" (__mask)		\
+			:"ir"(__sz),"r" (__idx)	\
+			:"cc");			\
+	__mask; })
 
 /* Prevent speculative execution past this barrier. */
 #define barrier_nospec() asm volatile("lfence":::"memory")

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-10 19:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-04 22:53 More annoying code generation by clang Linus Torvalds
2024-04-06 10:56 ` Ingo Molnar
2024-04-06 12:30   ` Uros Bizjak
2024-04-06 15:39     ` Linus Torvalds
2024-04-06 16:04       ` Linus Torvalds
2024-04-08  8:49 ` Peter Zijlstra
2024-04-08 13:41   ` H. Peter Anvin
2024-04-08 18:32   ` Linus Torvalds
2024-04-08 19:42     ` Linus Torvalds
2024-04-09 10:45       ` Peter Zijlstra
2024-04-09 15:37         ` Nick Desaulniers
2024-04-10 19:23           ` Bill Wendling
2024-04-10  8:11       ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox