public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* clang and drm issue: objtool warnings from clang build
@ 2025-04-26 17:42 Linus Torvalds
  2025-04-26 17:52 ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2025-04-26 17:42 UTC (permalink / raw)
  To: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt
  Cc: the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List

So with clang, I get these drm build warnings

  drivers/gpu/drm/amd/amdgpu/../display/dc/basics/fixpt31_32.o:
     warning: objtool: dc_fixpt_recip() falls through to next function
dc_fixpt_sinc()

  drivers/gpu/drm/amd/amdgpu/../display/dc/sspl/spl_fixpt31_32.o:
     warning: objtool: spl_fixpt_recip() falls through to next
function spl_fixpt_sinc()

and the warnings seem real. I ignored them because it wasn't entirely
obvious what was going on and I was looking at other things, but today
I looked at why these happen.

What is going on is that the *_fixpt_recip() function has a

        SPL_ASSERT(arg.value);

which results in basically a test for 'arg.value' being zero and a WARN_ON()

Then - through inlining - we get from *_fixpt_recip() to

  spl_fixpt_from_fraction ->
    spl_complete_integer_division_u64()

here we have *another* check for the divisor not being zero:

        SPL_ASSERT(divisor);

and then it goes on to inline the code:

      spl_div64_u64_rem() ->
        div64_u64_rem()

which does

        *remainder = dividend % divisor;
        return dividend / divisor;

so now what has happened is that clang sees that when it inlines those
things, it basically has a path with two warnings, following a divide
by a value known to be the constant zero.

And that makes clang just stop generating any code at all, and the asm
looks like this:

spl_fixpt_recip:                        # @spl_fixpt_recip
# %bb.0:
        callq   __fentry__
        testq   %rdi, %rdi
        je      .LBB3_10
  ...
.LBB3_10:
        #APP
     .. disgusting unreadable for WARN_ON() on line 199 ..
        #NO_APP
        #APP
     .. disgusting unreadable for WARN_ON() on line 32 ..
        #NO_APP
.Lfunc_end3:

because clang has decided - correctly - that this path now divides by
zero. Notice how it just falls off at .Lfunc_end3 with no actual
divide, and no other sign of "you have failed at life".

However, this is actually problematic for the kernel: yes, the warning
hopefully helps show what is wrong, but because clang has now
generated code that basically jumps to random code, you get random
security issues if the warnign ever triggers (and there are
configurations where the warning is just disabled).

So there are two problems here:

 - the drm code should not "assert" crap. Do proper error handling, or
don't. But don't have a random "this is a known bad value" and then
fall back to using that value anyway.

We had something similar some time ago, where there was a drm
assertion without error handling, which caused the compiler to see
that there was a static path where the invalid value was used, and
then caused other problems. I forget the details, and gmail search
isn't helping me

But I *really* think that clang silently just generating known bad
code for invalida operations like this is very very dangerous, and is
a bug in clang.

So I'm cc'ing drm and clang people (and x86 people) in the hope that
we can fix both of these things.

Can we *please* get a flag for clang that it doesn't just stop
generating code because it has decided some path is unreachable or
undefined? Add a TRAP instruction, for Chrissake! PLEASE!

 Maybe such a flag already exists, and the kernel just doesn't know
about it. This whole "do random things for undefined behavior" is a
bug, dammit.

And yes, the drm code shouldn't be doing this. Adding warnings for
things that will oops is actually counter-productive. It generates
extra code, but this is an example of it actually causing *more*
problems. We'd have been better off with a nice clean divide-by-zero
oops than with a warning followed by random behavior.

               Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 17:42 clang and drm issue: objtool warnings from clang build Linus Torvalds
@ 2025-04-26 17:52 ` Linus Torvalds
  2025-04-26 20:05   ` Nathan Chancellor
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2025-04-26 17:52 UTC (permalink / raw)
  To: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nathan Chancellor, Nick Desaulniers, Bill Wendling, Justin Stitt
  Cc: the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List

On Sat, 26 Apr 2025 at 10:42, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> We had something similar some time ago, where there was a drm
> assertion without error handling, which caused the compiler to see
> that there was a static path where the invalid value was used, and
> then caused other problems. I forget the details, and gmail search
> isn't helping me

My dim memories came back and helped me with the right search terms,
and this is what I was talking about:

   https://lore.kernel.org/all/CAHk-=wg4ETks+pGUco4gDrRxT+1UBbFGQtpOqSxLSzvVAWpm5w@mail.gmail.com/

different compiler, very different results, but same kind of issue:
warning about an error case without actually *dealing* with the error,
which results in the compiler seeing a static code path from the
warning to an invalid situation, and causing odd problems.

Please people: "ASSERT()" like behavior is simply not acceptable in
the kernel. WARN_ON() and friends need to either be otherwise benign
(ie "warn but then continue to do valid things") or they need to be
*handled* (ie "warn and then refuse to do things that aren't valid").

Just warning and then doing random crap is not sane. If you aren't
capable of dealing with the situation, don't do the bogus test. Just
warning about it isn't fixing the code, and can make things actively
worse as in these two examples.

But I do think that clang needs to stop doing that "make things
actively worse" part. Maybe even have an actual honest-to-goodness
"this is a static undefined situation, I will stop generating code AND
THAT MEANS I FAIL THE BUILD".

Not this silent "now I generate random code by falling through to
something else entirely" that clang does now. Not good.

              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 17:52 ` Linus Torvalds
@ 2025-04-26 20:05   ` Nathan Chancellor
  2025-04-26 20:56     ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Nathan Chancellor @ 2025-04-26 20:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List,
	llvm

Hi Linus,

On Sat, Apr 26, 2025 at 10:42:40AM -0700, Linus Torvalds wrote:
> But I *really* think that clang silently just generating known bad
> code for invalida operations like this is very very dangerous, and is
> a bug in clang.
> 
> Can we *please* get a flag for clang that it doesn't just stop
> generating code because it has decided some path is unreachable or
> undefined? Add a TRAP instruction, for Chrissake! PLEASE!
> 
>  Maybe such a flag already exists, and the kernel just doesn't know
> about it. This whole "do random things for undefined behavior" is a
> bug, dammit.

I think there is an internal LLVM flag, '-trap-unreachable', that does
what we would want here. Within the last year, I tested adding something
like

    KBUILD_CFLAGS += -mllvm -trap-unreachable

to Makefile under an 'ifdef CONFIG_CC_IS_CLANG' and it eliminated most
objtool warnings but I seem to recall it introducing some new ones, I
think around __noreturn functions? I know Josh has done a lot of work on
objtool recently so I should retest. GCC has this behavior exposed under
-funreachable-traps, I could see about trying to expose that in clang.

This specific case started with clang-20 and your analysis is spot on:

https://lore.kernel.org/20241220223403.GA2605890@ax162/

On Sat, Apr 26, 2025 at 10:52:10AM -0700, Linus Torvalds wrote:
> Please people: "ASSERT()" like behavior is simply not acceptable in
> the kernel. WARN_ON() and friends need to either be otherwise benign
> (ie "warn but then continue to do valid things") or they need to be
> *handled* (ie "warn and then refuse to do things that aren't valid").
> 
> Just warning and then doing random crap is not sane. If you aren't
> capable of dealing with the situation, don't do the bogus test. Just
> warning about it isn't fixing the code, and can make things actively
> worse as in these two examples.

This was the most recent series to clear up those objtool warnings in
the AMD driver:

https://lore.kernel.org/20250114132856.19463-1-yangtiezhu@loongson.cn/

Not sure if other ASSERT() uses should be audited or eliminated too but
that would at least make the build cleaner.

> Not this silent "now I generate random code by falling through to
> something else entirely" that clang does now. Not good.

Aside from what I mention above, I suppose getting OBJTOOL_WERROR in
useful shape would help with this a little bit because it will be easier
for me and others testing tip of tree LLVM to notice when optimizations
introduce new warnings.

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 20:05   ` Nathan Chancellor
@ 2025-04-26 20:56     ` Linus Torvalds
  2025-04-26 23:23       ` Nathan Chancellor
  2025-04-28 17:53       ` Bill Wendling
  0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2025-04-26 20:56 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List,
	llvm

On Sat, 26 Apr 2025 at 13:05, Nathan Chancellor <nathan@kernel.org> wrote:
>
>     KBUILD_CFLAGS += -mllvm -trap-unreachable

Hmm. That certainly builds for me, but yeah, it generates new objtool
warnings, notably

   panic() missing __noreturn in .c/.h or NORETURN() in noreturns.h

and I *think* that is because that flag makes clang not honour our
*explicit* "this code is unreachable" annotations.

So now objtool complains about the fact that clang has generated some
random code that follows a call to 'panic()' even though objtool knows
that panic() cannot return.

And those explicit annotations definitely should be honored.

IOW, there's a *big* difference between "the programmer told me this
is unreachable, so I won't generate code past this point" and "I have
decided this is undefined behavior, so now I won't generate code past
this point".

So what I'm asking for is absolutely not "trap on unreachable". That's
wrong and just plain stupid.

I'm asking for "trap on UD instead of *assuming* it's unreachable".

Because clearly that code *can* be reached, it's just doing something undefined.

See? Big big difference.

             Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 20:56     ` Linus Torvalds
@ 2025-04-26 23:23       ` Nathan Chancellor
  2025-04-27  0:31         ` Linus Torvalds
  2025-04-28 17:53       ` Bill Wendling
  1 sibling, 1 reply; 13+ messages in thread
From: Nathan Chancellor @ 2025-04-26 23:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List,
	llvm

On Sat, Apr 26, 2025 at 01:56:59PM -0700, Linus Torvalds wrote:
> IOW, there's a *big* difference between "the programmer told me this
> is unreachable, so I won't generate code past this point" and "I have
> decided this is undefined behavior, so now I won't generate code past
> this point".
> 
> So what I'm asking for is absolutely not "trap on unreachable". That's
> wrong and just plain stupid.
> 
> I'm asking for "trap on UD instead of *assuming* it's unreachable".
> 
> Because clearly that code *can* be reached, it's just doing something undefined.
> 
> See? Big big difference.

Ah yes, that is a big yet subtle difference that I had not considered,
my bad for missing that. I was only thinking about the implicitly
inserted __builtin_unreachable() from potential UB, not the explicitly
added ones from the developers.

I suspect that it would not be easy to split that distinction in LLVM
but since I am not a compiler person, I will see if this has come up
before and talk to people otherwise. I know there has been work in LLVM
to try and stop undefined behavior from destroying control flow with
things like the freeze instruction but I am not sure that would help us
in this situation. Pardon my ignorance though, isn't something like this
basically just '-fsanitize=undefined -fsanitize-trap=all'?

Cheers,
Nathan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 23:23       ` Nathan Chancellor
@ 2025-04-27  0:31         ` Linus Torvalds
  2025-04-28 18:08           ` Bill Wendling
  0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2025-04-27  0:31 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Josh Poimboeuf, Harry Wentland, Leo Li, Alex Deucher,
	Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Bill Wendling, Justin Stitt,
	the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List,
	llvm

On Sat, 26 Apr 2025 at 16:23, Nathan Chancellor <nathan@kernel.org> wrote:
>
>  Pardon my ignorance though, isn't something like this
> basically just '-fsanitize=undefined -fsanitize-trap=all'?

Sure. Except -fsanitize=undefined is a horrible horrible thing.

Why? Because it pointlessly adds code to *look* for undefined
behavior, which is only extra overhead.

I)OW, if we have a divide, just *DO THE DIVIDE*. Don't some extra
pointless code to "is the divisor zero, and trap if so".

Because dammit, that's what the divide instruction ALREADY DOES.

The whole concept of "use undefined C behavior to change code
generation" is complete and utter BS. It's wrong. It's stupid. And a
compiler shouldn't do it.

The argument for it is "once it's udnefined, I might as well optimize
it away". But in reality, that argument is pure garbage.

It's garbage for several reasons:

 - there's no real life optimization to have in practice. You aren't
actually improving code generation.

 - there are real and serious downsides in security, and this case is
an example of that very issue

 - the historical reason for most C undefined behavior DOES NOT
ACTUALLY EXIST ANY MORE.

Nick Desaulniers recently pointed me at a paper that is worth reading
by any compiler person:

   https://web.ist.utl.pt/nuno.lopes/pubs/ub-pldi25.pdf

which backs me up on that "UD optimizations aren't actually optimizing
anything" thing.

So please. Clang people need to get a clue. Yes, we care *deeply*
about performance in the kernel, but a C compiler that thinks that
using UD to generate "better" code is a disgrace and pure garbage.
Because security matters a whole lot too, and the downsides of turning
undefined behavior into random garbage are about a million times
bigger than the "I can remove one integer instruction for zero gain".

For the kerrnel, we want to disable absolutely all undefined behavior
crap ideas by the compiler. It's why we use -fwrapv and have for
years. It's why we disable the idiotic "strict alias" stuff that
should never have become part of C. And it's why I want that "turn UD
into unreachable" mindfart fixed.

The notion of "optimizing" unreachable code is crazy. And the notion
of thinking that "UD means unreachable" is so incredibly insane that
any compiler person that thinks it is reasonable should have his head
examined.

               Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-26 20:56     ` Linus Torvalds
  2025-04-26 23:23       ` Nathan Chancellor
@ 2025-04-28 17:53       ` Bill Wendling
  1 sibling, 0 replies; 13+ messages in thread
From: Bill Wendling @ 2025-04-28 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nathan Chancellor, Josh Poimboeuf, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Sat, Apr 26, 2025 at 1:57 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Sat, 26 Apr 2025 at 13:05, Nathan Chancellor <nathan@kernel.org> wrote:
> >
> >     KBUILD_CFLAGS += -mllvm -trap-unreachable
>
> Hmm. That certainly builds for me, but yeah, it generates new objtool
> warnings, notably
>
>    panic() missing __noreturn in .c/.h or NORETURN() in noreturns.h
>
> and I *think* that is because that flag makes clang not honour our
> *explicit* "this code is unreachable" annotations.
>
There's also this flag:

  -mllvm -no-trap-after-noreturn

Here's the documentation for both flags:

static cl::opt<bool>
    EnableTrapUnreachable("trap-unreachable", cl::Hidden,
                          cl::desc("Enable generating trap for unreachable"));

static cl::opt<bool> EnableNoTrapAfterNoreturn(
    "no-trap-after-noreturn", cl::Hidden,
    cl::desc("Do not emit a trap instruction for 'unreachable' IR instructions "
             "after noreturn calls, even if --trap-unreachable is set."));


Could you share how you configured your build?

-bw

> So now objtool complains about the fact that clang has generated some
> random code that follows a call to 'panic()' even though objtool knows
> that panic() cannot return.
>
> And those explicit annotations definitely should be honored.
>
> IOW, there's a *big* difference between "the programmer told me this
> is unreachable, so I won't generate code past this point" and "I have
> decided this is undefined behavior, so now I won't generate code past
> this point".
>
> So what I'm asking for is absolutely not "trap on unreachable". That's
> wrong and just plain stupid.
>
> I'm asking for "trap on UD instead of *assuming* it's unreachable".
>
> Because clearly that code *can* be reached, it's just doing something undefined.
>
> See? Big big difference.
>
>              Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-27  0:31         ` Linus Torvalds
@ 2025-04-28 18:08           ` Bill Wendling
  2025-04-28 19:34             ` Linus Torvalds
  0 siblings, 1 reply; 13+ messages in thread
From: Bill Wendling @ 2025-04-28 18:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nathan Chancellor, Josh Poimboeuf, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Sat, Apr 26, 2025 at 5:31 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So please. Clang people need to get a clue. Yes, we care *deeply*
> about performance in the kernel, but a C compiler that thinks that
> using UD to generate "better" code is a disgrace and pure garbage.
> Because security matters a whole lot too, and the downsides of turning
> undefined behavior into random garbage are about a million times
> bigger than the "I can remove one integer instruction for zero gain".
>
> For the kerrnel, we want to disable absolutely all undefined behavior
> crap ideas by the compiler. It's why we use -fwrapv and have for
> years. It's why we disable the idiotic "strict alias" stuff that
> should never have become part of C. And it's why I want that "turn UD
> into unreachable" mindfart fixed.
>
> The notion of "optimizing" unreachable code is crazy. And the notion
> of thinking that "UD means unreachable" is so incredibly insane that
> any compiler person that thinks it is reasonable should have his head
> examined.

I tend to agree that generating bad code in the face of UB is bad;
there was another, unrelated, instance where Clang silently generated
bad code with UB.

The problem borders on the philosophical. But I believe the argument
is roughly "generating 'good' code in the face of UB is a never-ending
game of Wack-A-Mole(tm), where we don't actually know what the correct
outcome should be." ("Correct" implying that the compiler could
somehow divine what the programmer was hoping for, or at least get
close to it, but it's not possible.) This situation is one of the
easier ones: "do something other than fall into the next function";
but there are far more involved examples, of course. And even in this
case, the compiler needs to know if a "trap" is okay, or would
returning with garbage in %rax be okay.

-bw

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-28 18:08           ` Bill Wendling
@ 2025-04-28 19:34             ` Linus Torvalds
  2025-04-28 19:54               ` Josh Poimboeuf
                                 ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Linus Torvalds @ 2025-04-28 19:34 UTC (permalink / raw)
  To: Bill Wendling
  Cc: Nathan Chancellor, Josh Poimboeuf, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Mon, 28 Apr 2025 at 11:08, Bill Wendling <morbo@google.com> wrote:
>
> This situation is one of the
> easier ones: "do something other than fall into the next function";

Note that the "fall into the next function" is just something that
objtool notices. It *could* be "fall into the next basic block of the
same function, and objtool wouldn't warn, because objtool generally
wouldn't notice (there could be other things that make objtool notice,
of course - things like stack updates being out of whack or similar).

But I really wish that clang would look at a "don't depend on UD as a
code generation model AT ALL" as a flag.

The whole "this is undefined, so I'll generate something different"
model is just wrong.

That said, there are certainly graduations of wrong:

> but there are far more involved examples, of course. And even in this
> case, the compiler needs to know if a "trap" is okay, or would
> returning with garbage in %rax be okay.

Honestly, the least wrong thing is to just NOT HAVE THE CHECK FOR ZERO AT ALL.

IOW, just generate the divide instruction.

I can almost guarantee that that will actually then generate the best
code too, because you'll probably just end up sharing the divide
instruction will all the *normal* cases.

So the best model is to literally remove that pointless and stupid "is
this a divide by zero" code. It's pointless and stupid because it
literally just makes for more work both for the compiler AND it
generates worse code.

Why do extra work to generate worse code?

Btu if some religious nutcase decides that "I will not generate divide
instructions if I know the divisor is zero" is a hill they will die
on, generating a "trap" instruction is certainly not inexcusable.

Generating a random value for %eax is WRONG. Now, that said, it's
clearly less wrong than falling through to some unrelated code
entirely, so it would be an improvement on the *current* situation,
but that's like saying that getting shot in the foot is an improvement
on getting shot in the head: true, but if the alternative is not
getting shot at all, why is that "less bad" alternative even on the
table?

The "just execute random code" is clearly so bad that it *should* be
off the table in the first place, and I don't understand why it is
what clang does now. It's just crazy.

And yes, this really is a very potential and real security issue. In
the kernel I don't think we have this ever happening, partly because a
lot of configurations use gcc which afaik doesn't have this particular
horrendous model of UD.

But this isn't just a kernel issue, it's a "anybody using clang to
build any program that might have security issues would be *insane* to
think this is a good model for dealing with UD". We do more checking
than most on the code generation, so we actually had tools that
noticed this odd code generaton. I can guarantee you that 99% of all
projects out there would never have even noticed.

And who knows what cases we *don't* find.

And obviously hopefully UD doesn't actually happen. But that's like
saying "hopefully we have no bugs". It's not reality.

Using UD to change code generation really is a truly horrendously bad
idea in the first place, but doing it in anything where security might
matter takes "bad idea" to "just don't do this".

                 Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-28 19:34             ` Linus Torvalds
@ 2025-04-28 19:54               ` Josh Poimboeuf
  2025-04-28 20:57                 ` Linus Torvalds
  2025-04-28 20:58               ` Bill Wendling
  2025-04-29  9:17               ` Peter Zijlstra
  2 siblings, 1 reply; 13+ messages in thread
From: Josh Poimboeuf @ 2025-04-28 19:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bill Wendling, Nathan Chancellor, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Mon, Apr 28, 2025 at 12:34:27PM -0700, Linus Torvalds wrote:
> Honestly, the least wrong thing is to just NOT HAVE THE CHECK FOR ZERO AT ALL.
> 
> IOW, just generate the divide instruction.
> 
> I can almost guarantee that that will actually then generate the best
> code too, because you'll probably just end up sharing the divide
> instruction will all the *normal* cases.
> 
> So the best model is to literally remove that pointless and stupid "is
> this a divide by zero" code. It's pointless and stupid because it
> literally just makes for more work both for the compiler AND it
> generates worse code.

BTW, I've noticed Clang also generates UB for negative shift values.  I
assume we'd want it to stop checking for those as well.

-- 
Josh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-28 19:54               ` Josh Poimboeuf
@ 2025-04-28 20:57                 ` Linus Torvalds
  0 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2025-04-28 20:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Bill Wendling, Nathan Chancellor, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Mon, 28 Apr 2025 at 12:54, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> BTW, I've noticed Clang also generates UB for negative shift values.  I
> assume we'd want it to stop checking for those as well.

Yeah, that seems to match the exact same issue.

And again - the correct fix would be for the compiler to not do extra
work only to make for worse results.

Note that it's different if we *ask* for -fsanitize-undefined-xyz: at
that point we're literally asking the compiler to add extra code for
reporting.

But even then we absolutely don't want the "fall through to random
code" behavior. We'd only want the reporting part, and then still
generate valid code.

So the "turn undefined behavior into truly random behavior" is *never*
a valid model.

It would be much better handled as implementation-defined. So "divide
by zero" would have a very valid model - it will raise an exception.
And shift by negative would have all the usual semantics on x86 (the
shift value is just masked).

Undefined behavior is a bad bad thing to try to take advantage of.
It's bad in CPU design, it's bad in compiler design. If we have a bug
- and bugs *will* happen - it's a lot better if that bug then causes
some reliably bad behavior. It's better for security, it's better for
debuggability.

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-28 19:34             ` Linus Torvalds
  2025-04-28 19:54               ` Josh Poimboeuf
@ 2025-04-28 20:58               ` Bill Wendling
  2025-04-29  9:17               ` Peter Zijlstra
  2 siblings, 0 replies; 13+ messages in thread
From: Bill Wendling @ 2025-04-28 20:58 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nathan Chancellor, Josh Poimboeuf, Harry Wentland, Leo Li,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Nick Desaulniers, Justin Stitt, the arch/x86 maintainers,
	dri-devel, Linux Kernel Mailing List, llvm

On Mon, Apr 28, 2025 at 12:34 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Mon, 28 Apr 2025 at 11:08, Bill Wendling <morbo@google.com> wrote:
> >
> > This situation is one of the
> > easier ones: "do something other than fall into the next function";
>
> Note that the "fall into the next function" is just something that
> objtool notices. It *could* be "fall into the next basic block of the
> same function, and objtool wouldn't warn, because objtool generally
> wouldn't notice (there could be other things that make objtool notice,
> of course - things like stack updates being out of whack or similar).
>
> But I really wish that clang would look at a "don't depend on UD as a
> code generation model AT ALL" as a flag.
>
> The whole "this is undefined, so I'll generate something different"
> model is just wrong.
>
> That said, there are certainly graduations of wrong:
>
> > but there are far more involved examples, of course. And even in this
> > case, the compiler needs to know if a "trap" is okay, or would
> > returning with garbage in %rax be okay.
>
> Honestly, the least wrong thing is to just NOT HAVE THE CHECK FOR ZERO AT ALL.
>
> IOW, just generate the divide instruction.
>
> I can almost guarantee that that will actually then generate the best
> code too, because you'll probably just end up sharing the divide
> instruction will all the *normal* cases.
>
I get what you're saying, I really do. I'm actually in the "playing
Wack-A-Mole(tm) is far better than generating code that accidentally
launches the nukes" crowd. The fact that the compiler silently
generates something wrong is horrifying to me. The compiler has a ton
of options to allow for "bad" math, but they're mostly (all?) for
floating point operations. It has some for integers, like the -fwrapv
you mentioned.

> So the best model is to literally remove that pointless and stupid "is
> this a divide by zero" code. It's pointless and stupid because it
> literally just makes for more work both for the compiler AND it
> generates worse code.
>
> Why do extra work to generate worse code?
>
> Btu if some religious nutcase decides that "I will not generate divide
> instructions if I know the divisor is zero" is a hill they will die
> on, generating a "trap" instruction is certainly not inexcusable.
>
I'll see what I can do with this. I might be able to sneak a patch in
past the religious nutcases. The fact that we have the two flags
Nathan and I mentioned could indicate that someone will be amenable to
the patch.

-bw

> Generating a random value for %eax is WRONG. Now, that said, it's
> clearly less wrong than falling through to some unrelated code
> entirely, so it would be an improvement on the *current* situation,
> but that's like saying that getting shot in the foot is an improvement
> on getting shot in the head: true, but if the alternative is not
> getting shot at all, why is that "less bad" alternative even on the
> table?
>
> The "just execute random code" is clearly so bad that it *should* be
> off the table in the first place, and I don't understand why it is
> what clang does now. It's just crazy.
>
> And yes, this really is a very potential and real security issue. In
> the kernel I don't think we have this ever happening, partly because a
> lot of configurations use gcc which afaik doesn't have this particular
> horrendous model of UD.
>
> But this isn't just a kernel issue, it's a "anybody using clang to
> build any program that might have security issues would be *insane* to
> think this is a good model for dealing with UD". We do more checking
> than most on the code generation, so we actually had tools that
> noticed this odd code generaton. I can guarantee you that 99% of all
> projects out there would never have even noticed.
>
> And who knows what cases we *don't* find.
>
> And obviously hopefully UD doesn't actually happen. But that's like
> saying "hopefully we have no bugs". It's not reality.
>
> Using UD to change code generation really is a truly horrendously bad
> idea in the first place, but doing it in anything where security might
> matter takes "bad idea" to "just don't do this".
>
>                  Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: clang and drm issue: objtool warnings from clang build
  2025-04-28 19:34             ` Linus Torvalds
  2025-04-28 19:54               ` Josh Poimboeuf
  2025-04-28 20:58               ` Bill Wendling
@ 2025-04-29  9:17               ` Peter Zijlstra
  2 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2025-04-29  9:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Bill Wendling, Nathan Chancellor, Josh Poimboeuf, Harry Wentland,
	Leo Li, Alex Deucher, Christian König, David Airlie,
	Simona Vetter, Nick Desaulniers, Justin Stitt,
	the arch/x86 maintainers, dri-devel, Linux Kernel Mailing List,
	llvm

On Mon, Apr 28, 2025 at 12:34:27PM -0700, Linus Torvalds wrote:

> And yes, this really is a very potential and real security issue. In
> the kernel I don't think we have this ever happening, partly because a
> lot of configurations use gcc which afaik doesn't have this particular
> horrendous model of UD.

I see more and more people use clang, in a large part because of Rust.

Anyway, I've seen clang pull this stop-codegen-on-UB trick before (link
was upstream in the thread) and yes, it is horrific crap. At the time I
proposed emitting at the very least a UD2 instruction rather than just
straight up stopping code gen, but I think your proposal for a code-gen
knob to just not do this 'optimisation' at all is much better.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-04-29  9:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-26 17:42 clang and drm issue: objtool warnings from clang build Linus Torvalds
2025-04-26 17:52 ` Linus Torvalds
2025-04-26 20:05   ` Nathan Chancellor
2025-04-26 20:56     ` Linus Torvalds
2025-04-26 23:23       ` Nathan Chancellor
2025-04-27  0:31         ` Linus Torvalds
2025-04-28 18:08           ` Bill Wendling
2025-04-28 19:34             ` Linus Torvalds
2025-04-28 19:54               ` Josh Poimboeuf
2025-04-28 20:57                 ` Linus Torvalds
2025-04-28 20:58               ` Bill Wendling
2025-04-29  9:17               ` Peter Zijlstra
2025-04-28 17:53       ` Bill Wendling

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox