From: Thomas Gleixner <tglx@kernel.org>
To: Alexander Potapenko <glider@google.com>,
Mark Rutland <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>,
syzbot <syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com>,
kasan-dev <kasan-dev@googlegroups.com>,
linux-kernel@vger.kernel.org, luto@kernel.org,
peterz@infradead.org, syzkaller-bugs@googlegroups.com,
ruanjinjie@huawei.com
Subject: Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
Date: Wed, 13 May 2026 02:36:10 +0200 [thread overview]
Message-ID: <87zf246t8l.ffs@tglx> (raw)
In-Reply-To: <CAG_fn=VGfkvsTJTq_TFkg_=FiCFmbY5uj=2bwSzMR+8Cmdxi+g@mail.gmail.com>
Alexander!
On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
>> > Again, this only matters because we are calling an instrumented function from
>> > a non-instrumented one, otherwise it's perfectly fine to call between
>> > compilation units.
>>
>> For context, can you explain how this is expected to work across
>> compilation units when the caller and callee *are* instrumented, when an
>> argument is passed in a register?
>
> Per KMSAN ABI, the metadata for the function arguments is passed via
> buffers in the so-called context state (see
> include/linux/kmsan_types.h)
> In the userspace, these buffers are thread-local variables referenced
> by inline loads and stores.
> In the kernel space, the compiler inserts a call
> __msan_get_context_state() at the beginning of every function, and
> then the instrumentation code uses whatever that function returned.
>
> Assume we have a function:
>
> int sum(int a, int b) {
> result = a + b;
> return result;
> }
>
> Its instrumented version looks roughly as follows (we'll omit origin
> tracking for simplicity):
>
> int sum(int a, int b) {
> struct kmsan_context_state *kcs = __msan_get_context_state();
> int s_a = ((int)kcs->param_tls)[0]; // shadow of a
> int s_b = ((int)kcs->param_tls)[1]; // shadow of b
> result = a + b;
> s_result = s_a | s_b;
> ((int)kcs->retval_tls)[0] = s_result; // returned shadow
> return result;
> }
>
> Most certainly `a` and `b` will be passed using registers, but that
> doesn't matter: their metadata is safe as long as the caller does:
>
> ((int)kcs->param_tls)[0] = s_a;
> ((int)kcs->param_tls)[1] = s_b;
> result = sum(a, b);
> s_result = ((int)kcs->retval_tls)[0];
Thanks for the explanation.
TBH, I'm failing to see how this is not inSANe, but I might be missing
something obvious.
What really bothers me is the lack of distinction between arguments
passed by value vs. arguments passed by reference. For me there is a
clear distinction.
Any argument passed by reference, i.e. pointer, obviously requires to be
validated at the actual usage site, which might be several levels deep
into the call chain. That obviously does not include by value arguments
which are passed on the stack due to exceeding the register passing
constraints,
But for arguments passed by value, I fail to see the justification.
The compiler can validate at the call site that the value arguments are
properly initialized both at compile and at runtime:
1) If the argument value is a constant it is initialized
2) If the argument value is read from memory at the call site, then
the function which reads has to validate that the memory is
initialized before reading it.
3) If the argument value was returned from a previous function call
then that invoked function has to make sure that the returned
value is properly initialized.
4) If the argument is a local variable which is not initialized
that's catched at compile time with and without KMSAN. If people
ignore that warning, then they will ignore the resulting KMSAN
splat too.
5) If the argument is only propagated to the next call, then #1 - #4
apply to the caller, i.e. all of that propagates through the
call chain.
6) If the argument is handed in from the previous caller and
modified by the function which calls the next one, then still #1
- #4 apply.
IOW, the producer of a value argument has to ensure that the value
argument is properly initialized.
This whole 'is initialized' propagation for by value arguments is
overengineered voodoo in my opinion. Why?
If the compiler can't ensure at the producer site that something is
properly initialized and then can't ensure that the intermediates
handing the value argument down the callchain are doing the right
thing, then KMSAN won't help either as it is then by definition
equally untrustworthy as the untrustworthy compiler which emitted the
KMSAN self-validation along with the broken code.
So it just adds overhead for nothing and makes everything look "sane"
while in fact it provides no value and creates exactly the problems we
are debating right now.
No?
> All the metadata tracking is implemented using an LLVM IR pass, which
> ensures that the shadow for each uninitialized value is tracked
> regardless of where that value is stored - be that heap memory, stack
> or registers:
> - when a value is created or loaded from memory, the compiler inserts
> SSA registers corresponding to that value's shadow;
> - when a value is written to memory, a shadow store is inserted;
> - when a value is used in an operation, the result of that operation
> receives a shadow value depending on the operands and their shadow
> values;
> - when a value is passed to a function, the corresponding context
> state stores and loads are emitted.
>
> Now, this works fine under the assumption that all code in the kernel
> is instrumented with KMSAN.
> However this is not true.
We know that since KMSAN got integrated and you pointed yourself to that
discussion which we had back then about handling the @regs pointer
argument.
There was a brief mentioning of the same problem about non-pointer
arguments, but that dried out because the test runs did not barf anymore
after the @regs unpoisoning got fixed.
There were discussions about utilizing instrumentation_begin()/end() to
tell the compiler that this instrumentation_begin() lifts the 'do not
sanitize directive' and it could rightfully inject instrumentation code
there including calls, but obviously nothing happened.
What really worries me more than this particular problem at hand is that
we have tons of places which invoke instrumented functions from
non-instrumented code with arguments originating from the
non-instrumented code. Here is an example:
DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
{
do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
0, NULL);
}
#define DEFINE_IDTENTRY_ERRORCODE(func) \
static __always_inline void __##func(struct pt_regs *regs, \
unsigned long error_code); \
\
__visible noinstr void func(struct pt_regs *regs, \
unsigned long error_code) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
instrumentation_begin(); \
__##func (regs, error_code); \
instrumentation_end(); \
irqentry_exit(regs, state); \
} \
\
static __always_inline void __##func(struct pt_regs *regs, \
unsigned long error_code)
@error_code comes all the way from the ASM entry code.
The function which gets inlined within the instrumentation_begin()/end()
region in the non-instrumented exc_invalid_tss() is:
static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code)
{
do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL);
}
Anything else than @regs (already unpoisoned) and @error_code is compile
time constant.
do_error_trap() is in the same compilation unit, static and
instrumentable:
static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
unsigned long trapnr, int signr, int sicode, void __user *addr)
{
RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
NOTIFY_STOP) {
cond_local_irq_enable(regs);
do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
cond_local_irq_disable(regs);
}
}
That should suffer from the very same problem vs. _all_ arguments
except @regs, right?
But no, it does not. The same compilers which emit imaginary initialized
checks for something they out of lined themself despite knowing that the
only calling context is not instrumentable, ignore the very same for
do_error_trap().
According to the disassembly do_error_trap() just goes and invokes
notify_die() without any checks and notify_die() takes the same
(reordered) arguments uninspected and stores them into a on stack data
struct which is properly poisened first and then unpoisoned for each
member stored.
That's either insanely inconsistent or I've missed something in the
resulting ASM maze. But I'm pretty sure that I did not, so I spare you
the ASM analysis as you surely are able to look at the resulting
inconsistent assembly code yourself (clang-19 debian based).
>> > So what we could do is to add annotations manualy on either the caller side or
>> > the callee side.
>>
>> Without some understanding of those "hints" you mention, I don't see how
>> we can do that on the caller side.
>
> In this case by "hints" I meant any kind of instrumentation that the
> compiler could have inserted automatically to mark the data in the
> instrumented functions as initialized.
> There are no such "hints" right now (apart from letting KMSAN
> instrument the function).
> As for the manual annotations, we only have the mentioned
> `__no_kmsan_checks`, `__no_sanitize_memory`, and
> kmsan_unpoison_memory().
Which is a sad state of affairs. These hints have been debated four
years ago for the very same reasons and nothing happened since then
other than the very same compiler becoming more agressive about
uninling.
>> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
>> > making all its inputs initialized. This is the easiest solution, it may
>> > introduce false negatives, but we are on a very thin ice anyway, so perhaps
>> > doing so is better than dealing with more false positives in the interrupt code.
>> >
>> > Another option for the callee would be applying `__always_inline`, so that
>> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
>> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
>> > `__always_inline`, it might be the right thing to do.
>> >
>> We can do that, but this really suggests that there's a fundemantal
>> inability to pass arguments between code which is noinstr and code which
>> isinstrumented with AddressSanitizer, and that's inevitably going to
>> bite us in future.
For the problem at hand I really want to go and mark the inline
__always_inline so it's gone for now.
I hate that "solution" with a passion as it just papers over the
underlying problem and I definitely agree with Mark that this can only
to be considered to be a temporary band aid and is going to bite use
over and over again.
As I pointed out with exc_invalid_tss() this just works by chance and
not by design and there are far more places than that to illustrate it.
> This is true (assuming you mean MemorySanitizer), but:
> - I don't think we can avoid that, given that there will always be
> non-instrumented code;
> - so far the number of annotations has been manageable.
I agree that it's unavoidable, but I don't agree with the second
sentiment unless you meant: "Has been manageable by sheer luck".
What I said in that four year old discussion you pointed to earlier
"I'm all for making use of advanced instrumentation, validation and
debugging features, but this mindset of 'make the code comply to what
the tool of today provides' is fundamentally wrong. Tools have to
provide value to the programmer and not the other way round.
Yes, it's more work on the tooling side, but the tooling side is
mostly a one time effort while chasing the false positives is a long
term nightmare."
still applies and while we might paper over the underlying problem for
now to keep the CI robots happy, we really have to come up with
something which is reliable and at the same time provides the maximum
coverage for instrumentation.
After reading back on that old thread I still think that we should stick
some hint for the compiler into instrumentation_begin()/end().
These macros exist to provide objtool a way to validate that
non-instrumentable code does not accidentaly call out into
instrumentable code without someone having done the reasoning why it is
safe. But at the same time these sections obviously can tell the
compiler that the by value arguments handed to any function invoked
within that section have to be considered valid.
I deliberately restricted this to by value arguments because by
reference arguments need to be explicitly covered as we do today for
@regs (also see above).
So the compiler can either emit the required KMSAN magic right before
the call or create a wrapper function, which does the required context
initialization.
IIRC, this has been debated four years ago already in some form.
But now four years down the road that obviously does not solve the
problem because nothing happened and we have to deal with existing
compilers and the only thing we can do until compiler people get their
act together is work around this insanity in kernel code again.
Why does TCMalloc come to my mind right now?
Ranted enough.
Here is what I came up with which seems to be "workable" after throwing
at least a dozen other "brilliant and trivial" ideas out:
instrumentation_begin();
- func(arg1, arg2);
+ call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2);
instrumentation_end();
and let call_instr() inject the necessary KMSAN fixups before invoking
@func. That would probably do some useless stuff when @func is an empty
stub, a __always_inline function or subject to arbitrary inlining
decisions of the compiler. But that's not the end of the world and the
conversion can be mostly done by coccinelle or your favorite code
conversion tool of the day.
Something insane like this:
#define KMSAN_FIXUP_VA_1(insane, type, arg) insane(type, arg)
#define KMSAN_FIXUP_VA_2(insane, arg, ...) KMSAN_FIXUP_VA_1(insane, arg, __VA_ARGS__)
#define KMSAN_FIXUP_VA_3(insane, type, arg, ...) insane(type, arg), KMSAN_FIXUP_VA_2(insane, __VA_ARGS__)
#define KMSAN_FIXUP_VA_4(insane, arg, ...) KMSAN_FIXUP_VA_3(insane, arg, __VA_ARGS__)
#define KMSAN_FIXUP_VA_FOR_EACH(insane, ...) \
CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_ARGS__)
#define KMSAN_VA_ARG_2(a1, a2) (a2)
#define KMSAN_VA_ARG_4(a1, a2, a3, a4) (a2), (a4)
#define VA_CALL(f, ...) \
f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__))
#define kmsan_fixup_arg_int(arg)
__kmsan_fixup_arg_4(&(arg))
#define KMSAN_FIXUP_ARG(type, arg) \
CONCATENATE(kmsan_fixup_arg_, type)(arg)
#define call_instr(f, ...) \
do { \
VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__); \
VA_CALL(f, __VA_ARGS__); \
} while (0)
That's obiously restricted to _two_ arguments and _one_ data type for
illustration purposes, but making it work for the limited number of
arguments and data types shouldn't be rocket science..
As a bonus it will fail to compile immediately for unknown data types
and argument numbers. It preserves type safety by not playing silly
casting games with the arguments on the way.
Not that I'm a fan of this, but that's at least better than playing
whack a mole with the insanities of toolchains forever.
Thoughts?
Thanks,
tglx
next prev parent reply other threads:[~2026-05-13 0:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <69e7ee1f.a00a0220.17a17.001d.GAE@google.com>
[not found] ` <CACT4Y+b5wU2wHfugXH0tjGyLS1ibTjj6KibXMMhKN59rXRuafw@mail.gmail.com>
[not found] ` <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com>
2026-05-11 12:25 ` [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt Thomas Gleixner
2026-05-12 9:33 ` Alexander Potapenko
2026-05-12 11:15 ` Mark Rutland
2026-05-12 16:24 ` Alexander Potapenko
2026-05-12 17:46 ` Mark Rutland
2026-05-13 0:36 ` Thomas Gleixner [this message]
2026-05-13 7:54 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zf246t8l.ffs@tglx \
--to=tglx@kernel.org \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=peterz@infradead.org \
--cc=ruanjinjie@huawei.com \
--cc=syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox