From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 784A319DF4F
	for <linux-kernel@vger.kernel.org>; Wed, 13 May 2026 00:36:14 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1778632574; cv=none; b=UKwbcn6EC9/ore4wGiWtg5YIe7jIdge300jh/fB2MsV2OObOjJ5NeQ6m4HFmLcNjO4A1sUcPXm7JszWk7cuQqC45IwbZFff6FzRuyVno9S5ErtuaiNRrI4hWpm1DbtiBq9Zd3keYf8r5c6Fmwn8tLB8PE7KVebwK2qE2WMHAAXw=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1778632574; c=relaxed/simple;
	bh=GK3kOBeEF+l4MMvszg0cOr3LzKO9UljrVZtKjJtXt1k=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID:
	 MIME-Version:Content-Type; b=Y3As+5TpDwx4o0LcmR3xz/sQQ3nIoorGL1+u6zsq1/XwgBry6Q2o3t5DXgQFAuGHNVLGfX0hCooLsfmNY0K329qsznPDncivc0rMU2+2cgpPiZ8vjbwhOdHjC6rHYo2U0EDsbz0URMEBTTOBeLZgZytgzrZRaeEbF5Abne8a8tw=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Gp8VzwlN; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Gp8VzwlN"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7EB60C2BCB0;
	Wed, 13 May 2026 00:36:13 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1778632574;
	bh=GK3kOBeEF+l4MMvszg0cOr3LzKO9UljrVZtKjJtXt1k=;
	h=From:To:Cc:Subject:In-Reply-To:References:Date:From;
	b=Gp8VzwlNocu+yK4I3B0Q4YJuhUZ3z9acwxhoggDiNT7BmhI3v98kzUDniKeeOpwL3
	 OzlvzhMIl/uWK1YeliuiUgsy5Osc2bVmGDQuqiP5B92QCGRH7fced6rKp8jSHYFy7L
	 peoLyotVZDn8llCJuBr6lelM8yB3g61+O22LpHNd/AkmEvaGBThuuIUJsT4tj6Ojeq
	 C74Gctwrm6hSMo8QmdpeQvDtOjkpwkldM1DzFkQy0XmOePgN3vXPZB53J/fgkiut/D
	 zt0aa/07W28JFgvoxe/S/RKYiipWvVo8ba/2SX8DiLtGsxxjSJ9nM1inUR4ZSpzPju
	 WRCeN8pIyQH1w==
From: Thomas Gleixner <tglx@kernel.org>
To: Alexander Potapenko <glider@google.com>, Mark Rutland
 <mark.rutland@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>, syzbot
 <syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com>, kasan-dev
 <kasan-dev@googlegroups.com>, linux-kernel@vger.kernel.org,
 luto@kernel.org, peterz@infradead.org, syzkaller-bugs@googlegroups.com,
 ruanjinjie@huawei.com
Subject: Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in
 irqentry_exit_to_kernel_mode_preempt
In-Reply-To: <CAG_fn=VGfkvsTJTq_TFkg_=FiCFmbY5uj=2bwSzMR+8Cmdxi+g@mail.gmail.com>
References: <69e7ee1f.a00a0220.17a17.001d.GAE@google.com>
 <CACT4Y+b5wU2wHfugXH0tjGyLS1ibTjj6KibXMMhKN59rXRuafw@mail.gmail.com>
 <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com>
 <87v7cu876c.ffs@tglx>
 <CAG_fn=Uyg0z65X+7DTju9hYPHtBHfB8tD1o2j_qxObm34LgTKQ@mail.gmail.com>
 <agMLyQ-ztO0yU3jZ@J2N7QTR9R3>
 <CAG_fn=VGfkvsTJTq_TFkg_=FiCFmbY5uj=2bwSzMR+8Cmdxi+g@mail.gmail.com>
Date: Wed, 13 May 2026 02:36:10 +0200
Message-ID: <87zf246t8l.ffs@tglx>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Alexander!

On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 1:15=E2=80=AFPM Mark Rutland <mark.rutland@arm.co=
m> wrote:
>> > Again, this only matters because we are calling an instrumented functi=
on from
>> > a non-instrumented one, otherwise it's perfectly fine to call between
>> > compilation units.
>>
>> For context, can you explain how this is expected to work across
>> compilation units when the caller and callee *are* instrumented, when an
>> argument is passed in a register?
>
> Per KMSAN ABI, the metadata for the function arguments is passed via
> buffers in the so-called context state (see
> include/linux/kmsan_types.h)
> In the userspace, these buffers are thread-local variables referenced
> by inline loads and stores.
> In the kernel space, the compiler inserts a call
> __msan_get_context_state() at the beginning of every function, and
> then the instrumentation code uses whatever that function returned.
>
> Assume we have a function:
>
> int sum(int a, int b) {
>     result =3D a + b;
>     return result;
> }
>
> Its instrumented version looks roughly as follows (we'll omit origin
> tracking for simplicity):
>
> int sum(int a, int b) {
>   struct kmsan_context_state *kcs =3D __msan_get_context_state();
>   int s_a =3D ((int)kcs->param_tls)[0];  // shadow of a
>   int s_b =3D ((int)kcs->param_tls)[1];  // shadow of b
>   result =3D a + b;
>   s_result =3D s_a | s_b;
>   ((int)kcs->retval_tls)[0] =3D s_result;  // returned shadow
>   return result;
> }
>
> Most certainly `a` and `b` will be passed using registers, but that
> doesn't matter: their metadata is safe as long as the caller does:
>
> ((int)kcs->param_tls)[0] =3D s_a;
> ((int)kcs->param_tls)[1] =3D s_b;
> result =3D sum(a, b);
> s_result =3D ((int)kcs->retval_tls)[0];

Thanks for the explanation.

TBH, I'm failing to see how this is not inSANe, but I might be missing
something obvious.

What really bothers me is the lack of distinction between arguments
passed by value vs. arguments passed by reference. For me there is a
clear distinction.

Any argument passed by reference, i.e. pointer, obviously requires to be
validated at the actual usage site, which might be several levels deep
into the call chain. That obviously does not include by value arguments
which are passed on the stack due to exceeding the register passing
constraints,

But for arguments passed by value, I fail to see the justification.

The compiler can validate at the call site that the value arguments are
properly initialized both at compile and at runtime:

    1) If the argument value is a constant it is initialized

    2) If the argument value is read from memory at the call site, then
       the function which reads has to validate that the memory is
       initialized before reading it.

    3) If the argument value was returned from a previous function call
       then that invoked function has to make sure that the returned
       value is properly initialized.

    4) If the argument is a local variable which is not initialized
       that's catched at compile time with and without KMSAN. If people
       ignore that warning, then they will ignore the resulting KMSAN
       splat too.
=20=20=20=20=20=20=20=20
    5) If the argument is only propagated to the next call, then #1 - #4
       apply to the caller, i.e. all of that propagates through the
       call chain.

    6) If the argument is handed in from the previous caller and
       modified by the function which calls the next one, then still #1
       - #4 apply.

IOW, the producer of a value argument has to ensure that the value
argument is properly initialized.

This whole 'is initialized' propagation for by value arguments is
overengineered voodoo in my opinion. Why?

  If the compiler can't ensure at the producer site that something is
  properly initialized and then can't ensure that the intermediates
  handing the value argument down the callchain are doing the right
  thing, then KMSAN won't help either as it is then by definition
  equally untrustworthy as the untrustworthy compiler which emitted the
  KMSAN self-validation along with the broken code.

  So it just adds overhead for nothing and makes everything look "sane"
  while in fact it provides no value and creates exactly the problems we
  are debating right now.

No?

> All the metadata tracking is implemented using an LLVM IR pass, which
> ensures that the shadow for each uninitialized value is tracked
> regardless of where that value is stored - be that heap memory, stack
> or registers:
> - when a value is created or loaded from memory, the compiler inserts
> SSA registers corresponding to that value's shadow;
> - when a value is written to memory, a shadow store is inserted;
> - when a value is used in an operation, the result of that operation
> receives a shadow value depending on the operands and their shadow
> values;
> - when a value is passed to a function, the corresponding context
> state stores and loads are emitted.
>
> Now, this works fine under the assumption that all code in the kernel
> is instrumented with KMSAN.
> However this is not true.

We know that since KMSAN got integrated and you pointed yourself to that
discussion which we had back then about handling the @regs pointer
argument.

There was a brief mentioning of the same problem about non-pointer
arguments, but that dried out because the test runs did not barf anymore
after the @regs unpoisoning got fixed.

There were discussions about utilizing instrumentation_begin()/end() to
tell the compiler that this instrumentation_begin() lifts the 'do not
sanitize directive' and it could rightfully inject instrumentation code
there including calls, but obviously nothing happened.

What really worries me more than this particular problem at hand is that
we have tons of places which invoke instrumented functions from
non-instrumented code with arguments originating from the
non-instrumented code. Here is an example:

DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
{
	do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
		      0, NULL);
}

#define DEFINE_IDTENTRY_ERRORCODE(func)					\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code);		\
									\
__visible noinstr void func(struct pt_regs *regs,			\
			    unsigned long error_code)			\
{									\
	irqentry_state_t state =3D irqentry_enter(regs);			\
									\
	instrumentation_begin();					\
	__##func (regs, error_code);					\
	instrumentation_end();						\
	irqentry_exit(regs, state);					\
}									\
									\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code)

@error_code comes all the way from the ASM entry code.

The function which gets inlined within the instrumentation_begin()/end()
region in the non-instrumented exc_invalid_tss() is:

static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigne=
d long error_code)
{
	do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NU=
LL);
}

Anything else than @regs (already unpoisoned) and @error_code is compile
time constant.

do_error_trap() is in the same compilation unit, static and
instrumentable:

static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
	unsigned long trapnr, int signr, int sicode, void __user *addr)
{
	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");

	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=3D
			NOTIFY_STOP) {
		cond_local_irq_enable(regs);
		do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
		cond_local_irq_disable(regs);
	}
}

That should suffer from the very same problem vs. _all_ arguments
except @regs, right?

But no, it does not. The same compilers which emit imaginary initialized
checks for something they out of lined themself despite knowing that the
only calling context is not instrumentable, ignore the very same for
do_error_trap().

According to the disassembly do_error_trap() just goes and invokes
notify_die() without any checks and notify_die() takes the same
(reordered) arguments uninspected and stores them into a on stack data
struct which is properly poisened first and then unpoisoned for each
member stored.

That's either insanely inconsistent or I've missed something in the
resulting ASM maze. But I'm pretty sure that I did not, so I spare you
the ASM analysis as you surely are able to look at the resulting
inconsistent assembly code yourself (clang-19 debian based).

>> > So what we could do is to add annotations manualy on either the caller=
 side or
>> > the callee side.
>>
>> Without some understanding of those "hints" you mention, I don't see how
>> we can do that on the caller side.
>
> In this case by "hints" I meant any kind of instrumentation that the
> compiler could have inserted automatically to mark the data in the
> instrumented functions as initialized.
> There are no such "hints" right now (apart from letting KMSAN
> instrument the function).
> As for the manual annotations, we only have the mentioned
> `__no_kmsan_checks`, `__no_sanitize_memory`, and
> kmsan_unpoison_memory().

Which is a sad state of affairs. These hints have been debated four
years ago for the very same reasons and nothing happened since then
other than the very same compiler becoming more agressive about
uninling.

>> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preem=
pt(),
>> > making all its inputs initialized. This is the easiest solution, it may
>> > introduce false negatives, but we are on a very thin ice anyway, so pe=
rhaps
>> > doing so is better than dealing with more false positives in the inter=
rupt code.
>> >
>> > Another option for the callee would be applying `__always_inline`, so =
that
>> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
>> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
>> > `__always_inline`, it might be the right thing to do.
>> >
>> We can do that, but this really suggests that there's a fundemantal
>> inability to pass arguments between code which is noinstr and code which
>> isinstrumented with AddressSanitizer, and that's inevitably going to
>> bite us in future.

For the problem at hand I really want to go and mark the inline
__always_inline so it's gone for now.

I hate that "solution" with a passion as it just papers over the
underlying problem and I definitely agree with Mark that this can only
to be considered to be a temporary band aid and is going to bite use
over and over again.

As I pointed out with exc_invalid_tss() this just works by chance and
not by design and there are far more places than that to illustrate it.

> This is true (assuming you mean MemorySanitizer), but:
> - I don't think we can avoid that, given that there will always be
> non-instrumented code;
> - so far the number of annotations has been manageable.

I agree that it's unavoidable, but I don't agree with the second
sentiment unless you meant: "Has been manageable by sheer luck".

What I said in that four year old discussion you pointed to earlier

  "I'm all for making use of advanced instrumentation, validation and
   debugging features, but this mindset of 'make the code comply to what
   the tool of today provides' is fundamentally wrong. Tools have to
   provide value to the programmer and not the other way round.

   Yes, it's more work on the tooling side, but the tooling side is
   mostly a one time effort while chasing the false positives is a long
   term nightmare."

still applies and while we might paper over the underlying problem for
now to keep the CI robots happy, we really have to come up with
something which is reliable and at the same time provides the maximum
coverage for instrumentation.

After reading back on that old thread I still think that we should stick
some hint for the compiler into instrumentation_begin()/end().

These macros exist to provide objtool a way to validate that
non-instrumentable code does not accidentaly call out into
instrumentable code without someone having done the reasoning why it is
safe. But at the same time these sections obviously can tell the
compiler that the by value arguments handed to any function invoked
within that section have to be considered valid.

I deliberately restricted this to by value arguments because by
reference arguments need to be explicitly covered as we do today for
@regs (also see above).

So the compiler can either emit the required KMSAN magic right before
the call or create a wrapper function, which does the required context
initialization.

IIRC, this has been debated four years ago already in some form.

But now four years down the road that obviously does not solve the
problem because nothing happened and we have to deal with existing
compilers and the only thing we can do until compiler people get their
act together is work around this insanity in kernel code again.

Why does TCMalloc come to my mind right now?

Ranted enough.

Here is what I came up with which seems to be "workable" after throwing
at least a dozen other "brilliant and trivial" ideas out:

     instrumentation_begin();
-    func(arg1, arg2);
+    call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2);=20=20=20=20=
=20
     instrumentation_end();

and let call_instr() inject the necessary KMSAN fixups before invoking
@func. That would probably do some useless stuff when @func is an empty
stub, a __always_inline function or subject to arbitrary inlining
decisions of the compiler. But that's not the end of the world and the
conversion can be mostly done by coccinelle or your favorite code
conversion tool of the day.

Something insane like this:

#define KMSAN_FIXUP_VA_1(insane, type, arg)          insane(type, arg)
#define KMSAN_FIXUP_VA_2(insane, arg, ...)           KMSAN_FIXUP_VA_1(insan=
e, arg, __VA_ARGS__)
#define KMSAN_FIXUP_VA_3(insane, type, arg, ...)     insane(type, arg), KMS=
AN_FIXUP_VA_2(insane, __VA_ARGS__)
#define KMSAN_FIXUP_VA_4(insane, arg, ...)           KMSAN_FIXUP_VA_3(insan=
e, arg, __VA_ARGS__)

#define KMSAN_FIXUP_VA_FOR_EACH(insane, ...)                             \
        CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_=
ARGS__)

#define KMSAN_VA_ARG_2(a1, a2)                        (a2)
#define KMSAN_VA_ARG_4(a1, a2, a3, a4)                (a2), (a4)

#define VA_CALL(f, ...)                                            \
        f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__))

#define kmsan_fixup_arg_int(arg)
        __kmsan_fixup_arg_4(&(arg))

#define KMSAN_FIXUP_ARG(type, arg)                                 \
        CONCATENATE(kmsan_fixup_arg_, type)(arg)

#define call_instr(f, ...)                                         \
do {                                                               \
        VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__);      \
        VA_CALL(f, __VA_ARGS__);                                   \
} while (0)

That's obiously restricted to _two_ arguments and _one_ data type for
illustration purposes, but making it work for the limited number of
arguments and data types shouldn't be rocket science..

As a bonus it will fail to compile immediately for unknown data types
and argument numbers. It preserves type safety by not playing silly
casting games with the arguments on the way.

Not that I'm a fan of this, but that's at least better than playing
whack a mole with the insanities of toolchains forever.

Thoughts?

Thanks,

        tglx