From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 784A319DF4F for ; Wed, 13 May 2026 00:36:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778632574; cv=none; b=UKwbcn6EC9/ore4wGiWtg5YIe7jIdge300jh/fB2MsV2OObOjJ5NeQ6m4HFmLcNjO4A1sUcPXm7JszWk7cuQqC45IwbZFff6FzRuyVno9S5ErtuaiNRrI4hWpm1DbtiBq9Zd3keYf8r5c6Fmwn8tLB8PE7KVebwK2qE2WMHAAXw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778632574; c=relaxed/simple; bh=GK3kOBeEF+l4MMvszg0cOr3LzKO9UljrVZtKjJtXt1k=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=Y3As+5TpDwx4o0LcmR3xz/sQQ3nIoorGL1+u6zsq1/XwgBry6Q2o3t5DXgQFAuGHNVLGfX0hCooLsfmNY0K329qsznPDncivc0rMU2+2cgpPiZ8vjbwhOdHjC6rHYo2U0EDsbz0URMEBTTOBeLZgZytgzrZRaeEbF5Abne8a8tw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Gp8VzwlN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Gp8VzwlN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7EB60C2BCB0; Wed, 13 May 2026 00:36:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778632574; bh=GK3kOBeEF+l4MMvszg0cOr3LzKO9UljrVZtKjJtXt1k=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Gp8VzwlNocu+yK4I3B0Q4YJuhUZ3z9acwxhoggDiNT7BmhI3v98kzUDniKeeOpwL3 OzlvzhMIl/uWK1YeliuiUgsy5Osc2bVmGDQuqiP5B92QCGRH7fced6rKp8jSHYFy7L peoLyotVZDn8llCJuBr6lelM8yB3g61+O22LpHNd/AkmEvaGBThuuIUJsT4tj6Ojeq C74Gctwrm6hSMo8QmdpeQvDtOjkpwkldM1DzFkQy0XmOePgN3vXPZB53J/fgkiut/D zt0aa/07W28JFgvoxe/S/RKYiipWvVo8ba/2SX8DiLtGsxxjSJ9nM1inUR4ZSpzPju WRCeN8pIyQH1w== From: Thomas Gleixner To: Alexander Potapenko , Mark Rutland Cc: Dmitry Vyukov , syzbot , kasan-dev , linux-kernel@vger.kernel.org, luto@kernel.org, peterz@infradead.org, syzkaller-bugs@googlegroups.com, ruanjinjie@huawei.com Subject: Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt In-Reply-To: References: <69e7ee1f.a00a0220.17a17.001d.GAE@google.com> <87v7cu876c.ffs@tglx> Date: Wed, 13 May 2026 02:36:10 +0200 Message-ID: <87zf246t8l.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Alexander! On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 1:15=E2=80=AFPM Mark Rutland wrote: >> > Again, this only matters because we are calling an instrumented functi= on from >> > a non-instrumented one, otherwise it's perfectly fine to call between >> > compilation units. >> >> For context, can you explain how this is expected to work across >> compilation units when the caller and callee *are* instrumented, when an >> argument is passed in a register? > > Per KMSAN ABI, the metadata for the function arguments is passed via > buffers in the so-called context state (see > include/linux/kmsan_types.h) > In the userspace, these buffers are thread-local variables referenced > by inline loads and stores. > In the kernel space, the compiler inserts a call > __msan_get_context_state() at the beginning of every function, and > then the instrumentation code uses whatever that function returned. > > Assume we have a function: > > int sum(int a, int b) { > result =3D a + b; > return result; > } > > Its instrumented version looks roughly as follows (we'll omit origin > tracking for simplicity): > > int sum(int a, int b) { > struct kmsan_context_state *kcs =3D __msan_get_context_state(); > int s_a =3D ((int)kcs->param_tls)[0]; // shadow of a > int s_b =3D ((int)kcs->param_tls)[1]; // shadow of b > result =3D a + b; > s_result =3D s_a | s_b; > ((int)kcs->retval_tls)[0] =3D s_result; // returned shadow > return result; > } > > Most certainly `a` and `b` will be passed using registers, but that > doesn't matter: their metadata is safe as long as the caller does: > > ((int)kcs->param_tls)[0] =3D s_a; > ((int)kcs->param_tls)[1] =3D s_b; > result =3D sum(a, b); > s_result =3D ((int)kcs->retval_tls)[0]; Thanks for the explanation. TBH, I'm failing to see how this is not inSANe, but I might be missing something obvious. What really bothers me is the lack of distinction between arguments passed by value vs. arguments passed by reference. For me there is a clear distinction. Any argument passed by reference, i.e. pointer, obviously requires to be validated at the actual usage site, which might be several levels deep into the call chain. That obviously does not include by value arguments which are passed on the stack due to exceeding the register passing constraints, But for arguments passed by value, I fail to see the justification. The compiler can validate at the call site that the value arguments are properly initialized both at compile and at runtime: 1) If the argument value is a constant it is initialized 2) If the argument value is read from memory at the call site, then the function which reads has to validate that the memory is initialized before reading it. 3) If the argument value was returned from a previous function call then that invoked function has to make sure that the returned value is properly initialized. 4) If the argument is a local variable which is not initialized that's catched at compile time with and without KMSAN. If people ignore that warning, then they will ignore the resulting KMSAN splat too. =20=20=20=20=20=20=20=20 5) If the argument is only propagated to the next call, then #1 - #4 apply to the caller, i.e. all of that propagates through the call chain. 6) If the argument is handed in from the previous caller and modified by the function which calls the next one, then still #1 - #4 apply. IOW, the producer of a value argument has to ensure that the value argument is properly initialized. This whole 'is initialized' propagation for by value arguments is overengineered voodoo in my opinion. Why? If the compiler can't ensure at the producer site that something is properly initialized and then can't ensure that the intermediates handing the value argument down the callchain are doing the right thing, then KMSAN won't help either as it is then by definition equally untrustworthy as the untrustworthy compiler which emitted the KMSAN self-validation along with the broken code. So it just adds overhead for nothing and makes everything look "sane" while in fact it provides no value and creates exactly the problems we are debating right now. No? > All the metadata tracking is implemented using an LLVM IR pass, which > ensures that the shadow for each uninitialized value is tracked > regardless of where that value is stored - be that heap memory, stack > or registers: > - when a value is created or loaded from memory, the compiler inserts > SSA registers corresponding to that value's shadow; > - when a value is written to memory, a shadow store is inserted; > - when a value is used in an operation, the result of that operation > receives a shadow value depending on the operands and their shadow > values; > - when a value is passed to a function, the corresponding context > state stores and loads are emitted. > > Now, this works fine under the assumption that all code in the kernel > is instrumented with KMSAN. > However this is not true. We know that since KMSAN got integrated and you pointed yourself to that discussion which we had back then about handling the @regs pointer argument. There was a brief mentioning of the same problem about non-pointer arguments, but that dried out because the test runs did not barf anymore after the @regs unpoisoning got fixed. There were discussions about utilizing instrumentation_begin()/end() to tell the compiler that this instrumentation_begin() lifts the 'do not sanitize directive' and it could rightfully inject instrumentation code there including calls, but obviously nothing happened. What really worries me more than this particular problem at hand is that we have tons of places which invoke instrumented functions from non-instrumented code with arguments originating from the non-instrumented code. Here is an example: DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); } #define DEFINE_IDTENTRY_ERRORCODE(func) \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code); \ \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ irqentry_state_t state =3D irqentry_enter(regs); \ \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code) @error_code comes all the way from the ASM entry code. The function which gets inlined within the instrumentation_begin()/end() region in the non-instrumented exc_invalid_tss() is: static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigne= d long error_code) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NU= LL); } Anything else than @regs (already unpoisoned) and @error_code is compile time constant. do_error_trap() is in the same compilation unit, static and instrumentable: static void do_error_trap(struct pt_regs *regs, long error_code, char *str, unsigned long trapnr, int signr, int sicode, void __user *addr) { RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=3D NOTIFY_STOP) { cond_local_irq_enable(regs); do_trap(trapnr, signr, str, regs, error_code, sicode, addr); cond_local_irq_disable(regs); } } That should suffer from the very same problem vs. _all_ arguments except @regs, right? But no, it does not. The same compilers which emit imaginary initialized checks for something they out of lined themself despite knowing that the only calling context is not instrumentable, ignore the very same for do_error_trap(). According to the disassembly do_error_trap() just goes and invokes notify_die() without any checks and notify_die() takes the same (reordered) arguments uninspected and stores them into a on stack data struct which is properly poisened first and then unpoisoned for each member stored. That's either insanely inconsistent or I've missed something in the resulting ASM maze. But I'm pretty sure that I did not, so I spare you the ASM analysis as you surely are able to look at the resulting inconsistent assembly code yourself (clang-19 debian based). >> > So what we could do is to add annotations manualy on either the caller= side or >> > the callee side. >> >> Without some understanding of those "hints" you mention, I don't see how >> we can do that on the caller side. > > In this case by "hints" I meant any kind of instrumentation that the > compiler could have inserted automatically to mark the data in the > instrumented functions as initialized. > There are no such "hints" right now (apart from letting KMSAN > instrument the function). > As for the manual annotations, we only have the mentioned > `__no_kmsan_checks`, `__no_sanitize_memory`, and > kmsan_unpoison_memory(). Which is a sad state of affairs. These hints have been debated four years ago for the very same reasons and nothing happened since then other than the very same compiler becoming more agressive about uninling. >> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preem= pt(), >> > making all its inputs initialized. This is the easiest solution, it may >> > introduce false negatives, but we are on a very thin ice anyway, so pe= rhaps >> > doing so is better than dealing with more false positives in the inter= rupt code. >> > >> > Another option for the callee would be applying `__always_inline`, so = that >> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. >> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already >> > `__always_inline`, it might be the right thing to do. >> > >> We can do that, but this really suggests that there's a fundemantal >> inability to pass arguments between code which is noinstr and code which >> isinstrumented with AddressSanitizer, and that's inevitably going to >> bite us in future. For the problem at hand I really want to go and mark the inline __always_inline so it's gone for now. I hate that "solution" with a passion as it just papers over the underlying problem and I definitely agree with Mark that this can only to be considered to be a temporary band aid and is going to bite use over and over again. As I pointed out with exc_invalid_tss() this just works by chance and not by design and there are far more places than that to illustrate it. > This is true (assuming you mean MemorySanitizer), but: > - I don't think we can avoid that, given that there will always be > non-instrumented code; > - so far the number of annotations has been manageable. I agree that it's unavoidable, but I don't agree with the second sentiment unless you meant: "Has been manageable by sheer luck". What I said in that four year old discussion you pointed to earlier "I'm all for making use of advanced instrumentation, validation and debugging features, but this mindset of 'make the code comply to what the tool of today provides' is fundamentally wrong. Tools have to provide value to the programmer and not the other way round. Yes, it's more work on the tooling side, but the tooling side is mostly a one time effort while chasing the false positives is a long term nightmare." still applies and while we might paper over the underlying problem for now to keep the CI robots happy, we really have to come up with something which is reliable and at the same time provides the maximum coverage for instrumentation. After reading back on that old thread I still think that we should stick some hint for the compiler into instrumentation_begin()/end(). These macros exist to provide objtool a way to validate that non-instrumentable code does not accidentaly call out into instrumentable code without someone having done the reasoning why it is safe. But at the same time these sections obviously can tell the compiler that the by value arguments handed to any function invoked within that section have to be considered valid. I deliberately restricted this to by value arguments because by reference arguments need to be explicitly covered as we do today for @regs (also see above). So the compiler can either emit the required KMSAN magic right before the call or create a wrapper function, which does the required context initialization. IIRC, this has been debated four years ago already in some form. But now four years down the road that obviously does not solve the problem because nothing happened and we have to deal with existing compilers and the only thing we can do until compiler people get their act together is work around this insanity in kernel code again. Why does TCMalloc come to my mind right now? Ranted enough. Here is what I came up with which seems to be "workable" after throwing at least a dozen other "brilliant and trivial" ideas out: instrumentation_begin(); - func(arg1, arg2); + call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2);=20=20=20=20= =20 instrumentation_end(); and let call_instr() inject the necessary KMSAN fixups before invoking @func. That would probably do some useless stuff when @func is an empty stub, a __always_inline function or subject to arbitrary inlining decisions of the compiler. But that's not the end of the world and the conversion can be mostly done by coccinelle or your favorite code conversion tool of the day. Something insane like this: #define KMSAN_FIXUP_VA_1(insane, type, arg) insane(type, arg) #define KMSAN_FIXUP_VA_2(insane, arg, ...) KMSAN_FIXUP_VA_1(insan= e, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_3(insane, type, arg, ...) insane(type, arg), KMS= AN_FIXUP_VA_2(insane, __VA_ARGS__) #define KMSAN_FIXUP_VA_4(insane, arg, ...) KMSAN_FIXUP_VA_3(insan= e, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_FOR_EACH(insane, ...) \ CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_= ARGS__) #define KMSAN_VA_ARG_2(a1, a2) (a2) #define KMSAN_VA_ARG_4(a1, a2, a3, a4) (a2), (a4) #define VA_CALL(f, ...) \ f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)) #define kmsan_fixup_arg_int(arg) __kmsan_fixup_arg_4(&(arg)) #define KMSAN_FIXUP_ARG(type, arg) \ CONCATENATE(kmsan_fixup_arg_, type)(arg) #define call_instr(f, ...) \ do { \ VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__); \ VA_CALL(f, __VA_ARGS__); \ } while (0) That's obiously restricted to _two_ arguments and _one_ data type for illustration purposes, but making it work for the limited number of arguments and data types shouldn't be rocket science.. As a bonus it will fail to compile immediately for unknown data types and argument numbers. It preserves type safety by not playing silly casting games with the arguments on the way. Not that I'm a fan of this, but that's at least better than playing whack a mole with the insanities of toolchains forever. Thoughts? Thanks, tglx