* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt [not found] ` <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com> @ 2026-05-11 12:25 ` Thomas Gleixner 2026-05-12 9:33 ` Alexander Potapenko 0 siblings, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2026-05-11 12:25 UTC (permalink / raw) To: Alexander Potapenko, Dmitry Vyukov, Mark Rutland Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Wed, Apr 29 2026 at 10:09, Alexander Potapenko wrote: > On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > I believe Mark's recent changes introduced this. I don't believe any of this. I prefer facts. > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > and `state`. > Because there is a lot of non-instrumented code around, we fail to > initialize these variables. > Instead, irqentry_enter_from_kernel_mode() explicitly calls > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > the state. > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > at the same place. Good luck with unpoisoning 'state'. 'state' is not memory to begin with. 'state' is composed in irqentry_enter_from_kernel_mode() and returned from there to the caller as value in RAX. That returned value is then provided as function argument to irqentry_exit() and subsequently to the inlines invoked from there. See the deeper ASM analysis below. Aside of that the whole splat makes no sense at all due to this: >> > Uninit was created at: >> > slab_post_alloc_hook mm/slub.c:4576 [inline] >> > slab_alloc_node mm/slub.c:4898 [inline] >> > __do_kmalloc_node mm/slub.c:5294 [inline] >> > __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403 >> > kmalloc_reserve net/core/skbuff.c:635 [inline] >> > __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713 >> > alloc_skb include/linux/skbuff.h:1383 [inline] >> > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline] >> > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] >> > nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922 How can irqentry_exit_to_kernel_mode_preempt() "touch" an uninitialized SKB which was allocated by the interrupted code, when it only evaluates regs->eflags and the rcu_exit portion of the register supplied 'state' argument? >> > ===================================================== >> > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 >> > irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 >> > irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline] >> > irqentry_exit+0x7b/0x760 kernel/entry/common.c:164 >> > sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061 >> > asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697 >> > __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210 >> > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline] >> > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] >> > nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922 So I got the vmlinux and disassembled the related code parts: ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>: ffffffff81da7240: 55 push %rbp ffffffff81da7241: 48 89 e5 mov %rsp,%rbp ffffffff81da7244: 41 57 push %r15 ffffffff81da7246: 41 56 push %r14 ffffffff81da7248: 41 55 push %r13 ffffffff81da724a: 41 54 push %r12 ffffffff81da724c: 53 push %rbx ffffffff81da724d: 41 89 f6 mov %esi,%r14d // Save 'state' in r14 ffffffff81da7250: 49 89 ff mov %rdi,%r15 // Save 'regs' in r15 ffffffff81da7253: e8 08 84 d4 00 call ffffffff82aef660 <__msan_get_context_state> ffffffff81da7258: 44 0f b6 68 08 movzbl 0x8(%rax),%r13d // Read magic value into r13d (related to 'state') ffffffff81da725d: 8b 98 90 0c 00 00 mov 0xc90(%rax),%ebx // Read magic value into ebx (related to 'regs') ffffffff81da7263: 4d 8b a7 90 00 00 00 mov 0x90(%r15),%r12 // Read regs->eflags into r12 ffffffff81da726a: 49 81 c7 90 00 00 00 add $0x90,%r15 // Add eflags offset to r15 ffffffff81da7271: 4c 89 ff mov %r15,%rdi // MSAN check for accessing regs->eflags ffffffff81da7274: e8 b7 78 d4 00 call ffffffff82aeeb30 <__msan_metadata_ptr_for_load_8> ffffffff81da7279: 8b 08 mov (%rax),%ecx // Read magic data from the address returned by msan ffffffff81da727b: 89 c8 mov %ecx,%eax // Transfer result to eax ffffffff81da727d: f7 d0 not %eax // One's complement negate eax ffffffff81da727f: 41 81 e4 00 02 00 00 and $0x200,%r12d // Isolate eflags::IF (bit 9) ffffffff81da7286: c1 e9 09 shr $0x9,%ecx // Right shift the magic value by 9 (IF into bit 0) ffffffff81da7289: 0d ff fd ff ff or $0xfffffdff,%eax // Set all bits in eax except bit 9 ffffffff81da728e: 41 85 c4 test %eax,%r12d // Logical compare between eax and r12d The result is in ZF: 1 when bit 9 is not set in eax and/or r12d. 0 if bit 9 is set in both. ffffffff81da7291: 0f 94 c0 sete %al // If ZF==1 AL=1 otherwise AL=0 ffffffff81da7294: 20 c8 and %cl,%al // And the right shifted bit 9 ffffffff81da7296: 4d 85 e4 test %r12,%r12 // Check whether r12 is 0 ( IF set or not ) ffffffff81da7299: 41 0f 94 c7 sete %r15b // If ZF==1 r15b=1 otherwise r15b=0 ffffffff81da729d: 44 89 e1 mov %r12d,%ecx // Transfer the isolated bit 9 into ecx ffffffff81da72a0: c1 e9 09 shr $0x9,%ecx // Shift it into bit 0 ffffffff81da72a3: 44 20 e9 and %r13b,%cl // And the magic context state value ffffffff81da72a6: 45 08 f7 or %r14b,%r15b // Or 'state' on the IF result created above ffffffff81da72a9: 41 f6 d6 not %r14b // One's complement negate 'state' ffffffff81da72ac: 45 08 ee or %r13b,%r14b // Or the above magic KMSAN value on the negated 'state' 'state' is a register argument value. So how on earth can KMSAN mangle this with information stored in the KMSAN context? See below. ffffffff81da72af: 41 0f b6 f6 movzbl %r14b,%esi // zero extend the result into esi ffffffff81da72b3: 0f b6 c9 movzbl %cl,%ecx // zero extend the magic result into ecx ffffffff81da72b6: 84 c0 test %al,%al // Check whether al is 0 (magic IF result) ffffffff81da72b8: 0f 45 ce cmovne %esi,%ecx // If not, move the magic mangled state result into ecx This is where the whole thing goes south because it uses the register state which was wrongfully mangled with the KMSAN state in r13b. ffffffff81da72bb: f6 c1 01 test $0x1,%cl // Check whether the result is 1 // If 1, jump to the kmsan warning section ffffffff81da72be: 75 1b jne ffffffff81da72db <irqentry_exit_to_kernel_mode_preempt+0x9b> ffffffff81da72c0: 41 f6 c7 01 test $0x1,%r15b // Check the actual condition i.e. if (regs_irqs_disabled(regs) || state.exit_rcu) ffffffff81da72c4: 75 05 jne ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b> ffffffff81da72c6: e8 9d f1 37 0f call ffffffff91126468 <__SCT__irqentry_exit_cond_resched> ffffffff81da72cb: 5b pop %rbx ffffffff81da72cc: 41 5c pop %r12 ffffffff81da72ce: 41 5d pop %r13 ffffffff81da72d0: 41 5e pop %r14 ffffffff81da72d2: 41 5f pop %r15 ffffffff81da72d4: 5d pop %rbp ffffffff81da72d5: 2e e9 d5 e2 37 0f cs jmp ffffffff911255b0 <__pi___x86_return_thunk> // KMSAN warning ffffffff81da72db: 31 c9 xor %ecx,%ecx ffffffff81da72dd: 4d 85 e4 test %r12,%r12 ffffffff81da72e0: 0f 44 d9 cmove %ecx,%ebx ffffffff81da72e3: 84 c0 test %al,%al ffffffff81da72e5: 74 02 je ffffffff81da72e9 <irqentry_exit_to_kernel_mode_preempt+0xa9> ffffffff81da72e7: 8b 1a mov (%rdx),%ebx ffffffff81da72e9: 89 df mov %ebx,%edi ffffffff81da72eb: e8 30 83 d4 00 call ffffffff82aef620 <__msan_warning> ffffffff81da72f0: 41 f6 c7 01 test $0x1,%r15b ffffffff81da72f4: 74 d0 je ffffffff81da72c6 <irqentry_exit_to_kernel_mode_preempt+0x86> ffffffff81da72f6: eb d3 jmp ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b> Let's look at the call sites to prove that KMSAN cannot observe 'state' as a memory access ever. noinstr sysvec_apic_timer_interrupt(regs) { // Not observable stack or register state irqentry_state_t state = irqentry_enter(regs); instrumentation_begin(); __sysvec_apic_timer_interrupt(regs); instrumentation_end(); irqentry_exit(regs, state); } Here is the disassembly: ffffffff910e6840 <sysvec_apic_timer_interrupt>: ffffffff910e6840: f3 0f 1e fa endbr64 ffffffff910e6844: 55 push %rbp ffffffff910e6845: 48 89 e5 mov %rsp,%rbp ffffffff910e6848: 41 56 push %r14 ffffffff910e684a: 53 push %rbx ffffffff910e684b: 48 89 fb mov %rdi,%rbx // Saves 'regs' in rbx ffffffff910e684e: e8 ad 07 00 00 call ffffffff910e7000 <irqentry_enter> // invokes irqentry_enter() ffffffff910e6853: 41 89 c6 mov %eax,%r14d // Saves the returned 'state' in r14d ffffffff910e6856: 65 c6 05 a2 d9 7f 04 01 movb $0x1,%gs:0x47fd9a2(%rip) # ffffffff958e4200 <irq_stat> ffffffff910e685e: 90 nop ffffffff910e685f: f6 83 88 00 00 00 03 testb $0x3,0x88(%rbx) ffffffff910e6866: 75 0b jne ffffffff910e6873 <sysvec_apic_timer_interrupt+0x33> ffffffff910e6868: 65 8a 05 c3 f7 76 04 mov %gs:0x476f7c3(%rip),%al # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e686f: 84 c0 test %al,%al ffffffff910e6871: 74 29 je ffffffff910e689c <sysvec_apic_timer_interrupt+0x5c> ffffffff910e6873: e8 28 eb 8e f0 call ffffffff819d53a0 <irq_enter_rcu> ffffffff910e6878: 48 89 df mov %rbx,%rdi ffffffff910e687b: e8 c0 a5 80 f0 call ffffffff818f0e40 <__sysvec_apic_timer_interrupt> ffffffff910e6880: e8 3b ed 8e f0 call ffffffff819d55c0 <irq_exit_rcu> ffffffff910e6885: 90 nop ffffffff910e6886: 41 0f b6 f6 movzbl %r14b,%esi // Moves the saved 'state' into esi ffffffff910e688a: 48 89 df mov %rbx,%rdi // Moves the saved 'regs' into rdi ffffffff910e688d: e8 4e 08 00 00 call ffffffff910e70e0 <irqentry_exit> // Invokes irqentry_exit() ffffffff910e6892: 5b pop %rbx ffffffff910e6893: 41 5e pop %r14 ffffffff910e6895: 5d pop %rbp ffffffff910e6896: 2e e9 14 ed 03 00 cs jmp ffffffff911255b0 <__pi___x86_return_thunk> ffffffff910e689c: 65 c6 05 8e f7 76 04 01 movb $0x1,%gs:0x476f78e(%rip) # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e68a4: 65 4c 8b 1d 54 f7 76 04 mov %gs:0x476f754(%rip),%r11 # ffffffff95856000 <hardirq_stack_ptr> ffffffff910e68ac: 49 89 23 mov %rsp,(%r11) ffffffff910e68af: 4c 89 dc mov %r11,%rsp ffffffff910e68b2: e8 e9 ea 8e f0 call ffffffff819d53a0 <irq_enter_rcu> ffffffff910e68b7: 48 89 df mov %rbx,%rdi ffffffff910e68ba: e8 81 a5 80 f0 call ffffffff818f0e40 <__sysvec_apic_timer_interrupt> ffffffff910e68bf: e8 fc ec 8e f0 call ffffffff819d55c0 <irq_exit_rcu> ffffffff910e68c4: 5c pop %rsp ffffffff910e68c5: 65 c6 05 65 f7 76 04 00 movb $0x0,%gs:0x476f765(%rip) # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e68cd: eb b6 jmp ffffffff910e6885 <sysvec_apic_timer_interrupt+0x45> ffffffff910e68cf: 90 nop Purely register state in a function which is not instrumented to begin with. Now let's look at irqentry_enter(): noinstr irqentry_enter(regs) { // Not observable stack or register state irqentry_state_t state = ....; .... return state; } ffffffff910e7000 <irqentry_enter>: ffffffff910e7000: f3 0f 1e fa endbr64 ffffffff910e7004: 55 push %rbp ffffffff910e7005: 48 89 e5 mov %rsp,%rbp ffffffff910e7008: 53 push %rbx ffffffff910e7009: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) // check for kernel entry ffffffff910e7010: 74 59 je ffffffff910e706b <irqentry_enter+0x6b> // branch taken ... ffffffff910e706b: 65 48 8b 05 95 ef 76 04 mov %gs:0x476ef95(%rip),%rax # ffffffff95856008 <current_task> ffffffff910e7073: f6 40 2c 02 testb $0x2,0x2c(%rax) // is_idle_task()? ffffffff910e7077: 75 0b jne ffffffff910e7084 <irqentry_enter+0x84> ffffffff910e7079: 90 nop ffffffff910e707a: e8 91 99 a0 f1 call ffffffff82af0a10 <kmsan_unpoison_entry_regs> // Unpoisons regs ffffffff910e707f: 90 nop ffffffff910e7080: 31 c0 xor %eax,%eax // state = 0 ffffffff910e7082: eb 14 jmp ffffffff910e7098 <irqentry_enter+0x98> // return ffffffff910e7084: 48 89 fb mov %rdi,%rbx // Preserve 'regs' ffffffff910e7087: e8 44 0c 00 00 call ffffffff910e7cd0 <ct_irq_enter> ffffffff910e708c: 90 nop ffffffff910e708d: 48 89 df mov %rbx,%rdi // Restore 'regs' ffffffff910e7090: e8 7b 99 a0 f1 call ffffffff82af0a10 <kmsan_unpoison_entry_regs> // Unpoisons 'regs' ffffffff910e7095: 90 nop ffffffff910e7096: b0 01 mov $0x1,%al // state = 1 ffffffff910e7098: 5b pop %rbx // return ffffffff910e7099: 5d pop %rbp ffffffff910e709a: 2e e9 10 e5 03 00 cs jmp ffffffff911255b0 <__pi___x86_return_thunk> As this just returns state constants in RAX, there is zero related KMSAN memory state, right? Now irqentry_exit: noinstr irqentry_exit(regs, state) { .... irqentry_exit_to_kernel_mode_preempt(regs, state); } ffffffff910e70e0 <irqentry_exit>: ffffffff910e70e0: f3 0f 1e fa endbr64 ffffffff910e70e4: 55 push %rbp ffffffff910e70e5: 48 89 e5 mov %rsp,%rbp ffffffff910e70e8: 41 57 push %r15 ffffffff910e70ea: 41 56 push %r14 ffffffff910e70ec: 41 55 push %r13 ffffffff910e70ee: 41 54 push %r12 ffffffff910e70f0: 53 push %rbx ffffffff910e70f1: 48 83 ec 68 sub $0x68,%rsp ffffffff910e70f5: 49 89 fe mov %rdi,%r14 // Save 'regs' ffffffff910e70f8: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) // return to kernel mode ffffffff910e70ff: 74 4c je ffffffff910e714d <irqentry_exit+0x6d> // branch taken ... ffffffff910e714d: 89 f3 mov %esi,%ebx // Save 'state' ffffffff910e714f: 90 nop ffffffff910e7150: 0f b6 f3 movzbl %bl,%esi ffffffff910e7153: 4c 89 f7 mov %r14,%rdi ffffffff910e7156: e8 e5 00 cc f0 call ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt> ffffffff910e715b: 90 nop Again. 'state' is a pure register value, which is handed to irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). But KMSAN magically associates a memory access which it and then claims it belongs to a SKB which was allocated in the interrupted code. What Mark's change actually does is to make the register value 'state' observable in an instrumented function, while before that 'state' was always confined in the non instrumentable code. But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() is a pure register value, which could be even a constant supplied by the caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. I just built the same .config with clang-19 and it is hallucinating in the same way, though the actual assembly is more readable than the syzbot clang-21 variant: 0000000000002710 <irqentry_exit_to_kernel_mode_preempt>: 2710: 55 push %rbp 2711: 48 89 e5 mov %rsp,%rbp 2714: 41 57 push %r15 2716: 41 56 push %r14 2718: 41 55 push %r13 271a: 41 54 push %r12 271c: 53 push %rbx 271d: 89 f3 mov %esi,%ebx // save 'state' in EBX 271f: 49 89 ff mov %rdi,%r15 // save 'regs' in R15 2722: e8 00 00 00 00 call 2727 <irqentry_exit_to_kernel_mode_preempt+0x17> 2723: R_X86_64_PLT32 __msan_get_context_state-0x4 2727: 44 0f b6 60 08 movzbl 0x8(%rax),%r12d // Magic KMSAN data related to 'state' 272c: 44 8b b0 90 0c 00 00 mov 0xc90(%rax),%r14d // Magic KMSAN data related to 'regs::eflags' 2733: 4d 8b af 90 00 00 00 mov 0x90(%r15),%r13 // Read the actual regs->eflags 273a: 49 81 c7 90 00 00 00 add $0x90,%r15 // Adjust pointer to regs::eflags 2741: 4c 89 ff mov %r15,%rdi 2744: e8 00 00 00 00 call 2749 <irqentry_exit_to_kernel_mode_preempt+0x39> 2745: R_X86_64_PLT32 __msan_metadata_ptr_for_load_8-0x4 2749: 48 8b 00 mov (%rax),%rax // Read the KMSAN state 274c: 25 00 02 00 00 and $0x200,%eax // Isolate bit 9 (IF?) 2751: 41 81 e5 00 02 00 00 and $0x200,%r13d // Isolate IF in the actual value 2758: 48 85 c0 test %rax,%rax // Test whether the KMSAN IF state is 0 275b: 74 07 je 2764 <irqentry_exit_to_kernel_mode_preempt+0x54> // Not zero 275d: f7 d0 not %eax // negate the KMSAN state 275f: 44 21 e8 and %r13d,%eax // And the actual value (IF isolated) // If zero, goto MSAN warning 2762: 74 25 je 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2764: 4d 85 ed test %r13,%r13 // Check whether IF is set // If not, return 2767: 74 10 je 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2769: 41 f6 c4 01 test $0x1,%r12b // Check the magic KMSAN state (related to 'state') // If not zero, goto MSAN warning 276d: 75 28 jne 2797 <irqentry_exit_to_kernel_mode_preempt+0x87> 276f: f6 c3 01 test $0x1,%bl // Check state.rcu_exit // If not, return 2772: 75 05 jne 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2774: e8 00 00 00 00 call 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2775: R_X86_64_PLT32 __SCT__irqentry_exit_cond_resched-0x4 2779: 5b pop %rbx 277a: 41 5c pop %r12 277c: 41 5d pop %r13 277e: 41 5e pop %r14 2780: 41 5f pop %r15 2782: 5d pop %rbp 2783: 2e e9 00 00 00 00 cs jmp 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2785: R_X86_64_PLT32 __x86_return_thunk-0x4 2789: 8b 3a mov (%rdx),%edi 278b: e8 00 00 00 00 call 2790 <irqentry_exit_to_kernel_mode_preempt+0x80> 278c: R_X86_64_PLT32 __msan_warning-0x4 2790: 4d 85 ed test %r13,%r13 2793: 75 d4 jne 2769 <irqentry_exit_to_kernel_mode_preempt+0x59> 2795: eb e2 jmp 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2797: 44 89 f7 mov %r14d,%edi 279a: e8 00 00 00 00 call 279f <irqentry_exit_to_kernel_mode_preempt+0x8f> 279b: R_X86_64_PLT32 __msan_warning-0x4 279f: f6 c3 01 test $0x1,%bl 27a2: 74 d0 je 2774 <irqentry_exit_to_kernel_mode_preempt+0x64> 27a4: eb d3 jmp 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> The compiler _cannot_ assume anything about the 'state' argument as that's handed in as value in RSI from a completely different compilation unit. Something is wrong in KMSAN/compiler land or do you still believe that you just need to unpoison the non existing memory 'state'? Thanks, tglx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-11 12:25 ` [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt Thomas Gleixner @ 2026-05-12 9:33 ` Alexander Potapenko 2026-05-12 11:15 ` Mark Rutland 0 siblings, 1 reply; 7+ messages in thread From: Alexander Potapenko @ 2026-05-12 9:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Dmitry Vyukov, Mark Rutland, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > Hi Thomas, thanks for diving into this! > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > and `state`. > > Because there is a lot of non-instrumented code around, we fail to > > initialize these variables. > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > the state. > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > at the same place. > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. My bad. Normally, the compiler would happily move `state` to memory if it is address-taken, and make sure that its shadow is tracked properly. But not in this case, because irqentry_enter_from_kernel_mode() is inlined into a `noinstr` function. <I omit the elaborate assembly analysis here, tipping my hat!> > Again. 'state' is a pure register value, which is handed to > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). There is no notion of a 'pure register value' in C, and the compiler may make arbitrary decisions about whether a particular value is stored on the stack or in the registers. Luckily, KMSAN does not have to know about that, because it works on the LLVM IR level and can track the state of a value regardless of where it is stored. In particular, it normally works for function return values - unless `noinstr` kicks in. > > But KMSAN magically associates a memory access which it and then claims > it belongs to a SKB which was allocated in the interrupted code. > > What Mark's change actually does is to make the register value 'state' > observable in an instrumented function, while before that 'state' was > always confined in the non instrumentable code. Agreed. > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > is a pure register value, which could be even a constant supplied by the > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. I disagree with a general implication that KMSAN has zero business to fiddle with values passed in registers. But I agree we are doing a poor job trying to pull the shadow for `state` out of thin air. > > The compiler _cannot_ assume anything about the 'state' argument as > that's handed in as value in RSI from a completely different compilation > unit. > Again, this only matters because we are calling an instrumented function from a non-instrumented one, otherwise it's perfectly fine to call between compilation units. > Something is wrong in KMSAN/compiler land or do you still believe that > you just need to unpoison the non existing memory 'state'? > When we call an instrumented function from a non-instrumented one, the compiler is doomed to not understand that and to be unable to track the function parameters properly. Exactly because `noinstr` implies no instrumentation whatsoever, the compiler may not add any hints on the caller side that would help the callee understand what's going on - even if KMSAN is able to see this `noinstr` function (which is not always the case). So what we could do is to add annotations manualy on either the caller side or the callee side. We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), making all its inputs initialized. This is the easiest solution, it may introduce false negatives, but we are on a very thin ice anyway, so perhaps doing so is better than dealing with more false positives in the interrupt code. Another option for the callee would be applying `__always_inline`, so that irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. Given that irqentry_exit_to_kernel_mode_after_preempt() is already `__always_inline`, it might be the right thing to do. On the caller side, we could do something creative with instrumentation_begin() and instrumentation_end(). We've had a discussion about that exactly four years ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u , but came to a conclusion that a handful of annotations on the noinstr/instr boundary may do a better job than a solution that doesn't cover all cases. In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in instrumentation_begin(). But it wouldn't solve more complex (yet rare, and non-existing today) cases where two functions are called from an instrumented region, and the first function somehow leaves the argument state poisoned. Do you think it's worth revisiting the instrumentation_begin() approach, or shall we go with one of the compiler attributes instead? Thank you, Alexander -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 9:33 ` Alexander Potapenko @ 2026-05-12 11:15 ` Mark Rutland 2026-05-12 16:24 ` Alexander Potapenko 0 siblings, 1 reply; 7+ messages in thread From: Mark Rutland @ 2026-05-12 11:15 UTC (permalink / raw) To: Alexander Potapenko Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote: > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > > and `state`. > > > Because there is a lot of non-instrumented code around, we fail to > > > initialize these variables. > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > > the state. > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > > at the same place. > > > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. > > My bad. Normally, the compiler would happily move `state` to memory if it is > address-taken, and make sure that its shadow is tracked properly. > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into > a `noinstr` function. > > <I omit the elaborate assembly analysis here, tipping my hat!> > > > Again. 'state' is a pure register value, which is handed to > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). > > There is no notion of a 'pure register value' in C, and the compiler may make > arbitrary decisions about whether a particular value is stored on the stack or > in the registers. > > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR > level and can track the state of a value regardless of where it is stored. > In particular, it normally works for function return values - unless `noinstr` > kicks in. > > > > > But KMSAN magically associates a memory access which it and then claims > > it belongs to a SKB which was allocated in the interrupted code. > > > > What Mark's change actually does is to make the register value 'state' > > observable in an instrumented function, while before that 'state' was > > always confined in the non instrumentable code. > > Agreed. > > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > > is a pure register value, which could be even a constant supplied by the > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. > > I disagree with a general implication that KMSAN has zero business to fiddle > with values passed in registers. But I agree we are doing a poor job trying > to pull the shadow for `state` out of thin air. > > > The compiler _cannot_ assume anything about the 'state' argument as > > that's handed in as value in RSI from a completely different compilation > > unit. > > Again, this only matters because we are calling an instrumented function from > a non-instrumented one, otherwise it's perfectly fine to call between > compilation units. For context, can you explain how this is expected to work across compilation units when the caller and callee *are* instrumented, when an argument is passed in a register? Below you suggest that the caller might add "hints", but it's not clear specifically what this means. > > Something is wrong in KMSAN/compiler land or do you still believe that > > you just need to unpoison the non existing memory 'state'? > > When we call an instrumented function from a non-instrumented one, the compiler > is doomed to not understand that and to be unable to track the function > parameters properly. Exactly because `noinstr` implies no instrumentation > whatsoever, the compiler may not add any hints on the caller side that would > help the callee understand what's going on - even if KMSAN is able to see this > `noinstr` function (which is not always the case). > > So what we could do is to add annotations manualy on either the caller side or > the callee side. Without some understanding of those "hints" you mention, I don't see how we can do that on the caller side. > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > making all its inputs initialized. This is the easiest solution, it may > introduce false negatives, but we are on a very thin ice anyway, so perhaps > doing so is better than dealing with more false positives in the interrupt code. > > Another option for the callee would be applying `__always_inline`, so that > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > `__always_inline`, it might be the right thing to do. We can do that, but this really suggests that there's a fundemantal inability to pass arguments between code which is noinstr and code which isinstrumented with AddressSanitizer, and that's inevitably going to bite us in future. > On the caller side, we could do something creative with instrumentation_begin() > and instrumentation_end(). We've had a discussion about that exactly four years > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > , but came to a conclusion that a handful of annotations on the noinstr/instr > boundary may do a better job than a solution that doesn't cover all cases. That doesn't look general at all, so I am not keen on that. > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > non-existing today) cases where two functions are called from an instrumented > region, and the first function somehow leaves the argument state poisoned. How exactly is kmsan_context_state used? If that's supposed to carry some global or current context, surely blatting that in entry code will affect the code that was interrupted? I see kmsan_get_context() has an in_task() check, but that can't help with nested exceptions, so this doesn't look right at all. > Do you think it's worth revisiting the instrumentation_begin() approach, or > shall we go with one of the compiler attributes instead? I think we need a better understanding of this first. It looks to me that there are bigger problems here. Mark. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 11:15 ` Mark Rutland @ 2026-05-12 16:24 ` Alexander Potapenko 2026-05-12 17:46 ` Mark Rutland 2026-05-13 0:36 ` Thomas Gleixner 0 siblings, 2 replies; 7+ messages in thread From: Alexander Potapenko @ 2026-05-12 16:24 UTC (permalink / raw) To: Mark Rutland Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote: > > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > > > and `state`. > > > > Because there is a lot of non-instrumented code around, we fail to > > > > initialize these variables. > > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > > > the state. > > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > > > at the same place. > > > > > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. > > > > My bad. Normally, the compiler would happily move `state` to memory if it is > > address-taken, and make sure that its shadow is tracked properly. > > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into > > a `noinstr` function. > > > > <I omit the elaborate assembly analysis here, tipping my hat!> > > > > > Again. 'state' is a pure register value, which is handed to > > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). > > > > There is no notion of a 'pure register value' in C, and the compiler may make > > arbitrary decisions about whether a particular value is stored on the stack or > > in the registers. > > > > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR > > level and can track the state of a value regardless of where it is stored. > > In particular, it normally works for function return values - unless `noinstr` > > kicks in. > > > > > > > > But KMSAN magically associates a memory access which it and then claims > > > it belongs to a SKB which was allocated in the interrupted code. > > > > > > What Mark's change actually does is to make the register value 'state' > > > observable in an instrumented function, while before that 'state' was > > > always confined in the non instrumentable code. > > > > Agreed. > > > > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > > > is a pure register value, which could be even a constant supplied by the > > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. > > > > I disagree with a general implication that KMSAN has zero business to fiddle > > with values passed in registers. But I agree we are doing a poor job trying > > to pull the shadow for `state` out of thin air. > > > > > The compiler _cannot_ assume anything about the 'state' argument as > > > that's handed in as value in RSI from a completely different compilation > > > unit. > > > > Again, this only matters because we are calling an instrumented function from > > a non-instrumented one, otherwise it's perfectly fine to call between > > compilation units. > > For context, can you explain how this is expected to work across > compilation units when the caller and callee *are* instrumented, when an > argument is passed in a register? Per KMSAN ABI, the metadata for the function arguments is passed via buffers in the so-called context state (see include/linux/kmsan_types.h) In the userspace, these buffers are thread-local variables referenced by inline loads and stores. In the kernel space, the compiler inserts a call __msan_get_context_state() at the beginning of every function, and then the instrumentation code uses whatever that function returned. Assume we have a function: int sum(int a, int b) { result = a + b; return result; } Its instrumented version looks roughly as follows (we'll omit origin tracking for simplicity): int sum(int a, int b) { struct kmsan_context_state *kcs = __msan_get_context_state(); int s_a = ((int)kcs->param_tls)[0]; // shadow of a int s_b = ((int)kcs->param_tls)[1]; // shadow of b result = a + b; s_result = s_a | s_b; ((int)kcs->retval_tls)[0] = s_result; // returned shadow return result; } Most certainly `a` and `b` will be passed using registers, but that doesn't matter: their metadata is safe as long as the caller does: ((int)kcs->param_tls)[0] = s_a; ((int)kcs->param_tls)[1] = s_b; result = sum(a, b); s_result = ((int)kcs->retval_tls)[0]; All the metadata tracking is implemented using an LLVM IR pass, which ensures that the shadow for each uninitialized value is tracked regardless of where that value is stored - be that heap memory, stack or registers: - when a value is created or loaded from memory, the compiler inserts SSA registers corresponding to that value's shadow; - when a value is written to memory, a shadow store is inserted; - when a value is used in an operation, the result of that operation receives a shadow value depending on the operands and their shadow values; - when a value is passed to a function, the corresponding context state stores and loads are emitted. Now, this works fine under the assumption that all code in the kernel is instrumented with KMSAN. However this is not true. 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some prevent infinite recursion caused by KMSAN calling instrumented code; others disable instrumentation of the code that executes before KMSAN is fully set up. 2. There are annotations to soft-disable KMSAN for a particular function by adding a `__no_kmsan_checks` attribute. There will still be instrumentation, but all the function outputs (stores and returns) will be initialized, and no errors will be reported. This is handy when the compiler cannot properly instrument the underlying code. One example would be tricky inline assembly, another one - stack walking functions that deliberately read and write uninitialized values. 3. There are cases in which KMSAN must be disabled for the whole function (`__no_sanitize_memory`). We disable instrumentation for `noinstr` functions. Additionally, on x86 there is exactly one case where this is done to avoid infinite recursion when storing the origins. It's important to note that, due to how function attributes are implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will be applied to a function if that function exists during the KMSAN instrumentation pass. If it was inlined before that pass, the compiler will apply the instrumentation rules for whichever function is calling the inlined one. Now, because of the described ABI, calls between instrumented functions and/or functions marked `__no_kmsan_checks` are perfectly fine because the function arguments are stored and loaded on both ends. What does not work well is calling non-instrumented functions from instrumented functions, and vice versa. In the former case, KMSAN will not see the callee's side effects (return values or memory stores), in the latter case, the callee may receive incorrect shadow values for the function parameters or memory stores in the caller. Both will We have few tools to deal with such cases. It often helps to move the border between the instrumented and non-instrumented code by applying __no_kmsan_checks or __no_sanitize_memory until we get to a point where there are no incorrect shadow arguments. Another way to deal with uninitialized data coming from non-instrumented code is kmsan_unpoison_memory(address, size). Unfortunately, as Thomas pointed out, we can't use it for locals in non-instrumented code. > Below you suggest that the caller might add "hints", but it's not clear > specifically what this means. > > > > Something is wrong in KMSAN/compiler land or do you still believe that > > > you just need to unpoison the non existing memory 'state'? > > > > When we call an instrumented function from a non-instrumented one, the compiler > > is doomed to not understand that and to be unable to track the function > > parameters properly. Exactly because `noinstr` implies no instrumentation > > whatsoever, the compiler may not add any hints on the caller side that would > > help the callee understand what's going on - even if KMSAN is able to see this > > `noinstr` function (which is not always the case). > > > > So what we could do is to add annotations manualy on either the caller side or > > the callee side. > > Without some understanding of those "hints" you mention, I don't see how > we can do that on the caller side. In this case by "hints" I meant any kind of instrumentation that the compiler could have inserted automatically to mark the data in the instrumented functions as initialized. There are no such "hints" right now (apart from letting KMSAN instrument the function). As for the manual annotations, we only have the mentioned `__no_kmsan_checks`, `__no_sanitize_memory`, and kmsan_unpoison_memory(). > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > making all its inputs initialized. This is the easiest solution, it may > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > doing so is better than dealing with more false positives in the interrupt code. > > > > Another option for the callee would be applying `__always_inline`, so that > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > `__always_inline`, it might be the right thing to do. > > We can do that, but this really suggests that there's a fundemantal > inability to pass arguments between code which is noinstr and code which > isinstrumented with AddressSanitizer, and that's inevitably going to > bite us in future. This is true (assuming you mean MemorySanitizer), but: - I don't think we can avoid that, given that there will always be non-instrumented code; - so far the number of annotations has been manageable. > > > On the caller side, we could do something creative with instrumentation_begin() > > and instrumentation_end(). We've had a discussion about that exactly four years > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > > , but came to a conclusion that a handful of annotations on the noinstr/instr > > boundary may do a better job than a solution that doesn't cover all cases. > > That doesn't look general at all, so I am not keen on that. Ack. > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > > non-existing today) cases where two functions are called from an instrumented > > region, and the first function somehow leaves the argument state poisoned. > > How exactly is kmsan_context_state used? Hope the explanation above helps. > > If that's supposed to carry some global or current context, surely > blatting that in entry code will affect the code that was interrupted? It will mostly affect the bottom-level function called by the entry code, after that, nested functions will be just passing their parameters as per KMSAN ABI. > I see kmsan_get_context() has an in_task() check, but that can't help > with nested exceptions, so this doesn't look right at all. KMSAN context is per-task, and if !in_task(), it is per-CPU. For the nested exceptions, we conservatively bail out (see kmsan_in_runtime()) to avoid deadlocking. This may cause false negatives. > > Do you think it's worth revisiting the instrumentation_begin() approach, or > > shall we go with one of the compiler attributes instead? > I think we need a better understanding of this first. > > It looks to me that there are bigger problems here. > I'll be happy to discuss this further. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 16:24 ` Alexander Potapenko @ 2026-05-12 17:46 ` Mark Rutland 2026-05-13 0:36 ` Thomas Gleixner 1 sibling, 0 replies; 7+ messages in thread From: Mark Rutland @ 2026-05-12 17:46 UTC (permalink / raw) To: Alexander Potapenko Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie Hi, Thanks for this; the explanation of the KMSAN ABI is *very* helpful. I have a few questions and comments inline below. On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > For context, can you explain how this is expected to work across > > compilation units when the caller and callee *are* instrumented, when an > > argument is passed in a register? > > Per KMSAN ABI, the metadata for the function arguments is passed via > buffers in the so-called context state (see > include/linux/kmsan_types.h) > In the userspace, these buffers are thread-local variables referenced > by inline loads and stores. > In the kernel space, the compiler inserts a call > __msan_get_context_state() at the beginning of every function, and > then the instrumentation code uses whatever that function returned. > > Assume we have a function: > > int sum(int a, int b) { > result = a + b; > return result; > } > > Its instrumented version looks roughly as follows (we'll omit origin > tracking for simplicity): > > int sum(int a, int b) { > struct kmsan_context_state *kcs = __msan_get_context_state(); > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > result = a + b; > s_result = s_a | s_b; > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > return result; > } > > Most certainly `a` and `b` will be passed using registers, but that > doesn't matter: their metadata is safe as long as the caller does: > > ((int)kcs->param_tls)[0] = s_a; > ((int)kcs->param_tls)[1] = s_b; > result = sum(a, b); > s_result = ((int)kcs->retval_tls)[0]; Ok. I see how that works when both caller and callee are instrumented. I assume that's tracked for all arguments regardless of type? e.g. that includes pointers, structs, etc? > All the metadata tracking is implemented using an LLVM IR pass, which > ensures that the shadow for each uninitialized value is tracked > regardless of where that value is stored - be that heap memory, stack > or registers: > - when a value is created or loaded from memory, the compiler inserts > SSA registers corresponding to that value's shadow; > - when a value is written to memory, a shadow store is inserted; > - when a value is used in an operation, the result of that operation > receives a shadow value depending on the operands and their shadow > values; > - when a value is passed to a function, the corresponding context > state stores and loads are emitted. > > Now, this works fine under the assumption that all code in the kernel > is instrumented with KMSAN. > However this is not true. > > 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some > prevent infinite recursion caused by KMSAN calling instrumented code; > others disable instrumentation of the code that executes before KMSAN > is fully set up. > 2. There are annotations to soft-disable KMSAN for a particular > function by adding a `__no_kmsan_checks` attribute. There will still > be instrumentation, but all the function outputs (stores and returns) > will be initialized, and no errors will be reported. > This is handy when the compiler cannot properly instrument the > underlying code. One example would be tricky inline assembly, another > one - stack walking functions that deliberately read and write > uninitialized values. > 3. There are cases in which KMSAN must be disabled for the whole > function (`__no_sanitize_memory`). > We disable instrumentation for `noinstr` functions. Additionally, on > x86 there is exactly one case where this is done to avoid infinite > recursion when storing the origins. In addition to the above cases, out-of-line assembly functions won't manipulate the shadow (so they're effectively the same as noinstr). > It's important to note that, due to how function attributes are > implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will > be applied to a function if that function exists during the KMSAN > instrumentation pass. If it was inlined before that pass, the compiler > will apply the instrumentation rules for whichever function is calling > the inlined one. > > Now, because of the described ABI, calls between instrumented > functions and/or functions marked `__no_kmsan_checks` are perfectly > fine because the function arguments are stored and loaded on both > ends. > What does not work well is calling non-instrumented functions from > instrumented functions, and vice versa. Yep; understood. > In the former case, KMSAN will not see the callee's side effects > (return values or memory stores), in the latter case, the callee may > receive incorrect shadow values for the function parameters or memory > stores in the caller. Both will Incomplete sentence here? > We have few tools to deal with such cases. It often helps to move the > border between the instrumented and non-instrumented code by applying > __no_kmsan_checks or __no_sanitize_memory until we get to a point > where there are no incorrect shadow arguments. > Another way to deal with uninitialized data coming from > non-instrumented code is kmsan_unpoison_memory(address, size). > Unfortunately, as Thomas pointed out, we can't use it for locals in > non-instrumented code. It's clearly not sufficient to use kmsan_unpoison_memory() for locals, and I think it's non-sensical because conceptually they aren't memory. I don't think kmsan_unpoison_entry_regs() alone is sufficient for the regs. Surely the pointer argument itself needs to be unpoisoned before being passed to a callee? Otherwise, when pointer to regs is passed as an argument to a callee, the callee will consume stale shadow, which might indicate that the pointer to regs is uninitialized? That seems to be exactly what we're hitting with the irq entry state, so I'm surprised we're not hitting it for the regs. Is that sheer luck, or is something else protecting us? > > Below you suggest that the caller might add "hints", but it's not clear > > specifically what this means. > > > > > > Something is wrong in KMSAN/compiler land or do you still believe that > > > > you just need to unpoison the non existing memory 'state'? > > > > > > When we call an instrumented function from a non-instrumented one, the compiler > > > is doomed to not understand that and to be unable to track the function > > > parameters properly. Exactly because `noinstr` implies no instrumentation > > > whatsoever, the compiler may not add any hints on the caller side that would > > > help the callee understand what's going on - even if KMSAN is able to see this > > > `noinstr` function (which is not always the case). > > > > > > So what we could do is to add annotations manualy on either the caller side or > > > the callee side. > > > > Without some understanding of those "hints" you mention, I don't see how > > we can do that on the caller side. > > In this case by "hints" I meant any kind of instrumentation that the > compiler could have inserted automatically to mark the data in the > instrumented functions as initialized. > There are no such "hints" right now (apart from letting KMSAN > instrument the function). > As for the manual annotations, we only have the mentioned > `__no_kmsan_checks`, `__no_sanitize_memory`, and > kmsan_unpoison_memory(). Ok. For noinstr code calling instrumented code, I see that we could explicitly modify the argument shadow in the noinstr caller, but that's *really* grotty, and I don't think we want to do that. > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > > making all its inputs initialized. This is the easiest solution, it may > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > > doing so is better than dealing with more false positives in the interrupt code. > > > > > > Another option for the callee would be applying `__always_inline`, so that > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > > `__always_inline`, it might be the right thing to do. > > > > We can do that, but this really suggests that there's a fundemantal > > inability to pass arguments between code which is noinstr and code which > > isinstrumented with AddressSanitizer, and that's inevitably going to > > bite us in future. > > This is true (assuming you mean MemorySanitizer), but: Sorry, yes, I meant MemorySanitizer. > - I don't think we can avoid that, given that there will always be > non-instrumented code; > - so far the number of annotations has been manageable. As above, I suspect that we're actually missing many necessary annotations and getting away with this by accident. I suspect that we need additional compiler support to say something like "assume this argument/return value IS initialized". > > > On the caller side, we could do something creative with instrumentation_begin() > > > and instrumentation_end(). We've had a discussion about that exactly four years > > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > > > , but came to a conclusion that a handful of annotations on the noinstr/instr > > > boundary may do a better job than a solution that doesn't cover all cases. > > > > That doesn't look general at all, so I am not keen on that. > > Ack. > > > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > > > non-existing today) cases where two functions are called from an instrumented > > > region, and the first function somehow leaves the argument state poisoned. > > > > How exactly is kmsan_context_state used? > > Hope the explanation above helps. > > > > If that's supposed to carry some global or current context, surely > > blatting that in entry code will affect the code that was interrupted? > > It will mostly affect the bottom-level function called by the entry > code, after that, nested functions will be just passing their > parameters as per KMSAN ABI. The problem I was explaining was the other way around: we clobber the context of the function that was executing *before* exception entry. Consider when the CPU is executing some function foo(), and an exception is taken. Within the exception handler, code will clobber the context state. AFAICT, that state is not saved/restored, and nor is it zeroed prior to exception return. The exception handler returns back to the middle of foo(). From foo's PoV, the context state has been clobbered arbitrarily, and that clobbered state could trigger false positives or false negatives. The way kmsan_get_context() uses in_task() will avoid that in *some* cases, but not *all* cases. In particular, I don't think that's going to help when a fault is taken for a uaccess. Maybe I'm missing something here? > > I see kmsan_get_context() has an in_task() check, but that can't help > > with nested exceptions, so this doesn't look right at all. > > KMSAN context is per-task, and if !in_task(), it is per-CPU. > > For the nested exceptions, we conservatively bail out (see > kmsan_in_runtime()) to avoid deadlocking. > This may cause false negatives. AFAICT that doesn't address the case I've described above. Mark. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 16:24 ` Alexander Potapenko 2026-05-12 17:46 ` Mark Rutland @ 2026-05-13 0:36 ` Thomas Gleixner 2026-05-13 7:54 ` Peter Zijlstra 1 sibling, 1 reply; 7+ messages in thread From: Thomas Gleixner @ 2026-05-13 0:36 UTC (permalink / raw) To: Alexander Potapenko, Mark Rutland Cc: Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie Alexander! On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: >> > Again, this only matters because we are calling an instrumented function from >> > a non-instrumented one, otherwise it's perfectly fine to call between >> > compilation units. >> >> For context, can you explain how this is expected to work across >> compilation units when the caller and callee *are* instrumented, when an >> argument is passed in a register? > > Per KMSAN ABI, the metadata for the function arguments is passed via > buffers in the so-called context state (see > include/linux/kmsan_types.h) > In the userspace, these buffers are thread-local variables referenced > by inline loads and stores. > In the kernel space, the compiler inserts a call > __msan_get_context_state() at the beginning of every function, and > then the instrumentation code uses whatever that function returned. > > Assume we have a function: > > int sum(int a, int b) { > result = a + b; > return result; > } > > Its instrumented version looks roughly as follows (we'll omit origin > tracking for simplicity): > > int sum(int a, int b) { > struct kmsan_context_state *kcs = __msan_get_context_state(); > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > result = a + b; > s_result = s_a | s_b; > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > return result; > } > > Most certainly `a` and `b` will be passed using registers, but that > doesn't matter: their metadata is safe as long as the caller does: > > ((int)kcs->param_tls)[0] = s_a; > ((int)kcs->param_tls)[1] = s_b; > result = sum(a, b); > s_result = ((int)kcs->retval_tls)[0]; Thanks for the explanation. TBH, I'm failing to see how this is not inSANe, but I might be missing something obvious. What really bothers me is the lack of distinction between arguments passed by value vs. arguments passed by reference. For me there is a clear distinction. Any argument passed by reference, i.e. pointer, obviously requires to be validated at the actual usage site, which might be several levels deep into the call chain. That obviously does not include by value arguments which are passed on the stack due to exceeding the register passing constraints, But for arguments passed by value, I fail to see the justification. The compiler can validate at the call site that the value arguments are properly initialized both at compile and at runtime: 1) If the argument value is a constant it is initialized 2) If the argument value is read from memory at the call site, then the function which reads has to validate that the memory is initialized before reading it. 3) If the argument value was returned from a previous function call then that invoked function has to make sure that the returned value is properly initialized. 4) If the argument is a local variable which is not initialized that's catched at compile time with and without KMSAN. If people ignore that warning, then they will ignore the resulting KMSAN splat too. 5) If the argument is only propagated to the next call, then #1 - #4 apply to the caller, i.e. all of that propagates through the call chain. 6) If the argument is handed in from the previous caller and modified by the function which calls the next one, then still #1 - #4 apply. IOW, the producer of a value argument has to ensure that the value argument is properly initialized. This whole 'is initialized' propagation for by value arguments is overengineered voodoo in my opinion. Why? If the compiler can't ensure at the producer site that something is properly initialized and then can't ensure that the intermediates handing the value argument down the callchain are doing the right thing, then KMSAN won't help either as it is then by definition equally untrustworthy as the untrustworthy compiler which emitted the KMSAN self-validation along with the broken code. So it just adds overhead for nothing and makes everything look "sane" while in fact it provides no value and creates exactly the problems we are debating right now. No? > All the metadata tracking is implemented using an LLVM IR pass, which > ensures that the shadow for each uninitialized value is tracked > regardless of where that value is stored - be that heap memory, stack > or registers: > - when a value is created or loaded from memory, the compiler inserts > SSA registers corresponding to that value's shadow; > - when a value is written to memory, a shadow store is inserted; > - when a value is used in an operation, the result of that operation > receives a shadow value depending on the operands and their shadow > values; > - when a value is passed to a function, the corresponding context > state stores and loads are emitted. > > Now, this works fine under the assumption that all code in the kernel > is instrumented with KMSAN. > However this is not true. We know that since KMSAN got integrated and you pointed yourself to that discussion which we had back then about handling the @regs pointer argument. There was a brief mentioning of the same problem about non-pointer arguments, but that dried out because the test runs did not barf anymore after the @regs unpoisoning got fixed. There were discussions about utilizing instrumentation_begin()/end() to tell the compiler that this instrumentation_begin() lifts the 'do not sanitize directive' and it could rightfully inject instrumentation code there including calls, but obviously nothing happened. What really worries me more than this particular problem at hand is that we have tons of places which invoke instrumented functions from non-instrumented code with arguments originating from the non-instrumented code. Here is an example: DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); } #define DEFINE_IDTENTRY_ERRORCODE(func) \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code); \ \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code) @error_code comes all the way from the ASM entry code. The function which gets inlined within the instrumentation_begin()/end() region in the non-instrumented exc_invalid_tss() is: static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); } Anything else than @regs (already unpoisoned) and @error_code is compile time constant. do_error_trap() is in the same compilation unit, static and instrumentable: static void do_error_trap(struct pt_regs *regs, long error_code, char *str, unsigned long trapnr, int signr, int sicode, void __user *addr) { RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) != NOTIFY_STOP) { cond_local_irq_enable(regs); do_trap(trapnr, signr, str, regs, error_code, sicode, addr); cond_local_irq_disable(regs); } } That should suffer from the very same problem vs. _all_ arguments except @regs, right? But no, it does not. The same compilers which emit imaginary initialized checks for something they out of lined themself despite knowing that the only calling context is not instrumentable, ignore the very same for do_error_trap(). According to the disassembly do_error_trap() just goes and invokes notify_die() without any checks and notify_die() takes the same (reordered) arguments uninspected and stores them into a on stack data struct which is properly poisened first and then unpoisoned for each member stored. That's either insanely inconsistent or I've missed something in the resulting ASM maze. But I'm pretty sure that I did not, so I spare you the ASM analysis as you surely are able to look at the resulting inconsistent assembly code yourself (clang-19 debian based). >> > So what we could do is to add annotations manualy on either the caller side or >> > the callee side. >> >> Without some understanding of those "hints" you mention, I don't see how >> we can do that on the caller side. > > In this case by "hints" I meant any kind of instrumentation that the > compiler could have inserted automatically to mark the data in the > instrumented functions as initialized. > There are no such "hints" right now (apart from letting KMSAN > instrument the function). > As for the manual annotations, we only have the mentioned > `__no_kmsan_checks`, `__no_sanitize_memory`, and > kmsan_unpoison_memory(). Which is a sad state of affairs. These hints have been debated four years ago for the very same reasons and nothing happened since then other than the very same compiler becoming more agressive about uninling. >> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), >> > making all its inputs initialized. This is the easiest solution, it may >> > introduce false negatives, but we are on a very thin ice anyway, so perhaps >> > doing so is better than dealing with more false positives in the interrupt code. >> > >> > Another option for the callee would be applying `__always_inline`, so that >> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. >> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already >> > `__always_inline`, it might be the right thing to do. >> > >> We can do that, but this really suggests that there's a fundemantal >> inability to pass arguments between code which is noinstr and code which >> isinstrumented with AddressSanitizer, and that's inevitably going to >> bite us in future. For the problem at hand I really want to go and mark the inline __always_inline so it's gone for now. I hate that "solution" with a passion as it just papers over the underlying problem and I definitely agree with Mark that this can only to be considered to be a temporary band aid and is going to bite use over and over again. As I pointed out with exc_invalid_tss() this just works by chance and not by design and there are far more places than that to illustrate it. > This is true (assuming you mean MemorySanitizer), but: > - I don't think we can avoid that, given that there will always be > non-instrumented code; > - so far the number of annotations has been manageable. I agree that it's unavoidable, but I don't agree with the second sentiment unless you meant: "Has been manageable by sheer luck". What I said in that four year old discussion you pointed to earlier "I'm all for making use of advanced instrumentation, validation and debugging features, but this mindset of 'make the code comply to what the tool of today provides' is fundamentally wrong. Tools have to provide value to the programmer and not the other way round. Yes, it's more work on the tooling side, but the tooling side is mostly a one time effort while chasing the false positives is a long term nightmare." still applies and while we might paper over the underlying problem for now to keep the CI robots happy, we really have to come up with something which is reliable and at the same time provides the maximum coverage for instrumentation. After reading back on that old thread I still think that we should stick some hint for the compiler into instrumentation_begin()/end(). These macros exist to provide objtool a way to validate that non-instrumentable code does not accidentaly call out into instrumentable code without someone having done the reasoning why it is safe. But at the same time these sections obviously can tell the compiler that the by value arguments handed to any function invoked within that section have to be considered valid. I deliberately restricted this to by value arguments because by reference arguments need to be explicitly covered as we do today for @regs (also see above). So the compiler can either emit the required KMSAN magic right before the call or create a wrapper function, which does the required context initialization. IIRC, this has been debated four years ago already in some form. But now four years down the road that obviously does not solve the problem because nothing happened and we have to deal with existing compilers and the only thing we can do until compiler people get their act together is work around this insanity in kernel code again. Why does TCMalloc come to my mind right now? Ranted enough. Here is what I came up with which seems to be "workable" after throwing at least a dozen other "brilliant and trivial" ideas out: instrumentation_begin(); - func(arg1, arg2); + call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2); instrumentation_end(); and let call_instr() inject the necessary KMSAN fixups before invoking @func. That would probably do some useless stuff when @func is an empty stub, a __always_inline function or subject to arbitrary inlining decisions of the compiler. But that's not the end of the world and the conversion can be mostly done by coccinelle or your favorite code conversion tool of the day. Something insane like this: #define KMSAN_FIXUP_VA_1(insane, type, arg) insane(type, arg) #define KMSAN_FIXUP_VA_2(insane, arg, ...) KMSAN_FIXUP_VA_1(insane, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_3(insane, type, arg, ...) insane(type, arg), KMSAN_FIXUP_VA_2(insane, __VA_ARGS__) #define KMSAN_FIXUP_VA_4(insane, arg, ...) KMSAN_FIXUP_VA_3(insane, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_FOR_EACH(insane, ...) \ CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_ARGS__) #define KMSAN_VA_ARG_2(a1, a2) (a2) #define KMSAN_VA_ARG_4(a1, a2, a3, a4) (a2), (a4) #define VA_CALL(f, ...) \ f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)) #define kmsan_fixup_arg_int(arg) __kmsan_fixup_arg_4(&(arg)) #define KMSAN_FIXUP_ARG(type, arg) \ CONCATENATE(kmsan_fixup_arg_, type)(arg) #define call_instr(f, ...) \ do { \ VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__); \ VA_CALL(f, __VA_ARGS__); \ } while (0) That's obiously restricted to _two_ arguments and _one_ data type for illustration purposes, but making it work for the limited number of arguments and data types shouldn't be rocket science.. As a bonus it will fail to compile immediately for unknown data types and argument numbers. It preserves type safety by not playing silly casting games with the arguments on the way. Not that I'm a fan of this, but that's at least better than playing whack a mole with the insanities of toolchains forever. Thoughts? Thanks, tglx ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-13 0:36 ` Thomas Gleixner @ 2026-05-13 7:54 ` Peter Zijlstra 0 siblings, 0 replies; 7+ messages in thread From: Peter Zijlstra @ 2026-05-13 7:54 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexander Potapenko, Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, syzkaller-bugs, ruanjinjie On Wed, May 13, 2026 at 02:36:10AM +0200, Thomas Gleixner wrote: > But now four years down the road that obviously does not solve the > problem because nothing happened and we have to deal with existing > compilers and the only thing we can do until compiler people get their > act together is work around this insanity in kernel code again. > Thoughts? KMSAN is debug cruft. I don't see why we cannot demand they fix it and make KMSAN depend whatever clang will have the thing fixed. There is no reason we need to support existing compilers what so ever for debug code. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-13 7:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <69e7ee1f.a00a0220.17a17.001d.GAE@google.com>
[not found] ` <CACT4Y+b5wU2wHfugXH0tjGyLS1ibTjj6KibXMMhKN59rXRuafw@mail.gmail.com>
[not found] ` <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com>
2026-05-11 12:25 ` [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt Thomas Gleixner
2026-05-12 9:33 ` Alexander Potapenko
2026-05-12 11:15 ` Mark Rutland
2026-05-12 16:24 ` Alexander Potapenko
2026-05-12 17:46 ` Mark Rutland
2026-05-13 0:36 ` Thomas Gleixner
2026-05-13 7:54 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox