Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
       [not found]   ` <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com>
@ 2026-05-11 12:25     ` Thomas Gleixner
  2026-05-12  9:33       ` Alexander Potapenko
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2026-05-11 12:25 UTC (permalink / raw)
  To: Alexander Potapenko, Dmitry Vyukov, Mark Rutland
  Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs,
	ruanjinjie

On Wed, Apr 29 2026 at 10:09, Alexander Potapenko wrote:
> On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> I believe Mark's recent changes introduced this.

I don't believe any of this. I prefer facts.

> irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> and `state`.
> Because there is a lot of non-instrumented code around, we fail to
> initialize these variables.
> Instead, irqentry_enter_from_kernel_mode() explicitly calls
> kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> the state.
> We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> at the same place.

Good luck with unpoisoning 'state'. 'state' is not memory to begin with.

'state' is composed in irqentry_enter_from_kernel_mode() and returned
from there to the caller as value in RAX. That returned value is then
provided as function argument to irqentry_exit() and subsequently to the
inlines invoked from there. See the deeper ASM analysis below.

Aside of that the whole splat makes no sense at all due to this:

>> > Uninit was created at:
>> >  slab_post_alloc_hook mm/slub.c:4576 [inline]
>> >  slab_alloc_node mm/slub.c:4898 [inline]
>> >  __do_kmalloc_node mm/slub.c:5294 [inline]
>> >  __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403
>> >  kmalloc_reserve net/core/skbuff.c:635 [inline]
>> >  __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713
>> >  alloc_skb include/linux/skbuff.h:1383 [inline]
>> >  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline]
>> >  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>> >  nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922

How can irqentry_exit_to_kernel_mode_preempt() "touch" an uninitialized
SKB which was allocated by the interrupted code, when it only evaluates
regs->eflags and the rcu_exit portion of the register supplied 'state'
argument?

>> > =====================================================
>> > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>> >  irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>> >  irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline]
>> >  irqentry_exit+0x7b/0x760 kernel/entry/common.c:164
>> >  sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061
>> >  asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697
>> >  __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210
>> >  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline]
>> >  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>> >  nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922

So I got the vmlinux and disassembled the related code parts:

ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>:
ffffffff81da7240:       55                      push   %rbp
ffffffff81da7241:       48 89 e5                mov    %rsp,%rbp
ffffffff81da7244:       41 57                   push   %r15
ffffffff81da7246:       41 56                   push   %r14
ffffffff81da7248:       41 55                   push   %r13
ffffffff81da724a:       41 54                   push   %r12
ffffffff81da724c:       53                      push   %rbx
ffffffff81da724d:       41 89 f6                mov    %esi,%r14d	// Save 'state' in r14
ffffffff81da7250:       49 89 ff                mov    %rdi,%r15        // Save 'regs' in r15
ffffffff81da7253:       e8 08 84 d4 00          call   ffffffff82aef660 <__msan_get_context_state>
ffffffff81da7258:       44 0f b6 68 08          movzbl 0x8(%rax),%r13d  // Read magic value into r13d (related to 'state')
ffffffff81da725d:       8b 98 90 0c 00 00       mov    0xc90(%rax),%ebx // Read magic value into ebx (related to 'regs')

ffffffff81da7263:       4d 8b a7 90 00 00 00    mov    0x90(%r15),%r12  // Read regs->eflags into r12

ffffffff81da726a:       49 81 c7 90 00 00 00    add    $0x90,%r15	// Add eflags offset to r15
ffffffff81da7271:       4c 89 ff                mov    %r15,%rdi	// MSAN check for accessing regs->eflags
ffffffff81da7274:       e8 b7 78 d4 00          call   ffffffff82aeeb30 <__msan_metadata_ptr_for_load_8>
ffffffff81da7279:       8b 08                   mov    (%rax),%ecx	// Read magic data from the address returned by msan
ffffffff81da727b:       89 c8                   mov    %ecx,%eax	// Transfer result to eax
ffffffff81da727d:       f7 d0                   not    %eax		// One's complement negate eax
ffffffff81da727f:       41 81 e4 00 02 00 00    and    $0x200,%r12d	// Isolate eflags::IF (bit 9)

ffffffff81da7286:       c1 e9 09                shr    $0x9,%ecx	// Right shift the magic value by 9 (IF into bit 0)

ffffffff81da7289:       0d ff fd ff ff          or     $0xfffffdff,%eax	// Set all bits in eax except bit 9
ffffffff81da728e:       41 85 c4                test   %eax,%r12d	// Logical compare between eax and r12d

The result is in ZF: 1 when bit 9 is not set in eax and/or r12d. 0 if bit 9 is set in both.
	
ffffffff81da7291:       0f 94 c0                sete   %al		// If ZF==1 AL=1 otherwise AL=0
ffffffff81da7294:       20 c8                   and    %cl,%al		// And the right shifted bit 9

ffffffff81da7296:       4d 85 e4                test   %r12,%r12	// Check whether r12 is 0 ( IF set or not )
ffffffff81da7299:       41 0f 94 c7             sete   %r15b		// If ZF==1 r15b=1 otherwise r15b=0

ffffffff81da729d:       44 89 e1                mov    %r12d,%ecx	// Transfer the isolated bit 9 into ecx
ffffffff81da72a0:       c1 e9 09                shr    $0x9,%ecx	// Shift it into bit 0
ffffffff81da72a3:       44 20 e9                and    %r13b,%cl	// And the magic context state value

ffffffff81da72a6:       45 08 f7                or     %r14b,%r15b	// Or 'state' on the IF result created above

ffffffff81da72a9:       41 f6 d6                not    %r14b		// One's complement negate 'state'
ffffffff81da72ac:       45 08 ee                or     %r13b,%r14b	// Or the above magic KMSAN value on the negated 'state'

'state' is a register argument value. So how on earth can KMSAN mangle this
with information stored in the KMSAN context? See below.

ffffffff81da72af:       41 0f b6 f6             movzbl %r14b,%esi	// zero extend the result into esi
ffffffff81da72b3:       0f b6 c9                movzbl %cl,%ecx		// zero extend the magic result into ecx
ffffffff81da72b6:       84 c0                   test   %al,%al		// Check whether al is 0 (magic IF result)

ffffffff81da72b8:       0f 45 ce                cmovne %esi,%ecx        // If not, move the magic mangled state result into ecx

This is where the whole thing goes south because it uses the register
state which was wrongfully mangled with the KMSAN state in r13b.

ffffffff81da72bb:       f6 c1 01                test   $0x1,%cl		// Check whether the result is 1
			      			       			// If 1, jump to the kmsan warning section

ffffffff81da72be:       75 1b                   jne    ffffffff81da72db <irqentry_exit_to_kernel_mode_preempt+0x9b>

ffffffff81da72c0:       41 f6 c7 01             test   $0x1,%r15b       // Check the actual condition

i.e.  if (regs_irqs_disabled(regs) || state.exit_rcu)

ffffffff81da72c4:       75 05                   jne    ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b>
ffffffff81da72c6:       e8 9d f1 37 0f          call   ffffffff91126468 <__SCT__irqentry_exit_cond_resched>
ffffffff81da72cb:       5b                      pop    %rbx
ffffffff81da72cc:       41 5c                   pop    %r12
ffffffff81da72ce:       41 5d                   pop    %r13
ffffffff81da72d0:       41 5e                   pop    %r14
ffffffff81da72d2:       41 5f                   pop    %r15
ffffffff81da72d4:       5d                      pop    %rbp
ffffffff81da72d5:       2e e9 d5 e2 37 0f       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>

// KMSAN warning

ffffffff81da72db:       31 c9                   xor    %ecx,%ecx
ffffffff81da72dd:       4d 85 e4                test   %r12,%r12
ffffffff81da72e0:       0f 44 d9                cmove  %ecx,%ebx
ffffffff81da72e3:       84 c0                   test   %al,%al
ffffffff81da72e5:       74 02                   je     ffffffff81da72e9 <irqentry_exit_to_kernel_mode_preempt+0xa9>
ffffffff81da72e7:       8b 1a                   mov    (%rdx),%ebx
ffffffff81da72e9:       89 df                   mov    %ebx,%edi
ffffffff81da72eb:       e8 30 83 d4 00          call   ffffffff82aef620 <__msan_warning>
ffffffff81da72f0:       41 f6 c7 01             test   $0x1,%r15b
ffffffff81da72f4:       74 d0                   je     ffffffff81da72c6 <irqentry_exit_to_kernel_mode_preempt+0x86>
ffffffff81da72f6:       eb d3                   jmp    ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b>

Let's look at the call sites to prove that KMSAN cannot observe 'state'
as a memory access ever.

noinstr sysvec_apic_timer_interrupt(regs)
{
  // Not observable stack or register state
  irqentry_state_t state = irqentry_enter(regs);

  instrumentation_begin();
  __sysvec_apic_timer_interrupt(regs);
  instrumentation_end();

  irqentry_exit(regs, state);
}

Here is the disassembly:

ffffffff910e6840 <sysvec_apic_timer_interrupt>:
ffffffff910e6840:       f3 0f 1e fa             endbr64
ffffffff910e6844:       55                      push   %rbp
ffffffff910e6845:       48 89 e5                mov    %rsp,%rbp
ffffffff910e6848:       41 56                   push   %r14
ffffffff910e684a:       53                      push   %rbx
ffffffff910e684b:       48 89 fb                mov    %rdi,%rbx			       // Saves 'regs' in rbx
ffffffff910e684e:       e8 ad 07 00 00          call   ffffffff910e7000 <irqentry_enter>       // invokes irqentry_enter()
ffffffff910e6853:       41 89 c6                mov    %eax,%r14d			       // Saves the returned 'state' in r14d
ffffffff910e6856:       65 c6 05 a2 d9 7f 04 01 movb   $0x1,%gs:0x47fd9a2(%rip)        # ffffffff958e4200 <irq_stat>
ffffffff910e685e:       90                      nop
ffffffff910e685f:       f6 83 88 00 00 00 03    testb  $0x3,0x88(%rbx)
ffffffff910e6866:       75 0b                   jne    ffffffff910e6873 <sysvec_apic_timer_interrupt+0x33>
ffffffff910e6868:       65 8a 05 c3 f7 76 04    mov    %gs:0x476f7c3(%rip),%al        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e686f:       84 c0                   test   %al,%al
ffffffff910e6871:       74 29                   je     ffffffff910e689c <sysvec_apic_timer_interrupt+0x5c>
ffffffff910e6873:       e8 28 eb 8e f0          call   ffffffff819d53a0 <irq_enter_rcu>
ffffffff910e6878:       48 89 df                mov    %rbx,%rdi
ffffffff910e687b:       e8 c0 a5 80 f0          call   ffffffff818f0e40 <__sysvec_apic_timer_interrupt>
ffffffff910e6880:       e8 3b ed 8e f0          call   ffffffff819d55c0 <irq_exit_rcu>
ffffffff910e6885:       90                      nop
ffffffff910e6886:       41 0f b6 f6             movzbl %r14b,%esi			// Moves the saved 'state' into esi
ffffffff910e688a:       48 89 df                mov    %rbx,%rdi			// Moves the saved 'regs' into rdi
ffffffff910e688d:       e8 4e 08 00 00          call   ffffffff910e70e0 <irqentry_exit> // Invokes irqentry_exit()
ffffffff910e6892:       5b                      pop    %rbx
ffffffff910e6893:       41 5e                   pop    %r14
ffffffff910e6895:       5d                      pop    %rbp
ffffffff910e6896:       2e e9 14 ed 03 00       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>
ffffffff910e689c:       65 c6 05 8e f7 76 04 01 movb   $0x1,%gs:0x476f78e(%rip)        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e68a4:       65 4c 8b 1d 54 f7 76 04 mov    %gs:0x476f754(%rip),%r11        # ffffffff95856000 <hardirq_stack_ptr>
ffffffff910e68ac:       49 89 23                mov    %rsp,(%r11)
ffffffff910e68af:       4c 89 dc                mov    %r11,%rsp
ffffffff910e68b2:       e8 e9 ea 8e f0          call   ffffffff819d53a0 <irq_enter_rcu>
ffffffff910e68b7:       48 89 df                mov    %rbx,%rdi
ffffffff910e68ba:       e8 81 a5 80 f0          call   ffffffff818f0e40 <__sysvec_apic_timer_interrupt>
ffffffff910e68bf:       e8 fc ec 8e f0          call   ffffffff819d55c0 <irq_exit_rcu>
ffffffff910e68c4:       5c                      pop    %rsp
ffffffff910e68c5:       65 c6 05 65 f7 76 04 00 movb   $0x0,%gs:0x476f765(%rip)        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e68cd:       eb b6                   jmp    ffffffff910e6885 <sysvec_apic_timer_interrupt+0x45>
ffffffff910e68cf:       90                      nop

Purely register state in a function which is not instrumented to begin with.

Now let's look at irqentry_enter():

noinstr irqentry_enter(regs)
{
  // Not observable stack or register state
  irqentry_state_t state = ....;
  ....
  return state;
}


ffffffff910e7000 <irqentry_enter>:
ffffffff910e7000:       f3 0f 1e fa             endbr64
ffffffff910e7004:       55                      push   %rbp
ffffffff910e7005:       48 89 e5                mov    %rsp,%rbp
ffffffff910e7008:       53                      push   %rbx
ffffffff910e7009:       f6 87 88 00 00 00 03    testb  $0x3,0x88(%rdi)				// check for kernel entry
ffffffff910e7010:       74 59                   je     ffffffff910e706b <irqentry_enter+0x6b>   // branch taken 

...

ffffffff910e706b:       65 48 8b 05 95 ef 76 04 mov    %gs:0x476ef95(%rip),%rax        # ffffffff95856008 <current_task>
ffffffff910e7073:       f6 40 2c 02             testb  $0x2,0x2c(%rax)				       // is_idle_task()?
ffffffff910e7077:       75 0b                   jne    ffffffff910e7084 <irqentry_enter+0x84>	
ffffffff910e7079:       90                      nop
ffffffff910e707a:       e8 91 99 a0 f1          call   ffffffff82af0a10 <kmsan_unpoison_entry_regs>    // Unpoisons regs
ffffffff910e707f:       90                      nop
ffffffff910e7080:       31 c0                   xor    %eax,%eax				       // state = 0
ffffffff910e7082:       eb 14                   jmp    ffffffff910e7098 <irqentry_enter+0x98>          // return
ffffffff910e7084:       48 89 fb                mov    %rdi,%rbx				       // Preserve 'regs'	  
ffffffff910e7087:       e8 44 0c 00 00          call   ffffffff910e7cd0 <ct_irq_enter>
ffffffff910e708c:       90                      nop
ffffffff910e708d:       48 89 df                mov    %rbx,%rdi				      // Restore 'regs'
ffffffff910e7090:       e8 7b 99 a0 f1          call   ffffffff82af0a10 <kmsan_unpoison_entry_regs>   // Unpoisons 'regs'
ffffffff910e7095:       90                      nop
ffffffff910e7096:       b0 01                   mov    $0x1,%al					      // state = 1
ffffffff910e7098:       5b                      pop    %rbx					      // return
ffffffff910e7099:       5d                      pop    %rbp
ffffffff910e709a:       2e e9 10 e5 03 00       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>

As this just returns state constants in RAX, there is zero related KMSAN memory state, right?

Now irqentry_exit:

noinstr irqentry_exit(regs, state)
{
	....
	irqentry_exit_to_kernel_mode_preempt(regs, state);
}

ffffffff910e70e0 <irqentry_exit>:
ffffffff910e70e0:       f3 0f 1e fa             endbr64
ffffffff910e70e4:       55                      push   %rbp
ffffffff910e70e5:       48 89 e5                mov    %rsp,%rbp
ffffffff910e70e8:       41 57                   push   %r15
ffffffff910e70ea:       41 56                   push   %r14
ffffffff910e70ec:       41 55                   push   %r13
ffffffff910e70ee:       41 54                   push   %r12
ffffffff910e70f0:       53                      push   %rbx
ffffffff910e70f1:       48 83 ec 68             sub    $0x68,%rsp
ffffffff910e70f5:       49 89 fe                mov    %rdi,%r14	// Save 'regs'
ffffffff910e70f8:       f6 87 88 00 00 00 03    testb  $0x3,0x88(%rdi)	// return to kernel mode
ffffffff910e70ff:       74 4c                   je     ffffffff910e714d <irqentry_exit+0x6d>  // branch taken
...
ffffffff910e714d:       89 f3                   mov    %esi,%ebx	// Save 'state'
ffffffff910e714f:       90                      nop
ffffffff910e7150:       0f b6 f3                movzbl %bl,%esi
ffffffff910e7153:       4c 89 f7                mov    %r14,%rdi
ffffffff910e7156:       e8 e5 00 cc f0          call   ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>
ffffffff910e715b:       90                      nop

Again. 'state' is a pure register value, which is handed to
irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().

But KMSAN magically associates a memory access which it and then claims
it belongs to a SKB which was allocated in the interrupted code.

What Mark's change actually does is to make the register value 'state'
observable in an instrumented function, while before that 'state' was
always confined in the non instrumentable code.

But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
is a pure register value, which could be even a constant supplied by the
caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.

I just built the same .config with clang-19 and it is hallucinating in
the same way, though the actual assembly is more readable than the
syzbot clang-21 variant:

0000000000002710 <irqentry_exit_to_kernel_mode_preempt>:
    2710:       55                      push   %rbp
    2711:       48 89 e5                mov    %rsp,%rbp
    2714:       41 57                   push   %r15
    2716:       41 56                   push   %r14
    2718:       41 55                   push   %r13
    271a:       41 54                   push   %r12
    271c:       53                      push   %rbx
    271d:       89 f3                   mov    %esi,%ebx	// save 'state' in EBX
    271f:       49 89 ff                mov    %rdi,%r15	// save 'regs' in R15
    2722:       e8 00 00 00 00          call   2727 <irqentry_exit_to_kernel_mode_preempt+0x17> 2723: R_X86_64_PLT32    __msan_get_context_state-0x4
    2727:       44 0f b6 60 08          movzbl 0x8(%rax),%r12d	  // Magic KMSAN data related to 'state'
    272c:       44 8b b0 90 0c 00 00    mov    0xc90(%rax),%r14d  // Magic KMSAN data related to 'regs::eflags'
    2733:       4d 8b af 90 00 00 00    mov    0x90(%r15),%r13	  // Read the actual regs->eflags
    273a:       49 81 c7 90 00 00 00    add    $0x90,%r15	  // Adjust pointer to regs::eflags
    2741:       4c 89 ff                mov    %r15,%rdi
    2744:       e8 00 00 00 00          call   2749 <irqentry_exit_to_kernel_mode_preempt+0x39> 2745: R_X86_64_PLT32    __msan_metadata_ptr_for_load_8-0x4
    2749:       48 8b 00                mov    (%rax),%rax	  // Read the KMSAN state
    274c:       25 00 02 00 00          and    $0x200,%eax        // Isolate bit 9 (IF?)

    2751:       41 81 e5 00 02 00 00    and    $0x200,%r13d	  // Isolate IF in the actual value
    2758:       48 85 c0                test   %rax,%rax	  // Test whether the KMSAN IF state is 0
    275b:       74 07                   je     2764 <irqentry_exit_to_kernel_mode_preempt+0x54>
								  // Not zero
    275d:       f7 d0                   not    %eax		  // negate the KMSAN state
    275f:       44 21 e8                and    %r13d,%eax	  // And the actual value (IF isolated)
    		      			       			  // If zero, goto MSAN warning
    2762:       74 25                   je     2789 <irqentry_exit_to_kernel_mode_preempt+0x79>
    
    2764:       4d 85 ed                test   %r13,%r13	  // Check whether IF is set
    		      			       			  // If not, return
    2767:       74 10                   je     2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2769:       41 f6 c4 01             test   $0x1,%r12b	  // Check the magic KMSAN state (related to 'state')
		      	 		       			  // If not zero, goto MSAN warning   
    276d:       75 28                   jne    2797 <irqentry_exit_to_kernel_mode_preempt+0x87>

    276f:       f6 c3 01                test   $0x1,%bl		  // Check state.rcu_exit
    		      			       			  // If not, return
    2772:       75 05                   jne    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2774:       e8 00 00 00 00          call   2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2775: R_X86_64_PLT32    __SCT__irqentry_exit_cond_resched-0x4
    2779:       5b                      pop    %rbx
    277a:       41 5c                   pop    %r12
    277c:       41 5d                   pop    %r13
    277e:       41 5e                   pop    %r14
    2780:       41 5f                   pop    %r15
    2782:       5d                      pop    %rbp
    2783:       2e e9 00 00 00 00       cs jmp 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2785: R_X86_64_PLT32    __x86_return_thunk-0x4
    2789:       8b 3a                   mov    (%rdx),%edi
    278b:       e8 00 00 00 00          call   2790 <irqentry_exit_to_kernel_mode_preempt+0x80> 278c: R_X86_64_PLT32    __msan_warning-0x4
    2790:       4d 85 ed                test   %r13,%r13
    2793:       75 d4                   jne    2769 <irqentry_exit_to_kernel_mode_preempt+0x59>
    2795:       eb e2                   jmp    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2797:       44 89 f7                mov    %r14d,%edi
    279a:       e8 00 00 00 00          call   279f <irqentry_exit_to_kernel_mode_preempt+0x8f> 279b: R_X86_64_PLT32    __msan_warning-0x4
    279f:       f6 c3 01                test   $0x1,%bl
    27a2:       74 d0                   je     2774 <irqentry_exit_to_kernel_mode_preempt+0x64>
    27a4:       eb d3                   jmp    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>

The compiler _cannot_ assume anything about the 'state' argument as
that's handed in as value in RSI from a completely different compilation
unit.

Something is wrong in KMSAN/compiler land or do you still believe that
you just need to unpoison the non existing memory 'state'?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-11 12:25     ` [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt Thomas Gleixner
@ 2026-05-12  9:33       ` Alexander Potapenko
  2026-05-12 11:15         ` Mark Rutland
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Potapenko @ 2026-05-12  9:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Vyukov, Mark Rutland, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
>

Hi Thomas,

thanks for diving into this!

> > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > and `state`.
> > Because there is a lot of non-instrumented code around, we fail to
> > initialize these variables.
> > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > the state.
> > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > at the same place.
>
> Good luck with unpoisoning 'state'. 'state' is not memory to begin with.

My bad. Normally, the compiler would happily move `state` to memory if it is
address-taken, and make sure that its shadow is tracked properly.
But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
a `noinstr` function.

<I omit the elaborate assembly analysis here, tipping my hat!>

> Again. 'state' is a pure register value, which is handed to
> irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().

There is no notion of a 'pure register value' in C, and the compiler may make
arbitrary decisions about whether a particular value is stored on the stack or
in the registers.

Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
level and can track the state of a value regardless of where it is stored.
In particular, it normally works for function return values - unless `noinstr`
kicks in.

>
> But KMSAN magically associates a memory access which it and then claims
> it belongs to a SKB which was allocated in the interrupted code.
>
> What Mark's change actually does is to make the register value 'state'
> observable in an instrumented function, while before that 'state' was
> always confined in the non instrumentable code.

Agreed.

> But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> is a pure register value, which could be even a constant supplied by the
> caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.

I disagree with a general implication that KMSAN has zero business to fiddle
with values passed in registers. But I agree we are doing a poor job trying
to pull the shadow for `state` out of thin air.

>
> The compiler _cannot_ assume anything about the 'state' argument as
> that's handed in as value in RSI from a completely different compilation
> unit.
>

Again, this only matters because we are calling an instrumented function from
a non-instrumented one, otherwise it's perfectly fine to call between
compilation units.

> Something is wrong in KMSAN/compiler land or do you still believe that
> you just need to unpoison the non existing memory 'state'?
>

When we call an instrumented function from a non-instrumented one, the compiler
is doomed to not understand that and to be unable to track the function
parameters properly. Exactly because `noinstr` implies no instrumentation
whatsoever, the compiler may not add any hints on the caller side that would
help the callee understand what's going on - even if KMSAN is able to see this
`noinstr` function (which is not always the case).

So what we could do is to add annotations manualy on either the caller side or
the callee side.

We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
making all its inputs initialized. This is the easiest solution, it may
introduce false negatives, but we are on a very thin ice anyway, so perhaps
doing so is better than dealing with more false positives in the interrupt code.

Another option for the callee would be applying `__always_inline`, so that
irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
Given that irqentry_exit_to_kernel_mode_after_preempt() is already
`__always_inline`, it might be the right thing to do.

On the caller side, we could do something creative with instrumentation_begin()
and instrumentation_end(). We've had a discussion about that exactly four years
ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
, but came to a conclusion that a handful of annotations on the noinstr/instr
boundary may do a better job than a solution that doesn't cover all cases.

In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
non-existing today) cases where two functions are called from an instrumented
region, and the first function somehow leaves the argument state poisoned.

Do you think it's worth revisiting the instrumentation_begin() approach, or
shall we go with one of the compiler attributes instead?

Thank you,
Alexander

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12  9:33       ` Alexander Potapenko
@ 2026-05-12 11:15         ` Mark Rutland
  2026-05-12 16:24           ` Alexander Potapenko
  0 siblings, 1 reply; 5+ messages in thread
From: Mark Rutland @ 2026-05-12 11:15 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote:
> On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
> > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > > and `state`.
> > > Because there is a lot of non-instrumented code around, we fail to
> > > initialize these variables.
> > > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > > the state.
> > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > > at the same place.
> >
> > Good luck with unpoisoning 'state'. 'state' is not memory to begin with.
> 
> My bad. Normally, the compiler would happily move `state` to memory if it is
> address-taken, and make sure that its shadow is tracked properly.
> But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
> a `noinstr` function.
> 
> <I omit the elaborate assembly analysis here, tipping my hat!>
> 
> > Again. 'state' is a pure register value, which is handed to
> > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().
> 
> There is no notion of a 'pure register value' in C, and the compiler may make
> arbitrary decisions about whether a particular value is stored on the stack or
> in the registers.
> 
> Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
> level and can track the state of a value regardless of where it is stored.
> In particular, it normally works for function return values - unless `noinstr`
> kicks in.
> 
> >
> > But KMSAN magically associates a memory access which it and then claims
> > it belongs to a SKB which was allocated in the interrupted code.
> >
> > What Mark's change actually does is to make the register value 'state'
> > observable in an instrumented function, while before that 'state' was
> > always confined in the non instrumentable code.
> 
> Agreed.
> 
> > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> > is a pure register value, which could be even a constant supplied by the
> > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.
> 
> I disagree with a general implication that KMSAN has zero business to fiddle
> with values passed in registers. But I agree we are doing a poor job trying
> to pull the shadow for `state` out of thin air.
> 
> > The compiler _cannot_ assume anything about the 'state' argument as
> > that's handed in as value in RSI from a completely different compilation
> > unit.
> 
> Again, this only matters because we are calling an instrumented function from
> a non-instrumented one, otherwise it's perfectly fine to call between
> compilation units.

For context, can you explain how this is expected to work across
compilation units when the caller and callee *are* instrumented, when an
argument is passed in a register?

Below you suggest that the caller might add "hints", but it's not clear
specifically what this means.

> > Something is wrong in KMSAN/compiler land or do you still believe that
> > you just need to unpoison the non existing memory 'state'?
> 
> When we call an instrumented function from a non-instrumented one, the compiler
> is doomed to not understand that and to be unable to track the function
> parameters properly. Exactly because `noinstr` implies no instrumentation
> whatsoever, the compiler may not add any hints on the caller side that would
> help the callee understand what's going on - even if KMSAN is able to see this
> `noinstr` function (which is not always the case).
>
> So what we could do is to add annotations manualy on either the caller side or
> the callee side.

Without some understanding of those "hints" you mention, I don't see how
we can do that on the caller side.

> We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> making all its inputs initialized. This is the easiest solution, it may
> introduce false negatives, but we are on a very thin ice anyway, so perhaps
> doing so is better than dealing with more false positives in the interrupt code.
> 
> Another option for the callee would be applying `__always_inline`, so that
> irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> `__always_inline`, it might be the right thing to do.

We can do that, but this really suggests that there's a fundemantal
inability to pass arguments between code which is noinstr and code which
isinstrumented with AddressSanitizer, and that's inevitably going to
bite us in future.

> On the caller side, we could do something creative with instrumentation_begin()
> and instrumentation_end(). We've had a discussion about that exactly four years
> ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> , but came to a conclusion that a handful of annotations on the noinstr/instr
> boundary may do a better job than a solution that doesn't cover all cases.

That doesn't look general at all, so I am not keen on that.

> In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> non-existing today) cases where two functions are called from an instrumented
> region, and the first function somehow leaves the argument state poisoned.

How exactly is kmsan_context_state used?

If that's supposed to carry some global or current context, surely
blatting that in entry code will affect the code that was interrupted?

I see kmsan_get_context() has an in_task() check, but that can't help
with nested exceptions, so this doesn't look right at all.

> Do you think it's worth revisiting the instrumentation_begin() approach, or
> shall we go with one of the compiler attributes instead?

I think we need a better understanding of this first.

It looks to me that there are bigger problems here.

Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 11:15         ` Mark Rutland
@ 2026-05-12 16:24           ` Alexander Potapenko
  2026-05-12 17:46             ` Mark Rutland
  0 siblings, 1 reply; 5+ messages in thread
From: Alexander Potapenko @ 2026-05-12 16:24 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote:
> > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
> > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > > > and `state`.
> > > > Because there is a lot of non-instrumented code around, we fail to
> > > > initialize these variables.
> > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > > > the state.
> > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > > > at the same place.
> > >
> > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with.
> >
> > My bad. Normally, the compiler would happily move `state` to memory if it is
> > address-taken, and make sure that its shadow is tracked properly.
> > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
> > a `noinstr` function.
> >
> > <I omit the elaborate assembly analysis here, tipping my hat!>
> >
> > > Again. 'state' is a pure register value, which is handed to
> > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().
> >
> > There is no notion of a 'pure register value' in C, and the compiler may make
> > arbitrary decisions about whether a particular value is stored on the stack or
> > in the registers.
> >
> > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
> > level and can track the state of a value regardless of where it is stored.
> > In particular, it normally works for function return values - unless `noinstr`
> > kicks in.
> >
> > >
> > > But KMSAN magically associates a memory access which it and then claims
> > > it belongs to a SKB which was allocated in the interrupted code.
> > >
> > > What Mark's change actually does is to make the register value 'state'
> > > observable in an instrumented function, while before that 'state' was
> > > always confined in the non instrumentable code.
> >
> > Agreed.
> >
> > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> > > is a pure register value, which could be even a constant supplied by the
> > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.
> >
> > I disagree with a general implication that KMSAN has zero business to fiddle
> > with values passed in registers. But I agree we are doing a poor job trying
> > to pull the shadow for `state` out of thin air.
> >
> > > The compiler _cannot_ assume anything about the 'state' argument as
> > > that's handed in as value in RSI from a completely different compilation
> > > unit.
> >
> > Again, this only matters because we are calling an instrumented function from
> > a non-instrumented one, otherwise it's perfectly fine to call between
> > compilation units.
>
> For context, can you explain how this is expected to work across
> compilation units when the caller and callee *are* instrumented, when an
> argument is passed in a register?

Per KMSAN ABI, the metadata for the function arguments is passed via
buffers in the so-called context state (see
include/linux/kmsan_types.h)
In the userspace, these buffers are thread-local variables referenced
by inline loads and stores.
In the kernel space, the compiler inserts a call
__msan_get_context_state() at the beginning of every function, and
then the instrumentation code uses whatever that function returned.

Assume we have a function:

int sum(int a, int b) {
    result = a + b;
    return result;
}

Its instrumented version looks roughly as follows (we'll omit origin
tracking for simplicity):

int sum(int a, int b) {
  struct kmsan_context_state *kcs = __msan_get_context_state();
  int s_a = ((int)kcs->param_tls)[0];  // shadow of a
  int s_b = ((int)kcs->param_tls)[1];  // shadow of b
  result = a + b;
  s_result = s_a | s_b;
  ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
  return result;
}

Most certainly `a` and `b` will be passed using registers, but that
doesn't matter: their metadata is safe as long as the caller does:

((int)kcs->param_tls)[0] = s_a;
((int)kcs->param_tls)[1] = s_b;
result = sum(a, b);
s_result = ((int)kcs->retval_tls)[0];

All the metadata tracking is implemented using an LLVM IR pass, which
ensures that the shadow for each uninitialized value is tracked
regardless of where that value is stored - be that heap memory, stack
or registers:
- when a value is created or loaded from memory, the compiler inserts
SSA registers corresponding to that value's shadow;
- when a value is written to memory, a shadow store is inserted;
- when a value is used in an operation, the result of that operation
receives a shadow value depending on the operands and their shadow
values;
- when a value is passed to a function, the corresponding context
state stores and loads are emitted.

Now, this works fine under the assumption that all code in the kernel
is instrumented with KMSAN.
However this is not true.

1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some
prevent infinite recursion caused by KMSAN calling instrumented code;
others disable instrumentation of the code that executes before KMSAN
is fully set up.
2. There are annotations to soft-disable KMSAN for a particular
function by adding a `__no_kmsan_checks` attribute. There will still
be instrumentation, but all the function outputs (stores and returns)
will be initialized, and no errors will be reported.
This is handy when the compiler cannot properly instrument the
underlying code. One example would be tricky inline assembly, another
one - stack walking functions that deliberately read and write
uninitialized values.
3. There are cases in which KMSAN must be disabled for the whole
function (`__no_sanitize_memory`).
We disable instrumentation for `noinstr` functions. Additionally, on
x86 there is exactly one case where this is done to avoid infinite
recursion when storing the origins.

It's important to note that, due to how function attributes are
implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will
be applied to a function if that function exists during the KMSAN
instrumentation pass. If it was inlined before that pass, the compiler
will apply the instrumentation rules for whichever function is calling
the inlined one.

Now, because of the described ABI, calls between instrumented
functions and/or functions marked `__no_kmsan_checks` are perfectly
fine because the function arguments are stored and loaded on both
ends.
What does not work well is calling non-instrumented functions from
instrumented functions, and vice versa.
In the former case, KMSAN will not see the callee's side effects
(return values or memory stores), in the latter case, the callee may
receive incorrect shadow values for the function parameters or memory
stores in the caller. Both will
We have few tools to deal with such cases. It often helps to move the
border between the instrumented and non-instrumented code by applying
__no_kmsan_checks or __no_sanitize_memory until we get to a point
where there are no incorrect shadow arguments.
Another way to deal with uninitialized data coming from
non-instrumented code is kmsan_unpoison_memory(address, size).
Unfortunately, as Thomas pointed out, we can't use it for locals in
non-instrumented code.

> Below you suggest that the caller might add "hints", but it's not clear
> specifically what this means.
>
> > > Something is wrong in KMSAN/compiler land or do you still believe that
> > > you just need to unpoison the non existing memory 'state'?
> >
> > When we call an instrumented function from a non-instrumented one, the compiler
> > is doomed to not understand that and to be unable to track the function
> > parameters properly. Exactly because `noinstr` implies no instrumentation
> > whatsoever, the compiler may not add any hints on the caller side that would
> > help the callee understand what's going on - even if KMSAN is able to see this
> > `noinstr` function (which is not always the case).
> >
> > So what we could do is to add annotations manualy on either the caller side or
> > the callee side.
>
> Without some understanding of those "hints" you mention, I don't see how
> we can do that on the caller side.

In this case by "hints" I meant any kind of instrumentation that the
compiler could have inserted automatically to mark the data in the
instrumented functions as initialized.
There are no such "hints" right now (apart from letting KMSAN
instrument the function).
As for the manual annotations, we only have the mentioned
`__no_kmsan_checks`, `__no_sanitize_memory`, and
kmsan_unpoison_memory().

>
> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > making all its inputs initialized. This is the easiest solution, it may
> > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > doing so is better than dealing with more false positives in the interrupt code.
> >
> > Another option for the callee would be applying `__always_inline`, so that
> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > `__always_inline`, it might be the right thing to do.
>
> We can do that, but this really suggests that there's a fundemantal
> inability to pass arguments between code which is noinstr and code which
> isinstrumented with AddressSanitizer, and that's inevitably going to
> bite us in future.

This is true (assuming you mean MemorySanitizer), but:
- I don't think we can avoid that, given that there will always be
non-instrumented code;
- so far the number of annotations has been manageable.

>
> > On the caller side, we could do something creative with instrumentation_begin()
> > and instrumentation_end(). We've had a discussion about that exactly four years
> > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> > , but came to a conclusion that a handful of annotations on the noinstr/instr
> > boundary may do a better job than a solution that doesn't cover all cases.
>
> That doesn't look general at all, so I am not keen on that.

Ack.

> > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> > non-existing today) cases where two functions are called from an instrumented
> > region, and the first function somehow leaves the argument state poisoned.
>
> How exactly is kmsan_context_state used?

Hope the explanation above helps.

>
> If that's supposed to carry some global or current context, surely
> blatting that in entry code will affect the code that was interrupted?

It will mostly affect the bottom-level function called by the entry
code, after that, nested functions will be just passing their
parameters as per KMSAN ABI.

> I see kmsan_get_context() has an in_task() check, but that can't help
> with nested exceptions, so this doesn't look right at all.

KMSAN context is per-task, and if !in_task(), it is per-CPU.

For the nested exceptions, we conservatively bail out (see
kmsan_in_runtime()) to avoid deadlocking.
This may cause false negatives.

> > Do you think it's worth revisiting the instrumentation_begin() approach, or
> > shall we go with one of the compiler attributes instead?

> I think we need a better understanding of this first.
>
> It looks to me that there are bigger problems here.
>

I'll be happy to discuss this further.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 16:24           ` Alexander Potapenko
@ 2026-05-12 17:46             ` Mark Rutland
  0 siblings, 0 replies; 5+ messages in thread
From: Mark Rutland @ 2026-05-12 17:46 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

Hi,

Thanks for this; the explanation of the KMSAN ABI is *very* helpful.

I have a few questions and comments inline below.

On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > For context, can you explain how this is expected to work across
> > compilation units when the caller and callee *are* instrumented, when an
> > argument is passed in a register?
> 
> Per KMSAN ABI, the metadata for the function arguments is passed via
> buffers in the so-called context state (see
> include/linux/kmsan_types.h)
> In the userspace, these buffers are thread-local variables referenced
> by inline loads and stores.
> In the kernel space, the compiler inserts a call
> __msan_get_context_state() at the beginning of every function, and
> then the instrumentation code uses whatever that function returned.
> 
> Assume we have a function:
> 
> int sum(int a, int b) {
>     result = a + b;
>     return result;
> }
> 
> Its instrumented version looks roughly as follows (we'll omit origin
> tracking for simplicity):
> 
> int sum(int a, int b) {
>   struct kmsan_context_state *kcs = __msan_get_context_state();
>   int s_a = ((int)kcs->param_tls)[0];  // shadow of a
>   int s_b = ((int)kcs->param_tls)[1];  // shadow of b
>   result = a + b;
>   s_result = s_a | s_b;
>   ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
>   return result;
> }
> 
> Most certainly `a` and `b` will be passed using registers, but that
> doesn't matter: their metadata is safe as long as the caller does:
> 
> ((int)kcs->param_tls)[0] = s_a;
> ((int)kcs->param_tls)[1] = s_b;
> result = sum(a, b);
> s_result = ((int)kcs->retval_tls)[0];

Ok. I see how that works when both caller and callee are instrumented.

I assume that's tracked for all arguments regardless of type? e.g. that
includes pointers, structs, etc?

> All the metadata tracking is implemented using an LLVM IR pass, which
> ensures that the shadow for each uninitialized value is tracked
> regardless of where that value is stored - be that heap memory, stack
> or registers:
> - when a value is created or loaded from memory, the compiler inserts
> SSA registers corresponding to that value's shadow;
> - when a value is written to memory, a shadow store is inserted;
> - when a value is used in an operation, the result of that operation
> receives a shadow value depending on the operands and their shadow
> values;
> - when a value is passed to a function, the corresponding context
> state stores and loads are emitted.
> 
> Now, this works fine under the assumption that all code in the kernel
> is instrumented with KMSAN.
> However this is not true.
> 
> 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some
> prevent infinite recursion caused by KMSAN calling instrumented code;
> others disable instrumentation of the code that executes before KMSAN
> is fully set up.
> 2. There are annotations to soft-disable KMSAN for a particular
> function by adding a `__no_kmsan_checks` attribute. There will still
> be instrumentation, but all the function outputs (stores and returns)
> will be initialized, and no errors will be reported.
> This is handy when the compiler cannot properly instrument the
> underlying code. One example would be tricky inline assembly, another
> one - stack walking functions that deliberately read and write
> uninitialized values.
> 3. There are cases in which KMSAN must be disabled for the whole
> function (`__no_sanitize_memory`).
> We disable instrumentation for `noinstr` functions. Additionally, on
> x86 there is exactly one case where this is done to avoid infinite
> recursion when storing the origins.

In addition to the above cases, out-of-line assembly functions won't
manipulate the shadow (so they're effectively the same as noinstr).

> It's important to note that, due to how function attributes are
> implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will
> be applied to a function if that function exists during the KMSAN
> instrumentation pass. If it was inlined before that pass, the compiler
> will apply the instrumentation rules for whichever function is calling
> the inlined one.
> 
> Now, because of the described ABI, calls between instrumented
> functions and/or functions marked `__no_kmsan_checks` are perfectly
> fine because the function arguments are stored and loaded on both
> ends.
> What does not work well is calling non-instrumented functions from
> instrumented functions, and vice versa.

Yep; understood.

> In the former case, KMSAN will not see the callee's side effects
> (return values or memory stores), in the latter case, the callee may
> receive incorrect shadow values for the function parameters or memory
> stores in the caller. Both will

Incomplete sentence here?

> We have few tools to deal with such cases. It often helps to move the
> border between the instrumented and non-instrumented code by applying
> __no_kmsan_checks or __no_sanitize_memory until we get to a point
> where there are no incorrect shadow arguments.
> Another way to deal with uninitialized data coming from
> non-instrumented code is kmsan_unpoison_memory(address, size).
> Unfortunately, as Thomas pointed out, we can't use it for locals in
> non-instrumented code.

It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
and I think it's non-sensical because conceptually they aren't memory.

I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
regs. Surely the pointer argument itself needs to be unpoisoned before
being passed to a callee?

Otherwise, when pointer to regs is passed as an argument to a callee,
the callee will consume stale shadow, which might indicate that the
pointer to regs is uninitialized?

That seems to be exactly what we're hitting with the irq entry state, so
I'm surprised we're not hitting it for the regs. Is that sheer luck, or
is something else protecting us?

> > Below you suggest that the caller might add "hints", but it's not clear
> > specifically what this means.
> >
> > > > Something is wrong in KMSAN/compiler land or do you still believe that
> > > > you just need to unpoison the non existing memory 'state'?
> > >
> > > When we call an instrumented function from a non-instrumented one, the compiler
> > > is doomed to not understand that and to be unable to track the function
> > > parameters properly. Exactly because `noinstr` implies no instrumentation
> > > whatsoever, the compiler may not add any hints on the caller side that would
> > > help the callee understand what's going on - even if KMSAN is able to see this
> > > `noinstr` function (which is not always the case).
> > >
> > > So what we could do is to add annotations manualy on either the caller side or
> > > the callee side.
> >
> > Without some understanding of those "hints" you mention, I don't see how
> > we can do that on the caller side.
> 
> In this case by "hints" I meant any kind of instrumentation that the
> compiler could have inserted automatically to mark the data in the
> instrumented functions as initialized.
> There are no such "hints" right now (apart from letting KMSAN
> instrument the function).
> As for the manual annotations, we only have the mentioned
> `__no_kmsan_checks`, `__no_sanitize_memory`, and
> kmsan_unpoison_memory().

Ok.

For noinstr code calling instrumented code, I see that we could
explicitly modify the argument shadow in the noinstr caller, but that's
*really* grotty, and I don't think we want to do that.

> > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > making all its inputs initialized. This is the easiest solution, it may
> > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > doing so is better than dealing with more false positives in the interrupt code.
> > >
> > > Another option for the callee would be applying `__always_inline`, so that
> > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > `__always_inline`, it might be the right thing to do.
> >
> > We can do that, but this really suggests that there's a fundemantal
> > inability to pass arguments between code which is noinstr and code which
> > isinstrumented with AddressSanitizer, and that's inevitably going to
> > bite us in future.
> 
> This is true (assuming you mean MemorySanitizer), but:

Sorry, yes, I meant MemorySanitizer.

> - I don't think we can avoid that, given that there will always be
> non-instrumented code;
> - so far the number of annotations has been manageable.

As above, I suspect that we're actually missing many necessary
annotations and getting away with this by accident.

I suspect that we need additional compiler support to say something like
"assume this argument/return value IS initialized".

> > > On the caller side, we could do something creative with instrumentation_begin()
> > > and instrumentation_end(). We've had a discussion about that exactly four years
> > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> > > , but came to a conclusion that a handful of annotations on the noinstr/instr
> > > boundary may do a better job than a solution that doesn't cover all cases.
> >
> > That doesn't look general at all, so I am not keen on that.
> 
> Ack.
> 
> > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> > > non-existing today) cases where two functions are called from an instrumented
> > > region, and the first function somehow leaves the argument state poisoned.
> >
> > How exactly is kmsan_context_state used?
> 
> Hope the explanation above helps.
> >
> > If that's supposed to carry some global or current context, surely
> > blatting that in entry code will affect the code that was interrupted?
> 
> It will mostly affect the bottom-level function called by the entry
> code, after that, nested functions will be just passing their
> parameters as per KMSAN ABI.

The problem I was explaining was the other way around: we clobber the
context of the function that was executing *before* exception entry.

Consider when the CPU is executing some function foo(), and an exception
is taken. Within the exception handler, code will clobber the context
state. AFAICT, that state is not saved/restored, and nor is it zeroed
prior to exception return. The exception handler returns back to the
middle of foo(). From foo's PoV, the context state has been clobbered
arbitrarily, and that clobbered state could trigger false positives or
false negatives.

The way kmsan_get_context() uses in_task() will avoid that in *some*
cases, but not *all* cases. In particular, I don't think that's going to
help when a fault is taken for a uaccess.

Maybe I'm missing something here?

> > I see kmsan_get_context() has an in_task() check, but that can't help
> > with nested exceptions, so this doesn't look right at all.
> 
> KMSAN context is per-task, and if !in_task(), it is per-CPU.
> 
> For the nested exceptions, we conservatively bail out (see
> kmsan_in_runtime()) to avoid deadlocking.
> This may cause false negatives.

AFAICT that doesn't address the case I've described above.

Mark.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-12 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <69e7ee1f.a00a0220.17a17.001d.GAE@google.com>
     [not found] ` <CACT4Y+b5wU2wHfugXH0tjGyLS1ibTjj6KibXMMhKN59rXRuafw@mail.gmail.com>
     [not found]   ` <CAG_fn=WGjaOSEh7JJ9=e-QvpoMSnSv7=__cerR696-qAajc=Vw@mail.gmail.com>
2026-05-11 12:25     ` [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt Thomas Gleixner
2026-05-12  9:33       ` Alexander Potapenko
2026-05-12 11:15         ` Mark Rutland
2026-05-12 16:24           ` Alexander Potapenko
2026-05-12 17:46             ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox