[syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode

All of lore.kernel.org
 help / color / mirror / Atom feed

* [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
@ 2026-04-21 21:37 syzbot
  2026-04-29  6:29 ` Dmitry Vyukov
  0 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2026-04-21 21:37 UTC (permalink / raw)
  To: linux-kernel, luto, peterz, syzkaller-bugs, tglx

Hello,

syzbot found the following issue on:

HEAD commit:    4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810
dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3
compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz
kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com

bridge0: port 1(bridge_slave_0) entered blocking state
bridge0: port 1(bridge_slave_0) entered forwarding state
=====================================================
BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
 irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
 irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline]
 irqentry_exit+0x7b/0x760 kernel/entry/common.c:164
 sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061
 asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697
 __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210
 nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline]
 nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
 nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385
 worker_thread+0xee4/0x1590 kernel/workqueue.c:3466
 kthread+0x53f/0x600 kernel/kthread.c:436
 ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

Uninit was created at:
 slab_post_alloc_hook mm/slub.c:4576 [inline]
 slab_alloc_node mm/slub.c:4898 [inline]
 __do_kmalloc_node mm/slub.c:5294 [inline]
 __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403
 kmalloc_reserve net/core/skbuff.c:635 [inline]
 __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713
 alloc_skb include/linux/skbuff.h:1383 [inline]
 nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline]
 nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
 nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922
 process_one_work kernel/workqueue.c:3302 [inline]
 process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385
 worker_thread+0xee4/0x1590 kernel/workqueue.c:3466
 kthread+0x53f/0x600 kernel/kthread.c:436
 ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

CPU: 1 UID: 0 PID: 13 Comm: kworker/u8:1 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
Workqueue: events_unbound nsim_dev_trap_report_work
=====================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-04-21 21:37 [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt syzbot
@ 2026-04-29  6:29 ` Dmitry Vyukov
  2026-04-29  8:09   ` Alexander Potapenko
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2026-04-29  6:29 UTC (permalink / raw)
  To: syzbot, kasan-dev, Alexander Potapenko
  Cc: linux-kernel, luto, peterz, syzkaller-bugs, tglx

On Tue, 21 Apr 2026 at 23:37, syzbot
<syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810
> dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3
> compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com
>

+kmsan maintainers, not sure where it should be fixed: in irq code or
in kmsan code

> =====================================================
> BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>  irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>  irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline]
>  irqentry_exit+0x7b/0x760 kernel/entry/common.c:164
>  sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061
>  asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697
>  __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210
>  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline]
>  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>  nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922
>  process_one_work kernel/workqueue.c:3302 [inline]
>  process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385
>  worker_thread+0xee4/0x1590 kernel/workqueue.c:3466
>  kthread+0x53f/0x600 kernel/kthread.c:436
>  ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> Uninit was created at:
>  slab_post_alloc_hook mm/slub.c:4576 [inline]
>  slab_alloc_node mm/slub.c:4898 [inline]
>  __do_kmalloc_node mm/slub.c:5294 [inline]
>  __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403
>  kmalloc_reserve net/core/skbuff.c:635 [inline]
>  __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713
>  alloc_skb include/linux/skbuff.h:1383 [inline]
>  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline]
>  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>  nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922
>  process_one_work kernel/workqueue.c:3302 [inline]
>  process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385
>  worker_thread+0xee4/0x1590 kernel/workqueue.c:3466
>  kthread+0x53f/0x600 kernel/kthread.c:436
>  ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>
> CPU: 1 UID: 0 PID: 13 Comm: kworker/u8:1 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026
> Workqueue: events_unbound nsim_dev_trap_report_work
> =====================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-04-29  6:29 ` Dmitry Vyukov
@ 2026-04-29  8:09   ` Alexander Potapenko
  2026-05-11 12:25     ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2026-04-29  8:09 UTC (permalink / raw)
  To: Dmitry Vyukov, Mark Rutland
  Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs,
	tglx, ruanjinjie

On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Tue, 21 Apr 2026 at 23:37, syzbot
> <syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810
> > dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3
> > compiler:       Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com
> >
>
> +kmsan maintainers, not sure where it should be fixed: in irq code or
> in kmsan code

I believe Mark's recent changes introduced this.
irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
and `state`.
Because there is a lot of non-instrumented code around, we fail to
initialize these variables.
Instead, irqentry_enter_from_kernel_mode() explicitly calls
kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
the state.
We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
at the same place.

>
> > =====================================================
> > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
> >  irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
> >  irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline]
> >  irqentry_exit+0x7b/0x760 kernel/entry/common.c:164
> >  sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061
> >  asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697
> >  __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210
> >  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline]
> >  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
> >  nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922
> >  process_one_work kernel/workqueue.c:3302 [inline]
> >  process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385
> >  worker_thread+0xee4/0x1590 kernel/workqueue.c:3466
> >  kthread+0x53f/0x600 kernel/kthread.c:436
> >  ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158
> >  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-04-29  8:09   ` Alexander Potapenko
@ 2026-05-11 12:25     ` Thomas Gleixner
  2026-05-12  9:33       ` Alexander Potapenko
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2026-05-11 12:25 UTC (permalink / raw)
  To: Alexander Potapenko, Dmitry Vyukov, Mark Rutland
  Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs,
	ruanjinjie

On Wed, Apr 29 2026 at 10:09, Alexander Potapenko wrote:
> On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> I believe Mark's recent changes introduced this.

I don't believe any of this. I prefer facts.

> irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> and `state`.
> Because there is a lot of non-instrumented code around, we fail to
> initialize these variables.
> Instead, irqentry_enter_from_kernel_mode() explicitly calls
> kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> the state.
> We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> at the same place.

Good luck with unpoisoning 'state'. 'state' is not memory to begin with.

'state' is composed in irqentry_enter_from_kernel_mode() and returned
from there to the caller as value in RAX. That returned value is then
provided as function argument to irqentry_exit() and subsequently to the
inlines invoked from there. See the deeper ASM analysis below.

Aside of that the whole splat makes no sense at all due to this:

>> > Uninit was created at:
>> >  slab_post_alloc_hook mm/slub.c:4576 [inline]
>> >  slab_alloc_node mm/slub.c:4898 [inline]
>> >  __do_kmalloc_node mm/slub.c:5294 [inline]
>> >  __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403
>> >  kmalloc_reserve net/core/skbuff.c:635 [inline]
>> >  __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713
>> >  alloc_skb include/linux/skbuff.h:1383 [inline]
>> >  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline]
>> >  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>> >  nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922

How can irqentry_exit_to_kernel_mode_preempt() "touch" an uninitialized
SKB which was allocated by the interrupted code, when it only evaluates
regs->eflags and the rcu_exit portion of the register supplied 'state'
argument?

>> > =====================================================
>> > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>> >  irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472
>> >  irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline]
>> >  irqentry_exit+0x7b/0x760 kernel/entry/common.c:164
>> >  sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061
>> >  asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697
>> >  __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210
>> >  nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline]
>> >  nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline]
>> >  nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922

So I got the vmlinux and disassembled the related code parts:

ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>:
ffffffff81da7240:       55                      push   %rbp
ffffffff81da7241:       48 89 e5                mov    %rsp,%rbp
ffffffff81da7244:       41 57                   push   %r15
ffffffff81da7246:       41 56                   push   %r14
ffffffff81da7248:       41 55                   push   %r13
ffffffff81da724a:       41 54                   push   %r12
ffffffff81da724c:       53                      push   %rbx
ffffffff81da724d:       41 89 f6                mov    %esi,%r14d	// Save 'state' in r14
ffffffff81da7250:       49 89 ff                mov    %rdi,%r15        // Save 'regs' in r15
ffffffff81da7253:       e8 08 84 d4 00          call   ffffffff82aef660 <__msan_get_context_state>
ffffffff81da7258:       44 0f b6 68 08          movzbl 0x8(%rax),%r13d  // Read magic value into r13d (related to 'state')
ffffffff81da725d:       8b 98 90 0c 00 00       mov    0xc90(%rax),%ebx // Read magic value into ebx (related to 'regs')

ffffffff81da7263:       4d 8b a7 90 00 00 00    mov    0x90(%r15),%r12  // Read regs->eflags into r12

ffffffff81da726a:       49 81 c7 90 00 00 00    add    $0x90,%r15	// Add eflags offset to r15
ffffffff81da7271:       4c 89 ff                mov    %r15,%rdi	// MSAN check for accessing regs->eflags
ffffffff81da7274:       e8 b7 78 d4 00          call   ffffffff82aeeb30 <__msan_metadata_ptr_for_load_8>
ffffffff81da7279:       8b 08                   mov    (%rax),%ecx	// Read magic data from the address returned by msan
ffffffff81da727b:       89 c8                   mov    %ecx,%eax	// Transfer result to eax
ffffffff81da727d:       f7 d0                   not    %eax		// One's complement negate eax
ffffffff81da727f:       41 81 e4 00 02 00 00    and    $0x200,%r12d	// Isolate eflags::IF (bit 9)

ffffffff81da7286:       c1 e9 09                shr    $0x9,%ecx	// Right shift the magic value by 9 (IF into bit 0)

ffffffff81da7289:       0d ff fd ff ff          or     $0xfffffdff,%eax	// Set all bits in eax except bit 9
ffffffff81da728e:       41 85 c4                test   %eax,%r12d	// Logical compare between eax and r12d

The result is in ZF: 1 when bit 9 is not set in eax and/or r12d. 0 if bit 9 is set in both.
	
ffffffff81da7291:       0f 94 c0                sete   %al		// If ZF==1 AL=1 otherwise AL=0
ffffffff81da7294:       20 c8                   and    %cl,%al		// And the right shifted bit 9

ffffffff81da7296:       4d 85 e4                test   %r12,%r12	// Check whether r12 is 0 ( IF set or not )
ffffffff81da7299:       41 0f 94 c7             sete   %r15b		// If ZF==1 r15b=1 otherwise r15b=0

ffffffff81da729d:       44 89 e1                mov    %r12d,%ecx	// Transfer the isolated bit 9 into ecx
ffffffff81da72a0:       c1 e9 09                shr    $0x9,%ecx	// Shift it into bit 0
ffffffff81da72a3:       44 20 e9                and    %r13b,%cl	// And the magic context state value

ffffffff81da72a6:       45 08 f7                or     %r14b,%r15b	// Or 'state' on the IF result created above

ffffffff81da72a9:       41 f6 d6                not    %r14b		// One's complement negate 'state'
ffffffff81da72ac:       45 08 ee                or     %r13b,%r14b	// Or the above magic KMSAN value on the negated 'state'

'state' is a register argument value. So how on earth can KMSAN mangle this
with information stored in the KMSAN context? See below.

ffffffff81da72af:       41 0f b6 f6             movzbl %r14b,%esi	// zero extend the result into esi
ffffffff81da72b3:       0f b6 c9                movzbl %cl,%ecx		// zero extend the magic result into ecx
ffffffff81da72b6:       84 c0                   test   %al,%al		// Check whether al is 0 (magic IF result)

ffffffff81da72b8:       0f 45 ce                cmovne %esi,%ecx        // If not, move the magic mangled state result into ecx

This is where the whole thing goes south because it uses the register
state which was wrongfully mangled with the KMSAN state in r13b.

ffffffff81da72bb:       f6 c1 01                test   $0x1,%cl		// Check whether the result is 1
			      			       			// If 1, jump to the kmsan warning section

ffffffff81da72be:       75 1b                   jne    ffffffff81da72db <irqentry_exit_to_kernel_mode_preempt+0x9b>

ffffffff81da72c0:       41 f6 c7 01             test   $0x1,%r15b       // Check the actual condition

i.e.  if (regs_irqs_disabled(regs) || state.exit_rcu)

ffffffff81da72c4:       75 05                   jne    ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b>
ffffffff81da72c6:       e8 9d f1 37 0f          call   ffffffff91126468 <__SCT__irqentry_exit_cond_resched>
ffffffff81da72cb:       5b                      pop    %rbx
ffffffff81da72cc:       41 5c                   pop    %r12
ffffffff81da72ce:       41 5d                   pop    %r13
ffffffff81da72d0:       41 5e                   pop    %r14
ffffffff81da72d2:       41 5f                   pop    %r15
ffffffff81da72d4:       5d                      pop    %rbp
ffffffff81da72d5:       2e e9 d5 e2 37 0f       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>

// KMSAN warning

ffffffff81da72db:       31 c9                   xor    %ecx,%ecx
ffffffff81da72dd:       4d 85 e4                test   %r12,%r12
ffffffff81da72e0:       0f 44 d9                cmove  %ecx,%ebx
ffffffff81da72e3:       84 c0                   test   %al,%al
ffffffff81da72e5:       74 02                   je     ffffffff81da72e9 <irqentry_exit_to_kernel_mode_preempt+0xa9>
ffffffff81da72e7:       8b 1a                   mov    (%rdx),%ebx
ffffffff81da72e9:       89 df                   mov    %ebx,%edi
ffffffff81da72eb:       e8 30 83 d4 00          call   ffffffff82aef620 <__msan_warning>
ffffffff81da72f0:       41 f6 c7 01             test   $0x1,%r15b
ffffffff81da72f4:       74 d0                   je     ffffffff81da72c6 <irqentry_exit_to_kernel_mode_preempt+0x86>
ffffffff81da72f6:       eb d3                   jmp    ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b>

Let's look at the call sites to prove that KMSAN cannot observe 'state'
as a memory access ever.

noinstr sysvec_apic_timer_interrupt(regs)
{
  // Not observable stack or register state
  irqentry_state_t state = irqentry_enter(regs);

  instrumentation_begin();
  __sysvec_apic_timer_interrupt(regs);
  instrumentation_end();

  irqentry_exit(regs, state);
}

Here is the disassembly:

ffffffff910e6840 <sysvec_apic_timer_interrupt>:
ffffffff910e6840:       f3 0f 1e fa             endbr64
ffffffff910e6844:       55                      push   %rbp
ffffffff910e6845:       48 89 e5                mov    %rsp,%rbp
ffffffff910e6848:       41 56                   push   %r14
ffffffff910e684a:       53                      push   %rbx
ffffffff910e684b:       48 89 fb                mov    %rdi,%rbx			       // Saves 'regs' in rbx
ffffffff910e684e:       e8 ad 07 00 00          call   ffffffff910e7000 <irqentry_enter>       // invokes irqentry_enter()
ffffffff910e6853:       41 89 c6                mov    %eax,%r14d			       // Saves the returned 'state' in r14d
ffffffff910e6856:       65 c6 05 a2 d9 7f 04 01 movb   $0x1,%gs:0x47fd9a2(%rip)        # ffffffff958e4200 <irq_stat>
ffffffff910e685e:       90                      nop
ffffffff910e685f:       f6 83 88 00 00 00 03    testb  $0x3,0x88(%rbx)
ffffffff910e6866:       75 0b                   jne    ffffffff910e6873 <sysvec_apic_timer_interrupt+0x33>
ffffffff910e6868:       65 8a 05 c3 f7 76 04    mov    %gs:0x476f7c3(%rip),%al        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e686f:       84 c0                   test   %al,%al
ffffffff910e6871:       74 29                   je     ffffffff910e689c <sysvec_apic_timer_interrupt+0x5c>
ffffffff910e6873:       e8 28 eb 8e f0          call   ffffffff819d53a0 <irq_enter_rcu>
ffffffff910e6878:       48 89 df                mov    %rbx,%rdi
ffffffff910e687b:       e8 c0 a5 80 f0          call   ffffffff818f0e40 <__sysvec_apic_timer_interrupt>
ffffffff910e6880:       e8 3b ed 8e f0          call   ffffffff819d55c0 <irq_exit_rcu>
ffffffff910e6885:       90                      nop
ffffffff910e6886:       41 0f b6 f6             movzbl %r14b,%esi			// Moves the saved 'state' into esi
ffffffff910e688a:       48 89 df                mov    %rbx,%rdi			// Moves the saved 'regs' into rdi
ffffffff910e688d:       e8 4e 08 00 00          call   ffffffff910e70e0 <irqentry_exit> // Invokes irqentry_exit()
ffffffff910e6892:       5b                      pop    %rbx
ffffffff910e6893:       41 5e                   pop    %r14
ffffffff910e6895:       5d                      pop    %rbp
ffffffff910e6896:       2e e9 14 ed 03 00       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>
ffffffff910e689c:       65 c6 05 8e f7 76 04 01 movb   $0x1,%gs:0x476f78e(%rip)        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e68a4:       65 4c 8b 1d 54 f7 76 04 mov    %gs:0x476f754(%rip),%r11        # ffffffff95856000 <hardirq_stack_ptr>
ffffffff910e68ac:       49 89 23                mov    %rsp,(%r11)
ffffffff910e68af:       4c 89 dc                mov    %r11,%rsp
ffffffff910e68b2:       e8 e9 ea 8e f0          call   ffffffff819d53a0 <irq_enter_rcu>
ffffffff910e68b7:       48 89 df                mov    %rbx,%rdi
ffffffff910e68ba:       e8 81 a5 80 f0          call   ffffffff818f0e40 <__sysvec_apic_timer_interrupt>
ffffffff910e68bf:       e8 fc ec 8e f0          call   ffffffff819d55c0 <irq_exit_rcu>
ffffffff910e68c4:       5c                      pop    %rsp
ffffffff910e68c5:       65 c6 05 65 f7 76 04 00 movb   $0x0,%gs:0x476f765(%rip)        # ffffffff95856032 <hardirq_stack_inuse>
ffffffff910e68cd:       eb b6                   jmp    ffffffff910e6885 <sysvec_apic_timer_interrupt+0x45>
ffffffff910e68cf:       90                      nop

Purely register state in a function which is not instrumented to begin with.

Now let's look at irqentry_enter():

noinstr irqentry_enter(regs)
{
  // Not observable stack or register state
  irqentry_state_t state = ....;
  ....
  return state;
}


ffffffff910e7000 <irqentry_enter>:
ffffffff910e7000:       f3 0f 1e fa             endbr64
ffffffff910e7004:       55                      push   %rbp
ffffffff910e7005:       48 89 e5                mov    %rsp,%rbp
ffffffff910e7008:       53                      push   %rbx
ffffffff910e7009:       f6 87 88 00 00 00 03    testb  $0x3,0x88(%rdi)				// check for kernel entry
ffffffff910e7010:       74 59                   je     ffffffff910e706b <irqentry_enter+0x6b>   // branch taken 

...

ffffffff910e706b:       65 48 8b 05 95 ef 76 04 mov    %gs:0x476ef95(%rip),%rax        # ffffffff95856008 <current_task>
ffffffff910e7073:       f6 40 2c 02             testb  $0x2,0x2c(%rax)				       // is_idle_task()?
ffffffff910e7077:       75 0b                   jne    ffffffff910e7084 <irqentry_enter+0x84>	
ffffffff910e7079:       90                      nop
ffffffff910e707a:       e8 91 99 a0 f1          call   ffffffff82af0a10 <kmsan_unpoison_entry_regs>    // Unpoisons regs
ffffffff910e707f:       90                      nop
ffffffff910e7080:       31 c0                   xor    %eax,%eax				       // state = 0
ffffffff910e7082:       eb 14                   jmp    ffffffff910e7098 <irqentry_enter+0x98>          // return
ffffffff910e7084:       48 89 fb                mov    %rdi,%rbx				       // Preserve 'regs'	  
ffffffff910e7087:       e8 44 0c 00 00          call   ffffffff910e7cd0 <ct_irq_enter>
ffffffff910e708c:       90                      nop
ffffffff910e708d:       48 89 df                mov    %rbx,%rdi				      // Restore 'regs'
ffffffff910e7090:       e8 7b 99 a0 f1          call   ffffffff82af0a10 <kmsan_unpoison_entry_regs>   // Unpoisons 'regs'
ffffffff910e7095:       90                      nop
ffffffff910e7096:       b0 01                   mov    $0x1,%al					      // state = 1
ffffffff910e7098:       5b                      pop    %rbx					      // return
ffffffff910e7099:       5d                      pop    %rbp
ffffffff910e709a:       2e e9 10 e5 03 00       cs jmp ffffffff911255b0 <__pi___x86_return_thunk>

As this just returns state constants in RAX, there is zero related KMSAN memory state, right?

Now irqentry_exit:

noinstr irqentry_exit(regs, state)
{
	....
	irqentry_exit_to_kernel_mode_preempt(regs, state);
}

ffffffff910e70e0 <irqentry_exit>:
ffffffff910e70e0:       f3 0f 1e fa             endbr64
ffffffff910e70e4:       55                      push   %rbp
ffffffff910e70e5:       48 89 e5                mov    %rsp,%rbp
ffffffff910e70e8:       41 57                   push   %r15
ffffffff910e70ea:       41 56                   push   %r14
ffffffff910e70ec:       41 55                   push   %r13
ffffffff910e70ee:       41 54                   push   %r12
ffffffff910e70f0:       53                      push   %rbx
ffffffff910e70f1:       48 83 ec 68             sub    $0x68,%rsp
ffffffff910e70f5:       49 89 fe                mov    %rdi,%r14	// Save 'regs'
ffffffff910e70f8:       f6 87 88 00 00 00 03    testb  $0x3,0x88(%rdi)	// return to kernel mode
ffffffff910e70ff:       74 4c                   je     ffffffff910e714d <irqentry_exit+0x6d>  // branch taken
...
ffffffff910e714d:       89 f3                   mov    %esi,%ebx	// Save 'state'
ffffffff910e714f:       90                      nop
ffffffff910e7150:       0f b6 f3                movzbl %bl,%esi
ffffffff910e7153:       4c 89 f7                mov    %r14,%rdi
ffffffff910e7156:       e8 e5 00 cc f0          call   ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>
ffffffff910e715b:       90                      nop

Again. 'state' is a pure register value, which is handed to
irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().

But KMSAN magically associates a memory access which it and then claims
it belongs to a SKB which was allocated in the interrupted code.

What Mark's change actually does is to make the register value 'state'
observable in an instrumented function, while before that 'state' was
always confined in the non instrumentable code.

But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
is a pure register value, which could be even a constant supplied by the
caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.

I just built the same .config with clang-19 and it is hallucinating in
the same way, though the actual assembly is more readable than the
syzbot clang-21 variant:

0000000000002710 <irqentry_exit_to_kernel_mode_preempt>:
    2710:       55                      push   %rbp
    2711:       48 89 e5                mov    %rsp,%rbp
    2714:       41 57                   push   %r15
    2716:       41 56                   push   %r14
    2718:       41 55                   push   %r13
    271a:       41 54                   push   %r12
    271c:       53                      push   %rbx
    271d:       89 f3                   mov    %esi,%ebx	// save 'state' in EBX
    271f:       49 89 ff                mov    %rdi,%r15	// save 'regs' in R15
    2722:       e8 00 00 00 00          call   2727 <irqentry_exit_to_kernel_mode_preempt+0x17> 2723: R_X86_64_PLT32    __msan_get_context_state-0x4
    2727:       44 0f b6 60 08          movzbl 0x8(%rax),%r12d	  // Magic KMSAN data related to 'state'
    272c:       44 8b b0 90 0c 00 00    mov    0xc90(%rax),%r14d  // Magic KMSAN data related to 'regs::eflags'
    2733:       4d 8b af 90 00 00 00    mov    0x90(%r15),%r13	  // Read the actual regs->eflags
    273a:       49 81 c7 90 00 00 00    add    $0x90,%r15	  // Adjust pointer to regs::eflags
    2741:       4c 89 ff                mov    %r15,%rdi
    2744:       e8 00 00 00 00          call   2749 <irqentry_exit_to_kernel_mode_preempt+0x39> 2745: R_X86_64_PLT32    __msan_metadata_ptr_for_load_8-0x4
    2749:       48 8b 00                mov    (%rax),%rax	  // Read the KMSAN state
    274c:       25 00 02 00 00          and    $0x200,%eax        // Isolate bit 9 (IF?)

    2751:       41 81 e5 00 02 00 00    and    $0x200,%r13d	  // Isolate IF in the actual value
    2758:       48 85 c0                test   %rax,%rax	  // Test whether the KMSAN IF state is 0
    275b:       74 07                   je     2764 <irqentry_exit_to_kernel_mode_preempt+0x54>
								  // Not zero
    275d:       f7 d0                   not    %eax		  // negate the KMSAN state
    275f:       44 21 e8                and    %r13d,%eax	  // And the actual value (IF isolated)
    		      			       			  // If zero, goto MSAN warning
    2762:       74 25                   je     2789 <irqentry_exit_to_kernel_mode_preempt+0x79>
    
    2764:       4d 85 ed                test   %r13,%r13	  // Check whether IF is set
    		      			       			  // If not, return
    2767:       74 10                   je     2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2769:       41 f6 c4 01             test   $0x1,%r12b	  // Check the magic KMSAN state (related to 'state')
		      	 		       			  // If not zero, goto MSAN warning   
    276d:       75 28                   jne    2797 <irqentry_exit_to_kernel_mode_preempt+0x87>

    276f:       f6 c3 01                test   $0x1,%bl		  // Check state.rcu_exit
    		      			       			  // If not, return
    2772:       75 05                   jne    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2774:       e8 00 00 00 00          call   2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2775: R_X86_64_PLT32    __SCT__irqentry_exit_cond_resched-0x4
    2779:       5b                      pop    %rbx
    277a:       41 5c                   pop    %r12
    277c:       41 5d                   pop    %r13
    277e:       41 5e                   pop    %r14
    2780:       41 5f                   pop    %r15
    2782:       5d                      pop    %rbp
    2783:       2e e9 00 00 00 00       cs jmp 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2785: R_X86_64_PLT32    __x86_return_thunk-0x4
    2789:       8b 3a                   mov    (%rdx),%edi
    278b:       e8 00 00 00 00          call   2790 <irqentry_exit_to_kernel_mode_preempt+0x80> 278c: R_X86_64_PLT32    __msan_warning-0x4
    2790:       4d 85 ed                test   %r13,%r13
    2793:       75 d4                   jne    2769 <irqentry_exit_to_kernel_mode_preempt+0x59>
    2795:       eb e2                   jmp    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>
    2797:       44 89 f7                mov    %r14d,%edi
    279a:       e8 00 00 00 00          call   279f <irqentry_exit_to_kernel_mode_preempt+0x8f> 279b: R_X86_64_PLT32    __msan_warning-0x4
    279f:       f6 c3 01                test   $0x1,%bl
    27a2:       74 d0                   je     2774 <irqentry_exit_to_kernel_mode_preempt+0x64>
    27a4:       eb d3                   jmp    2779 <irqentry_exit_to_kernel_mode_preempt+0x69>

The compiler _cannot_ assume anything about the 'state' argument as
that's handed in as value in RSI from a completely different compilation
unit.

Something is wrong in KMSAN/compiler land or do you still believe that
you just need to unpoison the non existing memory 'state'?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-11 12:25     ` Thomas Gleixner
@ 2026-05-12  9:33       ` Alexander Potapenko
  2026-05-12 11:15         ` Mark Rutland
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2026-05-12  9:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Vyukov, Mark Rutland, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
>

Hi Thomas,

thanks for diving into this!

> > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > and `state`.
> > Because there is a lot of non-instrumented code around, we fail to
> > initialize these variables.
> > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > the state.
> > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > at the same place.
>
> Good luck with unpoisoning 'state'. 'state' is not memory to begin with.

My bad. Normally, the compiler would happily move `state` to memory if it is
address-taken, and make sure that its shadow is tracked properly.
But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
a `noinstr` function.

<I omit the elaborate assembly analysis here, tipping my hat!>

> Again. 'state' is a pure register value, which is handed to
> irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().

There is no notion of a 'pure register value' in C, and the compiler may make
arbitrary decisions about whether a particular value is stored on the stack or
in the registers.

Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
level and can track the state of a value regardless of where it is stored.
In particular, it normally works for function return values - unless `noinstr`
kicks in.

>
> But KMSAN magically associates a memory access which it and then claims
> it belongs to a SKB which was allocated in the interrupted code.
>
> What Mark's change actually does is to make the register value 'state'
> observable in an instrumented function, while before that 'state' was
> always confined in the non instrumentable code.

Agreed.

> But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> is a pure register value, which could be even a constant supplied by the
> caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.

I disagree with a general implication that KMSAN has zero business to fiddle
with values passed in registers. But I agree we are doing a poor job trying
to pull the shadow for `state` out of thin air.

>
> The compiler _cannot_ assume anything about the 'state' argument as
> that's handed in as value in RSI from a completely different compilation
> unit.
>

Again, this only matters because we are calling an instrumented function from
a non-instrumented one, otherwise it's perfectly fine to call between
compilation units.

> Something is wrong in KMSAN/compiler land or do you still believe that
> you just need to unpoison the non existing memory 'state'?
>

When we call an instrumented function from a non-instrumented one, the compiler
is doomed to not understand that and to be unable to track the function
parameters properly. Exactly because `noinstr` implies no instrumentation
whatsoever, the compiler may not add any hints on the caller side that would
help the callee understand what's going on - even if KMSAN is able to see this
`noinstr` function (which is not always the case).

So what we could do is to add annotations manualy on either the caller side or
the callee side.

We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
making all its inputs initialized. This is the easiest solution, it may
introduce false negatives, but we are on a very thin ice anyway, so perhaps
doing so is better than dealing with more false positives in the interrupt code.

Another option for the callee would be applying `__always_inline`, so that
irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
Given that irqentry_exit_to_kernel_mode_after_preempt() is already
`__always_inline`, it might be the right thing to do.

On the caller side, we could do something creative with instrumentation_begin()
and instrumentation_end(). We've had a discussion about that exactly four years
ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
, but came to a conclusion that a handful of annotations on the noinstr/instr
boundary may do a better job than a solution that doesn't cover all cases.

In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
non-existing today) cases where two functions are called from an instrumented
region, and the first function somehow leaves the argument state poisoned.

Do you think it's worth revisiting the instrumentation_begin() approach, or
shall we go with one of the compiler attributes instead?

Thank you,
Alexander

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12  9:33       ` Alexander Potapenko
@ 2026-05-12 11:15         ` Mark Rutland
  2026-05-12 16:24           ` Alexander Potapenko
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Rutland @ 2026-05-12 11:15 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote:
> On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
> > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > > and `state`.
> > > Because there is a lot of non-instrumented code around, we fail to
> > > initialize these variables.
> > > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > > the state.
> > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > > at the same place.
> >
> > Good luck with unpoisoning 'state'. 'state' is not memory to begin with.
> 
> My bad. Normally, the compiler would happily move `state` to memory if it is
> address-taken, and make sure that its shadow is tracked properly.
> But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
> a `noinstr` function.
> 
> <I omit the elaborate assembly analysis here, tipping my hat!>
> 
> > Again. 'state' is a pure register value, which is handed to
> > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().
> 
> There is no notion of a 'pure register value' in C, and the compiler may make
> arbitrary decisions about whether a particular value is stored on the stack or
> in the registers.
> 
> Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
> level and can track the state of a value regardless of where it is stored.
> In particular, it normally works for function return values - unless `noinstr`
> kicks in.
> 
> >
> > But KMSAN magically associates a memory access which it and then claims
> > it belongs to a SKB which was allocated in the interrupted code.
> >
> > What Mark's change actually does is to make the register value 'state'
> > observable in an instrumented function, while before that 'state' was
> > always confined in the non instrumentable code.
> 
> Agreed.
> 
> > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> > is a pure register value, which could be even a constant supplied by the
> > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.
> 
> I disagree with a general implication that KMSAN has zero business to fiddle
> with values passed in registers. But I agree we are doing a poor job trying
> to pull the shadow for `state` out of thin air.
> 
> > The compiler _cannot_ assume anything about the 'state' argument as
> > that's handed in as value in RSI from a completely different compilation
> > unit.
> 
> Again, this only matters because we are calling an instrumented function from
> a non-instrumented one, otherwise it's perfectly fine to call between
> compilation units.

For context, can you explain how this is expected to work across
compilation units when the caller and callee *are* instrumented, when an
argument is passed in a register?

Below you suggest that the caller might add "hints", but it's not clear
specifically what this means.

> > Something is wrong in KMSAN/compiler land or do you still believe that
> > you just need to unpoison the non existing memory 'state'?
> 
> When we call an instrumented function from a non-instrumented one, the compiler
> is doomed to not understand that and to be unable to track the function
> parameters properly. Exactly because `noinstr` implies no instrumentation
> whatsoever, the compiler may not add any hints on the caller side that would
> help the callee understand what's going on - even if KMSAN is able to see this
> `noinstr` function (which is not always the case).
>
> So what we could do is to add annotations manualy on either the caller side or
> the callee side.

Without some understanding of those "hints" you mention, I don't see how
we can do that on the caller side.

> We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> making all its inputs initialized. This is the easiest solution, it may
> introduce false negatives, but we are on a very thin ice anyway, so perhaps
> doing so is better than dealing with more false positives in the interrupt code.
> 
> Another option for the callee would be applying `__always_inline`, so that
> irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> `__always_inline`, it might be the right thing to do.

We can do that, but this really suggests that there's a fundemantal
inability to pass arguments between code which is noinstr and code which
isinstrumented with AddressSanitizer, and that's inevitably going to
bite us in future.

> On the caller side, we could do something creative with instrumentation_begin()
> and instrumentation_end(). We've had a discussion about that exactly four years
> ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> , but came to a conclusion that a handful of annotations on the noinstr/instr
> boundary may do a better job than a solution that doesn't cover all cases.

That doesn't look general at all, so I am not keen on that.

> In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> non-existing today) cases where two functions are called from an instrumented
> region, and the first function somehow leaves the argument state poisoned.

How exactly is kmsan_context_state used?

If that's supposed to carry some global or current context, surely
blatting that in entry code will affect the code that was interrupted?

I see kmsan_get_context() has an in_task() check, but that can't help
with nested exceptions, so this doesn't look right at all.

> Do you think it's worth revisiting the instrumentation_begin() approach, or
> shall we go with one of the compiler attributes instead?

I think we need a better understanding of this first.

It looks to me that there are bigger problems here.

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 11:15         ` Mark Rutland
@ 2026-05-12 16:24           ` Alexander Potapenko
  2026-05-12 17:46             ` Mark Rutland
  2026-05-13  0:36             ` Thomas Gleixner
  0 siblings, 2 replies; 18+ messages in thread
From: Alexander Potapenko @ 2026-05-12 16:24 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote:
> > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote:
> > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs`
> > > > and `state`.
> > > > Because there is a lot of non-instrumented code around, we fail to
> > > > initialize these variables.
> > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls
> > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not
> > > > the state.
> > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))`
> > > > at the same place.
> > >
> > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with.
> >
> > My bad. Normally, the compiler would happily move `state` to memory if it is
> > address-taken, and make sure that its shadow is tracked properly.
> > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into
> > a `noinstr` function.
> >
> > <I omit the elaborate assembly analysis here, tipping my hat!>
> >
> > > Again. 'state' is a pure register value, which is handed to
> > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt().
> >
> > There is no notion of a 'pure register value' in C, and the compiler may make
> > arbitrary decisions about whether a particular value is stored on the stack or
> > in the registers.
> >
> > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR
> > level and can track the state of a value regardless of where it is stored.
> > In particular, it normally works for function return values - unless `noinstr`
> > kicks in.
> >
> > >
> > > But KMSAN magically associates a memory access which it and then claims
> > > it belongs to a SKB which was allocated in the interrupted code.
> > >
> > > What Mark's change actually does is to make the register value 'state'
> > > observable in an instrumented function, while before that 'state' was
> > > always confined in the non instrumentable code.
> >
> > Agreed.
> >
> > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt()
> > > is a pure register value, which could be even a constant supplied by the
> > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it.
> >
> > I disagree with a general implication that KMSAN has zero business to fiddle
> > with values passed in registers. But I agree we are doing a poor job trying
> > to pull the shadow for `state` out of thin air.
> >
> > > The compiler _cannot_ assume anything about the 'state' argument as
> > > that's handed in as value in RSI from a completely different compilation
> > > unit.
> >
> > Again, this only matters because we are calling an instrumented function from
> > a non-instrumented one, otherwise it's perfectly fine to call between
> > compilation units.
>
> For context, can you explain how this is expected to work across
> compilation units when the caller and callee *are* instrumented, when an
> argument is passed in a register?

Per KMSAN ABI, the metadata for the function arguments is passed via
buffers in the so-called context state (see
include/linux/kmsan_types.h)
In the userspace, these buffers are thread-local variables referenced
by inline loads and stores.
In the kernel space, the compiler inserts a call
__msan_get_context_state() at the beginning of every function, and
then the instrumentation code uses whatever that function returned.

Assume we have a function:

int sum(int a, int b) {
    result = a + b;
    return result;
}

Its instrumented version looks roughly as follows (we'll omit origin
tracking for simplicity):

int sum(int a, int b) {
  struct kmsan_context_state *kcs = __msan_get_context_state();
  int s_a = ((int)kcs->param_tls)[0];  // shadow of a
  int s_b = ((int)kcs->param_tls)[1];  // shadow of b
  result = a + b;
  s_result = s_a | s_b;
  ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
  return result;
}

Most certainly `a` and `b` will be passed using registers, but that
doesn't matter: their metadata is safe as long as the caller does:

((int)kcs->param_tls)[0] = s_a;
((int)kcs->param_tls)[1] = s_b;
result = sum(a, b);
s_result = ((int)kcs->retval_tls)[0];

All the metadata tracking is implemented using an LLVM IR pass, which
ensures that the shadow for each uninitialized value is tracked
regardless of where that value is stored - be that heap memory, stack
or registers:
- when a value is created or loaded from memory, the compiler inserts
SSA registers corresponding to that value's shadow;
- when a value is written to memory, a shadow store is inserted;
- when a value is used in an operation, the result of that operation
receives a shadow value depending on the operands and their shadow
values;
- when a value is passed to a function, the corresponding context
state stores and loads are emitted.

Now, this works fine under the assumption that all code in the kernel
is instrumented with KMSAN.
However this is not true.

1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some
prevent infinite recursion caused by KMSAN calling instrumented code;
others disable instrumentation of the code that executes before KMSAN
is fully set up.
2. There are annotations to soft-disable KMSAN for a particular
function by adding a `__no_kmsan_checks` attribute. There will still
be instrumentation, but all the function outputs (stores and returns)
will be initialized, and no errors will be reported.
This is handy when the compiler cannot properly instrument the
underlying code. One example would be tricky inline assembly, another
one - stack walking functions that deliberately read and write
uninitialized values.
3. There are cases in which KMSAN must be disabled for the whole
function (`__no_sanitize_memory`).
We disable instrumentation for `noinstr` functions. Additionally, on
x86 there is exactly one case where this is done to avoid infinite
recursion when storing the origins.

It's important to note that, due to how function attributes are
implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will
be applied to a function if that function exists during the KMSAN
instrumentation pass. If it was inlined before that pass, the compiler
will apply the instrumentation rules for whichever function is calling
the inlined one.

Now, because of the described ABI, calls between instrumented
functions and/or functions marked `__no_kmsan_checks` are perfectly
fine because the function arguments are stored and loaded on both
ends.
What does not work well is calling non-instrumented functions from
instrumented functions, and vice versa.
In the former case, KMSAN will not see the callee's side effects
(return values or memory stores), in the latter case, the callee may
receive incorrect shadow values for the function parameters or memory
stores in the caller. Both will
We have few tools to deal with such cases. It often helps to move the
border between the instrumented and non-instrumented code by applying
__no_kmsan_checks or __no_sanitize_memory until we get to a point
where there are no incorrect shadow arguments.
Another way to deal with uninitialized data coming from
non-instrumented code is kmsan_unpoison_memory(address, size).
Unfortunately, as Thomas pointed out, we can't use it for locals in
non-instrumented code.

> Below you suggest that the caller might add "hints", but it's not clear
> specifically what this means.
>
> > > Something is wrong in KMSAN/compiler land or do you still believe that
> > > you just need to unpoison the non existing memory 'state'?
> >
> > When we call an instrumented function from a non-instrumented one, the compiler
> > is doomed to not understand that and to be unable to track the function
> > parameters properly. Exactly because `noinstr` implies no instrumentation
> > whatsoever, the compiler may not add any hints on the caller side that would
> > help the callee understand what's going on - even if KMSAN is able to see this
> > `noinstr` function (which is not always the case).
> >
> > So what we could do is to add annotations manualy on either the caller side or
> > the callee side.
>
> Without some understanding of those "hints" you mention, I don't see how
> we can do that on the caller side.

In this case by "hints" I meant any kind of instrumentation that the
compiler could have inserted automatically to mark the data in the
instrumented functions as initialized.
There are no such "hints" right now (apart from letting KMSAN
instrument the function).
As for the manual annotations, we only have the mentioned
`__no_kmsan_checks`, `__no_sanitize_memory`, and
kmsan_unpoison_memory().

>
> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > making all its inputs initialized. This is the easiest solution, it may
> > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > doing so is better than dealing with more false positives in the interrupt code.
> >
> > Another option for the callee would be applying `__always_inline`, so that
> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > `__always_inline`, it might be the right thing to do.
>
> We can do that, but this really suggests that there's a fundemantal
> inability to pass arguments between code which is noinstr and code which
> isinstrumented with AddressSanitizer, and that's inevitably going to
> bite us in future.

This is true (assuming you mean MemorySanitizer), but:
- I don't think we can avoid that, given that there will always be
non-instrumented code;
- so far the number of annotations has been manageable.

>
> > On the caller side, we could do something creative with instrumentation_begin()
> > and instrumentation_end(). We've had a discussion about that exactly four years
> > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> > , but came to a conclusion that a handful of annotations on the noinstr/instr
> > boundary may do a better job than a solution that doesn't cover all cases.
>
> That doesn't look general at all, so I am not keen on that.

Ack.

> > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> > non-existing today) cases where two functions are called from an instrumented
> > region, and the first function somehow leaves the argument state poisoned.
>
> How exactly is kmsan_context_state used?

Hope the explanation above helps.

>
> If that's supposed to carry some global or current context, surely
> blatting that in entry code will affect the code that was interrupted?

It will mostly affect the bottom-level function called by the entry
code, after that, nested functions will be just passing their
parameters as per KMSAN ABI.

> I see kmsan_get_context() has an in_task() check, but that can't help
> with nested exceptions, so this doesn't look right at all.

KMSAN context is per-task, and if !in_task(), it is per-CPU.

For the nested exceptions, we conservatively bail out (see
kmsan_in_runtime()) to avoid deadlocking.
This may cause false negatives.

> > Do you think it's worth revisiting the instrumentation_begin() approach, or
> > shall we go with one of the compiler attributes instead?

> I think we need a better understanding of this first.
>
> It looks to me that there are bigger problems here.
>

I'll be happy to discuss this further.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 16:24           ` Alexander Potapenko
@ 2026-05-12 17:46             ` Mark Rutland
  2026-05-22  6:26               ` Alexander Potapenko
  2026-05-13  0:36             ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Rutland @ 2026-05-12 17:46 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

Hi,

Thanks for this; the explanation of the KMSAN ABI is *very* helpful.

I have a few questions and comments inline below.

On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > For context, can you explain how this is expected to work across
> > compilation units when the caller and callee *are* instrumented, when an
> > argument is passed in a register?
> 
> Per KMSAN ABI, the metadata for the function arguments is passed via
> buffers in the so-called context state (see
> include/linux/kmsan_types.h)
> In the userspace, these buffers are thread-local variables referenced
> by inline loads and stores.
> In the kernel space, the compiler inserts a call
> __msan_get_context_state() at the beginning of every function, and
> then the instrumentation code uses whatever that function returned.
> 
> Assume we have a function:
> 
> int sum(int a, int b) {
>     result = a + b;
>     return result;
> }
> 
> Its instrumented version looks roughly as follows (we'll omit origin
> tracking for simplicity):
> 
> int sum(int a, int b) {
>   struct kmsan_context_state *kcs = __msan_get_context_state();
>   int s_a = ((int)kcs->param_tls)[0];  // shadow of a
>   int s_b = ((int)kcs->param_tls)[1];  // shadow of b
>   result = a + b;
>   s_result = s_a | s_b;
>   ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
>   return result;
> }
> 
> Most certainly `a` and `b` will be passed using registers, but that
> doesn't matter: their metadata is safe as long as the caller does:
> 
> ((int)kcs->param_tls)[0] = s_a;
> ((int)kcs->param_tls)[1] = s_b;
> result = sum(a, b);
> s_result = ((int)kcs->retval_tls)[0];

Ok. I see how that works when both caller and callee are instrumented.

I assume that's tracked for all arguments regardless of type? e.g. that
includes pointers, structs, etc?

> All the metadata tracking is implemented using an LLVM IR pass, which
> ensures that the shadow for each uninitialized value is tracked
> regardless of where that value is stored - be that heap memory, stack
> or registers:
> - when a value is created or loaded from memory, the compiler inserts
> SSA registers corresponding to that value's shadow;
> - when a value is written to memory, a shadow store is inserted;
> - when a value is used in an operation, the result of that operation
> receives a shadow value depending on the operands and their shadow
> values;
> - when a value is passed to a function, the corresponding context
> state stores and loads are emitted.
> 
> Now, this works fine under the assumption that all code in the kernel
> is instrumented with KMSAN.
> However this is not true.
> 
> 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some
> prevent infinite recursion caused by KMSAN calling instrumented code;
> others disable instrumentation of the code that executes before KMSAN
> is fully set up.
> 2. There are annotations to soft-disable KMSAN for a particular
> function by adding a `__no_kmsan_checks` attribute. There will still
> be instrumentation, but all the function outputs (stores and returns)
> will be initialized, and no errors will be reported.
> This is handy when the compiler cannot properly instrument the
> underlying code. One example would be tricky inline assembly, another
> one - stack walking functions that deliberately read and write
> uninitialized values.
> 3. There are cases in which KMSAN must be disabled for the whole
> function (`__no_sanitize_memory`).
> We disable instrumentation for `noinstr` functions. Additionally, on
> x86 there is exactly one case where this is done to avoid infinite
> recursion when storing the origins.

In addition to the above cases, out-of-line assembly functions won't
manipulate the shadow (so they're effectively the same as noinstr).

> It's important to note that, due to how function attributes are
> implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will
> be applied to a function if that function exists during the KMSAN
> instrumentation pass. If it was inlined before that pass, the compiler
> will apply the instrumentation rules for whichever function is calling
> the inlined one.
> 
> Now, because of the described ABI, calls between instrumented
> functions and/or functions marked `__no_kmsan_checks` are perfectly
> fine because the function arguments are stored and loaded on both
> ends.
> What does not work well is calling non-instrumented functions from
> instrumented functions, and vice versa.

Yep; understood.

> In the former case, KMSAN will not see the callee's side effects
> (return values or memory stores), in the latter case, the callee may
> receive incorrect shadow values for the function parameters or memory
> stores in the caller. Both will

Incomplete sentence here?

> We have few tools to deal with such cases. It often helps to move the
> border between the instrumented and non-instrumented code by applying
> __no_kmsan_checks or __no_sanitize_memory until we get to a point
> where there are no incorrect shadow arguments.
> Another way to deal with uninitialized data coming from
> non-instrumented code is kmsan_unpoison_memory(address, size).
> Unfortunately, as Thomas pointed out, we can't use it for locals in
> non-instrumented code.

It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
and I think it's non-sensical because conceptually they aren't memory.

I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
regs. Surely the pointer argument itself needs to be unpoisoned before
being passed to a callee?

Otherwise, when pointer to regs is passed as an argument to a callee,
the callee will consume stale shadow, which might indicate that the
pointer to regs is uninitialized?

That seems to be exactly what we're hitting with the irq entry state, so
I'm surprised we're not hitting it for the regs. Is that sheer luck, or
is something else protecting us?

> > Below you suggest that the caller might add "hints", but it's not clear
> > specifically what this means.
> >
> > > > Something is wrong in KMSAN/compiler land or do you still believe that
> > > > you just need to unpoison the non existing memory 'state'?
> > >
> > > When we call an instrumented function from a non-instrumented one, the compiler
> > > is doomed to not understand that and to be unable to track the function
> > > parameters properly. Exactly because `noinstr` implies no instrumentation
> > > whatsoever, the compiler may not add any hints on the caller side that would
> > > help the callee understand what's going on - even if KMSAN is able to see this
> > > `noinstr` function (which is not always the case).
> > >
> > > So what we could do is to add annotations manualy on either the caller side or
> > > the callee side.
> >
> > Without some understanding of those "hints" you mention, I don't see how
> > we can do that on the caller side.
> 
> In this case by "hints" I meant any kind of instrumentation that the
> compiler could have inserted automatically to mark the data in the
> instrumented functions as initialized.
> There are no such "hints" right now (apart from letting KMSAN
> instrument the function).
> As for the manual annotations, we only have the mentioned
> `__no_kmsan_checks`, `__no_sanitize_memory`, and
> kmsan_unpoison_memory().

Ok.

For noinstr code calling instrumented code, I see that we could
explicitly modify the argument shadow in the noinstr caller, but that's
*really* grotty, and I don't think we want to do that.

> > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > making all its inputs initialized. This is the easiest solution, it may
> > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > doing so is better than dealing with more false positives in the interrupt code.
> > >
> > > Another option for the callee would be applying `__always_inline`, so that
> > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > `__always_inline`, it might be the right thing to do.
> >
> > We can do that, but this really suggests that there's a fundemantal
> > inability to pass arguments between code which is noinstr and code which
> > isinstrumented with AddressSanitizer, and that's inevitably going to
> > bite us in future.
> 
> This is true (assuming you mean MemorySanitizer), but:

Sorry, yes, I meant MemorySanitizer.

> - I don't think we can avoid that, given that there will always be
> non-instrumented code;
> - so far the number of annotations has been manageable.

As above, I suspect that we're actually missing many necessary
annotations and getting away with this by accident.

I suspect that we need additional compiler support to say something like
"assume this argument/return value IS initialized".

> > > On the caller side, we could do something creative with instrumentation_begin()
> > > and instrumentation_end(). We've had a discussion about that exactly four years
> > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u
> > > , but came to a conclusion that a handful of annotations on the noinstr/instr
> > > boundary may do a better job than a solution that doesn't cover all cases.
> >
> > That doesn't look general at all, so I am not keen on that.
> 
> Ack.
> 
> > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have
> > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in
> > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and
> > > non-existing today) cases where two functions are called from an instrumented
> > > region, and the first function somehow leaves the argument state poisoned.
> >
> > How exactly is kmsan_context_state used?
> 
> Hope the explanation above helps.
> >
> > If that's supposed to carry some global or current context, surely
> > blatting that in entry code will affect the code that was interrupted?
> 
> It will mostly affect the bottom-level function called by the entry
> code, after that, nested functions will be just passing their
> parameters as per KMSAN ABI.

The problem I was explaining was the other way around: we clobber the
context of the function that was executing *before* exception entry.

Consider when the CPU is executing some function foo(), and an exception
is taken. Within the exception handler, code will clobber the context
state. AFAICT, that state is not saved/restored, and nor is it zeroed
prior to exception return. The exception handler returns back to the
middle of foo(). From foo's PoV, the context state has been clobbered
arbitrarily, and that clobbered state could trigger false positives or
false negatives.

The way kmsan_get_context() uses in_task() will avoid that in *some*
cases, but not *all* cases. In particular, I don't think that's going to
help when a fault is taken for a uaccess.

Maybe I'm missing something here?

> > I see kmsan_get_context() has an in_task() check, but that can't help
> > with nested exceptions, so this doesn't look right at all.
> 
> KMSAN context is per-task, and if !in_task(), it is per-CPU.
> 
> For the nested exceptions, we conservatively bail out (see
> kmsan_in_runtime()) to avoid deadlocking.
> This may cause false negatives.

AFAICT that doesn't address the case I've described above.

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 17:46             ` Mark Rutland
@ 2026-05-22  6:26               ` Alexander Potapenko
  2026-06-02 11:53                 ` Mark Rutland
  0 siblings, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2026-05-22  6:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi,
>
> Thanks for this; the explanation of the KMSAN ABI is *very* helpful.
>
> I have a few questions and comments inline below.
>
> On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > > For context, can you explain how this is expected to work across
> > > compilation units when the caller and callee *are* instrumented, when an
> > > argument is passed in a register?
> >
> > Per KMSAN ABI, the metadata for the function arguments is passed via
> > buffers in the so-called context state (see
> > include/linux/kmsan_types.h)
> > In the userspace, these buffers are thread-local variables referenced
> > by inline loads and stores.
> > In the kernel space, the compiler inserts a call
> > __msan_get_context_state() at the beginning of every function, and
> > then the instrumentation code uses whatever that function returned.
> >
> > Assume we have a function:
> >
> > int sum(int a, int b) {
> >     result = a + b;
> >     return result;
> > }
> >
> > Its instrumented version looks roughly as follows (we'll omit origin
> > tracking for simplicity):
> >
> > int sum(int a, int b) {
> >   struct kmsan_context_state *kcs = __msan_get_context_state();
> >   int s_a = ((int)kcs->param_tls)[0];  // shadow of a
> >   int s_b = ((int)kcs->param_tls)[1];  // shadow of b
> >   result = a + b;
> >   s_result = s_a | s_b;
> >   ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
> >   return result;
> > }
> >
> > Most certainly `a` and `b` will be passed using registers, but that
> > doesn't matter: their metadata is safe as long as the caller does:
> >
> > ((int)kcs->param_tls)[0] = s_a;
> > ((int)kcs->param_tls)[1] = s_b;
> > result = sum(a, b);
> > s_result = ((int)kcs->retval_tls)[0];
>
> Ok. I see how that works when both caller and callee are instrumented.
>
> I assume that's tracked for all arguments regardless of type? e.g. that
> includes pointers, structs, etc?

That's tracked for all parameters passed to a function, whether on the
stack or in registers.
This includes cases when a struct can be passed by value, e.g.:

struct arg {
  int a, b;
};

int sum(struct arg arg) {
  return arg.a + arg.b;
}

For anything passed to the function by pointer, we only track the
initializedness of the pointer itself at the call site and the
function prologue.
Why is that? Because we already track the state of the pointed-to
memory elsewhere.
For every heap allocation, there is shadow memory backing it, which is
initially poisoned (unless it is a GFP_ZERO allocation).
Similarly, shadow memory backs every stack allocation. The state of
stack allocations is initially set by __msan_poison_alloca().
When we write an initialized value to a memory buffer, the
instrumentation code updates that buffer's shadow memory.

So the fact that a buffer is passed to a function by pointer does not
change that buffer's state; that's why we don't need additional
tracking of the pointed-to memory around the calls.
(this is all assuming both the caller and the callee are instrumented).

> > 3. There are cases in which KMSAN must be disabled for the whole
> > function (`__no_sanitize_memory`).
> > We disable instrumentation for `noinstr` functions. Additionally, on
> > x86 there is exactly one case where this is done to avoid infinite
> > recursion when storing the origins.
>
> In addition to the above cases, out-of-line assembly functions won't
> manipulate the shadow (so they're effectively the same as noinstr).

This is correct.


> > In the former case, KMSAN will not see the callee's side effects
> > (return values or memory stores), in the latter case, the callee may
> > receive incorrect shadow values for the function parameters or memory
> > stores in the caller. Both will
>
> Incomplete sentence here?

Yeah, sorry, disregard this.

>
> > We have few tools to deal with such cases. It often helps to move the
> > border between the instrumented and non-instrumented code by applying
> > __no_kmsan_checks or __no_sanitize_memory until we get to a point
> > where there are no incorrect shadow arguments.
> > Another way to deal with uninitialized data coming from
> > non-instrumented code is kmsan_unpoison_memory(address, size).
> > Unfortunately, as Thomas pointed out, we can't use it for locals in
> > non-instrumented code.
>
> It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
> and I think it's non-sensical because conceptually they aren't memory.
>
> I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
> regs. Surely the pointer argument itself needs to be unpoisoned before
> being passed to a callee?
>
> Otherwise, when pointer to regs is passed as an argument to a callee,
> the callee will consume stale shadow, which might indicate that the
> pointer to regs is uninitialized?
>
> That seems to be exactly what we're hitting with the irq entry state, so
> I'm surprised we're not hitting it for the regs. Is that sheer luck, or
> is something else protecting us?

Struct regs is created by non-instrumented code on the stack, that's
why its shadow contains whatever garbage that stack slot had when
leaving instrumented code.
For the data pointed to by individual registers, there are usually two
possibilities:
1. It is allocated by instrumented code and already has some state,
and struct regs is being used to transport it between instrumented
regions.
2. It is allocated by non-instrumented code, belongs to the same shim
that created it, and instrumented code is not supposed to touch it.

So I think this is not "sheer luck", but rather the fact that
non-instrumented code doesn't allocate memory.


> > In this case by "hints" I meant any kind of instrumentation that the
> > compiler could have inserted automatically to mark the data in the
> > instrumented functions as initialized.
> > There are no such "hints" right now (apart from letting KMSAN
> > instrument the function).
> > As for the manual annotations, we only have the mentioned
> > `__no_kmsan_checks`, `__no_sanitize_memory`, and
> > kmsan_unpoison_memory().
>
> Ok.
>
> For noinstr code calling instrumented code, I see that we could
> explicitly modify the argument shadow in the noinstr caller, but that's
> *really* grotty, and I don't think we want to do that.

Agreed.

>
> > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > > making all its inputs initialized. This is the easiest solution, it may
> > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > > doing so is better than dealing with more false positives in the interrupt code.
> > > >
> > > > Another option for the callee would be applying `__always_inline`, so that
> > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > > `__always_inline`, it might be the right thing to do.
> > >
> > > We can do that, but this really suggests that there's a fundemantal
> > > inability to pass arguments between code which is noinstr and code which
> > > isinstrumented with AddressSanitizer, and that's inevitably going to
> > > bite us in future.
> >
> > This is true (assuming you mean MemorySanitizer), but:
>
> Sorry, yes, I meant MemorySanitizer.
>
> > - I don't think we can avoid that, given that there will always be
> > non-instrumented code;
> > - so far the number of annotations has been manageable.
>
> As above, I suspect that we're actually missing many necessary
> annotations and getting away with this by accident.
>
> I suspect that we need additional compiler support to say something like
> "assume this argument/return value IS initialized".

There is already a more generic one saying "assume all
arguments/return values are uninitialized" - that's __no_kmsan_checks.
Do we need a narrower attribute knowing that for a call from
uninstrumented code we'll anyway need to mark all the arguments?


> > Hope the explanation above helps.
> > >
> > > If that's supposed to carry some global or current context, surely
> > > blatting that in entry code will affect the code that was interrupted?
> >
> > It will mostly affect the bottom-level function called by the entry
> > code, after that, nested functions will be just passing their
> > parameters as per KMSAN ABI.
>
> The problem I was explaining was the other way around: we clobber the
> context of the function that was executing *before* exception entry.
>
> Consider when the CPU is executing some function foo(), and an exception
> is taken. Within the exception handler, code will clobber the context
> state. AFAICT, that state is not saved/restored, and nor is it zeroed
> prior to exception return. The exception handler returns back to the
> middle of foo(). From foo's PoV, the context state has been clobbered
> arbitrarily, and that clobbered state could trigger false positives or
> false negatives.
>
> The way kmsan_get_context() uses in_task() will avoid that in *some*
> cases, but not *all* cases. In particular, I don't think that's going to
> help when a fault is taken for a uaccess.
>
> Maybe I'm missing something here?
>
> > > I see kmsan_get_context() has an in_task() check, but that can't help
> > > with nested exceptions, so this doesn't look right at all.
> >
> > KMSAN context is per-task, and if !in_task(), it is per-CPU.
> >
> > For the nested exceptions, we conservatively bail out (see
> > kmsan_in_runtime()) to avoid deadlocking.
> > This may cause false negatives.
>
> AFAICT that doesn't address the case I've described above.
>
> Mark.

You are right, it's quite messy for uaccess, and maybe other cases as well.
In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq()
to switch between different per-cpu contexts, but that didn't work out
well, because preempt_count() could change in the middle of certain
low-level functions (which the instrumentation ignored, as the
reference to the context was only calculated in the prologue).
I think this part indeed works by sheer luck.
We need an annotation telling the tool where an interrupt context
starts and ends. On the compiler side, we'll probably also need an
intrinsic to reload the context pointer.


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-22  6:26               ` Alexander Potapenko
@ 2026-06-02 11:53                 ` Mark Rutland
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Rutland @ 2026-06-02 11:53 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Fri, May 22, 2026 at 08:26:30AM +0200, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote:
> > > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
> > > > For context, can you explain how this is expected to work across
> > > > compilation units when the caller and callee *are* instrumented, when an
> > > > argument is passed in a register?
> > >
> > > Per KMSAN ABI, the metadata for the function arguments is passed via
> > > buffers in the so-called context state (see
> > > include/linux/kmsan_types.h)
> > > In the userspace, these buffers are thread-local variables referenced
> > > by inline loads and stores.
> > > In the kernel space, the compiler inserts a call
> > > __msan_get_context_state() at the beginning of every function, and
> > > then the instrumentation code uses whatever that function returned.
> > >
> > > Assume we have a function:
> > >
> > > int sum(int a, int b) {
> > >     result = a + b;
> > >     return result;
> > > }
> > >
> > > Its instrumented version looks roughly as follows (we'll omit origin
> > > tracking for simplicity):
> > >
> > > int sum(int a, int b) {
> > >   struct kmsan_context_state *kcs = __msan_get_context_state();
> > >   int s_a = ((int)kcs->param_tls)[0];  // shadow of a
> > >   int s_b = ((int)kcs->param_tls)[1];  // shadow of b
> > >   result = a + b;
> > >   s_result = s_a | s_b;
> > >   ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
> > >   return result;
> > > }
> > >
> > > Most certainly `a` and `b` will be passed using registers, but that
> > > doesn't matter: their metadata is safe as long as the caller does:
> > >
> > > ((int)kcs->param_tls)[0] = s_a;
> > > ((int)kcs->param_tls)[1] = s_b;
> > > result = sum(a, b);
> > > s_result = ((int)kcs->retval_tls)[0];
> >
> > Ok. I see how that works when both caller and callee are instrumented.
> >
> > I assume that's tracked for all arguments regardless of type? e.g. that
> > includes pointers, structs, etc?
> 
> That's tracked for all parameters passed to a function, whether on the
> stack or in registers.
> This includes cases when a struct can be passed by value, e.g.:
> 
> struct arg {
>   int a, b;
> };
> 
> int sum(struct arg arg) {
>   return arg.a + arg.b;
> }
> 
> For anything passed to the function by pointer, we only track the
> initializedness of the pointer itself at the call site and the
> function prologue.

Yep, that was what I expected. Thanks for confirming!

> Why is that? Because we already track the state of the pointed-to
> memory elsewhere.
> For every heap allocation, there is shadow memory backing it, which is
> initially poisoned (unless it is a GFP_ZERO allocation).
> Similarly, shadow memory backs every stack allocation. The state of
> stack allocations is initially set by __msan_poison_alloca().
> When we write an initialized value to a memory buffer, the
> instrumentation code updates that buffer's shadow memory.
> 
> So the fact that a buffer is passed to a function by pointer does not
> change that buffer's state; that's why we don't need additional
> tracking of the pointed-to memory around the calls.
> (this is all assuming both the caller and the callee are instrumented).

Understood.

[...]

> > > We have few tools to deal with such cases. It often helps to move the
> > > border between the instrumented and non-instrumented code by applying
> > > __no_kmsan_checks or __no_sanitize_memory until we get to a point
> > > where there are no incorrect shadow arguments.
> > > Another way to deal with uninitialized data coming from
> > > non-instrumented code is kmsan_unpoison_memory(address, size).
> > > Unfortunately, as Thomas pointed out, we can't use it for locals in
> > > non-instrumented code.
> >
> > It's clearly not sufficient to use kmsan_unpoison_memory() for locals,
> > and I think it's non-sensical because conceptually they aren't memory.
> >
> > I don't think kmsan_unpoison_entry_regs() alone is sufficient for the
> > regs. Surely the pointer argument itself needs to be unpoisoned before
> > being passed to a callee?
> >
> > Otherwise, when pointer to regs is passed as an argument to a callee,
> > the callee will consume stale shadow, which might indicate that the
> > pointer to regs is uninitialized?
> >
> > That seems to be exactly what we're hitting with the irq entry state, so
> > I'm surprised we're not hitting it for the regs. Is that sheer luck, or
> > is something else protecting us?
> 
> Struct regs is created by non-instrumented code on the stack, that's
> why its shadow contains whatever garbage that stack slot had when
> leaving instrumented code.

I'm talking about the *pointer itself*, not the memory the pointer
points to. I understand that the shadow for the memory will contain
garbage, but I'm asking about the tracking of the *local variable* that
is a pointer to that memory.

Note: I assume that by 'struct regs' you mean 'struct pt_regs'.

> For the data pointed to by individual registers, there are usually two
> possibilities:
> 1. It is allocated by instrumented code and already has some state,
> and struct regs is being used to transport it between instrumented
> regions.
> 2. It is allocated by non-instrumented code, belongs to the same shim
> that created it, and instrumented code is not supposed to touch it.
> 
> So I think this is not "sheer luck", but rather the fact that
> non-instrumented code doesn't allocate memory.

As above, that doesn't address my concern, becuase I'm talking about the
*pointer* not the *memory*. I still think that it's sheer luck that
callees think the *pointer* is initiailiased.

What ensures that the *pointer to the regs* is marked as initialised?

The entry asm doesn't ensure that. The noinstr entry code only unpoisons
the memory, but not the pointer itself. The instrumented code called by
the noinstr code doesn't seem to have anything to suppress checks on the
pointer argument.

AFAICT, at the point where noinstr code calls instrumented code, we're
relying on stable shadow indicating that register arguments have been
initialized, including the pointer to the regs.

[...]

> > > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
> > > > > making all its inputs initialized. This is the easiest solution, it may
> > > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps
> > > > > doing so is better than dealing with more false positives in the interrupt code.
> > > > >
> > > > > Another option for the callee would be applying `__always_inline`, so that
> > > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
> > > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
> > > > > `__always_inline`, it might be the right thing to do.
> > > >
> > > > We can do that, but this really suggests that there's a fundemantal
> > > > inability to pass arguments between code which is noinstr and code which
> > > > isinstrumented with AddressSanitizer, and that's inevitably going to
> > > > bite us in future.
> > >
> > > This is true (assuming you mean MemorySanitizer), but:
> >
> > Sorry, yes, I meant MemorySanitizer.
> >
> > > - I don't think we can avoid that, given that there will always be
> > > non-instrumented code;
> > > - so far the number of annotations has been manageable.
> >
> > As above, I suspect that we're actually missing many necessary
> > annotations and getting away with this by accident.
> >
> > I suspect that we need additional compiler support to say something like
> > "assume this argument/return value IS initialized".
> 
> There is already a more generic one saying "assume all
> arguments/return values are uninitialized" - that's __no_kmsan_checks.
> Do we need a narrower attribute knowing that for a call from
> uninstrumented code we'll anyway need to mark all the arguments?

We certainly need to be inhibiting checks in cases where we're not
today.

How does that work for something like:

	void do_something_with_foo(struct foo *f)
	{
		[[ accesses foo fields here ]]
	}

	void called_from_noinstr(struct foo *f) __no_kmsan_checks
	{
		do_something_with_foo(f);
	}

	noinstr void foo(void)
	{
		struct foo f = { ... };
		called_from_noinstr(f);
	}

... e.g. will called_from_noinstr() initialize the argument when calling
do_something_with_foo() ?

[...]

> > Consider when the CPU is executing some function foo(), and an exception
> > is taken. Within the exception handler, code will clobber the context
> > state. AFAICT, that state is not saved/restored, and nor is it zeroed
> > prior to exception return. The exception handler returns back to the
> > middle of foo(). From foo's PoV, the context state has been clobbered
> > arbitrarily, and that clobbered state could trigger false positives or
> > false negatives.
> >
> > The way kmsan_get_context() uses in_task() will avoid that in *some*
> > cases, but not *all* cases. In particular, I don't think that's going to
> > help when a fault is taken for a uaccess.
> >
> > Maybe I'm missing something here?
> >
> > > > I see kmsan_get_context() has an in_task() check, but that can't help
> > > > with nested exceptions, so this doesn't look right at all.
> > >
> > > KMSAN context is per-task, and if !in_task(), it is per-CPU.
> > >
> > > For the nested exceptions, we conservatively bail out (see
> > > kmsan_in_runtime()) to avoid deadlocking.
> > > This may cause false negatives.
> >
> > AFAICT that doesn't address the case I've described above.
> >
> > Mark.
> 
> You are right, it's quite messy for uaccess, and maybe other cases as well.
> In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq()
> to switch between different per-cpu contexts, but that didn't work out
> well, because preempt_count() could change in the middle of certain
> low-level functions (which the instrumentation ignored, as the
> reference to the context was only calculated in the prologue).

To be frank, I don't think we can rely on per-cpu contexts here, and
that shapre of check is the wrong approach.

AFAICT, the only way to make this work without false negatives is to
have this nest, with entry code saving/restoring the original context.
Given the size of that context, it seems impractical to put that on the
stack, so I don't see how we could reaonably make that work.

We might get away with clobbering the state when returning from an
exception, marking all everything as initialized, but that will lead to
false negatives a lot of the time.

> I think this part indeed works by sheer luck.
> We need an annotation telling the tool where an interrupt context
> starts and ends. On the compiler side, we'll probably also need an
> intrinsic to reload the context pointer.

As above, I think there are further problems with that.

Mark.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-12 16:24           ` Alexander Potapenko
  2026-05-12 17:46             ` Mark Rutland
@ 2026-05-13  0:36             ` Thomas Gleixner
  2026-05-13  7:54               ` Peter Zijlstra
  2026-05-27 12:39               ` Alexander Potapenko
  1 sibling, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-05-13  0:36 UTC (permalink / raw)
  To: Alexander Potapenko, Mark Rutland
  Cc: Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz,
	syzkaller-bugs, ruanjinjie

Alexander!

On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote:
> On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote:
>> > Again, this only matters because we are calling an instrumented function from
>> > a non-instrumented one, otherwise it's perfectly fine to call between
>> > compilation units.
>>
>> For context, can you explain how this is expected to work across
>> compilation units when the caller and callee *are* instrumented, when an
>> argument is passed in a register?
>
> Per KMSAN ABI, the metadata for the function arguments is passed via
> buffers in the so-called context state (see
> include/linux/kmsan_types.h)
> In the userspace, these buffers are thread-local variables referenced
> by inline loads and stores.
> In the kernel space, the compiler inserts a call
> __msan_get_context_state() at the beginning of every function, and
> then the instrumentation code uses whatever that function returned.
>
> Assume we have a function:
>
> int sum(int a, int b) {
>     result = a + b;
>     return result;
> }
>
> Its instrumented version looks roughly as follows (we'll omit origin
> tracking for simplicity):
>
> int sum(int a, int b) {
>   struct kmsan_context_state *kcs = __msan_get_context_state();
>   int s_a = ((int)kcs->param_tls)[0];  // shadow of a
>   int s_b = ((int)kcs->param_tls)[1];  // shadow of b
>   result = a + b;
>   s_result = s_a | s_b;
>   ((int)kcs->retval_tls)[0] = s_result;  // returned shadow
>   return result;
> }
>
> Most certainly `a` and `b` will be passed using registers, but that
> doesn't matter: their metadata is safe as long as the caller does:
>
> ((int)kcs->param_tls)[0] = s_a;
> ((int)kcs->param_tls)[1] = s_b;
> result = sum(a, b);
> s_result = ((int)kcs->retval_tls)[0];

Thanks for the explanation.

TBH, I'm failing to see how this is not inSANe, but I might be missing
something obvious.

What really bothers me is the lack of distinction between arguments
passed by value vs. arguments passed by reference. For me there is a
clear distinction.

Any argument passed by reference, i.e. pointer, obviously requires to be
validated at the actual usage site, which might be several levels deep
into the call chain. That obviously does not include by value arguments
which are passed on the stack due to exceeding the register passing
constraints,

But for arguments passed by value, I fail to see the justification.

The compiler can validate at the call site that the value arguments are
properly initialized both at compile and at runtime:

    1) If the argument value is a constant it is initialized

    2) If the argument value is read from memory at the call site, then
       the function which reads has to validate that the memory is
       initialized before reading it.

    3) If the argument value was returned from a previous function call
       then that invoked function has to make sure that the returned
       value is properly initialized.

    4) If the argument is a local variable which is not initialized
       that's catched at compile time with and without KMSAN. If people
       ignore that warning, then they will ignore the resulting KMSAN
       splat too.

    5) If the argument is only propagated to the next call, then #1 - #4
       apply to the caller, i.e. all of that propagates through the
       call chain.

    6) If the argument is handed in from the previous caller and
       modified by the function which calls the next one, then still #1
       - #4 apply.

IOW, the producer of a value argument has to ensure that the value
argument is properly initialized.

This whole 'is initialized' propagation for by value arguments is
overengineered voodoo in my opinion. Why?

  If the compiler can't ensure at the producer site that something is
  properly initialized and then can't ensure that the intermediates
  handing the value argument down the callchain are doing the right
  thing, then KMSAN won't help either as it is then by definition
  equally untrustworthy as the untrustworthy compiler which emitted the
  KMSAN self-validation along with the broken code.

  So it just adds overhead for nothing and makes everything look "sane"
  while in fact it provides no value and creates exactly the problems we
  are debating right now.

No?

> All the metadata tracking is implemented using an LLVM IR pass, which
> ensures that the shadow for each uninitialized value is tracked
> regardless of where that value is stored - be that heap memory, stack
> or registers:
> - when a value is created or loaded from memory, the compiler inserts
> SSA registers corresponding to that value's shadow;
> - when a value is written to memory, a shadow store is inserted;
> - when a value is used in an operation, the result of that operation
> receives a shadow value depending on the operands and their shadow
> values;
> - when a value is passed to a function, the corresponding context
> state stores and loads are emitted.
>
> Now, this works fine under the assumption that all code in the kernel
> is instrumented with KMSAN.
> However this is not true.

We know that since KMSAN got integrated and you pointed yourself to that
discussion which we had back then about handling the @regs pointer
argument.

There was a brief mentioning of the same problem about non-pointer
arguments, but that dried out because the test runs did not barf anymore
after the @regs unpoisoning got fixed.

There were discussions about utilizing instrumentation_begin()/end() to
tell the compiler that this instrumentation_begin() lifts the 'do not
sanitize directive' and it could rightfully inject instrumentation code
there including calls, but obviously nothing happened.

What really worries me more than this particular problem at hand is that
we have tons of places which invoke instrumented functions from
non-instrumented code with arguments originating from the
non-instrumented code. Here is an example:

DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
{
	do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
		      0, NULL);
}

#define DEFINE_IDTENTRY_ERRORCODE(func)					\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code);		\
									\
__visible noinstr void func(struct pt_regs *regs,			\
			    unsigned long error_code)			\
{									\
	irqentry_state_t state = irqentry_enter(regs);			\
									\
	instrumentation_begin();					\
	__##func (regs, error_code);					\
	instrumentation_end();						\
	irqentry_exit(regs, state);					\
}									\
									\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code)

@error_code comes all the way from the ASM entry code.

The function which gets inlined within the instrumentation_begin()/end()
region in the non-instrumented exc_invalid_tss() is:

static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code)
{
	do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL);
}

Anything else than @regs (already unpoisoned) and @error_code is compile
time constant.

do_error_trap() is in the same compilation unit, static and
instrumentable:

static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
	unsigned long trapnr, int signr, int sicode, void __user *addr)
{
	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");

	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
			NOTIFY_STOP) {
		cond_local_irq_enable(regs);
		do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
		cond_local_irq_disable(regs);
	}
}

That should suffer from the very same problem vs. _all_ arguments
except @regs, right?

But no, it does not. The same compilers which emit imaginary initialized
checks for something they out of lined themself despite knowing that the
only calling context is not instrumentable, ignore the very same for
do_error_trap().

According to the disassembly do_error_trap() just goes and invokes
notify_die() without any checks and notify_die() takes the same
(reordered) arguments uninspected and stores them into a on stack data
struct which is properly poisened first and then unpoisoned for each
member stored.

That's either insanely inconsistent or I've missed something in the
resulting ASM maze. But I'm pretty sure that I did not, so I spare you
the ASM analysis as you surely are able to look at the resulting
inconsistent assembly code yourself (clang-19 debian based).

>> > So what we could do is to add annotations manualy on either the caller side or
>> > the callee side.
>>
>> Without some understanding of those "hints" you mention, I don't see how
>> we can do that on the caller side.
>
> In this case by "hints" I meant any kind of instrumentation that the
> compiler could have inserted automatically to mark the data in the
> instrumented functions as initialized.
> There are no such "hints" right now (apart from letting KMSAN
> instrument the function).
> As for the manual annotations, we only have the mentioned
> `__no_kmsan_checks`, `__no_sanitize_memory`, and
> kmsan_unpoison_memory().

Which is a sad state of affairs. These hints have been debated four
years ago for the very same reasons and nothing happened since then
other than the very same compiler becoming more agressive about
uninling.

>> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(),
>> > making all its inputs initialized. This is the easiest solution, it may
>> > introduce false negatives, but we are on a very thin ice anyway, so perhaps
>> > doing so is better than dealing with more false positives in the interrupt code.
>> >
>> > Another option for the callee would be applying `__always_inline`, so that
>> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented.
>> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already
>> > `__always_inline`, it might be the right thing to do.
>> >
>> We can do that, but this really suggests that there's a fundemantal
>> inability to pass arguments between code which is noinstr and code which
>> isinstrumented with AddressSanitizer, and that's inevitably going to
>> bite us in future.

For the problem at hand I really want to go and mark the inline
__always_inline so it's gone for now.

I hate that "solution" with a passion as it just papers over the
underlying problem and I definitely agree with Mark that this can only
to be considered to be a temporary band aid and is going to bite use
over and over again.

As I pointed out with exc_invalid_tss() this just works by chance and
not by design and there are far more places than that to illustrate it.

> This is true (assuming you mean MemorySanitizer), but:
> - I don't think we can avoid that, given that there will always be
> non-instrumented code;
> - so far the number of annotations has been manageable.

I agree that it's unavoidable, but I don't agree with the second
sentiment unless you meant: "Has been manageable by sheer luck".

What I said in that four year old discussion you pointed to earlier

  "I'm all for making use of advanced instrumentation, validation and
   debugging features, but this mindset of 'make the code comply to what
   the tool of today provides' is fundamentally wrong. Tools have to
   provide value to the programmer and not the other way round.

   Yes, it's more work on the tooling side, but the tooling side is
   mostly a one time effort while chasing the false positives is a long
   term nightmare."

still applies and while we might paper over the underlying problem for
now to keep the CI robots happy, we really have to come up with
something which is reliable and at the same time provides the maximum
coverage for instrumentation.

After reading back on that old thread I still think that we should stick
some hint for the compiler into instrumentation_begin()/end().

These macros exist to provide objtool a way to validate that
non-instrumentable code does not accidentaly call out into
instrumentable code without someone having done the reasoning why it is
safe. But at the same time these sections obviously can tell the
compiler that the by value arguments handed to any function invoked
within that section have to be considered valid.

I deliberately restricted this to by value arguments because by
reference arguments need to be explicitly covered as we do today for
@regs (also see above).

So the compiler can either emit the required KMSAN magic right before
the call or create a wrapper function, which does the required context
initialization.

IIRC, this has been debated four years ago already in some form.

But now four years down the road that obviously does not solve the
problem because nothing happened and we have to deal with existing
compilers and the only thing we can do until compiler people get their
act together is work around this insanity in kernel code again.

Why does TCMalloc come to my mind right now?

Ranted enough.

Here is what I came up with which seems to be "workable" after throwing
at least a dozen other "brilliant and trivial" ideas out:

     instrumentation_begin();
-    func(arg1, arg2);
+    call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2);     
     instrumentation_end();

and let call_instr() inject the necessary KMSAN fixups before invoking
@func. That would probably do some useless stuff when @func is an empty
stub, a __always_inline function or subject to arbitrary inlining
decisions of the compiler. But that's not the end of the world and the
conversion can be mostly done by coccinelle or your favorite code
conversion tool of the day.

Something insane like this:

#define KMSAN_FIXUP_VA_1(insane, type, arg)          insane(type, arg)
#define KMSAN_FIXUP_VA_2(insane, arg, ...)           KMSAN_FIXUP_VA_1(insane, arg, __VA_ARGS__)
#define KMSAN_FIXUP_VA_3(insane, type, arg, ...)     insane(type, arg), KMSAN_FIXUP_VA_2(insane, __VA_ARGS__)
#define KMSAN_FIXUP_VA_4(insane, arg, ...)           KMSAN_FIXUP_VA_3(insane, arg, __VA_ARGS__)

#define KMSAN_FIXUP_VA_FOR_EACH(insane, ...)                             \
        CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_ARGS__)

#define KMSAN_VA_ARG_2(a1, a2)                        (a2)
#define KMSAN_VA_ARG_4(a1, a2, a3, a4)                (a2), (a4)

#define VA_CALL(f, ...)                                            \
        f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__))

#define kmsan_fixup_arg_int(arg)
        __kmsan_fixup_arg_4(&(arg))

#define KMSAN_FIXUP_ARG(type, arg)                                 \
        CONCATENATE(kmsan_fixup_arg_, type)(arg)

#define call_instr(f, ...)                                         \
do {                                                               \
        VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__);      \
        VA_CALL(f, __VA_ARGS__);                                   \
} while (0)

That's obiously restricted to _two_ arguments and _one_ data type for
illustration purposes, but making it work for the limited number of
arguments and data types shouldn't be rocket science..

As a bonus it will fail to compile immediately for unknown data types
and argument numbers. It preserves type safety by not playing silly
casting games with the arguments on the way.

Not that I'm a fan of this, but that's at least better than playing
whack a mole with the insanities of toolchains forever.

Thoughts?

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-13  0:36             ` Thomas Gleixner
@ 2026-05-13  7:54               ` Peter Zijlstra
  2026-05-27 12:39               ` Alexander Potapenko
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2026-05-13  7:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alexander Potapenko, Mark Rutland, Dmitry Vyukov, syzbot,
	kasan-dev, linux-kernel, luto, syzkaller-bugs, ruanjinjie

On Wed, May 13, 2026 at 02:36:10AM +0200, Thomas Gleixner wrote:

> But now four years down the road that obviously does not solve the
> problem because nothing happened and we have to deal with existing
> compilers and the only thing we can do until compiler people get their
> act together is work around this insanity in kernel code again.

> Thoughts?

KMSAN is debug cruft. I don't see why we cannot demand they fix it and
make KMSAN depend whatever clang will have the thing fixed. There is no
reason we need to support existing compilers what so ever for debug
code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-13  0:36             ` Thomas Gleixner
  2026-05-13  7:54               ` Peter Zijlstra
@ 2026-05-27 12:39               ` Alexander Potapenko
  2026-06-09 13:34                 ` Alexander Potapenko
  2026-06-25 20:54                 ` Thomas Gleixner
  1 sibling, 2 replies; 18+ messages in thread
From: Alexander Potapenko @ 2026-05-27 12:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Wed, May 13, 2026 at 2:36 AM Thomas Gleixner <tglx@kernel.org> wrote:
>
> Alexander!


>
> Thanks for the explanation.
>
> TBH, I'm failing to see how this is not inSANe, but I might be missing
> something obvious.
>
> What really bothers me is the lack of distinction between arguments
> passed by value vs. arguments passed by reference. For me there is a
> clear distinction.
>
> Any argument passed by reference, i.e. pointer, obviously requires to be
> validated at the actual usage site, which might be several levels deep
> into the call chain. That obviously does not include by value arguments
> which are passed on the stack due to exceeding the register passing
> constraints,
>
> But for arguments passed by value, I fail to see the justification.
>
> The compiler can validate at the call site that the value arguments are
> properly initialized both at compile and at runtime:
>
>     1) If the argument value is a constant it is initialized
>
>     2) If the argument value is read from memory at the call site, then
>        the function which reads has to validate that the memory is
>        initialized before reading it.
>
>     3) If the argument value was returned from a previous function call
>        then that invoked function has to make sure that the returned
>        value is properly initialized.
>
>     4) If the argument is a local variable which is not initialized
>        that's catched at compile time with and without KMSAN. If people
>        ignore that warning, then they will ignore the resulting KMSAN
>        splat too.
>
>     5) If the argument is only propagated to the next call, then #1 - #4
>        apply to the caller, i.e. all of that propagates through the
>        call chain.
>
>     6) If the argument is handed in from the previous caller and
>        modified by the function which calls the next one, then still #1
>        - #4 apply.
>
> IOW, the producer of a value argument has to ensure that the value
> argument is properly initialized.
>
> This whole 'is initialized' propagation for by value arguments is
> overengineered voodoo in my opinion. Why?

Sorry, I left out a big chunk of how KMSAN works.
In fact, the above cases are covered by
`-fsanitize-memory-param-retval`, which eagerly checks parameters and
return values having the `noundef` attribute.
For such cases, the compiler will not propagate the shadow into TLS.

Yet we still have to maintain TLS parameter passing for types that
lack the `noundef` attribute (e.g. structs with internal padding or
vector types) and for variadic function arguments.

>
>   If the compiler can't ensure at the producer site that something is
>   properly initialized and then can't ensure that the intermediates
>   handing the value argument down the callchain are doing the right
>   thing, then KMSAN won't help either as it is then by definition
>   equally untrustworthy as the untrustworthy compiler which emitted the
>   KMSAN self-validation along with the broken code.

Agreed.

>   So it just adds overhead for nothing and makes everything look "sane"
>   while in fact it provides no value and creates exactly the problems we
>   are debating right now.
>
> No?

I hope the above addresses this concern to some extent.
While eager checks simplify the instrumentation code, they (as
mentioned) do not completely remove the need for TLS parameter
passing.
Long story short, it's unsafe to assume at the beginning of an
instrumented function that all its by-value arguments are initialized.
If that were true, we would indeed have no problems with
non-instrumented functions calling instrumented ones.


> >
> > Now, this works fine under the assumption that all code in the kernel
> > is instrumented with KMSAN.
> > However this is not true.
>
> We know that since KMSAN got integrated and you pointed yourself to that
> discussion which we had back then about handling the @regs pointer
> argument.
>
> There was a brief mentioning of the same problem about non-pointer
> arguments, but that dried out because the test runs did not barf anymore
> after the @regs unpoisoning got fixed.
>
> There were discussions about utilizing instrumentation_begin()/end() to
> tell the compiler that this instrumentation_begin() lifts the 'do not
> sanitize directive' and it could rightfully inject instrumentation code
> there including calls, but obviously nothing happened.

Right. At that point, we thought __no_kmsan_checks should be enough to
keep the complexity manageable.
The more I think about it now, the more I am convinced it shouldn't be
hard to implement intrinsics that
allow instrumentation between
instrumentation_begin()/instrumentation_end() and treat all values
passed into that region as initialized.
This should fix problems with noinstr functions calling instrumented functions.

> What really worries me more than this particular problem at hand is that
> we have tons of places which invoke instrumented functions from
> non-instrumented code with arguments originating from the
> non-instrumented code. Here is an example:
>
> DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss)
> {
>         do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV,
>                       0, NULL);
> }
>
> #define DEFINE_IDTENTRY_ERRORCODE(func)                                 \
> static __always_inline void __##func(struct pt_regs *regs,              \
>                                      unsigned long error_code);         \
>                                                                         \
> __visible noinstr void func(struct pt_regs *regs,                       \
>                             unsigned long error_code)                   \
> {                                                                       \
>         irqentry_state_t state = irqentry_enter(regs);                  \
>                                                                         \
>         instrumentation_begin();                                        \
>         __##func (regs, error_code);                                    \
>         instrumentation_end();                                          \
>         irqentry_exit(regs, state);                                     \
> }                                                                       \
>                                                                         \
> static __always_inline void __##func(struct pt_regs *regs,              \
>                                      unsigned long error_code)
>
> @error_code comes all the way from the ASM entry code.
>
> The function which gets inlined within the instrumentation_begin()/end()
> region in the non-instrumented exc_invalid_tss() is:
>
> static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code)
> {
>         do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL);
> }
>
> Anything else than @regs (already unpoisoned) and @error_code is compile
> time constant.
>
> do_error_trap() is in the same compilation unit, static and
> instrumentable:
>
> static void do_error_trap(struct pt_regs *regs, long error_code, char *str,
>         unsigned long trapnr, int signr, int sicode, void __user *addr)
> {
>         RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
>
>         if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) !=
>                         NOTIFY_STOP) {
>                 cond_local_irq_enable(regs);
>                 do_trap(trapnr, signr, str, regs, error_code, sicode, addr);
>                 cond_local_irq_disable(regs);
>         }
> }
>
> That should suffer from the very same problem vs. _all_ arguments
> except @regs, right?
>
> But no, it does not. The same compilers which emit imaginary initialized
> checks for something they out of lined themself despite knowing that the
> only calling context is not instrumentable, ignore the very same for
> do_error_trap().
>
> According to the disassembly do_error_trap() just goes and invokes
> notify_die() without any checks and notify_die() takes the same
> (reordered) arguments uninspected and stores them into a on stack data
> struct which is properly poisened first and then unpoisoned for each
> member stored.

Let's look at the compilation result with `-S -emit-llvm` - it is
generally easier to comprehend and gives better understanding of
what's going on with the instrumentation:

; Function Attrs: fn_ret_thunk_extern noredzone nounwind
null_pointer_is_valid sanitize_memory
define internal fastcc void @do_error_trap(ptr noundef %regs, i64
noundef %error_code, ptr noundef %str, i64 noundef range(i64 0, 13)
%trapnr, i32 noundef range(i32 4, 12) %signr, i32 noundef range(i32 0,
3) %sicode, ptr noundef %addr) unnamed_addr #9 align 16 !dbg !10395 {
entry:
  %0 = call ptr @__msan_get_context_state() #19, !dbg !11994
  %retval_shadow = getelementptr i8, ptr %0, i64 800, !dbg !11994
  %param_origin = getelementptr i8, ptr %0, i64 3208, !dbg !11994
  %retval_origin = getelementptr i8, ptr %0, i64 4008, !dbg !11994
    #dbg_value(ptr %regs, !10394, !DIExpression(), !11995)
    #dbg_value(i64 %error_code, !10399, !DIExpression(), !11995)
    #dbg_value(ptr %str, !10400, !DIExpression(), !11995)
    #dbg_value(i64 %trapnr, !10401, !DIExpression(), !11995)
    #dbg_value(i32 %signr, !10402, !DIExpression(), !11995)
    #dbg_value(i32 %sicode, !10403, !DIExpression(), !11995)
    #dbg_value(ptr %addr, !10404, !DIExpression(), !11995)
  call void @__sanitizer_cov_trace_pc() #20, !dbg !11994
  %conv = trunc nuw nsw i64 %trapnr to i32, !dbg !11996
  store i32 0, ptr %retval_shadow, align 8, !dbg !11997
  %call = call i32 @notify_die(i32 noundef 8, ptr noundef %str, ptr
noundef %regs, i64 noundef %error_code, i32 noundef %conv, i32 noundef
%signr) #21, !dbg !11997
...

There are no checks for `@notify_die` arguments because
`@do_error_trap` receives them as `noundef`.
If eager checks are enabled, KMSAN pass knows that these arguments
were already checked previously (see
https://www.llvm.org/doxygen/MemorySanitizer_8cpp_source.html#l02149),
so it omits the checks.

`@do_error_trap` is not instrumented, but even if it were, the check
for `%error_code` would have been pushed to its caller because of the
`noundef` attribute.

; Function Attrs: disable_sanitizer_instrumentation
fn_ret_thunk_extern noinline noprofile noredzone nosanitize_coverage
nounwind null_pointer_is_valid
define dso_local void @exc_invalid_tss(ptr noundef %regs, i64 noundef
%error_code) local_unnamed_addr #4 section ".noinstr.text" align 16
!dbg !10467 {
entry:
    #dbg_value(ptr %regs, !10469, !DIExpression(), !10472)
    #dbg_value(i64 %error_code, !10470, !DIExpression(), !10472)
  %call = call i8 @irqentry_enter(ptr noundef %regs) #21, !dbg !10473
    #dbg_value(i8 %call, !10471, !DIExpression(), !10472)
  call void asm sideeffect "801: nop\0A\09.pushsection
.discard.annotate_insn, \22M\22, @progbits, 8; .long 801b - ., 3;
.popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 801) #19, !dbg
!10474, !srcloc !10477
    #dbg_value(ptr %regs, !10478, !DIExpression(), !10482)
    #dbg_value(i64 %error_code, !10481, !DIExpression(), !10482)
  call fastcc void @do_error_trap(ptr noundef %regs, i64 noundef
%error_code, ptr noundef nonnull @.str.15, i64 noundef 10, i32 noundef
11, i32 noundef 0, ptr noundef null) #22, !dbg !10485
  call void asm sideeffect "802: nop\0A\09.pushsection
.discard.annotate_insn, \22M\22, @progbits, 8; .long 802b - ., 4;
.popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 802) #19, !dbg
!10486, !srcloc !10489
  call void @irqentry_exit(ptr noundef %regs, i8 %call) #21, !dbg !10490
  ret void, !dbg !10473
}

So, if a `noundef` value is passed along a chain of function calls, it
is checked only once, at the point where we don't yet know that it is
`noundef`.
As mentioned, for `noundef` arguments, no shadow is stored in the TLS.
Thus, even if non-instrumented functions are along the way, their
callees still assume that the `noundef` argument is initialized (which
may introduce false negatives, but these are inevitable when
non-instrumented code is present).
To achieve the same behavior for the function argument with TLS shadow
propagation, we'll indeed need to enable instrumentation within the
instrumentation_begin()/instrumentation_end() region.

I think doing so will resolve the issues with instrumented functions
being called from non-instrumented functions in the same KMSAN
context, and will not require the call_instr() magic you are
suggesting.
So let me finally take a stab at it.

Note that the issue with incorrect context tracking in [soft]irqs is
orthogonal to this and will need to be addressed separately.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-27 12:39               ` Alexander Potapenko
@ 2026-06-09 13:34                 ` Alexander Potapenko
  2026-06-22  9:37                   ` Alexander Potapenko
  2026-06-25 21:04                   ` Thomas Gleixner
  2026-06-25 20:54                 ` Thomas Gleixner
  1 sibling, 2 replies; 18+ messages in thread
From: Alexander Potapenko @ 2026-06-09 13:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

> There are no checks for `@notify_die` arguments because
> `@do_error_trap` receives them as `noundef`.
> If eager checks are enabled, KMSAN pass knows that these arguments
> were already checked previously (see
> https://www.llvm.org/doxygen/MemorySanitizer_8cpp_source.html#l02149),
> so it omits the checks.
>
> `@do_error_trap` is not instrumented, but even if it were, the check
> for `%error_code` would have been pushed to its caller because of the
> `noundef` attribute.
>
> ; Function Attrs: disable_sanitizer_instrumentation
> fn_ret_thunk_extern noinline noprofile noredzone nosanitize_coverage
> nounwind null_pointer_is_valid
> define dso_local void @exc_invalid_tss(ptr noundef %regs, i64 noundef
> %error_code) local_unnamed_addr #4 section ".noinstr.text" align 16
> !dbg !10467 {
> entry:
>     #dbg_value(ptr %regs, !10469, !DIExpression(), !10472)
>     #dbg_value(i64 %error_code, !10470, !DIExpression(), !10472)
>   %call = call i8 @irqentry_enter(ptr noundef %regs) #21, !dbg !10473
>     #dbg_value(i8 %call, !10471, !DIExpression(), !10472)
>   call void asm sideeffect "801: nop\0A\09.pushsection
> .discard.annotate_insn, \22M\22, @progbits, 8; .long 801b - ., 3;
> .popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 801) #19, !dbg
> !10474, !srcloc !10477
>     #dbg_value(ptr %regs, !10478, !DIExpression(), !10482)
>     #dbg_value(i64 %error_code, !10481, !DIExpression(), !10482)
>   call fastcc void @do_error_trap(ptr noundef %regs, i64 noundef
> %error_code, ptr noundef nonnull @.str.15, i64 noundef 10, i32 noundef
> 11, i32 noundef 0, ptr noundef null) #22, !dbg !10485
>   call void asm sideeffect "802: nop\0A\09.pushsection
> .discard.annotate_insn, \22M\22, @progbits, 8; .long 802b - ., 4;
> .popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 802) #19, !dbg
> !10486, !srcloc !10489
>   call void @irqentry_exit(ptr noundef %regs, i8 %call) #21, !dbg !10490
>   ret void, !dbg !10473
> }
>
> So, if a `noundef` value is passed along a chain of function calls, it
> is checked only once, at the point where we don't yet know that it is
> `noundef`.
> As mentioned, for `noundef` arguments, no shadow is stored in the TLS.
> Thus, even if non-instrumented functions are along the way, their
> callees still assume that the `noundef` argument is initialized (which
> may introduce false negatives, but these are inevitable when
> non-instrumented code is present).
> To achieve the same behavior for the function argument with TLS shadow
> propagation, we'll indeed need to enable instrumentation within the
> instrumentation_begin()/instrumentation_end() region.
>
> I think doing so will resolve the issues with instrumented functions
> being called from non-instrumented functions in the same KMSAN
> context, and will not require the call_instr() magic you are
> suggesting.
> So let me finally take a stab at it.

Hi Thomas,

I have a draft Clang patch implementing the following intrinsics at
https://github.com/llvm/llvm-project/pull/202603:
- llvm.kmsan.instrumentation.begin
- llvm.kmsan.instrumentation.end
- llvm.kmsan.instrumentation.update.context (to reload the context
pointer; I am not using it yet).

Below is the kernel patch that I am applying to insert these
intrinsics into the instrumentation regions.
I thought it would be a good idea to warn if
llvm.kmsan.instrumentation.{begin,end} are called from functions that
are fully instrumented or have `no_sanitize("memory")` (aka
__no_kmsan_checks), but surprisingly, there were loads of such
functions.

It turned out that WARN() and BUG() unconditionally invoke
instrumentation_begin() and instrumentation_end(), even for
instrumented functions.
Was there any long-term plan to fix that, or is my warning useless by design?

Patch follows:
```
diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
index bf675a8aef8ab..49a2ee638120b 100644
--- a/include/linux/instrumentation.h
+++ b/include/linux/instrumentation.h
@@ -4,6 +4,7 @@

 #ifdef CONFIG_NOINSTR_VALIDATION

+#include <linux/kmsan-checks.h>
 #include <linux/objtool.h>
 #include <linux/stringify.h>

@@ -12,6 +13,7 @@
  asm volatile(__stringify(c) ": nop\n\t" \
       ANNOTATE_INSTR_BEGIN(__ASM_BREF(c)) \
       : : "i" (c)); \
+ kmsan_instrumentation_begin(); \
 })
 #define instrumentation_begin() __instrumentation_begin(__COUNTER__)

@@ -47,6 +49,7 @@
  * part of the condition block and does not escape.
  */
 #define __instrumentation_end(c) ({ \
+ kmsan_instrumentation_end(); \
  asm volatile(__stringify(c) ": nop\n\t" \
       ANNOTATE_INSTR_END(__ASM_BREF(c)) \
       : : "i" (c)); \
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index e1082dc40abc2..be493f836acf0 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -12,6 +12,7 @@

 #include <linux/types.h>

+#ifndef __ASSEMBLY__
 #ifdef CONFIG_KMSAN

 /**
@@ -72,6 +73,11 @@ void kmsan_copy_to_user(void __user *to, const void
*from, size_t to_copy,
  */
 void kmsan_memmove(void *to, const void *from, size_t to_copy);

+void llvm_kmsan_instrumentation_begin(void)
__asm__("llvm.kmsan.instrumentation.begin");
+void llvm_kmsan_instrumentation_end(void)
__asm__("llvm.kmsan.instrumentation.end");
+#define kmsan_instrumentation_begin() llvm_kmsan_instrumentation_begin()
+#define kmsan_instrumentation_end() llvm_kmsan_instrumentation_end()
+
 #else

 static inline void kmsan_poison_memory(const void *address, size_t size,
@@ -93,6 +99,10 @@ static inline void kmsan_memmove(void *to, const
void *from, size_t to_copy)
 {
 }

+#define kmsan_instrumentation_begin() do { } while (0)
+#define kmsan_instrumentation_end() do { } while (0)
+
 #endif
+#endif // !__ASSEMBLY__

 #endif /* _LINUX_KMSAN_CHECKS_H */
```

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-06-09 13:34                 ` Alexander Potapenko
@ 2026-06-22  9:37                   ` Alexander Potapenko
  2026-06-25 21:06                     ` Thomas Gleixner
  2026-06-25 21:04                   ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Alexander Potapenko @ 2026-06-22  9:37 UTC (permalink / raw)
  To: Thomas Gleixner, Mark Rutland, Marco Elver
  Cc: Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz,
	syzkaller-bugs, ruanjinjie

>
> Hi Thomas,
>
> I have a draft Clang patch implementing the following intrinsics at
> https://github.com/llvm/llvm-project/pull/202603:
> - llvm.kmsan.instrumentation.begin
> - llvm.kmsan.instrumentation.end
> - llvm.kmsan.instrumentation.update.context (to reload the context
> pointer; I am not using it yet).

For the record, Marco Elver has an alternative proposal: modify
__kmsan_get_context() to take _RET_IP_ from the caller.
Then, if noinstr function A calls an instrumented function B, B will
pass the return address belonging to A to __kmsan_get_context(), and
the latter can check that return address and wipe out the argument
shadow if it belongs to the noinstr function.

This solution is quite elegant; it requires only minor compiler
changes, and won't require any kernel annotations.
However, there are downsides:
- We are slowing down each call, including those between instrumented functions.
- For noinstr calls from modules, we'll need complex lookups involving
RCU, with a potential for infinite recursion.




> Below is the kernel patch that I am applying to insert these
> intrinsics into the instrumentation regions.
> I thought it would be a good idea to warn if
> llvm.kmsan.instrumentation.{begin,end} are called from functions that
> are fully instrumented or have `no_sanitize("memory")` (aka
> __no_kmsan_checks), but surprisingly, there were loads of such
> functions.
>
> It turned out that WARN() and BUG() unconditionally invoke
> instrumentation_begin() and instrumentation_end(), even for
> instrumented functions.
> Was there any long-term plan to fix that, or is my warning useless by design?
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-06-22  9:37                   ` Alexander Potapenko
@ 2026-06-25 21:06                     ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-25 21:06 UTC (permalink / raw)
  To: Alexander Potapenko, Mark Rutland, Marco Elver
  Cc: Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz,
	syzkaller-bugs, ruanjinjie

On Mon, Jun 22 2026 at 11:37, Alexander Potapenko wrote:
>> I have a draft Clang patch implementing the following intrinsics at
>> https://github.com/llvm/llvm-project/pull/202603:
>> - llvm.kmsan.instrumentation.begin
>> - llvm.kmsan.instrumentation.end
>> - llvm.kmsan.instrumentation.update.context (to reload the context
>> pointer; I am not using it yet).
>
> For the record, Marco Elver has an alternative proposal: modify
> __kmsan_get_context() to take _RET_IP_ from the caller.
> Then, if noinstr function A calls an instrumented function B, B will
> pass the return address belonging to A to __kmsan_get_context(), and
> the latter can check that return address and wipe out the argument
> shadow if it belongs to the noinstr function.
>
> This solution is quite elegant; it requires only minor compiler
> changes, and won't require any kernel annotations.
> However, there are downsides:
> - We are slowing down each call, including those between instrumented functions.
> - For noinstr calls from modules, we'll need complex lookups involving
> RCU, with a potential for infinite recursion.

Let's not go there. I rather spend some effort to disentangle the
BUG/WARN maze.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-06-09 13:34                 ` Alexander Potapenko
  2026-06-22  9:37                   ` Alexander Potapenko
@ 2026-06-25 21:04                   ` Thomas Gleixner
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-25 21:04 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

On Tue, Jun 09 2026 at 15:34, Alexander Potapenko wrote:
>> So let me finally take a stab at it.
>
> Hi Thomas,
>
> I have a draft Clang patch implementing the following intrinsics at
> https://github.com/llvm/llvm-project/pull/202603:
> - llvm.kmsan.instrumentation.begin
> - llvm.kmsan.instrumentation.end
> - llvm.kmsan.instrumentation.update.context (to reload the context
> pointer; I am not using it yet).
>
> Below is the kernel patch that I am applying to insert these
> intrinsics into the instrumentation regions.
> I thought it would be a good idea to warn if
> llvm.kmsan.instrumentation.{begin,end} are called from functions that
> are fully instrumented or have `no_sanitize("memory")` (aka
> __no_kmsan_checks), but surprisingly, there were loads of such
> functions.
>
> It turned out that WARN() and BUG() unconditionally invoke
> instrumentation_begin() and instrumentation_end(), even for
> instrumented functions.
> Was there any long-term plan to fix that, or is my warning useless by design?

The reason why e.g. WARN() and BUG() unconditionally do that is that we
decided that getting out the WARN/BUG information has priority and IIRC
we mostly did that to shut the objtool noinstr validation up. There is a
halfways comprehensible comment in x86...bug.h:

/*
 * This instrumentation_begin() is strictly speaking incorrect; but it
 * suppresses the complaints from WARN()s in noinstr code. If such a WARN()
 * were to trigger, we'd rather wreck the machine in an attempt to get the
 * message out than not know about it.
 */

It could be fixed by having some special version of BUG/WARN() which will
be a bit of a whack a mole game, but if it helps to make the sanitizers
saner, then that's probably worth it.

> Patch follows:
> ```
> diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
> index bf675a8aef8ab..49a2ee638120b 100644
> --- a/include/linux/instrumentation.h
> +++ b/include/linux/instrumentation.h
> @@ -4,6 +4,7 @@
>
>  #ifdef CONFIG_NOINSTR_VALIDATION
>
> +#include <linux/kmsan-checks.h>
>  #include <linux/objtool.h>
>  #include <linux/stringify.h>
>
> @@ -12,6 +13,7 @@
>   asm volatile(__stringify(c) ": nop\n\t" \
>        ANNOTATE_INSTR_BEGIN(__ASM_BREF(c)) \
>        : : "i" (c)); \
> + kmsan_instrumentation_begin(); \
>  })
>  #define instrumentation_begin() __instrumentation_begin(__COUNTER__)
>
> @@ -47,6 +49,7 @@
>   * part of the condition block and does not escape.
>   */
>  #define __instrumentation_end(c) ({ \
> + kmsan_instrumentation_end(); \
>   asm volatile(__stringify(c) ": nop\n\t" \
>        ANNOTATE_INSTR_END(__ASM_BREF(c)) \
>        : : "i" (c)); \

Looks very reasonable.

Thanks for looking into this!

       tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt
  2026-05-27 12:39               ` Alexander Potapenko
  2026-06-09 13:34                 ` Alexander Potapenko
@ 2026-06-25 20:54                 ` Thomas Gleixner
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2026-06-25 20:54 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel,
	luto, peterz, syzkaller-bugs, ruanjinjie

Alexander!

First of all sorry, that I ingored this for almost a month. I was
burried in other stuff including private things which turned out to be
more important.

On Wed, May 27 2026 at 14:39, Alexander Potapenko wrote:
> On Wed, May 13, 2026 at 2:36 AM Thomas Gleixner <tglx@kernel.org> wrote:
>> IOW, the producer of a value argument has to ensure that the value
>> argument is properly initialized.
>>
>> This whole 'is initialized' propagation for by value arguments is
>> overengineered voodoo in my opinion. Why?
>
> Sorry, I left out a big chunk of how KMSAN works.
> In fact, the above cases are covered by
> `-fsanitize-memory-param-retval`, which eagerly checks parameters and
> return values having the `noundef` attribute.
> For such cases, the compiler will not propagate the shadow into TLS.

Ah. That makes a lot more sense.

> Yet we still have to maintain TLS parameter passing for types that
> lack the `noundef` attribute (e.g. structs with internal padding or
> vector types) and for variadic function arguments.

Ok.

>>   So it just adds overhead for nothing and makes everything look "sane"
>>   while in fact it provides no value and creates exactly the problems we
>>   are debating right now.
>>
>> No?
>
> I hope the above addresses this concern to some extent.

It does.

> While eager checks simplify the instrumentation code, they (as
> mentioned) do not completely remove the need for TLS parameter
> passing.
> Long story short, it's unsafe to assume at the beginning of an
> instrumented function that all its by-value arguments are initialized.
> If that were true, we would indeed have no problems with
> non-instrumented functions calling instrumented ones.

Right, except for the already covered pointers

>> There were discussions about utilizing instrumentation_begin()/end() to
>> tell the compiler that this instrumentation_begin() lifts the 'do not
>> sanitize directive' and it could rightfully inject instrumentation code
>> there including calls, but obviously nothing happened.
>
> Right. At that point, we thought __no_kmsan_checks should be enough to
> keep the complexity manageable.
> The more I think about it now, the more I am convinced it shouldn't be
> hard to implement intrinsics that
> allow instrumentation between
> instrumentation_begin()/instrumentation_end() and treat all values
> passed into that region as initialized.
> This should fix problems with noinstr functions calling instrumented functions.

Ok.

>> According to the disassembly do_error_trap() just goes and invokes
>> notify_die() without any checks and notify_die() takes the same
>> (reordered) arguments uninspected and stores them into a on stack data
>> struct which is properly poisened first and then unpoisoned for each
>> member stored.
>
> Let's look at the compilation result with `-S -emit-llvm` - it is
> generally easier to comprehend and gives better understanding of
> what's going on with the instrumentation:
>
> ; Function Attrs: fn_ret_thunk_extern noredzone nounwind
> null_pointer_is_valid sanitize_memory
> define internal fastcc void @do_error_trap(ptr noundef %regs, i64
> noundef %error_code, ptr noundef %str, i64 noundef range(i64 0, 13)
> %trapnr, i32 noundef range(i32 4, 12) %signr, i32 noundef range(i32 0,
> 3) %sicode, ptr noundef %addr) unnamed_addr #9 align 16 !dbg !10395 {
> entry:
>   %0 = call ptr @__msan_get_context_state() #19, !dbg !11994
>   %retval_shadow = getelementptr i8, ptr %0, i64 800, !dbg !11994
>   %param_origin = getelementptr i8, ptr %0, i64 3208, !dbg !11994
>   %retval_origin = getelementptr i8, ptr %0, i64 4008, !dbg !11994
>     #dbg_value(ptr %regs, !10394, !DIExpression(), !11995)
>     #dbg_value(i64 %error_code, !10399, !DIExpression(), !11995)
>     #dbg_value(ptr %str, !10400, !DIExpression(), !11995)
>     #dbg_value(i64 %trapnr, !10401, !DIExpression(), !11995)
>     #dbg_value(i32 %signr, !10402, !DIExpression(), !11995)
>     #dbg_value(i32 %sicode, !10403, !DIExpression(), !11995)
>     #dbg_value(ptr %addr, !10404, !DIExpression(), !11995)
>   call void @__sanitizer_cov_trace_pc() #20, !dbg !11994
>   %conv = trunc nuw nsw i64 %trapnr to i32, !dbg !11996
>   store i32 0, ptr %retval_shadow, align 8, !dbg !11997
>   %call = call i32 @notify_die(i32 noundef 8, ptr noundef %str, ptr
> noundef %regs, i64 noundef %error_code, i32 noundef %conv, i32 noundef
> %signr) #21, !dbg !11997
> ...
>
> There are no checks for `@notify_die` arguments because
> `@do_error_trap` receives them as `noundef`.
> If eager checks are enabled, KMSAN pass knows that these arguments
> were already checked previously (see
> https://www.llvm.org/doxygen/MemorySanitizer_8cpp_source.html#l02149),
> so it omits the checks.

Interesting.

> So, if a `noundef` value is passed along a chain of function calls, it
> is checked only once, at the point where we don't yet know that it is
> `noundef`.
> As mentioned, for `noundef` arguments, no shadow is stored in the TLS.
> Thus, even if non-instrumented functions are along the way, their
> callees still assume that the `noundef` argument is initialized (which
> may introduce false negatives, but these are inevitable when
> non-instrumented code is present).
> To achieve the same behavior for the function argument with TLS shadow
> propagation, we'll indeed need to enable instrumentation within the
> instrumentation_begin()/instrumentation_end() region.

Right.

> I think doing so will resolve the issues with instrumented functions
> being called from non-instrumented functions in the same KMSAN
> context, and will not require the call_instr() magic you are
> suggesting.
> So let me finally take a stab at it.

Thank you!

> Note that the issue with incorrect context tracking in [soft]irqs is
> orthogonal to this and will need to be addressed separately.

Ok.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-06-25 21:06 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21 21:37 [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt syzbot
2026-04-29  6:29 ` Dmitry Vyukov
2026-04-29  8:09   ` Alexander Potapenko
2026-05-11 12:25     ` Thomas Gleixner
2026-05-12  9:33       ` Alexander Potapenko
2026-05-12 11:15         ` Mark Rutland
2026-05-12 16:24           ` Alexander Potapenko
2026-05-12 17:46             ` Mark Rutland
2026-05-22  6:26               ` Alexander Potapenko
2026-06-02 11:53                 ` Mark Rutland
2026-05-13  0:36             ` Thomas Gleixner
2026-05-13  7:54               ` Peter Zijlstra
2026-05-27 12:39               ` Alexander Potapenko
2026-06-09 13:34                 ` Alexander Potapenko
2026-06-22  9:37                   ` Alexander Potapenko
2026-06-25 21:06                     ` Thomas Gleixner
2026-06-25 21:04                   ` Thomas Gleixner
2026-06-25 20:54                 ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.