* [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt @ 2026-04-21 21:37 syzbot 2026-04-29 6:29 ` Dmitry Vyukov 0 siblings, 1 reply; 13+ messages in thread From: syzbot @ 2026-04-21 21:37 UTC (permalink / raw) To: linux-kernel, luto, peterz, syzkaller-bugs, tglx Hello, syzbot found the following issue on: HEAD commit: 4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000 kernel config: https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810 dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3 compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com bridge0: port 1(bridge_slave_0) entered blocking state bridge0: port 1(bridge_slave_0) entered forwarding state ===================================================== BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline] irqentry_exit+0x7b/0x760 kernel/entry/common.c:164 sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061 asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697 __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210 nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922 process_one_work kernel/workqueue.c:3302 [inline] process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385 worker_thread+0xee4/0x1590 kernel/workqueue.c:3466 kthread+0x53f/0x600 kernel/kthread.c:436 ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 Uninit was created at: slab_post_alloc_hook mm/slub.c:4576 [inline] slab_alloc_node mm/slub.c:4898 [inline] __do_kmalloc_node mm/slub.c:5294 [inline] __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403 kmalloc_reserve net/core/skbuff.c:635 [inline] __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713 alloc_skb include/linux/skbuff.h:1383 [inline] nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline] nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922 process_one_work kernel/workqueue.c:3302 [inline] process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385 worker_thread+0xee4/0x1590 kernel/workqueue.c:3466 kthread+0x53f/0x600 kernel/kthread.c:436 ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 CPU: 1 UID: 0 PID: 13 Comm: kworker/u8:1 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026 Workqueue: events_unbound nsim_dev_trap_report_work ===================================================== --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-04-21 21:37 [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt syzbot @ 2026-04-29 6:29 ` Dmitry Vyukov 2026-04-29 8:09 ` Alexander Potapenko 0 siblings, 1 reply; 13+ messages in thread From: Dmitry Vyukov @ 2026-04-29 6:29 UTC (permalink / raw) To: syzbot, kasan-dev, Alexander Potapenko Cc: linux-kernel, luto, peterz, syzkaller-bugs, tglx On Tue, 21 Apr 2026 at 23:37, syzbot <syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com> wrote: > > Hello, > > syzbot found the following issue on: > > HEAD commit: 4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel.. > git tree: upstream > console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000 > kernel config: https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810 > dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3 > compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 > > Downloadable assets: > disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz > vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz > kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com > +kmsan maintainers, not sure where it should be fixed: in irq code or in kmsan code > ===================================================== > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 > irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 > irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline] > irqentry_exit+0x7b/0x760 kernel/entry/common.c:164 > sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061 > asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697 > __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210 > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline] > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] > nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922 > process_one_work kernel/workqueue.c:3302 [inline] > process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385 > worker_thread+0xee4/0x1590 kernel/workqueue.c:3466 > kthread+0x53f/0x600 kernel/kthread.c:436 > ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > Uninit was created at: > slab_post_alloc_hook mm/slub.c:4576 [inline] > slab_alloc_node mm/slub.c:4898 [inline] > __do_kmalloc_node mm/slub.c:5294 [inline] > __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403 > kmalloc_reserve net/core/skbuff.c:635 [inline] > __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713 > alloc_skb include/linux/skbuff.h:1383 [inline] > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline] > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] > nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922 > process_one_work kernel/workqueue.c:3302 [inline] > process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385 > worker_thread+0xee4/0x1590 kernel/workqueue.c:3466 > kthread+0x53f/0x600 kernel/kthread.c:436 > ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158 > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 > > CPU: 1 UID: 0 PID: 13 Comm: kworker/u8:1 Not tainted syzkaller #0 PREEMPT(full) > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026 > Workqueue: events_unbound nsim_dev_trap_report_work > ===================================================== > > > --- > This report is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this issue. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > If the report is already addressed, let syzbot know by replying with: > #syz fix: exact-commit-title > > If you want to overwrite report's subsystems, reply with: > #syz set subsystems: new-subsystem > (See the list of subsystem names on the web dashboard) > > If the report is a duplicate of another one, reply with: > #syz dup: exact-subject-of-another-report > > If you want to undo deduplication, reply with: > #syz undup ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-04-29 6:29 ` Dmitry Vyukov @ 2026-04-29 8:09 ` Alexander Potapenko 2026-05-11 12:25 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Alexander Potapenko @ 2026-04-29 8:09 UTC (permalink / raw) To: Dmitry Vyukov, Mark Rutland Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, tglx, ruanjinjie On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Tue, 21 Apr 2026 at 23:37, syzbot > <syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com> wrote: > > > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 4ee64205ffaa Merge tag 'clk-for-linus' of git://git.kernel.. > > git tree: upstream > > console output: https://syzkaller.appspot.com/x/log.txt?x=1706d702580000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=ce4d707ce513c810 > > dashboard link: https://syzkaller.appspot.com/bug?extid=cdcfd55737fe43eeb3a3 > > compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8 > > > > Downloadable assets: > > disk image: https://storage.googleapis.com/syzbot-assets/eed48de63104/disk-4ee64205.raw.xz > > vmlinux: https://storage.googleapis.com/syzbot-assets/6ebe911e6995/vmlinux-4ee64205.xz > > kernel image: https://storage.googleapis.com/syzbot-assets/557a239d8722/bzImage-4ee64205.xz > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+cdcfd55737fe43eeb3a3@syzkaller.appspotmail.com > > > > +kmsan maintainers, not sure where it should be fixed: in irq code or > in kmsan code I believe Mark's recent changes introduced this. irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` and `state`. Because there is a lot of non-instrumented code around, we fail to initialize these variables. Instead, irqentry_enter_from_kernel_mode() explicitly calls kmsan_unpoison_entry_regs(regs) to take care of the registers, but not the state. We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` at the same place. > > > ===================================================== > > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 > > irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 > > irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline] > > irqentry_exit+0x7b/0x760 kernel/entry/common.c:164 > > sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061 > > asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697 > > __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210 > > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline] > > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] > > nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922 > > process_one_work kernel/workqueue.c:3302 [inline] > > process_scheduled_works+0xb65/0x1e40 kernel/workqueue.c:3385 > > worker_thread+0xee4/0x1590 kernel/workqueue.c:3466 > > kthread+0x53f/0x600 kernel/kthread.c:436 > > ret_from_fork+0x20f/0x8d0 arch/x86/kernel/process.c:158 > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-04-29 8:09 ` Alexander Potapenko @ 2026-05-11 12:25 ` Thomas Gleixner 2026-05-12 9:33 ` Alexander Potapenko 0 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2026-05-11 12:25 UTC (permalink / raw) To: Alexander Potapenko, Dmitry Vyukov, Mark Rutland Cc: syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Wed, Apr 29 2026 at 10:09, Alexander Potapenko wrote: > On Wed, Apr 29, 2026 at 8:29 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > I believe Mark's recent changes introduced this. I don't believe any of this. I prefer facts. > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > and `state`. > Because there is a lot of non-instrumented code around, we fail to > initialize these variables. > Instead, irqentry_enter_from_kernel_mode() explicitly calls > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > the state. > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > at the same place. Good luck with unpoisoning 'state'. 'state' is not memory to begin with. 'state' is composed in irqentry_enter_from_kernel_mode() and returned from there to the caller as value in RAX. That returned value is then provided as function argument to irqentry_exit() and subsequently to the inlines invoked from there. See the deeper ASM analysis below. Aside of that the whole splat makes no sense at all due to this: >> > Uninit was created at: >> > slab_post_alloc_hook mm/slub.c:4576 [inline] >> > slab_alloc_node mm/slub.c:4898 [inline] >> > __do_kmalloc_node mm/slub.c:5294 [inline] >> > __kmalloc_node_track_caller_noprof+0x4f6/0x1750 mm/slub.c:5403 >> > kmalloc_reserve net/core/skbuff.c:635 [inline] >> > __alloc_skb+0x90d/0x1190 net/core/skbuff.c:713 >> > alloc_skb include/linux/skbuff.h:1383 [inline] >> > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:819 [inline] >> > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] >> > nsim_dev_trap_report_work+0x3f2/0x1430 drivers/net/netdevsim/dev.c:922 How can irqentry_exit_to_kernel_mode_preempt() "touch" an uninitialized SKB which was allocated by the interrupted code, when it only evaluates regs->eflags and the rcu_exit portion of the register supplied 'state' argument? >> > ===================================================== >> > BUG: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 >> > irqentry_exit_to_kernel_mode_preempt+0xb0/0xc0 include/linux/irq-entry-common.h:472 >> > irqentry_exit_to_kernel_mode include/linux/irq-entry-common.h:547 [inline] >> > irqentry_exit+0x7b/0x760 kernel/entry/common.c:164 >> > sysvec_apic_timer_interrupt+0x52/0x90 arch/x86/kernel/apic/apic.c:1061 >> > asm_sysvec_apic_timer_interrupt+0x1f/0x30 arch/x86/include/asm/idtentry.h:697 >> > __sanitizer_cov_trace_pc+0x5/0x70 kernel/kcov.c:210 >> > nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:842 [inline] >> > nsim_dev_trap_report drivers/net/netdevsim/dev.c:876 [inline] >> > nsim_dev_trap_report_work+0x8c0/0x1430 drivers/net/netdevsim/dev.c:922 So I got the vmlinux and disassembled the related code parts: ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt>: ffffffff81da7240: 55 push %rbp ffffffff81da7241: 48 89 e5 mov %rsp,%rbp ffffffff81da7244: 41 57 push %r15 ffffffff81da7246: 41 56 push %r14 ffffffff81da7248: 41 55 push %r13 ffffffff81da724a: 41 54 push %r12 ffffffff81da724c: 53 push %rbx ffffffff81da724d: 41 89 f6 mov %esi,%r14d // Save 'state' in r14 ffffffff81da7250: 49 89 ff mov %rdi,%r15 // Save 'regs' in r15 ffffffff81da7253: e8 08 84 d4 00 call ffffffff82aef660 <__msan_get_context_state> ffffffff81da7258: 44 0f b6 68 08 movzbl 0x8(%rax),%r13d // Read magic value into r13d (related to 'state') ffffffff81da725d: 8b 98 90 0c 00 00 mov 0xc90(%rax),%ebx // Read magic value into ebx (related to 'regs') ffffffff81da7263: 4d 8b a7 90 00 00 00 mov 0x90(%r15),%r12 // Read regs->eflags into r12 ffffffff81da726a: 49 81 c7 90 00 00 00 add $0x90,%r15 // Add eflags offset to r15 ffffffff81da7271: 4c 89 ff mov %r15,%rdi // MSAN check for accessing regs->eflags ffffffff81da7274: e8 b7 78 d4 00 call ffffffff82aeeb30 <__msan_metadata_ptr_for_load_8> ffffffff81da7279: 8b 08 mov (%rax),%ecx // Read magic data from the address returned by msan ffffffff81da727b: 89 c8 mov %ecx,%eax // Transfer result to eax ffffffff81da727d: f7 d0 not %eax // One's complement negate eax ffffffff81da727f: 41 81 e4 00 02 00 00 and $0x200,%r12d // Isolate eflags::IF (bit 9) ffffffff81da7286: c1 e9 09 shr $0x9,%ecx // Right shift the magic value by 9 (IF into bit 0) ffffffff81da7289: 0d ff fd ff ff or $0xfffffdff,%eax // Set all bits in eax except bit 9 ffffffff81da728e: 41 85 c4 test %eax,%r12d // Logical compare between eax and r12d The result is in ZF: 1 when bit 9 is not set in eax and/or r12d. 0 if bit 9 is set in both. ffffffff81da7291: 0f 94 c0 sete %al // If ZF==1 AL=1 otherwise AL=0 ffffffff81da7294: 20 c8 and %cl,%al // And the right shifted bit 9 ffffffff81da7296: 4d 85 e4 test %r12,%r12 // Check whether r12 is 0 ( IF set or not ) ffffffff81da7299: 41 0f 94 c7 sete %r15b // If ZF==1 r15b=1 otherwise r15b=0 ffffffff81da729d: 44 89 e1 mov %r12d,%ecx // Transfer the isolated bit 9 into ecx ffffffff81da72a0: c1 e9 09 shr $0x9,%ecx // Shift it into bit 0 ffffffff81da72a3: 44 20 e9 and %r13b,%cl // And the magic context state value ffffffff81da72a6: 45 08 f7 or %r14b,%r15b // Or 'state' on the IF result created above ffffffff81da72a9: 41 f6 d6 not %r14b // One's complement negate 'state' ffffffff81da72ac: 45 08 ee or %r13b,%r14b // Or the above magic KMSAN value on the negated 'state' 'state' is a register argument value. So how on earth can KMSAN mangle this with information stored in the KMSAN context? See below. ffffffff81da72af: 41 0f b6 f6 movzbl %r14b,%esi // zero extend the result into esi ffffffff81da72b3: 0f b6 c9 movzbl %cl,%ecx // zero extend the magic result into ecx ffffffff81da72b6: 84 c0 test %al,%al // Check whether al is 0 (magic IF result) ffffffff81da72b8: 0f 45 ce cmovne %esi,%ecx // If not, move the magic mangled state result into ecx This is where the whole thing goes south because it uses the register state which was wrongfully mangled with the KMSAN state in r13b. ffffffff81da72bb: f6 c1 01 test $0x1,%cl // Check whether the result is 1 // If 1, jump to the kmsan warning section ffffffff81da72be: 75 1b jne ffffffff81da72db <irqentry_exit_to_kernel_mode_preempt+0x9b> ffffffff81da72c0: 41 f6 c7 01 test $0x1,%r15b // Check the actual condition i.e. if (regs_irqs_disabled(regs) || state.exit_rcu) ffffffff81da72c4: 75 05 jne ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b> ffffffff81da72c6: e8 9d f1 37 0f call ffffffff91126468 <__SCT__irqentry_exit_cond_resched> ffffffff81da72cb: 5b pop %rbx ffffffff81da72cc: 41 5c pop %r12 ffffffff81da72ce: 41 5d pop %r13 ffffffff81da72d0: 41 5e pop %r14 ffffffff81da72d2: 41 5f pop %r15 ffffffff81da72d4: 5d pop %rbp ffffffff81da72d5: 2e e9 d5 e2 37 0f cs jmp ffffffff911255b0 <__pi___x86_return_thunk> // KMSAN warning ffffffff81da72db: 31 c9 xor %ecx,%ecx ffffffff81da72dd: 4d 85 e4 test %r12,%r12 ffffffff81da72e0: 0f 44 d9 cmove %ecx,%ebx ffffffff81da72e3: 84 c0 test %al,%al ffffffff81da72e5: 74 02 je ffffffff81da72e9 <irqentry_exit_to_kernel_mode_preempt+0xa9> ffffffff81da72e7: 8b 1a mov (%rdx),%ebx ffffffff81da72e9: 89 df mov %ebx,%edi ffffffff81da72eb: e8 30 83 d4 00 call ffffffff82aef620 <__msan_warning> ffffffff81da72f0: 41 f6 c7 01 test $0x1,%r15b ffffffff81da72f4: 74 d0 je ffffffff81da72c6 <irqentry_exit_to_kernel_mode_preempt+0x86> ffffffff81da72f6: eb d3 jmp ffffffff81da72cb <irqentry_exit_to_kernel_mode_preempt+0x8b> Let's look at the call sites to prove that KMSAN cannot observe 'state' as a memory access ever. noinstr sysvec_apic_timer_interrupt(regs) { // Not observable stack or register state irqentry_state_t state = irqentry_enter(regs); instrumentation_begin(); __sysvec_apic_timer_interrupt(regs); instrumentation_end(); irqentry_exit(regs, state); } Here is the disassembly: ffffffff910e6840 <sysvec_apic_timer_interrupt>: ffffffff910e6840: f3 0f 1e fa endbr64 ffffffff910e6844: 55 push %rbp ffffffff910e6845: 48 89 e5 mov %rsp,%rbp ffffffff910e6848: 41 56 push %r14 ffffffff910e684a: 53 push %rbx ffffffff910e684b: 48 89 fb mov %rdi,%rbx // Saves 'regs' in rbx ffffffff910e684e: e8 ad 07 00 00 call ffffffff910e7000 <irqentry_enter> // invokes irqentry_enter() ffffffff910e6853: 41 89 c6 mov %eax,%r14d // Saves the returned 'state' in r14d ffffffff910e6856: 65 c6 05 a2 d9 7f 04 01 movb $0x1,%gs:0x47fd9a2(%rip) # ffffffff958e4200 <irq_stat> ffffffff910e685e: 90 nop ffffffff910e685f: f6 83 88 00 00 00 03 testb $0x3,0x88(%rbx) ffffffff910e6866: 75 0b jne ffffffff910e6873 <sysvec_apic_timer_interrupt+0x33> ffffffff910e6868: 65 8a 05 c3 f7 76 04 mov %gs:0x476f7c3(%rip),%al # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e686f: 84 c0 test %al,%al ffffffff910e6871: 74 29 je ffffffff910e689c <sysvec_apic_timer_interrupt+0x5c> ffffffff910e6873: e8 28 eb 8e f0 call ffffffff819d53a0 <irq_enter_rcu> ffffffff910e6878: 48 89 df mov %rbx,%rdi ffffffff910e687b: e8 c0 a5 80 f0 call ffffffff818f0e40 <__sysvec_apic_timer_interrupt> ffffffff910e6880: e8 3b ed 8e f0 call ffffffff819d55c0 <irq_exit_rcu> ffffffff910e6885: 90 nop ffffffff910e6886: 41 0f b6 f6 movzbl %r14b,%esi // Moves the saved 'state' into esi ffffffff910e688a: 48 89 df mov %rbx,%rdi // Moves the saved 'regs' into rdi ffffffff910e688d: e8 4e 08 00 00 call ffffffff910e70e0 <irqentry_exit> // Invokes irqentry_exit() ffffffff910e6892: 5b pop %rbx ffffffff910e6893: 41 5e pop %r14 ffffffff910e6895: 5d pop %rbp ffffffff910e6896: 2e e9 14 ed 03 00 cs jmp ffffffff911255b0 <__pi___x86_return_thunk> ffffffff910e689c: 65 c6 05 8e f7 76 04 01 movb $0x1,%gs:0x476f78e(%rip) # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e68a4: 65 4c 8b 1d 54 f7 76 04 mov %gs:0x476f754(%rip),%r11 # ffffffff95856000 <hardirq_stack_ptr> ffffffff910e68ac: 49 89 23 mov %rsp,(%r11) ffffffff910e68af: 4c 89 dc mov %r11,%rsp ffffffff910e68b2: e8 e9 ea 8e f0 call ffffffff819d53a0 <irq_enter_rcu> ffffffff910e68b7: 48 89 df mov %rbx,%rdi ffffffff910e68ba: e8 81 a5 80 f0 call ffffffff818f0e40 <__sysvec_apic_timer_interrupt> ffffffff910e68bf: e8 fc ec 8e f0 call ffffffff819d55c0 <irq_exit_rcu> ffffffff910e68c4: 5c pop %rsp ffffffff910e68c5: 65 c6 05 65 f7 76 04 00 movb $0x0,%gs:0x476f765(%rip) # ffffffff95856032 <hardirq_stack_inuse> ffffffff910e68cd: eb b6 jmp ffffffff910e6885 <sysvec_apic_timer_interrupt+0x45> ffffffff910e68cf: 90 nop Purely register state in a function which is not instrumented to begin with. Now let's look at irqentry_enter(): noinstr irqentry_enter(regs) { // Not observable stack or register state irqentry_state_t state = ....; .... return state; } ffffffff910e7000 <irqentry_enter>: ffffffff910e7000: f3 0f 1e fa endbr64 ffffffff910e7004: 55 push %rbp ffffffff910e7005: 48 89 e5 mov %rsp,%rbp ffffffff910e7008: 53 push %rbx ffffffff910e7009: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) // check for kernel entry ffffffff910e7010: 74 59 je ffffffff910e706b <irqentry_enter+0x6b> // branch taken ... ffffffff910e706b: 65 48 8b 05 95 ef 76 04 mov %gs:0x476ef95(%rip),%rax # ffffffff95856008 <current_task> ffffffff910e7073: f6 40 2c 02 testb $0x2,0x2c(%rax) // is_idle_task()? ffffffff910e7077: 75 0b jne ffffffff910e7084 <irqentry_enter+0x84> ffffffff910e7079: 90 nop ffffffff910e707a: e8 91 99 a0 f1 call ffffffff82af0a10 <kmsan_unpoison_entry_regs> // Unpoisons regs ffffffff910e707f: 90 nop ffffffff910e7080: 31 c0 xor %eax,%eax // state = 0 ffffffff910e7082: eb 14 jmp ffffffff910e7098 <irqentry_enter+0x98> // return ffffffff910e7084: 48 89 fb mov %rdi,%rbx // Preserve 'regs' ffffffff910e7087: e8 44 0c 00 00 call ffffffff910e7cd0 <ct_irq_enter> ffffffff910e708c: 90 nop ffffffff910e708d: 48 89 df mov %rbx,%rdi // Restore 'regs' ffffffff910e7090: e8 7b 99 a0 f1 call ffffffff82af0a10 <kmsan_unpoison_entry_regs> // Unpoisons 'regs' ffffffff910e7095: 90 nop ffffffff910e7096: b0 01 mov $0x1,%al // state = 1 ffffffff910e7098: 5b pop %rbx // return ffffffff910e7099: 5d pop %rbp ffffffff910e709a: 2e e9 10 e5 03 00 cs jmp ffffffff911255b0 <__pi___x86_return_thunk> As this just returns state constants in RAX, there is zero related KMSAN memory state, right? Now irqentry_exit: noinstr irqentry_exit(regs, state) { .... irqentry_exit_to_kernel_mode_preempt(regs, state); } ffffffff910e70e0 <irqentry_exit>: ffffffff910e70e0: f3 0f 1e fa endbr64 ffffffff910e70e4: 55 push %rbp ffffffff910e70e5: 48 89 e5 mov %rsp,%rbp ffffffff910e70e8: 41 57 push %r15 ffffffff910e70ea: 41 56 push %r14 ffffffff910e70ec: 41 55 push %r13 ffffffff910e70ee: 41 54 push %r12 ffffffff910e70f0: 53 push %rbx ffffffff910e70f1: 48 83 ec 68 sub $0x68,%rsp ffffffff910e70f5: 49 89 fe mov %rdi,%r14 // Save 'regs' ffffffff910e70f8: f6 87 88 00 00 00 03 testb $0x3,0x88(%rdi) // return to kernel mode ffffffff910e70ff: 74 4c je ffffffff910e714d <irqentry_exit+0x6d> // branch taken ... ffffffff910e714d: 89 f3 mov %esi,%ebx // Save 'state' ffffffff910e714f: 90 nop ffffffff910e7150: 0f b6 f3 movzbl %bl,%esi ffffffff910e7153: 4c 89 f7 mov %r14,%rdi ffffffff910e7156: e8 e5 00 cc f0 call ffffffff81da7240 <irqentry_exit_to_kernel_mode_preempt> ffffffff910e715b: 90 nop Again. 'state' is a pure register value, which is handed to irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). But KMSAN magically associates a memory access which it and then claims it belongs to a SKB which was allocated in the interrupted code. What Mark's change actually does is to make the register value 'state' observable in an instrumented function, while before that 'state' was always confined in the non instrumentable code. But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() is a pure register value, which could be even a constant supplied by the caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. I just built the same .config with clang-19 and it is hallucinating in the same way, though the actual assembly is more readable than the syzbot clang-21 variant: 0000000000002710 <irqentry_exit_to_kernel_mode_preempt>: 2710: 55 push %rbp 2711: 48 89 e5 mov %rsp,%rbp 2714: 41 57 push %r15 2716: 41 56 push %r14 2718: 41 55 push %r13 271a: 41 54 push %r12 271c: 53 push %rbx 271d: 89 f3 mov %esi,%ebx // save 'state' in EBX 271f: 49 89 ff mov %rdi,%r15 // save 'regs' in R15 2722: e8 00 00 00 00 call 2727 <irqentry_exit_to_kernel_mode_preempt+0x17> 2723: R_X86_64_PLT32 __msan_get_context_state-0x4 2727: 44 0f b6 60 08 movzbl 0x8(%rax),%r12d // Magic KMSAN data related to 'state' 272c: 44 8b b0 90 0c 00 00 mov 0xc90(%rax),%r14d // Magic KMSAN data related to 'regs::eflags' 2733: 4d 8b af 90 00 00 00 mov 0x90(%r15),%r13 // Read the actual regs->eflags 273a: 49 81 c7 90 00 00 00 add $0x90,%r15 // Adjust pointer to regs::eflags 2741: 4c 89 ff mov %r15,%rdi 2744: e8 00 00 00 00 call 2749 <irqentry_exit_to_kernel_mode_preempt+0x39> 2745: R_X86_64_PLT32 __msan_metadata_ptr_for_load_8-0x4 2749: 48 8b 00 mov (%rax),%rax // Read the KMSAN state 274c: 25 00 02 00 00 and $0x200,%eax // Isolate bit 9 (IF?) 2751: 41 81 e5 00 02 00 00 and $0x200,%r13d // Isolate IF in the actual value 2758: 48 85 c0 test %rax,%rax // Test whether the KMSAN IF state is 0 275b: 74 07 je 2764 <irqentry_exit_to_kernel_mode_preempt+0x54> // Not zero 275d: f7 d0 not %eax // negate the KMSAN state 275f: 44 21 e8 and %r13d,%eax // And the actual value (IF isolated) // If zero, goto MSAN warning 2762: 74 25 je 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2764: 4d 85 ed test %r13,%r13 // Check whether IF is set // If not, return 2767: 74 10 je 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2769: 41 f6 c4 01 test $0x1,%r12b // Check the magic KMSAN state (related to 'state') // If not zero, goto MSAN warning 276d: 75 28 jne 2797 <irqentry_exit_to_kernel_mode_preempt+0x87> 276f: f6 c3 01 test $0x1,%bl // Check state.rcu_exit // If not, return 2772: 75 05 jne 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2774: e8 00 00 00 00 call 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2775: R_X86_64_PLT32 __SCT__irqentry_exit_cond_resched-0x4 2779: 5b pop %rbx 277a: 41 5c pop %r12 277c: 41 5d pop %r13 277e: 41 5e pop %r14 2780: 41 5f pop %r15 2782: 5d pop %rbp 2783: 2e e9 00 00 00 00 cs jmp 2789 <irqentry_exit_to_kernel_mode_preempt+0x79> 2785: R_X86_64_PLT32 __x86_return_thunk-0x4 2789: 8b 3a mov (%rdx),%edi 278b: e8 00 00 00 00 call 2790 <irqentry_exit_to_kernel_mode_preempt+0x80> 278c: R_X86_64_PLT32 __msan_warning-0x4 2790: 4d 85 ed test %r13,%r13 2793: 75 d4 jne 2769 <irqentry_exit_to_kernel_mode_preempt+0x59> 2795: eb e2 jmp 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> 2797: 44 89 f7 mov %r14d,%edi 279a: e8 00 00 00 00 call 279f <irqentry_exit_to_kernel_mode_preempt+0x8f> 279b: R_X86_64_PLT32 __msan_warning-0x4 279f: f6 c3 01 test $0x1,%bl 27a2: 74 d0 je 2774 <irqentry_exit_to_kernel_mode_preempt+0x64> 27a4: eb d3 jmp 2779 <irqentry_exit_to_kernel_mode_preempt+0x69> The compiler _cannot_ assume anything about the 'state' argument as that's handed in as value in RSI from a completely different compilation unit. Something is wrong in KMSAN/compiler land or do you still believe that you just need to unpoison the non existing memory 'state'? Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-11 12:25 ` Thomas Gleixner @ 2026-05-12 9:33 ` Alexander Potapenko 2026-05-12 11:15 ` Mark Rutland 0 siblings, 1 reply; 13+ messages in thread From: Alexander Potapenko @ 2026-05-12 9:33 UTC (permalink / raw) To: Thomas Gleixner Cc: Dmitry Vyukov, Mark Rutland, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > Hi Thomas, thanks for diving into this! > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > and `state`. > > Because there is a lot of non-instrumented code around, we fail to > > initialize these variables. > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > the state. > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > at the same place. > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. My bad. Normally, the compiler would happily move `state` to memory if it is address-taken, and make sure that its shadow is tracked properly. But not in this case, because irqentry_enter_from_kernel_mode() is inlined into a `noinstr` function. <I omit the elaborate assembly analysis here, tipping my hat!> > Again. 'state' is a pure register value, which is handed to > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). There is no notion of a 'pure register value' in C, and the compiler may make arbitrary decisions about whether a particular value is stored on the stack or in the registers. Luckily, KMSAN does not have to know about that, because it works on the LLVM IR level and can track the state of a value regardless of where it is stored. In particular, it normally works for function return values - unless `noinstr` kicks in. > > But KMSAN magically associates a memory access which it and then claims > it belongs to a SKB which was allocated in the interrupted code. > > What Mark's change actually does is to make the register value 'state' > observable in an instrumented function, while before that 'state' was > always confined in the non instrumentable code. Agreed. > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > is a pure register value, which could be even a constant supplied by the > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. I disagree with a general implication that KMSAN has zero business to fiddle with values passed in registers. But I agree we are doing a poor job trying to pull the shadow for `state` out of thin air. > > The compiler _cannot_ assume anything about the 'state' argument as > that's handed in as value in RSI from a completely different compilation > unit. > Again, this only matters because we are calling an instrumented function from a non-instrumented one, otherwise it's perfectly fine to call between compilation units. > Something is wrong in KMSAN/compiler land or do you still believe that > you just need to unpoison the non existing memory 'state'? > When we call an instrumented function from a non-instrumented one, the compiler is doomed to not understand that and to be unable to track the function parameters properly. Exactly because `noinstr` implies no instrumentation whatsoever, the compiler may not add any hints on the caller side that would help the callee understand what's going on - even if KMSAN is able to see this `noinstr` function (which is not always the case). So what we could do is to add annotations manualy on either the caller side or the callee side. We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), making all its inputs initialized. This is the easiest solution, it may introduce false negatives, but we are on a very thin ice anyway, so perhaps doing so is better than dealing with more false positives in the interrupt code. Another option for the callee would be applying `__always_inline`, so that irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. Given that irqentry_exit_to_kernel_mode_after_preempt() is already `__always_inline`, it might be the right thing to do. On the caller side, we could do something creative with instrumentation_begin() and instrumentation_end(). We've had a discussion about that exactly four years ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u , but came to a conclusion that a handful of annotations on the noinstr/instr boundary may do a better job than a solution that doesn't cover all cases. In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in instrumentation_begin(). But it wouldn't solve more complex (yet rare, and non-existing today) cases where two functions are called from an instrumented region, and the first function somehow leaves the argument state poisoned. Do you think it's worth revisiting the instrumentation_begin() approach, or shall we go with one of the compiler attributes instead? Thank you, Alexander -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 9:33 ` Alexander Potapenko @ 2026-05-12 11:15 ` Mark Rutland 2026-05-12 16:24 ` Alexander Potapenko 0 siblings, 1 reply; 13+ messages in thread From: Mark Rutland @ 2026-05-12 11:15 UTC (permalink / raw) To: Alexander Potapenko Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote: > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > > and `state`. > > > Because there is a lot of non-instrumented code around, we fail to > > > initialize these variables. > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > > the state. > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > > at the same place. > > > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. > > My bad. Normally, the compiler would happily move `state` to memory if it is > address-taken, and make sure that its shadow is tracked properly. > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into > a `noinstr` function. > > <I omit the elaborate assembly analysis here, tipping my hat!> > > > Again. 'state' is a pure register value, which is handed to > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). > > There is no notion of a 'pure register value' in C, and the compiler may make > arbitrary decisions about whether a particular value is stored on the stack or > in the registers. > > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR > level and can track the state of a value regardless of where it is stored. > In particular, it normally works for function return values - unless `noinstr` > kicks in. > > > > > But KMSAN magically associates a memory access which it and then claims > > it belongs to a SKB which was allocated in the interrupted code. > > > > What Mark's change actually does is to make the register value 'state' > > observable in an instrumented function, while before that 'state' was > > always confined in the non instrumentable code. > > Agreed. > > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > > is a pure register value, which could be even a constant supplied by the > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. > > I disagree with a general implication that KMSAN has zero business to fiddle > with values passed in registers. But I agree we are doing a poor job trying > to pull the shadow for `state` out of thin air. > > > The compiler _cannot_ assume anything about the 'state' argument as > > that's handed in as value in RSI from a completely different compilation > > unit. > > Again, this only matters because we are calling an instrumented function from > a non-instrumented one, otherwise it's perfectly fine to call between > compilation units. For context, can you explain how this is expected to work across compilation units when the caller and callee *are* instrumented, when an argument is passed in a register? Below you suggest that the caller might add "hints", but it's not clear specifically what this means. > > Something is wrong in KMSAN/compiler land or do you still believe that > > you just need to unpoison the non existing memory 'state'? > > When we call an instrumented function from a non-instrumented one, the compiler > is doomed to not understand that and to be unable to track the function > parameters properly. Exactly because `noinstr` implies no instrumentation > whatsoever, the compiler may not add any hints on the caller side that would > help the callee understand what's going on - even if KMSAN is able to see this > `noinstr` function (which is not always the case). > > So what we could do is to add annotations manualy on either the caller side or > the callee side. Without some understanding of those "hints" you mention, I don't see how we can do that on the caller side. > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > making all its inputs initialized. This is the easiest solution, it may > introduce false negatives, but we are on a very thin ice anyway, so perhaps > doing so is better than dealing with more false positives in the interrupt code. > > Another option for the callee would be applying `__always_inline`, so that > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > `__always_inline`, it might be the right thing to do. We can do that, but this really suggests that there's a fundemantal inability to pass arguments between code which is noinstr and code which isinstrumented with AddressSanitizer, and that's inevitably going to bite us in future. > On the caller side, we could do something creative with instrumentation_begin() > and instrumentation_end(). We've had a discussion about that exactly four years > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > , but came to a conclusion that a handful of annotations on the noinstr/instr > boundary may do a better job than a solution that doesn't cover all cases. That doesn't look general at all, so I am not keen on that. > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > non-existing today) cases where two functions are called from an instrumented > region, and the first function somehow leaves the argument state poisoned. How exactly is kmsan_context_state used? If that's supposed to carry some global or current context, surely blatting that in entry code will affect the code that was interrupted? I see kmsan_get_context() has an in_task() check, but that can't help with nested exceptions, so this doesn't look right at all. > Do you think it's worth revisiting the instrumentation_begin() approach, or > shall we go with one of the compiler attributes instead? I think we need a better understanding of this first. It looks to me that there are bigger problems here. Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 11:15 ` Mark Rutland @ 2026-05-12 16:24 ` Alexander Potapenko 2026-05-12 17:46 ` Mark Rutland 2026-05-13 0:36 ` Thomas Gleixner 0 siblings, 2 replies; 13+ messages in thread From: Alexander Potapenko @ 2026-05-12 16:24 UTC (permalink / raw) To: Mark Rutland Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, May 12, 2026 at 11:33:47AM +0200, Alexander Potapenko wrote: > > On Mon, May 11, 2026 at 2:25 PM Thomas Gleixner <tglx@kernel.org> wrote: > > > > irqentry_exit_to_kernel_mode_preempt() is now checking for both `regs` > > > > and `state`. > > > > Because there is a lot of non-instrumented code around, we fail to > > > > initialize these variables. > > > > Instead, irqentry_enter_from_kernel_mode() explicitly calls > > > > kmsan_unpoison_entry_regs(regs) to take care of the registers, but not > > > > the state. > > > > We should probably call `kmsan_unpoison_memory(&state, sizeof(state))` > > > > at the same place. > > > > > > Good luck with unpoisoning 'state'. 'state' is not memory to begin with. > > > > My bad. Normally, the compiler would happily move `state` to memory if it is > > address-taken, and make sure that its shadow is tracked properly. > > But not in this case, because irqentry_enter_from_kernel_mode() is inlined into > > a `noinstr` function. > > > > <I omit the elaborate assembly analysis here, tipping my hat!> > > > > > Again. 'state' is a pure register value, which is handed to > > > irqentry_exit() and irqentry_exit_to_kernel_mode_preempt(). > > > > There is no notion of a 'pure register value' in C, and the compiler may make > > arbitrary decisions about whether a particular value is stored on the stack or > > in the registers. > > > > Luckily, KMSAN does not have to know about that, because it works on the LLVM IR > > level and can track the state of a value regardless of where it is stored. > > In particular, it normally works for function return values - unless `noinstr` > > kicks in. > > > > > > > > But KMSAN magically associates a memory access which it and then claims > > > it belongs to a SKB which was allocated in the interrupted code. > > > > > > What Mark's change actually does is to make the register value 'state' > > > observable in an instrumented function, while before that 'state' was > > > always confined in the non instrumentable code. > > > > Agreed. > > > > > But as that 'state' argument of irqentry_exit_to_kernel_mode_preempt() > > > is a pure register value, which could be even a constant supplied by the > > > caller of irqentry_exit(), KMSAN has _ZERO_ business to fiddle with it. > > > > I disagree with a general implication that KMSAN has zero business to fiddle > > with values passed in registers. But I agree we are doing a poor job trying > > to pull the shadow for `state` out of thin air. > > > > > The compiler _cannot_ assume anything about the 'state' argument as > > > that's handed in as value in RSI from a completely different compilation > > > unit. > > > > Again, this only matters because we are calling an instrumented function from > > a non-instrumented one, otherwise it's perfectly fine to call between > > compilation units. > > For context, can you explain how this is expected to work across > compilation units when the caller and callee *are* instrumented, when an > argument is passed in a register? Per KMSAN ABI, the metadata for the function arguments is passed via buffers in the so-called context state (see include/linux/kmsan_types.h) In the userspace, these buffers are thread-local variables referenced by inline loads and stores. In the kernel space, the compiler inserts a call __msan_get_context_state() at the beginning of every function, and then the instrumentation code uses whatever that function returned. Assume we have a function: int sum(int a, int b) { result = a + b; return result; } Its instrumented version looks roughly as follows (we'll omit origin tracking for simplicity): int sum(int a, int b) { struct kmsan_context_state *kcs = __msan_get_context_state(); int s_a = ((int)kcs->param_tls)[0]; // shadow of a int s_b = ((int)kcs->param_tls)[1]; // shadow of b result = a + b; s_result = s_a | s_b; ((int)kcs->retval_tls)[0] = s_result; // returned shadow return result; } Most certainly `a` and `b` will be passed using registers, but that doesn't matter: their metadata is safe as long as the caller does: ((int)kcs->param_tls)[0] = s_a; ((int)kcs->param_tls)[1] = s_b; result = sum(a, b); s_result = ((int)kcs->retval_tls)[0]; All the metadata tracking is implemented using an LLVM IR pass, which ensures that the shadow for each uninitialized value is tracked regardless of where that value is stored - be that heap memory, stack or registers: - when a value is created or loaded from memory, the compiler inserts SSA registers corresponding to that value's shadow; - when a value is written to memory, a shadow store is inserted; - when a value is used in an operation, the result of that operation receives a shadow value depending on the operands and their shadow values; - when a value is passed to a function, the corresponding context state stores and loads are emitted. Now, this works fine under the assumption that all code in the kernel is instrumented with KMSAN. However this is not true. 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some prevent infinite recursion caused by KMSAN calling instrumented code; others disable instrumentation of the code that executes before KMSAN is fully set up. 2. There are annotations to soft-disable KMSAN for a particular function by adding a `__no_kmsan_checks` attribute. There will still be instrumentation, but all the function outputs (stores and returns) will be initialized, and no errors will be reported. This is handy when the compiler cannot properly instrument the underlying code. One example would be tricky inline assembly, another one - stack walking functions that deliberately read and write uninitialized values. 3. There are cases in which KMSAN must be disabled for the whole function (`__no_sanitize_memory`). We disable instrumentation for `noinstr` functions. Additionally, on x86 there is exactly one case where this is done to avoid infinite recursion when storing the origins. It's important to note that, due to how function attributes are implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will be applied to a function if that function exists during the KMSAN instrumentation pass. If it was inlined before that pass, the compiler will apply the instrumentation rules for whichever function is calling the inlined one. Now, because of the described ABI, calls between instrumented functions and/or functions marked `__no_kmsan_checks` are perfectly fine because the function arguments are stored and loaded on both ends. What does not work well is calling non-instrumented functions from instrumented functions, and vice versa. In the former case, KMSAN will not see the callee's side effects (return values or memory stores), in the latter case, the callee may receive incorrect shadow values for the function parameters or memory stores in the caller. Both will We have few tools to deal with such cases. It often helps to move the border between the instrumented and non-instrumented code by applying __no_kmsan_checks or __no_sanitize_memory until we get to a point where there are no incorrect shadow arguments. Another way to deal with uninitialized data coming from non-instrumented code is kmsan_unpoison_memory(address, size). Unfortunately, as Thomas pointed out, we can't use it for locals in non-instrumented code. > Below you suggest that the caller might add "hints", but it's not clear > specifically what this means. > > > > Something is wrong in KMSAN/compiler land or do you still believe that > > > you just need to unpoison the non existing memory 'state'? > > > > When we call an instrumented function from a non-instrumented one, the compiler > > is doomed to not understand that and to be unable to track the function > > parameters properly. Exactly because `noinstr` implies no instrumentation > > whatsoever, the compiler may not add any hints on the caller side that would > > help the callee understand what's going on - even if KMSAN is able to see this > > `noinstr` function (which is not always the case). > > > > So what we could do is to add annotations manualy on either the caller side or > > the callee side. > > Without some understanding of those "hints" you mention, I don't see how > we can do that on the caller side. In this case by "hints" I meant any kind of instrumentation that the compiler could have inserted automatically to mark the data in the instrumented functions as initialized. There are no such "hints" right now (apart from letting KMSAN instrument the function). As for the manual annotations, we only have the mentioned `__no_kmsan_checks`, `__no_sanitize_memory`, and kmsan_unpoison_memory(). > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > making all its inputs initialized. This is the easiest solution, it may > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > doing so is better than dealing with more false positives in the interrupt code. > > > > Another option for the callee would be applying `__always_inline`, so that > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > `__always_inline`, it might be the right thing to do. > > We can do that, but this really suggests that there's a fundemantal > inability to pass arguments between code which is noinstr and code which > isinstrumented with AddressSanitizer, and that's inevitably going to > bite us in future. This is true (assuming you mean MemorySanitizer), but: - I don't think we can avoid that, given that there will always be non-instrumented code; - so far the number of annotations has been manageable. > > > On the caller side, we could do something creative with instrumentation_begin() > > and instrumentation_end(). We've had a discussion about that exactly four years > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > > , but came to a conclusion that a handful of annotations on the noinstr/instr > > boundary may do a better job than a solution that doesn't cover all cases. > > That doesn't look general at all, so I am not keen on that. Ack. > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > > non-existing today) cases where two functions are called from an instrumented > > region, and the first function somehow leaves the argument state poisoned. > > How exactly is kmsan_context_state used? Hope the explanation above helps. > > If that's supposed to carry some global or current context, surely > blatting that in entry code will affect the code that was interrupted? It will mostly affect the bottom-level function called by the entry code, after that, nested functions will be just passing their parameters as per KMSAN ABI. > I see kmsan_get_context() has an in_task() check, but that can't help > with nested exceptions, so this doesn't look right at all. KMSAN context is per-task, and if !in_task(), it is per-CPU. For the nested exceptions, we conservatively bail out (see kmsan_in_runtime()) to avoid deadlocking. This may cause false negatives. > > Do you think it's worth revisiting the instrumentation_begin() approach, or > > shall we go with one of the compiler attributes instead? > I think we need a better understanding of this first. > > It looks to me that there are bigger problems here. > I'll be happy to discuss this further. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 16:24 ` Alexander Potapenko @ 2026-05-12 17:46 ` Mark Rutland 2026-05-22 6:26 ` Alexander Potapenko 2026-05-13 0:36 ` Thomas Gleixner 1 sibling, 1 reply; 13+ messages in thread From: Mark Rutland @ 2026-05-12 17:46 UTC (permalink / raw) To: Alexander Potapenko Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie Hi, Thanks for this; the explanation of the KMSAN ABI is *very* helpful. I have a few questions and comments inline below. On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > For context, can you explain how this is expected to work across > > compilation units when the caller and callee *are* instrumented, when an > > argument is passed in a register? > > Per KMSAN ABI, the metadata for the function arguments is passed via > buffers in the so-called context state (see > include/linux/kmsan_types.h) > In the userspace, these buffers are thread-local variables referenced > by inline loads and stores. > In the kernel space, the compiler inserts a call > __msan_get_context_state() at the beginning of every function, and > then the instrumentation code uses whatever that function returned. > > Assume we have a function: > > int sum(int a, int b) { > result = a + b; > return result; > } > > Its instrumented version looks roughly as follows (we'll omit origin > tracking for simplicity): > > int sum(int a, int b) { > struct kmsan_context_state *kcs = __msan_get_context_state(); > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > result = a + b; > s_result = s_a | s_b; > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > return result; > } > > Most certainly `a` and `b` will be passed using registers, but that > doesn't matter: their metadata is safe as long as the caller does: > > ((int)kcs->param_tls)[0] = s_a; > ((int)kcs->param_tls)[1] = s_b; > result = sum(a, b); > s_result = ((int)kcs->retval_tls)[0]; Ok. I see how that works when both caller and callee are instrumented. I assume that's tracked for all arguments regardless of type? e.g. that includes pointers, structs, etc? > All the metadata tracking is implemented using an LLVM IR pass, which > ensures that the shadow for each uninitialized value is tracked > regardless of where that value is stored - be that heap memory, stack > or registers: > - when a value is created or loaded from memory, the compiler inserts > SSA registers corresponding to that value's shadow; > - when a value is written to memory, a shadow store is inserted; > - when a value is used in an operation, the result of that operation > receives a shadow value depending on the operands and their shadow > values; > - when a value is passed to a function, the corresponding context > state stores and loads are emitted. > > Now, this works fine under the assumption that all code in the kernel > is instrumented with KMSAN. > However this is not true. > > 1. There are a few `KMSAN_SANITIZE := n` in the Makefiles: some > prevent infinite recursion caused by KMSAN calling instrumented code; > others disable instrumentation of the code that executes before KMSAN > is fully set up. > 2. There are annotations to soft-disable KMSAN for a particular > function by adding a `__no_kmsan_checks` attribute. There will still > be instrumentation, but all the function outputs (stores and returns) > will be initialized, and no errors will be reported. > This is handy when the compiler cannot properly instrument the > underlying code. One example would be tricky inline assembly, another > one - stack walking functions that deliberately read and write > uninitialized values. > 3. There are cases in which KMSAN must be disabled for the whole > function (`__no_sanitize_memory`). > We disable instrumentation for `noinstr` functions. Additionally, on > x86 there is exactly one case where this is done to avoid infinite > recursion when storing the origins. In addition to the above cases, out-of-line assembly functions won't manipulate the shadow (so they're effectively the same as noinstr). > It's important to note that, due to how function attributes are > implemented in Clang, __no_kmsan_checks and __no_sanitize_memory will > be applied to a function if that function exists during the KMSAN > instrumentation pass. If it was inlined before that pass, the compiler > will apply the instrumentation rules for whichever function is calling > the inlined one. > > Now, because of the described ABI, calls between instrumented > functions and/or functions marked `__no_kmsan_checks` are perfectly > fine because the function arguments are stored and loaded on both > ends. > What does not work well is calling non-instrumented functions from > instrumented functions, and vice versa. Yep; understood. > In the former case, KMSAN will not see the callee's side effects > (return values or memory stores), in the latter case, the callee may > receive incorrect shadow values for the function parameters or memory > stores in the caller. Both will Incomplete sentence here? > We have few tools to deal with such cases. It often helps to move the > border between the instrumented and non-instrumented code by applying > __no_kmsan_checks or __no_sanitize_memory until we get to a point > where there are no incorrect shadow arguments. > Another way to deal with uninitialized data coming from > non-instrumented code is kmsan_unpoison_memory(address, size). > Unfortunately, as Thomas pointed out, we can't use it for locals in > non-instrumented code. It's clearly not sufficient to use kmsan_unpoison_memory() for locals, and I think it's non-sensical because conceptually they aren't memory. I don't think kmsan_unpoison_entry_regs() alone is sufficient for the regs. Surely the pointer argument itself needs to be unpoisoned before being passed to a callee? Otherwise, when pointer to regs is passed as an argument to a callee, the callee will consume stale shadow, which might indicate that the pointer to regs is uninitialized? That seems to be exactly what we're hitting with the irq entry state, so I'm surprised we're not hitting it for the regs. Is that sheer luck, or is something else protecting us? > > Below you suggest that the caller might add "hints", but it's not clear > > specifically what this means. > > > > > > Something is wrong in KMSAN/compiler land or do you still believe that > > > > you just need to unpoison the non existing memory 'state'? > > > > > > When we call an instrumented function from a non-instrumented one, the compiler > > > is doomed to not understand that and to be unable to track the function > > > parameters properly. Exactly because `noinstr` implies no instrumentation > > > whatsoever, the compiler may not add any hints on the caller side that would > > > help the callee understand what's going on - even if KMSAN is able to see this > > > `noinstr` function (which is not always the case). > > > > > > So what we could do is to add annotations manualy on either the caller side or > > > the callee side. > > > > Without some understanding of those "hints" you mention, I don't see how > > we can do that on the caller side. > > In this case by "hints" I meant any kind of instrumentation that the > compiler could have inserted automatically to mark the data in the > instrumented functions as initialized. > There are no such "hints" right now (apart from letting KMSAN > instrument the function). > As for the manual annotations, we only have the mentioned > `__no_kmsan_checks`, `__no_sanitize_memory`, and > kmsan_unpoison_memory(). Ok. For noinstr code calling instrumented code, I see that we could explicitly modify the argument shadow in the noinstr caller, but that's *really* grotty, and I don't think we want to do that. > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > > making all its inputs initialized. This is the easiest solution, it may > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > > doing so is better than dealing with more false positives in the interrupt code. > > > > > > Another option for the callee would be applying `__always_inline`, so that > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > > `__always_inline`, it might be the right thing to do. > > > > We can do that, but this really suggests that there's a fundemantal > > inability to pass arguments between code which is noinstr and code which > > isinstrumented with AddressSanitizer, and that's inevitably going to > > bite us in future. > > This is true (assuming you mean MemorySanitizer), but: Sorry, yes, I meant MemorySanitizer. > - I don't think we can avoid that, given that there will always be > non-instrumented code; > - so far the number of annotations has been manageable. As above, I suspect that we're actually missing many necessary annotations and getting away with this by accident. I suspect that we need additional compiler support to say something like "assume this argument/return value IS initialized". > > > On the caller side, we could do something creative with instrumentation_begin() > > > and instrumentation_end(). We've had a discussion about that exactly four years > > > ago: https://lore.kernel.org/all/20220426164315.625149-29-glider@google.com/T/#u > > > , but came to a conclusion that a handful of annotations on the noinstr/instr > > > boundary may do a better job than a solution that doesn't cover all cases. > > > > That doesn't look general at all, so I am not keen on that. > > Ack. > > > > In particular, the case of irqentry_exit_to_kernel_mode_preempt() could have > > > been solved by `__memset(state, 0, sizeof(struct kmsan_context_state))` in > > > instrumentation_begin(). But it wouldn't solve more complex (yet rare, and > > > non-existing today) cases where two functions are called from an instrumented > > > region, and the first function somehow leaves the argument state poisoned. > > > > How exactly is kmsan_context_state used? > > Hope the explanation above helps. > > > > If that's supposed to carry some global or current context, surely > > blatting that in entry code will affect the code that was interrupted? > > It will mostly affect the bottom-level function called by the entry > code, after that, nested functions will be just passing their > parameters as per KMSAN ABI. The problem I was explaining was the other way around: we clobber the context of the function that was executing *before* exception entry. Consider when the CPU is executing some function foo(), and an exception is taken. Within the exception handler, code will clobber the context state. AFAICT, that state is not saved/restored, and nor is it zeroed prior to exception return. The exception handler returns back to the middle of foo(). From foo's PoV, the context state has been clobbered arbitrarily, and that clobbered state could trigger false positives or false negatives. The way kmsan_get_context() uses in_task() will avoid that in *some* cases, but not *all* cases. In particular, I don't think that's going to help when a fault is taken for a uaccess. Maybe I'm missing something here? > > I see kmsan_get_context() has an in_task() check, but that can't help > > with nested exceptions, so this doesn't look right at all. > > KMSAN context is per-task, and if !in_task(), it is per-CPU. > > For the nested exceptions, we conservatively bail out (see > kmsan_in_runtime()) to avoid deadlocking. > This may cause false negatives. AFAICT that doesn't address the case I've described above. Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 17:46 ` Mark Rutland @ 2026-05-22 6:26 ` Alexander Potapenko 2026-06-02 11:53 ` Mark Rutland 0 siblings, 1 reply; 13+ messages in thread From: Alexander Potapenko @ 2026-05-22 6:26 UTC (permalink / raw) To: Mark Rutland Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@arm.com> wrote: > > Hi, > > Thanks for this; the explanation of the KMSAN ABI is *very* helpful. > > I have a few questions and comments inline below. > > On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote: > > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > For context, can you explain how this is expected to work across > > > compilation units when the caller and callee *are* instrumented, when an > > > argument is passed in a register? > > > > Per KMSAN ABI, the metadata for the function arguments is passed via > > buffers in the so-called context state (see > > include/linux/kmsan_types.h) > > In the userspace, these buffers are thread-local variables referenced > > by inline loads and stores. > > In the kernel space, the compiler inserts a call > > __msan_get_context_state() at the beginning of every function, and > > then the instrumentation code uses whatever that function returned. > > > > Assume we have a function: > > > > int sum(int a, int b) { > > result = a + b; > > return result; > > } > > > > Its instrumented version looks roughly as follows (we'll omit origin > > tracking for simplicity): > > > > int sum(int a, int b) { > > struct kmsan_context_state *kcs = __msan_get_context_state(); > > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > > result = a + b; > > s_result = s_a | s_b; > > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > > return result; > > } > > > > Most certainly `a` and `b` will be passed using registers, but that > > doesn't matter: their metadata is safe as long as the caller does: > > > > ((int)kcs->param_tls)[0] = s_a; > > ((int)kcs->param_tls)[1] = s_b; > > result = sum(a, b); > > s_result = ((int)kcs->retval_tls)[0]; > > Ok. I see how that works when both caller and callee are instrumented. > > I assume that's tracked for all arguments regardless of type? e.g. that > includes pointers, structs, etc? That's tracked for all parameters passed to a function, whether on the stack or in registers. This includes cases when a struct can be passed by value, e.g.: struct arg { int a, b; }; int sum(struct arg arg) { return arg.a + arg.b; } For anything passed to the function by pointer, we only track the initializedness of the pointer itself at the call site and the function prologue. Why is that? Because we already track the state of the pointed-to memory elsewhere. For every heap allocation, there is shadow memory backing it, which is initially poisoned (unless it is a GFP_ZERO allocation). Similarly, shadow memory backs every stack allocation. The state of stack allocations is initially set by __msan_poison_alloca(). When we write an initialized value to a memory buffer, the instrumentation code updates that buffer's shadow memory. So the fact that a buffer is passed to a function by pointer does not change that buffer's state; that's why we don't need additional tracking of the pointed-to memory around the calls. (this is all assuming both the caller and the callee are instrumented). > > 3. There are cases in which KMSAN must be disabled for the whole > > function (`__no_sanitize_memory`). > > We disable instrumentation for `noinstr` functions. Additionally, on > > x86 there is exactly one case where this is done to avoid infinite > > recursion when storing the origins. > > In addition to the above cases, out-of-line assembly functions won't > manipulate the shadow (so they're effectively the same as noinstr). This is correct. > > In the former case, KMSAN will not see the callee's side effects > > (return values or memory stores), in the latter case, the callee may > > receive incorrect shadow values for the function parameters or memory > > stores in the caller. Both will > > Incomplete sentence here? Yeah, sorry, disregard this. > > > We have few tools to deal with such cases. It often helps to move the > > border between the instrumented and non-instrumented code by applying > > __no_kmsan_checks or __no_sanitize_memory until we get to a point > > where there are no incorrect shadow arguments. > > Another way to deal with uninitialized data coming from > > non-instrumented code is kmsan_unpoison_memory(address, size). > > Unfortunately, as Thomas pointed out, we can't use it for locals in > > non-instrumented code. > > It's clearly not sufficient to use kmsan_unpoison_memory() for locals, > and I think it's non-sensical because conceptually they aren't memory. > > I don't think kmsan_unpoison_entry_regs() alone is sufficient for the > regs. Surely the pointer argument itself needs to be unpoisoned before > being passed to a callee? > > Otherwise, when pointer to regs is passed as an argument to a callee, > the callee will consume stale shadow, which might indicate that the > pointer to regs is uninitialized? > > That seems to be exactly what we're hitting with the irq entry state, so > I'm surprised we're not hitting it for the regs. Is that sheer luck, or > is something else protecting us? Struct regs is created by non-instrumented code on the stack, that's why its shadow contains whatever garbage that stack slot had when leaving instrumented code. For the data pointed to by individual registers, there are usually two possibilities: 1. It is allocated by instrumented code and already has some state, and struct regs is being used to transport it between instrumented regions. 2. It is allocated by non-instrumented code, belongs to the same shim that created it, and instrumented code is not supposed to touch it. So I think this is not "sheer luck", but rather the fact that non-instrumented code doesn't allocate memory. > > In this case by "hints" I meant any kind of instrumentation that the > > compiler could have inserted automatically to mark the data in the > > instrumented functions as initialized. > > There are no such "hints" right now (apart from letting KMSAN > > instrument the function). > > As for the manual annotations, we only have the mentioned > > `__no_kmsan_checks`, `__no_sanitize_memory`, and > > kmsan_unpoison_memory(). > > Ok. > > For noinstr code calling instrumented code, I see that we could > explicitly modify the argument shadow in the noinstr caller, but that's > *really* grotty, and I don't think we want to do that. Agreed. > > > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > > > making all its inputs initialized. This is the easiest solution, it may > > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > > > doing so is better than dealing with more false positives in the interrupt code. > > > > > > > > Another option for the callee would be applying `__always_inline`, so that > > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > > > `__always_inline`, it might be the right thing to do. > > > > > > We can do that, but this really suggests that there's a fundemantal > > > inability to pass arguments between code which is noinstr and code which > > > isinstrumented with AddressSanitizer, and that's inevitably going to > > > bite us in future. > > > > This is true (assuming you mean MemorySanitizer), but: > > Sorry, yes, I meant MemorySanitizer. > > > - I don't think we can avoid that, given that there will always be > > non-instrumented code; > > - so far the number of annotations has been manageable. > > As above, I suspect that we're actually missing many necessary > annotations and getting away with this by accident. > > I suspect that we need additional compiler support to say something like > "assume this argument/return value IS initialized". There is already a more generic one saying "assume all arguments/return values are uninitialized" - that's __no_kmsan_checks. Do we need a narrower attribute knowing that for a call from uninstrumented code we'll anyway need to mark all the arguments? > > Hope the explanation above helps. > > > > > > If that's supposed to carry some global or current context, surely > > > blatting that in entry code will affect the code that was interrupted? > > > > It will mostly affect the bottom-level function called by the entry > > code, after that, nested functions will be just passing their > > parameters as per KMSAN ABI. > > The problem I was explaining was the other way around: we clobber the > context of the function that was executing *before* exception entry. > > Consider when the CPU is executing some function foo(), and an exception > is taken. Within the exception handler, code will clobber the context > state. AFAICT, that state is not saved/restored, and nor is it zeroed > prior to exception return. The exception handler returns back to the > middle of foo(). From foo's PoV, the context state has been clobbered > arbitrarily, and that clobbered state could trigger false positives or > false negatives. > > The way kmsan_get_context() uses in_task() will avoid that in *some* > cases, but not *all* cases. In particular, I don't think that's going to > help when a fault is taken for a uaccess. > > Maybe I'm missing something here? > > > > I see kmsan_get_context() has an in_task() check, but that can't help > > > with nested exceptions, so this doesn't look right at all. > > > > KMSAN context is per-task, and if !in_task(), it is per-CPU. > > > > For the nested exceptions, we conservatively bail out (see > > kmsan_in_runtime()) to avoid deadlocking. > > This may cause false negatives. > > AFAICT that doesn't address the case I've described above. > > Mark. You are right, it's quite messy for uaccess, and maybe other cases as well. In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq() to switch between different per-cpu contexts, but that didn't work out well, because preempt_count() could change in the middle of certain low-level functions (which the instrumentation ignored, as the reference to the context was only calculated in the prologue). I think this part indeed works by sheer luck. We need an annotation telling the tool where an interrupt context starts and ends. On the compiler side, we'll probably also need an intrinsic to reload the context pointer. -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-22 6:26 ` Alexander Potapenko @ 2026-06-02 11:53 ` Mark Rutland 0 siblings, 0 replies; 13+ messages in thread From: Mark Rutland @ 2026-06-02 11:53 UTC (permalink / raw) To: Alexander Potapenko Cc: Thomas Gleixner, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Fri, May 22, 2026 at 08:26:30AM +0200, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 7:46 PM Mark Rutland <mark.rutland@arm.com> wrote: > > On Tue, May 12, 2026 at 06:24:03PM +0200, Alexander Potapenko wrote: > > > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: > > > > For context, can you explain how this is expected to work across > > > > compilation units when the caller and callee *are* instrumented, when an > > > > argument is passed in a register? > > > > > > Per KMSAN ABI, the metadata for the function arguments is passed via > > > buffers in the so-called context state (see > > > include/linux/kmsan_types.h) > > > In the userspace, these buffers are thread-local variables referenced > > > by inline loads and stores. > > > In the kernel space, the compiler inserts a call > > > __msan_get_context_state() at the beginning of every function, and > > > then the instrumentation code uses whatever that function returned. > > > > > > Assume we have a function: > > > > > > int sum(int a, int b) { > > > result = a + b; > > > return result; > > > } > > > > > > Its instrumented version looks roughly as follows (we'll omit origin > > > tracking for simplicity): > > > > > > int sum(int a, int b) { > > > struct kmsan_context_state *kcs = __msan_get_context_state(); > > > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > > > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > > > result = a + b; > > > s_result = s_a | s_b; > > > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > > > return result; > > > } > > > > > > Most certainly `a` and `b` will be passed using registers, but that > > > doesn't matter: their metadata is safe as long as the caller does: > > > > > > ((int)kcs->param_tls)[0] = s_a; > > > ((int)kcs->param_tls)[1] = s_b; > > > result = sum(a, b); > > > s_result = ((int)kcs->retval_tls)[0]; > > > > Ok. I see how that works when both caller and callee are instrumented. > > > > I assume that's tracked for all arguments regardless of type? e.g. that > > includes pointers, structs, etc? > > That's tracked for all parameters passed to a function, whether on the > stack or in registers. > This includes cases when a struct can be passed by value, e.g.: > > struct arg { > int a, b; > }; > > int sum(struct arg arg) { > return arg.a + arg.b; > } > > For anything passed to the function by pointer, we only track the > initializedness of the pointer itself at the call site and the > function prologue. Yep, that was what I expected. Thanks for confirming! > Why is that? Because we already track the state of the pointed-to > memory elsewhere. > For every heap allocation, there is shadow memory backing it, which is > initially poisoned (unless it is a GFP_ZERO allocation). > Similarly, shadow memory backs every stack allocation. The state of > stack allocations is initially set by __msan_poison_alloca(). > When we write an initialized value to a memory buffer, the > instrumentation code updates that buffer's shadow memory. > > So the fact that a buffer is passed to a function by pointer does not > change that buffer's state; that's why we don't need additional > tracking of the pointed-to memory around the calls. > (this is all assuming both the caller and the callee are instrumented). Understood. [...] > > > We have few tools to deal with such cases. It often helps to move the > > > border between the instrumented and non-instrumented code by applying > > > __no_kmsan_checks or __no_sanitize_memory until we get to a point > > > where there are no incorrect shadow arguments. > > > Another way to deal with uninitialized data coming from > > > non-instrumented code is kmsan_unpoison_memory(address, size). > > > Unfortunately, as Thomas pointed out, we can't use it for locals in > > > non-instrumented code. > > > > It's clearly not sufficient to use kmsan_unpoison_memory() for locals, > > and I think it's non-sensical because conceptually they aren't memory. > > > > I don't think kmsan_unpoison_entry_regs() alone is sufficient for the > > regs. Surely the pointer argument itself needs to be unpoisoned before > > being passed to a callee? > > > > Otherwise, when pointer to regs is passed as an argument to a callee, > > the callee will consume stale shadow, which might indicate that the > > pointer to regs is uninitialized? > > > > That seems to be exactly what we're hitting with the irq entry state, so > > I'm surprised we're not hitting it for the regs. Is that sheer luck, or > > is something else protecting us? > > Struct regs is created by non-instrumented code on the stack, that's > why its shadow contains whatever garbage that stack slot had when > leaving instrumented code. I'm talking about the *pointer itself*, not the memory the pointer points to. I understand that the shadow for the memory will contain garbage, but I'm asking about the tracking of the *local variable* that is a pointer to that memory. Note: I assume that by 'struct regs' you mean 'struct pt_regs'. > For the data pointed to by individual registers, there are usually two > possibilities: > 1. It is allocated by instrumented code and already has some state, > and struct regs is being used to transport it between instrumented > regions. > 2. It is allocated by non-instrumented code, belongs to the same shim > that created it, and instrumented code is not supposed to touch it. > > So I think this is not "sheer luck", but rather the fact that > non-instrumented code doesn't allocate memory. As above, that doesn't address my concern, becuase I'm talking about the *pointer* not the *memory*. I still think that it's sheer luck that callees think the *pointer* is initiailiased. What ensures that the *pointer to the regs* is marked as initialised? The entry asm doesn't ensure that. The noinstr entry code only unpoisons the memory, but not the pointer itself. The instrumented code called by the noinstr code doesn't seem to have anything to suppress checks on the pointer argument. AFAICT, at the point where noinstr code calls instrumented code, we're relying on stable shadow indicating that register arguments have been initialized, including the pointer to the regs. [...] > > > > > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), > > > > > making all its inputs initialized. This is the easiest solution, it may > > > > > introduce false negatives, but we are on a very thin ice anyway, so perhaps > > > > > doing so is better than dealing with more false positives in the interrupt code. > > > > > > > > > > Another option for the callee would be applying `__always_inline`, so that > > > > > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. > > > > > Given that irqentry_exit_to_kernel_mode_after_preempt() is already > > > > > `__always_inline`, it might be the right thing to do. > > > > > > > > We can do that, but this really suggests that there's a fundemantal > > > > inability to pass arguments between code which is noinstr and code which > > > > isinstrumented with AddressSanitizer, and that's inevitably going to > > > > bite us in future. > > > > > > This is true (assuming you mean MemorySanitizer), but: > > > > Sorry, yes, I meant MemorySanitizer. > > > > > - I don't think we can avoid that, given that there will always be > > > non-instrumented code; > > > - so far the number of annotations has been manageable. > > > > As above, I suspect that we're actually missing many necessary > > annotations and getting away with this by accident. > > > > I suspect that we need additional compiler support to say something like > > "assume this argument/return value IS initialized". > > There is already a more generic one saying "assume all > arguments/return values are uninitialized" - that's __no_kmsan_checks. > Do we need a narrower attribute knowing that for a call from > uninstrumented code we'll anyway need to mark all the arguments? We certainly need to be inhibiting checks in cases where we're not today. How does that work for something like: void do_something_with_foo(struct foo *f) { [[ accesses foo fields here ]] } void called_from_noinstr(struct foo *f) __no_kmsan_checks { do_something_with_foo(f); } noinstr void foo(void) { struct foo f = { ... }; called_from_noinstr(f); } ... e.g. will called_from_noinstr() initialize the argument when calling do_something_with_foo() ? [...] > > Consider when the CPU is executing some function foo(), and an exception > > is taken. Within the exception handler, code will clobber the context > > state. AFAICT, that state is not saved/restored, and nor is it zeroed > > prior to exception return. The exception handler returns back to the > > middle of foo(). From foo's PoV, the context state has been clobbered > > arbitrarily, and that clobbered state could trigger false positives or > > false negatives. > > > > The way kmsan_get_context() uses in_task() will avoid that in *some* > > cases, but not *all* cases. In particular, I don't think that's going to > > help when a fault is taken for a uaccess. > > > > Maybe I'm missing something here? > > > > > > I see kmsan_get_context() has an in_task() check, but that can't help > > > > with nested exceptions, so this doesn't look right at all. > > > > > > KMSAN context is per-task, and if !in_task(), it is per-CPU. > > > > > > For the nested exceptions, we conservatively bail out (see > > > kmsan_in_runtime()) to avoid deadlocking. > > > This may cause false negatives. > > > > AFAICT that doesn't address the case I've described above. > > > > Mark. > > You are right, it's quite messy for uaccess, and maybe other cases as well. > In the past, I attempted to use in_nmi(), in_hardirq(), in_softirq() > to switch between different per-cpu contexts, but that didn't work out > well, because preempt_count() could change in the middle of certain > low-level functions (which the instrumentation ignored, as the > reference to the context was only calculated in the prologue). To be frank, I don't think we can rely on per-cpu contexts here, and that shapre of check is the wrong approach. AFAICT, the only way to make this work without false negatives is to have this nest, with entry code saving/restoring the original context. Given the size of that context, it seems impractical to put that on the stack, so I don't see how we could reaonably make that work. We might get away with clobbering the state when returning from an exception, marking all everything as initialized, but that will lead to false negatives a lot of the time. > I think this part indeed works by sheer luck. > We need an annotation telling the tool where an interrupt context > starts and ends. On the compiler side, we'll probably also need an > intrinsic to reload the context pointer. As above, I think there are further problems with that. Mark. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-12 16:24 ` Alexander Potapenko 2026-05-12 17:46 ` Mark Rutland @ 2026-05-13 0:36 ` Thomas Gleixner 2026-05-13 7:54 ` Peter Zijlstra 2026-05-27 12:39 ` Alexander Potapenko 1 sibling, 2 replies; 13+ messages in thread From: Thomas Gleixner @ 2026-05-13 0:36 UTC (permalink / raw) To: Alexander Potapenko, Mark Rutland Cc: Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie Alexander! On Tue, May 12 2026 at 18:24, Alexander Potapenko wrote: > On Tue, May 12, 2026 at 1:15 PM Mark Rutland <mark.rutland@arm.com> wrote: >> > Again, this only matters because we are calling an instrumented function from >> > a non-instrumented one, otherwise it's perfectly fine to call between >> > compilation units. >> >> For context, can you explain how this is expected to work across >> compilation units when the caller and callee *are* instrumented, when an >> argument is passed in a register? > > Per KMSAN ABI, the metadata for the function arguments is passed via > buffers in the so-called context state (see > include/linux/kmsan_types.h) > In the userspace, these buffers are thread-local variables referenced > by inline loads and stores. > In the kernel space, the compiler inserts a call > __msan_get_context_state() at the beginning of every function, and > then the instrumentation code uses whatever that function returned. > > Assume we have a function: > > int sum(int a, int b) { > result = a + b; > return result; > } > > Its instrumented version looks roughly as follows (we'll omit origin > tracking for simplicity): > > int sum(int a, int b) { > struct kmsan_context_state *kcs = __msan_get_context_state(); > int s_a = ((int)kcs->param_tls)[0]; // shadow of a > int s_b = ((int)kcs->param_tls)[1]; // shadow of b > result = a + b; > s_result = s_a | s_b; > ((int)kcs->retval_tls)[0] = s_result; // returned shadow > return result; > } > > Most certainly `a` and `b` will be passed using registers, but that > doesn't matter: their metadata is safe as long as the caller does: > > ((int)kcs->param_tls)[0] = s_a; > ((int)kcs->param_tls)[1] = s_b; > result = sum(a, b); > s_result = ((int)kcs->retval_tls)[0]; Thanks for the explanation. TBH, I'm failing to see how this is not inSANe, but I might be missing something obvious. What really bothers me is the lack of distinction between arguments passed by value vs. arguments passed by reference. For me there is a clear distinction. Any argument passed by reference, i.e. pointer, obviously requires to be validated at the actual usage site, which might be several levels deep into the call chain. That obviously does not include by value arguments which are passed on the stack due to exceeding the register passing constraints, But for arguments passed by value, I fail to see the justification. The compiler can validate at the call site that the value arguments are properly initialized both at compile and at runtime: 1) If the argument value is a constant it is initialized 2) If the argument value is read from memory at the call site, then the function which reads has to validate that the memory is initialized before reading it. 3) If the argument value was returned from a previous function call then that invoked function has to make sure that the returned value is properly initialized. 4) If the argument is a local variable which is not initialized that's catched at compile time with and without KMSAN. If people ignore that warning, then they will ignore the resulting KMSAN splat too. 5) If the argument is only propagated to the next call, then #1 - #4 apply to the caller, i.e. all of that propagates through the call chain. 6) If the argument is handed in from the previous caller and modified by the function which calls the next one, then still #1 - #4 apply. IOW, the producer of a value argument has to ensure that the value argument is properly initialized. This whole 'is initialized' propagation for by value arguments is overengineered voodoo in my opinion. Why? If the compiler can't ensure at the producer site that something is properly initialized and then can't ensure that the intermediates handing the value argument down the callchain are doing the right thing, then KMSAN won't help either as it is then by definition equally untrustworthy as the untrustworthy compiler which emitted the KMSAN self-validation along with the broken code. So it just adds overhead for nothing and makes everything look "sane" while in fact it provides no value and creates exactly the problems we are debating right now. No? > All the metadata tracking is implemented using an LLVM IR pass, which > ensures that the shadow for each uninitialized value is tracked > regardless of where that value is stored - be that heap memory, stack > or registers: > - when a value is created or loaded from memory, the compiler inserts > SSA registers corresponding to that value's shadow; > - when a value is written to memory, a shadow store is inserted; > - when a value is used in an operation, the result of that operation > receives a shadow value depending on the operands and their shadow > values; > - when a value is passed to a function, the corresponding context > state stores and loads are emitted. > > Now, this works fine under the assumption that all code in the kernel > is instrumented with KMSAN. > However this is not true. We know that since KMSAN got integrated and you pointed yourself to that discussion which we had back then about handling the @regs pointer argument. There was a brief mentioning of the same problem about non-pointer arguments, but that dried out because the test runs did not barf anymore after the @regs unpoisoning got fixed. There were discussions about utilizing instrumentation_begin()/end() to tell the compiler that this instrumentation_begin() lifts the 'do not sanitize directive' and it could rightfully inject instrumentation code there including calls, but obviously nothing happened. What really worries me more than this particular problem at hand is that we have tons of places which invoke instrumented functions from non-instrumented code with arguments originating from the non-instrumented code. Here is an example: DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); } #define DEFINE_IDTENTRY_ERRORCODE(func) \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code); \ \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ irqentry_state_t state = irqentry_enter(regs); \ \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ irqentry_exit(regs, state); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ unsigned long error_code) @error_code comes all the way from the ASM entry code. The function which gets inlined within the instrumentation_begin()/end() region in the non-instrumented exc_invalid_tss() is: static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code) { do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); } Anything else than @regs (already unpoisoned) and @error_code is compile time constant. do_error_trap() is in the same compilation unit, static and instrumentable: static void do_error_trap(struct pt_regs *regs, long error_code, char *str, unsigned long trapnr, int signr, int sicode, void __user *addr) { RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) != NOTIFY_STOP) { cond_local_irq_enable(regs); do_trap(trapnr, signr, str, regs, error_code, sicode, addr); cond_local_irq_disable(regs); } } That should suffer from the very same problem vs. _all_ arguments except @regs, right? But no, it does not. The same compilers which emit imaginary initialized checks for something they out of lined themself despite knowing that the only calling context is not instrumentable, ignore the very same for do_error_trap(). According to the disassembly do_error_trap() just goes and invokes notify_die() without any checks and notify_die() takes the same (reordered) arguments uninspected and stores them into a on stack data struct which is properly poisened first and then unpoisoned for each member stored. That's either insanely inconsistent or I've missed something in the resulting ASM maze. But I'm pretty sure that I did not, so I spare you the ASM analysis as you surely are able to look at the resulting inconsistent assembly code yourself (clang-19 debian based). >> > So what we could do is to add annotations manualy on either the caller side or >> > the callee side. >> >> Without some understanding of those "hints" you mention, I don't see how >> we can do that on the caller side. > > In this case by "hints" I meant any kind of instrumentation that the > compiler could have inserted automatically to mark the data in the > instrumented functions as initialized. > There are no such "hints" right now (apart from letting KMSAN > instrument the function). > As for the manual annotations, we only have the mentioned > `__no_kmsan_checks`, `__no_sanitize_memory`, and > kmsan_unpoison_memory(). Which is a sad state of affairs. These hints have been debated four years ago for the very same reasons and nothing happened since then other than the very same compiler becoming more agressive about uninling. >> > We can apply `__no_kmsan_checks` to irqentry_exit_to_kernel_mode_preempt(), >> > making all its inputs initialized. This is the easiest solution, it may >> > introduce false negatives, but we are on a very thin ice anyway, so perhaps >> > doing so is better than dealing with more false positives in the interrupt code. >> > >> > Another option for the callee would be applying `__always_inline`, so that >> > irqentry_exit_to_kernel_mode_preempt() also becomes non-instrumented. >> > Given that irqentry_exit_to_kernel_mode_after_preempt() is already >> > `__always_inline`, it might be the right thing to do. >> > >> We can do that, but this really suggests that there's a fundemantal >> inability to pass arguments between code which is noinstr and code which >> isinstrumented with AddressSanitizer, and that's inevitably going to >> bite us in future. For the problem at hand I really want to go and mark the inline __always_inline so it's gone for now. I hate that "solution" with a passion as it just papers over the underlying problem and I definitely agree with Mark that this can only to be considered to be a temporary band aid and is going to bite use over and over again. As I pointed out with exc_invalid_tss() this just works by chance and not by design and there are far more places than that to illustrate it. > This is true (assuming you mean MemorySanitizer), but: > - I don't think we can avoid that, given that there will always be > non-instrumented code; > - so far the number of annotations has been manageable. I agree that it's unavoidable, but I don't agree with the second sentiment unless you meant: "Has been manageable by sheer luck". What I said in that four year old discussion you pointed to earlier "I'm all for making use of advanced instrumentation, validation and debugging features, but this mindset of 'make the code comply to what the tool of today provides' is fundamentally wrong. Tools have to provide value to the programmer and not the other way round. Yes, it's more work on the tooling side, but the tooling side is mostly a one time effort while chasing the false positives is a long term nightmare." still applies and while we might paper over the underlying problem for now to keep the CI robots happy, we really have to come up with something which is reliable and at the same time provides the maximum coverage for instrumentation. After reading back on that old thread I still think that we should stick some hint for the compiler into instrumentation_begin()/end(). These macros exist to provide objtool a way to validate that non-instrumentable code does not accidentaly call out into instrumentable code without someone having done the reasoning why it is safe. But at the same time these sections obviously can tell the compiler that the by value arguments handed to any function invoked within that section have to be considered valid. I deliberately restricted this to by value arguments because by reference arguments need to be explicitly covered as we do today for @regs (also see above). So the compiler can either emit the required KMSAN magic right before the call or create a wrapper function, which does the required context initialization. IIRC, this has been debated four years ago already in some form. But now four years down the road that obviously does not solve the problem because nothing happened and we have to deal with existing compilers and the only thing we can do until compiler people get their act together is work around this insanity in kernel code again. Why does TCMalloc come to my mind right now? Ranted enough. Here is what I came up with which seems to be "workable" after throwing at least a dozen other "brilliant and trivial" ideas out: instrumentation_begin(); - func(arg1, arg2); + call_instr(func, typeof(arg1), arg1, typeof(arg2), arg2); instrumentation_end(); and let call_instr() inject the necessary KMSAN fixups before invoking @func. That would probably do some useless stuff when @func is an empty stub, a __always_inline function or subject to arbitrary inlining decisions of the compiler. But that's not the end of the world and the conversion can be mostly done by coccinelle or your favorite code conversion tool of the day. Something insane like this: #define KMSAN_FIXUP_VA_1(insane, type, arg) insane(type, arg) #define KMSAN_FIXUP_VA_2(insane, arg, ...) KMSAN_FIXUP_VA_1(insane, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_3(insane, type, arg, ...) insane(type, arg), KMSAN_FIXUP_VA_2(insane, __VA_ARGS__) #define KMSAN_FIXUP_VA_4(insane, arg, ...) KMSAN_FIXUP_VA_3(insane, arg, __VA_ARGS__) #define KMSAN_FIXUP_VA_FOR_EACH(insane, ...) \ CONCATENATE(KMSAN_FIXUP_VA_, COUNT_ARGS(__VA_ARGS__))(insane, __VA_ARGS__) #define KMSAN_VA_ARG_2(a1, a2) (a2) #define KMSAN_VA_ARG_4(a1, a2, a3, a4) (a2), (a4) #define VA_CALL(f, ...) \ f(CONCATENATE(KMSAN_VA_ARG_, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)) #define kmsan_fixup_arg_int(arg) __kmsan_fixup_arg_4(&(arg)) #define KMSAN_FIXUP_ARG(type, arg) \ CONCATENATE(kmsan_fixup_arg_, type)(arg) #define call_instr(f, ...) \ do { \ VA_KMSAN_FIXUP_FOR_EACH(KMSAN_FIXUP_ARG, __VA_ARGS__); \ VA_CALL(f, __VA_ARGS__); \ } while (0) That's obiously restricted to _two_ arguments and _one_ data type for illustration purposes, but making it work for the limited number of arguments and data types shouldn't be rocket science.. As a bonus it will fail to compile immediately for unknown data types and argument numbers. It preserves type safety by not playing silly casting games with the arguments on the way. Not that I'm a fan of this, but that's at least better than playing whack a mole with the insanities of toolchains forever. Thoughts? Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-13 0:36 ` Thomas Gleixner @ 2026-05-13 7:54 ` Peter Zijlstra 2026-05-27 12:39 ` Alexander Potapenko 1 sibling, 0 replies; 13+ messages in thread From: Peter Zijlstra @ 2026-05-13 7:54 UTC (permalink / raw) To: Thomas Gleixner Cc: Alexander Potapenko, Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, syzkaller-bugs, ruanjinjie On Wed, May 13, 2026 at 02:36:10AM +0200, Thomas Gleixner wrote: > But now four years down the road that obviously does not solve the > problem because nothing happened and we have to deal with existing > compilers and the only thing we can do until compiler people get their > act together is work around this insanity in kernel code again. > Thoughts? KMSAN is debug cruft. I don't see why we cannot demand they fix it and make KMSAN depend whatever clang will have the thing fixed. There is no reason we need to support existing compilers what so ever for debug code. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt 2026-05-13 0:36 ` Thomas Gleixner 2026-05-13 7:54 ` Peter Zijlstra @ 2026-05-27 12:39 ` Alexander Potapenko 1 sibling, 0 replies; 13+ messages in thread From: Alexander Potapenko @ 2026-05-27 12:39 UTC (permalink / raw) To: Thomas Gleixner Cc: Mark Rutland, Dmitry Vyukov, syzbot, kasan-dev, linux-kernel, luto, peterz, syzkaller-bugs, ruanjinjie On Wed, May 13, 2026 at 2:36 AM Thomas Gleixner <tglx@kernel.org> wrote: > > Alexander! > > Thanks for the explanation. > > TBH, I'm failing to see how this is not inSANe, but I might be missing > something obvious. > > What really bothers me is the lack of distinction between arguments > passed by value vs. arguments passed by reference. For me there is a > clear distinction. > > Any argument passed by reference, i.e. pointer, obviously requires to be > validated at the actual usage site, which might be several levels deep > into the call chain. That obviously does not include by value arguments > which are passed on the stack due to exceeding the register passing > constraints, > > But for arguments passed by value, I fail to see the justification. > > The compiler can validate at the call site that the value arguments are > properly initialized both at compile and at runtime: > > 1) If the argument value is a constant it is initialized > > 2) If the argument value is read from memory at the call site, then > the function which reads has to validate that the memory is > initialized before reading it. > > 3) If the argument value was returned from a previous function call > then that invoked function has to make sure that the returned > value is properly initialized. > > 4) If the argument is a local variable which is not initialized > that's catched at compile time with and without KMSAN. If people > ignore that warning, then they will ignore the resulting KMSAN > splat too. > > 5) If the argument is only propagated to the next call, then #1 - #4 > apply to the caller, i.e. all of that propagates through the > call chain. > > 6) If the argument is handed in from the previous caller and > modified by the function which calls the next one, then still #1 > - #4 apply. > > IOW, the producer of a value argument has to ensure that the value > argument is properly initialized. > > This whole 'is initialized' propagation for by value arguments is > overengineered voodoo in my opinion. Why? Sorry, I left out a big chunk of how KMSAN works. In fact, the above cases are covered by `-fsanitize-memory-param-retval`, which eagerly checks parameters and return values having the `noundef` attribute. For such cases, the compiler will not propagate the shadow into TLS. Yet we still have to maintain TLS parameter passing for types that lack the `noundef` attribute (e.g. structs with internal padding or vector types) and for variadic function arguments. > > If the compiler can't ensure at the producer site that something is > properly initialized and then can't ensure that the intermediates > handing the value argument down the callchain are doing the right > thing, then KMSAN won't help either as it is then by definition > equally untrustworthy as the untrustworthy compiler which emitted the > KMSAN self-validation along with the broken code. Agreed. > So it just adds overhead for nothing and makes everything look "sane" > while in fact it provides no value and creates exactly the problems we > are debating right now. > > No? I hope the above addresses this concern to some extent. While eager checks simplify the instrumentation code, they (as mentioned) do not completely remove the need for TLS parameter passing. Long story short, it's unsafe to assume at the beginning of an instrumented function that all its by-value arguments are initialized. If that were true, we would indeed have no problems with non-instrumented functions calling instrumented ones. > > > > Now, this works fine under the assumption that all code in the kernel > > is instrumented with KMSAN. > > However this is not true. > > We know that since KMSAN got integrated and you pointed yourself to that > discussion which we had back then about handling the @regs pointer > argument. > > There was a brief mentioning of the same problem about non-pointer > arguments, but that dried out because the test runs did not barf anymore > after the @regs unpoisoning got fixed. > > There were discussions about utilizing instrumentation_begin()/end() to > tell the compiler that this instrumentation_begin() lifts the 'do not > sanitize directive' and it could rightfully inject instrumentation code > there including calls, but obviously nothing happened. Right. At that point, we thought __no_kmsan_checks should be enough to keep the complexity manageable. The more I think about it now, the more I am convinced it shouldn't be hard to implement intrinsics that allow instrumentation between instrumentation_begin()/instrumentation_end() and treat all values passed into that region as initialized. This should fix problems with noinstr functions calling instrumented functions. > What really worries me more than this particular problem at hand is that > we have tons of places which invoke instrumented functions from > non-instrumented code with arguments originating from the > non-instrumented code. Here is an example: > > DEFINE_IDTENTRY_ERRORCODE(exc_invalid_tss) > { > do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, > 0, NULL); > } > > #define DEFINE_IDTENTRY_ERRORCODE(func) \ > static __always_inline void __##func(struct pt_regs *regs, \ > unsigned long error_code); \ > \ > __visible noinstr void func(struct pt_regs *regs, \ > unsigned long error_code) \ > { \ > irqentry_state_t state = irqentry_enter(regs); \ > \ > instrumentation_begin(); \ > __##func (regs, error_code); \ > instrumentation_end(); \ > irqentry_exit(regs, state); \ > } \ > \ > static __always_inline void __##func(struct pt_regs *regs, \ > unsigned long error_code) > > @error_code comes all the way from the ASM entry code. > > The function which gets inlined within the instrumentation_begin()/end() > region in the non-instrumented exc_invalid_tss() is: > > static __always_inline void __exc_invalid_tss(struct pt_regs *regs, unsigned long error_code) > { > do_error_trap(regs, error_code, "invalid TSS", X86_TRAP_TS, SIGSEGV, 0, NULL); > } > > Anything else than @regs (already unpoisoned) and @error_code is compile > time constant. > > do_error_trap() is in the same compilation unit, static and > instrumentable: > > static void do_error_trap(struct pt_regs *regs, long error_code, char *str, > unsigned long trapnr, int signr, int sicode, void __user *addr) > { > RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU"); > > if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) != > NOTIFY_STOP) { > cond_local_irq_enable(regs); > do_trap(trapnr, signr, str, regs, error_code, sicode, addr); > cond_local_irq_disable(regs); > } > } > > That should suffer from the very same problem vs. _all_ arguments > except @regs, right? > > But no, it does not. The same compilers which emit imaginary initialized > checks for something they out of lined themself despite knowing that the > only calling context is not instrumentable, ignore the very same for > do_error_trap(). > > According to the disassembly do_error_trap() just goes and invokes > notify_die() without any checks and notify_die() takes the same > (reordered) arguments uninspected and stores them into a on stack data > struct which is properly poisened first and then unpoisoned for each > member stored. Let's look at the compilation result with `-S -emit-llvm` - it is generally easier to comprehend and gives better understanding of what's going on with the instrumentation: ; Function Attrs: fn_ret_thunk_extern noredzone nounwind null_pointer_is_valid sanitize_memory define internal fastcc void @do_error_trap(ptr noundef %regs, i64 noundef %error_code, ptr noundef %str, i64 noundef range(i64 0, 13) %trapnr, i32 noundef range(i32 4, 12) %signr, i32 noundef range(i32 0, 3) %sicode, ptr noundef %addr) unnamed_addr #9 align 16 !dbg !10395 { entry: %0 = call ptr @__msan_get_context_state() #19, !dbg !11994 %retval_shadow = getelementptr i8, ptr %0, i64 800, !dbg !11994 %param_origin = getelementptr i8, ptr %0, i64 3208, !dbg !11994 %retval_origin = getelementptr i8, ptr %0, i64 4008, !dbg !11994 #dbg_value(ptr %regs, !10394, !DIExpression(), !11995) #dbg_value(i64 %error_code, !10399, !DIExpression(), !11995) #dbg_value(ptr %str, !10400, !DIExpression(), !11995) #dbg_value(i64 %trapnr, !10401, !DIExpression(), !11995) #dbg_value(i32 %signr, !10402, !DIExpression(), !11995) #dbg_value(i32 %sicode, !10403, !DIExpression(), !11995) #dbg_value(ptr %addr, !10404, !DIExpression(), !11995) call void @__sanitizer_cov_trace_pc() #20, !dbg !11994 %conv = trunc nuw nsw i64 %trapnr to i32, !dbg !11996 store i32 0, ptr %retval_shadow, align 8, !dbg !11997 %call = call i32 @notify_die(i32 noundef 8, ptr noundef %str, ptr noundef %regs, i64 noundef %error_code, i32 noundef %conv, i32 noundef %signr) #21, !dbg !11997 ... There are no checks for `@notify_die` arguments because `@do_error_trap` receives them as `noundef`. If eager checks are enabled, KMSAN pass knows that these arguments were already checked previously (see https://www.llvm.org/doxygen/MemorySanitizer_8cpp_source.html#l02149), so it omits the checks. `@do_error_trap` is not instrumented, but even if it were, the check for `%error_code` would have been pushed to its caller because of the `noundef` attribute. ; Function Attrs: disable_sanitizer_instrumentation fn_ret_thunk_extern noinline noprofile noredzone nosanitize_coverage nounwind null_pointer_is_valid define dso_local void @exc_invalid_tss(ptr noundef %regs, i64 noundef %error_code) local_unnamed_addr #4 section ".noinstr.text" align 16 !dbg !10467 { entry: #dbg_value(ptr %regs, !10469, !DIExpression(), !10472) #dbg_value(i64 %error_code, !10470, !DIExpression(), !10472) %call = call i8 @irqentry_enter(ptr noundef %regs) #21, !dbg !10473 #dbg_value(i8 %call, !10471, !DIExpression(), !10472) call void asm sideeffect "801: nop\0A\09.pushsection .discard.annotate_insn, \22M\22, @progbits, 8; .long 801b - ., 3; .popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 801) #19, !dbg !10474, !srcloc !10477 #dbg_value(ptr %regs, !10478, !DIExpression(), !10482) #dbg_value(i64 %error_code, !10481, !DIExpression(), !10482) call fastcc void @do_error_trap(ptr noundef %regs, i64 noundef %error_code, ptr noundef nonnull @.str.15, i64 noundef 10, i32 noundef 11, i32 noundef 0, ptr noundef null) #22, !dbg !10485 call void asm sideeffect "802: nop\0A\09.pushsection .discard.annotate_insn, \22M\22, @progbits, 8; .long 802b - ., 4; .popsection", "i,~{dirflag},~{fpsr},~{flags}"(i32 802) #19, !dbg !10486, !srcloc !10489 call void @irqentry_exit(ptr noundef %regs, i8 %call) #21, !dbg !10490 ret void, !dbg !10473 } So, if a `noundef` value is passed along a chain of function calls, it is checked only once, at the point where we don't yet know that it is `noundef`. As mentioned, for `noundef` arguments, no shadow is stored in the TLS. Thus, even if non-instrumented functions are along the way, their callees still assume that the `noundef` argument is initialized (which may introduce false negatives, but these are inevitable when non-instrumented code is present). To achieve the same behavior for the function argument with TLS shadow propagation, we'll indeed need to enable instrumentation within the instrumentation_begin()/instrumentation_end() region. I think doing so will resolve the issues with instrumented functions being called from non-instrumented functions in the same KMSAN context, and will not require the call_instr() magic you are suggesting. So let me finally take a stab at it. Note that the issue with incorrect context tracking in [soft]irqs is orthogonal to this and will need to be addressed separately. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-06-02 11:54 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-21 21:37 [syzbot] [kernel?] upstream test error: KMSAN: uninit-value in irqentry_exit_to_kernel_mode_preempt syzbot 2026-04-29 6:29 ` Dmitry Vyukov 2026-04-29 8:09 ` Alexander Potapenko 2026-05-11 12:25 ` Thomas Gleixner 2026-05-12 9:33 ` Alexander Potapenko 2026-05-12 11:15 ` Mark Rutland 2026-05-12 16:24 ` Alexander Potapenko 2026-05-12 17:46 ` Mark Rutland 2026-05-22 6:26 ` Alexander Potapenko 2026-06-02 11:53 ` Mark Rutland 2026-05-13 0:36 ` Thomas Gleixner 2026-05-13 7:54 ` Peter Zijlstra 2026-05-27 12:39 ` Alexander Potapenko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.