* Re: [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
[not found] <bug-219349-41252@https.bugzilla.kernel.org/>
@ 2024-10-03 21:39 ` Bjorn Helgaas
2024-10-04 5:55 ` Marcin Mirosław
0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2024-10-03 21:39 UTC (permalink / raw)
To: Lu Baolu, Kevin Tian, Joerg Roedel; +Cc: David Woodhouse, linux-pci, iommu
On Thu, Oct 03, 2024 at 08:15:16PM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219349
>
> Summary: RIP: 0010:pci_for_each_dma_alias
> (./include/linux/pci.h:692 drivers/pci/search.c:41)
> Kernel Version: 6.12.0-rc1 8c245fe7dde3
> Regression: Yes
>
> Created attachment 306959
> --> https://bugzilla.kernel.org/attachment.cgi?id=306959&action=edit
> lspci -vv
>
> Hello,
> I see BUG: kernel NULL pointer dereference using kernel 6.12.0-rc1 (actually at
> 8c245fe7dde3 but don't know what is first bad commit).
Thanks very much for this report! You marked this as a regression;
Marcin, do you know the most recent kernel where you did not see this
issue?
> RPC: Registered tcp NFSv4.1 backchannel transport module.
> intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)#012 SUBSYSTEM=pci#012
> DEVICE=+pci:0000:00:15.0
> platform idma64.0: Adding to iommu group 12#012 SUBSYSTEM=platform#012
> DEVICE=+platform:idma64.0
> BUG: kernel NULL pointer dereference, address: 00000000000000d8
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> CPU: 0 UID: 0 PID: 430 Comm: (udev-worker) Not tainted
> 6.12.0-rc1-00113-g8c245fe7dde3 #50
> Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
> RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
> drivers/pci/search.c:41)
> Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
> e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
> e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
> All code
> ========
> 0: 00 0f add %cl,(%rdi)
> 2: 1f (bad)
> 3: 40 00 0f rex add %cl,(%rdi)
> 6: 1f (bad)
> 7: 44 00 00 add %r8b,(%rax)
> a: 41 56 push %r14
> c: 41 55 push %r13
> e: 41 54 push %r12
> 10: 49 89 d4 mov %rdx,%r12
> 13: 55 push %rbp
> 14: 48 89 f5 mov %rsi,%rbp
> 17: 53 push %rbx
> 18: e8 18 d2 ff ff call 0xffffffffffffd235
> 1d: 4c 89 e2 mov %r12,%rdx
> 20: 48 89 c3 mov %rax,%rbx
> 23: 48 8b 40 10 mov 0x10(%rax),%rax
> 27: 48 89 df mov %rbx,%rdi
> 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
> instruction
> 31: c1 e6 08 shl $0x8,%esi
> 34: 66 0b 73 38 or 0x38(%rbx),%si
> 38: 0f b7 f6 movzwl %si,%esi
> 3b: ff d5 call *%rbp
> 3d: 85 c0 test %eax,%eax
> 3f: 41 rex.B
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
> 7: c1 e6 08 shl $0x8,%esi
> a: 66 0b 73 38 or 0x38(%rbx),%si
> e: 0f b7 f6 movzwl %si,%esi
> 11: ff d5 call *%rbp
> 13: 85 c0 test %eax,%eax
> 15: 41 rex.B
> RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
> RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
> RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
> RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
> R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
> R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
> FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
> ? page_fault_oops (arch/x86/mm/fault.c:715)
> ? do_user_addr_fault (./include/linux/kprobes.h:589 (discriminator 1)
> arch/x86/mm/fault.c:1240 (discriminator 1))
> ? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:152
> kernel/locking/spinlock.c:194)
> ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37
> ./arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489
> arch/x86/mm/fault.c:1539)
> ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
> ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
> ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
> ? pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
> ? pci_for_each_dma_alias (drivers/pci/search.c:41 (discriminator 1))
> device_block_translation (drivers/iommu/intel/iommu.c:3370)
> intel_iommu_attach_device (drivers/iommu/intel/iommu.c:3635)
> __iommu_attach_device (drivers/iommu/iommu.c:2084)
> __iommu_device_set_domain (drivers/iommu/iommu.c:2272)
I suspect this path:
__iommu_device_set_domain
__iommu_attach_device
domain->ops->attach_dev
intel_iommu_attach_device
device_block_translation
domain_context_clear
if (!dev_is_pci(info->dev))
domain_context_clear_one
pci_for_each_dma_alias
and 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with
context mapping") looks a little suspicious to me since
domain_context_clear() now calls domain_context_clear_one() for
non-PCI devices, but then goes on to also use pci_for_each_dma_alias()
for ALL devices, even non-PCI ones.
But 9a16ab9d6402 appeared in v6.7-rc4, so it's been around a while.
Maybe a more recent change added non-PCI devices into the mix, so
previously we only got there with PCI devices?
> iommu_setup_default_domain (drivers/iommu/iommu.c:2326 (discriminator 2)
> drivers/iommu/iommu.c:2992 (discriminator 2))
> __iommu_probe_device (drivers/iommu/iommu.c:567)
> iommu_probe_device (drivers/iommu/iommu.c:604)
> iommu_bus_notifier (drivers/iommu/iommu.c:1668 drivers/iommu/iommu.c:1659)
> notifier_call_chain (kernel/notifier.c:95)
> blocking_notifier_call_chain (kernel/notifier.c:389 kernel/notifier.c:376)
> bus_notify (./include/linux/kobject.h:193 drivers/base/base.h:73
> drivers/base/bus.c:999)
> device_add (drivers/base/core.c:3656)
> platform_device_add (drivers/base/platform.c:717)
> mfd_add_device (drivers/mfd/mfd-core.c:274) mfd_core
> ? alloc_inode (fs/inode.c:265)
> ? make_kgid (kernel/user_namespace.c:483)
> ? inode_init_always (fs/inode.c:219)
> mfd_add_devices (drivers/mfd/mfd-core.c:329) mfd_core
> intel_lpss_probe (drivers/mfd/intel-lpss.c:443 drivers/mfd/intel-lpss.c:390)
> intel_lpss
> ? _raw_spin_lock_irqsave (./arch/x86/include/asm/paravirt.h:584
> ./arch/x86/include/asm/qspinlock.h:51 ./include/asm-generic/qspinlock.h:114
> ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:111
> kernel/locking/spinlock.c:162)
> ? pci_conf1_write (arch/x86/pci/direct.c:78)
> intel_lpss_pci_probe (drivers/mfd/intel-lpss-pci.c:80) intel_lpss_pci
> local_pci_probe (drivers/pci/pci-driver.c:325)
> pci_device_probe (drivers/pci/pci-driver.c:392 (discriminator 1)
> drivers/pci/pci-driver.c:417 (discriminator 1) drivers/pci/pci-driver.c:451
> (discriminator 1))
> really_probe (drivers/base/dd.c:581 drivers/base/dd.c:658)
> ? __device_attach_driver (drivers/base/dd.c:1157)
> __driver_probe_device (drivers/base/dd.c:800)
> driver_probe_device (drivers/base/dd.c:831)
> __driver_attach (drivers/base/dd.c:1217 drivers/base/dd.c:1156)
> bus_for_each_dev (drivers/base/bus.c:369)
> bus_add_driver (drivers/base/bus.c:676)
> driver_register (drivers/base/driver.c:247)
> ? 0xffffffffc061b000
> do_one_initcall (init/main.c:1269)
> do_init_module (kernel/module/main.c:2544)
> init_module_from_file (kernel/module/main.c:3199)
> idempotent_init_module (kernel/module/main.c:3210)
> __x64_sys_finit_module (./include/linux/file.h:68 kernel/module/main.c:3238
> kernel/module/main.c:3220 kernel/module/main.c:3220)
> do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)
> arch/x86/entry/common.c:83 (discriminator 1))
> ? mntput_no_expire (fs/namespace.c:1460)
> ? terminate_walk (fs/namei.c:693 (discriminator 1))
> ? path_openat (fs/namei.c:3934)
> ? do_filp_open (fs/namei.c:3961 (discriminator 2))
> ? copy_from_kernel_nofault (mm/maccess.c:31 (discriminator 1))
> ? kmem_cache_alloc_noprof (mm/slub.c:494 mm/slub.c:539 mm/slub.c:528
> mm/slub.c:3964 mm/slub.c:4123 mm/slub.c:4142)
> ? do_sys_openat2 (fs/open.c:1424)
> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> kernel/entry/common.c:220)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> kernel/entry/common.c:220)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> kernel/entry/common.c:220)
> ? do_syscall_64 (arch/x86/entry/common.c:102)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7fe49238eb8d
> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01
> c3 48 8b 0d 6b 72 0c 00 f7 d8 64 89 01 48
> All code
> ========
> 0: ff c3 inc %ebx
> 2: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 9: 00 00 00
> c: 90 nop
> d: f3 0f 1e fa endbr64
> 11: 48 89 f8 mov %rdi,%rax
> 14: 48 89 f7 mov %rsi,%rdi
> 17: 48 89 d6 mov %rdx,%rsi
> 1a: 48 89 ca mov %rcx,%rdx
> 1d: 4d 89 c2 mov %r8,%r10
> 20: 4d 89 c8 mov %r9,%r8
> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
> 28: 0f 05 syscall
> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
> trapping instruction
> 30: 73 01 jae 0x33
> 32: c3 ret
> 33: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc72a5
> 3a: f7 d8 neg %eax
> 3c: 64 89 01 mov %eax,%fs:(%rcx)
> 3f: 48 rex.W
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
> 6: 73 01 jae 0x9
> 8: c3 ret
> 9: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc727b
> 10: f7 d8 neg %eax
> 12: 64 89 01 mov %eax,%fs:(%rcx)
> 15: 48 rex.W
> RSP: 002b:00007ffdf72f16f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 000055b0dae03330 RCX: 00007fe49238eb8d
> RDX: 0000000000000004 RSI: 00007fe49252231b RDI: 000000000000000d
> RBP: 0000000000000004 R08: 000055b0dae05940 R09: 000055b0dae03e80
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe49252231b
> R13: 0000000000020000 R14: 000055b0dadd00e0 R15: 000055b0dae03330
> </TASK>
> Modules linked in: intel_lpss_pci(+) intel_lpss idma64 raid_class virt_dma
> scsi_transport_sas mfd_core sunrpc dm_mirror dm_region_hash dm_log dm_mod btrfs
> blake2b_generic
> CR2: 00000000000000d8
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
> drivers/pci/search.c:41)
> Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
> e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
> e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
> All code
> ========
> 0: 00 0f add %cl,(%rdi)
> 2: 1f (bad)
> 3: 40 00 0f rex add %cl,(%rdi)
> 6: 1f (bad)
> 7: 44 00 00 add %r8b,(%rax)
> a: 41 56 push %r14
> c: 41 55 push %r13
> e: 41 54 push %r12
> 10: 49 89 d4 mov %rdx,%r12
> 13: 55 push %rbp
> 14: 48 89 f5 mov %rsi,%rbp
> 17: 53 push %rbx
> 18: e8 18 d2 ff ff call 0xffffffffffffd235
> 1d: 4c 89 e2 mov %r12,%rdx
> 20: 48 89 c3 mov %rax,%rbx
> 23: 48 8b 40 10 mov 0x10(%rax),%rax
> 27: 48 89 df mov %rbx,%rdi
> 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
> instruction
> 31: c1 e6 08 shl $0x8,%esi
> 34: 66 0b 73 38 or 0x38(%rbx),%si
> 38: 0f b7 f6 movzwl %si,%esi
> 3b: ff d5 call *%rbp
> 3d: 85 c0 test %eax,%eax
> 3f: 41 rex.B
>
> Code starting with the faulting instruction
> 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
> 7: c1 e6 08 shl $0x8,%esi
> a: 66 0b 73 38 or 0x38(%rbx),%si
> e: 0f b7 f6 movzwl %si,%esi
> 11: ff d5 call *%rbp
> 13: 85 c0 test %eax,%eax
> 15: 41 rex.B
> RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
> RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
> RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
> RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
> R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
> R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
> FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> note: (udev-worker)[430] exited with irqs disabled
> mpt3sas version 48.100.00.00 loaded
> md/raid1: md22: active with 2 out of 2 mirrors
> md22: detected capacity change from 0 to 62912512
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
2024-10-03 21:39 ` [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41) Bjorn Helgaas
@ 2024-10-04 5:55 ` Marcin Mirosław
2024-10-07 17:20 ` Bjorn Helgaas
0 siblings, 1 reply; 4+ messages in thread
From: Marcin Mirosław @ 2024-10-04 5:55 UTC (permalink / raw)
To: Bjorn Helgaas, Lu Baolu, Kevin Tian, Joerg Roedel
Cc: David Woodhouse, linux-pci, iommu
W dniu 03.10.2024 o 23:39, Bjorn Helgaas pisze:
> On Thu, Oct 03, 2024 at 08:15:16PM +0000, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219349
>>
>> Summary: RIP: 0010:pci_for_each_dma_alias
>> (./include/linux/pci.h:692 drivers/pci/search.c:41)
>> Kernel Version: 6.12.0-rc1 8c245fe7dde3
>> Regression: Yes
>>
>> Created attachment 306959
>> --> https://bugzilla.kernel.org/attachment.cgi?id=306959&action=edit
>> lspci -vv
>>
>> Hello,
>> I see BUG: kernel NULL pointer dereference using kernel 6.12.0-rc1 (actually at
>> 8c245fe7dde3 but don't know what is first bad commit).
>
> Thanks very much for this report! You marked this as a regression;
> Marcin, do you know the most recent kernel where you did not see this
> issue?
Kernel 6.11 works correctly, I didn't narrow suspect commit yet.
>> RPC: Registered tcp NFSv4.1 backchannel transport module.
>> intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)#012 SUBSYSTEM=pci#012
>> DEVICE=+pci:0000:00:15.0
>> platform idma64.0: Adding to iommu group 12#012 SUBSYSTEM=platform#012
>> DEVICE=+platform:idma64.0
>> BUG: kernel NULL pointer dereference, address: 00000000000000d8
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0
>> Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> CPU: 0 UID: 0 PID: 430 Comm: (udev-worker) Not tainted
>> 6.12.0-rc1-00113-g8c245fe7dde3 #50
>> Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
>> RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
>> drivers/pci/search.c:41)
>> Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
>> e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
>> e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
>> All code
>> ========
>> 0: 00 0f add %cl,(%rdi)
>> 2: 1f (bad)
>> 3: 40 00 0f rex add %cl,(%rdi)
>> 6: 1f (bad)
>> 7: 44 00 00 add %r8b,(%rax)
>> a: 41 56 push %r14
>> c: 41 55 push %r13
>> e: 41 54 push %r12
>> 10: 49 89 d4 mov %rdx,%r12
>> 13: 55 push %rbp
>> 14: 48 89 f5 mov %rsi,%rbp
>> 17: 53 push %rbx
>> 18: e8 18 d2 ff ff call 0xffffffffffffd235
>> 1d: 4c 89 e2 mov %r12,%rdx
>> 20: 48 89 c3 mov %rax,%rbx
>> 23: 48 8b 40 10 mov 0x10(%rax),%rax
>> 27: 48 89 df mov %rbx,%rdi
>> 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
>> instruction
>> 31: c1 e6 08 shl $0x8,%esi
>> 34: 66 0b 73 38 or 0x38(%rbx),%si
>> 38: 0f b7 f6 movzwl %si,%esi
>> 3b: ff d5 call *%rbp
>> 3d: 85 c0 test %eax,%eax
>> 3f: 41 rex.B
>>
>> Code starting with the faulting instruction
>> ===========================================
>> 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
>> 7: c1 e6 08 shl $0x8,%esi
>> a: 66 0b 73 38 or 0x38(%rbx),%si
>> e: 0f b7 f6 movzwl %si,%esi
>> 11: ff d5 call *%rbp
>> 13: 85 c0 test %eax,%eax
>> 15: 41 rex.B
>> RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
>> RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
>> RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
>> RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
>> R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
>> R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
>> FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> <TASK>
>> ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>> ? page_fault_oops (arch/x86/mm/fault.c:715)
>> ? do_user_addr_fault (./include/linux/kprobes.h:589 (discriminator 1)
>> arch/x86/mm/fault.c:1240 (discriminator 1))
>> ? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:152
>> kernel/locking/spinlock.c:194)
>> ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37
>> ./arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489
>> arch/x86/mm/fault.c:1539)
>> ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
>> ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
>> ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
>> ? pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
>> ? pci_for_each_dma_alias (drivers/pci/search.c:41 (discriminator 1))
>> device_block_translation (drivers/iommu/intel/iommu.c:3370)
>> intel_iommu_attach_device (drivers/iommu/intel/iommu.c:3635)
>> __iommu_attach_device (drivers/iommu/iommu.c:2084)
>> __iommu_device_set_domain (drivers/iommu/iommu.c:2272)
>
> I suspect this path:
>
> __iommu_device_set_domain
> __iommu_attach_device
> domain->ops->attach_dev
> intel_iommu_attach_device
> device_block_translation
> domain_context_clear
> if (!dev_is_pci(info->dev))
> domain_context_clear_one
> pci_for_each_dma_alias
>
> and 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with
> context mapping") looks a little suspicious to me since
> domain_context_clear() now calls domain_context_clear_one() for
> non-PCI devices, but then goes on to also use pci_for_each_dma_alias()
> for ALL devices, even non-PCI ones.
>
> But 9a16ab9d6402 appeared in v6.7-rc4, so it's been around a while.
> Maybe a more recent change added non-PCI devices into the mix, so
> previously we only got there with PCI devices?
>
>> iommu_setup_default_domain (drivers/iommu/iommu.c:2326 (discriminator 2)
>> drivers/iommu/iommu.c:2992 (discriminator 2))
>> __iommu_probe_device (drivers/iommu/iommu.c:567)
>> iommu_probe_device (drivers/iommu/iommu.c:604)
>> iommu_bus_notifier (drivers/iommu/iommu.c:1668 drivers/iommu/iommu.c:1659)
>> notifier_call_chain (kernel/notifier.c:95)
>> blocking_notifier_call_chain (kernel/notifier.c:389 kernel/notifier.c:376)
>> bus_notify (./include/linux/kobject.h:193 drivers/base/base.h:73
>> drivers/base/bus.c:999)
>> device_add (drivers/base/core.c:3656)
>> platform_device_add (drivers/base/platform.c:717)
>> mfd_add_device (drivers/mfd/mfd-core.c:274) mfd_core
>> ? alloc_inode (fs/inode.c:265)
>> ? make_kgid (kernel/user_namespace.c:483)
>> ? inode_init_always (fs/inode.c:219)
>> mfd_add_devices (drivers/mfd/mfd-core.c:329) mfd_core
>> intel_lpss_probe (drivers/mfd/intel-lpss.c:443 drivers/mfd/intel-lpss.c:390)
>> intel_lpss
>> ? _raw_spin_lock_irqsave (./arch/x86/include/asm/paravirt.h:584
>> ./arch/x86/include/asm/qspinlock.h:51 ./include/asm-generic/qspinlock.h:114
>> ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:111
>> kernel/locking/spinlock.c:162)
>> ? pci_conf1_write (arch/x86/pci/direct.c:78)
>> intel_lpss_pci_probe (drivers/mfd/intel-lpss-pci.c:80) intel_lpss_pci
>> local_pci_probe (drivers/pci/pci-driver.c:325)
>> pci_device_probe (drivers/pci/pci-driver.c:392 (discriminator 1)
>> drivers/pci/pci-driver.c:417 (discriminator 1) drivers/pci/pci-driver.c:451
>> (discriminator 1))
>> really_probe (drivers/base/dd.c:581 drivers/base/dd.c:658)
>> ? __device_attach_driver (drivers/base/dd.c:1157)
>> __driver_probe_device (drivers/base/dd.c:800)
>> driver_probe_device (drivers/base/dd.c:831)
>> __driver_attach (drivers/base/dd.c:1217 drivers/base/dd.c:1156)
>> bus_for_each_dev (drivers/base/bus.c:369)
>> bus_add_driver (drivers/base/bus.c:676)
>> driver_register (drivers/base/driver.c:247)
>> ? 0xffffffffc061b000
>> do_one_initcall (init/main.c:1269)
>> do_init_module (kernel/module/main.c:2544)
>> init_module_from_file (kernel/module/main.c:3199)
>> idempotent_init_module (kernel/module/main.c:3210)
>> __x64_sys_finit_module (./include/linux/file.h:68 kernel/module/main.c:3238
>> kernel/module/main.c:3220 kernel/module/main.c:3220)
>> do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)
>> arch/x86/entry/common.c:83 (discriminator 1))
>> ? mntput_no_expire (fs/namespace.c:1460)
>> ? terminate_walk (fs/namei.c:693 (discriminator 1))
>> ? path_openat (fs/namei.c:3934)
>> ? do_filp_open (fs/namei.c:3961 (discriminator 2))
>> ? copy_from_kernel_nofault (mm/maccess.c:31 (discriminator 1))
>> ? kmem_cache_alloc_noprof (mm/slub.c:494 mm/slub.c:539 mm/slub.c:528
>> mm/slub.c:3964 mm/slub.c:4123 mm/slub.c:4142)
>> ? do_sys_openat2 (fs/open.c:1424)
>> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
>> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
>> kernel/entry/common.c:220)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
>> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
>> kernel/entry/common.c:220)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
>> ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
>> kernel/entry/common.c:220)
>> ? do_syscall_64 (arch/x86/entry/common.c:102)
>> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>> RIP: 0033:0x7fe49238eb8d
>> Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
>> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01
>> c3 48 8b 0d 6b 72 0c 00 f7 d8 64 89 01 48
>> All code
>> ========
>> 0: ff c3 inc %ebx
>> 2: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
>> 9: 00 00 00
>> c: 90 nop
>> d: f3 0f 1e fa endbr64
>> 11: 48 89 f8 mov %rdi,%rax
>> 14: 48 89 f7 mov %rsi,%rdi
>> 17: 48 89 d6 mov %rdx,%rsi
>> 1a: 48 89 ca mov %rcx,%rdx
>> 1d: 4d 89 c2 mov %r8,%r10
>> 20: 4d 89 c8 mov %r9,%r8
>> 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
>> 28: 0f 05 syscall
>> 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
>> trapping instruction
>> 30: 73 01 jae 0x33
>> 32: c3 ret
>> 33: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc72a5
>> 3a: f7 d8 neg %eax
>> 3c: 64 89 01 mov %eax,%fs:(%rcx)
>> 3f: 48 rex.W
>>
>> Code starting with the faulting instruction
>> ===========================================
>> 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
>> 6: 73 01 jae 0x9
>> 8: c3 ret
>> 9: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc727b
>> 10: f7 d8 neg %eax
>> 12: 64 89 01 mov %eax,%fs:(%rcx)
>> 15: 48 rex.W
>> RSP: 002b:00007ffdf72f16f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> RAX: ffffffffffffffda RBX: 000055b0dae03330 RCX: 00007fe49238eb8d
>> RDX: 0000000000000004 RSI: 00007fe49252231b RDI: 000000000000000d
>> RBP: 0000000000000004 R08: 000055b0dae05940 R09: 000055b0dae03e80
>> R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe49252231b
>> R13: 0000000000020000 R14: 000055b0dadd00e0 R15: 000055b0dae03330
>> </TASK>
>> Modules linked in: intel_lpss_pci(+) intel_lpss idma64 raid_class virt_dma
>> scsi_transport_sas mfd_core sunrpc dm_mirror dm_region_hash dm_log dm_mod btrfs
>> blake2b_generic
>> CR2: 00000000000000d8
>> ---[ end trace 0000000000000000 ]---
>> RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
>> drivers/pci/search.c:41)
>> Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
>> e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
>> e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
>> All code
>> ========
>> 0: 00 0f add %cl,(%rdi)
>> 2: 1f (bad)
>> 3: 40 00 0f rex add %cl,(%rdi)
>> 6: 1f (bad)
>> 7: 44 00 00 add %r8b,(%rax)
>> a: 41 56 push %r14
>> c: 41 55 push %r13
>> e: 41 54 push %r12
>> 10: 49 89 d4 mov %rdx,%r12
>> 13: 55 push %rbp
>> 14: 48 89 f5 mov %rsi,%rbp
>> 17: 53 push %rbx
>> 18: e8 18 d2 ff ff call 0xffffffffffffd235
>> 1d: 4c 89 e2 mov %r12,%rdx
>> 20: 48 89 c3 mov %rax,%rbx
>> 23: 48 8b 40 10 mov 0x10(%rax),%rax
>> 27: 48 89 df mov %rbx,%rdi
>> 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
>> instruction
>> 31: c1 e6 08 shl $0x8,%esi
>> 34: 66 0b 73 38 or 0x38(%rbx),%si
>> 38: 0f b7 f6 movzwl %si,%esi
>> 3b: ff d5 call *%rbp
>> 3d: 85 c0 test %eax,%eax
>> 3f: 41 rex.B
>>
>> Code starting with the faulting instruction
>> 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
>> 7: c1 e6 08 shl $0x8,%esi
>> a: 66 0b 73 38 or 0x38(%rbx),%si
>> e: 0f b7 f6 movzwl %si,%esi
>> 11: ff d5 call *%rbp
>> 13: 85 c0 test %eax,%eax
>> 15: 41 rex.B
>> RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
>> RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
>> RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
>> RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
>> R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
>> R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
>> FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> note: (udev-worker)[430] exited with irqs disabled
>> mpt3sas version 48.100.00.00 loaded
>> md/raid1: md22: active with 2 out of 2 mirrors
>> md22: detected capacity change from 0 to 62912512
>
Marcin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
2024-10-04 5:55 ` Marcin Mirosław
@ 2024-10-07 17:20 ` Bjorn Helgaas
2024-10-07 20:46 ` Bjorn Helgaas
0 siblings, 1 reply; 4+ messages in thread
From: Bjorn Helgaas @ 2024-10-07 17:20 UTC (permalink / raw)
To: Lu Baolu
Cc: Kevin Tian, Joerg Roedel, David Woodhouse, linux-pci, iommu,
Marcin Mirosław
[+to Lu, author of 2031c469f816]
On Fri, Oct 04, 2024 at 07:55:29AM +0200, Marcin Mirosław wrote:
> W dniu 03.10.2024 o 23:39, Bjorn Helgaas pisze:
> > On Thu, Oct 03, 2024 at 08:15:16PM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219349
> > >
> > > Summary: RIP: 0010:pci_for_each_dma_alias
> > > (./include/linux/pci.h:692 drivers/pci/search.c:41)
> > > Kernel Version: 6.12.0-rc1 8c245fe7dde3
> > > Regression: Yes
> > >
> > > Created attachment 306959
> > > --> https://bugzilla.kernel.org/attachment.cgi?id=306959&action=edit
> > > lspci -vv
> > >
> > > Hello,
> > > I see BUG: kernel NULL pointer dereference using kernel 6.12.0-rc1 (actually at
> > > 8c245fe7dde3 but don't know what is first bad commit).
> >
> > Thanks very much for this report! You marked this as a regression;
> > Marcin, do you know the most recent kernel where you did not see this
> > issue?
>
> Kernel 6.11 works correctly, I didn't narrow suspect commit yet.
Update from the bugzilla:
Marcin bisected the problem to 2031c469f816 ("iommu/vt-d: Add support
for static identity domain") and verified that reverting that commit
from v6.12-rc2 avoids the problem.
> > > RPC: Registered tcp NFSv4.1 backchannel transport module.
> > > intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)#012 SUBSYSTEM=pci#012
> > > DEVICE=+pci:0000:00:15.0
> > > platform idma64.0: Adding to iommu group 12#012 SUBSYSTEM=platform#012
> > > DEVICE=+platform:idma64.0
> > > BUG: kernel NULL pointer dereference, address: 00000000000000d8
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x0000) - not-present page
> > > PGD 0 P4D 0
> > > Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > > CPU: 0 UID: 0 PID: 430 Comm: (udev-worker) Not tainted
> > > 6.12.0-rc1-00113-g8c245fe7dde3 #50
> > > Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
> > > RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
> > > drivers/pci/search.c:41)
> > > Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
> > > e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
> > > e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
> > > All code
> > > ========
> > > 0: 00 0f add %cl,(%rdi)
> > > 2: 1f (bad)
> > > 3: 40 00 0f rex add %cl,(%rdi)
> > > 6: 1f (bad)
> > > 7: 44 00 00 add %r8b,(%rax)
> > > a: 41 56 push %r14
> > > c: 41 55 push %r13
> > > e: 41 54 push %r12
> > > 10: 49 89 d4 mov %rdx,%r12
> > > 13: 55 push %rbp
> > > 14: 48 89 f5 mov %rsi,%rbp
> > > 17: 53 push %rbx
> > > 18: e8 18 d2 ff ff call 0xffffffffffffd235
> > > 1d: 4c 89 e2 mov %r12,%rdx
> > > 20: 48 89 c3 mov %rax,%rbx
> > > 23: 48 8b 40 10 mov 0x10(%rax),%rax
> > > 27: 48 89 df mov %rbx,%rdi
> > > 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
> > > instruction
> > > 31: c1 e6 08 shl $0x8,%esi
> > > 34: 66 0b 73 38 or 0x38(%rbx),%si
> > > 38: 0f b7 f6 movzwl %si,%esi
> > > 3b: ff d5 call *%rbp
> > > 3d: 85 c0 test %eax,%eax
> > > 3f: 41 rex.B
> > >
> > > Code starting with the faulting instruction
> > > ===========================================
> > > 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
> > > 7: c1 e6 08 shl $0x8,%esi
> > > a: 66 0b 73 38 or 0x38(%rbx),%si
> > > e: 0f b7 f6 movzwl %si,%esi
> > > 11: ff d5 call *%rbp
> > > 13: 85 c0 test %eax,%eax
> > > 15: 41 rex.B
> > > RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
> > > RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
> > > RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
> > > RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
> > > R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
> > > R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
> > > FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > > <TASK>
> > > ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
> > > ? page_fault_oops (arch/x86/mm/fault.c:715)
> > > ? do_user_addr_fault (./include/linux/kprobes.h:589 (discriminator 1)
> > > arch/x86/mm/fault.c:1240 (discriminator 1))
> > > ? _raw_spin_unlock_irqrestore (./include/linux/spinlock_api_smp.h:152
> > > kernel/locking/spinlock.c:194)
> > > ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37
> > > ./arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489
> > > arch/x86/mm/fault.c:1539)
> > > ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
> > > ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
> > > ? domain_context_clear_one (drivers/iommu/intel/iommu.c:3328)
> > > ? pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
> > > ? pci_for_each_dma_alias (drivers/pci/search.c:41 (discriminator 1))
> > > device_block_translation (drivers/iommu/intel/iommu.c:3370)
> > > intel_iommu_attach_device (drivers/iommu/intel/iommu.c:3635)
> > > __iommu_attach_device (drivers/iommu/iommu.c:2084)
> > > __iommu_device_set_domain (drivers/iommu/iommu.c:2272)
> >
> > I suspect this path:
> >
> > __iommu_device_set_domain
> > __iommu_attach_device
> > domain->ops->attach_dev
> > intel_iommu_attach_device
> > device_block_translation
> > domain_context_clear
> > if (!dev_is_pci(info->dev))
> > domain_context_clear_one
> > pci_for_each_dma_alias
> >
> > and 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with
> > context mapping") looks a little suspicious to me since
> > domain_context_clear() now calls domain_context_clear_one() for
> > non-PCI devices, but then goes on to also use pci_for_each_dma_alias()
> > for ALL devices, even non-PCI ones.
> >
> > But 9a16ab9d6402 appeared in v6.7-rc4, so it's been around a while.
> > Maybe a more recent change added non-PCI devices into the mix, so
> > previously we only got there with PCI devices?
> >
> > > iommu_setup_default_domain (drivers/iommu/iommu.c:2326 (discriminator 2)
> > > drivers/iommu/iommu.c:2992 (discriminator 2))
> > > __iommu_probe_device (drivers/iommu/iommu.c:567)
> > > iommu_probe_device (drivers/iommu/iommu.c:604)
> > > iommu_bus_notifier (drivers/iommu/iommu.c:1668 drivers/iommu/iommu.c:1659)
> > > notifier_call_chain (kernel/notifier.c:95)
> > > blocking_notifier_call_chain (kernel/notifier.c:389 kernel/notifier.c:376)
> > > bus_notify (./include/linux/kobject.h:193 drivers/base/base.h:73
> > > drivers/base/bus.c:999)
> > > device_add (drivers/base/core.c:3656)
> > > platform_device_add (drivers/base/platform.c:717)
> > > mfd_add_device (drivers/mfd/mfd-core.c:274) mfd_core
> > > ? alloc_inode (fs/inode.c:265)
> > > ? make_kgid (kernel/user_namespace.c:483)
> > > ? inode_init_always (fs/inode.c:219)
> > > mfd_add_devices (drivers/mfd/mfd-core.c:329) mfd_core
> > > intel_lpss_probe (drivers/mfd/intel-lpss.c:443 drivers/mfd/intel-lpss.c:390)
> > > intel_lpss
> > > ? _raw_spin_lock_irqsave (./arch/x86/include/asm/paravirt.h:584
> > > ./arch/x86/include/asm/qspinlock.h:51 ./include/asm-generic/qspinlock.h:114
> > > ./include/linux/spinlock.h:187 ./include/linux/spinlock_api_smp.h:111
> > > kernel/locking/spinlock.c:162)
> > > ? pci_conf1_write (arch/x86/pci/direct.c:78)
> > > intel_lpss_pci_probe (drivers/mfd/intel-lpss-pci.c:80) intel_lpss_pci
> > > local_pci_probe (drivers/pci/pci-driver.c:325)
> > > pci_device_probe (drivers/pci/pci-driver.c:392 (discriminator 1)
> > > drivers/pci/pci-driver.c:417 (discriminator 1) drivers/pci/pci-driver.c:451
> > > (discriminator 1))
> > > really_probe (drivers/base/dd.c:581 drivers/base/dd.c:658)
> > > ? __device_attach_driver (drivers/base/dd.c:1157)
> > > __driver_probe_device (drivers/base/dd.c:800)
> > > driver_probe_device (drivers/base/dd.c:831)
> > > __driver_attach (drivers/base/dd.c:1217 drivers/base/dd.c:1156)
> > > bus_for_each_dev (drivers/base/bus.c:369)
> > > bus_add_driver (drivers/base/bus.c:676)
> > > driver_register (drivers/base/driver.c:247)
> > > ? 0xffffffffc061b000
> > > do_one_initcall (init/main.c:1269)
> > > do_init_module (kernel/module/main.c:2544)
> > > init_module_from_file (kernel/module/main.c:3199)
> > > idempotent_init_module (kernel/module/main.c:3210)
> > > __x64_sys_finit_module (./include/linux/file.h:68 kernel/module/main.c:3238
> > > kernel/module/main.c:3220 kernel/module/main.c:3220)
> > > do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1)
> > > arch/x86/entry/common.c:83 (discriminator 1))
> > > ? mntput_no_expire (fs/namespace.c:1460)
> > > ? terminate_walk (fs/namei.c:693 (discriminator 1))
> > > ? path_openat (fs/namei.c:3934)
> > > ? do_filp_open (fs/namei.c:3961 (discriminator 2))
> > > ? copy_from_kernel_nofault (mm/maccess.c:31 (discriminator 1))
> > > ? kmem_cache_alloc_noprof (mm/slub.c:494 mm/slub.c:539 mm/slub.c:528
> > > mm/slub.c:3964 mm/slub.c:4123 mm/slub.c:4142)
> > > ? do_sys_openat2 (fs/open.c:1424)
> > > ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> > > ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> > > kernel/entry/common.c:220)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> > > ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> > > kernel/entry/common.c:220)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > ? syscall_exit_to_user_mode (./arch/x86/include/asm/processor.h:701
> > > ./arch/x86/include/asm/entry-common.h:100 ./include/linux/entry-common.h:364
> > > kernel/entry/common.c:220)
> > > ? do_syscall_64 (arch/x86/entry/common.c:102)
> > > entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> > > RIP: 0033:0x7fe49238eb8d
> > > Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
> > > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > c3 48 8b 0d 6b 72 0c 00 f7 d8 64 89 01 48
> > > All code
> > > ========
> > > 0: ff c3 inc %ebx
> > > 2: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> > > 9: 00 00 00
> > > c: 90 nop
> > > d: f3 0f 1e fa endbr64
> > > 11: 48 89 f8 mov %rdi,%rax
> > > 14: 48 89 f7 mov %rsi,%rdi
> > > 17: 48 89 d6 mov %rdx,%rsi
> > > 1a: 48 89 ca mov %rcx,%rdx
> > > 1d: 4d 89 c2 mov %r8,%r10
> > > 20: 4d 89 c8 mov %r9,%r8
> > > 23: 4c 8b 4c 24 08 mov 0x8(%rsp),%r9
> > > 28: 0f 05 syscall
> > > 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <--
> > > trapping instruction
> > > 30: 73 01 jae 0x33
> > > 32: c3 ret
> > > 33: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc72a5
> > > 3a: f7 d8 neg %eax
> > > 3c: 64 89 01 mov %eax,%fs:(%rcx)
> > > 3f: 48 rex.W
> > >
> > > Code starting with the faulting instruction
> > > ===========================================
> > > 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
> > > 6: 73 01 jae 0x9
> > > 8: c3 ret
> > > 9: 48 8b 0d 6b 72 0c 00 mov 0xc726b(%rip),%rcx # 0xc727b
> > > 10: f7 d8 neg %eax
> > > 12: 64 89 01 mov %eax,%fs:(%rcx)
> > > 15: 48 rex.W
> > > RSP: 002b:00007ffdf72f16f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > > RAX: ffffffffffffffda RBX: 000055b0dae03330 RCX: 00007fe49238eb8d
> > > RDX: 0000000000000004 RSI: 00007fe49252231b RDI: 000000000000000d
> > > RBP: 0000000000000004 R08: 000055b0dae05940 R09: 000055b0dae03e80
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe49252231b
> > > R13: 0000000000020000 R14: 000055b0dadd00e0 R15: 000055b0dae03330
> > > </TASK>
> > > Modules linked in: intel_lpss_pci(+) intel_lpss idma64 raid_class virt_dma
> > > scsi_transport_sas mfd_core sunrpc dm_mirror dm_region_hash dm_log dm_mod btrfs
> > > blake2b_generic
> > > CR2: 00000000000000d8
> > > ---[ end trace 0000000000000000 ]---
> > > RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692
> > > drivers/pci/search.c:41)
> > > Code: 00 0f 1f 40 00 0f 1f 44 00 00 41 56 41 55 41 54 49 89 d4 55 48 89 f5 53
> > > e8 18 d2 ff ff 4c 89 e2 48 89 c3 48 8b 40 10 48 89 df <0f> b6 b0 d8 00 00 00 c1
> > > e6 08 66 0b 73 38 0f b7 f6 ff d5 85 c0 41
> > > All code
> > > ========
> > > 0: 00 0f add %cl,(%rdi)
> > > 2: 1f (bad)
> > > 3: 40 00 0f rex add %cl,(%rdi)
> > > 6: 1f (bad)
> > > 7: 44 00 00 add %r8b,(%rax)
> > > a: 41 56 push %r14
> > > c: 41 55 push %r13
> > > e: 41 54 push %r12
> > > 10: 49 89 d4 mov %rdx,%r12
> > > 13: 55 push %rbp
> > > 14: 48 89 f5 mov %rsi,%rbp
> > > 17: 53 push %rbx
> > > 18: e8 18 d2 ff ff call 0xffffffffffffd235
> > > 1d: 4c 89 e2 mov %r12,%rdx
> > > 20: 48 89 c3 mov %rax,%rbx
> > > 23: 48 8b 40 10 mov 0x10(%rax),%rax
> > > 27: 48 89 df mov %rbx,%rdi
> > > 2a:* 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi <-- trapping
> > > instruction
> > > 31: c1 e6 08 shl $0x8,%esi
> > > 34: 66 0b 73 38 or 0x38(%rbx),%si
> > > 38: 0f b7 f6 movzwl %si,%esi
> > > 3b: ff d5 call *%rbp
> > > 3d: 85 c0 test %eax,%eax
> > > 3f: 41 rex.B
> > >
> > > Code starting with the faulting instruction
> > > 0: 0f b6 b0 d8 00 00 00 movzbl 0xd8(%rax),%esi
> > > 7: c1 e6 08 shl $0x8,%esi
> > > a: 66 0b 73 38 or 0x38(%rbx),%si
> > > e: 0f b7 f6 movzwl %si,%esi
> > > 11: ff d5 call *%rbp
> > > 13: 85 c0 test %eax,%eax
> > > 15: 41 rex.B
> > > RSP: 0018:ffffbff6006bb3a8 EFLAGS: 00010296
> > > RAX: 0000000000000000 RBX: ffff9dd828880348 RCX: 0000000000000000
> > > RDX: ffff9dd80f838ea0 RSI: ffffffffac8045b0 RDI: ffff9dd828880348
> > > RBP: ffffffffac8045b0 R08: 0000000000000000 R09: 0000000200000025
> > > R10: 000000000000007c R11: 00000000000001f4 R12: ffff9dd80f838ea0
> > > R13: ffff9dd826ab9700 R14: 0000000000000001 R15: 0000000000000001
> > > FS: 00007fe4921ab5c0(0000) GS:ffff9ddb5c600000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00000000000000d8 CR3: 000000010fa9b004 CR4: 00000000003706f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > note: (udev-worker)[430] exited with irqs disabled
> > > mpt3sas version 48.100.00.00 loaded
> > > md/raid1: md22: active with 2 out of 2 mirrors
> > > md22: detected capacity change from 0 to 62912512
> >
>
>
> Marcin
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41)
2024-10-07 17:20 ` Bjorn Helgaas
@ 2024-10-07 20:46 ` Bjorn Helgaas
0 siblings, 0 replies; 4+ messages in thread
From: Bjorn Helgaas @ 2024-10-07 20:46 UTC (permalink / raw)
To: Lu Baolu
Cc: Kevin Tian, Joerg Roedel, David Woodhouse, linux-pci, iommu,
Marcin Mirosław
On Mon, Oct 07, 2024 at 12:20:01PM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2024 at 07:55:29AM +0200, Marcin Mirosław wrote:
> > W dniu 03.10.2024 o 23:39, Bjorn Helgaas pisze:
> > > On Thu, Oct 03, 2024 at 08:15:16PM +0000, bugzilla-daemon@kernel.org wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=219349
> > > >
> > > > Summary: RIP: 0010:pci_for_each_dma_alias
> > > > (./include/linux/pci.h:692 drivers/pci/search.c:41)
> > > > Kernel Version: 6.12.0-rc1 8c245fe7dde3
> > > > Regression: Yes
> > > >
> > > > Created attachment 306959
> > > > --> https://bugzilla.kernel.org/attachment.cgi?id=306959&action=edit
> > > > lspci -vv
> > > >
> > > > Hello,
> > > > I see BUG: kernel NULL pointer dereference using kernel 6.12.0-rc1 (actually at
> > > > 8c245fe7dde3 but don't know what is first bad commit).
> > >
> > > Thanks very much for this report! You marked this as a regression;
> > > Marcin, do you know the most recent kernel where you did not see this
> > > issue?
> >
> > Kernel 6.11 works correctly, I didn't narrow suspect commit yet.
>
> Update from the bugzilla:
>
> Marcin bisected the problem to 2031c469f816 ("iommu/vt-d: Add support
> for static identity domain") and verified that reverting that commit
> from v6.12-rc2 avoids the problem.
#regzbot introduced: 2031c469f816 ("iommu/vt-d: Add support for static identity domain")
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-10-07 20:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <bug-219349-41252@https.bugzilla.kernel.org/>
2024-10-03 21:39 ` [Bug 219349] New: RIP: 0010:pci_for_each_dma_alias (./include/linux/pci.h:692 drivers/pci/search.c:41) Bjorn Helgaas
2024-10-04 5:55 ` Marcin Mirosław
2024-10-07 17:20 ` Bjorn Helgaas
2024-10-07 20:46 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).