* [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
@ 2024-05-28 2:23 Yi Zhang
2024-05-28 5:00 ` Joerg Roedel
0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2024-05-28 2:23 UTC (permalink / raw)
To: linux-block, iommu; +Cc: Shinichiro Kawasaki, joro, suravee.suthikulpanit
Hello
I found this regression panic issue on the latest 6.10-rc1 and it
cannot be reproduced on 6.9, please help check and let me know if you
need any info/testing for it, thanks.
reproducer
# cat config
TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
# ./check block/008
block/008 => nvme0n1 (do IO while hotplugging CPUs)
read iops 131813 ...
runtime 32.097s ...
[ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
[ 977.485983] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
[ 977.500334] #PF: supervisor instruction fetch in kernel mode
[ 977.505992] #PF: error_code(0x0011) - permissions violation
[ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
PTE 8000000719d5e163
[ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
[ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
6.10.0-0.rc1.17.eln136.x86_64 #1
[ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.13.3 09/12/2023
[ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[ 977.625927] Call Trace:
[ 977.628376] <TASK>
[ 977.630480] ? srso_return_thunk+0x5/0x5f
[ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
[ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
[ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
[ 977.647830] ? __die_body.cold+0x8/0x12
[ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.656979] ? page_fault_oops+0x146/0x160
[ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.666392] ? exc_page_fault+0x152/0x160
[ 977.670405] ? asm_exc_page_fault+0x26/0x30
[ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.690527] cpuhp_invoke_callback+0x122/0x410
[ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 977.699593] cpuhp_thread_fun+0x98/0x140
[ 977.703521] smpboot_thread_fn+0xdd/0x1d0
[ 977.707533] kthread+0xd2/0x100
[ 977.710677] ? __pfx_kthread+0x10/0x10
[ 977.714431] ret_from_fork+0x34/0x50
[ 977.718009] ? __pfx_kthread+0x10/0x10
[ 977.721763] ret_from_fork_asm+0x1a/0x30
[ 977.725692] </TASK>
[ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
dm_log dm_mod
[ 977.786224] CR2: ffffffffb3d5e310
[ 977.789544] ---[ end trace 0000000000000000 ]---
[ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[ 977.969129] Kernel panic - not syncing: Fatal exception
[ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-28 2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang @ 2024-05-28 5:00 ` Joerg Roedel 2024-05-28 17:30 ` Vasant Hegde 0 siblings, 1 reply; 8+ messages in thread From: Joerg Roedel @ 2024-05-28 5:00 UTC (permalink / raw) To: Yi Zhang, Vasant Hegde Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit Adding Vasant. On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: > Hello > I found this regression panic issue on the latest 6.10-rc1 and it > cannot be reproduced on 6.9, please help check and let me know if you > need any info/testing for it, thanks. > > reproducer > # cat config > TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1) > # ./check block/008 > block/008 => nvme0n1 (do IO while hotplugging CPUs) > read iops 131813 ... > runtime 32.097s ... > > [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38 > [ 977.485983] kernel tried to execute NX-protected page - exploit > attempt? (uid: 0) > [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310 > [ 977.500334] #PF: supervisor instruction fetch in kernel mode > [ 977.505992] #PF: error_code(0x0011) - permissions violation > [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063 > PTE 8000000719d5e163 > [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI > [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted > 6.10.0-0.rc1.17.eln136.x86_64 #1 > [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS > 2.13.3 09/12/2023 > [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 > [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 > 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 > 00 00 > [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 > [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 > [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 > [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 > [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 > [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 > [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) > knlGS:0000000000000000 > [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 > [ 977.625927] Call Trace: > [ 977.628376] <TASK> > [ 977.630480] ? srso_return_thunk+0x5/0x5f > [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0 > [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0 > [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410 > [ 977.647830] ? __die_body.cold+0x8/0x12 > [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > [ 977.656979] ? page_fault_oops+0x146/0x160 > [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > [ 977.666392] ? exc_page_fault+0x152/0x160 > [ 977.670405] ? asm_exc_page_fault+0x26/0x30 > [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > [ 977.690527] cpuhp_invoke_callback+0x122/0x410 > [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 977.699593] cpuhp_thread_fun+0x98/0x140 > [ 977.703521] smpboot_thread_fn+0xdd/0x1d0 > [ 977.707533] kthread+0xd2/0x100 > [ 977.710677] ? __pfx_kthread+0x10/0x10 > [ 977.714431] ret_from_fork+0x34/0x50 > [ 977.718009] ? __pfx_kthread+0x10/0x10 > [ 977.721763] ret_from_fork_asm+0x1a/0x30 > [ 977.725692] </TASK> > [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 > dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath > ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac > edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm > dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr > acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4 > ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci > crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas > ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class > t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash > dm_log dm_mod > [ 977.786224] CR2: ffffffffb3d5e310 > [ 977.789544] ---[ end trace 0000000000000000 ]--- > [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 > [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 > 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 > 00 00 > [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 > [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 > [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 > [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 > [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 > [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 > [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) > knlGS:0000000000000000 > [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 > [ 977.969129] Kernel panic - not syncing: Fatal exception > [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000 > (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]--- > > -- > Best Regards, > Yi Zhang > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-28 5:00 ` Joerg Roedel @ 2024-05-28 17:30 ` Vasant Hegde 2024-05-28 17:40 ` Vasant Hegde 0 siblings, 1 reply; 8+ messages in thread From: Vasant Hegde @ 2024-05-28 17:30 UTC (permalink / raw) To: Joerg Roedel, Yi Zhang Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit Hi Yi, On 5/28/2024 10:30 AM, Joerg Roedel wrote: > Adding Vasant. > > On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: >> Hello >> I found this regression panic issue on the latest 6.10-rc1 and it >> cannot be reproduced on 6.9, please help check and let me know if you >> need any info/testing for it, thanks. I have tried to reproduce this issue on my system. So far I am not able to reproduce it. Will you be able to bisect the kernel? >> >> reproducer >> # cat config >> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1) >> # ./check block/008 >> block/008 => nvme0n1 (do IO while hotplugging CPUs) >> read iops 131813 ... >> runtime 32.097s ... >> >> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38 >> [ 977.485983] kernel tried to execute NX-protected page - exploit >> attempt? (uid: 0) >> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310 >> [ 977.500334] #PF: supervisor instruction fetch in kernel mode >> [ 977.505992] #PF: error_code(0x0011) - permissions violation >> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063 >> PTE 8000000719d5e163 >> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI >> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted >> 6.10.0-0.rc1.17.eln136.x86_64 #1 >> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS >> 2.13.3 09/12/2023 >> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 amd_iommu_enable_faulting() just returns zero. -Vasant >> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >> 00 00 >> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >> knlGS:0000000000000000 >> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >> [ 977.625927] Call Trace: >> [ 977.628376] <TASK> >> [ 977.630480] ? srso_return_thunk+0x5/0x5f >> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0 >> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0 >> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410 >> [ 977.647830] ? __die_body.cold+0x8/0x12 >> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >> [ 977.656979] ? page_fault_oops+0x146/0x160 >> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >> [ 977.666392] ? exc_page_fault+0x152/0x160 >> [ 977.670405] ? asm_exc_page_fault+0x26/0x30 >> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >> [ 977.690527] cpuhp_invoke_callback+0x122/0x410 >> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10 >> [ 977.699593] cpuhp_thread_fun+0x98/0x140 >> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0 >> [ 977.707533] kthread+0xd2/0x100 >> [ 977.710677] ? __pfx_kthread+0x10/0x10 >> [ 977.714431] ret_from_fork+0x34/0x50 >> [ 977.718009] ? __pfx_kthread+0x10/0x10 >> [ 977.721763] ret_from_fork_asm+0x1a/0x30 >> [ 977.725692] </TASK> >> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 >> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath >> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac >> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm >> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr >> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4 >> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci >> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas >> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class >> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash >> dm_log dm_mod >> [ 977.786224] CR2: ffffffffb3d5e310 >> [ 977.789544] ---[ end trace 0000000000000000 ]--- >> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 >> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >> 00 00 >> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >> knlGS:0000000000000000 >> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >> [ 977.969129] Kernel panic - not syncing: Fatal exception >> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000 >> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]--- >> >> -- >> Best Regards, >> Yi Zhang >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-28 17:30 ` Vasant Hegde @ 2024-05-28 17:40 ` Vasant Hegde 2024-05-29 6:16 ` Yi Zhang 0 siblings, 1 reply; 8+ messages in thread From: Vasant Hegde @ 2024-05-28 17:40 UTC (permalink / raw) To: Joerg Roedel, Yi Zhang Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit Hi Yi, On 5/28/2024 11:00 PM, Vasant Hegde wrote: > Hi Yi, > > > On 5/28/2024 10:30 AM, Joerg Roedel wrote: >> Adding Vasant. >> >> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: >>> Hello >>> I found this regression panic issue on the latest 6.10-rc1 and it >>> cannot be reproduced on 6.9, please help check and let me know if you >>> need any info/testing for it, thanks. > > I have tried to reproduce this issue on my system. So far I am not able to > reproduce it. > > Will you be able to bisect the kernel? I see that below patch touched this code path. Can you revert below patch and test it again? commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6 Author: Dimitri Sivanich <sivanich@hpe.com> Date: Wed Apr 24 15:16:29 2024 +0800 iommu/vt-d: Allocate DMAR fault interrupts locally -Vasant > >>> >>> reproducer >>> # cat config >>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1) >>> # ./check block/008 >>> block/008 => nvme0n1 (do IO while hotplugging CPUs) >>> read iops 131813 ... >>> runtime 32.097s ... >>> >>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38 >>> [ 977.485983] kernel tried to execute NX-protected page - exploit >>> attempt? (uid: 0) >>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310 >>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode >>> [ 977.505992] #PF: error_code(0x0011) - permissions violation >>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063 >>> PTE 8000000719d5e163 >>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI >>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted >>> 6.10.0-0.rc1.17.eln136.x86_64 #1 >>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS >>> 2.13.3 09/12/2023 >>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 > > amd_iommu_enable_faulting() just returns zero. > > -Vasant > > >>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >>> 00 00 >>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >>> knlGS:0000000000000000 >>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >>> [ 977.625927] Call Trace: >>> [ 977.628376] <TASK> >>> [ 977.630480] ? srso_return_thunk+0x5/0x5f >>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0 >>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0 >>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410 >>> [ 977.647830] ? __die_body.cold+0x8/0x12 >>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>> [ 977.656979] ? page_fault_oops+0x146/0x160 >>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>> [ 977.666392] ? exc_page_fault+0x152/0x160 >>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30 >>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410 >>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10 >>> [ 977.699593] cpuhp_thread_fun+0x98/0x140 >>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0 >>> [ 977.707533] kthread+0xd2/0x100 >>> [ 977.710677] ? __pfx_kthread+0x10/0x10 >>> [ 977.714431] ret_from_fork+0x34/0x50 >>> [ 977.718009] ? __pfx_kthread+0x10/0x10 >>> [ 977.721763] ret_from_fork_asm+0x1a/0x30 >>> [ 977.725692] </TASK> >>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 >>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath >>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac >>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm >>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr >>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4 >>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci >>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas >>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class >>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash >>> dm_log dm_mod >>> [ 977.786224] CR2: ffffffffb3d5e310 >>> [ 977.789544] ---[ end trace 0000000000000000 ]--- >>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 >>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >>> 00 00 >>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >>> knlGS:0000000000000000 >>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >>> [ 977.969129] Kernel panic - not syncing: Fatal exception >>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000 >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]--- >>> >>> -- >>> Best Regards, >>> Yi Zhang >>> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-28 17:40 ` Vasant Hegde @ 2024-05-29 6:16 ` Yi Zhang 2024-05-29 6:26 ` Vasant Hegde 0 siblings, 1 reply; 8+ messages in thread From: Yi Zhang @ 2024-05-29 6:16 UTC (permalink / raw) To: Vasant Hegde Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote: > > Hi Yi, > > > On 5/28/2024 11:00 PM, Vasant Hegde wrote: > > Hi Yi, > > > > > > On 5/28/2024 10:30 AM, Joerg Roedel wrote: > >> Adding Vasant. > >> > >> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: > >>> Hello > >>> I found this regression panic issue on the latest 6.10-rc1 and it > >>> cannot be reproduced on 6.9, please help check and let me know if you > >>> need any info/testing for it, thanks. > > > > I have tried to reproduce this issue on my system. So far I am not able to > > reproduce it. > > > > Will you be able to bisect the kernel? > > I see that below patch touched this code path. Can you revert below patch and > test it again? Yes, the panic cannot be reproduced now after revert this patch. > > commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6 > Author: Dimitri Sivanich <sivanich@hpe.com> > Date: Wed Apr 24 15:16:29 2024 +0800 > > iommu/vt-d: Allocate DMAR fault interrupts locally > > -Vasant > > > > >>> > >>> reproducer > >>> # cat config > >>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1) > >>> # ./check block/008 > >>> block/008 => nvme0n1 (do IO while hotplugging CPUs) > >>> read iops 131813 ... > >>> runtime 32.097s ... > >>> > >>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38 > >>> [ 977.485983] kernel tried to execute NX-protected page - exploit > >>> attempt? (uid: 0) > >>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310 > >>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode > >>> [ 977.505992] #PF: error_code(0x0011) - permissions violation > >>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063 > >>> PTE 8000000719d5e163 > >>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI > >>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted > >>> 6.10.0-0.rc1.17.eln136.x86_64 #1 > >>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS > >>> 2.13.3 09/12/2023 > >>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 > > > > amd_iommu_enable_faulting() just returns zero. > > > > -Vasant > > > > > >>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 > >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 > >>> 00 00 > >>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 > >>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 > >>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 > >>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 > >>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 > >>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 > >>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) > >>> knlGS:0000000000000000 > >>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 > >>> [ 977.625927] Call Trace: > >>> [ 977.628376] <TASK> > >>> [ 977.630480] ? srso_return_thunk+0x5/0x5f > >>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0 > >>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0 > >>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410 > >>> [ 977.647830] ? __die_body.cold+0x8/0x12 > >>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > >>> [ 977.656979] ? page_fault_oops+0x146/0x160 > >>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > >>> [ 977.666392] ? exc_page_fault+0x152/0x160 > >>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30 > >>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > >>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > >>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 > >>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410 > >>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10 > >>> [ 977.699593] cpuhp_thread_fun+0x98/0x140 > >>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0 > >>> [ 977.707533] kthread+0xd2/0x100 > >>> [ 977.710677] ? __pfx_kthread+0x10/0x10 > >>> [ 977.714431] ret_from_fork+0x34/0x50 > >>> [ 977.718009] ? __pfx_kthread+0x10/0x10 > >>> [ 977.721763] ret_from_fork_asm+0x1a/0x30 > >>> [ 977.725692] </TASK> > >>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 > >>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath > >>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac > >>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm > >>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr > >>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4 > >>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci > >>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas > >>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class > >>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash > >>> dm_log dm_mod > >>> [ 977.786224] CR2: ffffffffb3d5e310 > >>> [ 977.789544] ---[ end trace 0000000000000000 ]--- > >>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 > >>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 > >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 > >>> 00 00 > >>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 > >>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 > >>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 > >>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 > >>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 > >>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 > >>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) > >>> knlGS:0000000000000000 > >>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 > >>> [ 977.969129] Kernel panic - not syncing: Fatal exception > >>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000 > >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > >>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]--- > >>> > >>> -- > >>> Best Regards, > >>> Yi Zhang > >>> > -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-29 6:16 ` Yi Zhang @ 2024-05-29 6:26 ` Vasant Hegde 2024-05-29 8:13 ` Tian, Kevin 0 siblings, 1 reply; 8+ messages in thread From: Vasant Hegde @ 2024-05-29 6:26 UTC (permalink / raw) To: Yi Zhang, sivanich, kevin.tian, Baolu Lu Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit Hi Yi, +Dimitri, Lu, Tian. On 5/29/2024 11:46 AM, Yi Zhang wrote: > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote: >> >> Hi Yi, >> >> >> On 5/28/2024 11:00 PM, Vasant Hegde wrote: >>> Hi Yi, >>> >>> >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote: >>>> Adding Vasant. >>>> >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: >>>>> Hello >>>>> I found this regression panic issue on the latest 6.10-rc1 and it >>>>> cannot be reproduced on 6.9, please help check and let me know if you >>>>> need any info/testing for it, thanks. >>> >>> I have tried to reproduce this issue on my system. So far I am not able to >>> reproduce it. >>> >>> Will you be able to bisect the kernel? >> >> I see that below patch touched this code path. Can you revert below patch and >> test it again? > > Yes, the panic cannot be reproduced now after revert this patch. Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just return zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state() code path. @Dimitri, Can you look into this issue? -Vasant > >> >> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6 >> Author: Dimitri Sivanich <sivanich@hpe.com> >> Date: Wed Apr 24 15:16:29 2024 +0800 >> >> iommu/vt-d: Allocate DMAR fault interrupts locally >> >> -Vasant >> >>> >>>>> >>>>> reproducer >>>>> # cat config >>>>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1) >>>>> # ./check block/008 >>>>> block/008 => nvme0n1 (do IO while hotplugging CPUs) >>>>> read iops 131813 ... >>>>> runtime 32.097s ... >>>>> >>>>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38 >>>>> [ 977.485983] kernel tried to execute NX-protected page - exploit >>>>> attempt? (uid: 0) >>>>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310 >>>>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode >>>>> [ 977.505992] #PF: error_code(0x0011) - permissions violation >>>>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063 >>>>> PTE 8000000719d5e163 >>>>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI >>>>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted >>>>> 6.10.0-0.rc1.17.eln136.x86_64 #1 >>>>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS >>>>> 2.13.3 09/12/2023 >>>>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 >>> >>> amd_iommu_enable_faulting() just returns zero. >>> >>> -Vasant >>> >>> >>>>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >>>>> 00 00 >>>>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >>>>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >>>>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >>>>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >>>>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >>>>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >>>>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >>>>> [ 977.625927] Call Trace: >>>>> [ 977.628376] <TASK> >>>>> [ 977.630480] ? srso_return_thunk+0x5/0x5f >>>>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0 >>>>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0 >>>>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410 >>>>> [ 977.647830] ? __die_body.cold+0x8/0x12 >>>>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>>>> [ 977.656979] ? page_fault_oops+0x146/0x160 >>>>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>>>> [ 977.666392] ? exc_page_fault+0x152/0x160 >>>>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30 >>>>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>>>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>>>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10 >>>>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410 >>>>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10 >>>>> [ 977.699593] cpuhp_thread_fun+0x98/0x140 >>>>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0 >>>>> [ 977.707533] kthread+0xd2/0x100 >>>>> [ 977.710677] ? __pfx_kthread+0x10/0x10 >>>>> [ 977.714431] ret_from_fork+0x34/0x50 >>>>> [ 977.718009] ? __pfx_kthread+0x10/0x10 >>>>> [ 977.721763] ret_from_fork_asm+0x1a/0x30 >>>>> [ 977.725692] </TASK> >>>>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 >>>>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath >>>>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac >>>>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm >>>>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr >>>>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4 >>>>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci >>>>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas >>>>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class >>>>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash >>>>> dm_log dm_mod >>>>> [ 977.786224] CR2: ffffffffb3d5e310 >>>>> [ 977.789544] ---[ end trace 0000000000000000 ]--- >>>>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 >>>>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00 >>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 >>>>> 00 00 >>>>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246 >>>>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000 >>>>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004 >>>>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0 >>>>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0 >>>>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848 >>>>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0 >>>>> [ 977.969129] Kernel panic - not syncing: Fatal exception >>>>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000 >>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >>>>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]--- >>>>> >>>>> -- >>>>> Best Regards, >>>>> Yi Zhang >>>>> >> > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-29 6:26 ` Vasant Hegde @ 2024-05-29 8:13 ` Tian, Kevin 2024-05-29 12:49 ` Dimitri Sivanich 0 siblings, 1 reply; 8+ messages in thread From: Tian, Kevin @ 2024-05-29 8:13 UTC (permalink / raw) To: Vasant Hegde, Yi Zhang, sivanich@hpe.com, Baolu Lu Cc: Joerg Roedel, linux-block, iommu@lists.linux.dev, Shinichiro Kawasaki, suravee.suthikulpanit@amd.com > From: Vasant Hegde <vasant.hegde@amd.com> > Sent: Wednesday, May 29, 2024 2:26 PM > > Hi Yi, > > +Dimitri, Lu, Tian. > > > On 5/29/2024 11:46 AM, Yi Zhang wrote: > > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> > wrote: > >> > >> Hi Yi, > >> > >> > >> On 5/28/2024 11:00 PM, Vasant Hegde wrote: > >>> Hi Yi, > >>> > >>> > >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote: > >>>> Adding Vasant. > >>>> > >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: > >>>>> Hello > >>>>> I found this regression panic issue on the latest 6.10-rc1 and it > >>>>> cannot be reproduced on 6.9, please help check and let me know if > you > >>>>> need any info/testing for it, thanks. > >>> > >>> I have tried to reproduce this issue on my system. So far I am not able to > >>> reproduce it. > >>> > >>> Will you be able to bisect the kernel? > >> > >> I see that below patch touched this code path. Can you revert below patch > and > >> test it again? > > > > Yes, the panic cannot be reproduced now after revert this patch. > > Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just > return > zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state() > code path. > > @Dimitri, Can you look into this issue? > -int __init amd_iommu_enable_faulting(void) +int __init amd_iommu_enable_faulting(unsigned int cpu) likely it's due to '__init' not being removed... ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 2024-05-29 8:13 ` Tian, Kevin @ 2024-05-29 12:49 ` Dimitri Sivanich 0 siblings, 0 replies; 8+ messages in thread From: Dimitri Sivanich @ 2024-05-29 12:49 UTC (permalink / raw) To: Tian, Kevin Cc: Vasant Hegde, Yi Zhang, Baolu Lu, Joerg Roedel, linux-block, iommu@lists.linux.dev, Shinichiro Kawasaki, suravee.suthikulpanit@amd.com On Wed, May 29, 2024 at 08:13:42AM +0000, Tian, Kevin wrote: > > From: Vasant Hegde <vasant.hegde@amd.com> > > Sent: Wednesday, May 29, 2024 2:26 PM > > > > Hi Yi, > > > > +Dimitri, Lu, Tian. > > > > > > On 5/29/2024 11:46 AM, Yi Zhang wrote: > > > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> > > wrote: > > >> > > >> Hi Yi, > > >> > > >> > > >> On 5/28/2024 11:00 PM, Vasant Hegde wrote: > > >>> Hi Yi, > > >>> > > >>> > > >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote: > > >>>> Adding Vasant. > > >>>> > > >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote: > > >>>>> Hello > > >>>>> I found this regression panic issue on the latest 6.10-rc1 and it > > >>>>> cannot be reproduced on 6.9, please help check and let me know if > > you > > >>>>> need any info/testing for it, thanks. > > >>> > > >>> I have tried to reproduce this issue on my system. So far I am not able to > > >>> reproduce it. > > >>> > > >>> Will you be able to bisect the kernel? > > >> > > >> I see that below patch touched this code path. Can you revert below patch > > and > > >> test it again? > > > > > > Yes, the panic cannot be reproduced now after revert this patch. > > > > Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just > > return > > zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state() > > code path. > > > > @Dimitri, Can you look into this issue? > > > > -int __init amd_iommu_enable_faulting(void) > +int __init amd_iommu_enable_faulting(unsigned int cpu) > > likely it's due to '__init' not being removed... Yes, agreed. I will submit a patch with this change soon. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-29 13:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-05-28 2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang 2024-05-28 5:00 ` Joerg Roedel 2024-05-28 17:30 ` Vasant Hegde 2024-05-28 17:40 ` Vasant Hegde 2024-05-29 6:16 ` Yi Zhang 2024-05-29 6:26 ` Vasant Hegde 2024-05-29 8:13 ` Tian, Kevin 2024-05-29 12:49 ` Dimitri Sivanich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox