* [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
@ 2024-05-28 2:23 Yi Zhang
2024-05-28 5:00 ` Joerg Roedel
0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2024-05-28 2:23 UTC (permalink / raw)
To: linux-block, iommu; +Cc: Shinichiro Kawasaki, joro, suravee.suthikulpanit
Hello
I found this regression panic issue on the latest 6.10-rc1 and it
cannot be reproduced on 6.9, please help check and let me know if you
need any info/testing for it, thanks.
reproducer
# cat config
TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
# ./check block/008
block/008 => nvme0n1 (do IO while hotplugging CPUs)
read iops 131813 ...
runtime 32.097s ...
[ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
[ 977.485983] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
[ 977.500334] #PF: supervisor instruction fetch in kernel mode
[ 977.505992] #PF: error_code(0x0011) - permissions violation
[ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
PTE 8000000719d5e163
[ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
[ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
6.10.0-0.rc1.17.eln136.x86_64 #1
[ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.13.3 09/12/2023
[ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[ 977.625927] Call Trace:
[ 977.628376] <TASK>
[ 977.630480] ? srso_return_thunk+0x5/0x5f
[ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
[ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
[ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
[ 977.647830] ? __die_body.cold+0x8/0x12
[ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.656979] ? page_fault_oops+0x146/0x160
[ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.666392] ? exc_page_fault+0x152/0x160
[ 977.670405] ? asm_exc_page_fault+0x26/0x30
[ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[ 977.690527] cpuhp_invoke_callback+0x122/0x410
[ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 977.699593] cpuhp_thread_fun+0x98/0x140
[ 977.703521] smpboot_thread_fn+0xdd/0x1d0
[ 977.707533] kthread+0xd2/0x100
[ 977.710677] ? __pfx_kthread+0x10/0x10
[ 977.714431] ret_from_fork+0x34/0x50
[ 977.718009] ? __pfx_kthread+0x10/0x10
[ 977.721763] ret_from_fork_asm+0x1a/0x30
[ 977.725692] </TASK>
[ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
dm_log dm_mod
[ 977.786224] CR2: ffffffffb3d5e310
[ 977.789544] ---[ end trace 0000000000000000 ]---
[ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[ 977.969129] Kernel panic - not syncing: Fatal exception
[ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-28 2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang
@ 2024-05-28 5:00 ` Joerg Roedel
2024-05-28 17:30 ` Vasant Hegde
0 siblings, 1 reply; 8+ messages in thread
From: Joerg Roedel @ 2024-05-28 5:00 UTC (permalink / raw)
To: Yi Zhang, Vasant Hegde
Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit
Adding Vasant.
On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> Hello
> I found this regression panic issue on the latest 6.10-rc1 and it
> cannot be reproduced on 6.9, please help check and let me know if you
> need any info/testing for it, thanks.
>
> reproducer
> # cat config
> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
> # ./check block/008
> block/008 => nvme0n1 (do IO while hotplugging CPUs)
> read iops 131813 ...
> runtime 32.097s ...
>
> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
> [ 977.485983] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
> [ 977.500334] #PF: supervisor instruction fetch in kernel mode
> [ 977.505992] #PF: error_code(0x0011) - permissions violation
> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
> PTE 8000000719d5e163
> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
> 6.10.0-0.rc1.17.eln136.x86_64 #1
> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
> 2.13.3 09/12/2023
> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> 00 00
> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
> knlGS:0000000000000000
> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> [ 977.625927] Call Trace:
> [ 977.628376] <TASK>
> [ 977.630480] ? srso_return_thunk+0x5/0x5f
> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
> [ 977.647830] ? __die_body.cold+0x8/0x12
> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [ 977.656979] ? page_fault_oops+0x146/0x160
> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [ 977.666392] ? exc_page_fault+0x152/0x160
> [ 977.670405] ? asm_exc_page_fault+0x26/0x30
> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [ 977.690527] cpuhp_invoke_callback+0x122/0x410
> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
> [ 977.699593] cpuhp_thread_fun+0x98/0x140
> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0
> [ 977.707533] kthread+0xd2/0x100
> [ 977.710677] ? __pfx_kthread+0x10/0x10
> [ 977.714431] ret_from_fork+0x34/0x50
> [ 977.718009] ? __pfx_kthread+0x10/0x10
> [ 977.721763] ret_from_fork_asm+0x1a/0x30
> [ 977.725692] </TASK>
> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
> dm_log dm_mod
> [ 977.786224] CR2: ffffffffb3d5e310
> [ 977.789544] ---[ end trace 0000000000000000 ]---
> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> 00 00
> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
> knlGS:0000000000000000
> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> [ 977.969129] Kernel panic - not syncing: Fatal exception
> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> --
> Best Regards,
> Yi Zhang
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-28 5:00 ` Joerg Roedel
@ 2024-05-28 17:30 ` Vasant Hegde
2024-05-28 17:40 ` Vasant Hegde
0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-28 17:30 UTC (permalink / raw)
To: Joerg Roedel, Yi Zhang
Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit
Hi Yi,
On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> Adding Vasant.
>
> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>> Hello
>> I found this regression panic issue on the latest 6.10-rc1 and it
>> cannot be reproduced on 6.9, please help check and let me know if you
>> need any info/testing for it, thanks.
I have tried to reproduce this issue on my system. So far I am not able to
reproduce it.
Will you be able to bisect the kernel?
>>
>> reproducer
>> # cat config
>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>> # ./check block/008
>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>> read iops 131813 ...
>> runtime 32.097s ...
>>
>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
>> [ 977.485983] kernel tried to execute NX-protected page - exploit
>> attempt? (uid: 0)
>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode
>> [ 977.505992] #PF: error_code(0x0011) - permissions violation
>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>> PTE 8000000719d5e163
>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>> 2.13.3 09/12/2023
>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
amd_iommu_enable_faulting() just returns zero.
-Vasant
>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>> 00 00
>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>> knlGS:0000000000000000
>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>> [ 977.625927] Call Trace:
>> [ 977.628376] <TASK>
>> [ 977.630480] ? srso_return_thunk+0x5/0x5f
>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
>> [ 977.647830] ? __die_body.cold+0x8/0x12
>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [ 977.656979] ? page_fault_oops+0x146/0x160
>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [ 977.666392] ? exc_page_fault+0x152/0x160
>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30
>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410
>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
>> [ 977.699593] cpuhp_thread_fun+0x98/0x140
>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0
>> [ 977.707533] kthread+0xd2/0x100
>> [ 977.710677] ? __pfx_kthread+0x10/0x10
>> [ 977.714431] ret_from_fork+0x34/0x50
>> [ 977.718009] ? __pfx_kthread+0x10/0x10
>> [ 977.721763] ret_from_fork_asm+0x1a/0x30
>> [ 977.725692] </TASK>
>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>> dm_log dm_mod
>> [ 977.786224] CR2: ffffffffb3d5e310
>> [ 977.789544] ---[ end trace 0000000000000000 ]---
>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>> 00 00
>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>> knlGS:0000000000000000
>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>> [ 977.969129] Kernel panic - not syncing: Fatal exception
>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> --
>> Best Regards,
>> Yi Zhang
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-28 17:30 ` Vasant Hegde
@ 2024-05-28 17:40 ` Vasant Hegde
2024-05-29 6:16 ` Yi Zhang
0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-28 17:40 UTC (permalink / raw)
To: Joerg Roedel, Yi Zhang
Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit
Hi Yi,
On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> Hi Yi,
>
>
> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
>> Adding Vasant.
>>
>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>>> Hello
>>> I found this regression panic issue on the latest 6.10-rc1 and it
>>> cannot be reproduced on 6.9, please help check and let me know if you
>>> need any info/testing for it, thanks.
>
> I have tried to reproduce this issue on my system. So far I am not able to
> reproduce it.
>
> Will you be able to bisect the kernel?
I see that below patch touched this code path. Can you revert below patch and
test it again?
commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
Author: Dimitri Sivanich <sivanich@hpe.com>
Date: Wed Apr 24 15:16:29 2024 +0800
iommu/vt-d: Allocate DMAR fault interrupts locally
-Vasant
>
>>>
>>> reproducer
>>> # cat config
>>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>>> # ./check block/008
>>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>>> read iops 131813 ...
>>> runtime 32.097s ...
>>>
>>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
>>> [ 977.485983] kernel tried to execute NX-protected page - exploit
>>> attempt? (uid: 0)
>>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode
>>> [ 977.505992] #PF: error_code(0x0011) - permissions violation
>>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>>> PTE 8000000719d5e163
>>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>>> 2.13.3 09/12/2023
>>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>
> amd_iommu_enable_faulting() just returns zero.
>
> -Vasant
>
>
>>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>> 00 00
>>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>>> knlGS:0000000000000000
>>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>> [ 977.625927] Call Trace:
>>> [ 977.628376] <TASK>
>>> [ 977.630480] ? srso_return_thunk+0x5/0x5f
>>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
>>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
>>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
>>> [ 977.647830] ? __die_body.cold+0x8/0x12
>>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [ 977.656979] ? page_fault_oops+0x146/0x160
>>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [ 977.666392] ? exc_page_fault+0x152/0x160
>>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30
>>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410
>>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
>>> [ 977.699593] cpuhp_thread_fun+0x98/0x140
>>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0
>>> [ 977.707533] kthread+0xd2/0x100
>>> [ 977.710677] ? __pfx_kthread+0x10/0x10
>>> [ 977.714431] ret_from_fork+0x34/0x50
>>> [ 977.718009] ? __pfx_kthread+0x10/0x10
>>> [ 977.721763] ret_from_fork_asm+0x1a/0x30
>>> [ 977.725692] </TASK>
>>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>>> dm_log dm_mod
>>> [ 977.786224] CR2: ffffffffb3d5e310
>>> [ 977.789544] ---[ end trace 0000000000000000 ]---
>>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>> 00 00
>>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>>> knlGS:0000000000000000
>>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>> [ 977.969129] Kernel panic - not syncing: Fatal exception
>>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>
>>> --
>>> Best Regards,
>>> Yi Zhang
>>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-28 17:40 ` Vasant Hegde
@ 2024-05-29 6:16 ` Yi Zhang
2024-05-29 6:26 ` Vasant Hegde
0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2024-05-29 6:16 UTC (permalink / raw)
To: Vasant Hegde
Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki,
suravee.suthikulpanit
On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Hi Yi,
>
>
> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> > Hi Yi,
> >
> >
> > On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> >> Adding Vasant.
> >>
> >> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> >>> Hello
> >>> I found this regression panic issue on the latest 6.10-rc1 and it
> >>> cannot be reproduced on 6.9, please help check and let me know if you
> >>> need any info/testing for it, thanks.
> >
> > I have tried to reproduce this issue on my system. So far I am not able to
> > reproduce it.
> >
> > Will you be able to bisect the kernel?
>
> I see that below patch touched this code path. Can you revert below patch and
> test it again?
Yes, the panic cannot be reproduced now after revert this patch.
>
> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
> Author: Dimitri Sivanich <sivanich@hpe.com>
> Date: Wed Apr 24 15:16:29 2024 +0800
>
> iommu/vt-d: Allocate DMAR fault interrupts locally
>
> -Vasant
>
> >
> >>>
> >>> reproducer
> >>> # cat config
> >>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
> >>> # ./check block/008
> >>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
> >>> read iops 131813 ...
> >>> runtime 32.097s ...
> >>>
> >>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
> >>> [ 977.485983] kernel tried to execute NX-protected page - exploit
> >>> attempt? (uid: 0)
> >>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
> >>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode
> >>> [ 977.505992] #PF: error_code(0x0011) - permissions violation
> >>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
> >>> PTE 8000000719d5e163
> >>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
> >>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
> >>> 6.10.0-0.rc1.17.eln136.x86_64 #1
> >>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
> >>> 2.13.3 09/12/2023
> >>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> >
> > amd_iommu_enable_faulting() just returns zero.
> >
> > -Vasant
> >
> >
> >>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> >>> 00 00
> >>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> >>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> >>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> >>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> >>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> >>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> >>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
> >>> knlGS:0000000000000000
> >>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> >>> [ 977.625927] Call Trace:
> >>> [ 977.628376] <TASK>
> >>> [ 977.630480] ? srso_return_thunk+0x5/0x5f
> >>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
> >>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
> >>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
> >>> [ 977.647830] ? __die_body.cold+0x8/0x12
> >>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [ 977.656979] ? page_fault_oops+0x146/0x160
> >>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [ 977.666392] ? exc_page_fault+0x152/0x160
> >>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30
> >>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410
> >>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
> >>> [ 977.699593] cpuhp_thread_fun+0x98/0x140
> >>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0
> >>> [ 977.707533] kthread+0xd2/0x100
> >>> [ 977.710677] ? __pfx_kthread+0x10/0x10
> >>> [ 977.714431] ret_from_fork+0x34/0x50
> >>> [ 977.718009] ? __pfx_kthread+0x10/0x10
> >>> [ 977.721763] ret_from_fork_asm+0x1a/0x30
> >>> [ 977.725692] </TASK>
> >>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
> >>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
> >>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
> >>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
> >>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
> >>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
> >>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
> >>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
> >>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
> >>> dm_log dm_mod
> >>> [ 977.786224] CR2: ffffffffb3d5e310
> >>> [ 977.789544] ---[ end trace 0000000000000000 ]---
> >>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> >>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> >>> 00 00
> >>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> >>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> >>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> >>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> >>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> >>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> >>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
> >>> knlGS:0000000000000000
> >>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> >>> [ 977.969129] Kernel panic - not syncing: Fatal exception
> >>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
> >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >>>
> >>> --
> >>> Best Regards,
> >>> Yi Zhang
> >>>
>
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-29 6:16 ` Yi Zhang
@ 2024-05-29 6:26 ` Vasant Hegde
2024-05-29 8:13 ` Tian, Kevin
0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-29 6:26 UTC (permalink / raw)
To: Yi Zhang, sivanich, kevin.tian, Baolu Lu
Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki,
suravee.suthikulpanit
Hi Yi,
+Dimitri, Lu, Tian.
On 5/29/2024 11:46 AM, Yi Zhang wrote:
> On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote:
>>
>> Hi Yi,
>>
>>
>> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
>>> Hi Yi,
>>>
>>>
>>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
>>>> Adding Vasant.
>>>>
>>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>>>>> Hello
>>>>> I found this regression panic issue on the latest 6.10-rc1 and it
>>>>> cannot be reproduced on 6.9, please help check and let me know if you
>>>>> need any info/testing for it, thanks.
>>>
>>> I have tried to reproduce this issue on my system. So far I am not able to
>>> reproduce it.
>>>
>>> Will you be able to bisect the kernel?
>>
>> I see that below patch touched this code path. Can you revert below patch and
>> test it again?
>
> Yes, the panic cannot be reproduced now after revert this patch.
Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just return
zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
code path.
@Dimitri, Can you look into this issue?
-Vasant
>
>>
>> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
>> Author: Dimitri Sivanich <sivanich@hpe.com>
>> Date: Wed Apr 24 15:16:29 2024 +0800
>>
>> iommu/vt-d: Allocate DMAR fault interrupts locally
>>
>> -Vasant
>>
>>>
>>>>>
>>>>> reproducer
>>>>> # cat config
>>>>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>>>>> # ./check block/008
>>>>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>>>>> read iops 131813 ...
>>>>> runtime 32.097s ...
>>>>>
>>>>> [ 973.823246] run blktests block/008 at 2024-05-27 22:11:38
>>>>> [ 977.485983] kernel tried to execute NX-protected page - exploit
>>>>> attempt? (uid: 0)
>>>>> [ 977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>>>>> [ 977.500334] #PF: supervisor instruction fetch in kernel mode
>>>>> [ 977.505992] #PF: error_code(0x0011) - permissions violation
>>>>> [ 977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>>>>> PTE 8000000719d5e163
>>>>> [ 977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>>>>> [ 977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>>>>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>>>>> [ 977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>>>>> 2.13.3 09/12/2023
>>>>> [ 977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>>
>>> amd_iommu_enable_faulting() just returns zero.
>>>
>>> -Vasant
>>>
>>>
>>>>> [ 977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>>>> 00 00
>>>>> [ 977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>>>> [ 977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>>>> [ 977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>>>> [ 977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>>>> [ 977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>>>> [ 977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>>>> [ 977.604963] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 977.613050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>>>> [ 977.625927] Call Trace:
>>>>> [ 977.628376] <TASK>
>>>>> [ 977.630480] ? srso_return_thunk+0x5/0x5f
>>>>> [ 977.634491] ? show_trace_log_lvl+0x255/0x2f0
>>>>> [ 977.638851] ? show_trace_log_lvl+0x255/0x2f0
>>>>> [ 977.643213] ? cpuhp_invoke_callback+0x122/0x410
>>>>> [ 977.647830] ? __die_body.cold+0x8/0x12
>>>>> [ 977.651669] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [ 977.656979] ? page_fault_oops+0x146/0x160
>>>>> [ 977.661080] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [ 977.666392] ? exc_page_fault+0x152/0x160
>>>>> [ 977.670405] ? asm_exc_page_fault+0x26/0x30
>>>>> [ 977.674590] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [ 977.679905] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [ 977.685215] ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [ 977.690527] cpuhp_invoke_callback+0x122/0x410
>>>>> [ 977.694977] ? __pfx_smpboot_thread_fn+0x10/0x10
>>>>> [ 977.699593] cpuhp_thread_fun+0x98/0x140
>>>>> [ 977.703521] smpboot_thread_fn+0xdd/0x1d0
>>>>> [ 977.707533] kthread+0xd2/0x100
>>>>> [ 977.710677] ? __pfx_kthread+0x10/0x10
>>>>> [ 977.714431] ret_from_fork+0x34/0x50
>>>>> [ 977.718009] ? __pfx_kthread+0x10/0x10
>>>>> [ 977.721763] ret_from_fork_asm+0x1a/0x30
>>>>> [ 977.725692] </TASK>
>>>>> [ 977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>>>>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>>>>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>>>>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>>>>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>>>>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>>>>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>>>>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>>>>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>>>>> dm_log dm_mod
>>>>> [ 977.786224] CR2: ffffffffb3d5e310
>>>>> [ 977.789544] ---[ end trace 0000000000000000 ]---
>>>>> [ 977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>>>> [ 977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>>>> 00 00
>>>>> [ 977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>>>> [ 977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>>>> [ 977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>>>> [ 977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>>>> [ 977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>>>> [ 977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>>>> [ 977.948163] FS: 0000000000000000(0000) GS:ffff8f40df000000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 977.956251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>>>> [ 977.969129] Kernel panic - not syncing: Fatal exception
>>>>> [ 977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> [ 978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Yi Zhang
>>>>>
>>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-29 6:26 ` Vasant Hegde
@ 2024-05-29 8:13 ` Tian, Kevin
2024-05-29 12:49 ` Dimitri Sivanich
0 siblings, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2024-05-29 8:13 UTC (permalink / raw)
To: Vasant Hegde, Yi Zhang, sivanich@hpe.com, Baolu Lu
Cc: Joerg Roedel, linux-block, iommu@lists.linux.dev,
Shinichiro Kawasaki, suravee.suthikulpanit@amd.com
> From: Vasant Hegde <vasant.hegde@amd.com>
> Sent: Wednesday, May 29, 2024 2:26 PM
>
> Hi Yi,
>
> +Dimitri, Lu, Tian.
>
>
> On 5/29/2024 11:46 AM, Yi Zhang wrote:
> > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com>
> wrote:
> >>
> >> Hi Yi,
> >>
> >>
> >> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> >>> Hi Yi,
> >>>
> >>>
> >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> >>>> Adding Vasant.
> >>>>
> >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> >>>>> Hello
> >>>>> I found this regression panic issue on the latest 6.10-rc1 and it
> >>>>> cannot be reproduced on 6.9, please help check and let me know if
> you
> >>>>> need any info/testing for it, thanks.
> >>>
> >>> I have tried to reproduce this issue on my system. So far I am not able to
> >>> reproduce it.
> >>>
> >>> Will you be able to bisect the kernel?
> >>
> >> I see that below patch touched this code path. Can you revert below patch
> and
> >> test it again?
> >
> > Yes, the panic cannot be reproduced now after revert this patch.
>
> Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just
> return
> zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
> code path.
>
> @Dimitri, Can you look into this issue?
>
-int __init amd_iommu_enable_faulting(void)
+int __init amd_iommu_enable_faulting(unsigned int cpu)
likely it's due to '__init' not being removed...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
2024-05-29 8:13 ` Tian, Kevin
@ 2024-05-29 12:49 ` Dimitri Sivanich
0 siblings, 0 replies; 8+ messages in thread
From: Dimitri Sivanich @ 2024-05-29 12:49 UTC (permalink / raw)
To: Tian, Kevin
Cc: Vasant Hegde, Yi Zhang, Baolu Lu, Joerg Roedel, linux-block,
iommu@lists.linux.dev, Shinichiro Kawasaki,
suravee.suthikulpanit@amd.com
On Wed, May 29, 2024 at 08:13:42AM +0000, Tian, Kevin wrote:
> > From: Vasant Hegde <vasant.hegde@amd.com>
> > Sent: Wednesday, May 29, 2024 2:26 PM
> >
> > Hi Yi,
> >
> > +Dimitri, Lu, Tian.
> >
> >
> > On 5/29/2024 11:46 AM, Yi Zhang wrote:
> > > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com>
> > wrote:
> > >>
> > >> Hi Yi,
> > >>
> > >>
> > >> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> > >>> Hi Yi,
> > >>>
> > >>>
> > >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> > >>>> Adding Vasant.
> > >>>>
> > >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> > >>>>> Hello
> > >>>>> I found this regression panic issue on the latest 6.10-rc1 and it
> > >>>>> cannot be reproduced on 6.9, please help check and let me know if
> > you
> > >>>>> need any info/testing for it, thanks.
> > >>>
> > >>> I have tried to reproduce this issue on my system. So far I am not able to
> > >>> reproduce it.
> > >>>
> > >>> Will you be able to bisect the kernel?
> > >>
> > >> I see that below patch touched this code path. Can you revert below patch
> > and
> > >> test it again?
> > >
> > > Yes, the panic cannot be reproduced now after revert this patch.
> >
> > Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just
> > return
> > zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
> > code path.
> >
> > @Dimitri, Can you look into this issue?
> >
>
> -int __init amd_iommu_enable_faulting(void)
> +int __init amd_iommu_enable_faulting(unsigned int cpu)
>
> likely it's due to '__init' not being removed...
Yes, agreed. I will submit a patch with this change soon.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-05-29 13:53 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang
2024-05-28 5:00 ` Joerg Roedel
2024-05-28 17:30 ` Vasant Hegde
2024-05-28 17:40 ` Vasant Hegde
2024-05-29 6:16 ` Yi Zhang
2024-05-29 6:26 ` Vasant Hegde
2024-05-29 8:13 ` Tian, Kevin
2024-05-29 12:49 ` Dimitri Sivanich
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox