public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
* [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
@ 2024-05-28  2:23 Yi Zhang
  2024-05-28  5:00 ` Joerg Roedel
  0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2024-05-28  2:23 UTC (permalink / raw)
  To: linux-block, iommu; +Cc: Shinichiro Kawasaki, joro, suravee.suthikulpanit

Hello
I found this regression panic issue on the latest 6.10-rc1 and it
cannot be reproduced on 6.9, please help check and let me know if you
need any info/testing for it, thanks.

reproducer
# cat config
TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
# ./check block/008
block/008 => nvme0n1 (do IO while hotplugging CPUs)
    read iops  131813   ...
    runtime    32.097s  ...

[  973.823246] run blktests block/008 at 2024-05-27 22:11:38
[  977.485983] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)
[  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
[  977.500334] #PF: supervisor instruction fetch in kernel mode
[  977.505992] #PF: error_code(0x0011) - permissions violation
[  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
PTE 8000000719d5e163
[  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
[  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
6.10.0-0.rc1.17.eln136.x86_64 #1
[  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
2.13.3 09/12/2023
[  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[  977.625927] Call Trace:
[  977.628376]  <TASK>
[  977.630480]  ? srso_return_thunk+0x5/0x5f
[  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
[  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
[  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
[  977.647830]  ? __die_body.cold+0x8/0x12
[  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[  977.656979]  ? page_fault_oops+0x146/0x160
[  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[  977.666392]  ? exc_page_fault+0x152/0x160
[  977.670405]  ? asm_exc_page_fault+0x26/0x30
[  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
[  977.690527]  cpuhp_invoke_callback+0x122/0x410
[  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  977.699593]  cpuhp_thread_fun+0x98/0x140
[  977.703521]  smpboot_thread_fn+0xdd/0x1d0
[  977.707533]  kthread+0xd2/0x100
[  977.710677]  ? __pfx_kthread+0x10/0x10
[  977.714431]  ret_from_fork+0x34/0x50
[  977.718009]  ? __pfx_kthread+0x10/0x10
[  977.721763]  ret_from_fork_asm+0x1a/0x30
[  977.725692]  </TASK>
[  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
dm_log dm_mod
[  977.786224] CR2: ffffffffb3d5e310
[  977.789544] ---[ end trace 0000000000000000 ]---
[  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
[  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
00 00
[  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
[  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
[  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
[  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
[  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
[  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
[  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
knlGS:0000000000000000
[  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
[  977.969129] Kernel panic - not syncing: Fatal exception
[  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---

-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-28  2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang
@ 2024-05-28  5:00 ` Joerg Roedel
  2024-05-28 17:30   ` Vasant Hegde
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Roedel @ 2024-05-28  5:00 UTC (permalink / raw)
  To: Yi Zhang, Vasant Hegde
  Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit

Adding Vasant.

On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> Hello
> I found this regression panic issue on the latest 6.10-rc1 and it
> cannot be reproduced on 6.9, please help check and let me know if you
> need any info/testing for it, thanks.
> 
> reproducer
> # cat config
> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
> # ./check block/008
> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>     read iops  131813   ...
>     runtime    32.097s  ...
> 
> [  973.823246] run blktests block/008 at 2024-05-27 22:11:38
> [  977.485983] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
> [  977.500334] #PF: supervisor instruction fetch in kernel mode
> [  977.505992] #PF: error_code(0x0011) - permissions violation
> [  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
> PTE 8000000719d5e163
> [  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
> [  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
> 6.10.0-0.rc1.17.eln136.x86_64 #1
> [  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
> 2.13.3 09/12/2023
> [  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> [  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> 00 00
> [  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> [  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> [  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> [  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> [  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> [  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> [  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
> knlGS:0000000000000000
> [  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> [  977.625927] Call Trace:
> [  977.628376]  <TASK>
> [  977.630480]  ? srso_return_thunk+0x5/0x5f
> [  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
> [  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
> [  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
> [  977.647830]  ? __die_body.cold+0x8/0x12
> [  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [  977.656979]  ? page_fault_oops+0x146/0x160
> [  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [  977.666392]  ? exc_page_fault+0x152/0x160
> [  977.670405]  ? asm_exc_page_fault+0x26/0x30
> [  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> [  977.690527]  cpuhp_invoke_callback+0x122/0x410
> [  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [  977.699593]  cpuhp_thread_fun+0x98/0x140
> [  977.703521]  smpboot_thread_fn+0xdd/0x1d0
> [  977.707533]  kthread+0xd2/0x100
> [  977.710677]  ? __pfx_kthread+0x10/0x10
> [  977.714431]  ret_from_fork+0x34/0x50
> [  977.718009]  ? __pfx_kthread+0x10/0x10
> [  977.721763]  ret_from_fork_asm+0x1a/0x30
> [  977.725692]  </TASK>
> [  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
> dm_log dm_mod
> [  977.786224] CR2: ffffffffb3d5e310
> [  977.789544] ---[ end trace 0000000000000000 ]---
> [  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> [  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> 00 00
> [  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> [  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> [  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> [  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> [  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> [  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> [  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
> knlGS:0000000000000000
> [  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> [  977.969129] Kernel panic - not syncing: Fatal exception
> [  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> -- 
> Best Regards,
>   Yi Zhang
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-28  5:00 ` Joerg Roedel
@ 2024-05-28 17:30   ` Vasant Hegde
  2024-05-28 17:40     ` Vasant Hegde
  0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-28 17:30 UTC (permalink / raw)
  To: Joerg Roedel, Yi Zhang
  Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit

Hi Yi,


On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> Adding Vasant.
> 
> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>> Hello
>> I found this regression panic issue on the latest 6.10-rc1 and it
>> cannot be reproduced on 6.9, please help check and let me know if you
>> need any info/testing for it, thanks.

I have tried to reproduce this issue on my system. So far I am not able to
reproduce it.

Will you be able to bisect the kernel?

>>
>> reproducer
>> # cat config
>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>> # ./check block/008
>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>>     read iops  131813   ...
>>     runtime    32.097s  ...
>>
>> [  973.823246] run blktests block/008 at 2024-05-27 22:11:38
>> [  977.485983] kernel tried to execute NX-protected page - exploit
>> attempt? (uid: 0)
>> [  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>> [  977.500334] #PF: supervisor instruction fetch in kernel mode
>> [  977.505992] #PF: error_code(0x0011) - permissions violation
>> [  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>> PTE 8000000719d5e163
>> [  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>> [  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>> [  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>> 2.13.3 09/12/2023
>> [  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10

amd_iommu_enable_faulting() just returns zero.

-Vasant


>> [  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>> 00 00
>> [  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>> [  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>> [  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>> [  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>> [  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>> [  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>> [  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>> knlGS:0000000000000000
>> [  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>> [  977.625927] Call Trace:
>> [  977.628376]  <TASK>
>> [  977.630480]  ? srso_return_thunk+0x5/0x5f
>> [  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
>> [  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
>> [  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
>> [  977.647830]  ? __die_body.cold+0x8/0x12
>> [  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [  977.656979]  ? page_fault_oops+0x146/0x160
>> [  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [  977.666392]  ? exc_page_fault+0x152/0x160
>> [  977.670405]  ? asm_exc_page_fault+0x26/0x30
>> [  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>> [  977.690527]  cpuhp_invoke_callback+0x122/0x410
>> [  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
>> [  977.699593]  cpuhp_thread_fun+0x98/0x140
>> [  977.703521]  smpboot_thread_fn+0xdd/0x1d0
>> [  977.707533]  kthread+0xd2/0x100
>> [  977.710677]  ? __pfx_kthread+0x10/0x10
>> [  977.714431]  ret_from_fork+0x34/0x50
>> [  977.718009]  ? __pfx_kthread+0x10/0x10
>> [  977.721763]  ret_from_fork_asm+0x1a/0x30
>> [  977.725692]  </TASK>
>> [  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>> dm_log dm_mod
>> [  977.786224] CR2: ffffffffb3d5e310
>> [  977.789544] ---[ end trace 0000000000000000 ]---
>> [  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>> [  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>> 00 00
>> [  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>> [  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>> [  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>> [  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>> [  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>> [  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>> [  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>> knlGS:0000000000000000
>> [  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>> [  977.969129] Kernel panic - not syncing: Fatal exception
>> [  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> -- 
>> Best Regards,
>>   Yi Zhang
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-28 17:30   ` Vasant Hegde
@ 2024-05-28 17:40     ` Vasant Hegde
  2024-05-29  6:16       ` Yi Zhang
  0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-28 17:40 UTC (permalink / raw)
  To: Joerg Roedel, Yi Zhang
  Cc: linux-block, iommu, Shinichiro Kawasaki, suravee.suthikulpanit

Hi Yi,


On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> Hi Yi,
> 
> 
> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
>> Adding Vasant.
>>
>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>>> Hello
>>> I found this regression panic issue on the latest 6.10-rc1 and it
>>> cannot be reproduced on 6.9, please help check and let me know if you
>>> need any info/testing for it, thanks.
> 
> I have tried to reproduce this issue on my system. So far I am not able to
> reproduce it.
> 
> Will you be able to bisect the kernel?

I see that below patch touched this code path. Can you revert below patch and
test it again?

commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
Author: Dimitri Sivanich <sivanich@hpe.com>
Date:   Wed Apr 24 15:16:29 2024 +0800

    iommu/vt-d: Allocate DMAR fault interrupts locally

-Vasant

> 
>>>
>>> reproducer
>>> # cat config
>>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>>> # ./check block/008
>>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>>>     read iops  131813   ...
>>>     runtime    32.097s  ...
>>>
>>> [  973.823246] run blktests block/008 at 2024-05-27 22:11:38
>>> [  977.485983] kernel tried to execute NX-protected page - exploit
>>> attempt? (uid: 0)
>>> [  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>>> [  977.500334] #PF: supervisor instruction fetch in kernel mode
>>> [  977.505992] #PF: error_code(0x0011) - permissions violation
>>> [  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>>> PTE 8000000719d5e163
>>> [  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>>> [  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>>> [  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>>> 2.13.3 09/12/2023
>>> [  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> 
> amd_iommu_enable_faulting() just returns zero.
> 
> -Vasant
> 
> 
>>> [  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>> 00 00
>>> [  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>> [  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>> [  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>> [  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>> [  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>> [  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>> [  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>>> knlGS:0000000000000000
>>> [  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>> [  977.625927] Call Trace:
>>> [  977.628376]  <TASK>
>>> [  977.630480]  ? srso_return_thunk+0x5/0x5f
>>> [  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
>>> [  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
>>> [  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
>>> [  977.647830]  ? __die_body.cold+0x8/0x12
>>> [  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [  977.656979]  ? page_fault_oops+0x146/0x160
>>> [  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [  977.666392]  ? exc_page_fault+0x152/0x160
>>> [  977.670405]  ? asm_exc_page_fault+0x26/0x30
>>> [  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>> [  977.690527]  cpuhp_invoke_callback+0x122/0x410
>>> [  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
>>> [  977.699593]  cpuhp_thread_fun+0x98/0x140
>>> [  977.703521]  smpboot_thread_fn+0xdd/0x1d0
>>> [  977.707533]  kthread+0xd2/0x100
>>> [  977.710677]  ? __pfx_kthread+0x10/0x10
>>> [  977.714431]  ret_from_fork+0x34/0x50
>>> [  977.718009]  ? __pfx_kthread+0x10/0x10
>>> [  977.721763]  ret_from_fork_asm+0x1a/0x30
>>> [  977.725692]  </TASK>
>>> [  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>>> dm_log dm_mod
>>> [  977.786224] CR2: ffffffffb3d5e310
>>> [  977.789544] ---[ end trace 0000000000000000 ]---
>>> [  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>> [  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>> 00 00
>>> [  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>> [  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>> [  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>> [  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>> [  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>> [  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>> [  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>>> knlGS:0000000000000000
>>> [  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>> [  977.969129] Kernel panic - not syncing: Fatal exception
>>> [  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>
>>> -- 
>>> Best Regards,
>>>   Yi Zhang
>>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-28 17:40     ` Vasant Hegde
@ 2024-05-29  6:16       ` Yi Zhang
  2024-05-29  6:26         ` Vasant Hegde
  0 siblings, 1 reply; 8+ messages in thread
From: Yi Zhang @ 2024-05-29  6:16 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki,
	suravee.suthikulpanit

On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote:
>
> Hi Yi,
>
>
> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> > Hi Yi,
> >
> >
> > On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> >> Adding Vasant.
> >>
> >> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> >>> Hello
> >>> I found this regression panic issue on the latest 6.10-rc1 and it
> >>> cannot be reproduced on 6.9, please help check and let me know if you
> >>> need any info/testing for it, thanks.
> >
> > I have tried to reproduce this issue on my system. So far I am not able to
> > reproduce it.
> >
> > Will you be able to bisect the kernel?
>
> I see that below patch touched this code path. Can you revert below patch and
> test it again?

Yes, the panic cannot be reproduced now after revert this patch.

>
> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
> Author: Dimitri Sivanich <sivanich@hpe.com>
> Date:   Wed Apr 24 15:16:29 2024 +0800
>
>     iommu/vt-d: Allocate DMAR fault interrupts locally
>
> -Vasant
>
> >
> >>>
> >>> reproducer
> >>> # cat config
> >>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
> >>> # ./check block/008
> >>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
> >>>     read iops  131813   ...
> >>>     runtime    32.097s  ...
> >>>
> >>> [  973.823246] run blktests block/008 at 2024-05-27 22:11:38
> >>> [  977.485983] kernel tried to execute NX-protected page - exploit
> >>> attempt? (uid: 0)
> >>> [  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
> >>> [  977.500334] #PF: supervisor instruction fetch in kernel mode
> >>> [  977.505992] #PF: error_code(0x0011) - permissions violation
> >>> [  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
> >>> PTE 8000000719d5e163
> >>> [  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
> >>> [  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
> >>> 6.10.0-0.rc1.17.eln136.x86_64 #1
> >>> [  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
> >>> 2.13.3 09/12/2023
> >>> [  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> >
> > amd_iommu_enable_faulting() just returns zero.
> >
> > -Vasant
> >
> >
> >>> [  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> >>> 00 00
> >>> [  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> >>> [  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> >>> [  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> >>> [  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> >>> [  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> >>> [  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> >>> [  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
> >>> knlGS:0000000000000000
> >>> [  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> >>> [  977.625927] Call Trace:
> >>> [  977.628376]  <TASK>
> >>> [  977.630480]  ? srso_return_thunk+0x5/0x5f
> >>> [  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
> >>> [  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
> >>> [  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
> >>> [  977.647830]  ? __die_body.cold+0x8/0x12
> >>> [  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [  977.656979]  ? page_fault_oops+0x146/0x160
> >>> [  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [  977.666392]  ? exc_page_fault+0x152/0x160
> >>> [  977.670405]  ? asm_exc_page_fault+0x26/0x30
> >>> [  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
> >>> [  977.690527]  cpuhp_invoke_callback+0x122/0x410
> >>> [  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
> >>> [  977.699593]  cpuhp_thread_fun+0x98/0x140
> >>> [  977.703521]  smpboot_thread_fn+0xdd/0x1d0
> >>> [  977.707533]  kthread+0xd2/0x100
> >>> [  977.710677]  ? __pfx_kthread+0x10/0x10
> >>> [  977.714431]  ret_from_fork+0x34/0x50
> >>> [  977.718009]  ? __pfx_kthread+0x10/0x10
> >>> [  977.721763]  ret_from_fork_asm+0x1a/0x30
> >>> [  977.725692]  </TASK>
> >>> [  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
> >>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
> >>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
> >>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
> >>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
> >>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
> >>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
> >>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
> >>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
> >>> dm_log dm_mod
> >>> [  977.786224] CR2: ffffffffb3d5e310
> >>> [  977.789544] ---[ end trace 0000000000000000 ]---
> >>> [  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
> >>> [  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
> >>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
> >>> 00 00
> >>> [  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
> >>> [  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
> >>> [  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
> >>> [  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
> >>> [  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
> >>> [  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
> >>> [  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
> >>> knlGS:0000000000000000
> >>> [  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
> >>> [  977.969129] Kernel panic - not syncing: Fatal exception
> >>> [  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
> >>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> >>> [  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >>>
> >>> --
> >>> Best Regards,
> >>>   Yi Zhang
> >>>
>


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-29  6:16       ` Yi Zhang
@ 2024-05-29  6:26         ` Vasant Hegde
  2024-05-29  8:13           ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Vasant Hegde @ 2024-05-29  6:26 UTC (permalink / raw)
  To: Yi Zhang, sivanich, kevin.tian, Baolu Lu
  Cc: Joerg Roedel, linux-block, iommu, Shinichiro Kawasaki,
	suravee.suthikulpanit

Hi Yi,

+Dimitri, Lu, Tian.


On 5/29/2024 11:46 AM, Yi Zhang wrote:
> On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com> wrote:
>>
>> Hi Yi,
>>
>>
>> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
>>> Hi Yi,
>>>
>>>
>>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
>>>> Adding Vasant.
>>>>
>>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
>>>>> Hello
>>>>> I found this regression panic issue on the latest 6.10-rc1 and it
>>>>> cannot be reproduced on 6.9, please help check and let me know if you
>>>>> need any info/testing for it, thanks.
>>>
>>> I have tried to reproduce this issue on my system. So far I am not able to
>>> reproduce it.
>>>
>>> Will you be able to bisect the kernel?
>>
>> I see that below patch touched this code path. Can you revert below patch and
>> test it again?
> 
> Yes, the panic cannot be reproduced now after revert this patch.

Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just return
zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
code path.

@Dimitri, Can you look into this issue?

-Vasant

> 
>>
>> commit d74169ceb0d2e32438946a2f1f9fc8c803304bd6
>> Author: Dimitri Sivanich <sivanich@hpe.com>
>> Date:   Wed Apr 24 15:16:29 2024 +0800
>>
>>     iommu/vt-d: Allocate DMAR fault interrupts locally
>>
>> -Vasant
>>
>>>
>>>>>
>>>>> reproducer
>>>>> # cat config
>>>>> TEST_DEVS=(/dev/nvme0n1 /dev/nvme1n1)
>>>>> # ./check block/008
>>>>> block/008 => nvme0n1 (do IO while hotplugging CPUs)
>>>>>     read iops  131813   ...
>>>>>     runtime    32.097s  ...
>>>>>
>>>>> [  973.823246] run blktests block/008 at 2024-05-27 22:11:38
>>>>> [  977.485983] kernel tried to execute NX-protected page - exploit
>>>>> attempt? (uid: 0)
>>>>> [  977.493463] BUG: unable to handle page fault for address: ffffffffb3d5e310
>>>>> [  977.500334] #PF: supervisor instruction fetch in kernel mode
>>>>> [  977.505992] #PF: error_code(0x0011) - permissions violation
>>>>> [  977.511567] PGD 719225067 P4D 719225067 PUD 719226063 PMD 71a5ff063
>>>>> PTE 8000000719d5e163
>>>>> [  977.519662] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI
>>>>> [  977.524541] CPU: 4 PID: 42 Comm: cpuhp/4 Not tainted
>>>>> 6.10.0-0.rc1.17.eln136.x86_64 #1
>>>>> [  977.532366] Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS
>>>>> 2.13.3 09/12/2023
>>>>> [  977.540017] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>>
>>> amd_iommu_enable_faulting() just returns zero.
>>>
>>> -Vasant
>>>
>>>
>>>>> [  977.545329] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>>>> 00 00
>>>>> [  977.564076] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>>>> [  977.569301] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>>>> [  977.576433] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>>>> [  977.583567] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>>>> [  977.590698] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>>>> [  977.597833] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>>>> [  977.604963] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>>>>> knlGS:0000000000000000
>>>>> [  977.613050] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  977.618795] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>>>> [  977.625927] Call Trace:
>>>>> [  977.628376]  <TASK>
>>>>> [  977.630480]  ? srso_return_thunk+0x5/0x5f
>>>>> [  977.634491]  ? show_trace_log_lvl+0x255/0x2f0
>>>>> [  977.638851]  ? show_trace_log_lvl+0x255/0x2f0
>>>>> [  977.643213]  ? cpuhp_invoke_callback+0x122/0x410
>>>>> [  977.647830]  ? __die_body.cold+0x8/0x12
>>>>> [  977.651669]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [  977.656979]  ? page_fault_oops+0x146/0x160
>>>>> [  977.661080]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [  977.666392]  ? exc_page_fault+0x152/0x160
>>>>> [  977.670405]  ? asm_exc_page_fault+0x26/0x30
>>>>> [  977.674590]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [  977.679905]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [  977.685215]  ? __pfx_amd_iommu_enable_faulting+0x10/0x10
>>>>> [  977.690527]  cpuhp_invoke_callback+0x122/0x410
>>>>> [  977.694977]  ? __pfx_smpboot_thread_fn+0x10/0x10
>>>>> [  977.699593]  cpuhp_thread_fun+0x98/0x140
>>>>> [  977.703521]  smpboot_thread_fn+0xdd/0x1d0
>>>>> [  977.707533]  kthread+0xd2/0x100
>>>>> [  977.710677]  ? __pfx_kthread+0x10/0x10
>>>>> [  977.714431]  ret_from_fork+0x34/0x50
>>>>> [  977.718009]  ? __pfx_kthread+0x10/0x10
>>>>> [  977.721763]  ret_from_fork_asm+0x1a/0x30
>>>>> [  977.725692]  </TASK>
>>>>> [  977.727879] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>> dns_resolver nfs lockd grace netfs sunrpc vfat fat dm_multipath
>>>>> ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac
>>>>> edac_mce_amd dell_wmi sparse_keymap rfkill video kvm_amd dcdbas kvm
>>>>> dell_smbios dell_wmi_descriptor wmi_bmof rapl mgag200 pcspkr
>>>>> acpi_cpufreq i2c_algo_bit acpi_power_meter ptdma k10temp i2c_piix4
>>>>> ipmi_si acpi_ipmi ipmi_devintf ipmi_msghandler fuse xfs sd_mod sg ahci
>>>>> crct10dif_pclmul nvme libahci crc32_pclmul crc32c_intel mpt3sas
>>>>> ghash_clmulni_intel libata nvme_core tg3 ccp nvme_auth raid_class
>>>>> t10_pi scsi_transport_sas sp5100_tco wmi dm_mirror dm_region_hash
>>>>> dm_log dm_mod
>>>>> [  977.786224] CR2: ffffffffb3d5e310
>>>>> [  977.789544] ---[ end trace 0000000000000000 ]---
>>>>> [  977.883220] RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
>>>>> [  977.888532] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 00 00
>>>>> 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40
>>>>> 00 00
>>>>> [  977.907277] RSP: 0018:ffffa5bd80437e58 EFLAGS: 00010246
>>>>> [  977.912503] RAX: ffffffffb324bf00 RBX: ffff8f40df020820 RCX: 0000000000000000
>>>>> [  977.919633] RDX: 0000000000000001 RSI: 00000000000000c0 RDI: 0000000000000004
>>>>> [  977.926767] RBP: 0000000000000004 R08: ffff8f40df020848 R09: ffff8f398664ece0
>>>>> [  977.933900] R10: 0000000000000000 R11: 0000000000000008 R12: 00000000000000c0
>>>>> [  977.941030] R13: ffffffffb3d5e310 R14: 0000000000000000 R15: ffff8f40df020848
>>>>> [  977.948163] FS:  0000000000000000(0000) GS:ffff8f40df000000(0000)
>>>>> knlGS:0000000000000000
>>>>> [  977.956251] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [  977.961995] CR2: ffffffffb3d5e310 CR3: 0000000719220000 CR4: 0000000000350ef0
>>>>> [  977.969129] Kernel panic - not syncing: Fatal exception
>>>>> [  977.974439] Kernel Offset: 0x30400000 from 0xffffffff81000000
>>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> [  978.087528] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>>   Yi Zhang
>>>>>
>>
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-29  6:26         ` Vasant Hegde
@ 2024-05-29  8:13           ` Tian, Kevin
  2024-05-29 12:49             ` Dimitri Sivanich
  0 siblings, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2024-05-29  8:13 UTC (permalink / raw)
  To: Vasant Hegde, Yi Zhang, sivanich@hpe.com, Baolu Lu
  Cc: Joerg Roedel, linux-block, iommu@lists.linux.dev,
	Shinichiro Kawasaki, suravee.suthikulpanit@amd.com

> From: Vasant Hegde <vasant.hegde@amd.com>
> Sent: Wednesday, May 29, 2024 2:26 PM
> 
> Hi Yi,
> 
> +Dimitri, Lu, Tian.
> 
> 
> On 5/29/2024 11:46 AM, Yi Zhang wrote:
> > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com>
> wrote:
> >>
> >> Hi Yi,
> >>
> >>
> >> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> >>> Hi Yi,
> >>>
> >>>
> >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> >>>> Adding Vasant.
> >>>>
> >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> >>>>> Hello
> >>>>> I found this regression panic issue on the latest 6.10-rc1 and it
> >>>>> cannot be reproduced on 6.9, please help check and let me know if
> you
> >>>>> need any info/testing for it, thanks.
> >>>
> >>> I have tried to reproduce this issue on my system. So far I am not able to
> >>> reproduce it.
> >>>
> >>> Will you be able to bisect the kernel?
> >>
> >> I see that below patch touched this code path. Can you revert below patch
> and
> >> test it again?
> >
> > Yes, the panic cannot be reproduced now after revert this patch.
> 
> Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just
> return
> zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
> code path.
> 
> @Dimitri, Can you look into this issue?
> 

-int __init amd_iommu_enable_faulting(void)
+int __init amd_iommu_enable_faulting(unsigned int cpu)

likely it's due to '__init' not being removed...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10
  2024-05-29  8:13           ` Tian, Kevin
@ 2024-05-29 12:49             ` Dimitri Sivanich
  0 siblings, 0 replies; 8+ messages in thread
From: Dimitri Sivanich @ 2024-05-29 12:49 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Vasant Hegde, Yi Zhang, Baolu Lu, Joerg Roedel, linux-block,
	iommu@lists.linux.dev, Shinichiro Kawasaki,
	suravee.suthikulpanit@amd.com

On Wed, May 29, 2024 at 08:13:42AM +0000, Tian, Kevin wrote:
> > From: Vasant Hegde <vasant.hegde@amd.com>
> > Sent: Wednesday, May 29, 2024 2:26 PM
> > 
> > Hi Yi,
> > 
> > +Dimitri, Lu, Tian.
> > 
> > 
> > On 5/29/2024 11:46 AM, Yi Zhang wrote:
> > > On Wed, May 29, 2024 at 1:40 AM Vasant Hegde <vasant.hegde@amd.com>
> > wrote:
> > >>
> > >> Hi Yi,
> > >>
> > >>
> > >> On 5/28/2024 11:00 PM, Vasant Hegde wrote:
> > >>> Hi Yi,
> > >>>
> > >>>
> > >>> On 5/28/2024 10:30 AM, Joerg Roedel wrote:
> > >>>> Adding Vasant.
> > >>>>
> > >>>> On Tue, May 28, 2024 at 10:23:10AM +0800, Yi Zhang wrote:
> > >>>>> Hello
> > >>>>> I found this regression panic issue on the latest 6.10-rc1 and it
> > >>>>> cannot be reproduced on 6.9, please help check and let me know if
> > you
> > >>>>> need any info/testing for it, thanks.
> > >>>
> > >>> I have tried to reproduce this issue on my system. So far I am not able to
> > >>> reproduce it.
> > >>>
> > >>> Will you be able to bisect the kernel?
> > >>
> > >> I see that below patch touched this code path. Can you revert below patch
> > and
> > >> test it again?
> > >
> > > Yes, the panic cannot be reproduced now after revert this patch.
> > 
> > Thanks for verifying. AMD code path (amd_iommu_enable_faulting()) just
> > return
> > zero. It doesn't do anything else. I am not familiar with cpuhp_setup_state()
> > code path.
> > 
> > @Dimitri, Can you look into this issue?
> > 
> 
> -int __init amd_iommu_enable_faulting(void)
> +int __init amd_iommu_enable_faulting(unsigned int cpu)
> 
> likely it's due to '__init' not being removed...

Yes, agreed.  I will submit a patch with this change soon.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-29 13:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28  2:23 [bug report][regression] blktests block/008 lead kerne panic at RIP: 0010:amd_iommu_enable_faulting+0x0/0x10 Yi Zhang
2024-05-28  5:00 ` Joerg Roedel
2024-05-28 17:30   ` Vasant Hegde
2024-05-28 17:40     ` Vasant Hegde
2024-05-29  6:16       ` Yi Zhang
2024-05-29  6:26         ` Vasant Hegde
2024-05-29  8:13           ` Tian, Kevin
2024-05-29 12:49             ` Dimitri Sivanich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox