* BUG in amd_sfh_get_report
@ 2024-07-30 20:53 Dominique Martinet
2024-08-19 18:06 ` Jiri Kosina
0 siblings, 1 reply; 3+ messages in thread
From: Dominique Martinet @ 2024-07-30 20:53 UTC (permalink / raw)
To: Basavaraj Natikar
Cc: Jiri Kosina, Mario Limonciello, Nehal Shah, Shyam Sundar S K,
linux-input
Hello,
I just rebooted my server this morning and was greeted by this bug:
--------------
[ 9.251535] BUG: unable to handle page fault for address: ffffffff85600000
[ 9.254214] #PF: supervisor read access in kernel mode
[ 9.257295] #PF: error_code(0x0000) - not-present page
[ 9.259928] PGD 181a25067 P4D 181a25067 PUD 181a26063 PMD 0
[ 9.259940] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 9.259945] CPU: 11 PID: 723 Comm: (udev-worker) Tainted: P O 6.6.42 #1-NixOS
[ 9.259949] Hardware name: /Default string, BIOS FP7R2_B5D_04A.45 06/14/2023
[ 9.259950] RIP: 0010:amd_sfh_get_report+0x43/0x140 [amd_sfh]
[ 9.272030] Code: 00 48 8b 68 08 8b 45 10 85 c0 0f 84 d9 00 00 00 49 89 fc 41 89 f6 41 89 d7 31 db eb 0d 48 83 c3 01 48 39 c3 0f 84 bf 00 00 00 <4c> 39 64 dd 68 75 ec 48 8b 44 24 30 48 33 05 92 d3 c7 c2 be c0 0d
[ 9.272037] RSP: 0018:ffffc90000f8fb40 EFLAGS: 00010287
[ 9.272041] RAX: 0000000048000000 RBX: 0000000000545c2d RCX: 0000000000000000
[ 9.272043] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff88812ce84000
[ 9.272045] RBP: ffffffff82bd1e30 R08: ffffc90000f8fbd8 R09: ffffc90000f8fbd8
[ 9.272046] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88812ce84000
[ 9.272047] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
[ 9.272049] FS: 00007f7175005100(0000) GS:ffff88838ff80000(0000) knlGS:0000000000000000
[ 9.272050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.272051] CR2: ffffffff85600000 CR3: 0000000117900000 CR4: 0000000000f50ee0
[ 9.297338] PKRU: 55555554
[ 9.297345] Call Trace:
[ 9.297353] <TASK>
[ 9.297360] ? __die+0x23/0x80
[ 9.297371] ? page_fault_oops+0x171/0x500
[ 9.297376] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.297382] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.297384] ? search_bpf_extables+0x5f/0x90
[ 9.319385] ? exc_page_fault+0x158/0x160
[ 9.319397] ? asm_exc_page_fault+0x26/0x30
[ 9.319403] ? __pfx_css_release+0x10/0x10
[ 9.319417] ? amd_sfh_get_report+0x43/0x140 [amd_sfh]
[ 9.319426] amdtp_hid_request+0x3e/0x60 [amd_sfh]
[ 9.319435] sensor_hub_get_feature+0xad/0x180 [hid_sensor_hub]
[ 9.319448] hid_sensor_parse_common_attributes+0x217/0x320 [hid_sensor_iio_common]
[ 9.319457] hid_accel_3d_probe+0xb7/0x320 [hid_sensor_accel_3d]
[ 9.319463] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.319466] platform_probe+0x44/0xa0
[ 9.319474] really_probe+0x1ac/0x3f0
[ 9.319478] ? __pfx___driver_attach+0x10/0x10
[ 9.319480] __driver_probe_device+0x78/0x170
[ 9.319482] driver_probe_device+0x1f/0xa0
[ 9.319485] __driver_attach+0xea/0x1e0
[ 9.319487] bus_for_each_dev+0x8c/0xe0
[ 9.319493] bus_add_driver+0x14d/0x280
[ 9.319497] driver_register+0x5d/0x120
[ 9.319500] ? __pfx_hid_accel_3d_platform_driver_init+0x10/0x10 [hid_sensor_accel_3d]
[ 9.319504] do_one_initcall+0x5d/0x330
[ 9.319513] do_init_module+0x90/0x270
[ 9.319517] __do_sys_init_module+0x18a/0x1c0
[ 9.319520] ? srso_alias_return_thunk+0x5/0xfbef5
[ 9.319525] do_syscall_64+0x39/0x90
[ 9.319530] entry_SYSCALL_64_after_hwframe+0x78/0xe2
[ 9.319534] RIP: 0033:0x7f7174b1a61e
[ 9.319579] Code: 48 8b 0d 0d 68 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d da 67 0d 00 f7 d8 64 89 01 48
[ 9.319581] RSP: 002b:00007ffda3136658 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 9.319584] RAX: ffffffffffffffda RBX: 000055a3f7725040 RCX: 00007f7174b1a61e
[ 9.319585] RDX: 00007f7175181304 RSI: 0000000000007fd0 RDI: 000055a3f773edb0
[ 9.319587] RBP: 000055a3f773edb0 R08: 0000000000000000 R09: 0000000000000000
[ 9.319588] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f7175181304
[ 9.319589] R13: 0000000000020000 R14: 000055a3f771fa40 R15: 0000000000000000
[ 9.319593] </TASK>
[ 9.319594] Modules linked in: hid_sensor_gyro_3d hid_sensor_magn_3d snd_sof_amd_renoir intel_rapl_msr(+) edac_core nls_iso8859_1 hid_sensor_accel_3d(+) snd_sof_amd_acp rtw88_core intel_rapl_common hid_sensor_trigger nls_cp437 industrialio_triggered_buffer snd_sof_pci kfifo_buf snd_sof_xtensa_dsp hid_sensor_iio_common vfat industrialio snd_sof fat kvm_amd mac80211 snd_sof_utils hid_sensor_hub snd_hda_codec_realtek snd_hda_codec_hdmi kvm snd_hda_codec_generic drm_exec snd_soc_core amdxcp snd_usb_audio drm_buddy irqbypass snd_compress snd_hda_intel crc32_pclmul eeepc_wmi(-) polyval_clmulni ac97_bus snd_intel_dspcfg gpu_sched btusb asus_wmi snd_pcm_dmaengine polyval_generic snd_intel_sdw_acpi snd_usbmidi_lib gf128mul drm_suballoc_helper btrtl battery snd_pci_ps ghash_clmulni_intel snd_ump drm_ttm_helper snd_hda_codec snd_rpl_pci_acp6x btintel snd_rawmidi ttm btbcm snd_acp_pci ledtrig_audio input_leds sha512_ssse3 snd_seq_device snd_hda_core sparse_keymap btmtk evdev wmi_bmof snd_acp_legacy_common sha256_ssse3
[ 9.319659] drm_display_helper mc led_class snd_pci_acp6x snd_hwdep sha1_ssse3 cfg80211 mac_hid bluetooth r8169 aesni_intel snd_pcm cec crypto_simd cryptd snd_pci_acp5x i2c_algo_bit sp5100_tco snd_rn_pci_acp3x realtek snd_timer tpm_crb snd_acp_config mdio_devres ecdh_generic uas watchdog video snd tpm_tis amd_pmf snd_soc_acpi tiny_power_button rfkill ecc rapl usb_storage crc16 libphy libarc4 soundcore k10temp amd_sfh(+) i2c_piix4 ccp snd_pci_acp3x backlight wmi thermal tpm_tis_core platform_profile button acpi_tad serio_raw zfs(PO+) nfsd spl(O) tun tap auth_rpcgss macvlan nfs_acl lockd bridge grace stp llc fuse sunrpc efi_pstore configfs nfnetlink zram efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_generic sd_mod usbhid atkbd libps2 hid vivaldi_fmap ahci libahci libata nvme xhci_pci xhci_pci_renesas nvme_core scsi_mod xhci_hcd nvme_common t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic crct10dif_pclmul scsi_common crct10dif_common rtc_cmos i8042 serio dm_mod dax btrfs blake2b_generic
[ 9.319743] libcrc32c crc32c_generic crc32c_intel xor raid6_pq
[ 9.319749] CR2: ffffffff85600000
[ 9.319752] ---[ end trace 0000000000000000 ]---
[ 9.444407] RIP: 0010:amd_sfh_get_report+0x43/0x140 [amd_sfh]
[ 9.563701] Code: 00 48 8b 68 08 8b 45 10 85 c0 0f 84 d9 00 00 00 49 89 fc 41 89 f6 41 89 d7 31 db eb 0d 48 83 c3 01 48 39 c3 0f 84 bf 00 00 00 <4c> 39 64 dd 68 75 ec 48 8b 44 24 30 48 33 05 92 d3 c7 c2 be c0 0d
[ 9.563707] RSP: 0018:ffffc90000f8fb40 EFLAGS: 00010287
[ 9.563710] RAX: 0000000048000000 RBX: 0000000000545c2d RCX: 0000000000000000
[ 9.563711] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff88812ce84000
[ 9.563713] RBP: ffffffff82bd1e30 R08: ffffc90000f8fbd8 R09: ffffc90000f8fbd8
[ 9.563714] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88812ce84000
[ 9.563715] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
[ 9.563716] FS: 00007f7175005100(0000) GS:ffff88838ff80000(0000) knlGS:0000000000000000
[ 9.594612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.594617] CR2: ffffffff85600000 CR3: 0000000117900000 CR4: 0000000000f50ee0
[ 9.594619] PKRU: 55555554
[ 9.594622] note: (udev-worker)[723] exited with irqs disabled
------
Thanksfully the system was able to boot but I'm not quite sure if it's
related udev got a thread stuck trying to remove the device (probably
the thread died with some lock held) and everything was very slow;
something else crashed again shortly after so I didn't have time to
investigate the bugged state all that much.
- 6.6.42 kernel from nixos unstable
- CPU identified as AMD Ryzen 7 7735HS with Radeon Graphics in
/proc/cpuinfo
- this card:
05:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub [1022:15e4]
I'd offer to test mainline but I cannot reboot this machine easily, and
passing the card to qemu unfortunately didn't reproduce
(amd_sfh_dis_sts_v2() != 0 so it doesn't load, and skipping that check
doesn't help), so I'm afraid I won't be of much help with further
debugging but hopefully it'll give a starting point..
I unfortunately have no way to easily get debug infos but a quick look
at the disassembly hints that amd_sfh_get_report+0x43 is the
access to cli_data->hid_sensor_hubs[i]:
('i++')
aa6: 48 83 c3 01 add $0x1,%rbx
('i < cli_data->num_hid_devices' check)
aaa: 48 39 c3 cmp %rax,%rbx
aad: 0f 84 bf 00 00 00 je b72 <amd_sfh_get_report+0x102>
(amd_sfh_get_report+0x43;
'if (cli_data->hid_sensor_hubs[i] == hid) {'
0x68 is the offset of hid_sensor_hubs in struct amdtp_cl_data;
the registers / bug address also match rbp+8*rbx+0x68 = ffffffff85600000)
ab3: 4c 39 64 dd 68 cmp %r12,0x68(%rbp,%rbx,8)
ab8: 75 ec jne aa6 <amd_sfh_get_report+0x36>
ab3: 4c 39 64 dd 68 cmp %r12,0x68(%rbp,%rbx,8)
So the problem would be that num_hid_device somehow holds 0x48000000 and
that let i run free to way too high values?
I can't fault num_hid_devices init for a given cli_data in
amd_sfh_hid_client_init, but amd_sfh_get_report() might have been called
on something that's not quite valid yet or is in the process of being
removed?...
I'm sorry my previous reboot was a while ago so I can't even tell if
it's reproducible, but the code hasn't changed all that much recently so
this is probably a race condition so that'd explain I hadn't seen this
before...
(... And I honestly have no idea what this driver is all for even after
having looked at the code so I've just blacklisted the module for now,
good luck!)
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG in amd_sfh_get_report
2024-07-30 20:53 BUG in amd_sfh_get_report Dominique Martinet
@ 2024-08-19 18:06 ` Jiri Kosina
2024-08-20 6:24 ` Basavaraj Natikar
0 siblings, 1 reply; 3+ messages in thread
From: Jiri Kosina @ 2024-08-19 18:06 UTC (permalink / raw)
To: Dominique Martinet
Cc: Basavaraj Natikar, Mario Limonciello, Nehal Shah,
Shyam Sundar S K, linux-input
On Wed, 31 Jul 2024, Dominique Martinet wrote:
> Hello,
>
> I just rebooted my server this morning and was greeted by this bug:
AMD folks, can did you have a chance to look into this report, please?
--
Jiri Kosina
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: BUG in amd_sfh_get_report
2024-08-19 18:06 ` Jiri Kosina
@ 2024-08-20 6:24 ` Basavaraj Natikar
0 siblings, 0 replies; 3+ messages in thread
From: Basavaraj Natikar @ 2024-08-20 6:24 UTC (permalink / raw)
To: Jiri Kosina, Dominique Martinet
Cc: Basavaraj Natikar, Mario Limonciello, Nehal Shah,
Shyam Sundar S K, linux-input
On 8/19/2024 11:36 PM, Jiri Kosina wrote:
> On Wed, 31 Jul 2024, Dominique Martinet wrote:
>
>> Hello,
>>
>> I just rebooted my server this morning and was greeted by this bug:
> AMD folks, can did you have a chance to look into this report, please?
Yes Jiri, we tried to reproduce this issue but were unable to recreate it.
We are continuously monitoring the behavior to determine if the issue occurs again
Thanks,
--
Basavaraj
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-08-20 6:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-30 20:53 BUG in amd_sfh_get_report Dominique Martinet
2024-08-19 18:06 ` Jiri Kosina
2024-08-20 6:24 ` Basavaraj Natikar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).