From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Haochen Tong <linux@hexchain.org>, stable@vger.kernel.org
Cc: Linux Regressions <regressions@lists.linux.dev>,
Linux Input Devices <linux-input@vger.kernel.org>,
Basavaraj Natikar <basavaraj.natikar@amd.com>,
Jiri Kosina <jikos@kernel.org>,
Benjamin Tissoires <benjamin.tissoires@redhat.com>
Subject: Re: amd_sfh driver causes kernel oops during boot
Date: Wed, 24 May 2023 17:08:19 +0700 [thread overview]
Message-ID: <ZG3iE4l5X0V4WMdI@debian.me> (raw)
In-Reply-To: <f40e3897-76f1-2cd0-2d83-e48d87130eab@hexchain.org>
[-- Attachment #1: Type: text/plain, Size: 10214 bytes --]
On Wed, May 24, 2023 at 01:27:57AM +0800, Haochen Tong wrote:
> Hi,
>
> Since kernel 6.3.0 (and also 6.4rc3), on a ThinkPad Z13 system with Arch
> Linux, I've noticed that the amd_sfh driver spews a lot of stack traces
> during boot. Sometimes it is an oops:
>
> BUG: unable to handle page fault for address: 000000000001000f
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 8 PID: 457 Comm: (udev-worker) Not tainted 6.3.3-arch1-1 #1
> fa7b7e0107004b3021a57a74b951e0a25e7e8584
> Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 )
> 12/08/2022
> RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 57
> 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 85 c0 0f
> 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
> RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
> RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
> RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
> R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
> FS: 00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
> <TASK>
> amdtp_hid_request+0x36/0x50 [amd_sfh
> 2e3095779aada9fdb1764f08ca578ccb14e41fe4]
> sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub
> d6157999c9d260a1bfa6f27d4a0dc2c3e2c5654e]
> hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common
> 07a7935272aa9c7a28193b574580b3e953a64ec4]
> hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d
> 9f2eb51294a1f0c0315b365f335617cbaef01eab]
> platform_probe+0x44/0xa0
> really_probe+0x19e/0x3e0
> ? __pfx___driver_attach+0x10/0x10
> __driver_probe_device+0x78/0x160
> driver_probe_device+0x1f/0x90
> __driver_attach+0xd2/0x1c0
> bus_for_each_dev+0x88/0xd0
> bus_add_driver+0x116/0x220
> driver_register+0x59/0x100
> ? __pfx_init_module+0x10/0x10 [hid_sensor_gyro_3d
> 9f2eb51294a1f0c0315b365f335617cbaef01eab]
> do_one_initcall+0x5d/0x240
> do_init_module+0x4a/0x200
> __do_sys_init_module+0x17f/0x1b0
> do_syscall_64+0x60/0x90
> ? ksys_read+0x6f/0xf0
> ? syscall_exit_to_user_mode+0x1b/0x40
> ? do_syscall_64+0x6c/0x90
> ? exc_page_fault+0x7c/0x180
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fe9ce721f9e
> Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00
> 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffd280dd828 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 000055b72a37f630 RCX: 00007fe9ce721f9e
> RDX: 00007fe9cec7a343 RSI: 00000000000077f8 RDI: 000055b72a56c7f0
> RBP: 00007fe9cec7a343 R08: 00000000000077f8 R09: 0000000000000000
> R10: 000000000001a0a1 R11: 0000000000000246 R12: 0000000000020000
> R13: 000055b72a363b90 R14: 000055b72a37f630 R15: 000055b72a36a070
> </TASK>
> Modules linked in: hid_sensor_accel_3d(+) hid_sensor_gyro_3d(+) qrtr
> hid_sensor_trigger snd_sof industrialio_triggered_buffer ath11k_pci(+)
> kfifo_buf snd_sof_utils hid_sensor_iio_common joydev ath11k industrialio
> snd_soc_core mousedev qmi_helpers snd_compress hid_sensor_hub
> snd_hda_scodec_cs35l41_spi ac97_bus snd_hda_codec_realtek(+)
> snd_pcm_dmaengine intel_rapl_msr snd_hda_codec_hdmi snd_hda_codec_generic
> intel_rapl_common mac80211 snd_pci_ps btusb snd_rpl_pci_acp6x btrtl
> snd_hda_intel edac_mce_amd uvcvideo btbcm snd_acp_pci snd_intel_dspcfg
> snd_pci_acp6x videobuf2_vmalloc snd_intel_sdw_acpi libarc4 uvc btintel
> snd_usb_audio(+) snd_pci_acp5x videobuf2_memops btmtk snd_hda_codec kvm_amd
> videobuf2_v4l2 snd_hda_scodec_cs35l41_i2c snd_usbmidi_lib
> snd_hda_scodec_cs35l41 snd_rn_pci_acp3x ucsi_acpi bluetooth videodev
> snd_hda_core typec_ucsi snd_acp_config snd_hda_cs_dsp_ctls wacom(+)
> hid_multitouch cfg80211 snd_rawmidi sp5100_tco kvm snd_seq_device cs_dsp
> videobuf2_common typec ecdh_generic snd_soc_acpi
> think_lmi snd_hwdep snd_pcm irqbypass crc16 snd_soc_cs35l41_lib mhi
> thunderbolt firmware_attributes_class snd_pci_acp3x amd_sfh(+) k10temp
> psmouse roles rapl i2c_piix4 mc snd_timer wmi_bmof serial_multi_instantiate
> i2c_hid_acpi acpi_tad i2c_hid amd_pmf amd_pmc mac_hid sch_fq tcp_bbr
> dm_multipath i2c_dev crypto_user fuse loop zram ip_tables x_tables xfs
> libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder
> tee usbhid dm_mod amdgpu i2c_algo_bit serio_raw thinkpad_acpi drm_ttm_helper
> atkbd libps2 crct10dif_pclmul vivaldi_fmap crc32_pclmul ledtrig_audio
> crc32c_intel polyval_clmulni ttm polyval_generic drm_buddy nvme gf128mul
> platform_profile gpu_sched ghash_clmulni_intel sha512_ssse3 snd aesni_intel
> soundcore drm_display_helper crypto_simd rfkill nvme_core xhci_pci cryptd
> cec ccp xhci_pci_renesas i8042 video nvme_common serio wmi
> CR2: 000000000001000f
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:amd_sfh_get_report+0x1e/0x110 [amd_sfh]
> Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 57
> 41 56 41 55 41 54 55 53 48 8b 87 60 1d 00 00 48 8b 68 08 <8b> 45 10 85 c0 0f
> 84 a9 00 00 00 49 89 fc 41 89 f7 41 89 d6 31 db
> RSP: 0018:ffffb164426f3a20 EFLAGS: 00010246
> RAX: ffff9b0ae6b7bd00 RBX: ffff9b0ac0f46000 RCX: 0000000000000000
> RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff9b0ac0f46000
> RBP: 000000000000ffff R08: ffffb164426f3ab8 R09: ffffb164426f3ab8
> R10: 000000000020031b R11: ffff9b0ace40ac00 R12: ffff9b0ace40ac00
> R13: 0000000000000002 R14: 0000000000000002 R15: ffff9b0acd213010
> FS: 00007fe9ceb82200(0000) GS:ffff9b1122000000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000000001000f CR3: 000000010940c000 CR4: 0000000000750ee0
> PKRU: 55555554
>
> Sometimes it is a list corruption in the same function with a similar stack:
>
> ------------[ cut here ]------------
> list_add corruption. next is NULL.
> WARNING: CPU: 5 PID: 433 at lib/list_debug.c:25 __list_add_valid+0x57/0xa0
> ...
> CPU: 5 PID: 433 Comm: (udev-worker) Not tainted 6.4.0-rc3-1-mainline #1
> b60166e85cb97a6631db26f9dcda0196ed7a0c93
> Hardware name: LENOVO 21D2CTO1WW/21D2CTO1WW, BIOS N3GET47W (1.27 )
> 12/08/2022
> RIP: 0010:__list_add_valid+0x57/0xa0
> Code: 01 00 00 00 c3 cc cc cc cc 48 c7 c7 58 91 e6 9a e8 1e b9 a8 ff 0f 0b
> 31 c0 c3 cc cc cc cc 48 c7 c7 80 91 e6 9a e8 09 b9 a8 ff <0f> 0b eb e9 48 89
> c1 48 c7 c7 a8 91 e6 9a e8 f6 b8 a8 ff 0f 0b eb
> RSP: 0018:ffffad9dc0c7bb10 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff92d5a8099448 RCX: 0000000000000027
> RDX: ffff92dbe1f61688 RSI: 0000000000000001 RDI: ffff92dbe1f61680
> RBP: ffff92d59ea93508 R08: 0000000000000000 R09: ffffad9dc0c7b9a0
> R10: 0000000000000003 R11: ffffffff9b6ca808 R12: 0000000000000000
> R13: ffff92d5a8099440 R14: ffff92d59ea93760 R15: 0000000000000002
> FS: 00007fbaf0262200(0000) GS:ffff92dbe1f40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005651de666000 CR3: 000000011cfee000 CR4: 0000000000750ee0
> PKRU: 55555554
> Call Trace:
> <TASK>
> amd_sfh_get_report+0xba/0x110 [amd_sfh
> 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
> amdtp_hid_request+0x36/0x50 [amd_sfh
> 78bf82e66cdb2ccf24cbe871a0835ef4eedddb17]
> sensor_hub_get_feature+0xad/0x170 [hid_sensor_hub
> 30e53e2c49ea1702e2482c0b3860e22265679e39]
> hid_sensor_parse_common_attributes+0x217/0x310 [hid_sensor_iio_common
> ed7fba7a4d4147d48156e6a4b2a034ad3fc94350]
> hid_gyro_3d_probe+0x7f/0x2e0 [hid_sensor_gyro_3d
> 10978a2cdfc8979f2a7366fcd005e0ea826088eb]
> platform_probe+0x44/0xa0
> really_probe+0x19e/0x3e0
> ? __pfx___driver_attach+0x10/0x10
> __driver_probe_device+0x78/0x160
> driver_probe_device+0x1f/0x90
> __driver_attach+0xd2/0x1c0
> bus_for_each_dev+0x88/0xd0
> bus_add_driver+0x116/0x220
> driver_register+0x59/0x100
> ? __pfx_hid_gyro_3d_platform_driver_init+0x10/0x10 [hid_sensor_gyro_3d
> 10978a2cdfc8979f2a7366fcd005e0ea826088eb]
> do_one_initcall+0x5d/0x240
> do_init_module+0x60/0x240
> __do_sys_init_module+0x17f/0x1b0
> do_syscall_64+0x60/0x90
> ? exc_page_fault+0x7f/0x180
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fbaf06c0f9e
> Code: 48 8b 0d bd ed 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00
> 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d 8a ed 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffc5ce88528 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
> RAX: ffffffffffffffda RBX: 00005651de36dff0 RCX: 00007fbaf06c0f9e
> RDX: 00007fbaf0ba9343 RSI: 00000000000079f0 RDI: 00005651de646fe0
> RBP: 00007fbaf0ba9343 R08: 00000000000079f0 R09: 0000000000000000
> R10: 0000000000019fb1 R11: 0000000000000246 R12: 0000000000020000
> R13: 00005651de45fb10 R14: 00005651de36dff0 R15: 00005651de44d5f0
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> This occurs during almost every boot. When it happens there is usually a
> (udev-worker) process lingering forever, which is unkillable and even
> prevents shutdown.
>
> Looking at past journals it never happened before 6.3 so I believe it is a
> regression.
>
> Relevant device:
> 63:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc.
> [AMD] Sensor Fusion Hub [1022:15e4]
> Subsystem: Lenovo Sensor Fusion Hub [17aa:22f1]
> Kernel driver in use: pcie_mp2_amd
> Kernel modules: amd_sfh
>
Thanks for the bug report. I'm adding it to regzbot:
#regzbot ^introduced: v6.2..v6.3
#regzbot title: amd_sfh driver causes kernel oops (udev-worker becomes zombie) on ThinkPad Z13
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2023-05-24 10:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
2023-05-24 3:58 ` Bagas Sanjaya
2023-05-24 6:10 ` Haochen Tong
2023-05-24 10:10 ` Bagas Sanjaya
2023-06-05 11:24 ` Malte Starostik
2023-06-06 2:36 ` Bagas Sanjaya
2023-06-06 6:56 ` Linux regression tracking (Thorsten Leemhuis)
2023-06-06 8:08 ` Benjamin Tissoires
2023-06-06 15:25 ` Limonciello, Mario
2023-06-06 22:57 ` Malte Starostik
2023-06-20 13:20 ` Linux regression tracking (Thorsten Leemhuis)
2023-06-20 18:50 ` Limonciello, Mario
2023-06-20 20:03 ` Limonciello, Mario
2023-06-21 23:41 ` Malte Starostik
2023-06-21 2:46 ` Haochen Tong
2023-07-10 12:16 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-06 9:53 ` Malte Starostik
2023-06-06 2:39 ` Bagas Sanjaya
2023-06-06 3:41 ` Haochen Tong
2023-05-24 10:08 ` Bagas Sanjaya [this message]
2023-07-07 9:37 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZG3iE4l5X0V4WMdI@debian.me \
--to=bagasdotme@gmail.com \
--cc=basavaraj.natikar@amd.com \
--cc=benjamin.tissoires@redhat.com \
--cc=jikos@kernel.org \
--cc=linux-input@vger.kernel.org \
--cc=linux@hexchain.org \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.