From: Malte Starostik <malte@starostik.de>
To: Benjamin Tissoires <benjamin.tissoires@redhat.com>,
Linux regressions mailing list <regressions@lists.linux.dev>,
"Limonciello, Mario" <mario.limonciello@amd.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>,
basavaraj.natikar@amd.com, linux-input@vger.kernel.org,
linux@hexchain.org, stable@vger.kernel.org
Subject: Re: amd_sfh driver causes kernel oops during boot
Date: Wed, 07 Jun 2023 00:57:07 +0200 [thread overview]
Message-ID: <5980752.YW5z2jdOID@zen> (raw)
In-Reply-To: <79bd270e-4a0d-b4be-992b-73c65d085624@amd.com>
Am Dienstag, 6. Juni 2023, 17:25:13 CEST schrieb Limonciello, Mario:
> On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
> > On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >>>> Hello,
> >>>>
> >>>> chiming in here as I'm experiencing what looks like the exact same
> >>>> issue, also on a Lenovo Z13 notebook, also on Arch:
> >>>> Oops during startup in task udev-worker followed by udev-worker
> >>>> blocking all attempts to suspend or cleanly shutdown/reboot the
> >>>> machine
> > I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors
> > are enabled, clean up") because the stack trace says that there is a bad
> > list_add, which could happen if the object is not correctly initialized.
> >
> > However, that commit was present in v6.2, so it might not be that one.
> >
> If I'm not mistaken the Z13 doesn't actually have any
> sensors connected to SFH. So I think the suspicion on
> 7bcfdab3f0c6 and theory this is triggered by HID init makes
> a lot of sense.
>
> Can you try this patch?
>
> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> index d9b7b01900b5..fa693a5224c6 100644
> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
> *privdata)
> devm_kfree(dev, cl_data->report_descr[i]);
> }
> dev_warn(dev, "Failed to discover, sensors not enabled
> is %d\n", cl_data->is_any_sensor_enabled);
> + cl_data->num_hid_devices = 0;
> return -EOPNOTSUPP;
> }
> schedule_delayed_work(&cl_data->work_buffer,
> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));
I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, which was the
origin when I started the whole bisection. Clean rebuild, issue still
persists.
Out of 50 boots, I got:
25 clean
22 Oops as posted by the OP
1 same Oops, followed by a panic
1 lockup [1]
1 hanging with just a blank screen
Not sure whether the lockups are related, but [1] mentions modprobe and udev-
worker as well and all problems including the blank screen one appear roughly
at the same time during boot. As this is before a graphics mode switch, I
suspect the last mentioned case may be like [1] while the screen was blanked.
To support the timing correlation: the UVC error for the IR cam shown in the
photo (normal boot noise) also appears right before the BUG in the non-lockup
bad case.
I do see the dev_warn in dmesg, so the code path modified in your patch is
indeed hit:
[ 10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, sensors not
enabled is 1
[ 10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error -95
BR Malte
[1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8
next prev parent reply other threads:[~2023-06-06 22:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
2023-05-24 3:58 ` Bagas Sanjaya
2023-05-24 6:10 ` Haochen Tong
2023-05-24 10:10 ` Bagas Sanjaya
2023-06-05 11:24 ` Malte Starostik
2023-06-06 2:36 ` Bagas Sanjaya
2023-06-06 6:56 ` Linux regression tracking (Thorsten Leemhuis)
2023-06-06 8:08 ` Benjamin Tissoires
2023-06-06 15:25 ` Limonciello, Mario
2023-06-06 22:57 ` Malte Starostik [this message]
2023-06-20 13:20 ` Linux regression tracking (Thorsten Leemhuis)
2023-06-20 18:50 ` Limonciello, Mario
2023-06-20 20:03 ` Limonciello, Mario
2023-06-21 23:41 ` Malte Starostik
2023-06-21 2:46 ` Haochen Tong
2023-07-10 12:16 ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-06 9:53 ` Malte Starostik
2023-06-06 2:39 ` Bagas Sanjaya
2023-06-06 3:41 ` Haochen Tong
2023-05-24 10:08 ` Bagas Sanjaya
2023-07-07 9:37 ` Linux regression tracking #update (Thorsten Leemhuis)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5980752.YW5z2jdOID@zen \
--to=malte@starostik.de \
--cc=bagasdotme@gmail.com \
--cc=basavaraj.natikar@amd.com \
--cc=benjamin.tissoires@redhat.com \
--cc=linux-input@vger.kernel.org \
--cc=linux@hexchain.org \
--cc=mario.limonciello@amd.com \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).