All of lore.kernel.org
 help / color / mirror / Atom feed
From: Malte Starostik <malte@starostik.de>
To: Benjamin Tissoires <benjamin.tissoires@redhat.com>,
	Linux regressions mailing list <regressions@lists.linux.dev>,
	"Limonciello, Mario" <mario.limonciello@amd.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>,
	basavaraj.natikar@amd.com, linux-input@vger.kernel.org,
	linux@hexchain.org, stable@vger.kernel.org
Subject: Re: amd_sfh driver causes kernel oops during boot
Date: Wed, 07 Jun 2023 00:57:07 +0200	[thread overview]
Message-ID: <5980752.YW5z2jdOID@zen> (raw)
In-Reply-To: <79bd270e-4a0d-b4be-992b-73c65d085624@amd.com>

Am Dienstag, 6. Juni 2023, 17:25:13 CEST schrieb Limonciello, Mario:
> On 6/6/2023 3:08 AM, Benjamin Tissoires wrote:
> > On Jun 06 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> >>> On Mon, Jun 05, 2023 at 01:24:25PM +0200, Malte Starostik wrote:
> >>>> Hello,
> >>>> 
> >>>> chiming in here as I'm experiencing what looks like the exact same
> >>>> issue, also on a Lenovo Z13 notebook, also on Arch:
> >>>> Oops during startup in task udev-worker followed by udev-worker
> >>>> blocking all attempts to suspend or cleanly shutdown/reboot the
> >>>> machine

> > I have a suspicion on commit 7bcfdab3f0c6 ("HID: amd_sfh: if no sensors
> > are enabled, clean up") because the stack trace says that there is a bad
> > list_add, which could happen if the object is not correctly initialized.
> > 
> > However, that commit was present in v6.2, so it might not be that one.
> > 
> If I'm not mistaken the Z13 doesn't actually have any
> sensors connected to SFH.  So I think the suspicion on
> 7bcfdab3f0c6 and theory this is triggered by HID init makes
> a lot of sense.
> 
> Can you try this patch?
> 
> diff --git a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> index d9b7b01900b5..fa693a5224c6 100644
> --- a/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> +++ b/drivers/hid/amd-sfh-hid/amd_sfh_client.c
> @@ -324,6 +324,7 @@ int amd_sfh_hid_client_init(struct amd_mp2_dev
> *privdata)
>                          devm_kfree(dev, cl_data->report_descr[i]);
>                  }
>                  dev_warn(dev, "Failed to discover, sensors not enabled
> is %d\n", cl_data->is_any_sensor_enabled);
> +               cl_data->num_hid_devices = 0;
>                  return -EOPNOTSUPP;
>          }
>          schedule_delayed_work(&cl_data->work_buffer,
> msecs_to_jiffies(AMD_SFH_IDLE_LOOP));

I applied this to 9e87b63ed37e202c77aa17d4112da6ae0c7c097c now, which was the 
origin when I started the whole bisection. Clean rebuild, issue still 
persists.

Out of 50 boots, I got:

25 clean
22 Oops as posted by the OP
1 same Oops, followed by a panic
1 lockup [1]
1 hanging with just a blank screen

Not sure whether the lockups are related, but [1] mentions modprobe and udev-
worker as well and all problems including the blank screen one appear roughly 
at the same time during boot. As this is before a graphics mode switch, I 
suspect the last mentioned case may be like [1] while the screen was blanked.
To support the timing correlation: the UVC error for the IR cam shown in the 
photo (normal boot noise) also appears right before the BUG in the non-lockup 
bad case.

I do see the dev_warn in dmesg, so the code path modified in your patch is 
indeed hit:
[   10.897521] pcie_mp2_amd 0000:63:00.7: Failed to discover, sensors not 
enabled is 1
[   10.897533] pcie_mp2_amd: probe of 0000:63:00.7 failed with error -95

BR Malte

[1] https://photos.app.goo.gl/2FAvQ7DqBsHEF6Bd8



  reply	other threads:[~2023-06-06 22:57 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 17:27 amd_sfh driver causes kernel oops during boot Haochen Tong
2023-05-24  3:58 ` Bagas Sanjaya
2023-05-24  6:10   ` Haochen Tong
2023-05-24 10:10     ` Bagas Sanjaya
2023-06-05 11:24       ` Malte Starostik
2023-06-06  2:36         ` Bagas Sanjaya
2023-06-06  6:56           ` Linux regression tracking (Thorsten Leemhuis)
2023-06-06  8:08             ` Benjamin Tissoires
2023-06-06 15:25               ` Limonciello, Mario
2023-06-06 22:57                 ` Malte Starostik [this message]
2023-06-20 13:20                   ` Linux regression tracking (Thorsten Leemhuis)
2023-06-20 18:50                     ` Limonciello, Mario
2023-06-20 20:03                       ` Limonciello, Mario
2023-06-21 23:41                         ` Malte Starostik
2023-06-21  2:46                     ` Haochen Tong
2023-07-10 12:16                     ` Linux regression tracking #update (Thorsten Leemhuis)
2023-06-06  9:53             ` Malte Starostik
2023-06-06  2:39       ` Bagas Sanjaya
2023-06-06  3:41         ` Haochen Tong
2023-05-24 10:08 ` Bagas Sanjaya
2023-07-07  9:37 ` Linux regression tracking #update (Thorsten Leemhuis)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5980752.YW5z2jdOID@zen \
    --to=malte@starostik.de \
    --cc=bagasdotme@gmail.com \
    --cc=basavaraj.natikar@amd.com \
    --cc=benjamin.tissoires@redhat.com \
    --cc=linux-input@vger.kernel.org \
    --cc=linux@hexchain.org \
    --cc=mario.limonciello@amd.com \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.