public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: "Mario Limonciello" <superm1@kernel.org>,
	"Christian König" <christian.koenig@amd.com>,
	linux-kernel@vger.kernel.org
Cc: linux-next@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	Alex Deucher <alexander.deucher@amd.com>,
	spasswolf@web.de
Subject: Re: [PATCH] Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem"
Date: Mon, 02 Feb 2026 17:43:31 +0100	[thread overview]
Message-ID: <15517fd926161aee77b4df2ffa8bab4bd08eab9a.camel@web.de> (raw)
In-Reply-To: <2a3a3d4f-efa2-46e5-8fee-f51cf12812a9@kernel.org>

Am Montag, dem 02.02.2026 um 10:11 -0600 schrieb Mario Limonciello:
> On 2/2/26 8:35 AM, Christian König wrote:
> > On 2/2/26 15:25, Mario Limonciello wrote:
> > > On 1/31/26 6:24 PM, Bert Karwatzki wrote:
> > > > This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
> > > > 
> > > > This commit was erroneously applied again after commit 0ab5d711ec74
> > > > ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
> > > > removed it, leading to very hard to debug crashes, when used with a system with two
> > > > AMD GPUs of which only one supports ASPM.
> > > > 
> > > > Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
> > > > Link: https://github.com/acpica/acpica/issues/1060
> > > > Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
> > > > 
> > > > Signed-off-by: Bert Karwatzki <spasswolf@web.de>
> > > > ---
> > > 
> > > Amazing detective work, thanks so much.
> > > 
> > > This added the code initially:
> > > cba07cce39ace drm/amd: Check if ASPM is enabled from PCIe subsystem
> > > 
> > > This effectively removed it:
> > > 0ab5d711ec74d drm/amd: Refactor `amdgpu_aspm` to be evaluated per device
> > > 
> > > This was the accidental re-apply:
> > > 7294863a6f012 drm/amd: Check if ASPM is enabled from PCIe subsystem
> > > 
> > > It looks like this as right on the edge of the 5.17-rc6 and 5.18-rc1.
> > > I think drm-fixes-2022-02-25 and amd-drm-next-5.18-2022-02-25 ended up with different content.
> > > 
> > > Nonethless this is the correct change and I've applied it to amd-staging-drm-next.
> > > 
> > > Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> > 
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> > 
> > There is just one major question left: Why is disabling ASPM causing problems?
> > 
> 
> My theory is that it's a mismatch of PCIe core and AMDGPU.  IE if the 
> PCIe core thinks it's enabled but amdgpu thinks it is disabled can hit 
> some corner scenarios.

That's also my theory. In my case the discrete GPU is probed first

[    1.652505] [    T194] amdgpu 0000:03:00.0: enabling device (0000 -> 0002)
[    1.658662] [    T194] amdgpu 0000:03:00.0: amdgpu: initializing kernel modesetting (DIMGREY_CAVEFISH 0x1002:0x73FF 0x1462:0x1313 0xC3).
[    1.665045] [    T194] amdgpu 0000:03:00.0: amdgpu: register mmio base: 0xFCA00000
[    1.671399] [    T194] amdgpu 0000:03:00.0: amdgpu: register mmio size: 1048576
[    1.681596] [    T194] amdgpu 0000:03:00.0: amdgpu: detected ip block number 0 <common_v1_0_0> (nv_common)

then the built-in GPU is probed and set amdgpu_aspm = 0.

[    4.883191] [    T194] amdgpu 0000:08:00.0: enabling device (0006 -> 0007)
[    4.890078] [    T194] amdgpu 0000:08:00.0: amdgpu: initializing kernel modesetting (RENOIR 0x1002:0x1638 0x1462:0x1313 0xC5).
[    4.895907] [    T194] amdgpu 0000:08:00.0: amdgpu: register mmio base: 0xFC900000
[    4.901640] [    T194] amdgpu 0000:08:00.0: amdgpu: register mmio size: 524288
[    4.909833] [    T194] amdgpu 0000:08:00.0: amdgpu: detected ip block number 0 <common_v2_0_0> (soc15_common)

I'm going to monitor calls to amdgpu_device_should_use_aspm() to check if it's called during
the suspend/resumes cycle giving the wrong answer (i.e. false when ASPM is actually enabled)

Bert Karwatzki


      reply	other threads:[~2026-02-02 16:43 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-01  0:24 [PATCH] Revert "drm/amd: Check if ASPM is enabled from PCIe subsystem" Bert Karwatzki
2026-02-02 14:25 ` Mario Limonciello
2026-02-02 14:35   ` Christian König
2026-02-02 16:11     ` Mario Limonciello
2026-02-02 16:43       ` Bert Karwatzki [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15517fd926161aee77b4df2ffa8bab4bd08eab9a.camel@web.de \
    --to=spasswolf@web.de \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox