public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Bert Karwatzki <spasswolf@web.de>, linux-kernel@vger.kernel.org
Cc: linux-next@vger.kernel.org, linux-stable@vger.kernel.org,
	regressions@lists.linux.dev, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org,
	Mario Limonciello <superm1@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>
Subject: Re: [REGRESSION 00/04] Crash during resume of pcie bridge
Date: Mon, 6 Oct 2025 14:39:18 +0200	[thread overview]
Message-ID: <232324a9-e82d-40b3-b88b-538947411a24@amd.com> (raw)
In-Reply-To: <20251006120944.7880-1-spasswolf@web.de>

On 06.10.25 14:09, Bert Karwatzki wrote:
> Since linux version v6.15 I experience random crashes on my MSI Alpha 15 Laptop
> running debian trixie (amd64). The first such crash happened about in the midth
> of june, and as there were no useful log messages and even using netconsole
> gave no useful message I suspected faulty hardware. So I ran memtest86+ and
> found a faulty address line and replaced the memory (unfortunately 64G to 16G).
> But the crashes occured again and so I did a thorough investigation.
> 
> The crashes occur after 30min to 33h (yes, hours) of uptime and consist of a
> sudden reboot after which the PCI bridge at 00:02.4 and the nvme device 
> connected to it are missing. If there's sound running during the crash then the
> first sign of the crash is the sound looping like a broken record for about 2s,
> after which the reboot happens. With the missing nvme device the reboot drops to
> a rescue shell. Using "shutdown -h now" from that shell and starting the laptop
> with the power button restores the missing PCI bridge and nvme device.

Oh well, it sounds like some PCIe device is dropping of the bus and taking it's upstream bridge with it.

> As the bisections were not succesfull I tried to monitor the crash using
> netconsole and CONFIG_ACPI_DEBUG and "acpi.debug_layer=0xf acpi.debug_level=0x107"
> as command line parameters. With this the last message on netconsole before
> the crash is usually:
> 
> [21465.639279] [    T251]    evmisc-0132 ev_queue_notify_reques: Dispatching Notify on [GPP0] (Device) Value 0x00 (Bus Check) Node 00000000f81f36b8

A full dump of that might be helpful. That sounds like the dGPU is powering up/down.

> 
> GPP0 is the ACPI name of this PCI bridge (at least that's my best guess):
> 
> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
> 
> to which the discrete GPU is connected
> 
> 03:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c3)
> 
> via the pci express switch
> 
> 01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c3)
> 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
> 
> While the GUI (xfce on xorg) on my laptop runs on the built-in GPU the discrete 
> GPU usually wakes up quite often, e.g. when a window is opened or when scrolling down on youtube.

Yeah, that is a known issue and we are working on it.

Basically an application enumerates the possible render or video decode devices in the system and that wakes up the dGPU even when it isn't actually used.

> A somewhat reliable method to generate GPP0 notifies is putting on a youtube
> video and the periodically starting evolution with this script:
> 
> #!/bin/bash
> for i in {0..1000}
> do
> 	echo $i
> 	evolution &
> 	sleep 5
> 	killall evolution
> 	sleep 55
> done
> 
> This is also the method I used to test the debug kernel in the following mails.

To further narrow down the issue please run your laptop with amdgpu.runpm=0 on the kernel command line for a while and see if that is stable or not.

Thanks,
Christian.

> 
> Bert Karwatzki


  parent reply	other threads:[~2025-10-06 12:39 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 12:09 [REGRESSION 00/04] Crash during resume of pcie bridge Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 01/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 02/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 03/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 04/04] " Bert Karwatzki
2025-10-06 12:39 ` Christian König [this message]
2025-10-06 16:22   ` [REGRESSION 00/04] " Bert Karwatzki
2025-10-07  6:50     ` Bert Karwatzki
2025-10-07 21:33 ` Mario Limonciello
2025-10-13 16:29   ` Bert Karwatzki
2025-10-13 18:51     ` Mario Limonciello
2025-10-14 10:50       ` Christian König
     [not found]         ` <1853e2af7f70cf726df278137b6d2d89d9d9dc82.camel@web.de>
2025-10-31 13:38           ` Bert Karwatzki
2025-10-31 13:47             ` Bert Karwatzki
2025-10-31 18:35               ` Bert Karwatzki
2025-11-05 11:44                 ` Bert Karwatzki
2025-11-05 21:31                   ` Mario Limonciello (AMD) (kernel.org)
2025-11-07 13:09                     ` Bert Karwatzki
2025-11-07 17:09                       ` Bert Karwatzki
2025-11-10 13:33                         ` Christian König
2025-11-16 21:08                           ` Crash during resume of pcie bridge due to infinite loop in ACPICA Bert Karwatzki
2025-11-17 16:40                             ` Rafael J. Wysocki
2025-11-24 22:34                               ` Bert Karwatzki
2025-11-25 19:46                                 ` Rafael J. Wysocki
2025-11-27  0:08                                   ` Bert Karwatzki
2025-11-27 13:02                                     ` Rafael J. Wysocki
2025-11-28 20:47                                       ` Bert Karwatzki
2025-12-02 18:59                                         ` Rafael J. Wysocki
2025-12-02 19:53                                           ` Bert Karwatzki
2025-12-02 20:01                                             ` Rafael J. Wysocki
2025-12-05 10:05                                               ` Crash during resume of pcie bridge due to incorrect error handling Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=232324a9-e82d-40b3-b88b-538947411a24@amd.com \
    --to=christian.koenig@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-stable@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=spasswolf@web.de \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox