linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: "Christian König" <christian.koenig@amd.com>,
	"Mario Limonciello (AMD) (kernel.org)" <superm1@kernel.org>,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	regressions@lists.linux.dev, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	acpica-devel@lists.linux.dev,
	"Robert Moore" <robert.moore@intel.com>,
	"Saket Dumbre" <saket.dumbre@intel.com>,
	spasswolf@web.de
Subject: Re: Crash during resume of pcie bridge due to infinite loop in ACPICA
Date: Tue, 02 Dec 2025 20:53:58 +0100	[thread overview]
Message-ID: <a3e1729912a94d10d7dd211efe837d4d6c7a3eaf.camel@web.de> (raw)
In-Reply-To: <CAJZ5v0heOKxk8=4kwXcRLFfKhxYBX+ze_Dc2w90xrM14jvCirg@mail.gmail.com>

Am Dienstag, dem 02.12.2025 um 19:59 +0100 schrieb Rafael J. Wysocki:
> On Fri, Nov 28, 2025 at 9:47 PM Bert Karwatzki <spasswolf@web.de> wrote:
> > 
> > This is not an ACPICA problem after all:
> > 
> > I did some more monitoring:
> > https://gitlab.freedesktop.org/spasswolf/linux-stable/-/commits/amdgpu_suspend_resume?ref_type=heads
> > and I still get a crash, but perhaps due to the delays the printk()s caused I actually get a helpful error message in netconsole:
> > 
> > T5971;ACPI BIOS Error (bug): Could not resolve symbol [\x5cM013.VARR], AE_NOT_FOUND (20240827/psargs-332)
> > T5971;acpi_ps_complete_op returned 0x5
> > T5971;acpi_ps_parse_aml_debug: parse loop returned = 0x5
> > T5971;ACPI Error: Aborting method \x5cM013 due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;ACPI Error: Aborting method \x5cM017 due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;ACPI Error: Aborting method \x5cM019 due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;ACPI Error: Aborting method \x5c_SB.PCI0.GPP0.M439 due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;ACPI Error: Aborting method \x5c_SB.PCI0.GPP0.M241 due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;ACPI Error: Aborting method \x5c_SB.PCI0.GPP0.M237._ON due to previous error (AE_NOT_FOUND) (20240827/psparse-935)
> > T5971;acpi_ps_parse_aml_debug: after walk loop
> > T5971;acpi_ps_execute_method_debug 331
> > T5971;acpi_ns_evaluate_debug 475 METHOD
> > T5971;acpi_evaluate_object_debug 255
> > T5971;__acpi_power_on_debug 369
> > T5971;acpi_power_on_unlocked_debug 442
> > T5971;acpi_power_on_unlocked_debug 446
> > T5971;acpi_power_on_debug 471
> > T5971;acpi_power_on_list_debug 649: result = -19
> > T5971;pcieport 0000:00:01.1: pci_pm_default_resume_early 568#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:00:01.1
> > T5971;pcieport 0000:00:01.1: broken device, retraining non-functional downstream link at 2.5GT/s#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:00:01.1
> > T5971;pcieport 0000:00:01.1: retraining failed#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:00:01.1
> > T5971;pcieport 0000:00:01.1: Data Link Layer Link Active not set in 1000 msec#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:00:01.1
> > T5971;pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:01:00.0
> > 
> > This shows that there seems to be no problem with ACPICA, and acpi_power_on_list(_debug)() returns -ENODEV,
> > the crash occurs later.
> > 
> > This leaves two question:
> > 1. Is this crash avoidable by different error handling in the pci code?
> > 2. If the crash is not avoidable, can we at least modify the error handling in such a way that
> > we get an error message through netconsole by default? (perhaps a little delay will suffice)
> 
> I'm not sure how far this is going to get you, but you may try the
> attached patch.

This looks worth trying, I'll try it once my current test run has crashed.

Currently I'm trying to figure out why this line is there:

pcieport 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible#012 SUBSYSTEM=pci#012 DEVICE=+pci:0000:01:00.0

This line comes from this part of pci_power_up():

 pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 if (PCI_POSSIBLE_ERROR(pmcsr)) {
 pci_err(dev, "Unable to change power state from %s to D0, device inaccessible\n",
 pci_power_name(dev->current_state));
 WARN(1, "Who is calling %s?\n", __func__); // My debug statement. (No result, yet.)
 dev->current_state = PCI_D3cold;
 return -EIO;
 }

The interesting thing here is that the pci device 0000:01:00.0 has already been disconnected 
(with pci_dev_set_disconnected()) when the resume of the bridge at 0000:00:01.1 failed
(in the failure path of pci_pm_bridge_power_up_actions()) (I know for sure
because I put printk()s there, too). I'm not sure if pci_power_up should be called in this case.

Bert Karwatzki






  reply	other threads:[~2025-12-02 19:54 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-06 12:09 [REGRESSION 00/04] Crash during resume of pcie bridge Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 01/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 02/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 03/04] " Bert Karwatzki
2025-10-06 12:09 ` [REGRESSION 04/04] " Bert Karwatzki
2025-10-06 12:39 ` [REGRESSION 00/04] " Christian König
2025-10-06 16:22   ` Bert Karwatzki
2025-10-07  6:50     ` Bert Karwatzki
2025-10-07 21:33 ` Mario Limonciello
2025-10-13 16:29   ` Bert Karwatzki
2025-10-13 18:51     ` Mario Limonciello
2025-10-14 10:50       ` Christian König
     [not found]         ` <1853e2af7f70cf726df278137b6d2d89d9d9dc82.camel@web.de>
2025-10-31 13:38           ` Bert Karwatzki
2025-10-31 13:47             ` Bert Karwatzki
2025-10-31 18:35               ` Bert Karwatzki
2025-11-05 11:44                 ` Bert Karwatzki
2025-11-05 21:31                   ` Mario Limonciello (AMD) (kernel.org)
2025-11-07 13:09                     ` Bert Karwatzki
2025-11-07 17:09                       ` Bert Karwatzki
2025-11-10 13:33                         ` Christian König
2025-11-16 21:08                           ` Crash during resume of pcie bridge due to infinite loop in ACPICA Bert Karwatzki
2025-11-17 16:40                             ` Rafael J. Wysocki
2025-11-24 22:34                               ` Bert Karwatzki
2025-11-25 19:46                                 ` Rafael J. Wysocki
2025-11-27  0:08                                   ` Bert Karwatzki
2025-11-27 13:02                                     ` Rafael J. Wysocki
2025-11-28 20:47                                       ` Bert Karwatzki
2025-12-02 18:59                                         ` Rafael J. Wysocki
2025-12-02 19:53                                           ` Bert Karwatzki [this message]
2025-12-02 20:01                                             ` Rafael J. Wysocki
2025-12-05 10:05                                               ` Crash during resume of pcie bridge due to incorrect error handling Bert Karwatzki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a3e1729912a94d10d7dd211efe837d4d6c7a3eaf.camel@web.de \
    --to=spasswolf@web.de \
    --cc=acpica-devel@lists.linux.dev \
    --cc=christian.koenig@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rafael@kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=robert.moore@intel.com \
    --cc=saket.dumbre@intel.com \
    --cc=superm1@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).