public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Mario Limonciello <mario.limonciello@amd.com>
To: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: "bhelgaas@google.com" <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, lkml <linux-kernel@vger.kernel.org>,
	Jay Vosburgh <jay.vosburgh@canonical.com>
Subject: Re: [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly put devices into D0 when initializing"
Date: Wed, 3 Dec 2025 23:29:42 -0600	[thread overview]
Message-ID: <222da706-19c5-485c-be90-2ebda20c1142@amd.com> (raw)
In-Reply-To: <CAKAwkKvmZUGi+gEhr1nw5MV+rfyVP=Exu4AW1_WOPHDH6tSYug@mail.gmail.com>



On 12/3/2025 11:04 PM, Matthew Ruffell wrote:
> Hi Mario,
> 
> I thank you for your prompt reply, and apologise for my delayed reply.
> Answers inline.
> 
>> When you say AWS specific patches, can you be more specific?  What is
>> missing from a mainline kernel to use this hardware?  IE; how do I know
>> there aren't Ubuntu specific patches *causing* this issue.
> 
> I can reproduce the issue with the current HEAD of Linus's tree, with no
> additional patches applied. My current HEAD for testing is the 6.19 merge
> window, commit 51ab33fc0a8bef9454849371ef897a1241911b37.
> To get the mainline build to work on c5.metal on AWS I needed to edit a few
> config parameters, and I have attached the config I used.
> 
>> Now I've never used AWS - do you have an opportunity to do "regular"
>> reboots, or only kexec reboots?
>>
>> This issue only happens with a kexec reboot, right?
> 
> We can do regular and kexec reboots with the c5.metal instance type. The issue
> only happens with a kexec reboot.
> 
>> The first thing that jumps out at me is the code in
>> pci_device_shutdown() that clears bus mastering for a kexec reboot.
>> If you comment that out what happens?
> 
> I commented out the code that clears bus mastering, diff below, and kexec boots
> correctly now, and the NVME drive appears just as it did before
> "4d4c10f PCI: Explicitly put devices into D0 when initializing".
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 302d61783f6c..0cb14ff32475 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -517,8 +517,9 @@ static void pci_device_shutdown(struct device *dev)
>           * If it is not a kexec reboot, firmware will hit the PCI
>           * devices with big hammer and stop their DMA any way.
>           */
> -       if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> -               pci_clear_master(pci_dev);
> +/*     if (kexec_in_progress && (pci_dev->current_state <= PCI_D3hot))
> + *             pci_clear_master(pci_dev);
> + */
>   }
> 
>   #ifdef CONFIG_PM_SLEEP
> 
> Since this works, does that mean that the bus master bit isn't being set on the
> NVME device on the other side of kexec?

That's at least what it seems like.  And I guess trying to set D0 
without bus mastering enabling is causing a problem.

Could you try adding a pci_set_master() call to pci_power_up()?  This is 
what I have in mind (only compile tested):

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..68661e333032 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1323,6 +1323,7 @@ int pci_power_up(struct pci_dev *dev)
                 return -EIO;
         }

+       pci_set_master(dev);
         pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
         if (PCI_POSSIBLE_ERROR(pmcsr)) {
                 pci_err(dev, "Unable to change power state from %s to 
D0, device inaccessible\n",

> 
>> The next thing I would wonder if if you're compiling with
>> CONFIG_KEXEC_JUMP and if that has an impact to your issue.  When this is
>> defined there is a device suspend sequence in kernel_kexec() that is run
>> which will run various suspend related callbacks.  Maybe the issue is
>> actually in one of those callbacks.
> 
> Yes, Ubuntu kernels set CONFIG_KEXEC_JUMP=y. I did a build with
> CONFIG_KEXEC_JUMP=n and it has the same symptoms.
> 
>> A possible way to determine this would be to run rtcwake to suspend and
>> resume and see if the drive survives.  If it doesn't, it's a hint that
>> there is something going on with power management in this drive or the
>> bridge it's connected to.  Maybe one of them isn't handling D3 very well.
> 
> Unfortunately, this c5.metal instance type doesn't support rtcwake with mode mem
> or disk, as hibernation is disabled on these instance types. But since
> CONFIG_KEXEC_JUMP=n doesn't help,
> 
> I'm going to add some debug statements to pci_device_shutdown() to see what
> state the NVME device is in with and without
> "4d4c10f PCI: Explicitly put devices into D0 when initializing".
> 
> Thanks,
> Matthew

Thanks for the updates.

I have a relatively ignorant question.  Can you reproduce with kdump and 
a crash too?

I don't actually know if you configure kdump and then crash the kernel 
(say magic sys-rq key), does pci_device_shutdown() get called in order 
to do the kexec?  Or because the kernel is already in a crash state is 
there just a jump into the crash kernel image location?

  reply	other threads:[~2025-12-04  5:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-19  3:52 [PROBLEM] c5.metal on AWS fails to kexec after "PCI: Explicitly put devices into D0 when initializing" Matthew Ruffell
2025-09-19  5:02 ` Mario Limonciello
2025-12-04  5:04   ` Matthew Ruffell
2025-12-04  5:29     ` Mario Limonciello [this message]
2025-12-05  3:06       ` Matthew Ruffell
2025-12-05  3:10         ` Matthew Ruffell
2025-12-05  5:31           ` Mario Limonciello
2026-01-06  6:06             ` Mario Limonciello
2026-02-13  5:54               ` Matthew Ruffell
2026-02-13 19:26                 ` Bjorn Helgaas
2026-02-17 14:36                   ` Mario Limonciello
2026-02-23  6:04                     ` Mario Limonciello
2026-02-25  5:21                       ` Matthew Ruffell
2026-02-25  5:42                         ` Mario Limonciello
2025-12-05  5:28         ` Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=222da706-19c5-485c-be90-2ebda20c1142@amd.com \
    --to=mario.limonciello@amd.com \
    --cc=bhelgaas@google.com \
    --cc=jay.vosburgh@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=matthew.ruffell@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox