public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: helgaas@kernel.org, lukas@wunner.de,
	Farhan Ali <alifm@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
Cc: alex@shazbot.org, clg@redhat.com, stable@vger.kernel.org,
	mjrosato@linux.ibm.com
Subject: Re: [PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible
Date: Tue, 02 Dec 2025 13:20:15 +0100	[thread overview]
Message-ID: <2940d7cd662aed9d8b60f7c8fec9ced44f059166.camel@linux.ibm.com> (raw)
In-Reply-To: <20251201220823.3350-4-alifm@linux.ibm.com>

On Mon, 2025-12-01 at 14:08 -0800, Farhan Ali wrote:
> The current reset process saves the device's config space state before
> reset and restores it afterward. However, errors may occur unexpectedly,
> and the device may become inaccessible or the config space itself may
> be corrupted. This results in saving corrupted values that get
> written back to the device during state restoration.
> 
> With a reset we want to recover/restore the device into a functional
> state. So avoid saving the state of the config space when the
> device config space is inaccessible/corrupted.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>

I think the commit message needs more focus. Specifically I think the
main point is the case that Lukas mentioned in the following quote from
the cover letter of his "PCI: Universal error recoverability of
devices" series:

"However errors may occur unexpectedly and it may then be impossible
to save Config Space because the device may be inaccessible (e.g. DPC)
or Config Space may be corrupted. So it must be saved ahead of time."

That case will inevitably happen when state save / reset happens while
a PCI device is in the error state on a platform like s390, POWER, or
with DPC where Config Space will be inaccessible.

Moreover, I'd like to stress that this is an issue independent from the
rest of your series. As we've seen in your experiments this can be
triggered today when a vfio-pci user process blocks recovery, e.g. by
not handling the eventfd, and then the user tries to mitigate the
situation by performing a reset through sysfs, which then saves the
0xff bytes from inaccessible config space which may subsequently kill
the device on restore.

> ---
>  drivers/pci/pci.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 608d64900fee..28c6b9e7f526 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5105,6 +5105,7 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
>  
>  static void pci_dev_save_and_disable(struct pci_dev *dev)
>  {
> +	u32 val;
>  	const struct pci_error_handlers *err_handler =
>  			dev->driver ? dev->driver->err_handler : NULL;
>  
> @@ -5125,6 +5126,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>  	 */
>  	pci_set_power_state(dev, PCI_D0);
>  
> +	pci_read_config_dword(dev, PCI_COMMAND, &val);
> +	if (PCI_POSSIBLE_ERROR(val)) {
> +		pci_warn(dev, "Device config space inaccessible\n");
> +		return;
> +	}
> +

Can you explain your reasoning for not using pci_channel_offline()
here? This was suggested by Lukas in a previous iteration (link below)
and I would tend to prefer that as well.

https://lore.kernel.org/all/aOZoWDQV0TNh-NiM@wunner.de/

>  	pci_save_state(dev);
>  	/*
>  	 * Disable the device by clearing the Command register, except for

  reply	other threads:[~2025-12-02 12:20 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-01 22:08 [PATCH v6 0/9] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-12-01 22:08 ` [PATCH v6 1/9] PCI: Allow per function PCI slots Farhan Ali
2025-12-01 22:08 ` [PATCH v6 2/9] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-12-01 22:08 ` [PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible Farhan Ali
2025-12-02 12:20   ` Niklas Schnelle [this message]
2025-12-02 18:12     ` Farhan Ali
2025-12-01 22:08 ` [PATCH v6 4/9] PCI: Add additional checks for flr reset Farhan Ali
2025-12-01 22:08 ` [PATCH v6 5/9] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-12-01 22:08 ` [PATCH v6 6/9] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-12-01 22:08 ` [PATCH v6 7/9] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-12-01 22:08 ` [PATCH v6 8/9] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-12-01 22:08 ` [PATCH v6 9/9] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
2025-12-09 20:13 ` [PATCH v6 0/9] Error recovery for vfio-pci devices on s390x Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2940d7cd662aed9d8b60f7c8fec9ced44f059166.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=alex@shazbot.org \
    --cc=alifm@linux.ibm.com \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox