public inbox for linux-s390@vger.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Farhan Ali <alifm@linux.ibm.com>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org
Cc: helgaas@kernel.org, lukas@wunner.de, alex@shazbot.org,
	clg@redhat.com, mjrosato@linux.ibm.com
Subject: Re: [PATCH v14 4/7] s390/pci: Store PCI error information for passthrough devices
Date: Thu, 30 Apr 2026 09:48:03 +0200	[thread overview]
Message-ID: <c6363f08f898c7f4bb3e291ab2d3f7d8e3280459.camel@linux.ibm.com> (raw)
In-Reply-To: <3f0d25e2-fe8f-470e-a857-f5474cba8cdf@linux.ibm.com>

On Wed, 2026-04-29 at 09:48 -0700, Farhan Ali wrote:
> On 4/29/2026 4:41 AM, Niklas Schnelle wrote:
> > On Tue, 2026-04-21 at 09:30 -0700, Farhan Ali wrote:
> > > For a passthrough device we need co-operation from user space to recover
> > > the device. This would require to bubble up any error information to user
> > > space.  Let's store this error information for passthrough devices, so it
> > > can be retrieved later.
> > > 
> > > We can now have userspace drivers (vfio-pci based) on s390x. The userspace
> > > drivers will not have any KVM fd and so no kzdev associated with them. So
> > > we need to update the logic for detecting passthrough devices to not depend
> > > on struct kvm_zdev.
> > > 
> > > Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > ---
> > >   arch/s390/include/asm/pci.h      |  30 ++++++++
> > >   arch/s390/pci/pci.c              |   1 +
> > >   arch/s390/pci/pci_event.c        | 117 +++++++++++++++++--------------
> > >   drivers/vfio/pci/vfio_pci_zdev.c |   9 ++-
> > >   4 files changed, 105 insertions(+), 52 deletions(-)
> > > 
> > --- snip ---
> > > +
> > > +void zpci_start_mediated_recovery(struct zpci_dev *zdev)
> > > +{
> > > +	guard(mutex)(&zdev->pending_errs_lock);
> > > +	zdev->pending_errs.mediated_recovery = true;
> > > +}
> > > +EXPORT_SYMBOL_GPL(zpci_start_mediated_recovery);
> > > +
> > > +void zpci_stop_mediated_recovery(struct zpci_dev *zdev)
> > > +{
> > > +	struct pci_dev *pdev = NULL;
> > > +
> > > +	guard(mutex)(&zdev->pending_errs_lock);
> > > +	zdev->pending_errs.mediated_recovery = false;
> > > +	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> > > +	if (zdev->pending_errs.count)
> > > +		pr_info("%s: Unhandled PCI error events count=%d",
> > > +			pci_name(pdev), zdev->pending_errs.count);
> > Sashiko notes a possible ABBA locking issue here between
> > pending_errs_lock and the pci_bus_sem inside pci_get_slot(). I wonder
> > if that would also be visible with lockdep? Also Sashiko notes that
> > zdev->zbus->bus could be NULL I don't think this is possible at the
> > current callsites via vfio-pci though. Similarly I don't think
> > pci_get_slot() can really be NULL at the current call sites. This makes
> > me wonder however, would it maybe be cleaner to pass a struct pci_dev
> > here so you don't need the pci_get_slot() at all? Both
> > vfio_pci_zdev_open_device() and vfio_pci_zdev_close_device() have that
> > readily available via vdev->pdev.
> 
> The pdev here was meant for helpful error message. On second thought 
> maybe removing the pdev usage, and using the fid would be better?

Either that or maybe just return the number of left over errors and do
the print in the caller?

> 
> > 
> > 
> > > +	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> > > +	pci_dev_put(pdev);
> > > +}
> > > +EXPORT_SYMBOL_GPL(zpci_stop_mediated_recovery);
> > > +
> > >   static pci_ers_result_t zpci_event_notify_error_detected(struct pci_dev *pdev,
> > >   							 struct pci_driver *driver)
> > >   {
> > > @@ -175,7 +190,8 @@ static pci_ers_result_t zpci_event_do_reset(struct pci_dev *pdev,
> > >    * and the platform determines which functions are affected for
> > >    * multi-function devices.
> > >    */
> > > -static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
> > > +static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev,
> > > +							  struct zpci_ccdf_err *ccdf)
> > >   {
> > > 
--- snip ---
> > >   
> > > +	scoped_guard(mutex, &zdev->pending_errs_lock) {
> > > +		if (zdev->pending_errs.mediated_recovery) {
> > > +			pr_info("%s: Leaving recovery of pass-through device to user-space\n",
> > > +				pci_name(pdev));
> > > +			ers_res = PCI_ERS_RESULT_RECOVERED;
> > > +			status_str = "in progress";
> > > +			goto out_unlock;
> > > +		}
> > > +	}
> > > +
> > Sashiko notes that mixing goto unlock and lock guards within one
> > function is discouraged. Here it's not that hard to read since it is
> > just a scoped guard but I think we should still not mix it.
> > 
> > However if we also convert the device_lock() to a guard lock here the
> > goto would still make sense to go to the zpci_report_status() and that
> > is still a bit odd with guarded locks. So I think an alternative simple
> > fix, that makes this overall cleaner too, is to put the whole
> > scoped_guard() block above into a helper function then you can do the
> > goto out_unlock on that helper returning PCI_ERS_RESULT_RECVOERED in
> > line with e.g. zpci_event_notify_error_detected(). That way you don't
> > need to touch existing locks and you get to keep the guard locking.
> 
> How about changing it to mutex_lock/mutex_unlock? I think the block is 
> small enough that it shouldn't be too confusing. Moving to a function 
> opens up the possibility of using a stale value for mediated_recovery.

I'm probably missing something but I don't see how moving it to a
helper function changes whether the mediated_recovery can be stale?
You'd still use the same pending_errs_lock and passed in struct
zpci_dev * then you just return PCI_ERS_RESULT_RECOVERED if mediated
recovery is in effect? And you can do the pr_info() at the caller so
you don't need to pass in the pdev too.

> > 

  reply	other threads:[~2026-04-30  7:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 16:30 [PATCH v14 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-04-21 16:30 ` [PATCH v14 1/7] PCI: Allow per function PCI slots to fix slot reset on s390 Farhan Ali
2026-05-04 15:52   ` Gerd Bayer
2026-05-04 17:00     ` Farhan Ali
2026-04-21 16:30 ` [PATCH v14 2/7] PCI: Avoid saving config space state if inaccessible Farhan Ali
2026-04-21 16:30 ` [PATCH v14 3/7] PCI: Fail FLR when config space is inaccessible Farhan Ali
2026-04-21 16:30 ` [PATCH v14 4/7] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-04-29 11:41   ` Niklas Schnelle
2026-04-29 16:48     ` Farhan Ali
2026-04-30  7:48       ` Niklas Schnelle [this message]
2026-04-30 16:44         ` Farhan Ali
2026-04-21 16:30 ` [PATCH v14 5/7] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-04-29  9:40   ` Niklas Schnelle
2026-04-29 16:59     ` Farhan Ali
2026-04-30  8:35       ` Niklas Schnelle
2026-04-21 16:30 ` [PATCH v14 6/7] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-04-21 16:30 ` [PATCH v14 7/7] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali
2026-04-28 18:30 ` [PATCH v14 0/7] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-04-28 22:01 ` Bjorn Helgaas
2026-04-29 17:02   ` Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6363f08f898c7f4bb3e291ab2d3f7d8e3280459.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=alex@shazbot.org \
    --cc=alifm@linux.ibm.com \
    --cc=clg@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mjrosato@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox