public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Farhan Ali <alifm@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Cc: alex.williamson@redhat.com, helgaas@kernel.org, mjrosato@linux.ibm.com
Subject: Re: [PATCH v3 07/10] s390/pci: Store PCI error information for passthrough devices
Date: Tue, 16 Sep 2025 12:54:30 +0200	[thread overview]
Message-ID: <6703760a502d146909482f3aeb4333bf33cb431b.camel@linux.ibm.com> (raw)
In-Reply-To: <98a3bc6f-9b75-48cd-b09f-343831f5dcbf@linux.ibm.com>

On Mon, 2025-09-15 at 11:12 -0700, Farhan Ali wrote:
> On 9/15/2025 4:42 AM, Niklas Schnelle wrote:
> > On Thu, 2025-09-11 at 11:33 -0700, Farhan Ali wrote:
> > > For a passthrough device we need co-operation from user space to recover
> > > the device. This would require to bubble up any error information to user
> > > space.  Let's store this error information for passthrough devices, so it
> > > can be retrieved later.
> > > 
> > > Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> > > ---
> > > 
--- snip ---
> > > +	mutex_unlock(&zdev->pending_errs_lock);
> > > +}
> > > +
> > > +void zpci_cleanup_pending_errors(struct zpci_dev *zdev)
> > > +{
> > > +	struct pci_dev *pdev = NULL;
> > > +
> > > +	mutex_lock(&zdev->pending_errs_lock);
> > > +	pdev = pci_get_slot(zdev->zbus->bus, zdev->devfn);
> > > +	if (zdev->pending_errs.count)
> > > +		pr_err("%s: Unhandled PCI error events count=%zu",
> > > +				pci_name(pdev), zdev->pending_errs.count);
> > I think this could be a zpci_dbg(). That way you also don't need the
> > pci_get_slot() which is also buggy as it misses a pci_dev_put(). The
> > message also doesn't seem useful for the user. As I understand it this
> > would happen if a vfio-pci user dies without handling all the error
> > events but then vfio-pci will also reset the slot on closing of the
> > fds, no? So the device will get reset anyway.
> 
> Right, the device will reset anyway. But I wanted to at least give an 
> indication to the user that some events were not handled correctly. 
> Maybe pr_err is a little extreme, so can convert to a warn? This should 
> be rare as well behaving applications shouldn't do this. I am fine with 
> zpci_dbg as well, its just the kernel needs to be in debug mode for us 
> to get this info.

No, zpci_dbg() logs to /sys/kernel/debug/s390dbf/pci_msg/sprintf
without need for debug mode. I'm also ok with a pr_warn() or maybe even
pr_info(). I can see your argument that this may be useful to have in
dmesg e.g. when debugging a user-space driver without having to know
about s390 specific debug aids.

> 
> > 
> > > +	memset(&zdev->pending_errs, 0, sizeof(struct zpci_ccdf_pending));
> > If this goes wrong and we subsequently crash or take a live memory dump
> > I'd prefer to have bread crumbs such as the errors that weren't cleaned
> > up. Wouldn't it be enough to just set the count to zero and for debug
> > the original count will be in s390dbf.
> 
> I think setting count to zero should be enough, but I am wary about 
> keeping stale state around. How about just logging the count that was 
> not handled, in s390dbf? I think we already dump the ccdf in s390df if 
> we get any error event. So it should be enough for us to trace back the 
> unhandled error events?
> 
> > Also maybe it would make sense
> > to pull the zdev->mediated_recovery clearing in here?
> 
> I would like to keep the mediated_recovery flag separate from just 
> cleaning up the errors. The flag gets initialized when we open the vfio 
> device and so having the flag cleared on close makes it easier to track 
> this IMHO.

Ok yeah I can see the symmetry argument.
> 

  reply	other threads:[~2025-09-16 10:54 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11 18:32 [PATCH v3 00/10] Error recovery for vfio-pci devices on s390x Farhan Ali
2025-09-11 18:32 ` [PATCH v3 01/10] PCI: Avoid saving error values for config space Farhan Ali
2025-09-13  8:27   ` Alex Williamson
2025-09-15 17:15     ` Farhan Ali
2025-09-16 18:09   ` Bjorn Helgaas
2025-09-16 20:00     ` Farhan Ali
2025-09-19 18:17       ` Alex Williamson
2025-09-11 18:32 ` [PATCH v3 02/10] PCI: Add additional checks for flr reset Farhan Ali
2025-09-11 18:33 ` [PATCH v3 03/10] PCI: Allow per function PCI slots Farhan Ali
2025-09-12 12:23   ` Benjamin Block
2025-09-12 17:19     ` Farhan Ali
2025-09-16  6:52   ` Cédric Le Goater
2025-09-16 18:37     ` Farhan Ali
2025-09-17  6:21       ` Cédric Le Goater
2025-09-17 17:50         ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 04/10] s390/pci: Add architecture specific resource/bus address translation Farhan Ali
2025-09-17 14:48   ` Niklas Schnelle
2025-09-17 17:22     ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 05/10] s390/pci: Restore IRQ unconditionally for the zPCI device Farhan Ali
2025-09-15  8:39   ` Niklas Schnelle
2025-09-15 17:42     ` Farhan Ali
2025-09-16 10:59       ` Niklas Schnelle
2025-09-11 18:33 ` [PATCH v3 06/10] s390/pci: Update the logic for detecting passthrough device Farhan Ali
2025-09-15  9:22   ` Niklas Schnelle
2025-09-11 18:33 ` [PATCH v3 07/10] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2025-09-15 11:42   ` Niklas Schnelle
2025-09-15 18:12     ` Farhan Ali
2025-09-16 10:54       ` Niklas Schnelle [this message]
2025-09-11 18:33 ` [PATCH v3 08/10] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2025-09-13  9:04   ` Alex Williamson
2025-09-15 18:27     ` Farhan Ali
2025-09-15  6:26   ` Cédric Le Goater
2025-09-15 18:27     ` Farhan Ali
2025-09-11 18:33 ` [PATCH v3 09/10] vfio: Add a reset_done callback for vfio-pci driver Farhan Ali
2025-09-11 18:33 ` [PATCH v3 10/10] vfio: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6703760a502d146909482f3aeb4333bf33cb431b.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=alifm@linux.ibm.com \
    --cc=helgaas@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox