Linux s390 Architecture development
 help / color / mirror / Atom feed
From: Alex Williamson <alex@shazbot.org>
To: Farhan Ali <alifm@linux.ibm.com>
Cc: Keith Busch <kbusch@kernel.org>,
	linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, helgaas@kernel.org,
	schnelle@linux.ibm.com, mjrosato@linux.ibm.com,
	Julian Ruess <julianr@linux.ibm.com>,
	alex@shazbot.org, Chengwen Feng <fengchengwen@huawei.com>
Subject: Re: [PATCH v18 3/4] vfio/pci: Add a reset_done callback for vfio-pci driver
Date: Thu, 4 Jun 2026 13:57:17 -0600	[thread overview]
Message-ID: <20260604135717.5dc69583@shazbot.org> (raw)
In-Reply-To: <d9ca420b-a1e6-4d12-bd42-3993e401d58b@linux.ibm.com>

On Thu, 4 Jun 2026 10:17:04 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 6/4/2026 1:28 AM, Keith Busch wrote:
> > On Wed, Jun 03, 2026 at 11:24:14AM -0700, Farhan Ali wrote:  
> >> +static void vfio_pci_core_aer_reset_done(struct pci_dev *pdev)
> >> +{
> >> +	struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev);
> >> +
> >> +	if (!vdev->pci_saved_state)
> >> +		return;
> >> +
> >> +	pci_load_saved_state(pdev, vdev->pci_saved_state);
> >> +	pci_restore_state(pdev);
> >> +}  
> > Shouldn't there be a cooresponding user space notification that the
> > device has been restored? There's an eventfd on the error detected side
> > so user space can know the device needs recovery, but how does it come
> > to know that the reset is completed?  
> 
> I think if the VFIO_DEVICE_RESET ioctl completes successfully it should 
> be an indication that the reset has completed? AFAIU the ioctl will 
> drive a reset via pci_try_reset_function(). If reset completes completes 
> successfully the reset_done() callback is called via pci_dev_restore(). 
> So I don't think we need an eventfd to notify on reset completion. 
> Otherwise we would have the same problem today, where userspace is 
> unaware that VFIO_DEVICE_RESET did indeed successfully reset the device, 
> no? Or am I missing something?

I'm starting to feel a little sketchy about this.  I asked claude to
enumerate the state restores and the source of that restored state.
Hopefully this ascii table survives:

  ┌──────────────────────────┬────────────────────────┬─────────────────────┐
  │           Step           │         Source         │ Snapshot-dependent? │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │                          │ EXP cap save buffer    │                     │
  │ pci_restore_pcie_state   │ (pci_find_saved_cap,   │ YES                 │
  │                          │ cap.data)              │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │                          │ live                   │                     │
  │ pci_restore_pasid_state  │ pdev->pasid_enabled +  │ no                  │
  │                          │ pasid_features         │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_pri_state    │ live pdev->pri_enabled │ no                  │
  │                          │  + pri_reqs_alloc      │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_ats_state    │ live dev->ats_enabled  │ no                  │
  │                          │ + ats_stu              │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_vc_state     │ VC ext-cap save buffer │ YES                 │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │                          │ live resource_size()   │                     │
  │ pci_restore_rebar_state  │ (re-derived, written   │ no                  │
  │                          │ to hw)                 │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_dpc_state    │ DPC ext-cap save       │ YES                 │
  │                          │ buffer                 │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_ptm_state    │ PTM ext-cap save       │ YES                 │
  │                          │ buffer                 │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │                          │ TPH ext-cap save       │                     │
  │ pci_restore_tph_state    │ buffer, gated on live  │ YES (gated)         │
  │                          │ tph_enabled            │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_aer_clear_status     │ clears hw status (not  │ n/a                 │
  │                          │ a restore)             │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_aer_state    │ ERR ext-cap save       │ YES                 │
  │                          │ buffer                 │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │                          │ saved_config_space[16] │                     │
  │ pci_restore_config_space │  — type-0 header       │ YES                 │
  │                          │ (COMMAND, BARs,        │                     │
  │                          │ cacheline…)            │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_pcix_state   │ PCI-X cap save buffer  │ YES                 │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_msi_state    │ live msi_desc list +   │ no                  │
  │                          │ msi(x)_enabled         │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_enable_acs           │ re-derived from ACS    │ no                  │
  │                          │ policy                 │                     │
  ├──────────────────────────┼────────────────────────┼─────────────────────┤
  │ pci_restore_iov_state    │ live dev->sriov        │ no                  │
  │                          │ (num_VFs, ctrl)        │                     │
  └──────────────────────────┴────────────────────────┴─────────────────────┘

For things like MSI/X, SR-IOV, RE-BAR, etc. we're actually restoring
from the kernel internal state rather than the save buffer state, so
this is a no-op.  However, one thing in that list stands out, TPH.

We don't yet support enabling TPH, but there are series on the list
that propose to add this.  The TPH buffer space in the saved state is
allocated just by the capability being present.  On open TPH is
disabled and the saved state is untouched, zeros.  If TPH is then
enabled and the device reset, the pre-reset save state populates the
TPH save buffer and we restore that state post-reset.  With the change
here, reset_done would then push the open saved state.  The live TPH
state is enabled, therefore the restore pushes the original open state,
zeros.

This would result in a visible user change and maybe more importantly
shows that we're relying on ad-hoc behavior, without really any specific
policy to have this work reliably.  It actually seems like only in the
close function, where we've disabled anything the user might have
enabled, is it really valid to restore the original state.  Thanks,

Alex

  reply	other threads:[~2026-06-04 19:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 18:24 [PATCH v18 0/4] [VFIO] Error recovery for vfio-pci devices on s390x Farhan Ali
2026-06-03 18:24 ` [PATCH v18 1/4] s390/pci: Store PCI error information for passthrough devices Farhan Ali
2026-06-03 22:20   ` Alex Williamson
2026-06-03 23:35     ` Farhan Ali
2026-06-04 18:27       ` Alex Williamson
2026-06-03 18:24 ` [PATCH v18 2/4] vfio-pci/zdev: Add a device feature for error information Farhan Ali
2026-06-03 22:37   ` Alex Williamson
2026-06-03 23:40     ` Farhan Ali
2026-06-03 18:24 ` [PATCH v18 3/4] vfio/pci: Add a reset_done callback for vfio-pci driver Farhan Ali
2026-06-03 22:46   ` Alex Williamson
2026-06-04  0:01     ` Farhan Ali
2026-06-04  8:28   ` Keith Busch
2026-06-04 17:17     ` Farhan Ali
2026-06-04 19:57       ` Alex Williamson [this message]
2026-06-08 19:26         ` Farhan Ali
2026-06-04 20:42       ` Keith Busch
2026-06-05 18:41         ` Farhan Ali
2026-06-03 18:24 ` [PATCH v18 4/4] vfio/pci: Remove the pcie check for VFIO_PCI_ERR_IRQ_INDEX Farhan Ali

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260604135717.5dc69583@shazbot.org \
    --to=alex@shazbot.org \
    --cc=alifm@linux.ibm.com \
    --cc=fengchengwen@huawei.com \
    --cc=helgaas@kernel.org \
    --cc=julianr@linux.ibm.com \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox