From: Jason Gunthorpe <jgg@nvidia.com>
To: Brett Creeley <brett.creeley@amd.com>
Cc: kvm@vger.kernel.org, netdev@vger.kernel.org,
alex.williamson@redhat.com, yishaih@nvidia.com,
shameerali.kolothum.thodi@huawei.com, kevin.tian@intel.com,
shannon.nelson@amd.com, drivers@pensando.io,
simon.horman@corigine.com
Subject: Re: [PATCH v8 vfio 6/7] vfio/pds: Add support for firmware recovery
Date: Fri, 14 Apr 2023 09:56:27 -0300 [thread overview]
Message-ID: <ZDlNeyv/HLG4SPwB@nvidia.com> (raw)
In-Reply-To: <20230404190141.57762-7-brett.creeley@amd.com>
On Tue, Apr 04, 2023 at 12:01:40PM -0700, Brett Creeley wrote:
> It's possible that the device firmware crashes and is able to recover
> due to some configuration and/or other issue. If a live migration
> is in progress while the firmware crashes, the live migration will
> fail. However, the VF PCI device should still be functional post
> crash recovery and subsequent migrations should go through as
> expected.
>
> When the pds_core device notices that firmware crashes it sends an
> event to all its client drivers. When the pds_vfio driver receives
> this event while migration is in progress it will request a deferred
> reset on the next migration state transition. This state transition
> will report failure as well as any subsequent state transition
> requests from the VMM/VFIO. Based on uapi/vfio.h the only way out of
> VFIO_DEVICE_STATE_ERROR is by issuing VFIO_DEVICE_RESET. Once this
> reset is done, the migration state will be reset to
> VFIO_DEVICE_STATE_RUNNING and migration can be performed.
>
> If the event is received while no migration is in progress (i.e.
> the VM is in normal operating mode), then no actions are taken
> and the migration state remains VFIO_DEVICE_STATE_RUNNING.
>
> Signed-off-by: Brett Creeley <brett.creeley@amd.com>
> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
> ---
> drivers/vfio/pci/pds/pci_drv.c | 110 +++++++++++++++++++++++++++++++-
> drivers/vfio/pci/pds/vfio_dev.c | 34 +++++++++-
> drivers/vfio/pci/pds/vfio_dev.h | 6 +-
> 3 files changed, 146 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/pci/pds/pci_drv.c b/drivers/vfio/pci/pds/pci_drv.c
> index b0781d9f4246..b155ac9b98ae 100644
> --- a/drivers/vfio/pci/pds/pci_drv.c
> +++ b/drivers/vfio/pci/pds/pci_drv.c
> @@ -20,6 +20,104 @@
> #define PDS_VFIO_DRV_DESCRIPTION "AMD/Pensando VFIO Device Driver"
> #define PCI_VENDOR_ID_PENSANDO 0x1dd8
>
> +static void
> +pds_vfio_recovery_work(struct work_struct *work)
> +{
> + struct pds_vfio_pci_device *pds_vfio =
> + container_of(work, struct pds_vfio_pci_device, work);
> + bool deferred_reset_needed = false;
> +
> + /* Documentation states that the kernel migration driver must not
> + * generate asynchronous device state transitions outside of
> + * manipulation by the user or the VFIO_DEVICE_RESET ioctl.
> + *
> + * Since recovery is an asynchronous event received from the device,
> + * initiate a deferred reset. Only issue the deferred reset if a
> + * migration is in progress, which will cause the next step of the
> + * migration to fail. Also, if the device is in a state that will
> + * be set to VFIO_DEVICE_STATE_RUNNING on the next action (i.e. VM is
> + * shutdown and device is in VFIO_DEVICE_STATE_STOP) as that will clear
> + * the VFIO_DEVICE_STATE_ERROR when the VM starts back up.
> + */
> + mutex_lock(&pds_vfio->state_mutex);
> + if ((pds_vfio->state != VFIO_DEVICE_STATE_RUNNING &&
> + pds_vfio->state != VFIO_DEVICE_STATE_ERROR) ||
> + (pds_vfio->state == VFIO_DEVICE_STATE_RUNNING &&
> + pds_vfio_dirty_is_enabled(pds_vfio)))
> + deferred_reset_needed = true;
> + mutex_unlock(&pds_vfio->state_mutex);
> +
> + /* On the next user initiated state transition, the device will
> + * transition to the VFIO_DEVICE_STATE_ERROR. At this point it's the user's
> + * responsibility to reset the device.
> + *
> + * If a VFIO_DEVICE_RESET is requested post recovery and before the next
> + * state transition, then the deferred reset state will be set to
> + * VFIO_DEVICE_STATE_RUNNING.
> + */
> + if (deferred_reset_needed)
> + pds_vfio_deferred_reset(pds_vfio, VFIO_DEVICE_STATE_ERROR);
> +}
Why is this a work? it is threaded on a blocking_notifier_chain so it
can call the mutex?
Why is the locking like this, can't you just call
pds_vfio_deferred_reset() under the mutex?
Jason
next prev parent reply other threads:[~2023-04-14 12:56 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-04 19:01 [PATCH v8 vfio 0/7] pds_vfio driver Brett Creeley
2023-04-04 19:01 ` [PATCH v8 vfio 1/7] vfio: Commonize combine_ranges for use in other VFIO drivers Brett Creeley
2023-04-14 12:31 ` Jason Gunthorpe
2023-04-04 19:01 ` [PATCH v8 vfio 2/7] vfio/pds: Initial support for pds_vfio VFIO driver Brett Creeley
2023-04-04 19:01 ` [PATCH v8 vfio 3/7] vfio/pds: register with the pds_core PF Brett Creeley
2023-04-10 20:41 ` Alex Williamson
2023-04-11 17:09 ` Brett Creeley
2023-04-14 12:43 ` Jason Gunthorpe
2023-04-17 18:42 ` Shannon Nelson
2023-04-21 0:42 ` Brett Creeley
2023-04-04 19:01 ` [PATCH v8 vfio 4/7] vfio/pds: Add VFIO live migration support Brett Creeley
2023-04-10 22:05 ` Alex Williamson
2023-04-11 17:21 ` Brett Creeley
2023-04-14 12:52 ` Jason Gunthorpe
2023-04-04 19:01 ` [PATCH v8 vfio 5/7] vfio/pds: Add support for dirty page tracking Brett Creeley
2023-04-10 22:15 ` Alex Williamson
2023-04-04 19:01 ` [PATCH v8 vfio 6/7] vfio/pds: Add support for firmware recovery Brett Creeley
2023-04-14 12:56 ` Jason Gunthorpe [this message]
2023-04-04 19:01 ` [PATCH v8 vfio 7/7] vfio/pds: Add Kconfig and documentation Brett Creeley
2023-04-14 12:57 ` Jason Gunthorpe
2023-04-04 19:03 ` [PATCH v8 vfio 0/7] pds_vfio driver Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZDlNeyv/HLG4SPwB@nvidia.com \
--to=jgg@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=brett.creeley@amd.com \
--cc=drivers@pensando.io \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=shannon.nelson@amd.com \
--cc=simon.horman@corigine.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.