From: Yishai Hadas <yishaih@nvidia.com>
To: <alex.williamson@redhat.com>, <jgg@nvidia.com>
Cc: <kvm@vger.kernel.org>, <kevin.tian@intel.com>,
<joao.m.martins@oracle.com>, <leonro@nvidia.com>,
<shayd@nvidia.com>, <maorg@nvidia.com>, <avihaih@nvidia.com>,
<cohuck@redhat.com>
Subject: Re: [PATCH V1 vfio 00/14] Add migration PRE_COPY support for mlx5 driver
Date: Wed, 30 Nov 2022 12:33:04 +0200 [thread overview]
Message-ID: <a3db348a-246f-dbff-7667-e5837ef8be47@nvidia.com> (raw)
In-Reply-To: <20221124173932.194654-1-yishaih@nvidia.com>
On 24/11/2022 19:39, Yishai Hadas wrote:
> This series adds migration PRE_COPY uAPIs and their implementation as
> part of mlx5 driver.
>
> The uAPIs follow some discussion that was done in the mailing list [1]
> in this area.
>
> By the time the patches were sent, there was no driver implementation
> for the uAPIs, now we have it for mlx5 driver.
>
> The optional PRE_COPY state opens the saving data transfer FD before
> reaching STOP_COPY and allows the device to dirty track the internal
> state changes with the general idea to reduce the volume of data
> transferred in the STOP_COPY stage.
>
> While in PRE_COPY the device remains RUNNING, but the saving FD is open.
>
> A new ioctl VFIO_MIG_GET_PRECOPY_INFO is provided to allow userspace to
> query the progress of the precopy operation in the driver with the idea
> it will judge to move to STOP_COPY at least once the initial data set is
> transferred, and possibly after the dirty size has shrunk appropriately.
>
> User space can detect whether PRE_COPY is supported for a given device
> by checking the VFIO_MIGRATION_PRE_COPY flag once using the
> VFIO_DEVICE_FEATURE_MIGRATION ioctl.
>
> Extra details exist as part of the specific uAPI patch from the series.
>
> Finally, we come with mlx5 implementation based on its device
> specification for PRE_COPY.
>
> To support PRE_COPY, mlx5 driver is transferring multiple states
> (images) of the device. e.g.: the source VF can save and transfer
> multiple states, and the target VF will load them by that order.
>
> The device is saving three kinds of states:
> 1) Initial state - when the device moves to PRE_COPY state.
> 2) Middle state - during PRE_COPY phase via VFIO_MIG_GET_PRECOPY_INFO,
> can be multiple such states.
> 3) Final state - when the device moves to STOP_COPY state.
>
> After moving to PRE_COPY state, the user is holding the saving FD and
> should use it for transferring the data from the source to the target
> while the VM is still running. From user point of view, it's a stream of
> data, however, from mlx5 driver point of view it includes multiple
> images/states. For that, it sets some headers with metadata on the
> source to be parsed on the target.
>
> At some point, user may switch the device state from PRE_COPY to
> STOP_COPY, this will invoke saving of the final state.
>
> As discussed earlier in the mailing list, the data that is returned as
> part of PRE_COPY is not required to have any bearing relative to the
> data size available during the STOP_COPY phase.
>
> For this, we have the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE option.
>
> In mlx5 driver we could gain with this series about 20-30 percent
> improvement in the downtime compared to the previous code when PRE_COPY
> wasn't supported.
>
> The series includes some pre-patches to be ready for managing multiple
> images then it comes with the PRE_COPY implementation itself.
>
> The matching qemu changes can be previewed here [2].
>
> They come on top of the v2 migration protocol patches that were sent
> already to the mailing list.
>
> Note:
> As this series includes a net/mlx5 patch, we may need to send it as a
> pull request format to VFIO to avoid conflicts before acceptance.
>
> [1] https://lore.kernel.org/kvm/20220302172903.1995-8-shameerali.kolothum.thodi@huawei.com/
> [2] https://github.com/avihai1122/qemu/commits/mig_v2_precopy
>
> Changes from V0: https://www.spinics.net/lists/kvm/msg294247.html
>
> Drop the first 2 patches that Alex merged already.
> Refactor mlx5 implementation based on Jason's comments on V0, it includes
> the below:
> * Refactor the PD usage to be aligned with the migration file life cycle.
> * Refactor the MKEY usage to be aligned with the migration file life cycle.
> * Refactor the migration file state.
> * Use queue based data chunks to simplify the driver code.
> * Use the FSM model on the target to simplify the driver code.
> * Extend the driver pre_copy header for future use.
>
> Yishai
>
>
> Jason Gunthorpe (1):
> vfio: Extend the device migration protocol with PRE_COPY
>
> Shay Drory (3):
> net/mlx5: Introduce ifc bits for pre_copy
> vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error
> vfio/mlx5: Enable MIGRATION_PRE_COPY flag
>
> Yishai Hadas (10):
> vfio/mlx5: Enforce a single SAVE command at a time
> vfio/mlx5: Refactor PD usage
> vfio/mlx5: Refactor MKEY usage
> vfio/mlx5: Refactor migration file state
> vfio/mlx5: Refactor to use queue based data chunks
> vfio/mlx5: Introduce device transitions of PRE_COPY
> vfio/mlx5: Introduce SW headers for migration states
> vfio/mlx5: Introduce vfio precopy ioctl implementation
> vfio/mlx5: Consider temporary end of stream as part of PRE_COPY
> vfio/mlx5: Introduce multiple loads
>
> drivers/vfio/pci/mlx5/cmd.c | 408 ++++++++++++++----
> drivers/vfio/pci/mlx5/cmd.h | 93 ++++-
> drivers/vfio/pci/mlx5/main.c | 750 ++++++++++++++++++++++++++++------
> drivers/vfio/vfio_main.c | 74 +++-
> include/linux/mlx5/mlx5_ifc.h | 14 +-
> include/uapi/linux/vfio.h | 122 +++++-
> 6 files changed, 1241 insertions(+), 220 deletions(-)
>
Hi Alex,
Any comments on V1 ? I would like to send V2 very soon with a small fix
that was found in patch #9 (see my note in the list).
As no comments were published on the UAPI patch so far and the other
code is mlx5 driver specific, I believe that V2 can be a merge candidate.
Let's try to complete review soon so that we can send a PR for the first
net/mlx5 patch, the others should go later on via the vfio tree.
By that, we can have in kernel 6.2 feature complete for migration V2 in
the UAPI area that were in V1 and we can focus on completing the QEMU side.
Thanks,
Yishai
prev parent reply other threads:[~2022-11-30 10:33 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-24 17:39 [PATCH V1 vfio 00/14] Add migration PRE_COPY support for mlx5 driver Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 01/14] net/mlx5: Introduce ifc bits for pre_copy Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 02/14] vfio: Extend the device migration protocol with PRE_COPY Yishai Hadas
2022-11-30 22:22 ` Alex Williamson
2022-12-01 0:51 ` Jason Gunthorpe
2022-12-01 8:01 ` Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 03/14] vfio/mlx5: Enforce a single SAVE command at a time Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 04/14] vfio/mlx5: Refactor PD usage Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 05/14] vfio/mlx5: Refactor MKEY usage Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 06/14] vfio/mlx5: Refactor migration file state Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 07/14] vfio/mlx5: Refactor to use queue based data chunks Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 08/14] vfio/mlx5: Introduce device transitions of PRE_COPY Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 09/14] vfio/mlx5: Introduce SW headers for migration states Yishai Hadas
2022-11-30 10:10 ` Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 10/14] vfio/mlx5: Introduce vfio precopy ioctl implementation Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 11/14] vfio/mlx5: Consider temporary end of stream as part of PRE_COPY Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 12/14] vfio/mlx5: Introduce multiple loads Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 13/14] vfio/mlx5: Fallback to STOP_COPY upon specific PRE_COPY error Yishai Hadas
2022-11-24 17:39 ` [PATCH V1 vfio 14/14] vfio/mlx5: Enable MIGRATION_PRE_COPY flag Yishai Hadas
2022-11-30 10:33 ` Yishai Hadas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3db348a-246f-dbff-7667-e5837ef8be47@nvidia.com \
--to=yishaih@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=cohuck@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=leonro@nvidia.com \
--cc=maorg@nvidia.com \
--cc=shayd@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox