* [PATCH vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-27 20:42 ` Alex Williamson
2026-02-24 8:20 ` [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
` (5 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
As currently defined, initial_bytes is monotonically decreasing and
precedes dirty_bytes when reading from the saving file descriptor.
The transition from initial_bytes to dirty_bytes is unidirectional and
irreversible.
The initial_bytes are considered as critical data that is highly
recommended to be transferred to the target as part of PRE_COPY, without
this data, the PRE_COPY phase would be ineffective.
We come to solve the case when a new chunk of critical data is
introduced during the PRE_COPY phase and the driver would like to report
an entirely new value for the initial_bytes.
For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
initial_bytes value during the PRE_COPY phase.
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user(), this effectively echoes
userspace-provided flags back as output, preventing the field from being
used to report new reliable data from the drivers.
Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
to explicitly opt in by enabling the
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature.
When the caller opts in, the driver may report an entirely new
value for initial_bytes. It may be larger, it may be smaller, it may
include the previous unread initial_bytes, it may discard the previous
unread initial_bytes, up to the driver logic and state.
The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the
driver indicates that new initial data is present on the stream.
Once the caller sees this flag, the initial_bytes value should be
re-evaluated relative to the readiness state for transition to
STOP_COPY.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
include/uapi/linux/vfio.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index bb7b89330d35..b6efda07000f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1266,6 +1266,17 @@ enum vfio_device_mig_state {
* The initial_bytes field indicates the amount of initial precopy
* data available from the device. This field should have a non-zero initial
* value and decrease as migration data is read from the device.
+ * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates
+ * that new initial data is present on the stream.
+ * In that case initial_bytes may report a non-zero value irrespective of
+ * any previously reported values, which progresses towards zero as precopy
+ * data is read from the data stream. dirty_bytes is also reset
+ * to zero and represents the state change of the device relative to the new
+ * initial_bytes.
+ * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to
+ * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field
+ * of struct vfio_precopy_info is reserved for bug-compatibility reasons.
+ *
* It is recommended to leave PRE_COPY for STOP_COPY only after this field
* reaches zero. Leaving PRE_COPY earlier might make things slower.
*
@@ -1301,6 +1312,7 @@ enum vfio_device_mig_state {
struct vfio_precopy_info {
__u32 argsz;
__u32 flags;
+#define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */
__aligned_u64 initial_bytes;
__aligned_u64 dirty_bytes;
};
@@ -1510,6 +1522,16 @@ struct vfio_device_feature_dma_buf {
struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
};
+/*
+ * Enables the migration prepcopy_info_v2 behaviour.
+ *
+ * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
+ *
+ * On SET, enables the v2 pre_copy_info behaviour, where the
+ * vfio_precopy_info.flags is a valid output field.
+ */
+#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
2026-02-24 8:20 ` [PATCH vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase Yishai Hadas
@ 2026-02-27 20:42 ` Alex Williamson
0 siblings, 0 replies; 17+ messages in thread
From: Alex Williamson @ 2026-02-27 20:42 UTC (permalink / raw)
To: Yishai Hadas
Cc: alex.williamson, jgg, kvm, kevin.tian, joao.m.martins, leonro,
maorg, avihaih, liulongfang, giovanni.cabiddu, kwankhede, alex
On Tue, 24 Feb 2026 10:20:14 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> As currently defined, initial_bytes is monotonically decreasing and
> precedes dirty_bytes when reading from the saving file descriptor.
> The transition from initial_bytes to dirty_bytes is unidirectional and
> irreversible.
>
> The initial_bytes are considered as critical data that is highly
> recommended to be transferred to the target as part of PRE_COPY, without
> this data, the PRE_COPY phase would be ineffective.
>
> We come to solve the case when a new chunk of critical data is
> introduced during the PRE_COPY phase and the driver would like to report
> an entirely new value for the initial_bytes.
>
> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
> initial_bytes value during the PRE_COPY phase.
>
> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
> assign info.flags before copy_to_user(), this effectively echoes
> userspace-provided flags back as output, preventing the field from being
> used to report new reliable data from the drivers.
>
> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
> to explicitly opt in by enabling the
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature.
>
> When the caller opts in, the driver may report an entirely new
> value for initial_bytes. It may be larger, it may be smaller, it may
> include the previous unread initial_bytes, it may discard the previous
> unread initial_bytes, up to the driver logic and state.
> The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the
> driver indicates that new initial data is present on the stream.
>
> Once the caller sees this flag, the initial_bytes value should be
> re-evaluated relative to the readiness state for transition to
> STOP_COPY.
>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
> include/uapi/linux/vfio.h | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index bb7b89330d35..b6efda07000f 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1266,6 +1266,17 @@ enum vfio_device_mig_state {
> * The initial_bytes field indicates the amount of initial precopy
> * data available from the device. This field should have a non-zero initial
> * value and decrease as migration data is read from the device.
> + * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates
> + * that new initial data is present on the stream.
> + * In that case initial_bytes may report a non-zero value irrespective of
> + * any previously reported values, which progresses towards zero as precopy
> + * data is read from the data stream. dirty_bytes is also reset
> + * to zero and represents the state change of the device relative to the new
> + * initial_bytes.
> + * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to
> + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field
> + * of struct vfio_precopy_info is reserved for bug-compatibility reasons.
> + *
> * It is recommended to leave PRE_COPY for STOP_COPY only after this field
> * reaches zero. Leaving PRE_COPY earlier might make things slower.
> *
> @@ -1301,6 +1312,7 @@ enum vfio_device_mig_state {
> struct vfio_precopy_info {
> __u32 argsz;
> __u32 flags;
> +#define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */
> __aligned_u64 initial_bytes;
> __aligned_u64 dirty_bytes;
> };
> @@ -1510,6 +1522,16 @@ struct vfio_device_feature_dma_buf {
> struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
> };
>
> +/*
> + * Enables the migration prepcopy_info_v2 behaviour.
s/prepcopy/precopy/
Thanks,
Alex
> + *
> + * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
> + *
> + * On SET, enables the v2 pre_copy_info behaviour, where the
> + * vfio_precopy_info.flags is a valid output field.
> + */
> +#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-27 20:42 ` Alex Williamson
2026-02-24 8:20 ` [PATCH vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl Yishai Hadas
` (4 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user().
Because they copy the struct in from userspace first, this effectively
echoes userspace-provided flags back as output, preventing the field
from being used to report new reliable data from the drivers.
Add support for a new device feature named
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
On SET, enables the v2 pre_copy_info behaviour, where the
vfio_precopy_info.flags is a valid output field.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/vfio_pci_core.c | 1 +
drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
include/linux/vfio.h | 1 +
3 files changed, 22 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c84..e22280f53ebf 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
#endif
vfio_pci_core_disable(vdev);
+ core_vdev->precopy_info_flags_fix = 0;
vfio_pci_dma_buf_cleanup(vdev);
mutex_lock(&vdev->igate);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 742477546b15..2243a6eb5547 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
return 0;
}
+static int
+vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
+ u32 flags, size_t argsz)
+{
+ int ret;
+
+ if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
+ return -EINVAL;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
+ if (ret != 1)
+ return ret;
+
+ device->precopy_info_flags_fix = 1;
+ return 0;
+}
+
static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
u32 flags, void __user *arg,
size_t argsz)
@@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
return vfio_ioctl_device_feature_migration_data_size(
device, feature.flags, arg->data,
feature.argsz - minsz);
+ case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
+ return vfio_ioctl_device_feature_migration_precopy_info_v2(
+ device, feature.flags, feature.argsz - minsz);
default:
if (unlikely(!device->ops->device_feature))
return -ENOTTY;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..3ff21374aeee 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -52,6 +52,7 @@ struct vfio_device {
struct vfio_device_set *dev_set;
struct list_head dev_set_list;
unsigned int migration_flags;
+ u8 precopy_info_flags_fix;
struct kvm *kvm;
/* Members below here are private, not for driver use */
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-02-24 8:20 ` [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
@ 2026-02-27 20:42 ` Alex Williamson
2026-02-28 19:51 ` Alex Williamson
0 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2026-02-27 20:42 UTC (permalink / raw)
To: Yishai Hadas
Cc: alex.williamson, jgg, kvm, kevin.tian, joao.m.martins, leonro,
maorg, avihaih, liulongfang, giovanni.cabiddu, kwankhede, alex
On Tue, 24 Feb 2026 10:20:15 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
> assign info.flags before copy_to_user().
>
> Because they copy the struct in from userspace first, this effectively
> echoes userspace-provided flags back as output, preventing the field
> from being used to report new reliable data from the drivers.
>
> Add support for a new device feature named
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>
> On SET, enables the v2 pre_copy_info behaviour, where the
> vfio_precopy_info.flags is a valid output field.
>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 1 +
> drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
> include/linux/vfio.h | 1 +
> 3 files changed, 22 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..e22280f53ebf 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> #endif
> vfio_pci_core_disable(vdev);
>
> + core_vdev->precopy_info_flags_fix = 0;
> vfio_pci_dma_buf_cleanup(vdev);
>
> mutex_lock(&vdev->igate);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 742477546b15..2243a6eb5547 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
> return 0;
> }
>
> +static int
> +vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
> + u32 flags, size_t argsz)
> +{
> + int ret;
> +
> + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
> + return -EINVAL;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
This should be VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_PROBE.
Probe support is essentially free, but we've not been good about
including it. Thanks,
Alex
> + if (ret != 1)
> + return ret;
> +
> + device->precopy_info_flags_fix = 1;
> + return 0;
> +}
> +
> static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
> u32 flags, void __user *arg,
> size_t argsz)
> @@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
> return vfio_ioctl_device_feature_migration_data_size(
> device, feature.flags, arg->data,
> feature.argsz - minsz);
> + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
> + return vfio_ioctl_device_feature_migration_precopy_info_v2(
> + device, feature.flags, feature.argsz - minsz);
> default:
> if (unlikely(!device->ops->device_feature))
> return -ENOTTY;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e90859956514..3ff21374aeee 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -52,6 +52,7 @@ struct vfio_device {
> struct vfio_device_set *dev_set;
> struct list_head dev_set_list;
> unsigned int migration_flags;
> + u8 precopy_info_flags_fix;
> struct kvm *kvm;
>
> /* Members below here are private, not for driver use */
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-02-27 20:42 ` Alex Williamson
@ 2026-02-28 19:51 ` Alex Williamson
0 siblings, 0 replies; 17+ messages in thread
From: Alex Williamson @ 2026-02-28 19:51 UTC (permalink / raw)
To: Yishai Hadas
Cc: Alex Williamson, Jason Gunthorpe, kvm, Kevin Tian, joao.m.martins,
leonro, maorg, avihaih, liulongfang, giovanni.cabiddu, kwankhede
On Fri, Feb 27, 2026, at 1:42 PM, Alex Williamson wrote:
> On Tue, 24 Feb 2026 10:20:15 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
>> assign info.flags before copy_to_user().
>>
>> Because they copy the struct in from userspace first, this effectively
>> echoes userspace-provided flags back as output, preventing the field
>> from being used to report new reliable data from the drivers.
>>
>> Add support for a new device feature named
>> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>>
>> On SET, enables the v2 pre_copy_info behaviour, where the
>> vfio_precopy_info.flags is a valid output field.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>> drivers/vfio/pci/vfio_pci_core.c | 1 +
>> drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
>> include/linux/vfio.h | 1 +
>> 3 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index d43745fe4c84..e22280f53ebf 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>> #endif
>> vfio_pci_core_disable(vdev);
>>
>> + core_vdev->precopy_info_flags_fix = 0;
>> vfio_pci_dma_buf_cleanup(vdev);
>>
>> mutex_lock(&vdev->igate);
>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>> index 742477546b15..2243a6eb5547 100644
>> --- a/drivers/vfio/vfio_main.c
>> +++ b/drivers/vfio/vfio_main.c
>> @@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
>> return 0;
>> }
>>
>> +static int
>> +vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
>> + u32 flags, size_t argsz)
>> +{
>> + int ret;
>> +
>> + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
>> + return -EINVAL;
>> +
>> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
>
> This should be VFIO_DEVICE_FEATURE_SET | VFIO_DEVICE_FEATURE_PROBE.
> Probe support is essentially free, but we've not been good about
> including it. Thanks,
Sorry, ignore this, only GET and SET are checked for supported ops, PROBE is implicitly supported. Thanks,
Alex
>> + if (ret != 1)
>> + return ret;
>> +
>> + device->precopy_info_flags_fix = 1;
>> + return 0;
>> +}
>> +
>> static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
>> u32 flags, void __user *arg,
>> size_t argsz)
>> @@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
>> return vfio_ioctl_device_feature_migration_data_size(
>> device, feature.flags, arg->data,
>> feature.argsz - minsz);
>> + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
>> + return vfio_ioctl_device_feature_migration_precopy_info_v2(
>> + device, feature.flags, feature.argsz - minsz);
>> default:
>> if (unlikely(!device->ops->device_feature))
>> return -ENOTTY;
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index e90859956514..3ff21374aeee 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -52,6 +52,7 @@ struct vfio_device {
>> struct vfio_device_set *dev_set;
>> struct list_head dev_set_list;
>> unsigned int migration_flags;
>> + u8 precopy_info_flags_fix;
>> struct kvm *kvm;
>>
>> /* Members below here are private, not for driver use */
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 4/6] net/mlx5: Add IFC bits for migration state Yishai Hadas
` (3 subsequent siblings)
6 siblings, 0 replies; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
Introduce a core helper function for VFIO_MIG_GET_PRECOPY_INFO and adapt
all drivers to use it.
It centralizes the common code and ensures that output flags are cleared
on entry, in case user opts in to VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
This preventing any unintended echoing of userspace data back to
userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +++-----
drivers/vfio/pci/mlx5/main.c | 18 +++------
drivers/vfio/pci/qat/main.c | 17 +++-----
drivers/vfio/pci/virtio/migrate.c | 17 +++-----
include/linux/vfio.h | 39 +++++++++++++++++++
samples/vfio-mdev/mtty.c | 16 +++-----
6 files changed, 68 insertions(+), 56 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 1d367cff7dcf..bb121f635b9f 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -857,18 +857,12 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp,
struct hisi_acc_vf_core_device *hisi_acc_vdev = migf->hisi_acc_vdev;
loff_t *pos = &filp->f_pos;
struct vfio_precopy_info info;
- unsigned long minsz;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&hisi_acc_vdev->core_device.vdev, cmd,
+ arg, &info);
+ if (ret)
+ return ret;
mutex_lock(&hisi_acc_vdev->state_mutex);
if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
@@ -893,7 +887,8 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp,
mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex);
- return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ return copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
out:
mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index dbba6173894b..fb541c17c712 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -463,21 +463,14 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mlx5_vhca_data_buffer *buf;
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
- unsigned long minsz;
size_t inc_length = 0;
bool end_of_data = false;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
-
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&mvdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&mvdev->state_mutex);
if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -545,7 +538,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
done:
mlx5vf_state_mutex_unlock(mvdev);
- if (copy_to_user((void __user *)arg, &info, minsz))
+ if (copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)))
return -EFAULT;
return 0;
diff --git a/drivers/vfio/pci/qat/main.c b/drivers/vfio/pci/qat/main.c
index b982d4ae666c..b3a4b7a55696 100644
--- a/drivers/vfio/pci/qat/main.c
+++ b/drivers/vfio/pci/qat/main.c
@@ -121,18 +121,12 @@ static long qat_vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct qat_mig_dev *mig_dev = qat_vdev->mdev;
struct vfio_precopy_info info;
loff_t *pos = &filp->f_pos;
- unsigned long minsz;
int ret = 0;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&qat_vdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&qat_vdev->state_mutex);
if (qat_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -160,7 +154,8 @@ static long qat_vf_precopy_ioctl(struct file *filp, unsigned int cmd,
mutex_unlock(&qat_vdev->state_mutex);
if (ret)
return ret;
- return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ return copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
}
static ssize_t qat_vf_save_read(struct file *filp, char __user *buf,
diff --git a/drivers/vfio/pci/virtio/migrate.c b/drivers/vfio/pci/virtio/migrate.c
index 35fa2d6ed611..7e11834ad512 100644
--- a/drivers/vfio/pci/virtio/migrate.c
+++ b/drivers/vfio/pci/virtio/migrate.c
@@ -443,19 +443,13 @@ static long virtiovf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
bool end_of_data = false;
- unsigned long minsz;
u32 ctx_size = 0;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
-
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&virtvdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&virtvdev->state_mutex);
if (virtvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -514,7 +508,8 @@ static long virtiovf_precopy_ioctl(struct file *filp, unsigned int cmd,
done:
virtiovf_state_mutex_unlock(virtvdev);
- if (copy_to_user((void __user *)arg, &info, minsz))
+ if (copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)))
return -EFAULT;
return 0;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 3ff21374aeee..cd71261b6d01 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -16,6 +16,7 @@
#include <linux/cdev.h>
#include <uapi/linux/vfio.h>
#include <linux/iova_bitmap.h>
+#include <linux/uaccess.h>
struct kvm;
struct iommufd_ctx;
@@ -285,6 +286,44 @@ static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops,
return 1;
}
+/**
+ * vfio_check_precopy_ioctl - Validate user input for the VFIO_MIG_GET_PRECOPY_INFO ioctl
+ * @vdev: The vfio device
+ * @cmd: Cmd from the ioctl
+ * @arg: Arg from the ioctl
+ * @info: Driver pointer to hold the userspace input to the ioctl
+ *
+ * For use in a driver's get_precopy_info. Checks that the inputs to the
+ * VFIO_MIG_GET_PRECOPY_INFO ioctl are correct.
+
+ * Returns 0 on success, otherwise errno.
+ */
+
+static inline int
+vfio_check_precopy_ioctl(struct vfio_device *vdev, unsigned int cmd,
+ unsigned long arg, struct vfio_precopy_info *info)
+{
+ unsigned long minsz;
+
+ if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
+ return -ENOTTY;
+
+ minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
+
+ if (copy_from_user(info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info->argsz < minsz)
+ return -EINVAL;
+
+ /* keep v1 behaviour as is for compatibility reasons */
+ if (vdev->precopy_info_flags_fix)
+ /* flags are output, set its initial value to 0 */
+ info->flags = 0;
+
+ return 0;
+}
+
struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev,
const struct vfio_device_ops *ops);
#define vfio_alloc_device(dev_struct, member, dev, ops) \
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index bd92c38379b8..c1070af69544 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -837,18 +837,11 @@ static long mtty_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mdev_state *mdev_state = migf->mdev_state;
loff_t *pos = &filp->f_pos;
struct vfio_precopy_info info = {};
- unsigned long minsz;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&mdev_state->vdev, cmd, arg, &info);
+ if (ret)
+ return ret;
mutex_lock(&mdev_state->state_mutex);
if (mdev_state->state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -875,7 +868,8 @@ static long mtty_precopy_ioctl(struct file *filp, unsigned int cmd,
info.initial_bytes = migf->filled_size - *pos;
mutex_unlock(&migf->lock);
- ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ ret = copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
unlock:
mtty_state_mutex_unlock(mdev_state);
return ret;
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH vfio 4/6] net/mlx5: Add IFC bits for migration state
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (2 preceding siblings ...)
2026-02-24 8:20 ` [PATCH vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY Yishai Hadas
` (2 subsequent siblings)
6 siblings, 0 replies; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
Add the relevant IFC bits for querying an extra migration state from the
device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 775cb0c56865..1c8922c58c8f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2173,7 +2173,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
u8 sf_eq_usage[0x1];
u8 reserved_at_d3[0x5];
u8 multiplane[0x1];
- u8 reserved_at_d9[0x7];
+ u8 migration_state[0x1];
+ u8 reserved_at_da[0x6];
u8 cross_vhca_object_to_object_supported[0x20];
@@ -13280,13 +13281,24 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits {
u8 reserved_at_60[0x20];
};
+enum {
+ MLX5_QUERY_VHCA_MIG_STATE_UNINITIALIZED = 0x0,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_IDLE = 0x1,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_READY = 0x2,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_DIRTY = 0x3,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT = 0x4,
+};
+
struct mlx5_ifc_query_vhca_migration_state_out_bits {
u8 status[0x8];
u8 reserved_at_8[0x18];
u8 syndrome[0x20];
- u8 reserved_at_40[0x40];
+ u8 reserved_at_40[0x20];
+
+ u8 migration_state[0x4];
+ u8 reserved_at_64[0x1c];
u8 required_umem_size[0x20];
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (3 preceding siblings ...)
2026-02-24 8:20 ` [PATCH vfio 4/6] net/mlx5: Add IFC bits for migration state Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-24 8:20 ` [PATCH vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO Yishai Hadas
2026-02-27 20:23 ` [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Alex Williamson
6 siblings, 0 replies; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
Consider an inflight SAVE operation during the PRE_COPY phase, so the
caller will wait when no data is currently available but is expected
to arrive.
This enables a follow-up patch to avoid returning -ENOMSG while a new
*initial_bytes* chunk is still pending from an asynchronous SAVE command
issued by the VFIO_MIG_GET_PRECOPY_INFO ioctl.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/mlx5/cmd.c | 5 +++++
drivers/vfio/pci/mlx5/cmd.h | 1 +
drivers/vfio/pci/mlx5/main.c | 3 ++-
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index ca6d95f293cd..18b8d8594070 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -606,6 +606,8 @@ static void
mlx5vf_save_callback_complete(struct mlx5_vf_migration_file *migf,
struct mlx5vf_async_data *async_data)
{
+ migf->inflight_save = 0;
+ wake_up_interruptible(&migf->poll_wait);
kvfree(async_data->out);
complete(&migf->save_comp);
fput(migf->filp);
@@ -809,6 +811,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
async_data->header_buf = header_buf;
get_file(migf->filp);
+ migf->inflight_save = 1;
err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in),
async_data->out,
out_size, mlx5vf_save_callback,
@@ -819,6 +822,8 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
return 0;
err_exec:
+ migf->inflight_save = 0;
+ wake_up_interruptible(&migf->poll_wait);
if (header_buf)
mlx5vf_put_data_buffer(header_buf);
fput(migf->filp);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index d7821b5ca772..7d2c10be2e60 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -111,6 +111,7 @@ struct mlx5_vf_migration_file {
struct completion save_comp;
struct mlx5_async_ctx async_ctx;
struct mlx5vf_async_data async_data;
+ u8 inflight_save:1;
};
struct mlx5_vhca_cq_buf {
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index fb541c17c712..68e051c48d40 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -179,7 +179,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len,
!list_empty(&migf->buf_list) ||
migf->state == MLX5_MIGF_STATE_ERROR ||
migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR ||
- migf->state == MLX5_MIGF_STATE_PRE_COPY ||
+ (migf->state == MLX5_MIGF_STATE_PRE_COPY &&
+ !migf->inflight_save) ||
migf->state == MLX5_MIGF_STATE_COMPLETE))
return -ERESTARTSYS;
}
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (4 preceding siblings ...)
2026-02-24 8:20 ` [PATCH vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY Yishai Hadas
@ 2026-02-24 8:20 ` Yishai Hadas
2026-02-27 20:23 ` [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Alex Williamson
6 siblings, 0 replies; 17+ messages in thread
From: Yishai Hadas @ 2026-02-24 8:20 UTC (permalink / raw)
To: alex.williamson, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
When userspace opts into VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2, the
driver may report the VFIO_PRECOPY_INFO_REINIT output flag in response
to the VFIO_MIG_GET_PRECOPY_INFO ioctl, along with a new initial_bytes
value.
The presence of the VFIO_PRECOPY_INFO_REINIT flag indicates to the
caller that new initial data is available in the migration stream.
If the firmware reports a new initial-data chunk, any previously dirty
bytes in memory are treated as initial bytes, since the caller must read
both sets before reaching the end of the initial-data region.
In this case, the driver issues a new SAVE command to fetch the data and
prepare it for a subsequent read() from userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/mlx5/cmd.c | 20 ++++++--
drivers/vfio/pci/mlx5/cmd.h | 5 +-
drivers/vfio/pci/mlx5/main.c | 97 +++++++++++++++++++++++-------------
3 files changed, 83 insertions(+), 39 deletions(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 18b8d8594070..5fe0621b5fbd 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -87,7 +87,7 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod)
int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
size_t *state_size, u64 *total_size,
- u8 query_flags)
+ u8 *mig_state, u8 query_flags)
{
u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {};
u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {};
@@ -152,6 +152,10 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
MLX5_GET64(query_vhca_migration_state_out, out,
remaining_total_size) : *state_size;
+ if (mig_state && mvdev->mig_state_cap)
+ *mig_state = MLX5_GET(query_vhca_migration_state_out, out,
+ migration_state);
+
return 0;
}
@@ -277,6 +281,9 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks))
mvdev->chunk_mode = 1;
+ if (MLX5_CAP_GEN_2(mvdev->mdev, migration_state))
+ mvdev->mig_state_cap = 1;
+
end:
mlx5_vf_put_core_dev(mvdev->mdev);
}
@@ -555,6 +562,7 @@ void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf)
{
spin_lock_irq(&buf->migf->list_lock);
buf->stop_copy_chunk_num = 0;
+ buf->pre_copy_init_bytes_chunk = false;
list_add_tail(&buf->buf_elm, &buf->migf->avail_list);
spin_unlock_irq(&buf->migf->list_lock);
}
@@ -689,7 +697,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
!next_required_umem_size;
if (async_data->header_buf) {
status = add_buf_header(async_data->header_buf, image_size,
- initial_pre_copy);
+ initial_pre_copy ||
+ async_data->buf->pre_copy_init_bytes_chunk);
if (status)
goto err;
}
@@ -708,9 +717,12 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
}
}
spin_unlock_irqrestore(&migf->list_lock, flags);
- if (initial_pre_copy) {
+ if (initial_pre_copy || async_data->buf->pre_copy_init_bytes_chunk) {
migf->pre_copy_initial_bytes += image_size;
- migf->state = MLX5_MIGF_STATE_PRE_COPY;
+ if (initial_pre_copy)
+ migf->state = MLX5_MIGF_STATE_PRE_COPY;
+ if (async_data->buf->pre_copy_init_bytes_chunk)
+ async_data->buf->pre_copy_init_bytes_chunk = false;
}
if (stop_copy_last_chunk)
migf->state = MLX5_MIGF_STATE_COMPLETE;
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 7d2c10be2e60..deed0f132f39 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -62,6 +62,7 @@ struct mlx5_vhca_data_buffer {
u32 *mkey_in;
enum dma_data_direction dma_dir;
u8 stop_copy_chunk_num;
+ bool pre_copy_init_bytes_chunk;
struct list_head buf_elm;
struct mlx5_vf_migration_file *migf;
};
@@ -97,6 +98,7 @@ struct mlx5_vf_migration_file {
u32 record_tag;
u64 stop_copy_prep_size;
u64 pre_copy_initial_bytes;
+ u64 pre_copy_initial_bytes_start;
size_t next_required_umem_size;
u8 num_ready_chunks;
/* Upon chunk mode preserve another set of buffers for stop_copy phase */
@@ -175,6 +177,7 @@ struct mlx5vf_pci_core_device {
u8 mdev_detach:1;
u8 log_active:1;
u8 chunk_mode:1;
+ u8 mig_state_cap:1;
struct completion tracker_comp;
/* protect migration state */
struct mutex state_mutex;
@@ -199,7 +202,7 @@ int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
size_t *state_size, u64 *total_size,
- u8 query_flags);
+ u8 *migration_state, u8 query_flags);
void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
const struct vfio_migration_ops *mig_ops,
const struct vfio_log_ops *log_ops);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 68e051c48d40..0d4e363a4e3b 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -464,8 +464,10 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mlx5_vhca_data_buffer *buf;
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
+ u8 migration_state = 0;
size_t inc_length = 0;
- bool end_of_data = false;
+ bool reinit_state;
+ bool end_of_data;
int ret;
ret = vfio_check_precopy_ioctl(&mvdev->core_device.vdev, cmd, arg,
@@ -492,7 +494,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
* As so, the other code below is safe with the proper locks.
*/
ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length,
- NULL, MLX5VF_QUERY_INC);
+ NULL, &migration_state,
+ MLX5VF_QUERY_INC);
if (ret)
goto err_state_unlock;
}
@@ -503,41 +506,67 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
goto err_migf_unlock;
}
- if (migf->pre_copy_initial_bytes > *pos) {
- info.initial_bytes = migf->pre_copy_initial_bytes - *pos;
+ /*
+ * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves
+ * as opt-in for VFIO_PRECOPY_INFO_REINIT as well
+ */
+ reinit_state = mvdev->core_device.vdev.precopy_info_flags_fix &&
+ migration_state == MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT;
+ end_of_data = !(migf->max_pos - *pos);
+ if (reinit_state) {
+ /*
+ * Any bytes already present in memory are treated as initial
+ * bytes, since the caller is required to read them before
+ * reaching the new initial-bytes region.
+ */
+ migf->pre_copy_initial_bytes_start = *pos;
+ migf->pre_copy_initial_bytes = migf->max_pos - *pos;
+ info.initial_bytes = migf->pre_copy_initial_bytes + inc_length;
+ info.flags |= VFIO_PRECOPY_INFO_REINIT;
} else {
- info.dirty_bytes = migf->max_pos - *pos;
- if (!info.dirty_bytes)
- end_of_data = true;
- info.dirty_bytes += inc_length;
+ if (migf->pre_copy_initial_bytes_start +
+ migf->pre_copy_initial_bytes > *pos) {
+ WARN_ON_ONCE(end_of_data);
+ info.initial_bytes = migf->pre_copy_initial_bytes_start +
+ migf->pre_copy_initial_bytes - *pos;
+ } else {
+ info.dirty_bytes = (migf->max_pos - *pos) + inc_length;
+ }
}
+ mutex_unlock(&migf->lock);
- if (!end_of_data || !inc_length) {
- mutex_unlock(&migf->lock);
- goto done;
- }
+ if ((reinit_state || end_of_data) && inc_length) {
+ /*
+ * In case we finished transferring the current state and the
+ * device has a dirty state, or that the device has a new init
+ * state, save a new state to be ready for.
+ */
+ buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE),
+ DMA_FROM_DEVICE);
+ if (IS_ERR(buf)) {
+ ret = PTR_ERR(buf);
+ mlx5vf_mark_err(migf);
+ goto err_state_unlock;
+ }
- mutex_unlock(&migf->lock);
- /*
- * We finished transferring the current state and the device has a
- * dirty state, save a new state to be ready for.
- */
- buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE),
- DMA_FROM_DEVICE);
- if (IS_ERR(buf)) {
- ret = PTR_ERR(buf);
- mlx5vf_mark_err(migf);
- goto err_state_unlock;
- }
+ buf->pre_copy_init_bytes_chunk = reinit_state;
+ ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true);
+ if (ret) {
+ mlx5vf_mark_err(migf);
+ mlx5vf_put_data_buffer(buf);
+ goto err_state_unlock;
+ }
- ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true);
- if (ret) {
- mlx5vf_mark_err(migf);
- mlx5vf_put_data_buffer(buf);
- goto err_state_unlock;
+ /*
+ * SAVE appends a header record via add_buf_header(),
+ * let's account it as well.
+ */
+ if (reinit_state)
+ info.initial_bytes += sizeof(struct mlx5_vf_migration_header);
+ else
+ info.dirty_bytes += sizeof(struct mlx5_vf_migration_header);
}
-done:
mlx5vf_state_mutex_unlock(mvdev);
if (copy_to_user((void __user *)arg, &info,
offsetofend(struct vfio_precopy_info, dirty_bytes)))
@@ -570,7 +599,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
if (migf->state == MLX5_MIGF_STATE_ERROR)
return -ENODEV;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL,
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, NULL,
MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL);
if (ret)
goto err;
@@ -636,7 +665,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
if (ret)
goto out;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, 0);
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, NULL, 0);
if (ret)
goto out_pd;
@@ -1123,7 +1152,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
enum mlx5_vf_migf_state state;
size_t size;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL,
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL, NULL,
MLX5VF_QUERY_INC | MLX5VF_QUERY_CLEANUP);
if (ret)
return ERR_PTR(ret);
@@ -1248,7 +1277,7 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev,
mutex_lock(&mvdev->state_mutex);
ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &state_size,
- &total_size, 0);
+ &total_size, NULL, 0);
if (!ret)
*stop_copy_length = total_size;
mlx5vf_state_mutex_unlock(mvdev);
--
2.18.1
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-02-24 8:20 [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (5 preceding siblings ...)
2026-02-24 8:20 ` [PATCH vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO Yishai Hadas
@ 2026-02-27 20:23 ` Alex Williamson
2026-02-28 6:27 ` Cédric Le Goater
6 siblings, 1 reply; 17+ messages in thread
From: Alex Williamson @ 2026-02-27 20:23 UTC (permalink / raw)
To: Cédric Le Goater, Peter Xu
Cc: Yishai Hadas, alex.williamson, jgg, kvm, kevin.tian,
joao.m.martins, leonro, maorg, avihaih, liulongfang,
giovanni.cabiddu, kwankhede, alex
+Cédric, +Peter, please see what you think of this approach relative to
QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
unfortunate, but we need an opt-in for the driver to enable REINIT
reporting anyway. Thanks,
Alex
On Tue, 24 Feb 2026 10:20:13 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> This series introduces support for re-initializing the initial_bytes
> value during the VFIO PRE_COPY migration phase.
>
> Background
> ==========
> As currently defined, initial_bytes is monotonically decreasing and
> precedes dirty_bytes when reading from the saving file descriptor.
> The transition from initial_bytes to dirty_bytes is unidirectional and
> irreversible.
>
> The initial_bytes are considered critical data that is highly
> recommended to be transferred to the target as part of PRE_COPY.
> Without this data, the PRE_COPY phase would be ineffective.
>
> Problem Statement
> =================
> In some cases, a new chunk of critical data may appear during the
> PRE_COPY phase. The current API does not provide a mechanism for the
> driver to report an updated initial_bytes value when this occurs.
>
> Solution
> ========
> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
> initial_bytes value during the PRE_COPY phase.
>
> However, Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations
> don't assign info.flags before copy_to_user(), this effectively echoes
> userspace-provided flags back as output, preventing the field from being
> used to report new reliable data from the drivers.
>
> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
> to explicitly opt in. For that we introduce a new feature named
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>
> User should opt-in to the above feature with a SET operation, no data is
> required and any supplied data is ignored.
>
> When the caller opts in:
> - We set info.flags to zero, otherwise we keep v1 behaviour as is for
> compatibility reasons.
> - The new output flag VFIO_PRECOPY_INFO_REINIT can be used reliably.
> - The VFIO_PRECOPY_INFO_REINIT output flag indicates that new initial
> data is present on the stream. The initial_bytes value should be
> re-evaluated relative to the readiness state for transition to
> STOP_COPY.
>
> The mlx5 VFIO driver is extended to support this case when the
> underlying firmware also supports the REINIT migration state.
>
> As part of this series, a core helper function is introduced to provide
> shared functionality for implementing the VFIO_MIG_GET_PRECOPY_INFO
> ioctl, and all drivers have been updated to use it.
>
> Note:
> We may need to send the net/mlx5 patch to VFIO as a pull request to
> avoid conflicts prior to acceptance.
>
> Yishai
>
> Yishai Hadas (6):
> vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
> vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
> vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
> net/mlx5: Add IFC bits for migration state
> vfio/mlx5: consider inflight SAVE during PRE_COPY
> vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
>
> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +--
> drivers/vfio/pci/mlx5/cmd.c | 25 +++-
> drivers/vfio/pci/mlx5/cmd.h | 6 +-
> drivers/vfio/pci/mlx5/main.c | 118 +++++++++++-------
> drivers/vfio/pci/qat/main.c | 17 +--
> drivers/vfio/pci/vfio_pci_core.c | 1 +
> drivers/vfio/pci/virtio/migrate.c | 17 +--
> drivers/vfio/vfio_main.c | 20 +++
> include/linux/mlx5/mlx5_ifc.h | 16 ++-
> include/linux/vfio.h | 40 ++++++
> include/uapi/linux/vfio.h | 22 ++++
> samples/vfio-mdev/mtty.c | 16 +--
> 12 files changed, 217 insertions(+), 98 deletions(-)
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-02-27 20:23 ` [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Alex Williamson
@ 2026-02-28 6:27 ` Cédric Le Goater
2026-03-01 12:43 ` Yishai Hadas
0 siblings, 1 reply; 17+ messages in thread
From: Cédric Le Goater @ 2026-02-28 6:27 UTC (permalink / raw)
To: Alex Williamson, Peter Xu
Cc: Yishai Hadas, alex.williamson, jgg, kvm, kevin.tian,
joao.m.martins, leonro, maorg, avihaih, liulongfang,
giovanni.cabiddu, kwankhede
Hello,
On 2/27/26 21:23, Alex Williamson wrote:
>
> +Cédric, +Peter, please see what you think of this approach relative to
> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
> unfortunate, but we need an opt-in for the driver to enable REINIT
> reporting anyway. Thanks,
>
> Alex
I took a quick look. The series would be a little cleaner if
vfio_check_precopy_ioctl() came first and some parts are little ugly
(precopy_info_flags_fix). Will take a closer look when back from PTO.
Is there a QEMU implementation ?
Thanks,
C.
>
> On Tue, 24 Feb 2026 10:20:13 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> This series introduces support for re-initializing the initial_bytes
>> value during the VFIO PRE_COPY migration phase.
>>
>> Background
>> ==========
>> As currently defined, initial_bytes is monotonically decreasing and
>> precedes dirty_bytes when reading from the saving file descriptor.
>> The transition from initial_bytes to dirty_bytes is unidirectional and
>> irreversible.
>>
>> The initial_bytes are considered critical data that is highly
>> recommended to be transferred to the target as part of PRE_COPY.
>> Without this data, the PRE_COPY phase would be ineffective.
>>
>> Problem Statement
>> =================
>> In some cases, a new chunk of critical data may appear during the
>> PRE_COPY phase. The current API does not provide a mechanism for the
>> driver to report an updated initial_bytes value when this occurs.
>>
>> Solution
>> ========
>> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
>> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
>> initial_bytes value during the PRE_COPY phase.
>>
>> However, Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations
>> don't assign info.flags before copy_to_user(), this effectively echoes
>> userspace-provided flags back as output, preventing the field from being
>> used to report new reliable data from the drivers.
>>
>> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
>> to explicitly opt in. For that we introduce a new feature named
>> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>>
>> User should opt-in to the above feature with a SET operation, no data is
>> required and any supplied data is ignored.
>>
>> When the caller opts in:
>> - We set info.flags to zero, otherwise we keep v1 behaviour as is for
>> compatibility reasons.
>> - The new output flag VFIO_PRECOPY_INFO_REINIT can be used reliably.
>> - The VFIO_PRECOPY_INFO_REINIT output flag indicates that new initial
>> data is present on the stream. The initial_bytes value should be
>> re-evaluated relative to the readiness state for transition to
>> STOP_COPY.
>>
>> The mlx5 VFIO driver is extended to support this case when the
>> underlying firmware also supports the REINIT migration state.
>>
>> As part of this series, a core helper function is introduced to provide
>> shared functionality for implementing the VFIO_MIG_GET_PRECOPY_INFO
>> ioctl, and all drivers have been updated to use it.
>>
>> Note:
>> We may need to send the net/mlx5 patch to VFIO as a pull request to
>> avoid conflicts prior to acceptance.
>>
>> Yishai
>>
>> Yishai Hadas (6):
>> vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
>> vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
>> vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
>> net/mlx5: Add IFC bits for migration state
>> vfio/mlx5: consider inflight SAVE during PRE_COPY
>> vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
>>
>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +--
>> drivers/vfio/pci/mlx5/cmd.c | 25 +++-
>> drivers/vfio/pci/mlx5/cmd.h | 6 +-
>> drivers/vfio/pci/mlx5/main.c | 118 +++++++++++-------
>> drivers/vfio/pci/qat/main.c | 17 +--
>> drivers/vfio/pci/vfio_pci_core.c | 1 +
>> drivers/vfio/pci/virtio/migrate.c | 17 +--
>> drivers/vfio/vfio_main.c | 20 +++
>> include/linux/mlx5/mlx5_ifc.h | 16 ++-
>> include/linux/vfio.h | 40 ++++++
>> include/uapi/linux/vfio.h | 22 ++++
>> samples/vfio-mdev/mtty.c | 16 +--
>> 12 files changed, 217 insertions(+), 98 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-02-28 6:27 ` Cédric Le Goater
@ 2026-03-01 12:43 ` Yishai Hadas
2026-03-10 9:46 ` Yishai Hadas
0 siblings, 1 reply; 17+ messages in thread
From: Yishai Hadas @ 2026-03-01 12:43 UTC (permalink / raw)
To: Cédric Le Goater, Alex Williamson, Peter Xu
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede, yishaih
On 28/02/2026 8:27, Cédric Le Goater wrote:
> Hello,
>
> On 2/27/26 21:23, Alex Williamson wrote:
>>
>> +Cédric, +Peter, please see what you think of this approach relative to
>> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
>> unfortunate, but we need an opt-in for the driver to enable REINIT
>> reporting anyway. Thanks,
>>
>> Alex
>
>
> I took a quick look. The series would be a little cleaner if
> vfio_check_precopy_ioctl() came first
The motivation to introduce that core helper and adapt all the drivers
to use it, was to centralize the common code and ensures that output
flags are cleared on entry.
This can be done only after that we have the previous opt-in patch as we
would like to keep the V1 behavior for compatibility reasons.
> and some parts are little ugly
> (precopy_info_flags_fix). Will take a closer look when back from PTO.
>
I'm open for any better name, any specific suggestion ?
> Is there a QEMU implementation ?
Yes, please see here [1] the candidate QEMU patches that the kernel
series was tested with.
[1] https://github.com/avihai1122/qemu/commits/vfio_precopy_info_reinit/
Thanks,
Yishai
>
> Thanks,
>
> C.
>
>
>
>>
>> On Tue, 24 Feb 2026 10:20:13 +0200
>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>
>>> This series introduces support for re-initializing the initial_bytes
>>> value during the VFIO PRE_COPY migration phase.
>>>
>>> Background
>>> ==========
>>> As currently defined, initial_bytes is monotonically decreasing and
>>> precedes dirty_bytes when reading from the saving file descriptor.
>>> The transition from initial_bytes to dirty_bytes is unidirectional and
>>> irreversible.
>>>
>>> The initial_bytes are considered critical data that is highly
>>> recommended to be transferred to the target as part of PRE_COPY.
>>> Without this data, the PRE_COPY phase would be ineffective.
>>>
>>> Problem Statement
>>> =================
>>> In some cases, a new chunk of critical data may appear during the
>>> PRE_COPY phase. The current API does not provide a mechanism for the
>>> driver to report an updated initial_bytes value when this occurs.
>>>
>>> Solution
>>> ========
>>> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
>>> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
>>> initial_bytes value during the PRE_COPY phase.
>>>
>>> However, Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations
>>> don't assign info.flags before copy_to_user(), this effectively echoes
>>> userspace-provided flags back as output, preventing the field from being
>>> used to report new reliable data from the drivers.
>>>
>>> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
>>> to explicitly opt in. For that we introduce a new feature named
>>> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>>>
>>> User should opt-in to the above feature with a SET operation, no data is
>>> required and any supplied data is ignored.
>>>
>>> When the caller opts in:
>>> - We set info.flags to zero, otherwise we keep v1 behaviour as is for
>>> compatibility reasons.
>>> - The new output flag VFIO_PRECOPY_INFO_REINIT can be used reliably.
>>> - The VFIO_PRECOPY_INFO_REINIT output flag indicates that new initial
>>> data is present on the stream. The initial_bytes value should be
>>> re-evaluated relative to the readiness state for transition to
>>> STOP_COPY.
>>>
>>> The mlx5 VFIO driver is extended to support this case when the
>>> underlying firmware also supports the REINIT migration state.
>>>
>>> As part of this series, a core helper function is introduced to provide
>>> shared functionality for implementing the VFIO_MIG_GET_PRECOPY_INFO
>>> ioctl, and all drivers have been updated to use it.
>>>
>>> Note:
>>> We may need to send the net/mlx5 patch to VFIO as a pull request to
>>> avoid conflicts prior to acceptance.
>>>
>>> Yishai
>>>
>>> Yishai Hadas (6):
>>> vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
>>> vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
>>> vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
>>> net/mlx5: Add IFC bits for migration state
>>> vfio/mlx5: consider inflight SAVE during PRE_COPY
>>> vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
>>>
>>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +--
>>> drivers/vfio/pci/mlx5/cmd.c | 25 +++-
>>> drivers/vfio/pci/mlx5/cmd.h | 6 +-
>>> drivers/vfio/pci/mlx5/main.c | 118 +++++++++++-------
>>> drivers/vfio/pci/qat/main.c | 17 +--
>>> drivers/vfio/pci/vfio_pci_core.c | 1 +
>>> drivers/vfio/pci/virtio/migrate.c | 17 +--
>>> drivers/vfio/vfio_main.c | 20 +++
>>> include/linux/mlx5/mlx5_ifc.h | 16 ++-
>>> include/linux/vfio.h | 40 ++++++
>>> include/uapi/linux/vfio.h | 22 ++++
>>> samples/vfio-mdev/mtty.c | 16 +--
>>> 12 files changed, 217 insertions(+), 98 deletions(-)
>>>
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-03-01 12:43 ` Yishai Hadas
@ 2026-03-10 9:46 ` Yishai Hadas
2026-03-10 10:09 ` Cédric Le Goater
0 siblings, 1 reply; 17+ messages in thread
From: Yishai Hadas @ 2026-03-10 9:46 UTC (permalink / raw)
To: Cédric Le Goater, Alex Williamson, Peter Xu
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
On 01/03/2026 14:43, Yishai Hadas wrote:
> On 28/02/2026 8:27, Cédric Le Goater wrote:
>> Hello,
>>
>> On 2/27/26 21:23, Alex Williamson wrote:
>>>
>>> +Cédric, +Peter, please see what you think of this approach relative to
>>> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
>>> unfortunate, but we need an opt-in for the driver to enable REINIT
>>> reporting anyway. Thanks,
>>>
>>> Alex
>>
>>
>> I took a quick look. The series would be a little cleaner if
>> vfio_check_precopy_ioctl() came first
>
> The motivation to introduce that core helper and adapt all the drivers
> to use it, was to centralize the common code and ensures that output
> flags are cleared on entry.
>
> This can be done only after that we have the previous opt-in patch as we
> would like to keep the V1 behavior for compatibility reasons.
>
>> and some parts are little ugly
>> (precopy_info_flags_fix). Will take a closer look when back from PTO.
>>
>
> I'm open for any better name, any specific suggestion ?
How about renaming it to precopy_info_v2, which is closer to the feature
name ?
>
>> Is there a QEMU implementation ?
>
> Yes, please see here [1] the candidate QEMU patches that the kernel
> series was tested with.
>
> [1] https://github.com/avihai1122/qemu/commits/vfio_precopy_info_reinit/
>
Cedric,
Did you have the chance to look at the matching QEMU patches ?
For now, only minor notes remain open, and I would like to send a V1
soon to make progress.
Thanks,
Yishai
> Thanks,
> Yishai
>
>>
>> Thanks,
>>
>> C.
>>
>>
>>
>>>
>>> On Tue, 24 Feb 2026 10:20:13 +0200
>>> Yishai Hadas <yishaih@nvidia.com> wrote:
>>>
>>>> This series introduces support for re-initializing the initial_bytes
>>>> value during the VFIO PRE_COPY migration phase.
>>>>
>>>> Background
>>>> ==========
>>>> As currently defined, initial_bytes is monotonically decreasing and
>>>> precedes dirty_bytes when reading from the saving file descriptor.
>>>> The transition from initial_bytes to dirty_bytes is unidirectional and
>>>> irreversible.
>>>>
>>>> The initial_bytes are considered critical data that is highly
>>>> recommended to be transferred to the target as part of PRE_COPY.
>>>> Without this data, the PRE_COPY phase would be ineffective.
>>>>
>>>> Problem Statement
>>>> =================
>>>> In some cases, a new chunk of critical data may appear during the
>>>> PRE_COPY phase. The current API does not provide a mechanism for the
>>>> driver to report an updated initial_bytes value when this occurs.
>>>>
>>>> Solution
>>>> ========
>>>> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
>>>> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
>>>> initial_bytes value during the PRE_COPY phase.
>>>>
>>>> However, Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations
>>>> don't assign info.flags before copy_to_user(), this effectively echoes
>>>> userspace-provided flags back as output, preventing the field from
>>>> being
>>>> used to report new reliable data from the drivers.
>>>>
>>>> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires
>>>> userspace
>>>> to explicitly opt in. For that we introduce a new feature named
>>>> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>>>>
>>>> User should opt-in to the above feature with a SET operation, no
>>>> data is
>>>> required and any supplied data is ignored.
>>>>
>>>> When the caller opts in:
>>>> - We set info.flags to zero, otherwise we keep v1 behaviour as is for
>>>> compatibility reasons.
>>>> - The new output flag VFIO_PRECOPY_INFO_REINIT can be used reliably.
>>>> - The VFIO_PRECOPY_INFO_REINIT output flag indicates that new initial
>>>> data is present on the stream. The initial_bytes value should be
>>>> re-evaluated relative to the readiness state for transition to
>>>> STOP_COPY.
>>>>
>>>> The mlx5 VFIO driver is extended to support this case when the
>>>> underlying firmware also supports the REINIT migration state.
>>>>
>>>> As part of this series, a core helper function is introduced to provide
>>>> shared functionality for implementing the VFIO_MIG_GET_PRECOPY_INFO
>>>> ioctl, and all drivers have been updated to use it.
>>>>
>>>> Note:
>>>> We may need to send the net/mlx5 patch to VFIO as a pull request to
>>>> avoid conflicts prior to acceptance.
>>>>
>>>> Yishai
>>>>
>>>> Yishai Hadas (6):
>>>> vfio: Define uAPI for re-init initial bytes during the PRE_COPY
>>>> phase
>>>> vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
>>>> vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
>>>> net/mlx5: Add IFC bits for migration state
>>>> vfio/mlx5: consider inflight SAVE during PRE_COPY
>>>> vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
>>>>
>>>> .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +--
>>>> drivers/vfio/pci/mlx5/cmd.c | 25 +++-
>>>> drivers/vfio/pci/mlx5/cmd.h | 6 +-
>>>> drivers/vfio/pci/mlx5/main.c | 118 ++++++++++
>>>> +-------
>>>> drivers/vfio/pci/qat/main.c | 17 +--
>>>> drivers/vfio/pci/vfio_pci_core.c | 1 +
>>>> drivers/vfio/pci/virtio/migrate.c | 17 +--
>>>> drivers/vfio/vfio_main.c | 20 +++
>>>> include/linux/mlx5/mlx5_ifc.h | 16 ++-
>>>> include/linux/vfio.h | 40 ++++++
>>>> include/uapi/linux/vfio.h | 22 ++++
>>>> samples/vfio-mdev/mtty.c | 16 +--
>>>> 12 files changed, 217 insertions(+), 98 deletions(-)
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-03-10 9:46 ` Yishai Hadas
@ 2026-03-10 10:09 ` Cédric Le Goater
2026-03-10 12:44 ` Yishai Hadas
0 siblings, 1 reply; 17+ messages in thread
From: Cédric Le Goater @ 2026-03-10 10:09 UTC (permalink / raw)
To: Yishai Hadas, Alex Williamson, Peter Xu
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
On 3/10/26 10:46, Yishai Hadas wrote:
> On 01/03/2026 14:43, Yishai Hadas wrote:
>> On 28/02/2026 8:27, Cédric Le Goater wrote:
>>> Hello,
>>>
>>> On 2/27/26 21:23, Alex Williamson wrote:
>>>>
>>>> +Cédric, +Peter, please see what you think of this approach relative to
>>>> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
>>>> unfortunate, but we need an opt-in for the driver to enable REINIT
>>>> reporting anyway. Thanks,
>>>>
>>>> Alex
>>>
>>>
>>> I took a quick look. The series would be a little cleaner if
>>> vfio_check_precopy_ioctl() came first
>>
>> The motivation to introduce that core helper and adapt all the drivers to use it, was to centralize the common code and ensures that output flags are cleared on entry.
>>
>> This can be done only after that we have the previous opt-in patch as we would like to keep the V1 behavior for compatibility reasons.
>>
>>> and some parts are little ugly
>>> (precopy_info_flags_fix). Will take a closer look when back from PTO.
>>>
>>
>> I'm open for any better name, any specific suggestion ?
>
> How about renaming it to precopy_info_v2, which is closer to the feature name ?
I found the prefix '_fix' confusing and saw no connection with
the names of the new flag (VFIO_PRECOPY_INFO_REINIT) or ioctl
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
Based on how 'precopy_info_flags_fix' is used in mlx5vf_precopy_ioctl() :
+ /*
+ * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves
+ * as opt-in for VFIO_PRECOPY_INFO_REINIT as well
+ */
+ reinit_state = mvdev->core_device.vdev.precopy_info_flags_fix &&
+ migration_state == MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT;
a name like 'precopy_reinit_allow' seems appropriate to me.
'precopy_info_v2' could work just as well but that would tie the
precopy reinit/reset to an ioctl which might me a bit limiting.
As you wish.
>>> Is there a QEMU implementation ?
>>
>> Yes, please see here [1] the candidate QEMU patches that the kernel series was tested with.
>>
>> [1] https://github.com/avihai1122/qemu/commits/vfio_precopy_info_reinit/
>>
>
> Cedric,
> Did you have the chance to look at the matching QEMU patches ?
I will this week. I just got back and work has piled up on many fronts.
> For now, only minor notes remain open, and I would like to send a V1 soon to make progress.
you should send the QEMU series. Naming is not a blocker and it will
help assess kernel support and compatibility.
Thanks,
C.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-03-10 10:09 ` Cédric Le Goater
@ 2026-03-10 12:44 ` Yishai Hadas
2026-03-10 16:00 ` Alex Williamson
0 siblings, 1 reply; 17+ messages in thread
From: Yishai Hadas @ 2026-03-10 12:44 UTC (permalink / raw)
To: Cédric Le Goater, Alex Williamson, Peter Xu
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih,
liulongfang, giovanni.cabiddu, kwankhede
On 10/03/2026 12:09, Cédric Le Goater wrote:
> On 3/10/26 10:46, Yishai Hadas wrote:
>> On 01/03/2026 14:43, Yishai Hadas wrote:
>>> On 28/02/2026 8:27, Cédric Le Goater wrote:
>>>> Hello,
>>>>
>>>> On 2/27/26 21:23, Alex Williamson wrote:
>>>>>
>>>>> +Cédric, +Peter, please see what you think of this approach
>>>>> relative to
>>>>> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
>>>>> unfortunate, but we need an opt-in for the driver to enable REINIT
>>>>> reporting anyway. Thanks,
>>>>>
>>>>> Alex
>>>>
>>>>
>>>> I took a quick look. The series would be a little cleaner if
>>>> vfio_check_precopy_ioctl() came first
>>>
>>> The motivation to introduce that core helper and adapt all the
>>> drivers to use it, was to centralize the common code and ensures that
>>> output flags are cleared on entry.
>>>
>>> This can be done only after that we have the previous opt-in patch as
>>> we would like to keep the V1 behavior for compatibility reasons.
>>>
>>>> and some parts are little ugly
>>>> (precopy_info_flags_fix). Will take a closer look when back from PTO.
>>>>
>>>
>>> I'm open for any better name, any specific suggestion ?
>>
>> How about renaming it to precopy_info_v2, which is closer to the
>> feature name ?
>
> I found the prefix '_fix' confusing and saw no connection with
> the names of the new flag (VFIO_PRECOPY_INFO_REINIT) or ioctl
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>
>
> Based on how 'precopy_info_flags_fix' is used in mlx5vf_precopy_ioctl() :
>
> + /*
> + * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves
> + * as opt-in for VFIO_PRECOPY_INFO_REINIT as well
> + */
> + reinit_state = mvdev->core_device.vdev.precopy_info_flags_fix &&
> + migration_state ==
> MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT;
>
>
> a name like 'precopy_reinit_allow' seems appropriate to me.
The opt-in allows using the 'flags' field not only for the reinit case,
but for any further usage.
See also its usage as part vfio_check_precopy_ioctl() where we
set info->flags to 0, based on it.
> 'precopy_info_v2' could work just as well but that would tie the
> precopy reinit/reset to an ioctl which might me a bit limiting.
> As you wish.
IMO, renaming to precopy_info_v2 seems more suitable to describe the
feature/expected usage.
>
>
>>>> Is there a QEMU implementation ?
>>>
>>> Yes, please see here [1] the candidate QEMU patches that the kernel
>>> series was tested with.
>>>
>>> [1] https://github.com/avihai1122/qemu/commits/vfio_precopy_info_reinit/
>>>
>>
>> Cedric,
>> Did you have the chance to look at the matching QEMU patches ?
>
> I will this week. I just got back and work has piled up on many fronts.
>
>> For now, only minor notes remain open, and I would like to send a V1
>> soon to make progress.
>
> you should send the QEMU series. Naming is not a blocker and it will
> help assess kernel support and compatibility.
>
Avihai may send the QEMU series probably next week.
In the meantime you can review its stuff in the github URL that was
published above.
I plan to send the kernel V1 soon.
Yishai
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-03-10 12:44 ` Yishai Hadas
@ 2026-03-10 16:00 ` Alex Williamson
0 siblings, 0 replies; 17+ messages in thread
From: Alex Williamson @ 2026-03-10 16:00 UTC (permalink / raw)
To: Yishai Hadas
Cc: Cédric Le Goater, Peter Xu, jgg, kvm, kevin.tian,
joao.m.martins, leonro, maorg, avihaih, liulongfang,
giovanni.cabiddu, kwankhede, alex
On Tue, 10 Mar 2026 14:44:04 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> On 10/03/2026 12:09, Cédric Le Goater wrote:
> > On 3/10/26 10:46, Yishai Hadas wrote:
> >> On 01/03/2026 14:43, Yishai Hadas wrote:
> >>> On 28/02/2026 8:27, Cédric Le Goater wrote:
> >>>> Hello,
> >>>>
> >>>> On 2/27/26 21:23, Alex Williamson wrote:
> >>>>>
> >>>>> +Cédric, +Peter, please see what you think of this approach
> >>>>> relative to
> >>>>> QEMU. The broken uAPI for flags on the PRECOPY_INFO ioctl is
> >>>>> unfortunate, but we need an opt-in for the driver to enable REINIT
> >>>>> reporting anyway. Thanks,
> >>>>>
> >>>>> Alex
> >>>>
> >>>>
> >>>> I took a quick look. The series would be a little cleaner if
> >>>> vfio_check_precopy_ioctl() came first
> >>>
> >>> The motivation to introduce that core helper and adapt all the
> >>> drivers to use it, was to centralize the common code and ensures that
> >>> output flags are cleared on entry.
> >>>
> >>> This can be done only after that we have the previous opt-in patch as
> >>> we would like to keep the V1 behavior for compatibility reasons.
> >>>
> >>>> and some parts are little ugly
> >>>> (precopy_info_flags_fix). Will take a closer look when back from PTO.
> >>>>
> >>>
> >>> I'm open for any better name, any specific suggestion ?
> >>
> >> How about renaming it to precopy_info_v2, which is closer to the
> >> feature name ?
> >
> > I found the prefix '_fix' confusing and saw no connection with
> > the names of the new flag (VFIO_PRECOPY_INFO_REINIT) or ioctl
> > VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
> >
> >
> > Based on how 'precopy_info_flags_fix' is used in mlx5vf_precopy_ioctl() :
> >
> > + /*
> > + * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves
> > + * as opt-in for VFIO_PRECOPY_INFO_REINIT as well
> > + */
> > + reinit_state = mvdev->core_device.vdev.precopy_info_flags_fix &&
> > + migration_state ==
> > MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT;
> >
> >
> > a name like 'precopy_reinit_allow' seems appropriate to me.
>
> The opt-in allows using the 'flags' field not only for the reinit case,
> but for any further usage.
Yes. It started out as an opt-in for reinit, but then the failure to
sanitize flags was discovered and it seemed to make sense that if we're
defining a v2 interface where flags is valid, reinit could be a
fundamental part of the new base v2 protocol.
If a driver doesn't want/need to support reinit, it never needs to
raise that flag. Userspace can also always choose to ignore the flag
and transition to stop-copy at any point. It didn't seem worth a
separate opt-in beyond v2, but do comment if you have other ideas.
This could have also gone the route of making a new v2 ioctl, but this
seemed slightly cleaner. Thanks,
Alex
^ permalink raw reply [flat|nested] 17+ messages in thread