* [PATCH V2 vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
As currently defined, initial_bytes is monotonically decreasing and
precedes dirty_bytes when reading from the saving file descriptor.
The transition from initial_bytes to dirty_bytes is unidirectional and
irreversible.
The initial_bytes are considered as critical data that is highly
recommended to be transferred to the target as part of PRE_COPY, without
this data, the PRE_COPY phase would be ineffective.
We come to solve the case when a new chunk of critical data is
introduced during the PRE_COPY phase and the driver would like to report
an entirely new value for the initial_bytes.
For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
initial_bytes value during the PRE_COPY phase.
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user(), this effectively echoes
userspace-provided flags back as output, preventing the field from being
used to report new reliable data from the drivers.
Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
to explicitly opt in by enabling the
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 device feature.
When the caller opts in, the driver may report an entirely new
value for initial_bytes. It may be larger, it may be smaller, it may
include the previous unread initial_bytes, it may discard the previous
unread initial_bytes, up to the driver logic and state.
The presence of the VFIO_PRECOPY_INFO_REINIT output flag set by the
driver indicates that new initial data is present on the stream.
Once the caller sees this flag, the initial_bytes value should be
re-evaluated relative to the readiness state for transition to
STOP_COPY.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
include/uapi/linux/vfio.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index bb7b89330d35..bb4a2df0550d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1266,6 +1266,19 @@ enum vfio_device_mig_state {
* The initial_bytes field indicates the amount of initial precopy
* data available from the device. This field should have a non-zero initial
* value and decrease as migration data is read from the device.
+ * The presence of the VFIO_PRECOPY_INFO_REINIT output flag indicates
+ * that new initial data is present on the stream.
+ * The new initial data may result, for example, from device reconfiguration
+ * during migration that requires additional initialization data.
+ * In that case initial_bytes may report a non-zero value irrespective of
+ * any previously reported values, which progresses towards zero as precopy
+ * data is read from the data stream. dirty_bytes is also reset
+ * to zero and represents the state change of the device relative to the new
+ * initial_bytes.
+ * VFIO_PRECOPY_INFO_REINIT can be reported only after userspace opts in to
+ * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2. Without this opt-in, the flags field
+ * of struct vfio_precopy_info is reserved for bug-compatibility reasons.
+ *
* It is recommended to leave PRE_COPY for STOP_COPY only after this field
* reaches zero. Leaving PRE_COPY earlier might make things slower.
*
@@ -1301,6 +1314,7 @@ enum vfio_device_mig_state {
struct vfio_precopy_info {
__u32 argsz;
__u32 flags;
+#define VFIO_PRECOPY_INFO_REINIT (1 << 0) /* output - new initial data is present */
__aligned_u64 initial_bytes;
__aligned_u64 dirty_bytes;
};
@@ -1510,6 +1524,16 @@ struct vfio_device_feature_dma_buf {
struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges);
};
+/*
+ * Enables the migration precopy_info_v2 behaviour.
+ *
+ * VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
+ *
+ * On SET, enables the v2 pre_copy_info behaviour, where the
+ * vfio_precopy_info.flags is a valid output field.
+ */
+#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-18 22:03 ` Alex Williamson
2026-03-17 16:17 ` [PATCH V2 vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl Yishai Hadas
` (4 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
assign info.flags before copy_to_user().
Because they copy the struct in from userspace first, this effectively
echoes userspace-provided flags back as output, preventing the field
from being used to report new reliable data from the drivers.
Add support for a new device feature named
VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
On SET, enables the v2 pre_copy_info behaviour, where the
vfio_precopy_info.flags is a valid output field.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/vfio_pci_core.c | 1 +
drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
include/linux/vfio.h | 1 +
3 files changed, 22 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index d43745fe4c84..1daaceb5b2c8 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
#endif
vfio_pci_core_disable(vdev);
+ core_vdev->precopy_info_v2 = 0;
vfio_pci_dma_buf_cleanup(vdev);
mutex_lock(&vdev->igate);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 742477546b15..dcb879018f27 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
return 0;
}
+static int
+vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
+ u32 flags, size_t argsz)
+{
+ int ret;
+
+ if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
+ return -EINVAL;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
+ if (ret != 1)
+ return ret;
+
+ device->precopy_info_v2 = 1;
+ return 0;
+}
+
static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
u32 flags, void __user *arg,
size_t argsz)
@@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
return vfio_ioctl_device_feature_migration_data_size(
device, feature.flags, arg->data,
feature.argsz - minsz);
+ case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
+ return vfio_ioctl_device_feature_migration_precopy_info_v2(
+ device, feature.flags, feature.argsz - minsz);
default:
if (unlikely(!device->ops->device_feature))
return -ENOTTY;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index e90859956514..7c1d33283e04 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -52,6 +52,7 @@ struct vfio_device {
struct vfio_device_set *dev_set;
struct list_head dev_set_list;
unsigned int migration_flags;
+ u8 precopy_info_v2;
struct kvm *kvm;
/* Members below here are private, not for driver use */
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-03-17 16:17 ` [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
@ 2026-03-18 22:03 ` Alex Williamson
2026-03-19 15:26 ` Yishai Hadas
0 siblings, 1 reply; 11+ messages in thread
From: Alex Williamson @ 2026-03-18 22:03 UTC (permalink / raw)
To: Yishai Hadas
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih, clg,
peterx, liulongfang, giovanni.cabiddu, kwankhede, alex
On Tue, 17 Mar 2026 18:17:49 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
> assign info.flags before copy_to_user().
>
> Because they copy the struct in from userspace first, this effectively
> echoes userspace-provided flags back as output, preventing the field
> from being used to report new reliable data from the drivers.
>
> Add support for a new device feature named
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>
> On SET, enables the v2 pre_copy_info behaviour, where the
> vfio_precopy_info.flags is a valid output field.
>
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> ---
> drivers/vfio/pci/vfio_pci_core.c | 1 +
> drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
> include/linux/vfio.h | 1 +
> 3 files changed, 22 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index d43745fe4c84..1daaceb5b2c8 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> #endif
> vfio_pci_core_disable(vdev);
>
> + core_vdev->precopy_info_v2 = 0;
> vfio_pci_dma_buf_cleanup(vdev);
There's a minor discrepancy here, enabling precopy_info_v2 is a core
vfio feature, but clearing the previous user's opt-in is only
implemented in the core helper for vfio-pci and associated variant
drivers. This should be moved to vfio_df_device_last_close() to be
common. A follow-up fix rather than a v3 is fine if you agree. Thanks,
Alex
>
> mutex_lock(&vdev->igate);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index 742477546b15..dcb879018f27 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
> return 0;
> }
>
> +static int
> +vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
> + u32 flags, size_t argsz)
> +{
> + int ret;
> +
> + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
> + return -EINVAL;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
> + if (ret != 1)
> + return ret;
> +
> + device->precopy_info_v2 = 1;
> + return 0;
> +}
> +
> static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
> u32 flags, void __user *arg,
> size_t argsz)
> @@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
> return vfio_ioctl_device_feature_migration_data_size(
> device, feature.flags, arg->data,
> feature.argsz - minsz);
> + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
> + return vfio_ioctl_device_feature_migration_precopy_info_v2(
> + device, feature.flags, feature.argsz - minsz);
> default:
> if (unlikely(!device->ops->device_feature))
> return -ENOTTY;
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index e90859956514..7c1d33283e04 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -52,6 +52,7 @@ struct vfio_device {
> struct vfio_device_set *dev_set;
> struct list_head dev_set_list;
> unsigned int migration_flags;
> + u8 precopy_info_v2;
> struct kvm *kvm;
>
> /* Members below here are private, not for driver use */
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-03-18 22:03 ` Alex Williamson
@ 2026-03-19 15:26 ` Yishai Hadas
2026-03-19 18:28 ` Alex Williamson
0 siblings, 1 reply; 11+ messages in thread
From: Yishai Hadas @ 2026-03-19 15:26 UTC (permalink / raw)
To: Alex Williamson
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih, clg,
peterx, liulongfang, giovanni.cabiddu, kwankhede
On 19/03/2026 0:03, Alex Williamson wrote:
> On Tue, 17 Mar 2026 18:17:49 +0200
> Yishai Hadas <yishaih@nvidia.com> wrote:
>
>> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
>> assign info.flags before copy_to_user().
>>
>> Because they copy the struct in from userspace first, this effectively
>> echoes userspace-provided flags back as output, preventing the field
>> from being used to report new reliable data from the drivers.
>>
>> Add support for a new device feature named
>> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>>
>> On SET, enables the v2 pre_copy_info behaviour, where the
>> vfio_precopy_info.flags is a valid output field.
>>
>> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
>> ---
>> drivers/vfio/pci/vfio_pci_core.c | 1 +
>> drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
>> include/linux/vfio.h | 1 +
>> 3 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
>> index d43745fe4c84..1daaceb5b2c8 100644
>> --- a/drivers/vfio/pci/vfio_pci_core.c
>> +++ b/drivers/vfio/pci/vfio_pci_core.c
>> @@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
>> #endif
>> vfio_pci_core_disable(vdev);
>>
>> + core_vdev->precopy_info_v2 = 0;
>> vfio_pci_dma_buf_cleanup(vdev);
>
> There's a minor discrepancy here, enabling precopy_info_v2 is a core
> vfio feature, but clearing the previous user's opt-in is only
> implemented in the core helper for vfio-pci and associated variant
> drivers. This should be moved to vfio_df_device_last_close() to be
> common. A follow-up fix rather than a v3 is fine if you agree. Thanks,
>
Sure, up to you.
We can consider also using the below fix-up chunk as part of merging V2.
diff --git a/drivers/vfio/pci/vfio_pci_core.c
b/drivers/vfio/pci/vfio_pci_core.c
index 1daaceb5b2c8..d43745fe4c84 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -736,7 +736,6 @@ void vfio_pci_core_close_device(struct vfio_device
*core_vdev)
#endif
vfio_pci_core_disable(vdev);
- core_vdev->precopy_info_v2 = 0;
vfio_pci_dma_buf_cleanup(vdev);
mutex_lock(&vdev->igate);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index dcb879018f27..8666f35fb3f0 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -553,6 +553,7 @@ static void vfio_df_device_last_close(struct
vfio_device_file *df)
vfio_df_iommufd_unbind(df);
else
vfio_device_group_unuse_iommu(device);
+ device->precopy_info_v2 = 0;
module_put(device->dev->driver->owner);
}
Yishai
> Alex
>
>>
>> mutex_lock(&vdev->igate);
>> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
>> index 742477546b15..dcb879018f27 100644
>> --- a/drivers/vfio/vfio_main.c
>> +++ b/drivers/vfio/vfio_main.c
>> @@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
>> return 0;
>> }
>>
>> +static int
>> +vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
>> + u32 flags, size_t argsz)
>> +{
>> + int ret;
>> +
>> + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
>> + return -EINVAL;
>> +
>> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
>> + if (ret != 1)
>> + return ret;
>> +
>> + device->precopy_info_v2 = 1;
>> + return 0;
>> +}
>> +
>> static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
>> u32 flags, void __user *arg,
>> size_t argsz)
>> @@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
>> return vfio_ioctl_device_feature_migration_data_size(
>> device, feature.flags, arg->data,
>> feature.argsz - minsz);
>> + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
>> + return vfio_ioctl_device_feature_migration_precopy_info_v2(
>> + device, feature.flags, feature.argsz - minsz);
>> default:
>> if (unlikely(!device->ops->device_feature))
>> return -ENOTTY;
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index e90859956514..7c1d33283e04 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -52,6 +52,7 @@ struct vfio_device {
>> struct vfio_device_set *dev_set;
>> struct list_head dev_set_list;
>> unsigned int migration_flags;
>> + u8 precopy_info_v2;
>> struct kvm *kvm;
>>
>> /* Members below here are private, not for driver use */
>
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2
2026-03-19 15:26 ` Yishai Hadas
@ 2026-03-19 18:28 ` Alex Williamson
0 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2026-03-19 18:28 UTC (permalink / raw)
To: Yishai Hadas
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih, clg,
peterx, liulongfang, giovanni.cabiddu, kwankhede, alex
On Thu, 19 Mar 2026 17:26:46 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> On 19/03/2026 0:03, Alex Williamson wrote:
> > On Tue, 17 Mar 2026 18:17:49 +0200
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >
> >> Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations don't
> >> assign info.flags before copy_to_user().
> >>
> >> Because they copy the struct in from userspace first, this effectively
> >> echoes userspace-provided flags back as output, preventing the field
> >> from being used to report new reliable data from the drivers.
> >>
> >> Add support for a new device feature named
> >> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
> >>
> >> On SET, enables the v2 pre_copy_info behaviour, where the
> >> vfio_precopy_info.flags is a valid output field.
> >>
> >> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> >> ---
> >> drivers/vfio/pci/vfio_pci_core.c | 1 +
> >> drivers/vfio/vfio_main.c | 20 ++++++++++++++++++++
> >> include/linux/vfio.h | 1 +
> >> 3 files changed, 22 insertions(+)
> >>
> >> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> >> index d43745fe4c84..1daaceb5b2c8 100644
> >> --- a/drivers/vfio/pci/vfio_pci_core.c
> >> +++ b/drivers/vfio/pci/vfio_pci_core.c
> >> @@ -736,6 +736,7 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev)
> >> #endif
> >> vfio_pci_core_disable(vdev);
> >>
> >> + core_vdev->precopy_info_v2 = 0;
> >> vfio_pci_dma_buf_cleanup(vdev);
> >
> > There's a minor discrepancy here, enabling precopy_info_v2 is a core
> > vfio feature, but clearing the previous user's opt-in is only
> > implemented in the core helper for vfio-pci and associated variant
> > drivers. This should be moved to vfio_df_device_last_close() to be
> > common. A follow-up fix rather than a v3 is fine if you agree. Thanks,
> >
>
> Sure, up to you.
>
> We can consider also using the below fix-up chunk as part of merging V2.
I'll fold it in. Thanks,
Alex
> diff --git a/drivers/vfio/pci/vfio_pci_core.c
> b/drivers/vfio/pci/vfio_pci_core.c
> index 1daaceb5b2c8..d43745fe4c84 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -736,7 +736,6 @@ void vfio_pci_core_close_device(struct vfio_device
> *core_vdev)
> #endif
> vfio_pci_core_disable(vdev);
>
> - core_vdev->precopy_info_v2 = 0;
> vfio_pci_dma_buf_cleanup(vdev);
>
> mutex_lock(&vdev->igate);
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index dcb879018f27..8666f35fb3f0 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -553,6 +553,7 @@ static void vfio_df_device_last_close(struct
> vfio_device_file *df)
> vfio_df_iommufd_unbind(df);
> else
> vfio_device_group_unuse_iommu(device);
> + device->precopy_info_v2 = 0;
> module_put(device->dev->driver->owner);
> }
>
> Yishai
>
>
> > Alex
> >
> >>
> >> mutex_lock(&vdev->igate);
> >> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> >> index 742477546b15..dcb879018f27 100644
> >> --- a/drivers/vfio/vfio_main.c
> >> +++ b/drivers/vfio/vfio_main.c
> >> @@ -964,6 +964,23 @@ vfio_ioctl_device_feature_migration_data_size(struct vfio_device *device,
> >> return 0;
> >> }
> >>
> >> +static int
> >> +vfio_ioctl_device_feature_migration_precopy_info_v2(struct vfio_device *device,
> >> + u32 flags, size_t argsz)
> >> +{
> >> + int ret;
> >> +
> >> + if (!(device->migration_flags & VFIO_MIGRATION_PRE_COPY))
> >> + return -EINVAL;
> >> +
> >> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0);
> >> + if (ret != 1)
> >> + return ret;
> >> +
> >> + device->precopy_info_v2 = 1;
> >> + return 0;
> >> +}
> >> +
> >> static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
> >> u32 flags, void __user *arg,
> >> size_t argsz)
> >> @@ -1251,6 +1268,9 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
> >> return vfio_ioctl_device_feature_migration_data_size(
> >> device, feature.flags, arg->data,
> >> feature.argsz - minsz);
> >> + case VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2:
> >> + return vfio_ioctl_device_feature_migration_precopy_info_v2(
> >> + device, feature.flags, feature.argsz - minsz);
> >> default:
> >> if (unlikely(!device->ops->device_feature))
> >> return -ENOTTY;
> >> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> >> index e90859956514..7c1d33283e04 100644
> >> --- a/include/linux/vfio.h
> >> +++ b/include/linux/vfio.h
> >> @@ -52,6 +52,7 @@ struct vfio_device {
> >> struct vfio_device_set *dev_set;
> >> struct list_head dev_set_list;
> >> unsigned int migration_flags;
> >> + u8 precopy_info_v2;
> >> struct kvm *kvm;
> >>
> >> /* Members below here are private, not for driver use */
> >
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH V2 vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 1/6] vfio: Define uAPI for re-init initial bytes during the PRE_COPY phase Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 2/6] vfio: Add support for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 4/6] net/mlx5: Add IFC bits for migration state Yishai Hadas
` (3 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
Introduce a core helper function for VFIO_MIG_GET_PRECOPY_INFO and adapt
all drivers to use it.
It centralizes the common code and ensures that output flags are cleared
on entry, in case user opts in to VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
This preventing any unintended echoing of userspace data back to
userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 17 +++-----
drivers/vfio/pci/mlx5/main.c | 18 +++------
drivers/vfio/pci/qat/main.c | 17 +++-----
drivers/vfio/pci/virtio/migrate.c | 17 +++-----
include/linux/vfio.h | 39 +++++++++++++++++++
samples/vfio-mdev/mtty.c | 16 +++-----
6 files changed, 68 insertions(+), 56 deletions(-)
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index 1d367cff7dcf..bb121f635b9f 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -857,18 +857,12 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp,
struct hisi_acc_vf_core_device *hisi_acc_vdev = migf->hisi_acc_vdev;
loff_t *pos = &filp->f_pos;
struct vfio_precopy_info info;
- unsigned long minsz;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&hisi_acc_vdev->core_device.vdev, cmd,
+ arg, &info);
+ if (ret)
+ return ret;
mutex_lock(&hisi_acc_vdev->state_mutex);
if (hisi_acc_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY) {
@@ -893,7 +887,8 @@ static long hisi_acc_vf_precopy_ioctl(struct file *filp,
mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex);
- return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ return copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
out:
mutex_unlock(&migf->lock);
mutex_unlock(&hisi_acc_vdev->state_mutex);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index dbba6173894b..fb541c17c712 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -463,21 +463,14 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mlx5_vhca_data_buffer *buf;
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
- unsigned long minsz;
size_t inc_length = 0;
bool end_of_data = false;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
-
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&mvdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&mvdev->state_mutex);
if (mvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -545,7 +538,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
done:
mlx5vf_state_mutex_unlock(mvdev);
- if (copy_to_user((void __user *)arg, &info, minsz))
+ if (copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)))
return -EFAULT;
return 0;
diff --git a/drivers/vfio/pci/qat/main.c b/drivers/vfio/pci/qat/main.c
index b982d4ae666c..b3a4b7a55696 100644
--- a/drivers/vfio/pci/qat/main.c
+++ b/drivers/vfio/pci/qat/main.c
@@ -121,18 +121,12 @@ static long qat_vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct qat_mig_dev *mig_dev = qat_vdev->mdev;
struct vfio_precopy_info info;
loff_t *pos = &filp->f_pos;
- unsigned long minsz;
int ret = 0;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&qat_vdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&qat_vdev->state_mutex);
if (qat_vdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -160,7 +154,8 @@ static long qat_vf_precopy_ioctl(struct file *filp, unsigned int cmd,
mutex_unlock(&qat_vdev->state_mutex);
if (ret)
return ret;
- return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ return copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
}
static ssize_t qat_vf_save_read(struct file *filp, char __user *buf,
diff --git a/drivers/vfio/pci/virtio/migrate.c b/drivers/vfio/pci/virtio/migrate.c
index 35fa2d6ed611..7e11834ad512 100644
--- a/drivers/vfio/pci/virtio/migrate.c
+++ b/drivers/vfio/pci/virtio/migrate.c
@@ -443,19 +443,13 @@ static long virtiovf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
bool end_of_data = false;
- unsigned long minsz;
u32 ctx_size = 0;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
-
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&virtvdev->core_device.vdev, cmd, arg,
+ &info);
+ if (ret)
+ return ret;
mutex_lock(&virtvdev->state_mutex);
if (virtvdev->mig_state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -514,7 +508,8 @@ static long virtiovf_precopy_ioctl(struct file *filp, unsigned int cmd,
done:
virtiovf_state_mutex_unlock(virtvdev);
- if (copy_to_user((void __user *)arg, &info, minsz))
+ if (copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)))
return -EFAULT;
return 0;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 7c1d33283e04..50b474334a19 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -16,6 +16,7 @@
#include <linux/cdev.h>
#include <uapi/linux/vfio.h>
#include <linux/iova_bitmap.h>
+#include <linux/uaccess.h>
struct kvm;
struct iommufd_ctx;
@@ -285,6 +286,44 @@ static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops,
return 1;
}
+/**
+ * vfio_check_precopy_ioctl - Validate user input for the VFIO_MIG_GET_PRECOPY_INFO ioctl
+ * @vdev: The vfio device
+ * @cmd: Cmd from the ioctl
+ * @arg: Arg from the ioctl
+ * @info: Driver pointer to hold the userspace input to the ioctl
+ *
+ * For use in a driver's get_precopy_info. Checks that the inputs to the
+ * VFIO_MIG_GET_PRECOPY_INFO ioctl are correct.
+
+ * Returns 0 on success, otherwise errno.
+ */
+
+static inline int
+vfio_check_precopy_ioctl(struct vfio_device *vdev, unsigned int cmd,
+ unsigned long arg, struct vfio_precopy_info *info)
+{
+ unsigned long minsz;
+
+ if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
+ return -ENOTTY;
+
+ minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
+
+ if (copy_from_user(info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info->argsz < minsz)
+ return -EINVAL;
+
+ /* keep v1 behaviour as is for compatibility reasons */
+ if (vdev->precopy_info_v2)
+ /* flags are output, set its initial value to 0 */
+ info->flags = 0;
+
+ return 0;
+}
+
struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev,
const struct vfio_device_ops *ops);
#define vfio_alloc_device(dev_struct, member, dev, ops) \
diff --git a/samples/vfio-mdev/mtty.c b/samples/vfio-mdev/mtty.c
index bd92c38379b8..c1070af69544 100644
--- a/samples/vfio-mdev/mtty.c
+++ b/samples/vfio-mdev/mtty.c
@@ -837,18 +837,11 @@ static long mtty_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mdev_state *mdev_state = migf->mdev_state;
loff_t *pos = &filp->f_pos;
struct vfio_precopy_info info = {};
- unsigned long minsz;
int ret;
- if (cmd != VFIO_MIG_GET_PRECOPY_INFO)
- return -ENOTTY;
-
- minsz = offsetofend(struct vfio_precopy_info, dirty_bytes);
-
- if (copy_from_user(&info, (void __user *)arg, minsz))
- return -EFAULT;
- if (info.argsz < minsz)
- return -EINVAL;
+ ret = vfio_check_precopy_ioctl(&mdev_state->vdev, cmd, arg, &info);
+ if (ret)
+ return ret;
mutex_lock(&mdev_state->state_mutex);
if (mdev_state->state != VFIO_DEVICE_STATE_PRE_COPY &&
@@ -875,7 +868,8 @@ static long mtty_precopy_ioctl(struct file *filp, unsigned int cmd,
info.initial_bytes = migf->filled_size - *pos;
mutex_unlock(&migf->lock);
- ret = copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0;
+ ret = copy_to_user((void __user *)arg, &info,
+ offsetofend(struct vfio_precopy_info, dirty_bytes)) ? -EFAULT : 0;
unlock:
mtty_state_mutex_unlock(mdev_state);
return ret;
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH V2 vfio 4/6] net/mlx5: Add IFC bits for migration state
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (2 preceding siblings ...)
2026-03-17 16:17 ` [PATCH V2 vfio 3/6] vfio: Adapt drivers to use the core helper vfio_check_precopy_ioctl Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY Yishai Hadas
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
Add the relevant IFC bits for querying an extra migration state from the
device.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 775cb0c56865..1c8922c58c8f 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -2173,7 +2173,8 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
u8 sf_eq_usage[0x1];
u8 reserved_at_d3[0x5];
u8 multiplane[0x1];
- u8 reserved_at_d9[0x7];
+ u8 migration_state[0x1];
+ u8 reserved_at_da[0x6];
u8 cross_vhca_object_to_object_supported[0x20];
@@ -13280,13 +13281,24 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits {
u8 reserved_at_60[0x20];
};
+enum {
+ MLX5_QUERY_VHCA_MIG_STATE_UNINITIALIZED = 0x0,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_IDLE = 0x1,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_READY = 0x2,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_DIRTY = 0x3,
+ MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT = 0x4,
+};
+
struct mlx5_ifc_query_vhca_migration_state_out_bits {
u8 status[0x8];
u8 reserved_at_8[0x18];
u8 syndrome[0x20];
- u8 reserved_at_40[0x40];
+ u8 reserved_at_40[0x20];
+
+ u8 migration_state[0x4];
+ u8 reserved_at_64[0x1c];
u8 required_umem_size[0x20];
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH V2 vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (3 preceding siblings ...)
2026-03-17 16:17 ` [PATCH V2 vfio 4/6] net/mlx5: Add IFC bits for migration state Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-17 16:17 ` [PATCH V2 vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO Yishai Hadas
2026-03-20 21:17 ` [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Alex Williamson
6 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
Consider an inflight SAVE operation during the PRE_COPY phase, so the
caller will wait when no data is currently available but is expected
to arrive.
This enables a follow-up patch to avoid returning -ENOMSG while a new
*initial_bytes* chunk is still pending from an asynchronous SAVE command
issued by the VFIO_MIG_GET_PRECOPY_INFO ioctl.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/mlx5/cmd.c | 5 +++++
drivers/vfio/pci/mlx5/cmd.h | 1 +
drivers/vfio/pci/mlx5/main.c | 3 ++-
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index ca6d95f293cd..18b8d8594070 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -606,6 +606,8 @@ static void
mlx5vf_save_callback_complete(struct mlx5_vf_migration_file *migf,
struct mlx5vf_async_data *async_data)
{
+ migf->inflight_save = 0;
+ wake_up_interruptible(&migf->poll_wait);
kvfree(async_data->out);
complete(&migf->save_comp);
fput(migf->filp);
@@ -809,6 +811,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
async_data->header_buf = header_buf;
get_file(migf->filp);
+ migf->inflight_save = 1;
err = mlx5_cmd_exec_cb(&migf->async_ctx, in, sizeof(in),
async_data->out,
out_size, mlx5vf_save_callback,
@@ -819,6 +822,8 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
return 0;
err_exec:
+ migf->inflight_save = 0;
+ wake_up_interruptible(&migf->poll_wait);
if (header_buf)
mlx5vf_put_data_buffer(header_buf);
fput(migf->filp);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index d7821b5ca772..7d2c10be2e60 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -111,6 +111,7 @@ struct mlx5_vf_migration_file {
struct completion save_comp;
struct mlx5_async_ctx async_ctx;
struct mlx5vf_async_data async_data;
+ u8 inflight_save:1;
};
struct mlx5_vhca_cq_buf {
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index fb541c17c712..68e051c48d40 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -179,7 +179,8 @@ static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len,
!list_empty(&migf->buf_list) ||
migf->state == MLX5_MIGF_STATE_ERROR ||
migf->state == MLX5_MIGF_STATE_PRE_COPY_ERROR ||
- migf->state == MLX5_MIGF_STATE_PRE_COPY ||
+ (migf->state == MLX5_MIGF_STATE_PRE_COPY &&
+ !migf->inflight_save) ||
migf->state == MLX5_MIGF_STATE_COMPLETE))
return -ERESTARTSYS;
}
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH V2 vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (4 preceding siblings ...)
2026-03-17 16:17 ` [PATCH V2 vfio 5/6] vfio/mlx5: consider inflight SAVE during PRE_COPY Yishai Hadas
@ 2026-03-17 16:17 ` Yishai Hadas
2026-03-20 21:17 ` [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Alex Williamson
6 siblings, 0 replies; 11+ messages in thread
From: Yishai Hadas @ 2026-03-17 16:17 UTC (permalink / raw)
To: alex, jgg
Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg, avihaih,
clg, peterx, liulongfang, giovanni.cabiddu, kwankhede
When userspace opts into VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2, the
driver may report the VFIO_PRECOPY_INFO_REINIT output flag in response
to the VFIO_MIG_GET_PRECOPY_INFO ioctl, along with a new initial_bytes
value.
The presence of the VFIO_PRECOPY_INFO_REINIT flag indicates to the
caller that new initial data is available in the migration stream.
If the firmware reports a new initial-data chunk, any previously dirty
bytes in memory are treated as initial bytes, since the caller must read
both sets before reaching the end of the initial-data region.
In this case, the driver issues a new SAVE command to fetch the data and
prepare it for a subsequent read() from userspace.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
drivers/vfio/pci/mlx5/cmd.c | 20 ++++++--
drivers/vfio/pci/mlx5/cmd.h | 5 +-
drivers/vfio/pci/mlx5/main.c | 97 +++++++++++++++++++++++-------------
3 files changed, 83 insertions(+), 39 deletions(-)
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 18b8d8594070..5fe0621b5fbd 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -87,7 +87,7 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod)
int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
size_t *state_size, u64 *total_size,
- u8 query_flags)
+ u8 *mig_state, u8 query_flags)
{
u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {};
u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {};
@@ -152,6 +152,10 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
MLX5_GET64(query_vhca_migration_state_out, out,
remaining_total_size) : *state_size;
+ if (mig_state && mvdev->mig_state_cap)
+ *mig_state = MLX5_GET(query_vhca_migration_state_out, out,
+ migration_state);
+
return 0;
}
@@ -277,6 +281,9 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks))
mvdev->chunk_mode = 1;
+ if (MLX5_CAP_GEN_2(mvdev->mdev, migration_state))
+ mvdev->mig_state_cap = 1;
+
end:
mlx5_vf_put_core_dev(mvdev->mdev);
}
@@ -555,6 +562,7 @@ void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf)
{
spin_lock_irq(&buf->migf->list_lock);
buf->stop_copy_chunk_num = 0;
+ buf->pre_copy_init_bytes_chunk = false;
list_add_tail(&buf->buf_elm, &buf->migf->avail_list);
spin_unlock_irq(&buf->migf->list_lock);
}
@@ -689,7 +697,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
!next_required_umem_size;
if (async_data->header_buf) {
status = add_buf_header(async_data->header_buf, image_size,
- initial_pre_copy);
+ initial_pre_copy ||
+ async_data->buf->pre_copy_init_bytes_chunk);
if (status)
goto err;
}
@@ -708,9 +717,12 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
}
}
spin_unlock_irqrestore(&migf->list_lock, flags);
- if (initial_pre_copy) {
+ if (initial_pre_copy || async_data->buf->pre_copy_init_bytes_chunk) {
migf->pre_copy_initial_bytes += image_size;
- migf->state = MLX5_MIGF_STATE_PRE_COPY;
+ if (initial_pre_copy)
+ migf->state = MLX5_MIGF_STATE_PRE_COPY;
+ if (async_data->buf->pre_copy_init_bytes_chunk)
+ async_data->buf->pre_copy_init_bytes_chunk = false;
}
if (stop_copy_last_chunk)
migf->state = MLX5_MIGF_STATE_COMPLETE;
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 7d2c10be2e60..deed0f132f39 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -62,6 +62,7 @@ struct mlx5_vhca_data_buffer {
u32 *mkey_in;
enum dma_data_direction dma_dir;
u8 stop_copy_chunk_num;
+ bool pre_copy_init_bytes_chunk;
struct list_head buf_elm;
struct mlx5_vf_migration_file *migf;
};
@@ -97,6 +98,7 @@ struct mlx5_vf_migration_file {
u32 record_tag;
u64 stop_copy_prep_size;
u64 pre_copy_initial_bytes;
+ u64 pre_copy_initial_bytes_start;
size_t next_required_umem_size;
u8 num_ready_chunks;
/* Upon chunk mode preserve another set of buffers for stop_copy phase */
@@ -175,6 +177,7 @@ struct mlx5vf_pci_core_device {
u8 mdev_detach:1;
u8 log_active:1;
u8 chunk_mode:1;
+ u8 mig_state_cap:1;
struct completion tracker_comp;
/* protect migration state */
struct mutex state_mutex;
@@ -199,7 +202,7 @@ int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
size_t *state_size, u64 *total_size,
- u8 query_flags);
+ u8 *migration_state, u8 query_flags);
void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
const struct vfio_migration_ops *mig_ops,
const struct vfio_log_ops *log_ops);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 68e051c48d40..de306dee1d1a 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -464,8 +464,10 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
struct mlx5_vhca_data_buffer *buf;
struct vfio_precopy_info info = {};
loff_t *pos = &filp->f_pos;
+ u8 migration_state = 0;
size_t inc_length = 0;
- bool end_of_data = false;
+ bool reinit_state;
+ bool end_of_data;
int ret;
ret = vfio_check_precopy_ioctl(&mvdev->core_device.vdev, cmd, arg,
@@ -492,7 +494,8 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
* As so, the other code below is safe with the proper locks.
*/
ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length,
- NULL, MLX5VF_QUERY_INC);
+ NULL, &migration_state,
+ MLX5VF_QUERY_INC);
if (ret)
goto err_state_unlock;
}
@@ -503,41 +506,67 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
goto err_migf_unlock;
}
- if (migf->pre_copy_initial_bytes > *pos) {
- info.initial_bytes = migf->pre_copy_initial_bytes - *pos;
+ /*
+ * opt-in for VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 serves
+ * as opt-in for VFIO_PRECOPY_INFO_REINIT as well
+ */
+ reinit_state = mvdev->core_device.vdev.precopy_info_v2 &&
+ migration_state == MLX5_QUERY_VHCA_MIG_STATE_OPER_MIGRATION_INIT;
+ end_of_data = !(migf->max_pos - *pos);
+ if (reinit_state) {
+ /*
+ * Any bytes already present in memory are treated as initial
+ * bytes, since the caller is required to read them before
+ * reaching the new initial-bytes region.
+ */
+ migf->pre_copy_initial_bytes_start = *pos;
+ migf->pre_copy_initial_bytes = migf->max_pos - *pos;
+ info.initial_bytes = migf->pre_copy_initial_bytes + inc_length;
+ info.flags |= VFIO_PRECOPY_INFO_REINIT;
} else {
- info.dirty_bytes = migf->max_pos - *pos;
- if (!info.dirty_bytes)
- end_of_data = true;
- info.dirty_bytes += inc_length;
+ if (migf->pre_copy_initial_bytes_start +
+ migf->pre_copy_initial_bytes > *pos) {
+ WARN_ON_ONCE(end_of_data);
+ info.initial_bytes = migf->pre_copy_initial_bytes_start +
+ migf->pre_copy_initial_bytes - *pos;
+ } else {
+ info.dirty_bytes = (migf->max_pos - *pos) + inc_length;
+ }
}
+ mutex_unlock(&migf->lock);
- if (!end_of_data || !inc_length) {
- mutex_unlock(&migf->lock);
- goto done;
- }
+ if ((reinit_state || end_of_data) && inc_length) {
+ /*
+ * In case we finished transferring the current state and the
+ * device has a dirty state, or that the device has a new init
+ * state, save a new state to be ready for.
+ */
+ buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE),
+ DMA_FROM_DEVICE);
+ if (IS_ERR(buf)) {
+ ret = PTR_ERR(buf);
+ mlx5vf_mark_err(migf);
+ goto err_state_unlock;
+ }
- mutex_unlock(&migf->lock);
- /*
- * We finished transferring the current state and the device has a
- * dirty state, save a new state to be ready for.
- */
- buf = mlx5vf_get_data_buffer(migf, DIV_ROUND_UP(inc_length, PAGE_SIZE),
- DMA_FROM_DEVICE);
- if (IS_ERR(buf)) {
- ret = PTR_ERR(buf);
- mlx5vf_mark_err(migf);
- goto err_state_unlock;
- }
+ buf->pre_copy_init_bytes_chunk = reinit_state;
+ ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true);
+ if (ret) {
+ mlx5vf_mark_err(migf);
+ mlx5vf_put_data_buffer(buf);
+ goto err_state_unlock;
+ }
- ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, true);
- if (ret) {
- mlx5vf_mark_err(migf);
- mlx5vf_put_data_buffer(buf);
- goto err_state_unlock;
+ /*
+ * SAVE appends a header record via add_buf_header(),
+ * let's account it as well.
+ */
+ if (reinit_state)
+ info.initial_bytes += sizeof(struct mlx5_vf_migration_header);
+ else
+ info.dirty_bytes += sizeof(struct mlx5_vf_migration_header);
}
-done:
mlx5vf_state_mutex_unlock(mvdev);
if (copy_to_user((void __user *)arg, &info,
offsetofend(struct vfio_precopy_info, dirty_bytes)))
@@ -570,7 +599,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
if (migf->state == MLX5_MIGF_STATE_ERROR)
return -ENODEV;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL,
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, NULL,
MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL);
if (ret)
goto err;
@@ -636,7 +665,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
if (ret)
goto out;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, 0);
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, NULL, 0);
if (ret)
goto out_pd;
@@ -1123,7 +1152,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
enum mlx5_vf_migf_state state;
size_t size;
- ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL,
+ ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &size, NULL, NULL,
MLX5VF_QUERY_INC | MLX5VF_QUERY_CLEANUP);
if (ret)
return ERR_PTR(ret);
@@ -1248,7 +1277,7 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev,
mutex_lock(&mvdev->state_mutex);
ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &state_size,
- &total_size, 0);
+ &total_size, NULL, 0);
if (!ret)
*stop_copy_length = total_size;
mlx5vf_state_mutex_unlock(mvdev);
--
2.18.1
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization
2026-03-17 16:17 [PATCH V2 vfio 0/6] Add support for PRE_COPY initial bytes re-initialization Yishai Hadas
` (5 preceding siblings ...)
2026-03-17 16:17 ` [PATCH V2 vfio 6/6] vfio/mlx5: Add REINIT support to VFIO_MIG_GET_PRECOPY_INFO Yishai Hadas
@ 2026-03-20 21:17 ` Alex Williamson
6 siblings, 0 replies; 11+ messages in thread
From: Alex Williamson @ 2026-03-20 21:17 UTC (permalink / raw)
To: Yishai Hadas
Cc: jgg, kvm, kevin.tian, joao.m.martins, leonro, maorg, avihaih, clg,
peterx, liulongfang, giovanni.cabiddu, kwankhede, alex
On Tue, 17 Mar 2026 18:17:47 +0200
Yishai Hadas <yishaih@nvidia.com> wrote:
> This series introduces support for re-initializing the initial_bytes
> value during the VFIO PRE_COPY migration phase.
>
> Background
> ==========
> As currently defined, initial_bytes is monotonically decreasing and
> precedes dirty_bytes when reading from the saving file descriptor.
> The transition from initial_bytes to dirty_bytes is unidirectional and
> irreversible.
>
> The initial_bytes are considered critical data that is highly
> recommended to be transferred to the target as part of PRE_COPY.
> Without this data, the PRE_COPY phase would be ineffective.
>
> Problem Statement
> =================
> In some cases, a new chunk of critical data may appear during the
> PRE_COPY phase. The current API does not provide a mechanism for the
> driver to report an updated initial_bytes value when this occurs.
>
> Solution
> ========
> For that, we extend the VFIO_MIG_GET_PRECOPY_INFO ioctl with an output
> flag named VFIO_PRECOPY_INFO_REINIT to allow drivers reporting a new
> initial_bytes value during the PRE_COPY phase.
>
> However, Currently, existing VFIO_MIG_GET_PRECOPY_INFO implementations
> don't assign info.flags before copy_to_user(), this effectively echoes
> userspace-provided flags back as output, preventing the field from being
> used to report new reliable data from the drivers.
>
> Reliable use of the new VFIO_PRECOPY_INFO_REINIT flag requires userspace
> to explicitly opt in. For that we introduce a new feature named
> VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2.
>
> User should opt-in to the above feature with a SET operation, no data is
> required and any supplied data is ignored.
>
> When the caller opts in:
> - We set info.flags to zero, otherwise we keep v1 behaviour as is for
> compatibility reasons.
> - The new output flag VFIO_PRECOPY_INFO_REINIT can be used reliably.
> - The VFIO_PRECOPY_INFO_REINIT output flag indicates that new initial
> data is present on the stream. The initial_bytes value should be
> re-evaluated relative to the readiness state for transition to
> STOP_COPY.
>
> The mlx5 VFIO driver is extended to support this case when the
> underlying firmware also supports the REINIT migration state.
>
> As part of this series, a core helper function is introduced to provide
> shared functionality for implementing the VFIO_MIG_GET_PRECOPY_INFO
> ioctl, and all drivers have been updated to use it.
>
> Changes from V1:
> https://patchwork.kernel.org/project/kvm/cover/20260310164006.4020-1-yishaih@nvidia.com/
>
> Patch #1:
> - Extend the uAPI documentation to refer to the source of new
> initial_bytes data.
Applied with the fix discussed in 2/ to vfio next branch for v7.1.
Thanks,
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread