* [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support
@ 2026-06-30 10:41 Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev Anthony Krowiak
` (14 more replies)
0 siblings, 15 replies; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
This patch series implements live guest migration support for KVM guests
with s390 AP (Adjunct Processor) devices passed through via the VFIO
mediated device framework.
Background
~~~~~~~~~~
The vfio-ap device driver differs from typical VFIO device drivers in that
it does not virtualize a physical device. Instead, it manages AP
configuration metadata identifying the AP adapters, domains, and control
domains to which a guest will be granted access. These AP resources are
configured by assigning them to a vfio-ap mediated device via its sysfs
assignment interfaces. When the fd for the VFIO device is opened by
userspace, the vfio_ap device driver sets the guest's AP configuration
from the metadata stored with the mediated device. As such, the AP devices
are not accessed directly through the vfio_ap driver, so the driver has no
internal AP device state to migrate. What it does migrate is the AP
configuration metadata of the source guest.
Implementation Approach
~~~~~~~~~~~~~~~~~~~~~~~
This series implements the VFIO migration protocol using the STOP_COPY
migration flow. The key aspects are:
1. On transition of the migration state from STOP to STOP_COPY
- The vfio_ap device driver creates a filestream for userspace to use to
read the guest's AP configuration from the mdev
2. During the STOP_COPY phase
- Userspace uses the filestream created in #1 to read the source guest's
AP configuration
- The vfio_ap device driver copies the source guest's AP configuration
information to userspace
3. On transition of the migration state from STOP to RESUMING
- The vfio_ap device driver creates a filestream for userspace to use to
write the source guest's AP configuration information so it can be
restored to the mdev on the destination host.
4. During the RESUMING phase
- Userspace uses the filestream created in #3 to send the source guest's
AP configuration information to the vfio_ap device driver on the
destination host.
- The vfio_ap device driver first verifies the source guest's AP
configuration is compatible with the destination host's.
- The driver restores AP configuration to the mdev on the destination
host which automatically hot plugs the AP resources identified
therein.
5. Documentation
- Add live guest migration chapter to vfio-ap.rst
Compatibility Validation
~~~~~~~~~~~~~~~~~~~~~~~~
The series includes comprehensive validation to ensure source and
destination AP configurations are compatible. For each queue, the following
characteristics must match:
- AP type (target must be same or newer than source)
- Installed facilities (APSC, APQKM, AP4KC, SLCF)
- Operating mode (CCA, Accelerator, XCP)
- APXA facility setting
- Classification (native vs stateless functions)
- Queue usability (binding/associated state)
When incompatibilities are detected, migration fails with detailed error
messages identifying the specific queue and characteristic that caused
the failure.
Configuration Management
~~~~~~~~~~~~~~~~~~~~~~~~
This implementation does not prevent configuration changes during
migration. Configuration stability is an orchestration-layer
responsibility, consistent with other VFIO device types. The driver's
role is to validate configurations and provide clear diagnostics when
incompatibilities are detected, enabling orchestration tools to implement
appropriate policies.
Change log v2 => v3:
~~~~~~~~~~~~~~~~~~~
Patch 1:
* Removed this patch because the hardware info will be retrieved using
PQAP(TAPQ) command when needed in cases changes have been made to the
host's AP configuration since the mdev was probed.
* Removed #include "uapi/linux/vfio_ap.h" from vfio_ap_migration.c
file
Patch 3:
* Removed vfio_ap_migration_file class and replaced with file pointers as
there is no longer a need to store AP configuration information which
will now be retrieved from the mdev (during STOP_COPY phase) or stored
with the mdev (during RESUMING phase)
* Added padding fields to struct vfio_ap_queue_info & struct vfio_ap_config
to ensure alignment
Patch 4:
* vfio_ap_release_mig_files function now only needs to fput the files used
during ghe STOP_COPY and RESUMING phases of migration since there are no
longer vfio_ap_migration_file objects containing the file pointers as
well as AP configuration data.
* Removed #include "uapi/linux/vfio_ap.h" from vfio_ap_migration.c
file
* Set the struct vfio_device migration_flags and mig_ops fields from mdev
probe callback prior to call to vfio_ap_init_migration_capabilities
function so the VFIO core doesn't see mig_ops as NULL during the
registration.
* In the vfio_ap_mdev_open_device function in vfio_ap_ops.c, make a call
to vfio_ap_release_migration_data if the vfio_ap_mdev_set_kvm function
fails to clean up mig_data. This prevents data leak of mig_data if
multiple opens are executed.
* Call vfio_ap_release_migration_data if setting the KVM pointer fails in
the open vfio device callback to prevent leaking the allocation of
the matrix_mdev->mig_data pointer.
Patch 5:
* Combined the state transition checks from STOP_COPY to STOP and RESUMING
to STOP into one if statement since the handling is the same.
* Squashed patch 16 into this patch
Patch 6:
* No longer copies the AP configuration from the mdev to a
vfio_ap_migration_file structure (removed) and simply opens a file stream
for STOP_COPY phase of migration.
Patch 7:
* No longer copies a vfio_ap_config object retrieved from a
vfio_ap_migration_file structure (removed) to userspace.
* Now creates an vfio_ap_config object and obtains the hardware information
stored for each APQN assigned to the mdev via the PQAP(TAPQ) instruction,
then copies the vfio_ap_configuration object to userspace
* Removed dev_err messages from validate_save_read_parms function to
avoid kernal log spamming.
* Add check to validate_save_read_parms function to verify that matrix_mdev
is stored as private data of the file object and that
matrix_mdev->mig_data is not NULL since we access it in the read
callback.
* Release matrix_dev->mdevs_lock while copying AP configuration data to
userspace in vfio_ap_save_read function
Patch 8:
* No longer creates a struct vfio_ap_migration_data object (removed) in
which to store the AP configuration data sent from userspace during the
RESUMING phase of migration.
* Now, merely creates and returns the filestream for userspace to use to
restore the source guest's data on the destination
Patch 9:
* Complete redesign to better comport with the VFIO migration framework
whereby the vfio device state of the source device is restored to the
destination device. Prior to this change, the destination guest's AP
configuration was acquired at startup from the vfio-ap mediated device.
The vfio_ap driver would only verify that the source and destination
AP configurations are compatible.
- The source guest's state (AP configuration) is read from the file
stream and validated
- If valid, it is stored directly in the ap_matrix_mdev object and hot
hot plugged into the running guest
Patches 10 and 11:
* Squashed patches 10 and 11 together as the state transition to from
STOP_COPY to STOP and RESUMING to STOP are handled the same way (also see
changes for Patch 5 above.
Patch 16:
* Updated vfio-ap.rst to include text related to design changes
Anthony Krowiak (15):
s390/vfio-ap: Provide function to get the number of queues assigned to
mdev
s390/vfio-ap: Data structures for facilitating vfio device migration
s390/vfio-ap: Initialize/release vfio device migration data
s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler
s390-vfio-ap: Callback to get/set vfio device mig state during guest
migration
s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY
s390/vfio-ap: File ops called to save the vfio device migration state
s390/vfio-ap: Transition device migration state from STOP to RESUMING
s390/vfio-ap: Add method to set a new guest AP configuration
s390/vfio-ap: File ops called to resume the vfio device migration
s390/vfio-ap: Transition device migration state to STOP
s390/vfio-ap: Transition device migration state from STOP to RUNNING
and vice versa
s390/vfio-ap: Callback to get the size of data to be migrated during
guest migration
s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute
s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst
Documentation/arch/s390/vfio-ap.rst | 514 +++++++--
drivers/s390/crypto/Makefile | 2 +-
drivers/s390/crypto/vfio_ap_drv.c | 4 +-
drivers/s390/crypto/vfio_ap_migration.c | 1373 +++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_ops.c | 462 ++++++--
drivers/s390/crypto/vfio_ap_private.h | 72 ++
6 files changed, 2225 insertions(+), 202 deletions(-)
create mode 100644 drivers/s390/crypto/vfio_ap_migration.c
--
2.53.0
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration Anthony Krowiak
` (13 subsequent siblings)
14 siblings, 0 replies; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Provides a function that returns the number of queues that are or will be
passed through to a guest when the mdev is attached to a guest.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 14 ++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 2 ++
2 files changed, 16 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 44b3a1dcc1b3..71d3e2c2e07c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2098,6 +2098,20 @@ static void vfio_ap_mdev_request(struct vfio_device *vdev, unsigned int count)
release_update_locks_for_mdev(matrix_mdev);
}
+int vfio_ap_mdev_get_num_queues(struct ap_matrix *ap_matrix)
+{
+ unsigned long apid, apqi;
+ int num_queues = 0;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ for_each_set_bit_inv(apid, ap_matrix->apm, AP_DEVICES)
+ for_each_set_bit_inv(apqi, ap_matrix->aqm, AP_DOMAINS)
+ num_queues++;
+
+ return num_queues;
+}
+
static int vfio_ap_mdev_get_device_info(unsigned long arg)
{
unsigned long minsz;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 9bff666b0b35..9677e49554d7 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -152,6 +152,8 @@ struct vfio_ap_queue {
struct work_struct reset_work;
};
+int vfio_ap_mdev_get_num_queues(struct ap_matrix *ap_matrix);
+
int vfio_ap_mdev_register(void);
void vfio_ap_mdev_unregister(void);
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 10:55 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data Anthony Krowiak
` (12 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Creates the data structures used to facilitate state transitions during
vfio device migration.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/Makefile | 2 +-
drivers/s390/crypto/vfio_ap_migration.c | 56 +++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_private.h | 5 +++
3 files changed, 62 insertions(+), 1 deletion(-)
create mode 100644 drivers/s390/crypto/vfio_ap_migration.c
diff --git a/drivers/s390/crypto/Makefile b/drivers/s390/crypto/Makefile
index e83c6603c858..20f29184825a 100644
--- a/drivers/s390/crypto/Makefile
+++ b/drivers/s390/crypto/Makefile
@@ -34,5 +34,5 @@ pkey-uv-objs := pkey_uv.o
obj-$(CONFIG_PKEY_UV) += pkey-uv.o
# adjunct processor matrix
-vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o
+vfio_ap-objs := vfio_ap_drv.o vfio_ap_ops.o vfio_ap_migration.o
obj-$(CONFIG_VFIO_AP) += vfio_ap.o
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
new file mode 100644
index 000000000000..239168c4b7ff
--- /dev/null
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Drives vfio_ap mdev migration.
+ *
+ * Copyright IBM Corp. 2025
+ */
+#include "vfio_ap_private.h"
+
+/**
+ * vfio_ap_migration_data - the data needed to migrate a guest with pass-through
+ * access to AP devices
+ *
+ * @mig_state: the current migration state
+ * @resuming_migf: the object used to resume the target guest
+ * @saving_migf: the object used to save the state of the source guest
+ */
+struct vfio_ap_migration_data {
+ enum vfio_device_mig_state mig_state;
+
+ struct {
+ struct file *filp;
+ struct vfio_ap_config *ap_config;
+ size_t config_sz;
+ } resuming_mig_state;
+
+ struct file *stop_copy_mig_file;
+};
+
+/**
+ * vfio_ap_queue_info - the information for an AP queue
+ *
+ * @data: contains the queue information returned in GR2 from the PQAP(TAPQ)
+ * command
+ * @apqn: the APQN of the queue
+ * @reserved: padding to ensure consistent structure size across platforms
+ */
+struct vfio_ap_queue_info {
+ u64 data;
+ u16 apqn;
+ u8 reserved[6];
+};
+
+/**
+ * vfio_ap_config - the guest's AP configuration
+ *
+ * @num_queues: the number of queues passed through to the guest
+ * @reserved: padding to ensure proper alignment of qinfo array
+ * @qinfo: an array of vfio_ap_queue_info objects, each specifying the
+ * queue information for a queue passed through to the guest
+ */
+struct vfio_ap_config {
+ u32 num_queues;
+ u8 reserved[4];
+ size_t config_sz;
+ struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
+};
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 9677e49554d7..2b542648964b 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -91,6 +91,9 @@ struct ap_queue_table {
DECLARE_HASHTABLE(queues, 8);
};
+/* Forward declaration for migration data structure */
+struct vfio_ap_migration_data;
+
/**
* struct ap_matrix_mdev - Contains the data associated with a matrix mediated
* device.
@@ -110,6 +113,7 @@ struct ap_queue_table {
* @aqm_add: bitmap of APQIs added to the host's AP configuration
* @adm_add: bitmap of control domain numbers added to the host's AP
* configuration
+ * @mig_data: vfio device migration data
*/
struct ap_matrix_mdev {
struct vfio_device vdev;
@@ -125,6 +129,7 @@ struct ap_matrix_mdev {
DECLARE_BITMAP(apm_add, AP_DEVICES);
DECLARE_BITMAP(aqm_add, AP_DOMAINS);
DECLARE_BITMAP(adm_add, AP_DOMAINS);
+ struct vfio_ap_migration_data *mig_data;
};
/**
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:04 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler Anthony Krowiak
` (11 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Provides the functions that initialize and release the data structures
used during live guest migration. These functions should
be called when the mdev fd is opened and closed respectively. The function
to release the data structures shall also be invoked when the mdev
is released.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 75 +++++++++++++++++++++++++
drivers/s390/crypto/vfio_ap_ops.c | 60 +++++++++++++++++---
drivers/s390/crypto/vfio_ap_private.h | 4 ++
3 files changed, 131 insertions(+), 8 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 239168c4b7ff..5b6d48ccf332 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -4,6 +4,7 @@
*
* Copyright IBM Corp. 2025
*/
+#include <linux/file.h>
#include "vfio_ap_private.h"
/**
@@ -54,3 +55,77 @@ struct vfio_ap_config {
size_t config_sz;
struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
};
+
+static struct file *vfio_ap_set_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state new_state)
+{
+ return NULL;
+}
+
+static int vfio_ap_get_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state *current_state)
+{
+ return -EOPNOTSUPP;
+}
+
+static int vfio_ap_get_data_size(struct vfio_device *vdev,
+ unsigned long *stop_copy_length)
+{
+ return -EOPNOTSUPP;
+}
+
+static const struct vfio_migration_ops vfio_ap_migration_ops = {
+ .migration_set_state = vfio_ap_set_state,
+ .migration_get_state = vfio_ap_get_state,
+ .migration_get_data_size = vfio_ap_get_data_size,
+};
+
+/**
+ * vfio_ap_init_migrations_capabilities - initialize migration capabilities
+ *
+ * @matrix_mdev: pointer to object containing the mdev state
+ */
+void vfio_ap_init_migration_capabilities(struct ap_matrix_mdev *matrix_mdev)
+{
+ matrix_mdev->vdev.migration_flags = VFIO_MIGRATION_STOP_COPY;
+ matrix_mdev->vdev.mig_ops = &vfio_ap_migration_ops;
+}
+
+/**
+ * vfio_ap_init_migration_data - initialize migration data and functions
+ *
+ * @matrix_mdev: pointer to object containing the mdev state
+ *
+ * Return: zero if initialization is successful; otherwise, returns a error.
+ */
+int vfio_ap_init_migration_data(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_migration_data *mig_data;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ mig_data = kzalloc_obj(struct vfio_ap_migration_data, GFP_KERNEL);
+ if (!mig_data)
+ return -ENOMEM;
+
+ mig_data->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ matrix_mdev->mig_data = mig_data;
+
+ return 0;
+}
+
+/**
+ * vfio_ap_release_migration_data: reclaim private migration data
+ *
+ * @vdev: pointer to the mdev
+ */
+void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev)
+{
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ if (!matrix_mdev->mig_data)
+ return;
+
+ kfree(matrix_mdev->mig_data);
+ matrix_mdev->mig_data = NULL;
+}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 71d3e2c2e07c..813290214866 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -775,18 +775,30 @@ static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev,
static int vfio_ap_mdev_init_dev(struct vfio_device *vdev)
{
- struct ap_matrix_mdev *matrix_mdev =
- container_of(vdev, struct ap_matrix_mdev, vdev);
+ struct ap_matrix_mdev *matrix_mdev;
+ mutex_lock(&matrix_dev->mdevs_lock);
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
matrix_mdev->mdev = to_mdev_device(vdev->dev);
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->matrix);
matrix_mdev->pqap_hook = handle_pqap;
vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb);
hash_init(matrix_mdev->qtable.queues);
+ mutex_unlock(&matrix_dev->mdevs_lock);
return 0;
}
+static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ vfio_ap_release_migration_data(matrix_mdev);
+ mutex_unlock(&matrix_dev->mdevs_lock);
+}
+
static int vfio_ap_mdev_probe(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev;
@@ -797,19 +809,30 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
if (IS_ERR(matrix_mdev))
return PTR_ERR(matrix_mdev);
+ mutex_lock(&matrix_dev->mdevs_lock);
+
+ /*
+ * Migration capabilities must be initialized before calling
+ * vfio_register_emulated_iommu_dev; otherwise, the VFIO core
+ * will see mig_ops as NULL during the registration. This could
+ * prevent the VFIO core from properly setting up migration
+ * infrastructure like debugfs entries.
+ */
+ vfio_ap_init_migration_capabilities(matrix_mdev);
+
ret = vfio_register_emulated_iommu_dev(&matrix_mdev->vdev);
if (ret)
goto err_put_vdev;
matrix_mdev->req_trigger = NULL;
matrix_mdev->cfg_chg_trigger = NULL;
dev_set_drvdata(&mdev->dev, matrix_mdev);
- mutex_lock(&matrix_dev->mdevs_lock);
list_add(&matrix_mdev->node, &matrix_dev->mdev_list);
mutex_unlock(&matrix_dev->mdevs_lock);
return 0;
err_put_vdev:
vfio_put_device(&matrix_mdev->vdev);
+ mutex_unlock(&matrix_dev->mdevs_lock);
return ret;
}
@@ -2052,19 +2075,39 @@ static int vfio_ap_mdev_reset_qlist(struct list_head *qlist)
static int vfio_ap_mdev_open_device(struct vfio_device *vdev)
{
- struct ap_matrix_mdev *matrix_mdev =
- container_of(vdev, struct ap_matrix_mdev, vdev);
+ struct ap_matrix_mdev *matrix_mdev;
+ int ret;
if (!vdev->kvm)
return -EINVAL;
- return vfio_ap_mdev_set_kvm(matrix_mdev, vdev->kvm);
+ mutex_lock(&matrix_dev->mdevs_lock);
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ ret = vfio_ap_init_migration_data(matrix_mdev);
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ if (ret)
+ return ret;
+
+ ret = vfio_ap_mdev_set_kvm(matrix_mdev, vdev->kvm);
+ if (ret) {
+ /* Clean up migration data on failure */
+ mutex_lock(&matrix_dev->mdevs_lock);
+ vfio_ap_release_migration_data(matrix_mdev);
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ }
+
+ return ret;
}
static void vfio_ap_mdev_close_device(struct vfio_device *vdev)
{
- struct ap_matrix_mdev *matrix_mdev =
- container_of(vdev, struct ap_matrix_mdev, vdev);
+ struct ap_matrix_mdev *matrix_mdev;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ vfio_ap_release_migration_data(matrix_mdev);
+ mutex_unlock(&matrix_dev->mdevs_lock);
vfio_ap_mdev_unset_kvm(matrix_mdev);
}
@@ -2374,6 +2417,7 @@ static const struct attribute_group vfio_queue_attr_group = {
static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.init = vfio_ap_mdev_init_dev,
+ .release = vfio_ap_mdev_release_dev,
.open_device = vfio_ap_mdev_open_device,
.close_device = vfio_ap_mdev_close_device,
.ioctl = vfio_ap_mdev_ioctl,
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 2b542648964b..a2a713f93674 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -172,4 +172,8 @@ void vfio_ap_on_cfg_changed(struct ap_config_info *new_config_info,
void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
struct ap_config_info *old_config_info);
+void vfio_ap_init_migration_capabilities(struct ap_matrix_mdev *matrix_mdev);
+int vfio_ap_init_migration_data(struct ap_matrix_mdev *matrix_mdev);
+void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (2 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:10 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration Anthony Krowiak
` (10 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
According to the VFIO migration specifications, userspace must issue a
VFIO_DEVICE_RESET ioctl if a vfio device enters an error state or fails a
state transition to recover the device and return its migration state to
RUNNING. This patch adds a function that is called when a VFIO_DEVICE_RESET
is issued that releases the migration files and frees the storage allocated
for maintaining the migration state.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 20 ++++++++++++++++++++
drivers/s390/crypto/vfio_ap_ops.c | 4 ++++
drivers/s390/crypto/vfio_ap_private.h | 1 +
3 files changed, 25 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 5b6d48ccf332..c7c71980bdbb 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -129,3 +129,23 @@ void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev)
kfree(matrix_mdev->mig_data);
matrix_mdev->mig_data = NULL;
}
+
+/**
+ * vfio_ap_reset_migration_state - Reset the vfio-ap migration state
+ *
+ * @matrix_mdev: pointer to the object maintaining the vfio-ap device state
+ *
+ * Called during VFIO_DEVICE_RESET to clean up any active migration
+ * stte and reset the device to RUNNING state as required by the VFIO
+ * migration specification.
+ */
+void vfio_ap_reset_migration_state(struct ap_matrix_mdev *matrix_mdev)
+{
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ if (!matrix_mdev->mig_data)
+ return;
+
+ vfio_ap_release_mig_files(matrix_mdev);
+ matrix_mdev->mig_data->mig_state = VFIO_DEVICE_STATE_RUNNING;
+}
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 813290214866..8a9e83921f74 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -2332,6 +2332,10 @@ static ssize_t vfio_ap_mdev_ioctl(struct vfio_device *vdev,
break;
case VFIO_DEVICE_RESET:
ret = vfio_ap_mdev_reset_queues(matrix_mdev);
+ if (ret == 0) {
+ /* Reset migration state per VFIO migration spec */
+ vfio_ap_reset_migration_state(matrix_mdev);
+ }
break;
case VFIO_DEVICE_GET_IRQ_INFO:
ret = vfio_ap_get_irq_info(arg);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index a2a713f93674..1fbdfcce5a11 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -175,5 +175,6 @@ void vfio_ap_on_scan_complete(struct ap_config_info *new_config_info,
void vfio_ap_init_migration_capabilities(struct ap_matrix_mdev *matrix_mdev);
int vfio_ap_init_migration_data(struct ap_matrix_mdev *matrix_mdev);
void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev);
+void vfio_ap_reset_migration_state(struct ap_matrix_mdev *matrix_mdev);
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (3 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:11 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY Anthony Krowiak
` (9 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Implements two callbacks related to the current state of a vfio-ap device:
* The callback that sets a new migration state of a vfio-ap device during
live migration of guests with pass-through access to AP devices. This
callback is mandatory for VFIO_DEVICE_FEATURE_MIGRATION support.
The function pointer for this callback is specified via the
'migration_set_state' field of the 'vfio_migration_ops' structure
which is stored with the VFIO device when the 'vfio_device'
structure representing the mediated device is initialized.
* The callback that returns the current vfio device migration state during
live migration of guests with pass-through access to AP devices.
The function pointer for this callback is specified via the
'migration_get_state' field of the 'vfio_migration_ops' structure
which is stored with the VFIO device when the 'vfio_device'
structure representing the mediated device is initialized.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 130 +++++++++++++++++++++++-
1 file changed, 127 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index c7c71980bdbb..8274e7b65eea 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -5,6 +5,7 @@
* Copyright IBM Corp. 2025
*/
#include <linux/file.h>
+#include "ap_bus.h"
#include "vfio_ap_private.h"
/**
@@ -56,16 +57,140 @@ struct vfio_ap_config {
struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
};
+static struct file *
+vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
+ enum vfio_device_mig_state new_state)
+{
+ struct vfio_ap_migration_data *mig_data;
+ enum vfio_device_mig_state cur_state;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+ mig_data = matrix_mdev->mig_data;
+ cur_state = mig_data->mig_state;
+ dev_dbg(matrix_mdev->vdev.dev, "%s: %d -> %d\n", __func__, cur_state,
+ new_state);
+
+ if (cur_state == VFIO_DEVICE_STATE_STOP &&
+ new_state == VFIO_DEVICE_STATE_STOP_COPY) {
+ /* TODO */
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+
+ if (cur_state == VFIO_DEVICE_STATE_STOP &&
+ new_state == VFIO_DEVICE_STATE_RESUMING) {
+ /* TODO */
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+
+ if ((cur_state == VFIO_DEVICE_STATE_RESUMING &&
+ new_state == VFIO_DEVICE_STATE_STOP) ||
+ (cur_state == VFIO_DEVICE_STATE_STOP_COPY &&
+ new_state == VFIO_DEVICE_STATE_STOP)) {
+ /* TODO */
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+
+ if ((cur_state == VFIO_DEVICE_STATE_STOP &&
+ new_state == VFIO_DEVICE_STATE_RUNNING) ||
+ (cur_state == VFIO_DEVICE_STATE_RUNNING &&
+ new_state == VFIO_DEVICE_STATE_STOP)) {
+ /* TODO */
+ return ERR_PTR(-EOPNOTSUPP);
+ }
+
+ /* vfio_mig_get_next_state() does not use arcs other than the above */
+ WARN_ON(true);
+
+ return ERR_PTR(-EINVAL);
+}
+
static struct file *vfio_ap_set_state(struct vfio_device *vdev,
enum vfio_device_mig_state new_state)
{
- return NULL;
+ int ret;
+ struct file *filp = NULL;
+ struct ap_matrix_mdev *matrix_mdev;
+ enum vfio_device_mig_state next_state;
+ struct vfio_ap_migration_data *mig_data;
+
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+ if (ap_is_se_guest()) {
+ dev_err(matrix_mdev->vdev.dev,
+ "Migration not allowed from or to a Secure Execution guest\n");
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ return ERR_PTR(-EPERM);
+ }
+
+ mig_data = matrix_mdev->mig_data;
+
+ /*
+ * The mig_data pointer is set in the vfio_ap_init_migration_data
+ * function which is called when the vfio-ap device fd is opened.
+ * Since the implicit pre-open state is RUNNING, a request to set
+ * RUNNING is a no-op. Any other state transition is invalid before
+ * open_device.
+ */
+ if (!mig_data) {
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ if (new_state == VFIO_DEVICE_STATE_RUNNING)
+ return NULL;
+ return ERR_PTR(-ENODEV);
+ }
+
+ dev_dbg(vdev->dev, "%s -> %d\n", __func__, new_state);
+
+ while (mig_data->mig_state != new_state) {
+ ret = vfio_mig_get_next_state(vdev, mig_data->mig_state,
+ new_state, &next_state);
+ if (ret) {
+ filp = ERR_PTR(ret);
+ break;
+ }
+
+ filp = vfio_ap_transition_to_state(matrix_mdev, next_state);
+ if (IS_ERR(filp))
+ break;
+
+ mig_data->mig_state = next_state;
+
+ if (WARN_ON(filp && new_state != next_state)) {
+ fput(filp);
+ filp = ERR_PTR(-EINVAL);
+ break;
+ }
+ }
+
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ return filp;
}
static int vfio_ap_get_state(struct vfio_device *vdev,
enum vfio_device_mig_state *current_state)
{
- return -EOPNOTSUPP;
+ struct ap_matrix_mdev *matrix_mdev;
+ struct vfio_ap_migration_data *mig_data;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ mig_data = matrix_mdev->mig_data;
+
+ /*
+ * The mig_data pointer is set in the vfio_ap_init_migration_data
+ * function which is called when the vfio-ap device fd is opened.
+ * If mig_data is NULL, report RUNNING as the implicit pre-open state
+ * so userspace doesn't need to perform any state transition before the
+ * device becomes active.
+ */
+ *current_state = (mig_data) ? mig_data->mig_state :
+ VFIO_DEVICE_STATE_RUNNING;
+
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ return 0;
}
static int vfio_ap_get_data_size(struct vfio_device *vdev,
@@ -146,6 +271,5 @@ void vfio_ap_reset_migration_state(struct ap_matrix_mdev *matrix_mdev)
if (!matrix_mdev->mig_data)
return;
- vfio_ap_release_mig_files(matrix_mdev);
matrix_mdev->mig_data->mig_state = VFIO_DEVICE_STATE_RUNNING;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (4 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:23 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state Anthony Krowiak
` (8 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
The transition of the guest migration state from VFIO_DEVICE_STATE_STOP to
VFIO_DEVICE_STATE_STOP_COPY begins the process of saving the
vfio device state. The vfio device state is comprised of the guest's
AP configuration which specifies the adapters, domains and control
domains to which the guest will be given access.
To begin the process of saving the vfio device state, the VFIO AP device
driver opens a file stream that will be used to read the source guest's
AP configuration.
Note:
This commit introduces the file operations structure for migration state
files with a stub implementation of the read operation. The complete
implementation is provided in the subsequent commit.
This incremental approach allows for easier review of the migration
infrastructure setup separate from the actual implementation logic.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 68 ++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 8274e7b65eea..76e06abe5a9e 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -4,6 +4,7 @@
*
* Copyright IBM Corp. 2025
*/
+#include <linux/anon_inodes.h>
#include <linux/file.h>
#include "ap_bus.h"
#include "vfio_ap_private.h"
@@ -57,6 +58,57 @@ struct vfio_ap_config {
struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
};
+static void
+vfio_ap_release_stop_copy_file(struct vfio_ap_migration_data *mig_data)
+{
+ if (mig_data->stop_copy_mig_file)
+ mig_data->stop_copy_mig_file = NULL;
+}
+
+static ssize_t
+vfio_ap_stop_copy_read(struct file *, char __user *, size_t, loff_t *)
+{
+ /* TODO */
+ return -EOPNOTSUPP;
+}
+
+static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
+{
+ struct ap_matrix_mdev *matrix_mdev = filp->private_data;
+
+ if (!matrix_mdev || !matrix_mdev->mig_data)
+ return -ENODEV;
+
+ if (filp == matrix_mdev->mig_data->stop_copy_mig_file)
+ vfio_ap_release_stop_copy_file(matrix_mdev->mig_data);
+ else
+ return -ENOENT;
+
+ return 0;
+}
+
+static const struct file_operations vfio_ap_stop_copy_fops = {
+ .owner = THIS_MODULE,
+ .read = vfio_ap_stop_copy_read,
+ .compat_ioctl = compat_ptr_ioctl,
+ .release = vfio_ap_release_mig_file,
+};
+
+static struct file *vfio_ap_open_file_stream(struct ap_matrix_mdev *matrix_mdev,
+ const struct file_operations *fops,
+ int flags)
+{
+ struct file *filp;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ filp = anon_inode_getfile("vfio_ap_mig_file", fops, matrix_mdev, flags);
+ if (!IS_ERR(filp))
+ stream_open(filp->f_inode, filp);
+
+ return filp;
+}
+
static struct file *
vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
enum vfio_device_mig_state new_state)
@@ -70,10 +122,22 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
dev_dbg(matrix_mdev->vdev.dev, "%s: %d -> %d\n", __func__, cur_state,
new_state);
+ /*
+ * Begins the process of saving the vfio device state by creating and
+ * returning a streaming data_fd to be used to read out the internal
+ * state of the vfio-ap device on the source host.
+ */
if (cur_state == VFIO_DEVICE_STATE_STOP &&
new_state == VFIO_DEVICE_STATE_STOP_COPY) {
- /* TODO */
- return ERR_PTR(-EOPNOTSUPP);
+ struct file *filp = vfio_ap_open_file_stream(matrix_mdev,
+ &vfio_ap_stop_copy_fops,
+ O_RDONLY);
+ if (IS_ERR(filp))
+ return ERR_CAST(filp);
+
+ mig_data->stop_copy_mig_file = filp;
+
+ return filp;
}
if (cur_state == VFIO_DEVICE_STATE_STOP &&
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (5 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:26 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING Anthony Krowiak
` (7 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Implements the read callback function that was added to the
file_operations structure for the file created to save the state of the
vfio-ap device when the migration state transitioned from STOP to
to the STOP_COPY state.
This function copies the guest's AP configuration information to
userspace. The information copied is comprised of the APQN of each queue
device passed through to the guest along with its hardware information.
This state data will be transferred to the vfio_ap device driver on the
destination host when the state is transitioned to RESUMING.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 216 +++++++++++++++++++++++-
1 file changed, 209 insertions(+), 7 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 76e06abe5a9e..5ef7127b697a 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -65,13 +65,6 @@ vfio_ap_release_stop_copy_file(struct vfio_ap_migration_data *mig_data)
mig_data->stop_copy_mig_file = NULL;
}
-static ssize_t
-vfio_ap_stop_copy_read(struct file *, char __user *, size_t, loff_t *)
-{
- /* TODO */
- return -EOPNOTSUPP;
-}
-
static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
{
struct ap_matrix_mdev *matrix_mdev = filp->private_data;
@@ -87,6 +80,215 @@ static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
return 0;
}
+/**
+ * validate_stop_copy_read_parms: Validate the input parameters to the
+ * vfio_ap_stop_copy_read function
+ *
+ * @matrix_mdev: The object device containing the state to be read
+ * @filp: Pointer to the file stream used to read the vfio-ap device state
+ * @pos: The file offset from which to start reading data
+ * @len: The length of the data to be read
+ *
+ * Verify the following:
+ * - @filp private data is an ap_matrix_mdev instance
+ * - @filp is the instance opened when state transitioned from STOP to STOP_COPY
+ * - @pos + @len does not cause integer overflow
+ *
+ * Returns: 0 if the parameters pass validation; otherwise returns an error
+ */
+static int validate_stop_copy_read_parms(struct file *filp, loff_t *pos,
+ size_t len)
+{
+ struct vfio_ap_migration_data *mig_data;
+ struct ap_matrix_mdev *matrix_mdev;
+ loff_t total_len;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ if (check_add_overflow((loff_t)len, *pos, &total_len))
+ return -EIO;
+
+ matrix_mdev = filp->private_data;
+
+ if (!matrix_mdev || !matrix_mdev->mig_data)
+ return -ENODEV;
+
+ mig_data = matrix_mdev->mig_data;
+
+ if (mig_data->stop_copy_mig_file != filp)
+ return -EINVAL;
+
+ return 0;
+}
+
+static size_t vfio_ap_config_size(struct ap_matrix_mdev *matrix_mdev,
+ int *num_queues)
+{
+ size_t qinfo_size;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ *num_queues = vfio_ap_mdev_get_num_queues(&matrix_mdev->shadow_apcb);
+ qinfo_size = *num_queues * sizeof(struct vfio_ap_queue_info);
+
+ return qinfo_size + sizeof(struct vfio_ap_config);
+}
+
+static int get_hardware_info_for_queue(struct ap_matrix_mdev *matrix_mdev,
+ struct ap_tapq_hwinfo *hwinfo,
+ unsigned long apqn)
+{
+ struct ap_queue_status status;
+
+ status = ap_tapq(apqn, hwinfo);
+
+ switch (status.response_code) {
+ case AP_RESPONSE_NORMAL:
+ case AP_RESPONSE_RESET_IN_PROGRESS:
+ case AP_RESPONSE_DECONFIGURED:
+ case AP_RESPONSE_CHECKSTOPPED:
+ case AP_RESPONSE_BUSY:
+ /* For all these RCs the tapq info should be available */
+ return 0;
+ case AP_RESPONSE_Q_NOT_AVAIL:
+ dev_err(matrix_mdev->vdev.dev,
+ "migration failed: Failed to get hwinfo for queue %02lx.%04lx on target host: TAPQ rc=%d",
+ AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+ return -ENODEV;
+ default:
+ /* On a pending async error the tapq info should be available */
+ if (!status.async)
+ return 0;
+
+ dev_err(matrix_mdev->vdev.dev,
+ "Failed to get hwinfo for queue %02lx.%04lx: TAPQ rc=%d",
+ AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+ return -EIO;
+ }
+
+ return -EINVAL;
+}
+
+static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
+ struct vfio_ap_config *ap_config)
+{
+ unsigned long *apm, *aqm, num_queues, apid, apqi, apqn;
+ struct ap_tapq_hwinfo source_hwinfo;
+ int ret;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ apm = matrix_mdev->shadow_apcb.apm;
+ aqm = matrix_mdev->shadow_apcb.aqm;
+ num_queues = 0;
+
+ for_each_set_bit_inv(apid, apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
+ apqn = AP_MKQID(apid, apqi);
+
+ ret = get_hardware_info_for_queue(matrix_mdev,
+ &source_hwinfo, apqn);
+ if (ret)
+ return ret;
+
+ ap_config->qinfo[num_queues].apqn = apqn;
+ ap_config->qinfo[num_queues].data = source_hwinfo.value;
+ num_queues += 1;
+ }
+ }
+
+ return (num_queues != ap_config->num_queues) ? -EINVAL : 0;
+}
+
+static int
+vfio_ap_get_config(struct ap_matrix_mdev *matrix_mdev,
+ struct vfio_ap_config **ap_config, size_t *ap_config_size)
+{
+ struct vfio_ap_config *ap_configuration;
+ int num_queues, ret;
+
+ *ap_config_size = vfio_ap_config_size(matrix_mdev, &num_queues);
+
+ ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
+ if (!ap_configuration)
+ return -ENOMEM;
+
+ ap_configuration->num_queues = num_queues;
+
+ ret = vfio_ap_store_queue_info(matrix_mdev, ap_configuration);
+ if (ret) {
+ kfree(ap_configuration);
+ return ret;
+ }
+
+ *ap_config = ap_configuration;
+
+ return 0;
+}
+
+static ssize_t vfio_ap_stop_copy_read(struct file *filp, char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+ size_t ret = 0, ap_config_size;
+ struct vfio_ap_config *ap_config;
+
+ /*
+ * When userspace calls read() with an explicit offset (pread), pos is
+ * non-NULL and the function rejects it with -ESPIPE (illegal seek). For
+ * normal read() calls, pos is NULL, so we'll use the file's internal
+ * position filp->f_pos
+ */
+ if (pos)
+ return -ESPIPE;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+
+ pos = &filp->f_pos;
+
+ ret = validate_stop_copy_read_parms(filp, pos, len);
+ if (ret) {
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ return ret;
+ }
+
+ matrix_mdev = filp->private_data;
+
+ ret = vfio_ap_get_config(matrix_mdev, &ap_config, &ap_config_size);
+ if (ret) {
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ return ret;
+ }
+
+ /*
+ * If the position exceeds the size of the AP configuration data,
+ * then indicate EOF; otherwise calculate the length of the data to
+ * read such that a buffer overrun is prevented.
+ */
+ if (*pos >= ap_config_size)
+ len = 0;
+ else
+ len = min_t(size_t, ap_config_size - *pos, len);
+
+ /* If we've reached an EOF condition, let the caller know */
+ if (len == 0) {
+ kfree(ap_config);
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ return 0;
+ }
+
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ if (copy_to_user(buf, (char *)ap_config + *pos, len)) {
+ kfree(ap_config);
+ return -EFAULT;
+ }
+
+ kfree(ap_config);
+ *pos += len;
+ return len;
+}
+
static const struct file_operations vfio_ap_stop_copy_fops = {
.owner = THIS_MODULE,
.read = vfio_ap_stop_copy_read,
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (6 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:28 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration Anthony Krowiak
` (6 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Transitioning from VFIO_DEVICE_STATE_STOP to VFIO_DEVICE_STATE_RESUMING
starts a process of restoring the device state of the vfio device on the
target system. To prepare for restoring the device state, open a file
stream to receive the guest's AP configuration saved when the device state
on the source system transitioned from VFIO_DEVICE_STATE_STOP to
VFIO_DEVICE_STATE_STOP_COPY.
Note:
This commit introduces the file operations structure for migration state
files with a stub implementation of the write operation. The complete
implementation is provided in the subsequent commit.
This incremental approach allows for easier review of the migration
infrastructure setup separate from the actual implementation logic.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 54 ++++++++++++++++++++++++-
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 5ef7127b697a..e917cc995d35 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -65,6 +65,17 @@ vfio_ap_release_stop_copy_file(struct vfio_ap_migration_data *mig_data)
mig_data->stop_copy_mig_file = NULL;
}
+static void
+vfio_ap_release_resuming_file(struct vfio_ap_migration_data *mig_data)
+{
+ if (mig_data->resuming_mig_state.filp)
+ mig_data->resuming_mig_state.filp = NULL;
+
+ kfree(mig_data->resuming_mig_state.ap_config);
+ mig_data->resuming_mig_state.ap_config = NULL;
+ mig_data->resuming_mig_state.config_sz = 0;
+}
+
static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
{
struct ap_matrix_mdev *matrix_mdev = filp->private_data;
@@ -74,6 +85,8 @@ static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
if (filp == matrix_mdev->mig_data->stop_copy_mig_file)
vfio_ap_release_stop_copy_file(matrix_mdev->mig_data);
+ else if (filp == matrix_mdev->mig_data->resuming_mig_state.filp)
+ vfio_ap_release_resuming_file(matrix_mdev->mig_data);
else
return -ENOENT;
@@ -311,6 +324,32 @@ static struct file *vfio_ap_open_file_stream(struct ap_matrix_mdev *matrix_mdev,
return filp;
}
+static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *pos)
+{
+ /* TODO */
+ return -EOPNOTSUPP;
+}
+
+static const struct file_operations vfio_ap_resume_fops = {
+ .owner = THIS_MODULE,
+ .write = vfio_ap_resuming_write,
+ .release = vfio_ap_release_mig_file,
+};
+
+static struct file *vfio_ap_resuming_init(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_migration_data *mig_data;
+ struct file *filp;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ mig_data = matrix_mdev->mig_data;
+ filp = vfio_ap_open_file_stream(matrix_mdev, &vfio_ap_resume_fops, O_WRONLY);
+
+ return filp;
+}
+
static struct file *
vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
enum vfio_device_mig_state new_state)
@@ -342,10 +381,21 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
return filp;
}
+ /*
+ * Begins the process of restoring the vfio device state by creating and
+ * returning a streaming data_fd to be used to read in the internal
+ * state of the vfio-ap device on the destination host.
+ */
if (cur_state == VFIO_DEVICE_STATE_STOP &&
new_state == VFIO_DEVICE_STATE_RESUMING) {
- /* TODO */
- return ERR_PTR(-EOPNOTSUPP);
+ struct file *filp = vfio_ap_resuming_init(matrix_mdev);
+
+ if (IS_ERR(filp))
+ return ERR_CAST(filp);
+
+ mig_data->resuming_mig_state.filp = filp;
+
+ return filp;
}
if ((cur_state == VFIO_DEVICE_STATE_RESUMING &&
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (7 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:34 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration Anthony Krowiak
` (5 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Add a new vfio_ap_set_new_config function to set a guest's AP
configuration. This is needed in order to set the state of the mdev when
it is migrated from a remote host system during the RESUMING phase.
Key changes:
* Refactored code from the ap_config_store function - handles changes to
the sysfs ap_config attribute - into a new, non-static function which
is callable from the ap_config_store function as well as the live guest
migration code.
* A flag parameter indicating whether the update to the guest's AP
configuration can proceed if APIDs must be filtered out because one or
more of the queues are not bound to the vfio_ap device driver. This is
primarily for calls from the migration code because we don't want code
that is running in the guest to lose access. The ap_config_store function
will continue to operate as it did before and allow filtering.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 358 +++++++++++++++++++++-----
drivers/s390/crypto/vfio_ap_private.h | 4 +
2 files changed, 298 insertions(+), 64 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 8a9e83921f74..c5eb9e4aa85c 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -105,6 +105,27 @@ static inline void get_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
mutex_lock(&matrix_dev->mdevs_lock);
}
+/**
+ * assert_has_update_locks_for_mdev:
+ *
+ * Assert the locks required to dynamically update a KVM guest's APCB are
+ * currently held.
+ *
+ * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
+ * configuration data to use to update a KVM guest's APCB.
+ *
+ * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
+ * lock will not be taken.
+ */
+static inline void
+assert_has_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
+{
+ lockdep_assert_held(&matrix_dev->guests_lock);
+ if (matrix_mdev && matrix_mdev->kvm)
+ lockdep_assert_held(&matrix_mdev->kvm->lock);
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+}
+
/**
* release_update_locks_for_mdev: Release the locks used to dynamically update a
* KVM guest's APCB in the proper order.
@@ -875,7 +896,40 @@ static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
q = vfio_ap_mdev_get_queue(matrix_mdev,
AP_MKQID(apid, apqi));
if (q)
- q->matrix_mdev = NULL;
+ vfio_ap_mdev_link_queue(matrix_mdev, q);
+ }
+ }
+}
+
+static void vfio_ap_unlink_queues(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apid, apqi;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ q = vfio_ap_mdev_get_queue(matrix_mdev,
+ AP_MKQID(apid, apqi));
+ if (q) {
+ vfio_ap_unlink_queue_fr_mdev(q);
+ vfio_ap_unlink_mdev_fr_queue(q);
+ }
+ }
+ }
+}
+
+static void vfio_ap_link_queues(struct ap_matrix_mdev *matrix_mdev)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apid, apqi;
+
+ for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) {
+ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
+ AP_DOMAINS) {
+ q = vfio_ap_find_queue(AP_MKQID(apid, apqi));
+ if (q)
+ vfio_ap_mdev_link_queue(matrix_mdev, q);
}
}
}
@@ -1014,19 +1068,36 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev *matrix_mdev,
unsigned long apqi;
for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS)
- vfio_ap_mdev_link_apqn(matrix_mdev,
- AP_MKQID(apid, apqi));
+ vfio_ap_mdev_link_apqn(matrix_mdev, AP_MKQID(apid, apqi));
}
-static void collect_queues_to_reset(struct ap_matrix_mdev *matrix_mdev,
- unsigned long apid,
- struct list_head *qlist)
+static void collect_queues_by_apid(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apid,
+ struct list_head *qlist)
{
struct vfio_ap_queue *q;
unsigned long apqi;
for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
- q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
+ q = matrix_mdev ?
+ vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi)) :
+ vfio_ap_find_queue(AP_MKQID(apid, apqi));
+ if (q)
+ list_add_tail(&q->reset_qnode, qlist);
+ }
+}
+
+static void collect_queues_by_apqi(struct ap_matrix_mdev *matrix_mdev,
+ unsigned long apqi,
+ struct list_head *qlist)
+{
+ struct vfio_ap_queue *q;
+ unsigned long apid;
+
+ for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
+ q = matrix_mdev ?
+ vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi)) :
+ vfio_ap_find_queue(AP_MKQID(apid, apqi));
if (q)
list_add_tail(&q->reset_qnode, qlist);
}
@@ -1038,7 +1109,7 @@ static void reset_queues_for_apid(struct ap_matrix_mdev *matrix_mdev,
struct list_head qlist;
INIT_LIST_HEAD(&qlist);
- collect_queues_to_reset(matrix_mdev, apid, &qlist);
+ collect_queues_by_apid(matrix_mdev, apid, &qlist);
vfio_ap_mdev_reset_qlist(&qlist);
}
@@ -1054,7 +1125,7 @@ static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev,
INIT_LIST_HEAD(&qlist);
for_each_set_bit_inv(apid, apm_reset, AP_DEVICES)
- collect_queues_to_reset(matrix_mdev, apid, &qlist);
+ collect_queues_by_apid(matrix_mdev, apid, &qlist);
return vfio_ap_mdev_reset_qlist(&qlist);
}
@@ -1725,82 +1796,241 @@ static void ap_matrix_copy(struct ap_matrix *dst, struct ap_matrix *src)
bitmap_copy(dst->adm, src->adm, AP_DOMAINS);
}
-static ssize_t ap_config_store(struct device *dev, struct device_attribute *attr,
- const char *buf, size_t count)
+static int validate_ap_matrix(struct ap_matrix_mdev *matrix_mdev)
{
- struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
- struct ap_matrix m_new, m_old, m_added, m_removed;
- DECLARE_BITMAP(apm_filtered, AP_DEVICES);
- unsigned long newbit;
- char *newbuf, *rest;
- int rc = count;
- bool do_update;
+ int rc;
- newbuf = kstrndup(buf, AP_CONFIG_STRLEN, GFP_KERNEL);
- if (!newbuf)
- return -ENOMEM;
- rest = newbuf;
+ lockdep_assert_held(&ap_attr_mutex);
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
- mutex_lock(&ap_attr_mutex);
- get_update_locks_for_mdev(matrix_mdev);
+ rc = vfio_ap_mdev_validate_masks(matrix_mdev);
+ if (rc)
+ return rc;
+ rc = ap_matrix_overflow_check(matrix_mdev);
+ if (rc)
+ return rc;
- /* Save old state */
- ap_matrix_copy(&m_old, &matrix_mdev->matrix);
- if (parse_bitmap(&rest, m_new.apm, AP_DEVICES) ||
- parse_bitmap(&rest, m_new.aqm, AP_DOMAINS) ||
- parse_bitmap(&rest, m_new.adm, AP_DOMAINS)) {
- rc = -EINVAL;
- goto out;
+ return 0;
+}
+
+static void get_removed_matrixes(struct ap_matrix *m_removed,
+ struct ap_matrix *m_old,
+ struct ap_matrix *m_new)
+{
+ bitmap_andnot(m_removed->apm, m_old->apm, m_new->apm, AP_DEVICES);
+ bitmap_andnot(m_removed->aqm, m_old->aqm, m_new->aqm, AP_DOMAINS);
+ bitmap_andnot(m_removed->adm, m_old->adm, m_new->adm, AP_DOMAINS);
+}
+
+static void reset_removed_queues_by_apid(unsigned long *apm_removed,
+ unsigned long *apm_filtered,
+ struct list_head *qlist)
+{
+ DECLARE_BITMAP(apids_removed, AP_DEVICES);
+ unsigned long apid;
+
+ for_each_set_bit_inv(apid, apm_filtered, AP_DEVICES)
+ set_bit_inv(apid, apids_removed);
+
+ for_each_set_bit_inv(apid, apm_removed, AP_DEVICES) {
+ if (!test_bit_inv(apid, apids_removed))
+ set_bit_inv(apid, apids_removed);
}
- bitmap_andnot(m_removed.apm, m_old.apm, m_new.apm, AP_DEVICES);
- bitmap_andnot(m_removed.aqm, m_old.aqm, m_new.aqm, AP_DOMAINS);
- bitmap_andnot(m_added.apm, m_new.apm, m_old.apm, AP_DEVICES);
- bitmap_andnot(m_added.aqm, m_new.aqm, m_old.aqm, AP_DOMAINS);
+ if (!bitmap_empty(apids_removed, AP_DEVICES)) {
+ for_each_set_bit_inv(apid, apids_removed, AP_DEVICES)
+ collect_queues_by_apid(NULL, apid, qlist);
+ }
- /* Need new bitmaps in matrix_mdev for validation */
- ap_matrix_copy(&matrix_mdev->matrix, &m_new);
+ if (!list_empty(qlist))
+ vfio_ap_mdev_reset_qlist(qlist);
+}
- /* Ensure new state is valid, else undo new state */
- rc = vfio_ap_mdev_validate_masks(matrix_mdev);
- if (rc) {
- ap_matrix_copy(&matrix_mdev->matrix, &m_old);
- goto out;
+/**
+ * remove_queues_already_reset:
+ *
+ * Remove the queues that have already beeen reset from a list of queues that
+ * have yet to be reset.
+ *
+ * @qlist_rst: A list of queues that have already been reset
+ * @qlist_rem: A list of queues from which already reset queues are to be
+ * removed.
+ */
+static void remove_queues_already_reset(struct list_head *qlist_rst,
+ struct list_head *qlist_rem)
+{
+ struct vfio_ap_queue *rq, *qr, *trq, *tqr;
+
+ if (list_empty(qlist_rst))
+ return;
+
+ /*
+ * Each queue in qlist_reset has already been reset, so remove the
+ * matching queues from qlist_reset so they don't get reset again
+ */
+ list_for_each_entry_safe(qr, tqr, qlist_rem, reset_qnode) {
+ list_for_each_entry_safe(rq, trq, qlist_rst, reset_qnode) {
+ if (qr->apqn == rq->apqn)
+ list_del(&qr->reset_qnode);
+ }
}
- rc = ap_matrix_overflow_check(matrix_mdev);
- if (rc) {
- ap_matrix_copy(&matrix_mdev->matrix, &m_old);
- goto out;
+}
+
+static void reset_removed_queues(struct ap_matrix *m_removed,
+ unsigned long *apm_filtered)
+{
+ struct list_head qlist_by_apid, qlist_by_apqi;
+ DECLARE_BITMAP(apqis, AP_DOMAINS);
+ unsigned long apqi;
+
+ INIT_LIST_HEAD(&qlist_by_apid);
+ INIT_LIST_HEAD(&qlist_by_apqi);
+ bitmap_clear(apqis, 0, AP_DOMAINS);
+
+ reset_removed_queues_by_apid(m_removed->apm, apm_filtered, &qlist_by_apid);
+
+ for_each_set_bit_inv(apqi, m_removed->aqm, AP_DEVICES) {
+ set_bit_inv(apqi, apqis);
+ collect_queues_by_apqi(NULL, apqi, &qlist_by_apqi);
}
- rc = count;
- /* Need old bitmaps in matrix_mdev for unplug/unlink */
- ap_matrix_copy(&matrix_mdev->matrix, &m_old);
+ if (list_empty(&qlist_by_apqi))
+ return;
- /* Unlink removed adapters/domains */
- vfio_ap_mdev_hot_unplug_adapters(matrix_mdev, m_removed.apm);
- vfio_ap_mdev_hot_unplug_domains(matrix_mdev, m_removed.aqm);
+ remove_queues_already_reset(&qlist_by_apid, &qlist_by_apqi);
- /* Need new bitmaps in matrix_mdev for linking new adapters/domains */
- ap_matrix_copy(&matrix_mdev->matrix, &m_new);
+ if (!list_empty(&qlist_by_apqi))
+ vfio_ap_mdev_reset_qlist(&qlist_by_apid);
+}
- /* Link newly added adapters */
- for_each_set_bit_inv(newbit, m_added.apm, AP_DEVICES)
- vfio_ap_mdev_link_adapter(matrix_mdev, newbit);
+/**
+ * restore_mdev_state:
+ *
+ * Restore the mdev to its previous state:
+ * - Unlink the queues from the updated mdev
+ * - Copy the previous matrix and shadow_apcb to the mdev
+ * - Re-link the original queues to the mdev
+ *
+ * @matrix_mdev: The object that maintains the AP configuration for a guest
+ * @m_old: The object containing the bitmaps specifying the guest AP
+ * configuration profile as it was prior to the attempt to set
+ * a new one
+ * @m_old_shadow: The object containing the bitmaps specifying the guest AP
+ * configuration as it was prior to the attempt to set a new
+ * one
+ */
+static void restore_mdev_state(struct ap_matrix_mdev *matrix_mdev,
+ struct ap_matrix *m_old,
+ struct ap_matrix *m_old_shadow)
+{
+ vfio_ap_unlink_queues(matrix_mdev);
+ ap_matrix_copy(&matrix_mdev->matrix, m_old);
+ ap_matrix_copy(&matrix_mdev->shadow_apcb, m_old_shadow);
+ vfio_ap_link_queues(matrix_mdev);
+}
- for_each_set_bit_inv(newbit, m_added.aqm, AP_DOMAINS)
- vfio_ap_mdev_link_domain(matrix_mdev, newbit);
+/**
+ * vfio_ap_set_new_guest_config:
+ *
+ * Set a new AP configuration for a guest.
+ *
+ * @matrix_mdev: Object used to maintain the AP configuration for a guest
+ * @m_new: Object used to set the new AP configuration
+ * @filtering_allowable: A boolean value indicating whether any filtering of
+ * new AP configuration is acceptable. If the configuration
+ * needs to be filtered before it can be passed through to
+ * the guest and this flag is set to false, then
+ * the operation shall be terminated with an error.
+ *
+ * Returns: zero (0) if the new AP configuration is successfully set; otherwise,
+ * returns an error:
+ *
+ * ~ EADDRNOTAVAIL One or more APQNs are reserved for host use
+ * ~ EADDRINUSE One or more APQNs are assigned to another mdev
+ * ~ ENODEV An adapter, domain or control domain in the new
+ * AP configuration exceeds the max architected value
+ * ~ ECANCELED @filtering_allowed was specified as false and the
+ * AP configuration needs to be filtered.
+ */
+int vfio_ap_set_new_guest_config(struct ap_matrix_mdev *matrix_mdev,
+ struct ap_matrix *m_new,
+ bool filtering_allowable)
+{
+ DECLARE_BITMAP(apm_filtered, AP_DEVICES);
+ struct ap_matrix m_old, m_old_shadow, m_removed;
+ bool do_update;
+ int rc;
- /* filter resources not bound to vfio-ap */
+ lockdep_assert_held(&ap_attr_mutex);
+ assert_has_update_locks_for_mdev(matrix_mdev);
+
+ /* Save old state */
+ ap_matrix_copy(&m_old, &matrix_mdev->matrix);
+ ap_matrix_copy(&m_old_shadow, &matrix_mdev->shadow_apcb);
+
+ /* Reset mdev state */
+ vfio_ap_unlink_queues(matrix_mdev);
+ ap_matrix_copy(&matrix_mdev->matrix, m_new);
+ vfio_ap_link_queues(matrix_mdev);
+
+ rc = validate_ap_matrix(matrix_mdev);
+ if (rc) {
+ restore_mdev_state(matrix_mdev, &m_old, &m_old_shadow);
+ return rc;
+ }
+
+ /*
+ * If APIDs need to be filtered from the guest AP config and filtering
+ * is not allowable according to the caller, then terminate the operation.
+ */
do_update = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered);
+ if (!bitmap_empty(apm_filtered, AP_DEVICES) && !filtering_allowable) {
+ restore_mdev_state(matrix_mdev, &m_old, &m_old_shadow);
+ return -ECANCELED;
+ }
+
do_update |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
- /* Apply changes to shadow apbc if things changed */
- if (do_update) {
+ if (do_update)
vfio_ap_mdev_update_guest_apcb(matrix_mdev);
- reset_queues_for_apids(matrix_mdev, apm_filtered);
+
+ get_removed_matrixes(&m_removed, &m_old, m_new);
+ if (!bitmap_empty(m_removed.apm, AP_DEVICES) ||
+ !bitmap_empty(apm_filtered, AP_DEVICES))
+ reset_removed_queues(&m_removed, apm_filtered);
+
+ return 0;
+}
+
+static ssize_t ap_config_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
+ struct ap_matrix m_new;
+ char *newbuf, *rest;
+ ssize_t rc;
+
+ newbuf = kstrndup(buf, AP_CONFIG_STRLEN, GFP_KERNEL);
+ if (!newbuf)
+ return -ENOMEM;
+ rest = newbuf;
+
+ mutex_lock(&ap_attr_mutex);
+ get_update_locks_for_mdev(matrix_mdev);
+
+ if (parse_bitmap(&rest, m_new.apm, AP_DEVICES) ||
+ parse_bitmap(&rest, m_new.aqm, AP_DOMAINS) ||
+ parse_bitmap(&rest, m_new.adm, AP_DOMAINS)) {
+ kfree(newbuf);
+ release_update_locks_for_mdev(matrix_mdev);
+ mutex_unlock(&ap_attr_mutex);
+ return -EINVAL;
}
-out:
+
+ rc = vfio_ap_set_new_guest_config(matrix_mdev, &m_new, true);
+ if (!rc)
+ rc = count;
+
release_update_locks_for_mdev(matrix_mdev);
mutex_unlock(&ap_attr_mutex);
kfree(newbuf);
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 1fbdfcce5a11..7ae48729cafa 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -177,4 +177,8 @@ int vfio_ap_init_migration_data(struct ap_matrix_mdev *matrix_mdev);
void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev);
void vfio_ap_reset_migration_state(struct ap_matrix_mdev *matrix_mdev);
+int vfio_ap_set_new_guest_config(struct ap_matrix_mdev *matrix_mdev,
+ struct ap_matrix *m_new,
+ bool filtering_allowable);
+
#endif /* _VFIO_AP_PRIVATE_H_ */
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (8 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:37 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP Anthony Krowiak
` (4 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Implements the 'write' callback function that was added to the
'file_operations' structure for the file stream created to restore the
state of the vfio-ap device on the destination system when the migration
state transitioned from STOP to RESUMING
The write callback retrieves the vfio device migration state saved to the
file stream created when the vfio device state was transitioned from
STOP to STOP_COPY. The saved state contains the source guest's AP
configuration information. This data is copied from the userspace buffer
passed to the 'write' callback and stored in the vfio_ap_config structure
used to set the state of the vfio-ap device on the destination host. If the
source guest's AP configuration is compatible with the AP configuration on
the destination host, it will be hot plugged into the destination guest.
In order for the source guest's and destination host's AP configurations
to be considered compatible:
* Each APQN in the source guest's AP configuration must also be in the
destination host's AP configuration
* Each matching APQN in the destination host's AP configuration must be
bound to the vfio_ap device driver
* Each matching APQN in the destination host's AP configuration must
reference a queue device with compatible hardware:
- The source and destination queues must have the same facilities
installed:
~ APSC facility
~ APQKM facility
~ AP4KC facility
- The source and destination queues must have the same mode:
~ Coprocessor-mode
~ Accelerator-mode
~ XCP-mode
- The source and destination queues must have the same APXA facility
setting
~ If the APXA facility is installed on source queue, it must also
be installed on the destination queue and vice versa
- The source and destination queues must have a compatible
classification setting. If the source queue has full native card
function, then the destination queue must also have full native
card function. If the source queue has stateless functions, then
the destination queue can have stateless functions or full native card
function because the latter includes the stateless functions.
- The binding and associated state for both the source and destination
queues must indicate that the queue is usable for all messages
(i.e., BS bits equal to 00).
- The AP type of the destination queue must be the same as or newer than
the source queue (backward compatibility)
Note: The get_hardware_info_for_queue function that was created in
a previous patch was modified to take a mediated device name rather
than an ap_matrix_mdev object because that is what is needed for
this patch so the function can be executed without holding the
matrix_dev->mdevs_lock.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 785 +++++++++++++++++++++++-
drivers/s390/crypto/vfio_ap_ops.c | 68 --
drivers/s390/crypto/vfio_ap_private.h | 56 ++
3 files changed, 829 insertions(+), 80 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index e917cc995d35..9cd1e7a3f743 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -9,6 +9,40 @@
#include "ap_bus.h"
#include "vfio_ap_private.h"
+/*
+ * Masks the fields of the queue information returned from the PQAP(TAPQ)
+ * command. In order to migrate a guest, it's AP configuration must be
+ * compatible with AP configuration assigned to the target guest's mdev.
+ * This mask is used to verify that the queue information for each source and
+ * target queue is compatible (i.e., the masked fields are equivalent).
+ *
+ * The relevant fields covered by this mask are:
+ * S bit 0: APSC facility installed
+ * M bit 1: APQKM facility installed
+ * C bit 2: AP4KC facility installed
+ * Mode bits 3-5:
+ * D bit 3: CCA-mode facility
+ * A bit 4: accelerator-mode facility
+ * X bit 5: XCP-mode facility
+ * N bit 6: APXA facility installed
+ * SL bit 7: SLCF facility installed
+ * Classification (functional capabilities) bits 8-16
+ * bit 8: Native card function
+ * bit 9: Only stateless functions
+ * BS bits 16-17:
+ * AP Type bits 32-40:
+ */
+#define QINFO_DATA_MASK 0xffffc000ff000000
+
+/*
+ * Masks the bit that indicates whether full native card function is available
+ * from the 8 bits specifying the functional capabilities of a queue
+ */
+#define CLASSIFICATION_NATIVE_FCN_MASK 0x80
+
+/* The maximum number of queues that can be installed in an s390 system */
+#define MAX_AP_QUEUES (AP_DEVICES * AP_DOMAINS)
+
/**
* vfio_ap_migration_data - the data needed to migrate a guest with pass-through
* access to AP devices
@@ -147,7 +181,7 @@ static size_t vfio_ap_config_size(struct ap_matrix_mdev *matrix_mdev,
return qinfo_size + sizeof(struct vfio_ap_config);
}
-static int get_hardware_info_for_queue(struct ap_matrix_mdev *matrix_mdev,
+static int get_hardware_info_for_queue(const char *mdev_name,
struct ap_tapq_hwinfo *hwinfo,
unsigned long apqn)
{
@@ -164,22 +198,20 @@ static int get_hardware_info_for_queue(struct ap_matrix_mdev *matrix_mdev,
/* For all these RCs the tapq info should be available */
return 0;
case AP_RESPONSE_Q_NOT_AVAIL:
- dev_err(matrix_mdev->vdev.dev,
- "migration failed: Failed to get hwinfo for queue %02lx.%04lx on target host: TAPQ rc=%d",
- AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+ pr_err("vfio_ap_mdev %s: migration failed: Failed to get hwinfo for queue %02lx.%04lx on target host: TAPQ rc=%d",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+
return -ENODEV;
default:
/* On a pending async error the tapq info should be available */
if (!status.async)
return 0;
- dev_err(matrix_mdev->vdev.dev,
- "Failed to get hwinfo for queue %02lx.%04lx: TAPQ rc=%d",
- AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+ pr_err("vfio_ap_mdev %s: Failed to get hwinfo for queue %02lx.%04lx: TAPQ rc=%d",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
+
return -EIO;
}
-
- return -EINVAL;
}
static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
@@ -187,10 +219,12 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
{
unsigned long *apm, *aqm, num_queues, apid, apqi, apqn;
struct ap_tapq_hwinfo source_hwinfo;
+ const char *mdev_name;
int ret;
lockdep_assert_held(&matrix_dev->mdevs_lock);
+ mdev_name = dev_name(matrix_mdev->vdev.dev);
apm = matrix_mdev->shadow_apcb.apm;
aqm = matrix_mdev->shadow_apcb.aqm;
num_queues = 0;
@@ -199,7 +233,7 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
apqn = AP_MKQID(apid, apqi);
- ret = get_hardware_info_for_queue(matrix_mdev,
+ ret = get_hardware_info_for_queue(mdev_name,
&source_hwinfo, apqn);
if (ret)
return ret;
@@ -324,11 +358,738 @@ static struct file *vfio_ap_open_file_stream(struct ap_matrix_mdev *matrix_mdev,
return filp;
}
+static int validate_resuming_write_parms(struct file *filp,
+ size_t len, loff_t *pos)
+{
+ struct ap_matrix_mdev *matrix_mdev;
+ loff_t total_len;
+ int ret;
+
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+
+ matrix_mdev = filp->private_data;
+ if (!matrix_mdev || !matrix_mdev->mig_data) {
+ ret = -ENODEV;
+ goto done;
+ }
+
+ if (filp != matrix_mdev->mig_data->resuming_mig_state.filp) {
+ ret = -ENXIO;
+ goto done;
+ }
+
+ if (*pos < 0) {
+ ret = -EINVAL;
+ goto done;
+ }
+
+ if (check_add_overflow((loff_t)len, *pos, &total_len)) {
+ ret = -ERANGE;
+ goto done;
+ }
+
+ /*
+ * If the ap_config has not yet been allocated and the file position
+ * indicates this is not the first write, or the ap_config has been allocated
+ * but the file position indicates this is the first write, then this is an
+ * error condition.
+ */
+ if ((!matrix_mdev->mig_data->resuming_mig_state.ap_config && *pos != 0) ||
+ (matrix_mdev->mig_data->resuming_mig_state.ap_config && *pos == 0)) {
+ ret = -EFAULT;
+
+ goto done;
+ }
+
+ ret = 0;
+
+done:
+ return ret;
+}
+
+static ssize_t calculate_ap_config_size(unsigned int num_queues)
+{
+ size_t qinfo_size;
+
+ if (num_queues > MAX_AP_QUEUES)
+ return -EINVAL;
+
+ qinfo_size = num_queues * sizeof(struct vfio_ap_queue_info);
+ return qinfo_size + sizeof(struct vfio_ap_config);
+}
+
+/**
+ * allocate_ap_config:
+ *
+ * Allocate storage for the source guest's AP configuration data sent from
+ * userspace.
+ *
+ * @ap_config: The location in which to store the pointer to the storage
+ * allocated for the AP configuration data.
+ * @buf: The userspace buffer containing some or all of the source
+ * guest's AP configuration data
+ * @len: The number of bytes of data to copy from @buf
+ *
+ * Returns: The number of bytes of storage allocated for the config data or
+ * an error:
+ *
+ *
+ * -EIO: failed to copy data from @buf
+ * -EINVAL: the number of queues specified exceeds the max allowed
+ * -ENOMEM: the allocation of storage failed
+ */
+static ssize_t allocate_ap_config(struct vfio_ap_config **ap_config,
+ const char __user *buf, size_t len)
+{
+ struct vfio_ap_config tmp_ap_config;
+ ssize_t config_size;
+ size_t copy_size;
+
+ /*
+ * If the length of the data sent exceeds the size of the vfio_ap_config
+ * structure, then we will copy enough data from userspace to get the
+ * number of queues which we can use to allocate enough space all of
+ * the queue information.
+ */
+ copy_size = min(len, sizeof(tmp_ap_config));
+
+ if (copy_from_user(&tmp_ap_config, buf, copy_size))
+ return -EIO;
+
+ /*
+ * If the length of data sent includes the number of queues
+ * in the AP configuration, then calculate its size; otherwise
+ * set config_size to the length of data sent.
+ */
+ if (len >= sizeof(struct vfio_ap_config)) {
+ config_size = calculate_ap_config_size(tmp_ap_config.num_queues);
+
+ /* If the calculation returned an error */
+ if (config_size < 0)
+ return config_size;
+ } else {
+ config_size = len;
+ }
+
+ *ap_config = kzalloc(config_size, GFP_KERNEL_ACCOUNT);
+ if (!*ap_config)
+ return -ENOMEM;
+
+ return config_size;
+}
+
+/**
+ * reallocate_ap_config:
+ *
+ * Reallocate the storage buffer so it is large enough to store the source
+ * guest's AP configuration data sent from userspace.
+ *
+ * @mig_ap_config: The location in which to store the pointer to the storage
+ * reallocated for the AP configuration data.
+ * len: The length of the data to be stored
+ *
+ * Returns: The size of the memory allocated for the source guest's
+ * AP configuration data or an error:
+ *
+ * -ENOMEM: The call to krealloc failed
+ * -EINVAL: The guest's AP configuration size changed between
+ * calls to the vfio_ap_resuming_write function.
+ *
+ */
+static ssize_t reallocate_ap_config(struct vfio_ap_config **mig_ap_config,
+ size_t len)
+{
+ struct vfio_ap_config *ap_config = *mig_ap_config;
+ struct vfio_ap_config *new_ap_config;
+ size_t new_cfg_sz, cur_cfg_sz;
+ unsigned int num_queues;
+
+ cur_cfg_sz = ap_config->config_sz;
+ num_queues = ap_config->num_queues;
+
+ /*
+ * If the current configuration size is greater than the
+ * size of a vfio_ap_config structure (i.e., contains the num_queues
+ * field), then there should already be enough storage allocated
+ * to store the source guest's AP configuration. Let's verify that the
+ * amount of storage allocated is what we expect based on the number of
+ * vfio_ap_queue_info objects that must be stored.
+ */
+ if (cur_cfg_sz >= sizeof(struct vfio_ap_config)) {
+ new_cfg_sz = calculate_ap_config_size(num_queues);
+ if (cur_cfg_sz != new_cfg_sz)
+ return -EINVAL;
+ } else {
+ new_cfg_sz = cur_cfg_sz + len;
+ }
+
+ new_ap_config = krealloc(ap_config, new_cfg_sz, GFP_KERNEL_ACCOUNT);
+ if (!new_ap_config)
+ return -ENOMEM;
+
+ *mig_ap_config = new_ap_config;
+
+ return new_cfg_sz;
+}
+
+/**
+ * qdev_is_bound_to_vfio_ap:
+ *
+ * Query to determine whether a queue with the specified APQN is available on
+ * the host system and bound to the vfio_ap device driver.
+ *
+ * @apqn: The APQN of the queue device being queried
+ *
+ * Returns: True if there is a queue device with the specified @apqn installed
+ * in the system and is bound to the vfio_ap device driver; otherwise,
+ * returns false.
+ */
+static bool qdev_is_bound_to_vfio_ap(unsigned int apqn)
+{
+ struct ap_queue *queue;
+ bool is_bound = true;
+
+ queue = ap_get_qdev(apqn);
+ if (!queue)
+ return false;
+
+ if (queue->ap_dev.device.driver != &matrix_dev->vfio_ap_drv->driver)
+ is_bound = false;
+
+ put_device(&queue->ap_dev.device);
+
+ return is_bound;
+}
+
+/**
+ * queues_available_on_target_system:
+ *
+ * Query whether each queue from the source guest's AP configuration is
+ * available and bound to the vfio_ap device driver; if not, log an error
+ * message.
+ *
+ * @mdev_name: The mdev name to use in error messages
+ * @source_config: The object specifying the source guest's AP configuration
+ *
+ * Returns: true if each queue identified in @source_config is available and
+ * bound to the vfio_ap device driver; otherwise, returns false.
+ */
+static bool
+queues_available_on_target_system(const char *mdev_name,
+ struct vfio_ap_config *source_config)
+{
+ unsigned long apqn;
+ bool ret = true;
+
+ for (int i = 0; i < source_config->num_queues; i++) {
+ apqn = source_config->qinfo[i].apqn;
+
+ /*
+ * Find the queue device bound to the vfio_ap device driver. If it is
+ * not found, log an error and continue so users see all problems
+ * at once, not one-at-a-time through retries of the migration.
+ */
+ if (!qdev_is_bound_to_vfio_ap(apqn)) {
+ pr_err("vfio_ap_mdev %s: Queue %02lx.%04lx not available to vfio_ap driver on target host\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ ret = false;
+ }
+ }
+
+ return ret;
+}
+
+static void report_facilities_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (src_hwinfo->apsc != target_hwinfo->apsc) {
+ if (src_hwinfo->apsc) {
+ pr_err("vfio_ap_mdev %s: APSC facility installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: APSC facility not installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: APSC facility not installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s APSC facility installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+
+ if (src_hwinfo->mex4k != target_hwinfo->mex4k) {
+ if (src_hwinfo->mex4k) {
+ pr_err("vfio_ap_mdev %s: mex4k facility installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: mex4k facility not installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: mex4k facility not installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: mex4k facility installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+
+ if (src_hwinfo->crt4k != target_hwinfo->crt4k) {
+ if (src_hwinfo->crt4k) {
+ pr_err("vfio_ap_mdev %s: crt4k facility installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: crt4k facility not installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: crt4k facility not installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: crt4k facility installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+}
+
+static void report_mode_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (src_hwinfo->cca != target_hwinfo->cca) {
+ if (src_hwinfo->cca) {
+ pr_err("vfio_ap_mdev %s: Coprocessor-mode facility installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Coprocessor-mode facility not installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: Coprocessor-mode facility not installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Coprocessor-mode facility installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+
+ if (src_hwinfo->accel != target_hwinfo->accel) {
+ if (src_hwinfo->accel) {
+ pr_err("vfio_ap_mdev %s: Accelerator-mode facility installed source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Accelerator-mode facility not installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: Accelerator-mode facility not installed source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Accelerator-mode facility installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+
+ if (src_hwinfo->ep11 != target_hwinfo->ep11) {
+ if (src_hwinfo->ep11) {
+ pr_err("vfio_ap_mdev %s: XCP-mode facility installed source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: XCP-mode facility not installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: XCP-mode facility not installed source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: XCP-mode facility installed target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+}
+
+static void report_apxa_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (src_hwinfo->apxa != target_hwinfo->apxa) {
+ if (src_hwinfo->apxa) {
+ pr_err("vfio_ap_mdev %s: AP-extended-addressing (APXA) facility installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: AP-extended-addressing (APXA) facility not installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: AP-extended-addressing (APXA) facility not installed in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: AP-extended-addressing (APXA) facility installed in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+}
+
+static void report_slcf_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (src_hwinfo->slcf != target_hwinfo->slcf) {
+ if (src_hwinfo->slcf) {
+ pr_err("vfio_ap_mdev %s: Stateless-command-filtering (SLCF) available in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Stateless-command-filtering (SLCF) not available in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ } else {
+ pr_err("vfio_ap_mdev %s: Stateless-command-filtering (SLCF) not available in source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Stateless-command-filtering (SLCF) available in target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+ }
+}
+
+static void report_bs_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ /*
+ * The BS field on both the source and destination must be 0, so if one of
+ * them is not, then report an error.
+ */
+ if (src_hwinfo->bs || target_hwinfo->bs) {
+ pr_err("vfio_ap_mdev %s: Bind/associate state for source (%01x) and target (%01x) queue %02lx.%04lx must be 0\n",
+ mdev_name, src_hwinfo->bs, target_hwinfo->bs,
+ AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+}
+
+static void report_aptype_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (src_hwinfo->at > target_hwinfo->at) {
+ pr_err("vfio_ap_mdev %s: AP type of source (%02x) not compatible with target (%02x)\n",
+ mdev_name, src_hwinfo->at, target_hwinfo->at);
+ }
+}
+
+static bool classes_compatible(struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ unsigned long src_native, target_native;
+
+ src_native = src_hwinfo->class & CLASSIFICATION_NATIVE_FCN_MASK;
+ target_native = target_hwinfo->class & CLASSIFICATION_NATIVE_FCN_MASK;
+
+ /*
+ * If the source queue has full native card function and the
+ * target queue has only stateless functions available, then
+ * there may be instructions that will not execute on the
+ * target queue. This shall be reported as an error.
+ *
+ * If the source queue has only stateless card functions and the
+ * target queue has full native card function available, then
+ * we are okay because the target queue can run all stateless card
+ * functions.
+ */
+ return (src_native != target_native) ? !src_native : true;
+}
+
+static void report_class_compatibility(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ if (!classes_compatible(src_hwinfo, target_hwinfo)) {
+ pr_err("vfio_ap_mdev %s: Full native card function available on source queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ pr_err("vfio_ap_mdev %s: Only stateless functions available on target queue %02lx.%04lx\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+ }
+}
+
+/*
+ * Log a device error reporting that migration failed due to queue
+ * incompatibilities followed by a device error for each incompatible feature.
+ */
+static void report_qinfo_incompatibilities(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ pr_err("vfio_ap_mdev %s: Migration failed: Source and target queue (%02lx.%04lx) not compatible\n",
+ mdev_name, AP_QID_CARD(apqn), AP_QID_QUEUE(apqn));
+
+ report_facilities_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_mode_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_apxa_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_slcf_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_aptype_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_bs_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+ report_class_compatibility(mdev_name, apqn, src_hwinfo, target_hwinfo);
+}
+
+/**
+ * queue_hardware_info_is_compatible:
+ *
+ * Verify whether the hardware information for a source queue is compatible with
+ * the hardware info for the corresponding queue on this system.
+ *
+ * In order to be compatible, the hardware information for each queue must
+ * meet the following requirements:
+ *
+ * 1. The hardware facilities bits much match
+ * 2. The AP type of the source queue must be the same as or older than that
+ * of the target queue (target is backwards compatible)
+ * 3. The classification bits must indicate:
+ * - Both queues have full native card function or both have stateless
+ * functions available
+ * - If the classification bits don't match, then the only acceptable
+ * configuration is stateless functions for the source queue and
+ * full native function for the target queue
+ * 4. The BS bits for both queues must be 0 (Queue usable for all messages
+ * supported by the adapter)
+ *
+ * @mdev_name: The mdev name to use in error messages
+ * @apqn: The APQN for the queues
+ * @src_hwinfo: The hardware info for the source queue
+ * @target_hwinfo: The hardware info for the corresponding queue on this system
+ *
+ * Returns: true if the hardware info for the two queues is compatible;
+ * otherwise, returns false.
+ */
+static bool queue_hardware_info_is_compatible(const char *mdev_name,
+ unsigned long apqn,
+ struct ap_tapq_hwinfo *src_hwinfo,
+ struct ap_tapq_hwinfo *target_hwinfo)
+{
+ unsigned long src_bits, target_bits;
+
+ src_bits = src_hwinfo->value & QINFO_DATA_MASK;
+ target_bits = target_hwinfo->value & QINFO_DATA_MASK;
+
+ /* If all bits match the queues are compatible */
+ if (src_bits == target_bits)
+ return true;
+
+ if (src_hwinfo->fac == target_hwinfo->fac &&
+ src_hwinfo->at <= target_hwinfo->at &&
+ classes_compatible(src_hwinfo, target_hwinfo) &&
+ (src_hwinfo->bs == 0 && target_hwinfo->bs == 0)) {
+ return true;
+ }
+
+ report_qinfo_incompatibilities(mdev_name, apqn, src_hwinfo, target_hwinfo);
+
+ return false;
+}
+
+/**
+ * verify_ap_configs_are_compatible:
+ *
+ * Verifies that the queues in the source guest's AP configuration are
+ * compatible with the corresponding queues on this system.
+ *
+ * @mdev_name: The mdev name to use in error messages
+ * @source_config: The object specifying the source guest's AP configuration
+ *
+ * Returns: an error indicating either a failure to retrieve a queue's
+ * hardware information or one or more source queues are not
+ * compatible with the corresponding queue on this system; otherwise,
+ * returns zero to indicate compatibility.
+ */
+static int verify_ap_configs_are_compatible(const char *mdev_name,
+ struct vfio_ap_config *source_config)
+{
+ struct ap_tapq_hwinfo src_hwinfo, dest_hwinfo;
+ unsigned long apqn;
+ int ret = 0, rc;
+
+ for (int i = 0; i < source_config->num_queues; i++) {
+ apqn = source_config->qinfo[i].apqn;
+
+ /*
+ * If we can't get the hardware info for a particular queue, then let's
+ * capture the function return code and continue so we can log all
+ * errors to aid in debugging of migration.
+ */
+ rc = get_hardware_info_for_queue(mdev_name, &dest_hwinfo, apqn);
+ if (rc) {
+ ret = rc;
+ continue;
+ }
+
+ src_hwinfo.value = source_config->qinfo[i].data;
+
+ if (!queue_hardware_info_is_compatible(mdev_name, apqn,
+ &src_hwinfo,
+ &dest_hwinfo))
+ ret = -EFAULT;
+ }
+
+ return ret;
+}
+
+static int do_post_copy_validation(const char *mdev_name,
+ struct vfio_ap_config *source_config)
+{
+ if (!queues_available_on_target_system(mdev_name, source_config))
+ return -ENODEV;
+
+ return verify_ap_configs_are_compatible(mdev_name, source_config);
+}
+
+/**
+ * setup_ap_matrix_from_ap_config:
+ *
+ * Set the bits corresponding to the adapters, domains and control domains
+ * in the source guest's AP configuration into an ap_matrix object to be
+ * used to update the target guest to run on this host. An error message will
+ * be logged for each adapter, domain or control domain that is not available
+ * on this host. Returning an error after the each error may result in needing
+ * to initiate multiple migrations in order to find and fix each of them.
+ *
+ * Returns: zero (0) if each adapter, domain and control domain from the
+ * source guest's ap configuration is available on this host;
+ * otherwise, returns -ENODEV.
+ */
+static void setup_ap_matrix_from_ap_config(const char *mdev_name,
+ struct vfio_ap_config *ap_config,
+ struct ap_matrix *guest_matrix)
+{
+ struct vfio_ap_queue_info qinfo;
+ unsigned long apid, apqi;
+
+ for (int i = 0; i < ap_config->num_queues; i++) {
+ qinfo = ap_config->qinfo[i];
+ apid = AP_QID_CARD(qinfo.apqn);
+ apqi = AP_QID_QUEUE(qinfo.apqn);
+
+ if (!test_bit_inv(apid, guest_matrix->apm))
+ set_bit_inv(apid, guest_matrix->apm);
+ if (!test_bit_inv(apqi, guest_matrix->aqm))
+ set_bit_inv(apqi, guest_matrix->aqm);
+ }
+}
+
static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
size_t len, loff_t *pos)
{
- /* TODO */
- return -EOPNOTSUPP;
+ struct vfio_ap_migration_data *mig_data;
+ struct ap_matrix_mdev *matrix_mdev;
+ struct vfio_ap_config *ap_config;
+ struct ap_matrix guest_matrix;
+ ssize_t ret = 0, cfg_sz;
+ const char *mdev_name;
+
+ /*
+ * When userspace calls write() with an explicit offset (pwrite), pos is
+ * non-NULL and the function rejects it with -ESPIPE (illegal seek). For
+ * normal write() calls, pos is NULL, so we'll use the file's internal
+ * position filp->f_pos
+ */
+ if (pos)
+ return -ESPIPE;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+ pos = &filp->f_pos;
+
+ ret = validate_resuming_write_parms(filp, len, pos);
+ if (ret)
+ goto done;
+
+ matrix_mdev = filp->private_data;
+ mig_data = matrix_mdev->mig_data;
+ mdev_name = dev_name(matrix_mdev->vdev.dev);
+
+ /*
+ * If this is the first write operation, then allocate storage for the
+ * AP configuration information; otherwise, reallocate the
+ * struct vfio_ap_config object used to store the AP configuration data
+ * sent from userspace.
+ */
+ if (*pos == 0) {
+ ret = allocate_ap_config(&ap_config, buf, len);
+
+ /* If the allocation failed, we'll return the error */
+ if (ret < 0)
+ goto done;
+
+ cfg_sz = ret;
+ } else {
+ ap_config = mig_data->resuming_mig_state.ap_config;
+
+ ret = reallocate_ap_config(&ap_config, len);
+ if (ret < 0)
+ goto cleanup;
+
+ cfg_sz = ret;
+ }
+
+ if (*pos + len > cfg_sz) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ /*
+ * We don't want to lock all mdevs while copying data from userspace so
+ * we don't block all other mdevs in case the I/O takes a long time.
+ * From here on out we don't need this lock because we are not
+ * accessing the matrix_mdev until we need to get the update
+ * locks to set the new destination guest's AP configuration in which
+ * case this lock will be taken then.
+ */
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ if (copy_from_user((char *)ap_config + *pos, buf, len)) {
+ ret = -EIO;
+ goto cleanup;
+ }
+
+ /* Check if we've completed writing the entire configuration */
+ if (*pos + len == cfg_sz) {
+ ret = do_post_copy_validation(mdev_name, ap_config);
+ if (ret < 0)
+ goto cleanup;
+
+ setup_ap_matrix_from_ap_config(mdev_name, ap_config,
+ &guest_matrix);
+
+ /* Acquire locks required to update the guest's AP config */
+ mutex_lock(&ap_attr_mutex);
+ get_update_locks_for_mdev(matrix_mdev);
+
+ ret = vfio_ap_set_new_guest_config(matrix_mdev, &guest_matrix,
+ false);
+ if (!ret) {
+ mig_data->resuming_mig_state.ap_config = ap_config;
+ mig_data->resuming_mig_state.config_sz = cfg_sz;
+ }
+
+ release_update_locks_for_mdev(matrix_mdev);
+ mutex_unlock(&ap_attr_mutex);
+
+ if (ret)
+ goto cleanup;
+ }
+
+ ret = len;
+ *pos += len;
+ goto done;
+
+cleanup:
+ kfree(ap_config);
+done:
+ if (mutex_is_locked(&matrix_dev->mdevs_lock))
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ return ret;
}
static const struct file_operations vfio_ap_resume_fops = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index c5eb9e4aa85c..2c4cca4920b3 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -81,74 +81,6 @@ static inline void release_update_locks_for_kvm(struct kvm *kvm)
mutex_unlock(&matrix_dev->guests_lock);
}
-/**
- * get_update_locks_for_mdev: Acquire the locks required to dynamically update a
- * KVM guest's APCB in the proper order.
- *
- * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
- * configuration data to use to update a KVM guest's APCB.
- *
- * The proper locking order is:
- * 1. matrix_dev->guests_lock: required to use the KVM pointer to update a KVM
- * guest's APCB.
- * 2. matrix_mdev->kvm->lock: required to update a guest's APCB
- * 3. matrix_dev->mdevs_lock: required to access data stored in a matrix_mdev
- *
- * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
- * lock will not be taken.
- */
-static inline void get_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
-{
- mutex_lock(&matrix_dev->guests_lock);
- if (matrix_mdev && matrix_mdev->kvm)
- mutex_lock(&matrix_mdev->kvm->lock);
- mutex_lock(&matrix_dev->mdevs_lock);
-}
-
-/**
- * assert_has_update_locks_for_mdev:
- *
- * Assert the locks required to dynamically update a KVM guest's APCB are
- * currently held.
- *
- * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
- * configuration data to use to update a KVM guest's APCB.
- *
- * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
- * lock will not be taken.
- */
-static inline void
-assert_has_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
-{
- lockdep_assert_held(&matrix_dev->guests_lock);
- if (matrix_mdev && matrix_mdev->kvm)
- lockdep_assert_held(&matrix_mdev->kvm->lock);
- lockdep_assert_held(&matrix_dev->mdevs_lock);
-}
-
-/**
- * release_update_locks_for_mdev: Release the locks used to dynamically update a
- * KVM guest's APCB in the proper order.
- *
- * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
- * configuration data to use to update a KVM guest's APCB.
- *
- * The proper unlocking order is:
- * 1. matrix_dev->mdevs_lock
- * 2. matrix_mdev->kvm->lock
- * 3. matrix_dev->guests_lock
- *
- * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
- * lock will not be released.
- */
-static inline void release_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
-{
- mutex_unlock(&matrix_dev->mdevs_lock);
- if (matrix_mdev && matrix_mdev->kvm)
- mutex_unlock(&matrix_mdev->kvm->lock);
- mutex_unlock(&matrix_dev->guests_lock);
-}
-
/**
* get_update_locks_by_apqn: Find the mdev to which an APQN is assigned and
* acquire the locks required to update the APCB of
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 7ae48729cafa..7f0c69f2fd9e 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -157,6 +157,62 @@ struct vfio_ap_queue {
struct work_struct reset_work;
};
+/**
+ * get_update_locks_for_mdev: Acquire the locks required to dynamically update a
+ * KVM guest's APCB in the proper order.
+ *
+ * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
+ * configuration data to use to update a KVM guest's APCB.
+ *
+ * The proper locking order is:
+ * 1. matrix_dev->guests_lock: required to use the KVM pointer to update a KVM
+ * guest's APCB.
+ * 2. matrix_mdev->kvm->lock: required to update a guest's APCB
+ * 3. matrix_dev->mdevs_lock: required to access data stored in a matrix_mdev
+ *
+ * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
+ * lock will not be taken.
+ */
+static inline void get_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
+{
+ mutex_lock(&matrix_dev->guests_lock);
+ if (matrix_mdev && matrix_mdev->kvm)
+ mutex_lock(&matrix_mdev->kvm->lock);
+ mutex_lock(&matrix_dev->mdevs_lock);
+}
+
+/**
+ * release_update_locks_for_mdev: Release the locks used to dynamically update a
+ * KVM guest's APCB in the proper order.
+ *
+ * @matrix_mdev: a pointer to a struct ap_matrix_mdev object containing the AP
+ * configuration data to use to update a KVM guest's APCB.
+ *
+ * The proper unlocking order is:
+ * 1. matrix_dev->mdevs_lock
+ * 2. matrix_mdev->kvm->lock
+ * 3. matrix_dev->guests_lock
+ *
+ * Note: If @matrix_mdev is NULL or is not attached to a KVM guest, the KVM
+ * lock will not be released.
+ */
+static inline void release_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
+{
+ mutex_unlock(&matrix_dev->mdevs_lock);
+ if (matrix_mdev && matrix_mdev->kvm)
+ mutex_unlock(&matrix_mdev->kvm->lock);
+ mutex_unlock(&matrix_dev->guests_lock);
+}
+
+static inline void
+assert_has_update_locks_for_mdev(struct ap_matrix_mdev *matrix_mdev)
+{
+ lockdep_assert_held(&matrix_dev->guests_lock);
+ if (matrix_mdev && matrix_mdev->kvm)
+ lockdep_assert_held(&matrix_mdev->kvm->lock);
+ lockdep_assert_held(&matrix_dev->mdevs_lock);
+}
+
int vfio_ap_mdev_get_num_queues(struct ap_matrix *ap_matrix);
int vfio_ap_mdev_register(void);
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (9 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:46 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa Anthony Krowiak
` (3 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Typically at this point, interrupt processing for the device, DMA and other
changes to the device's external state must be prohibited. For VFIO AP,
there is no physical DMA device virtualized by the vfio-ap device . It
merely manages the guest's AP configuration that identifies the devices
passed through to the guest, but does not have access to any of them. Those
AP devices are passed through and controlled exclusively by the SIE program
used to start the guest. There only thing to do at this point is return
NULL to the caller since the file streams and associated data will get
cleaned up as a result of the VFIO migration framework releasing the files.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 9cd1e7a3f743..1b526b743ad0 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -1159,12 +1159,17 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
return filp;
}
+ /*
+ * Terminates the data transfer session of the vfio-ap device state
+ * between the source and target hosts. Since the vfio-ap device does
+ * not virtualize a DMA device, there is no internal device state to
+ * incorporate into the vfio-ap device on the target.
+ */
if ((cur_state == VFIO_DEVICE_STATE_RESUMING &&
new_state == VFIO_DEVICE_STATE_STOP) ||
(cur_state == VFIO_DEVICE_STATE_STOP_COPY &&
new_state == VFIO_DEVICE_STATE_STOP)) {
- /* TODO */
- return ERR_PTR(-EOPNOTSUPP);
+ return NULL;
}
if ((cur_state == VFIO_DEVICE_STATE_STOP &&
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (10 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:48 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration Anthony Krowiak
` (2 subsequent siblings)
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
The transition from STOP to RUNNING and vice versa indicates that
the vfio device is operating normally and migration is not
currently taking place. In this case, just set the new state.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 1b526b743ad0..572db04c4ac1 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -1172,12 +1172,17 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
return NULL;
}
+ /*
+ * These states indicate migration has either not been initiated or
+ * has completed and the vfio-ap device is operating normally.Since the
+ * vfio-ap device does not virtualize a DMA device, there is no internal
+ * device state to incorporate into the vfio-ap device on the target.
+ */
if ((cur_state == VFIO_DEVICE_STATE_STOP &&
new_state == VFIO_DEVICE_STATE_RUNNING) ||
(cur_state == VFIO_DEVICE_STATE_RUNNING &&
new_state == VFIO_DEVICE_STATE_STOP)) {
- /* TODO */
- return ERR_PTR(-EOPNOTSUPP);
+ return NULL;
}
/* vfio_mig_get_next_state() does not use arcs other than the above */
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (11 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:49 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst Anthony Krowiak
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Callback to get the estimated data length of the data that will be required
to complete reading of the vfio device's internal state. It's mandatory for
VFIO_DEVICE_FEATURE_MIGRATION migration support.
The function pointer for this callback is specified via the
'migration_get_data_size' field of the 'vfio_migration_ops' structure
which is stored with the VFIO device when the 'vfio_device'
structure representing the mediated device is initialized.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_migration.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
index 572db04c4ac1..83e1ce2c40c0 100644
--- a/drivers/s390/crypto/vfio_ap_migration.c
+++ b/drivers/s390/crypto/vfio_ap_migration.c
@@ -1283,7 +1283,18 @@ static int vfio_ap_get_state(struct vfio_device *vdev,
static int vfio_ap_get_data_size(struct vfio_device *vdev,
unsigned long *stop_copy_length)
{
- return -EOPNOTSUPP;
+ struct ap_matrix_mdev *matrix_mdev;
+ size_t qinfo_sz;
+ int num_queues;
+
+ mutex_lock(&matrix_dev->mdevs_lock);
+ matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
+ num_queues = vfio_ap_mdev_get_num_queues(&matrix_mdev->shadow_apcb);
+ qinfo_sz = num_queues * sizeof(struct vfio_ap_queue_info);
+ *stop_copy_length = qinfo_sz + sizeof(struct vfio_ap_config);
+ mutex_unlock(&matrix_dev->mdevs_lock);
+
+ return 0;
}
static const struct vfio_migration_ops vfio_ap_migration_ops = {
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (12 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:56 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst Anthony Krowiak
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Add the 'migratable' feature to the features provided by the sysfs
/sys/devices/vfio_ap/matrix/feature attribute to indicate that migration
of vfio devices is supported.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_drv.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
index fd7394d81880..d2fc3598cd24 100644
--- a/drivers/s390/crypto/vfio_ap_drv.c
+++ b/drivers/s390/crypto/vfio_ap_drv.c
@@ -26,9 +26,11 @@ MODULE_LICENSE("GPL v2");
struct ap_matrix_dev *matrix_dev;
debug_info_t *vfio_ap_dbf_info;
+#define FEATURES "guest_matrix hotplug ap_config migratable"
+
static ssize_t features_show(struct device *dev, struct device_attribute *attr, char *buf)
{
- return sysfs_emit(buf, "guest_matrix hotplug ap_config\n");
+ return sysfs_emit(buf, "%s\n", FEATURES);
}
static DEVICE_ATTR_RO(features);
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
` (13 preceding siblings ...)
2026-06-30 10:41 ` [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute Anthony Krowiak
@ 2026-06-30 10:41 ` Anthony Krowiak
2026-06-30 11:54 ` sashiko-bot
14 siblings, 1 reply; 30+ messages in thread
From: Anthony Krowiak @ 2026-06-30 10:41 UTC (permalink / raw)
To: linux-s390, linux-kernel, kvm
Cc: jjherne, borntraeger, mjrosato, pasic, alex, kwankhede, fiuczy,
pbonzini, frankja, imbrenda, agordeev, hca, gor
Add a 'Live Guest Migration' chapter to the
Documentation/arch/s390/vfio-ap.rst document to describe the details for
initiating live guest migration for a guest to which AP adapters, domains
and control domains have been passed through.
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
---
Documentation/arch/s390/vfio-ap.rst | 514 +++++++++++++++++++++++-----
1 file changed, 433 insertions(+), 81 deletions(-)
diff --git a/Documentation/arch/s390/vfio-ap.rst b/Documentation/arch/s390/vfio-ap.rst
index ac0c07f76ddd..fc0707c0dffd 100644
--- a/Documentation/arch/s390/vfio-ap.rst
+++ b/Documentation/arch/s390/vfio-ap.rst
@@ -1012,7 +1012,7 @@ guest_matrix dyn ap_config
the following features are advertised:
----------------+---------------------------------------------------------------+
++--------------+---------------------------------------------------------------+
| Flag | Description |
+==============+===============================================================+
| guest_matrix | guest_matrix attribute exists. It reports the matrix of |
@@ -1021,105 +1021,457 @@ the following features are advertised:
+--------------+---------------------------------------------------------------+
| dyn | Indicates hot plug/unplug of AP adapters, domains and control |
| | domains for a guest to which the mdev is attached. |
-+------------+-----------------------------------------------------------------+
++--------------+---------------------------------------------------------------+
| ap_config | ap_config interface for one-shot modifications to mdev config |
+--------------+---------------------------------------------------------------+
+| migratable | Indicates that live guest migration is supported for guests |
+| | to which crypto devices are passed through |
++--------------+---------------------------------------------------------------+
-Limitations
-===========
-Live guest migration is not supported for guests using AP devices without
-intervention by a system administrator. Before a KVM guest can be migrated,
-the vfio_ap mediated device must be removed. Unfortunately, it can not be
-removed manually (i.e., echo 1 > /sys/devices/vfio_ap/matrix/$UUID/remove) while
-the mdev is in use by a KVM guest. If the guest is being emulated by QEMU,
-its mdev can be hot unplugged from the guest in one of two ways:
-
-1. If the KVM guest was started with libvirt, you can hot unplug the mdev via
- the following commands:
-
- virsh detach-device <guestname> <path-to-device-xml>
-
- For example, to hot unplug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 from
- the guest named 'my-guest':
-
- virsh detach-device my-guest ~/config/my-guest-hostdev.xml
-
- The contents of my-guest-hostdev.xml:
-
-.. code-block:: xml
-
- <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
- <source>
- <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
- </source>
- </hostdev>
-
-
- virsh qemu-monitor-command <guest-name> --hmp "device-del <device-id>"
-
- For example, to hot unplug the vfio_ap mediated device identified on the
- qemu command line with 'id=hostdev0' from the guest named 'my-guest':
-
-.. code-block:: sh
-
- virsh qemu-monitor-command my-guest --hmp "device_del hostdev0"
-
-2. A vfio_ap mediated device can be hot unplugged by attaching the qemu monitor
- to the guest and using the following qemu monitor command:
+Live Guest Migration
+====================
+The VFIO AP mediated device is not used to provide userspace with direct
+access to a device as is the case with other devices that use the VFIO
+framework for device pass-through. Instead, it manages AP
+configuration metadata identifying the adapters, domains, and control
+domains to which a guest will be granted access. These AP resources are
+configured by assigning them to a vfio-ap mediated device via its sysfs
+assignment interfaces. When the guest is started, the vfio_ap device driver
+sets the guest's AP configuration from the metadata stored with the mediated
+device. The AP devices are not accessed directly through the vfio_ap driver,
+so the driver does not migrate the state of the AP devices themselves.
+Instead, it migrates the AP configuration metadata. The vfio_ap device
+driver on the destination host will then ensure that the AP configuration
+metadata migrated from the source host is compatible with the AP
+configuration available on the destination host and if
+so, will set the destination guest's AP configuration accordingly.
- (QEMU) device-del id=<device-id>
+To be considered compatible, the AP configuration for the destination host must
+meet these requirements:
- For example, to hot unplug the vfio_ap mediated device that was specified
- on the qemu command line with 'id=hostdev0' when the guest was started:
+* Each AP adapter, domain and control domain number assigned to the source
+ guest's AP configuration must be in the destination host's AP
+ configuration. This can be verified as follows:
- (QEMU) device-del id=hostdev0
+ * On the source host, display the AP configuration of the source guest:
-After live migration of the KVM guest completes, an AP configuration can be
-restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
-system into the guest in one of two ways:
+ .. code-block:: bash
-1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
- device into the guest via the following virsh commands:
+ cat /sys/devices/vfio_ap/matrix/$UUID/guest_matrix
+ cat /sys/devices/vfio_ap/matrix/$UUID/control_domains
- virsh attach-device <guestname> <path-to-device-xml>
+ .. note::
+ Replace ``$UUID`` with the actual UUID of your mediated device.
- For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
- the guest named 'my-guest':
+ * Display the AP configuration of the destination host:
- virsh attach-device my-guest ~/config/my-guest-hostdev.xml
+ .. code-block:: bash
- The contents of my-guest-hostdev.xml:
+ lszcrypt -V
+ cat /sys/bus/ap/ap_control_domain_mask
-.. code-block:: xml
+ Note that each bit in the ap_control_domain_mask corresponds to
+ correspond to domain numbers 0-255. If a bit is set to one, then
+ the control domain is available on the destination host. For
+ example:
- <hostdev mode='subsystem' type='mdev' managed='no' model='vfio-ap'>
- <source>
- <address uuid='62177883-f1bb-47f0-914d-32a22e3a8804'/>
- </source>
- </hostdev>
+ .. code-block:: bash
+ 0x0000000000001020000000000000000000000000000000000000000000000000
- virsh qemu-monitor-command <guest-name> --hmp \
- "device_add vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+ This mask indicates that AP adapters 51 (0x33) and 58 (0x58) are
+ available on the host system.
- For example, to hot plug the vfio_ap mediated device
- 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest named 'my-guest' with
- device-id hostdev0:
+ * Verify the configurations:
- virsh qemu-monitor-command my-guest --hmp \
- "device_add vfio-ap,\
- sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
- id=hostdev0"
+ * Each APQN in the guest's sysfs ``guest_matrix`` file must match an
+ APQN returned from the lszcrypt command on the destination host. It
+ is important to note the driver to which the queue is bound which
+ is displayed in the ``driver`` column. Queues not bound to the vfio_ap
+ driver are not available to the destination guest. That, however, does
+ not guarantee they can be used by the guest being migrated; for example,
+ migration will fail if any of these APQNs are assigned to a mediated
+ device other than the one being migrated.
-2. A vfio_ap mediated device can be hot plugged by attaching the qemu monitor
- to the guest and using the following qemu monitor command:
+ * Each domain number from the source guest's sysfs
+ ``control_domains`` attribute must correspond to a bit set to one in
+ the destination host's ``ap_control_domain_mask``.
- (qemu) device_add "vfio-ap,sysfsdev=<path-to-mdev>,id=<device-id>"
+* Each APQN assigned to the source guest and destination guest must
+ reference a queue device with compatible hardware capabilities:
- For example, to plug the vfio_ap mediated device
- 62177883-f1bb-47f0-914d-32a22e3a8804 into the guest with the device-id
- hostdev0:
++--------------+---------------------------------------------------------------+
+| Hardware | Description |
+| Capabilities | |
++==============+===============================================================+
+| [1] | * AP special command facility (APSC) |
+| Facilities | * AP 4096-bit ME PKU commands facility (AP4KM) |
+| | * AP 4096-bit CRT PKU commands (AP4KC) |
++--------------+---------------------------------------------------------------+
+| [1] Mode | * CCA-mode |
+| | * Accelerator-mode |
+| | * XCP-mode (EP11) |
++--------------+---------------------------------------------------------------+
+| [1] AP | APXA installed |
+| extended | |
+| addressing | |
++--------------+---------------------------------------------------------------+
+| [1] Command | Command filtering available |
+| filtering | |
++--------------+---------------------------------------------------------------+
+| [2] | * Full native card function |
+| Functional | * Only stateless functions |
+| capabilities | |
++--------------+---------------------------------------------------------------+
+| [3] AP type | * 10 (0x0a): CEX4 |
+| | * 11 (0x0b): CEX5 |
+| | * 12 (0x0c) CEX6 |
+| | * 13 (0x0d) CEX7 |
+| | * 14 (0x0e) CEX8 |
++--------------+---------------------------------------------------------------+
- (QEMU) device-add "vfio-ap,\
- sysfsdev=/sys/devices/vfio_ap/matrix/62177883-f1bb-47f0-914d-32a22e3a8804,\
- id=hostdev0"
++-----------------------------------------------------------------------------+
+| Table Keys |
++=============================================================================+
+| |
+| [1] The hardware capability must be the same for both the source |
+| and destination queue device. |
+| |
+| [2] The source and destination queues must either both have |
+| full native card function or both have stateless functions. |
+| If the functional capabilities don't match, then the source |
+| queue can have stateless capabilities since full native card |
+| function can handle stateless functions; otherwise, migration |
+| will fail. |
+| |
+| [3] The AP type on the source and destination guests can |
+| differ if the queue passed through to the target guest is a |
+| newer model (backwards compatible) |
++--------------+--------------------------------------------------------------+
+
+* To verify the hardware capabilities are compatible:
+
+ * On both the source and destination hosts, display the hardware capabilities
+ for each AP adapter in the AP configuration for that host::
+
+ cat /sys/bus/ap/devices/card$APID/ap_functions
+
+ .. note::
+ The ``$APID`` is the two-character adapter number in hexidecimal format;
+ for example, card02 or card1f. The ``/sys/bus/ap/devices`` directory
+ also lists the APQNs of the queue devices installed in the respective
+ host system with the first two characters being the APID.
+
+ * Verify the hardware capabilities for each AP adapter device on the source
+ and destination host are compatible. The ``ap_functions`` attribute is a
+ bitmask. The bits in the mask read from left to right starting with bit 0.
+ Each bit that is set to one indicates the corresponding hardware capability
+ is installed:
+
+ * Bits 0-3 are the facilities bits. Each bit value must match for the AP
+ devices on the source and destination systems. The values indicate the
+ following:
+ * bit-0: APSC is installed
+ * bit-1: AP4KM is installed
+ * bit-2: AP4KC is installed
+
+ * Bits 3-5 specify the adapter Modes. Only one of these bits will be set to
+ one and must be the same for both the source and destination adapter
+ devices:
+ * bit-3: CCA-mode
+ * bit-4: Accelerator-mode
+ * bit-5: XCP-mode (EP11)
+
+ * Bit 6 specifies whether APXA is installed and must be the same for both
+ the source and destination adapter devices.
+
+ * Bit 7 specifies command whether command filtering is installed
+ and must be the same for both the source and destination adapter devices.
+
+ * Bits 8-15 are the classification bits. There are only two relevant
+ bits and only one of them will be set to one. For the source and destination
+ adapter devices to be considered compatible:
+
+ * Bit 8 may be set to 1 on both hosts (full native card function)
+ * Bit 9 may be set to 1 on both hosts (only stateless functions available)
+ * Bit 9 may be set to 1 on the source host and bit 8 may be set to 1 on
+ the destination host
+
+ * Bits 16-17 are the binding state bits which must be 00
+
+* To verify the AP types are compatible:
+
+ * On both the source and destination hosts, display the hardware type
+ for each AP adapter in the AP configuration for that host to ensure the
+ source and destination adapter types are the same, or the source type is
+ less than the destination type::
+
+ cat /sys/bus/ap/devices/card$APID/hwtype
+
+ .. note::
+ The ``$APID`` is the two-character adapter number in hexidecimal format;
+ for example, card02 or card1f.
+
+Setting up for live guest migration on the destination host
+-----------------------------------------------------------
+To set up a host as the destination for live guest migration, do the
+following:
+
+Create a mediated device
+~~~~~~~~~~~~~~~~~~~~~~~~
+Create a mediated device with the same name (UUID) as the mediated device
+used to supply the AP configuration to the source guest. For example, if
+the mediated device on the source guest is named
+62177883-f1bb-47f0-914d-32a22e3a8804:
+
+.. code-block:: bash
+
+ echo 62177883-f1bb-47f0-914d-32a22e3a8804 > \
+ /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/create
+
+Reserve adapters and domains for the vfio_ap device driver's use
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Reserve the same adapters and domains passed through to the source guest
+for the vfio_ap device driver on the destination host by clearing the
+adapter numbers from the ``/sys/bus/ap/ap_mask`` and/or the domain numbers
+from the ``/sys/bus/ap/aqmask`` files.
+
+For example, suppose adapters 0, 20 (0x14) and 22 (0x16) as well as
+domains 51 (0x33) and 58 (0x3a) are passed through to the source guest. To
+ensure they can be passed through to the destination guest, they must be
+made available to the vfio_ap device driver. To make them available, do one
+or both of the following on the destination host:
+
+.. code-block:: bash
+
+ echo -0,-0x14,-22 > /sys/bus/ap/apmask
+ echo -0,-0x33,-0x3a > /sys/bus/ap/aqmask
+
+.. note::
+
+ * It is not necessary to reserve both the adapters and domains.
+ Reserving an adapter implicitly allows the vfio_ap driver to assign
+ each of the domains available on the system along with the reserved
+ adapters to a guest's AP configuration and vice versa.
+
+ * Reserving adapters and/or domains will result in binding the queue
+ devices referenced by the APQNs derived from the Cartesian product of
+ the adapters and domains made available to the vfio_ap device driver
+ with one caveat; an APQN does not necessarily have to reference a
+ queue device installed in the system in order to reserve it for the
+ vfio_ap device driver; however, if the APQN does not reference a
+ queue device installed on the destination host, migration will fail.
+
+ * It is not necessary to configure the destination mediated device with
+ adapters, domains and control domains because the source guest's AP
+ configuration will overlay the destination guest's during migration.
+
+ * To verify that the queue devices are reserved for the vfio_ap
+ device driver, issue the ``lszcrypt -V`` command and you should
+ see ``vfio_ap`` in the ``driver`` column of the output.
+
+Live guest migration failures due to AP configuration errors
+------------------------------------------------------------
+This section describes the errors that may occur during live guest migration
+when the AP configuration of the source guest and destination host are not
+compatible and where to look for problem determination.
+
+The destination host is missing the mediated device
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This error occurs when a mediated device with the same name (UUID) as the
+mediated device being migrated has not been created on the destination host.
+
+**Source host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | error: device not found: mediated device '$UUID' not |
+| | found |
++--------------+----------------------------------------------------------------+
+| QEMU log: | initiating migration |
+| | qemu-system-s390x: Sibling indicated error 1 |
++--------------+----------------------------------------------------------------+
+
+.. note::
+ $UUID is the UUID of the mediated device being migrated from the source host.
+
+
+One or more queues are not bound to the vfio_ap driver on the destination host
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The queue devices passed through to the source guest are not installed in the
+destination host or are not bound to its vfio_ap device driver.
+
+**Source host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | error: operation failed: migration failed. Message from the |
+| | source host: operation failed: job 'migration out' failed: |
+| | Sibling indicated error 1. Message from the destination host: |
+| | operation failed: job 'migration in' failed: load of migration |
+| | failed: No such device: Failed to load vmstate version_id: 1, |
+| | ret: -19 |
++--------------+----------------------------------------------------------------+
+
+**Destinaton host**
+
++--------------+----------------------------------------------------------------+
+| Kernel log: | vfio_ap_mdev $UUID: Queue $APID.$APQI not available to |
+| | vfio_ap driver on target host |
++--------------+----------------------------------------------------------------+
+| QEMU log: | shutting down, reason=failed |
+| | qemu-system-s390x: terminating on signal $SIGNAL_NUM from |
+| | pid $PID $DAEMON |
++--------------+----------------------------------------------------------------+
+
+
+.. note::
+ * $UUID is the UUID of the mediated device being migrated from the source host.
+ * $APID.$APQI is adapter number and queue index (domain number) comprising the
+ $APQN (for example, 00.0033) of the queue device that is not available.
+ * $SIGNAL_NUM is the signal number that caused the QEMU process to terminate.
+ * $PID is the process ID of the QEMU process on the destination host.
+ * $DAEMON is the name of the QEMU daemon process.
+ * This message will be logged for each queue for which this error is detected.
+
+Failure to get hardware information for a queue device on destination host
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The vfio_ap device driver on the destination host will execute an AP instruction
+to get the hardware information for a queue device to perform queue compatibility
+checks. The instruction can fail for a number of reasons in which case the
+migration will be terminated.
+
+**Source host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | error: operation failed: migration failed. Message from the |
+| | source host: operation failed: job 'migration out' failed: |
+| | Sibling indicated error 1. Message from the destination host: |
+| | operation failed: job 'migration in' failed: load of migration |
+| | failed: failed to get hardware information for queue device |
+| | $APQN |
++--------------+----------------------------------------------------------------+
+
+**Destination host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | vfio_ap_mdev $UUID: migration failed: Failed to get hwinfo for |
+| | queue $APQN on target host: rc=$RESPONSE-CODE" |
++--------------+----------------------------------------------------------------+
+| QEMU log: | shutting down, reason=failed |
+| | qemu-system-s390x: terminating on signal $SIGNAL-ID from pid |
+| | $PID $DAEMON |
++--------------+----------------------------------------------------------------+
+
+.. note::
+ * $UUID is the UUID of the mediated device being migrated from the source host.
+ * $APQN (for example, 00.0033) of the queue device that is not compatible.
+ * $RESPONSE_CODE is the response code from the AP instruction.
+ * $SIGNAL_NUM is the signal number of the signal that caused the QEMU process
+ to terminate.
+ * $PID is the process ID of the QEMU process on the destination host.
+ * $DAEMON is the name of the QEMU daemon process.
+ * This message will be logged for each queue for which this error is detected.
+
+The AP configurations of the source guest and destination host not incompatible
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+The functional capabilities or facilities available on one or more of the queues
+passed through to the source guest are not compatible with the queue device with
+the same APQN on the destination system (see the Hardware Capabilities table above)
+
+**Source host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | error: operation failed: migration failed. Message from the |
+| | source host: operation failed: job 'migration out' failed: |
+| | Sibling indicated error 1. Message from the destination host: |
+| | operation failed: job 'migration in' failed: load of migration |
+| | failed: No such device: Failed to load vmstate version_id: 1, |
+| | ret: -19 |
++--------------+----------------------------------------------------------------+
+| Kernel log: | N/A |
++--------------+----------------------------------------------------------------+
+| QEMU log: | *initiating migration* |
+| | |
+| | *qemu-system-s390x: Sibling indicated error 1* |
++--------------+----------------------------------------------------------------+
+
+**Destination host**
+
++--------------+----------------------------------------------------------------+
+| Log | Message |
++==============+================================================================+
+| Console log: | N/A |
++--------------+----------------------------------------------------------------+
+| Kernel log: | *vfio_ap_mdev $UUID: Migration failed: Source and target |
+| | queue ($APQN) not compatible* |
+| | |
+| | The message above will be followed by one or more messages |
+| | enumerating the incompatible features; for example: |
+| | |
+| | ``vfio_ap_mdev $UUID: APSC facility installed in source queue |
+| | $APQN`` |
+| | ``vfio_ap_mdev $UUID: APSC facility not installed in target |
+| | queue $APQN`` |
+| | |
+| | ``AP type of source ($APTYPE1) not compatible with target |
+| | ($APTYPE2)`` ... |
++--------------+----------------------------------------------------------------+
+| QEMU log: | *initiating migration* |
+| | |
+| | *qemu-system-s390x: error while loading state section id ...* |
+| | |
+| | *shutting down, reason=failed* |
+| | |
+| | *terminating on signal $SIGNAL-ID from pid $PID $DAEMON* |
++--------------+----------------------------------------------------------------+
+
+.. note::
+ * $UUID is the UUID of the mediated device being migrated from the source host.
+ * $APQN (for example, 00.0033) of the queue device that is incompatible.
+ * $APTYPE1 is the type of the source queue device.
+ * $APTYPE2 is the type of the target queue device.
+ * $SIGNAL-ID is the signal number of the signal that caused the QEMU process
+ to terminate.
+ * $PROCESS-ID is the process ID of the QEMU process on the source host.
+ * $PID is the process ID of the QEMU process on the destination host.
+ * $DAEMON is the name of the QEMU daemon process.
+ * Each follow-on message logging the incompatibility will be logged for each
+ error detected for the specified queue device.
+
+AP Configuration Management
+---------------------------
+The AP configurations of the source guest and destination host must be kept
+in synchronization or live guest migration will likely fail due to potential
+incompatibilities introduced by such changes. In particular, changing adapter
+or domain reservations via the sysfs ``apmask`` or ``aqmask`` attributes on
+either the source or destination host without coordinating such changes between
+the two hosts -especially while a migration is taking place - can result in
+migration failures.
+
+Configuration stability is an orchestration-layer or system administrator
+responsibility, which is consistent with other VFIO device types. The vfio_ap
+device driver will validate source and destination configurations and provide
+clear diagnostics when incompatibilities are detected, enabling orchestration
+tools to implement appropriate policies; for example, blocking migration when
+such changes are to be made.
+
+Master Key administration
+--------------------------
+It is imperative that s390 Common Cryptographic Architecture (CCA) master key
+administration be performed on both the source and destination AP
+devices to synchronize the key values prior to allowing live guest
+migration. If the master keys do not match, then crypto applications that
+rely on secure keys wrapped by a CCA master key will fail when the guest on
+which they are running is migrated to the destination host. Again, this is
+an orchestration-layer or system administrator responsibility and migration must
+be blocked while master key administration is being performed.
--
2.53.0
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration
2026-06-30 10:41 ` [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration Anthony Krowiak
@ 2026-06-30 10:55 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 10:55 UTC (permalink / raw)
To: Anthony Krowiak
Cc: linux-s390, kvm, Vasily Gorbik, Alexander Gordeev, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Out-of-bounds read in `do_post_copy_validation` due to redundant and unsynchronized size fields (`num_queues` and `config_sz`) during fragmented migration stream writes.
- [Low] Use of the architecture-dependent type `size_t` in a binary structure intended for serialization.
- [Low] Severe mismatch between kernel-doc comments and actual structure fields in `vfio_ap_migration_data` and `vfio_ap_config`.
--
commit b64c84539884d50cda90f18a69a0a94397d84158
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Data structures for facilitating vfio device migration
This commit introduces the foundational data structures needed to facilitate
state transitions during vfio device migration for the s390 vfio-ap driver.
It adds structures for migration data, queue information, and AP configuration.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> new file mode 100644
> index 0000000000000..239168c4b7fff
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -0,0 +1,56 @@
[ ... ]
> +/**
> + * vfio_ap_migration_data - the data needed to migrate a guest with pass-through
> + * access to AP devices
> + *
> + * @mig_state: the current migration state
> + * @resuming_migf: the object used to resume the target guest
> + * @saving_migf: the object used to save the state of the source guest
> + */
> +struct vfio_ap_migration_data {
> + enum vfio_device_mig_state mig_state;
> +
> + struct {
> + struct file *filp;
> + struct vfio_ap_config *ap_config;
> + size_t config_sz;
> + } resuming_mig_state;
> +
> + struct file *stop_copy_mig_file;
> +};
[Severity: Low]
Are the kernel-doc fields for vfio_ap_migration_data out of sync with the
implementation? The comments document resuming_migf and saving_migf, but the
actual fields are resuming_mig_state and stop_copy_mig_file.
[ ... ]
> +/**
> + * vfio_ap_config - the guest's AP configuration
> + *
> + * @num_queues: the number of queues passed through to the guest
> + * @reserved: padding to ensure proper alignment of qinfo array
> + * @qinfo: an array of vfio_ap_queue_info objects, each specifying the
> + * queue information for a queue passed through to the guest
> + */
> +struct vfio_ap_config {
> + u32 num_queues;
> + u8 reserved[4];
> + size_t config_sz;
[Severity: Low]
Does the kernel-doc block for vfio_ap_config need an update to include the
config_sz field?
[Severity: Low]
Could the use of size_t here cause consistency issues? Since size_t has
variable width depending on the architecture, using it in a binary structure
intended for serialization over file descriptors or the network is generally
avoided in favor of fixed-width types.
[Severity: Critical]
Could having both num_queues and config_sz in this struct lead to an
out-of-bounds read later during migration?
If a fragmented migration stream write manipulates config_sz to be a smaller
value than expected, the validation checks in reallocate_ap_config() could be
bypassed:
reallocate_ap_config() {
cur_cfg_sz = ap_config->config_sz;
num_queues = ap_config->num_queues;
if (cur_cfg_sz >= sizeof(struct vfio_ap_config)) {
...
} else {
new_cfg_sz = cur_cfg_sz + len;
}
}
This could cause an out-of-bounds read in do_post_copy_validation() when
looping num_queues times over the qinfo array:
queues_available_on_target_system() {
for (int i = 0; i < source_config->num_queues; i++) {
apqn = source_config->qinfo[i].apqn;
...
}
Is it possible to synchronize these fields or derive the size directly from
num_queues to prevent a mismatch?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data
2026-06-30 10:41 ` [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data Anthony Krowiak
@ 2026-06-30 11:04 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:04 UTC (permalink / raw)
To: Anthony Krowiak
Cc: Vasily Gorbik, Alexander Gordeev, kvm, Heiko Carstens, linux-s390
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Self-deadlock in the error path of vfio_ap_mdev_probe() when vfio_register_emulated_iommu_dev() fails.
--
commit 5aa03de641783e67e8d873c30619dc8b515b536e
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Initialize/release vfio device migration data
This commit provides functions to initialize and release data structures
used during live guest migration. It also adds a release callback for the
vfio device to clean up the migration data when the mdev is released.
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
[ ... ]
> +static void vfio_ap_mdev_release_dev(struct vfio_device *vdev)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> +
> + mutex_lock(&matrix_dev->mdevs_lock);
> + matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
> + vfio_ap_release_migration_data(matrix_mdev);
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +}
> +
> static int vfio_ap_mdev_probe(struct mdev_device *mdev)
[ ... ]
> @@ -797,19 +809,30 @@ static int vfio_ap_mdev_probe(struct mdev_device *mdev)
> if (IS_ERR(matrix_mdev))
> return PTR_ERR(matrix_mdev);
>
> + mutex_lock(&matrix_dev->mdevs_lock);
> +
[ ... ]
> ret = vfio_register_emulated_iommu_dev(&matrix_mdev->vdev);
> if (ret)
> goto err_put_vdev;
[ ... ]
> err_put_vdev:
> vfio_put_device(&matrix_mdev->vdev);
> + mutex_unlock(&matrix_dev->mdevs_lock);
[Severity: High]
Does this create a self-deadlock when vfio_register_emulated_iommu_dev()
fails?
In this error path, matrix_dev->mdevs_lock is held when calling
vfio_put_device(). If this drops the last reference, it will synchronously
invoke the device's release callback, vfio_ap_mdev_release_dev().
Looking at the newly added vfio_ap_mdev_release_dev() above, it
unconditionally attempts to acquire the exact same matrix_dev->mdevs_lock.
Could this cause the thread to hang indefinitely since the non-recursive
lock is already held by vfio_ap_mdev_probe()?
> return ret;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler
2026-06-30 10:41 ` [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler Anthony Krowiak
@ 2026-06-30 11:10 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:10 UTC (permalink / raw)
To: Anthony Krowiak
Cc: Vasily Gorbik, kvm, Alexander Gordeev, Heiko Carstens, linux-s390
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues:
- [Low] Typo in the function comment block for `vfio_ap_reset_migration_state`.
Pre-existing issues:
- [High] Preexisting memory leak of `ap_config` in `vfio_ap_release_migration_data`.
--
commit 814157f788ce7725968457dca0dde1e8bb23d18c
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler
This patch adds a handler to reset the vfio-ap migration state during a
VFIO_DEVICE_RESET ioctl. It releases migration files and frees the storage
allocated for maintaining the migration state to return it to RUNNING.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 5b6d48ccf332c..c7c71980bdbb8 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -129,3 +129,23 @@ void vfio_ap_release_migration_data(struct ap_matrix_mdev *matrix_mdev)
> kfree(matrix_mdev->mig_data);
> matrix_mdev->mig_data = NULL;
> }
[Severity: High]
This isn't a bug introduced by this patch, but does this function leak the
dynamically allocated ap_config pointer stored within mig_data?
If the VFIO device file descriptor is closed before the migration file
descriptor, vfio_ap_mdev_close_device() calls this function which frees
matrix_mdev->mig_data without first freeing
mig_data->resuming_mig_state.ap_config.
When the migration file descriptor is subsequently closed, its release
callback vfio_ap_release_mig_file() checks matrix_mdev->mig_data. Since it
is now NULL, it returns -ENODEV and skips cleanup, which seems to permanently
leak the ap_config memory. Could this be triggered repeatedly by userspace?
> +
> +/**
> + * vfio_ap_reset_migration_state - Reset the vfio-ap migration state
> + *
> + * @matrix_mdev: pointer to the object maintaining the vfio-ap device state
> + *
> + * Called during VFIO_DEVICE_RESET to clean up any active migration
> + * stte and reset the device to RUNNING state as required by the VFIO
[Severity: Low]
This isn't a bug, but there is a typo in the comment block. Did you mean
"state" instead of "stte"?
> + * migration specification.
> + */
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration
2026-06-30 10:41 ` [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration Anthony Krowiak
@ 2026-06-30 11:11 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:11 UTC (permalink / raw)
To: Anthony Krowiak
Cc: linux-s390, kvm, Heiko Carstens, Alexander Gordeev, Vasily Gorbik
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Race condition in `vfio_ap_release_mig_file` accessing `mig_data` without locking, resulting in Use-After-Free and memory leak.
- [High] Removal of migration file cleanup logic from vfio_ap_reset_migration_state leaves stale file pointers, causing resource leaks.
- [High] The device state is not updated to VFIO_DEVICE_STATE_ERROR when a state transition fails.
- [Medium] Unratelimited dev_err allows unprivileged users to flood the kernel log.
- [Medium] Use of global mutex for device-specific state operations allows cross-VM interference.
--
commit d82cbc8a070306f90cbac1743dbb46ba5b58d080
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390-vfio-ap: Callback to get/set vfio device mig state during guest migration
Implements two callbacks related to the current state of a vfio-ap device:
* The callback that sets a new migration state of a vfio-ap device during
live migration of guests with pass-through access to AP devices. This
callback is mandatory for VFIO_DEVICE_FEATURE_MIGRATION support.
The function pointer for this callback is specified via the
'migration_set_state' field of the 'vfio_migration_ops' structure
which is stored with the VFIO device when the 'vfio_device'
structure representing the mediated device is initialized.
* The callback that returns the current vfio device migration state during
live migration of guests with pass-through access to AP devices.
The function pointer for this callback is specified via the
'migration_get_state' field of the 'vfio_migration_ops' structure
which is stored with the VFIO device when the 'vfio_device'
structure representing the mediated device is initialized.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index c7c71980bdbb8..8274e7b65eea4 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
[ ... ]
> @@ -56,16 +57,140 @@ struct vfio_ap_config {
> struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
> };
>
[ ... ]
> +static struct file *vfio_ap_set_state(struct vfio_device *vdev,
> + enum vfio_device_mig_state new_state)
> {
> - return NULL;
> + int ret;
> + struct file *filp = NULL;
> + struct ap_matrix_mdev *matrix_mdev;
> + enum vfio_device_mig_state next_state;
> + struct vfio_ap_migration_data *mig_data;
> +
> + matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
> +
> + mutex_lock(&matrix_dev->mdevs_lock);
[Severity: Medium]
Does using the global mdevs_lock here allow an attacker controlling a guest
to heavily contend the lock by repeatedly calling the migration ioctl? Would
it be better to use a per-device lock to prevent potential cross-VM
interference?
> + if (ap_is_se_guest()) {
> + dev_err(matrix_mdev->vdev.dev,
> + "Migration not allowed from or to a Secure Execution guest\n");
[Severity: Medium]
Could this unratelimited error message allow an unprivileged user program to
flood the kernel log by calling the ioctl in a tight loop?
> + mutex_unlock(&matrix_dev->mdevs_lock);
> + return ERR_PTR(-EPERM);
> + }
> +
> + mig_data = matrix_mdev->mig_data;
> +
[ ... ]
> + while (mig_data->mig_state != new_state) {
> + ret = vfio_mig_get_next_state(vdev, mig_data->mig_state,
> + new_state, &next_state);
> + if (ret) {
> + filp = ERR_PTR(ret);
> + break;
> + }
> +
> + filp = vfio_ap_transition_to_state(matrix_mdev, next_state);
> + if (IS_ERR(filp))
> + break;
[Severity: High]
If a state transition fails here, should the device be placed into
VFIO_DEVICE_STATE_ERROR? The current logic seems to break the loop and
return the error, but mig_data->mig_state is left unmodified, so subsequent
calls might report the old valid state instead of the required error state.
> +
> + mig_data->mig_state = next_state;
> +
> + if (WARN_ON(filp && new_state != next_state)) {
> + fput(filp);
> + filp = ERR_PTR(-EINVAL);
> + break;
> + }
> + }
> +
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +
> + return filp;
> }
>
> static int vfio_ap_get_state(struct vfio_device *vdev,
> enum vfio_device_mig_state *current_state)
> {
> - return -EOPNOTSUPP;
> + struct ap_matrix_mdev *matrix_mdev;
> + struct vfio_ap_migration_data *mig_data;
> +
> + mutex_lock(&matrix_dev->mdevs_lock);
[Severity: Medium]
Similar to the concern in vfio_ap_set_state(), could acquiring the global
mdevs_lock just to read the device-specific state lead to cross-VM lock
contention?
> +
> + matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
> + mig_data = matrix_mdev->mig_data;
> +
[ ... ]
> @@ -146,6 +271,5 @@ void vfio_ap_reset_migration_state(struct ap_matrix_mdev *matrix_mdev)
> if (!matrix_mdev->mig_data)
> return;
>
> - vfio_ap_release_mig_files(matrix_mdev);
> matrix_mdev->mig_data->mig_state = VFIO_DEVICE_STATE_RUNNING;
> }
[Severity: High]
Is it safe to remove the call to vfio_ap_release_mig_files() here? If
userspace starts a new migration, won't the new file pointers overwrite the
old ones in mig_data? When the old file descriptors are closed later, the
driver's release callback won't find a match in mig_data, which appears to
result in a permanent memory leak of the associated allocations.
[Severity: Critical]
Are there also race conditions related to accessing mig_data without locks
during release? Looking at vfio_ap_release_mig_file():
struct ap_matrix_mdev *matrix_mdev = filp->private_data;
if (!matrix_mdev || !matrix_mdev->mig_data)
return -ENODEV;
if (filp == matrix_mdev->mig_data->stop_copy_mig_file)
Could an unprivileged process cause a use-after-free by closing the VFIO
device file descriptor while keeping the migration file descriptor open,
and then closing the migration file descriptor? Since mig_data is accessed
locklessly here, it seems it could be freed concurrently during device
closure, leading to a dereference of freed memory.
Additionally, if vfio_ap_release_migration_data() is called concurrently,
does it properly free internal allocations like resuming_mig_state.ap_config,
or are those leaked when the parent mig_data is freed?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=5
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY
2026-06-30 10:41 ` [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY Anthony Krowiak
@ 2026-06-30 11:23 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:23 UTC (permalink / raw)
To: Anthony Krowiak
Cc: Alexander Gordeev, kvm, linux-s390, Heiko Carstens, Vasily Gorbik
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The migration file descriptor does not hold a reference to `matrix_mdev` nor uses an independent context structure, leading to a Use-After-Free when the migration file is closed after the device is removed.
- [High] `vfio_ap_release_mig_file()` accesses and modifies `matrix_mdev->mig_data` without holding `matrix_dev->mdevs_lock`, leading to a TOCTOU race and Use-After-Free.
--
commit 24121dd0a897a029f9bc2979eb21b1bf31847440
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY
This patch implements the transition to the STOP_COPY migration state for
vfio-ap devices. It sets up the migration infrastructure by creating a file
stream used to read the guest's AP configuration. The file operations are
introduced with a stub read implementation.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 8274e7b65eea4..76e06abe5a9e9 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
[ ... ]
> @@ -57,6 +58,57 @@ struct vfio_ap_config {
> struct vfio_ap_queue_info qinfo[] __counted_by(num_queues);
> };
>
[ ... ]
> +static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
> +{
> + struct ap_matrix_mdev *matrix_mdev = filp->private_data;
> +
[Severity: High]
Does this code race with device closure?
If userspace concurrently closes the main VFIO device FD and the migration
FD, one thread could process the main FD closure, acquire the mdevs_lock,
and free matrix_mdev->mig_data.
If vfio_ap_release_mig_file() is called by the other thread and accesses
matrix_mdev->mig_data without acquiring matrix_dev->mdevs_lock, could it
dereference a freed mig_data pointer?
> + if (!matrix_mdev || !matrix_mdev->mig_data)
> + return -ENODEV;
> +
> + if (filp == matrix_mdev->mig_data->stop_copy_mig_file)
> + vfio_ap_release_stop_copy_file(matrix_mdev->mig_data);
> + else
> + return -ENOENT;
> +
> + return 0;
> +}
> +
> +static const struct file_operations vfio_ap_stop_copy_fops = {
> + .owner = THIS_MODULE,
> + .read = vfio_ap_stop_copy_read,
> + .compat_ioctl = compat_ptr_ioctl,
> + .release = vfio_ap_release_mig_file,
> +};
> +
> +static struct file *vfio_ap_open_file_stream(struct ap_matrix_mdev *matrix_mdev,
> + const struct file_operations *fops,
> + int flags)
> +{
> + struct file *filp;
> +
> + lockdep_assert_held(&matrix_dev->mdevs_lock);
> +
> + filp = anon_inode_getfile("vfio_ap_mig_file", fops, matrix_mdev, flags);
[Severity: High]
Can this lead to a use-after-free of matrix_mdev?
Because the migration file descriptor does not hold a reference to
matrix_mdev, what happens if userspace closes the main VFIO device FD
while keeping the migration FD open, and the administrator removes the
mediated device?
The VFIO core might wait for the main device FD to close and then free the
matrix_mdev object. When userspace eventually closes the migration FD,
vfio_ap_release_mig_file() will dereference filp->private_data which
points to the freed matrix_mdev memory.
> + if (!IS_ERR(filp))
> + stream_open(filp->f_inode, filp);
> +
> + return filp;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=6
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state
2026-06-30 10:41 ` [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state Anthony Krowiak
@ 2026-06-30 11:26 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:26 UTC (permalink / raw)
To: Anthony Krowiak
Cc: kvm, Alexander Gordeev, Vasily Gorbik, linux-s390, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [Critical] Uninitialized config_sz in vfio_ap_config breaks migration data restoration on target
- [Critical] Concurrent read() calls race on lockless f_pos updates causing Out-of-Bounds kernel read
- [High] Dynamic state re-generation on chunked read() causes data stream corruption
- [High] Lockless release of migration file causes Use-After-Free of mig_data
- [Medium] Inverted logic for handling async hardware errors in get_hardware_info_for_queue
--
commit 41fe322bf22b6ffa832bc302520c59909d8f5cca
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: File ops called to save the vfio device migration state
This commit implements the read callback for the vfio-ap device's stop-copy
migration file. It reads hardware queue states and exposes them to userspace
for transfer to the destination host during migration.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 76e06abe5a9e9..5ef7127b697a5 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -65,13 +65,6 @@ vfio_ap_release_stop_copy_file(struct vfio_ap_migration_data *mig_data)
> mig_data->stop_copy_mig_file = NULL;
> }
>
> -static ssize_t
> -vfio_ap_stop_copy_read(struct file *, char __user *, size_t, loff_t *)
> -{
> - /* TODO */
> - return -EOPNOTSUPP;
> -}
> -
> static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
> {
> struct ap_matrix_mdev *matrix_mdev = filp->private_data;
[Severity: High]
Can this lockless release cause a use-after-free of mig_data?
In vfio_ap_release_mig_file(), matrix_mdev->mig_data is checked and then
dereferenced locklessly. Concurrently, if the device fd is closed,
vfio_ap_release_migration_data() frees matrix_mdev->mig_data and sets it to
NULL.
If a user closes the device fd while simultaneously closing the migration fd,
could this read a non-NULL pointer that is subsequently freed before being
dereferenced?
> @@ -87,6 +80,215 @@ static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
> return 0;
> }
>
[ ... ]
> +static int get_hardware_info_for_queue(struct ap_matrix_mdev *matrix_mdev,
> + struct ap_tapq_hwinfo *hwinfo,
> + unsigned long apqn)
> +{
> + struct ap_queue_status status;
> +
> + status = ap_tapq(apqn, hwinfo);
> +
> + switch (status.response_code) {
> + case AP_RESPONSE_NORMAL:
> + case AP_RESPONSE_RESET_IN_PROGRESS:
> + case AP_RESPONSE_DECONFIGURED:
> + case AP_RESPONSE_CHECKSTOPPED:
> + case AP_RESPONSE_BUSY:
> + /* For all these RCs the tapq info should be available */
> + return 0;
> + case AP_RESPONSE_Q_NOT_AVAIL:
> + dev_err(matrix_mdev->vdev.dev,
> + "migration failed: Failed to get hwinfo for queue %02lx.%04lx on target host: TAPQ rc=%d",
> + AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
> + return -ENODEV;
> + default:
> + /* On a pending async error the tapq info should be available */
> + if (!status.async)
> + return 0;
[Severity: Medium]
Is this logic inverted?
If there is no pending async error (!status.async), this returns 0 for an
unknown response code. However, it returns -EIO if there is an async error.
The comment suggests the info should be available on a pending async error.
Should this check be if (status.async) instead?
> +
> + dev_err(matrix_mdev->vdev.dev,
> + "Failed to get hwinfo for queue %02lx.%04lx: TAPQ rc=%d",
> + AP_QID_CARD(apqn), AP_QID_QUEUE(apqn), status.response_code);
> + return -EIO;
> + }
> +
> + return -EINVAL;
> +}
[ ... ]
> +static int
> +vfio_ap_get_config(struct ap_matrix_mdev *matrix_mdev,
> + struct vfio_ap_config **ap_config, size_t *ap_config_size)
> +{
> + struct vfio_ap_config *ap_configuration;
> + int num_queues, ret;
> +
> + *ap_config_size = vfio_ap_config_size(matrix_mdev, &num_queues);
> +
> + ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
> + if (!ap_configuration)
> + return -ENOMEM;
> +
> + ap_configuration->num_queues = num_queues;
[Severity: Critical]
Does this code fail to initialize ap_configuration->config_sz?
The target host receives this structure and relies on config_sz during
reallocate_ap_config(). If config_sz is 0, the check:
if (cur_cfg_sz >= sizeof(struct vfio_ap_config))
falls through to:
new_cfg_sz = cur_cfg_sz + len; // evaluates to (0 + len)
This might cause the configuration buffer to be incorrectly reallocated to
just len bytes, stripping the previously allocated space.
Will this lead to memory corruption when writing subsequent queue data?
> +
> + ret = vfio_ap_store_queue_info(matrix_mdev, ap_configuration);
> + if (ret) {
> + kfree(ap_configuration);
> + return ret;
> + }
> +
> + *ap_config = ap_configuration;
> +
> + return 0;
> +}
> +
> +static ssize_t vfio_ap_stop_copy_read(struct file *filp, char __user *buf,
> + size_t len, loff_t *pos)
> +{
> + struct ap_matrix_mdev *matrix_mdev;
> + size_t ret = 0, ap_config_size;
> + struct vfio_ap_config *ap_config;
> +
> + /*
> + * When userspace calls read() with an explicit offset (pread), pos is
> + * non-NULL and the function rejects it with -ESPIPE (illegal seek). For
> + * normal read() calls, pos is NULL, so we'll use the file's internal
> + * position filp->f_pos
> + */
> + if (pos)
> + return -ESPIPE;
> +
> + mutex_lock(&matrix_dev->mdevs_lock);
> +
> + pos = &filp->f_pos;
> +
> + ret = validate_stop_copy_read_parms(filp, pos, len);
> + if (ret) {
> + mutex_unlock(&matrix_dev->mdevs_lock);
> + return ret;
> + }
> +
> + matrix_mdev = filp->private_data;
> +
> + ret = vfio_ap_get_config(matrix_mdev, &ap_config, &ap_config_size);
[Severity: High]
Does this dynamically regenerate the entire device state on every read() call?
If userspace reads the data in smaller chunks (e.g., 4KB at a time), the
hardware state is re-fetched and re-allocated from scratch for every chunk.
If a queue's state or the number of queues changes on the host between
chunk reads, the generated size and data will shift.
Could this result in torn structures, truncated streams, or duplicated data?
> + if (ret) {
> + mutex_unlock(&matrix_dev->mdevs_lock);
> + return ret;
> + }
> +
> + /*
> + * If the position exceeds the size of the AP configuration data,
> + * then indicate EOF; otherwise calculate the length of the data to
> + * read such that a buffer overrun is prevented.
> + */
> + if (*pos >= ap_config_size)
> + len = 0;
> + else
> + len = min_t(size_t, ap_config_size - *pos, len);
> +
> + /* If we've reached an EOF condition, let the caller know */
> + if (len == 0) {
> + kfree(ap_config);
> + mutex_unlock(&matrix_dev->mdevs_lock);
> + return 0;
> + }
> +
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +
> + if (copy_to_user(buf, (char *)ap_config + *pos, len)) {
[Severity: Critical]
Can concurrent read() calls race here and cause an out-of-bounds kernel
memory read?
Since this file descriptor uses FMODE_STREAM, vfs_read() bypasses f_pos_lock.
The mutex is dropped just before copy_to_user(), which directly dereferences
*pos.
If Thread A and Thread B read concurrently:
Thread A evaluates len and drops the lock.
Thread B evaluates len and drops the lock.
Thread A completes copy_to_user() and updates *pos += len.
Thread B would then evaluate (char *)ap_config + *pos using Thread A's
updated *pos.
Could Thread B end up reading len bytes from beyond the end of the
ap_config kernel buffer?
> + kfree(ap_config);
> + return -EFAULT;
> + }
> +
> + kfree(ap_config);
> + *pos += len;
> + return len;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=7
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING
2026-06-30 10:41 ` [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING Anthony Krowiak
@ 2026-06-30 11:28 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:28 UTC (permalink / raw)
To: Anthony Krowiak
Cc: kvm, Vasily Gorbik, linux-s390, Alexander Gordeev, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
Pre-existing issues:
- [High] The migration file descriptor lacks a reference to the vfio_device, and its release handler lacks locking, leading to critical Use-After-Free and NULL pointer dereferences.
- [High] Concurrent read operations in vfio_ap_stop_copy_read bypass VFS locks and race on filp->f_pos, corrupting the migration stream.
- [High] Missing initialization of config_sz in vfio_ap_get_config causes a zero value to be sent to userspace, breaking the resuming phase protocol.
--
commit fb4283ff8475958b596a7e1a1f9bf2aba56d90d8
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
Subject: s390/vfio-ap: Transition device migration state from STOP to RESUMING
This commit introduces the transition from VFIO_DEVICE_STATE_STOP to
VFIO_DEVICE_STATE_RESUMING. It prepares for restoring the device state by
opening a file stream to receive the guest's AP configuration. It adds a stub
implementation for the write operation.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 5ef7127b697a5..e917cc995d352 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -65,6 +65,17 @@ vfio_ap_release_stop_copy_file(struct vfio_ap_migration_data *mig_data)
> mig_data->stop_copy_mig_file = NULL;
> }
>
> +static void
> +vfio_ap_release_resuming_file(struct vfio_ap_migration_data *mig_data)
> +{
> + if (mig_data->resuming_mig_state.filp)
> + mig_data->resuming_mig_state.filp = NULL;
> +
> + kfree(mig_data->resuming_mig_state.ap_config);
> + mig_data->resuming_mig_state.ap_config = NULL;
> + mig_data->resuming_mig_state.config_sz = 0;
> +}
> +
> static int vfio_ap_release_mig_file(struct inode *file_inode, struct file *filp)
> {
> struct ap_matrix_mdev *matrix_mdev = filp->private_data;
[Severity: High]
This is a pre-existing issue, but does vfio_ap_release_mig_file() safely
access matrix_mdev and mig_data here?
It appears the migration file descriptor is created in
vfio_ap_open_file_stream() without taking a reference to the vfio_device via
vfio_device_get(). Because of this, the file descriptor can outlive the
device structure.
If userspace closes the main device file descriptor and unbinds the mdev,
matrix_mdev is freed. When the migration file descriptor is subsequently
closed, vfio_ap_release_mig_file() accesses the freed matrix_mdev.
Also, since this release handler doesn't hold matrix_dev->mdevs_lock, can it
race with vfio_ap_release_migration_data() freeing mig_data, leading to a
use-after-free or NULL pointer dereference when accessing
matrix_mdev->mig_data?
[ ... ]
> @@ -311,6 +324,32 @@ static struct file *vfio_ap_open_file_stream(struct ap_matrix_mdev *matrix_mdev,
> return filp;
> }
>
> +static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
> + size_t len, loff_t *pos)
> +{
> + /* TODO */
> + return -EOPNOTSUPP;
> +}
[Severity: High]
This is a pre-existing issue, but while implementing the resumption phase,
will the data coming from vfio_ap_get_config() be properly formed?
In vfio_ap_get_config():
ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
if (!ap_configuration)
return -ENOMEM;
ap_configuration->num_queues = num_queues;
The config_sz field is never initialized, so a zero value is copied to
userspace. During this write operation, if data is handled in chunks and
uses config_sz to calculate the new allocation size, could it read 0, shrink
the buffer with krealloc() to an undersized amount, and cause an
out-of-bounds write during copy_from_user()?
> +
> +static const struct file_operations vfio_ap_resume_fops = {
> + .owner = THIS_MODULE,
> + .write = vfio_ap_resuming_write,
> + .release = vfio_ap_release_mig_file,
> +};
[Severity: High]
This isn't a bug introduced by this patch, but considering these new file
operations, is the f_pos handling safe from data races under concurrent I/O?
Looking at the existing vfio_ap_stop_copy_read(), the file's internal
position is updated entirely outside the protection of
matrix_dev->mdevs_lock:
vfio_ap_stop_copy_read()
mutex_lock(&matrix_dev->mdevs_lock);
...
mutex_unlock(&matrix_dev->mdevs_lock);
if (copy_to_user(buf, (char *)ap_config + *pos, len)) {
kfree(ap_config);
return -EFAULT;
}
kfree(ap_config);
*pos += len;
Since stream_open() disables the standard VFS f_pos_lock serialization,
can concurrent read() calls race here, sample the same offset, transmit
duplicate data blocks, and double-increment the position pointer?
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=8
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration
2026-06-30 10:41 ` [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration Anthony Krowiak
@ 2026-06-30 11:34 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:34 UTC (permalink / raw)
To: Anthony Krowiak
Cc: linux-s390, Heiko Carstens, Alexander Gordeev, Vasily Gorbik, kvm
Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:
- [Critical] Refactoring error replaces unlinking with linking, causing a Critical Use-After-Free on mdev removal.
- [High] Unconditional NULL pointer dereference in queue collection causes kernel panic.
- [High] Uninitialized stack-allocated bitmap `apids_removed` causes unintended hardware resets.
- [High] Typo in `reset_removed_queues` skips hardware resets for removed domains, leaking cryptographic state.
- [High] List corruption due to double addition of queues to local reset lists.
- [High] Premature queue linking before matrix validation permanently corrupts hash tables of concurrently running mdevs.
- [High] Domain-only removals completely bypass the hardware reset sequence, leaking cryptographic state.
- [Low] Loop over domain mask uses incorrect limit constant `AP_DEVICES` instead of `AP_DOMAINS`.
--
commit 1cdca5acb48932a1408594014bf5e0f8fa51e334
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Add method to set a new guest AP configuration
Add a new vfio_ap_set_new_config function to set a guest's AP
configuration. This is needed in order to set the state of the mdev when
it is migrated from a remote host system during the RESUMING phase.
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
[ ... ]
> @@ -875,7 +896,40 @@ static void vfio_ap_mdev_unlink_fr_queues(struct ap_matrix_mdev *matrix_mdev)
> q = vfio_ap_mdev_get_queue(matrix_mdev,
> AP_MKQID(apid, apqi));
> if (q)
> - q->matrix_mdev = NULL;
> + vfio_ap_mdev_link_queue(matrix_mdev, q);
> }
> }
> }
[Severity: Critical]
Does replacing the unlink assignment with vfio_ap_mdev_link_queue() leave
dangling pointers after device teardown?
This function is supposed to unlink the queue from the mdev by setting
q->matrix_mdev = NULL. Actively linking it instead means the queues are never
unlinked, which could cause a use-after-free when the matrix_mdev is
subsequently freed during removal.
[ ... ]
> -static void collect_queues_to_reset(struct ap_matrix_mdev *matrix_mdev,
> - unsigned long apid,
> - struct list_head *qlist)
> +static void collect_queues_by_apid(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apid,
> + struct list_head *qlist)
> {
> struct vfio_ap_queue *q;
> unsigned long apqi;
>
> for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) {
[Severity: High]
Does this unconditionally dereference matrix_mdev?
The loop condition accesses matrix_mdev->shadow_apcb.aqm without checking if
it is valid. Later code passes a NULL pointer to this function, which will
cause an immediate kernel panic here.
> - q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi));
> + q = matrix_mdev ?
> + vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi)) :
> + vfio_ap_find_queue(AP_MKQID(apid, apqi));
> + if (q)
> list_add_tail(&q->reset_qnode, qlist);
[Severity: High]
Does appending queues directly to local lists here corrupt the list structure?
If a queue matches both a removed adapter and a removed domain, it is collected
by both collect_queues_by_apid() and collect_queues_by_apqi(). Calling
list_add_tail() unconditionally overwrites the list links of the node,
corrupting the structure of the first list if it wasn't unlinked first.
> + }
> +}
> +
> +static void collect_queues_by_apqi(struct ap_matrix_mdev *matrix_mdev,
> + unsigned long apqi,
> + struct list_head *qlist)
> +{
> + struct vfio_ap_queue *q;
> + unsigned long apid;
> +
> + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) {
[Severity: High]
Does this also unconditionally dereference matrix_mdev?
Similar to collect_queues_by_apid(), accessing matrix_mdev->shadow_apcb.apm
here will panic if matrix_mdev is NULL.
[ ... ]
> +static void reset_removed_queues_by_apid(unsigned long *apm_removed,
> + unsigned long *apm_filtered,
> + struct list_head *qlist)
> +{
> + DECLARE_BITMAP(apids_removed, AP_DEVICES);
[Severity: High]
Can this uninitialized stack-allocated bitmap cause unintended hardware resets?
DECLARE_BITMAP() allocates the bitmap on the stack without zero-initializing
it. Setting specific bits in it without ever clearing it first means it
retains garbage stack data, which could lead to resetting arbitrary AP
queues across the host.
> + unsigned long apid;
> +
> + for_each_set_bit_inv(apid, apm_filtered, AP_DEVICES)
> + set_bit_inv(apid, apids_removed);
> +
> + for_each_set_bit_inv(apid, apm_removed, AP_DEVICES) {
> + if (!test_bit_inv(apid, apids_removed))
> + set_bit_inv(apid, apids_removed);
> + }
> +
> + if (!bitmap_empty(apids_removed, AP_DEVICES)) {
> + for_each_set_bit_inv(apid, apids_removed, AP_DEVICES)
> + collect_queues_by_apid(NULL, apid, qlist);
[Severity: High]
Is it intentional to pass NULL as the matrix_mdev here?
As mentioned earlier, collect_queues_by_apid() unconditionally dereferences
this pointer in its loop header, so this call will result in a NULL pointer
dereference.
[ ... ]
> +static void reset_removed_queues(struct ap_matrix *m_removed,
> + unsigned long *apm_filtered)
> +{
> + struct list_head qlist_by_apid, qlist_by_apqi;
> + DECLARE_BITMAP(apqis, AP_DOMAINS);
> + unsigned long apqi;
> +
> + INIT_LIST_HEAD(&qlist_by_apid);
> + INIT_LIST_HEAD(&qlist_by_apqi);
> + bitmap_clear(apqis, 0, AP_DOMAINS);
> +
> + reset_removed_queues_by_apid(m_removed->apm, apm_filtered, &qlist_by_apid);
> +
> + for_each_set_bit_inv(apqi, m_removed->aqm, AP_DEVICES) {
[Severity: Low]
Should this loop use AP_DOMAINS instead of AP_DEVICES for the limit?
While both constants might currently be defined identically, iterating over the
domain mask (aqm) with the device limit constant could lead to out-of-bounds
accesses if the architecture limits diverge in the future.
> + set_bit_inv(apqi, apqis);
> + collect_queues_by_apqi(NULL, apqi, &qlist_by_apqi);
> }
>
> + if (list_empty(&qlist_by_apqi))
> + return;
>
> + remove_queues_already_reset(&qlist_by_apid, &qlist_by_apqi);
>
> + if (!list_empty(&qlist_by_apqi))
> + vfio_ap_mdev_reset_qlist(&qlist_by_apid);
[Severity: High]
Is there a typo here where qlist_by_apid is being reset instead of
qlist_by_apqi?
The code checks if the apqi list is empty, but then explicitly calls the
reset function on the apid list. This seems to bypass the reset of removed
domain queues entirely, leaving their cryptographic state active.
[ ... ]
> +int vfio_ap_set_new_guest_config(struct ap_matrix_mdev *matrix_mdev,
> + struct ap_matrix *m_new,
> + bool filtering_allowable)
> +{
> + DECLARE_BITMAP(apm_filtered, AP_DEVICES);
> + struct ap_matrix m_old, m_old_shadow, m_removed;
> + bool do_update;
> + int rc;
>
> + lockdep_assert_held(&ap_attr_mutex);
> + assert_has_update_locks_for_mdev(matrix_mdev);
> +
> + /* Save old state */
> + ap_matrix_copy(&m_old, &matrix_mdev->matrix);
> + ap_matrix_copy(&m_old_shadow, &matrix_mdev->shadow_apcb);
> +
> + /* Reset mdev state */
> + vfio_ap_unlink_queues(matrix_mdev);
> + ap_matrix_copy(&matrix_mdev->matrix, m_new);
> + vfio_ap_link_queues(matrix_mdev);
> +
> + rc = validate_ap_matrix(matrix_mdev);
[Severity: High]
Does linking the queues before validating the matrix corrupt the hash tables
of concurrently running mdevs?
Queues are forcibly linked to the new matrix_mdev's hash table here before
verifying if they are already owned by another mdev in validate_ap_matrix().
If a queue is currently owned by another mdev, hash_add could blindly
overwrite its mdev_qnode pointers, destroying the victim mdev's hash bucket
chain.
> + if (rc) {
> + restore_mdev_state(matrix_mdev, &m_old, &m_old_shadow);
> + return rc;
> + }
> +
> + /*
> + * If APIDs need to be filtered from the guest AP config and filtering
> + * is not allowable according to the caller, then terminate the operation.
> + */
> do_update = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered);
> + if (!bitmap_empty(apm_filtered, AP_DEVICES) && !filtering_allowable) {
> + restore_mdev_state(matrix_mdev, &m_old, &m_old_shadow);
> + return -ECANCELED;
> + }
> +
> do_update |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
>
> + if (do_update)
> vfio_ap_mdev_update_guest_apcb(matrix_mdev);
> +
> + get_removed_matrixes(&m_removed, &m_old, m_new);
> + if (!bitmap_empty(m_removed.apm, AP_DEVICES) ||
> + !bitmap_empty(apm_filtered, AP_DEVICES))
> + reset_removed_queues(&m_removed, apm_filtered);
[Severity: High]
Does this condition bypass the hardware reset sequence for domain-only removals?
If the configuration update only removes AP domains, meaning m_removed.aqm is
populated but m_removed.apm is empty, the entire reset sequence appears to be
skipped. This could leak sensitive cryptographic state to potential subsequent
users.
> +
> + return 0;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=9
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration
2026-06-30 10:41 ` [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration Anthony Krowiak
@ 2026-06-30 11:37 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:37 UTC (permalink / raw)
To: Anthony Krowiak
Cc: kvm, Heiko Carstens, Alexander Gordeev, linux-s390, Vasily Gorbik
Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:
- [Critical] The `guest_matrix` struct is allocated on the stack without initialization, leading to uninitialized kernel stack memory being applied to the guest's AP configuration.
- [Critical] Error paths and `krealloc` usage leave dangling pointers in `mig_data->resuming_mig_state.ap_config`, resulting in Use-After-Free and Double Free vulnerabilities.
- [High] Unsafe use of `mutex_is_locked` to conditionally unlock `mdevs_lock` can release a mutex held by another thread.
- [High] A short write smaller than `sizeof(struct vfio_ap_config)` bypasses size checks and prematurely triggers post-copy validation, leading to out-of-bounds heap reads.
- [High] A double fetch of the `num_queues` field from userspace allows an attacker to bypass bounds checks and trigger out-of-bounds heap reads.
- [High] Partial writes to the migration stream leak the dynamically allocated configuration buffer and permanently break the state machine.
- [High] The `vfio_ap_resuming_write` function unconditionally returns `-ESPIPE` because it assumes `pos` can be NULL for normal writes, breaking migration entirely.
- [High] Dropping `mdevs_lock` while using `&filp->f_pos` on a stream without `FMODE_ATOMIC_POS` leads to data races.
- [Medium] The hardware compatibility check bypasses the requirement that BS bits must be 0 if the source and target queues have identical hardware information.
- [Medium] Strict equality checks on the `fac` bitfield override and break the intended backward compatibility for functionality classes.
- [Medium] The migration functionality completely ignores AP control domains (`adm`), silently dropping them during live migration.
--
commit 1f07710d77bb19f343d29ddba26d02cffa013a44
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: File ops called to resume the vfio device migration
This commit implements the write callback function to restore the state
of the vfio-ap device on the destination system during migration. It
copies the source guest's AP configuration from userspace, validates it,
and configures it on the destination host.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index e917cc995d352..9cd1e7a3f7430 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
[ ... ]
> @@ -187,10 +219,12 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
> {
> unsigned long *apm, *aqm, num_queues, apid, apqi, apqn;
> struct ap_tapq_hwinfo source_hwinfo;
> + const char *mdev_name;
> int ret;
>
> lockdep_assert_held(&matrix_dev->mdevs_lock);
>
> + mdev_name = dev_name(matrix_mdev->vdev.dev);
> apm = matrix_mdev->shadow_apcb.apm;
> aqm = matrix_mdev->shadow_apcb.aqm;
> num_queues = 0;
> @@ -199,7 +233,7 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
> for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> apqn = AP_MKQID(apid, apqi);
>
[Severity: Medium]
Are we missing the control domains (adm) here? It appears the migration
functionality iterates over apm and aqm, but leaves out adm entirely. Will
guests with AP control domain capabilities silently lose these resources
after live migration?
> - ret = get_hardware_info_for_queue(matrix_mdev,
> + ret = get_hardware_info_for_queue(mdev_name,
> &source_hwinfo, apqn);
> if (ret)
> return ret;
[ ... ]
> +static ssize_t allocate_ap_config(struct vfio_ap_config **ap_config,
> + const char __user *buf, size_t len)
> +{
> + struct vfio_ap_config tmp_ap_config;
> + ssize_t config_size;
> + size_t copy_size;
> +
> + /*
> + * If the length of the data sent exceeds the size of the vfio_ap_config
> + * structure, then we will copy enough data from userspace to get the
> + * number of queues which we can use to allocate enough space all of
> + * the queue information.
> + */
> + copy_size = min(len, sizeof(tmp_ap_config));
> +
> + if (copy_from_user(&tmp_ap_config, buf, copy_size))
> + return -EIO;
> +
[Severity: High]
We are copying tmp_ap_config from userspace here to validate num_queues
and size the buffer. In vfio_ap_resuming_write(), a second copy_from_user
is performed directly into the allocated buffer without re-validating
num_queues.
Could userspace modify num_queues between these two fetches to a large
value, bypassing bounds checks and leading to out-of-bounds heap reads
during subsequent validation loops?
> + /*
> + * If the length of data sent includes the number of queues
> + * in the AP configuration, then calculate its size; otherwise
> + * set config_size to the length of data sent.
> + */
> + if (len >= sizeof(struct vfio_ap_config)) {
> + config_size = calculate_ap_config_size(tmp_ap_config.num_queues);
> +
> + /* If the calculation returned an error */
> + if (config_size < 0)
> + return config_size;
> + } else {
> + config_size = len;
> + }
> +
[Severity: High]
If len is less than sizeof(struct vfio_ap_config), we set config_size to
len. Does this bypass the size validation and allocate an undersized
buffer? Later in vfio_ap_resuming_write(), this can prematurely trigger
post-copy validation on uninitialized heap memory.
> + *ap_config = kzalloc(config_size, GFP_KERNEL_ACCOUNT);
> + if (!*ap_config)
> + return -ENOMEM;
> +
> + return config_size;
> +}
> +
> +/**
> + * reallocate_ap_config:
[ ... ]
> +static ssize_t reallocate_ap_config(struct vfio_ap_config **mig_ap_config,
> + size_t len)
> +{
> + struct vfio_ap_config *ap_config = *mig_ap_config;
> + struct vfio_ap_config *new_ap_config;
> + size_t new_cfg_sz, cur_cfg_sz;
> + unsigned int num_queues;
> +
> + cur_cfg_sz = ap_config->config_sz;
> + num_queues = ap_config->num_queues;
[ ... ]
> + new_ap_config = krealloc(ap_config, new_cfg_sz, GFP_KERNEL_ACCOUNT);
> + if (!new_ap_config)
> + return -ENOMEM;
> +
> + *mig_ap_config = new_ap_config;
> +
> + return new_cfg_sz;
> +}
> +
[Severity: Critical]
krealloc() frees the old memory and returns a new pointer, updating the
local variable but not the global mig_data pointer until the transfer is
fully complete in vfio_ap_resuming_write(). Can a subsequent write or an
error path dereference the freed pointer?
> +/**
> + * qdev_is_bound_to_vfio_ap:
[ ... ]
> +static bool queue_hardware_info_is_compatible(const char *mdev_name,
> + unsigned long apqn,
> + struct ap_tapq_hwinfo *src_hwinfo,
> + struct ap_tapq_hwinfo *target_hwinfo)
> +{
> + unsigned long src_bits, target_bits;
> +
> + src_bits = src_hwinfo->value & QINFO_DATA_MASK;
> + target_bits = target_hwinfo->value & QINFO_DATA_MASK;
> +
> + /* If all bits match the queues are compatible */
> + if (src_bits == target_bits)
> + return true;
> +
[Severity: Medium]
The specification requires that the Bind/Associate State (BS) bits must be
0 for both queues. Does returning true early when src_bits == target_bits
bypass this requirement if both queues have identical, non-zero BS bits?
> + if (src_hwinfo->fac == target_hwinfo->fac &&
> + src_hwinfo->at <= target_hwinfo->at &&
> + classes_compatible(src_hwinfo, target_hwinfo) &&
> + (src_hwinfo->bs == 0 && target_hwinfo->bs == 0)) {
> + return true;
> + }
> +
[Severity: Medium]
The fac bitfield structurally encompasses the class field. If the source
and target queues have different classes (e.g., stateless vs native), will
src_hwinfo->fac == target_hwinfo->fac unconditionally evaluate to false and
break the intended backward compatibility logic in classes_compatible()?
> + report_qinfo_incompatibilities(mdev_name, apqn, src_hwinfo, target_hwinfo);
> +
> + return false;
> +}
[ ... ]
> +static void setup_ap_matrix_from_ap_config(const char *mdev_name,
> + struct vfio_ap_config *ap_config,
> + struct ap_matrix *guest_matrix)
> +{
> + struct vfio_ap_queue_info qinfo;
> + unsigned long apid, apqi;
> +
> + for (int i = 0; i < ap_config->num_queues; i++) {
> + qinfo = ap_config->qinfo[i];
> + apid = AP_QID_CARD(qinfo.apqn);
> + apqi = AP_QID_QUEUE(qinfo.apqn);
> +
> + if (!test_bit_inv(apid, guest_matrix->apm))
> + set_bit_inv(apid, guest_matrix->apm);
> + if (!test_bit_inv(apqi, guest_matrix->aqm))
> + set_bit_inv(apqi, guest_matrix->aqm);
> + }
> +}
> +
[Severity: Medium]
Similar to vfio_ap_store_queue_info(), this reconstruction appears to
discard the adm control domains.
> static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
> size_t len, loff_t *pos)
> {
> - /* TODO */
> - return -EOPNOTSUPP;
> + struct vfio_ap_migration_data *mig_data;
> + struct ap_matrix_mdev *matrix_mdev;
> + struct vfio_ap_config *ap_config;
> + struct ap_matrix guest_matrix;
> + ssize_t ret = 0, cfg_sz;
> + const char *mdev_name;
> +
[Severity: Critical]
Is guest_matrix fully initialized before use? It is allocated on the stack,
and setup_ap_matrix_from_ap_config() only modifies the apm and aqm bits.
Does passing this to vfio_ap_set_new_guest_config() apply uninitialized
kernel stack memory (including the uninitialized adm bits) to the guest's
cryptographic configuration?
> + /*
> + * When userspace calls write() with an explicit offset (pwrite), pos is
> + * non-NULL and the function rejects it with -ESPIPE (illegal seek). For
> + * normal write() calls, pos is NULL, so we'll use the file's internal
> + * position filp->f_pos
> + */
> + if (pos)
> + return -ESPIPE;
> +
[Severity: High]
The VFS layer always passes a valid pos pointer to the write callback
(never NULL). Does this mean normal writes will unconditionally fail with
-ESPIPE and break migration entirely?
> + mutex_lock(&matrix_dev->mdevs_lock);
> + pos = &filp->f_pos;
> +
> + ret = validate_resuming_write_parms(filp, len, pos);
> + if (ret)
> + goto done;
> +
> + matrix_mdev = filp->private_data;
> + mig_data = matrix_mdev->mig_data;
> + mdev_name = dev_name(matrix_mdev->vdev.dev);
> +
> + /*
> + * If this is the first write operation, then allocate storage for the
> + * AP configuration information; otherwise, reallocate the
> + * struct vfio_ap_config object used to store the AP configuration data
> + * sent from userspace.
> + */
> + if (*pos == 0) {
> + ret = allocate_ap_config(&ap_config, buf, len);
> +
> + /* If the allocation failed, we'll return the error */
> + if (ret < 0)
> + goto done;
> +
> + cfg_sz = ret;
> + } else {
> + ap_config = mig_data->resuming_mig_state.ap_config;
> +
> + ret = reallocate_ap_config(&ap_config, len);
> + if (ret < 0)
> + goto cleanup;
> +
> + cfg_sz = ret;
> + }
> +
> + if (*pos + len > cfg_sz) {
> + ret = -EIO;
> + goto cleanup;
> + }
> +
> + /*
> + * We don't want to lock all mdevs while copying data from userspace so
> + * we don't block all other mdevs in case the I/O takes a long time.
> + * From here on out we don't need this lock because we are not
> + * accessing the matrix_mdev until we need to get the update
> + * locks to set the new destination guest's AP configuration in which
> + * case this lock will be taken then.
> + */
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +
[Severity: High]
Since the file is opened with stream_open (which unsets FMODE_ATOMIC_POS),
f_pos_lock is not held by VFS. Does dropping mdevs_lock here allow
concurrent writes to race on filp->f_pos, leading to data races and memory
corruption?
> + if (copy_from_user((char *)ap_config + *pos, buf, len)) {
> + ret = -EIO;
> + goto cleanup;
> + }
> +
> + /* Check if we've completed writing the entire configuration */
> + if (*pos + len == cfg_sz) {
> + ret = do_post_copy_validation(mdev_name, ap_config);
> + if (ret < 0)
> + goto cleanup;
> +
[Severity: High]
If a short write occurred where len is less than sizeof(struct
vfio_ap_config), could *pos + len == cfg_sz evaluate to true prematurely?
Will this trigger post-copy validation on an undersized and partially
uninitialized buffer, leading to out-of-bounds heap reads?
> + setup_ap_matrix_from_ap_config(mdev_name, ap_config,
> + &guest_matrix);
> +
> + /* Acquire locks required to update the guest's AP config */
> + mutex_lock(&ap_attr_mutex);
> + get_update_locks_for_mdev(matrix_mdev);
> +
> + ret = vfio_ap_set_new_guest_config(matrix_mdev, &guest_matrix,
> + false);
> + if (!ret) {
> + mig_data->resuming_mig_state.ap_config = ap_config;
> + mig_data->resuming_mig_state.config_sz = cfg_sz;
> + }
> +
[Severity: High]
If a write is partial, it returns before hitting this block. Does this leak
the dynamically allocated configuration buffer since it isn't assigned to
mig_data? And does this permanently break the state machine, as the next
write will fail the validation in validate_resuming_write_parms()?
> + release_update_locks_for_mdev(matrix_mdev);
> + mutex_unlock(&ap_attr_mutex);
> +
> + if (ret)
> + goto cleanup;
> + }
> +
> + ret = len;
> + *pos += len;
> + goto done;
> +
> +cleanup:
> + kfree(ap_config);
[Severity: Critical]
In the error path, we kfree(ap_config). Because krealloc() might have freed
the memory stored in mig_data->resuming_mig_state.ap_config during a prior
reallocate_ap_config(), could calling kfree(ap_config) here without
clearing the global state lead to a Double Free later when the file is
released?
> +done:
> + if (mutex_is_locked(&matrix_dev->mdevs_lock))
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +
[Severity: High]
The mutex_is_locked() function only checks if the lock is held by any
thread, not necessarily the current thread. If another thread has acquired
mdevs_lock while this thread was unlocked, will we erroneously unlock it
here and destroy mutual exclusion?
> + return ret;
> }
>
> static const struct file_operations vfio_ap_resume_fops = {
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=10
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP
2026-06-30 10:41 ` [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP Anthony Krowiak
@ 2026-06-30 11:46 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:46 UTC (permalink / raw)
To: Anthony Krowiak
Cc: Vasily Gorbik, Heiko Carstens, Alexander Gordeev, linux-s390, kvm
Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
New issues:
- [High] Migration stream file descriptors are not invalidated upon transitioning out of data transfer states, allowing untrusted userspace to modify running guest state.
- [High] The driver unconditionally allows the transition from RESUMING to STOP, failing to reject incomplete migration data as required by the VFIO specification.
Pre-existing issues:
- [Critical] Chunked or partial writes to the RESUMING fd leak memory and cause a NULL pointer dereference on subsequent chunks.
- [Critical] Migration file descriptors do not hold a reference to the `vfio_device` or `matrix_mdev`, leading to a Use-After-Free if the device is removed while the migration FD is kept open.
- [Critical] vfio_ap_release_mig_file accesses and modifies migration state without holding matrix_dev->mdevs_lock, leading to UAF.
- [High] Writing fewer than `sizeof(struct vfio_ap_config)` bytes to the resuming stream bypasses size validation, triggering a Heap Out-of-Bounds read.
- [High] The dynamically allocated `ap_config` is permanently leaked if the VFIO device is closed while the migration file descriptor remains open.
--
commit 1ef55b17508b1d713795918bce912d07492d77e0
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Transition device migration state to STOP
This commit updates the vfio-ap migration state machine to allow
transitions to the stop state. It allows returning null when
transitioning from resuming or stop copy to stop.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 9cd1e7a3f7430..1b526b743ad02 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -1159,12 +1159,17 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
> return filp;
> }
>
> + /*
> + * Terminates the data transfer session of the vfio-ap device state
> + * between the source and target hosts. Since the vfio-ap device does
> + * not virtualize a DMA device, there is no internal device state to
> + * incorporate into the vfio-ap device on the target.
> + */
> if ((cur_state == VFIO_DEVICE_STATE_RESUMING &&
> new_state == VFIO_DEVICE_STATE_STOP) ||
> (cur_state == VFIO_DEVICE_STATE_STOP_COPY &&
> new_state == VFIO_DEVICE_STATE_STOP)) {
> - /* TODO */
> - return ERR_PTR(-EOPNOTSUPP);
> + return NULL;
[Severity: High]
Does transitioning to the stop state here leave the active file stream
pointers open for abuse? By simply returning null without clearing
mig_data->resuming_mig_state.filp, userspace might be able to
transition the device to stop and then running, while keeping the
resuming file descriptor open to write to it.
In validate_resuming_write_parms(), it only validates that filp matches
the cached pointer without checking if the device is actually in the
VFIO_DEVICE_STATE_RESUMING state:
validate_resuming_write_parms() {
if (filp != matrix_mdev->mig_data->resuming_mig_state.filp) {
ret = -ENXIO;
goto done;
}
}
Could this trigger asynchronous mutation of the AP hardware
configuration of a live guest?
[Severity: High]
Does unconditionally returning null here bypass verifying the completeness
of the migration data transfer?
If userspace writes a partial configuration and transitions to stop,
will the device silently remain unconfigured without signaling an error
to userspace?
> }
>
> if ((cur_state == VFIO_DEVICE_STATE_STOP &&
[Severity: Critical]
This is a pre-existing issue, but do chunked or partial writes to the
resuming file descriptor leak memory and cause a null pointer dereference?
If a user writes a partial chunk in vfio_ap_resuming_write() where the size
is less than cfg_sz, it skips assigning the dynamically allocated ap_config
back to the global state:
vfio_ap_resuming_write() {
...
if (*pos + len == cfg_sz) {
ret = do_post_copy_validation(mdev_name, ap_config);
...
}
On the subsequent write chunk, retrieving ap_config from mig_data fetches
null, which then gets passed to reallocate_ap_config():
reallocate_ap_config() {
struct vfio_ap_config *ap_config = *mig_ap_config;
...
cur_cfg_sz = ap_config->config_sz;
}
Will this dereference cause a kernel oops?
[Severity: High]
This is a pre-existing issue, but does writing fewer than the expected size
bypass size validation and trigger a heap out-of-bounds read?
In allocate_ap_config(), writing an undersized buffer sets config_size
to the short length:
allocate_ap_config() {
...
} else {
config_size = len;
}
}
If vfio_ap_resuming_write() sees that the received size matches cfg_sz,
it assumes the configuration is fully received and passes the undersized
buffer for validation.
Since userspace controls num_queues, will iterating over it read out
of bounds?
queues_available_on_target_system() {
for (int i = 0; i < source_config->num_queues; i++) {
apqn = source_config->qinfo[i].apqn;
...
}
[Severity: High]
This is a pre-existing issue, but is the dynamically allocated ap_config
permanently leaked if the device is closed while the migration file
descriptor remains open?
If userspace closes the main device file descriptor, the container struct
is freed:
vfio_ap_release_migration_data() {
...
kfree(matrix_mdev->mig_data);
matrix_mdev->mig_data = NULL;
}
When the migration file descriptor is later closed, it bails out early
if the container is null, leaving the nested ap_config allocation orphaned:
vfio_ap_release_mig_file() {
if (!matrix_mdev || !matrix_mdev->mig_data)
return -ENODEV;
...
}
[Severity: Critical]
This is a pre-existing issue, but do migration file descriptors fail to
hold a reference to the vfio_device or matrix_mdev?
When opening the stream, anon_inode_getfile() is called without bumping
the reference count:
vfio_ap_open_file_stream() {
filp = anon_inode_getfile("vfio_ap_mig_file", fops, matrix_mdev, flags);
...
}
If the device is removed via sysfs while the migration file descriptor
is open, will the final reference be dropped?
vfio_ap_mdev_remove() {
...
vfio_put_device(&matrix_mdev->vdev);
}
Could subsequent operations on the open migration file descriptor
dereference the freed matrix_mdev and result in a use-after-free?
[Severity: Critical]
This is a pre-existing issue, but does accessing the migration state
locklessly in vfio_ap_release_mig_file() lead to a use-after-free?
The function checks mig_data without holding a lock:
vfio_ap_release_mig_file() {
if (!matrix_mdev || !matrix_mdev->mig_data)
return -ENODEV;
if (filp == matrix_mdev->mig_data->stop_copy_mig_file)
...
}
If userspace concurrently triggers closure of the primary device, can
the memory be freed under the mdevs_lock?
vfio_ap_release_migration_data() {
kfree(matrix_mdev->mig_data);
matrix_mdev->mig_data = NULL;
}
Could this allow vfio_ap_release_mig_file() to dereference mig_data
immediately after it is freed?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=11
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa
2026-06-30 10:41 ` [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa Anthony Krowiak
@ 2026-06-30 11:48 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:48 UTC (permalink / raw)
To: Anthony Krowiak
Cc: Alexander Gordeev, kvm, linux-s390, Vasily Gorbik, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
New issues:
- [High] Transitioning the VFIO AP device from RUNNING to STOP state does not quiesce or reset the underlying AP queues, potentially allowing hardware to modify guest memory asynchronously.
Pre-existing issues:
- [Critical] Writing a chunk smaller than `sizeof(struct vfio_ap_config)` to the resuming file allocates an undersized buffer, triggering an out-of-bounds read during validation.
- [Critical] A partial initial write to the vfio-ap migration resuming stream file skips saving the configuration pointer, leading to a memory leak and a subsequent NULL pointer dereference.
- [Critical] Error paths after a successful initial write double-free the configuration pointer, or leave a dangling pointer leading to a Use-After-Free.
- [High] The function `vfio_ap_resuming_write` conditionally unlocks a global mutex using `mutex_is_locked()`, incorrectly dropping the lock when held by other threads and corrupting mutex state.
- [High] The driver allocates large, potentially fragmented user-controlled memory chunks using `kzalloc()`, leading to unreliable device migration.
--
commit a3f4b819ba0ee66ec445b971234644e51eeab7e9
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa
This commit updates the device migration state transitions to handle the
transition from STOP to RUNNING and vice versa. It treats these state
changes as a normal operation state where no internal device state needs
to be migrated.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 1b526b743ad02..572db04c4ac15 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -1172,12 +1172,17 @@ vfio_ap_transition_to_state(struct ap_matrix_mdev *matrix_mdev,
> return NULL;
> }
>
> + /*
> + * These states indicate migration has either not been initiated or
> + * has completed and the vfio-ap device is operating normally.Since the
> + * vfio-ap device does not virtualize a DMA device, there is no internal
> + * device state to incorporate into the vfio-ap device on the target.
> + */
> if ((cur_state == VFIO_DEVICE_STATE_STOP &&
> new_state == VFIO_DEVICE_STATE_RUNNING) ||
> (cur_state == VFIO_DEVICE_STATE_RUNNING &&
> new_state == VFIO_DEVICE_STATE_STOP)) {
> - /* TODO */
> - return ERR_PTR(-EOPNOTSUPP);
> + return NULL;
[Severity: High]
Does returning NULL here safely quiesce the underlying AP queues when
transitioning from RUNNING to STOP?
The VFIO migration specification requires that when a device transitions to
the STOP state, it must cease processing commands, generating interrupts, and
writing to guest memory. If the queues are not stopped or reset during a
RUNNING to STOP transition, the hardware may continue to modify memory after
the page is migrated, which could result in missed completion events.
> }
>
> /* vfio_mig_get_next_state() does not use arcs other than the above */
[Severity: Critical]
This is a pre-existing issue, but can writing a small chunk to the resuming
file lead to an out-of-bounds read?
If userspace writes a chunk smaller than the configuration struct size,
allocate_ap_config() appears to allocate an undersized buffer:
drivers/s390/crypto/vfio_ap_migration.c:allocate_ap_config() {
...
*ap_config = kzalloc(config_size, GFP_KERNEL_ACCOUNT);
...
}
If this data is copied and triggers early validation in
vfio_ap_resuming_write():
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_resuming_write() {
...
if (*pos + len == cfg_sz) {
ret = do_post_copy_validation(mdev_name, ap_config);
...
}
Could this result in do_post_copy_validation() reading past the allocation
since the buffer was sized based on the short length rather than the full
struct?
[Severity: Critical]
This is a pre-existing issue, but does a partial initial write skip saving
the configuration pointer?
In vfio_ap_resuming_write(), if the initial write does not equal the full
configuration size, it seems to jump to done without saving the newly
allocated ap_config:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_resuming_write() {
...
ret = len;
*pos += len;
goto done;
cleanup:
kfree(ap_config);
...
}
If a subsequent write occurs, reallocate_ap_config() might dereference a
null pointer because the pointer was never saved during the first partial
write:
drivers/s390/crypto/vfio_ap_migration.c:reallocate_ap_config() {
...
struct vfio_ap_config *ap_config = *mig_ap_config;
...
cur_cfg_sz = ap_config->config_sz;
...
}
Could this sequence lead to a memory leak and a null pointer dereference?
[Severity: Critical]
This is a pre-existing issue, but can error paths in vfio_ap_resuming_write()
cause a double-free or use-after-free?
If an additional write triggers an error, the code jumps to cleanup and frees
ap_config:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_resuming_write() {
...
ret = reallocate_ap_config(&ap_config, len);
if (ret < 0)
goto cleanup;
...
cleanup:
kfree(ap_config);
...
}
However, the persistently tracked pointer in resuming_mig_state is not
nullified. When the file is released, the pointer is freed again:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_release_resuming_file() {
...
kfree(mig_data->resuming_mig_state.ap_config);
mig_data->resuming_mig_state.ap_config = NULL;
...
}
Does this leave a dangling pointer that gets double-freed when the file is
closed?
[Severity: High]
This isn't a bug introduced by this patch, but does vfio_ap_resuming_write()
incorrectly unlock the global mutex if it is held by another thread?
At the end of the write operation, the function conditionally unlocks the
mutex:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_resuming_write() {
...
done:
if (mutex_is_locked(&matrix_dev->mdevs_lock))
mutex_unlock(&matrix_dev->mdevs_lock);
...
}
Since mutex_is_locked() checks if the lock is held by any thread, not just
the current thread, could this cause the current thread to erroneously unlock
it when it is safely held by a completely different thread?
[Severity: High]
This is a pre-existing issue, but is kzalloc() safe for large allocations
when getting the AP configuration?
Both allocate_ap_config() and vfio_ap_get_config() use kzalloc() for memory
that can exceed 1MB depending on the user-supplied queue length:
drivers/s390/crypto/vfio_ap_migration.c:allocate_ap_config() {
...
*ap_config = kzalloc(config_size, GFP_KERNEL_ACCOUNT);
...
}
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_get_config() {
...
ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
...
}
Since this requires large contiguous physical chunks, could this lead to
sporadic ENOMEM failures due to memory fragmentation? Would it be more
reliable to use kvzalloc() here since contiguous physical pages are not
strictly required for staging buffers?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=12
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration
2026-06-30 10:41 ` [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration Anthony Krowiak
@ 2026-06-30 11:49 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:49 UTC (permalink / raw)
To: Anthony Krowiak
Cc: kvm, Alexander Gordeev, Vasily Gorbik, linux-s390, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues:
- [Critical] Chunked writes to the resuming migration file descriptor cause a memory leak and subsequent NULL pointer dereference.
- [High] Migration config size is uninitialized on the source, breaking chunked reallocation on the destination.
--
commit 267d5567c135fa27576ba7a999b70cfabea69848
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Callback to get the size of data to be migrated during guest migration
Callback to get the estimated data length of the data that will be required
to complete reading of the vfio device's internal state. It's mandatory for
VFIO_DEVICE_FEATURE_MIGRATION migration support.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index 572db04c4ac15..83e1ce2c40c0c 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -1283,7 +1283,18 @@ static int vfio_ap_get_state(struct vfio_device *vdev,
> static int vfio_ap_get_data_size(struct vfio_device *vdev,
> unsigned long *stop_copy_length)
> {
> - return -EOPNOTSUPP;
> + struct ap_matrix_mdev *matrix_mdev;
> + size_t qinfo_sz;
> + int num_queues;
> +
> + mutex_lock(&matrix_dev->mdevs_lock);
> + matrix_mdev = container_of(vdev, struct ap_matrix_mdev, vdev);
> + num_queues = vfio_ap_mdev_get_num_queues(&matrix_mdev->shadow_apcb);
> + qinfo_sz = num_queues * sizeof(struct vfio_ap_queue_info);
> + *stop_copy_length = qinfo_sz + sizeof(struct vfio_ap_config);
> + mutex_unlock(&matrix_dev->mdevs_lock);
> +
> + return 0;
> }
>
> static const struct vfio_migration_ops vfio_ap_migration_ops = {
[Severity: Critical]
This is a pre-existing issue, but chunked writes to the resuming migration
file descriptor can cause a memory leak and subsequent NULL pointer
dereference.
When userspace writes resuming migration data in chunks (a write where
*pos + len < cfg_sz), vfio_ap_resuming_write() successfully processes the
chunk but bypasses saving the newly allocated ap_config to mig_data, and
fails to free it:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_resuming_write() {
...
if (*pos == 0) {
ret = allocate_ap_config(&ap_config, buf, len);
...
} else {
...
}
...
/* Check if we've completed writing the entire configuration */
if (*pos + len == cfg_sz) {
...
if (!ret) {
mig_data->resuming_mig_state.ap_config = ap_config;
mig_data->resuming_mig_state.config_sz = cfg_sz;
}
...
}
ret = len;
*pos += len;
goto done;
cleanup:
kfree(ap_config);
done:
...
}
If the first write is a partial chunk, ap_config is allocated but the
function skips assigning it to mig_data->resuming_mig_state.ap_config and
jumps to done, leaking ap_config.
On the next chunk write (*pos > 0), the else branch reads
mig_data->resuming_mig_state.ap_config which is still NULL. It passes it to
reallocate_ap_config(), where it is unconditionally dereferenced:
drivers/s390/crypto/vfio_ap_migration.c:reallocate_ap_config() {
struct vfio_ap_config *ap_config = *mig_ap_config;
...
cur_cfg_sz = ap_config->config_sz;
Can this sequence cause a memory leak on partial writes, followed by a NULL
pointer dereference on the next chunk write?
[Severity: High]
This is a pre-existing issue, but the migration config size appears to be
left uninitialized on the source, which can break chunked reallocation on
the destination.
In vfio_ap_get_config(), ap_configuration is allocated via kzalloc but the
config_sz field is never populated:
drivers/s390/crypto/vfio_ap_migration.c:vfio_ap_get_config() {
...
*ap_config_size = vfio_ap_config_size(matrix_mdev, &num_queues);
ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
if (!ap_configuration)
return -ENOMEM;
ap_configuration->num_queues = num_queues;
...
}
Because config_sz remains 0, the source sends a zeroed config_sz to
userspace. On the destination, vfio_ap_resuming_write() copies this
structure from userspace. If a subsequent chunk triggers
reallocate_ap_config(), it reads the 0 size and reallocates the buffer to
merely len. This causes the subsequent validation:
if (*pos + len > cfg_sz) {
ret = -EIO;
goto cleanup;
}
to evaluate as true (since *pos + len > len is true when *pos > 0),
aborting the migration with -EIO.
Should config_sz be initialized in vfio_ap_get_config() before it is sent
to the destination?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=13
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst
2026-06-30 10:41 ` [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst Anthony Krowiak
@ 2026-06-30 11:54 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:54 UTC (permalink / raw)
To: Anthony Krowiak
Cc: linux-s390, Alexander Gordeev, Vasily Gorbik, kvm, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Low] Malformed reStructuredText syntax leading to potential build warnings (grid table and nested lists).
- [Low] Multiple typographical, grammatical errors, and inconsistent variable names in the newly added text.
- [Low] Logical contradiction in section heading.
--
commit 5afeaef82632a184153d66843b7e2a58e98d0cdd
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst
This commit adds a 'Live Guest Migration' chapter to the vfio-ap.rst
documentation. It describes the details for initiating live guest migration
for a guest to which AP adapters, domains and control domains have been
passed through.
> diff --git a/Documentation/arch/s390/vfio-ap.rst b/Documentation/arch/s390/vfio-ap.rst
> index ac0c07f76ddd7..fc0707c0dffd6 100644
> --- a/Documentation/arch/s390/vfio-ap.rst
> +++ b/Documentation/arch/s390/vfio-ap.rst
[ ... ]
> +To be considered compatible, the AP configuration for the destination host must
> +meet these requirements:
>
> - (QEMU) device-del id=<device-id>
> +* Each AP adapter, domain and control domain number assigned to the source
> + guest's AP configuration must be in the destination host's AP
> + configuration. This can be verified as follows:
>
> - For example, to hot unplug the vfio_ap mediated device that was specified
> - on the qemu command line with 'id=hostdev0' when the guest was started:
> + * On the source host, display the AP configuration of the source guest:
>
> - (QEMU) device-del id=hostdev0
> + .. code-block:: bash
>
> -After live migration of the KVM guest completes, an AP configuration can be
> -restored to the KVM guest by hot plugging a vfio_ap mediated device on the target
> -system into the guest in one of two ways:
> + cat /sys/devices/vfio_ap/matrix/$UUID/guest_matrix
> + cat /sys/devices/vfio_ap/matrix/$UUID/control_domains
>
> -1. If the KVM guest was started with libvirt, you can hot plug a matrix mediated
> - device into the guest via the following virsh commands:
> + .. note::
> + Replace ``$UUID`` with the actual UUID of your mediated device.
>
> - virsh attach-device <guestname> <path-to-device-xml>
> + * Display the AP configuration of the destination host:
>
> - For example, to hot plug mdev 62177883-f1bb-47f0-914d-32a22e3a8804 into
> - the guest named 'my-guest':
> + .. code-block:: bash
>
> - virsh attach-device my-guest ~/config/my-guest-hostdev.xml
> + lszcrypt -V
> + cat /sys/bus/ap/ap_control_domain_mask
>
> - The contents of my-guest-hostdev.xml:
> + Note that each bit in the ap_control_domain_mask corresponds to
> + correspond to domain numbers 0-255. If a bit is set to one, then
[Severity: Low]
This isn't a bug, but the phrase "corresponds to correspond to" appears
to be duplicated.
> + the control domain is available on the destination host. For
> + example:
[ ... ]
> ++-----------------------------------------------------------------------------+
> +| Table Keys |
> ++=============================================================================+
> +| |
> +| [1] The hardware capability must be the same for both the source |
> +| and destination queue device. |
> +| |
> +| [2] The source and destination queues must either both have |
> +| full native card function or both have stateless functions. |
> +| If the functional capabilities don't match, then the source |
> +| queue can have stateless capabilities since full native card |
> +| function can handle stateless functions; otherwise, migration |
> +| will fail. |
> +| |
> +| [3] The AP type on the source and destination guests can |
> +| differ if the queue passed through to the target guest is a |
> +| newer model (backwards compatible) |
> ++--------------+--------------------------------------------------------------+
[Severity: Low]
Will this malformed table bottom border cause sphinx build warnings?
The table body defines a single column, but the bottom border is split
into two columns by the + character, which violates reStructuredText grid
table syntax.
> +
> +* To verify the hardware capabilities are compatible:
> +
> + * On both the source and destination hosts, display the hardware capabilities
> + for each AP adapter in the AP configuration for that host::
> +
> + cat /sys/bus/ap/devices/card$APID/ap_functions
> +
> + .. note::
> + The ``$APID`` is the two-character adapter number in hexidecimal format;
[Severity: Low]
This isn't a bug, but there is a typo in the word "hexidecimal".
> + for example, card02 or card1f. The ``/sys/bus/ap/devices`` directory
> + also lists the APQNs of the queue devices installed in the respective
> + host system with the first two characters being the APID.
> +
> + * Verify the hardware capabilities for each AP adapter device on the source
> + and destination host are compatible. The ``ap_functions`` attribute is a
> + bitmask. The bits in the mask read from left to right starting with bit 0.
> + Each bit that is set to one indicates the corresponding hardware capability
> + is installed:
> +
> + * Bits 0-3 are the facilities bits. Each bit value must match for the AP
> + devices on the source and destination systems. The values indicate the
> + following:
> + * bit-0: APSC is installed
> + * bit-1: AP4KM is installed
> + * bit-2: AP4KC is installed
[Severity: Low]
Is a blank line needed before the nested list items (starting with bit-0)
to avoid reStructuredText parsing warnings?
> +
> + * Bits 3-5 specify the adapter Modes. Only one of these bits will be set to
> + one and must be the same for both the source and destination adapter
> + devices:
> + * bit-3: CCA-mode
> + * bit-4: Accelerator-mode
> + * bit-5: XCP-mode (EP11)
> +
> + * Bit 6 specifies whether APXA is installed and must be the same for both
> + the source and destination adapter devices.
> +
> + * Bit 7 specifies command whether command filtering is installed
[Severity: Low]
This isn't a bug, but the phrase "command whether command filtering" has
duplicated words.
> + and must be the same for both the source and destination adapter devices.
[ ... ]
> +One or more queues are not bound to the vfio_ap driver on the destination host
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +The queue devices passed through to the source guest are not installed in the
> +destination host or are not bound to its vfio_ap device driver.
> +
> +**Source host**
> +
> ++--------------+----------------------------------------------------------------+
> +| Log | Message |
> ++==============+================================================================+
> +| Console log: | error: operation failed: migration failed. Message from the |
> +| | source host: operation failed: job 'migration out' failed: |
> +| | Sibling indicated error 1. Message from the destination host: |
> +| | operation failed: job 'migration in' failed: load of migration |
> +| | failed: No such device: Failed to load vmstate version_id: 1, |
> +| | ret: -19 |
> ++--------------+----------------------------------------------------------------+
> +
> +**Destinaton host**
[Severity: Low]
This isn't a bug, but there is a typo in the word "Destinaton".
> +
> ++--------------+----------------------------------------------------------------+
> +| Kernel log: | vfio_ap_mdev $UUID: Queue $APID.$APQI not available to |
> +| | vfio_ap driver on target host |
> ++--------------+----------------------------------------------------------------+
> +| QEMU log: | shutting down, reason=failed |
> +| | qemu-system-s390x: terminating on signal $SIGNAL_NUM from |
> +| | pid $PID $DAEMON |
> ++--------------+----------------------------------------------------------------+
[ ... ]
> +**Destination host**
> +
> ++--------------+----------------------------------------------------------------+
> +| Log | Message |
> ++==============+================================================================+
> +| Console log: | vfio_ap_mdev $UUID: migration failed: Failed to get hwinfo for |
> +| | queue $APQN on target host: rc=$RESPONSE-CODE" |
[Severity: Low]
This isn't a bug, but there appears to be an unmatched trailing quote
character at the end of the rc=$RESPONSE-CODE string.
> ++--------------+----------------------------------------------------------------+
> +| QEMU log: | shutting down, reason=failed |
> +| | qemu-system-s390x: terminating on signal $SIGNAL-ID from pid |
> +| | $PID $DAEMON |
> ++--------------+----------------------------------------------------------------+
> +
> +.. note::
> + * $UUID is the UUID of the mediated device being migrated from the source host.
> + * $APQN (for example, 00.0033) of the queue device that is not compatible.
> + * $RESPONSE_CODE is the response code from the AP instruction.
> + * $SIGNAL_NUM is the signal number of the signal that caused the QEMU process
> + to terminate.
> + * $PID is the process ID of the QEMU process on the destination host.
> + * $DAEMON is the name of the QEMU daemon process.
> + * This message will be logged for each queue for which this error is detected.
> +
> +The AP configurations of the source guest and destination host not incompatible
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[Severity: Low]
Does this heading have a double negative?
The text below it enumerates failures when the configurations are not
compatible, but the heading says "not incompatible" which logically means
they are compatible.
> +The functional capabilities or facilities available on one or more of the queues
> +passed through to the source guest are not compatible with the queue device with
> +the same APQN on the destination system (see the Hardware Capabilities table above)
> +
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=15
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute
2026-06-30 10:41 ` [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute Anthony Krowiak
@ 2026-06-30 11:56 ` sashiko-bot
0 siblings, 0 replies; 30+ messages in thread
From: sashiko-bot @ 2026-06-30 11:56 UTC (permalink / raw)
To: Anthony Krowiak
Cc: linux-s390, Vasily Gorbik, Alexander Gordeev, kvm, Heiko Carstens
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [Critical] Dangling pointer in migration data after `krealloc`.
- [High] Unlocking an arbitrary thread's mutex using `mutex_is_locked()`.
- [High] Failure to persist `ap_config` pointer during chunked resuming writes causes memory leaks and breaks subsequent writes.
- [High] Uninitialized `config_sz` sent over the wire breaks buffer reallocation on the destination host.
--
commit 0de28964ca6b520852841fe66380c0b0caacaec6
Author: Anthony Krowiak <akrowiak@linux.ibm.com>
Date: Tue Jun 30 10:41:24 2026 +0000
s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute
Add the 'migratable' feature to the features provided by the sysfs
/sys/devices/vfio_ap/matrix/feature attribute to indicate that migration
of vfio devices is supported.
> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> new file mode 100644
> index 0000000000000..83e1ce2c40c0c
> --- /dev/null
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
> @@ -0,0 +1,1373 @@
[ ... ]
> +static int
> +vfio_ap_get_config(struct ap_matrix_mdev *matrix_mdev,
> + struct vfio_ap_config **ap_config, size_t *ap_config_size)
> +{
> + struct vfio_ap_config *ap_configuration;
> + int num_queues, ret;
> +
> + *ap_config_size = vfio_ap_config_size(matrix_mdev, &num_queues);
> +
> + ap_configuration = kzalloc(*ap_config_size, GFP_KERNEL_ACCOUNT);
> + if (!ap_configuration)
> + return -ENOMEM;
> +
> + ap_configuration->num_queues = num_queues;
[Severity: High]
Does this code fail to initialize ap_configuration->config_sz before sending
it over the wire during vfio_ap_stop_copy_read()?
The config_sz field is left as 0, which is then sent to the destination host.
> +
> + ret = vfio_ap_store_queue_info(matrix_mdev, ap_configuration);
[ ... ]
> +static ssize_t reallocate_ap_config(struct vfio_ap_config **mig_ap_config,
> + size_t len)
> +{
> + struct vfio_ap_config *ap_config = *mig_ap_config;
> + struct vfio_ap_config *new_ap_config;
> + size_t new_cfg_sz, cur_cfg_sz;
> + unsigned int num_queues;
> +
> + cur_cfg_sz = ap_config->config_sz;
> + num_queues = ap_config->num_queues;
> +
> + /*
> + * If the current configuration size is greater than the
> + * size of a vfio_ap_config structure (i.e., contains the num_queues
> + * field), then there should already be enough storage allocated
> + * to store the source guest's AP configuration. Let's verify that the
> + * amount of storage allocated is what we expect based on the number of
> + * vfio_ap_queue_info objects that must be stored.
> + */
> + if (cur_cfg_sz >= sizeof(struct vfio_ap_config)) {
[Severity: High]
Can this condition fail unexpectedly due to the uninitialized config_sz sent
from the source host?
During restoration, allocate_ap_config() receives the first chunk, storing 0
in config_sz. On the next chunk, reallocate_ap_config() reads cur_cfg_sz as 0.
The else branch is taken, where new_cfg_sz = cur_cfg_sz + len calculates len,
truncating the buffer to just the size of the incoming chunk, causing immediate
-EIO failures.
> + new_cfg_sz = calculate_ap_config_size(num_queues);
[ ... ]
> +static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
> + size_t len, loff_t *pos)
> +{
> + struct vfio_ap_migration_data *mig_data;
> + struct ap_matrix_mdev *matrix_mdev;
> + struct vfio_ap_config *ap_config;
> + struct ap_matrix guest_matrix;
> + ssize_t ret = 0, cfg_sz;
> + const char *mdev_name;
[ ... ]
> + /*
> + * If this is the first write operation, then allocate storage for the
> + * AP configuration information; otherwise, reallocate the
> + * struct vfio_ap_config object used to store the AP configuration data
> + * sent from userspace.
> + */
> + if (*pos == 0) {
> + ret = allocate_ap_config(&ap_config, buf, len);
> +
> + /* If the allocation failed, we'll return the error */
> + if (ret < 0)
> + goto done;
> +
> + cfg_sz = ret;
[Severity: High]
Is it possible for a chunked write to fail to persist the ap_config pointer?
If allocate_ap_config() succeeds but the write does not contain the complete
configuration (*pos + len != cfg_sz), ap_config is not assigned to
mig_data->resuming_mig_state.ap_config before the function returns.
The allocated ap_config memory leaks, and on the next chunk, *pos > 0 but the
state pointer remains NULL. This will cause validate_resuming_write_parms() to
falsely detect an illegal state and return -EFAULT.
> + } else {
> + ap_config = mig_data->resuming_mig_state.ap_config;
> +
> + ret = reallocate_ap_config(&ap_config, len);
[Severity: Critical]
Does this code create a dangling pointer in the migration data after krealloc()?
The reallocate_ap_config() call uses krealloc(), which may free the old buffer
and return a new one. The new pointer is stored in the local ap_config variable,
but it isn't updated in mig_data->resuming_mig_state.ap_config unless the
entire configuration is received successfully.
If copy_from_user() fails below, the new buffer is freed via kfree(ap_config),
but mig_data retains the old, already-freed pointer. When the file is closed,
vfio_ap_release_resuming_file() calls kfree() on the old pointer again,
leading to a double-free.
> + if (ret < 0)
> + goto cleanup;
> +
> + cfg_sz = ret;
> + }
[ ... ]
> +cleanup:
> + kfree(ap_config);
> +done:
> + if (mutex_is_locked(&matrix_dev->mdevs_lock))
> + mutex_unlock(&matrix_dev->mdevs_lock);
[Severity: High]
Does this error path illegally unlock matrix_dev->mdevs_lock if it is held by
another thread?
Because mutex_is_locked() just checks if the lock is held by any thread, and
vfio_ap_resuming_write() dropped mdevs_lock earlier to call copy_from_user(),
if a concurrent thread is currently holding mdevs_lock, this will unlock it.
This leads to race conditions and potential data corruption.
> +
> + return ret;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=14
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2026-06-30 11:56 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration Anthony Krowiak
2026-06-30 10:55 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data Anthony Krowiak
2026-06-30 11:04 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler Anthony Krowiak
2026-06-30 11:10 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration Anthony Krowiak
2026-06-30 11:11 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY Anthony Krowiak
2026-06-30 11:23 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state Anthony Krowiak
2026-06-30 11:26 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING Anthony Krowiak
2026-06-30 11:28 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration Anthony Krowiak
2026-06-30 11:34 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration Anthony Krowiak
2026-06-30 11:37 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP Anthony Krowiak
2026-06-30 11:46 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa Anthony Krowiak
2026-06-30 11:48 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration Anthony Krowiak
2026-06-30 11:49 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute Anthony Krowiak
2026-06-30 11:56 ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst Anthony Krowiak
2026-06-30 11:54 ` sashiko-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox