Re: [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Anthony Krowiak" <akrowiak@linux.ibm.com>
Cc: kvm@vger.kernel.org, "Heiko Carstens" <hca@linux.ibm.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	linux-s390@vger.kernel.org, "Vasily Gorbik" <gor@linux.ibm.com>
Subject: Re: [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration
Date: Tue, 30 Jun 2026 11:37:53 +0000	[thread overview]
Message-ID: <20260630113754.1A56B1F00A3D@smtp.kernel.org> (raw)
In-Reply-To: <20260630104132.1598851-11-akrowiak@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 11 potential issue(s) to consider:
- [Critical] The `guest_matrix` struct is allocated on the stack without initialization, leading to uninitialized kernel stack memory being applied to the guest's AP configuration.
- [Critical] Error paths and `krealloc` usage leave dangling pointers in `mig_data->resuming_mig_state.ap_config`, resulting in Use-After-Free and Double Free vulnerabilities.
- [High] Unsafe use of `mutex_is_locked` to conditionally unlock `mdevs_lock` can release a mutex held by another thread.
- [High] A short write smaller than `sizeof(struct vfio_ap_config)` bypasses size checks and prematurely triggers post-copy validation, leading to out-of-bounds heap reads.
- [High] A double fetch of the `num_queues` field from userspace allows an attacker to bypass bounds checks and trigger out-of-bounds heap reads.
- [High] Partial writes to the migration stream leak the dynamically allocated configuration buffer and permanently break the state machine.
- [High] The `vfio_ap_resuming_write` function unconditionally returns `-ESPIPE` because it assumes `pos` can be NULL for normal writes, breaking migration entirely.
- [High] Dropping `mdevs_lock` while using `&filp->f_pos` on a stream without `FMODE_ATOMIC_POS` leads to data races.
- [Medium] The hardware compatibility check bypasses the requirement that BS bits must be 0 if the source and target queues have identical hardware information.
- [Medium] Strict equality checks on the `fac` bitfield override and break the intended backward compatibility for functionality classes.
- [Medium] The migration functionality completely ignores AP control domains (`adm`), silently dropping them during live migration.
--

commit 1f07710d77bb19f343d29ddba26d02cffa013a44
Author: Anthony Krowiak <akrowiak@linux.ibm.com>

s390/vfio-ap: File ops called to resume the vfio device migration

This commit implements the write callback function to restore the state
of the vfio-ap device on the destination system during migration. It
copies the source guest's AP configuration from userspace, validates it,
and configures it on the destination host.

> diff --git a/drivers/s390/crypto/vfio_ap_migration.c b/drivers/s390/crypto/vfio_ap_migration.c
> index e917cc995d352..9cd1e7a3f7430 100644
> --- a/drivers/s390/crypto/vfio_ap_migration.c
> +++ b/drivers/s390/crypto/vfio_ap_migration.c
[ ... ]
> @@ -187,10 +219,12 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
>  {
>  	unsigned long *apm, *aqm, num_queues, apid, apqi, apqn;
>  	struct ap_tapq_hwinfo source_hwinfo;
> +	const char *mdev_name;
>  	int ret;
>  
>  	lockdep_assert_held(&matrix_dev->mdevs_lock);
>  
> +	mdev_name = dev_name(matrix_mdev->vdev.dev);
>  	apm = matrix_mdev->shadow_apcb.apm;
>  	aqm = matrix_mdev->shadow_apcb.aqm;
>  	num_queues = 0;
> @@ -199,7 +233,7 @@ static int vfio_ap_store_queue_info(struct ap_matrix_mdev *matrix_mdev,
>  		for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>  			apqn = AP_MKQID(apid, apqi);
>  

[Severity: Medium]
Are we missing the control domains (adm) here? It appears the migration
functionality iterates over apm and aqm, but leaves out adm entirely. Will
guests with AP control domain capabilities silently lose these resources
after live migration?

> -			ret = get_hardware_info_for_queue(matrix_mdev,
> +			ret = get_hardware_info_for_queue(mdev_name,
>  							  &source_hwinfo, apqn);
>  			if (ret)
>  				return ret;
[ ... ]
> +static ssize_t allocate_ap_config(struct vfio_ap_config **ap_config,
> +				  const char __user *buf, size_t len)
> +{
> +	struct vfio_ap_config tmp_ap_config;
> +	ssize_t config_size;
> +	size_t copy_size;
> +
> +	/*
> +	 * If the length of the data sent exceeds the size of the vfio_ap_config
> +	 * structure, then we will copy enough data from userspace to get the
> +	 * number of queues which we can use to allocate enough space all of
> +	 * the queue information.
> +	 */
> +	copy_size = min(len, sizeof(tmp_ap_config));
> +
> +	if (copy_from_user(&tmp_ap_config, buf, copy_size))
> +		return -EIO;
> +

[Severity: High]
We are copying tmp_ap_config from userspace here to validate num_queues
and size the buffer. In vfio_ap_resuming_write(), a second copy_from_user
is performed directly into the allocated buffer without re-validating
num_queues.

Could userspace modify num_queues between these two fetches to a large
value, bypassing bounds checks and leading to out-of-bounds heap reads
during subsequent validation loops?

> +	/*
> +	 * If the length of data sent includes the number of queues
> +	 * in the AP configuration, then calculate its size; otherwise
> +	 * set config_size to the length of data sent.
> +	 */
> +	if (len >= sizeof(struct vfio_ap_config)) {
> +		config_size = calculate_ap_config_size(tmp_ap_config.num_queues);
> +
> +		/* If the calculation returned an error */
> +		if (config_size < 0)
> +			return config_size;
> +	} else {
> +		config_size = len;
> +	}
> +

[Severity: High]
If len is less than sizeof(struct vfio_ap_config), we set config_size to
len. Does this bypass the size validation and allocate an undersized
buffer? Later in vfio_ap_resuming_write(), this can prematurely trigger
post-copy validation on uninitialized heap memory.

> +	*ap_config = kzalloc(config_size, GFP_KERNEL_ACCOUNT);
> +	if (!*ap_config)
> +		return -ENOMEM;
> +
> +	return config_size;
> +}
> +
> +/**
> + * reallocate_ap_config:
[ ... ]
> +static ssize_t reallocate_ap_config(struct vfio_ap_config **mig_ap_config,
> +				    size_t len)
> +{
> +	struct vfio_ap_config *ap_config = *mig_ap_config;
> +	struct vfio_ap_config *new_ap_config;
> +	size_t new_cfg_sz, cur_cfg_sz;
> +	unsigned int num_queues;
> +
> +	cur_cfg_sz = ap_config->config_sz;
> +	num_queues = ap_config->num_queues;
[ ... ]
> +	new_ap_config = krealloc(ap_config, new_cfg_sz, GFP_KERNEL_ACCOUNT);
> +	if (!new_ap_config)
> +		return -ENOMEM;
> +
> +	*mig_ap_config = new_ap_config;
> +
> +	return new_cfg_sz;
> +}
> +

[Severity: Critical]
krealloc() frees the old memory and returns a new pointer, updating the
local variable but not the global mig_data pointer until the transfer is
fully complete in vfio_ap_resuming_write(). Can a subsequent write or an
error path dereference the freed pointer?

> +/**
> + * qdev_is_bound_to_vfio_ap:
[ ... ]
> +static bool queue_hardware_info_is_compatible(const char *mdev_name,
> +					      unsigned long apqn,
> +					      struct ap_tapq_hwinfo *src_hwinfo,
> +					      struct ap_tapq_hwinfo *target_hwinfo)
> +{
> +	unsigned long src_bits, target_bits;
> +
> +	src_bits = src_hwinfo->value & QINFO_DATA_MASK;
> +	target_bits = target_hwinfo->value & QINFO_DATA_MASK;
> +
> +	/* If all bits match the queues are compatible */
> +	if (src_bits == target_bits)
> +		return true;
> +

[Severity: Medium]
The specification requires that the Bind/Associate State (BS) bits must be
0 for both queues. Does returning true early when src_bits == target_bits
bypass this requirement if both queues have identical, non-zero BS bits?

> +	if (src_hwinfo->fac == target_hwinfo->fac &&
> +	    src_hwinfo->at <= target_hwinfo->at &&
> +	    classes_compatible(src_hwinfo, target_hwinfo) &&
> +	    (src_hwinfo->bs == 0 && target_hwinfo->bs == 0)) {
> +		return true;
> +	}
> +

[Severity: Medium]
The fac bitfield structurally encompasses the class field. If the source
and target queues have different classes (e.g., stateless vs native), will
src_hwinfo->fac == target_hwinfo->fac unconditionally evaluate to false and
break the intended backward compatibility logic in classes_compatible()?

> +	report_qinfo_incompatibilities(mdev_name, apqn, src_hwinfo, target_hwinfo);
> +
> +	return false;
> +}
[ ... ]
> +static void setup_ap_matrix_from_ap_config(const char *mdev_name,
> +					   struct vfio_ap_config *ap_config,
> +					   struct ap_matrix *guest_matrix)
> +{
> +	struct vfio_ap_queue_info qinfo;
> +	unsigned long apid, apqi;
> +
> +	for (int i = 0; i < ap_config->num_queues; i++) {
> +		qinfo = ap_config->qinfo[i];
> +		apid = AP_QID_CARD(qinfo.apqn);
> +		apqi = AP_QID_QUEUE(qinfo.apqn);
> +
> +		if (!test_bit_inv(apid, guest_matrix->apm))
> +			set_bit_inv(apid, guest_matrix->apm);
> +		if (!test_bit_inv(apqi, guest_matrix->aqm))
> +			set_bit_inv(apqi, guest_matrix->aqm);
> +	}
> +}
> +

[Severity: Medium]
Similar to vfio_ap_store_queue_info(), this reconstruction appears to
discard the adm control domains.

>  static ssize_t vfio_ap_resuming_write(struct file *filp, const char __user *buf,
>  				      size_t len, loff_t *pos)
>  {
> -	/* TODO */
> -	return -EOPNOTSUPP;
> +	struct vfio_ap_migration_data *mig_data;
> +	struct ap_matrix_mdev *matrix_mdev;
> +	struct vfio_ap_config *ap_config;
> +	struct ap_matrix guest_matrix;
> +	ssize_t ret = 0, cfg_sz;
> +	const char *mdev_name;
> +

[Severity: Critical]
Is guest_matrix fully initialized before use? It is allocated on the stack,
and setup_ap_matrix_from_ap_config() only modifies the apm and aqm bits.
Does passing this to vfio_ap_set_new_guest_config() apply uninitialized
kernel stack memory (including the uninitialized adm bits) to the guest's
cryptographic configuration?

> +	/*
> +	 * When userspace calls write() with an explicit offset (pwrite), pos is
> +	 * non-NULL and the function rejects it with -ESPIPE (illegal seek). For
> +	 * normal write() calls, pos is NULL, so we'll use the file's internal
> +	 * position filp->f_pos
> +	 */
> +	if (pos)
> +		return -ESPIPE;
> +

[Severity: High]
The VFS layer always passes a valid pos pointer to the write callback
(never NULL). Does this mean normal writes will unconditionally fail with
-ESPIPE and break migration entirely?

> +	mutex_lock(&matrix_dev->mdevs_lock);
> +	pos = &filp->f_pos;
> +
> +	ret = validate_resuming_write_parms(filp, len, pos);
> +	if (ret)
> +		goto done;
> +
> +	matrix_mdev = filp->private_data;
> +	mig_data = matrix_mdev->mig_data;
> +	mdev_name = dev_name(matrix_mdev->vdev.dev);
> +
> +	/*
> +	 * If this is the first write operation, then allocate storage for the
> +	 * AP configuration information; otherwise, reallocate the
> +	 * struct vfio_ap_config object used to store the AP configuration data
> +	 * sent from userspace.
> +	 */
> +	if (*pos == 0) {
> +		ret = allocate_ap_config(&ap_config, buf, len);
> +
> +		/* If the allocation failed, we'll return the error */
> +		if (ret < 0)
> +			goto done;
> +
> +		cfg_sz = ret;
> +	} else {
> +		ap_config = mig_data->resuming_mig_state.ap_config;
> +
> +		ret = reallocate_ap_config(&ap_config, len);
> +		if (ret < 0)
> +			goto cleanup;
> +
> +		cfg_sz = ret;
> +	}
> +
> +	if (*pos + len > cfg_sz) {
> +		ret = -EIO;
> +		goto cleanup;
> +	}
> +
> +	/*
> +	 * We don't want to lock all mdevs while copying data from userspace so
> +	 * we don't block all other mdevs in case the I/O takes a long time.
> +	 * From here on out we don't need this lock because we are not
> +	 * accessing the matrix_mdev until we need to get the update
> +	 * locks to set the new destination guest's AP configuration in which
> +	 * case this lock will be taken then.
> +	 */
> +	mutex_unlock(&matrix_dev->mdevs_lock);
> +

[Severity: High]
Since the file is opened with stream_open (which unsets FMODE_ATOMIC_POS),
f_pos_lock is not held by VFS. Does dropping mdevs_lock here allow
concurrent writes to race on filp->f_pos, leading to data races and memory
corruption?

> +	if (copy_from_user((char *)ap_config + *pos, buf, len)) {
> +		ret = -EIO;
> +		goto cleanup;
> +	}
> +
> +	/* Check if we've completed writing the entire configuration */
> +	if (*pos + len == cfg_sz) {
> +		ret = do_post_copy_validation(mdev_name, ap_config);
> +		if (ret < 0)
> +			goto cleanup;
> +

[Severity: High]
If a short write occurred where len is less than sizeof(struct
vfio_ap_config), could *pos + len == cfg_sz evaluate to true prematurely?
Will this trigger post-copy validation on an undersized and partially
uninitialized buffer, leading to out-of-bounds heap reads?

> +		setup_ap_matrix_from_ap_config(mdev_name, ap_config,
> +					       &guest_matrix);
> +
> +		/* Acquire locks required to update the guest's AP config */
> +		mutex_lock(&ap_attr_mutex);
> +		get_update_locks_for_mdev(matrix_mdev);
> +
> +		ret = vfio_ap_set_new_guest_config(matrix_mdev, &guest_matrix,
> +						   false);
> +		if (!ret) {
> +			mig_data->resuming_mig_state.ap_config = ap_config;
> +			mig_data->resuming_mig_state.config_sz = cfg_sz;
> +		}
> +

[Severity: High]
If a write is partial, it returns before hitting this block. Does this leak
the dynamically allocated configuration buffer since it isn't assigned to
mig_data? And does this permanently break the state machine, as the next
write will fail the validation in validate_resuming_write_parms()?

> +		release_update_locks_for_mdev(matrix_mdev);
> +		mutex_unlock(&ap_attr_mutex);
> +
> +		if (ret)
> +			goto cleanup;
> +	}
> +
> +	ret = len;
> +	*pos += len;
> +	goto done;
> +
> +cleanup:
> +	kfree(ap_config);

[Severity: Critical]
In the error path, we kfree(ap_config). Because krealloc() might have freed
the memory stored in mig_data->resuming_mig_state.ap_config during a prior
reallocate_ap_config(), could calling kfree(ap_config) here without
clearing the global state lead to a Double Free later when the file is
released?

> +done:
> +	if (mutex_is_locked(&matrix_dev->mdevs_lock))
> +		mutex_unlock(&matrix_dev->mdevs_lock);
> +

[Severity: High]
The mutex_is_locked() function only checks if the lock is held by any
thread, not necessarily the current thread. If another thread has acquired
mdevs_lock while this thread was unlocked, will we erroneously unlock it
here and destroy mutual exclusion?

> +	return ret;
>  }
>  
>  static const struct file_operations vfio_ap_resume_fops = {

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260630104132.1598851-1-akrowiak@linux.ibm.com?part=10

next prev parent reply	other threads:[~2026-06-30 11:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-30 10:41 [PATCH v3 00/15] s390/vfio-ap: Add live guest migration support Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 01/15] s390/vfio-ap: Provide function to get the number of queues assigned to mdev Anthony Krowiak
2026-06-30 10:41 ` [PATCH v3 02/15] s390/vfio-ap: Data structures for facilitating vfio device migration Anthony Krowiak
2026-06-30 10:55   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 03/15] s390/vfio-ap: Initialize/release vfio device migration data Anthony Krowiak
2026-06-30 11:04   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 04/15] s390/vfio-ap: Reset migration state in VFIO_DEVICE_RESET ioctl handler Anthony Krowiak
2026-06-30 11:10   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 05/15] s390-vfio-ap: Callback to get/set vfio device mig state during guest migration Anthony Krowiak
2026-06-30 11:11   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 06/15] s390/vfio-ap: Transition guest migration state from STOP to STOP_COPY Anthony Krowiak
2026-06-30 11:23   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 07/15] s390/vfio-ap: File ops called to save the vfio device migration state Anthony Krowiak
2026-06-30 11:26   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 08/15] s390/vfio-ap: Transition device migration state from STOP to RESUMING Anthony Krowiak
2026-06-30 11:28   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 09/15] s390/vfio-ap: Add method to set a new guest AP configuration Anthony Krowiak
2026-06-30 11:34   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 10/15] s390/vfio-ap: File ops called to resume the vfio device migration Anthony Krowiak
2026-06-30 11:37   ` sashiko-bot [this message]
2026-06-30 10:41 ` [PATCH v3 11/15] s390/vfio-ap: Transition device migration state to STOP Anthony Krowiak
2026-06-30 11:46   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 12/15] s390/vfio-ap: Transition device migration state from STOP to RUNNING and vice versa Anthony Krowiak
2026-06-30 11:48   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 13/15] s390/vfio-ap: Callback to get the size of data to be migrated during guest migration Anthony Krowiak
2026-06-30 11:49   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 14/15] s390/vfio-ap: Add 'migratable' feature to sysfs 'features' attribute Anthony Krowiak
2026-06-30 11:56   ` sashiko-bot
2026-06-30 10:41 ` [PATCH v3 15/15] s390/vfio-ap: Add live guest migration chapter to vfio-ap.rst Anthony Krowiak
2026-06-30 11:54   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630113754.1A56B1F00A3D@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox