All of lore.kernel.org
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "David Matlack" <dmatlack@google.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 03/11] PCI: liveupdate: Track incoming preserved PCI devices
Date: Thu, 14 May 2026 01:05:18 +0000	[thread overview]
Message-ID: <20260514010518.B0B41C19425@smtp.kernel.org> (raw)
In-Reply-To: <20260512184846.119396-4-dmatlack@google.com>

Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Integer truncation in `pci_ser_xa_key()` drops the upper 16 bits of the PCI domain.
- [High] Memory leak of KHO-preserved physical pages in the error path of `pci_flb_retrieve()`.
- [High] Subsystem guide violation: `pci_flb_preserve()` calls a KHO serialization API without checking `kho_is_enabled()`.
- [High] Data corruption in `pci_liveupdate_unpreserve()` due to decrementing `ser->nr_devices` without compacting the array.
- [High] Use-after-free risk because `pci_flb_unpreserve()` leaves dangling pointers in `dev->liveupdate.outgoing`.
- [High] `pci_liveupdate_finish()` calls `pci_liveupdate_flb_put_incoming()` under a write lock, risking a sleep-in-atomic bug.
--

commit 46fdddc110c525ef02301643989348b7f03a10c0
Author: David Matlack <dmatlack@google.com>

PCI: liveupdate: Track incoming preserved PCI devices

This commit adds support for tracking pci devices preserved across a kexec
via live update. It introduces setup and cleanup hooks for the pci core to
manage the incoming live update state of these devices.

> diff --git a/drivers/pci/liveupdate.c b/drivers/pci/liveupdate.c
> index 9c4582ecd55cc..f14396dd14773 100644
> --- a/drivers/pci/liveupdate.c
> +++ b/drivers/pci/liveupdate.c
> @@ -85,6 +101,21 @@ struct pci_flb_outgoing {
>  	struct mutex lock;
>  };
>  
> +/**
> + * struct pci_flb_incoming - Incoming PCI FLB object
> + * @ser: The incoming struct pci_ser from the previous kernel.
> + * @xa: Xarray used to quickly lookup devices in @ser.
> + */
> +struct pci_flb_incoming {
> +	struct pci_ser *ser;
> +	struct xarray xa;
> +};
> +
> +static unsigned long pci_ser_xa_key(u32 domain, u16 bdf)
> +{
> +	return domain << 16 | bdf;
> +}

Can this integer shift truncate the upper 16 bits of the pci domain?

Because domain is a u32, the shift expression evaluates as a 32-bit integer.
This discards the upper 16 bits before the result is promoted to the 64-bit
unsigned long return type.

If devices are in domains >= 0x10000, this will produce colliding keys,
causing them to overwrite each other in the xarray.

Could this be cast before the shift, such as
((unsigned long)domain << 16) | bdf?

> +
>  static int pci_flb_preserve(struct liveupdate_flb_op_args *args)
>  {
>  	struct pci_flb_outgoing *outgoing;

Does this function need to check kho_is_enabled() before proceeding?

According to the kho subsystem guidelines, all callers of serialization-side
kho APIs (like kho_alloc_preserve() used later in this function) must
gate their usage on kho_is_enabled().

Failing to check the enabled state can cause null pointer dereferences or
silently add useless tracking state that will never be used.

[ ... ]

> @@ -140,13 +171,44 @@ static void pci_flb_unpreserve(struct liveupdate_flb_op_args *args)

Can pci_flb_unpreserve() cause a use-after-free risk by leaving dangling
pointers?

When an outgoing live update is aborted, this function frees the outgoing->ser
memory via kho_unpreserve_free(). However, it does not clear the
dev->liveupdate.outgoing pointers in the pci devices that were already
preserved.

If a new live update is initiated and pci_liveupdate_unpreserve() is called
on those devices, it will execute memset(dev_ser, 0, ...) on the dangling
dev->liveupdate.outgoing pointer, corrupting memory.

>  
>  static int pci_flb_retrieve(struct liveupdate_flb_op_args *args)
>  {
> -	args->obj = phys_to_virt(args->data);
> +	struct pci_flb_incoming *incoming;
> +	int i, ret;
> +
> +	incoming = kmalloc_obj(*incoming);
> +	if (!incoming)
> +		return -ENOMEM;
> +
> +	incoming->ser = phys_to_virt(args->data);
> +
> +	xa_init(&incoming->xa);
> +
> +	for (i = 0; i < incoming->ser->max_nr_devices; i++) {
> +		struct pci_dev_ser *dev_ser = &incoming->ser->devices[i];
> +		unsigned long key;
> +
> +		if (!dev_ser->refcount)
> +			continue;
> +
> +		key = pci_ser_xa_key(dev_ser->domain, dev_ser->bdf);
> +		ret = xa_err(xa_store(&incoming->xa, key, dev_ser, GFP_KERNEL));
> +		if (ret) {
> +			xa_destroy(&incoming->xa);
> +			kfree(incoming);
> +			return ret;
> +		}

Does this error path leak the preserved physical pages?

If xa_store() fails, this returns an error code without calling
kho_restore_free(incoming->ser). Because the function returns an error and
does not populate args->obj, the live update core will not call the .finish
callback to clean up the memory.

[ ... ]

> @@ -268,6 +330,163 @@ void pci_liveupdate_unpreserve(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_liveupdate_unpreserve);

Can pci_liveupdate_unpreserve() corrupt the live update tracking state by
leaving holes in the array?

Looking at the body of pci_liveupdate_unpreserve():

drivers/pci/liveupdate.c:pci_liveupdate_unpreserve() {
    ...
    ser->nr_devices--;
    memset(dev_ser, 0, sizeof(*dev_ser));
    ...
}

If a device other than the most recently preserved one is unpreserved, this
leaves a hole in the array and moves the allocation cursor backward.
Subsequent device preservations will overwrite currently preserved, valid
devices at that index.

[ ... ]

> +void pci_liveupdate_finish(struct pci_dev *dev)
> +{
> +	guard(write_lock)(&dev->liveupdate.lock);
> +
> +	if (!dev->liveupdate.incoming) {
> +		pci_warn(dev, "Cannot finish preserving an unpreserved device\n");
> +		return;
> +	}
> +
> +	pci_info(dev, "Device is finished participating in Live Update\n");
> +
> +	/*
> +	 * Drop the refcount so this device does not get treated as an incoming
> +	 * device again, e.g. in case pci_liveupdate_setup_device() gets called
> +	 * again because the device is hot-plugged.
> +	 */
> +	dev->liveupdate.incoming->refcount = 0;
> +	dev->liveupdate.incoming = NULL;
> +
> +	/* Drop this device's reference on the incoming FLB. */
> +	pci_liveupdate_flb_put_incoming();
> +}

Can calling pci_liveupdate_flb_put_incoming() under a write lock cause a
sleep-in-atomic bug?

This function calls pci_liveupdate_flb_put_incoming() while holding
dev->liveupdate.lock (an rwlock) for writing. If the put function acquires a
sleepable lock internally, calling it from atomic context will trigger a bug.

This contrasts with pci_liveupdate_cleanup_device() which drops the lock
first using scoped_guard().

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260512184846.119396-1-dmatlack@google.com?part=3

  reply	other threads:[~2026-05-14  1:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 18:48 [PATCH v5 00/11] PCI: liveupdate: PCI core support for Live Update David Matlack
2026-05-12 18:48 ` [PATCH v5 01/11] PCI: liveupdate: Set up FLB handler for the PCI core David Matlack
2026-05-12 18:48 ` [PATCH v5 02/11] PCI: liveupdate: Track outgoing preserved PCI devices David Matlack
2026-05-14  0:31   ` sashiko-bot
2026-05-14 18:16     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 03/11] PCI: liveupdate: Track incoming " David Matlack
2026-05-14  1:05   ` sashiko-bot [this message]
2026-05-14 18:27     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 04/11] PCI: liveupdate: Document driver binding responsibilities David Matlack
2026-05-12 18:48 ` [PATCH v5 05/11] PCI: liveupdate: Keep bus numbers constant during Live Update David Matlack
2026-05-14  1:36   ` sashiko-bot
2026-05-14 18:39     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 06/11] PCI: liveupdate: Auto-preserve upstream bridges across " David Matlack
2026-05-14  2:05   ` sashiko-bot
2026-05-14 18:41     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 07/11] PCI: liveupdate: Inherit ACS flags in incoming preserved devices David Matlack
2026-05-14  2:37   ` sashiko-bot
2026-05-14 18:46     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 08/11] PCI: liveupdate: Inherit ARI Forwarding Enable on preserved bridges David Matlack
2026-05-12 18:48 ` [PATCH v5 09/11] PCI: liveupdate: Freeze preservation status during shutdown David Matlack
2026-05-14  3:14   ` sashiko-bot
2026-05-14 18:48     ` David Matlack
2026-05-12 18:48 ` [PATCH v5 10/11] PCI: liveupdate: Do not disable bus mastering on preserved devices during kexec David Matlack
2026-05-12 18:48 ` [PATCH v5 11/11] Documentation: PCI: Add documentation for Live Update David Matlack

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514010518.B0B41C19425@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=dmatlack@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.