Re: [PATCH v5 2/4] PCI: endpoint: Add DOE mailbox support for endpoint functions

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Aksh Garg" <a-garg7@ti.com>
Cc: linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 2/4] PCI: endpoint: Add DOE mailbox support for endpoint functions
Date: Wed, 10 Jun 2026 10:17:16 +0000	[thread overview]
Message-ID: <20260610101716.95CD01F00898@smtp.kernel.org> (raw)
In-Reply-To: <20260610100256.1889111-3-a-garg7@ti.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] A sticky `PCI_DOE_FLAG_CANCEL` flag causes spurious aborts of valid requests following an idle mailbox abort.
- [High] Potential hard deadlock due to acquiring non-IRQ-safe xarray spinlocks in hardirq context.
- [High] A TOCTOU race between `pci_ep_doe_submit_task()` and `pci_ep_doe_destroy_mb()` causes `queue_work()` to be called on a destroyed workqueue.
- [High] Clearing the `PCI_DOE_FLAG_CANCEL` flag unconditionally can silently erase a concurrent DOE Abort signal from the host.
--

commit 0a3d7b35451c74090ea52cb3095ab14d0e4a6a56
Author: Aksh Garg <a-garg7@ti.com>

PCI: endpoint: Add DOE mailbox support for endpoint functions

This commit implements Data Object Exchange (DOE) for PCIe endpoint devices.
It provides framework APIs to register mailboxes and handle DOE requests
using workqueues. Discovery protocol support is built-in.

> diff --git a/drivers/pci/endpoint/pci-ep-doe.c b/drivers/pci/endpoint/pci-ep-doe.c
> new file mode 100644
> --- /dev/null
> +++ b/drivers/pci/endpoint/pci-ep-doe.c
[ ... ]
> +static struct pci_ep_doe_mb *pci_ep_doe_get_mailbox(struct pci_epc *epc,
> +						    u8 func_no, u16 cap_offset)
> +{
> +	struct pci_ep_doe_mb *doe_mb;
> +	unsigned long key;
> +
> +	if (!epc)
> +		return NULL;
> +
> +	key = PCI_DOE_MB_KEY(func_no, cap_offset);
> +
> +	xa_lock(&epc->doe_mbs);

[Severity: High]
Does acquiring the plain xa_lock spinlock here pose a deadlock risk?

Both pci_ep_doe_process_request() and pci_ep_doe_abort() call this function,
and these are intended to be executed from a controller's hardware interrupt
handler. Because xa_lock() does not disable interrupts, if a thread is
accessing the xarray in process context (like in pci_ep_doe_add_mailbox() or
pci_ep_doe_destroy()) and is preempted by a DOE interrupt on the same CPU,
attempting to acquire the lock in the interrupt handler will deadlock.

Could this be resolved by using xa_lock_irqsave() or initializing the xarray
with XA_FLAGS_LOCK_IRQ?

> +
> +	doe_mb = xa_load(&epc->doe_mbs, key);
> +	if (doe_mb && !refcount_inc_not_zero(&doe_mb->refs))
> +		doe_mb = NULL;
> +
> +	xa_unlock(&epc->doe_mbs);
> +
> +	return doe_mb;
> +}
[ ... ]
> +static void signal_task_complete(struct pci_ep_doe_task *task, int status)
> +{
> +	struct pci_ep_doe_mb *doe_mb = task->doe_mb;
> +
> +	task->complete(doe_mb->epc, doe_mb->func_no, doe_mb->cap_offset,
> +		       status, task->feat.vid, task->feat.type,
> +		       task->response_pl, task->response_pl_sz);
> +
> +	/* Clear the CANCEL flag for next DOE request */
> +	clear_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags);

[Severity: High]
Can this unconditional clear_bit() unintentionally erase a concurrent DOE
Abort signal?

If the host issues a DOE Abort interrupt while the task->complete() callback
is still running, pci_ep_doe_abort() will set the PCI_DOE_FLAG_CANCEL bit.
However, once the completion callback returns, clear_bit() is unconditionally
called here, erasing the concurrent abort signal. As a result, subsequent
queued requests might be processed instead of being aborted.

> +
> +	kfree(task->request_pl);
> +	kfree(task);
> +
> +	/* Release the mailbox reference acquired during process_request */
> +	pci_ep_doe_put_mailbox(doe_mb);
> +}
> +
> +/**
> + * doe_ep_task_work() - Work function for processing DOE EP tasks
> + * @work: Work structure
> + *
> + * Process a DOE request by calling the appropriate protocol handler.
> + */
> +static void doe_ep_task_work(struct work_struct *work)
> +{
> +	struct pci_ep_doe_task *task = container_of(work, struct pci_ep_doe_task,
> +						    work);
> +	struct pci_ep_doe_mb *doe_mb = task->doe_mb;
> +	pci_doe_protocol_handler_t handler;
> +	int rc;
> +
> +	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags)) {
> +		signal_task_complete(task, -EIO);
> +		return;
> +	}
> +
> +	/* Check if request was aborted */
> +	if (test_bit(PCI_DOE_FLAG_CANCEL, &doe_mb->flags)) {

[Severity: High]
Is it possible for a newly submitted request to be erroneously aborted here
due to a stale CANCEL flag?

Because pci_ep_doe_abort() simply sets the PCI_DOE_FLAG_CANCEL flag and
returns, the flag is only cleared by signal_task_complete(). If an abort
occurs when the mailbox is entirely idle, no worker will execute to clear
the flag. When the host eventually sends a valid request, this worker will
see the old CANCEL flag, drop the new request with -ECANCELED, and only
then clear it, breaking the expected request sequence.

> +		signal_task_complete(task, -ECANCELED);
> +		return;
> +	}
[ ... ]
> +static int pci_ep_doe_submit_task(struct pci_ep_doe_mb *doe_mb,
> +				  struct pci_ep_doe_task *task)
> +{
> +	if (test_bit(PCI_DOE_FLAG_DEAD, &doe_mb->flags))
> +		return -EIO;
> +
> +	task->doe_mb = doe_mb;
> +	INIT_WORK(&task->work, doe_ep_task_work);
> +	queue_work(doe_mb->work_queue, &task->work);

[Severity: High]
Could a Time-Of-Check to Time-Of-Use race here cause queue_work() to be
called on a destroyed workqueue?

If the executing thread is preempted immediately after the lockless
test_bit(PCI_DOE_FLAG_DEAD, ...) check evaluates to false, another thread
running pci_ep_doe_destroy() could mark the mailbox DEAD and call
destroy_workqueue(). The mailbox structure itself is kept alive by its
reference count, but the workqueue pointer would point to freed memory.

When the preempted thread resumes, it would blindly call queue_work() on
the freed workqueue, resulting in a use-after-free panic.

> +	return 0;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260610100256.1889111-1-a-garg7@ti.com?part=2

next prev parent reply	other threads:[~2026-06-10 10:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-10 10:02 [PATCH v5 0/4] PCI: Add DOE support for endpoint Aksh Garg
2026-06-10 10:02 ` [PATCH v5 1/4] PCI/DOE: Move common definitions to the header file Aksh Garg
2026-06-10 10:02 ` [PATCH v5 2/4] PCI: endpoint: Add DOE mailbox support for endpoint functions Aksh Garg
2026-06-10 10:17   ` sashiko-bot [this message]
2026-06-10 11:19     ` Aksh Garg
2026-06-10 10:02 ` [PATCH v5 3/4] PCI: endpoint: Add support for DOE initialization and setup in EPC core Aksh Garg
2026-06-10 10:17   ` sashiko-bot
2026-06-10 10:02 ` [PATCH v5 4/4] Documentation: PCI: Add documentation for DOE endpoint support Aksh Garg
2026-06-10 23:21   ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260610101716.95CD01F00898@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=a-garg7@ti.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.