From: sashiko-bot@kernel.org
To: "Nathan Lynch via B4 Relay" <devnull+nathan.lynch.amd.com@kernel.org>
Cc: Frank.Li@kernel.org, dmaengine@vger.kernel.org,
linux-pci@vger.kernel.org, vkoul@kernel.org
Subject: Re: [PATCH v2 23/23] dmaengine: sdxi: Add DMA engine provider
Date: Wed, 13 May 2026 22:57:43 +0000 [thread overview]
Message-ID: <20260513225744.2E6F5C19425@smtp.kernel.org> (raw)
In-Reply-To: <20260511-sdxi-base-v2-23-889cfed17e3f@amd.com>
Thank you for your contribution! Sashiko AI review found 13 potential issue(s) to consider:
- [Critical] Out-of-bounds array initialization due to TOCTOU on dma_channels module parameter.
- [Critical] DMA Use-After-Free (Memory Corruption) when sdxi_completion_poll times out.
- [High] Unhandled negative error code from sdxi_alloc_akey is cast to a positive IRQ number, causing a silent failure.
- [High] Hardware ring stalls permanently if prepared descriptors are freed before submission.
- [High] sdxi_dma_terminate_all fails to abort active transfers, causing Use-After-Free on target buffers.
- [High] Silent early return on allocation failure in sdxi_stop_cxt leads to DMA Use-After-Free.
- [High] Out-of-order completion in sdxi_tx_status breaks completed_cookie monotonicity and leaks descriptors.
- [High] Unhandled DMA transfer errors in sdxi_dma_cxt_irq lead to silent data corruption.
- [High] Misplaced read memory barrier in sdxi_completion_signaled risks data corruption.
- [High] Missing tasklet_kill() during channel teardown leads to Use-After-Free of the virt-dma tasklet.
- [Medium] sdxi_dma_terminate_all leaks descriptors by moving them to the unswept desc_terminated list.
- [Medium] sdxi_dma_synchronize fails to flush tasklets on error paths.
- [Medium] umax() incorrectly evaluates wrapping hardware doorbell indices in sdxi_dma_issue_pending.
--
commit 17b81cc17c9b5c350d13ff97f58dfc2cf02880d1
Author: Nathan Lynch <nathan.lynch@amd.com>
Subject: dmaengine: sdxi: Add DMA engine provider
This commit adds a DMA engine provider for the Smart Data Accelerator Interface
(SDXI). It registers a memory-to-memory DMA device supporting interrupt and
polled completion modes.
> diff --git a/drivers/dma/sdxi/dma.c b/drivers/dma/sdxi/dma.c
> new file mode 100644
> --- /dev/null
> +++ b/drivers/dma/sdxi/dma.c
[ ... ]
> +static void sdxi_tx_desc_free(struct virt_dma_desc *vdesc)
> +{
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> +
> + sdxi_completion_free(sddesc->completion);
> + kfree(to_sdxi_dma_desc(vdesc));
> +}
If a client frees a prepared descriptor before submitting it, does
sdxi_tx_desc_free() fail to write valid NOPs to the abandoned hardware ring
slots? If the slots are left with their valid bits cleared, will the hardware
permanently stall when it reaches them?
[ ... ]
> +static enum dma_status sdxi_tx_status(struct dma_chan *chan,
> + dma_cookie_t cookie,
> + struct dma_tx_state *state)
> +{
> + struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(chan);
> + struct sdxi_dma_desc *sddesc;
> + enum dma_status status;
> + struct virt_dma_desc *vdesc;
> +
> + status = dma_cookie_status(chan, cookie, state);
> + if (status == DMA_COMPLETE)
> + return status;
> +
> + guard(spinlock_irqsave)(&sdchan->vchan.lock);
> +
> + vdesc = vchan_find_desc(&sdchan->vchan, cookie);
> + if (!vdesc)
> + return status;
> +
> + sddesc = to_sdxi_dma_desc(vdesc);
> +
> + if (WARN_ON_ONCE(!sddesc->completion))
> + return DMA_ERROR;
> +
> + if (!sdxi_completion_signaled(sddesc->completion))
> + return DMA_IN_PROGRESS;
> +
> + if (sdxi_completion_errored(sddesc->completion))
> + return DMA_ERROR;
> +
> + list_del(&vdesc->node);
> + vchan_cookie_complete(vdesc);
If a newer descriptor is polled and completed before an older unpolled
descriptor, does calling vchan_cookie_complete() set completed_cookie forward
and break DMA engine monotonicity? Would unpolled descriptors preceding the
polled one permanently leak in the desc_issued list?
> +
> + return dma_cookie_status(chan, cookie, state);
> +}
> +
> +static void sdxi_dma_issue_pending(struct dma_chan *dma_chan)
> +{
> + struct virt_dma_chan *vchan = to_virt_chan(dma_chan);
> + struct virt_dma_desc *vdesc;
> + u64 dbval = 0;
> +
> + scoped_guard(spinlock_irqsave, &vchan->lock) {
> + /*
> + * This can happen with racing submitters.
> + */
> + if (list_empty(&vchan->desc_submitted))
> + return;
> +
> + list_for_each_entry(vdesc, &vchan->desc_submitted, node) {
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> + struct sdxi_desc *hwdesc;
> +
> + sdxi_ring_resv_foreach(&sddesc->resv, hwdesc)
> + sdxi_desc_make_valid(hwdesc);
> + /*
> + * The reservations ought to be ordered
> + * ascending, but use umax() just in case.
> + */
> + dbval = umax(sdxi_ring_resv_dbval(&sddesc->resv), dbval);
If the underlying ring index wraps around (e.g., from capacity - 1 to 0), does
umax() erroneously retain the older, larger index instead of taking the newer
wrapped index? Would this point the hardware doorbell backward and cause the
channel to stall?
> + }
> +
> + vchan_issue_pending(vchan);
> + }
> +
> + /*
> + * The implementation is required to handle out-of-order
> + * doorbell updates; we can do this after dropping the
> + * lock.
> + */
> + sdxi_cxt_push_doorbell(to_sdxi_dma_chan(dma_chan)->cxt, dbval);
> +}
> +
> +static int sdxi_dma_terminate_all(struct dma_chan *dma_chan)
> +{
> + struct virt_dma_chan *vchan = to_virt_chan(dma_chan);
> + u64 dbval = 0;
> +
> + /*
> + * Allocated and submitted txds are in the ring but not valid
> + * yet. Overwrite them with nops and then set their valid
> + * bits.
> + *
> + * The implementation may start consuming these as soon as the
> + * valid bits flip. sdxi_dma_synchronize() will ensure they're
> + * all done.
> + */
> + scoped_guard(spinlock_irqsave, &vchan->lock) {
> + struct virt_dma_desc *vdesc;
> + LIST_HEAD(head);
> +
> + list_splice_tail_init(&vchan->desc_allocated, &head);
> + list_splice_tail_init(&vchan->desc_submitted, &head);
Should sdxi_dma_terminate_all() also abort transfers in the desc_issued list?
By only addressing allocated and submitted descriptors, does this leave active
hardware transfers running in the background?
> +
> + if (list_empty(&head))
> + return 0;
> +
> + list_for_each_entry(vdesc, &head, node) {
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> + struct sdxi_desc *hwdesc;
> +
> + sdxi_ring_resv_foreach(&sddesc->resv, hwdesc) {
> + sdxi_serialize_nop(hwdesc);
> + sdxi_desc_make_valid(hwdesc);
> + }
> +
> + dbval = umax(sdxi_ring_resv_dbval(&sddesc->resv), dbval);
Does umax() here also suffer from the same ring index wrapping issue as in
sdxi_dma_issue_pending()?
> + }
> +
> + list_splice_tail(&head, &vchan->desc_terminated);
Does moving the descriptors to desc_terminated without sweeping them result in
a memory leak if a client repeatedly calls dmaengine_terminate_all() during
the channel's active lifetime?
> + }
> +
> + sdxi_cxt_push_doorbell(to_sdxi_dma_chan(dma_chan)->cxt, dbval);
> +
> + return 0;
> +}
> +
> +static void sdxi_dma_synchronize(struct dma_chan *dma_chan)
> +{
> + struct sdxi_cxt *cxt = to_sdxi_dma_chan(dma_chan)->cxt;
> + struct sdxi_ring_resv resv;
> + struct sdxi_desc *nop;
> + int err;
> +
> + /* Submit a single nop with fence and wait for it to complete. */
> +
> + if (sdxi_ring_reserve(cxt->ring_state, 1, &resv))
> + return;
> +
> + struct sdxi_completion *comp __free(sdxi_completion) = sdxi_completion_alloc(cxt->sdxi);
> + if (!comp)
> + return;
If sdxi_ring_reserve() or sdxi_completion_alloc() fails, this returns early
without calling vchan_synchronize(). Does this violate the guarantee that all
callbacks have completed, allowing the client to prematurely free memory?
> +
> + nop = sdxi_ring_resv_next(&resv);
> + sdxi_serialize_nop(nop);
> + sdxi_completion_attach(nop, comp);
> + sdxi_desc_set_fence(nop);
> + sdxi_desc_make_valid(nop);
> + sdxi_cxt_push_doorbell(cxt, sdxi_ring_resv_dbval(&resv));
> +
> + err = sdxi_completion_poll(comp);
If sdxi_completion_poll() reaches its timeout and returns, the
__free(sdxi_completion) cleanup frees the DMA-mapped status block. Does the
hardware descriptor remain in the ring? If the hardware eventually processes
the descriptor, will it write to the freed block?
> + WARN_ONCE(err, "got %d polling cst_blk", err);
> +
> + vchan_synchronize(to_virt_chan(dma_chan));
> +}
> +
> +static irqreturn_t sdxi_dma_cxt_irq(int irq, void *data)
> +{
> + struct sdxi_dma_chan *sdchan = data;
> + struct virt_dma_chan *vchan = &sdchan->vchan;
> + struct virt_dma_desc *vdesc;
> + bool completed = false;
> +
> + guard(spinlock_irqsave)(&vchan->lock);
> +
> + while ((vdesc = vchan_next_desc(vchan))) {
> + struct sdxi_dma_desc *sddesc = to_sdxi_dma_desc(vdesc);
> +
> + if (!sdxi_completion_signaled(sddesc->completion))
> + break;
The interrupt handler appears to ignore the hardware error state exposed by
sdxi_completion_errored(). Are errored descriptors unconditionally completed
via vchan_cookie_complete(), potentially masking hardware faults?
Also, looking at sdxi_completion_signaled(), it places dma_rmb() before
checking the synchronization variable (sc->cst_blk->signal == 0). Does this
misplaced barrier fail to order the signal read with subsequent data reads,
allowing stale data to be consumed by the client?
> +
> + list_del(&vdesc->node);
> + vchan_cookie_complete(&sddesc->vdesc);
> + completed = true;
> + }
> +
> + if (completed)
> + sdxi_ring_wake_up(sdchan->cxt->ring_state);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int sdxi_dma_alloc_chan_resources(struct dma_chan *dma_chan)
> +{
> + struct sdxi_dev *sdxi = dev_get_drvdata(dma_chan->device->dev);
> + struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(dma_chan);
> + int vector, irq, err;
> +
> + sdchan->cxt = sdxi_cxt_new(sdxi);
> + if (!sdchan->cxt)
> + return -ENOMEM;
> + /*
> + * This irq and akey setup should perhaps all be pushed into
> + * the context allocation.
> + */
> + err = vector = sdxi_alloc_vector(sdxi);
> + if (vector < 0)
> + goto exit_cxt;
> +
> + sdchan->vector = vector;
> +
> + err = irq = sdxi_vector_to_irq(sdxi, vector);
> + if (irq < 0)
> + goto free_vector;
> +
> + sdchan->irq = irq;
> +
> + /*
> + * Note this akey entry is used for both the completion
> + * interrupt and source and destination access for copies.
> + */
> + sdchan->akey = sdxi_alloc_akey(sdchan->cxt);
> + if (!sdchan->akey)
> + goto free_vector;
If sdxi_alloc_akey() fails, does the function return the positive irq value in
err? If the DMA engine core considers alloc_chan_resources returning >= 0 as
a success, would this cause a silent allocation failure?
> +
> + *sdchan->akey = (typeof(*sdchan->akey)) {
> + .intr_num = cpu_to_le16(FIELD_PREP(SDXI_AKEY_ENT_VL, 1) |
> + FIELD_PREP(SDXI_AKEY_ENT_IV, 1) |
> + FIELD_PREP(SDXI_AKEY_ENT_INTR_NUM,
> + vector)),
> + };
> +
> + err = request_irq(sdchan->irq, sdxi_dma_cxt_irq,
> + IRQF_TRIGGER_NONE, "SDXI DMAengine", sdchan);
> + if (err)
> + goto free_akey;
> +
> + err = sdxi_start_cxt(sdchan->cxt);
If sdxi_start_cxt() calls sdxi_completion_poll() and that reaches its timeout,
could it similarly leave a hardware descriptor running that will later write to
the freed status block?
> + if (err)
> + goto free_irq;
> +
> + return 0;
> +free_irq:
> + free_irq(sdchan->irq, sdchan);
> +free_akey:
> + sdxi_free_akey(sdchan->cxt, sdchan->akey);
> +free_vector:
> + sdxi_free_vector(sdxi, vector);
> +exit_cxt:
> + sdxi_cxt_exit(sdchan->cxt);
> + return err;
> +}
> +
> +static void sdxi_dma_free_chan_resources(struct dma_chan *dma_chan)
> +{
> + struct sdxi_dma_chan *sdchan = to_sdxi_dma_chan(dma_chan);
> +
> + sdxi_stop_cxt(sdchan->cxt);
If sdxi_stop_cxt() aborts early (e.g. due to sdxi_completion_alloc() returning
NULL), does it fail to actually stop the hardware context? When the teardown
then proceeds to free the context and its DMA ring buffers via
sdxi_cxt_exit(), will the running hardware corrupt the freed context memory?
> + free_irq(sdchan->irq, sdchan);
> + sdxi_free_vector(sdchan->cxt->sdxi, sdchan->vector);
> + sdxi_free_akey(sdchan->cxt, sdchan->akey);
> + vchan_free_chan_resources(to_virt_chan(dma_chan));
Does sdxi_dma_free_chan_resources() miss a call to
tasklet_kill(&sdchan->vchan.task)? If the virt-dma tasklet is scheduled just
before the IRQ is freed, could it run asynchronously after the device is
unbound?
> + sdxi_cxt_exit(sdchan->cxt);
> +}
> +
> +int sdxi_dma_register(struct sdxi_dev *sdxi)
> +{
> + struct device *dev = sdxi->dev;
> + struct sdxi_dma_dev *sddev;
> + struct dma_device *dma_dev;
> + int err;
> +
> + if (!dma_channels)
> + return 0;
> + /*
> + * Note that this code assumes the device supports the
> + * interrupt operation group (IntrGrp), which is optional. See
> + * SDXI 1.0 Table 6-1 SDXI Operation Groups.
> + *
> + * TODO: check sdxi->op_grp_cap for IntrGrp support and error
> + * out if it's missing.
> + */
> +
> + sddev = devm_kzalloc(dev, struct_size(sddev, sdchan, dma_channels),
> + GFP_KERNEL);
> + if (!sddev)
> + return -ENOMEM;
> +
> + sddev->nr_channels = dma_channels;
Could dma_channels be modified concurrently via sysfs between the struct_size
calculation and this assignment? If so, does the subsequent initialization loop
write past the allocated bounds?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260511-sdxi-base-v2-0-889cfed17e3f@amd.com?part=23
prev parent reply other threads:[~2026-05-13 22:57 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 19:16 [PATCH v2 00/23] dmaengine: Smart Data Accelerator Interface (SDXI) basic support Nathan Lynch via B4 Relay
2026-05-11 19:16 ` [PATCH v2 01/23] PCI: Add SNIA SDXI accelerator sub-class Nathan Lynch via B4 Relay
2026-05-11 20:48 ` Frank Li
2026-05-12 23:50 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 02/23] MAINTAINERS: Add entry for SDXI driver Nathan Lynch via B4 Relay
2026-05-11 19:16 ` [PATCH v2 03/23] dmaengine: sdxi: Add PCI initialization Nathan Lynch via B4 Relay
2026-05-11 21:22 ` Frank Li
2026-05-13 0:05 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 04/23] dmaengine: sdxi: Feature discovery and initial configuration Nathan Lynch via B4 Relay
2026-05-11 21:30 ` Frank Li
2026-05-13 0:33 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 05/23] dmaengine: sdxi: Configure context tables Nathan Lynch via B4 Relay
2026-05-13 1:12 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 06/23] dmaengine: sdxi: Allocate DMA pools Nathan Lynch via B4 Relay
2026-05-13 1:30 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 07/23] dmaengine: sdxi: Allocate administrative context Nathan Lynch via B4 Relay
2026-05-13 2:20 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 08/23] dmaengine: sdxi: Install " Nathan Lynch via B4 Relay
2026-05-13 3:17 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 09/23] dmaengine: sdxi: Start functions on probe, stop on remove Nathan Lynch via B4 Relay
2026-05-13 3:35 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 10/23] dmaengine: sdxi: Complete administrative context jump start Nathan Lynch via B4 Relay
2026-05-13 3:54 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 11/23] dmaengine: sdxi: Add client context alloc and release APIs Nathan Lynch via B4 Relay
2026-05-13 4:46 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 12/23] dmaengine: sdxi: Add descriptor ring management Nathan Lynch via B4 Relay
2026-05-13 5:21 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 13/23] dmaengine: sdxi: Add unit tests for descriptor ring reservations Nathan Lynch via B4 Relay
2026-05-13 5:48 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 14/23] dmaengine: sdxi: Attach descriptor ring state to contexts Nathan Lynch via B4 Relay
2026-05-13 19:31 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 15/23] dmaengine: sdxi: Per-context access key (AKey) table entry allocator Nathan Lynch via B4 Relay
2026-05-13 19:54 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 16/23] dmaengine: sdxi: Generic descriptor manipulation helpers Nathan Lynch via B4 Relay
2026-05-13 20:21 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 17/23] dmaengine: sdxi: Add completion status block API Nathan Lynch via B4 Relay
2026-05-13 20:38 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 18/23] dmaengine: sdxi: Encode context start, stop, and sync descriptors Nathan Lynch via B4 Relay
2026-05-11 19:16 ` [PATCH v2 19/23] dmaengine: sdxi: Provide context start and stop APIs Nathan Lynch via B4 Relay
2026-05-13 21:18 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 20/23] dmaengine: sdxi: Encode nop, copy, and interrupt descriptors Nathan Lynch via B4 Relay
2026-05-13 21:33 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 21/23] dmaengine: sdxi: Add unit tests for descriptor encoding Nathan Lynch via B4 Relay
2026-05-13 21:55 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 22/23] dmaengine: sdxi: MSI/MSI-X vector allocation and mapping Nathan Lynch via B4 Relay
2026-05-13 22:17 ` sashiko-bot
2026-05-11 19:16 ` [PATCH v2 23/23] dmaengine: sdxi: Add DMA engine provider Nathan Lynch via B4 Relay
2026-05-11 20:47 ` Frank Li
2026-05-11 22:28 ` Lynch, Nathan
2026-05-13 20:01 ` Frank Li
2026-05-13 22:57 ` sashiko-bot [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260513225744.2E6F5C19425@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=Frank.Li@kernel.org \
--cc=devnull+nathan.lynch.amd.com@kernel.org \
--cc=dmaengine@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
--cc=vkoul@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox