Re: [PATCH v18 06/14] dmaengine: qcom: bam_dma: add support for BAM locking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Bartosz Golaszewski" <bartosz.golaszewski@oss.qualcomm.com>
Cc: Frank.Li@kernel.org, vkoul@kernel.org, dmaengine@vger.kernel.org
Subject: Re: [PATCH v18 06/14] dmaengine: qcom: bam_dma: add support for BAM locking
Date: Fri, 22 May 2026 15:25:05 +0000	[thread overview]
Message-ID: <20260522152506.63A241F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260522-qcom-qce-cmd-descr-v18-6-99103926bafc@oss.qualcomm.com>

Thank you for your contribution! Sashiko AI review found 10 potential issue(s) to consider:
- [High] Assigning dmaengine cookies to internal dummy descriptors breaks the monotonic sequence of completions, causing premature completion reporting for client descriptors.
- [High] Unconditional addition of LOCK/UNLOCK descriptors in `bam_start_dma()` causes duplicate lock injections and queue accumulation upon channel resume.
- [High] Failure to allocate the UNLOCK descriptor leaves the hardware pipeline permanently locked.
- [High] The command element is DMA-mapped using the client peripheral device instead of the DMA controller device.
- [High] Per-descriptor metadata updates globally shared channel state, causing a persistent state leak across transactions.
- [High] The DMA-mapped `lock_ce` buffer is not cacheline-aligned and shares a cacheline with CPU-modified fields.
- [High] Internal dummy descriptors are not assigned the `DMA_CTRL_ACK` flag, leading to a permanent memory and DMA mapping leak.
- [Medium] The pipe locking logic checks the wrong transfer direction, completely disabling the feature for RX channels.
- [Medium] `bam_metadata_attach()` does not validate the `len` argument before dereferencing the metadata payload.
- [Medium] Missing `cpu_to_le32()` conversion when assigning the DMA address to the hardware descriptor.
--

commit e7baf190c2e5ffe3e7505c85ae0eb210175ae10a
Author: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

dmaengine: qcom: bam_dma: add support for BAM locking

Add support for BAM pipe locking. To that end: when starting DMA on an RX
channel - prepend the existing queue of issued descriptors with an
additional "dummy" command descriptor with the LOCK bit set. Once the
transaction is done (no more issued descriptors), issue one more dummy
descriptor with the UNLOCK bit.

> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 04fe1d546be73..40b5d5c24067a 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
[ ... ]
> @@ -72,6 +76,10 @@ struct bam_async_desc {
>  
>  	struct bam_desc_hw *curr_desc;
>  
> +	/* BAM locking infrastructure */
> +	struct scatterlist lock_sg;
> +	struct bam_cmd_element lock_ce;
> +
>  	/* list node for the desc in the bam_chan list of descriptors */
>  	struct list_head desc_node;

[Severity: High]
Does this struct layout cause cacheline tearing on non-coherent
architectures?

lock_ce is embedded directly inside the dynamically allocated
struct bam_async_desc without cacheline alignment, meaning it shares a
cacheline with CPU-modified fields like desc_node. Concurrently modifying
list pointers while the DMA API performs cache maintenance operations on the
same cacheline violates ARCH_DMA_MINALIGN rules and can corrupt data.

[ ... ]
> @@ -652,6 +664,35 @@ static int bam_slave_config(struct dma_chan *chan,
>  	return 0;
>  }
>  
> +static int bam_metadata_attach(struct dma_async_tx_descriptor *desc, void *data, size_t len)
> +{
> +	struct bam_chan *bchan = to_bam_chan(desc->chan);
> +	const struct bam_device_data *bdata = bchan->bdev->dev_data;
> +	struct bam_desc_metadata *metadata = data;

[Severity: Medium]
Could this result in an out-of-bounds read if a caller provides a
truncated payload?

The data pointer is cast and used without verifying that len is at
least sizeof(struct bam_desc_metadata).

> +
> +	if (!data)
> +		return -EINVAL;
> +
> +	if (!bdata->pipe_lock_supported)
> +		/*
> +		 * The client wants to use locking but this BAM version doesn't
> +		 * support it. Don't return an error here as this will stop the
> +		 * client from using DMA at all for no reason.
> +		 */
> +		return 0;
> +
> +	guard(spinlock_irqsave)(&bchan->vc.lock);
> +
> +	bchan->scratchpad_addr = metadata->scratchpad_addr;

[Severity: High]
Does saving scratchpad_addr to the shared channel structure cause a state
leak across transactions?

This state is never cleared. If a client submits a transfer with metadata
and later submits transfers without metadata, the stale scratchpad_addr
remains in bchan, potentially causing dummy lock writes to freed or
reallocated physical addresses.

[ ... ]
> +static struct bam_async_desc *
> +bam_make_lock_desc(struct bam_chan *bchan, unsigned long flag)
> +{
> +	struct dma_chan *chan = &bchan->vc.chan;
> +	struct bam_async_desc *async_desc;
> +	struct bam_desc_hw *desc;
> +	struct virt_dma_desc *vd;
> +	struct virt_dma_chan *vc;
> +	unsigned int mapped;
> +	dma_cookie_t cookie;
> +	int ret;
> +
> +	async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
> +	if (!async_desc) {
> +		dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	sg_init_table(&async_desc->lock_sg, 1);
> +
> +	async_desc->num_desc = 1;
> +	async_desc->curr_desc = async_desc->desc;
> +	async_desc->dir = DMA_MEM_TO_DEV;
> +
> +	desc = async_desc->desc;
> +
> +	bam_prep_ce_le32(&async_desc->lock_ce, bchan->scratchpad_addr, BAM_WRITE_COMMAND, 0);
> +	sg_set_buf(&async_desc->lock_sg, &async_desc->lock_ce, sizeof(async_desc->lock_ce));
> +
> +	mapped = dma_map_sg(chan->slave, &async_desc->lock_sg, 1, DMA_TO_DEVICE);

[Severity: High]
Should this mapping use the DMA controller's device instead of the client
peripheral device?

The BAM DMA controller is the bus master fetching the command element, so
mapping it with chan->slave will place it in the wrong IOMMU domain.
Additionally, chan->slave may be NULL in modern dmaengine drivers,
potentially leading to a NULL pointer dereference.

> +	if (!mapped) {
> +		kfree(async_desc);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
> +	desc->addr = sg_dma_address(&async_desc->lock_sg);

[Severity: Medium]
Does this missing endianness conversion write an invalid address to hardware
on big-endian architectures?

This assignment implicitly casts the CPU-endian dma_addr_t to a
little-endian __le32 field without the necessary cpu_to_le32()
conversion.

> +	desc->size = cpu_to_le16(sizeof(struct bam_cmd_element));
> +
> +	vc = &bchan->vc;
> +	vd = &async_desc->vd;
> +
> +	dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
> +	vd->tx.flags = DMA_PREP_CMD;

[Severity: High]
Will the omission of the DMA_CTRL_ACK flag cause an unbounded memory leak?

The virt-dma core retains completed descriptors in vc->desc_completed
until async_tx_test_ack() evaluates to true. Since these dummy
descriptors are never returned to the client to be ACKed, they will
permanently accumulate in the list.

> +	vd->tx.desc_free = vchan_tx_desc_free;
> +	vd->tx_result.result = DMA_TRANS_NOERROR;
> +	vd->tx_result.residue = 0;
> +
> +	cookie = dma_cookie_assign(&vd->tx);

[Severity: High]
Does assigning cookies to dummy descriptors break the monotonic sequence of
completions?

Because the LOCK descriptor is prepended to desc_issued, it executes and
completes before the client's payload descriptors. When the LOCK completes,
vchan_cookie_complete() advances completed_cookie. A client polling
via dma_cookie_status() will falsely see its pending transfers as complete
before they have even started.

> +	ret = dma_submit_error(cookie);
> +	if (ret) {
> +		dma_unmap_sg(chan->slave, &async_desc->lock_sg, 1, DMA_TO_DEVICE);
> +		kfree(async_desc);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return async_desc;
> +}
> +
> +static int bam_do_setup_pipe_lock(struct bam_chan *bchan, bool lock)
> +{
> +	struct bam_device *bdev = bchan->bdev;
> +	const struct bam_device_data *bdata = bdev->dev_data;
> +	struct bam_async_desc *lock_desc;
> +	unsigned long flag;
> +
> +	lockdep_assert_held(&bchan->vc.lock);
> +
> +	if (!bdata->pipe_lock_supported || !bchan->scratchpad_addr ||
> +	    bchan->direction != DMA_MEM_TO_DEV)

[Severity: Medium]
Does this check entirely skip locking for RX channels?

The commit message explicitly states the feature is designed for "starting
DMA on an RX channel" (which is DMA_DEV_TO_MEM). However, aborting if
bchan->direction != DMA_MEM_TO_DEV restricts this entirely to TX channels,
contradicting the stated intent.

> +		return 0;
> +
> +	flag = lock ? DESC_FLAG_LOCK : DESC_FLAG_UNLOCK;
> +
> +	lock_desc = bam_make_lock_desc(bchan, flag);
> +	if (IS_ERR(lock_desc))
> +		return PTR_ERR(lock_desc);
> +
> +	if (lock)
> +		list_add(&lock_desc->vd.node, &bchan->vc.desc_issued);
> +	else
> +		list_add_tail(&lock_desc->vd.node, &bchan->vc.desc_issued);
> +
> +	return 0;
> +}
> +
> +static void bam_setup_pipe_lock(struct bam_chan *bchan)
> +{
> +	if (bam_do_setup_pipe_lock(bchan, true) || bam_do_setup_pipe_lock(bchan, false))

[Severity: High]
Does this short-circuit logic risk permanently locking the hardware pipeline?

If the LOCK descriptor allocation succeeds but the UNLOCK allocation fails
(e.g., -ENOMEM), the short-circuit evaluates to true and the function
exits. The engine will execute the already-queued LOCK but never receive an
UNLOCK, resulting in a permanent hardware deadlock.

> +		dev_err(bchan->vc.chan.slave, "Failed to setup BAM pipe lock descriptors");
> +}
> +
>  /**
>   * bam_start_dma - start next transaction
>   * @bchan: bam dma channel
>   */
>  static void bam_start_dma(struct bam_chan *bchan)
>  {
> -	struct virt_dma_desc *vd = vchan_next_desc(&bchan->vc);
> +	struct virt_dma_desc *vd;
>  	struct bam_device *bdev = bchan->bdev;
>  	struct bam_async_desc *async_desc = NULL;
>  	struct bam_desc_hw *desc;
> @@ -1030,6 +1169,9 @@ static void bam_start_dma(struct bam_chan *bchan)
>  
>  	lockdep_assert_held(&bchan->vc.lock);
>  
> +	bam_setup_pipe_lock(bchan);
> +

[Severity: High]
Will unconditionally calling bam_setup_pipe_lock() cause duplicate lock
injections if the hardware FIFO is full?

If bam_start_dma() exits early due to IS_BUSY(bchan), unprocessed payload
descriptors and the UNLOCK descriptor remain in the list. When the channel
resumes and bam_start_dma() is re-invoked to process the backlog, it will
blindly inject another pair of LOCK/UNLOCK descriptors into the middle of the
ongoing transaction stream.

> +	vd = vchan_next_desc(&bchan->vc);
>  	if (!vd)
>  		return;

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260522-qcom-qce-cmd-descr-v18-0-99103926bafc@oss.qualcomm.com?part=6

next prev parent reply	other threads:[~2026-05-22 15:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 13:39 [PATCH v18 00/14] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
2026-05-22 13:39 ` [PATCH v18 01/14] dmaengine: constify struct dma_descriptor_metadata_ops Bartosz Golaszewski
2026-05-22 13:39 ` [PATCH v18 02/14] dmaengine: qcom: bam_dma: free interrupt before the clock in error path Bartosz Golaszewski
2026-05-22 14:17   ` sashiko-bot
2026-05-22 13:39 ` [PATCH v18 03/14] dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue Bartosz Golaszewski
2026-05-22 14:37   ` sashiko-bot
2026-05-22 13:39 ` [PATCH v18 04/14] dmaengine: qcom: bam_dma: Extend the driver's device match data Bartosz Golaszewski
2026-05-22 13:39 ` [PATCH v18 05/14] dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support Bartosz Golaszewski
2026-05-22 13:39 ` [PATCH v18 06/14] dmaengine: qcom: bam_dma: add support for BAM locking Bartosz Golaszewski
2026-05-22 15:25   ` sashiko-bot [this message]
2026-05-22 13:40 ` [PATCH v18 07/14] crypto: qce - Cancel work on device detach Bartosz Golaszewski
2026-05-22 15:57   ` sashiko-bot
2026-05-22 13:40 ` [PATCH v18 08/14] crypto: qce - Include algapi.h in the core.h header Bartosz Golaszewski
2026-05-22 13:40 ` [PATCH v18 09/14] crypto: qce - Remove unused ignore_buf Bartosz Golaszewski
2026-05-22 16:44   ` sashiko-bot
2026-05-22 13:40 ` [PATCH v18 10/14] crypto: qce - Simplify arguments of devm_qce_dma_request() Bartosz Golaszewski
2026-05-22 13:40 ` [PATCH v18 11/14] crypto: qce - Use existing devres APIs in devm_qce_dma_request() Bartosz Golaszewski
2026-05-22 17:15   ` sashiko-bot
2026-05-22 13:40 ` [PATCH v18 12/14] crypto: qce - Map crypto memory for DMA Bartosz Golaszewski
2026-05-22 17:56   ` sashiko-bot
2026-05-22 13:40 ` [PATCH v18 13/14] crypto: qce - Add BAM DMA support for crypto register I/O Bartosz Golaszewski
2026-05-22 13:40 ` [PATCH v18 14/14] crypto: qce - Communicate the base physical address to the dmaengine Bartosz Golaszewski
2026-05-24 20:49 ` [PATCH v18 00/14] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Eric Biggers
2026-05-25  5:44   ` Manivannan Sadhasivam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260522152506.63A241F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=Frank.Li@kernel.org \
    --cc=bartosz.golaszewski@oss.qualcomm.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.