From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 34F0419CD0A for ; Tue, 19 May 2026 13:56:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779198998; cv=none; b=XVlFk6OhTRfOM2k/WtI0SdaQ3n5PJ7TGstXIPgQiKutIWXYvk9P2VdsCMMyjWHD6iu9sbkiLLkvSgec+47vrzNSbSUeK2/lz6I0dIZ/2oXRXfFinfaW6+AhwwpZpZmojMmfR5HiMFjhGHDxQneZwdALQm88zmdFstkGPRVCILzA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779198998; c=relaxed/simple; bh=xwxvHd7V8VHSKrK2otqrgdTFXSmm/3W4zbLV23EsLDg=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=TWDnyH4ZKAdemzAkR+h1cKR0Xr0cGOGDDv4R/77DNgkMAAWDJwJ3SAUQU4lvLwU1FAGWD/lMujeJzWDU/Cy1qAfsRM0DCckmFBQcY3DI+5wm7r6HKGOv2sDgYc7kkkmDh/XvVwdfoMxNS1fMH+ns3GgAsRhlwesufNXdtG1VpqU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=I6Wwxmz6; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="I6Wwxmz6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FDAAC2BCB8; Tue, 19 May 2026 13:56:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1779198997; bh=xwxvHd7V8VHSKrK2otqrgdTFXSmm/3W4zbLV23EsLDg=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=I6Wwxmz6uk274ErxzdRzFyFy8R//NEr97iuRU8ABKeGY/8qDLkuJXAkvQgNbv7GGK mY0jPfAKqVagwL6YRYTLTzHUInMPFRQQg1VWcCkiM11I6X/To+MIW9ZYH/k/oLEjum 7tfG7ncUhc7Teb3PJCdhOJSfi15Ei8Qf863/gvQH8GPOj+5kd/HVFmqpefGfE4ISys vkjTC2ZdFo2idXvxnB2J7IuTU43J1Q3aOqLrQTEw/I85o0v9sWYFBMne8cggsK+fjV WpxllNvPDhgAIPt9KNIoxGU6KzzOZTmf1K0PUWJLbx6xWx1QE9jDTs003Akf1pmBQj 7LLfGbgpprp1w== From: sashiko-bot@kernel.org Subject: Re: [PATCH v17 13/14] crypto: qce - Add BAM DMA support for crypto register I/O Reply-To: sashiko-reviews@lists.linux.dev To: "Bartosz Golaszewski" Cc: vkoul@kernel.org, Frank.Li@kernel.org, dmaengine@vger.kernel.org In-Reply-To: <20260519-qcom-qce-cmd-descr-v17-13-53a595414b79@oss.qualcomm.com> References: <20260519-qcom-qce-cmd-descr-v17-13-53a595414b79@oss.qualcomm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 19 May 2026 13:56:30 +0000 Message-Id: <20260519135637.8FDAAC2BCB8@smtp.kernel.org> Precedence: bulk X-Mailing-List: dmaengine@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 7 potential issue(= s) to consider: - [High] DMA API misuse causing potential memory corruption or IOMMU faults. - [High] Missing endianness conversion for BAM DMA command data. - [High] Buffer overflow due to missing bounds checking in BAM DMA arrays. - [High] Resource leak of DMA mappings on successful crypto operations. - [Medium] Race condition in `qce_dma_terminate_all()` due to improper DMA = unmap ordering. - [Low] Unused coherent DMA memory allocation for register reads. - [Low] Suboptimal scatterlist construction for BAM command descriptors. -- commit 7d591e31fc1682350db9b4ee4360e0e434326203 Author: Bartosz Golaszewski crypto: qce - Add BAM DMA support for crypto register I/O =20 Switch to using BAM DMA for register I/O in addition to passing data. To that end: provide the necessary infrastructure in the driver, modify the ordering of operations as required and replace all direct register writes with wrappers queueing DMA command descriptors. > diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c > index 3db46fc0c419a..b66e6386fccda 100644 > --- a/drivers/crypto/qce/dma.c > +++ b/drivers/crypto/qce/dma.c [ ... ] > +int qce_submit_cmd_desc(struct qce_device *qce) > +{ > + struct qce_desc_info *qce_desc =3D qce->dma.bam_txn->desc; > + struct qce_bam_transaction *bam_txn =3D qce->dma.bam_txn; > + struct dma_async_tx_descriptor *dma_desc; > + struct dma_chan *chan =3D qce->dma.rxchan; > + unsigned long attrs =3D DMA_PREP_CMD; > + dma_cookie_t cookie; > + unsigned int mapped; > + int ret; > + > + mapped =3D dma_map_sg(qce->dev, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt, D= MA_TO_DEVICE); > + if (!mapped) > + return -ENOMEM; Does this leak the DMA mapping on successful crypto operations? The wr_sgl scatterlist is mapped for DMA here, but there does not appear to be a corresponding dma_unmap_sg() on the successful completion paths of the crypto requests. The unmapping is currently only handled in qce_dma_terminate_all() during error cleanup. > + > + dma_desc =3D dmaengine_prep_slave_sg(chan, bam_txn->wr_sgl, bam_txn->wr= _sgl_cnt, > + DMA_MEM_TO_DEV, attrs); Should this pass the mapped variable instead of bam_txn->wr_sgl_cnt? Because dma_map_sg() coalesces contiguous physical memory regions, the number of returned segments can be fewer than the original count. Passing the original wr_sgl_cnt might force the DMA engine to process trailing, unmapped scatterlist entries, which could lead to IOMMU faults or memory corruption. > + if (!dma_desc) { > + ret =3D -ENOMEM; > + goto err_unmap_sg; > + } [ ... ] > +static void qce_prep_dma_cmd_desc(struct qce_device *qce, struct qce_dma= _data *dma, > + unsigned int addr, void *buf) > +{ > + struct qce_bam_transaction *bam_txn =3D dma->bam_txn; > + struct bam_cmd_element *bam_ce_buf; > + int bam_ce_size, cnt, idx; > + > + idx =3D bam_txn->bam_ce_idx; > + bam_ce_buf =3D &bam_txn->bam_ce[idx]; > + bam_prep_ce_le32(bam_ce_buf, addr, BAM_WRITE_COMMAND, *((__le32 *)buf)); Could this result in incorrect endianness on big-endian hardware? The value passed to buf originates as a native host-endian u32 from qce_write_dma(). Casting it directly to __le32 * and dereferencing it skips the use of cpu_to_le32(). This would write native big-endian values directly to the little-endian BAM DMA hardware. > + > + bam_ce_buf =3D &bam_txn->bam_ce[bam_txn->pre_bam_ce_idx]; > + bam_txn->bam_ce_idx++; Does this code overflow the bam_txn->bam_ce[] array? The array is sized to QCE_BAM_CMD_ELEMENT_SIZE (128), but there are no bounds checks before unconditionally incrementing bam_ce_idx. > + bam_ce_size =3D (bam_txn->bam_ce_idx - bam_txn->pre_bam_ce_idx) * sizeo= f(*bam_ce_buf); > + > + cnt =3D bam_txn->wr_sgl_cnt; > + > + sg_set_buf(&bam_txn->wr_sgl[cnt], bam_ce_buf, bam_ce_size); > + > + ++bam_txn->wr_sgl_cnt; Similarly, does this overflow the bam_txn->wr_sgl[] array? It is also sized to QCE_BAM_CMD_SGL_SIZE (128) and incremented without checking if the bounds have been exceeded. > + bam_txn->pre_bam_ce_idx =3D bam_txn->bam_ce_idx; Could the scatterlist construction be improved here? By setting pre_bam_ce_idx equal to bam_ce_idx after every write, this appears to create a separate 16-byte scatterlist entry for every individual register write instead of batching adjacent command elements into a single contiguous scatterlist segment. > +} > + > +void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val) > +{ > + unsigned int reg_addr =3D ((unsigned int)(qce->base_phys) + offset); > + > + qce_prep_dma_cmd_desc(qce, &qce->dma, reg_addr, &val); > +} > + > int devm_qce_dma_request(struct qce_device *qce) > { [ ... ] > + qce->read_buf =3D dmam_alloc_coherent(qce->dev, QCE_MAX_REG_READ * size= of(*qce->read_buf), > + &qce->read_buf_dma, GFP_KERNEL); Is this allocated memory actually used? The driver appears to allocate this coherent DMA memory, but continues to use the synchronous readl() for qce_read(), leaving this buffer unused. > + if (!qce->read_buf) > + return -ENOMEM; > + > return 0; > } [ ... ] > int qce_dma_terminate_all(struct qce_dma_data *dma) > { > + struct qce_device *qce =3D container_of(dma, struct qce_device, dma); > + struct qce_bam_transaction *bam_txn =3D dma->bam_txn; > int ret; > =20 > + dma_unmap_sg(qce->dev, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt, DMA_TO_DEV= ICE); > + > ret =3D dmaengine_terminate_all(dma->rxchan); > return ret ?: dmaengine_terminate_all(dma->txchan); > } Could this ordering cause an IOMMU fault? Calling dma_unmap_sg() before dmaengine_terminate_all() can mean that the memory is unmapped while the DMA engine is still actively processing descriptors during an error or cancellation path. Should the hardware be terminated prior to unmapping the memory? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260519-qcom-qce-c= md-descr-v17-0-53a595414b79@oss.qualcomm.com?part=3D13