public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
@ 2026-03-02 15:57 Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 01/12] crypto: qce - Include algapi.h in the core.h header Bartosz Golaszewski
                   ` (12 more replies)
  0 siblings, 13 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski,
	Konrad Dybcio, Dmitry Baryshkov, Bjorn Andersson

NOTE: Please note that even though this is version 11, I changed the
prefix to RFC as this is an entirely new approach resulting from
discussions under v9. I AM AWARE of the existing memory leaks in the
last patch of this series - I'm sending it because I want to first
discuss the approach and get a green light from Vinod as well as Mani
and Bjorn. Especially when it comes to communicating the address for the
dummy rights from the client to the BAM driver.
/NOTE

Currently the QCE crypto driver accesses the crypto engine registers
directly via CPU. Trust Zone may perform crypto operations simultaneously
resulting in a race condition. To remedy that, let's introduce support
for BAM locking/unlocking to the driver. The BAM driver will now wrap
any existing issued descriptor chains with additional descriptors
performing the locking when the client starts the transaction
(dmaengine_issue_pending()). The client wanting to profit from locking
needs to switch to performing register I/O over DMA and communicate the
address to which to perform the dummy writes via a call to
dmaengine_slave_config().

In the specific case of the BAM DMA this translates to sending command
descriptors performing dummy writes with the relevant flags set. The BAM
will then lock all other pipes not related to the current pipe group, and
keep handling the current pipe only until it sees the the unlock bit.

In order for the locking to work correctly, we also need to switch to
using DMA for all register I/O.

On top of this, the series contains some additional tweaks and
refactoring.

The goal of this is not to improve the performance but to prepare the
driver for supporting decryption into secure buffers in the future.

Tested with tcrypt.ko, kcapi and cryptsetup.

Shout out to Daniel and Udit from Qualcomm for helping me out with some
DMA issues we encountered.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
Changes in v11:
- Use new approach, not requiring the client to be involved in locking.
- Add a patch constifying dma_descriptor_metadata_ops
- Rebase on top of v7.0-rc1
- Link to v10: https://lore.kernel.org/r/20251219-qcom-qce-cmd-descr-v10-0-ff7e4bf7dad4@oss.qualcomm.com

Changes in v10:
- Move DESC_FLAG_(UN)LOCK BIT definitions from patch 2 to 3
- Add a patch constifying the dma engine metadata as the first in the
  series
- Use the VERSION register for dummy lock/unlock writes
- Link to v9: https://lore.kernel.org/r/20251128-qcom-qce-cmd-descr-v9-0-9a5f72b89722@linaro.org

Changes in v9:
- Drop the global, generic LOCK/UNLOCK flags and instead use DMA
  descriptor metadata ops to pass BAM-specific information from the QCE
  to the DMA engine
- Link to v8: https://lore.kernel.org/r/20251106-qcom-qce-cmd-descr-v8-0-ecddca23ca26@linaro.org

Changes in v8:
- Rework the command descriptor logic and drop a lot of unneeded code
- Use the physical address for BAM command descriptor access, not the
  mapped DMA address
- Fix the problems with iommu faults on newer platforms
- Generalize the LOCK/UNLOCK flags in dmaengine and reword the docs and
  commit messages
- Make the BAM locking logic stricter in the DMA engine driver
- Add some additional minor QCE driver refactoring changes to the series
- Lots of small reworks and tweaks to rebase on current mainline and fix
  previous issues
- Link to v7: https://lore.kernel.org/all/20250311-qce-cmd-descr-v7-0-db613f5d9c9f@linaro.org/

Changes in v7:
- remove unused code: writing to multiple registers was not used in v6,
  neither were the functions for reading registers over BAM DMA-
- remove
- don't read the SW_VERSION register needlessly in the BAM driver,
  instead: encode the information on whether the IP supports BAM locking
  in device match data
- shrink code where possible with logic modifications (for instance:
  change the implementation of qce_write() instead of replacing it
  everywhere with a new symbol)
- remove duplicated error messages
- rework commit messages
- a lot of shuffling code around for easier review and a more
  streamlined series
- Link to v6: https://lore.kernel.org/all/20250115103004.3350561-1-quic_mdalam@quicinc.com/

Changes in v6:
- change "BAM" to "DMA"
- Ensured this series is compilable with the current Linux-next tip of
  the tree (TOT).

Changes in v5:
- Added DMA_PREP_LOCK and DMA_PREP_UNLOCK flag support in separate patch
- Removed DMA_PREP_LOCK & DMA_PREP_UNLOCK flag
- Added FIELD_GET and GENMASK macro to extract major and minor version

Changes in v4:
- Added feature description and test hardware
  with test command
- Fixed patch version numbering
- Dropped dt-binding patch
- Dropped device tree changes
- Added BAM_SW_VERSION register read
- Handled the error path for the api dma_map_resource()
  in probe
- updated the commit messages for batter redability
- Squash the change where qce_bam_acquire_lock() and
  qce_bam_release_lock() api got introduce to the change where
  the lock/unlock flag get introced
- changed cover letter subject heading to
  "dmaengine: qcom: bam_dma: add cmd descriptor support"
- Added the very initial post for BAM lock/unlock patch link
  as v1 to track this feature

Changes in v3:
- https://lore.kernel.org/lkml/183d4f5e-e00a-8ef6-a589-f5704bc83d4a@quicinc.com/
- Addressed all the comments from v2
- Added the dt-binding
- Fix alignment issue
- Removed type casting from qce_write_reg_dma()
  and qce_read_reg_dma()
- Removed qce_bam_txn = dma->qce_bam_txn; line from
  qce_alloc_bam_txn() api and directly returning
  dma->qce_bam_txn

Changes in v2:
- https://lore.kernel.org/lkml/20231214114239.2635325-1-quic_mdalam@quicinc.com/
- Initial set of patches for cmd descriptor support
- Add client driver to use BAM lock/unlock feature
- Added register read/write via BAM in QCE Crypto driver
  to use BAM lock/unlock feature

---
Bartosz Golaszewski (12):
      crypto: qce - Include algapi.h in the core.h header
      crypto: qce - Remove unused ignore_buf
      crypto: qce - Simplify arguments of devm_qce_dma_request()
      crypto: qce - Use existing devres APIs in devm_qce_dma_request()
      crypto: qce - Map crypto memory for DMA
      crypto: qce - Add BAM DMA support for crypto register I/O
      crypto: qce - Communicate the base physical address to the dmaengine
      dmaengine: constify struct dma_descriptor_metadata_ops
      dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue
      dmaengine: qcom: bam_dma: Extend the driver's device match data
      dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support
      dmaengine: qcom: bam_dma: add support for BAM locking

 drivers/crypto/qce/aead.c       |   2 -
 drivers/crypto/qce/common.c     |  20 +++--
 drivers/crypto/qce/core.c       |  29 ++++++-
 drivers/crypto/qce/core.h       |  11 +++
 drivers/crypto/qce/dma.c        | 151 ++++++++++++++++++++++++++++++-------
 drivers/crypto/qce/dma.h        |  11 ++-
 drivers/crypto/qce/sha.c        |   2 -
 drivers/crypto/qce/skcipher.c   |   2 -
 drivers/dma/qcom/bam_dma.c      | 163 ++++++++++++++++++++++++++++++++++------
 drivers/dma/ti/k3-udma.c        |   2 +-
 drivers/dma/xilinx/xilinx_dma.c |   2 +-
 include/linux/dmaengine.h       |   2 +-
 12 files changed, 321 insertions(+), 76 deletions(-)
---
base-commit: d8cfb11ffd167a413269993d7e24d307ad47aa49
change-id: 20251103-qcom-qce-cmd-descr-c5e9b11fe609

Best regards,
-- 
Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 01/12] crypto: qce - Include algapi.h in the core.h header
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 02/12] crypto: qce - Remove unused ignore_buf Bartosz Golaszewski
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

The header defines a struct embedding struct crypto_queue whose size
needs to be known and which is defined in crypto/algapi.h. Move the
inclusion from core.c to core.h.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/core.c | 1 -
 drivers/crypto/qce/core.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
index b966f3365b7de8d2a8f6707397a34aa4facdc4ac..65205100c3df961ffaa4b7bc9e217e8d3e08ed57 100644
--- a/drivers/crypto/qce/core.c
+++ b/drivers/crypto/qce/core.c
@@ -13,7 +13,6 @@
 #include <linux/mod_devicetable.h>
 #include <linux/platform_device.h>
 #include <linux/types.h>
-#include <crypto/algapi.h>
 #include <crypto/internal/hash.h>
 
 #include "core.h"
diff --git a/drivers/crypto/qce/core.h b/drivers/crypto/qce/core.h
index eb6fa7a8b64a81daf9ad5304a3ae4e5e597a70b8..f092ce2d3b04a936a37805c20ac5ba78d8fdd2df 100644
--- a/drivers/crypto/qce/core.h
+++ b/drivers/crypto/qce/core.h
@@ -8,6 +8,7 @@
 
 #include <linux/mutex.h>
 #include <linux/workqueue.h>
+#include <crypto/algapi.h>
 
 #include "dma.h"
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 02/12] crypto: qce - Remove unused ignore_buf
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 01/12] crypto: qce - Include algapi.h in the core.h header Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 03/12] crypto: qce - Simplify arguments of devm_qce_dma_request() Bartosz Golaszewski
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

It's unclear what the purpose of this field is. It has been here since
the initial commit but without any explanation. The driver works fine
without it. We still keep allocating more space in the result buffer, we
just don't need to store its address. While at it: move the
QCE_IGNORE_BUF_SZ definition into dma.c as it's not used outside of this
compilation unit.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/dma.c | 4 ++--
 drivers/crypto/qce/dma.h | 2 --
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index 68cafd4741ad3d91906d39e817fc7873b028d498..08bf3e8ec12433c1a8ee17003f3487e41b7329e4 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -9,6 +9,8 @@
 
 #include "dma.h"
 
+#define QCE_IGNORE_BUF_SZ		(2 * QCE_BAM_BURST_SIZE)
+
 static void qce_dma_release(void *data)
 {
 	struct qce_dma_data *dma = data;
@@ -41,8 +43,6 @@ int devm_qce_dma_request(struct device *dev, struct qce_dma_data *dma)
 		goto error_nomem;
 	}
 
-	dma->ignore_buf = dma->result_buf + QCE_RESULT_BUF_SZ;
-
 	return devm_add_action_or_reset(dev, qce_dma_release, dma);
 
 error_nomem:
diff --git a/drivers/crypto/qce/dma.h b/drivers/crypto/qce/dma.h
index 31629185000e12242fa07c2cc08b95fcbd5d4b8c..fc337c435cd14917bdfb99febcf9119275afdeba 100644
--- a/drivers/crypto/qce/dma.h
+++ b/drivers/crypto/qce/dma.h
@@ -23,7 +23,6 @@ struct qce_result_dump {
 	u32 status2;
 };
 
-#define QCE_IGNORE_BUF_SZ	(2 * QCE_BAM_BURST_SIZE)
 #define QCE_RESULT_BUF_SZ	\
 		ALIGN(sizeof(struct qce_result_dump), QCE_BAM_BURST_SIZE)
 
@@ -31,7 +30,6 @@ struct qce_dma_data {
 	struct dma_chan *txchan;
 	struct dma_chan *rxchan;
 	struct qce_result_dump *result_buf;
-	void *ignore_buf;
 };
 
 int devm_qce_dma_request(struct device *dev, struct qce_dma_data *dma);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 03/12] crypto: qce - Simplify arguments of devm_qce_dma_request()
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 01/12] crypto: qce - Include algapi.h in the core.h header Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 02/12] crypto: qce - Remove unused ignore_buf Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 04/12] crypto: qce - Use existing devres APIs in devm_qce_dma_request() Bartosz Golaszewski
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

This function can extract all the information it needs from struct
qce_device alone so simplify its arguments. This is done in preparation
for adding support for register I/O over DMA which will require
accessing even more fields from struct qce_device.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/core.c | 2 +-
 drivers/crypto/qce/dma.c  | 5 ++++-
 drivers/crypto/qce/dma.h  | 4 +++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
index 65205100c3df961ffaa4b7bc9e217e8d3e08ed57..8b7bcd0c420c45caf8b29e5455e0f384fd5c5616 100644
--- a/drivers/crypto/qce/core.c
+++ b/drivers/crypto/qce/core.c
@@ -226,7 +226,7 @@ static int qce_crypto_probe(struct platform_device *pdev)
 	if (ret)
 		return ret;
 
-	ret = devm_qce_dma_request(qce->dev, &qce->dma);
+	ret = devm_qce_dma_request(qce);
 	if (ret)
 		return ret;
 
diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index 08bf3e8ec12433c1a8ee17003f3487e41b7329e4..c29b0abe9445381a019e0447d30acfd7319d5c1f 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -7,6 +7,7 @@
 #include <linux/dmaengine.h>
 #include <crypto/scatterwalk.h>
 
+#include "core.h"
 #include "dma.h"
 
 #define QCE_IGNORE_BUF_SZ		(2 * QCE_BAM_BURST_SIZE)
@@ -20,8 +21,10 @@ static void qce_dma_release(void *data)
 	kfree(dma->result_buf);
 }
 
-int devm_qce_dma_request(struct device *dev, struct qce_dma_data *dma)
+int devm_qce_dma_request(struct qce_device *qce)
 {
+	struct qce_dma_data *dma = &qce->dma;
+	struct device *dev = qce->dev;
 	int ret;
 
 	dma->txchan = dma_request_chan(dev, "tx");
diff --git a/drivers/crypto/qce/dma.h b/drivers/crypto/qce/dma.h
index fc337c435cd14917bdfb99febcf9119275afdeba..483789d9fa98e79d1283de8297bf2fc2a773f3a7 100644
--- a/drivers/crypto/qce/dma.h
+++ b/drivers/crypto/qce/dma.h
@@ -8,6 +8,8 @@
 
 #include <linux/dmaengine.h>
 
+struct qce_device;
+
 /* maximum data transfer block size between BAM and CE */
 #define QCE_BAM_BURST_SIZE		64
 
@@ -32,7 +34,7 @@ struct qce_dma_data {
 	struct qce_result_dump *result_buf;
 };
 
-int devm_qce_dma_request(struct device *dev, struct qce_dma_data *dma);
+int devm_qce_dma_request(struct qce_device *qce);
 int qce_dma_prep_sgs(struct qce_dma_data *dma, struct scatterlist *sg_in,
 		     int in_ents, struct scatterlist *sg_out, int out_ents,
 		     dma_async_tx_callback cb, void *cb_param);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 04/12] crypto: qce - Use existing devres APIs in devm_qce_dma_request()
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (2 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 03/12] crypto: qce - Simplify arguments of devm_qce_dma_request() Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 05/12] crypto: qce - Map crypto memory for DMA Bartosz Golaszewski
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski,
	Konrad Dybcio

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Switch to devm_kmalloc() and devm_dma_alloc_chan() in
devm_qce_dma_request(). This allows us to drop two labels and shrink the
function.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/dma.c | 39 +++++++++------------------------------
 1 file changed, 9 insertions(+), 30 deletions(-)

diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index c29b0abe9445381a019e0447d30acfd7319d5c1f..a46264735bb895b6199969e83391383ccbbacc5f 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -12,47 +12,26 @@
 
 #define QCE_IGNORE_BUF_SZ		(2 * QCE_BAM_BURST_SIZE)
 
-static void qce_dma_release(void *data)
-{
-	struct qce_dma_data *dma = data;
-
-	dma_release_channel(dma->txchan);
-	dma_release_channel(dma->rxchan);
-	kfree(dma->result_buf);
-}
-
 int devm_qce_dma_request(struct qce_device *qce)
 {
 	struct qce_dma_data *dma = &qce->dma;
 	struct device *dev = qce->dev;
-	int ret;
 
-	dma->txchan = dma_request_chan(dev, "tx");
+	dma->txchan = devm_dma_request_chan(dev, "tx");
 	if (IS_ERR(dma->txchan))
 		return dev_err_probe(dev, PTR_ERR(dma->txchan),
 				     "Failed to get TX DMA channel\n");
 
-	dma->rxchan = dma_request_chan(dev, "rx");
-	if (IS_ERR(dma->rxchan)) {
-		ret = dev_err_probe(dev, PTR_ERR(dma->rxchan),
-				    "Failed to get RX DMA channel\n");
-		goto error_rx;
-	}
-
-	dma->result_buf = kmalloc(QCE_RESULT_BUF_SZ + QCE_IGNORE_BUF_SZ,
-				  GFP_KERNEL);
-	if (!dma->result_buf) {
-		ret = -ENOMEM;
-		goto error_nomem;
-	}
+	dma->rxchan = devm_dma_request_chan(dev, "rx");
+	if (IS_ERR(dma->rxchan))
+		return dev_err_probe(dev, PTR_ERR(dma->rxchan),
+				     "Failed to get RX DMA channel\n");
 
-	return devm_add_action_or_reset(dev, qce_dma_release, dma);
+	dma->result_buf = devm_kmalloc(dev, QCE_RESULT_BUF_SZ + QCE_IGNORE_BUF_SZ, GFP_KERNEL);
+	if (!dma->result_buf)
+		return -ENOMEM;
 
-error_nomem:
-	dma_release_channel(dma->rxchan);
-error_rx:
-	dma_release_channel(dma->txchan);
-	return ret;
+	return 0;
 }
 
 struct scatterlist *

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 05/12] crypto: qce - Map crypto memory for DMA
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (3 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 04/12] crypto: qce - Use existing devres APIs in devm_qce_dma_request() Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 06/12] crypto: qce - Add BAM DMA support for crypto register I/O Bartosz Golaszewski
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

As the first step in converting the driver to using DMA for register
I/O, let's map the crypto memory range.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/core.c | 25 +++++++++++++++++++++++--
 drivers/crypto/qce/core.h |  6 ++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
index 8b7bcd0c420c45caf8b29e5455e0f384fd5c5616..2667fcd67fee826a44080da8f88a3e2abbb9b2cf 100644
--- a/drivers/crypto/qce/core.c
+++ b/drivers/crypto/qce/core.c
@@ -185,10 +185,19 @@ static int qce_check_version(struct qce_device *qce)
 	return 0;
 }
 
+static void qce_crypto_unmap_dma(void *data)
+{
+	struct qce_device *qce = data;
+
+	dma_unmap_resource(qce->dev, qce->base_dma, qce->dma_size,
+			   DMA_BIDIRECTIONAL, 0);
+}
+
 static int qce_crypto_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct qce_device *qce;
+	struct resource *res;
 	int ret;
 
 	qce = devm_kzalloc(dev, sizeof(*qce), GFP_KERNEL);
@@ -198,7 +207,7 @@ static int qce_crypto_probe(struct platform_device *pdev)
 	qce->dev = dev;
 	platform_set_drvdata(pdev, qce);
 
-	qce->base = devm_platform_ioremap_resource(pdev, 0);
+	qce->base = devm_platform_get_and_ioremap_resource(pdev, 0, &res);
 	if (IS_ERR(qce->base))
 		return PTR_ERR(qce->base);
 
@@ -244,7 +253,19 @@ static int qce_crypto_probe(struct platform_device *pdev)
 	qce->async_req_enqueue = qce_async_request_enqueue;
 	qce->async_req_done = qce_async_request_done;
 
-	return devm_qce_register_algs(qce);
+	ret = devm_qce_register_algs(qce);
+	if (ret)
+		return ret;
+
+	qce->dma_size = resource_size(res);
+	qce->base_dma = dma_map_resource(dev, res->start, qce->dma_size,
+					 DMA_BIDIRECTIONAL, 0);
+	qce->base_phys = res->start;
+	ret = dma_mapping_error(dev, qce->base_dma);
+	if (ret)
+		return ret;
+
+	return devm_add_action_or_reset(qce->dev, qce_crypto_unmap_dma, qce);
 }
 
 static const struct of_device_id qce_crypto_of_match[] = {
diff --git a/drivers/crypto/qce/core.h b/drivers/crypto/qce/core.h
index f092ce2d3b04a936a37805c20ac5ba78d8fdd2df..a80e12eac6c87e5321cce16c56a4bf5003474ef0 100644
--- a/drivers/crypto/qce/core.h
+++ b/drivers/crypto/qce/core.h
@@ -27,6 +27,9 @@
  * @dma: pointer to dma data
  * @burst_size: the crypto burst size
  * @pipe_pair_id: which pipe pair id the device using
+ * @base_dma: base DMA address
+ * @base_phys: base physical address
+ * @dma_size: size of memory mapped for DMA
  * @async_req_enqueue: invoked by every algorithm to enqueue a request
  * @async_req_done: invoked by every algorithm to finish its request
  */
@@ -43,6 +46,9 @@ struct qce_device {
 	struct qce_dma_data dma;
 	int burst_size;
 	unsigned int pipe_pair_id;
+	dma_addr_t base_dma;
+	phys_addr_t base_phys;
+	size_t dma_size;
 	int (*async_req_enqueue)(struct qce_device *qce,
 				 struct crypto_async_request *req);
 	void (*async_req_done)(struct qce_device *qce, int ret);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 06/12] crypto: qce - Add BAM DMA support for crypto register I/O
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (4 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 05/12] crypto: qce - Map crypto memory for DMA Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine Bartosz Golaszewski
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Implement the infrastructure for performing register I/O over BAM DMA,
not CPU.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/aead.c     |   2 -
 drivers/crypto/qce/common.c   |  20 ++++----
 drivers/crypto/qce/core.h     |   4 ++
 drivers/crypto/qce/dma.c      | 109 ++++++++++++++++++++++++++++++++++++++++++
 drivers/crypto/qce/dma.h      |   5 ++
 drivers/crypto/qce/sha.c      |   2 -
 drivers/crypto/qce/skcipher.c |   2 -
 7 files changed, 127 insertions(+), 17 deletions(-)

diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
index abb438d2f8888d313d134161fda97dcc9d82d6d9..a4e8158803eb59cd0d40076674d0059bb94759ce 100644
--- a/drivers/crypto/qce/aead.c
+++ b/drivers/crypto/qce/aead.c
@@ -473,8 +473,6 @@ qce_aead_async_req_handle(struct crypto_async_request *async_req)
 	if (ret)
 		goto error_unmap_src;
 
-	qce_dma_issue_pending(&qce->dma);
-
 	ret = qce_start(async_req, tmpl->crypto_alg_type);
 	if (ret)
 		goto error_terminate;
diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 04253a8d33409a2a51db527435d09ae85a7880af..b2b0e751a06517ac06e7a468599bd18666210e0c 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -25,7 +25,7 @@ static inline u32 qce_read(struct qce_device *qce, u32 offset)
 
 static inline void qce_write(struct qce_device *qce, u32 offset, u32 val)
 {
-	writel(val, qce->base + offset);
+	qce_write_dma(qce, offset, val);
 }
 
 static inline void qce_write_array(struct qce_device *qce, u32 offset,
@@ -82,6 +82,8 @@ static void qce_setup_config(struct qce_device *qce)
 {
 	u32 config;
 
+	qce_clear_bam_transaction(qce);
+
 	/* get big endianness */
 	config = qce_config_reg(qce, 0);
 
@@ -90,12 +92,14 @@ static void qce_setup_config(struct qce_device *qce)
 	qce_write(qce, REG_CONFIG, config);
 }
 
-static inline void qce_crypto_go(struct qce_device *qce, bool result_dump)
+static inline int qce_crypto_go(struct qce_device *qce, bool result_dump)
 {
 	if (result_dump)
 		qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT));
 	else
 		qce_write(qce, REG_GOPROC, BIT(GO_SHIFT));
+
+	return qce_submit_cmd_desc(qce);
 }
 
 #if defined(CONFIG_CRYPTO_DEV_QCE_SHA) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD)
@@ -223,9 +227,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request *async_req)
 	config = qce_config_reg(qce, 1);
 	qce_write(qce, REG_CONFIG, config);
 
-	qce_crypto_go(qce, true);
-
-	return 0;
+	return qce_crypto_go(qce, true);
 }
 #endif
 
@@ -386,9 +388,7 @@ static int qce_setup_regs_skcipher(struct crypto_async_request *async_req)
 	config = qce_config_reg(qce, 1);
 	qce_write(qce, REG_CONFIG, config);
 
-	qce_crypto_go(qce, true);
-
-	return 0;
+	return qce_crypto_go(qce, true);
 }
 #endif
 
@@ -535,9 +535,7 @@ static int qce_setup_regs_aead(struct crypto_async_request *async_req)
 	qce_write(qce, REG_CONFIG, config);
 
 	/* Start the process */
-	qce_crypto_go(qce, !IS_CCM(flags));
-
-	return 0;
+	return qce_crypto_go(qce, !IS_CCM(flags));
 }
 #endif
 
diff --git a/drivers/crypto/qce/core.h b/drivers/crypto/qce/core.h
index a80e12eac6c87e5321cce16c56a4bf5003474ef0..d238097f834e4605f3825f23d0316d4196439116 100644
--- a/drivers/crypto/qce/core.h
+++ b/drivers/crypto/qce/core.h
@@ -30,6 +30,8 @@
  * @base_dma: base DMA address
  * @base_phys: base physical address
  * @dma_size: size of memory mapped for DMA
+ * @read_buf: Buffer for DMA to write back to
+ * @read_buf_dma: Mapped address of the read buffer
  * @async_req_enqueue: invoked by every algorithm to enqueue a request
  * @async_req_done: invoked by every algorithm to finish its request
  */
@@ -49,6 +51,8 @@ struct qce_device {
 	dma_addr_t base_dma;
 	phys_addr_t base_phys;
 	size_t dma_size;
+	__le32 *read_buf;
+	dma_addr_t read_buf_dma;
 	int (*async_req_enqueue)(struct qce_device *qce,
 				 struct crypto_async_request *req);
 	void (*async_req_done)(struct qce_device *qce, int ret);
diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index a46264735bb895b6199969e83391383ccbbacc5f..ba7a52fd4c6349d59c075c346f75741defeb6034 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -4,6 +4,8 @@
  */
 
 #include <linux/device.h>
+#include <linux/dma/qcom_bam_dma.h>
+#include <linux/dma-mapping.h>
 #include <linux/dmaengine.h>
 #include <crypto/scatterwalk.h>
 
@@ -11,6 +13,98 @@
 #include "dma.h"
 
 #define QCE_IGNORE_BUF_SZ		(2 * QCE_BAM_BURST_SIZE)
+#define QCE_BAM_CMD_SGL_SIZE		128
+#define QCE_BAM_CMD_ELEMENT_SIZE	128
+#define QCE_MAX_REG_READ		8
+
+struct qce_desc_info {
+	struct dma_async_tx_descriptor *dma_desc;
+	enum dma_data_direction dir;
+};
+
+struct qce_bam_transaction {
+	struct bam_cmd_element bam_ce[QCE_BAM_CMD_ELEMENT_SIZE];
+	struct scatterlist wr_sgl[QCE_BAM_CMD_SGL_SIZE];
+	struct qce_desc_info *desc;
+	u32 bam_ce_idx;
+	u32 pre_bam_ce_idx;
+	u32 wr_sgl_cnt;
+};
+
+void qce_clear_bam_transaction(struct qce_device *qce)
+{
+	struct qce_bam_transaction *bam_txn = qce->dma.bam_txn;
+
+	bam_txn->bam_ce_idx = 0;
+	bam_txn->wr_sgl_cnt = 0;
+	bam_txn->bam_ce_idx = 0;
+	bam_txn->pre_bam_ce_idx = 0;
+}
+
+int qce_submit_cmd_desc(struct qce_device *qce)
+{
+	struct qce_desc_info *qce_desc = qce->dma.bam_txn->desc;
+	struct qce_bam_transaction *bam_txn = qce->dma.bam_txn;
+	struct dma_async_tx_descriptor *dma_desc;
+	struct dma_chan *chan = qce->dma.rxchan;
+	unsigned long attrs = DMA_PREP_CMD;
+	dma_cookie_t cookie;
+	unsigned int mapped;
+	int ret;
+
+	mapped = dma_map_sg_attrs(qce->dev, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt,
+				  DMA_TO_DEVICE, attrs);
+	if (!mapped)
+		return -ENOMEM;
+
+	dma_desc = dmaengine_prep_slave_sg(chan, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt,
+					   DMA_MEM_TO_DEV, attrs);
+	if (!dma_desc) {
+		dma_unmap_sg(qce->dev, bam_txn->wr_sgl, bam_txn->wr_sgl_cnt, DMA_TO_DEVICE);
+		return -ENOMEM;
+	}
+
+	qce_desc->dma_desc = dma_desc;
+	cookie = dmaengine_submit(qce_desc->dma_desc);
+
+	ret = dma_submit_error(cookie);
+	if (ret)
+		return ret;
+
+	qce_dma_issue_pending(&qce->dma);
+
+	return 0;
+}
+
+static void qce_prep_dma_cmd_desc(struct qce_device *qce, struct qce_dma_data *dma,
+				  unsigned int addr, void *buf)
+{
+	struct qce_bam_transaction *bam_txn = dma->bam_txn;
+	struct bam_cmd_element *bam_ce_buf;
+	int bam_ce_size, cnt, idx;
+
+	idx = bam_txn->bam_ce_idx;
+	bam_ce_buf = &bam_txn->bam_ce[idx];
+	bam_prep_ce_le32(bam_ce_buf, addr, BAM_WRITE_COMMAND, *((__le32 *)buf));
+
+	bam_ce_buf = &bam_txn->bam_ce[bam_txn->pre_bam_ce_idx];
+	bam_txn->bam_ce_idx++;
+	bam_ce_size = (bam_txn->bam_ce_idx - bam_txn->pre_bam_ce_idx) * sizeof(*bam_ce_buf);
+
+	cnt = bam_txn->wr_sgl_cnt;
+
+	sg_set_buf(&bam_txn->wr_sgl[cnt], bam_ce_buf, bam_ce_size);
+
+	++bam_txn->wr_sgl_cnt;
+	bam_txn->pre_bam_ce_idx = bam_txn->bam_ce_idx;
+}
+
+void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val)
+{
+	unsigned int reg_addr = ((unsigned int)(qce->base_phys) + offset);
+
+	qce_prep_dma_cmd_desc(qce, &qce->dma, reg_addr, &val);
+}
 
 int devm_qce_dma_request(struct qce_device *qce)
 {
@@ -31,6 +125,21 @@ int devm_qce_dma_request(struct qce_device *qce)
 	if (!dma->result_buf)
 		return -ENOMEM;
 
+	dma->bam_txn = devm_kzalloc(dev, sizeof(*dma->bam_txn), GFP_KERNEL);
+	if (!dma->bam_txn)
+		return -ENOMEM;
+
+	dma->bam_txn->desc = devm_kzalloc(dev, sizeof(*dma->bam_txn->desc), GFP_KERNEL);
+	if (!dma->bam_txn->desc)
+		return -ENOMEM;
+
+	sg_init_table(dma->bam_txn->wr_sgl, QCE_BAM_CMD_SGL_SIZE);
+
+	qce->read_buf = dmam_alloc_coherent(qce->dev, QCE_MAX_REG_READ * sizeof(*qce->read_buf),
+					    &qce->read_buf_dma, GFP_KERNEL);
+	if (!qce->read_buf)
+		return -ENOMEM;
+
 	return 0;
 }
 
diff --git a/drivers/crypto/qce/dma.h b/drivers/crypto/qce/dma.h
index 483789d9fa98e79d1283de8297bf2fc2a773f3a7..f05dfa9e6b25bd60e32f45079a8bc7e6a4cf81f9 100644
--- a/drivers/crypto/qce/dma.h
+++ b/drivers/crypto/qce/dma.h
@@ -8,6 +8,7 @@
 
 #include <linux/dmaengine.h>
 
+struct qce_bam_transaction;
 struct qce_device;
 
 /* maximum data transfer block size between BAM and CE */
@@ -32,6 +33,7 @@ struct qce_dma_data {
 	struct dma_chan *txchan;
 	struct dma_chan *rxchan;
 	struct qce_result_dump *result_buf;
+	struct qce_bam_transaction *bam_txn;
 };
 
 int devm_qce_dma_request(struct qce_device *qce);
@@ -43,5 +45,8 @@ int qce_dma_terminate_all(struct qce_dma_data *dma);
 struct scatterlist *
 qce_sgtable_add(struct sg_table *sgt, struct scatterlist *sg_add,
 		unsigned int max_len);
+void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val);
+int qce_submit_cmd_desc(struct qce_device *qce);
+void qce_clear_bam_transaction(struct qce_device *qce);
 
 #endif /* _DMA_H_ */
diff --git a/drivers/crypto/qce/sha.c b/drivers/crypto/qce/sha.c
index d7b6d042fb44f4856a6b4f9c901376dd7531454d..9552a74bf191def412fc32f3859356e569e5d400 100644
--- a/drivers/crypto/qce/sha.c
+++ b/drivers/crypto/qce/sha.c
@@ -113,8 +113,6 @@ static int qce_ahash_async_req_handle(struct crypto_async_request *async_req)
 	if (ret)
 		goto error_unmap_dst;
 
-	qce_dma_issue_pending(&qce->dma);
-
 	ret = qce_start(async_req, tmpl->crypto_alg_type);
 	if (ret)
 		goto error_terminate;
diff --git a/drivers/crypto/qce/skcipher.c b/drivers/crypto/qce/skcipher.c
index 872b65318233ed21e3559853f6bbdad030a1b81f..e80452c19b03496faaee38d4ac792289f560d082 100644
--- a/drivers/crypto/qce/skcipher.c
+++ b/drivers/crypto/qce/skcipher.c
@@ -147,8 +147,6 @@ qce_skcipher_async_req_handle(struct crypto_async_request *async_req)
 	if (ret)
 		goto error_unmap_src;
 
-	qce_dma_issue_pending(&qce->dma);
-
 	ret = qce_start(async_req, tmpl->crypto_alg_type);
 	if (ret)
 		goto error_terminate;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (5 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 06/12] crypto: qce - Add BAM DMA support for crypto register I/O Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-04 14:39   ` Vinod Koul
  2026-03-02 15:57 ` [PATCH RFC v11 08/12] dmaengine: constify struct dma_descriptor_metadata_ops Bartosz Golaszewski
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

In order to let the BAM DMA engine know which address is used for
register I/O, call dmaengine_slave_config() after requesting the RX
channel and use the config structure to pass that information to the
dmaengine core. This is done ahead of extending the BAM driver with
support for pipe locking, which requires performing dummy writes when
passing the lock/unlock flags alongside the command descriptors.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/crypto/qce/core.c | 3 ++-
 drivers/crypto/qce/dma.c  | 8 ++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
index 2667fcd67fee826a44080da8f88a3e2abbb9b2cf..f6363d2a1231dcee0176824135389c42bec02153 100644
--- a/drivers/crypto/qce/core.c
+++ b/drivers/crypto/qce/core.c
@@ -211,6 +211,8 @@ static int qce_crypto_probe(struct platform_device *pdev)
 	if (IS_ERR(qce->base))
 		return PTR_ERR(qce->base);
 
+	qce->base_phys = res->start;
+
 	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
 	if (ret < 0)
 		return ret;
@@ -260,7 +262,6 @@ static int qce_crypto_probe(struct platform_device *pdev)
 	qce->dma_size = resource_size(res);
 	qce->base_dma = dma_map_resource(dev, res->start, qce->dma_size,
 					 DMA_BIDIRECTIONAL, 0);
-	qce->base_phys = res->start;
 	ret = dma_mapping_error(dev, qce->base_dma);
 	if (ret)
 		return ret;
diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
index ba7a52fd4c6349d59c075c346f75741defeb6034..86f22c9a11f8a9e055c243dd8beaf1ded6f88bb9 100644
--- a/drivers/crypto/qce/dma.c
+++ b/drivers/crypto/qce/dma.c
@@ -109,7 +109,9 @@ void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val)
 int devm_qce_dma_request(struct qce_device *qce)
 {
 	struct qce_dma_data *dma = &qce->dma;
+	struct dma_slave_config cfg = { };
 	struct device *dev = qce->dev;
+	int ret;
 
 	dma->txchan = devm_dma_request_chan(dev, "tx");
 	if (IS_ERR(dma->txchan))
@@ -121,6 +123,12 @@ int devm_qce_dma_request(struct qce_device *qce)
 		return dev_err_probe(dev, PTR_ERR(dma->rxchan),
 				     "Failed to get RX DMA channel\n");
 
+	cfg.dst_addr = qce->base_phys;
+	cfg.direction = DMA_MEM_TO_DEV;
+	ret = dmaengine_slave_config(dma->rxchan, &cfg);
+	if (ret)
+		return ret;
+
 	dma->result_buf = devm_kmalloc(dev, QCE_RESULT_BUF_SZ + QCE_IGNORE_BUF_SZ, GFP_KERNEL);
 	if (!dma->result_buf)
 		return -ENOMEM;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 08/12] dmaengine: constify struct dma_descriptor_metadata_ops
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (6 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 09/12] dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue Bartosz Golaszewski
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

There's no reason for the instances of this struct to be modifiable.
Constify the pointer in struct dma_async_tx_descriptor and all drivers
currently using it.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/dma/ti/k3-udma.c        | 2 +-
 drivers/dma/xilinx/xilinx_dma.c | 2 +-
 include/linux/dmaengine.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/ti/k3-udma.c b/drivers/dma/ti/k3-udma.c
index c964ebfcf3b68d86e4bbc9b62bad2212f0ce3ee9..8a2f235b669aaf084a6f7b3e6b23d06b04768608 100644
--- a/drivers/dma/ti/k3-udma.c
+++ b/drivers/dma/ti/k3-udma.c
@@ -3408,7 +3408,7 @@ static int udma_set_metadata_len(struct dma_async_tx_descriptor *desc,
 	return 0;
 }
 
-static struct dma_descriptor_metadata_ops metadata_ops = {
+static const struct dma_descriptor_metadata_ops metadata_ops = {
 	.attach = udma_attach_metadata,
 	.get_ptr = udma_get_metadata_ptr,
 	.set_len = udma_set_metadata_len,
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c
index b53292e02448fe528f1ae9ba33b4bcf408f89fd6..97b934ca54101ea699e3ab28d419bed1b45dee4a 100644
--- a/drivers/dma/xilinx/xilinx_dma.c
+++ b/drivers/dma/xilinx/xilinx_dma.c
@@ -653,7 +653,7 @@ static void *xilinx_dma_get_metadata_ptr(struct dma_async_tx_descriptor *tx,
 	return seg->hw.app;
 }
 
-static struct dma_descriptor_metadata_ops xilinx_dma_metadata_ops = {
+static const struct dma_descriptor_metadata_ops xilinx_dma_metadata_ops = {
 	.get_ptr = xilinx_dma_get_metadata_ptr,
 };
 
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 99efe2b9b4ea9844ca6161208362ef18ef111d96..92566c4c100e98f48750de21249ae3b5de06c763 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -623,7 +623,7 @@ struct dma_async_tx_descriptor {
 	void *callback_param;
 	struct dmaengine_unmap_data *unmap;
 	enum dma_desc_metadata_mode desc_metadata_mode;
-	struct dma_descriptor_metadata_ops *metadata_ops;
+	const struct dma_descriptor_metadata_ops *metadata_ops;
 #ifdef CONFIG_ASYNC_TX_ENABLE_CHANNEL_SWITCH
 	struct dma_async_tx_descriptor *next;
 	struct dma_async_tx_descriptor *parent;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 09/12] dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (7 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 08/12] dmaengine: constify struct dma_descriptor_metadata_ops Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 10/12] dmaengine: qcom: bam_dma: Extend the driver's device match data Bartosz Golaszewski
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski,
	Dmitry Baryshkov, Bjorn Andersson

BH workqueues are a modern mechanism, aiming to replace legacy tasklets.
Let's convert the BAM DMA driver to using the high-priority variant of
the BH workqueue.

[Vinod: suggested using the BG workqueue instead of the regular one
running in process context]

Suggested-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/dma/qcom/bam_dma.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 19116295f8325767a0d97a7848077885b118241c..c8601bac555edf1bb4384fd39cb3449ec6e86334 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -42,6 +42,7 @@
 #include <linux/pm_runtime.h>
 #include <linux/scatterlist.h>
 #include <linux/slab.h>
+#include <linux/workqueue.h>
 
 #include "../dmaengine.h"
 #include "../virt-dma.h"
@@ -397,8 +398,8 @@ struct bam_device {
 	struct clk *bamclk;
 	int irq;
 
-	/* dma start transaction tasklet */
-	struct tasklet_struct task;
+	/* dma start transaction workqueue */
+	struct work_struct work;
 };
 
 /**
@@ -863,7 +864,7 @@ static u32 process_channel_irqs(struct bam_device *bdev)
 			/*
 			 * if complete, process cookie. Otherwise
 			 * push back to front of desc_issued so that
-			 * it gets restarted by the tasklet
+			 * it gets restarted by the work queue.
 			 */
 			if (!async_desc->num_desc) {
 				vchan_cookie_complete(&async_desc->vd);
@@ -893,9 +894,9 @@ static irqreturn_t bam_dma_irq(int irq, void *data)
 
 	srcs |= process_channel_irqs(bdev);
 
-	/* kick off tasklet to start next dma transfer */
+	/* kick off the work queue to start next dma transfer */
 	if (srcs & P_IRQ)
-		tasklet_schedule(&bdev->task);
+		queue_work(system_bh_highpri_wq, &bdev->work);
 
 	ret = pm_runtime_get_sync(bdev->dev);
 	if (ret < 0)
@@ -1091,14 +1092,14 @@ static void bam_start_dma(struct bam_chan *bchan)
 }
 
 /**
- * dma_tasklet - DMA IRQ tasklet
- * @t: tasklet argument (bam controller structure)
+ * bam_dma_work() - DMA interrupt work queue callback
+ * @work: work queue struct embedded in the BAM controller device struct
  *
  * Sets up next DMA operation and then processes all completed transactions
  */
-static void dma_tasklet(struct tasklet_struct *t)
+static void bam_dma_work(struct work_struct *work)
 {
-	struct bam_device *bdev = from_tasklet(bdev, t, task);
+	struct bam_device *bdev = from_work(bdev, work, work);
 	struct bam_chan *bchan;
 	unsigned int i;
 
@@ -1111,14 +1112,13 @@ static void dma_tasklet(struct tasklet_struct *t)
 		if (!list_empty(&bchan->vc.desc_issued) && !IS_BUSY(bchan))
 			bam_start_dma(bchan);
 	}
-
 }
 
 /**
  * bam_issue_pending - starts pending transactions
  * @chan: dma channel
  *
- * Calls tasklet directly which in turn starts any pending transactions
+ * Calls work queue directly which in turn starts any pending transactions
  */
 static void bam_issue_pending(struct dma_chan *chan)
 {
@@ -1286,14 +1286,14 @@ static int bam_dma_probe(struct platform_device *pdev)
 	if (ret)
 		goto err_disable_clk;
 
-	tasklet_setup(&bdev->task, dma_tasklet);
+	INIT_WORK(&bdev->work, bam_dma_work);
 
 	bdev->channels = devm_kcalloc(bdev->dev, bdev->num_channels,
 				sizeof(*bdev->channels), GFP_KERNEL);
 
 	if (!bdev->channels) {
 		ret = -ENOMEM;
-		goto err_tasklet_kill;
+		goto err_workqueue_cancel;
 	}
 
 	/* allocate and initialize channels */
@@ -1358,8 +1358,8 @@ static int bam_dma_probe(struct platform_device *pdev)
 err_bam_channel_exit:
 	for (i = 0; i < bdev->num_channels; i++)
 		tasklet_kill(&bdev->channels[i].vc.task);
-err_tasklet_kill:
-	tasklet_kill(&bdev->task);
+err_workqueue_cancel:
+	cancel_work_sync(&bdev->work);
 err_disable_clk:
 	clk_disable_unprepare(bdev->bamclk);
 
@@ -1393,7 +1393,7 @@ static void bam_dma_remove(struct platform_device *pdev)
 			    bdev->channels[i].fifo_phys);
 	}
 
-	tasklet_kill(&bdev->task);
+	cancel_work_sync(&bdev->work);
 
 	clk_disable_unprepare(bdev->bamclk);
 }

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 10/12] dmaengine: qcom: bam_dma: Extend the driver's device match data
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (8 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 09/12] dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 11/12] dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support Bartosz Golaszewski
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

In preparation for supporting the pipe locking feature flag, extend the
amount of information we can carry in device match data: create a
separate structure and make the register information one of its fields.
This way, in subsequent patches, it will be just a matter of adding a
new field to the device data.

Reviewed-by: Dmitry Baryshkov <lumag@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/dma/qcom/bam_dma.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index c8601bac555edf1bb4384fd39cb3449ec6e86334..8f6d03f6c673b57ed13aeca6c8331c71596d077b 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -113,6 +113,10 @@ struct reg_offset_data {
 	unsigned int pipe_mult, evnt_mult, ee_mult;
 };
 
+struct bam_device_data {
+	const struct reg_offset_data *reg_info;
+};
+
 static const struct reg_offset_data bam_v1_3_reg_info[] = {
 	[BAM_CTRL]		= { 0x0F80, 0x00, 0x00, 0x00 },
 	[BAM_REVISION]		= { 0x0F84, 0x00, 0x00, 0x00 },
@@ -142,6 +146,10 @@ static const struct reg_offset_data bam_v1_3_reg_info[] = {
 	[BAM_P_FIFO_SIZES]	= { 0x1020, 0x00, 0x40, 0x00 },
 };
 
+static const struct bam_device_data bam_v1_3_data = {
+	.reg_info = bam_v1_3_reg_info,
+};
+
 static const struct reg_offset_data bam_v1_4_reg_info[] = {
 	[BAM_CTRL]		= { 0x0000, 0x00, 0x00, 0x00 },
 	[BAM_REVISION]		= { 0x0004, 0x00, 0x00, 0x00 },
@@ -171,6 +179,10 @@ static const struct reg_offset_data bam_v1_4_reg_info[] = {
 	[BAM_P_FIFO_SIZES]	= { 0x1820, 0x00, 0x1000, 0x00 },
 };
 
+static const struct bam_device_data bam_v1_4_data = {
+	.reg_info = bam_v1_4_reg_info,
+};
+
 static const struct reg_offset_data bam_v1_7_reg_info[] = {
 	[BAM_CTRL]		= { 0x00000, 0x00, 0x00, 0x00 },
 	[BAM_REVISION]		= { 0x01000, 0x00, 0x00, 0x00 },
@@ -200,6 +212,10 @@ static const struct reg_offset_data bam_v1_7_reg_info[] = {
 	[BAM_P_FIFO_SIZES]	= { 0x13820, 0x00, 0x1000, 0x00 },
 };
 
+static const struct bam_device_data bam_v1_7_data = {
+	.reg_info = bam_v1_7_reg_info,
+};
+
 /* BAM CTRL */
 #define BAM_SW_RST			BIT(0)
 #define BAM_EN				BIT(1)
@@ -393,7 +409,7 @@ struct bam_device {
 	bool powered_remotely;
 	u32 active_channels;
 
-	const struct reg_offset_data *layout;
+	const struct bam_device_data *dev_data;
 
 	struct clk *bamclk;
 	int irq;
@@ -411,7 +427,7 @@ struct bam_device {
 static inline void __iomem *bam_addr(struct bam_device *bdev, u32 pipe,
 		enum bam_reg reg)
 {
-	const struct reg_offset_data r = bdev->layout[reg];
+	const struct reg_offset_data r = bdev->dev_data->reg_info[reg];
 
 	return bdev->regs + r.base_offset +
 		r.pipe_mult * pipe +
@@ -1205,9 +1221,9 @@ static void bam_channel_init(struct bam_device *bdev, struct bam_chan *bchan,
 }
 
 static const struct of_device_id bam_of_match[] = {
-	{ .compatible = "qcom,bam-v1.3.0", .data = &bam_v1_3_reg_info },
-	{ .compatible = "qcom,bam-v1.4.0", .data = &bam_v1_4_reg_info },
-	{ .compatible = "qcom,bam-v1.7.0", .data = &bam_v1_7_reg_info },
+	{ .compatible = "qcom,bam-v1.3.0", .data = &bam_v1_3_data },
+	{ .compatible = "qcom,bam-v1.4.0", .data = &bam_v1_4_data },
+	{ .compatible = "qcom,bam-v1.7.0", .data = &bam_v1_7_data },
 	{}
 };
 
@@ -1231,7 +1247,7 @@ static int bam_dma_probe(struct platform_device *pdev)
 		return -ENODEV;
 	}
 
-	bdev->layout = match->data;
+	bdev->dev_data = match->data;
 
 	bdev->regs = devm_platform_ioremap_resource(pdev, 0);
 	if (IS_ERR(bdev->regs))

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 11/12] dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (9 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 10/12] dmaengine: qcom: bam_dma: Extend the driver's device match data Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-02 15:57 ` [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking Bartosz Golaszewski
  2026-03-03 12:43 ` [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Manivannan Sadhasivam
  12 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski,
	Dmitry Baryshkov

From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Extend the device match data with a flag indicating whether the IP
supports the BAM lock/unlock feature. Set it to true on BAM IP versions
1.4.0 and above.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/dma/qcom/bam_dma.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 8f6d03f6c673b57ed13aeca6c8331c71596d077b..83491e7c2f17d8c9d12a1a055baea7e3a0a75a53 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -115,6 +115,7 @@ struct reg_offset_data {
 
 struct bam_device_data {
 	const struct reg_offset_data *reg_info;
+	bool pipe_lock_supported;
 };
 
 static const struct reg_offset_data bam_v1_3_reg_info[] = {
@@ -181,6 +182,7 @@ static const struct reg_offset_data bam_v1_4_reg_info[] = {
 
 static const struct bam_device_data bam_v1_4_data = {
 	.reg_info = bam_v1_4_reg_info,
+	.pipe_lock_supported = true,
 };
 
 static const struct reg_offset_data bam_v1_7_reg_info[] = {
@@ -214,6 +216,7 @@ static const struct reg_offset_data bam_v1_7_reg_info[] = {
 
 static const struct bam_device_data bam_v1_7_data = {
 	.reg_info = bam_v1_7_reg_info,
+	.pipe_lock_supported = true,
 };
 
 /* BAM CTRL */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (10 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 11/12] dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support Bartosz Golaszewski
@ 2026-03-02 15:57 ` Bartosz Golaszewski
  2026-03-04 14:53   ` Vinod Koul
  2026-03-03 12:43 ` [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Manivannan Sadhasivam
  12 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-02 15:57 UTC (permalink / raw)
  To: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li
  Cc: dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Bartosz Golaszewski

Add support for BAM pipe locking. To that end: when starting the DMA on
an RX channel - wrap the already issued descriptors with additional
command descriptors performing dummy writes to the base register
supplied by the client via dmaengine_slave_config() (if any) alongside
the lock/unlock HW flags.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
---
 drivers/dma/qcom/bam_dma.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
index 83491e7c2f17d8c9d12a1a055baea7e3a0a75a53..b149cbe9613f0bdc8e26cae4f0cc6922997480d5 100644
--- a/drivers/dma/qcom/bam_dma.c
+++ b/drivers/dma/qcom/bam_dma.c
@@ -28,11 +28,13 @@
 #include <linux/clk.h>
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
+#include <linux/dma/qcom_bam_dma.h>
 #include <linux/dmaengine.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/kernel.h>
+#include <linux/lockdep.h>
 #include <linux/module.h>
 #include <linux/of_address.h>
 #include <linux/of_dma.h>
@@ -60,6 +62,8 @@ struct bam_desc_hw {
 #define DESC_FLAG_EOB BIT(13)
 #define DESC_FLAG_NWD BIT(12)
 #define DESC_FLAG_CMD BIT(11)
+#define DESC_FLAG_LOCK BIT(10)
+#define DESC_FLAG_UNLOCK BIT(9)
 
 struct bam_async_desc {
 	struct virt_dma_desc vd;
@@ -391,6 +395,12 @@ struct bam_chan {
 	struct list_head desc_list;
 
 	struct list_head node;
+
+	/* BAM locking infrastructure */
+	struct scatterlist lock_sg;
+	struct scatterlist unlock_sg;
+	struct bam_cmd_element lock_ce;
+	struct bam_cmd_element unlock_ce;
 };
 
 static inline struct bam_chan *to_bam_chan(struct dma_chan *common)
@@ -1012,14 +1022,92 @@ static void bam_apply_new_config(struct bam_chan *bchan,
 	bchan->reconfigure = 0;
 }
 
+static struct bam_async_desc *
+bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
+		   struct bam_cmd_element *ce, unsigned int flag)
+{
+	struct dma_chan *chan = &bchan->vc.chan;
+	struct bam_async_desc *async_desc;
+	struct bam_desc_hw *desc;
+	struct virt_dma_desc *vd;
+	struct virt_dma_chan *vc;
+	unsigned int mapped;
+	dma_cookie_t cookie;
+	int ret;
+
+	async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
+	if (!async_desc) {
+		dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
+		return NULL;
+	}
+
+	async_desc->num_desc = 1;
+	async_desc->curr_desc = async_desc->desc;
+	async_desc->dir = DMA_MEM_TO_DEV;
+
+	desc = async_desc->desc;
+
+	bam_prep_ce_le32(ce, bchan->slave.dst_addr, BAM_WRITE_COMMAND, 0);
+	sg_set_buf(sg, ce, sizeof(*ce));
+
+	mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
+	if (!mapped) {
+		kfree(async_desc);
+		return NULL;
+	}
+
+	desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
+	desc->addr = sg_dma_address(sg);
+	desc->size = sizeof(struct bam_cmd_element);
+
+	vc = &bchan->vc;
+	vd = &async_desc->vd;
+
+	dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
+	vd->tx.flags = DMA_PREP_CMD;
+	vd->tx.desc_free = vchan_tx_desc_free;
+	vd->tx_result.result = DMA_TRANS_NOERROR;
+	vd->tx_result.residue = 0;
+
+	cookie = dma_cookie_assign(&vd->tx);
+	ret = dma_submit_error(cookie);
+	if (ret)
+		return NULL;
+
+	return async_desc;
+}
+
+static int bam_setup_pipe_lock(struct bam_chan *bchan)
+{
+	struct bam_async_desc *lock_desc, *unlock_desc;
+
+	lock_desc = bam_make_lock_desc(bchan, &bchan->lock_sg,
+				       &bchan->lock_ce, DESC_FLAG_LOCK);
+	if (!lock_desc)
+		return -ENOMEM;
+
+	unlock_desc = bam_make_lock_desc(bchan, &bchan->unlock_sg,
+					 &bchan->unlock_ce, DESC_FLAG_UNLOCK);
+	if (!unlock_desc) {
+		kfree(lock_desc);
+		return -ENOMEM;
+	}
+
+	list_add(&lock_desc->vd.node, &bchan->vc.desc_issued);
+	list_add_tail(&unlock_desc->vd.node, &bchan->vc.desc_issued);
+
+	return 0;
+}
+
 /**
  * bam_start_dma - start next transaction
  * @bchan: bam dma channel
  */
 static void bam_start_dma(struct bam_chan *bchan)
 {
-	struct virt_dma_desc *vd = vchan_next_desc(&bchan->vc);
+	struct virt_dma_desc *vd;
 	struct bam_device *bdev = bchan->bdev;
+	const struct bam_device_data *bdata = bdev->dev_data;
 	struct bam_async_desc *async_desc = NULL;
 	struct bam_desc_hw *desc;
 	struct bam_desc_hw *fifo = PTR_ALIGN(bchan->fifo_virt,
@@ -1030,6 +1118,16 @@ static void bam_start_dma(struct bam_chan *bchan)
 
 	lockdep_assert_held(&bchan->vc.lock);
 
+	if (bdata->pipe_lock_supported && bchan->slave.dst_addr &&
+	    bchan->slave.direction == DMA_MEM_TO_DEV) {
+		ret = bam_setup_pipe_lock(bchan);
+		if (ret) {
+			dev_err(bdev->dev, "Failed to set up the BAM lock\n");
+			return;
+		}
+	}
+
+	vd = vchan_next_desc(&bchan->vc);
 	if (!vd)
 		return;
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
                   ` (11 preceding siblings ...)
  2026-03-02 15:57 ` [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking Bartosz Golaszewski
@ 2026-03-03 12:43 ` Manivannan Sadhasivam
  2026-03-03 12:56   ` Bartosz Golaszewski
  2026-03-05 11:59   ` Stephan Gerhold
  12 siblings, 2 replies; 28+ messages in thread
From: Manivannan Sadhasivam @ 2026-03-03 12:43 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Vinod Koul, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, brgl, Bartosz Golaszewski, Konrad Dybcio,
	Dmitry Baryshkov, Bjorn Andersson

On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> NOTE: Please note that even though this is version 11, I changed the
> prefix to RFC as this is an entirely new approach resulting from
> discussions under v9. I AM AWARE of the existing memory leaks in the
> last patch of this series - I'm sending it because I want to first
> discuss the approach and get a green light from Vinod as well as Mani
> and Bjorn. Especially when it comes to communicating the address for the
> dummy rights from the client to the BAM driver.
> /NOTE
> 
> Currently the QCE crypto driver accesses the crypto engine registers
> directly via CPU. Trust Zone may perform crypto operations simultaneously
> resulting in a race condition. To remedy that, let's introduce support
> for BAM locking/unlocking to the driver. The BAM driver will now wrap
> any existing issued descriptor chains with additional descriptors
> performing the locking when the client starts the transaction
> (dmaengine_issue_pending()). The client wanting to profit from locking
> needs to switch to performing register I/O over DMA and communicate the
> address to which to perform the dummy writes via a call to
> dmaengine_slave_config().
> 

Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
a dummy write to an address in the client register space. So in this case, you
can also use the previous metadata approach to pass the scratchpad register to
the BAM driver from clients. The BAM driver can use this register to perform
LOCK/UNLOCK.

It may sound like I'm suggesting a part of your previous design, but it fits the
design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
the scratchpad register address from the clients through the metadata once.

It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
the clients. These would've made the code/design even more cleaner.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-03 12:43 ` [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Manivannan Sadhasivam
@ 2026-03-03 12:56   ` Bartosz Golaszewski
  2026-03-05 11:59   ` Stephan Gerhold
  1 sibling, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-03 12:56 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Vinod Koul
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski, Konrad Dybcio,
	Dmitry Baryshkov, Bjorn Andersson

On Tue, Mar 3, 2026 at 1:44 PM Manivannan Sadhasivam <mani@kernel.org> wrote:
>
> On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > NOTE: Please note that even though this is version 11, I changed the
> > prefix to RFC as this is an entirely new approach resulting from
> > discussions under v9. I AM AWARE of the existing memory leaks in the
> > last patch of this series - I'm sending it because I want to first
> > discuss the approach and get a green light from Vinod as well as Mani
> > and Bjorn. Especially when it comes to communicating the address for the
> > dummy rights from the client to the BAM driver.
> > /NOTE
> >
> > Currently the QCE crypto driver accesses the crypto engine registers
> > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > resulting in a race condition. To remedy that, let's introduce support
> > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > any existing issued descriptor chains with additional descriptors
> > performing the locking when the client starts the transaction
> > (dmaengine_issue_pending()). The client wanting to profit from locking
> > needs to switch to performing register I/O over DMA and communicate the
> > address to which to perform the dummy writes via a call to
> > dmaengine_slave_config().
> >
>
> Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> a dummy write to an address in the client register space. So in this case, you
> can also use the previous metadata approach to pass the scratchpad register to
> the BAM driver from clients. The BAM driver can use this register to perform
> LOCK/UNLOCK.
>

I thought about reusing descriptor metadata but this is attached to
every descriptor created with dmaengine_prep_slave_sg() (as opposed to
doing it once with dmaengine_slave_config()). We would also still need
a custom struct in the existing BAM DMA header.

I'm fine with that. Vinod: could you please say if that's sound for
you and if so - I'll rework it and also fix the memory leaks by
reworking the cleanup path.

> It may sound like I'm suggesting a part of your previous design, but it fits the
> design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> the scratchpad register address from the clients through the metadata once.
>

If that gets it upstream, then it's fine with me.

> It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> the clients. These would've made the code/design even more cleaner.
>

For sure but I double-checked this. :(

Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine
  2026-03-02 15:57 ` [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine Bartosz Golaszewski
@ 2026-03-04 14:39   ` Vinod Koul
  2026-03-04 15:05     ` Bartosz Golaszewski
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2026-03-04 14:39 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel, brgl,
	Bartosz Golaszewski

On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> In order to let the BAM DMA engine know which address is used for
> register I/O, call dmaengine_slave_config() after requesting the RX
> channel and use the config structure to pass that information to the
> dmaengine core. This is done ahead of extending the BAM driver with
> support for pipe locking, which requires performing dummy writes when
> passing the lock/unlock flags alongside the command descriptors.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> ---
>  drivers/crypto/qce/core.c | 3 ++-
>  drivers/crypto/qce/dma.c  | 8 ++++++++
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/crypto/qce/core.c b/drivers/crypto/qce/core.c
> index 2667fcd67fee826a44080da8f88a3e2abbb9b2cf..f6363d2a1231dcee0176824135389c42bec02153 100644
> --- a/drivers/crypto/qce/core.c
> +++ b/drivers/crypto/qce/core.c
> @@ -211,6 +211,8 @@ static int qce_crypto_probe(struct platform_device *pdev)
>  	if (IS_ERR(qce->base))
>  		return PTR_ERR(qce->base);
>  
> +	qce->base_phys = res->start;
> +
>  	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>  	if (ret < 0)
>  		return ret;
> @@ -260,7 +262,6 @@ static int qce_crypto_probe(struct platform_device *pdev)
>  	qce->dma_size = resource_size(res);
>  	qce->base_dma = dma_map_resource(dev, res->start, qce->dma_size,
>  					 DMA_BIDIRECTIONAL, 0);
> -	qce->base_phys = res->start;
>  	ret = dma_mapping_error(dev, qce->base_dma);
>  	if (ret)
>  		return ret;
> diff --git a/drivers/crypto/qce/dma.c b/drivers/crypto/qce/dma.c
> index ba7a52fd4c6349d59c075c346f75741defeb6034..86f22c9a11f8a9e055c243dd8beaf1ded6f88bb9 100644
> --- a/drivers/crypto/qce/dma.c
> +++ b/drivers/crypto/qce/dma.c
> @@ -109,7 +109,9 @@ void qce_write_dma(struct qce_device *qce, unsigned int offset, u32 val)
>  int devm_qce_dma_request(struct qce_device *qce)
>  {
>  	struct qce_dma_data *dma = &qce->dma;
> +	struct dma_slave_config cfg = { };
>  	struct device *dev = qce->dev;
> +	int ret;
>  
>  	dma->txchan = devm_dma_request_chan(dev, "tx");
>  	if (IS_ERR(dma->txchan))
> @@ -121,6 +123,12 @@ int devm_qce_dma_request(struct qce_device *qce)
>  		return dev_err_probe(dev, PTR_ERR(dma->rxchan),
>  				     "Failed to get RX DMA channel\n");
>  
> +	cfg.dst_addr = qce->base_phys;
> +	cfg.direction = DMA_MEM_TO_DEV;

So is this the address of crypto engine address where dma data is
supposed to be pushed to..?

-- 
~Vinod

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking
  2026-03-02 15:57 ` [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking Bartosz Golaszewski
@ 2026-03-04 14:53   ` Vinod Koul
  2026-03-04 15:27     ` Bartosz Golaszewski
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2026-03-04 14:53 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel, brgl,
	Bartosz Golaszewski

On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> Add support for BAM pipe locking. To that end: when starting the DMA on
> an RX channel - wrap the already issued descriptors with additional
> command descriptors performing dummy writes to the base register
> supplied by the client via dmaengine_slave_config() (if any) alongside
> the lock/unlock HW flags.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> ---
>  drivers/dma/qcom/bam_dma.c | 100 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 99 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/qcom/bam_dma.c b/drivers/dma/qcom/bam_dma.c
> index 83491e7c2f17d8c9d12a1a055baea7e3a0a75a53..b149cbe9613f0bdc8e26cae4f0cc6922997480d5 100644
> --- a/drivers/dma/qcom/bam_dma.c
> +++ b/drivers/dma/qcom/bam_dma.c
> @@ -28,11 +28,13 @@
>  #include <linux/clk.h>
>  #include <linux/device.h>
>  #include <linux/dma-mapping.h>
> +#include <linux/dma/qcom_bam_dma.h>
>  #include <linux/dmaengine.h>
>  #include <linux/init.h>
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/kernel.h>
> +#include <linux/lockdep.h>
>  #include <linux/module.h>
>  #include <linux/of_address.h>
>  #include <linux/of_dma.h>
> @@ -60,6 +62,8 @@ struct bam_desc_hw {
>  #define DESC_FLAG_EOB BIT(13)
>  #define DESC_FLAG_NWD BIT(12)
>  #define DESC_FLAG_CMD BIT(11)
> +#define DESC_FLAG_LOCK BIT(10)
> +#define DESC_FLAG_UNLOCK BIT(9)
>  
>  struct bam_async_desc {
>  	struct virt_dma_desc vd;
> @@ -391,6 +395,12 @@ struct bam_chan {
>  	struct list_head desc_list;
>  
>  	struct list_head node;
> +
> +	/* BAM locking infrastructure */
> +	struct scatterlist lock_sg;
> +	struct scatterlist unlock_sg;
> +	struct bam_cmd_element lock_ce;
> +	struct bam_cmd_element unlock_ce;
>  };
>  
>  static inline struct bam_chan *to_bam_chan(struct dma_chan *common)
> @@ -1012,14 +1022,92 @@ static void bam_apply_new_config(struct bam_chan *bchan,
>  	bchan->reconfigure = 0;
>  }
>  
> +static struct bam_async_desc *
> +bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
> +		   struct bam_cmd_element *ce, unsigned int flag)
> +{
> +	struct dma_chan *chan = &bchan->vc.chan;
> +	struct bam_async_desc *async_desc;
> +	struct bam_desc_hw *desc;
> +	struct virt_dma_desc *vd;
> +	struct virt_dma_chan *vc;
> +	unsigned int mapped;
> +	dma_cookie_t cookie;
> +	int ret;
> +
> +	async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
> +	if (!async_desc) {
> +		dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
> +		return NULL;
> +	}
> +
> +	async_desc->num_desc = 1;
> +	async_desc->curr_desc = async_desc->desc;
> +	async_desc->dir = DMA_MEM_TO_DEV;
> +
> +	desc = async_desc->desc;
> +
> +	bam_prep_ce_le32(ce, bchan->slave.dst_addr, BAM_WRITE_COMMAND, 0);
> +	sg_set_buf(sg, ce, sizeof(*ce));
> +
> +	mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
> +	if (!mapped) {
> +		kfree(async_desc);
> +		return NULL;
> +	}
> +
> +	desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
> +	desc->addr = sg_dma_address(sg);
> +	desc->size = sizeof(struct bam_cmd_element);
> +
> +	vc = &bchan->vc;
> +	vd = &async_desc->vd;
> +
> +	dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
> +	vd->tx.flags = DMA_PREP_CMD;
> +	vd->tx.desc_free = vchan_tx_desc_free;
> +	vd->tx_result.result = DMA_TRANS_NOERROR;
> +	vd->tx_result.residue = 0;
> +
> +	cookie = dma_cookie_assign(&vd->tx);
> +	ret = dma_submit_error(cookie);

I am not sure I understand this.

At start you add a descriptor in the queue, ideally which should be
queued after the existing descriptors are completed!

Also I thought you want to append Pipe cmd to descriptors, why not do
this while preparing the descriptors and add the pipe cmd and start and
end of the sequence when you prepare... This was you dont need to create
a cookie like this


> +	if (ret)
> +		return NULL;
> +
> +	return async_desc;
> +}
> +
> +static int bam_setup_pipe_lock(struct bam_chan *bchan)
> +{
> +	struct bam_async_desc *lock_desc, *unlock_desc;
> +
> +	lock_desc = bam_make_lock_desc(bchan, &bchan->lock_sg,
> +				       &bchan->lock_ce, DESC_FLAG_LOCK);
> +	if (!lock_desc)
> +		return -ENOMEM;
> +
> +	unlock_desc = bam_make_lock_desc(bchan, &bchan->unlock_sg,
> +					 &bchan->unlock_ce, DESC_FLAG_UNLOCK);
> +	if (!unlock_desc) {
> +		kfree(lock_desc);
> +		return -ENOMEM;
> +	}
> +
> +	list_add(&lock_desc->vd.node, &bchan->vc.desc_issued);
> +	list_add_tail(&unlock_desc->vd.node, &bchan->vc.desc_issued);
> +
> +	return 0;
> +}
> +
>  /**
>   * bam_start_dma - start next transaction
>   * @bchan: bam dma channel
>   */
>  static void bam_start_dma(struct bam_chan *bchan)
>  {
> -	struct virt_dma_desc *vd = vchan_next_desc(&bchan->vc);
> +	struct virt_dma_desc *vd;
>  	struct bam_device *bdev = bchan->bdev;
> +	const struct bam_device_data *bdata = bdev->dev_data;
>  	struct bam_async_desc *async_desc = NULL;
>  	struct bam_desc_hw *desc;
>  	struct bam_desc_hw *fifo = PTR_ALIGN(bchan->fifo_virt,
> @@ -1030,6 +1118,16 @@ static void bam_start_dma(struct bam_chan *bchan)
>  
>  	lockdep_assert_held(&bchan->vc.lock);
>  
> +	if (bdata->pipe_lock_supported && bchan->slave.dst_addr &&
> +	    bchan->slave.direction == DMA_MEM_TO_DEV) {
> +		ret = bam_setup_pipe_lock(bchan);
> +		if (ret) {
> +			dev_err(bdev->dev, "Failed to set up the BAM lock\n");
> +			return;
> +		}
> +	}
> +
> +	vd = vchan_next_desc(&bchan->vc);
>  	if (!vd)
>  		return;
>  
> 
> -- 
> 2.47.3

-- 
~Vinod

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine
  2026-03-04 14:39   ` Vinod Koul
@ 2026-03-04 15:05     ` Bartosz Golaszewski
  2026-03-07 20:35       ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-04 15:05 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski

On Wed, Mar 4, 2026 at 3:39 PM Vinod Koul <vkoul@kernel.org> wrote:
>
> On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> > In order to let the BAM DMA engine know which address is used for
> > register I/O, call dmaengine_slave_config() after requesting the RX
> > channel and use the config structure to pass that information to the
> > dmaengine core. This is done ahead of extending the BAM driver with
> > support for pipe locking, which requires performing dummy writes when
> > passing the lock/unlock flags alongside the command descriptors.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> > ---
> >
> >       dma->txchan = devm_dma_request_chan(dev, "tx");
> >       if (IS_ERR(dma->txchan))
> > @@ -121,6 +123,12 @@ int devm_qce_dma_request(struct qce_device *qce)
> >               return dev_err_probe(dev, PTR_ERR(dma->rxchan),
> >                                    "Failed to get RX DMA channel\n");
> >
> > +     cfg.dst_addr = qce->base_phys;
> > +     cfg.direction = DMA_MEM_TO_DEV;
>
> So is this the address of crypto engine address where dma data is
> supposed to be pushed to..?
>

No. In case I wasn't clear enough in the cover letter: this is the
address of the *crypto engine* register which we use as a scratchpad
for the dummy write when issuing the lock/unlock command. Mani
suggested under the cover letter to use the descriptor metadata for
that.

Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking
  2026-03-04 14:53   ` Vinod Koul
@ 2026-03-04 15:27     ` Bartosz Golaszewski
  2026-03-07 20:33       ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-04 15:27 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski

On Wed, Mar 4, 2026 at 3:53 PM Vinod Koul <vkoul@kernel.org> wrote:
>
> On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> > Add support for BAM pipe locking. To that end: when starting the DMA on
> > an RX channel - wrap the already issued descriptors with additional
> > command descriptors performing dummy writes to the base register
> > supplied by the client via dmaengine_slave_config() (if any) alongside
> > the lock/unlock HW flags.
> >
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>

[snip]

> > +static struct bam_async_desc *
> > +bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
> > +                struct bam_cmd_element *ce, unsigned int flag)
> > +{
> > +     struct dma_chan *chan = &bchan->vc.chan;
> > +     struct bam_async_desc *async_desc;
> > +     struct bam_desc_hw *desc;
> > +     struct virt_dma_desc *vd;
> > +     struct virt_dma_chan *vc;
> > +     unsigned int mapped;
> > +     dma_cookie_t cookie;
> > +     int ret;
> > +
> > +     async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
> > +     if (!async_desc) {
> > +             dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
> > +             return NULL;
> > +     }
> > +
> > +     async_desc->num_desc = 1;
> > +     async_desc->curr_desc = async_desc->desc;
> > +     async_desc->dir = DMA_MEM_TO_DEV;
> > +
> > +     desc = async_desc->desc;
> > +
> > +     bam_prep_ce_le32(ce, bchan->slave.dst_addr, BAM_WRITE_COMMAND, 0);
> > +     sg_set_buf(sg, ce, sizeof(*ce));
> > +
> > +     mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
> > +     if (!mapped) {
> > +             kfree(async_desc);
> > +             return NULL;
> > +     }
> > +
> > +     desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
> > +     desc->addr = sg_dma_address(sg);
> > +     desc->size = sizeof(struct bam_cmd_element);
> > +
> > +     vc = &bchan->vc;
> > +     vd = &async_desc->vd;
> > +
> > +     dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
> > +     vd->tx.flags = DMA_PREP_CMD;
> > +     vd->tx.desc_free = vchan_tx_desc_free;
> > +     vd->tx_result.result = DMA_TRANS_NOERROR;
> > +     vd->tx_result.residue = 0;
> > +
> > +     cookie = dma_cookie_assign(&vd->tx);
> > +     ret = dma_submit_error(cookie);
>
> I am not sure I understand this.
>
> At start you add a descriptor in the queue, ideally which should be
> queued after the existing descriptors are completed!
>
> Also I thought you want to append Pipe cmd to descriptors, why not do
> this while preparing the descriptors and add the pipe cmd and start and
> end of the sequence when you prepare... This was you dont need to create
> a cookie like this
>

Client (in this case - crypto engine) can call
dmaengine_prep_slave_sg() multiple times adding several logical
descriptors which themselves can have several hardware descriptors. We
want to lock the channel before issuing the first queued descriptor
(for crypto: typically data descriptor) and unlock it once the final
descriptor is processed (typically command descriptor). To that end:
we insert the dummy command descriptor with the lock flag at the head
of the queue and the one with the unlock flag at the tail - "wrapping"
the existing queue with lock/unlock operations.

Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-03 12:43 ` [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Manivannan Sadhasivam
  2026-03-03 12:56   ` Bartosz Golaszewski
@ 2026-03-05 11:59   ` Stephan Gerhold
  2026-03-05 13:10     ` Bartosz Golaszewski
  1 sibling, 1 reply; 28+ messages in thread
From: Stephan Gerhold @ 2026-03-05 11:59 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel,
	Konrad Dybcio, Dmitry Baryshkov, Bjorn Andersson

On Tue, Mar 03, 2026 at 06:13:56PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > NOTE: Please note that even though this is version 11, I changed the
> > prefix to RFC as this is an entirely new approach resulting from
> > discussions under v9. I AM AWARE of the existing memory leaks in the
> > last patch of this series - I'm sending it because I want to first
> > discuss the approach and get a green light from Vinod as well as Mani
> > and Bjorn. Especially when it comes to communicating the address for the
> > dummy rights from the client to the BAM driver.
> > /NOTE
> > 
> > Currently the QCE crypto driver accesses the crypto engine registers
> > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > resulting in a race condition. To remedy that, let's introduce support
> > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > any existing issued descriptor chains with additional descriptors
> > performing the locking when the client starts the transaction
> > (dmaengine_issue_pending()). The client wanting to profit from locking
> > needs to switch to performing register I/O over DMA and communicate the
> > address to which to perform the dummy writes via a call to
> > dmaengine_slave_config().
> > 
> 
> Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> a dummy write to an address in the client register space. So in this case, you
> can also use the previous metadata approach to pass the scratchpad register to
> the BAM driver from clients. The BAM driver can use this register to perform
> LOCK/UNLOCK.
> 
> It may sound like I'm suggesting a part of your previous design, but it fits the
> design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> the scratchpad register address from the clients through the metadata once.
> 
> It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> the clients. These would've made the code/design even more cleaner.
> 

I was staring at the downstream drivers for QCE (qce50.c?) [1] for a bit
and my impression is that they manage to get along without dummy writes.
It's a big mess, but it looks like they always have some commands
(depending on the crypto operation) that they are sending anyway and
they just assign the LOCK/UNLOCK flag to the command descriptor of that.

It is similar for the second relevant user of the LOCK/UNLOCK flags, the
QPIC NAND driver (msm_qpic_nand.c in downstream [2], qcom_nandc.c in
mainline), it is assigned as part of the register programming sequence
instead of using a dummy write. In addition, the UNLOCK flag is
sometimes assigned to a READ command descriptor rather than a WRITE.

@Bartosz: Can we get by without doing any dummy writes?
If not, would a dummy read perhaps be less intrusive than a dummy write?

Thanks,
Stephan

[1]: https://git.codelinaro.org/clo/la/platform/vendor/qcom/opensource/securemsm-kernel/-/blob/sec-kernel.lnx.14.16.r4-rel/crypto-qti/qce50.c
[2]: https://git.codelinaro.org/clo/la/kernel/msm-5.15/-/blob/kernel.lnx.5.15.r46-rel/drivers/mtd/devices/msm_qpic_nand.c#L542-562

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-05 11:59   ` Stephan Gerhold
@ 2026-03-05 13:10     ` Bartosz Golaszewski
  2026-03-05 13:43       ` Stephan Gerhold
  0 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-05 13:10 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel,
	Konrad Dybcio, Dmitry Baryshkov, Bjorn Andersson

On Thu, Mar 5, 2026 at 1:00 PM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> On Tue, Mar 03, 2026 at 06:13:56PM +0530, Manivannan Sadhasivam wrote:
> > On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > > NOTE: Please note that even though this is version 11, I changed the
> > > prefix to RFC as this is an entirely new approach resulting from
> > > discussions under v9. I AM AWARE of the existing memory leaks in the
> > > last patch of this series - I'm sending it because I want to first
> > > discuss the approach and get a green light from Vinod as well as Mani
> > > and Bjorn. Especially when it comes to communicating the address for the
> > > dummy rights from the client to the BAM driver.
> > > /NOTE
> > >
> > > Currently the QCE crypto driver accesses the crypto engine registers
> > > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > > resulting in a race condition. To remedy that, let's introduce support
> > > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > > any existing issued descriptor chains with additional descriptors
> > > performing the locking when the client starts the transaction
> > > (dmaengine_issue_pending()). The client wanting to profit from locking
> > > needs to switch to performing register I/O over DMA and communicate the
> > > address to which to perform the dummy writes via a call to
> > > dmaengine_slave_config().
> > >
> >
> > Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> > neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> > a dummy write to an address in the client register space. So in this case, you
> > can also use the previous metadata approach to pass the scratchpad register to
> > the BAM driver from clients. The BAM driver can use this register to perform
> > LOCK/UNLOCK.
> >
> > It may sound like I'm suggesting a part of your previous design, but it fits the
> > design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> > the scratchpad register address from the clients through the metadata once.
> >
> > It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> > some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> > the clients. These would've made the code/design even more cleaner.
> >
>
> I was staring at the downstream drivers for QCE (qce50.c?) [1] for a bit
> and my impression is that they manage to get along without dummy writes.
> It's a big mess, but it looks like they always have some commands
> (depending on the crypto operation) that they are sending anyway and
> they just assign the LOCK/UNLOCK flag to the command descriptor of that.
>
> It is similar for the second relevant user of the LOCK/UNLOCK flags, the
> QPIC NAND driver (msm_qpic_nand.c in downstream [2], qcom_nandc.c in
> mainline), it is assigned as part of the register programming sequence
> instead of using a dummy write. In addition, the UNLOCK flag is
> sometimes assigned to a READ command descriptor rather than a WRITE.
>
> @Bartosz: Can we get by without doing any dummy writes?
> If not, would a dummy read perhaps be less intrusive than a dummy write?
>

The HPG says that the LOCK/UNLOCK flag *must* be set on a command
descriptor, not a data descriptor. For a simple encryption we will
typically have a data descriptor and a command descriptor with
register writes. So we need a command descriptor in front of the data
and - while we could technically set the UNLOCK bit on the subsequent
command descriptor - it's unclear from the HPG whether it will unlock
before or after processing the command descriptor with the UNLOCK bit
set. Hence the additional command descriptor at the end.

The HPG also only mentions a write command and says nothing about a
read. In any case: that's the least of the problems as switching to
read doesn't solve the issue of passing the address of the scratchpad
register.

So while some of this *may* just work, I would prefer to stick to what
documentation says *will* work. :)

Bartosz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-05 13:10     ` Bartosz Golaszewski
@ 2026-03-05 13:43       ` Stephan Gerhold
  2026-03-05 13:54         ` Bartosz Golaszewski
  0 siblings, 1 reply; 28+ messages in thread
From: Stephan Gerhold @ 2026-03-05 13:43 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel,
	Konrad Dybcio, Dmitry Baryshkov, Bjorn Andersson

On Thu, Mar 05, 2026 at 02:10:55PM +0100, Bartosz Golaszewski wrote:
> On Thu, Mar 5, 2026 at 1:00 PM Stephan Gerhold
> <stephan.gerhold@linaro.org> wrote:
> >
> > On Tue, Mar 03, 2026 at 06:13:56PM +0530, Manivannan Sadhasivam wrote:
> > > On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > > > NOTE: Please note that even though this is version 11, I changed the
> > > > prefix to RFC as this is an entirely new approach resulting from
> > > > discussions under v9. I AM AWARE of the existing memory leaks in the
> > > > last patch of this series - I'm sending it because I want to first
> > > > discuss the approach and get a green light from Vinod as well as Mani
> > > > and Bjorn. Especially when it comes to communicating the address for the
> > > > dummy rights from the client to the BAM driver.
> > > > /NOTE
> > > >
> > > > Currently the QCE crypto driver accesses the crypto engine registers
> > > > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > > > resulting in a race condition. To remedy that, let's introduce support
> > > > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > > > any existing issued descriptor chains with additional descriptors
> > > > performing the locking when the client starts the transaction
> > > > (dmaengine_issue_pending()). The client wanting to profit from locking
> > > > needs to switch to performing register I/O over DMA and communicate the
> > > > address to which to perform the dummy writes via a call to
> > > > dmaengine_slave_config().
> > > >
> > >
> > > Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> > > neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> > > a dummy write to an address in the client register space. So in this case, you
> > > can also use the previous metadata approach to pass the scratchpad register to
> > > the BAM driver from clients. The BAM driver can use this register to perform
> > > LOCK/UNLOCK.
> > >
> > > It may sound like I'm suggesting a part of your previous design, but it fits the
> > > design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> > > the scratchpad register address from the clients through the metadata once.
> > >
> > > It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> > > some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> > > the clients. These would've made the code/design even more cleaner.
> > >
> >
> > I was staring at the downstream drivers for QCE (qce50.c?) [1] for a bit
> > and my impression is that they manage to get along without dummy writes.
> > It's a big mess, but it looks like they always have some commands
> > (depending on the crypto operation) that they are sending anyway and
> > they just assign the LOCK/UNLOCK flag to the command descriptor of that.
> >
> > It is similar for the second relevant user of the LOCK/UNLOCK flags, the
> > QPIC NAND driver (msm_qpic_nand.c in downstream [2], qcom_nandc.c in
> > mainline), it is assigned as part of the register programming sequence
> > instead of using a dummy write. In addition, the UNLOCK flag is
> > sometimes assigned to a READ command descriptor rather than a WRITE.
> >
> > @Bartosz: Can we get by without doing any dummy writes?
> > If not, would a dummy read perhaps be less intrusive than a dummy write?
> >
> 
> The HPG says that the LOCK/UNLOCK flag *must* be set on a command
> descriptor, not a data descriptor. For a simple encryption we will
> typically have a data descriptor and a command descriptor with
> register writes. So we need a command descriptor in front of the data
> and - while we could technically set the UNLOCK bit on the subsequent
> command descriptor - it's unclear from the HPG whether it will unlock
> before or after processing the command descriptor with the UNLOCK bit
> set. Hence the additional command descriptor at the end.
>

I won't pretend that I actually understand what the downstream QCE
driver is doing, but e.g. qce_ablk_cipher_req() in the qce50.c I linked
looks like they just put the command descriptor with all the register
writes first and then the data second (followed by another command
descriptor for cleanup/unlocking). Is it actually required to put the
data first?

> The HPG also only mentions a write command and says nothing about a
> read. In any case: that's the least of the problems as switching to
> read doesn't solve the issue of passing the address of the scratchpad
> register.

True.

> 
> So while some of this *may* just work, I would prefer to stick to what
> documentation says *will* work. :)
> 

Well, the question is if there is always a dummy register that can be
safely written (without causing any side effects). This will be always
just based on experiments, since the concept of a dummy write doesn't
seem to exist downstream (and I assume the documentation doesn't suggest
a specific register to use either).

NAND_VERSION (0xf08) might work for qcom_nandc.c (which might be the
only other relevant user of the BAM locking functionality...).

Thanks,
Stephan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-05 13:43       ` Stephan Gerhold
@ 2026-03-05 13:54         ` Bartosz Golaszewski
  2026-03-05 16:15           ` Stephan Gerhold
  0 siblings, 1 reply; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-05 13:54 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam, Dmitry Baryshkov,
	Peter Ujfalusi, Michal Simek, Frank Li, dmaengine, linux-doc,
	linux-kernel, linux-arm-msm, linux-crypto, linux-arm-kernel,
	Konrad Dybcio, Dmitry Baryshkov, Bjorn Andersson

On Thu, Mar 5, 2026 at 2:43 PM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> On Thu, Mar 05, 2026 at 02:10:55PM +0100, Bartosz Golaszewski wrote:
> > On Thu, Mar 5, 2026 at 1:00 PM Stephan Gerhold
> > <stephan.gerhold@linaro.org> wrote:
> > >
> > > On Tue, Mar 03, 2026 at 06:13:56PM +0530, Manivannan Sadhasivam wrote:
> > > > On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > > > > NOTE: Please note that even though this is version 11, I changed the
> > > > > prefix to RFC as this is an entirely new approach resulting from
> > > > > discussions under v9. I AM AWARE of the existing memory leaks in the
> > > > > last patch of this series - I'm sending it because I want to first
> > > > > discuss the approach and get a green light from Vinod as well as Mani
> > > > > and Bjorn. Especially when it comes to communicating the address for the
> > > > > dummy rights from the client to the BAM driver.
> > > > > /NOTE
> > > > >
> > > > > Currently the QCE crypto driver accesses the crypto engine registers
> > > > > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > > > > resulting in a race condition. To remedy that, let's introduce support
> > > > > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > > > > any existing issued descriptor chains with additional descriptors
> > > > > performing the locking when the client starts the transaction
> > > > > (dmaengine_issue_pending()). The client wanting to profit from locking
> > > > > needs to switch to performing register I/O over DMA and communicate the
> > > > > address to which to perform the dummy writes via a call to
> > > > > dmaengine_slave_config().
> > > > >
> > > >
> > > > Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> > > > neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> > > > a dummy write to an address in the client register space. So in this case, you
> > > > can also use the previous metadata approach to pass the scratchpad register to
> > > > the BAM driver from clients. The BAM driver can use this register to perform
> > > > LOCK/UNLOCK.
> > > >
> > > > It may sound like I'm suggesting a part of your previous design, but it fits the
> > > > design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> > > > the scratchpad register address from the clients through the metadata once.
> > > >
> > > > It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> > > > some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> > > > the clients. These would've made the code/design even more cleaner.
> > > >
> > >
> > > I was staring at the downstream drivers for QCE (qce50.c?) [1] for a bit
> > > and my impression is that they manage to get along without dummy writes.
> > > It's a big mess, but it looks like they always have some commands
> > > (depending on the crypto operation) that they are sending anyway and
> > > they just assign the LOCK/UNLOCK flag to the command descriptor of that.
> > >
> > > It is similar for the second relevant user of the LOCK/UNLOCK flags, the
> > > QPIC NAND driver (msm_qpic_nand.c in downstream [2], qcom_nandc.c in
> > > mainline), it is assigned as part of the register programming sequence
> > > instead of using a dummy write. In addition, the UNLOCK flag is
> > > sometimes assigned to a READ command descriptor rather than a WRITE.
> > >
> > > @Bartosz: Can we get by without doing any dummy writes?
> > > If not, would a dummy read perhaps be less intrusive than a dummy write?
> > >
> >
> > The HPG says that the LOCK/UNLOCK flag *must* be set on a command
> > descriptor, not a data descriptor. For a simple encryption we will
> > typically have a data descriptor and a command descriptor with
> > register writes. So we need a command descriptor in front of the data
> > and - while we could technically set the UNLOCK bit on the subsequent
> > command descriptor - it's unclear from the HPG whether it will unlock
> > before or after processing the command descriptor with the UNLOCK bit
> > set. Hence the additional command descriptor at the end.
> >
>
> I won't pretend that I actually understand what the downstream QCE
> driver is doing, but e.g. qce_ablk_cipher_req() in the qce50.c I linked
> looks like they just put the command descriptor with all the register
> writes first and then the data second (followed by another command
> descriptor for cleanup/unlocking). Is it actually required to put the
> data first?
>

Well, now you're getting into the philosophical issue of imposing
requirements on the client which seemed to be the main point of
contention in earlier versions. If you start requiring the client to
put the DMA operations in a certain order (and it's not based on any
HW requirement but rather on how the DMA driver is implemented) then
how is it better than having the client just drive the locking
altogether like pre v11? We won't get away without at least some
requirements - like the client doing register I/O over DMA or
providing the scratchpad address - but I think just wrapping the
existing queue with additional descriptors in a way transparent to
consumers is better in this case. And as I said: the HPG doesn't
explicitly say that it unlocks the pipe *after* the descriptor with
the unlock bit is processed. Doesn't even hint at what real the
ordering is.

> > The HPG also only mentions a write command and says nothing about a
> > read. In any case: that's the least of the problems as switching to
> > read doesn't solve the issue of passing the address of the scratchpad
> > register.
>
> True.
>
> >
> > So while some of this *may* just work, I would prefer to stick to what
> > documentation says *will* work. :)
> >
>
> Well, the question is if there is always a dummy register that can be
> safely written (without causing any side effects). This will be always
> just based on experiments, since the concept of a dummy write doesn't
> seem to exist downstream (and I assume the documentation doesn't suggest
> a specific register to use either).
>

You'd think so but the HPG actually does use the word "dummy" to
describe the write operation with lock/unlock bits set. Though it does
not recommend any particular register to do it.

> NAND_VERSION (0xf08) might work for qcom_nandc.c (which might be the
> only other relevant user of the BAM locking functionality...).
>

Yeah, I do the same for QCE, write to the version register.

Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-05 13:54         ` Bartosz Golaszewski
@ 2026-03-05 16:15           ` Stephan Gerhold
  2026-03-06 11:07             ` Bartosz Golaszewski
  0 siblings, 1 reply; 28+ messages in thread
From: Stephan Gerhold @ 2026-03-05 16:15 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Md Sadre Alam, Dmitry Baryshkov, Peter Ujfalusi,
	Michal Simek, Frank Li, dmaengine, linux-doc, linux-kernel,
	linux-arm-msm, linux-crypto, linux-arm-kernel, Konrad Dybcio,
	Dmitry Baryshkov, Bjorn Andersson

On Thu, Mar 05, 2026 at 02:54:02PM +0100, Bartosz Golaszewski wrote:
> On Thu, Mar 5, 2026 at 2:43 PM Stephan Gerhold
> <stephan.gerhold@linaro.org> wrote:
> >
> > On Thu, Mar 05, 2026 at 02:10:55PM +0100, Bartosz Golaszewski wrote:
> > > On Thu, Mar 5, 2026 at 1:00 PM Stephan Gerhold
> > > <stephan.gerhold@linaro.org> wrote:
> > > >
> > > > On Tue, Mar 03, 2026 at 06:13:56PM +0530, Manivannan Sadhasivam wrote:
> > > > > On Mon, Mar 02, 2026 at 04:57:13PM +0100, Bartosz Golaszewski wrote:
> > > > > > NOTE: Please note that even though this is version 11, I changed the
> > > > > > prefix to RFC as this is an entirely new approach resulting from
> > > > > > discussions under v9. I AM AWARE of the existing memory leaks in the
> > > > > > last patch of this series - I'm sending it because I want to first
> > > > > > discuss the approach and get a green light from Vinod as well as Mani
> > > > > > and Bjorn. Especially when it comes to communicating the address for the
> > > > > > dummy rights from the client to the BAM driver.
> > > > > > /NOTE
> > > > > >
> > > > > > Currently the QCE crypto driver accesses the crypto engine registers
> > > > > > directly via CPU. Trust Zone may perform crypto operations simultaneously
> > > > > > resulting in a race condition. To remedy that, let's introduce support
> > > > > > for BAM locking/unlocking to the driver. The BAM driver will now wrap
> > > > > > any existing issued descriptor chains with additional descriptors
> > > > > > performing the locking when the client starts the transaction
> > > > > > (dmaengine_issue_pending()). The client wanting to profit from locking
> > > > > > needs to switch to performing register I/O over DMA and communicate the
> > > > > > address to which to perform the dummy writes via a call to
> > > > > > dmaengine_slave_config().
> > > > > >
> > > > >
> > > > > Thanks for moving the LOCK/UNLOCK bits out of client to the BAM driver. It looks
> > > > > neat now. I understand the limitation that for LOCK/UNLOCK, BAM needs to perform
> > > > > a dummy write to an address in the client register space. So in this case, you
> > > > > can also use the previous metadata approach to pass the scratchpad register to
> > > > > the BAM driver from clients. The BAM driver can use this register to perform
> > > > > LOCK/UNLOCK.
> > > > >
> > > > > It may sound like I'm suggesting a part of your previous design, but it fits the
> > > > > design more cleanly IMO. The BAM performs LOCK/UNLOCK on its own, but it gets
> > > > > the scratchpad register address from the clients through the metadata once.
> > > > >
> > > > > It is very unfortunate that the IP doesn't accept '0' address for LOCK/UNLOCK or
> > > > > some of them cannot append LOCK/UNLOCK to the actual CMD descriptors passed from
> > > > > the clients. These would've made the code/design even more cleaner.
> > > > >
> > > >
> > > > I was staring at the downstream drivers for QCE (qce50.c?) [1] for a bit
> > > > and my impression is that they manage to get along without dummy writes.
> > > > It's a big mess, but it looks like they always have some commands
> > > > (depending on the crypto operation) that they are sending anyway and
> > > > they just assign the LOCK/UNLOCK flag to the command descriptor of that.
> > > >
> > > > It is similar for the second relevant user of the LOCK/UNLOCK flags, the
> > > > QPIC NAND driver (msm_qpic_nand.c in downstream [2], qcom_nandc.c in
> > > > mainline), it is assigned as part of the register programming sequence
> > > > instead of using a dummy write. In addition, the UNLOCK flag is
> > > > sometimes assigned to a READ command descriptor rather than a WRITE.
> > > >
> > > > @Bartosz: Can we get by without doing any dummy writes?
> > > > If not, would a dummy read perhaps be less intrusive than a dummy write?
> > > >
> > >
> > > The HPG says that the LOCK/UNLOCK flag *must* be set on a command
> > > descriptor, not a data descriptor. For a simple encryption we will
> > > typically have a data descriptor and a command descriptor with
> > > register writes. So we need a command descriptor in front of the data
> > > and - while we could technically set the UNLOCK bit on the subsequent
> > > command descriptor - it's unclear from the HPG whether it will unlock
> > > before or after processing the command descriptor with the UNLOCK bit
> > > set. Hence the additional command descriptor at the end.
> > >
> >
> > I won't pretend that I actually understand what the downstream QCE
> > driver is doing, but e.g. qce_ablk_cipher_req() in the qce50.c I linked
> > looks like they just put the command descriptor with all the register
> > writes first and then the data second (followed by another command
> > descriptor for cleanup/unlocking). Is it actually required to put the
> > data first?
> >
> 
> Well, now you're getting into the philosophical issue of imposing
> requirements on the client which seemed to be the main point of
> contention in earlier versions. If you start requiring the client to
> put the DMA operations in a certain order (and it's not based on any
> HW requirement but rather on how the DMA driver is implemented) then
> how is it better than having the client just drive the locking
> altogether like pre v11? We won't get away without at least some
> requirements - like the client doing register I/O over DMA or
> providing the scratchpad address - but I think just wrapping the
> existing queue with additional descriptors in a way transparent to
> consumers is better in this case. And as I said: the HPG doesn't
> explicitly say that it unlocks the pipe *after* the descriptor with
> the unlock bit is processed. Doesn't even hint at what real the
> ordering is.
> 

Yes, I think the transparent approach here is reasonable given the
design of the Linux DMA engine API. Since Mani commented "It is very
unfortunate that the IP doesn't [...]" I mainly wanted to point out that
this is probably because the main use cases the HW designers had in mind
don't strictly require use of a dummy descriptor. The comment about the
dummy descriptors might be more of a side note than an actual
recommendation to implement it that way (otherwise, the downstream
drivers would likely use the dummy descriptor approach as well).

> > > The HPG also only mentions a write command and says nothing about a
> > > read. In any case: that's the least of the problems as switching to
> > > read doesn't solve the issue of passing the address of the scratchpad
> > > register.
> >
> > True.
> >
> > >
> > > So while some of this *may* just work, I would prefer to stick to what
> > > documentation says *will* work. :)
> > >
> >
> > Well, the question is if there is always a dummy register that can be
> > safely written (without causing any side effects). This will be always
> > just based on experiments, since the concept of a dummy write doesn't
> > seem to exist downstream (and I assume the documentation doesn't suggest
> > a specific register to use either).
> >
> 
> You'd think so but the HPG actually does use the word "dummy" to
> describe the write operation with lock/unlock bits set. Though it does
> not recommend any particular register to do it.
> 

I guess the documentation I'm looking at (8.7.3.4 BAM operation in the
public APQ8016E TRM) might be an excerpt from some older version of the
BAM HPG. Is also has a note about "dummy" command descriptors:

  "NOTE: Pipe locking and unlocking should appear only in
   command-descriptor. In case a lock is required on a data descriptor
   this can be implemented by a dummy command descriptor with
   lock/unlock bit asserted preceding/following the data descriptor."

This one doesn't make any difference between READ and WRITE command
descriptors (and both are documented in the chapter).

Personally, I would prefer using a read over a write if possible. Unless
you can confirm that the register used for the dummy write is actually
read-only *and* write-ignore, writing to the register is essentially
undefined behavior. It will probably do the right thing on most
platforms, but there could also be one out there where writing to the
register triggers an error or potentially even silently ends up writing
into another register. Register logic can be fun in practice, commit
e9a48ea4d90b ("irqchip/qcom-pdc: Workaround hardware register bug on
X1E80100") [1] is a good example of that. :')

Thanks,
Stephan

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a48ea4d90be251e0d057d41665745caccb0351

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O
  2026-03-05 16:15           ` Stephan Gerhold
@ 2026-03-06 11:07             ` Bartosz Golaszewski
  0 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-06 11:07 UTC (permalink / raw)
  To: Stephan Gerhold
  Cc: Bartosz Golaszewski, Manivannan Sadhasivam, Vinod Koul,
	Jonathan Corbet, Thara Gopinath, Herbert Xu, David S. Miller,
	Udit Tiwari, Md Sadre Alam, Dmitry Baryshkov, Peter Ujfalusi,
	Michal Simek, Frank Li, dmaengine, linux-doc, linux-kernel,
	linux-arm-msm, linux-crypto, linux-arm-kernel, Konrad Dybcio,
	Dmitry Baryshkov, Bjorn Andersson

On Thu, Mar 5, 2026 at 5:16 PM Stephan Gerhold
<stephan.gerhold@linaro.org> wrote:
>
> >
> > You'd think so but the HPG actually does use the word "dummy" to
> > describe the write operation with lock/unlock bits set. Though it does
> > not recommend any particular register to do it.
> >
>
> I guess the documentation I'm looking at (8.7.3.4 BAM operation in the
> public APQ8016E TRM) might be an excerpt from some older version of the
> BAM HPG. Is also has a note about "dummy" command descriptors:
>
>   "NOTE: Pipe locking and unlocking should appear only in
>    command-descriptor. In case a lock is required on a data descriptor
>    this can be implemented by a dummy command descriptor with
>    lock/unlock bit asserted preceding/following the data descriptor."
>
> This one doesn't make any difference between READ and WRITE command
> descriptors (and both are documented in the chapter).
>
> Personally, I would prefer using a read over a write if possible. Unless
> you can confirm that the register used for the dummy write is actually
> read-only *and* write-ignore, writing to the register is essentially
> undefined behavior. It will probably do the right thing on most
> platforms, but there could also be one out there where writing to the
> register triggers an error or potentially even silently ends up writing
> into another register. Register logic can be fun in practice, commit
> e9a48ea4d90b ("irqchip/qcom-pdc: Workaround hardware register bug on
> X1E80100") [1] is a good example of that. :')
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9a48ea4d90be251e0d057d41665745caccb0351

I agree in general but I also learned from the QCE team at Qualcomm
that apparently there were some issues with register reads over DMA,
which makes writes preferable. While the VERSION register is
officially read-only, I've been told writing to it is safe. So it's a
choice between doing a READ that may not work on some platforms and
doing a WRITE that may theoretically not work on some platforms.

I'll send v12 which will be a proper series with using register
metadata and correctly freeing resources and we can rediscuss. Doing
one or the other is actually a minor details of the whole thing after
all.

Bart

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking
  2026-03-04 15:27     ` Bartosz Golaszewski
@ 2026-03-07 20:33       ` Vinod Koul
  2026-03-09 17:05         ` Bartosz Golaszewski
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2026-03-07 20:33 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski

On 04-03-26, 16:27, Bartosz Golaszewski wrote:
> On Wed, Mar 4, 2026 at 3:53 PM Vinod Koul <vkoul@kernel.org> wrote:
> >
> > On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> > > Add support for BAM pipe locking. To that end: when starting the DMA on
> > > an RX channel - wrap the already issued descriptors with additional
> > > command descriptors performing dummy writes to the base register
> > > supplied by the client via dmaengine_slave_config() (if any) alongside
> > > the lock/unlock HW flags.
> > >
> > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> 
> [snip]
> 
> > > +static struct bam_async_desc *
> > > +bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
> > > +                struct bam_cmd_element *ce, unsigned int flag)
> > > +{
> > > +     struct dma_chan *chan = &bchan->vc.chan;
> > > +     struct bam_async_desc *async_desc;
> > > +     struct bam_desc_hw *desc;
> > > +     struct virt_dma_desc *vd;
> > > +     struct virt_dma_chan *vc;
> > > +     unsigned int mapped;
> > > +     dma_cookie_t cookie;
> > > +     int ret;
> > > +
> > > +     async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
> > > +     if (!async_desc) {
> > > +             dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
> > > +             return NULL;
> > > +     }
> > > +
> > > +     async_desc->num_desc = 1;
> > > +     async_desc->curr_desc = async_desc->desc;
> > > +     async_desc->dir = DMA_MEM_TO_DEV;
> > > +
> > > +     desc = async_desc->desc;
> > > +
> > > +     bam_prep_ce_le32(ce, bchan->slave.dst_addr, BAM_WRITE_COMMAND, 0);
> > > +     sg_set_buf(sg, ce, sizeof(*ce));
> > > +
> > > +     mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
> > > +     if (!mapped) {
> > > +             kfree(async_desc);
> > > +             return NULL;
> > > +     }
> > > +
> > > +     desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
> > > +     desc->addr = sg_dma_address(sg);
> > > +     desc->size = sizeof(struct bam_cmd_element);
> > > +
> > > +     vc = &bchan->vc;
> > > +     vd = &async_desc->vd;
> > > +
> > > +     dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
> > > +     vd->tx.flags = DMA_PREP_CMD;
> > > +     vd->tx.desc_free = vchan_tx_desc_free;
> > > +     vd->tx_result.result = DMA_TRANS_NOERROR;
> > > +     vd->tx_result.residue = 0;
> > > +
> > > +     cookie = dma_cookie_assign(&vd->tx);
> > > +     ret = dma_submit_error(cookie);
> >
> > I am not sure I understand this.
> >
> > At start you add a descriptor in the queue, ideally which should be
> > queued after the existing descriptors are completed!
> >
> > Also I thought you want to append Pipe cmd to descriptors, why not do
> > this while preparing the descriptors and add the pipe cmd and start and
> > end of the sequence when you prepare... This was you dont need to create
> > a cookie like this
> >
> 
> Client (in this case - crypto engine) can call
> dmaengine_prep_slave_sg() multiple times adding several logical
> descriptors which themselves can have several hardware descriptors. We
> want to lock the channel before issuing the first queued descriptor
> (for crypto: typically data descriptor) and unlock it once the final
> descriptor is processed (typically command descriptor). To that end:
> we insert the dummy command descriptor with the lock flag at the head
> of the queue and the one with the unlock flag at the tail - "wrapping"
> the existing queue with lock/unlock operations.

Why not do this per prep call submitted to the engine. It would be
simpler to just add lock and unlock to the start and end of transaction.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine
  2026-03-04 15:05     ` Bartosz Golaszewski
@ 2026-03-07 20:35       ` Vinod Koul
  0 siblings, 0 replies; 28+ messages in thread
From: Vinod Koul @ 2026-03-07 20:35 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski

On 04-03-26, 16:05, Bartosz Golaszewski wrote:
> On Wed, Mar 4, 2026 at 3:39 PM Vinod Koul <vkoul@kernel.org> wrote:
> >
> > On 02-03-26, 16:57, Bartosz Golaszewski wrote:
> > > In order to let the BAM DMA engine know which address is used for
> > > register I/O, call dmaengine_slave_config() after requesting the RX
> > > channel and use the config structure to pass that information to the
> > > dmaengine core. This is done ahead of extending the BAM driver with
> > > support for pipe locking, which requires performing dummy writes when
> > > passing the lock/unlock flags alongside the command descriptors.
> > >
> > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
> > > ---
> > >
> > >       dma->txchan = devm_dma_request_chan(dev, "tx");
> > >       if (IS_ERR(dma->txchan))
> > > @@ -121,6 +123,12 @@ int devm_qce_dma_request(struct qce_device *qce)
> > >               return dev_err_probe(dev, PTR_ERR(dma->rxchan),
> > >                                    "Failed to get RX DMA channel\n");
> > >
> > > +     cfg.dst_addr = qce->base_phys;
> > > +     cfg.direction = DMA_MEM_TO_DEV;
> >
> > So is this the address of crypto engine address where dma data is
> > supposed to be pushed to..?
> 
> No. In case I wasn't clear enough in the cover letter: this is the
> address of the *crypto engine* register which we use as a scratchpad
> for the dummy write when issuing the lock/unlock command. Mani
> suggested under the cover letter to use the descriptor metadata for
> that.

This is overloading of address field. If we go this was I would add a
comment in code explaining this and why in the setup the engine does not
need peripheral address. Meta data sounds okay as well.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking
  2026-03-07 20:33       ` Vinod Koul
@ 2026-03-09 17:05         ` Bartosz Golaszewski
  0 siblings, 0 replies; 28+ messages in thread
From: Bartosz Golaszewski @ 2026-03-09 17:05 UTC (permalink / raw)
  To: Vinod Koul
  Cc: Bartosz Golaszewski, Jonathan Corbet, Thara Gopinath, Herbert Xu,
	David S. Miller, Udit Tiwari, Daniel Perez-Zoghbi, Md Sadre Alam,
	Dmitry Baryshkov, Peter Ujfalusi, Michal Simek, Frank Li,
	dmaengine, linux-doc, linux-kernel, linux-arm-msm, linux-crypto,
	linux-arm-kernel, Bartosz Golaszewski, Bartosz Golaszewski

On Sat, 7 Mar 2026 21:33:22 +0100, Vinod Koul <vkoul@kernel.org> said:
> On 04-03-26, 16:27, Bartosz Golaszewski wrote:
>> On Wed, Mar 4, 2026 at 3:53 PM Vinod Koul <vkoul@kernel.org> wrote:
>> >
>> > On 02-03-26, 16:57, Bartosz Golaszewski wrote:
>> > > Add support for BAM pipe locking. To that end: when starting the DMA on
>> > > an RX channel - wrap the already issued descriptors with additional
>> > > command descriptors performing dummy writes to the base register
>> > > supplied by the client via dmaengine_slave_config() (if any) alongside
>> > > the lock/unlock HW flags.
>> > >
>> > > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
>>
>> [snip]
>>
>> > > +static struct bam_async_desc *
>> > > +bam_make_lock_desc(struct bam_chan *bchan, struct scatterlist *sg,
>> > > +                struct bam_cmd_element *ce, unsigned int flag)
>> > > +{
>> > > +     struct dma_chan *chan = &bchan->vc.chan;
>> > > +     struct bam_async_desc *async_desc;
>> > > +     struct bam_desc_hw *desc;
>> > > +     struct virt_dma_desc *vd;
>> > > +     struct virt_dma_chan *vc;
>> > > +     unsigned int mapped;
>> > > +     dma_cookie_t cookie;
>> > > +     int ret;
>> > > +
>> > > +     async_desc = kzalloc_flex(*async_desc, desc, 1, GFP_NOWAIT);
>> > > +     if (!async_desc) {
>> > > +             dev_err(bchan->bdev->dev, "failed to allocate the BAM lock descriptor\n");
>> > > +             return NULL;
>> > > +     }
>> > > +
>> > > +     async_desc->num_desc = 1;
>> > > +     async_desc->curr_desc = async_desc->desc;
>> > > +     async_desc->dir = DMA_MEM_TO_DEV;
>> > > +
>> > > +     desc = async_desc->desc;
>> > > +
>> > > +     bam_prep_ce_le32(ce, bchan->slave.dst_addr, BAM_WRITE_COMMAND, 0);
>> > > +     sg_set_buf(sg, ce, sizeof(*ce));
>> > > +
>> > > +     mapped = dma_map_sg_attrs(chan->slave, sg, 1, DMA_TO_DEVICE, DMA_PREP_CMD);
>> > > +     if (!mapped) {
>> > > +             kfree(async_desc);
>> > > +             return NULL;
>> > > +     }
>> > > +
>> > > +     desc->flags |= cpu_to_le16(DESC_FLAG_CMD | flag);
>> > > +     desc->addr = sg_dma_address(sg);
>> > > +     desc->size = sizeof(struct bam_cmd_element);
>> > > +
>> > > +     vc = &bchan->vc;
>> > > +     vd = &async_desc->vd;
>> > > +
>> > > +     dma_async_tx_descriptor_init(&vd->tx, &vc->chan);
>> > > +     vd->tx.flags = DMA_PREP_CMD;
>> > > +     vd->tx.desc_free = vchan_tx_desc_free;
>> > > +     vd->tx_result.result = DMA_TRANS_NOERROR;
>> > > +     vd->tx_result.residue = 0;
>> > > +
>> > > +     cookie = dma_cookie_assign(&vd->tx);
>> > > +     ret = dma_submit_error(cookie);
>> >
>> > I am not sure I understand this.
>> >
>> > At start you add a descriptor in the queue, ideally which should be
>> > queued after the existing descriptors are completed!
>> >
>> > Also I thought you want to append Pipe cmd to descriptors, why not do
>> > this while preparing the descriptors and add the pipe cmd and start and
>> > end of the sequence when you prepare... This was you dont need to create
>> > a cookie like this
>> >
>>
>> Client (in this case - crypto engine) can call
>> dmaengine_prep_slave_sg() multiple times adding several logical
>> descriptors which themselves can have several hardware descriptors. We
>> want to lock the channel before issuing the first queued descriptor
>> (for crypto: typically data descriptor) and unlock it once the final
>> descriptor is processed (typically command descriptor). To that end:
>> we insert the dummy command descriptor with the lock flag at the head
>> of the queue and the one with the unlock flag at the tail - "wrapping"
>> the existing queue with lock/unlock operations.
>
> Why not do this per prep call submitted to the engine. It would be
> simpler to just add lock and unlock to the start and end of transaction.
>

Becuase then we'd have:

  [LOCK] [DATA] [UNLOCK] [LOCK] [CMD] [UNLOCK]

while what we want is:

  [LOCK] [DATA] [CMD] [UNLOCK]

Bartosz

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-03-09 17:06 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-02 15:57 [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 01/12] crypto: qce - Include algapi.h in the core.h header Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 02/12] crypto: qce - Remove unused ignore_buf Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 03/12] crypto: qce - Simplify arguments of devm_qce_dma_request() Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 04/12] crypto: qce - Use existing devres APIs in devm_qce_dma_request() Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 05/12] crypto: qce - Map crypto memory for DMA Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 06/12] crypto: qce - Add BAM DMA support for crypto register I/O Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 07/12] crypto: qce - Communicate the base physical address to the dmaengine Bartosz Golaszewski
2026-03-04 14:39   ` Vinod Koul
2026-03-04 15:05     ` Bartosz Golaszewski
2026-03-07 20:35       ` Vinod Koul
2026-03-02 15:57 ` [PATCH RFC v11 08/12] dmaengine: constify struct dma_descriptor_metadata_ops Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 09/12] dmaengine: qcom: bam_dma: convert tasklet to a BH workqueue Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 10/12] dmaengine: qcom: bam_dma: Extend the driver's device match data Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 11/12] dmaengine: qcom: bam_dma: Add pipe_lock_supported flag support Bartosz Golaszewski
2026-03-02 15:57 ` [PATCH RFC v11 12/12] dmaengine: qcom: bam_dma: add support for BAM locking Bartosz Golaszewski
2026-03-04 14:53   ` Vinod Koul
2026-03-04 15:27     ` Bartosz Golaszewski
2026-03-07 20:33       ` Vinod Koul
2026-03-09 17:05         ` Bartosz Golaszewski
2026-03-03 12:43 ` [PATCH RFC v11 00/12] crypto/dmaengine: qce: introduce BAM locking and use DMA for register I/O Manivannan Sadhasivam
2026-03-03 12:56   ` Bartosz Golaszewski
2026-03-05 11:59   ` Stephan Gerhold
2026-03-05 13:10     ` Bartosz Golaszewski
2026-03-05 13:43       ` Stephan Gerhold
2026-03-05 13:54         ` Bartosz Golaszewski
2026-03-05 16:15           ` Stephan Gerhold
2026-03-06 11:07             ` Bartosz Golaszewski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox