linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
@ 2025-06-03 12:37 T Pratham
  2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: T Pratham @ 2025-06-03 12:37 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: T Pratham, linux-crypto, devicetree, linux-kernel,
	Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a
new crypto accelerator which contains multiple crypto IPs [1].
This series implements support for ECB and CBC modes of AES for the AES
Engine of the DTHE, using skcipher APIs of the kernel.

Tested with:
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y

and tcrypt,
sudo modprobe tcrypt mode=500 sec=1

Signed-off-by: T Pratham <t-pratham@ti.com>
---
[1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE)
Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf

Change log:
v5:
 - Simplified tfm ctx struct
 - Set cra_reqsize instead of using crypto_skcipher_set_reqsize()
 - Move setting sysconfig and irqenable registers to dthe_aes_run
v4:
 - Corrected dt-bindings example indentation
 - Simplified dt-bindings example, removing the node surrounding crypto
 - Fixed typo in dthev2-common.h header guard
 - Removed unused ctx field in dev_data struct
 - Moved per-op data into request context
v3:
 - Corrected dt-bindings reg length is too long error
 - Converted AES driver code to use crypto_engine APIs for using
   internal crypto queue instead of mutex.
 - Removed calls to skcipher_request_complete in paths not returning
   -EINPROGRESS before.
 - Added missing KConfig import, which was accidentally removed in v2.

v2:
 - Corrected dt-bindings syntax errors and other review comments in v1.
 - Completely changed driver code structure, splitting code into
   multiple files

Link to previous versions:
v4: https://lore.kernel.org/all/20250508101723.846210-2-t-pratham@ti.com/
v3: https://lore.kernel.org/all/20250502121253.456974-2-t-pratham@ti.com/
v2: https://lore.kernel.org/all/20250411091321.2925308-1-t-pratham@ti.com/
v1: https://lore.kernel.org/all/20250206-dthe-v2-aes-v1-0-1e86cf683928@ti.com/
---

T Pratham (2):
  dt-bindings: crypto: Add binding for TI DTHE V2
  crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)

 .../bindings/crypto/ti,am62l-dthev2.yaml      |  50 +++
 MAINTAINERS                                   |   7 +
 drivers/crypto/Kconfig                        |   1 +
 drivers/crypto/Makefile                       |   1 +
 drivers/crypto/ti/Kconfig                     |  13 +
 drivers/crypto/ti/Makefile                    |   3 +
 drivers/crypto/ti/dthev2-aes.c                | 420 ++++++++++++++++++
 drivers/crypto/ti/dthev2-common.c             | 220 +++++++++
 drivers/crypto/ti/dthev2-common.h             | 101 +++++
 9 files changed, 816 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml
 create mode 100644 drivers/crypto/ti/Kconfig
 create mode 100644 drivers/crypto/ti/Makefile
 create mode 100644 drivers/crypto/ti/dthev2-aes.c
 create mode 100644 drivers/crypto/ti/dthev2-common.c
 create mode 100644 drivers/crypto/ti/dthev2-common.h

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2
  2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham
@ 2025-06-03 12:37 ` T Pratham
  2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
  2025-06-17  4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers
  2 siblings, 0 replies; 16+ messages in thread
From: T Pratham @ 2025-06-03 12:37 UTC (permalink / raw)
  To: T Pratham, Herbert Xu, David S. Miller, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley
  Cc: Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry, Krzysztof Kozlowski, linux-crypto, devicetree,
	linux-kernel

Add DT binding for Texas Instruments DTHE V2 crypto accelerator.

DTHE V2 is introduced as a part of TI AM62L SoC and can currently be
only found in it.

Signed-off-by: T Pratham <t-pratham@ti.com>
Reviewed-By: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
 .../bindings/crypto/ti,am62l-dthev2.yaml      | 50 +++++++++++++++++++
 MAINTAINERS                                   |  6 +++
 2 files changed, 56 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml

diff --git a/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml b/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml
new file mode 100644
index 000000000000..5486bfeb2fe8
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml
@@ -0,0 +1,50 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/crypto/ti,am62l-dthev2.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: K3 SoC DTHE V2 crypto module
+
+maintainers:
+  - T Pratham <t-pratham@ti.com>
+
+properties:
+  compatible:
+    enum:
+      - ti,am62l-dthev2
+
+  reg:
+    maxItems: 1
+
+  dmas:
+    items:
+      - description: AES Engine RX DMA Channel
+      - description: AES Engine TX DMA Channel
+      - description: SHA Engine TX DMA Channel
+
+  dma-names:
+    items:
+      - const: rx
+      - const: tx1
+      - const: tx2
+
+required:
+  - compatible
+  - reg
+  - dmas
+  - dma-names
+
+additionalProperties: false
+
+examples:
+  - |
+    crypto@40800000 {
+        compatible = "ti,am62l-dthev2";
+        reg = <0x40800000 0x10000>;
+
+        dmas = <&main_bcdma 0 0 0x4700 0>,
+               <&main_bcdma 0 0 0xc701 0>,
+               <&main_bcdma 0 0 0xc700 0>;
+        dma-names = "rx", "tx1", "tx2";
+    };
diff --git a/MAINTAINERS b/MAINTAINERS
index b119ae6b5f14..b990850b4994 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24552,6 +24552,12 @@ S:	Odd Fixes
 F:	drivers/clk/ti/
 F:	include/linux/clk/ti.h
 
+TI DATA TRANSFORM AND HASHING ENGINE (DTHE) V2 CRYPTO ACCELERATOR DRIVER
+M:	T Pratham <t-pratham@ti.com>
+L:	linux-crypto@vger.kernel.org
+S:	Supported
+F:	Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml
+
 TI DAVINCI MACHINE SUPPORT
 M:	Bartosz Golaszewski <brgl@bgdev.pl>
 L:	linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham
  2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham
@ 2025-06-03 12:37 ` T Pratham
  2025-06-11  9:31   ` Herbert Xu
                     ` (2 more replies)
  2025-06-17  4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers
  2 siblings, 3 replies; 16+ messages in thread
From: T Pratham @ 2025-06-03 12:37 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, T Pratham
  Cc: Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry, linux-kernel, linux-crypto

Add support for ECB and CBC modes in the AES Engine of the DTHE V2
hardware crypto accelerator.

Signed-off-by: T Pratham <t-pratham@ti.com>
---
 MAINTAINERS                       |   1 +
 drivers/crypto/Kconfig            |   1 +
 drivers/crypto/Makefile           |   1 +
 drivers/crypto/ti/Kconfig         |  13 +
 drivers/crypto/ti/Makefile        |   3 +
 drivers/crypto/ti/dthev2-aes.c    | 420 ++++++++++++++++++++++++++++++
 drivers/crypto/ti/dthev2-common.c | 220 ++++++++++++++++
 drivers/crypto/ti/dthev2-common.h | 101 +++++++
 8 files changed, 760 insertions(+)
 create mode 100644 drivers/crypto/ti/Kconfig
 create mode 100644 drivers/crypto/ti/Makefile
 create mode 100644 drivers/crypto/ti/dthev2-aes.c
 create mode 100644 drivers/crypto/ti/dthev2-common.c
 create mode 100644 drivers/crypto/ti/dthev2-common.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b990850b4994..25a42e0b19e4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -24557,6 +24557,7 @@ M:	T Pratham <t-pratham@ti.com>
 L:	linux-crypto@vger.kernel.org
 S:	Supported
 F:	Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml
+F:	drivers/crypto/ti/
 
 TI DAVINCI MACHINE SUPPORT
 M:	Bartosz Golaszewski <brgl@bgdev.pl>
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 5686369779be..c1407236f137 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -849,5 +849,6 @@ config CRYPTO_DEV_SA2UL
 source "drivers/crypto/aspeed/Kconfig"
 source "drivers/crypto/starfive/Kconfig"
 source "drivers/crypto/inside-secure/eip93/Kconfig"
+source "drivers/crypto/ti/Kconfig"
 
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 22eadcc8f4a2..c0e8d4acc37c 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -48,3 +48,4 @@ obj-$(CONFIG_CRYPTO_DEV_AMLOGIC_GXL) += amlogic/
 obj-y += intel/
 obj-y += starfive/
 obj-y += cavium/
+obj-$(CONFIG_ARCH_K3) += ti/
diff --git a/drivers/crypto/ti/Kconfig b/drivers/crypto/ti/Kconfig
new file mode 100644
index 000000000000..575e09fa7d96
--- /dev/null
+++ b/drivers/crypto/ti/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config CRYPTO_DEV_TI_DTHEV2
+	tristate "Support for TI DTHE V2 crypto accelerators"
+	depends on CRYPTO && CRYPTO_HW && ARCH_K3
+	select CRYPTO_ENGINE
+	select CRYPTO_SKCIPHER
+	select CRYPTO_ECB
+	select CRYPTO_CBC
+	help
+	  This enables support for the TI DTHE V2 hw crypto accelerator
+	  which can be found on TI K3 SOCs. Selecting this enables use
+	  of hardware acceleration for cryptographic algorithms on
+	  these devices.
diff --git a/drivers/crypto/ti/Makefile b/drivers/crypto/ti/Makefile
new file mode 100644
index 000000000000..b883078f203d
--- /dev/null
+++ b/drivers/crypto/ti/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_CRYPTO_DEV_TI_DTHEV2) += dthev2.o
+dthev2-objs := dthev2-common.o dthev2-aes.o
diff --git a/drivers/crypto/ti/dthev2-aes.c b/drivers/crypto/ti/dthev2-aes.c
new file mode 100644
index 000000000000..7e02797b7b14
--- /dev/null
+++ b/drivers/crypto/ti/dthev2-aes.c
@@ -0,0 +1,420 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * K3 DTHE V2 crypto accelerator driver
+ *
+ * Copyright (C) Texas Instruments 2025 - https://www.ti.com
+ * Author: T Pratham <t-pratham@ti.com>
+ */
+
+#include <crypto/aead.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/engine.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/skcipher.h>
+
+#include "dthev2-common.h"
+
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/scatterlist.h>
+
+/* Registers */
+
+// AES Engine
+#define DTHE_P_AES_BASE		0x7000
+#define DTHE_P_AES_KEY1_0	0x0038
+#define DTHE_P_AES_KEY1_1	0x003C
+#define DTHE_P_AES_KEY1_2	0x0030
+#define DTHE_P_AES_KEY1_3	0x0034
+#define DTHE_P_AES_KEY1_4	0x0028
+#define DTHE_P_AES_KEY1_5	0x002C
+#define DTHE_P_AES_KEY1_6	0x0020
+#define DTHE_P_AES_KEY1_7	0x0024
+#define DTHE_P_AES_IV_IN_0	0x0040
+#define DTHE_P_AES_IV_IN_1	0x0044
+#define DTHE_P_AES_IV_IN_2	0x0048
+#define DTHE_P_AES_IV_IN_3	0x004C
+#define DTHE_P_AES_CTRL		0x0050
+#define DTHE_P_AES_C_LENGTH_0	0x0054
+#define DTHE_P_AES_C_LENGTH_1	0x0058
+#define DTHE_P_AES_AUTH_LENGTH	0x005C
+#define DTHE_P_AES_DATA_IN_OUT	0x0060
+
+#define DTHE_P_AES_SYSCONFIG	0x0084
+#define DTHE_P_AES_IRQSTATUS	0x008C
+#define DTHE_P_AES_IRQENABLE	0x0090
+
+/* Register write values and macros */
+
+enum aes_ctrl_mode_masks {
+	AES_CTRL_ECB_MASK = 0x00,
+	AES_CTRL_CBC_MASK = BIT(5),
+};
+
+#define DTHE_AES_CTRL_MODE_CLEAR_MASK		~GENMASK(28, 5)
+
+#define DTHE_AES_CTRL_DIR_ENC			BIT(2)
+
+#define DTHE_AES_CTRL_KEYSIZE_16B		BIT(3)
+#define DTHE_AES_CTRL_KEYSIZE_24B		BIT(4)
+#define DTHE_AES_CTRL_KEYSIZE_32B		(BIT(3) | BIT(4))
+
+#define DTHE_AES_CTRL_SAVE_CTX_SET		BIT(29)
+
+#define DTHE_AES_CTRL_OUTPUT_READY		BIT_MASK(0)
+#define DTHE_AES_CTRL_INPUT_READY		BIT_MASK(1)
+#define DTHE_AES_CTRL_SAVED_CTX_READY		BIT_MASK(30)
+#define DTHE_AES_CTRL_CTX_READY			BIT_MASK(31)
+
+#define DTHE_AES_SYSCONFIG_DMA_DATA_IN_OUT_EN	GENMASK(6, 5)
+#define DTHE_AES_IRQENABLE_EN_ALL		GENMASK(3, 0)
+
+/* Misc */
+#define AES_IV_SIZE				AES_BLOCK_SIZE
+#define AES_BLOCK_WORDS				(AES_BLOCK_SIZE / sizeof(u32))
+#define AES_IV_WORDS				AES_BLOCK_WORDS
+
+static int dthe_cipher_init_tfm(struct crypto_skcipher *tfm)
+{
+	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct dthe_data *dev_data = dthe_get_dev(ctx);
+
+	memzero_explicit(ctx, sizeof(*ctx));
+	ctx->dev_data = dev_data;
+	ctx->keylen = 0;
+
+	return 0;
+}
+
+static void dthe_cipher_exit_tfm(struct crypto_skcipher *tfm)
+{
+	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	ctx->keylen = 0;
+}
+
+static int dthe_aes_setkey(struct crypto_skcipher *tfm, const u8 *key, unsigned int keylen)
+{
+	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+	if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_192 && keylen != AES_KEYSIZE_256)
+		return -EINVAL;
+
+	ctx->keylen = keylen;
+	memcpy(ctx->key, key, keylen);
+
+	return 0;
+}
+
+static void dthe_aes_set_ctrl_key(struct dthe_tfm_ctx *ctx,
+				  struct dthe_aes_req_ctx *rctx,
+				  u32 *iv_in)
+{
+	struct dthe_data *dev_data = dthe_get_dev(ctx);
+	void __iomem *aes_base_reg = dev_data->regs + DTHE_P_AES_BASE;
+	u32 ctrl_val = 0;
+
+	writel_relaxed(ctx->key[0], aes_base_reg + DTHE_P_AES_KEY1_0);
+	writel_relaxed(ctx->key[1], aes_base_reg + DTHE_P_AES_KEY1_1);
+	writel_relaxed(ctx->key[2], aes_base_reg + DTHE_P_AES_KEY1_2);
+	writel_relaxed(ctx->key[3], aes_base_reg + DTHE_P_AES_KEY1_3);
+
+	if (ctx->keylen > AES_KEYSIZE_128) {
+		writel_relaxed(ctx->key[4], aes_base_reg + DTHE_P_AES_KEY1_4);
+		writel_relaxed(ctx->key[5], aes_base_reg + DTHE_P_AES_KEY1_5);
+	}
+	if (ctx->keylen == AES_KEYSIZE_256) {
+		writel_relaxed(ctx->key[6], aes_base_reg + DTHE_P_AES_KEY1_6);
+		writel_relaxed(ctx->key[7], aes_base_reg + DTHE_P_AES_KEY1_7);
+	}
+
+	if (rctx->enc)
+		ctrl_val |= DTHE_AES_CTRL_DIR_ENC;
+
+	if (ctx->keylen == AES_KEYSIZE_128)
+		ctrl_val |= DTHE_AES_CTRL_KEYSIZE_16B;
+	else if (ctx->keylen == AES_KEYSIZE_192)
+		ctrl_val |= DTHE_AES_CTRL_KEYSIZE_24B;
+	else
+		ctrl_val |= DTHE_AES_CTRL_KEYSIZE_32B;
+
+	// Write AES mode
+	ctrl_val &= DTHE_AES_CTRL_MODE_CLEAR_MASK;
+	switch (rctx->mode) {
+	case DTHE_AES_ECB:
+		ctrl_val |= AES_CTRL_ECB_MASK;
+		break;
+	case DTHE_AES_CBC:
+		ctrl_val |= AES_CTRL_CBC_MASK;
+		break;
+	}
+
+	if (iv_in) {
+		ctrl_val |= DTHE_AES_CTRL_SAVE_CTX_SET;
+		for (int i = 0; i < AES_IV_WORDS; ++i)
+			writel_relaxed(iv_in[i],
+				       aes_base_reg + DTHE_P_AES_IV_IN_0 + (DTHE_REG_SIZE * i));
+	}
+
+	writel_relaxed(ctrl_val, aes_base_reg + DTHE_P_AES_CTRL);
+}
+
+static void dthe_aes_dma_in_callback(void *data)
+{
+	struct skcipher_request *req = (struct skcipher_request *)data;
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	complete(&rctx->aes_compl);
+}
+
+static int dthe_aes_run(struct crypto_engine *engine, void *areq)
+{
+	struct skcipher_request *req = container_of(areq, struct skcipher_request, base);
+	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
+	struct dthe_data *dev_data = dthe_get_dev(ctx);
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	unsigned int len = req->cryptlen;
+	struct scatterlist *src = req->src;
+	struct scatterlist *dst = req->dst;
+
+	int src_nents = sg_nents_for_len(src, len);
+	int dst_nents;
+
+	int src_mapped_nents;
+	int dst_mapped_nents;
+
+	bool diff_dst;
+	enum dma_data_direction src_dir, dst_dir;
+
+	struct device *tx_dev, *rx_dev;
+	struct dma_async_tx_descriptor *desc_in, *desc_out;
+
+	int ret;
+
+	void __iomem *aes_base_reg = dev_data->regs + DTHE_P_AES_BASE;
+
+	u32 aes_irqenable_val = readl_relaxed(aes_base_reg + DTHE_P_AES_IRQENABLE);
+	u32 aes_sysconfig_val = readl_relaxed(aes_base_reg + DTHE_P_AES_SYSCONFIG);
+
+	aes_sysconfig_val |= DTHE_AES_SYSCONFIG_DMA_DATA_IN_OUT_EN;
+	writel_relaxed(aes_sysconfig_val, aes_base_reg + DTHE_P_AES_SYSCONFIG);
+
+	aes_irqenable_val |= DTHE_AES_IRQENABLE_EN_ALL;
+	writel_relaxed(aes_irqenable_val, aes_base_reg + DTHE_P_AES_IRQENABLE);
+
+	if (src == dst) {
+		diff_dst = false;
+		src_dir = DMA_BIDIRECTIONAL;
+		dst_dir = DMA_BIDIRECTIONAL;
+	} else {
+		diff_dst = true;
+		src_dir = DMA_TO_DEVICE;
+		dst_dir  = DMA_FROM_DEVICE;
+	}
+
+	tx_dev = dmaengine_get_dma_device(dev_data->dma_aes_tx);
+	rx_dev = dmaengine_get_dma_device(dev_data->dma_aes_rx);
+
+	src_mapped_nents = dma_map_sg(tx_dev, src, src_nents, src_dir);
+	if (src_mapped_nents == 0) {
+		ret = -EINVAL;
+		goto aes_err;
+	}
+
+	if (!diff_dst) {
+		dst_nents = src_nents;
+		dst_mapped_nents = src_mapped_nents;
+	} else {
+		dst_nents = sg_nents_for_len(dst, len);
+		dst_mapped_nents = dma_map_sg(rx_dev, dst, dst_nents, dst_dir);
+		if (dst_mapped_nents == 0) {
+			dma_unmap_sg(tx_dev, src, src_nents, src_dir);
+			ret = -EINVAL;
+			goto aes_err;
+		}
+	}
+
+	desc_in = dmaengine_prep_slave_sg(dev_data->dma_aes_rx, dst, dst_mapped_nents,
+					  DMA_DEV_TO_MEM, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	if (!desc_in) {
+		dev_err(dev_data->dev, "IN prep_slave_sg() failed\n");
+		ret = -EINVAL;
+		goto aes_prep_err;
+	}
+
+	desc_out = dmaengine_prep_slave_sg(dev_data->dma_aes_tx, src, src_mapped_nents,
+					   DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+	if (!desc_out) {
+		dev_err(dev_data->dev, "OUT prep_slave_sg() failed\n");
+		ret = -EINVAL;
+		goto aes_prep_err;
+	}
+
+	desc_in->callback = dthe_aes_dma_in_callback;
+	desc_in->callback_param = req;
+
+	init_completion(&rctx->aes_compl);
+
+	if (rctx->mode == DTHE_AES_ECB)
+		dthe_aes_set_ctrl_key(ctx, rctx, NULL);
+	else
+		dthe_aes_set_ctrl_key(ctx, rctx, (u32 *)req->iv);
+
+	writel_relaxed(req->cryptlen, aes_base_reg + DTHE_P_AES_C_LENGTH_0);
+
+	dmaengine_submit(desc_in);
+	dmaengine_submit(desc_out);
+
+	dma_async_issue_pending(dev_data->dma_aes_rx);
+	dma_async_issue_pending(dev_data->dma_aes_tx);
+
+	// Need to do a timeout to ensure finalise gets called if DMA callback fails for any reason
+	ret = wait_for_completion_timeout(&rctx->aes_compl, msecs_to_jiffies(DTHE_DMA_TIMEOUT_MS));
+	if (!ret) {
+		ret = -ETIMEDOUT;
+
+		for (int i = 0; i < AES_BLOCK_WORDS; ++i)
+			readl_relaxed(aes_base_reg + DTHE_P_AES_DATA_IN_OUT + (DTHE_REG_SIZE * i));
+	} else {
+		ret = 0;
+	}
+
+	// For modes other than ECB, read IV_OUT
+	if (rctx->mode != DTHE_AES_ECB) {
+		u32 *iv_out = (u32 *)req->iv;
+
+		for (int i = 0; i < AES_IV_WORDS; ++i)
+			iv_out[i] = readl_relaxed(aes_base_reg +
+						  DTHE_P_AES_IV_IN_0 +
+						  (DTHE_REG_SIZE * i));
+	}
+
+aes_prep_err:
+	dma_unmap_sg(tx_dev, src, src_nents, src_dir);
+	if (dst_dir != DMA_BIDIRECTIONAL)
+		dma_unmap_sg(rx_dev, dst, dst_nents, dst_dir);
+
+aes_err:
+	local_bh_disable();
+	crypto_finalize_skcipher_request(dev_data->engine, req, ret);
+	local_bh_enable();
+	return 0;
+}
+
+static int dthe_aes_crypt(struct skcipher_request *req)
+{
+	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(crypto_skcipher_reqtfm(req));
+	struct dthe_data *dev_data = dthe_get_dev(ctx);
+	struct crypto_engine *engine;
+
+	/*
+	 * If data is not a multiple of AES_BLOCK_SIZE, need to return -EINVAL
+	 * If data length input is zero, no need to do any operation.
+	 */
+	if (req->cryptlen % AES_BLOCK_SIZE)
+		return -EINVAL;
+
+	if (req->cryptlen == 0)
+		return 0;
+
+	engine = dev_data->engine;
+	return crypto_transfer_skcipher_request_to_engine(engine, req);
+}
+
+static int dthe_aes_ecb_encrypt(struct skcipher_request *req)
+{
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	rctx->enc = 1;
+	rctx->mode = DTHE_AES_ECB;
+	return dthe_aes_crypt(req);
+}
+
+static int dthe_aes_ecb_decrypt(struct skcipher_request *req)
+{
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	rctx->enc = 0;
+	rctx->mode = DTHE_AES_ECB;
+	return dthe_aes_crypt(req);
+}
+
+static int dthe_aes_cbc_encrypt(struct skcipher_request *req)
+{
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	rctx->enc = 1;
+	rctx->mode = DTHE_AES_CBC;
+	return dthe_aes_crypt(req);
+}
+
+static int dthe_aes_cbc_decrypt(struct skcipher_request *req)
+{
+	struct dthe_aes_req_ctx *rctx = skcipher_request_ctx(req);
+
+	rctx->enc = 0;
+	rctx->mode = DTHE_AES_CBC;
+	return dthe_aes_crypt(req);
+}
+
+static struct skcipher_engine_alg cipher_algs[] = {
+	{
+		.base.init			= dthe_cipher_init_tfm,
+		.base.exit			= dthe_cipher_exit_tfm,
+		.base.setkey			= dthe_aes_setkey,
+		.base.encrypt			= dthe_aes_ecb_encrypt,
+		.base.decrypt			= dthe_aes_ecb_decrypt,
+		.base.min_keysize		= AES_MIN_KEY_SIZE,
+		.base.max_keysize		= AES_MAX_KEY_SIZE,
+		.base.base = {
+			.cra_name		= "ecb(aes)",
+			.cra_driver_name	= "ecb-aes-dthev2",
+			.cra_priority		= 30000,
+			.cra_flags		= CRYPTO_ALG_TYPE_SKCIPHER |
+						  CRYPTO_ALG_KERN_DRIVER_ONLY,
+			.cra_alignmask		= AES_BLOCK_SIZE - 1,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= sizeof(struct dthe_tfm_ctx),
+			.cra_reqsize		= sizeof(struct dthe_aes_req_ctx),
+			.cra_module		= THIS_MODULE,
+		},
+		.op.do_one_request = dthe_aes_run,
+	}, /* ECB AES */
+	{
+		.base.init			= dthe_cipher_init_tfm,
+		.base.exit			= dthe_cipher_exit_tfm,
+		.base.setkey			= dthe_aes_setkey,
+		.base.encrypt			= dthe_aes_cbc_encrypt,
+		.base.decrypt			= dthe_aes_cbc_decrypt,
+		.base.min_keysize		= AES_MIN_KEY_SIZE,
+		.base.max_keysize		= AES_MAX_KEY_SIZE,
+		.base.ivsize			= AES_IV_SIZE,
+		.base.base = {
+			.cra_name		= "cbc(aes)",
+			.cra_driver_name	= "cbc-aes-dthev2",
+			.cra_priority		= 30000,
+			.cra_flags		= CRYPTO_ALG_TYPE_SKCIPHER |
+						  CRYPTO_ALG_KERN_DRIVER_ONLY,
+			.cra_alignmask		= AES_BLOCK_SIZE - 1,
+			.cra_blocksize		= AES_BLOCK_SIZE,
+			.cra_ctxsize		= sizeof(struct dthe_tfm_ctx),
+			.cra_reqsize		= sizeof(struct dthe_aes_req_ctx),
+			.cra_module		= THIS_MODULE,
+		},
+		.op.do_one_request = dthe_aes_run,
+	} /* CBC AES */
+};
+
+int dthe_register_aes_algs(void)
+{
+	return crypto_engine_register_skciphers(cipher_algs, ARRAY_SIZE(cipher_algs));
+}
+
+void dthe_unregister_aes_algs(void)
+{
+	crypto_engine_unregister_skciphers(cipher_algs, ARRAY_SIZE(cipher_algs));
+}
diff --git a/drivers/crypto/ti/dthev2-common.c b/drivers/crypto/ti/dthev2-common.c
new file mode 100644
index 000000000000..4d5b59d3ef1f
--- /dev/null
+++ b/drivers/crypto/ti/dthev2-common.c
@@ -0,0 +1,220 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * K3 DTHE V2 crypto accelerator driver
+ *
+ * Copyright (C) Texas Instruments 2025 - https://www.ti.com
+ * Author: T Pratham <t-pratham@ti.com>
+ */
+
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/engine.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/skcipher.h>
+
+#include "dthev2-common.h"
+
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dmapool.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/platform_device.h>
+#include <linux/scatterlist.h>
+
+#define DRIVER_NAME	"dthev2"
+
+static struct dthe_list dthe_dev_list = {
+	.dev_list = LIST_HEAD_INIT(dthe_dev_list.dev_list),
+	.lock = __SPIN_LOCK_UNLOCKED(dthe_dev_list.lock),
+};
+
+struct dthe_data *dthe_get_dev(struct dthe_tfm_ctx *ctx)
+{
+	struct dthe_data *dev_data;
+
+	if (ctx->dev_data)
+		return ctx->dev_data;
+
+	spin_lock_bh(&dthe_dev_list.lock);
+	dev_data = list_first_entry(&dthe_dev_list.dev_list, struct dthe_data, list);
+	if (dev_data)
+		list_move_tail(&dev_data->list, &dthe_dev_list.dev_list);
+	spin_unlock_bh(&dthe_dev_list.lock);
+
+	return dev_data;
+}
+
+static int dthe_dma_init(struct dthe_data *dev_data)
+{
+	int ret;
+	struct dma_slave_config cfg;
+
+	dev_data->dma_aes_rx = NULL;
+	dev_data->dma_aes_tx = NULL;
+	dev_data->dma_sha_tx = NULL;
+
+	dev_data->dma_aes_rx = dma_request_chan(dev_data->dev, "rx");
+	if (IS_ERR(dev_data->dma_aes_rx)) {
+		return dev_err_probe(dev_data->dev, PTR_ERR(dev_data->dma_aes_rx),
+				     "Unable to request rx DMA channel\n");
+	}
+
+	dev_data->dma_aes_tx = dma_request_chan(dev_data->dev, "tx1");
+	if (IS_ERR(dev_data->dma_aes_tx)) {
+		ret = dev_err_probe(dev_data->dev, PTR_ERR(dev_data->dma_aes_tx),
+				    "Unable to request tx1 DMA channel\n");
+		goto err_dma_aes_tx;
+	}
+
+	dev_data->dma_sha_tx = dma_request_chan(dev_data->dev, "tx2");
+	if (IS_ERR(dev_data->dma_sha_tx)) {
+		ret = dev_err_probe(dev_data->dev, PTR_ERR(dev_data->dma_sha_tx),
+				    "Unable to request tx2 DMA channel\n");
+		goto err_dma_sha_tx;
+	}
+
+	memzero_explicit(&cfg, sizeof(cfg));
+
+	cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.src_maxburst = 4;
+
+	ret = dmaengine_slave_config(dev_data->dma_aes_rx, &cfg);
+	if (ret) {
+		dev_err(dev_data->dev, "Can't configure IN dmaengine slave: %d\n", ret);
+		goto err_dma_config;
+	}
+
+	cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.dst_maxburst = 4;
+
+	ret = dmaengine_slave_config(dev_data->dma_aes_tx, &cfg);
+	if (ret) {
+		dev_err(dev_data->dev, "Can't configure OUT dmaengine slave: %d\n", ret);
+		goto err_dma_config;
+	}
+
+	return 0;
+
+err_dma_config:
+	dma_release_channel(dev_data->dma_sha_tx);
+err_dma_sha_tx:
+	dma_release_channel(dev_data->dma_aes_tx);
+err_dma_aes_tx:
+	dma_release_channel(dev_data->dma_aes_rx);
+
+	return ret;
+}
+
+static int dthe_register_algs(void)
+{
+	return dthe_register_aes_algs();
+}
+
+static void dthe_unregister_algs(void)
+{
+	dthe_unregister_aes_algs();
+}
+
+static int dthe_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct dthe_data *dev_data;
+	int ret;
+
+	dev_data = devm_kzalloc(dev, sizeof(*dev_data), GFP_KERNEL);
+	if (!dev_data)
+		return -ENOMEM;
+
+	dev_data->dev = dev;
+	dev_data->regs = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(dev_data->regs))
+		return PTR_ERR(dev_data->regs);
+
+	platform_set_drvdata(pdev, dev_data);
+
+	spin_lock(&dthe_dev_list.lock);
+	list_add(&dev_data->list, &dthe_dev_list.dev_list);
+	spin_unlock(&dthe_dev_list.lock);
+
+	ret = dthe_dma_init(dev_data);
+	if (ret)
+		goto probe_dma_err;
+
+	dev_data->engine = crypto_engine_alloc_init(dev, 1);
+	if (!dev_data->engine) {
+		ret = -ENOMEM;
+		goto probe_engine_err;
+	}
+
+	ret = crypto_engine_start(dev_data->engine);
+	if (ret) {
+		dev_err(dev, "Failed to start crypto engine\n");
+		goto probe_engine_start_err;
+	}
+
+	ret = dthe_register_algs();
+	if (ret) {
+		dev_err(dev, "Failed to register algs\n");
+		goto probe_reg_err;
+	}
+
+	return 0;
+
+probe_reg_err:
+	crypto_engine_stop(dev_data->engine);
+probe_engine_start_err:
+	crypto_engine_exit(dev_data->engine);
+probe_engine_err:
+	dma_release_channel(dev_data->dma_aes_rx);
+	dma_release_channel(dev_data->dma_aes_tx);
+	dma_release_channel(dev_data->dma_sha_tx);
+probe_dma_err:
+	spin_lock(&dthe_dev_list.lock);
+	list_del(&dev_data->list);
+	spin_unlock(&dthe_dev_list.lock);
+
+	return ret;
+}
+
+static void dthe_remove(struct platform_device *pdev)
+{
+	struct dthe_data *dev_data = platform_get_drvdata(pdev);
+
+	spin_lock(&dthe_dev_list.lock);
+	list_del(&dev_data->list);
+	spin_unlock(&dthe_dev_list.lock);
+
+	dthe_unregister_algs();
+
+	crypto_engine_stop(dev_data->engine);
+	crypto_engine_exit(dev_data->engine);
+
+	dma_release_channel(dev_data->dma_aes_rx);
+	dma_release_channel(dev_data->dma_aes_tx);
+	dma_release_channel(dev_data->dma_sha_tx);
+}
+
+static const struct of_device_id dthe_of_match[] = {
+	{ .compatible = "ti,am62l-dthev2", },
+	{},
+};
+MODULE_DEVICE_TABLE(of, dthe_of_match);
+
+static struct platform_driver dthe_driver = {
+	.probe	= dthe_probe,
+	.remove	= dthe_remove,
+	.driver = {
+		.name		= DRIVER_NAME,
+		.of_match_table	= dthe_of_match,
+	},
+};
+
+module_platform_driver(dthe_driver);
+
+MODULE_AUTHOR("T Pratham <t-pratham@ti.com>");
+MODULE_DESCRIPTION("Texas Instruments DTHE V2 driver");
+MODULE_LICENSE("GPL");
diff --git a/drivers/crypto/ti/dthev2-common.h b/drivers/crypto/ti/dthev2-common.h
new file mode 100644
index 000000000000..a6c1a69a2437
--- /dev/null
+++ b/drivers/crypto/ti/dthev2-common.h
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * K3 DTHE V2 crypto accelerator driver
+ *
+ * Copyright (C) Texas Instruments 2025 - https://www.ti.com
+ * Author: T Pratham <t-pratham@ti.com>
+ */
+
+#ifndef __TI_DTHEV2_H__
+#define __TI_DTHEV2_H__
+
+#include <crypto/aead.h>
+#include <crypto/aes.h>
+#include <crypto/algapi.h>
+#include <crypto/engine.h>
+#include <crypto/hash.h>
+#include <crypto/internal/aead.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/skcipher.h>
+
+#include <linux/delay.h>
+#include <linux/dmaengine.h>
+#include <linux/dmapool.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/scatterlist.h>
+
+#define DTHE_REG_SIZE		4
+#define DTHE_DMA_TIMEOUT_MS	2000
+
+enum dthe_aes_mode {
+	DTHE_AES_ECB = 0,
+	DTHE_AES_CBC,
+};
+
+/* Driver specific struct definitions */
+
+/**
+ * struct dthe_data - DTHE_V2 driver instance data
+ * @dev: Device pointer
+ * @regs: Base address of the register space
+ * @list: list node for dev
+ * @engine: Crypto engine instance
+ * @dma_aes_rx: AES Rx DMA Channel
+ * @dma_aes_tx: AES Tx DMA Channel
+ * @dma_sha_tx: SHA Tx DMA Channel
+ */
+struct dthe_data {
+	struct device *dev;
+	void __iomem *regs;
+	struct list_head list;
+	struct crypto_engine *engine;
+
+	struct dma_chan *dma_aes_rx;
+	struct dma_chan *dma_aes_tx;
+
+	struct dma_chan *dma_sha_tx;
+};
+
+/**
+ * struct dthe_list - device data list head
+ * @dev_list: linked list head
+ * @lock: Spinlock protecting accesses to the list
+ */
+struct dthe_list {
+	struct list_head dev_list;
+	spinlock_t lock;
+};
+
+/**
+ * struct dthe_tfm_ctx - Transform ctx struct containing ctx for all sub-components of DTHE V2
+ * @dev_data: Device data struct pointer
+ * @keylen: AES key length
+ * @key: AES key
+ */
+struct dthe_tfm_ctx {
+	struct dthe_data *dev_data;
+	unsigned int keylen;
+	u32 key[AES_KEYSIZE_256 / sizeof(u32)];
+};
+
+/**
+ * struct dthe_aes_req_ctx - AES engine req ctx struct
+ * @mode: AES mode
+ * @enc: flag indicating encryption or decryption operation
+ * @aes_compl: Completion variable for use in manual completion in case of DMA callback failure
+ */
+struct dthe_aes_req_ctx {
+	enum dthe_aes_mode mode;
+	int enc;
+	struct completion aes_compl;
+};
+
+/* Struct definitions end */
+
+struct dthe_data *dthe_get_dev(struct dthe_tfm_ctx *ctx);
+
+int dthe_register_aes_algs(void);
+void dthe_unregister_aes_algs(void);
+
+#endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
@ 2025-06-11  9:31   ` Herbert Xu
  2025-06-11  9:32   ` Herbert Xu
  2025-06-11  9:53   ` Herbert Xu
  2 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2025-06-11  9:31 UTC (permalink / raw)
  To: T Pratham
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto

On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
>
> +static int dthe_cipher_init_tfm(struct crypto_skcipher *tfm)
> +{
> +	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +	struct dthe_data *dev_data = dthe_get_dev(ctx);
> +
> +	memzero_explicit(ctx, sizeof(*ctx));

This is unnecessary.  The tfm context is zeroed by default.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
  2025-06-11  9:31   ` Herbert Xu
@ 2025-06-11  9:32   ` Herbert Xu
  2025-06-12  5:49     ` T Pratham
  2025-06-11  9:53   ` Herbert Xu
  2 siblings, 1 reply; 16+ messages in thread
From: Herbert Xu @ 2025-06-11  9:32 UTC (permalink / raw)
  To: T Pratham
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto

On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
>
> +static void dthe_cipher_exit_tfm(struct crypto_skcipher *tfm)
> +{
> +	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
> +
> +	ctx->keylen = 0;
> +}

This is pointless.  Just get rid of the exit_tfm function.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
  2025-06-11  9:31   ` Herbert Xu
  2025-06-11  9:32   ` Herbert Xu
@ 2025-06-11  9:53   ` Herbert Xu
  2025-06-12 10:15     ` T Pratham
  2025-06-18 17:53     ` Vinod Koul
  2 siblings, 2 replies; 16+ messages in thread
From: Herbert Xu @ 2025-06-11  9:53 UTC (permalink / raw)
  To: T Pratham
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto,
	Vinod Koul, dmaengine

On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
>
> +	// Need to do a timeout to ensure finalise gets called if DMA callback fails for any reason
> +	ret = wait_for_completion_timeout(&rctx->aes_compl, msecs_to_jiffies(DTHE_DMA_TIMEOUT_MS));

This doesn't look safe.  What if the callback is invoked after a
timeout? That would be a UAF.

Does the DMA engine provide any timeout mechanism? If not, then
you could do it with a delayed work struct.  Just make sure that
you cancel the work struct in the normal path callback.  Vice versa
you need to terminate the DMA job in the timeout work struct.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-11  9:32   ` Herbert Xu
@ 2025-06-12  5:49     ` T Pratham
  0 siblings, 0 replies; 16+ messages in thread
From: T Pratham @ 2025-06-12  5:49 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto

On 11/06/25 15:02, Herbert Xu wrote:
> On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
>>
>> +static void dthe_cipher_exit_tfm(struct crypto_skcipher *tfm)
>> +{
>> +	struct dthe_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
>> +
>> +	ctx->keylen = 0;
>> +}
> 
> This is pointless.  Just get rid of the exit_tfm function.
> 
> Cheers,

Okay. I was under the impression that init and exit functions were mandatory. Seems like they are not. I'll remove this function.

Regards
T Pratham <t-pratham@ti.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-11  9:53   ` Herbert Xu
@ 2025-06-12 10:15     ` T Pratham
  2025-06-16  2:49       ` Herbert Xu
  2025-06-18 17:53     ` Vinod Koul
  1 sibling, 1 reply; 16+ messages in thread
From: T Pratham @ 2025-06-12 10:15 UTC (permalink / raw)
  To: Herbert Xu
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto,
	Vinod Koul, dmaengine

On 11/06/25 15:23, Herbert Xu wrote:
> On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
>>
>> +	// Need to do a timeout to ensure finalise gets called if DMA callback fails for any reason
>> +	ret = wait_for_completion_timeout(&rctx->aes_compl, msecs_to_jiffies(DTHE_DMA_TIMEOUT_MS));
> 
> This doesn't look safe.  What if the callback is invoked after a
> timeout? That would be a UAF.
> 
> Does the DMA engine provide any timeout mechanism? If not, then
> you could do it with a delayed work struct.  Just make sure that
> you cancel the work struct in the normal path callback.  Vice versa
> you need to terminate the DMA job in the timeout work struct.
> 
> Cheers,

Calling dma_terminate_sync() here should suffice I presume? I'll update the code accordingly.

Regards
T Pratham <t-pratham@ti.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-12 10:15     ` T Pratham
@ 2025-06-16  2:49       ` Herbert Xu
  0 siblings, 0 replies; 16+ messages in thread
From: Herbert Xu @ 2025-06-16  2:49 UTC (permalink / raw)
  To: T Pratham
  Cc: David S. Miller, Kamlesh Gurudasani, Vignesh Raghavendra,
	Praneeth Bajjuri, Manorit Chawdhry, linux-kernel, linux-crypto,
	Vinod Koul, dmaengine

On Thu, Jun 12, 2025 at 03:45:54PM +0530, T Pratham wrote:
>
> Calling dma_terminate_sync() here should suffice I presume? I'll update the code accordingly.

This is just me having a skim through the DMA engine API.  Vinod,
could you please confirm whether this is how timeouts should be
dealt with through the DMA engine?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham
  2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham
  2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
@ 2025-06-17  4:27 ` Eric Biggers
  2025-06-18 10:30   ` Kamlesh Gurudasani
  2025-06-20  3:07   ` Simon Richter
  2 siblings, 2 replies; 16+ messages in thread
From: Eric Biggers @ 2025-06-17  4:27 UTC (permalink / raw)
  To: T Pratham
  Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, linux-crypto, devicetree, linux-kernel,
	Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote:
> This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a
> new crypto accelerator which contains multiple crypto IPs [1].
> This series implements support for ECB and CBC modes of AES for the AES
> Engine of the DTHE, using skcipher APIs of the kernel.
> 
> Tested with:
> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> 
> and tcrypt,
> sudo modprobe tcrypt mode=500 sec=1
> 
> Signed-off-by: T Pratham <t-pratham@ti.com>
> ---
> [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE)
> Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf

Numbers, please.  What is the specific, real use case in Linux where this
patchset actually improves performance?  Going off the CPU and back again just
to en/decrypt some data is hugely expensive.

Note that the manual you linked to above explicitly states that the CPU supports
the ARMv8 Cryptography Extensions.  That definitively makes any off-CPU offload
obsolete.  But even without that, these sorts of off-CPU offloads have always
been highly questionable.

I think it's implausible that this patchset could actually be beneficial.

In fact, it might actually be really harmful.  You set your algorithms to
priority 30000, which makes them be prioritized over ARMv8 CE.  I've seen
exactly that bug with other "accelerators", which actually regressed performance
by over 50x compared to simply staying on the CPU.

- Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-17  4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers
@ 2025-06-18 10:30   ` Kamlesh Gurudasani
  2025-06-18 17:58     ` Eric Biggers
  2025-06-20  3:07   ` Simon Richter
  1 sibling, 1 reply; 16+ messages in thread
From: Kamlesh Gurudasani @ 2025-06-18 10:30 UTC (permalink / raw)
  To: Eric Biggers, T Pratham
  Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, linux-crypto, devicetree, linux-kernel,
	Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry

Eric Biggers <ebiggers@kernel.org> writes:

> On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote:
>> This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a
>> new crypto accelerator which contains multiple crypto IPs [1].
>> This series implements support for ECB and CBC modes of AES for the AES
>> Engine of the DTHE, using skcipher APIs of the kernel.
>> 
>> Tested with:
>> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
>> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
>> 
>> and tcrypt,
>> sudo modprobe tcrypt mode=500 sec=1
>> 
>> Signed-off-by: T Pratham <t-pratham@ti.com>
>> ---
>> [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE)
>> Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf
>
> Numbers, please.  What is the specific, real use case in Linux where this
> patchset actually improves performance?  Going off the CPU and back again just
> to en/decrypt some data is hugely expensive.
>
We don't really care about the speed here. These crypto accelerators are
from embedded system. Often less than 4 cores and this particular SOC
have variant with only one core.

ARMv8 is clocking at 1.4ghz and DTHEv2 at 400Mhz, so no way it can give
better performance number in term of speed. But crypto acclerators are
designed specifically for lower power consumption as well. ARMv8 crypto
extensions leverage SIMD registers, but dedicated crypto accelerator are
still more efficient. Think about battery operated low cost devices. 

These embedded devices are often in the open and vicinity of attacker.
Crypto accelerator are much more secure.[1]

Bottomline:
1. Crypto accelerators can deliver a higher cryptography performance.
2. Crypto accelerators can deliver better energy efficiency.
3. Cryptography hardware usually has lower timing and power side channel leakage than running
cryptography algorithms on the processor.

IPSEC and partition encryption/decryption/authentication use cases are bulk
operations and often have low setup cost than operation itself. 

[1] https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf

Cheers,
Kamlesh

> Note that the manual you linked to above explicitly states that the CPU supports
> the ARMv8 Cryptography Extensions.  That definitively makes any off-CPU offload
> obsolete.  But even without that, these sorts of off-CPU offloads have always
> been highly questionable.
>
> I think it's implausible that this patchset could actually be beneficial.
>
> In fact, it might actually be really harmful.  You set your algorithms to
> priority 30000, which makes them be prioritized over ARMv8 CE.  I've seen
> exactly that bug with other "accelerators", which actually regressed performance
> by over 50x compared to simply staying on the CPU.
>
> - Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC)
  2025-06-11  9:53   ` Herbert Xu
  2025-06-12 10:15     ` T Pratham
@ 2025-06-18 17:53     ` Vinod Koul
  1 sibling, 0 replies; 16+ messages in thread
From: Vinod Koul @ 2025-06-18 17:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: T Pratham, David S. Miller, Kamlesh Gurudasani,
	Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry,
	linux-kernel, linux-crypto, dmaengine

On 11-06-25, 17:53, Herbert Xu wrote:
> On Tue, Jun 03, 2025 at 06:07:29PM +0530, T Pratham wrote:
> >
> > +	// Need to do a timeout to ensure finalise gets called if DMA callback fails for any reason
> > +	ret = wait_for_completion_timeout(&rctx->aes_compl, msecs_to_jiffies(DTHE_DMA_TIMEOUT_MS));
> 
> This doesn't look safe.  What if the callback is invoked after a
> timeout? That would be a UAF.
> 
> Does the DMA engine provide any timeout mechanism? If not, then
> you could do it with a delayed work struct.  Just make sure that
> you cancel the work struct in the normal path callback.  Vice versa
> you need to terminate the DMA job in the timeout work struct.

Typically no. Most of the hardware may not have capability, so we have
apis to terminate.

-- 
~Vinod

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-18 10:30   ` Kamlesh Gurudasani
@ 2025-06-18 17:58     ` Eric Biggers
  2025-06-26 13:33       ` Kamlesh Gurudasani
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Biggers @ 2025-06-18 17:58 UTC (permalink / raw)
  To: Kamlesh Gurudasani
  Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree,
	linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

On Wed, Jun 18, 2025 at 04:00:12PM +0530, Kamlesh Gurudasani wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> > On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote:
> >> This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a
> >> new crypto accelerator which contains multiple crypto IPs [1].
> >> This series implements support for ECB and CBC modes of AES for the AES
> >> Engine of the DTHE, using skcipher APIs of the kernel.
> >> 
> >> Tested with:
> >> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
> >> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> >> 
> >> and tcrypt,
> >> sudo modprobe tcrypt mode=500 sec=1
> >> 
> >> Signed-off-by: T Pratham <t-pratham@ti.com>
> >> ---
> >> [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE)
> >> Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf
> >
> > Numbers, please.  What is the specific, real use case in Linux where this
> > patchset actually improves performance?  Going off the CPU and back again just
> > to en/decrypt some data is hugely expensive.
> >
> We don't really care about the speed here. These crypto accelerators are
> from embedded system. Often less than 4 cores and this particular SOC
> have variant with only one core.
> 
> ARMv8 is clocking at 1.4ghz and DTHEv2 at 400Mhz, so no way it can give
> better performance number in term of speed. But crypto acclerators are
> designed specifically for lower power consumption as well. ARMv8 crypto
> extensions leverage SIMD registers, but dedicated crypto accelerator are
> still more efficient. Think about battery operated low cost devices. 
> 
> These embedded devices are often in the open and vicinity of attacker.
> Crypto accelerator are much more secure.[1]
> 
> Bottomline:
> 1. Crypto accelerators can deliver a higher cryptography performance.
> 2. Crypto accelerators can deliver better energy efficiency.
> 3. Cryptography hardware usually has lower timing and power side channel leakage than running
> cryptography algorithms on the processor.
> 
> IPSEC and partition encryption/decryption/authentication use cases are bulk
> operations and often have low setup cost than operation itself. 
> 
> [1] https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf
> 
> Cheers,
> Kamlesh

Okay, so you admit that your "accelerator" is much slower than the CPU.  So (1)
does not apply.

As for (2), it's not clear that applies here.  Sure, your AES engine *by itself*
may be more power-efficient than the AES instructions on the CPU.  However,
using the offload requires all the additional work associated with offloading
the operation from the CPU.  Since it's much slower, it will also cause the
operation to be dragged out over much a longer period of time, keeping the
system awake for longer when it could have gone into suspend earlier.

Thus, using the "accelerator" could actually increase power usage.

As for (3), a couple issues.  First, you're just making an argument from
generalities and are not claiming that it's actually true in this case.  ARMv8
CE instructions are in fact constant time.

Sure, ARMv8 CE is generally not hardened against power analysis attacks.  But
you haven't actually claimed that your crypto engine is either.

Second, these side channels, especially the ones other than timing, just aren't
part of the threat model of most users.

Meanwhile, a security issue we do have is that the hardware drivers tend not to
be tested before the kernel is released, and often are released in a broken
state where they don't even do the en/decryption correctly.  Furthermore,
unprivileged userspace programs may use AF_ALG to exploit buggy drivers.

It seems implausible that this patch is more helpful than harmful.

- Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-17  4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers
  2025-06-18 10:30   ` Kamlesh Gurudasani
@ 2025-06-20  3:07   ` Simon Richter
  1 sibling, 0 replies; 16+ messages in thread
From: Simon Richter @ 2025-06-20  3:07 UTC (permalink / raw)
  To: Eric Biggers, T Pratham
  Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, linux-crypto, devicetree, linux-kernel,
	Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

Hi,

On 6/17/25 13:27, Eric Biggers wrote:

> Numbers, please.  What is the specific, real use case in Linux where this
> patchset actually improves performance?  Going off the CPU and back again just
> to en/decrypt some data is hugely expensive.

It would be cool to get some numbers from the IBM folks as well -- the 
NX coprocessor can do AES and SHA, but it is not enabled in the Linux 
kernel, only GZIP is (where I can definitely see a benefit, usually 
somewhere between 3 to 9 GB/s depending on how hard it needs to look for 
repetitions), so I'm wondering if that is an oversight, or deliberate.

I also wonder if for some hardware, we can get a speedup by offloading 
and polling for completion instead of waiting for an interrupt. It feels 
wrong, but the thread is blocked no matter what.

The other thing to ponder would be whether we can define a data size 
threshold where the offloading overhead becomes small enough that it's 
still worth it. That would also work for fscrypt, because with 4k 
blocks, it would simply never choose the offload engine.

    Simon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-18 17:58     ` Eric Biggers
@ 2025-06-26 13:33       ` Kamlesh Gurudasani
  2025-06-26 18:35         ` Eric Biggers
  0 siblings, 1 reply; 16+ messages in thread
From: Kamlesh Gurudasani @ 2025-06-26 13:33 UTC (permalink / raw)
  To: Eric Biggers
  Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree,
	linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

Eric Biggers <ebiggers@kernel.org> writes:

>
> Okay, so you admit that your "accelerator" is much slower than the CPU.  So (1)
> does not apply.
>
> As for (2), it's not clear that applies here.  Sure, your AES engine *by itself*
> may be more power-efficient than the AES instructions on the CPU.  However,
> using the offload requires all the additional work associated with offloading
> the operation from the CPU.  Since it's much slower, it will also cause the
> operation to be dragged out over much a longer period of time, keeping the
> system awake for longer when it could have gone into suspend earlier.
>
> Thus, using the "accelerator" could actually increase power usage.
>
> As for (3), a couple issues.  First, you're just making an argument from
> generalities and are not claiming that it's actually true in this case.  ARMv8
> CE instructions are in fact constant time.
>
> Sure, ARMv8 CE is generally not hardened against power analysis attacks.  But
> you haven't actually claimed that your crypto engine is either.
1. AES/PKE engine inside DTHEv2 is DPA and EMA resistant.

>
> Second, these side channels, especially the ones other than timing, just aren't
> part of the threat model of most users.
2. Certification like SESIP, PSA and
IEC62443(being certified for CIP kernel- LFX [1])
All these have requirements for sidechannel attacks resistance.(check
lvl 3+)
Most of our users have these requirements and they don't even care about
performance in terms of speed.

>
> Meanwhile, a security issue we do have is that the hardware drivers tend not to
> be tested before the kernel is released, and often are released in a broken
> state where they don't even do the en/decryption correctly.  Furthermore,
> unprivileged userspace programs may use AF_ALG to exploit buggy drivers.
3. We have devices in kerneCI and we have regular testing and engineers
working on acceleratprs internally too, we can be more careful about
that these drivers are going through prescribed testing for all
revisions.

We can reduce the prority for hw Accelerator by default if that's what
you're trying to imply and let users decide.
>
> It seems implausible that this patch is more helpful than harmful.
>
I don't understand why you call it harmful when it is providing the
security against side channel attacks.

If ARM itself prescribing to use crypto acclerators if they are
avialable, then it is beyond my understanding why would you push towards
using CE extensions.[3]

Are we not serious about the security than the performance itself?

For us,
Point 1 and 2 is at top priority and being a SOC vendor we want to make
sure that we provide all support that is needed by end customers for
their threat modeling.  

For embedded systems, resource utilization is also very important,
I can use crypto accelerator and save CPU for other activities

But lets look at numbers, They are not 50x worse as you have mentioned in
earlier mail, they are just 2x bad. These a system with one core cpu
833Mhz and DTHEv2 at 400Mhz

root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc
cryptsetup benchmark --cipher aes-cbc
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-cbc        256b        77.7 MiB/s        77.5 MiB/s
root@am62lxx-evm:~# modprobe -r dthev2
modprobe -r dthev2
root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc
cryptsetup benchmark --cipher aes-cbc
# Tests are approximate using memory only (no storage IO).
# Algorithm |       Key |      Encryption |      Decryption
    aes-cbc        256b       150.4 MiB/s       163.8 MiB/s

[1]https://dashboard.kernelci.org/hardware?hs=ti
[2]https://www.cip-project.org/about/security-iec-62443-4-2
[3]https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf

Cheers,
Kamlesh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator
  2025-06-26 13:33       ` Kamlesh Gurudasani
@ 2025-06-26 18:35         ` Eric Biggers
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Biggers @ 2025-06-26 18:35 UTC (permalink / raw)
  To: Kamlesh Gurudasani
  Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree,
	linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri,
	Manorit Chawdhry

On Thu, Jun 26, 2025 at 07:03:53PM +0530, Kamlesh Gurudasani wrote:
> Eric Biggers <ebiggers@kernel.org> writes:
> 
> >
> > Okay, so you admit that your "accelerator" is much slower than the CPU.  So (1)
> > does not apply.
> >
> > As for (2), it's not clear that applies here.  Sure, your AES engine *by itself*
> > may be more power-efficient than the AES instructions on the CPU.  However,
> > using the offload requires all the additional work associated with offloading
> > the operation from the CPU.  Since it's much slower, it will also cause the
> > operation to be dragged out over much a longer period of time, keeping the
> > system awake for longer when it could have gone into suspend earlier.
> >
> > Thus, using the "accelerator" could actually increase power usage.
> >
> > As for (3), a couple issues.  First, you're just making an argument from
> > generalities and are not claiming that it's actually true in this case.  ARMv8
> > CE instructions are in fact constant time.
> >
> > Sure, ARMv8 CE is generally not hardened against power analysis attacks.  But
> > you haven't actually claimed that your crypto engine is either.
> 1. AES/PKE engine inside DTHEv2 is DPA and EMA resistant.
> >
> > Second, these side channels, especially the ones other than timing, just aren't
> > part of the threat model of most users.
> 2. Certification like SESIP, PSA and
> IEC62443(being certified for CIP kernel- LFX [1])
> All these have requirements for sidechannel attacks resistance.(check
> lvl 3+)
> Most of our users have these requirements and they don't even care about
> performance in terms of speed.
> 
> >
> > Meanwhile, a security issue we do have is that the hardware drivers tend not to
> > be tested before the kernel is released, and often are released in a broken
> > state where they don't even do the en/decryption correctly.  Furthermore,
> > unprivileged userspace programs may use AF_ALG to exploit buggy drivers.
> 3. We have devices in kerneCI and we have regular testing and engineers
> working on acceleratprs internally too, we can be more careful about
> that these drivers are going through prescribed testing for all
> revisions.
> 
> We can reduce the prority for hw Accelerator by default if that's what
> you're trying to imply and let users decide.
> >
> > It seems implausible that this patch is more helpful than harmful.
> >
> I don't understand why you call it harmful when it is providing the
> security against side channel attacks.
> 
> If ARM itself prescribing to use crypto acclerators if they are
> avialable, then it is beyond my understanding why would you push towards
> using CE extensions.[3]
> 
> Are we not serious about the security than the performance itself?
> 
> For us,
> Point 1 and 2 is at top priority and being a SOC vendor we want to make
> sure that we provide all support that is needed by end customers for
> their threat modeling.  

If this is the motivation, then maybe it should be presented as the motivation?
Let's look at the patchset itself:

    "Add support for Texas Instruments DTHE V2 crypto accelerator"

    "This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a
    new crypto accelerator which contains multiple crypto IPs [1].  This series
    implements support for ECB and CBC modes of AES for the AES Engine of the
    DTHE, using skcipher APIs of the kernel."

    config CRYPTO_DEV_TI_DTHEV2
        tristate "Support for TI DTHE V2 crypto accelerators"
        depends on CRYPTO && CRYPTO_HW && ARCH_K3
        select CRYPTO_ENGINE
        select CRYPTO_SKCIPHER
        select CRYPTO_ECB
        select CRYPTO_CBC
        help
          This enables support for the TI DTHE V2 hw crypto accelerator
          which can be found on TI K3 SOCs. Selecting this enables use
          of hardware acceleration for cryptographic algorithms on
          these devices.

Nothing about side channel resistance, but everything about it being an
"accelerator" and providing "hardware acceleration".  That implies that
performance is the primary motivation.

(Also, nothing about any actual use case like dm-crypt...)

If your crypto engine does indeed provide additional side channel resistance
beyond that of ARMv8 CE, and you have an actual use case where that provides a
meaningful benefit, that's potentially valuable.

Of course, it has to be weighed against the fact that these sorts of crypto
engines are problematic in pretty much every other way.  Besides actually being
slower than the CPU, they also they often have bugs/issues where they produce
the wrong output or corrupt data.  Getting those things right should be the
first priority.  Yes, you'll vouch for your driver, but so does everyone else,
and yet they actually still have these issues.  Unfortunately the odds are kind
of stacked against you; these drivers are really hard to get right.  And the
crypto self-tests don't even properly test them.

As I mentioned, these drivers also exacerbate the usual issues we have with
kernel security, where userspace programs can exploit kernel bugs to escalate
privileges.  This is because they're all accessible to userspace via AF_ALG.

Anyway, if this is supported at all, it should be opt-in at runtime.  So yes,
please decrease the cra_priority that you're registering the algorithms with.

> For embedded systems, resource utilization is also very important,
> I can use crypto accelerator and save CPU for other activities

For the small message sizes that get used in practice this doesn't seem very
plausible, especially when the alternative is ARMv8 CE.  The driver overhead and
scheduling overhead is just too much on small message sizes.

> But lets look at numbers, They are not 50x worse as you have mentioned in
> earlier mail, they are just 2x bad. These a system with one core cpu
> 833Mhz and DTHEv2 at 400Mhz
> 
> root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc
> cryptsetup benchmark --cipher aes-cbc
> # Tests are approximate using memory only (no storage IO).
> # Algorithm |       Key |      Encryption |      Decryption
>     aes-cbc        256b        77.7 MiB/s        77.5 MiB/s
> root@am62lxx-evm:~# modprobe -r dthev2
> modprobe -r dthev2
> root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc
> cryptsetup benchmark --cipher aes-cbc
> # Tests are approximate using memory only (no storage IO).
> # Algorithm |       Key |      Encryption |      Decryption
>     aes-cbc        256b       150.4 MiB/s       163.8 MiB/s
> 
> [1]https://dashboard.kernelci.org/hardware?hs=ti
> [2]https://www.cip-project.org/about/security-iec-62443-4-2
> [3]https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf

I'm afraid 'cryptsetup benchmark --cipher aes-cbc' is not at all the right
benchmark to use here, and it's quite misleading here:

- 'cryptsetup benchmark' uses a 64 KiB message size by default.  That's 16 times
  longer than the messages that dm-crypt typically uses.  The longer messages
  strongly skew the numbers towards the hardware crypto engine.

- 'cryptsetup benchmark' uses AF_ALG and measures not just the crypto
  performance but also the overhead of AF_ALG.  This has the effect of
  diminishing any difference in speeds.  The real difference is larger.

- You benchmarked AES-CBC, which is outdated for storage encryption.  AES-XTS is
  generally the better choice, and it's faster than AES-CBC on the CPU.
  Presumably you chose AES-CBC because your driver does not support AES-XTS.

With an 833 MHz CPU, I don't think you'll see 50x worse like I saw on some other
boards.  However, the real difference will be more than the 2x worse you're
seeing with 'cryptsetup benchmark --cipher aes-cbc'.  A more accurate benchmark
would be to do an in-kernel benchmark with 4 KiB messages, of AES-XTS (ARMv8 CE)
vs either AES-CBC-ESSIV with the AES-CBC component offloaded to your crypto
engine or AES-XTS with the AES-ECB component offloaded.

- Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-06-26 18:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham
2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham
2025-06-03 12:37 ` [PATCH v5 2/2] crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) T Pratham
2025-06-11  9:31   ` Herbert Xu
2025-06-11  9:32   ` Herbert Xu
2025-06-12  5:49     ` T Pratham
2025-06-11  9:53   ` Herbert Xu
2025-06-12 10:15     ` T Pratham
2025-06-16  2:49       ` Herbert Xu
2025-06-18 17:53     ` Vinod Koul
2025-06-17  4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers
2025-06-18 10:30   ` Kamlesh Gurudasani
2025-06-18 17:58     ` Eric Biggers
2025-06-26 13:33       ` Kamlesh Gurudasani
2025-06-26 18:35         ` Eric Biggers
2025-06-20  3:07   ` Simon Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).