* [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator @ 2025-06-03 12:37 T Pratham 2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham 2025-06-17 4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers 0 siblings, 2 replies; 8+ messages in thread From: T Pratham @ 2025-06-03 12:37 UTC (permalink / raw) To: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: T Pratham, linux-crypto, devicetree, linux-kernel, Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a new crypto accelerator which contains multiple crypto IPs [1]. This series implements support for ECB and CBC modes of AES for the AES Engine of the DTHE, using skcipher APIs of the kernel. Tested with: CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y and tcrypt, sudo modprobe tcrypt mode=500 sec=1 Signed-off-by: T Pratham <t-pratham@ti.com> --- [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE) Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf Change log: v5: - Simplified tfm ctx struct - Set cra_reqsize instead of using crypto_skcipher_set_reqsize() - Move setting sysconfig and irqenable registers to dthe_aes_run v4: - Corrected dt-bindings example indentation - Simplified dt-bindings example, removing the node surrounding crypto - Fixed typo in dthev2-common.h header guard - Removed unused ctx field in dev_data struct - Moved per-op data into request context v3: - Corrected dt-bindings reg length is too long error - Converted AES driver code to use crypto_engine APIs for using internal crypto queue instead of mutex. - Removed calls to skcipher_request_complete in paths not returning -EINPROGRESS before. - Added missing KConfig import, which was accidentally removed in v2. v2: - Corrected dt-bindings syntax errors and other review comments in v1. - Completely changed driver code structure, splitting code into multiple files Link to previous versions: v4: https://lore.kernel.org/all/20250508101723.846210-2-t-pratham@ti.com/ v3: https://lore.kernel.org/all/20250502121253.456974-2-t-pratham@ti.com/ v2: https://lore.kernel.org/all/20250411091321.2925308-1-t-pratham@ti.com/ v1: https://lore.kernel.org/all/20250206-dthe-v2-aes-v1-0-1e86cf683928@ti.com/ --- T Pratham (2): dt-bindings: crypto: Add binding for TI DTHE V2 crypto: ti: Add driver for DTHE V2 AES Engine (ECB, CBC) .../bindings/crypto/ti,am62l-dthev2.yaml | 50 +++ MAINTAINERS | 7 + drivers/crypto/Kconfig | 1 + drivers/crypto/Makefile | 1 + drivers/crypto/ti/Kconfig | 13 + drivers/crypto/ti/Makefile | 3 + drivers/crypto/ti/dthev2-aes.c | 420 ++++++++++++++++++ drivers/crypto/ti/dthev2-common.c | 220 +++++++++ drivers/crypto/ti/dthev2-common.h | 101 +++++ 9 files changed, 816 insertions(+) create mode 100644 Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml create mode 100644 drivers/crypto/ti/Kconfig create mode 100644 drivers/crypto/ti/Makefile create mode 100644 drivers/crypto/ti/dthev2-aes.c create mode 100644 drivers/crypto/ti/dthev2-common.c create mode 100644 drivers/crypto/ti/dthev2-common.h -- 2.43.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham @ 2025-06-03 12:37 ` T Pratham 2025-06-17 4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers 1 sibling, 0 replies; 8+ messages in thread From: T Pratham @ 2025-06-03 12:37 UTC (permalink / raw) To: T Pratham, Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley Cc: Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry, Krzysztof Kozlowski, linux-crypto, devicetree, linux-kernel Add DT binding for Texas Instruments DTHE V2 crypto accelerator. DTHE V2 is introduced as a part of TI AM62L SoC and can currently be only found in it. Signed-off-by: T Pratham <t-pratham@ti.com> Reviewed-By: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> --- .../bindings/crypto/ti,am62l-dthev2.yaml | 50 +++++++++++++++++++ MAINTAINERS | 6 +++ 2 files changed, 56 insertions(+) create mode 100644 Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml diff --git a/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml b/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml new file mode 100644 index 000000000000..5486bfeb2fe8 --- /dev/null +++ b/Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml @@ -0,0 +1,50 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/crypto/ti,am62l-dthev2.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: K3 SoC DTHE V2 crypto module + +maintainers: + - T Pratham <t-pratham@ti.com> + +properties: + compatible: + enum: + - ti,am62l-dthev2 + + reg: + maxItems: 1 + + dmas: + items: + - description: AES Engine RX DMA Channel + - description: AES Engine TX DMA Channel + - description: SHA Engine TX DMA Channel + + dma-names: + items: + - const: rx + - const: tx1 + - const: tx2 + +required: + - compatible + - reg + - dmas + - dma-names + +additionalProperties: false + +examples: + - | + crypto@40800000 { + compatible = "ti,am62l-dthev2"; + reg = <0x40800000 0x10000>; + + dmas = <&main_bcdma 0 0 0x4700 0>, + <&main_bcdma 0 0 0xc701 0>, + <&main_bcdma 0 0 0xc700 0>; + dma-names = "rx", "tx1", "tx2"; + }; diff --git a/MAINTAINERS b/MAINTAINERS index b119ae6b5f14..b990850b4994 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -24552,6 +24552,12 @@ S: Odd Fixes F: drivers/clk/ti/ F: include/linux/clk/ti.h +TI DATA TRANSFORM AND HASHING ENGINE (DTHE) V2 CRYPTO ACCELERATOR DRIVER +M: T Pratham <t-pratham@ti.com> +L: linux-crypto@vger.kernel.org +S: Supported +F: Documentation/devicetree/bindings/crypto/ti,am62l-dthev2.yaml + TI DAVINCI MACHINE SUPPORT M: Bartosz Golaszewski <brgl@bgdev.pl> L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham 2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham @ 2025-06-17 4:27 ` Eric Biggers 2025-06-18 10:30 ` Kamlesh Gurudasani 2025-06-20 3:07 ` Simon Richter 1 sibling, 2 replies; 8+ messages in thread From: Eric Biggers @ 2025-06-17 4:27 UTC (permalink / raw) To: T Pratham Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote: > This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a > new crypto accelerator which contains multiple crypto IPs [1]. > This series implements support for ECB and CBC modes of AES for the AES > Engine of the DTHE, using skcipher APIs of the kernel. > > Tested with: > CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set > CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y > > and tcrypt, > sudo modprobe tcrypt mode=500 sec=1 > > Signed-off-by: T Pratham <t-pratham@ti.com> > --- > [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE) > Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf Numbers, please. What is the specific, real use case in Linux where this patchset actually improves performance? Going off the CPU and back again just to en/decrypt some data is hugely expensive. Note that the manual you linked to above explicitly states that the CPU supports the ARMv8 Cryptography Extensions. That definitively makes any off-CPU offload obsolete. But even without that, these sorts of off-CPU offloads have always been highly questionable. I think it's implausible that this patchset could actually be beneficial. In fact, it might actually be really harmful. You set your algorithms to priority 30000, which makes them be prioritized over ARMv8 CE. I've seen exactly that bug with other "accelerators", which actually regressed performance by over 50x compared to simply staying on the CPU. - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-17 4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers @ 2025-06-18 10:30 ` Kamlesh Gurudasani 2025-06-18 17:58 ` Eric Biggers 2025-06-20 3:07 ` Simon Richter 1 sibling, 1 reply; 8+ messages in thread From: Kamlesh Gurudasani @ 2025-06-18 10:30 UTC (permalink / raw) To: Eric Biggers, T Pratham Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry Eric Biggers <ebiggers@kernel.org> writes: > On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote: >> This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a >> new crypto accelerator which contains multiple crypto IPs [1]. >> This series implements support for ECB and CBC modes of AES for the AES >> Engine of the DTHE, using skcipher APIs of the kernel. >> >> Tested with: >> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set >> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y >> >> and tcrypt, >> sudo modprobe tcrypt mode=500 sec=1 >> >> Signed-off-by: T Pratham <t-pratham@ti.com> >> --- >> [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE) >> Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf > > Numbers, please. What is the specific, real use case in Linux where this > patchset actually improves performance? Going off the CPU and back again just > to en/decrypt some data is hugely expensive. > We don't really care about the speed here. These crypto accelerators are from embedded system. Often less than 4 cores and this particular SOC have variant with only one core. ARMv8 is clocking at 1.4ghz and DTHEv2 at 400Mhz, so no way it can give better performance number in term of speed. But crypto acclerators are designed specifically for lower power consumption as well. ARMv8 crypto extensions leverage SIMD registers, but dedicated crypto accelerator are still more efficient. Think about battery operated low cost devices. These embedded devices are often in the open and vicinity of attacker. Crypto accelerator are much more secure.[1] Bottomline: 1. Crypto accelerators can deliver a higher cryptography performance. 2. Crypto accelerators can deliver better energy efficiency. 3. Cryptography hardware usually has lower timing and power side channel leakage than running cryptography algorithms on the processor. IPSEC and partition encryption/decryption/authentication use cases are bulk operations and often have low setup cost than operation itself. [1] https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf Cheers, Kamlesh > Note that the manual you linked to above explicitly states that the CPU supports > the ARMv8 Cryptography Extensions. That definitively makes any off-CPU offload > obsolete. But even without that, these sorts of off-CPU offloads have always > been highly questionable. > > I think it's implausible that this patchset could actually be beneficial. > > In fact, it might actually be really harmful. You set your algorithms to > priority 30000, which makes them be prioritized over ARMv8 CE. I've seen > exactly that bug with other "accelerators", which actually regressed performance > by over 50x compared to simply staying on the CPU. > > - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-18 10:30 ` Kamlesh Gurudasani @ 2025-06-18 17:58 ` Eric Biggers 2025-06-26 13:33 ` Kamlesh Gurudasani 0 siblings, 1 reply; 8+ messages in thread From: Eric Biggers @ 2025-06-18 17:58 UTC (permalink / raw) To: Kamlesh Gurudasani Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry On Wed, Jun 18, 2025 at 04:00:12PM +0530, Kamlesh Gurudasani wrote: > Eric Biggers <ebiggers@kernel.org> writes: > > > On Tue, Jun 03, 2025 at 06:07:27PM +0530, T Pratham wrote: > >> This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a > >> new crypto accelerator which contains multiple crypto IPs [1]. > >> This series implements support for ECB and CBC modes of AES for the AES > >> Engine of the DTHE, using skcipher APIs of the kernel. > >> > >> Tested with: > >> CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set > >> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y > >> > >> and tcrypt, > >> sudo modprobe tcrypt mode=500 sec=1 > >> > >> Signed-off-by: T Pratham <t-pratham@ti.com> > >> --- > >> [1]: Section 14.6.3 (DMA Control Registers -> DMASS_DTHE) > >> Link: https://www.ti.com/lit/ug/sprujb4/sprujb4.pdf > > > > Numbers, please. What is the specific, real use case in Linux where this > > patchset actually improves performance? Going off the CPU and back again just > > to en/decrypt some data is hugely expensive. > > > We don't really care about the speed here. These crypto accelerators are > from embedded system. Often less than 4 cores and this particular SOC > have variant with only one core. > > ARMv8 is clocking at 1.4ghz and DTHEv2 at 400Mhz, so no way it can give > better performance number in term of speed. But crypto acclerators are > designed specifically for lower power consumption as well. ARMv8 crypto > extensions leverage SIMD registers, but dedicated crypto accelerator are > still more efficient. Think about battery operated low cost devices. > > These embedded devices are often in the open and vicinity of attacker. > Crypto accelerator are much more secure.[1] > > Bottomline: > 1. Crypto accelerators can deliver a higher cryptography performance. > 2. Crypto accelerators can deliver better energy efficiency. > 3. Cryptography hardware usually has lower timing and power side channel leakage than running > cryptography algorithms on the processor. > > IPSEC and partition encryption/decryption/authentication use cases are bulk > operations and often have low setup cost than operation itself. > > [1] https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf > > Cheers, > Kamlesh Okay, so you admit that your "accelerator" is much slower than the CPU. So (1) does not apply. As for (2), it's not clear that applies here. Sure, your AES engine *by itself* may be more power-efficient than the AES instructions on the CPU. However, using the offload requires all the additional work associated with offloading the operation from the CPU. Since it's much slower, it will also cause the operation to be dragged out over much a longer period of time, keeping the system awake for longer when it could have gone into suspend earlier. Thus, using the "accelerator" could actually increase power usage. As for (3), a couple issues. First, you're just making an argument from generalities and are not claiming that it's actually true in this case. ARMv8 CE instructions are in fact constant time. Sure, ARMv8 CE is generally not hardened against power analysis attacks. But you haven't actually claimed that your crypto engine is either. Second, these side channels, especially the ones other than timing, just aren't part of the threat model of most users. Meanwhile, a security issue we do have is that the hardware drivers tend not to be tested before the kernel is released, and often are released in a broken state where they don't even do the en/decryption correctly. Furthermore, unprivileged userspace programs may use AF_ALG to exploit buggy drivers. It seems implausible that this patch is more helpful than harmful. - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-18 17:58 ` Eric Biggers @ 2025-06-26 13:33 ` Kamlesh Gurudasani 2025-06-26 18:35 ` Eric Biggers 0 siblings, 1 reply; 8+ messages in thread From: Kamlesh Gurudasani @ 2025-06-26 13:33 UTC (permalink / raw) To: Eric Biggers Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry Eric Biggers <ebiggers@kernel.org> writes: > > Okay, so you admit that your "accelerator" is much slower than the CPU. So (1) > does not apply. > > As for (2), it's not clear that applies here. Sure, your AES engine *by itself* > may be more power-efficient than the AES instructions on the CPU. However, > using the offload requires all the additional work associated with offloading > the operation from the CPU. Since it's much slower, it will also cause the > operation to be dragged out over much a longer period of time, keeping the > system awake for longer when it could have gone into suspend earlier. > > Thus, using the "accelerator" could actually increase power usage. > > As for (3), a couple issues. First, you're just making an argument from > generalities and are not claiming that it's actually true in this case. ARMv8 > CE instructions are in fact constant time. > > Sure, ARMv8 CE is generally not hardened against power analysis attacks. But > you haven't actually claimed that your crypto engine is either. 1. AES/PKE engine inside DTHEv2 is DPA and EMA resistant. > > Second, these side channels, especially the ones other than timing, just aren't > part of the threat model of most users. 2. Certification like SESIP, PSA and IEC62443(being certified for CIP kernel- LFX [1]) All these have requirements for sidechannel attacks resistance.(check lvl 3+) Most of our users have these requirements and they don't even care about performance in terms of speed. > > Meanwhile, a security issue we do have is that the hardware drivers tend not to > be tested before the kernel is released, and often are released in a broken > state where they don't even do the en/decryption correctly. Furthermore, > unprivileged userspace programs may use AF_ALG to exploit buggy drivers. 3. We have devices in kerneCI and we have regular testing and engineers working on acceleratprs internally too, we can be more careful about that these drivers are going through prescribed testing for all revisions. We can reduce the prority for hw Accelerator by default if that's what you're trying to imply and let users decide. > > It seems implausible that this patch is more helpful than harmful. > I don't understand why you call it harmful when it is providing the security against side channel attacks. If ARM itself prescribing to use crypto acclerators if they are avialable, then it is beyond my understanding why would you push towards using CE extensions.[3] Are we not serious about the security than the performance itself? For us, Point 1 and 2 is at top priority and being a SOC vendor we want to make sure that we provide all support that is needed by end customers for their threat modeling. For embedded systems, resource utilization is also very important, I can use crypto accelerator and save CPU for other activities But lets look at numbers, They are not 50x worse as you have mentioned in earlier mail, they are just 2x bad. These a system with one core cpu 833Mhz and DTHEv2 at 400Mhz root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc cryptsetup benchmark --cipher aes-cbc # Tests are approximate using memory only (no storage IO). # Algorithm | Key | Encryption | Decryption aes-cbc 256b 77.7 MiB/s 77.5 MiB/s root@am62lxx-evm:~# modprobe -r dthev2 modprobe -r dthev2 root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc cryptsetup benchmark --cipher aes-cbc # Tests are approximate using memory only (no storage IO). # Algorithm | Key | Encryption | Decryption aes-cbc 256b 150.4 MiB/s 163.8 MiB/s [1]https://dashboard.kernelci.org/hardware?hs=ti [2]https://www.cip-project.org/about/security-iec-62443-4-2 [3]https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf Cheers, Kamlesh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-26 13:33 ` Kamlesh Gurudasani @ 2025-06-26 18:35 ` Eric Biggers 0 siblings, 0 replies; 8+ messages in thread From: Eric Biggers @ 2025-06-26 18:35 UTC (permalink / raw) To: Kamlesh Gurudasani Cc: T Pratham, Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry On Thu, Jun 26, 2025 at 07:03:53PM +0530, Kamlesh Gurudasani wrote: > Eric Biggers <ebiggers@kernel.org> writes: > > > > > Okay, so you admit that your "accelerator" is much slower than the CPU. So (1) > > does not apply. > > > > As for (2), it's not clear that applies here. Sure, your AES engine *by itself* > > may be more power-efficient than the AES instructions on the CPU. However, > > using the offload requires all the additional work associated with offloading > > the operation from the CPU. Since it's much slower, it will also cause the > > operation to be dragged out over much a longer period of time, keeping the > > system awake for longer when it could have gone into suspend earlier. > > > > Thus, using the "accelerator" could actually increase power usage. > > > > As for (3), a couple issues. First, you're just making an argument from > > generalities and are not claiming that it's actually true in this case. ARMv8 > > CE instructions are in fact constant time. > > > > Sure, ARMv8 CE is generally not hardened against power analysis attacks. But > > you haven't actually claimed that your crypto engine is either. > 1. AES/PKE engine inside DTHEv2 is DPA and EMA resistant. > > > > Second, these side channels, especially the ones other than timing, just aren't > > part of the threat model of most users. > 2. Certification like SESIP, PSA and > IEC62443(being certified for CIP kernel- LFX [1]) > All these have requirements for sidechannel attacks resistance.(check > lvl 3+) > Most of our users have these requirements and they don't even care about > performance in terms of speed. > > > > > Meanwhile, a security issue we do have is that the hardware drivers tend not to > > be tested before the kernel is released, and often are released in a broken > > state where they don't even do the en/decryption correctly. Furthermore, > > unprivileged userspace programs may use AF_ALG to exploit buggy drivers. > 3. We have devices in kerneCI and we have regular testing and engineers > working on acceleratprs internally too, we can be more careful about > that these drivers are going through prescribed testing for all > revisions. > > We can reduce the prority for hw Accelerator by default if that's what > you're trying to imply and let users decide. > > > > It seems implausible that this patch is more helpful than harmful. > > > I don't understand why you call it harmful when it is providing the > security against side channel attacks. > > If ARM itself prescribing to use crypto acclerators if they are > avialable, then it is beyond my understanding why would you push towards > using CE extensions.[3] > > Are we not serious about the security than the performance itself? > > For us, > Point 1 and 2 is at top priority and being a SOC vendor we want to make > sure that we provide all support that is needed by end customers for > their threat modeling. If this is the motivation, then maybe it should be presented as the motivation? Let's look at the patchset itself: "Add support for Texas Instruments DTHE V2 crypto accelerator" "This series adds support for TI DTHE V2 crypto accelerator. DTHE V2 is a new crypto accelerator which contains multiple crypto IPs [1]. This series implements support for ECB and CBC modes of AES for the AES Engine of the DTHE, using skcipher APIs of the kernel." config CRYPTO_DEV_TI_DTHEV2 tristate "Support for TI DTHE V2 crypto accelerators" depends on CRYPTO && CRYPTO_HW && ARCH_K3 select CRYPTO_ENGINE select CRYPTO_SKCIPHER select CRYPTO_ECB select CRYPTO_CBC help This enables support for the TI DTHE V2 hw crypto accelerator which can be found on TI K3 SOCs. Selecting this enables use of hardware acceleration for cryptographic algorithms on these devices. Nothing about side channel resistance, but everything about it being an "accelerator" and providing "hardware acceleration". That implies that performance is the primary motivation. (Also, nothing about any actual use case like dm-crypt...) If your crypto engine does indeed provide additional side channel resistance beyond that of ARMv8 CE, and you have an actual use case where that provides a meaningful benefit, that's potentially valuable. Of course, it has to be weighed against the fact that these sorts of crypto engines are problematic in pretty much every other way. Besides actually being slower than the CPU, they also they often have bugs/issues where they produce the wrong output or corrupt data. Getting those things right should be the first priority. Yes, you'll vouch for your driver, but so does everyone else, and yet they actually still have these issues. Unfortunately the odds are kind of stacked against you; these drivers are really hard to get right. And the crypto self-tests don't even properly test them. As I mentioned, these drivers also exacerbate the usual issues we have with kernel security, where userspace programs can exploit kernel bugs to escalate privileges. This is because they're all accessible to userspace via AF_ALG. Anyway, if this is supported at all, it should be opt-in at runtime. So yes, please decrease the cra_priority that you're registering the algorithms with. > For embedded systems, resource utilization is also very important, > I can use crypto accelerator and save CPU for other activities For the small message sizes that get used in practice this doesn't seem very plausible, especially when the alternative is ARMv8 CE. The driver overhead and scheduling overhead is just too much on small message sizes. > But lets look at numbers, They are not 50x worse as you have mentioned in > earlier mail, they are just 2x bad. These a system with one core cpu > 833Mhz and DTHEv2 at 400Mhz > > root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc > cryptsetup benchmark --cipher aes-cbc > # Tests are approximate using memory only (no storage IO). > # Algorithm | Key | Encryption | Decryption > aes-cbc 256b 77.7 MiB/s 77.5 MiB/s > root@am62lxx-evm:~# modprobe -r dthev2 > modprobe -r dthev2 > root@am62lxx-evm:~# cryptsetup benchmark --cipher aes-cbc > cryptsetup benchmark --cipher aes-cbc > # Tests are approximate using memory only (no storage IO). > # Algorithm | Key | Encryption | Decryption > aes-cbc 256b 150.4 MiB/s 163.8 MiB/s > > [1]https://dashboard.kernelci.org/hardware?hs=ti > [2]https://www.cip-project.org/about/security-iec-62443-4-2 > [3]https://www.trustedfirmware.org/docs/Introduction_to_Physical_protection_for_MCU_developers_final.pdf I'm afraid 'cryptsetup benchmark --cipher aes-cbc' is not at all the right benchmark to use here, and it's quite misleading here: - 'cryptsetup benchmark' uses a 64 KiB message size by default. That's 16 times longer than the messages that dm-crypt typically uses. The longer messages strongly skew the numbers towards the hardware crypto engine. - 'cryptsetup benchmark' uses AF_ALG and measures not just the crypto performance but also the overhead of AF_ALG. This has the effect of diminishing any difference in speeds. The real difference is larger. - You benchmarked AES-CBC, which is outdated for storage encryption. AES-XTS is generally the better choice, and it's faster than AES-CBC on the CPU. Presumably you chose AES-CBC because your driver does not support AES-XTS. With an 833 MHz CPU, I don't think you'll see 50x worse like I saw on some other boards. However, the real difference will be more than the 2x worse you're seeing with 'cryptsetup benchmark --cipher aes-cbc'. A more accurate benchmark would be to do an in-kernel benchmark with 4 KiB messages, of AES-XTS (ARMv8 CE) vs either AES-CBC-ESSIV with the AES-CBC component offloaded to your crypto engine or AES-XTS with the AES-ECB component offloaded. - Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator 2025-06-17 4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers 2025-06-18 10:30 ` Kamlesh Gurudasani @ 2025-06-20 3:07 ` Simon Richter 1 sibling, 0 replies; 8+ messages in thread From: Simon Richter @ 2025-06-20 3:07 UTC (permalink / raw) To: Eric Biggers, T Pratham Cc: Herbert Xu, David S. Miller, Rob Herring, Krzysztof Kozlowski, Conor Dooley, linux-crypto, devicetree, linux-kernel, Kamlesh Gurudasani, Vignesh Raghavendra, Praneeth Bajjuri, Manorit Chawdhry Hi, On 6/17/25 13:27, Eric Biggers wrote: > Numbers, please. What is the specific, real use case in Linux where this > patchset actually improves performance? Going off the CPU and back again just > to en/decrypt some data is hugely expensive. It would be cool to get some numbers from the IBM folks as well -- the NX coprocessor can do AES and SHA, but it is not enabled in the Linux kernel, only GZIP is (where I can definitely see a benefit, usually somewhere between 3 to 9 GB/s depending on how hard it needs to look for repetitions), so I'm wondering if that is an oversight, or deliberate. I also wonder if for some hardware, we can get a speedup by offloading and polling for completion instead of waiting for an interrupt. It feels wrong, but the thread is blocked no matter what. The other thing to ponder would be whether we can define a data size threshold where the offloading overhead becomes small enough that it's still worth it. That would also work for fscrypt, because with 4k blocks, it would simply never choose the offload engine. Simon ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-06-26 18:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-03 12:37 [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator T Pratham 2025-06-03 12:37 ` [PATCH v5 1/2] dt-bindings: crypto: Add binding for TI DTHE V2 T Pratham 2025-06-17 4:27 ` [PATCH v5 0/2] Add support for Texas Instruments DTHE V2 crypto accelerator Eric Biggers 2025-06-18 10:30 ` Kamlesh Gurudasani 2025-06-18 17:58 ` Eric Biggers 2025-06-26 13:33 ` Kamlesh Gurudasani 2025-06-26 18:35 ` Eric Biggers 2025-06-20 3:07 ` Simon Richter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).