ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fscrypt: don't use hardware offload Crypto API drivers
@ 2025-06-11 20:58 Eric Biggers
  2025-06-12  0:21 ` Simon Richter
  2025-06-13  9:01 ` Maxime MERE
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-11 20:58 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

From: Eric Biggers <ebiggers@google.com>

fscrypt has never properly supported the old-school Crypto API hardware
offload drivers, as it processes each request synchronously.  There was
one report of someone successfully using such a driver 8 years ago.  But
otherwise this style of hardware offload is basically obsolete and has
been superseded by hardware-accelerated crypto instructions directly on
the CPU as well as inline storage encryption (UFS/eMMC).  Since then,
all I've gotten are issue reports where someone accidentally used a
buggy offload driver and it caused a problem for them, for example:

- https://lore.kernel.org/r/PH0PR02MB731916ECDB6C613665863B6CFFAA2@PH0PR02MB7319.namprd02.prod.outlook.com
- https://github.com/google/fscryptctl/issues/32

To protect users from these buggy and seemingly unhelpful drivers that I
have no way of testing, let's make fscrypt not use them.  Unfortunately
there is no direct support for doing so in the Crypto API, but we can
achieve something very close to it by disallowing algorithms that have
ASYNC, ALLOCATES_MEMORY, or KERN_DRIVER_ONLY set.

Disallowing ASYNC has the bonus that the code that waits for
asynchronous requests to complete is no longer necessary.

Note that for years dm-crypt has already had most of these drivers
disabled, as it disallows algorithms with ALLOCATES_MEMORY.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 fs/crypto/crypto.c          |  7 +++----
 fs/crypto/fname.c           | 18 ++++++++----------
 fs/crypto/fscrypt_private.h | 14 ++++++++++++++
 fs/crypto/hkdf.c            |  2 +-
 fs/crypto/keysetup.c        |  3 ++-
 fs/crypto/keysetup_v1.c     | 12 ++++++------
 6 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index b74b5937e695c..0acb440ae9723 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -111,11 +111,10 @@ int fscrypt_crypt_data_unit(const struct fscrypt_inode_info *ci,
 			    unsigned int len, unsigned int offs,
 			    gfp_t gfp_flags)
 {
 	union fscrypt_iv iv;
 	struct skcipher_request *req = NULL;
-	DECLARE_CRYPTO_WAIT(wait);
 	struct scatterlist dst, src;
 	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
 	int res = 0;
 
 	if (WARN_ON_ONCE(len <= 0))
@@ -129,21 +128,21 @@ int fscrypt_crypt_data_unit(const struct fscrypt_inode_info *ci,
 	if (!req)
 		return -ENOMEM;
 
 	skcipher_request_set_callback(
 		req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-		crypto_req_done, &wait);
+		NULL, NULL);
 
 	sg_init_table(&dst, 1);
 	sg_set_page(&dst, dest_page, len, offs);
 	sg_init_table(&src, 1);
 	sg_set_page(&src, src_page, len, offs);
 	skcipher_request_set_crypt(req, &src, &dst, len, &iv);
 	if (rw == FS_DECRYPT)
-		res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
+		res = crypto_skcipher_decrypt(req);
 	else
-		res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+		res = crypto_skcipher_encrypt(req);
 	skcipher_request_free(req);
 	if (res) {
 		fscrypt_err(ci->ci_inode,
 			    "%scryption failed for data unit %llu: %d",
 			    (rw == FS_DECRYPT ? "De" : "En"), index, res);
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 010f9c0a4c2f1..6365070211c02 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -91,11 +91,10 @@ static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
  */
 int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 			  u8 *out, unsigned int olen)
 {
 	struct skcipher_request *req = NULL;
-	DECLARE_CRYPTO_WAIT(wait);
 	const struct fscrypt_inode_info *ci = inode->i_crypt_info;
 	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
 	union fscrypt_iv iv;
 	struct scatterlist sg;
 	int res;
@@ -114,18 +113,18 @@ int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname,
 
 	/* Set up the encryption request */
 	req = skcipher_request_alloc(tfm, GFP_NOFS);
 	if (!req)
 		return -ENOMEM;
-	skcipher_request_set_callback(req,
-			CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-			crypto_req_done, &wait);
+	skcipher_request_set_callback(
+		req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+		NULL, NULL);
 	sg_init_one(&sg, out, olen);
 	skcipher_request_set_crypt(req, &sg, &sg, olen, &iv);
 
 	/* Do the encryption */
-	res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	res = crypto_skcipher_encrypt(req);
 	skcipher_request_free(req);
 	if (res < 0) {
 		fscrypt_err(inode, "Filename encryption failed: %d", res);
 		return res;
 	}
@@ -147,33 +146,32 @@ EXPORT_SYMBOL_GPL(fscrypt_fname_encrypt);
 static int fname_decrypt(const struct inode *inode,
 			 const struct fscrypt_str *iname,
 			 struct fscrypt_str *oname)
 {
 	struct skcipher_request *req = NULL;
-	DECLARE_CRYPTO_WAIT(wait);
 	struct scatterlist src_sg, dst_sg;
 	const struct fscrypt_inode_info *ci = inode->i_crypt_info;
 	struct crypto_skcipher *tfm = ci->ci_enc_key.tfm;
 	union fscrypt_iv iv;
 	int res;
 
 	/* Allocate request */
 	req = skcipher_request_alloc(tfm, GFP_NOFS);
 	if (!req)
 		return -ENOMEM;
-	skcipher_request_set_callback(req,
-		CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-		crypto_req_done, &wait);
+	skcipher_request_set_callback(
+		req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+		NULL, NULL);
 
 	/* Initialize IV */
 	fscrypt_generate_iv(&iv, 0, ci);
 
 	/* Create decryption request */
 	sg_init_one(&src_sg, iname->name, iname->len);
 	sg_init_one(&dst_sg, oname->name, oname->len);
 	skcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, &iv);
-	res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
+	res = crypto_skcipher_decrypt(req);
 	skcipher_request_free(req);
 	if (res < 0) {
 		fscrypt_err(inode, "Filename decryption failed: %d", res);
 		return res;
 	}
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index c1d92074b65c5..e9acf96f76739 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -43,10 +43,24 @@
  * hardware-wrapped keys has made it misleading as it's only for raw keys.
  * Don't use it in kernel code; use one of the above constants instead.
  */
 #undef FSCRYPT_MAX_KEY_SIZE
 
+/*
+ * This mask is passed as the third argument to crypto_alloc_*() functions to
+ * disallow CryptoAPI algorithm implementations that have any of these flags
+ * set.  This ensures that fscrypt uses only either well-tested "software"
+ * crypto (which these days is usually hardware-accelerated by specialized CPU
+ * instructions) or an inline encryption engine, not obsolete dedicated crypto
+ * accelerators.  Dedicated crypto accelerators often have buggy drivers and/or
+ * are slower than the "software" crypto, and fscrypt has never really been able
+ * to properly take advantage of them, as it en/decrypts data synchronously.
+ */
+#define FSCRYPT_CRYPTOAPI_MASK                            \
+	(CRYPTO_ALG_ASYNC | CRYPTO_ALG_ALLOCATES_MEMORY | \
+	 CRYPTO_ALG_KERN_DRIVER_ONLY)
+
 #define FSCRYPT_CONTEXT_V1	1
 #define FSCRYPT_CONTEXT_V2	2
 
 /* Keep this in sync with include/uapi/linux/fscrypt.h */
 #define FSCRYPT_MODE_MAX	FSCRYPT_MODE_AES_256_HCTR2
diff --git a/fs/crypto/hkdf.c b/fs/crypto/hkdf.c
index 0f3028adc9c72..5b9c21cfe2b45 100644
--- a/fs/crypto/hkdf.c
+++ b/fs/crypto/hkdf.c
@@ -56,11 +56,11 @@ int fscrypt_init_hkdf(struct fscrypt_hkdf *hkdf, const u8 *master_key,
 	struct crypto_shash *hmac_tfm;
 	static const u8 default_salt[HKDF_HASHLEN];
 	u8 prk[HKDF_HASHLEN];
 	int err;
 
-	hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, 0);
+	hmac_tfm = crypto_alloc_shash(HKDF_HMAC_ALG, 0, FSCRYPT_CRYPTOAPI_MASK);
 	if (IS_ERR(hmac_tfm)) {
 		fscrypt_err(NULL, "Error allocating " HKDF_HMAC_ALG ": %ld",
 			    PTR_ERR(hmac_tfm));
 		return PTR_ERR(hmac_tfm);
 	}
diff --git a/fs/crypto/keysetup.c b/fs/crypto/keysetup.c
index 0d71843af9469..d8113a7196979 100644
--- a/fs/crypto/keysetup.c
+++ b/fs/crypto/keysetup.c
@@ -101,11 +101,12 @@ fscrypt_allocate_skcipher(struct fscrypt_mode *mode, const u8 *raw_key,
 			  const struct inode *inode)
 {
 	struct crypto_skcipher *tfm;
 	int err;
 
-	tfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
+	tfm = crypto_alloc_skcipher(mode->cipher_str, 0,
+				    FSCRYPT_CRYPTOAPI_MASK);
 	if (IS_ERR(tfm)) {
 		if (PTR_ERR(tfm) == -ENOENT) {
 			fscrypt_warn(inode,
 				     "Missing crypto API support for %s (API name: \"%s\")",
 				     mode->friendly_name, mode->cipher_str);
diff --git a/fs/crypto/keysetup_v1.c b/fs/crypto/keysetup_v1.c
index b70521c55132b..3fdf174384f3d 100644
--- a/fs/crypto/keysetup_v1.c
+++ b/fs/crypto/keysetup_v1.c
@@ -48,13 +48,13 @@ static int derive_key_aes(const u8 *master_key,
 			  const u8 nonce[FSCRYPT_FILE_NONCE_SIZE],
 			  u8 *derived_key, unsigned int derived_keysize)
 {
 	int res = 0;
 	struct skcipher_request *req = NULL;
-	DECLARE_CRYPTO_WAIT(wait);
 	struct scatterlist src_sg, dst_sg;
-	struct crypto_skcipher *tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0);
+	struct crypto_skcipher *tfm =
+		crypto_alloc_skcipher("ecb(aes)", 0, FSCRYPT_CRYPTOAPI_MASK);
 
 	if (IS_ERR(tfm)) {
 		res = PTR_ERR(tfm);
 		tfm = NULL;
 		goto out;
@@ -63,22 +63,22 @@ static int derive_key_aes(const u8 *master_key,
 	req = skcipher_request_alloc(tfm, GFP_KERNEL);
 	if (!req) {
 		res = -ENOMEM;
 		goto out;
 	}
-	skcipher_request_set_callback(req,
-			CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-			crypto_req_done, &wait);
+	skcipher_request_set_callback(
+		req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
+		NULL, NULL);
 	res = crypto_skcipher_setkey(tfm, nonce, FSCRYPT_FILE_NONCE_SIZE);
 	if (res < 0)
 		goto out;
 
 	sg_init_one(&src_sg, master_key, derived_keysize);
 	sg_init_one(&dst_sg, derived_key, derived_keysize);
 	skcipher_request_set_crypt(req, &src_sg, &dst_sg, derived_keysize,
 				   NULL);
-	res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
+	res = crypto_skcipher_encrypt(req);
 out:
 	skcipher_request_free(req);
 	crypto_free_skcipher(tfm);
 	return res;
 }

base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-11 20:58 [PATCH] fscrypt: don't use hardware offload Crypto API drivers Eric Biggers
@ 2025-06-12  0:21 ` Simon Richter
  2025-06-12  0:59   ` Eric Biggers
  2025-06-13  9:01 ` Maxime MERE
  1 sibling, 1 reply; 18+ messages in thread
From: Simon Richter @ 2025-06-12  0:21 UTC (permalink / raw)
  To: Eric Biggers, linux-fscrypt
  Cc: linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

Hi,

On 6/12/25 05:58, Eric Biggers wrote:

> But
> otherwise this style of hardware offload is basically obsolete and has
> been superseded by hardware-accelerated crypto instructions directly on
> the CPU as well as inline storage encryption (UFS/eMMC).

For desktop, yes, but embedded still has quite a few of these, for 
example the STM32 crypto offload engine, and I expect quite a few FPGA 
based implementations exist, so this would require vendors to maintain a 
fork to keep their out-of-tree drivers functional when updating the kernel.

POWER also has an asynchronous offload engine with AES, SHA and gzip 
support, these are significantly faster than the CPU.

If a buggy engine passes self-test, can this simply be fixed by adding 
more tests? :>

    Simon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12  0:21 ` Simon Richter
@ 2025-06-12  0:59   ` Eric Biggers
  2025-06-12  6:25     ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2025-06-12  0:59 UTC (permalink / raw)
  To: Simon Richter
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Thu, Jun 12, 2025 at 09:21:26AM +0900, Simon Richter wrote:
> Hi,
> 
> On 6/12/25 05:58, Eric Biggers wrote:
> 
> > But
> > otherwise this style of hardware offload is basically obsolete and has
> > been superseded by hardware-accelerated crypto instructions directly on
> > the CPU as well as inline storage encryption (UFS/eMMC).
> 
> For desktop, yes, but embedded still has quite a few of these, for example
> the STM32 crypto offload engine, and I expect quite a few FPGA based
> implementations exist, so this would require vendors to maintain a fork to
> keep their out-of-tree drivers functional when updating the kernel.
> 
> POWER also has an asynchronous offload engine with AES, SHA and gzip
> support, these are significantly faster than the CPU.

Do you know if anyone is actually still using a legacy-style accelerator with
fscrypt, and if so what specific performance improvements are they getting?

Arguing that legacy-style crypto acceleration could theoretically be useful in
general isn't really relevant here.  What's relevant here is whether it's
actually useful and worthwhile with this specific kernel subsystem.

As I mentioned, fscrypt has never been optimized for legacy-style accelerators
anyway, and no one has ever tried to do so.  It just hasn't been relevant.

Meanwhile, the real feedback I *do* get from users is that these drivers are
causing problems for users, since the Crypto API (unwisely) enables them by
default and thus fscrypt uses them by default.

There are people who do care about some of these drivers, but they tend to be
the manufacturer of the hardware, not the users.  To me it seems like more of a
check-box exercise than something that is actually worth using in practice.

> If a buggy engine passes self-test, can this simply be fixed by adding more
> tests? :>

The crypto self-tests are disabled by default, just like any other kernel
subsystem.  They are supposed to be run before a kernel is released, but for
most of the drivers they aren't, since people don't have the hardware.  Thus, a
lot of drivers in-tree don't even pass the tests we do have.

Some distros do enable the crypto self-tests in production kernels, but only the
fast tests; the full set of tests is too slow to enable in production.  But even
the full tests don't properly test the hardware offload drivers, which have
unique challenges that do not exist in software code.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12  0:59   ` Eric Biggers
@ 2025-06-12  6:25     ` Eric Biggers
  2025-06-12  8:50       ` Giovanni Cabiddu
  2025-06-25  6:32       ` Eric Biggers
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-12  6:25 UTC (permalink / raw)
  To: Simon Richter
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Thu, Jun 12, 2025 at 12:59:14AM +0000, Eric Biggers wrote:
> On Thu, Jun 12, 2025 at 09:21:26AM +0900, Simon Richter wrote:
> > Hi,
> > 
> > On 6/12/25 05:58, Eric Biggers wrote:
> > 
> > > But
> > > otherwise this style of hardware offload is basically obsolete and has
> > > been superseded by hardware-accelerated crypto instructions directly on
> > > the CPU as well as inline storage encryption (UFS/eMMC).
> > 
> > For desktop, yes, but embedded still has quite a few of these, for example
> > the STM32 crypto offload engine

By the way, I noticed you specifically mentioned STM32.  I'm not sure if you
looked at the links I had in my commit message, but one of them
(https://github.com/google/fscryptctl/issues/32) was actually for the STM32
driver being broken and returning the wrong results, which broke filename
encryption.  The user fixed the issue by disabling the STM32 driver, and they
seemed okay with that.

That doesn't sound like something useful, IMO.  It sounds more like something
actively harmful to users.

Here's another one I forgot to mention:
https://github.com/google/fscryptctl/issues/9

I get blamed for these issues, because it's fscrypt that breaks.

FWIW, here's what happens if you try to use the Intel QAT driver with dm-crypt:
https://lore.kernel.org/r/CACsaVZ+mt3CfdXV0_yJh7d50tRcGcRZ12j3n6-hoX2cz3+njsg@mail.gmail.com/
https://lore.kernel.org/r/0171515-7267-624-5a22-238af829698f@redhat.com/

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12  6:25     ` Eric Biggers
@ 2025-06-12  8:50       ` Giovanni Cabiddu
  2025-06-12 15:57         ` Eric Biggers
  2025-06-25  6:32       ` Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Giovanni Cabiddu @ 2025-06-12  8:50 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Wed, Jun 11, 2025 at 11:25:21PM -0700, Eric Biggers wrote:

...

> FWIW, here's what happens if you try to use the Intel QAT driver with dm-crypt:
> https://lore.kernel.org/r/CACsaVZ+mt3CfdXV0_yJh7d50tRcGcRZ12j3n6-hoX2cz3+njsg@mail.gmail.com/

/s/happens/happened/

... and it got fixed
https://lore.kernel.org/all/20220506082327.21605-1-giovanni.cabiddu@intel.com/

Regards,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12  8:50       ` Giovanni Cabiddu
@ 2025-06-12 15:57         ` Eric Biggers
  2025-06-13  1:23           ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2025-06-12 15:57 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Thu, Jun 12, 2025 at 09:50:26AM +0100, Giovanni Cabiddu wrote:
> On Wed, Jun 11, 2025 at 11:25:21PM -0700, Eric Biggers wrote:
> 
> ...
> 
> > FWIW, here's what happens if you try to use the Intel QAT driver with dm-crypt:
> > https://lore.kernel.org/r/CACsaVZ+mt3CfdXV0_yJh7d50tRcGcRZ12j3n6-hoX2cz3+njsg@mail.gmail.com/
> 
> /s/happens/happened/
> 
> ... and it got fixed
> https://lore.kernel.org/all/20220506082327.21605-1-giovanni.cabiddu@intel.com/

But it reached users in the first place, including stable kernels.  And
apparently the issues were going on for years and were known to the authors of
the driver
(https://lore.kernel.org/linux-crypto/91fe9f87-54d7-4140-4d1a-eac8e2081a7c@gmail.com/).

We simply don't have issues like this with the AES-NI or VAES XTS code.

And separately, QAT was reported to be much slower than AES-NI for synchronous use
(https://lore.kernel.org/linux-crypto/0171515-7267-624-5a22-238af829698f@redhat.com/)

Later, I added VAES accelerated AES-XTS code which is over twice as fast as
AES-NI on the latest Intel CPUs, so that likely widened the gap even more.

Yet, the QAT driver registers its "xts(aes)" implementation with priority 4001,
compared to priority 800 for the VAES accelerated one.  So the QAT one is the
one that will be used by fscrypt!

That seems like a major issue even just from a performance perspective.

I expect this patch will significantly improve fscrypt performance on Intel
servers that have QAT.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12 15:57         ` Eric Biggers
@ 2025-06-13  1:23           ` Eric Biggers
  2025-06-13 11:10             ` Giovanni Cabiddu
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2025-06-13  1:23 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Thu, Jun 12, 2025 at 03:57:43PM +0000, Eric Biggers wrote:
> On Thu, Jun 12, 2025 at 09:50:26AM +0100, Giovanni Cabiddu wrote:
> > On Wed, Jun 11, 2025 at 11:25:21PM -0700, Eric Biggers wrote:
> > 
> > ...
> > 
> > > FWIW, here's what happens if you try to use the Intel QAT driver with dm-crypt:
> > > https://lore.kernel.org/r/CACsaVZ+mt3CfdXV0_yJh7d50tRcGcRZ12j3n6-hoX2cz3+njsg@mail.gmail.com/
> > 
> > /s/happens/happened/
> > 
> > ... and it got fixed
> > https://lore.kernel.org/all/20220506082327.21605-1-giovanni.cabiddu@intel.com/
> 
> But it reached users in the first place, including stable kernels.  And
> apparently the issues were going on for years and were known to the authors of
> the driver
> (https://lore.kernel.org/linux-crypto/91fe9f87-54d7-4140-4d1a-eac8e2081a7c@gmail.com/).
> 
> We simply don't have issues like this with the AES-NI or VAES XTS code.
> 
> And separately, QAT was reported to be much slower than AES-NI for synchronous use
> (https://lore.kernel.org/linux-crypto/0171515-7267-624-5a22-238af829698f@redhat.com/)
> 
> Later, I added VAES accelerated AES-XTS code which is over twice as fast as
> AES-NI on the latest Intel CPUs, so that likely widened the gap even more.
> 
> Yet, the QAT driver registers its "xts(aes)" implementation with priority 4001,
> compared to priority 800 for the VAES accelerated one.  So the QAT one is the
> one that will be used by fscrypt!
> 
> That seems like a major issue even just from a performance perspective.
> 
> I expect this patch will significantly improve fscrypt performance on Intel
> servers that have QAT.

I was curious, so I actually ran a benchmark on an Intel Emerald Rapids server.
Specifically, I used a kernel module that repeatedly en/decrypted 4096-byte
messages with AES-XTS using crypto_skcipher_en/decrypt().  That's basically what
fscrypt's file contents encryption does, but here I just measured the raw crypto
performance.  I tested both xts-aes-vaes-avx512 and qat_aes_xts.  For both, the
difference between encryption and decryption was within the margin of error, so
I'll give just one number for each.

Results:

    xts-aes-vaes-avx512: 16171 MB/s
    qat_aes_xts: 289 MB/s

So, QAT is 55 times slower than the VAES-optimized software code!

It's even slower than the generic C code:
     
    xts(ecb(aes-generic)): 305 MB/s

Now, it could be argued that this is user error -- I "should" have created lots
of asynchronous crypto requests for 4K blocks, submitted them all at once, and
waited for them to complete.  Thus allowing parallel processing by QAT.

But, that's simply not what fscrypt does.  And even if it did, it could only
plausibly help for large bios.  Short bios, for which latency is really
important, would continue to be massively regressed by using QAT for them.

Even for large bios, it would have to get over 55 times faster to be worth it,
which seems (very?) tenuous.

Also, as is known from dm-crypt which does do async processing, the code that's
needed to do it is quite complex and error-prone.

In any case, async processing would be a theoretical future improvement.  It's
simply not what fscrypt does today, or has ever done.

I also found that, even though I built the QAT driver as a loadable module, it
was loaded automatically on the system and prioritized itself over the VAES-
accelerated AES-XTS.  Thus, it would be what fscrypt uses on Intel servers where
the QAT driver is enabled in kconfig, even just as 'm'.

Even disregarding the historical data corruption issues with QAT, I think this
makes it *very* clear that the QAT driver is harmful to fscrypt users.

And I've seen similar results with the Qualcomm crypto engine
(https://lore.kernel.org/r/20241203180553.16893-1-ebiggers@kernel.org/).
So this isn't even unique to this particular accelerator either.

This has gone on for long enough.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-11 20:58 [PATCH] fscrypt: don't use hardware offload Crypto API drivers Eric Biggers
  2025-06-12  0:21 ` Simon Richter
@ 2025-06-13  9:01 ` Maxime MERE
  2025-06-13 14:42   ` Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Maxime MERE @ 2025-06-13  9:01 UTC (permalink / raw)
  To: Eric Biggers, linux-fscrypt
  Cc: linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

Hello,

On 6/11/25 22:58, Eric Biggers wrote:
> To protect users from these buggy and seemingly unhelpful drivers that I
> have no way of testing, let's make fscrypt not use them.  Unfortunately
> there is no direct support for doing so in the Crypto API, but we can
> achieve something very close to it by disallowing algorithms that have
> ASYNC, ALLOCATES_MEMORY, or KERN_DRIVER_ONLY set.

I agree that software drivers are more efficient and less prone to bugs 
than hardware drivers. However, I would like to highlight the fact that 
certain ST products (the STM32MP2x series) have features that allow the 
loading of a secret key via an internal bus from a Secure OS to the CRYP 
peripheral (usable by the kernel). This enables cryptographic operations 
to be delegated to the non-secure side (the kernel) without exposing the 
key.

If fscrypt no longer supports hardware drivers, then this type of 
functionality could not be used, which I find unfortunate because it is 
something that might interest users.


cheers,

Maxime

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-13  1:23           ` Eric Biggers
@ 2025-06-13 11:10             ` Giovanni Cabiddu
  0 siblings, 0 replies; 18+ messages in thread
From: Giovanni Cabiddu @ 2025-06-13 11:10 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Fri, Jun 13, 2025 at 01:23:57AM +0000, Eric Biggers wrote:
> On Thu, Jun 12, 2025 at 03:57:43PM +0000, Eric Biggers wrote:
> > On Thu, Jun 12, 2025 at 09:50:26AM +0100, Giovanni Cabiddu wrote:
> > > On Wed, Jun 11, 2025 at 11:25:21PM -0700, Eric Biggers wrote:
> > > 
> > > ...
> > > 
> > > > FWIW, here's what happens if you try to use the Intel QAT driver with dm-crypt:
> > > > https://lore.kernel.org/r/CACsaVZ+mt3CfdXV0_yJh7d50tRcGcRZ12j3n6-hoX2cz3+njsg@mail.gmail.com/
> > > 
> > > /s/happens/happened/
> > > 
> > > ... and it got fixed
> > > https://lore.kernel.org/all/20220506082327.21605-1-giovanni.cabiddu@intel.com/
> > 
> > But it reached users in the first place, including stable kernels.  And
> > apparently the issues were going on for years and were known to the authors of
> > the driver
> > (https://lore.kernel.org/linux-crypto/91fe9f87-54d7-4140-4d1a-eac8e2081a7c@gmail.com/).
> > 
> > We simply don't have issues like this with the AES-NI or VAES XTS code.
> > 
> > And separately, QAT was reported to be much slower than AES-NI for synchronous use
> > (https://lore.kernel.org/linux-crypto/0171515-7267-624-5a22-238af829698f@redhat.com/)
> > 
> > Later, I added VAES accelerated AES-XTS code which is over twice as fast as
> > AES-NI on the latest Intel CPUs, so that likely widened the gap even more.
> > 
> > Yet, the QAT driver registers its "xts(aes)" implementation with priority 4001,
> > compared to priority 800 for the VAES accelerated one.  So the QAT one is the
> > one that will be used by fscrypt!
> > 
> > That seems like a major issue even just from a performance perspective.
> > 
> > I expect this patch will significantly improve fscrypt performance on Intel
> > servers that have QAT.
> 
> I was curious, so I actually ran a benchmark on an Intel Emerald Rapids server.
> Specifically, I used a kernel module that repeatedly en/decrypted 4096-byte
> messages with AES-XTS using crypto_skcipher_en/decrypt().  That's basically what
> fscrypt's file contents encryption does, but here I just measured the raw crypto
> performance.  I tested both xts-aes-vaes-avx512 and qat_aes_xts.  For both, the
> difference between encryption and decryption was within the margin of error, so
> I'll give just one number for each.
> 
> Results:
> 
>     xts-aes-vaes-avx512: 16171 MB/s
>     qat_aes_xts: 289 MB/s
> 
> So, QAT is 55 times slower than the VAES-optimized software code!
> 
> It's even slower than the generic C code:
>      
>     xts(ecb(aes-generic)): 305 MB/s
> 
> Now, it could be argued that this is user error -- I "should" have created lots
> of asynchronous crypto requests for 4K blocks, submitted them all at once, and
> waited for them to complete.  Thus allowing parallel processing by QAT.
> 
> But, that's simply not what fscrypt does.  And even if it did, it could only
> plausibly help for large bios.  Short bios, for which latency is really
> important, would continue to be massively regressed by using QAT for them.
> 
> Even for large bios, it would have to get over 55 times faster to be worth it,
> which seems (very?) tenuous.
> 
> Also, as is known from dm-crypt which does do async processing, the code that's
> needed to do it is quite complex and error-prone.
> 
> In any case, async processing would be a theoretical future improvement.  It's
> simply not what fscrypt does today, or has ever done.
> 
> I also found that, even though I built the QAT driver as a loadable module, it
> was loaded automatically on the system and prioritized itself over the VAES-
> accelerated AES-XTS.  Thus, it would be what fscrypt uses on Intel servers where
> the QAT driver is enabled in kconfig, even just as 'm'.
I just sent a patch to lower the priority of the skcipher (and aead)
algorithms in the QAT driver. This should allow xts-aes-vaes-avx512 to be
selected by default.

As for the module loading behaviour: loadable modules are automatically
loaded at startup if hardware that matches the device IDs they support
is found.

Regards,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-13  9:01 ` Maxime MERE
@ 2025-06-13 14:42   ` Eric Biggers
  2025-06-25 16:29     ` Maxime MERE
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2025-06-13 14:42 UTC (permalink / raw)
  To: Maxime MERE
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Fri, Jun 13, 2025 at 11:01:03AM +0200, Maxime MERE wrote:
> Hello,
> 
> On 6/11/25 22:58, Eric Biggers wrote:
> > To protect users from these buggy and seemingly unhelpful drivers that I
> > have no way of testing, let's make fscrypt not use them.  Unfortunately
> > there is no direct support for doing so in the Crypto API, but we can
> > achieve something very close to it by disallowing algorithms that have
> > ASYNC, ALLOCATES_MEMORY, or KERN_DRIVER_ONLY set.
> 
> I agree that software drivers are more efficient and less prone to bugs than
> hardware drivers. However, I would like to highlight the fact that certain
> ST products (the STM32MP2x series) have features that allow the loading of a
> secret key via an internal bus from a Secure OS to the CRYP peripheral
> (usable by the kernel). This enables cryptographic operations to be
> delegated to the non-secure side (the kernel) without exposing the key.
> 
> If fscrypt no longer supports hardware drivers, then this type of
> functionality could not be used, which I find unfortunate because it is
> something that might interest users.

What?  fscrypt doesn't support that anyway, and there isn't any path forward to
supporting it in a way that would actually improve security.  (Considering how
fscrypt's key derivation etc. works.)

fscrypt does support hardware wrapped *inline encryption* keys, which is
actually designed properly and does work.

Honestly, the responses to this thread so far have made it even more clear that
this patch is the right decision.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-12  6:25     ` Eric Biggers
  2025-06-12  8:50       ` Giovanni Cabiddu
@ 2025-06-25  6:32       ` Eric Biggers
  2025-06-25 12:44         ` Theodore Ts'o
  2025-06-25 16:29         ` Maxime MERE
  1 sibling, 2 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-25  6:32 UTC (permalink / raw)
  To: Simon Richter
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Wed, Jun 11, 2025 at 11:25:21PM -0700, Eric Biggers wrote:
> On Thu, Jun 12, 2025 at 12:59:14AM +0000, Eric Biggers wrote:
> > On Thu, Jun 12, 2025 at 09:21:26AM +0900, Simon Richter wrote:
> > > Hi,
> > > 
> > > On 6/12/25 05:58, Eric Biggers wrote:
> > > 
> > > > But
> > > > otherwise this style of hardware offload is basically obsolete and has
> > > > been superseded by hardware-accelerated crypto instructions directly on
> > > > the CPU as well as inline storage encryption (UFS/eMMC).
> > > 
> > > For desktop, yes, but embedded still has quite a few of these, for example
> > > the STM32 crypto offload engine
> 
> By the way, I noticed you specifically mentioned STM32.  I'm not sure if you
> looked at the links I had in my commit message, but one of them
> (https://github.com/google/fscryptctl/issues/32) was actually for the STM32
> driver being broken and returning the wrong results, which broke filename
> encryption.  The user fixed the issue by disabling the STM32 driver, and they
> seemed okay with that.
> 
> That doesn't sound like something useful, IMO.  It sounds more like something
> actively harmful to users.
> 
> Here's another one I forgot to mention:
> https://github.com/google/fscryptctl/issues/9
> 
> I get blamed for these issues, because it's fscrypt that breaks.

Since two people were pushing the STM32 crypto engine in this thread:

I measured decryption throughput on 4 KiB messages on an STM32MP157F-DK2.  This
is an embedded evaluation board that includes an STM32 crypto engine and has an
800 MHz Cortex-A7 processor.  Cortex-A7 doesn't have AES instructions:

    AES-128-CBC-ESSIV:
        essiv(stm32-cbc-aes,sha256-arm):
            3.1 MB/s
        essiv(cbc-aes-neonbs,sha256-arm): 
            15.5 MB/s

    AES-256-XTS:
        xts(stm32-ecb-aes):
            3.1 MB/s
        xts-aes-neonbs:
            11.0 MB/s
            
    Adiantum:
        adiantum(xchacha12-arm,aes-arm,nhpoly1305-neon):
            53.1 MB/s

That was the synchronous throughput.  However, submitting multiple requests
asynchronously (which again, fscrypt doesn't actually do) barely helps.
Apparently the STM32 crypto engine has only one hardware queue.

I already strongly suspected that these non-inline crypto engines aren't worth
using.  But I didn't realize they are quite this bad.  Even with AES on a
Cortex-A7 CPU that lacks AES instructions, the CPU is much faster!

But of course Adiantum is even faster, as it was specifically designed for CPUs
that don't have AES instructions.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25  6:32       ` Eric Biggers
@ 2025-06-25 12:44         ` Theodore Ts'o
  2025-06-25 18:38           ` Eric Biggers
  2025-06-25 16:29         ` Maxime MERE
  1 sibling, 1 reply; 18+ messages in thread
From: Theodore Ts'o @ 2025-06-25 12:44 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Tue, Jun 24, 2025 at 11:32:52PM -0700, Eric Biggers wrote:
> 
> That was the synchronous throughput.  However, submitting multiple requests
> asynchronously (which again, fscrypt doesn't actually do) barely helps.
> Apparently the STM32 crypto engine has only one hardware queue.
> 
> I already strongly suspected that these non-inline crypto engines
> aren't worth using.  But I didn't realize they are quite this bad.
> Even with AES on a Cortex-A7 CPU that lacks AES instructions, the
> CPU is much faster!

I wonder if the primary design goal of the STM32 crypto engine is that
it might reduce power consumption --- after all, one of the primary
benchmarketing metrics that vendors care about is "hours of You Tube
watch time" --- and decryptoing a video stream doesn't require high
performance.

Given that the typical benchmarketing number which handset vendors
tend to care about is SQLite transactions per second, maybe they
wouldn't be all that eager to use the crypto engine.  :-)

    	     	      	      	     	      - Ted

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-13 14:42   ` Eric Biggers
@ 2025-06-25 16:29     ` Maxime MERE
  2025-06-25 18:57       ` Eric Biggers
  2025-06-26  2:36       ` Eric Biggers
  0 siblings, 2 replies; 18+ messages in thread
From: Maxime MERE @ 2025-06-25 16:29 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel



On 6/13/25 16:42, Eric Biggers wrote:
> Honestly, the responses to this thread so far have made it even more clear that
> this patch is the right decision.

The chaining system I previously presented is just an example intended 
to demonstrate the value of hardware drivers in the context of ST platforms.

The key point is that our hardware IP allows us to securely embed 
encryption keys directly in hardware, making sure they are never visible 
or accessible from Linux, which runs in a non-secure environment. Our 
software architectures rely on a Secure OS running in parallel with 
Linux, similar to what is done on Android. This Secure OS is responsible 
for sensitive cryptographic operations.

This Secure OS can manages the keys with a dedicated hardware peripheral 
(SAES). The Linux side never sees the keys directly. Instead, the Secure 
OS prepares the keys and shares them securely with the cryptographic 
engine (CRYP) through a dedicated hardware bus.

This architecture improves security boundary: keys isolated from the 
non-secure Linux environment. But decryption can be processed by the 
linux kernel.

In addition, ST’s hardware crypto peripherals come with built-in 
protections against side-channel attacks and have been certified with 
SESIP and PSA level 3 security assurance, providing a level of security 
difficult to achieve with software alone.

Regarding robustness and maintenance, ST ensures regular updates of its 
drivers and can fix any reported bugs. We have conducted internal tests 
with dm-crypt that demonstrate the proper functioning of these drivers 
for this type of application.

Maxime

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25  6:32       ` Eric Biggers
  2025-06-25 12:44         ` Theodore Ts'o
@ 2025-06-25 16:29         ` Maxime MERE
  2025-06-25 19:17           ` Eric Biggers
  1 sibling, 1 reply; 18+ messages in thread
From: Maxime MERE @ 2025-06-25 16:29 UTC (permalink / raw)
  To: Eric Biggers, Simon Richter
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

Hi,

On 6/25/25 08:32, Eric Biggers wrote:
> That was the synchronous throughput.  However, submitting multiple requests
> asynchronously (which again, fscrypt doesn't actually do) barely helps.
> Apparently the STM32 crypto engine has only one hardware queue.
> 
> I already strongly suspected that these non-inline crypto engines aren't worth
> using.  But I didn't realize they are quite this bad.  Even with AES on a
> Cortex-A7 CPU that lacks AES instructions, the CPU is much faster!

 From a performance perspective, using hardware crypto offloads the CPU, 
which is important in real-world applications where the CPU must handle 
multiple tasks. Our processors are often single-core and not the highest 
performing, so hardware acceleration is valuable.

I can show you performance test realized with openSSL (3.2.4) who shows, 
less CPU usage and better performance for large block of data when our 
driver is used (via afalg):

command used: ```openssl speed -evp aes-256-cbc -engine afalg -elapsed```

+--------------------+--------------+-----------------+
| Block Size (bytes) | AFALG (MB/s) | SW BASED (MB/s) |
+--------------------+--------------+-----------------+
| 16                 | 0.09         | 9.44            |
| 64                 | 0.34         | 11.43           |
| 256                | 1.31         | 12.08           |
| 1024               | 4.96         | 12.27           |
| 8192               | 18.18        | 12.33           |
| 16384              | 22.48        | 12.33           |
+--------------------+--------------+-----------------+

to test CPU usage I've used a monocore stm32mp157f.
here with afalg, we have an average CPU usage of ~75%, with the sw based
approach CPU is used at ~100%

Maxime

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25 12:44         ` Theodore Ts'o
@ 2025-06-25 18:38           ` Eric Biggers
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-25 18:38 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Wed, Jun 25, 2025 at 08:44:45AM -0400, Theodore Ts'o wrote:
> On Tue, Jun 24, 2025 at 11:32:52PM -0700, Eric Biggers wrote:
> > 
> > That was the synchronous throughput.  However, submitting multiple requests
> > asynchronously (which again, fscrypt doesn't actually do) barely helps.
> > Apparently the STM32 crypto engine has only one hardware queue.
> > 
> > I already strongly suspected that these non-inline crypto engines
> > aren't worth using.  But I didn't realize they are quite this bad.
> > Even with AES on a Cortex-A7 CPU that lacks AES instructions, the
> > CPU is much faster!
> 
> I wonder if the primary design goal of the STM32 crypto engine is that
> it might reduce power consumption --- after all, one of the primary
> benchmarketing metrics that vendors care about is "hours of You Tube
> watch time" --- and decryptoing a video stream doesn't require high
> performance.
> 
> Given that the typical benchmarketing number which handset vendors
> tend to care about is SQLite transactions per second, maybe they
> wouldn't be all that eager to use the crypto engine.  :-)
> 

My STM32MP157F-DK2 board (with screen removed) is pulling 1.5W regardless of
whether it's running the benchmark with the STM32 crypto engine or with the NEON
bit-sliced code.  However, the NEON bit-sliced code finishes 5 times faster.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25 16:29     ` Maxime MERE
@ 2025-06-25 18:57       ` Eric Biggers
  2025-06-26  2:36       ` Eric Biggers
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-25 18:57 UTC (permalink / raw)
  To: Maxime MERE
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Wed, Jun 25, 2025 at 06:29:17PM +0200, Maxime MERE wrote:
> 
> 
> On 6/13/25 16:42, Eric Biggers wrote:
> > Honestly, the responses to this thread so far have made it even more clear that
> > this patch is the right decision.
> 
> The chaining system I previously presented is just an example intended to
> demonstrate the value of hardware drivers in the context of ST platforms.
> 
> The key point is that our hardware IP allows us to securely embed encryption
> keys directly in hardware, making sure they are never visible or accessible
> from Linux, which runs in a non-secure environment. Our software
> architectures rely on a Secure OS running in parallel with Linux, similar to
> what is done on Android. This Secure OS is responsible for sensitive
> cryptographic operations.
> 
> This Secure OS can manages the keys with a dedicated hardware peripheral
> (SAES). The Linux side never sees the keys directly. Instead, the Secure OS
> prepares the keys and shares them securely with the cryptographic engine
> (CRYP) through a dedicated hardware bus.
> 
> This architecture improves security boundary: keys isolated from the
> non-secure Linux environment. But decryption can be processed by the linux
> kernel.

Can you please stop hijacking this thread with what is basically an irrelevant
marketing blurb?  The STM32 driver actually just stores the keys in memory.  See
stm32_cryp_ctx::key in drivers/crypto/stm32/stm32-cryp.c, and struct
stm32_hash_ctx::key in drivers/crypto/stm32/stm32-hash.c.

So even if the STM32 crypto engine technically supports hardware keys, the
potential benefits of that are not actually being realized in Linux.

And for applications like fscrypt that derive a large number of keys, it isn't
actually possible to use hardware keys even if the driver supported it, without
key wrapping and proper integration with the key hierarchy.  (FWIW, the
hardware-wrapped inline encryption keys feature does do that, but that is very
different from what we're talking about here.)

Also, the STM32 driver does not actually fully support any of fscrypt's file
contents encryption modes.  It does support AES-256-ECB and AES-128-CBC, so it
can be used for AES-256-XTS and AES-128-CBC-ESSIV when the CPU handles the XTS
masking or IV encryption respectively.  But to do that, the CPU needs access to
part of the secret keys.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25 16:29         ` Maxime MERE
@ 2025-06-25 19:17           ` Eric Biggers
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-25 19:17 UTC (permalink / raw)
  To: Maxime MERE
  Cc: Simon Richter, linux-fscrypt, linux-crypto, linux-kernel,
	linux-mtd, linux-ext4, linux-f2fs-devel, ceph-devel

On Wed, Jun 25, 2025 at 06:29:26PM +0200, Maxime MERE wrote:
> Hi,
> 
> On 6/25/25 08:32, Eric Biggers wrote:
> > That was the synchronous throughput.  However, submitting multiple requests
> > asynchronously (which again, fscrypt doesn't actually do) barely helps.
> > Apparently the STM32 crypto engine has only one hardware queue.
> > 
> > I already strongly suspected that these non-inline crypto engines aren't worth
> > using.  But I didn't realize they are quite this bad.  Even with AES on a
> > Cortex-A7 CPU that lacks AES instructions, the CPU is much faster!
> 
> From a performance perspective, using hardware crypto offloads the CPU,
> which is important in real-world applications where the CPU must handle
> multiple tasks. Our processors are often single-core and not the highest
> performing, so hardware acceleration is valuable.
> 
> I can show you performance test realized with openSSL (3.2.4) who shows,
> less CPU usage and better performance for large block of data when our
> driver is used (via afalg):
> 
> command used: ```openssl speed -evp aes-256-cbc -engine afalg -elapsed```
> 
> +--------------------+--------------+-----------------+
> | Block Size (bytes) | AFALG (MB/s) | SW BASED (MB/s) |
> +--------------------+--------------+-----------------+
> | 16                 | 0.09         | 9.44            |
> | 64                 | 0.34         | 11.43           |
> | 256                | 1.31         | 12.08           |
> | 1024               | 4.96         | 12.27           |
> | 8192               | 18.18        | 12.33           |
> | 16384              | 22.48        | 12.33           |
> +--------------------+--------------+-----------------+
> 
> to test CPU usage I've used a monocore stm32mp157f.
> here with afalg, we have an average CPU usage of ~75%, with the sw based
> approach CPU is used at ~100%
> 
> Maxime

fscrypt is almost always used with 4096-byte blocks, which in my benchmark took
about 1300 μs each with AES-128-CBC-ESSIV w/ STM32 engine, 264 μs each with
AES-128-CBC-ESSIV w/ CPU, or 77 μs each with Adiantum w/ CPU.  The CPU-based
times seem short enough that there isn't much time for another task to be
usefully scheduled while waiting for each block.  It's important to consider (a)
driver overhead, (b) scheduling overhead, and (c) the low instructions per
second of this processor in the first place.

By the way, the board I have (STM32MP157F-DK2) is actually multi-core.  It seems
this is common among ST's offerings that are intended to run Linux?  (Of course,
the microcontrollers that don't run Linux are another story.)

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] fscrypt: don't use hardware offload Crypto API drivers
  2025-06-25 16:29     ` Maxime MERE
  2025-06-25 18:57       ` Eric Biggers
@ 2025-06-26  2:36       ` Eric Biggers
  1 sibling, 0 replies; 18+ messages in thread
From: Eric Biggers @ 2025-06-26  2:36 UTC (permalink / raw)
  To: Maxime MERE
  Cc: linux-fscrypt, linux-crypto, linux-kernel, linux-mtd, linux-ext4,
	linux-f2fs-devel, ceph-devel

On Wed, Jun 25, 2025 at 06:29:17PM +0200, Maxime MERE wrote:
> Regarding robustness and maintenance, ST ensures regular updates of its
> drivers and can fix any reported bugs. We have conducted internal tests with
> dm-crypt that demonstrate the proper functioning of these drivers for this
> type of application.

In addition to the bug I mentioned earlier where the STM32 crypto driver
produced incorrect ciphertext (https://github.com/google/fscryptctl/issues/32),
the following fix shows that the STM32 crypto driver computed incorrect hash
values for years (2017 through 2023):

    https://git.kernel.org/linus/e6af5c0c4d32a27e

While these bugs may be fixed now, they show a serious lack of testing.  They
also show that these sorts of drivers are really hard to get right.

I absolutely do not want fscrypt using anything like this.  I want the crypto to
be done correctly.

(And also efficiently, which clearly these offloads don't actually do either.)

BTW, it seems all the hardware offload crypto drivers have quality issues like
this.  I gave other examples in the thread, for example the Intel QAT driver
causing data corruption.  So my intent isn't to single out the STM32 driver per
se.  (And of course this patch applies to all drivers.)  I'm just responding to
STM32 because of the people pushing it in this thread for some reason.

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-06-26  2:37 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-11 20:58 [PATCH] fscrypt: don't use hardware offload Crypto API drivers Eric Biggers
2025-06-12  0:21 ` Simon Richter
2025-06-12  0:59   ` Eric Biggers
2025-06-12  6:25     ` Eric Biggers
2025-06-12  8:50       ` Giovanni Cabiddu
2025-06-12 15:57         ` Eric Biggers
2025-06-13  1:23           ` Eric Biggers
2025-06-13 11:10             ` Giovanni Cabiddu
2025-06-25  6:32       ` Eric Biggers
2025-06-25 12:44         ` Theodore Ts'o
2025-06-25 18:38           ` Eric Biggers
2025-06-25 16:29         ` Maxime MERE
2025-06-25 19:17           ` Eric Biggers
2025-06-13  9:01 ` Maxime MERE
2025-06-13 14:42   ` Eric Biggers
2025-06-25 16:29     ` Maxime MERE
2025-06-25 18:57       ` Eric Biggers
2025-06-26  2:36       ` Eric Biggers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).