linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators
@ 2024-04-26 10:54 Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 1/6] Revert "crypto: testmgr - Remove zlib-deflate" Giovanni Cabiddu
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

Add support for zlib compression and decompression through the acomp
APIs in BTRFS. This enables [de]compression operations to be offloaded
to accelerators. This is a rework of [1].

This set also re-enables zlib-deflate in the Crypto API and in the QAT
driver as they were removed in [2] since there was no user in kernel.
The re-enablement is done by reverting the commits that removed such
feature.

The code has been benchmarked on a system with the following specs:
 * Dual socket Intel(R) Xeon(R) Platinum 8470N
 * 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s])
 * 4 NVMe disks (349.3G INTEL SSDPE21K375GA)
 * 2 QAT 4xxx devices, one per socket, configured for compression only
 * Kernel 6.8.2

The test consisted of 4 processes running `dd` that wrote in parallel
50GB of data (Silesia corpus) to the 4 NVMe disks separately. We captured
disk write throughput, CPU utilization and compression ratio:

    +---------------------------+---------+---------+---------+---------+
    |                           | QAT-L9  | ZSTD-L3 | ZLIB-L3 | LZO-L1  |
    +---------------------------+---------+---------+---------+---------+
    | Disk Write TPUT (GiB/s)   | 6.5     | 5.2     | 2.2     | 6.5     |
    +---------------------------+---------+---------+---------+---------+
    | CPU utils %age @208 cores | 4.56%   | 15.67%  | 12.79%  | 19.85%  |
    +---------------------------+---------+---------+---------+---------+
    | Compression Ratio         | 34%     | 35%     | 37%     | 58%     |
    +---------------------------+---------+---------+---------+---------+

From the results we see that BTRFS with QAT configured for zlib-deflate Level 9
provides the best throughput with less CPU utilization and better compression
ratio compared with software zstd-l3, zlib-l3 and lzo. 

Limitations: 
  * The implementation is synchronous, even if acomp is an asynchronous API.
  * The implementation tries always to use an acomp tfm even if only
    zlib-deflate-scomp is present. This ignores the compression levels
    configuration for zlib.
  * There is no way to configure a compression level for acomp(zlib-deflate).
    This is hardcoded in the acomp algorithm implementation/provider.

[1] https://lore.kernel.org/all/1467083180-111750-1-git-send-email-weigang.li@intel.com/  
[2] https://lore.kernel.org/all/ZO8ULhlJSrJ0Mcsx@gondor.apana.org.au/

Giovanni Cabiddu (5):
  Revert "crypto: testmgr - Remove zlib-deflate"
  Revert "crypto: deflate - Remove zlib-deflate"
  Revert "crypto: qat - Remove zlib-deflate"
  Revert "crypto: qat - remove unused macros in qat_comp_alg.c"
  crypto: qat - change compressor settings for QAT GEN4

Weigang Li (1):
  btrfs: zlib: add support for zlib-deflate through acomp

 crypto/deflate.c                              |  61 +++--
 crypto/testmgr.c                              |  10 +
 crypto/testmgr.h                              |  75 ++++++
 .../crypto/intel/qat/qat_common/adf_gen4_dc.c |   4 +-
 .../intel/qat/qat_common/qat_comp_algs.c      | 138 ++++++++++-
 fs/btrfs/zlib.c                               | 216 ++++++++++++++++++
 6 files changed, 484 insertions(+), 20 deletions(-)

base-commit: ed265f7fd9a635d77c8022fc6d9a1b735dd4dfd7
-- 
2.44.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH 1/6] Revert "crypto: testmgr - Remove zlib-deflate"
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 2/6] Revert "crypto: deflate " Giovanni Cabiddu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

This reverts commit 30febae71c6182e0762dc7744737012b4f8e6a6d.
---
 crypto/testmgr.c | 10 +++++++
 crypto/testmgr.h | 75 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 00f5a6cf341a..ab904ab74bee 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -5733,6 +5733,16 @@ static const struct alg_test_desc alg_test_descs[] = {
 		.suite = {
 			.hash = __VECS(xxhash64_tv_template)
 		}
+	}, {
+		.alg = "zlib-deflate",
+		.test = alg_test_comp,
+		.fips_allowed = 1,
+		.suite = {
+			.comp = {
+				.comp = __VECS(zlib_deflate_comp_tv_template),
+				.decomp = __VECS(zlib_deflate_decomp_tv_template)
+			}
+		}
 	}, {
 		.alg = "zstd",
 		.test = alg_test_comp,
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index 5350cfd9d325..71d87a2fd842 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -34752,6 +34752,81 @@ static const struct comp_testvec deflate_decomp_tv_template[] = {
 	},
 };
 
+static const struct comp_testvec zlib_deflate_comp_tv_template[] = {
+	{
+		.inlen	= 70,
+		.outlen	= 44,
+		.input	= "Join us now and share the software "
+			"Join us now and share the software ",
+		.output	= "\x78\x5e\xf3\xca\xcf\xcc\x53\x28"
+			  "\x2d\x56\xc8\xcb\x2f\x57\x48\xcc"
+			  "\x4b\x51\x28\xce\x48\x2c\x4a\x55"
+			  "\x28\xc9\x48\x55\x28\xce\x4f\x2b"
+			  "\x29\x07\x71\xbc\x08\x2b\x01\x00"
+			  "\x7c\x65\x19\x3d",
+	}, {
+		.inlen	= 191,
+		.outlen	= 129,
+		.input	= "This document describes a compression method based on the DEFLATE"
+			"compression algorithm.  This document defines the application of "
+			"the DEFLATE algorithm to the IP Payload Compression Protocol.",
+		.output	= "\x78\x5e\x5d\xce\x41\x0a\xc3\x30"
+			  "\x0c\x04\xc0\xaf\xec\x0b\xf2\x87"
+			  "\xd2\xa6\x50\xe8\xc1\x07\x7f\x40"
+			  "\xb1\x95\x5a\x60\x5b\xc6\x56\x0f"
+			  "\xfd\x7d\x93\x1e\x42\xe8\x51\xec"
+			  "\xee\x20\x9f\x64\x20\x6a\x78\x17"
+			  "\xae\x86\xc8\x23\x74\x59\x78\x80"
+			  "\x10\xb4\xb4\xce\x63\x88\x56\x14"
+			  "\xb6\xa4\x11\x0b\x0d\x8e\xd8\x6e"
+			  "\x4b\x8c\xdb\x7c\x7f\x5e\xfc\x7c"
+			  "\xae\x51\x7e\x69\x17\x4b\x65\x02"
+			  "\xfc\x1f\xbc\x4a\xdd\xd8\x7d\x48"
+			  "\xad\x65\x09\x64\x3b\xac\xeb\xd9"
+			  "\xc2\x01\xc0\xf4\x17\x3c\x1c\x1c"
+			  "\x7d\xb2\x52\xc4\xf5\xf4\x8f\xeb"
+			  "\x6a\x1a\x34\x4f\x5f\x2e\x32\x45"
+			  "\x4e",
+	},
+};
+
+static const struct comp_testvec zlib_deflate_decomp_tv_template[] = {
+	{
+		.inlen	= 128,
+		.outlen	= 191,
+		.input	= "\x78\x9c\x5d\x8d\x31\x0e\xc2\x30"
+			  "\x10\x04\xbf\xb2\x2f\xc8\x1f\x10"
+			  "\x04\x09\x89\xc2\x85\x3f\x70\xb1"
+			  "\x2f\xf8\x24\xdb\x67\xd9\x47\xc1"
+			  "\xef\x49\x68\x12\x51\xae\x76\x67"
+			  "\xd6\x27\x19\x88\x1a\xde\x85\xab"
+			  "\x21\xf2\x08\x5d\x16\x1e\x20\x04"
+			  "\x2d\xad\xf3\x18\xa2\x15\x85\x2d"
+			  "\x69\xc4\x42\x83\x23\xb6\x6c\x89"
+			  "\x71\x9b\xef\xcf\x8b\x9f\xcf\x33"
+			  "\xca\x2f\xed\x62\xa9\x4c\x80\xff"
+			  "\x13\xaf\x52\x37\xed\x0e\x52\x6b"
+			  "\x59\x02\xd9\x4e\xe8\x7a\x76\x1d"
+			  "\x02\x98\xfe\x8a\x87\x83\xa3\x4f"
+			  "\x56\x8a\xb8\x9e\x8e\x5c\x57\xd3"
+			  "\xa0\x79\xfa\x02\x2e\x32\x45\x4e",
+		.output	= "This document describes a compression method based on the DEFLATE"
+			"compression algorithm.  This document defines the application of "
+			"the DEFLATE algorithm to the IP Payload Compression Protocol.",
+	}, {
+		.inlen	= 44,
+		.outlen	= 70,
+		.input	= "\x78\x9c\xf3\xca\xcf\xcc\x53\x28"
+			  "\x2d\x56\xc8\xcb\x2f\x57\x48\xcc"
+			  "\x4b\x51\x28\xce\x48\x2c\x4a\x55"
+			  "\x28\xc9\x48\x55\x28\xce\x4f\x2b"
+			  "\x29\x07\x71\xbc\x08\x2b\x01\x00"
+			  "\x7c\x65\x19\x3d",
+		.output	= "Join us now and share the software "
+			"Join us now and share the software ",
+	},
+};
+
 /*
  * LZO test vectors (null-terminated strings).
  */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 2/6] Revert "crypto: deflate - Remove zlib-deflate"
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 1/6] Revert "crypto: testmgr - Remove zlib-deflate" Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 3/6] Revert "crypto: qat " Giovanni Cabiddu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

This reverts commit 62a465c25e99b9a98259a6b7f5bb759f5296d501.
---
 crypto/deflate.c | 61 ++++++++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 17 deletions(-)

diff --git a/crypto/deflate.c b/crypto/deflate.c
index 6e31e0db0e86..b2a46f6dc961 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -39,20 +39,24 @@ struct deflate_ctx {
 	struct z_stream_s decomp_stream;
 };
 
-static int deflate_comp_init(struct deflate_ctx *ctx)
+static int deflate_comp_init(struct deflate_ctx *ctx, int format)
 {
 	int ret = 0;
 	struct z_stream_s *stream = &ctx->comp_stream;
 
 	stream->workspace = vzalloc(zlib_deflate_workspacesize(
-				    -DEFLATE_DEF_WINBITS, MAX_MEM_LEVEL));
+				    MAX_WBITS, MAX_MEM_LEVEL));
 	if (!stream->workspace) {
 		ret = -ENOMEM;
 		goto out;
 	}
-	ret = zlib_deflateInit2(stream, DEFLATE_DEF_LEVEL, Z_DEFLATED,
-				-DEFLATE_DEF_WINBITS, DEFLATE_DEF_MEMLEVEL,
-				Z_DEFAULT_STRATEGY);
+	if (format)
+		ret = zlib_deflateInit(stream, 3);
+	else
+		ret = zlib_deflateInit2(stream, DEFLATE_DEF_LEVEL, Z_DEFLATED,
+					-DEFLATE_DEF_WINBITS,
+					DEFLATE_DEF_MEMLEVEL,
+					Z_DEFAULT_STRATEGY);
 	if (ret != Z_OK) {
 		ret = -EINVAL;
 		goto out_free;
@@ -64,7 +68,7 @@ static int deflate_comp_init(struct deflate_ctx *ctx)
 	goto out;
 }
 
-static int deflate_decomp_init(struct deflate_ctx *ctx)
+static int deflate_decomp_init(struct deflate_ctx *ctx, int format)
 {
 	int ret = 0;
 	struct z_stream_s *stream = &ctx->decomp_stream;
@@ -74,7 +78,10 @@ static int deflate_decomp_init(struct deflate_ctx *ctx)
 		ret = -ENOMEM;
 		goto out;
 	}
-	ret = zlib_inflateInit2(stream, -DEFLATE_DEF_WINBITS);
+	if (format)
+		ret = zlib_inflateInit(stream);
+	else
+		ret = zlib_inflateInit2(stream, -DEFLATE_DEF_WINBITS);
 	if (ret != Z_OK) {
 		ret = -EINVAL;
 		goto out_free;
@@ -98,21 +105,21 @@ static void deflate_decomp_exit(struct deflate_ctx *ctx)
 	vfree(ctx->decomp_stream.workspace);
 }
 
-static int __deflate_init(void *ctx)
+static int __deflate_init(void *ctx, int format)
 {
 	int ret;
 
-	ret = deflate_comp_init(ctx);
+	ret = deflate_comp_init(ctx, format);
 	if (ret)
 		goto out;
-	ret = deflate_decomp_init(ctx);
+	ret = deflate_decomp_init(ctx, format);
 	if (ret)
 		deflate_comp_exit(ctx);
 out:
 	return ret;
 }
 
-static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
+static void *gen_deflate_alloc_ctx(struct crypto_scomp *tfm, int format)
 {
 	struct deflate_ctx *ctx;
 	int ret;
@@ -121,7 +128,7 @@ static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
 	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
-	ret = __deflate_init(ctx);
+	ret = __deflate_init(ctx, format);
 	if (ret) {
 		kfree(ctx);
 		return ERR_PTR(ret);
@@ -130,11 +137,21 @@ static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
 	return ctx;
 }
 
+static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
+{
+	return gen_deflate_alloc_ctx(tfm, 0);
+}
+
+static void *zlib_deflate_alloc_ctx(struct crypto_scomp *tfm)
+{
+	return gen_deflate_alloc_ctx(tfm, 1);
+}
+
 static int deflate_init(struct crypto_tfm *tfm)
 {
 	struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	return __deflate_init(ctx);
+	return __deflate_init(ctx, 0);
 }
 
 static void __deflate_exit(void *ctx)
@@ -269,7 +286,7 @@ static struct crypto_alg alg = {
 	.coa_decompress  	= deflate_decompress } }
 };
 
-static struct scomp_alg scomp = {
+static struct scomp_alg scomp[] = { {
 	.alloc_ctx		= deflate_alloc_ctx,
 	.free_ctx		= deflate_free_ctx,
 	.compress		= deflate_scompress,
@@ -279,7 +296,17 @@ static struct scomp_alg scomp = {
 		.cra_driver_name = "deflate-scomp",
 		.cra_module	 = THIS_MODULE,
 	}
-};
+}, {
+	.alloc_ctx		= zlib_deflate_alloc_ctx,
+	.free_ctx		= deflate_free_ctx,
+	.compress		= deflate_scompress,
+	.decompress		= deflate_sdecompress,
+	.base			= {
+		.cra_name	= "zlib-deflate",
+		.cra_driver_name = "zlib-deflate-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
+} };
 
 static int __init deflate_mod_init(void)
 {
@@ -289,7 +316,7 @@ static int __init deflate_mod_init(void)
 	if (ret)
 		return ret;
 
-	ret = crypto_register_scomp(&scomp);
+	ret = crypto_register_scomps(scomp, ARRAY_SIZE(scomp));
 	if (ret) {
 		crypto_unregister_alg(&alg);
 		return ret;
@@ -301,7 +328,7 @@ static int __init deflate_mod_init(void)
 static void __exit deflate_mod_fini(void)
 {
 	crypto_unregister_alg(&alg);
-	crypto_unregister_scomp(&scomp);
+	crypto_unregister_scomps(scomp, ARRAY_SIZE(scomp));
 }
 
 subsys_initcall(deflate_mod_init);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 3/6] Revert "crypto: qat - Remove zlib-deflate"
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 1/6] Revert "crypto: testmgr - Remove zlib-deflate" Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 2/6] Revert "crypto: deflate " Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 4/6] Revert "crypto: qat - remove unused macros in qat_comp_alg.c" Giovanni Cabiddu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

This reverts commit e9dd20e0e5f62d01d9404db2cf9824d1faebcf71.
---
 .../intel/qat/qat_common/qat_comp_algs.c      | 129 +++++++++++++++++-
 1 file changed, 128 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c b/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
index 2ba4aa22e092..79de04cfa012 100644
--- a/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
+++ b/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
@@ -100,6 +100,69 @@ static void qat_comp_resubmit(struct work_struct *work)
 	acomp_request_complete(areq, ret);
 }
 
+static int parse_zlib_header(u16 zlib_h)
+{
+	int ret = -EINVAL;
+	__be16 header;
+	u8 *header_p;
+	u8 cmf, flg;
+
+	header = cpu_to_be16(zlib_h);
+	header_p = (u8 *)&header;
+
+	flg = header_p[0];
+	cmf = header_p[1];
+
+	if (cmf >> QAT_RFC_1950_CM_OFFSET > QAT_RFC_1950_CM_DEFLATE_CINFO_32K)
+		return ret;
+
+	if ((cmf & QAT_RFC_1950_CM_MASK) != QAT_RFC_1950_CM_DEFLATE)
+		return ret;
+
+	if (flg & QAT_RFC_1950_DICT_MASK)
+		return ret;
+
+	return 0;
+}
+
+static int qat_comp_rfc1950_callback(struct qat_compression_req *qat_req,
+				     void *resp)
+{
+	struct acomp_req *areq = qat_req->acompress_req;
+	enum direction dir = qat_req->dir;
+	__be32 qat_produced_adler;
+
+	qat_produced_adler = cpu_to_be32(qat_comp_get_produced_adler32(resp));
+
+	if (dir == COMPRESSION) {
+		__be16 zlib_header;
+
+		zlib_header = cpu_to_be16(QAT_RFC_1950_COMP_HDR);
+		scatterwalk_map_and_copy(&zlib_header, areq->dst, 0, QAT_RFC_1950_HDR_SIZE, 1);
+		areq->dlen += QAT_RFC_1950_HDR_SIZE;
+
+		scatterwalk_map_and_copy(&qat_produced_adler, areq->dst, areq->dlen,
+					 QAT_RFC_1950_FOOTER_SIZE, 1);
+		areq->dlen += QAT_RFC_1950_FOOTER_SIZE;
+	} else {
+		__be32 decomp_adler;
+		int footer_offset;
+		int consumed;
+
+		consumed = qat_comp_get_consumed_ctr(resp);
+		footer_offset = consumed + QAT_RFC_1950_HDR_SIZE;
+		if (footer_offset + QAT_RFC_1950_FOOTER_SIZE > areq->slen)
+			return -EBADMSG;
+
+		scatterwalk_map_and_copy(&decomp_adler, areq->src, footer_offset,
+					 QAT_RFC_1950_FOOTER_SIZE, 0);
+
+		if (qat_produced_adler != decomp_adler)
+			return -EBADMSG;
+	}
+	return 0;
+}
+
 static void qat_comp_generic_callback(struct qat_compression_req *qat_req,
 				      void *resp)
 {
@@ -221,6 +284,18 @@ static void qat_comp_alg_exit_tfm(struct crypto_acomp *acomp_tfm)
 	memset(ctx, 0, sizeof(*ctx));
 }
 
+static int qat_comp_alg_rfc1950_init_tfm(struct crypto_acomp *acomp_tfm)
+{
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp_tfm);
+	struct qat_compression_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ret = qat_comp_alg_init_tfm(acomp_tfm);
+	ctx->qat_comp_callback = &qat_comp_rfc1950_callback;
+
+	return ret;
+}
+
 static int qat_comp_alg_compress_decompress(struct acomp_req *areq, enum direction dir,
 					    unsigned int shdr, unsigned int sftr,
 					    unsigned int dhdr, unsigned int dftr)
@@ -316,6 +391,43 @@ static int qat_comp_alg_decompress(struct acomp_req *req)
 	return qat_comp_alg_compress_decompress(req, DECOMPRESSION, 0, 0, 0, 0);
 }
 
+static int qat_comp_alg_rfc1950_compress(struct acomp_req *req)
+{
+	if (!req->dst && req->dlen != 0)
+		return -EINVAL;
+
+	if (req->dst && req->dlen <= QAT_RFC_1950_HDR_SIZE + QAT_RFC_1950_FOOTER_SIZE)
+		return -EINVAL;
+
+	return qat_comp_alg_compress_decompress(req, COMPRESSION, 0, 0,
+						QAT_RFC_1950_HDR_SIZE,
+						QAT_RFC_1950_FOOTER_SIZE);
+}
+
+static int qat_comp_alg_rfc1950_decompress(struct acomp_req *req)
+{
+	struct crypto_acomp *acomp_tfm = crypto_acomp_reqtfm(req);
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp_tfm);
+	struct qat_compression_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct adf_accel_dev *accel_dev = ctx->inst->accel_dev;
+	u16 zlib_header;
+	int ret;
+
+	if (req->slen <= QAT_RFC_1950_HDR_SIZE + QAT_RFC_1950_FOOTER_SIZE)
+		return -EBADMSG;
+
+	scatterwalk_map_and_copy(&zlib_header, req->src, 0, QAT_RFC_1950_HDR_SIZE, 0);
+
+	ret = parse_zlib_header(zlib_header);
+	if (ret) {
+		dev_dbg(&GET_DEV(accel_dev), "Error parsing zlib header\n");
+		return ret;
+	}
+
+	return qat_comp_alg_compress_decompress(req, DECOMPRESSION, QAT_RFC_1950_HDR_SIZE,
+						QAT_RFC_1950_FOOTER_SIZE, 0, 0);
+}
+
 static struct acomp_alg qat_acomp[] = { {
 	.base = {
 		.cra_name = "deflate",
@@ -331,7 +443,22 @@ static struct acomp_alg qat_acomp[] = { {
 	.decompress = qat_comp_alg_decompress,
 	.dst_free = sgl_free,
 	.reqsize = sizeof(struct qat_compression_req),
-}};
+}, {
+	.base = {
+		.cra_name = "zlib-deflate",
+		.cra_driver_name = "qat_zlib_deflate",
+		.cra_priority = 4001,
+		.cra_flags = CRYPTO_ALG_ASYNC,
+		.cra_ctxsize = sizeof(struct qat_compression_ctx),
+		.cra_module = THIS_MODULE,
+	},
+	.init = qat_comp_alg_rfc1950_init_tfm,
+	.exit = qat_comp_alg_exit_tfm,
+	.compress = qat_comp_alg_rfc1950_compress,
+	.decompress = qat_comp_alg_rfc1950_decompress,
+	.dst_free = sgl_free,
+	.reqsize = sizeof(struct qat_compression_req),
+} };
 
 int qat_comp_algs_register(void)
 {
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 4/6] Revert "crypto: qat - remove unused macros in qat_comp_alg.c"
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
                   ` (2 preceding siblings ...)
  2024-04-26 10:54 ` [RFC PATCH 3/6] Revert "crypto: qat " Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 5/6] crypto: qat - change compressor settings for QAT GEN4 Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp Giovanni Cabiddu
  5 siblings, 0 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

This reverts commit b4bf8295892924fca60d0704ac7cbc3b5897d233.
---
 drivers/crypto/intel/qat/qat_common/qat_comp_algs.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c b/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
index 79de04cfa012..b533984906ec 100644
--- a/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
+++ b/drivers/crypto/intel/qat/qat_common/qat_comp_algs.c
@@ -13,6 +13,15 @@
 #include "qat_compression.h"
 #include "qat_algs_send.h"
 
+#define QAT_RFC_1950_HDR_SIZE 2
+#define QAT_RFC_1950_FOOTER_SIZE 4
+#define QAT_RFC_1950_CM_DEFLATE 8
+#define QAT_RFC_1950_CM_DEFLATE_CINFO_32K 7
+#define QAT_RFC_1950_CM_MASK 0x0f
+#define QAT_RFC_1950_CM_OFFSET 4
+#define QAT_RFC_1950_DICT_MASK 0x20
+#define QAT_RFC_1950_COMP_HDR 0x785e
+
 static DEFINE_MUTEX(algs_lock);
 static unsigned int active_devs;
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 5/6] crypto: qat - change compressor settings for QAT GEN4
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
                   ` (3 preceding siblings ...)
  2024-04-26 10:54 ` [RFC PATCH 4/6] Revert "crypto: qat - remove unused macros in qat_comp_alg.c" Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-26 10:54 ` [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp Giovanni Cabiddu
  5 siblings, 0 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li, Giovanni Cabiddu

Enable dynamic compression by default for QAT GEN4 and change
compression level to 9.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/intel/qat/qat_common/adf_gen4_dc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/intel/qat/qat_common/adf_gen4_dc.c b/drivers/crypto/intel/qat/qat_common/adf_gen4_dc.c
index 5859238e37de..34f418b88738 100644
--- a/drivers/crypto/intel/qat/qat_common/adf_gen4_dc.c
+++ b/drivers/crypto/intel/qat/qat_common/adf_gen4_dc.c
@@ -22,7 +22,7 @@ static void qat_comp_build_deflate(void *ctx)
 	header->hdr_flags =
 		ICP_QAT_FW_COMN_HDR_FLAGS_BUILD(ICP_QAT_FW_COMN_REQ_FLAG_SET);
 	header->service_type = ICP_QAT_FW_COMN_REQ_CPM_FW_COMP;
-	header->service_cmd_id = ICP_QAT_FW_COMP_CMD_STATIC;
+	header->service_cmd_id = ICP_QAT_FW_COMP_CMD_DYNAMIC;
 	header->comn_req_flags =
 		ICP_QAT_FW_COMN_FLAGS_BUILD(QAT_COMN_CD_FLD_TYPE_16BYTE_DATA,
 					    QAT_COMN_PTR_TYPE_SGL);
@@ -35,7 +35,7 @@ static void qat_comp_build_deflate(void *ctx)
 	hw_comp_lower_csr.skip_ctrl = ICP_QAT_HW_COMP_20_BYTE_SKIP_3BYTE_LITERAL;
 	hw_comp_lower_csr.algo = ICP_QAT_HW_COMP_20_HW_COMP_FORMAT_ILZ77;
 	hw_comp_lower_csr.lllbd = ICP_QAT_HW_COMP_20_LLLBD_CTRL_LLLBD_ENABLED;
-	hw_comp_lower_csr.sd = ICP_QAT_HW_COMP_20_SEARCH_DEPTH_LEVEL_1;
+	hw_comp_lower_csr.sd = ICP_QAT_HW_COMP_20_SEARCH_DEPTH_LEVEL_9;
 	hw_comp_lower_csr.hash_update = ICP_QAT_HW_COMP_20_SKIP_HASH_UPDATE_DONT_ALLOW;
 	hw_comp_lower_csr.edmm = ICP_QAT_HW_COMP_20_EXTENDED_DELAY_MATCH_MODE_EDMM_ENABLED;
 	hw_comp_upper_csr.nice = ICP_QAT_HW_COMP_20_CONFIG_CSR_NICE_PARAM_DEFAULT_VAL;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
                   ` (4 preceding siblings ...)
  2024-04-26 10:54 ` [RFC PATCH 5/6] crypto: qat - change compressor settings for QAT GEN4 Giovanni Cabiddu
@ 2024-04-26 10:54 ` Giovanni Cabiddu
  2024-04-29 13:56   ` Josef Bacik
  2024-04-29 15:57   ` David Sterba
  5 siblings, 2 replies; 24+ messages in thread
From: Giovanni Cabiddu @ 2024-04-26 10:54 UTC (permalink / raw)
  To: clm, josef, dsterba, herbert
  Cc: linux-btrfs, linux-crypto, qat-linux, embg, cyan, brian.will,
	weigang.li

From: Weigang Li <weigang.li@intel.com>

Add support for zlib compression and decompression through the acomp
APIs.
Input pages are added to an sg-list and sent to acomp in one request.
Since acomp is asynchronous, the thread is put to sleep and then the CPU
is freed up. Once compression is done, the acomp callback is triggered
and the thread is woke up.

This patch doesn't change the BTRFS disk format, this means that files
compressed by hardware engines can be de-compressed by the zlib software
library, and vice versa.

Limitations:
  * The implementation tries always to use an acomp even if only
    zlib-deflate-scomp is present
  * Acomp does not provide a way to support compression levels
  * Acomp is an asynchronous API but used here synchronously

Signed-off-by: Weigang Li <weigang.li@intel.com>
---
 fs/btrfs/zlib.c | 216 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 216 insertions(+)

diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c
index e5b3f2003896..b5bbb8c97244 100644
--- a/fs/btrfs/zlib.c
+++ b/fs/btrfs/zlib.c
@@ -18,6 +18,8 @@
 #include <linux/pagemap.h>
 #include <linux/bio.h>
 #include <linux/refcount.h>
+#include <crypto/acompress.h>
+#include <linux/scatterlist.h>
 #include "compression.h"
 
 /* workspace buffer size for s390 zlib hardware support */
@@ -33,6 +35,201 @@ struct workspace {
 
 static struct workspace_manager wsm;
 
+static int acomp_comp_pages(struct address_space *mapping, u64 start,
+			    unsigned long len, struct page **pages,
+			    unsigned long *out_pages,
+			    unsigned long *total_in,
+			    unsigned long *total_out)
+{
+	unsigned int nr_src_pages = 0, nr_dst_pages = 0, nr_pages = 0;
+	struct sg_table in_sg = { 0 }, out_sg = { 0 };
+	struct page *in_page, *out_page, **in_pages;
+	struct crypto_acomp *tfm = NULL;
+	struct acomp_req *req = NULL;
+	struct crypto_wait wait;
+	int ret, i;
+
+	nr_src_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	in_pages = kcalloc(nr_src_pages, sizeof(struct page *), GFP_KERNEL);
+	if (!in_pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < nr_src_pages; i++) {
+		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
+		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (!in_page || !out_page) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		in_pages[i] = in_page;
+		pages[i] = out_page;
+		nr_dst_pages += 1;
+		start += PAGE_SIZE;
+	}
+
+	ret = sg_alloc_table_from_pages(&in_sg, in_pages, nr_src_pages, 0,
+					nr_src_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	ret = sg_alloc_table_from_pages(&out_sg, pages, nr_dst_pages, 0,
+					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	crypto_init_wait(&wait);
+	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
+	if (IS_ERR(tfm)) {
+		ret = PTR_ERR(tfm);
+		goto out;
+	}
+
+	req = acomp_request_alloc(tfm);
+	if (!req) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, len,
+				 nr_dst_pages << PAGE_SHIFT);
+	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   crypto_req_done, &wait);
+
+	ret = crypto_wait_req(crypto_acomp_compress(req), &wait);
+	if (ret)
+		goto out;
+
+	*total_in = len;
+	*total_out = req->dlen;
+	nr_pages = (*total_out + PAGE_SIZE - 1) >> PAGE_SHIFT;
+
+out:
+	sg_free_table(&in_sg);
+	sg_free_table(&out_sg);
+
+	if (in_pages) {
+		for (i = 0; i < nr_src_pages; i++)
+			put_page(in_pages[i]);
+		kfree(in_pages);
+	}
+
+	/* free un-used out pages */
+	for (i = nr_pages; i < nr_dst_pages; i++)
+		put_page(pages[i]);
+
+	if (req)
+		acomp_request_free(req);
+
+	if (tfm)
+		crypto_free_acomp(tfm);
+
+	*out_pages = nr_pages;
+
+	return ret;
+}
+
+static int acomp_zlib_decomp_bio(struct page **in_pages,
+				 struct compressed_bio *cb, size_t srclen,
+				 unsigned long total_pages_in)
+{
+	unsigned int nr_dst_pages = BTRFS_MAX_COMPRESSED_PAGES;
+	struct sg_table in_sg = { 0 }, out_sg = { 0 };
+	struct bio *orig_bio = &cb->orig_bbio->bio;
+	char *data_out = NULL, *bv_buf = NULL;
+	int copy_len = 0, bytes_left = 0;
+	struct crypto_acomp *tfm = NULL;
+	struct page **out_pages = NULL;
+	struct acomp_req *req = NULL;
+	struct crypto_wait wait;
+	struct bio_vec bvec;
+	int ret, i = 0;
+
+	ret = sg_alloc_table_from_pages(&in_sg, in_pages, total_pages_in,
+					0, srclen, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	out_pages = kcalloc(nr_dst_pages, sizeof(struct page *), GFP_KERNEL);
+	if (!out_pages) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	for (i = 0; i < nr_dst_pages; i++) {
+		out_pages[i] = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
+		if (!out_pages[i]) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
+	ret = sg_alloc_table_from_pages(&out_sg, out_pages, nr_dst_pages, 0,
+					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
+	if (ret)
+		goto out;
+
+	crypto_init_wait(&wait);
+	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
+	if (IS_ERR(tfm)) {
+		ret = PTR_ERR(tfm);
+		goto out;
+	}
+
+	req = acomp_request_alloc(tfm);
+	if (!req) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, srclen,
+				 nr_dst_pages << PAGE_SHIFT);
+	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+				   crypto_req_done, &wait);
+
+	ret = crypto_wait_req(crypto_acomp_decompress(req), &wait);
+	if (ret)
+		goto out;
+
+	/* Copy decompressed buffer to bio pages */
+	bytes_left = req->dlen;
+	for (i = 0; i < nr_dst_pages; i++) {
+		copy_len = bytes_left > PAGE_SIZE ? PAGE_SIZE : bytes_left;
+		data_out = kmap_local_page(out_pages[i]);
+
+		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
+		bv_buf = kmap_local_page(bvec.bv_page);
+		memcpy(bv_buf, data_out, copy_len);
+		kunmap_local(bv_buf);
+
+		bio_advance(orig_bio, copy_len);
+		if (!orig_bio->bi_iter.bi_size)
+			break;
+		bytes_left -= copy_len;
+		if (bytes_left <= 0)
+			break;
+	}
+out:
+	sg_free_table(&in_sg);
+	sg_free_table(&out_sg);
+
+	if (out_pages) {
+		for (i = 0; i < nr_dst_pages; i++) {
+			if (out_pages[i])
+				put_page(out_pages[i]);
+		}
+		kfree(out_pages);
+	}
+
+	if (req)
+		acomp_request_free(req);
+	if (tfm)
+		crypto_free_acomp(tfm);
+
+	return ret;
+}
+
 struct list_head *zlib_get_workspace(unsigned int level)
 {
 	struct list_head *ws = btrfs_get_workspace(BTRFS_COMPRESS_ZLIB, level);
@@ -108,6 +305,15 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
 	unsigned long nr_dest_pages = *out_pages;
 	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
 
+	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
+		ret = acomp_comp_pages(mapping, start, len, pages, out_pages,
+				       total_in, total_out);
+		if (!ret)
+			return ret;
+
+		pr_warn("BTRFS: acomp compression failed: ret = %d\n", ret);
+		/* Fallback to SW implementation if HW compression failed */
+	}
 	*out_pages = 0;
 	*total_out = 0;
 	*total_in = 0;
@@ -281,6 +487,16 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
 	unsigned long buf_start;
 	struct page **pages_in = cb->compressed_pages;
 
+	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
+		ret = acomp_zlib_decomp_bio(pages_in, cb, srclen,
+					    total_pages_in);
+		if (!ret)
+			return ret;
+
+		pr_warn("BTRFS: acomp decompression failed, ret=%d\n", ret);
+		/* Fallback to SW implementation if HW decompression failed */
+	}
+
 	data_in = kmap_local_page(pages_in[page_in_index]);
 	workspace->strm.next_in = data_in;
 	workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-26 10:54 ` [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp Giovanni Cabiddu
@ 2024-04-29 13:56   ` Josef Bacik
  2024-04-29 15:21     ` Cabiddu, Giovanni
  2024-04-29 15:41     ` David Sterba
  2024-04-29 15:57   ` David Sterba
  1 sibling, 2 replies; 24+ messages in thread
From: Josef Bacik @ 2024-04-29 13:56 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: clm, dsterba, herbert, linux-btrfs, linux-crypto, qat-linux, embg,
	cyan, brian.will, weigang.li

On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> From: Weigang Li <weigang.li@intel.com>
> 
> Add support for zlib compression and decompression through the acomp
> APIs.
> Input pages are added to an sg-list and sent to acomp in one request.
> Since acomp is asynchronous, the thread is put to sleep and then the CPU
> is freed up. Once compression is done, the acomp callback is triggered
> and the thread is woke up.
> 
> This patch doesn't change the BTRFS disk format, this means that files
> compressed by hardware engines can be de-compressed by the zlib software
> library, and vice versa.
> 
> Limitations:
>   * The implementation tries always to use an acomp even if only
>     zlib-deflate-scomp is present
>   * Acomp does not provide a way to support compression levels

That's a non-starter.  We can't just lie to the user about the compression level
that is being used.  If the user just does "-o compress=zlib" then you need to
update btrfs_compress_set_level() to figure out the compression level that acomp
is going to use and set that appropriately, so we can report to the user what is
actually being used.

Additionally if a user specifies a compression level you need to make sure we
don't do acomp if it doesn't match what acomp is going to do.

Finally, for the normal code review, there's a bunch of things that need to be
fixed up before I take a closer look

- We don't use pr_(), we have btrfs specific printk helpers, please use those.
- We do 1 variable per line, fix up the variable declarations in your functions.

Thanks,

Josef

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-29 13:56   ` Josef Bacik
@ 2024-04-29 15:21     ` Cabiddu, Giovanni
  2024-04-29 15:44       ` David Sterba
  2024-05-03 10:04       ` Herbert Xu
  2024-04-29 15:41     ` David Sterba
  1 sibling, 2 replies; 24+ messages in thread
From: Cabiddu, Giovanni @ 2024-04-29 15:21 UTC (permalink / raw)
  To: Josef Bacik, herbert
  Cc: clm, dsterba, linux-btrfs, linux-crypto, qat-linux, embg, cyan,
	brian.will, weigang.li

On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > From: Weigang Li <weigang.li@intel.com>
> > 
> > Add support for zlib compression and decompression through the acomp
> > APIs.
> > Input pages are added to an sg-list and sent to acomp in one request.
> > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > is freed up. Once compression is done, the acomp callback is triggered
> > and the thread is woke up.
> > 
> > This patch doesn't change the BTRFS disk format, this means that files
> > compressed by hardware engines can be de-compressed by the zlib software
> > library, and vice versa.
> > 
> > Limitations:
> >   * The implementation tries always to use an acomp even if only
> >     zlib-deflate-scomp is present
> >   * Acomp does not provide a way to support compression levels
> 
> That's a non-starter.  We can't just lie to the user about the compression level
> that is being used.  If the user just does "-o compress=zlib" then you need to
> update btrfs_compress_set_level() to figure out the compression level that acomp
> is going to use and set that appropriately, so we can report to the user what is
> actually being used.
> 
> Additionally if a user specifies a compression level you need to make sure we
> don't do acomp if it doesn't match what acomp is going to do.
Thanks for the feedback. We should then extend the acomp API to take the
compression level.
@Herbert, do you have any objection if we add the compression level to
the acomp tfm and we add an API to set it? Example:

    tfm = crypto_alloc_acomp("deflate", 0, 0);
    acomp_set_level(tfm, compression_level);

> Finally, for the normal code review, there's a bunch of things that need to be
> fixed up before I take a closer look
> 
> - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> - We do 1 variable per line, fix up the variable declarations in your functions.
I see that the code in fs/btrfs/zlib.c uses both pr_() and more than one
variable per line. If we change it, will mixed style be a concern?

-- 
Giovanni

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-29 13:56   ` Josef Bacik
  2024-04-29 15:21     ` Cabiddu, Giovanni
@ 2024-04-29 15:41     ` David Sterba
  2025-05-06 15:38       ` Cabiddu, Giovanni
  1 sibling, 1 reply; 24+ messages in thread
From: David Sterba @ 2024-04-29 15:41 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Giovanni Cabiddu, clm, dsterba, herbert, linux-btrfs,
	linux-crypto, qat-linux, embg, cyan, brian.will, weigang.li

On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > From: Weigang Li <weigang.li@intel.com>
> > 
> > Add support for zlib compression and decompression through the acomp
> > APIs.
> > Input pages are added to an sg-list and sent to acomp in one request.
> > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > is freed up. Once compression is done, the acomp callback is triggered
> > and the thread is woke up.
> > 
> > This patch doesn't change the BTRFS disk format, this means that files
> > compressed by hardware engines can be de-compressed by the zlib software
> > library, and vice versa.
> > 
> > Limitations:
> >   * The implementation tries always to use an acomp even if only
> >     zlib-deflate-scomp is present
> >   * Acomp does not provide a way to support compression levels
> 
> That's a non-starter.  We can't just lie to the user about the compression level
> that is being used.  If the user just does "-o compress=zlib" then you need to
> update btrfs_compress_set_level() to figure out the compression level that acomp
> is going to use and set that appropriately, so we can report to the user what is
> actually being used.
> 
> Additionally if a user specifies a compression level you need to make sure we
> don't do acomp if it doesn't match what acomp is going to do.
> 
> Finally, for the normal code review, there's a bunch of things that need to be
> fixed up before I take a closer look
> 
> - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> - We do 1 variable per line, fix up the variable declarations in your functions.

I'd skip the style and implementation details for now. The absence of
compression level support seems like the biggest problem, also in
combination with uncondtional use of the acomp interface. We'd have to
enhance the compression format specifier to make it configurable in the
sense: if accelerator is available use it, otherwise do CPU and
synchronous compression.

On the other hand, the compression levels are to trade off time and
space. If the QAT implementation with zlib level 9 is always better than
CPU compression then it's not that bad, not counting the possibly
misleading level to the users.

If QAT can also support ZSTD I'm not sure that lack of levels can work
there though, the memory overhead is bigger and it's a more complex
algorithm. Extending the acomp API with levels would be necessary.

Regarding the implementation, there are many allocations that set up the
async request. This is problematic as the compression is at the end of
the IO path and potentially called after memory pressure. We still do
some allocations there but also try not to fail due to ENOMEM, each
allocation is a new failure point. Anything that could be reused should
be in the workspace memory.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-29 15:21     ` Cabiddu, Giovanni
@ 2024-04-29 15:44       ` David Sterba
  2024-05-03 10:04       ` Herbert Xu
  1 sibling, 0 replies; 24+ messages in thread
From: David Sterba @ 2024-04-29 15:44 UTC (permalink / raw)
  To: Cabiddu, Giovanni
  Cc: Josef Bacik, herbert, clm, dsterba, linux-btrfs, linux-crypto,
	qat-linux, embg, cyan, brian.will, weigang.li

On Mon, Apr 29, 2024 at 04:21:46PM +0100, Cabiddu, Giovanni wrote:
> On Mon, Apr 29, 2024 at 09:56:45AM -0400, Josef Bacik wrote:
> > On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> > > From: Weigang Li <weigang.li@intel.com>
> > > 
> > > Add support for zlib compression and decompression through the acomp
> > > APIs.
> > > Input pages are added to an sg-list and sent to acomp in one request.
> > > Since acomp is asynchronous, the thread is put to sleep and then the CPU
> > > is freed up. Once compression is done, the acomp callback is triggered
> > > and the thread is woke up.
> > > 
> > > This patch doesn't change the BTRFS disk format, this means that files
> > > compressed by hardware engines can be de-compressed by the zlib software
> > > library, and vice versa.
> > > 
> > > Limitations:
> > >   * The implementation tries always to use an acomp even if only
> > >     zlib-deflate-scomp is present
> > >   * Acomp does not provide a way to support compression levels
> > 
> > That's a non-starter.  We can't just lie to the user about the compression level
> > that is being used.  If the user just does "-o compress=zlib" then you need to
> > update btrfs_compress_set_level() to figure out the compression level that acomp
> > is going to use and set that appropriately, so we can report to the user what is
> > actually being used.
> > 
> > Additionally if a user specifies a compression level you need to make sure we
> > don't do acomp if it doesn't match what acomp is going to do.
> Thanks for the feedback. We should then extend the acomp API to take the
> compression level.
> @Herbert, do you have any objection if we add the compression level to
> the acomp tfm and we add an API to set it? Example:
> 
>     tfm = crypto_alloc_acomp("deflate", 0, 0);
>     acomp_set_level(tfm, compression_level);
> 
> > Finally, for the normal code review, there's a bunch of things that need to be
> > fixed up before I take a closer look
> > 
> > - We don't use pr_(), we have btrfs specific printk helpers, please use those.
> > - We do 1 variable per line, fix up the variable declarations in your functions.
> I see that the code in fs/btrfs/zlib.c uses both pr_() and more than one
> variable per line. If we change it, will mixed style be a concern?

I have a work in progress to rework the messages in compression. The
messages with pr_() helpers are there for historical reasons and using
proper btrfs_info/... need extracting structures like fs_info from
various data. Josef's comment is valid but you can skip that for the QAT
series.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-26 10:54 ` [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp Giovanni Cabiddu
  2024-04-29 13:56   ` Josef Bacik
@ 2024-04-29 15:57   ` David Sterba
  1 sibling, 0 replies; 24+ messages in thread
From: David Sterba @ 2024-04-29 15:57 UTC (permalink / raw)
  To: Giovanni Cabiddu
  Cc: clm, josef, dsterba, herbert, linux-btrfs, linux-crypto,
	qat-linux, embg, cyan, brian.will, weigang.li

On Fri, Apr 26, 2024 at 11:54:29AM +0100, Giovanni Cabiddu wrote:
> From: Weigang Li <weigang.li@intel.com>
> +static int acomp_comp_pages(struct address_space *mapping, u64 start,
> +			    unsigned long len, struct page **pages,
> +			    unsigned long *out_pages,
> +			    unsigned long *total_in,
> +			    unsigned long *total_out)
> +{
> +	unsigned int nr_src_pages = 0, nr_dst_pages = 0, nr_pages = 0;
> +	struct sg_table in_sg = { 0 }, out_sg = { 0 };
> +	struct page *in_page, *out_page, **in_pages;
> +	struct crypto_acomp *tfm = NULL;
> +	struct acomp_req *req = NULL;
> +	struct crypto_wait wait;
> +	int ret, i;
> +
> +	nr_src_pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +	in_pages = kcalloc(nr_src_pages, sizeof(struct page *), GFP_KERNEL);

The maximum length is bounded so you could store the in_pages array in
zlib's workspace.

> +	if (!in_pages) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < nr_src_pages; i++) {
> +		in_page = find_get_page(mapping, start >> PAGE_SHIFT);
> +		out_page = alloc_page(GFP_NOFS | __GFP_HIGHMEM);

Output pages should be newly allocated by btrfs_alloc_compr_folio()

> +		if (!in_page || !out_page) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +		in_pages[i] = in_page;
> +		pages[i] = out_page;
> +		nr_dst_pages += 1;
> +		start += PAGE_SIZE;
> +	}
> +
> +	ret = sg_alloc_table_from_pages(&in_sg, in_pages, nr_src_pages, 0,
> +					nr_src_pages << PAGE_SHIFT, GFP_KERNEL);

I'm not sure if the sg interface allows to use an existing buffer but
the input parameters are bounded in size and count so the allocation
should be dropped and replaced by workspace data.

> +	if (ret)
> +		goto out;
> +
> +	ret = sg_alloc_table_from_pages(&out_sg, pages, nr_dst_pages, 0,
> +					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
> +	if (ret)
> +		goto out;
> +
> +	crypto_init_wait(&wait);
> +	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);

AFAIK the TFM should be allocated only once way before any IO is done
and then reused, this can trigger resolving the best implementation or
maybe even module loading.

> +	if (IS_ERR(tfm)) {
> +		ret = PTR_ERR(tfm);
> +		goto out;
> +	}
> +
> +	req = acomp_request_alloc(tfm);

The request should be in workspace, the only initialization I see
setting the right ->tfm pointer.

> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, len,
> +				 nr_dst_pages << PAGE_SHIFT);
> +	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &wait);
> +
> +	ret = crypto_wait_req(crypto_acomp_compress(req), &wait);
> +	if (ret)
> +		goto out;
> +
> +	*total_in = len;
> +	*total_out = req->dlen;
> +	nr_pages = (*total_out + PAGE_SIZE - 1) >> PAGE_SHIFT;
> +
> +out:
> +	sg_free_table(&in_sg);
> +	sg_free_table(&out_sg);
> +
> +	if (in_pages) {
> +		for (i = 0; i < nr_src_pages; i++)
> +			put_page(in_pages[i]);
> +		kfree(in_pages);

Pages returned back to the pool by btrfs_free_compr_folio()

> +	}
> +
> +	/* free un-used out pages */
> +	for (i = nr_pages; i < nr_dst_pages; i++)
> +		put_page(pages[i]);
> +
> +	if (req)
> +		acomp_request_free(req);
> +
> +	if (tfm)
> +		crypto_free_acomp(tfm);
> +
> +	*out_pages = nr_pages;
> +
> +	return ret;
> +}
> +
> +static int acomp_zlib_decomp_bio(struct page **in_pages,
> +				 struct compressed_bio *cb, size_t srclen,
> +				 unsigned long total_pages_in)
> +{
> +	unsigned int nr_dst_pages = BTRFS_MAX_COMPRESSED_PAGES;
> +	struct sg_table in_sg = { 0 }, out_sg = { 0 };
> +	struct bio *orig_bio = &cb->orig_bbio->bio;
> +	char *data_out = NULL, *bv_buf = NULL;
> +	int copy_len = 0, bytes_left = 0;
> +	struct crypto_acomp *tfm = NULL;
> +	struct page **out_pages = NULL;
> +	struct acomp_req *req = NULL;
> +	struct crypto_wait wait;
> +	struct bio_vec bvec;
> +	int ret, i = 0;
> +
> +	ret = sg_alloc_table_from_pages(&in_sg, in_pages, total_pages_in,
> +					0, srclen, GFP_KERNEL);

Any allocation here needs to be GFP_NOFS for now. Actually we'd need
memalloc_nofs_save/memalloc_nofs_restore around all compression and
decompression code that does not use GFP_NOFS directly and could call
other APIs that do GFP_KERNEL. Like crypto or sg.

> +	if (ret)
> +		goto out;
> +
> +	out_pages = kcalloc(nr_dst_pages, sizeof(struct page *), GFP_KERNEL);
> +	if (!out_pages) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	for (i = 0; i < nr_dst_pages; i++) {
> +		out_pages[i] = alloc_page(GFP_NOFS | __GFP_HIGHMEM);
> +		if (!out_pages[i]) {
> +			ret = -ENOMEM;
> +			goto out;
> +		}
> +	}
> +
> +	ret = sg_alloc_table_from_pages(&out_sg, out_pages, nr_dst_pages, 0,
> +					nr_dst_pages << PAGE_SHIFT, GFP_KERNEL);
> +	if (ret)
> +		goto out;
> +
> +	crypto_init_wait(&wait);
> +	tfm = crypto_alloc_acomp("zlib-deflate", 0, 0);
> +	if (IS_ERR(tfm)) {
> +		ret = PTR_ERR(tfm);
> +		goto out;
> +	}
> +
> +	req = acomp_request_alloc(tfm);
> +	if (!req) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
> +
> +	acomp_request_set_params(req, in_sg.sgl, out_sg.sgl, srclen,
> +				 nr_dst_pages << PAGE_SHIFT);
> +	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
> +				   crypto_req_done, &wait);
> +
> +	ret = crypto_wait_req(crypto_acomp_decompress(req), &wait);
> +	if (ret)
> +		goto out;
> +
> +	/* Copy decompressed buffer to bio pages */
> +	bytes_left = req->dlen;
> +	for (i = 0; i < nr_dst_pages; i++) {
> +		copy_len = bytes_left > PAGE_SIZE ? PAGE_SIZE : bytes_left;
> +		data_out = kmap_local_page(out_pages[i]);
> +
> +		bvec = bio_iter_iovec(orig_bio, orig_bio->bi_iter);
> +		bv_buf = kmap_local_page(bvec.bv_page);
> +		memcpy(bv_buf, data_out, copy_len);
> +		kunmap_local(bv_buf);
> +
> +		bio_advance(orig_bio, copy_len);
> +		if (!orig_bio->bi_iter.bi_size)
> +			break;
> +		bytes_left -= copy_len;
> +		if (bytes_left <= 0)
> +			break;
> +	}
> +out:
> +	sg_free_table(&in_sg);
> +	sg_free_table(&out_sg);
> +
> +	if (out_pages) {
> +		for (i = 0; i < nr_dst_pages; i++) {
> +			if (out_pages[i])
> +				put_page(out_pages[i]);
> +		}
> +		kfree(out_pages);
> +	}
> +
> +	if (req)
> +		acomp_request_free(req);
> +	if (tfm)
> +		crypto_free_acomp(tfm);
> +
> +	return ret;
> +}
> +
>  struct list_head *zlib_get_workspace(unsigned int level)
>  {
>  	struct list_head *ws = btrfs_get_workspace(BTRFS_COMPRESS_ZLIB, level);
> @@ -108,6 +305,15 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
>  	unsigned long nr_dest_pages = *out_pages;
>  	const unsigned long max_out = nr_dest_pages * PAGE_SIZE;
>  
> +	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
> +		ret = acomp_comp_pages(mapping, start, len, pages, out_pages,
> +				       total_in, total_out);
> +		if (!ret)
> +			return ret;
> +
> +		pr_warn("BTRFS: acomp compression failed: ret = %d\n", ret);
> +		/* Fallback to SW implementation if HW compression failed */
> +	}
>  	*out_pages = 0;
>  	*total_out = 0;
>  	*total_in = 0;
> @@ -281,6 +487,16 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
>  	unsigned long buf_start;
>  	struct page **pages_in = cb->compressed_pages;
>  
> +	if (crypto_has_acomp("zlib-deflate", 0, 0)) {
> +		ret = acomp_zlib_decomp_bio(pages_in, cb, srclen,
> +					    total_pages_in);
> +		if (!ret)
> +			return ret;
> +
> +		pr_warn("BTRFS: acomp decompression failed, ret=%d\n", ret);
> +		/* Fallback to SW implementation if HW decompression failed */
> +	}
> +
>  	data_in = kmap_local_page(pages_in[page_in_index]);
>  	workspace->strm.next_in = data_in;
>  	workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE);
> -- 
> 2.44.0
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-29 15:21     ` Cabiddu, Giovanni
  2024-04-29 15:44       ` David Sterba
@ 2024-05-03 10:04       ` Herbert Xu
  1 sibling, 0 replies; 24+ messages in thread
From: Herbert Xu @ 2024-05-03 10:04 UTC (permalink / raw)
  To: Cabiddu, Giovanni
  Cc: Josef Bacik, clm, dsterba, linux-btrfs, linux-crypto, qat-linux,
	embg, cyan, brian.will, weigang.li

On Mon, Apr 29, 2024 at 04:21:46PM +0100, Cabiddu, Giovanni wrote:
>
> @Herbert, do you have any objection if we add the compression level to
> the acomp tfm and we add an API to set it? Example:
> 
>     tfm = crypto_alloc_acomp("deflate", 0, 0);
>     acomp_set_level(tfm, compression_level);

Yes I think that's fine.  I'd make it a more generic interface
and model it after setkey so that you can set any parameter for
the given algorithm.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2024-04-29 15:41     ` David Sterba
@ 2025-05-06 15:38       ` Cabiddu, Giovanni
  2025-05-07  2:23         ` Herbert Xu
  2025-05-07 12:43         ` David Sterba
  0 siblings, 2 replies; 24+ messages in thread
From: Cabiddu, Giovanni @ 2025-05-06 15:38 UTC (permalink / raw)
  To: dsterba@suse.cz
  Cc: Josef Bacik, clm@fb.com, dsterba@suse.com,
	herbert@gondor.apana.org.au, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

Hi David,

I've resumed work on this, now that I have all necessary dependencies to
offload ZSTD to QAT.

On Mon, Apr 29, 2024 at 04:41:29PM +0100, David Sterba wrote:
...
> I'd skip the style and implementation details for now. The absence of
> compression level support seems like the biggest problem, also in
> combination with uncondtional use of the acomp interface.
Regarding compression levels, Herbert suggested a new interface that
would effectively address this concern [1].

> We'd have to enhance the compression format specifier to make it
> configurable in the sense: if accelerator is available use it, otherwise
> do CPU and synchronous compression.
For usability, wouldn't it be better to have a transparent solution? If
an accelerator is present, use it, rather than having a configuration
knob.

In any case, I would like to understand the best way forward here. I
would like to offload both zlib_deflate and ZSTD. However, I wouldn't
like to replicate the logic that calls the acomp APIs in both
fs/btrfs/zlib.c and fs/btrfs/zstd.c.

Thanks,

-- 
Giovanni

[1] https://lore.kernel.org/all/cover.1716202860.git.herbert@gondor.apana.org.au/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-06 15:38       ` Cabiddu, Giovanni
@ 2025-05-07  2:23         ` Herbert Xu
  2025-05-07 12:17           ` David Sterba
  2025-05-07 12:43         ` David Sterba
  1 sibling, 1 reply; 24+ messages in thread
From: Herbert Xu @ 2025-05-07  2:23 UTC (permalink / raw)
  To: Cabiddu, Giovanni
  Cc: dsterba@suse.cz, Josef Bacik, clm@fb.com, dsterba@suse.com,
	linux-btrfs@vger.kernel.org, linux-crypto@vger.kernel.org,
	qat-linux, embg@meta.com, Collet, Yann, Will, Brian, Li, Weigang

On Tue, May 06, 2025 at 04:38:11PM +0100, Cabiddu, Giovanni wrote:
>
> > We'd have to enhance the compression format specifier to make it
> > configurable in the sense: if accelerator is available use it, otherwise
> > do CPU and synchronous compression.
> For usability, wouldn't it be better to have a transparent solution? If
> an accelerator is present, use it, rather than having a configuration
> knob.

If you go through the Crypto API you won't need to add a new knob
at all.

The Crypto API is already configurable and comes with a knob
pre-installed.  Just download crconf and you can configure which
algorithm will be used by default:

https://git.code.sf.net/p/crconf/code

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-07  2:23         ` Herbert Xu
@ 2025-05-07 12:17           ` David Sterba
  2025-05-08  4:19             ` Eric Biggers
  0 siblings, 1 reply; 24+ messages in thread
From: David Sterba @ 2025-05-07 12:17 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Cabiddu, Giovanni, Josef Bacik, clm@fb.com, dsterba@suse.com,
	linux-btrfs@vger.kernel.org, linux-crypto@vger.kernel.org,
	qat-linux, embg@meta.com, Collet, Yann, Will, Brian, Li, Weigang

On Wed, May 07, 2025 at 10:23:53AM +0800, Herbert Xu wrote:
> On Tue, May 06, 2025 at 04:38:11PM +0100, Cabiddu, Giovanni wrote:
> >
> > > We'd have to enhance the compression format specifier to make it
> > > configurable in the sense: if accelerator is available use it, otherwise
> > > do CPU and synchronous compression.
> > For usability, wouldn't it be better to have a transparent solution? If
> > an accelerator is present, use it, rather than having a configuration
> > knob.
> 
> If you go through the Crypto API you won't need to add a new knob
> at all.
> 
> The Crypto API is already configurable and comes with a knob
> pre-installed.  Just download crconf and you can configure which
> algorithm will be used by default:
> 
> https://git.code.sf.net/p/crconf/code

First time I hear about such tool and given what it does I think it
should have better visibility and presence also in "the other" git
sources. It's not mentioned in linux Documentation either.

SourceForge is not taken seriously for quite some time and the project
landing page https://sourceforge.net/projects/crconf/ matches the
pattern of abandoned SF projects, last commit 5 years ago.

The first hit on gihub is https://github.com/Thermi/crconf and at least
it matches the SF commits.

The command line interface follows the iproute2 style that is
arguably hard to navigate:

  $ ./crconf
  Usage: crconf add { ALG | DRIVER } TYPE [ PRIORITY ]
	 crconf del DRIVER TYPE
	 crconf update DRIVER TYPE [ PRIORITY ]
	 crconf show { DRIVER TYPE | all }
	 crconf help
  ALG := alg <alg-name>
  DRIVER := driver <driver-name>
  TYPE := type ALGO-TYPE
  PRIORITY := priority <number>
  ALGO-TYPE := { 1 | 2 | 3 | 5 | 8 | 10 | 11 | 12 | 13 | 14 | 15 }
		 1 == alg type cipher
		 2 == alg type compress
		 3 == alg type aead
		 5 == alg type skcipher
		 8 == alg type kpp
		10 == alg type acompress
		11 == alg type scompress
		12 == alg type rng
		13 == alg type akcipher
		14 == alg type hash
		14 == alg type shash
		15 == alg type ahash

The manual page lacks any useful examples and figuring out how to set
the priority took me a few minutes of grepping in /proc/crypto, trial
and error to end up with:

  sudo ./crconf update driver sha256-generic type 14 priority 1000

The usability has a lot of room for improvement.

Anyway, assuming there will be a maintained, packaged in distros and
user friendly tool to tweak the linux crypto subsystem I agree we don't
have to do it in the filesystem or other subsystems.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-06 15:38       ` Cabiddu, Giovanni
  2025-05-07  2:23         ` Herbert Xu
@ 2025-05-07 12:43         ` David Sterba
  2025-05-07 13:12           ` Cabiddu, Giovanni
  1 sibling, 1 reply; 24+ messages in thread
From: David Sterba @ 2025-05-07 12:43 UTC (permalink / raw)
  To: Cabiddu, Giovanni
  Cc: Josef Bacik, clm@fb.com, dsterba@suse.com,
	herbert@gondor.apana.org.au, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

On Tue, May 06, 2025 at 04:38:11PM +0100, Cabiddu, Giovanni wrote:
> Hi David,
> 
> I've resumed work on this, now that I have all necessary dependencies to
> offload ZSTD to QAT.
> 
> On Mon, Apr 29, 2024 at 04:41:29PM +0100, David Sterba wrote:
> ...
> > I'd skip the style and implementation details for now. The absence of
> > compression level support seems like the biggest problem, also in
> > combination with uncondtional use of the acomp interface.
> Regarding compression levels, Herbert suggested a new interface that
> would effectively address this concern [1].

It seems to be sufficient for passing the level from filesystem to
crypto layer.

One thing I'm not sure is considered is that zstd has different
requirements for workspace size depending on the level. In btrfs this is
done in zstd_calc_ws_mem_sizes().

> > We'd have to enhance the compression format specifier to make it
> > configurable in the sense: if accelerator is available use it, otherwise
> > do CPU and synchronous compression.
> For usability, wouldn't it be better to have a transparent solution? If
> an accelerator is present, use it, rather than having a configuration
> knob.
> 
> In any case, I would like to understand the best way forward here. I
> would like to offload both zlib_deflate and ZSTD. However, I wouldn't
> like to replicate the logic that calls the acomp APIs in both
> fs/btrfs/zlib.c and fs/btrfs/zstd.c.

This looks doable, acomp_comp_pages() from your patch could work for
both, the only difference is the parameter to crypto_alloc_acomp() and
eventually the level.

Otherwise, how to go forward with that. I think we'd need to get a few
iterations staring from what you have, with added support for the levels
and then we'll remove/replace the problematic parts like the numerous
allocations.

As the first step, please send an update with the acomp levels added to
zlib callbacks, so it works in normal conditons with all the
allocations.

We'll need to move repeatedly used structures to the workspaces so that
will be the second step. Once we settle on someting reasonable we can
extend it and add zstd support.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-07 12:43         ` David Sterba
@ 2025-05-07 13:12           ` Cabiddu, Giovanni
  0 siblings, 0 replies; 24+ messages in thread
From: Cabiddu, Giovanni @ 2025-05-07 13:12 UTC (permalink / raw)
  To: David Sterba
  Cc: Josef Bacik, clm@fb.com, dsterba@suse.com,
	herbert@gondor.apana.org.au, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

On Wed, May 07, 2025 at 02:43:21PM +0200, David Sterba wrote:
> On Tue, May 06, 2025 at 04:38:11PM +0100, Cabiddu, Giovanni wrote:
> > Hi David,
> > 
> > I've resumed work on this, now that I have all necessary dependencies to
> > offload ZSTD to QAT.
> > 
> > On Mon, Apr 29, 2024 at 04:41:29PM +0100, David Sterba wrote:
> > ...
> > > I'd skip the style and implementation details for now. The absence of
> > > compression level support seems like the biggest problem, also in
> > > combination with uncondtional use of the acomp interface.
> > Regarding compression levels, Herbert suggested a new interface that
> > would effectively address this concern [1].
> 
> It seems to be sufficient for passing the level from filesystem to
> crypto layer.
> 
> One thing I'm not sure is considered is that zstd has different
> requirements for workspace size depending on the level. In btrfs this is
> done in zstd_calc_ws_mem_sizes().
> 
> > > We'd have to enhance the compression format specifier to make it
> > > configurable in the sense: if accelerator is available use it, otherwise
> > > do CPU and synchronous compression.
> > For usability, wouldn't it be better to have a transparent solution? If
> > an accelerator is present, use it, rather than having a configuration
> > knob.
> > 
> > In any case, I would like to understand the best way forward here. I
> > would like to offload both zlib_deflate and ZSTD. However, I wouldn't
> > like to replicate the logic that calls the acomp APIs in both
> > fs/btrfs/zlib.c and fs/btrfs/zstd.c.
> 
> This looks doable, acomp_comp_pages() from your patch could work for
> both, the only difference is the parameter to crypto_alloc_acomp() and
> eventually the level.
> 
> Otherwise, how to go forward with that. I think we'd need to get a few
> iterations staring from what you have, with added support for the levels
> and then we'll remove/replace the problematic parts like the numerous
> allocations.
> 
> As the first step, please send an update with the acomp levels added to
> zlib callbacks, so it works in normal conditons with all the
> allocations.
> 
> We'll need to move repeatedly used structures to the workspaces so that
> will be the second step. Once we settle on someting reasonable we can
> extend it and add zstd support.

Thanks David. I like the incremental approach.

I'll be sending a new version that is rebased on top of 6.15-rc. This
update will include the addition of compression levels to zlib-deflate
and will utilize folios instead of pages.
We can proceed from there.

Regards,

-- 
Giovanni

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-07 12:17           ` David Sterba
@ 2025-05-08  4:19             ` Eric Biggers
  2025-05-12 17:52               ` David Sterba
  2025-05-27  2:32               ` Gao Xiang
  0 siblings, 2 replies; 24+ messages in thread
From: Eric Biggers @ 2025-05-08  4:19 UTC (permalink / raw)
  To: David Sterba
  Cc: Herbert Xu, Cabiddu, Giovanni, Josef Bacik, clm@fb.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

On Wed, May 07, 2025 at 02:17:54PM +0200, David Sterba wrote:
> Anyway, assuming there will be a maintained, packaged in distros and
> user friendly tool to tweak the linux crypto subsystem I agree we don't
> have to do it in the filesystem or other subsystems.

I don't think there ever will be.  NETLINK_CRYPTO is obscure and hardly anyone
uses it.  The kernel's generic crypto infrastructure is also really cumbersome
to use, so the trend in the kernel overall has been a move away from the generic
crypto infrastructure and towards straightforward library APIs (e.g.
lib/crypto/) that just do the right thing with no configuration needed.

btrfs already uses the compression library APIs.  Considering how disastrous
crypto_acomp has been so far when other people tried to use it, most likely the
right decision will be to keep using the library APIs for the vast majority of
btrfs users, and have an alternative code path that uses crypto_acomp only when
hardware offload is actually being used.

There may also be a way to rework things so that the compression library APIs
can use hardware offload, in which case the crypto API would play no role at
all.  I understand the Zstandard library in userspace can use Intel QAT as an
external sequence provider, so it's been proven that this can be done.

BTW, I also have to wonder why this patchset is proposing accelerating zlib
instead of Zstandard.  Zstandard is a much more modern algorithm.

- Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-08  4:19             ` Eric Biggers
@ 2025-05-12 17:52               ` David Sterba
  2025-05-27  2:32               ` Gao Xiang
  1 sibling, 0 replies; 24+ messages in thread
From: David Sterba @ 2025-05-12 17:52 UTC (permalink / raw)
  To: Eric Biggers
  Cc: David Sterba, Herbert Xu, Cabiddu, Giovanni, Josef Bacik,
	clm@fb.com, dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

On Wed, May 07, 2025 at 09:19:14PM -0700, Eric Biggers wrote:
> On Wed, May 07, 2025 at 02:17:54PM +0200, David Sterba wrote:
> > Anyway, assuming there will be a maintained, packaged in distros and
> > user friendly tool to tweak the linux crypto subsystem I agree we don't
> > have to do it in the filesystem or other subsystems.
> 
> I don't think there ever will be.  NETLINK_CRYPTO is obscure and hardly anyone
> uses it.  The kernel's generic crypto infrastructure is also really cumbersome
> to use, so the trend in the kernel overall has been a move away from the generic
> crypto infrastructure and towards straightforward library APIs (e.g.
> lib/crypto/) that just do the right thing with no configuration needed.

Ok, so on hand the recommendation is to use an obscure tool and
interface and ont the other hand kernel is moving towards library API.
I don't mind using the library approach, as long as it provides the
automatic selection of the fastest implementation available (it could be
even an extra API call e.g. at mount time).

> btrfs already uses the compression library APIs.  Considering how disastrous
> crypto_acomp has been so far when other people tried to use it, most likely the
> right decision will be to keep using the library APIs for the vast majority of
> btrfs users, and have an alternative code path that uses crypto_acomp only when
> hardware offload is actually being used.

No problem with that. I once had a prototype to do async checksumming
and using the ahash was indeed cumbersome, with the mempools and request
handling to avoid deadlocks.

> There may also be a way to rework things so that the compression library APIs
> can use hardware offload, in which case the crypto API would play no role at
> all.  I understand the Zstandard library in userspace can use Intel QAT as an
> external sequence provider, so it's been proven that this can be done.

As along as the switch to library or QAT can be in a wrapper I don't
mind.

> BTW, I also have to wonder why this patchset is proposing accelerating zlib
> instead of Zstandard.  Zstandard is a much more modern algorithm.

I think the plan is to support zstd as well but the QAT integration
should be simpler on zlib. I'd like to see some up to date code first to
get better idea of what and how.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-08  4:19             ` Eric Biggers
  2025-05-12 17:52               ` David Sterba
@ 2025-05-27  2:32               ` Gao Xiang
  2025-05-27  2:45                 ` Gao Xiang
  2025-05-27 11:17                 ` David Sterba
  1 sibling, 2 replies; 24+ messages in thread
From: Gao Xiang @ 2025-05-27  2:32 UTC (permalink / raw)
  To: Eric Biggers, David Sterba
  Cc: Herbert Xu, Cabiddu, Giovanni, Josef Bacik, clm@fb.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang



On 2025/5/8 12:19, Eric Biggers wrote:

...

> 
> BTW, I also have to wonder why this patchset is proposing accelerating zlib
> instead of Zstandard.  Zstandard is a much more modern algorithm.

I think simply because QAT doesn't support the Zstandard native offload.
At least, for Intel Xeon Sapphire Rapids processors (it seems to have
built-in QAT 4xxx), only LZ4 and deflate-family are natively supported.

I've confirmed that SPR QAT deflate hardware decompresion already surpasses
LZ4 software decompression on our cloud server setup, which is useful since
it greatly improves decompression performance (even compared to software LZ4)
and saves CPU overhead completely.

For Zstandard, currently it seems it [1] just leverages part of the QAT
engine: hardware sequence producer (matchfinder) to boost up Zstandard
match finding And I think the current hardware QAT will have no positive
effect on the Zstandard decompression.  Without hardware improvements, it
seems it can be hardly changed.

I think it's true for both 5th Sapphire Rapids processors and 6th Granite
Rapids processors.  Correct if I'm wrong here but that is what I got so far.

Thanks,
Gao Xiang

[1] https://github.com/intel/QAT-ZSTD-Plugin

> 
> - Eric
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-27  2:32               ` Gao Xiang
@ 2025-05-27  2:45                 ` Gao Xiang
  2025-05-27 11:17                 ` David Sterba
  1 sibling, 0 replies; 24+ messages in thread
From: Gao Xiang @ 2025-05-27  2:45 UTC (permalink / raw)
  To: Eric Biggers, David Sterba
  Cc: Herbert Xu, Cabiddu, Giovanni, Josef Bacik, clm@fb.com,
	dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang



On 2025/5/27 10:32, Gao Xiang wrote:
> 
> 
> On 2025/5/8 12:19, Eric Biggers wrote:
> 
> ...
> 
>>
>> BTW, I also have to wonder why this patchset is proposing accelerating zlib
>> instead of Zstandard.  Zstandard is a much more modern algorithm.
> 
> I think simply because QAT doesn't support the Zstandard native offload.
> At least, for Intel Xeon Sapphire Rapids processors (it seems to have
> built-in QAT 4xxx), only LZ4 and deflate-family are natively supported.
> 
> I've confirmed that SPR QAT deflate hardware decompresion already surpasses
> LZ4 software decompression on our cloud server setup, which is useful since
> it greatly improves decompression performance (even compared to software LZ4)
> and saves CPU overhead completely.

Also see:
https://intel.github.io/quickassist/PG/services_compression_api.html

for more details on QAT hardware native supported algorithms.

> 
> For Zstandard, currently it seems it [1] just leverages part of the QAT
> engine: hardware sequence producer (matchfinder) to boost up Zstandard
> match finding And I think the current hardware QAT will have no positive
> effect on the Zstandard decompression.  Without hardware improvements, it
> seems it can be hardly changed.
> 
> I think it's true for both 5th Sapphire Rapids processors and 6th Granite
> Rapids processors.  Correct if I'm wrong here but that is what I got so far.
> 
> Thanks,
> Gao Xiang
> 
> [1] https://github.com/intel/QAT-ZSTD-Plugin
> 
>>
>> - Eric
>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-27  2:32               ` Gao Xiang
  2025-05-27  2:45                 ` Gao Xiang
@ 2025-05-27 11:17                 ` David Sterba
  2025-05-27 12:08                   ` Gao Xiang
  1 sibling, 1 reply; 24+ messages in thread
From: David Sterba @ 2025-05-27 11:17 UTC (permalink / raw)
  To: Gao Xiang
  Cc: Eric Biggers, Herbert Xu, Cabiddu, Giovanni, Josef Bacik,
	clm@fb.com, dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

On Tue, May 27, 2025 at 10:32:00AM +0800, Gao Xiang wrote:
> 
> 
> On 2025/5/8 12:19, Eric Biggers wrote:
> 
> ...
> 
> > 
> > BTW, I also have to wonder why this patchset is proposing accelerating zlib
> > instead of Zstandard.  Zstandard is a much more modern algorithm.
> 
> I think simply because QAT doesn't support the Zstandard native offload.
> At least, for Intel Xeon Sapphire Rapids processors (it seems to have
> built-in QAT 4xxx), only LZ4 and deflate-family are natively supported.
> 
> I've confirmed that SPR QAT deflate hardware decompresion already surpasses
> LZ4 software decompression on our cloud server setup, which is useful since
> it greatly improves decompression performance (even compared to software LZ4)
> and saves CPU overhead completely.

Does this measure the overall time of decompression (including the setup
steps, like the scatter/gather or similar, allocating requests, waiting
etc)?. Comparing that to the library calls plus the input page iteration.
I haven't found any public benchmarks with the QAT enabled compression.
I'm interested how it's benchmarked because we'v had people pointing out
that LZ4 itself is very fast, but when the overhead is taken into
account it's reducing the overall performance. Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp
  2025-05-27 11:17                 ` David Sterba
@ 2025-05-27 12:08                   ` Gao Xiang
  0 siblings, 0 replies; 24+ messages in thread
From: Gao Xiang @ 2025-05-27 12:08 UTC (permalink / raw)
  To: dsterba
  Cc: Eric Biggers, Herbert Xu, Cabiddu, Giovanni, Josef Bacik,
	clm@fb.com, dsterba@suse.com, linux-btrfs@vger.kernel.org,
	linux-crypto@vger.kernel.org, qat-linux, embg@meta.com,
	Collet, Yann, Will, Brian, Li, Weigang

Hi David,

On 2025/5/27 19:17, David Sterba wrote:
> On Tue, May 27, 2025 at 10:32:00AM +0800, Gao Xiang wrote:
>>
>>
>> On 2025/5/8 12:19, Eric Biggers wrote:
>>
>> ...
>>
>>>
>>> BTW, I also have to wonder why this patchset is proposing accelerating zlib
>>> instead of Zstandard.  Zstandard is a much more modern algorithm.
>>
>> I think simply because QAT doesn't support the Zstandard native offload.
>> At least, for Intel Xeon Sapphire Rapids processors (it seems to have
>> built-in QAT 4xxx), only LZ4 and deflate-family are natively supported.
>>
>> I've confirmed that SPR QAT deflate hardware decompresion already surpasses
>> LZ4 software decompression on our cloud server setup, which is useful since
>> it greatly improves decompression performance (even compared to software LZ4)
>> and saves CPU overhead completely.
> 
> Does this measure the overall time of decompression (including the setup
> steps, like the scatter/gather or similar, allocating requests, waiting
> etc)?. Comparing that to the library calls plus the input page iteration.
> I haven't found any public benchmarks with the QAT enabled compression.
> I'm interested how it's benchmarked because we'v had people pointing out
> that LZ4 itself is very fast, but when the overhead is taken into
> account it's reducing the overall performance. Thanks.

Yes, EROFS already supports QAT end-to-end since the ongoing Linux 6.16:

Processor: Intel(R) Xeon(R) Platinum 8475B (192 cores)
Memory: 512 GiB
Dataset: enwik9
Test command: fio --filename=enwik9 -rw=read -readonly -bs=4k -ioengine=psync -name=job1

1) $ mkfs.erofs -zdeflate -C1048576 enwik9.dfl enwik9
    $ echo qat_deflate > /sys/fs/erofs/accel
    READ: bw=662MiB/s (694MB/s), 662MiB/s-662MiB/s (694MB/s-694MB/s), io=954MiB (1000MB), run=1440-1440msec
    $ echo >  /sys/fs/erofs/accel
    READ: bw=381MiB/s (400MB/s), 381MiB/s-381MiB/s (400MB/s-400MB/s), io=954MiB (1000MB), run=2500-2500msec

2) $ mkfs.erofs -zlz4hc -C1048576 enwik9.lz4 enwik9
    READ: bw=541MiB/s (568MB/s), 541MiB/s-541MiB/s (568MB/s-568MB/s), io=954MiB (1000MB), run=1762-1762msec

However, my current test case that the cloud disk is slow (I use the cheapest
cloud disk setup because it will be used for rootfs and container images), so
the overall e2e is I/O bound instead of CPU bound, so in that case since
deflate can compress better (so can save more disk I/Os), it can surpass the
LZ4 one (because even LZ4 is faster but cost more I/Os due to large image).

If the storage/CPU combination is CPU bound, I think it could have different
results anyway.

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-05-27 12:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-26 10:54 [RFC PATCH 0/6] btrfs: offload zlib-deflate to accelerators Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 1/6] Revert "crypto: testmgr - Remove zlib-deflate" Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 2/6] Revert "crypto: deflate " Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 3/6] Revert "crypto: qat " Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 4/6] Revert "crypto: qat - remove unused macros in qat_comp_alg.c" Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 5/6] crypto: qat - change compressor settings for QAT GEN4 Giovanni Cabiddu
2024-04-26 10:54 ` [RFC PATCH 6/6] btrfs: zlib: add support for zlib-deflate through acomp Giovanni Cabiddu
2024-04-29 13:56   ` Josef Bacik
2024-04-29 15:21     ` Cabiddu, Giovanni
2024-04-29 15:44       ` David Sterba
2024-05-03 10:04       ` Herbert Xu
2024-04-29 15:41     ` David Sterba
2025-05-06 15:38       ` Cabiddu, Giovanni
2025-05-07  2:23         ` Herbert Xu
2025-05-07 12:17           ` David Sterba
2025-05-08  4:19             ` Eric Biggers
2025-05-12 17:52               ` David Sterba
2025-05-27  2:32               ` Gao Xiang
2025-05-27  2:45                 ` Gao Xiang
2025-05-27 11:17                 ` David Sterba
2025-05-27 12:08                   ` Gao Xiang
2025-05-07 12:43         ` David Sterba
2025-05-07 13:12           ` Cabiddu, Giovanni
2024-04-29 15:57   ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).