* [PATCH RFC v2 1/5] zram: Rename zcomp_strm_{init, free}()
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
@ 2026-03-09 12:23 ` Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 2/5] zram: Separate the lock from zcomp_strm Jihan LIN via B4 Relay
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Jihan LIN via B4 Relay @ 2026-03-09 12:23 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe
Cc: linux-kernel, linux-block, Jihan LIN
From: Jihan LIN <linjh22s@gmail.com>
Currently, zcomp uses a preemptive per-CPU stream model where streams
are allocated for each online CPU and guarded by mutexes. The existing
names zcomp_strm_{init, free}() are too generic to explicitly indicate
they handle per-CPU streams.
Rename them to zcomp_strm_{init, free}_percpu(). This helps distinguish
them from future streams that may not be per-CPU based. No functional
changes are intended.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
drivers/block/zram/zcomp.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index a771a8ecc540bad8bf535ec87712da694e34a1d3..f2834898d3b700746db7bd2296ea9f4186e183c8 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -43,7 +43,7 @@ static const struct zcomp_ops *backends[] = {
NULL
};
-static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
+static void zcomp_strm_free_percpu(struct zcomp *comp, struct zcomp_strm *zstrm)
{
comp->ops->destroy_ctx(&zstrm->ctx);
vfree(zstrm->local_copy);
@@ -51,7 +51,7 @@ static void zcomp_strm_free(struct zcomp *comp, struct zcomp_strm *zstrm)
zstrm->buffer = NULL;
}
-static int zcomp_strm_init(struct zcomp *comp, struct zcomp_strm *zstrm)
+static int zcomp_strm_init_percpu(struct zcomp *comp, struct zcomp_strm *zstrm)
{
int ret;
@@ -66,7 +66,7 @@ static int zcomp_strm_init(struct zcomp *comp, struct zcomp_strm *zstrm)
*/
zstrm->buffer = vzalloc(2 * PAGE_SIZE);
if (!zstrm->buffer || !zstrm->local_copy) {
- zcomp_strm_free(comp, zstrm);
+ zcomp_strm_free_percpu(comp, zstrm);
return -ENOMEM;
}
return 0;
@@ -172,7 +172,7 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
int ret;
- ret = zcomp_strm_init(comp, zstrm);
+ ret = zcomp_strm_init_percpu(comp, zstrm);
if (ret)
pr_err("Can't allocate a compression stream\n");
return ret;
@@ -184,7 +184,7 @@ int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
mutex_lock(&zstrm->lock);
- zcomp_strm_free(comp, zstrm);
+ zcomp_strm_free_percpu(comp, zstrm);
mutex_unlock(&zstrm->lock);
return 0;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH RFC v2 2/5] zram: Separate the lock from zcomp_strm
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 1/5] zram: Rename zcomp_strm_{init, free}() Jihan LIN via B4 Relay
@ 2026-03-09 12:23 ` Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams Jihan LIN via B4 Relay
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Jihan LIN via B4 Relay @ 2026-03-09 12:23 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe
Cc: linux-kernel, linux-block, Jihan LIN
From: Jihan LIN <linjh22s@gmail.com>
Currently zcomp_strm has a lock for default per-CPU streams. This field
should not be part of the generic stream structure.
Remove lock from zcomp_strm, and introduce struct percpu_zstrm for
per-CPU streams. This cleans up struct zcomp_strm and separates the
stream definition from its locking policy.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
drivers/block/zram/zcomp.c | 44 +++++++++++++++++++++++++++++---------------
drivers/block/zram/zcomp.h | 5 +++--
2 files changed, 32 insertions(+), 17 deletions(-)
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index f2834898d3b700746db7bd2296ea9f4186e183c8..daea592f01c37106b14dca9c6d8727a2240de54b 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -43,17 +43,32 @@ static const struct zcomp_ops *backends[] = {
NULL
};
-static void zcomp_strm_free_percpu(struct zcomp *comp, struct zcomp_strm *zstrm)
+struct percpu_zstrm {
+ struct zcomp_strm strm;
+ struct mutex lock;
+};
+
+static struct percpu_zstrm *zstrm_to_pcpu(struct zcomp_strm *zstrm)
+{
+ return container_of(zstrm, struct percpu_zstrm, strm);
+}
+
+static void zcomp_strm_free_percpu(struct zcomp *comp,
+ struct percpu_zstrm *zstrm_pcpu)
{
+ struct zcomp_strm *zstrm = &zstrm_pcpu->strm;
+
comp->ops->destroy_ctx(&zstrm->ctx);
vfree(zstrm->local_copy);
vfree(zstrm->buffer);
zstrm->buffer = NULL;
}
-static int zcomp_strm_init_percpu(struct zcomp *comp, struct zcomp_strm *zstrm)
+static int zcomp_strm_init_percpu(struct zcomp *comp,
+ struct percpu_zstrm *zstrm_pcpu)
{
int ret;
+ struct zcomp_strm *zstrm = &zstrm_pcpu->strm;
ret = comp->ops->create_ctx(comp->params, &zstrm->ctx);
if (ret)
@@ -66,7 +81,7 @@ static int zcomp_strm_init_percpu(struct zcomp *comp, struct zcomp_strm *zstrm)
*/
zstrm->buffer = vzalloc(2 * PAGE_SIZE);
if (!zstrm->buffer || !zstrm->local_copy) {
- zcomp_strm_free_percpu(comp, zstrm);
+ zcomp_strm_free_percpu(comp, zstrm_pcpu);
return -ENOMEM;
}
return 0;
@@ -110,7 +125,7 @@ ssize_t zcomp_available_show(const char *comp, char *buf, ssize_t at)
struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
{
for (;;) {
- struct zcomp_strm *zstrm = raw_cpu_ptr(comp->stream);
+ struct percpu_zstrm *zstrm_pcpu = raw_cpu_ptr(comp->stream);
/*
* Inspired by zswap
@@ -122,16 +137,16 @@ struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
* from a CPU that has already destroyed its stream. If
* so then unlock and re-try on the current CPU.
*/
- mutex_lock(&zstrm->lock);
- if (likely(zstrm->buffer))
- return zstrm;
- mutex_unlock(&zstrm->lock);
+ mutex_lock(&zstrm_pcpu->lock);
+ if (likely(zstrm_pcpu->strm.buffer))
+ return &zstrm_pcpu->strm;
+ mutex_unlock(&zstrm_pcpu->lock);
}
}
void zcomp_stream_put(struct zcomp_strm *zstrm)
{
- mutex_unlock(&zstrm->lock);
+ mutex_unlock(&zstrm_to_pcpu(zstrm)->lock);
}
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
@@ -169,7 +184,7 @@ int zcomp_decompress(struct zcomp *comp, struct zcomp_strm *zstrm,
int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
- struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
+ struct percpu_zstrm *zstrm = per_cpu_ptr(comp->stream, cpu);
int ret;
ret = zcomp_strm_init_percpu(comp, zstrm);
@@ -181,11 +196,10 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
{
struct zcomp *comp = hlist_entry(node, struct zcomp, node);
- struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
+ struct percpu_zstrm *zstrm_pcpu = per_cpu_ptr(comp->stream, cpu);
- mutex_lock(&zstrm->lock);
- zcomp_strm_free_percpu(comp, zstrm);
- mutex_unlock(&zstrm->lock);
+ guard(mutex)(&zstrm_pcpu->lock);
+ zcomp_strm_free_percpu(comp, zstrm_pcpu);
return 0;
}
@@ -193,7 +207,7 @@ static int zcomp_init(struct zcomp *comp, struct zcomp_params *params)
{
int ret, cpu;
- comp->stream = alloc_percpu(struct zcomp_strm);
+ comp->stream = alloc_percpu(struct percpu_zstrm);
if (!comp->stream)
return -ENOMEM;
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index eacfd3f7d61d9395694292713fb5da4f0023d6d7..9784bc3f432cf0e22085399b8772b8ba669071de 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -38,7 +38,6 @@ struct zcomp_ctx {
};
struct zcomp_strm {
- struct mutex lock;
/* compression buffer */
void *buffer;
/* local copy of handle memory */
@@ -46,6 +45,8 @@ struct zcomp_strm {
struct zcomp_ctx ctx;
};
+struct percpu_zstrm;
+
struct zcomp_req {
const unsigned char *src;
const size_t src_len;
@@ -71,7 +72,7 @@ struct zcomp_ops {
/* dynamic per-device compression frontend */
struct zcomp {
- struct zcomp_strm __percpu *stream;
+ struct percpu_zstrm __percpu *stream;
const struct zcomp_ops *ops;
struct zcomp_params *params;
struct hlist_node node;
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 1/5] zram: Rename zcomp_strm_{init, free}() Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 2/5] zram: Separate the lock from zcomp_strm Jihan LIN via B4 Relay
@ 2026-03-09 12:23 ` Jihan LIN via B4 Relay
2026-03-10 1:05 ` Sergey Senozhatsky
2026-03-09 12:23 ` [PATCH RFC v2 4/5] zram: Use zcomp-managed streams for async write requests Jihan LIN via B4 Relay
` (2 subsequent siblings)
5 siblings, 1 reply; 11+ messages in thread
From: Jihan LIN via B4 Relay @ 2026-03-09 12:23 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe
Cc: linux-kernel, linux-block, Jihan LIN
From: Jihan LIN <linjh22s@gmail.com>
Currently, zcomp uses a per-CPU stream model. This design is restrictive
for hardware-accelerated or batched zcomp backends. These backends often
need to manage their own resources rather than relying on a generic
mutex-protected per-CPU stream for batched operations.
Extend the zcomp interface to allow backends to optionally manage their
own streams while generic per-CPU streams still remain allocated as a
complementary mechanism.
Introduce zstrm_mgmt flag to struct zcomp_params. Backends set this flag
during zcomp_ops->setup_params() to advertise their capability to manage
streams.
Add zcomp_ops->{get, put}_stream() to allow zcomp backends to implement
their own stream strategies.
Modify zcomp_stream_get() to accept a new parameter indicating
zcomp-managed streams are preferred, and update zcomp_stream_put() to
route a zcomp-managed stream to the backend. If the backends advertise
their capability and the caller prefers managed streams, try to get a
stream from the backends; otherwise, fall back to the generic per-CPU
stream.
All existing call sites request the default per-CPU stream to preserve
the original behavior.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
drivers/block/zram/zcomp.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/block/zram/zcomp.h | 30 ++++++++++++++++++++++++++++--
drivers/block/zram/zram_drv.c | 10 +++++-----
3 files changed, 72 insertions(+), 9 deletions(-)
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index daea592f01c37106b14dca9c6d8727a2240de54b..1b5c8e8f6c6cb78a812320334da0b61391bb38f0 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -84,6 +84,7 @@ static int zcomp_strm_init_percpu(struct zcomp *comp,
zcomp_strm_free_percpu(comp, zstrm_pcpu);
return -ENOMEM;
}
+ zstrm->zcomp_managed = false;
return 0;
}
@@ -122,7 +123,7 @@ ssize_t zcomp_available_show(const char *comp, char *buf, ssize_t at)
return at;
}
-struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
+static inline struct zcomp_strm *zcomp_stream_pcpu_get(struct zcomp *comp)
{
for (;;) {
struct percpu_zstrm *zstrm_pcpu = raw_cpu_ptr(comp->stream);
@@ -144,9 +145,37 @@ struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
}
}
+struct zcomp_strm *zcomp_stream_get(struct zcomp *comp, enum zstrm_pref pref)
+{
+ if (comp->params->zstrm_mgmt && pref == ZSTRM_PREFER_MGMT) {
+ struct managed_zstrm *zstrm_managed =
+ comp->ops->get_stream(comp->params);
+
+ if (zstrm_managed) {
+ zstrm_managed->comp = comp;
+ return &zstrm_managed->strm;
+ }
+ }
+
+ return zcomp_stream_pcpu_get(comp);
+}
+
+static inline void zcomp_stream_pcpu_put(struct percpu_zstrm *zstrm)
+{
+ mutex_unlock(&zstrm->lock);
+}
+
void zcomp_stream_put(struct zcomp_strm *zstrm)
{
- mutex_unlock(&zstrm_to_pcpu(zstrm)->lock);
+ if (zstrm->zcomp_managed) {
+ struct managed_zstrm *zstrm_managed =
+ zstrm_to_managed(zstrm);
+
+ zstrm_managed->comp->ops->put_stream(
+ zstrm_managed->comp->params, zstrm_managed);
+ } else {
+ zcomp_stream_pcpu_put(zstrm_to_pcpu(zstrm));
+ }
}
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
@@ -211,11 +240,19 @@ static int zcomp_init(struct zcomp *comp, struct zcomp_params *params)
if (!comp->stream)
return -ENOMEM;
+ params->zstrm_mgmt = false;
comp->params = params;
ret = comp->ops->setup_params(comp->params);
if (ret)
goto cleanup;
+ if (params->zstrm_mgmt &&
+ !(comp->ops->get_stream && comp->ops->put_stream)) {
+ params->zstrm_mgmt = false;
+ pr_warn("Missing managed stream ops in %s, managed stream disabled\n",
+ comp->ops->name);
+ }
+
for_each_possible_cpu(cpu)
mutex_init(&per_cpu_ptr(comp->stream, cpu)->lock);
diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
index 9784bc3f432cf0e22085399b8772b8ba669071de..3543e7e4d2b3b1344bb191b321bcdd69f67031f6 100644
--- a/drivers/block/zram/zcomp.h
+++ b/drivers/block/zram/zcomp.h
@@ -24,6 +24,7 @@ struct zcomp_params {
union {
struct deflate_params deflate;
};
+ bool zstrm_mgmt;
void *drv_data;
};
@@ -31,13 +32,14 @@ struct zcomp_params {
/*
* Run-time driver context - scratch buffers, etc. It is modified during
* request execution (compression/decompression), cannot be shared, so
- * it's in per-CPU area.
+ * it's in per-CPU area or managed by the backend.
*/
struct zcomp_ctx {
void *context;
};
struct zcomp_strm {
+ bool zcomp_managed;
/* compression buffer */
void *buffer;
/* local copy of handle memory */
@@ -47,6 +49,11 @@ struct zcomp_strm {
struct percpu_zstrm;
+struct managed_zstrm {
+ struct zcomp *comp;
+ struct zcomp_strm strm;
+};
+
struct zcomp_req {
const unsigned char *src;
const size_t src_len;
@@ -55,6 +62,11 @@ struct zcomp_req {
size_t dst_len;
};
+enum zstrm_pref {
+ ZSTRM_DEFAULT, /* always use the generic per-CPU stream */
+ ZSTRM_PREFER_MGMT, /* try managed stream; fallback to per-CPU */
+};
+
struct zcomp_ops {
int (*compress)(struct zcomp_params *params, struct zcomp_ctx *ctx,
struct zcomp_req *req);
@@ -67,6 +79,15 @@ struct zcomp_ops {
int (*setup_params)(struct zcomp_params *params);
void (*release_params)(struct zcomp_params *params);
+ /*
+ * get_stream() needs to prepare zstrm->ctx. The backend must ensure
+ * returned stream has zcomp_managed set and matches the per-cpu
+ * stream sizing: local_copy >= PAGE_SIZE, buffer >= 2 * PAGE_SIZE.
+ */
+ struct managed_zstrm *(*get_stream)(struct zcomp_params *params);
+ void (*put_stream)(struct zcomp_params *params,
+ struct managed_zstrm *zstrm);
+
const char *name;
};
@@ -86,7 +107,7 @@ bool zcomp_available_algorithm(const char *comp);
struct zcomp *zcomp_create(const char *alg, struct zcomp_params *params);
void zcomp_destroy(struct zcomp *comp);
-struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
+struct zcomp_strm *zcomp_stream_get(struct zcomp *comp, enum zstrm_pref pref);
void zcomp_stream_put(struct zcomp_strm *zstrm);
int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
@@ -94,4 +115,9 @@ int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
int zcomp_decompress(struct zcomp *comp, struct zcomp_strm *zstrm,
const void *src, unsigned int src_len, void *dst);
+static inline struct managed_zstrm *zstrm_to_managed(struct zcomp_strm *zstrm)
+{
+ return container_of(zstrm, struct managed_zstrm, strm);
+}
+
#endif /* _ZCOMP_H_ */
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index bca33403fc8bf872569c63e65af0fe143287eaaf..7be88cfb56adb12fcc1edc6b4d42271044ef71b5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1377,7 +1377,7 @@ static int decompress_bdev_page(struct zram *zram, struct page *page, u32 index)
size = get_slot_size(zram, index);
prio = get_slot_comp_priority(zram, index);
- zstrm = zcomp_stream_get(zram->comps[prio]);
+ zstrm = zcomp_stream_get(zram->comps[prio], ZSTRM_DEFAULT);
src = kmap_local_page(page);
ret = zcomp_decompress(zram->comps[prio], zstrm, src, size,
zstrm->local_copy);
@@ -2083,7 +2083,7 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
size = get_slot_size(zram, index);
prio = get_slot_comp_priority(zram, index);
- zstrm = zcomp_stream_get(zram->comps[prio]);
+ zstrm = zcomp_stream_get(zram->comps[prio], ZSTRM_DEFAULT);
src = zs_obj_read_begin(zram->mem_pool, handle, size,
zstrm->local_copy);
dst = kmap_local_page(page);
@@ -2111,7 +2111,7 @@ static int read_from_zspool_raw(struct zram *zram, struct page *page, u32 index)
* case if object spans two physical pages. No decompression
* takes place here, as we read raw compressed data.
*/
- zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
+ zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP], ZSTRM_DEFAULT);
src = zs_obj_read_begin(zram->mem_pool, handle, size,
zstrm->local_copy);
memcpy_to_page(page, 0, src, size);
@@ -2265,7 +2265,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
if (same_filled)
return write_same_filled_page(zram, element, index);
- zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
+ zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP], ZSTRM_DEFAULT);
mem = kmap_local_page(page);
ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
mem, &comp_len);
@@ -2447,7 +2447,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
if (!zram->comps[prio])
continue;
- zstrm = zcomp_stream_get(zram->comps[prio]);
+ zstrm = zcomp_stream_get(zram->comps[prio], ZSTRM_DEFAULT);
src = kmap_local_page(page);
ret = zcomp_compress(zram->comps[prio], zstrm,
src, &comp_len_new);
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams
2026-03-09 12:23 ` [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams Jihan LIN via B4 Relay
@ 2026-03-10 1:05 ` Sergey Senozhatsky
2026-03-10 13:31 ` Jihan LIN
0 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2026-03-10 1:05 UTC (permalink / raw)
To: linjh22s
Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, linux-kernel,
linux-block
Hi Jihan,
On (26/03/09 12:23), Jihan LIN via B4 Relay wrote:
> From: Jihan LIN <linjh22s@gmail.com>
>
> Currently, zcomp uses a per-CPU stream model. This design is restrictive
> for hardware-accelerated or batched zcomp backends.
zcomp doesn't support hardware-accelerated compression, and that's
why we plan to delete zcomp API sometime this year and switch to a
acomp crypto API instead. Does crypto API address your use case?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams
2026-03-10 1:05 ` Sergey Senozhatsky
@ 2026-03-10 13:31 ` Jihan LIN
2026-03-11 8:58 ` Sergey Senozhatsky
0 siblings, 1 reply; 11+ messages in thread
From: Jihan LIN @ 2026-03-10 13:31 UTC (permalink / raw)
To: Sergey Senozhatsky
Cc: Minchan Kim, Jens Axboe, linux-kernel, linux-block, Kairui Song
Hi Sergey,
On Tue, Mar 10, 2026 at 9:05 AM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
> zcomp doesn't support hardware-accelerated compression, and that's
> why we plan to delete zcomp API sometime this year and switch to a
> acomp crypto API instead.
Thanks for the note.
Yes, the acomp crypto API can cover my user case. However, per-cpu
streams can still limit concurrency to num_online_cpus() even with acomp
as in mm/zswap.c. And simply replacing them with a global idle stream
list leads to a significant lock contention regression, as tested by
Kairui[1].
Would it make sense to try using per-CPU stream first and fallback to a
global idle stream pool only when the per-cpu stream is busy? Happy to
help with the migration effort.
Best Regards,
Jihan
[1]: https://lore.kernel.org/all/CAMgjq7BFUrUY5Xq5Eks4ibqbWgJcca1vqB4kq=otQVyG=23FRw@mail.gmail.com/
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams
2026-03-10 13:31 ` Jihan LIN
@ 2026-03-11 8:58 ` Sergey Senozhatsky
0 siblings, 0 replies; 11+ messages in thread
From: Sergey Senozhatsky @ 2026-03-11 8:58 UTC (permalink / raw)
To: Jihan LIN
Cc: Sergey Senozhatsky, Minchan Kim, Jens Axboe, linux-kernel,
linux-block, Kairui Song
Hi,
On (26/03/10 21:31), Jihan LIN wrote:
> Yes, the acomp crypto API can cover my user case. However, per-cpu
> streams can still limit concurrency to num_online_cpus() even with acomp
> as in mm/zswap.c. And simply replacing them with a global idle stream
> list leads to a significant lock contention regression, as tested by
> Kairui[1].
>
> Would it make sense to try using per-CPU stream first and fallback to a
> global idle stream pool only when the per-cpu stream is busy? Happy to
> help with the migration effort.
That's a good and difficult question. I was not planning on
re-introducing an idle list, wanted to keep things per-CPU,
the way they currently are. Hmm but our per-CPU model doesn't
fit at all. I don't have any good ideas now.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH RFC v2 4/5] zram: Use zcomp-managed streams for async write requests
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
` (2 preceding siblings ...)
2026-03-09 12:23 ` [PATCH RFC v2 3/5] zram: Introduce zcomp-managed streams Jihan LIN via B4 Relay
@ 2026-03-09 12:23 ` Jihan LIN via B4 Relay
2026-03-09 12:23 ` [PATCH RFC v2 5/5] zram: Add lz4 PoC for zcomp-managed streams Jihan LIN via B4 Relay
2026-03-11 8:51 ` [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Sergey Senozhatsky
5 siblings, 0 replies; 11+ messages in thread
From: Jihan LIN via B4 Relay @ 2026-03-09 12:23 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe
Cc: linux-kernel, linux-block, Jihan LIN
From: Jihan LIN <linjh22s@gmail.com>
Current per-CPU streams limit write concurrency to the number of online
CPUs. Hardware accelerators with deep submission queues can handle far
more concurrent requests. Use zcomp-managed streams for async write
requests to take advantage of this.
Modify zram_write_page() to accept a flag indicating the request is
asynchronous. If the bio request is considered non-synchronous and the
backend supports zcomp-managed streams, attempt to acquire one.
zcomp_stream_get() handles the fallback to per-CPU streams.
Sync writes block waiting for completion (e.g., blk_wait_io() in
submit_bio_wait() from callers), and remain on per-CPU streams for
per-request latency. Reads are unchanged since they are treated as
synchronous operations. Recompression also remains unchanged as it
prioritizes compression ratio.
Although zram_write_page() currently waits for compression to complete,
using zcomp-managed streams allows write concurrency to exceed the
number of CPUs.
Supporting multiple pages within a single bio request is deferred to
keep it simple and focused.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
drivers/block/zram/zram_drv.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7be88cfb56adb12fcc1edc6b4d42271044ef71b5..3db4579776f758c16006fd3108b4f778b84fea30 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2083,6 +2083,7 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
size = get_slot_size(zram, index);
prio = get_slot_comp_priority(zram, index);
+ /* Reads are treated as synchronous, see op_is_sync(). */
zstrm = zcomp_stream_get(zram->comps[prio], ZSTRM_DEFAULT);
src = zs_obj_read_begin(zram->mem_pool, handle, size,
zstrm->local_copy);
@@ -2249,7 +2250,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
return 0;
}
-static int zram_write_page(struct zram *zram, struct page *page, u32 index)
+static int zram_write_page(struct zram *zram, struct page *page, u32 index,
+ bool is_async)
{
int ret = 0;
unsigned long handle;
@@ -2265,7 +2267,16 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
if (same_filled)
return write_same_filled_page(zram, element, index);
- zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP], ZSTRM_DEFAULT);
+ /*
+ * Using a zcomp-managed stream and waiting for compression makes this
+ * appear synchronous.
+ *
+ * At this time, zram_bio_write handles pages one by one.
+ * However, preferring zcomp-managed streams allows backends to utilize
+ * their own resources.
+ */
+ zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP],
+ is_async ? ZSTRM_PREFER_MGMT : ZSTRM_DEFAULT);
mem = kmap_local_page(page);
ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
mem, &comp_len);
@@ -2327,7 +2338,8 @@ static int zram_bvec_write_partial(struct zram *zram, struct bio_vec *bvec,
ret = zram_read_page(zram, page, index, bio);
if (!ret) {
memcpy_from_bvec(page_address(page) + offset, bvec);
- ret = zram_write_page(zram, page, index);
+ ret = zram_write_page(zram, page, index,
+ !op_is_sync(bio->bi_opf));
}
__free_page(page);
return ret;
@@ -2338,7 +2350,8 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec,
{
if (is_partial_io(bvec))
return zram_bvec_write_partial(zram, bvec, index, offset, bio);
- return zram_write_page(zram, bvec->bv_page, index);
+ return zram_write_page(zram, bvec->bv_page, index,
+ !op_is_sync(bio->bi_opf));
}
#ifdef CONFIG_ZRAM_MULTI_COMP
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* [PATCH RFC v2 5/5] zram: Add lz4 PoC for zcomp-managed streams
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
` (3 preceding siblings ...)
2026-03-09 12:23 ` [PATCH RFC v2 4/5] zram: Use zcomp-managed streams for async write requests Jihan LIN via B4 Relay
@ 2026-03-09 12:23 ` Jihan LIN via B4 Relay
2026-03-11 8:51 ` [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Sergey Senozhatsky
5 siblings, 0 replies; 11+ messages in thread
From: Jihan LIN via B4 Relay @ 2026-03-09 12:23 UTC (permalink / raw)
To: Minchan Kim, Sergey Senozhatsky, Jens Axboe
Cc: linux-kernel, linux-block, Jihan LIN
From: Jihan LIN <linjh22s@gmail.com>
This patch provides a proof-of-concept implementation of zcomp-managed
streams for the lz4 backend, demonstrating how a hardware-accelerated
compression backend would integrate with zcomp-managed streams
introduced earlier in this series.
The PoC simulates a hardware accelerator with a fixed queue depth of
128. Global stream buffers are shared across all zram devices, while
contexts are per-device. Both are pre-allocated. During compression,
requests are submitted to a double-buffered kfifo queue and processed by
a dedicated kthread.
Known limitations:
- The single kthread serializes all compression work.
- Pool sizes are hard-coded.
- Uses global mutexes; contention is expected to be high under load.
- Assumes !HIGHMEM; kmap_local_page mappings are passed to a kthread.
Signed-off-by: Jihan LIN <linjh22s@gmail.com>
---
drivers/block/zram/backend_lz4.c | 464 +++++++++++++++++++++++++++++++++++++--
1 file changed, 442 insertions(+), 22 deletions(-)
diff --git a/drivers/block/zram/backend_lz4.c b/drivers/block/zram/backend_lz4.c
index 04e18661476086502ac41355e9cc38cc2f353d52..adf689003a62770cbbf0901ca07da84108f8d9d7 100644
--- a/drivers/block/zram/backend_lz4.c
+++ b/drivers/block/zram/backend_lz4.c
@@ -2,8 +2,12 @@
#include <linux/lz4.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
+#include <linux/kthread.h>
+#include <linux/kfifo.h>
+#include <linux/completion.h>
#include "backend_lz4.h"
+#include "zcomp.h"
struct lz4_ctx {
void *mem;
@@ -12,18 +16,326 @@ struct lz4_ctx {
LZ4_stream_t *cstrm;
};
+struct lz4_stream {
+ struct managed_zstrm zstrm;
+ struct list_head node;
+ struct completion completion;
+ int result;
+};
+
+struct lz4_req {
+ struct lz4_stream *strm;
+ struct lz4_ctx *ctx;
+ struct zcomp_params *params;
+ struct zcomp_req *zreq;
+};
+
+#define BACKEND_LZ4_STREAM_MAX 128
+#define BACKEND_LZ4_QUEUE_NR 2
+
+struct lz4_global {
+ struct list_head stream_head;
+ struct mutex stream_lock;
+ struct task_struct *tsk;
+
+ struct completion new_task_ready;
+
+ struct mutex working_lock;
+ DECLARE_KFIFO(working_queue[BACKEND_LZ4_QUEUE_NR], struct lz4_req,
+ BACKEND_LZ4_STREAM_MAX);
+ int working_submit_idx;
+
+ struct kref ref;
+};
+
+static DEFINE_MUTEX(lz4_global_lock);
+static struct lz4_global *lz4_global_data;
+
+static void lz4_stream_free(struct lz4_stream *src)
+{
+ if (IS_ERR_OR_NULL(src))
+ return;
+
+ vfree(src->zstrm.strm.buffer);
+ vfree(src->zstrm.strm.local_copy);
+
+ kvfree(src);
+}
+
+DEFINE_FREE(lz4_stream, struct lz4_stream *, lz4_stream_free(_T));
+static struct lz4_stream *lz4_stream_alloc(void)
+{
+ struct lz4_stream *strm __free(lz4_stream) = NULL;
+ void *buffer __free(kvfree) = NULL;
+ void *local_copy __free(kvfree) = NULL;
+
+ strm = kvzalloc_obj(struct lz4_stream, GFP_KERNEL);
+ if (!strm)
+ return ERR_PTR(-ENOMEM);
+
+ buffer = vmalloc(PAGE_SIZE * 2);
+ local_copy = vmalloc(PAGE_SIZE);
+ if (!buffer || !local_copy)
+ return ERR_PTR(-ENOMEM);
+
+ strm->zstrm.strm.buffer = no_free_ptr(buffer);
+ strm->zstrm.strm.local_copy = no_free_ptr(local_copy);
+ strm->zstrm.strm.zcomp_managed = true;
+
+ return_ptr(strm);
+}
+
+static void lz4_streams_destroy(struct lz4_global *inst)
+{
+ struct lz4_stream *pos, *tmp;
+
+ list_for_each_entry_safe(pos, tmp, &inst->stream_head, node) {
+ list_del(&pos->node);
+ lz4_stream_free(pos);
+ }
+}
+
+static int lz4_streams_init(struct zcomp_params *params,
+ struct lz4_global *inst)
+{
+ int err = 0;
+
+ INIT_LIST_HEAD(&inst->stream_head);
+ for (int i = 0; i < BACKEND_LZ4_STREAM_MAX; i++) {
+ struct lz4_stream *curr_zstrm __free(lz4_stream) = NULL;
+
+ curr_zstrm = lz4_stream_alloc();
+ if (IS_ERR(curr_zstrm)) {
+ err = PTR_ERR(curr_zstrm);
+ break;
+ }
+
+ /* lz4_ctx is linked to stream in get_stream() */
+ list_add(&curr_zstrm->node, &inst->stream_head);
+ init_completion(&curr_zstrm->completion);
+ retain_and_null_ptr(curr_zstrm);
+ }
+
+ if (err) {
+ lz4_streams_destroy(inst);
+ return err;
+ }
+
+ return 0;
+}
+
+static int __lz4_compress(struct zcomp_params *params, struct lz4_ctx *zctx,
+ struct zcomp_req *req);
+
+static void lz4_do_compression(struct lz4_global *inst)
+{
+ struct lz4_req req;
+ int idx;
+
+ scoped_guard(mutex, &inst->working_lock)
+ {
+ idx = inst->working_submit_idx;
+ inst->working_submit_idx = (idx + 1) % BACKEND_LZ4_QUEUE_NR;
+ }
+
+ while (kfifo_get(&inst->working_queue[idx], &req)) {
+ struct lz4_stream *lz4_strm = req.strm;
+
+ lz4_strm->result =
+ __lz4_compress(req.params, req.ctx, req.zreq);
+ complete(&lz4_strm->completion);
+ }
+}
+
+static int lz4_thread_worker(void *data)
+{
+ struct lz4_global *inst = data;
+
+ while (!kthread_should_stop()) {
+ int err;
+
+ err = wait_for_completion_interruptible(&inst->new_task_ready);
+ if (err)
+ continue;
+ lz4_do_compression(inst);
+ }
+ return 0;
+}
+
+static int lz4_global_init(struct zcomp_params *params)
+{
+ int err = 0;
+ struct lz4_global *newinst = NULL;
+
+ mutex_lock(&lz4_global_lock);
+
+ if (lz4_global_data) {
+ kref_get(&lz4_global_data->ref);
+ err = 0;
+ goto out_unlock;
+ }
+
+ newinst = kvzalloc_obj(*newinst);
+ if (!newinst) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+
+ INIT_KFIFO(newinst->working_queue[0]);
+ INIT_KFIFO(newinst->working_queue[1]);
+ newinst->working_submit_idx = 0;
+
+ mutex_init(&newinst->stream_lock);
+ mutex_init(&newinst->working_lock);
+ kref_init(&newinst->ref);
+ err = lz4_streams_init(params, newinst);
+ if (err)
+ goto err_stream_init;
+ init_completion(&newinst->new_task_ready);
+ newinst->tsk = kthread_run(lz4_thread_worker, newinst, "zcomp_lz4");
+ if (IS_ERR(newinst->tsk)) {
+ err = PTR_ERR(newinst->tsk);
+ goto err_kthread_init;
+ }
+
+ lz4_global_data = newinst;
+ mutex_unlock(&lz4_global_lock);
+ return 0;
+
+err_kthread_init:
+ lz4_streams_destroy(newinst);
+err_stream_init:
+ mutex_destroy(&newinst->working_lock);
+ mutex_destroy(&newinst->stream_lock);
+ kvfree(newinst);
+out_unlock:
+ mutex_unlock(&lz4_global_lock);
+ return err;
+}
+
+static void lz4_global_destroy(struct kref *ref)
+{
+ struct lz4_global *inst;
+
+ lockdep_assert_held(&lz4_global_lock);
+
+ if (!lz4_global_data)
+ return;
+ inst = container_of(ref, struct lz4_global, ref);
+ WARN_ON(inst != lz4_global_data);
+
+ lz4_global_data = NULL;
+ kthread_stop(inst->tsk);
+ lz4_streams_destroy(inst);
+ mutex_destroy(&inst->stream_lock);
+ mutex_destroy(&inst->working_lock);
+ kvfree(inst);
+}
+
+struct lz4_drv {
+ struct mutex lock;
+ DECLARE_KFIFO(ctxqueue, struct lz4_ctx *, BACKEND_LZ4_STREAM_MAX);
+ struct lz4_ctx ctxs[BACKEND_LZ4_STREAM_MAX];
+};
+
+static int lz4_ctx_init(struct zcomp_params *params, struct lz4_ctx *zctx);
+static void lz4_ctx_destroy(struct lz4_ctx *zctx);
+
+static void lz4_drv_free(struct lz4_drv *drv_data)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(drv_data->ctxs); i++)
+ lz4_ctx_destroy(&drv_data->ctxs[i]);
+
+ mutex_destroy(&drv_data->lock);
+
+ kvfree(drv_data);
+}
+
+static int lz4_drv_alloc(struct zcomp_params *params)
+{
+ struct lz4_drv *drv_data = NULL;
+ int i, len;
+ int err = 0;
+
+ drv_data = kvzalloc_obj(*drv_data);
+ if (!drv_data)
+ return -ENOMEM;
+ mutex_init(&drv_data->lock);
+
+ INIT_KFIFO(drv_data->ctxqueue);
+
+ len = kfifo_size(&drv_data->ctxqueue);
+
+ for (i = 0; i < min(len, ARRAY_SIZE(drv_data->ctxs)); i++) {
+ struct lz4_ctx *ctx = &drv_data->ctxs[i];
+
+ err = lz4_ctx_init(params, ctx);
+ if (err)
+ break;
+
+ kfifo_put(&drv_data->ctxqueue, ctx);
+ }
+
+ if (err) {
+ lz4_drv_free(drv_data);
+ return err;
+ }
+
+ params->drv_data = drv_data;
+
+ return 0;
+}
+
static void lz4_release_params(struct zcomp_params *params)
{
+ struct lz4_drv *drv_data = params->drv_data;
+
+ if (!params->zstrm_mgmt)
+ return;
+
+ lz4_drv_free(drv_data);
+
+ kref_put_mutex(&lz4_global_data->ref, lz4_global_destroy,
+ &lz4_global_lock);
}
static int lz4_setup_params(struct zcomp_params *params)
{
+ int err = 0;
+
if (params->level == ZCOMP_PARAM_NOT_SET)
params->level = LZ4_ACCELERATION_DEFAULT;
+ params->zstrm_mgmt = false;
+ err = lz4_global_init(params);
+ if (err) {
+ pr_err("lz4 global init failed: %d, managed stream disabled",
+ err);
+ return 0;
+ }
+
+ err = lz4_drv_alloc(params);
+
+ if (err) {
+ pr_err("lz4 drv init failed: %d, managed stream disabled", err);
+ kref_put_mutex(&lz4_global_data->ref, lz4_global_destroy,
+ &lz4_global_lock);
+ return 0;
+ }
+
+ params->zstrm_mgmt = true;
return 0;
}
+static void lz4_ctx_destroy(struct lz4_ctx *zctx)
+{
+ vfree(zctx->mem);
+ kfree(zctx->dstrm);
+ kfree(zctx->cstrm);
+}
+
static void lz4_destroy(struct zcomp_ctx *ctx)
{
struct lz4_ctx *zctx = ctx->context;
@@ -31,12 +343,36 @@ static void lz4_destroy(struct zcomp_ctx *ctx)
if (!zctx)
return;
- vfree(zctx->mem);
- kfree(zctx->dstrm);
- kfree(zctx->cstrm);
+ lz4_ctx_destroy(zctx);
kfree(zctx);
}
+static int lz4_ctx_init(struct zcomp_params *params, struct lz4_ctx *zctx)
+{
+ void *mem __free(kvfree) = NULL;
+ LZ4_streamDecode_t *dstrm __free(kfree) = NULL;
+ LZ4_stream_t *cstrm __free(kfree) = NULL;
+
+ if (params->dict_sz == 0) {
+ mem = vmalloc(LZ4_MEM_COMPRESS);
+ if (!mem)
+ return -ENOMEM;
+ } else {
+ dstrm = kzalloc_obj(*zctx->dstrm);
+ if (!dstrm)
+ return -ENOMEM;
+
+ cstrm = kzalloc_obj(*zctx->cstrm);
+ if (!cstrm)
+ return -ENOMEM;
+ }
+
+ zctx->mem = no_free_ptr(mem);
+ zctx->dstrm = no_free_ptr(dstrm);
+ zctx->cstrm = no_free_ptr(cstrm);
+ return 0;
+}
+
static int lz4_create(struct zcomp_params *params, struct zcomp_ctx *ctx)
{
struct lz4_ctx *zctx;
@@ -46,31 +382,17 @@ static int lz4_create(struct zcomp_params *params, struct zcomp_ctx *ctx)
return -ENOMEM;
ctx->context = zctx;
- if (params->dict_sz == 0) {
- zctx->mem = vmalloc(LZ4_MEM_COMPRESS);
- if (!zctx->mem)
- goto error;
- } else {
- zctx->dstrm = kzalloc_obj(*zctx->dstrm);
- if (!zctx->dstrm)
- goto error;
-
- zctx->cstrm = kzalloc_obj(*zctx->cstrm);
- if (!zctx->cstrm)
- goto error;
+ if (lz4_ctx_init(params, zctx)) {
+ lz4_destroy(ctx);
+ return -ENOMEM;
}
return 0;
-
-error:
- lz4_destroy(ctx);
- return -ENOMEM;
}
-static int lz4_compress(struct zcomp_params *params, struct zcomp_ctx *ctx,
- struct zcomp_req *req)
+static int __lz4_compress(struct zcomp_params *params, struct lz4_ctx *zctx,
+ struct zcomp_req *req)
{
- struct lz4_ctx *zctx = ctx->context;
int ret;
if (!zctx->cstrm) {
@@ -92,6 +414,47 @@ static int lz4_compress(struct zcomp_params *params, struct zcomp_ctx *ctx,
return 0;
}
+static int lz4_compress_managed(struct zcomp_params *params,
+ struct lz4_ctx *zctx, struct zcomp_req *req,
+ struct zcomp_strm *zstrm)
+{
+ struct lz4_stream *mngt_strm =
+ container_of(zstrm_to_managed(zstrm), struct lz4_stream, zstrm);
+
+ scoped_guard(mutex, &lz4_global_data->working_lock)
+ {
+ int cnt;
+ int idx = lz4_global_data->working_submit_idx;
+ struct lz4_req lz4req = {
+ .strm = mngt_strm,
+ .params = params,
+ .ctx = zctx,
+ .zreq = req
+ };
+
+ /* ctx->src is mapped by kmap_local_map() */
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHMEM));
+ cnt = kfifo_put(&lz4_global_data->working_queue[idx], lz4req);
+ if (cnt == 0)
+ return -EBUSY;
+ }
+ complete(&lz4_global_data->new_task_ready);
+ wait_for_completion(&mngt_strm->completion);
+ return mngt_strm->result;
+}
+
+static int lz4_compress(struct zcomp_params *params, struct zcomp_ctx *ctx,
+ struct zcomp_req *req)
+{
+ struct lz4_ctx *zctx = ctx->context;
+ struct zcomp_strm *zstrm = container_of(ctx, struct zcomp_strm, ctx);
+
+ if (!zstrm->zcomp_managed)
+ return __lz4_compress(params, zctx, req);
+
+ return lz4_compress_managed(params, zctx, req, zstrm);
+}
+
static int lz4_decompress(struct zcomp_params *params, struct zcomp_ctx *ctx,
struct zcomp_req *req)
{
@@ -116,6 +479,61 @@ static int lz4_decompress(struct zcomp_params *params, struct zcomp_ctx *ctx,
return 0;
}
+static struct managed_zstrm *lz4_get_stream(struct zcomp_params *params)
+{
+ struct lz4_stream *lz4_strm;
+ struct lz4_drv *drv_data = params->drv_data;
+ struct lz4_ctx *ctx;
+
+ if (!params->zstrm_mgmt)
+ return NULL;
+
+ scoped_guard(mutex, &lz4_global_data->stream_lock)
+ {
+ lz4_strm = list_first_entry_or_null(
+ &lz4_global_data->stream_head, struct lz4_stream, node);
+ if (!lz4_strm)
+ return NULL;
+
+ list_del_init(&lz4_strm->node);
+ }
+
+ scoped_guard(mutex, &drv_data->lock)
+ if (!kfifo_get(&drv_data->ctxqueue, &ctx))
+ ctx = NULL;
+ if (!ctx) {
+ guard(mutex)(&lz4_global_data->stream_lock);
+ list_add(&lz4_strm->node, &lz4_global_data->stream_head);
+ return NULL;
+ }
+ reinit_completion(&lz4_strm->completion);
+ lz4_strm->zstrm.strm.ctx.context = ctx;
+
+ return &lz4_strm->zstrm;
+}
+
+static void lz4_put_stream(struct zcomp_params *params,
+ struct managed_zstrm *zstrm)
+{
+ struct lz4_stream *lz4_strm;
+ struct lz4_ctx *ctx;
+ struct lz4_drv *drv_data = params->drv_data;
+
+ if (!zstrm)
+ return;
+ if (WARN_ON(!params->zstrm_mgmt))
+ return;
+
+ lz4_strm = container_of(zstrm, struct lz4_stream, zstrm);
+ ctx = zstrm->strm.ctx.context;
+ lz4_strm->zstrm.strm.ctx.context = NULL;
+
+ scoped_guard(mutex, &lz4_global_data->stream_lock)
+ list_add(&lz4_strm->node, &lz4_global_data->stream_head);
+ scoped_guard(mutex, &drv_data->lock)
+ kfifo_put(&drv_data->ctxqueue, ctx);
+}
+
const struct zcomp_ops backend_lz4 = {
.compress = lz4_compress,
.decompress = lz4_decompress,
@@ -123,5 +541,7 @@ const struct zcomp_ops backend_lz4 = {
.destroy_ctx = lz4_destroy,
.setup_params = lz4_setup_params,
.release_params = lz4_release_params,
+ .get_stream = lz4_get_stream,
+ .put_stream = lz4_put_stream,
.name = "lz4",
};
--
2.51.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams
2026-03-09 12:23 [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Jihan LIN via B4 Relay
` (4 preceding siblings ...)
2026-03-09 12:23 ` [PATCH RFC v2 5/5] zram: Add lz4 PoC for zcomp-managed streams Jihan LIN via B4 Relay
@ 2026-03-11 8:51 ` Sergey Senozhatsky
2026-03-13 14:42 ` Jihan LIN
5 siblings, 1 reply; 11+ messages in thread
From: Sergey Senozhatsky @ 2026-03-11 8:51 UTC (permalink / raw)
To: linjh22s
Cc: Minchan Kim, Sergey Senozhatsky, Jens Axboe, linux-kernel,
linux-block
A quick question:
On (26/03/09 12:23), Jihan LIN via B4 Relay wrote:
> This RFC series focuses on the stream management interface required for
> accelerator backends, laying the groundwork for batched asynchronous
> operations in zram. Since I cannot verify this on specific accelerators
> at this moment, a PoC patch that simulates this behavior in software is
> included to verify new stream operations without requiring specific
> accelerators. The next step would be to add a non-blocking interface to
> fully utilize their concurrency, and allow backends to be built as
> separate modules. Any feedback would be greatly appreciated.
So does such a hardware exist? This series is a little too
complex, so it better solve some real problem, so to speak,
before we start looking into it.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams
2026-03-11 8:51 ` [PATCH RFC v2 0/5] zram: Allow zcomps to manage their own streams Sergey Senozhatsky
@ 2026-03-13 14:42 ` Jihan LIN
0 siblings, 0 replies; 11+ messages in thread
From: Jihan LIN @ 2026-03-13 14:42 UTC (permalink / raw)
To: Sergey Senozhatsky; +Cc: Minchan Kim, Jens Axboe, linux-kernel, linux-block
Hi Sergey,
On Wed, Mar 11, 2026 at 4:52 PM Sergey Senozhatsky
<senozhatsky@chromium.org> wrote:
> So does such a hardware exist?
Yes, there are a few examples, as far as I know.
LZ4 is relevant here because it is widely used, and decompression is
already very fast on the CPU. So a compression-only accelerator makes
sense.
HiSilicon has hisi_zip in its server SoCs. For LZ4, hisi_zip offloads
compression only, with decompression handled in software[1].
I also found some out-of-tree examples, such as qpace_drv for SM8845 in
OnePlus's tree[2] and mtk_hwz for some MediaTek SoCs from Samsung[3].
These examples suggest a similar setup: queue-based hardware, while
software (or synchronous paths) is still used for some decompression
paths. These are the kinds of devices I had in mind, and deeper hardware
queues do not fit well into the current model. Well, I don't have any of
these on hand yet, but this is a kind of use case behind this series.
[1]: https://lore.kernel.org/all/20260117023435.1616703-1-huangchenghai2@huawei.com/
[2]: https://github.com/OnePlusOSS/android_kernel_oneplus_sm8845/tree/ecfc67b9e933937140df7a1cf39060de8dbd11be/drivers/block/zram
[3]: https://github.com/samsung-mediatek/android_kernel_device_modules-6.12/tree/4749bfe7783c045f53c50160e05b67a9a2acc3f4/drivers/misc/mediatek/mtk_zram
^ permalink raw reply [flat|nested] 11+ messages in thread