From: Yosry Ahmed <yosry.ahmed@linux.dev>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Kairui Song <ryncsn@gmail.com>, Minchan Kim <minchan@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 02/18] zram: permit preemption with active compression stream
Date: Wed, 12 Feb 2025 16:01:41 +0000 [thread overview]
Message-ID: <Z6zF5QvTQwVoAhMP@google.com> (raw)
In-Reply-To: <20250212063153.179231-3-senozhatsky@chromium.org>
On Wed, Feb 12, 2025 at 03:27:00PM +0900, Sergey Senozhatsky wrote:
> Currently, per-CPU stream access is done from a non-preemptible
> (atomic) section, which imposes the same atomicity requirements on
> compression backends as entry spin-lock, and makes it impossible
> to use algorithms that can schedule/wait/sleep during compression
> and decompression.
>
> Switch to preemptible per-CPU model, similar to the one used
> in zswap. Instead of a per-CPU local lock, each stream carries
> a mutex which is locked throughout entire time zram uses it
> for compression or decompression, so that cpu-dead event waits
> for zram to stop using a particular per-CPU stream and release
> it.
>
> Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
> ---
> drivers/block/zram/zcomp.c | 36 +++++++++++++++++++++++++----------
> drivers/block/zram/zcomp.h | 6 +++---
> drivers/block/zram/zram_drv.c | 20 +++++++++----------
> 3 files changed, 39 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
> index bb514403e305..e83dd9a80a81 100644
> --- a/drivers/block/zram/zcomp.c
> +++ b/drivers/block/zram/zcomp.c
> @@ -7,6 +7,7 @@
> #include <linux/wait.h>
> #include <linux/sched.h>
> #include <linux/cpu.h>
> +#include <linux/cpuhotplug.h>
What code changes prompt this?
> #include <linux/crypto.h>
> #include <linux/vmalloc.h>
>
> @@ -54,6 +55,7 @@ static int zcomp_strm_init(struct zcomp *comp, struct zcomp_strm *zstrm)
> {
> int ret;
>
> + mutex_init(&zstrm->lock);
I don't think we can initialize the mutex in the hotplug callback. I
think the following scenario is possible:
CPU #1 CPU #2
zcomp_stream_get()
zstrm = raw_cpu_ptr()
/* task migrated to CPU 2 */
CPU goes offline
zcomp_cpu_dead()
mutex_lock()
..
mutex_unlock()
/* migrated task continues */
zcomp_stream_get()
mutex_lock()
CPU goes online
mutex_init()
mutex_unlock() /* problem */
In this case we'll end up initializing the mutex on CPU #1 while CPU #2
has it locked. When we unlocked it on CPU #2 we will corrupt it AFAICT.
This is why I moved the mutex initialization out of the hotplug callback
in zswap. I suspect to do something similar for zram we'd need to do it
in zcomp_init()?
> ret = comp->ops->create_ctx(comp->params, &zstrm->ctx);
> if (ret)
> return ret;
> @@ -109,13 +111,29 @@ ssize_t zcomp_available_show(const char *comp, char *buf)
>
> struct zcomp_strm *zcomp_stream_get(struct zcomp *comp)
> {
> - local_lock(&comp->stream->lock);
> - return this_cpu_ptr(comp->stream);
> + for (;;) {
> + struct zcomp_strm *zstrm = raw_cpu_ptr(comp->stream);
> +
> + /*
> + * Inspired by zswap
> + *
> + * stream is returned with ->mutex locked which prevents
> + * cpu_dead() from releasing this stream under us, however
> + * there is still a race window between raw_cpu_ptr() and
> + * mutex_lock(), during which we could have been migrated
> + * to a CPU that has already destroyed its stream. If so
"we could have been migrated from** a CPU that has already destroyed its
stream"? Right?
> + * then unlock and re-try on the current CPU.
> + */
> + mutex_lock(&zstrm->lock);
> + if (likely(zstrm->buffer))
> + return zstrm;
> + mutex_unlock(&zstrm->lock);
> + }
> }
>
> -void zcomp_stream_put(struct zcomp *comp)
> +void zcomp_stream_put(struct zcomp_strm *zstrm)
> {
> - local_unlock(&comp->stream->lock);
> + mutex_unlock(&zstrm->lock);
> }
>
> int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> @@ -151,12 +169,9 @@ int zcomp_decompress(struct zcomp *comp, struct zcomp_strm *zstrm,
> int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
> {
> struct zcomp *comp = hlist_entry(node, struct zcomp, node);
> - struct zcomp_strm *zstrm;
> + struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
> int ret;
>
> - zstrm = per_cpu_ptr(comp->stream, cpu);
> - local_lock_init(&zstrm->lock);
> -
> ret = zcomp_strm_init(comp, zstrm);
> if (ret)
> pr_err("Can't allocate a compression stream\n");
> @@ -166,10 +181,11 @@ int zcomp_cpu_up_prepare(unsigned int cpu, struct hlist_node *node)
> int zcomp_cpu_dead(unsigned int cpu, struct hlist_node *node)
> {
> struct zcomp *comp = hlist_entry(node, struct zcomp, node);
> - struct zcomp_strm *zstrm;
> + struct zcomp_strm *zstrm = per_cpu_ptr(comp->stream, cpu);
>
> - zstrm = per_cpu_ptr(comp->stream, cpu);
> + mutex_lock(&zstrm->lock);
> zcomp_strm_free(comp, zstrm);
> + mutex_unlock(&zstrm->lock);
> return 0;
> }
>
> diff --git a/drivers/block/zram/zcomp.h b/drivers/block/zram/zcomp.h
> index ad5762813842..23b8236b9090 100644
> --- a/drivers/block/zram/zcomp.h
> +++ b/drivers/block/zram/zcomp.h
> @@ -3,7 +3,7 @@
> #ifndef _ZCOMP_H_
> #define _ZCOMP_H_
>
> -#include <linux/local_lock.h>
> +#include <linux/mutex.h>
>
> #define ZCOMP_PARAM_NO_LEVEL INT_MIN
>
> @@ -31,7 +31,7 @@ struct zcomp_ctx {
> };
>
> struct zcomp_strm {
> - local_lock_t lock;
> + struct mutex lock;
> /* compression buffer */
> void *buffer;
> struct zcomp_ctx ctx;
> @@ -77,7 +77,7 @@ struct zcomp *zcomp_create(const char *alg, struct zcomp_params *params);
> void zcomp_destroy(struct zcomp *comp);
>
> struct zcomp_strm *zcomp_stream_get(struct zcomp *comp);
> -void zcomp_stream_put(struct zcomp *comp);
> +void zcomp_stream_put(struct zcomp_strm *zstrm);
>
> int zcomp_compress(struct zcomp *comp, struct zcomp_strm *zstrm,
> const void *src, unsigned int *dst_len);
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 3708436f1d1f..43f460a45e3e 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1608,7 +1608,7 @@ static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
> ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
> kunmap_local(dst);
> zs_unmap_object(zram->mem_pool, handle);
> - zcomp_stream_put(zram->comps[prio]);
> + zcomp_stream_put(zstrm);
>
> return ret;
> }
> @@ -1769,14 +1769,14 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> kunmap_local(mem);
>
> if (unlikely(ret)) {
> - zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
> + zcomp_stream_put(zstrm);
> pr_err("Compression failed! err=%d\n", ret);
> zs_free(zram->mem_pool, handle);
> return ret;
> }
>
> if (comp_len >= huge_class_size) {
> - zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
> + zcomp_stream_put(zstrm);
> return write_incompressible_page(zram, page, index);
> }
>
> @@ -1800,7 +1800,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> __GFP_HIGHMEM |
> __GFP_MOVABLE);
> if (IS_ERR_VALUE(handle)) {
> - zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
> + zcomp_stream_put(zstrm);
> atomic64_inc(&zram->stats.writestall);
> handle = zs_malloc(zram->mem_pool, comp_len,
> GFP_NOIO | __GFP_HIGHMEM |
> @@ -1812,7 +1812,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> }
>
> if (!zram_can_store_page(zram)) {
> - zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
> + zcomp_stream_put(zstrm);
> zs_free(zram->mem_pool, handle);
> return -ENOMEM;
> }
> @@ -1820,7 +1820,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
> dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
>
> memcpy(dst, zstrm->buffer, comp_len);
> - zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
> + zcomp_stream_put(zstrm);
> zs_unmap_object(zram->mem_pool, handle);
>
> zram_slot_lock(zram, index);
> @@ -1979,7 +1979,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
> kunmap_local(src);
>
> if (ret) {
> - zcomp_stream_put(zram->comps[prio]);
> + zcomp_stream_put(zstrm);
> return ret;
> }
>
> @@ -1989,7 +1989,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
> /* Continue until we make progress */
> if (class_index_new >= class_index_old ||
> (threshold && comp_len_new >= threshold)) {
> - zcomp_stream_put(zram->comps[prio]);
> + zcomp_stream_put(zstrm);
> continue;
> }
>
> @@ -2047,13 +2047,13 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
> __GFP_HIGHMEM |
> __GFP_MOVABLE);
> if (IS_ERR_VALUE(handle_new)) {
> - zcomp_stream_put(zram->comps[prio]);
> + zcomp_stream_put(zstrm);
> return PTR_ERR((void *)handle_new);
> }
>
> dst = zs_map_object(zram->mem_pool, handle_new, ZS_MM_WO);
> memcpy(dst, zstrm->buffer, comp_len_new);
> - zcomp_stream_put(zram->comps[prio]);
> + zcomp_stream_put(zstrm);
>
> zs_unmap_object(zram->mem_pool, handle_new);
>
> --
> 2.48.1.502.g6dc24dfdaf-goog
>
next prev parent reply other threads:[~2025-02-12 16:02 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-12 6:26 [PATCH v5 00/18] zsmalloc/zram: there be preemption Sergey Senozhatsky
2025-02-12 6:26 ` [PATCH v5 01/18] zram: sleepable entry locking Sergey Senozhatsky
2025-02-13 0:08 ` Andrew Morton
2025-02-13 0:52 ` Sergey Senozhatsky
2025-02-13 1:42 ` Sergey Senozhatsky
2025-02-13 8:49 ` Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 02/18] zram: permit preemption with active compression stream Sergey Senozhatsky
2025-02-12 16:01 ` Yosry Ahmed [this message]
2025-02-13 1:04 ` Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 03/18] zram: remove crypto include Sergey Senozhatsky
2025-02-12 16:13 ` Yosry Ahmed
2025-02-13 0:53 ` Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 04/18] zram: remove max_comp_streams device attr Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 05/18] zram: remove two-staged handle allocation Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 06/18] zram: remove writestall zram_stats member Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 07/18] zram: limit max recompress prio to num_active_comps Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 08/18] zram: filter out recomp targets based on priority Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 09/18] zram: rework recompression loop Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 10/18] zsmalloc: factor out pool locking helpers Sergey Senozhatsky
2025-02-12 16:18 ` Yosry Ahmed
2025-02-12 16:19 ` Yosry Ahmed
2025-02-13 0:57 ` Sergey Senozhatsky
2025-02-13 1:12 ` Yosry Ahmed
2025-02-13 2:54 ` Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 11/18] zsmalloc: factor out size-class " Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 12/18] zsmalloc: make zspage lock preemptible Sergey Senozhatsky
2025-02-12 17:14 ` Yosry Ahmed
2025-02-13 1:20 ` Sergey Senozhatsky
2025-02-13 1:31 ` Yosry Ahmed
2025-02-13 1:53 ` Sergey Senozhatsky
2025-02-13 11:32 ` Hillf Danton
2025-02-13 12:29 ` Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 13/18] zsmalloc: introduce new object mapping API Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 14/18] zram: switch to new zsmalloc " Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 15/18] zram: permit reclaim in zstd custom allocator Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 16/18] zram: do not leak page on recompress_store error path Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 17/18] zram: do not leak page on writeback_store " Sergey Senozhatsky
2025-02-12 6:27 ` [PATCH v5 18/18] zram: add might_sleep to zcomp API Sergey Senozhatsky
2025-02-13 0:09 ` [PATCH v5 00/18] zsmalloc/zram: there be preemption Andrew Morton
2025-02-13 0:51 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6zF5QvTQwVoAhMP@google.com \
--to=yosry.ahmed@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ryncsn@gmail.com \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.