public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] zram: split page type read/write handling
@ 2024-12-10 10:53 Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 1/6] zram: cond_resched() in writeback loop Sergey Senozhatsky
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

This is a subset of [1] series which contains only fixes and
improvements (no new features, as ZRAM_HUGE split is still
under consideration).

The motivation for factoring out is that zram_write_page()
gets more and more complex all the time, because it tries
to handle too many scenarios: ZRAM_SAME store, ZRAM_HUGE
store, compress page store with zs_malloc allocation slowpath
and conditional recompression, etc.  Factor those out and
make things easier to handle.

Addition of cond_resched() is simply a fix, I can trigger
watchdog from zram writeback().  And early slot free is
just a reasonable thing to do.

[1] https://lore.kernel.org/linux-kernel/20241119072057.3440039-1-senozhatsky@chromium.org

Sergey Senozhatsky (6):
  zram: cond_resched() in writeback loop
  zram: free slot memory early during write
  zram: remove entry element member
  zram: factor out ZRAM_SAME write
  zram: factor out ZRAM_HUGE write
  zram: factor out different page types read

 drivers/block/zram/zram_drv.c | 293 +++++++++++++++++++---------------
 drivers/block/zram/zram_drv.h |   5 +-
 2 files changed, 167 insertions(+), 131 deletions(-)

-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
@ 2024-12-10 10:53 ` Sergey Senozhatsky
  2024-12-11  0:54   ` Andrew Morton
  2024-12-10 10:53 ` [PATCH 2/6] zram: free slot memory early during write Sergey Senozhatsky
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

Writeback loop can run for quite a while (depending on
wb device performance, compression algorithm and the
number of entries we writeback), so we need to do
cond_resched() there, similarly to what we do in
recompress loop.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 5de7f30d0aa6..172546aa1fce 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
 next:
 		zram_slot_unlock(zram, index);
 		release_pp_slot(zram, pps);
+
+		cond_resched();
 	}
 
 	if (blk_idx)
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] zram: free slot memory early during write
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 1/6] zram: cond_resched() in writeback loop Sergey Senozhatsky
@ 2024-12-10 10:53 ` Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 3/6] zram: remove entry element member Sergey Senozhatsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

In the current implementation entry's previously allocated
memory is released in the very last moment, when we already
have allocated a new memory for new data.  This, basically,
temporarily increases memory usage for no good reason.  For
example, consider the case when both old (stale) and new
entry data are incompressible so such entry will temporarily
use two physical pages - one for stale (old) data and one
for new data.  We can release old memory as soon as we get
a write request for entry.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 172546aa1fce..174b4053189b 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1647,6 +1647,11 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	unsigned long element = 0;
 	enum zram_pageflags flags = 0;
 
+	/* First, free memory allocated to this slot (if any) */
+	zram_slot_lock(zram, index);
+	zram_free_page(zram, index);
+	zram_slot_unlock(zram, index);
+
 	mem = kmap_local_page(page);
 	if (page_same_filled(mem, &element)) {
 		kunmap_local(mem);
@@ -1742,13 +1747,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	zs_unmap_object(zram->mem_pool, handle);
 	atomic64_add(comp_len, &zram->stats.compr_data_size);
 out:
-	/*
-	 * Free memory associated with this sector
-	 * before overwriting unused sectors.
-	 */
 	zram_slot_lock(zram, index);
-	zram_free_page(zram, index);
-
 	if (comp_len == PAGE_SIZE) {
 		zram_set_flag(zram, index, ZRAM_HUGE);
 		atomic64_inc(&zram->stats.huge_pages);
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] zram: remove entry element member
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 1/6] zram: cond_resched() in writeback loop Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 2/6] zram: free slot memory early during write Sergey Senozhatsky
@ 2024-12-10 10:53 ` Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 4/6] zram: factor out ZRAM_SAME write Sergey Senozhatsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

Element is in the same anon union as handle and hence
holds the same value, which makes code below sort of
confusing

    handle = zram_get_handle()
    if (!handle)
	element = zram_get_element()

Element doesn't really simplify the code, let's just
remove it.  We already re-purpose handle to store the
block id a written back page.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 23 +++++------------------
 drivers/block/zram/zram_drv.h |  5 +----
 2 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 174b4053189b..f68916527846 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -112,17 +112,6 @@ static void zram_clear_flag(struct zram *zram, u32 index,
 	zram->table[index].flags &= ~BIT(flag);
 }
 
-static inline void zram_set_element(struct zram *zram, u32 index,
-			unsigned long element)
-{
-	zram->table[index].element = element;
-}
-
-static unsigned long zram_get_element(struct zram *zram, u32 index)
-{
-	return zram->table[index].element;
-}
-
 static size_t zram_get_obj_size(struct zram *zram, u32 index)
 {
 	return zram->table[index].flags & (BIT(ZRAM_FLAG_SHIFT) - 1);
@@ -879,7 +868,7 @@ static ssize_t writeback_store(struct device *dev,
 
 		zram_free_page(zram, index);
 		zram_set_flag(zram, index, ZRAM_WB);
-		zram_set_element(zram, index, blk_idx);
+		zram_set_handle(zram, index, blk_idx);
 		blk_idx = 0;
 		atomic64_inc(&zram->stats.pages_stored);
 		spin_lock(&zram->wb_limit_lock);
@@ -1502,7 +1491,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 
 	if (zram_test_flag(zram, index, ZRAM_WB)) {
 		zram_clear_flag(zram, index, ZRAM_WB);
-		free_block_bdev(zram, zram_get_element(zram, index));
+		free_block_bdev(zram, zram_get_handle(zram, index));
 		goto out;
 	}
 
@@ -1546,12 +1535,10 @@ static int zram_read_from_zspool(struct zram *zram, struct page *page,
 
 	handle = zram_get_handle(zram, index);
 	if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
-		unsigned long value;
 		void *mem;
 
-		value = handle ? zram_get_element(zram, index) : 0;
 		mem = kmap_local_page(page);
-		zram_fill_page(mem, PAGE_SIZE, value);
+		zram_fill_page(mem, PAGE_SIZE, handle);
 		kunmap_local(mem);
 		return 0;
 	}
@@ -1597,7 +1584,7 @@ static int zram_read_page(struct zram *zram, struct page *page, u32 index,
 		 */
 		zram_slot_unlock(zram, index);
 
-		ret = read_from_bdev(zram, page, zram_get_element(zram, index),
+		ret = read_from_bdev(zram, page, zram_get_handle(zram, index),
 				     parent);
 	}
 
@@ -1756,7 +1743,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 
 	if (flags) {
 		zram_set_flag(zram, index, flags);
-		zram_set_element(zram, index, element);
+		zram_set_handle(zram, index, element);
 	}  else {
 		zram_set_handle(zram, index, handle);
 		zram_set_obj_size(zram, index, comp_len);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 134be414e210..db78d7c01b9a 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -62,10 +62,7 @@ enum zram_pageflags {
 
 /* Allocated for each disk page */
 struct zram_table_entry {
-	union {
-		unsigned long handle;
-		unsigned long element;
-	};
+	unsigned long handle;
 	unsigned int flags;
 	spinlock_t lock;
 #ifdef CONFIG_ZRAM_TRACK_ENTRY_ACTIME
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] zram: factor out ZRAM_SAME write
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2024-12-10 10:53 ` [PATCH 3/6] zram: remove entry element member Sergey Senozhatsky
@ 2024-12-10 10:53 ` Sergey Senozhatsky
  2024-12-10 10:53 ` [PATCH 5/6] zram: factor out ZRAM_HUGE write Sergey Senozhatsky
  2024-12-10 10:54 ` [PATCH 6/6] zram: factor out different page types read Sergey Senozhatsky
  5 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

Handling of ZRAM_SAME now uses a goto to the final stages of
zram_write_page() plus it introduces a branch and flags variable,
which is not making the code any simpler.  In reality, we can
handle ZRAM_SAME immediately when we detect such pages and
remove a goto and a branch.

Factor out ZRAM_SAME handling into a separate routine to
simplify zram_write_page().

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 37 ++++++++++++++++++++---------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f68916527846..22c6ab363ae6 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1622,6 +1622,20 @@ static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
 	return zram_read_page(zram, bvec->bv_page, index, bio);
 }
 
+static int write_same_filled_page(struct zram *zram, unsigned long fill,
+				  u32 index)
+{
+	zram_slot_lock(zram, index);
+	zram_set_flag(zram, index, ZRAM_SAME);
+	zram_set_handle(zram, index, fill);
+	zram_slot_unlock(zram, index);
+
+	atomic64_inc(&zram->stats.same_pages);
+	atomic64_inc(&zram->stats.pages_stored);
+
+	return 0;
+}
+
 static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 {
 	int ret = 0;
@@ -1632,7 +1646,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	void *src, *dst, *mem;
 	struct zcomp_strm *zstrm;
 	unsigned long element = 0;
-	enum zram_pageflags flags = 0;
+	bool same_filled;
 
 	/* First, free memory allocated to this slot (if any) */
 	zram_slot_lock(zram, index);
@@ -1640,14 +1654,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	zram_slot_unlock(zram, index);
 
 	mem = kmap_local_page(page);
-	if (page_same_filled(mem, &element)) {
-		kunmap_local(mem);
-		/* Free memory associated with this sector now. */
-		flags = ZRAM_SAME;
-		atomic64_inc(&zram->stats.same_pages);
-		goto out;
-	}
+	same_filled = page_same_filled(mem, &element);
 	kunmap_local(mem);
+	if (same_filled)
+		return write_same_filled_page(zram, element, index);
 
 compress_again:
 	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
@@ -1733,7 +1743,7 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
 	zs_unmap_object(zram->mem_pool, handle);
 	atomic64_add(comp_len, &zram->stats.compr_data_size);
-out:
+
 	zram_slot_lock(zram, index);
 	if (comp_len == PAGE_SIZE) {
 		zram_set_flag(zram, index, ZRAM_HUGE);
@@ -1741,13 +1751,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 		atomic64_inc(&zram->stats.huge_pages_since);
 	}
 
-	if (flags) {
-		zram_set_flag(zram, index, flags);
-		zram_set_handle(zram, index, element);
-	}  else {
-		zram_set_handle(zram, index, handle);
-		zram_set_obj_size(zram, index, comp_len);
-	}
+	zram_set_handle(zram, index, handle);
+	zram_set_obj_size(zram, index, comp_len);
 	zram_slot_unlock(zram, index);
 
 	/* Update stats */
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] zram: factor out ZRAM_HUGE write
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2024-12-10 10:53 ` [PATCH 4/6] zram: factor out ZRAM_SAME write Sergey Senozhatsky
@ 2024-12-10 10:53 ` Sergey Senozhatsky
  2024-12-10 11:31   ` Sergey Senozhatsky
  2024-12-11 10:06   ` Sergey Senozhatsky
  2024-12-10 10:54 ` [PATCH 6/6] zram: factor out different page types read Sergey Senozhatsky
  5 siblings, 2 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

zram_write_page() handles: ZRAM_SAME pages (which was already
factored out) stores, regular page stores and ZRAM_HUGE pages
stores.

ZRAM_HUGE handling adds a significant amount of complexity.
Instead, we can handle ZRAM_HUGE in a separate function.  This
allows us to simplify zs_handle allocations slow-path, as it
now does not handle ZRAM_HUGE case.  ZRAM_HUGE zs_handle
allocation, on the other hand, can now drop __GFP_KSWAPD_RECLAIM
because we handle ZRAM_HUGE in preemptible context (outside of
local-lock scope).

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 139 ++++++++++++++++++++--------------
 1 file changed, 83 insertions(+), 56 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 22c6ab363ae6..18263e4c208e 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -132,6 +132,27 @@ static inline bool zram_allocated(struct zram *zram, u32 index)
 			zram_test_flag(zram, index, ZRAM_WB);
 }
 
+static inline void update_used_max(struct zram *zram, const unsigned long pages)
+{
+	unsigned long cur_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		if (cur_max >= pages)
+			return;
+	} while (!atomic_long_try_cmpxchg(&zram->stats.max_used_pages,
+					  &cur_max, pages));
+}
+
+static bool zram_can_store_page(struct zram *zram)
+{
+	unsigned long alloced_pages;
+
+	alloced_pages = zs_get_total_pages(zram->mem_pool);
+	update_used_max(zram, alloced_pages);
+
+	return !zram->limit_pages || alloced_pages <= zram->limit_pages;
+}
+
 #if PAGE_SIZE != 4096
 static inline bool is_partial_io(struct bio_vec *bvec)
 {
@@ -266,18 +287,6 @@ static struct zram_pp_slot *select_pp_slot(struct zram_pp_ctl *ctl)
 }
 #endif
 
-static inline void update_used_max(struct zram *zram,
-					const unsigned long pages)
-{
-	unsigned long cur_max = atomic_long_read(&zram->stats.max_used_pages);
-
-	do {
-		if (cur_max >= pages)
-			return;
-	} while (!atomic_long_try_cmpxchg(&zram->stats.max_used_pages,
-					  &cur_max, pages));
-}
-
 static inline void zram_fill_page(void *ptr, unsigned long len,
 					unsigned long value)
 {
@@ -1636,14 +1645,55 @@ static int write_same_filled_page(struct zram *zram, unsigned long fill,
 	return 0;
 }
 
+static int write_incompressible_page(struct zram *zram, struct page *page,
+				     u32 index)
+{
+	unsigned long handle;
+	void *src, *dst;
+
+	/*
+	 * This function is called from preemptible context so we don't need
+	 * to do optimistic and fallback to pessimistic handle allocation,
+	 * like we do for compressible pages.
+	 */
+	handle = zs_malloc(zram->mem_pool, PAGE_SIZE,
+			   GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
+	if (IS_ERR_VALUE(handle))
+		return PTR_ERR((void *)handle);
+
+	if (!zram_can_store_page(zram)) {
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+		zs_free(zram->mem_pool, handle);
+		return -ENOMEM;
+	}
+
+	dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
+	src = kmap_local_page(page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_local(src);
+	zs_unmap_object(zram->mem_pool, handle);
+
+	zram_slot_lock(zram, index);
+	zram_set_flag(zram, index, ZRAM_HUGE);
+	zram_set_handle(zram, index, handle);
+	zram_set_obj_size(zram, index, PAGE_SIZE);
+	zram_slot_unlock(zram, index);
+
+	atomic64_add(PAGE_SIZE, &zram->stats.compr_data_size);
+	atomic64_inc(&zram->stats.huge_pages);
+	atomic64_inc(&zram->stats.huge_pages_since);
+	atomic64_inc(&zram->stats.pages_stored);
+
+	return 0;
+}
+
 static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 {
 	int ret = 0;
-	unsigned long alloced_pages;
 	unsigned long handle = -ENOMEM;
 	unsigned int comp_len = 0;
 	unsigned int last_comp_len = 0;
-	void *src, *dst, *mem;
+	void *dst, *mem;
 	struct zcomp_strm *zstrm;
 	unsigned long element = 0;
 	bool same_filled;
@@ -1661,10 +1711,10 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 
 compress_again:
 	zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
-	src = kmap_local_page(page);
+	mem = kmap_local_page(page);
 	ret = zcomp_compress(zram->comps[ZRAM_PRIMARY_COMP], zstrm,
-			     src, &comp_len);
-	kunmap_local(src);
+			     mem, &comp_len);
+	kunmap_local(mem);
 
 	if (unlikely(ret)) {
 		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
@@ -1673,13 +1723,16 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 		return ret;
 	}
 
-	if (comp_len >= huge_class_size)
-		comp_len = PAGE_SIZE;
-
 	if (last_comp_len && (last_comp_len != comp_len)) {
 		zs_free(zram->mem_pool, handle);
 		handle = -ENOMEM;
 	}
+
+	if (comp_len >= huge_class_size) {
+		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
+		return write_incompressible_page(zram, page, index);
+	}
+
 	/*
 	 * handle allocation has 2 paths:
 	 * a) fast path is executed with preemption disabled (for
@@ -1695,37 +1748,22 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	 */
 	if (IS_ERR_VALUE(handle))
 		handle = zs_malloc(zram->mem_pool, comp_len,
-				__GFP_KSWAPD_RECLAIM |
-				__GFP_NOWARN |
-				__GFP_HIGHMEM |
-				__GFP_MOVABLE);
+				   __GFP_KSWAPD_RECLAIM |
+				   __GFP_NOWARN |
+				   __GFP_HIGHMEM |
+				   __GFP_MOVABLE);
 	if (IS_ERR_VALUE(handle)) {
 		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
 		atomic64_inc(&zram->stats.writestall);
 		handle = zs_malloc(zram->mem_pool, comp_len,
-				GFP_NOIO | __GFP_HIGHMEM |
-				__GFP_MOVABLE);
+				   GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
 		if (IS_ERR_VALUE(handle))
 			return PTR_ERR((void *)handle);
 
-		if (comp_len != PAGE_SIZE) {
-			last_comp_len = comp_len;
-			goto compress_again;
-		}
-		/*
-		 * If the page is not compressible, you need to acquire the
-		 * lock and execute the code below. The zcomp_stream_get()
-		 * call is needed to disable the cpu hotplug and grab the
-		 * zstrm buffer back. It is necessary that the dereferencing
-		 * of the zstrm variable below occurs correctly.
-		 */
-		zstrm = zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP]);
+		goto compress_again;
 	}
 
-	alloced_pages = zs_get_total_pages(zram->mem_pool);
-	update_used_max(zram, alloced_pages);
-
-	if (zram->limit_pages && alloced_pages > zram->limit_pages) {
+	if (!zram_can_store_page(zram)) {
 		zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
 		zs_free(zram->mem_pool, handle);
 		return -ENOMEM;
@@ -1733,30 +1771,19 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 
 	dst = zs_map_object(zram->mem_pool, handle, ZS_MM_WO);
 
-	src = zstrm->buffer;
-	if (comp_len == PAGE_SIZE)
-		src = kmap_local_page(page);
-	memcpy(dst, src, comp_len);
-	if (comp_len == PAGE_SIZE)
-		kunmap_local(src);
-
+	memcpy(dst, zstrm->buffer, comp_len);
 	zcomp_stream_put(zram->comps[ZRAM_PRIMARY_COMP]);
 	zs_unmap_object(zram->mem_pool, handle);
-	atomic64_add(comp_len, &zram->stats.compr_data_size);
 
 	zram_slot_lock(zram, index);
-	if (comp_len == PAGE_SIZE) {
-		zram_set_flag(zram, index, ZRAM_HUGE);
-		atomic64_inc(&zram->stats.huge_pages);
-		atomic64_inc(&zram->stats.huge_pages_since);
-	}
-
 	zram_set_handle(zram, index, handle);
 	zram_set_obj_size(zram, index, comp_len);
 	zram_slot_unlock(zram, index);
 
 	/* Update stats */
 	atomic64_inc(&zram->stats.pages_stored);
+	atomic64_add(comp_len, &zram->stats.compr_data_size);
+
 	return ret;
 }
 
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] zram: factor out different page types read
  2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2024-12-10 10:53 ` [PATCH 5/6] zram: factor out ZRAM_HUGE write Sergey Senozhatsky
@ 2024-12-10 10:54 ` Sergey Senozhatsky
  5 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 10:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

Similarly to write, split the page read code into ZRAM_HUGE
read, ZRAM_SAME read and compressed page read to simplify
the code.

Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
---
 drivers/block/zram/zram_drv.c | 85 +++++++++++++++++++++--------------
 1 file changed, 52 insertions(+), 33 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 18263e4c208e..e4a7191ec13c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1528,54 +1528,73 @@ static void zram_free_page(struct zram *zram, size_t index)
 	zram_set_obj_size(zram, index, 0);
 }
 
-/*
- * Reads (decompresses if needed) a page from zspool (zsmalloc).
- * Corresponding ZRAM slot should be locked.
- */
-static int zram_read_from_zspool(struct zram *zram, struct page *page,
+static int read_same_filled_page(struct zram *zram, struct page *page,
 				 u32 index)
 {
-	struct zcomp_strm *zstrm;
+	void *mem;
+
+	mem = kmap_local_page(page);
+	zram_fill_page(mem, PAGE_SIZE, zram_get_handle(zram, index));
+	kunmap_local(mem);
+	return 0;
+}
+
+static int read_incompressible_page(struct zram *zram, struct page *page,
+				    u32 index)
+{
 	unsigned long handle;
-	unsigned int size;
 	void *src, *dst;
-	u32 prio;
-	int ret;
 
 	handle = zram_get_handle(zram, index);
-	if (!handle || zram_test_flag(zram, index, ZRAM_SAME)) {
-		void *mem;
+	src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
+	dst = kmap_local_page(page);
+	copy_page(dst, src);
+	kunmap_local(dst);
+	zs_unmap_object(zram->mem_pool, handle);
 
-		mem = kmap_local_page(page);
-		zram_fill_page(mem, PAGE_SIZE, handle);
-		kunmap_local(mem);
-		return 0;
-	}
+	return 0;
+}
 
-	size = zram_get_obj_size(zram, index);
+static int read_compressed_page(struct zram *zram, struct page *page, u32 index)
+{
+	struct zcomp_strm *zstrm;
+	unsigned long handle;
+	unsigned int size;
+	void *src, *dst;
+	int ret, prio;
 
-	if (size != PAGE_SIZE) {
-		prio = zram_get_priority(zram, index);
-		zstrm = zcomp_stream_get(zram->comps[prio]);
-	}
+	handle = zram_get_handle(zram, index);
+	size = zram_get_obj_size(zram, index);
+	prio = zram_get_priority(zram, index);
 
+	zstrm = zcomp_stream_get(zram->comps[prio]);
 	src = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);
-	if (size == PAGE_SIZE) {
-		dst = kmap_local_page(page);
-		copy_page(dst, src);
-		kunmap_local(dst);
-		ret = 0;
-	} else {
-		dst = kmap_local_page(page);
-		ret = zcomp_decompress(zram->comps[prio], zstrm,
-				       src, size, dst);
-		kunmap_local(dst);
-		zcomp_stream_put(zram->comps[prio]);
-	}
+	dst = kmap_local_page(page);
+	ret = zcomp_decompress(zram->comps[prio], zstrm, src, size, dst);
+	kunmap_local(dst);
 	zs_unmap_object(zram->mem_pool, handle);
+	zcomp_stream_put(zram->comps[prio]);
+
 	return ret;
 }
 
+/*
+ * Reads (decompresses if needed) a page from zspool (zsmalloc).
+ * Corresponding ZRAM slot should be locked.
+ */
+static int zram_read_from_zspool(struct zram *zram, struct page *page,
+				 u32 index)
+{
+	if (zram_test_flag(zram, index, ZRAM_SAME) ||
+	    !zram_get_handle(zram, index))
+		return read_same_filled_page(zram, page, index);
+
+	if (!zram_test_flag(zram, index, ZRAM_HUGE))
+		return read_compressed_page(zram, page, index);
+	else
+		return read_incompressible_page(zram, page, index);
+}
+
 static int zram_read_page(struct zram *zram, struct page *page, u32 index,
 			  struct bio *parent)
 {
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] zram: factor out ZRAM_HUGE write
  2024-12-10 10:53 ` [PATCH 5/6] zram: factor out ZRAM_HUGE write Sergey Senozhatsky
@ 2024-12-10 11:31   ` Sergey Senozhatsky
  2024-12-11 10:06   ` Sergey Senozhatsky
  1 sibling, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-10 11:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Minchan Kim, linux-kernel, Sergey Senozhatsky

On (24/12/10 19:53), Sergey Senozhatsky wrote:
> @@ -1673,13 +1723,16 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
>  		return ret;
>  	}
>  
> -	if (comp_len >= huge_class_size)
> -		comp_len = PAGE_SIZE;
> -
>  	if (last_comp_len && (last_comp_len != comp_len)) {
>  		zs_free(zram->mem_pool, handle);
>  		handle = -ENOMEM;
>  	}

Andrew, JFI, this requires [1] fixup to be applied.

[1] https://lore.kernel.org/mm-commits/20241210093835.GN16709@google.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-10 10:53 ` [PATCH 1/6] zram: cond_resched() in writeback loop Sergey Senozhatsky
@ 2024-12-11  0:54   ` Andrew Morton
  2024-12-11  3:43     ` Sergey Senozhatsky
  2024-12-11  4:11     ` Sergey Senozhatsky
  0 siblings, 2 replies; 16+ messages in thread
From: Andrew Morton @ 2024-12-11  0:54 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Minchan Kim, linux-kernel

On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:

> Writeback loop can run for quite a while (depending on
> wb device performance, compression algorithm and the
> number of entries we writeback), so we need to do
> cond_resched() there, similarly to what we do in
> recompress loop.
> 
> ...
>
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
>  next:
>  		zram_slot_unlock(zram, index);
>  		release_pp_slot(zram, pps);
> +
> +		cond_resched();
>  	}
>  
>  	if (blk_idx)

Should this be treated as a hotfix?  With a -stable backport?

If so, we'd need to explain our reasoning in the changelog.  "Fixes a
watchdog lockup splat when running <workload>".  And a Fixes: would be
nice if appropriate.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-11  0:54   ` Andrew Morton
@ 2024-12-11  3:43     ` Sergey Senozhatsky
  2024-12-11  3:59       ` Sergey Senozhatsky
  2024-12-11  4:11     ` Sergey Senozhatsky
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-11  3:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, Minchan Kim, linux-kernel

On (24/12/10 16:54), Andrew Morton wrote:
> > Writeback loop can run for quite a while (depending on
> > wb device performance, compression algorithm and the
> > number of entries we writeback), so we need to do
> > cond_resched() there, similarly to what we do in
> > recompress loop.
> > 
> > ...
> >
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
> >  next:
> >  		zram_slot_unlock(zram, index);
> >  		release_pp_slot(zram, pps);
> > +
> > +		cond_resched();
> >  	}
> >  
> >  	if (blk_idx)
> 
> Should this be treated as a hotfix?  With a -stable backport?
> 
> If so, we'd need to explain our reasoning in the changelog.  "Fixes a
> watchdog lockup splat when running <workload>".  And a Fixes: would be
> nice if appropriate.

Good point.  This fixes commit from 2018, I guess no one runs writebacks
on preempt-none systems, but I don't see why this should be in -stable.
I'll send updated patch in a bit.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-11  3:43     ` Sergey Senozhatsky
@ 2024-12-11  3:59       ` Sergey Senozhatsky
  0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-11  3:59 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, Minchan Kim, linux-kernel

On (24/12/11 12:43), Sergey Senozhatsky wrote:
> > Should this be treated as a hotfix?  With a -stable backport?
> > 
> > If so, we'd need to explain our reasoning in the changelog.  "Fixes a
> > watchdog lockup splat when running <workload>".  And a Fixes: would be
> > nice if appropriate.
> 
> Good point.  This fixes commit from 2018, I guess no one runs writebacks
> on preempt-none systems, but I don't see why this should be in -stable.
                                                 ^^ should not be in -stable

> I'll send updated patch in a bit.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-11  0:54   ` Andrew Morton
  2024-12-11  3:43     ` Sergey Senozhatsky
@ 2024-12-11  4:11     ` Sergey Senozhatsky
  2024-12-11  7:49       ` Sergey Senozhatsky
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-11  4:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, Minchan Kim, linux-kernel

On (24/12/10 16:54), Andrew Morton wrote:
> On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> > Writeback loop can run for quite a while (depending on
> > wb device performance, compression algorithm and the
> > number of entries we writeback), so we need to do
> > cond_resched() there, similarly to what we do in
> > recompress loop.
> > 
> > ...
> >
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
> >  next:
> >  		zram_slot_unlock(zram, index);
> >  		release_pp_slot(zram, pps);
> > +
> > +		cond_resched();
> >  	}
> >  
> >  	if (blk_idx)
> 
> Should this be treated as a hotfix?  With a -stable backport?

Actually... can I please ask you to drop this [1] particular patch for
now?  The stall should not happen, because submit_bio_wait() is a
rescheduling point (in blk_wait_io()).  So I'm not sure why I'm seeing
unhappy watchdogs.

[1] https://lore.kernel.org/mm-commits/20241211005510.842DFC4CED6@smtp.kernel.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/6] zram: cond_resched() in writeback loop
  2024-12-11  4:11     ` Sergey Senozhatsky
@ 2024-12-11  7:49       ` Sergey Senozhatsky
  0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-11  7:49 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, Minchan Kim, linux-kernel

On (24/12/11 13:11), Sergey Senozhatsky wrote:
> On (24/12/10 16:54), Andrew Morton wrote:
> > On Tue, 10 Dec 2024 19:53:55 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> > 
> > > Writeback loop can run for quite a while (depending on
> > > wb device performance, compression algorithm and the
> > > number of entries we writeback), so we need to do
> > > cond_resched() there, similarly to what we do in
> > > recompress loop.
> > > 
> > > ...
> > >
> > > --- a/drivers/block/zram/zram_drv.c
> > > +++ b/drivers/block/zram/zram_drv.c
> > > @@ -889,6 +889,8 @@ static ssize_t writeback_store(struct device *dev,
> > >  next:
> > >  		zram_slot_unlock(zram, index);
> > >  		release_pp_slot(zram, pps);
> > > +
> > > +		cond_resched();
> > >  	}
> > >  
> > >  	if (blk_idx)
> > 
> > Should this be treated as a hotfix?  With a -stable backport?
> 
> Actually... can I please ask you to drop this [1] particular patch for
> now?  The stall should not happen, because submit_bio_wait() is a
> rescheduling point (in blk_wait_io()).  So I'm not sure why I'm seeing
> unhappy watchdogs.

OK, so.  submit_bio_wait() is not necessarily a rescheduling point.
By the time it calls blk_wait_io() the I/O can already be completed
so it won't schedule().  Why would I/O be completed is another story.
For instance, the backing device may have BD_HAS_SUBMIT_BIO bit set
so __submit_bio() would call disk->fops->submit_bio(bio) on the backing
device directly.  So on such setups we end up in a loop

		for_each (target slot) {
			decompress slot
			submit bio
				disk->fops->submit_bio
		}

without rescheduling.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] zram: factor out ZRAM_HUGE write
  2024-12-10 10:53 ` [PATCH 5/6] zram: factor out ZRAM_HUGE write Sergey Senozhatsky
  2024-12-10 11:31   ` Sergey Senozhatsky
@ 2024-12-11 10:06   ` Sergey Senozhatsky
  2024-12-11 23:51     ` Andrew Morton
  1 sibling, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-11 10:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, Minchan Kim, linux-kernel

[..]On (24/12/10 19:53), Sergey Senozhatsky wrote:
> -				GFP_NOIO | __GFP_HIGHMEM |
> -				__GFP_MOVABLE);
> +				   GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
>  		if (IS_ERR_VALUE(handle))
>  			return PTR_ERR((void *)handle);
>  
> -		if (comp_len != PAGE_SIZE) {
> -			last_comp_len = comp_len;
> -			goto compress_again;
> -		}

I deleted too much, we need `last_comp_len = comp_len`.  This is why
I think zram_write_page() is too complex now.  (I'm still very very
unsre about last_comp_len patch.)


Andrew, please drop this whole series. I'll resend the combined
series (including zram writeback cond_resched() patch).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] zram: factor out ZRAM_HUGE write
  2024-12-11 10:06   ` Sergey Senozhatsky
@ 2024-12-11 23:51     ` Andrew Morton
  2024-12-12  3:45       ` Sergey Senozhatsky
  0 siblings, 1 reply; 16+ messages in thread
From: Andrew Morton @ 2024-12-11 23:51 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Minchan Kim, linux-kernel

On Wed, 11 Dec 2024 19:06:38 +0900 Sergey Senozhatsky <senozhatsky@chromium.org> wrote:

> Andrew, please drop this whole series. I'll resend the combined
> series (including zram writeback cond_resched() patch).

Gone.

Probably it's better to include linux-mm on zram patches?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] zram: factor out ZRAM_HUGE write
  2024-12-11 23:51     ` Andrew Morton
@ 2024-12-12  3:45       ` Sergey Senozhatsky
  0 siblings, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2024-12-12  3:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Sergey Senozhatsky, Minchan Kim, linux-kernel

On (24/12/11 15:51), Andrew Morton wrote:
> 
> Probably it's better to include linux-mm on zram patches?

OK, I'll try to.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-12-12  3:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-10 10:53 [PATCH 0/6] zram: split page type read/write handling Sergey Senozhatsky
2024-12-10 10:53 ` [PATCH 1/6] zram: cond_resched() in writeback loop Sergey Senozhatsky
2024-12-11  0:54   ` Andrew Morton
2024-12-11  3:43     ` Sergey Senozhatsky
2024-12-11  3:59       ` Sergey Senozhatsky
2024-12-11  4:11     ` Sergey Senozhatsky
2024-12-11  7:49       ` Sergey Senozhatsky
2024-12-10 10:53 ` [PATCH 2/6] zram: free slot memory early during write Sergey Senozhatsky
2024-12-10 10:53 ` [PATCH 3/6] zram: remove entry element member Sergey Senozhatsky
2024-12-10 10:53 ` [PATCH 4/6] zram: factor out ZRAM_SAME write Sergey Senozhatsky
2024-12-10 10:53 ` [PATCH 5/6] zram: factor out ZRAM_HUGE write Sergey Senozhatsky
2024-12-10 11:31   ` Sergey Senozhatsky
2024-12-11 10:06   ` Sergey Senozhatsky
2024-12-11 23:51     ` Andrew Morton
2024-12-12  3:45       ` Sergey Senozhatsky
2024-12-10 10:54 ` [PATCH 6/6] zram: factor out different page types read Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox