linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] mm: Remove PG_reclaim
@ 2025-01-13  9:34 Kirill A. Shutemov
  2025-01-13  9:34 ` [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios Kirill A. Shutemov
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

Use PG_dropbehind instead of PG_reclaim and remove PG_reclaim.

After removing PG_relcaim, PG_readahead is exclusive user of the page
flag bit.

Kirill A. Shutemov (8):
  drm/i915/gem: Convert __shmem_writeback() to folios
  drm/i915/gem: Use PG_dropbehind instead of PG_reclaim
  mm/zswap: Use PG_dropbehind instead of PG_reclaim
  mm/swap: Use PG_dropbehind instead of PG_reclaim
  mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  mm/vmscan: Use PG_dropbehind instead of PG_reclaim in
    shrink_folio_list()
  mm/mglru: Check PG_dropcache instead of PG_reclaim in
    lru_gen_folio_seq()
  mm: Remove PG_reclaim

 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 18 ++++-----
 fs/fuse/dev.c                             |  2 +-
 fs/proc/page.c                            |  2 +-
 include/linux/mm_inline.h                 |  4 +-
 include/linux/page-flags.h                | 15 +++-----
 include/trace/events/mmflags.h            |  2 +-
 include/uapi/linux/kernel-page-flags.h    |  2 +-
 mm/filemap.c                              | 12 ------
 mm/migrate.c                              | 10 +----
 mm/page-writeback.c                       | 16 +-------
 mm/page_io.c                              | 15 +++-----
 mm/swap.c                                 | 24 +-----------
 mm/vmscan.c                               | 46 ++++++-----------------
 mm/zswap.c                                |  4 +-
 tools/mm/page-types.c                     |  8 +---
 15 files changed, 41 insertions(+), 139 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:05   ` David Hildenbrand
  2025-01-13  9:34 ` [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim Kirill A. Shutemov
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

Use folios instead of pages.

This is preparation for removing PG_reclaim.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..9016832b20fc 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -320,25 +320,25 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
 
 	/* Begin writeback on each dirty page */
 	for (i = 0; i < size >> PAGE_SHIFT; i++) {
-		struct page *page;
+		struct folio *folio;
 
-		page = find_lock_page(mapping, i);
-		if (!page)
+		folio = filemap_lock_folio(mapping, i);
+		if (!folio)
 			continue;
 
-		if (!page_mapped(page) && clear_page_dirty_for_io(page)) {
+		if (!folio_mapped(folio) && folio_clear_dirty_for_io(folio)) {
 			int ret;
 
-			SetPageReclaim(page);
-			ret = mapping->a_ops->writepage(page, &wbc);
+			folio_set_reclaim(folio);
+			ret = mapping->a_ops->writepage(&folio->page, &wbc);
 			if (!PageWriteback(page))
-				ClearPageReclaim(page);
+				folio_clear_reclaim(folio);
 			if (!ret)
 				goto put;
 		}
-		unlock_page(page);
+		folio_unlock(folio);
 put:
-		put_page(page);
+		folio_put(folio);
 	}
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
  2025-01-13  9:34 ` [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:06   ` David Hildenbrand
  2025-01-13  9:34 ` [PATCH 3/8] mm/zswap: " Kirill A. Shutemov
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

The recently introduced PG_dropbehind allows for freeing folios
immediately after writeback. Unlike PG_reclaim, it does not need vmscan
to be involved to get the folio freed.

Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
__shmem_writeback()

It is safe to leave PG_dropbehind on the folio if, for some reason
(bug?), the folio is not in a writeback state after ->writepage().
In these cases, the kernel had to clear PG_reclaim as it shared a page
flag bit with PG_readahead.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index 9016832b20fc..c1724847c001 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -329,10 +329,8 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
 		if (!folio_mapped(folio) && folio_clear_dirty_for_io(folio)) {
 			int ret;
 
-			folio_set_reclaim(folio);
+			folio_set_dropbehind(folio);
 			ret = mapping->a_ops->writepage(&folio->page, &wbc);
-			if (!PageWriteback(page))
-				folio_clear_reclaim(folio);
 			if (!ret)
 				goto put;
 		}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/8] mm/zswap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
  2025-01-13  9:34 ` [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios Kirill A. Shutemov
  2025-01-13  9:34 ` [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:06   ` David Hildenbrand
  2025-01-13 16:10   ` Yosry Ahmed
  2025-01-13  9:34 ` [PATCH 4/8] mm/swap: " Kirill A. Shutemov
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

The recently introduced PG_dropbehind allows for freeing folios
immediately after writeback. Unlike PG_reclaim, it does not need vmscan
to be involved to get the folio freed.

Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
zswap_writeback_entry().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/zswap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 167ae641379f..c20bad0b0978 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1096,8 +1096,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
 	/* folio is up to date */
 	folio_mark_uptodate(folio);
 
-	/* move it to the tail of the inactive list after end_writeback */
-	folio_set_reclaim(folio);
+	/* free the folio after writeback */
+	folio_set_dropbehind(folio);
 
 	/* start writeback */
 	__swap_writepage(folio, &wbc);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (2 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 3/8] mm/zswap: " Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:07   ` David Hildenbrand
  2025-01-13 16:17   ` Yosry Ahmed
  2025-01-13  9:34 ` [PATCH 5/8] mm/vmscan: " Kirill A. Shutemov
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

The recently introduced PG_dropbehind allows for freeing folios
immediately after writeback. Unlike PG_reclaim, it does not need vmscan
to be involved to get the folio freed.

Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
lru_deactivate_file().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/swap.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index fc8281ef4241..4eb33b4804a8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
 	folio_clear_referenced(folio);
 
 	if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
-		/*
-		 * Setting the reclaim flag could race with
-		 * folio_end_writeback() and confuse readahead.  But the
-		 * race window is _really_ small and  it's not a critical
-		 * problem.
-		 */
 		lruvec_add_folio(lruvec, folio);
-		folio_set_reclaim(folio);
+		folio_set_dropbehind(folio);
 	} else {
 		/*
 		 * The folio's writeback ended while it was in the batch.
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (3 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 4/8] mm/swap: " Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:07   ` David Hildenbrand
  2025-01-13  9:34 ` [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list() Kirill A. Shutemov
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

The recently introduced PG_dropbehind allows for freeing folios
immediately after writeback. Unlike PG_reclaim, it does not need vmscan
to be involved to get the folio freed.

Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
pageout().

It is safe to leave PG_dropbehind on the folio if, for some reason
(bug?), the folio is not in a writeback state after ->writepage().
In these cases, the kernel had to clear PG_reclaim as it shared a page
flag bit with PG_readahead.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/vmscan.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a099876fa029..d15f80333d6b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
 		if (shmem_mapping(mapping) && folio_test_large(folio))
 			wbc.list = folio_list;
 
-		folio_set_reclaim(folio);
+		folio_set_dropbehind(folio);
+
 		res = mapping->a_ops->writepage(&folio->page, &wbc);
 		if (res < 0)
 			handle_write_error(mapping, folio, res);
 		if (res == AOP_WRITEPAGE_ACTIVATE) {
-			folio_clear_reclaim(folio);
+			folio_clear_dropbehind(folio);
 			return PAGE_ACTIVATE;
 		}
 
-		if (!folio_test_writeback(folio)) {
-			/* synchronous write or broken a_ops? */
-			folio_clear_reclaim(folio);
-		}
 		trace_mm_vmscan_write_folio(folio);
 		node_stat_add_folio(folio, NR_VMSCAN_WRITE);
 		return PAGE_SUCCESS;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list()
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (4 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 5/8] mm/vmscan: " Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:08   ` David Hildenbrand
  2025-01-13  9:34 ` [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq() Kirill A. Shutemov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

The recently introduced PG_dropbehind allows for freeing folios
immediately after writeback. Unlike PG_reclaim, it does not need vmscan
to be involved to get the folio freed.

Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
shrink_folio_list().

It is safe to leave PG_dropbehind on the folio if, for some reason
(bug?), the folio is not in a writeback state after ->writepage().
In these cases, the kernel had to clear PG_reclaim as it shared a page
flag bit with PG_readahead.

Also use PG_dropbehind instead PG_reclaim to detect I/O congestion.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 mm/vmscan.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d15f80333d6b..bb5ec22f97b5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1140,7 +1140,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 * for immediate reclaim are making it to the end of
 		 * the LRU a second time.
 		 */
-		if (writeback && folio_test_reclaim(folio))
+		if (writeback && folio_test_dropbehind(folio))
 			stat->nr_congested += nr_pages;
 
 		/*
@@ -1149,7 +1149,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 *
 		 * 1) If reclaim is encountering an excessive number
 		 *    of folios under writeback and this folio has both
-		 *    the writeback and reclaim flags set, then it
+		 *    the writeback and dropbehind flags set, then it
 		 *    indicates that folios are being queued for I/O but
 		 *    are being recycled through the LRU before the I/O
 		 *    can complete. Waiting on the folio itself risks an
@@ -1174,7 +1174,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 *    would probably show more reasons.
 		 *
 		 * 3) Legacy memcg encounters a folio that already has the
-		 *    reclaim flag set. memcg does not have any dirty folio
+		 *    dropbehind flag set. memcg does not have any dirty folio
 		 *    throttling so we could easily OOM just because too many
 		 *    folios are in writeback and there is nothing else to
 		 *    reclaim. Wait for the writeback to complete.
@@ -1193,31 +1193,17 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 
 			/* Case 1 above */
 			if (current_is_kswapd() &&
-			    folio_test_reclaim(folio) &&
+			    folio_test_dropbehind(folio) &&
 			    test_bit(PGDAT_WRITEBACK, &pgdat->flags)) {
 				stat->nr_immediate += nr_pages;
 				goto activate_locked;
 
 			/* Case 2 above */
 			} else if (writeback_throttling_sane(sc) ||
-			    !folio_test_reclaim(folio) ||
+			    !folio_test_dropbehind(folio) ||
 			    !may_enter_fs(folio, sc->gfp_mask) ||
 			    (mapping && mapping_writeback_indeterminate(mapping))) {
-				/*
-				 * This is slightly racy -
-				 * folio_end_writeback() might have
-				 * just cleared the reclaim flag, then
-				 * setting the reclaim flag here ends up
-				 * interpreted as the readahead flag - but
-				 * that does not matter enough to care.
-				 * What we do want is for this folio to
-				 * have the reclaim flag set next time
-				 * memcg reclaim reaches the tests above,
-				 * so it will then wait for writeback to
-				 * avoid OOM; and it's also appropriate
-				 * in global reclaim.
-				 */
-				folio_set_reclaim(folio);
+				folio_set_dropbehind(folio);
 				stat->nr_writeback += nr_pages;
 				goto activate_locked;
 
@@ -1372,7 +1358,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 			 */
 			if (folio_is_file_lru(folio) &&
 			    (!current_is_kswapd() ||
-			     !folio_test_reclaim(folio) ||
+			     !folio_test_dropbehind(folio) ||
 			     !test_bit(PGDAT_DIRTY, &pgdat->flags))) {
 				/*
 				 * Immediately reclaim when written back.
@@ -1382,7 +1368,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 				 */
 				node_stat_mod_folio(folio, NR_VMSCAN_IMMEDIATE,
 						nr_pages);
-				folio_set_reclaim(folio);
+				folio_set_dropbehind(folio);
 
 				goto activate_locked;
 			}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq()
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (5 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list() Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:09   ` David Hildenbrand
  2025-01-13  9:34 ` [PATCH 8/8] mm: Remove PG_reclaim Kirill A. Shutemov
  2025-01-13 13:45 ` [PATCH 0/8] " Matthew Wilcox
  8 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

Kernel sets PG_dropcache instead of PG_reclaim everywhere. Check
PG_dropcache in lru_gen_folio_seq().

No need to check for dirty and writeback as there's no conflict with
PG_readahead anymore.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 include/linux/mm_inline.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f9157a0c42a5..f353d3c610ac 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -241,8 +241,7 @@ static inline unsigned long lru_gen_folio_seq(struct lruvec *lruvec, struct foli
 	else if (reclaiming)
 		gen = MAX_NR_GENS;
 	else if ((!folio_is_file_lru(folio) && !folio_test_swapcache(folio)) ||
-		 (folio_test_reclaim(folio) &&
-		  (folio_test_dirty(folio) || folio_test_writeback(folio))))
+		 folio_test_dropbehind(folio))
 		gen = MIN_NR_GENS;
 	else
 		gen = MAX_NR_GENS - folio_test_workingset(folio);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/8] mm: Remove PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (6 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq() Kirill A. Shutemov
@ 2025-01-13  9:34 ` Kirill A. Shutemov
  2025-01-13 10:11   ` David Hildenbrand
  2025-01-13 15:28   ` Matthew Wilcox
  2025-01-13 13:45 ` [PATCH 0/8] " Matthew Wilcox
  8 siblings, 2 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13  9:34 UTC (permalink / raw)
  To: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe
  Cc: Jason A. Donenfeld, Kirill A. Shutemov, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

Nobody sets the flag anymore.

Remove the PG_reclaim, making PG_readhead exclusive user of the page
flag bit.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 fs/fuse/dev.c                          |  2 +-
 fs/proc/page.c                         |  2 +-
 include/linux/mm_inline.h              |  1 -
 include/linux/page-flags.h             | 15 +++++----------
 include/trace/events/mmflags.h         |  2 +-
 include/uapi/linux/kernel-page-flags.h |  2 +-
 mm/filemap.c                           | 12 ------------
 mm/migrate.c                           | 10 ++--------
 mm/page-writeback.c                    | 16 +---------------
 mm/page_io.c                           | 15 +++++----------
 mm/swap.c                              | 16 ----------------
 mm/vmscan.c                            |  7 -------
 tools/mm/page-types.c                  |  8 +-------
 13 files changed, 18 insertions(+), 90 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 27ccae63495d..20005e2e1d28 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -827,7 +827,7 @@ static int fuse_check_folio(struct folio *folio)
 	       1 << PG_lru |
 	       1 << PG_active |
 	       1 << PG_workingset |
-	       1 << PG_reclaim |
+	       1 << PG_readahead |
 	       1 << PG_waiters |
 	       LRU_GEN_MASK | LRU_REFS_MASK))) {
 		dump_page(&folio->page, "fuse: trying to steal weird page");
diff --git a/fs/proc/page.c b/fs/proc/page.c
index a55f5acefa97..59860ba2393c 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -189,7 +189,7 @@ u64 stable_page_flags(const struct page *page)
 	u |= kpf_copy_bit(k, KPF_LRU,		PG_lru);
 	u |= kpf_copy_bit(k, KPF_REFERENCED,	PG_referenced);
 	u |= kpf_copy_bit(k, KPF_ACTIVE,	PG_active);
-	u |= kpf_copy_bit(k, KPF_RECLAIM,	PG_reclaim);
+	u |= kpf_copy_bit(k, KPF_READAHEAD,	PG_readahead);
 
 #define SWAPCACHE ((1 << PG_swapbacked) | (1 << PG_swapcache))
 	if ((k & SWAPCACHE) == SWAPCACHE)
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f353d3c610ac..269acf1f77b4 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -270,7 +270,6 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio,
 	set_mask_bits(&folio->flags, LRU_GEN_MASK | BIT(PG_active), flags);
 
 	lru_gen_update_size(lruvec, folio, -1, gen);
-	/* for folio_rotate_reclaimable() */
 	if (reclaiming)
 		list_add_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 	else
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 2414e7921eea..8f59fd8b86c9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -63,8 +63,8 @@
  * might lose their PG_swapbacked flag when they simply can be dropped (e.g. as
  * a result of MADV_FREE).
  *
- * PG_referenced, PG_reclaim are used for page reclaim for anonymous and
- * file-backed pagecache (see mm/vmscan.c).
+ * PG_referenced is used for page reclaim for anonymous and file-backed
+ * pagecache (see mm/vmscan.c).
  *
  * PG_arch_1 is an architecture specific page state bit.  The generic code
  * guarantees that this bit is cleared for a page when it first is entered into
@@ -107,7 +107,7 @@ enum pageflags {
 	PG_reserved,
 	PG_private,		/* If pagecache, has fs-private data */
 	PG_private_2,		/* If pagecache, has fs aux data */
-	PG_reclaim,		/* To be reclaimed asap */
+	PG_readahead,
 	PG_swapbacked,		/* Page is backed by RAM/swap */
 	PG_unevictable,		/* Page is "unevictable"  */
 	PG_dropbehind,		/* drop pages on IO completion */
@@ -129,8 +129,6 @@ enum pageflags {
 #endif
 	__NR_PAGEFLAGS,
 
-	PG_readahead = PG_reclaim,
-
 	/* Anonymous memory (and shmem) */
 	PG_swapcache = PG_owner_priv_1, /* Swap page: swp_entry_t in private */
 	/* Some filesystems */
@@ -168,7 +166,7 @@ enum pageflags {
 	PG_xen_remapped = PG_owner_priv_1,
 
 	/* non-lru isolated movable page */
-	PG_isolated = PG_reclaim,
+	PG_isolated = PG_readahead,
 
 	/* Only valid for buddy pages. Used to track pages that are reported */
 	PG_reported = PG_uptodate,
@@ -187,7 +185,7 @@ enum pageflags {
 	/* At least one page in this folio has the hwpoison flag set */
 	PG_has_hwpoisoned = PG_active,
 	PG_large_rmappable = PG_workingset, /* anon or file-backed */
-	PG_partially_mapped = PG_reclaim, /* was identified to be partially mapped */
+	PG_partially_mapped = PG_readahead, /* was identified to be partially mapped */
 };
 
 #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
@@ -594,9 +592,6 @@ TESTPAGEFLAG(Writeback, writeback, PF_NO_TAIL)
 	TESTSCFLAG(Writeback, writeback, PF_NO_TAIL)
 FOLIO_FLAG(mappedtodisk, FOLIO_HEAD_PAGE)
 
-/* PG_readahead is only used for reads; PG_reclaim is only for writes */
-PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
-	TESTCLEARFLAG(Reclaim, reclaim, PF_NO_TAIL)
 FOLIO_FLAG(readahead, FOLIO_HEAD_PAGE)
 	FOLIO_TEST_CLEAR_FLAG(readahead, FOLIO_HEAD_PAGE)
 
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 3bc8656c8359..15d92784a745 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -114,7 +114,7 @@
 	DEF_PAGEFLAG_NAME(private_2),					\
 	DEF_PAGEFLAG_NAME(writeback),					\
 	DEF_PAGEFLAG_NAME(head),					\
-	DEF_PAGEFLAG_NAME(reclaim),					\
+	DEF_PAGEFLAG_NAME(readahead),					\
 	DEF_PAGEFLAG_NAME(swapbacked),					\
 	DEF_PAGEFLAG_NAME(unevictable),					\
 	DEF_PAGEFLAG_NAME(dropbehind)					\
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index ff8032227876..e5a9a113e079 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -15,7 +15,7 @@
 #define KPF_ACTIVE		6
 #define KPF_SLAB		7
 #define KPF_WRITEBACK		8
-#define KPF_RECLAIM		9
+#define KPF_READAHEAD		9
 #define KPF_BUDDY		10
 
 /* 11-20: new additions in 2.6.31 */
diff --git a/mm/filemap.c b/mm/filemap.c
index 5ca26f5e7238..8951c37c8a38 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1624,18 +1624,6 @@ void folio_end_writeback(struct folio *folio)
 
 	VM_BUG_ON_FOLIO(!folio_test_writeback(folio), folio);
 
-	/*
-	 * folio_test_clear_reclaim() could be used here but it is an
-	 * atomic operation and overkill in this particular case. Failing
-	 * to shuffle a folio marked for immediate reclaim is too mild
-	 * a gain to justify taking an atomic operation penalty at the
-	 * end of every folio writeback.
-	 */
-	if (folio_test_reclaim(folio)) {
-		folio_clear_reclaim(folio);
-		folio_rotate_reclaimable(folio);
-	}
-
 	/*
 	 * Writeback does not hold a folio reference of its own, relying
 	 * on truncation to wait for the clearing of PG_writeback.
diff --git a/mm/migrate.c b/mm/migrate.c
index caadbe393aa2..beba72da5e33 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -686,6 +686,8 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
 		folio_set_young(newfolio);
 	if (folio_test_idle(folio))
 		folio_set_idle(newfolio);
+	if (folio_test_readahead(folio))
+		folio_set_readahead(newfolio);
 
 	folio_migrate_refs(newfolio, folio);
 	/*
@@ -728,14 +730,6 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
 	if (folio_test_writeback(newfolio))
 		folio_end_writeback(newfolio);
 
-	/*
-	 * PG_readahead shares the same bit with PG_reclaim.  The above
-	 * end_page_writeback() may clear PG_readahead mistakenly, so set the
-	 * bit after that.
-	 */
-	if (folio_test_readahead(folio))
-		folio_set_readahead(newfolio);
-
 	folio_copy_owner(newfolio, folio);
 	pgalloc_tag_swap(newfolio, folio);
 
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 4f5970723cf2..f2b94a2cbfcf 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2888,22 +2888,8 @@ bool folio_mark_dirty(struct folio *folio)
 {
 	struct address_space *mapping = folio_mapping(folio);
 
-	if (likely(mapping)) {
-		/*
-		 * readahead/folio_deactivate could remain
-		 * PG_readahead/PG_reclaim due to race with folio_end_writeback
-		 * About readahead, if the folio is written, the flags would be
-		 * reset. So no problem.
-		 * About folio_deactivate, if the folio is redirtied,
-		 * the flag will be reset. So no problem. but if the
-		 * folio is used by readahead it will confuse readahead
-		 * and make it restart the size rampup process. But it's
-		 * a trivial problem.
-		 */
-		if (folio_test_reclaim(folio))
-			folio_clear_reclaim(folio);
+	if (likely(mapping))
 		return mapping->a_ops->dirty_folio(mapping, folio);
-	}
 
 	return noop_dirty_folio(mapping, folio);
 }
diff --git a/mm/page_io.c b/mm/page_io.c
index 9b983de351f9..0cb71f318fb1 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -37,14 +37,11 @@ static void __end_swap_bio_write(struct bio *bio)
 		 * Re-dirty the page in order to avoid it being reclaimed.
 		 * Also print a dire warning that things will go BAD (tm)
 		 * very quickly.
-		 *
-		 * Also clear PG_reclaim to avoid folio_rotate_reclaimable()
 		 */
 		folio_mark_dirty(folio);
 		pr_alert_ratelimited("Write-error on swap-device (%u:%u:%llu)\n",
 				     MAJOR(bio_dev(bio)), MINOR(bio_dev(bio)),
 				     (unsigned long long)bio->bi_iter.bi_sector);
-		folio_clear_reclaim(folio);
 	}
 	folio_end_writeback(folio);
 }
@@ -350,19 +347,17 @@ static void sio_write_complete(struct kiocb *iocb, long ret)
 
 	if (ret != sio->len) {
 		/*
-		 * In the case of swap-over-nfs, this can be a
-		 * temporary failure if the system has limited
-		 * memory for allocating transmit buffers.
-		 * Mark the page dirty and avoid
-		 * folio_rotate_reclaimable but rate-limit the
-		 * messages.
+		 * In the case of swap-over-nfs, this can be a temporary failure
+		 * if the system has limited memory for allocating transmit
+		 * buffers.
+		 *
+		 * Mark the page dirty but rate-limit the messages.
 		 */
 		pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n",
 				   ret, swap_dev_pos(page_swap_entry(page)));
 		for (p = 0; p < sio->pages; p++) {
 			page = sio->bvec[p].bv_page;
 			set_page_dirty(page);
-			ClearPageReclaim(page);
 		}
 	}
 
diff --git a/mm/swap.c b/mm/swap.c
index 4eb33b4804a8..5b94f13821e3 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -221,22 +221,6 @@ static void lru_move_tail(struct lruvec *lruvec, struct folio *folio)
 	__count_vm_events(PGROTATED, folio_nr_pages(folio));
 }
 
-/*
- * Writeback is about to end against a folio which has been marked for
- * immediate reclaim.  If it still appears to be reclaimable, move it
- * to the tail of the inactive list.
- *
- * folio_rotate_reclaimable() must disable IRQs, to prevent nasty races.
- */
-void folio_rotate_reclaimable(struct folio *folio)
-{
-	if (folio_test_locked(folio) || folio_test_dirty(folio) ||
-	    folio_test_unevictable(folio))
-		return;
-
-	folio_batch_add_and_move(folio, lru_move_tail, true);
-}
-
 void lru_note_cost(struct lruvec *lruvec, bool file,
 		   unsigned int nr_io, unsigned int nr_rotated)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bb5ec22f97b5..e61e88e63511 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3216,9 +3216,6 @@ static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclai
 
 		new_flags = old_flags & ~(LRU_GEN_MASK | LRU_REFS_FLAGS);
 		new_flags |= (new_gen + 1UL) << LRU_GEN_PGOFF;
-		/* for folio_end_writeback() */
-		if (reclaiming)
-			new_flags |= BIT(PG_reclaim);
 	} while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
 
 	lru_gen_update_size(lruvec, folio, old_gen, new_gen);
@@ -4460,9 +4457,6 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca
 	if (!folio_test_referenced(folio))
 		set_mask_bits(&folio->flags, LRU_REFS_MASK, 0);
 
-	/* for shrink_folio_list() */
-	folio_clear_reclaim(folio);
-
 	success = lru_gen_del_folio(lruvec, folio, true);
 	VM_WARN_ON_ONCE_FOLIO(!success, folio);
 
@@ -4659,7 +4653,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 			continue;
 		}
 
-		/* retry folios that may have missed folio_rotate_reclaimable() */
 		if (!skip_retry && !folio_test_active(folio) && !folio_mapped(folio) &&
 		    !folio_test_dirty(folio) && !folio_test_writeback(folio)) {
 			list_move(&folio->lru, &clean);
diff --git a/tools/mm/page-types.c b/tools/mm/page-types.c
index bcac7ebfb51f..c06647501370 100644
--- a/tools/mm/page-types.c
+++ b/tools/mm/page-types.c
@@ -85,7 +85,6 @@
  * not part of kernel API
  */
 #define KPF_ANON_EXCLUSIVE	47
-#define KPF_READAHEAD		48
 #define KPF_SLUB_FROZEN		50
 #define KPF_SLUB_DEBUG		51
 #define KPF_FILE		61
@@ -108,7 +107,7 @@ static const char * const page_flag_names[] = {
 	[KPF_ACTIVE]		= "A:active",
 	[KPF_SLAB]		= "S:slab",
 	[KPF_WRITEBACK]		= "W:writeback",
-	[KPF_RECLAIM]		= "I:reclaim",
+	[KPF_READAHEAD]		= "I:readahead",
 	[KPF_BUDDY]		= "B:buddy",
 
 	[KPF_MMAP]		= "M:mmap",
@@ -139,7 +138,6 @@ static const char * const page_flag_names[] = {
 	[KPF_ARCH_2]		= "H:arch_2",
 
 	[KPF_ANON_EXCLUSIVE]	= "d:anon_exclusive",
-	[KPF_READAHEAD]		= "I:readahead",
 	[KPF_SLUB_FROZEN]	= "A:slub_frozen",
 	[KPF_SLUB_DEBUG]	= "E:slub_debug",
 
@@ -484,10 +482,6 @@ static uint64_t expand_overloaded_flags(uint64_t flags, uint64_t pme)
 			flags ^= BIT(ERROR) | BIT(SLUB_DEBUG);
 	}
 
-	/* PG_reclaim is overloaded as PG_readahead in the read path */
-	if ((flags & (BIT(RECLAIM) | BIT(WRITEBACK))) == BIT(RECLAIM))
-		flags ^= BIT(RECLAIM) | BIT(READAHEAD);
-
 	if (pme & PM_SOFT_DIRTY)
 		flags |= BIT(SOFTDIRTY);
 	if (pme & PM_FILE)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios
  2025-01-13  9:34 ` [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios Kirill A. Shutemov
@ 2025-01-13 10:05   ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:05 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> Use folios instead of pages.
> 
> This is preparation for removing PG_reclaim.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim Kirill A. Shutemov
@ 2025-01-13 10:06   ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> __shmem_writeback()
> 
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.

I think this is correct

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mm/zswap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 3/8] mm/zswap: " Kirill A. Shutemov
@ 2025-01-13 10:06   ` David Hildenbrand
  2025-01-13 16:10   ` Yosry Ahmed
  1 sibling, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:06 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> zswap_writeback_entry().
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   mm/zswap.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 167ae641379f..c20bad0b0978 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1096,8 +1096,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>   	/* folio is up to date */
>   	folio_mark_uptodate(folio);
>   
> -	/* move it to the tail of the inactive list after end_writeback */
> -	folio_set_reclaim(folio);
> +	/* free the folio after writeback */
> +	folio_set_dropbehind(folio);

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 4/8] mm/swap: " Kirill A. Shutemov
@ 2025-01-13 10:07   ` David Hildenbrand
  2025-01-13 16:17   ` Yosry Ahmed
  1 sibling, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:07 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> lru_deactivate_file().
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   mm/swap.c | 8 +-------
>   1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index fc8281ef4241..4eb33b4804a8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
>   	folio_clear_referenced(folio);
>   
>   	if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> -		/*
> -		 * Setting the reclaim flag could race with
> -		 * folio_end_writeback() and confuse readahead.  But the
> -		 * race window is _really_ small and  it's not a critical
> -		 * problem.
> -		 */
>   		lruvec_add_folio(lruvec, folio);
> -		folio_set_reclaim(folio);
> +		folio_set_dropbehind(folio);
>   	} else {
>   		/*
>   		 * The folio's writeback ended while it was in the batch.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 5/8] mm/vmscan: " Kirill A. Shutemov
@ 2025-01-13 10:07   ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:07 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> pageout().
> 
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   mm/vmscan.c | 9 +++------
>   1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a099876fa029..d15f80333d6b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -692,19 +692,16 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>   		if (shmem_mapping(mapping) && folio_test_large(folio))
>   			wbc.list = folio_list;
>   
> -		folio_set_reclaim(folio);
> +		folio_set_dropbehind(folio);
> +
>   		res = mapping->a_ops->writepage(&folio->page, &wbc);
>   		if (res < 0)
>   			handle_write_error(mapping, folio, res);
>   		if (res == AOP_WRITEPAGE_ACTIVATE) {
> -			folio_clear_reclaim(folio);
> +			folio_clear_dropbehind(folio);
>   			return PAGE_ACTIVATE;
>   		}
>   
> -		if (!folio_test_writeback(folio)) {
> -			/* synchronous write or broken a_ops? */
> -			folio_clear_reclaim(folio);
> -		}
>   		trace_mm_vmscan_write_folio(folio);
>   		node_stat_add_folio(folio, NR_VMSCAN_WRITE);
>   		return PAGE_SUCCESS;

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list()
  2025-01-13  9:34 ` [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list() Kirill A. Shutemov
@ 2025-01-13 10:08   ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:08 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
> 
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> shrink_folio_list().
> 
> It is safe to leave PG_dropbehind on the folio if, for some reason
> (bug?), the folio is not in a writeback state after ->writepage().
> In these cases, the kernel had to clear PG_reclaim as it shared a page
> flag bit with PG_readahead.
> 
> Also use PG_dropbehind instead PG_reclaim to detect I/O congestion.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq()
  2025-01-13  9:34 ` [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq() Kirill A. Shutemov
@ 2025-01-13 10:09   ` David Hildenbrand
  0 siblings, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:09 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> Kernel sets PG_dropcache instead of PG_reclaim everywhere. Check
> PG_dropcache in lru_gen_folio_seq().

Subject and description PG_dropcache->PG_dropbehind

Apart from that LGTM

Acked-by: David Hildenbrand <david@redhat.com>


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] mm: Remove PG_reclaim
  2025-01-13  9:34 ` [PATCH 8/8] mm: Remove PG_reclaim Kirill A. Shutemov
@ 2025-01-13 10:11   ` David Hildenbrand
  2025-01-13 15:28   ` Matthew Wilcox
  1 sibling, 0 replies; 28+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:11 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe
  Cc: Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On 13.01.25 10:34, Kirill A. Shutemov wrote:
> Nobody sets the flag anymore.
> 
> Remove the PG_reclaim, making PG_readhead exclusive user of the page
> flag bit.
> 

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/8] mm: Remove PG_reclaim
  2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
                   ` (7 preceding siblings ...)
  2025-01-13  9:34 ` [PATCH 8/8] mm: Remove PG_reclaim Kirill A. Shutemov
@ 2025-01-13 13:45 ` Matthew Wilcox
  2025-01-13 14:07   ` Kirill A. Shutemov
  8 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2025-01-13 13:45 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Jens Axboe, Jason A. Donenfeld, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 11:34:45AM +0200, Kirill A. Shutemov wrote:
> Use PG_dropbehind instead of PG_reclaim and remove PG_reclaim.

I was hoping we'd end up with the name PG_reclaim instead of the name
PG_dropbehind.  PG_reclaim is a better name for this functionality.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/8] mm: Remove PG_reclaim
  2025-01-13 13:45 ` [PATCH 0/8] " Matthew Wilcox
@ 2025-01-13 14:07   ` Kirill A. Shutemov
  0 siblings, 0 replies; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-13 14:07 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Jens Axboe, Jason A. Donenfeld, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 01:45:48PM +0000, Matthew Wilcox wrote:
> On Mon, Jan 13, 2025 at 11:34:45AM +0200, Kirill A. Shutemov wrote:
> > Use PG_dropbehind instead of PG_reclaim and remove PG_reclaim.
> 
> I was hoping we'd end up with the name PG_reclaim instead of the name
> PG_dropbehind.  PG_reclaim is a better name for this functionality.

I got burned by re-using the name with MAX_ORDER redefinition.
I guess it is less risky as it is less used, but still...

Anyway, it can be done with a patch on top of the patchset. We must get
rid of current PG_reclaim first.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] mm: Remove PG_reclaim
  2025-01-13  9:34 ` [PATCH 8/8] mm: Remove PG_reclaim Kirill A. Shutemov
  2025-01-13 10:11   ` David Hildenbrand
@ 2025-01-13 15:28   ` Matthew Wilcox
  2025-01-14  8:30     ` Kirill A. Shutemov
  1 sibling, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2025-01-13 15:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Jens Axboe, Jason A. Donenfeld, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 11:34:53AM +0200, Kirill A. Shutemov wrote:
> diff --git a/mm/migrate.c b/mm/migrate.c
> index caadbe393aa2..beba72da5e33 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -686,6 +686,8 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
>  		folio_set_young(newfolio);
>  	if (folio_test_idle(folio))
>  		folio_set_idle(newfolio);
> +	if (folio_test_readahead(folio))
> +		folio_set_readahead(newfolio);
>  
>  	folio_migrate_refs(newfolio, folio);
>  	/*

Not a problem with this patch ... but aren't we missing a
test_dropbehind / set_dropbehind pair in this function?  Or are we
prohibited from migrating a folio with the dropbehind flag set
somewhere?

> +++ b/mm/swap.c
> @@ -221,22 +221,6 @@ static void lru_move_tail(struct lruvec *lruvec, struct folio *folio)
>  	__count_vm_events(PGROTATED, folio_nr_pages(folio));
>  }
>  
> -/*
> - * Writeback is about to end against a folio which has been marked for
> - * immediate reclaim.  If it still appears to be reclaimable, move it
> - * to the tail of the inactive list.
> - *
> - * folio_rotate_reclaimable() must disable IRQs, to prevent nasty races.
> - */
> -void folio_rotate_reclaimable(struct folio *folio)
> -{
> -	if (folio_test_locked(folio) || folio_test_dirty(folio) ||
> -	    folio_test_unevictable(folio))
> -		return;
> -
> -	folio_batch_add_and_move(folio, lru_move_tail, true);
> -}

I think this is the last caller of lru_move_tail(), which means we can
get rid of fbatches->lru_move_tail and the local_lock that protects it.
Or did I miss something?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/8] mm/zswap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 3/8] mm/zswap: " Kirill A. Shutemov
  2025-01-13 10:06   ` David Hildenbrand
@ 2025-01-13 16:10   ` Yosry Ahmed
  1 sibling, 0 replies; 28+ messages in thread
From: Yosry Ahmed @ 2025-01-13 16:10 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe,
	Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, David Hildenbrand,
	Hao Ge, Jani Nikula, Johannes Weiner, Joonas Lahtinen,
	Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi,
	Nhat Pham, Oscar Salvador, Ran Xiaokai, Rodrigo Vivi,
	Simona Vetter, Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
>
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> zswap_writeback_entry().
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Acked-by: Yosry Ahmed <yosryahmed@google.com>

> ---
>  mm/zswap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 167ae641379f..c20bad0b0978 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1096,8 +1096,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
>         /* folio is up to date */
>         folio_mark_uptodate(folio);
>
> -       /* move it to the tail of the inactive list after end_writeback */
> -       folio_set_reclaim(folio);
> +       /* free the folio after writeback */
> +       folio_set_dropbehind(folio);
>
>         /* start writeback */
>         __swap_writepage(folio, &wbc);
> --
> 2.45.2
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13  9:34 ` [PATCH 4/8] mm/swap: " Kirill A. Shutemov
  2025-01-13 10:07   ` David Hildenbrand
@ 2025-01-13 16:17   ` Yosry Ahmed
  2025-01-14  8:12     ` Kirill A. Shutemov
  1 sibling, 1 reply; 28+ messages in thread
From: Yosry Ahmed @ 2025-01-13 16:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe,
	Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, David Hildenbrand,
	Hao Ge, Jani Nikula, Johannes Weiner, Joonas Lahtinen,
	Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi,
	Nhat Pham, Oscar Salvador, Ran Xiaokai, Rodrigo Vivi,
	Simona Vetter, Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> The recently introduced PG_dropbehind allows for freeing folios
> immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> to be involved to get the folio freed.
>
> Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> lru_deactivate_file().
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  mm/swap.c | 8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/mm/swap.c b/mm/swap.c
> index fc8281ef4241..4eb33b4804a8 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
>         folio_clear_referenced(folio);
>
>         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> -               /*
> -                * Setting the reclaim flag could race with
> -                * folio_end_writeback() and confuse readahead.  But the
> -                * race window is _really_ small and  it's not a critical
> -                * problem.
> -                */
>                 lruvec_add_folio(lruvec, folio);
> -               folio_set_reclaim(folio);
> +               folio_set_dropbehind(folio);
>         } else {
>                 /*
>                  * The folio's writeback ended while it was in the batch.

Now there's a difference in behavior here depending on whether or not
the folio is under writeback (or will be written back soon). If it is,
we set PG_dropbehind to get it freed right after, but if writeback has
already ended we put it on the tail of the LRU to be freed later.

It's a bit counterintuitive to me that folios with pending writeback
get freed faster than folios that completed their writeback already.
Am I missing something?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-13 16:17   ` Yosry Ahmed
@ 2025-01-14  8:12     ` Kirill A. Shutemov
  2025-01-14 18:02       ` Yosry Ahmed
  0 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-14  8:12 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe,
	Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, David Hildenbrand,
	Hao Ge, Jani Nikula, Johannes Weiner, Joonas Lahtinen,
	Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi,
	Nhat Pham, Oscar Salvador, Ran Xiaokai, Rodrigo Vivi,
	Simona Vetter, Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote:
> On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > The recently introduced PG_dropbehind allows for freeing folios
> > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > to be involved to get the folio freed.
> >
> > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > lru_deactivate_file().
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  mm/swap.c | 8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/mm/swap.c b/mm/swap.c
> > index fc8281ef4241..4eb33b4804a8 100644
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
> >         folio_clear_referenced(folio);
> >
> >         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> > -               /*
> > -                * Setting the reclaim flag could race with
> > -                * folio_end_writeback() and confuse readahead.  But the
> > -                * race window is _really_ small and  it's not a critical
> > -                * problem.
> > -                */
> >                 lruvec_add_folio(lruvec, folio);
> > -               folio_set_reclaim(folio);
> > +               folio_set_dropbehind(folio);
> >         } else {
> >                 /*
> >                  * The folio's writeback ended while it was in the batch.
> 
> Now there's a difference in behavior here depending on whether or not
> the folio is under writeback (or will be written back soon). If it is,
> we set PG_dropbehind to get it freed right after, but if writeback has
> already ended we put it on the tail of the LRU to be freed later.
> 
> It's a bit counterintuitive to me that folios with pending writeback
> get freed faster than folios that completed their writeback already.
> Am I missing something?

Yeah, it is strange.

I think we can drop the writeback/dirty check. Set PG_dropbehind and put
the page on the tail of LRU unconditionally. The check was required to
avoid confusion with PG_readahead.

Comment above the function is not valid anymore.

But the folio that is still dirty under writeback will be freed faster as
we get rid of the folio just after writeback is done while clean page can
dangle on LRU for a while.

I don't think we have any convenient place to free clean dropbehind page
other than shrink_folio_list(). Or do we?

Looking at shrink_folio_list(), I think we need to bypass page demotion
for PG_dropbehind pages.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] mm: Remove PG_reclaim
  2025-01-13 15:28   ` Matthew Wilcox
@ 2025-01-14  8:30     ` Kirill A. Shutemov
  2025-01-14 17:01       ` Yu Zhao
  0 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2025-01-14  8:30 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Jens Axboe, Jason A. Donenfeld, Andi Shyti,
	Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Mon, Jan 13, 2025 at 03:28:43PM +0000, Matthew Wilcox wrote:
> On Mon, Jan 13, 2025 at 11:34:53AM +0200, Kirill A. Shutemov wrote:
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index caadbe393aa2..beba72da5e33 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -686,6 +686,8 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
> >  		folio_set_young(newfolio);
> >  	if (folio_test_idle(folio))
> >  		folio_set_idle(newfolio);
> > +	if (folio_test_readahead(folio))
> > +		folio_set_readahead(newfolio);
> >  
> >  	folio_migrate_refs(newfolio, folio);
> >  	/*
> 
> Not a problem with this patch ... but aren't we missing a
> test_dropbehind / set_dropbehind pair in this function?  Or are we
> prohibited from migrating a folio with the dropbehind flag set
> somewhere?

Hm. Good catch.

We might want to drop clean dropbehind pages instead migrating them.

But I am not sure about dirty ones. With slow backing storage it might be
better for the system to migrate them instead of keeping them in the old
place for potentially long time.

Any opinions?

> > +++ b/mm/swap.c
> > @@ -221,22 +221,6 @@ static void lru_move_tail(struct lruvec *lruvec, struct folio *folio)
> >  	__count_vm_events(PGROTATED, folio_nr_pages(folio));
> >  }
> >  
> > -/*
> > - * Writeback is about to end against a folio which has been marked for
> > - * immediate reclaim.  If it still appears to be reclaimable, move it
> > - * to the tail of the inactive list.
> > - *
> > - * folio_rotate_reclaimable() must disable IRQs, to prevent nasty races.
> > - */
> > -void folio_rotate_reclaimable(struct folio *folio)
> > -{
> > -	if (folio_test_locked(folio) || folio_test_dirty(folio) ||
> > -	    folio_test_unevictable(folio))
> > -		return;
> > -
> > -	folio_batch_add_and_move(folio, lru_move_tail, true);
> > -}
> 
> I think this is the last caller of lru_move_tail(), which means we can
> get rid of fbatches->lru_move_tail and the local_lock that protects it.
> Or did I miss something?

I see lru_move_tail() being used by lru_add_drain_cpu().

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 8/8] mm: Remove PG_reclaim
  2025-01-14  8:30     ` Kirill A. Shutemov
@ 2025-01-14 17:01       ` Yu Zhao
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Zhao @ 2025-01-14 17:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Matthew Wilcox, Andrew Morton, Jens Axboe, Jason A. Donenfeld,
	Andi Shyti, Chengming Zhou, Christian Brauner, Christophe Leroy,
	Dan Carpenter, David Airlie, David Hildenbrand, Hao Ge,
	Jani Nikula, Johannes Weiner, Joonas Lahtinen, Josef Bacik,
	Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi, Nhat Pham,
	Oscar Salvador, Ran Xiaokai, Rodrigo Vivi, Simona Vetter,
	Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka, Yosry Ahmed,
	intel-gfx, dri-devel, linux-kernel, linux-fsdevel, linux-mm,
	linux-trace-kernel

On Tue, Jan 14, 2025 at 1:30 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Mon, Jan 13, 2025 at 03:28:43PM +0000, Matthew Wilcox wrote:
> > On Mon, Jan 13, 2025 at 11:34:53AM +0200, Kirill A. Shutemov wrote:
> > > diff --git a/mm/migrate.c b/mm/migrate.c
> > > index caadbe393aa2..beba72da5e33 100644
> > > --- a/mm/migrate.c
> > > +++ b/mm/migrate.c
> > > @@ -686,6 +686,8 @@ void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
> > >             folio_set_young(newfolio);
> > >     if (folio_test_idle(folio))
> > >             folio_set_idle(newfolio);
> > > +   if (folio_test_readahead(folio))
> > > +           folio_set_readahead(newfolio);
> > >
> > >     folio_migrate_refs(newfolio, folio);
> > >     /*
> >
> > Not a problem with this patch ... but aren't we missing a
> > test_dropbehind / set_dropbehind pair in this function?  Or are we
> > prohibited from migrating a folio with the dropbehind flag set
> > somewhere?
>
> Hm. Good catch.
>
> We might want to drop clean dropbehind pages instead migrating them.
>
> But I am not sure about dirty ones. With slow backing storage it might be
> better for the system to migrate them instead of keeping them in the old
> place for potentially long time.
>
> Any opinions?
>
> > > +++ b/mm/swap.c
> > > @@ -221,22 +221,6 @@ static void lru_move_tail(struct lruvec *lruvec, struct folio *folio)
> > >     __count_vm_events(PGROTATED, folio_nr_pages(folio));
> > >  }
> > >
> > > -/*
> > > - * Writeback is about to end against a folio which has been marked for
> > > - * immediate reclaim.  If it still appears to be reclaimable, move it
> > > - * to the tail of the inactive list.
> > > - *
> > > - * folio_rotate_reclaimable() must disable IRQs, to prevent nasty races.
> > > - */
> > > -void folio_rotate_reclaimable(struct folio *folio)
> > > -{
> > > -   if (folio_test_locked(folio) || folio_test_dirty(folio) ||
> > > -       folio_test_unevictable(folio))
> > > -           return;
> > > -
> > > -   folio_batch_add_and_move(folio, lru_move_tail, true);
> > > -}
> >
> > I think this is the last caller of lru_move_tail(), which means we can
> > get rid of fbatches->lru_move_tail and the local_lock that protects it.
> > Or did I miss something?
>
> I see lru_move_tail() being used by lru_add_drain_cpu().

That can be deleted too, since you've already removed the producer to
fbatches->lru_move_tail.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-14  8:12     ` Kirill A. Shutemov
@ 2025-01-14 18:02       ` Yosry Ahmed
  2025-01-15  4:28         ` Yu Zhao
  0 siblings, 1 reply; 28+ messages in thread
From: Yosry Ahmed @ 2025-01-14 18:02 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Andrew Morton, Matthew Wilcox (Oracle), Jens Axboe,
	Jason A. Donenfeld, Andi Shyti, Chengming Zhou, Christian Brauner,
	Christophe Leroy, Dan Carpenter, David Airlie, David Hildenbrand,
	Hao Ge, Jani Nikula, Johannes Weiner, Joonas Lahtinen,
	Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers, Miklos Szeredi,
	Nhat Pham, Oscar Salvador, Ran Xiaokai, Rodrigo Vivi,
	Simona Vetter, Steven Rostedt, Tvrtko Ursulin, Vlastimil Babka,
	Yu Zhao, intel-gfx, dri-devel, linux-kernel, linux-fsdevel,
	linux-mm, linux-trace-kernel

On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote:
> > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > The recently introduced PG_dropbehind allows for freeing folios
> > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > > to be involved to get the folio freed.
> > >
> > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > > lru_deactivate_file().
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > ---
> > >  mm/swap.c | 8 +-------
> > >  1 file changed, 1 insertion(+), 7 deletions(-)
> > >
> > > diff --git a/mm/swap.c b/mm/swap.c
> > > index fc8281ef4241..4eb33b4804a8 100644
> > > --- a/mm/swap.c
> > > +++ b/mm/swap.c
> > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
> > >         folio_clear_referenced(folio);
> > >
> > >         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> > > -               /*
> > > -                * Setting the reclaim flag could race with
> > > -                * folio_end_writeback() and confuse readahead.  But the
> > > -                * race window is _really_ small and  it's not a critical
> > > -                * problem.
> > > -                */
> > >                 lruvec_add_folio(lruvec, folio);
> > > -               folio_set_reclaim(folio);
> > > +               folio_set_dropbehind(folio);
> > >         } else {
> > >                 /*
> > >                  * The folio's writeback ended while it was in the batch.
> >
> > Now there's a difference in behavior here depending on whether or not
> > the folio is under writeback (or will be written back soon). If it is,
> > we set PG_dropbehind to get it freed right after, but if writeback has
> > already ended we put it on the tail of the LRU to be freed later.
> >
> > It's a bit counterintuitive to me that folios with pending writeback
> > get freed faster than folios that completed their writeback already.
> > Am I missing something?
>
> Yeah, it is strange.
>
> I think we can drop the writeback/dirty check. Set PG_dropbehind and put
> the page on the tail of LRU unconditionally. The check was required to
> avoid confusion with PG_readahead.
>
> Comment above the function is not valid anymore.

My read is that we don't put dirty/writeback folios at the tail of the
LRU because they cannot be freed immediately and we want to give them
time to be written back before reclaim reaches them. So I don't think
we want to change that and always put the pages at the tail.

>
> But the folio that is still dirty under writeback will be freed faster as
> we get rid of the folio just after writeback is done while clean page can
> dangle on LRU for a while.

Yeah if we reuse PG_dropbehind then we cannot avoid
folio_end_writeback() freeing the folio faster than clean ones.

>
> I don't think we have any convenient place to free clean dropbehind page
> other than shrink_folio_list(). Or do we?

Not sure tbh. FWIW I am not saying it's necessarily a bad thing to
free dirty/writeback folios before clean ones when deactivated, it's
just strange and a behavioral change from today that I wanted to point
out. Perhaps that's the best we can do for now.

>
> Looking at shrink_folio_list(), I think we need to bypass page demotion
> for PG_dropbehind pages.
>
> --
>   Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-14 18:02       ` Yosry Ahmed
@ 2025-01-15  4:28         ` Yu Zhao
  2025-01-15  4:31           ` Yu Zhao
  0 siblings, 1 reply; 28+ messages in thread
From: Yu Zhao @ 2025-01-15  4:28 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers,
	Miklos Szeredi, Nhat Pham, Oscar Salvador, Ran Xiaokai,
	Rodrigo Vivi, Simona Vetter, Steven Rostedt, Tvrtko Ursulin,
	Vlastimil Babka, intel-gfx, dri-devel, linux-kernel,
	linux-fsdevel, linux-mm, linux-trace-kernel

On Tue, Jan 14, 2025 at 11:03 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote:
> > > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
> > > <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > The recently introduced PG_dropbehind allows for freeing folios
> > > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > > > to be involved to get the folio freed.
> > > >
> > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > > > lru_deactivate_file().
> > > >
> > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > ---
> > > >  mm/swap.c | 8 +-------
> > > >  1 file changed, 1 insertion(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/swap.c b/mm/swap.c
> > > > index fc8281ef4241..4eb33b4804a8 100644
> > > > --- a/mm/swap.c
> > > > +++ b/mm/swap.c
> > > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
> > > >         folio_clear_referenced(folio);
> > > >
> > > >         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> > > > -               /*
> > > > -                * Setting the reclaim flag could race with
> > > > -                * folio_end_writeback() and confuse readahead.  But the
> > > > -                * race window is _really_ small and  it's not a critical
> > > > -                * problem.
> > > > -                */
> > > >                 lruvec_add_folio(lruvec, folio);
> > > > -               folio_set_reclaim(folio);
> > > > +               folio_set_dropbehind(folio);
> > > >         } else {
> > > >                 /*
> > > >                  * The folio's writeback ended while it was in the batch.
> > >
> > > Now there's a difference in behavior here depending on whether or not
> > > the folio is under writeback (or will be written back soon). If it is,
> > > we set PG_dropbehind to get it freed right after, but if writeback has
> > > already ended we put it on the tail of the LRU to be freed later.
> > >
> > > It's a bit counterintuitive to me that folios with pending writeback
> > > get freed faster than folios that completed their writeback already.
> > > Am I missing something?
> >
> > Yeah, it is strange.
> >
> > I think we can drop the writeback/dirty check. Set PG_dropbehind and put
> > the page on the tail of LRU unconditionally. The check was required to
> > avoid confusion with PG_readahead.
> >
> > Comment above the function is not valid anymore.
>
> My read is that we don't put dirty/writeback folios at the tail of the
> LRU because they cannot be freed immediately and we want to give them
> time to be written back before reclaim reaches them. So I don't think
> we want to change that and always put the pages at the tail.
>
> >
> > But the folio that is still dirty under writeback will be freed faster as
> > we get rid of the folio just after writeback is done while clean page can
> > dangle on LRU for a while.
>
> Yeah if we reuse PG_dropbehind then we cannot avoid
> folio_end_writeback() freeing the folio faster than clean ones.
>
> >
> > I don't think we have any convenient place to free clean dropbehind page
> > other than shrink_folio_list(). Or do we?
>
> Not sure tbh. FWIW I am not saying it's necessarily a bad thing to
> free dirty/writeback folios before clean ones when deactivated, it's
> just strange and a behavioral change from today that I wanted to point
> out. Perhaps that's the best we can do for now.
>
> >
> > Looking at shrink_folio_list(), I think we need to bypass page demotion
> > for PG_dropbehind pages.

I agree with Yosry. I don't think lru_deactivate_file() is still
needed -- it was needed only because when truncation fails to free a
dirty/writeback folio, page reclaim can do that quickly. For other
conditions that mapping_evict_folio() returns 0, there isn't much page
reclaim can do, and those conditions are not deactivate_file_folio()
and lru_deactivate_file()'s intentions. So the following should be
enough, and it's a lot cleaner :

diff --git a/mm/truncate.c b/mm/truncate.c
index e2e115adfbc5..12d2aa608517 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct
address_space *mapping,
                         * of interest and try to speed up its reclaim.
                         */
                        if (!ret) {
-                               deactivate_file_folio(folio);
+                               folio_set_dropbehind(folio)
                                /* Likely in the lru cache of a remote CPU */
                                if (nr_failed)
                                        (*nr_failed)++;

Then we can drop deactivate_file_folio() and lru_deactivate_file().

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/8] mm/swap: Use PG_dropbehind instead of PG_reclaim
  2025-01-15  4:28         ` Yu Zhao
@ 2025-01-15  4:31           ` Yu Zhao
  0 siblings, 0 replies; 28+ messages in thread
From: Yu Zhao @ 2025-01-15  4:31 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Kirill A. Shutemov, Andrew Morton, Matthew Wilcox (Oracle),
	Jens Axboe, Jason A. Donenfeld, Andi Shyti, Chengming Zhou,
	Christian Brauner, Christophe Leroy, Dan Carpenter, David Airlie,
	David Hildenbrand, Hao Ge, Jani Nikula, Johannes Weiner,
	Joonas Lahtinen, Josef Bacik, Masami Hiramatsu, Mathieu Desnoyers,
	Miklos Szeredi, Nhat Pham, Oscar Salvador, Ran Xiaokai,
	Rodrigo Vivi, Simona Vetter, Steven Rostedt, Tvrtko Ursulin,
	Vlastimil Babka, intel-gfx, dri-devel, linux-kernel,
	linux-fsdevel, linux-mm, linux-trace-kernel

On Tue, Jan 14, 2025 at 9:28 PM Yu Zhao <yuzhao@google.com> wrote:
>
> On Tue, Jan 14, 2025 at 11:03 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> >
> > On Tue, Jan 14, 2025 at 12:12 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > On Mon, Jan 13, 2025 at 08:17:20AM -0800, Yosry Ahmed wrote:
> > > > On Mon, Jan 13, 2025 at 1:35 AM Kirill A. Shutemov
> > > > <kirill.shutemov@linux.intel.com> wrote:
> > > > >
> > > > > The recently introduced PG_dropbehind allows for freeing folios
> > > > > immediately after writeback. Unlike PG_reclaim, it does not need vmscan
> > > > > to be involved to get the folio freed.
> > > > >
> > > > > Instead of using folio_set_reclaim(), use folio_set_dropbehind() in
> > > > > lru_deactivate_file().
> > > > >
> > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > > > ---
> > > > >  mm/swap.c | 8 +-------
> > > > >  1 file changed, 1 insertion(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/mm/swap.c b/mm/swap.c
> > > > > index fc8281ef4241..4eb33b4804a8 100644
> > > > > --- a/mm/swap.c
> > > > > +++ b/mm/swap.c
> > > > > @@ -562,14 +562,8 @@ static void lru_deactivate_file(struct lruvec *lruvec, struct folio *folio)
> > > > >         folio_clear_referenced(folio);
> > > > >
> > > > >         if (folio_test_writeback(folio) || folio_test_dirty(folio)) {
> > > > > -               /*
> > > > > -                * Setting the reclaim flag could race with
> > > > > -                * folio_end_writeback() and confuse readahead.  But the
> > > > > -                * race window is _really_ small and  it's not a critical
> > > > > -                * problem.
> > > > > -                */
> > > > >                 lruvec_add_folio(lruvec, folio);
> > > > > -               folio_set_reclaim(folio);
> > > > > +               folio_set_dropbehind(folio);
> > > > >         } else {
> > > > >                 /*
> > > > >                  * The folio's writeback ended while it was in the batch.
> > > >
> > > > Now there's a difference in behavior here depending on whether or not
> > > > the folio is under writeback (or will be written back soon). If it is,
> > > > we set PG_dropbehind to get it freed right after, but if writeback has
> > > > already ended we put it on the tail of the LRU to be freed later.
> > > >
> > > > It's a bit counterintuitive to me that folios with pending writeback
> > > > get freed faster than folios that completed their writeback already.
> > > > Am I missing something?
> > >
> > > Yeah, it is strange.
> > >
> > > I think we can drop the writeback/dirty check. Set PG_dropbehind and put
> > > the page on the tail of LRU unconditionally. The check was required to
> > > avoid confusion with PG_readahead.
> > >
> > > Comment above the function is not valid anymore.
> >
> > My read is that we don't put dirty/writeback folios at the tail of the
> > LRU because they cannot be freed immediately and we want to give them
> > time to be written back before reclaim reaches them. So I don't think
> > we want to change that and always put the pages at the tail.
> >
> > >
> > > But the folio that is still dirty under writeback will be freed faster as
> > > we get rid of the folio just after writeback is done while clean page can
> > > dangle on LRU for a while.
> >
> > Yeah if we reuse PG_dropbehind then we cannot avoid
> > folio_end_writeback() freeing the folio faster than clean ones.
> >
> > >
> > > I don't think we have any convenient place to free clean dropbehind page
> > > other than shrink_folio_list(). Or do we?
> >
> > Not sure tbh. FWIW I am not saying it's necessarily a bad thing to
> > free dirty/writeback folios before clean ones when deactivated, it's
> > just strange and a behavioral change from today that I wanted to point
> > out. Perhaps that's the best we can do for now.
> >
> > >
> > > Looking at shrink_folio_list(), I think we need to bypass page demotion
> > > for PG_dropbehind pages.
>
> I agree with Yosry. I don't think lru_deactivate_file() is still
> needed -- it was needed only because when truncation fails to free a
> dirty/writeback folio, page reclaim can do that quickly. For other
> conditions that mapping_evict_folio() returns 0, there isn't much page
> reclaim can do, and those conditions are not deactivate_file_folio()
> and lru_deactivate_file()'s intentions. So the following should be
> enough, and it's a lot cleaner :
>
> diff --git a/mm/truncate.c b/mm/truncate.c
> index e2e115adfbc5..12d2aa608517 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -486,7 +486,7 @@ unsigned long mapping_try_invalidate(struct
> address_space *mapping,
>                          * of interest and try to speed up its reclaim.
>                          */
>                         if (!ret) {
> -                               deactivate_file_folio(folio);
> +                               folio_set_dropbehind(folio)
>                                 /* Likely in the lru cache of a remote CPU */
>                                 if (nr_failed)
>                                         (*nr_failed)++;
>
> Then we can drop deactivate_file_folio() and lru_deactivate_file().

And with the above and list_move_tail() removed, we can also remove
lruvec_add_folio_tail().

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-01-15  4:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13  9:34 [PATCH 0/8] mm: Remove PG_reclaim Kirill A. Shutemov
2025-01-13  9:34 ` [PATCH 1/8] drm/i915/gem: Convert __shmem_writeback() to folios Kirill A. Shutemov
2025-01-13 10:05   ` David Hildenbrand
2025-01-13  9:34 ` [PATCH 2/8] drm/i915/gem: Use PG_dropbehind instead of PG_reclaim Kirill A. Shutemov
2025-01-13 10:06   ` David Hildenbrand
2025-01-13  9:34 ` [PATCH 3/8] mm/zswap: " Kirill A. Shutemov
2025-01-13 10:06   ` David Hildenbrand
2025-01-13 16:10   ` Yosry Ahmed
2025-01-13  9:34 ` [PATCH 4/8] mm/swap: " Kirill A. Shutemov
2025-01-13 10:07   ` David Hildenbrand
2025-01-13 16:17   ` Yosry Ahmed
2025-01-14  8:12     ` Kirill A. Shutemov
2025-01-14 18:02       ` Yosry Ahmed
2025-01-15  4:28         ` Yu Zhao
2025-01-15  4:31           ` Yu Zhao
2025-01-13  9:34 ` [PATCH 5/8] mm/vmscan: " Kirill A. Shutemov
2025-01-13 10:07   ` David Hildenbrand
2025-01-13  9:34 ` [PATCH 6/8] mm/vmscan: Use PG_dropbehind instead of PG_reclaim in shrink_folio_list() Kirill A. Shutemov
2025-01-13 10:08   ` David Hildenbrand
2025-01-13  9:34 ` [PATCH 7/8] mm/mglru: Check PG_dropcache instead of PG_reclaim in lru_gen_folio_seq() Kirill A. Shutemov
2025-01-13 10:09   ` David Hildenbrand
2025-01-13  9:34 ` [PATCH 8/8] mm: Remove PG_reclaim Kirill A. Shutemov
2025-01-13 10:11   ` David Hildenbrand
2025-01-13 15:28   ` Matthew Wilcox
2025-01-14  8:30     ` Kirill A. Shutemov
2025-01-14 17:01       ` Yu Zhao
2025-01-13 13:45 ` [PATCH 0/8] " Matthew Wilcox
2025-01-13 14:07   ` Kirill A. Shutemov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).