[RFC PATCH 00/14] Rearrange batched folio freeing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 00/14] Rearrange batched folio freeing
@ 2023-08-25 13:59 Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
                   ` (18 more replies)
  0 siblings, 19 replies; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Other than the obvious "remove calls to compound_head" changes, the
fundamental belief here is that iterating a linked list is much slower
than iterating an array (5-15x slower in my testing).  There's also
an associated belief that since we iterate the batch of folios three
times, we do better when the array is small (ie 15 entries) than we do
with a batch that is hundreds of entries long, which only gives us the
opportunity for the first pages to fall out of cache by the time we get
to the end.

The one place where that probably falls down is "Free folios in a batch
in shrink_folio_list()" where we'll flush the TLB once per batch instead
of at the end.  That's going to take some benchmarking.

Matthew Wilcox (Oracle) (14):
  mm: Make folios_put() the basis of release_pages()
  mm: Convert free_unref_page_list() to use folios
  mm: Add free_unref_folios()
  mm: Use folios_put() in __folio_batch_release()
  memcg: Add mem_cgroup_uncharge_folios()
  mm: Remove use of folio list from folios_put()
  mm: Use free_unref_folios() in put_pages_list()
  mm: use __page_cache_release() in folios_put()
  mm: Handle large folios in free_unref_folios()
  mm: Allow non-hugetlb large folios to be batch processed
  mm: Free folios in a batch in shrink_folio_list()
  mm: Free folios directly in move_folios_to_lru()
  memcg: Remove mem_cgroup_uncharge_list()
  mm: Remove free_unref_page_list()

 include/linux/memcontrol.h |  24 ++---
 include/linux/mm.h         |  19 +---
 mm/internal.h              |   4 +-
 mm/memcontrol.c            |  16 ++--
 mm/mlock.c                 |   3 +-
 mm/page_alloc.c            |  74 ++++++++-------
 mm/swap.c                  | 180 ++++++++++++++++++++-----------------
 mm/vmscan.c                |  51 +++++------
 8 files changed, 181 insertions(+), 190 deletions(-)

-- 
2.40.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:21   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

By making release_pages() call folios_put(), we can get rid of the calls
to compound_head() for the callers that already know they have folios.
We can also get rid of the lock_batch tracking as we know the size of
the batch is limited by folio_batch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h | 19 ++---------
 mm/mlock.c         |  3 +-
 mm/swap.c          | 84 +++++++++++++++++++++++++++-------------------
 3 files changed, 52 insertions(+), 54 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2c6b54b5506a..7d1d96b75d11 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -36,6 +36,7 @@ struct anon_vma;
 struct anon_vma_chain;
 struct user_struct;
 struct pt_regs;
+struct folio_batch;
 
 extern int sysctl_page_lock_unfairness;
 
@@ -1521,23 +1522,7 @@ typedef union {
 } release_pages_arg __attribute__ ((__transparent_union__));
 
 void release_pages(release_pages_arg, int nr);
-
-/**
- * folios_put - Decrement the reference count on an array of folios.
- * @folios: The folios.
- * @nr: How many folios there are.
- *
- * Like folio_put(), but for an array of folios.  This is more efficient
- * than writing the loop yourself as it will optimise the locks which
- * need to be taken if the folios are freed.
- *
- * Context: May be called in process or interrupt context, but not in NMI
- * context.  May be called while holding a spinlock.
- */
-static inline void folios_put(struct folio **folios, unsigned int nr)
-{
-	release_pages(folios, nr);
-}
+void folios_put(struct folio_batch *folios);
 
 static inline void put_page(struct page *page)
 {
diff --git a/mm/mlock.c b/mm/mlock.c
index 06bdfab83b58..67bd74a6268a 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -206,8 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch)
 
 	if (lruvec)
 		unlock_page_lruvec_irq(lruvec);
-	folios_put(fbatch->folios, folio_batch_count(fbatch));
-	folio_batch_reinit(fbatch);
+	folios_put(fbatch);
 }
 
 void mlock_drain_local(void)
diff --git a/mm/swap.c b/mm/swap.c
index cd8f0150ba3a..7bdc63b56859 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -89,7 +89,7 @@ static void __page_cache_release(struct folio *folio)
 		__folio_clear_lru_flags(folio);
 		unlock_page_lruvec_irqrestore(lruvec, flags);
 	}
-	/* See comment on folio_test_mlocked in release_pages() */
+	/* See comment on folio_test_mlocked in folios_put() */
 	if (unlikely(folio_test_mlocked(folio))) {
 		long nr_pages = folio_nr_pages(folio);
 
@@ -175,7 +175,7 @@ static void lru_add_fn(struct lruvec *lruvec, struct folio *folio)
 	 * while the LRU lock is held.
 	 *
 	 * (That is not true of __page_cache_release(), and not necessarily
-	 * true of release_pages(): but those only clear the mlocked flag after
+	 * true of folios_put(): but those only clear the mlocked flag after
 	 * folio_put_testzero() has excluded any other users of the folio.)
 	 */
 	if (folio_evictable(folio)) {
@@ -221,8 +221,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 
 	if (lruvec)
 		unlock_page_lruvec_irqrestore(lruvec, flags);
-	folios_put(fbatch->folios, folio_batch_count(fbatch));
-	folio_batch_reinit(fbatch);
+	folios_put(fbatch);
 }
 
 static void folio_batch_add_and_move(struct folio_batch *fbatch,
@@ -946,41 +945,27 @@ void lru_cache_disable(void)
 }
 
 /**
- * release_pages - batched put_page()
- * @arg: array of pages to release
- * @nr: number of pages
+ * folios_put - Decrement the reference count on a batch of folios.
+ * @folios: The folios.
  *
- * Decrement the reference count on all the pages in @arg.  If it
- * fell to zero, remove the page from the LRU and free it.
+ * Like folio_put(), but for a batch of folios.  This is more efficient
+ * than writing the loop yourself as it will optimise the locks which need
+ * to be taken if the folios are freed.  The folios batch is returned
+ * empty and ready to be reused for another batch; there is no need to
+ * reinitialise it.
  *
- * Note that the argument can be an array of pages, encoded pages,
- * or folio pointers. We ignore any encoded bits, and turn any of
- * them into just a folio that gets free'd.
+ * Context: May be called in process or interrupt context, but not in NMI
+ * context.  May be called while holding a spinlock.
  */
-void release_pages(release_pages_arg arg, int nr)
+void folios_put(struct folio_batch *folios)
 {
 	int i;
-	struct encoded_page **encoded = arg.encoded_pages;
 	LIST_HEAD(pages_to_free);
 	struct lruvec *lruvec = NULL;
 	unsigned long flags = 0;
-	unsigned int lock_batch;
 
-	for (i = 0; i < nr; i++) {
-		struct folio *folio;
-
-		/* Turn any of the argument types into a folio */
-		folio = page_folio(encoded_page_ptr(encoded[i]));
-
-		/*
-		 * Make sure the IRQ-safe lock-holding time does not get
-		 * excessive with a continuous string of pages from the
-		 * same lruvec. The lock is held only if lruvec != NULL.
-		 */
-		if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) {
-			unlock_page_lruvec_irqrestore(lruvec, flags);
-			lruvec = NULL;
-		}
+	for (i = 0; i < folios->nr; i++) {
+		struct folio *folio = folios->folios[i];
 
 		if (is_huge_zero_page(&folio->page))
 			continue;
@@ -1010,13 +995,8 @@ void release_pages(release_pages_arg arg, int nr)
 		}
 
 		if (folio_test_lru(folio)) {
-			struct lruvec *prev_lruvec = lruvec;
-
 			lruvec = folio_lruvec_relock_irqsave(folio, lruvec,
 									&flags);
-			if (prev_lruvec != lruvec)
-				lock_batch = 0;
-
 			lruvec_del_folio(lruvec, folio);
 			__folio_clear_lru_flags(folio);
 		}
@@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr)
 
 	mem_cgroup_uncharge_list(&pages_to_free);
 	free_unref_page_list(&pages_to_free);
+	folios->nr = 0;
+}
+EXPORT_SYMBOL(folios_put);
+
+/**
+ * release_pages - batched put_page()
+ * @arg: array of pages to release
+ * @nr: number of pages
+ *
+ * Decrement the reference count on all the pages in @arg.  If it
+ * fell to zero, remove the page from the LRU and free it.
+ *
+ * Note that the argument can be an array of pages, encoded pages,
+ * or folio pointers. We ignore any encoded bits, and turn any of
+ * them into just a folio that gets free'd.
+ */
+void release_pages(release_pages_arg arg, int nr)
+{
+	struct folio_batch fbatch;
+	struct encoded_page **encoded = arg.encoded_pages;
+	int i;
+
+	folio_batch_init(&fbatch);
+	for (i = 0; i < nr; i++) {
+		/* Turn any of the argument types into a folio */
+		struct folio *folio = page_folio(encoded_page_ptr(encoded[i]));
+
+		if (folio_batch_add(&fbatch, folio) > 0)
+			continue;
+		folios_put(&fbatch);
+	}
+
+	if (fbatch.nr)
+		folios_put(&fbatch);
 }
 EXPORT_SYMBOL(release_pages);
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:29   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 03/14] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Most of its callees are not yet ready to accept a folio, but we know
all of the pages passed in are actually folios because they're linked
through ->lru.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/page_alloc.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 442c1b3480aa..f1ee96fd9bef 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2469,17 +2469,17 @@ void free_unref_page(struct page *page, unsigned int order)
 void free_unref_page_list(struct list_head *list)
 {
 	unsigned long __maybe_unused UP_flags;
-	struct page *page, *next;
+	struct folio *folio, *next;
 	struct per_cpu_pages *pcp = NULL;
 	struct zone *locked_zone = NULL;
 	int batch_count = 0;
 	int migratetype;
 
 	/* Prepare pages for freeing */
-	list_for_each_entry_safe(page, next, list, lru) {
-		unsigned long pfn = page_to_pfn(page);
-		if (!free_unref_page_prepare(page, pfn, 0)) {
-			list_del(&page->lru);
+	list_for_each_entry_safe(folio, next, list, lru) {
+		unsigned long pfn = folio_pfn(folio);
+		if (!free_unref_page_prepare(&folio->page, pfn, 0)) {
+			list_del(&folio->lru);
 			continue;
 		}
 
@@ -2487,24 +2487,25 @@ void free_unref_page_list(struct list_head *list)
 		 * Free isolated pages directly to the allocator, see
 		 * comment in free_unref_page.
 		 */
-		migratetype = get_pcppage_migratetype(page);
+		migratetype = get_pcppage_migratetype(&folio->page);
 		if (unlikely(is_migrate_isolate(migratetype))) {
-			list_del(&page->lru);
-			free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
+			list_del(&folio->lru);
+			free_one_page(folio_zone(folio), &folio->page, pfn,
+					0, migratetype, FPI_NONE);
 			continue;
 		}
 	}
 
-	list_for_each_entry_safe(page, next, list, lru) {
-		struct zone *zone = page_zone(page);
+	list_for_each_entry_safe(folio, next, list, lru) {
+		struct zone *zone = folio_zone(folio);
 
-		list_del(&page->lru);
-		migratetype = get_pcppage_migratetype(page);
+		list_del(&folio->lru);
+		migratetype = get_pcppage_migratetype(&folio->page);
 
 		/*
 		 * Either different zone requiring a different pcp lock or
 		 * excessive lock hold times when freeing a large list of
-		 * pages.
+		 * folios.
 		 */
 		if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {
 			if (pcp) {
@@ -2515,15 +2516,16 @@ void free_unref_page_list(struct list_head *list)
 			batch_count = 0;
 
 			/*
-			 * trylock is necessary as pages may be getting freed
+			 * trylock is necessary as folios may be getting freed
 			 * from IRQ or SoftIRQ context after an IO completion.
 			 */
 			pcp_trylock_prepare(UP_flags);
 			pcp = pcp_spin_trylock(zone->per_cpu_pageset);
 			if (unlikely(!pcp)) {
 				pcp_trylock_finish(UP_flags);
-				free_one_page(zone, page, page_to_pfn(page),
-					      0, migratetype, FPI_NONE);
+				free_one_page(zone, &folio->page,
+						folio_pfn(folio), 0,
+						migratetype, FPI_NONE);
 				locked_zone = NULL;
 				continue;
 			}
@@ -2537,8 +2539,8 @@ void free_unref_page_list(struct list_head *list)
 		if (unlikely(migratetype >= MIGRATE_PCPTYPES))
 			migratetype = MIGRATE_MOVABLE;
 
-		trace_mm_page_free_batched(page);
-		free_unref_page_commit(zone, pcp, page, migratetype, 0);
+		trace_mm_page_free_batched(&folio->page);
+		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
 		batch_count++;
 	}
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 03/14] mm: Add free_unref_folios()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:39   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Iterate over a folio_batch rather than a linked list.  This is
easier for the CPU to prefetch and has a batch count naturally
built in so we don't need to track it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/internal.h   |  5 +++--
 mm/page_alloc.c | 59 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 7499b5ea1cf6..5c6a53371aeb 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -441,8 +441,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
 					gfp_t gfp_flags);
 extern int user_min_free_kbytes;
 
-extern void free_unref_page(struct page *page, unsigned int order);
-extern void free_unref_page_list(struct list_head *list);
+void free_unref_page(struct page *page, unsigned int order);
+void free_unref_folios(struct folio_batch *fbatch);
+void free_unref_page_list(struct list_head *list);
 
 extern void zone_pcp_reset(struct zone *zone);
 extern void zone_pcp_disable(struct zone *zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f1ee96fd9bef..bca5c70b5576 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -32,6 +32,7 @@
 #include <linux/sysctl.h>
 #include <linux/cpu.h>
 #include <linux/cpuset.h>
+#include <linux/pagevec.h>
 #include <linux/memory_hotplug.h>
 #include <linux/nodemask.h>
 #include <linux/vmstat.h>
@@ -2464,57 +2465,51 @@ void free_unref_page(struct page *page, unsigned int order)
 }
 
 /*
- * Free a list of 0-order pages
+ * Free a batch of 0-order pages
  */
-void free_unref_page_list(struct list_head *list)
+void free_unref_folios(struct folio_batch *folios)
 {
 	unsigned long __maybe_unused UP_flags;
-	struct folio *folio, *next;
 	struct per_cpu_pages *pcp = NULL;
 	struct zone *locked_zone = NULL;
-	int batch_count = 0;
-	int migratetype;
+	int i, j, migratetype;
 
-	/* Prepare pages for freeing */
-	list_for_each_entry_safe(folio, next, list, lru) {
+	/* Prepare folios for freeing */
+	for (i = 0, j = 0; i < folios->nr; i++) {
+		struct folio *folio = folios->folios[i];
 		unsigned long pfn = folio_pfn(folio);
-		if (!free_unref_page_prepare(&folio->page, pfn, 0)) {
-			list_del(&folio->lru);
+		if (!free_unref_page_prepare(&folio->page, pfn, 0))
 			continue;
-		}
 
 		/*
-		 * Free isolated pages directly to the allocator, see
+		 * Free isolated folios directly to the allocator, see
 		 * comment in free_unref_page.
 		 */
 		migratetype = get_pcppage_migratetype(&folio->page);
 		if (unlikely(is_migrate_isolate(migratetype))) {
-			list_del(&folio->lru);
 			free_one_page(folio_zone(folio), &folio->page, pfn,
 					0, migratetype, FPI_NONE);
 			continue;
 		}
+		if (j != i)
+			folios->folios[j] = folio;
+		j++;
 	}
+	folios->nr = j;
 
-	list_for_each_entry_safe(folio, next, list, lru) {
+	for (i = 0; i < folios->nr; i++) {
+		struct folio *folio = folios->folios[i];
 		struct zone *zone = folio_zone(folio);
 
-		list_del(&folio->lru);
 		migratetype = get_pcppage_migratetype(&folio->page);
 
-		/*
-		 * Either different zone requiring a different pcp lock or
-		 * excessive lock hold times when freeing a large list of
-		 * folios.
-		 */
-		if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {
+		/* Different zone requires a different pcp lock */
+		if (zone != locked_zone) {
 			if (pcp) {
 				pcp_spin_unlock(pcp);
 				pcp_trylock_finish(UP_flags);
 			}
 
-			batch_count = 0;
-
 			/*
 			 * trylock is necessary as folios may be getting freed
 			 * from IRQ or SoftIRQ context after an IO completion.
@@ -2541,13 +2536,31 @@ void free_unref_page_list(struct list_head *list)
 
 		trace_mm_page_free_batched(&folio->page);
 		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
-		batch_count++;
 	}
 
 	if (pcp) {
 		pcp_spin_unlock(pcp);
 		pcp_trylock_finish(UP_flags);
 	}
+	folios->nr = 0;
+}
+
+void free_unref_page_list(struct list_head *list)
+{
+	struct folio_batch fbatch;
+
+	folio_batch_init(&fbatch);
+	while (!list_empty(list)) {
+		struct folio *folio = list_first_entry(list, struct folio, lru);
+
+		list_del(&folio->lru);
+		if (folio_batch_add(&fbatch, folio) > 0)
+			continue;
+		free_unref_folios(&fbatch);
+	}
+
+	if (fbatch.nr)
+		free_unref_folios(&fbatch);
 }
 
 /*
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 03/14] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:41   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

There's no need to indirect through release_pages() and iterate
over this batch of folios an extra time; we can just use the batch
that we have.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/swap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 7bdc63b56859..c5ea0c6669e7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1073,8 +1073,7 @@ void __folio_batch_release(struct folio_batch *fbatch)
 		lru_add_drain();
 		fbatch->percpu_pvec_drained = true;
 	}
-	release_pages(fbatch->folios, folio_batch_count(fbatch));
-	folio_batch_reinit(fbatch);
+	folios_put(fbatch);
 }
 EXPORT_SYMBOL(__folio_batch_release);
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:49   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 06/14] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Almost identical to mem_cgroup_uncharge_list(), except it takes a
folio_batch instead of a list_head.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/memcontrol.h | 12 ++++++++++++
 mm/memcontrol.c            | 13 +++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ab94ad4597d0..d0caeb1c1136 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -713,6 +713,14 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
 	__mem_cgroup_uncharge_list(page_list);
 }
 
+void __mem_cgroup_uncharge_folios(struct folio_batch *folios);
+static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
+{
+	if (mem_cgroup_disabled())
+		return;
+	__mem_cgroup_uncharge_folios(folios);
+}
+
 void mem_cgroup_migrate(struct folio *old, struct folio *new);
 
 /**
@@ -1273,6 +1281,10 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
 {
 }
 
+static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
+{
+}
+
 static inline void mem_cgroup_migrate(struct folio *old, struct folio *new)
 {
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 26ba9d1a66d1..a0fb008afa23 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -33,6 +33,7 @@
 #include <linux/shmem_fs.h>
 #include <linux/hugetlb.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/vm_event_item.h>
 #include <linux/smp.h>
 #include <linux/page-flags.h>
@@ -7190,6 +7191,18 @@ void __mem_cgroup_uncharge_list(struct list_head *page_list)
 		uncharge_batch(&ug);
 }
 
+void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
+{
+	struct uncharge_gather ug;
+	unsigned int i;
+
+	uncharge_gather_clear(&ug);
+	for (i = 0; i < folios->nr; i++)
+		uncharge_folio(folios->folios[i], &ug);
+	if (ug.memcg)
+		uncharge_batch(&ug);
+}
+
 /**
  * mem_cgroup_migrate - Charge a folio's replacement.
  * @old: Currently circulating folio.
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 06/14] mm: Remove use of folio list from folios_put()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 14:53   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 07/14] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Instead of putting the interesting folios on a list, delete the
uninteresting one from the folio_batch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/swap.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index c5ea0c6669e7..6b2f3f77450c 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -959,12 +959,11 @@ void lru_cache_disable(void)
  */
 void folios_put(struct folio_batch *folios)
 {
-	int i;
-	LIST_HEAD(pages_to_free);
+	int i, j;
 	struct lruvec *lruvec = NULL;
 	unsigned long flags = 0;
 
-	for (i = 0; i < folios->nr; i++) {
+	for (i = 0, j = 0; i < folios->nr; i++) {
 		struct folio *folio = folios->folios[i];
 
 		if (is_huge_zero_page(&folio->page))
@@ -1013,14 +1012,18 @@ void folios_put(struct folio_batch *folios)
 			count_vm_event(UNEVICTABLE_PGCLEARED);
 		}
 
-		list_add(&folio->lru, &pages_to_free);
+		if (j != i)
+			folios->folios[j] = folio;
+		j++;
 	}
 	if (lruvec)
 		unlock_page_lruvec_irqrestore(lruvec, flags);
+	folios->nr = j;
+	if (!j)
+		return;
 
-	mem_cgroup_uncharge_list(&pages_to_free);
-	free_unref_page_list(&pages_to_free);
-	folios->nr = 0;
+	mem_cgroup_uncharge_folios(folios);
+	free_unref_folios(folios);
 }
 EXPORT_SYMBOL(folios_put);
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 07/14] mm: Use free_unref_folios() in put_pages_list()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 06/14] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 08/14] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Break up the list of folios into batches here so that the folios are
more likely to be cache hot when doing the rest of the processing.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/swap.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 6b2f3f77450c..fffdb48cfbf9 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -138,22 +138,25 @@ EXPORT_SYMBOL(__folio_put);
  */
 void put_pages_list(struct list_head *pages)
 {
-	struct folio *folio, *next;
+	struct folio_batch fbatch;
+	struct folio *folio;
 
-	list_for_each_entry_safe(folio, next, pages, lru) {
-		if (!folio_put_testzero(folio)) {
-			list_del(&folio->lru);
+	folio_batch_init(&fbatch);
+	list_for_each_entry(folio, pages, lru) {
+		if (!folio_put_testzero(folio))
 			continue;
-		}
 		if (folio_test_large(folio)) {
-			list_del(&folio->lru);
 			__folio_put_large(folio);
 			continue;
 		}
 		/* LRU flag must be clear because it's passed using the lru */
+		if (folio_batch_add(&fbatch, folio) > 0)
+			continue;
+		free_unref_folios(&fbatch);
 	}
 
-	free_unref_page_list(pages);
+	if (fbatch.nr)
+		free_unref_folios(&fbatch);
 	INIT_LIST_HEAD(pages);
 }
 EXPORT_SYMBOL(put_pages_list);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 08/14] mm: use __page_cache_release() in folios_put()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 07/14] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-25 13:59 ` [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Pass a pointer to the lruvec so we can take advantage of the
folio_lruvec_relock_irqsave().  Adjust the calling convention of
folio_lruvec_relock_irqsave() to suit and add a page_cache_release()
wrapper.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/memcontrol.h | 16 +++++-----
 mm/swap.c                  | 62 ++++++++++++++++++--------------------
 2 files changed, 37 insertions(+), 41 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d0caeb1c1136..44d3e459d16f 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -1666,18 +1666,18 @@ static inline struct lruvec *folio_lruvec_relock_irq(struct folio *folio,
 	return folio_lruvec_lock_irq(folio);
 }
 
-/* Don't lock again iff page's lruvec locked */
-static inline struct lruvec *folio_lruvec_relock_irqsave(struct folio *folio,
-		struct lruvec *locked_lruvec, unsigned long *flags)
+/* Don't lock again iff folio's lruvec locked */
+static inline void folio_lruvec_relock_irqsave(struct folio *folio,
+		struct lruvec **lruvecp, unsigned long *flags)
 {
-	if (locked_lruvec) {
-		if (folio_matches_lruvec(folio, locked_lruvec))
-			return locked_lruvec;
+	if (*lruvecp) {
+		if (folio_matches_lruvec(folio, *lruvecp))
+			return;
 
-		unlock_page_lruvec_irqrestore(locked_lruvec, *flags);
+		unlock_page_lruvec_irqrestore(*lruvecp, *flags);
 	}
 
-	return folio_lruvec_lock_irqsave(folio, flags);
+	*lruvecp = folio_lruvec_lock_irqsave(folio, flags);
 }
 
 #ifdef CONFIG_CGROUP_WRITEBACK
diff --git a/mm/swap.c b/mm/swap.c
index fffdb48cfbf9..21c2df0f7928 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -74,22 +74,21 @@ static DEFINE_PER_CPU(struct cpu_fbatches, cpu_fbatches) = {
 	.lock = INIT_LOCAL_LOCK(lock),
 };
 
-/*
- * This path almost never happens for VM activity - pages are normally freed
- * in batches.  But it gets used by networking - and for compound pages.
- */
-static void __page_cache_release(struct folio *folio)
+static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp,
+		unsigned long *flagsp)
 {
 	if (folio_test_lru(folio)) {
-		struct lruvec *lruvec;
-		unsigned long flags;
-
-		lruvec = folio_lruvec_lock_irqsave(folio, &flags);
-		lruvec_del_folio(lruvec, folio);
+		folio_lruvec_relock_irqsave(folio, lruvecp, flagsp);
+		lruvec_del_folio(*lruvecp, folio);
 		__folio_clear_lru_flags(folio);
-		unlock_page_lruvec_irqrestore(lruvec, flags);
 	}
-	/* See comment on folio_test_mlocked in folios_put() */
+
+	/*
+	 * In rare cases, when truncation or holepunching raced with
+	 * munlock after VM_LOCKED was cleared, Mlocked may still be
+	 * found set here.  This does not indicate a problem, unless
+	 * "unevictable_pgs_cleared" appears worryingly large.
+	 */
 	if (unlikely(folio_test_mlocked(folio))) {
 		long nr_pages = folio_nr_pages(folio);
 
@@ -99,9 +98,23 @@ static void __page_cache_release(struct folio *folio)
 	}
 }
 
+/*
+ * This path almost never happens for VM activity - pages are normally freed
+ * in batches.  But it gets used by networking - and for compound pages.
+ */
+static void page_cache_release(struct folio *folio)
+{
+	struct lruvec *lruvec = NULL;
+	unsigned long flags;
+
+	__page_cache_release(folio, &lruvec, &flags);
+	if (lruvec)
+		unlock_page_lruvec_irqrestore(lruvec, flags);
+}
+
 static void __folio_put_small(struct folio *folio)
 {
-	__page_cache_release(folio);
+	page_cache_release(folio);
 	mem_cgroup_uncharge(folio);
 	free_unref_page(&folio->page, 0);
 }
@@ -115,7 +128,7 @@ static void __folio_put_large(struct folio *folio)
 	 * be called for hugetlb (it has a separate hugetlb_cgroup.)
 	 */
 	if (!folio_test_hugetlb(folio))
-		__page_cache_release(folio);
+		page_cache_release(folio);
 	destroy_large_folio(folio);
 }
 
@@ -216,7 +229,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 		if (move_fn != lru_add_fn && !folio_test_clear_lru(folio))
 			continue;
 
-		lruvec = folio_lruvec_relock_irqsave(folio, lruvec, &flags);
+		folio_lruvec_relock_irqsave(folio, &lruvec, &flags);
 		move_fn(lruvec, folio);
 
 		folio_set_lru(folio);
@@ -996,24 +1009,7 @@ void folios_put(struct folio_batch *folios)
 			continue;
 		}
 
-		if (folio_test_lru(folio)) {
-			lruvec = folio_lruvec_relock_irqsave(folio, lruvec,
-									&flags);
-			lruvec_del_folio(lruvec, folio);
-			__folio_clear_lru_flags(folio);
-		}
-
-		/*
-		 * In rare cases, when truncation or holepunching raced with
-		 * munlock after VM_LOCKED was cleared, Mlocked may still be
-		 * found set here.  This does not indicate a problem, unless
-		 * "unevictable_pgs_cleared" appears worryingly large.
-		 */
-		if (unlikely(folio_test_mlocked(folio))) {
-			__folio_clear_mlocked(folio);
-			zone_stat_sub_folio(folio, NR_MLOCK);
-			count_vm_event(UNEVICTABLE_PGCLEARED);
-		}
+		__page_cache_release(folio, &lruvec, &flags);
 
 		if (j != i)
 			folios->folios[j] = folio;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 08/14] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 15:21   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Call folio_undo_large_rmappable() if needed.  free_unref_page_prepare()
destroys the ability to call folio_order(), so stash the order in
folio->private for the benefit of the second loop.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/page_alloc.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bca5c70b5576..e586d17fb7f2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2465,7 +2465,7 @@ void free_unref_page(struct page *page, unsigned int order)
 }
 
 /*
- * Free a batch of 0-order pages
+ * Free a batch of folios
  */
 void free_unref_folios(struct folio_batch *folios)
 {
@@ -2478,7 +2478,11 @@ void free_unref_folios(struct folio_batch *folios)
 	for (i = 0, j = 0; i < folios->nr; i++) {
 		struct folio *folio = folios->folios[i];
 		unsigned long pfn = folio_pfn(folio);
-		if (!free_unref_page_prepare(&folio->page, pfn, 0))
+		unsigned int order = folio_order(folio);
+
+		if (order > 0 && folio_test_large_rmappable(folio))
+			folio_undo_large_rmappable(folio);
+		if (!free_unref_page_prepare(&folio->page, pfn, order))
 			continue;
 
 		/*
@@ -2486,11 +2490,13 @@ void free_unref_folios(struct folio_batch *folios)
 		 * comment in free_unref_page.
 		 */
 		migratetype = get_pcppage_migratetype(&folio->page);
-		if (unlikely(is_migrate_isolate(migratetype))) {
+		if (order > PAGE_ALLOC_COSTLY_ORDER ||
+		    is_migrate_isolate(migratetype)) {
 			free_one_page(folio_zone(folio), &folio->page, pfn,
-					0, migratetype, FPI_NONE);
+					order, migratetype, FPI_NONE);
 			continue;
 		}
+		folio->private = (void *)(unsigned long)order;
 		if (j != i)
 			folios->folios[j] = folio;
 		j++;
@@ -2500,7 +2506,9 @@ void free_unref_folios(struct folio_batch *folios)
 	for (i = 0; i < folios->nr; i++) {
 		struct folio *folio = folios->folios[i];
 		struct zone *zone = folio_zone(folio);
+		unsigned int order = (unsigned long)folio->private;
 
+		folio->private = NULL;
 		migratetype = get_pcppage_migratetype(&folio->page);
 
 		/* Different zone requires a different pcp lock */
@@ -2519,7 +2527,7 @@ void free_unref_folios(struct folio_batch *folios)
 			if (unlikely(!pcp)) {
 				pcp_trylock_finish(UP_flags);
 				free_one_page(zone, &folio->page,
-						folio_pfn(folio), 0,
+						folio_pfn(folio), order,
 						migratetype, FPI_NONE);
 				locked_zone = NULL;
 				continue;
@@ -2535,7 +2543,8 @@ void free_unref_folios(struct folio_batch *folios)
 			migratetype = MIGRATE_MOVABLE;
 
 		trace_mm_page_free_batched(&folio->page);
-		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
+		free_unref_page_commit(zone, pcp, &folio->page, migratetype,
+				order);
 	}
 
 	if (pcp) {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (8 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 15:28   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

Hugetlb folios still get special treatment, but normal large folios
can now be freed by free_unref_folios().  This should have a reasonable
performance impact, TBD.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/swap.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 21c2df0f7928..8bd15402cd8f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1000,12 +1000,13 @@ void folios_put(struct folio_batch *folios)
 		if (!folio_put_testzero(folio))
 			continue;
 
-		if (folio_test_large(folio)) {
+		/* hugetlb has its own memcg */
+		if (folio_test_hugetlb(folio)) {
 			if (lruvec) {
 				unlock_page_lruvec_irqrestore(lruvec, flags);
 				lruvec = NULL;
 			}
-			__folio_put_large(folio);
+			free_huge_folio(folio);
 			continue;
 		}
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (9 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-09-04  3:43   ` Matthew Wilcox
  2023-08-25 13:59 ` [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), Mel Gorman

Use free_unref_page_batch() to free the folios.  This may increase
the numer of IPIs from calling try_to_unmap_flush() more often,
but that's going to be very workload-dependent.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6f13394b112e..965c429847fd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1706,14 +1706,15 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		struct pglist_data *pgdat, struct scan_control *sc,
 		struct reclaim_stat *stat, bool ignore_references)
 {
+	struct folio_batch free_folios;
 	LIST_HEAD(ret_folios);
-	LIST_HEAD(free_folios);
 	LIST_HEAD(demote_folios);
 	unsigned int nr_reclaimed = 0;
 	unsigned int pgactivate = 0;
 	bool do_demote_pass;
 	struct swap_iocb *plug = NULL;
 
+	folio_batch_init(&free_folios);
 	memset(stat, 0, sizeof(*stat));
 	cond_resched();
 	do_demote_pass = can_demote(pgdat->node_id, sc);
@@ -2111,14 +2112,11 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 */
 		nr_reclaimed += nr_pages;
 
-		/*
-		 * Is there need to periodically free_folio_list? It would
-		 * appear not as the counts should be low
-		 */
-		if (unlikely(folio_test_large(folio)))
-			destroy_large_folio(folio);
-		else
-			list_add(&folio->lru, &free_folios);
+		if (folio_batch_add(&free_folios, folio) == 0) {
+			mem_cgroup_uncharge_folios(&free_folios);
+			try_to_unmap_flush();
+			free_unref_folios(&free_folios);
+		}
 		continue;
 
 activate_locked_split:
@@ -2182,9 +2180,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 
 	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
 
-	mem_cgroup_uncharge_list(&free_folios);
+	mem_cgroup_uncharge_folios(&free_folios);
 	try_to_unmap_flush();
-	free_unref_page_list(&free_folios);
+	free_unref_folios(&free_folios);
 
 	list_splice(&ret_folios, folio_list);
 	count_vm_events(PGACTIVATE, pgactivate);
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (10 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 15:46   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

The few folios which can't be moved to the LRU list (because their
refcount dropped to zero) used to be returned to the caller to dispose
of.  Make this simpler to call by freeing the folios directly through
free_unref_folios().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/vmscan.c | 31 ++++++++++++-------------------
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 965c429847fd..d5080510608e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2489,8 +2489,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
 		struct list_head *list)
 {
 	int nr_pages, nr_moved = 0;
-	LIST_HEAD(folios_to_free);
+	struct folio_batch free_folios;
 
+	folio_batch_init(&free_folios);
 	while (!list_empty(list)) {
 		struct folio *folio = lru_to_folio(list);
 
@@ -2519,12 +2520,12 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
 		if (unlikely(folio_put_testzero(folio))) {
 			__folio_clear_lru_flags(folio);
 
-			if (unlikely(folio_test_large(folio))) {
+			if (folio_batch_add(&free_folios, folio) == 0) {
 				spin_unlock_irq(&lruvec->lru_lock);
-				destroy_large_folio(folio);
+				mem_cgroup_uncharge_folios(&free_folios);
+				free_unref_folios(&free_folios);
 				spin_lock_irq(&lruvec->lru_lock);
-			} else
-				list_add(&folio->lru, &folios_to_free);
+			}
 
 			continue;
 		}
@@ -2541,10 +2542,12 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
 			workingset_age_nonresident(lruvec, nr_pages);
 	}
 
-	/*
-	 * To save our caller's stack, now use input list for pages to free.
-	 */
-	list_splice(&folios_to_free, list);
+	if (free_folios.nr) {
+		spin_unlock_irq(&lruvec->lru_lock);
+		mem_cgroup_uncharge_folios(&free_folios);
+		free_unref_folios(&free_folios);
+		spin_lock_irq(&lruvec->lru_lock);
+	}
 
 	return nr_moved;
 }
@@ -2623,8 +2626,6 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
 	spin_unlock_irq(&lruvec->lru_lock);
 
 	lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed);
-	mem_cgroup_uncharge_list(&folio_list);
-	free_unref_page_list(&folio_list);
 
 	/*
 	 * If dirty folios are scanned that are not queued for IO, it
@@ -2765,8 +2766,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	nr_activate = move_folios_to_lru(lruvec, &l_active);
 	nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
-	/* Keep all free folios in l_active list */
-	list_splice(&l_inactive, &l_active);
 
 	__count_vm_events(PGDEACTIVATE, nr_deactivate);
 	__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
@@ -2776,8 +2775,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	if (nr_rotated)
 		lru_note_cost(lruvec, file, 0, nr_rotated);
-	mem_cgroup_uncharge_list(&l_active);
-	free_unref_page_list(&l_active);
 	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
 			nr_deactivate, nr_rotated, sc->priority, file);
 }
@@ -5238,10 +5235,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
 
 	spin_unlock_irq(&lruvec->lru_lock);
 
-	mem_cgroup_uncharge_list(&list);
-	free_unref_page_list(&list);
-
-	INIT_LIST_HEAD(&list);
 	list_splice_init(&clean, &list);
 
 	if (!list_empty(&list)) {
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (11 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 18:26   ` Ryan Roberts
  2023-08-25 13:59 ` [RFC PATCH 14/14] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

All users have been converted to mem_cgroup_uncharge_folios() so
we can remove this API.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/memcontrol.h | 12 ------------
 mm/memcontrol.c            | 19 -------------------
 2 files changed, 31 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 44d3e459d16f..2b3966ff85f2 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -705,14 +705,6 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
 	__mem_cgroup_uncharge(folio);
 }
 
-void __mem_cgroup_uncharge_list(struct list_head *page_list);
-static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
-{
-	if (mem_cgroup_disabled())
-		return;
-	__mem_cgroup_uncharge_list(page_list);
-}
-
 void __mem_cgroup_uncharge_folios(struct folio_batch *folios);
 static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
 {
@@ -1277,10 +1269,6 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
 {
 }
 
-static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
-{
-}
-
 static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
 {
 }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a0fb008afa23..c89265bd9ce6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7172,25 +7172,6 @@ void __mem_cgroup_uncharge(struct folio *folio)
 	uncharge_batch(&ug);
 }
 
-/**
- * __mem_cgroup_uncharge_list - uncharge a list of page
- * @page_list: list of pages to uncharge
- *
- * Uncharge a list of pages previously charged with
- * __mem_cgroup_charge().
- */
-void __mem_cgroup_uncharge_list(struct list_head *page_list)
-{
-	struct uncharge_gather ug;
-	struct folio *folio;
-
-	uncharge_gather_clear(&ug);
-	list_for_each_entry(folio, page_list, lru)
-		uncharge_folio(folio, &ug);
-	if (ug.memcg)
-		uncharge_batch(&ug);
-}
-
 void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
 {
 	struct uncharge_gather ug;
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 14/14] mm: Remove free_unref_page_list()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (12 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
@ 2023-08-25 13:59 ` Matthew Wilcox (Oracle)
  2023-08-31 18:27   ` Ryan Roberts
  2023-08-30 18:50 ` [RFC PATCH 15/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-25 13:59 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle)

All callers now use free_unref_folios() so we can delete this function.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/internal.h   |  1 -
 mm/page_alloc.c | 18 ------------------
 2 files changed, 19 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5c6a53371aeb..42feb3f59ee5 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -443,7 +443,6 @@ extern int user_min_free_kbytes;
 
 void free_unref_page(struct page *page, unsigned int order);
 void free_unref_folios(struct folio_batch *fbatch);
-void free_unref_page_list(struct list_head *list);
 
 extern void zone_pcp_reset(struct zone *zone);
 extern void zone_pcp_disable(struct zone *zone);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e586d17fb7f2..496304014ec0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2554,24 +2554,6 @@ void free_unref_folios(struct folio_batch *folios)
 	folios->nr = 0;
 }
 
-void free_unref_page_list(struct list_head *list)
-{
-	struct folio_batch fbatch;
-
-	folio_batch_init(&fbatch);
-	while (!list_empty(list)) {
-		struct folio *folio = list_first_entry(list, struct folio, lru);
-
-		list_del(&folio->lru);
-		if (folio_batch_add(&fbatch, folio) > 0)
-			continue;
-		free_unref_folios(&fbatch);
-	}
-
-	if (fbatch.nr)
-		free_unref_folios(&fbatch);
-}
-
 /*
  * split_page takes a non-compound higher-order page, and splits it into
  * n (1<<order) sub-pages: page[0..n]
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 15/18] mm: Convert free_pages_and_swap_cache() to use folios_put()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (13 preceding siblings ...)
  2023-08-25 13:59 ` [RFC PATCH 14/14] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
@ 2023-08-30 18:50 ` Matthew Wilcox (Oracle)
  2023-08-30 18:50 ` [RFC PATCH 16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-30 18:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), Ryan Roberts

Process the pages in batch-sized quantities instead of all-at-once.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/swap_state.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index b3b14bd0dd64..f68ddeb93698 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -14,6 +14,7 @@
 #include <linux/swapops.h>
 #include <linux/init.h>
 #include <linux/pagemap.h>
+#include <linux/pagevec.h>
 #include <linux/backing-dev.h>
 #include <linux/blkdev.h>
 #include <linux/migrate.h>
@@ -309,10 +310,18 @@ void free_page_and_swap_cache(struct page *page)
  */
 void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
 {
+	struct folio_batch folios;
+
 	lru_add_drain();
-	for (int i = 0; i < nr; i++)
-		free_swap_cache(encoded_page_ptr(pages[i]));
-	release_pages(pages, nr);
+	folio_batch_init(&folios);
+	for (int i = 0; i < nr; i++) {
+		struct folio *folio = page_folio(encoded_page_ptr(pages[i]));
+		free_swap_cache(&folio->page);
+		if (folio_batch_add(&folios, folio) == 0)
+			folios_put(&folios);
+	}
+	if (folios.nr)
+		folios_put(&folios);
 }
 
 static inline bool swap_use_vma_readahead(void)
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (14 preceding siblings ...)
  2023-08-30 18:50 ` [RFC PATCH 15/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
@ 2023-08-30 18:50 ` Matthew Wilcox (Oracle)
  2023-08-30 18:50 ` [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-30 18:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), Ryan Roberts

These pages are all chained together through the lru list, so we know
they're folios.  Use the folio APIs to save three hidden calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/khugepaged.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 88433cc25d8a..afc94c281035 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -685,8 +685,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 						spinlock_t *ptl,
 						struct list_head *compound_pagelist)
 {
-	struct page *src_page;
-	struct page *tmp;
+	struct folio *src, *tmp;
 	pte_t *_pte;
 	pte_t pteval;
 
@@ -705,7 +704,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 				ksm_might_unmap_zero_page(vma->vm_mm, pteval);
 			}
 		} else {
-			src_page = pte_page(pteval);
+			struct page *src_page = pte_page(pteval);
 			if (!PageCompound(src_page))
 				release_pte_page(src_page);
 			/*
@@ -721,14 +720,13 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 		}
 	}
 
-	list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) {
-		list_del(&src_page->lru);
-		mod_node_page_state(page_pgdat(src_page),
-				    NR_ISOLATED_ANON + page_is_file_lru(src_page),
-				    -compound_nr(src_page));
-		unlock_page(src_page);
-		free_swap_cache(src_page);
-		putback_lru_page(src_page);
+	list_for_each_entry_safe(src, tmp, compound_pagelist, lru) {
+		list_del(&src->lru);
+		node_stat_sub_folio(src, NR_ISOLATED_ANON +
+				folio_is_file_lru(src));
+		folio_unlock(src);
+		free_swap_cache(&src->page);
+		folio_putback_lru(src);
 	}
 }
 
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (15 preceding siblings ...)
  2023-08-30 18:50 ` [RFC PATCH 16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
@ 2023-08-30 18:50 ` Matthew Wilcox (Oracle)
  2023-08-31 18:49   ` Ryan Roberts
  2023-08-30 18:50 ` [RFC PATCH 18/18] mm: Add pfn_range_put() Matthew Wilcox (Oracle)
  2023-09-04 13:25 ` [RFC PATCH 00/14] Rearrange batched folio freeing Ryan Roberts
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-30 18:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), Ryan Roberts

All but one caller already has a folio, so convert
free_page_and_swap_cache() to have a folio and remove the call to
page_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/swap.h |  8 ++++----
 mm/khugepaged.c      |  2 +-
 mm/memory.c          |  2 +-
 mm/swap_state.c      | 12 ++++++------
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 493487ed7c38..3536595e3bda 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -449,9 +449,9 @@ static inline unsigned long total_swapcache_pages(void)
 	return global_node_page_state(NR_SWAPCACHE);
 }
 
-extern void free_swap_cache(struct page *page);
-extern void free_page_and_swap_cache(struct page *);
-extern void free_pages_and_swap_cache(struct encoded_page **, int);
+void free_swap_cache(struct folio *folio);
+void free_page_and_swap_cache(struct page *);
+void free_pages_and_swap_cache(struct encoded_page **, int);
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
 extern long total_swap_pages;
@@ -534,7 +534,7 @@ static inline void put_swap_device(struct swap_info_struct *si)
 /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
 #define free_swap_and_cache(e) is_pfn_swap_entry(e)
 
-static inline void free_swap_cache(struct page *page)
+static inline void free_swap_cache(struct folio *folio)
 {
 }
 
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index afc94c281035..7b83bb6a1199 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -725,7 +725,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
 		node_stat_sub_folio(src, NR_ISOLATED_ANON +
 				folio_is_file_lru(src));
 		folio_unlock(src);
-		free_swap_cache(&src->page);
+		free_swap_cache(src);
 		folio_putback_lru(src);
 	}
 }
diff --git a/mm/memory.c b/mm/memory.c
index e35328c2f76e..2611d0fa4465 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3194,7 +3194,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 		folio_put(new_folio);
 	if (old_folio) {
 		if (page_copied)
-			free_swap_cache(&old_folio->page);
+			free_swap_cache(old_folio);
 		folio_put(old_folio);
 	}
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index f68ddeb93698..e6b4a00d3655 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -282,10 +282,8 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin,
  * folio_free_swap() _with_ the lock.
  * 					- Marcelo
  */
-void free_swap_cache(struct page *page)
+void free_swap_cache(struct folio *folio)
 {
-	struct folio *folio = page_folio(page);
-
 	if (folio_test_swapcache(folio) && !folio_mapped(folio) &&
 	    folio_trylock(folio)) {
 		folio_free_swap(folio);
@@ -299,9 +297,11 @@ void free_swap_cache(struct page *page)
  */
 void free_page_and_swap_cache(struct page *page)
 {
-	free_swap_cache(page);
+	struct folio *folio = page_folio(page);
+
+	free_swap_cache(folio);
 	if (!is_huge_zero_page(page))
-		put_page(page);
+		folio_put(folio);
 }
 
 /*
@@ -316,7 +316,7 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
 	folio_batch_init(&folios);
 	for (int i = 0; i < nr; i++) {
 		struct folio *folio = page_folio(encoded_page_ptr(pages[i]));
-		free_swap_cache(&folio->page);
+		free_swap_cache(folio);
 		if (folio_batch_add(&folios, folio) == 0)
 			folios_put(&folios);
 	}
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC PATCH 18/18] mm: Add pfn_range_put()
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (16 preceding siblings ...)
  2023-08-30 18:50 ` [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)
@ 2023-08-30 18:50 ` Matthew Wilcox (Oracle)
  2023-08-31 19:03   ` Ryan Roberts
  2023-09-04 13:25 ` [RFC PATCH 00/14] Rearrange batched folio freeing Ryan Roberts
  18 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-30 18:50 UTC (permalink / raw)
  To: linux-mm; +Cc: Matthew Wilcox (Oracle), Ryan Roberts

This function will be used imminently.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 include/linux/mm.h |  7 ++++++
 mm/swap.c          | 56 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a517ef8d2386..4f9e2cfb372e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1515,6 +1515,13 @@ typedef union {
 void release_pages(release_pages_arg, int nr);
 void folios_put(struct folio_batch *folios);
 
+struct pfn_range {
+	unsigned long start;
+	unsigned long end;
+};
+
+void pfn_range_put(struct pfn_range *range, unsigned int nr);
+
 static inline void put_page(struct page *page)
 {
 	struct folio *folio = page_folio(page);
diff --git a/mm/swap.c b/mm/swap.c
index 8bd15402cd8f..218d2cc4c6f4 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -1027,6 +1027,62 @@ void folios_put(struct folio_batch *folios)
 }
 EXPORT_SYMBOL(folios_put);
 
+void pfn_range_put(struct pfn_range *range, unsigned int nr)
+{
+	struct folio_batch folios;
+	unsigned int i;
+	struct lruvec *lruvec = NULL;
+	unsigned long flags = 0;
+
+	folio_batch_init(&folios);
+	for (i = 0; i < nr; i++) {
+		struct folio *folio = pfn_folio(range[i].start);
+		unsigned int refs = range[i].end - range[i].start;
+
+		if (folio_is_zone_device(folio)) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			if (put_devmap_managed_page_refs(&folio->page, refs))
+				continue;
+			if (folio_ref_sub_and_test(folio, refs))
+				free_zone_device_page(&folio->page);
+			continue;
+		}
+
+		if (!folio_ref_sub_and_test(folio, refs))
+			continue;
+
+		/* hugetlb has its own memcg */
+		if (folio_test_hugetlb(folio)) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			free_huge_folio(folio);
+			continue;
+		}
+
+		__page_cache_release(folio, &lruvec, &flags);
+		if (folio_batch_add(&folios, folio) == 0) {
+			if (lruvec) {
+				unlock_page_lruvec_irqrestore(lruvec, flags);
+				lruvec = NULL;
+			}
+			mem_cgroup_uncharge_folios(&folios);
+			free_unref_folios(&folios);
+		}
+	}
+	if (lruvec)
+		unlock_page_lruvec_irqrestore(lruvec, flags);
+
+	if (folios.nr) {
+		mem_cgroup_uncharge_folios(&folios);
+		free_unref_folios(&folios);
+	}
+}
+
 /**
  * release_pages - batched put_page()
  * @arg: array of pages to release
-- 
2.40.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages()
  2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
@ 2023-08-31 14:21   ` Ryan Roberts
  2023-09-01  3:58     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> By making release_pages() call folios_put(), we can get rid of the calls
> to compound_head() for the callers that already know they have folios.
> We can also get rid of the lock_batch tracking as we know the size of
> the batch is limited by folio_batch.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h | 19 ++---------
>  mm/mlock.c         |  3 +-
>  mm/swap.c          | 84 +++++++++++++++++++++++++++-------------------
>  3 files changed, 52 insertions(+), 54 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2c6b54b5506a..7d1d96b75d11 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -36,6 +36,7 @@ struct anon_vma;
>  struct anon_vma_chain;
>  struct user_struct;
>  struct pt_regs;
> +struct folio_batch;
>  
>  extern int sysctl_page_lock_unfairness;
>  
> @@ -1521,23 +1522,7 @@ typedef union {
>  } release_pages_arg __attribute__ ((__transparent_union__));
>  
>  void release_pages(release_pages_arg, int nr);
> -
> -/**
> - * folios_put - Decrement the reference count on an array of folios.
> - * @folios: The folios.
> - * @nr: How many folios there are.
> - *
> - * Like folio_put(), but for an array of folios.  This is more efficient
> - * than writing the loop yourself as it will optimise the locks which
> - * need to be taken if the folios are freed.
> - *
> - * Context: May be called in process or interrupt context, but not in NMI
> - * context.  May be called while holding a spinlock.
> - */
> -static inline void folios_put(struct folio **folios, unsigned int nr)
> -{
> -	release_pages(folios, nr);
> -}
> +void folios_put(struct folio_batch *folios);
>  
>  static inline void put_page(struct page *page)
>  {
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 06bdfab83b58..67bd74a6268a 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -206,8 +206,7 @@ static void mlock_folio_batch(struct folio_batch *fbatch)
>  
>  	if (lruvec)
>  		unlock_page_lruvec_irq(lruvec);
> -	folios_put(fbatch->folios, folio_batch_count(fbatch));
> -	folio_batch_reinit(fbatch);
> +	folios_put(fbatch);
>  }
>  
>  void mlock_drain_local(void)
> diff --git a/mm/swap.c b/mm/swap.c
> index cd8f0150ba3a..7bdc63b56859 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -89,7 +89,7 @@ static void __page_cache_release(struct folio *folio)
>  		__folio_clear_lru_flags(folio);
>  		unlock_page_lruvec_irqrestore(lruvec, flags);
>  	}
> -	/* See comment on folio_test_mlocked in release_pages() */
> +	/* See comment on folio_test_mlocked in folios_put() */
>  	if (unlikely(folio_test_mlocked(folio))) {
>  		long nr_pages = folio_nr_pages(folio);
>  
> @@ -175,7 +175,7 @@ static void lru_add_fn(struct lruvec *lruvec, struct folio *folio)
>  	 * while the LRU lock is held.
>  	 *
>  	 * (That is not true of __page_cache_release(), and not necessarily
> -	 * true of release_pages(): but those only clear the mlocked flag after
> +	 * true of folios_put(): but those only clear the mlocked flag after
>  	 * folio_put_testzero() has excluded any other users of the folio.)
>  	 */
>  	if (folio_evictable(folio)) {
> @@ -221,8 +221,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
>  
>  	if (lruvec)
>  		unlock_page_lruvec_irqrestore(lruvec, flags);
> -	folios_put(fbatch->folios, folio_batch_count(fbatch));
> -	folio_batch_reinit(fbatch);
> +	folios_put(fbatch);
>  }
>  
>  static void folio_batch_add_and_move(struct folio_batch *fbatch,
> @@ -946,41 +945,27 @@ void lru_cache_disable(void)
>  }
>  
>  /**
> - * release_pages - batched put_page()
> - * @arg: array of pages to release
> - * @nr: number of pages
> + * folios_put - Decrement the reference count on a batch of folios.
> + * @folios: The folios.
>   *
> - * Decrement the reference count on all the pages in @arg.  If it
> - * fell to zero, remove the page from the LRU and free it.
> + * Like folio_put(), but for a batch of folios.  This is more efficient
> + * than writing the loop yourself as it will optimise the locks which need
> + * to be taken if the folios are freed.  The folios batch is returned
> + * empty and ready to be reused for another batch; there is no need to
> + * reinitialise it.
>   *
> - * Note that the argument can be an array of pages, encoded pages,
> - * or folio pointers. We ignore any encoded bits, and turn any of
> - * them into just a folio that gets free'd.
> + * Context: May be called in process or interrupt context, but not in NMI
> + * context.  May be called while holding a spinlock.
>   */
> -void release_pages(release_pages_arg arg, int nr)
> +void folios_put(struct folio_batch *folios)
>  {
>  	int i;
> -	struct encoded_page **encoded = arg.encoded_pages;
>  	LIST_HEAD(pages_to_free);
>  	struct lruvec *lruvec = NULL;
>  	unsigned long flags = 0;
> -	unsigned int lock_batch;
>  
> -	for (i = 0; i < nr; i++) {
> -		struct folio *folio;
> -
> -		/* Turn any of the argument types into a folio */
> -		folio = page_folio(encoded_page_ptr(encoded[i]));
> -
> -		/*
> -		 * Make sure the IRQ-safe lock-holding time does not get
> -		 * excessive with a continuous string of pages from the
> -		 * same lruvec. The lock is held only if lruvec != NULL.
> -		 */
> -		if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) {

SWAP_CLUSTER_MAX is 32. By using the folio_batch, I think you are limitted to 15
in your batch, so I guess you could be taking/releasing the lock x2 as often? Is
there any perf implication?

> -			unlock_page_lruvec_irqrestore(lruvec, flags);
> -			lruvec = NULL;
> -		}
> +	for (i = 0; i < folios->nr; i++) {
> +		struct folio *folio = folios->folios[i];
>  
>  		if (is_huge_zero_page(&folio->page))
>  			continue;
> @@ -1010,13 +995,8 @@ void release_pages(release_pages_arg arg, int nr)
>  		}
>  
>  		if (folio_test_lru(folio)) {
> -			struct lruvec *prev_lruvec = lruvec;
> -
>  			lruvec = folio_lruvec_relock_irqsave(folio, lruvec,
>  									&flags);
> -			if (prev_lruvec != lruvec)
> -				lock_batch = 0;
> -
>  			lruvec_del_folio(lruvec, folio);
>  			__folio_clear_lru_flags(folio);
>  		}
> @@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr)
>  
>  	mem_cgroup_uncharge_list(&pages_to_free);
>  	free_unref_page_list(&pages_to_free);
> +	folios->nr = 0;

folio_batch_reinit(folios) ?

> +}
> +EXPORT_SYMBOL(folios_put);
> +
> +/**
> + * release_pages - batched put_page()
> + * @arg: array of pages to release
> + * @nr: number of pages
> + *
> + * Decrement the reference count on all the pages in @arg.  If it
> + * fell to zero, remove the page from the LRU and free it.
> + *
> + * Note that the argument can be an array of pages, encoded pages,
> + * or folio pointers. We ignore any encoded bits, and turn any of
> + * them into just a folio that gets free'd.
> + */
> +void release_pages(release_pages_arg arg, int nr)
> +{
> +	struct folio_batch fbatch;
> +	struct encoded_page **encoded = arg.encoded_pages;
> +	int i;
> +
> +	folio_batch_init(&fbatch);
> +	for (i = 0; i < nr; i++) {
> +		/* Turn any of the argument types into a folio */
> +		struct folio *folio = page_folio(encoded_page_ptr(encoded[i]));
> +
> +		if (folio_batch_add(&fbatch, folio) > 0)
> +			continue;
> +		folios_put(&fbatch);
> +	}
> +
> +	if (fbatch.nr)

if (folio_batch_count(&fbatch)) ?

> +		folios_put(&fbatch);
>  }
>  EXPORT_SYMBOL(release_pages);
>  



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios
  2023-08-25 13:59 ` [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
@ 2023-08-31 14:29   ` Ryan Roberts
  2023-09-01  4:03     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:29 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Most of its callees are not yet ready to accept a folio, but we know
> all of the pages passed in are actually folios because they're linked
> through ->lru.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/page_alloc.c | 38 ++++++++++++++++++++------------------
>  1 file changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 442c1b3480aa..f1ee96fd9bef 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2469,17 +2469,17 @@ void free_unref_page(struct page *page, unsigned int order)
>  void free_unref_page_list(struct list_head *list)

Given the function has *page* in the title and this conversion to folios
internally doesn't actually change any behaviour, I don't personally see a lot
of value in doing this conversion. But if the aim is that everything eventually
becomes a folio, then fair enough.

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

>  {
>  	unsigned long __maybe_unused UP_flags;
> -	struct page *page, *next;
> +	struct folio *folio, *next;
>  	struct per_cpu_pages *pcp = NULL;
>  	struct zone *locked_zone = NULL;
>  	int batch_count = 0;
>  	int migratetype;
>  
>  	/* Prepare pages for freeing */
> -	list_for_each_entry_safe(page, next, list, lru) {
> -		unsigned long pfn = page_to_pfn(page);
> -		if (!free_unref_page_prepare(page, pfn, 0)) {
> -			list_del(&page->lru);
> +	list_for_each_entry_safe(folio, next, list, lru) {
> +		unsigned long pfn = folio_pfn(folio);
> +		if (!free_unref_page_prepare(&folio->page, pfn, 0)) {
> +			list_del(&folio->lru);
>  			continue;
>  		}
>  
> @@ -2487,24 +2487,25 @@ void free_unref_page_list(struct list_head *list)
>  		 * Free isolated pages directly to the allocator, see
>  		 * comment in free_unref_page.
>  		 */
> -		migratetype = get_pcppage_migratetype(page);
> +		migratetype = get_pcppage_migratetype(&folio->page);
>  		if (unlikely(is_migrate_isolate(migratetype))) {
> -			list_del(&page->lru);
> -			free_one_page(page_zone(page), page, pfn, 0, migratetype, FPI_NONE);
> +			list_del(&folio->lru);
> +			free_one_page(folio_zone(folio), &folio->page, pfn,
> +					0, migratetype, FPI_NONE);
>  			continue;
>  		}
>  	}
>  
> -	list_for_each_entry_safe(page, next, list, lru) {
> -		struct zone *zone = page_zone(page);
> +	list_for_each_entry_safe(folio, next, list, lru) {
> +		struct zone *zone = folio_zone(folio);
>  
> -		list_del(&page->lru);
> -		migratetype = get_pcppage_migratetype(page);
> +		list_del(&folio->lru);
> +		migratetype = get_pcppage_migratetype(&folio->page);
>  
>  		/*
>  		 * Either different zone requiring a different pcp lock or
>  		 * excessive lock hold times when freeing a large list of
> -		 * pages.
> +		 * folios.
>  		 */
>  		if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {
>  			if (pcp) {
> @@ -2515,15 +2516,16 @@ void free_unref_page_list(struct list_head *list)
>  			batch_count = 0;
>  
>  			/*
> -			 * trylock is necessary as pages may be getting freed
> +			 * trylock is necessary as folios may be getting freed
>  			 * from IRQ or SoftIRQ context after an IO completion.
>  			 */
>  			pcp_trylock_prepare(UP_flags);
>  			pcp = pcp_spin_trylock(zone->per_cpu_pageset);
>  			if (unlikely(!pcp)) {
>  				pcp_trylock_finish(UP_flags);
> -				free_one_page(zone, page, page_to_pfn(page),
> -					      0, migratetype, FPI_NONE);
> +				free_one_page(zone, &folio->page,
> +						folio_pfn(folio), 0,
> +						migratetype, FPI_NONE);
>  				locked_zone = NULL;
>  				continue;
>  			}
> @@ -2537,8 +2539,8 @@ void free_unref_page_list(struct list_head *list)
>  		if (unlikely(migratetype >= MIGRATE_PCPTYPES))
>  			migratetype = MIGRATE_MOVABLE;
>  
> -		trace_mm_page_free_batched(page);
> -		free_unref_page_commit(zone, pcp, page, migratetype, 0);
> +		trace_mm_page_free_batched(&folio->page);
> +		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
>  		batch_count++;
>  	}
>  



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 03/14] mm: Add free_unref_folios()
  2023-08-25 13:59 ` [RFC PATCH 03/14] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
@ 2023-08-31 14:39   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:39 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Iterate over a folio_batch rather than a linked list.  This is
> easier for the CPU to prefetch and has a batch count naturally
> built in so we don't need to track it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/internal.h   |  5 +++--
>  mm/page_alloc.c | 59 ++++++++++++++++++++++++++++++-------------------
>  2 files changed, 39 insertions(+), 25 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 7499b5ea1cf6..5c6a53371aeb 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -441,8 +441,9 @@ extern void post_alloc_hook(struct page *page, unsigned int order,
>  					gfp_t gfp_flags);
>  extern int user_min_free_kbytes;
>  
> -extern void free_unref_page(struct page *page, unsigned int order);
> -extern void free_unref_page_list(struct list_head *list);
> +void free_unref_page(struct page *page, unsigned int order);
> +void free_unref_folios(struct folio_batch *fbatch);
> +void free_unref_page_list(struct list_head *list);
>  
>  extern void zone_pcp_reset(struct zone *zone);
>  extern void zone_pcp_disable(struct zone *zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f1ee96fd9bef..bca5c70b5576 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -32,6 +32,7 @@
>  #include <linux/sysctl.h>
>  #include <linux/cpu.h>
>  #include <linux/cpuset.h>
> +#include <linux/pagevec.h>
>  #include <linux/memory_hotplug.h>
>  #include <linux/nodemask.h>
>  #include <linux/vmstat.h>
> @@ -2464,57 +2465,51 @@ void free_unref_page(struct page *page, unsigned int order)
>  }
>  
>  /*
> - * Free a list of 0-order pages
> + * Free a batch of 0-order pages
>   */
> -void free_unref_page_list(struct list_head *list)
> +void free_unref_folios(struct folio_batch *folios)
>  {
>  	unsigned long __maybe_unused UP_flags;
> -	struct folio *folio, *next;
>  	struct per_cpu_pages *pcp = NULL;
>  	struct zone *locked_zone = NULL;
> -	int batch_count = 0;
> -	int migratetype;
> +	int i, j, migratetype;
>  
> -	/* Prepare pages for freeing */
> -	list_for_each_entry_safe(folio, next, list, lru) {
> +	/* Prepare folios for freeing */
> +	for (i = 0, j = 0; i < folios->nr; i++) {
> +		struct folio *folio = folios->folios[i];
>  		unsigned long pfn = folio_pfn(folio);
> -		if (!free_unref_page_prepare(&folio->page, pfn, 0)) {
> -			list_del(&folio->lru);
> +		if (!free_unref_page_prepare(&folio->page, pfn, 0))
>  			continue;
> -		}
>  
>  		/*
> -		 * Free isolated pages directly to the allocator, see
> +		 * Free isolated folios directly to the allocator, see
>  		 * comment in free_unref_page.
>  		 */
>  		migratetype = get_pcppage_migratetype(&folio->page);
>  		if (unlikely(is_migrate_isolate(migratetype))) {
> -			list_del(&folio->lru);
>  			free_one_page(folio_zone(folio), &folio->page, pfn,
>  					0, migratetype, FPI_NONE);
>  			continue;
>  		}
> +		if (j != i)
> +			folios->folios[j] = folio;
> +		j++;
>  	}
> +	folios->nr = j;
>  
> -	list_for_each_entry_safe(folio, next, list, lru) {
> +	for (i = 0; i < folios->nr; i++) {
> +		struct folio *folio = folios->folios[i];
>  		struct zone *zone = folio_zone(folio);
>  
> -		list_del(&folio->lru);
>  		migratetype = get_pcppage_migratetype(&folio->page);
>  
> -		/*
> -		 * Either different zone requiring a different pcp lock or
> -		 * excessive lock hold times when freeing a large list of
> -		 * folios.
> -		 */
> -		if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX) {

Same comment as for release_pages(): the batch count is effectively halved. Does
this have perf implications? I guess you'll need to benchmark...

> +		/* Different zone requires a different pcp lock */
> +		if (zone != locked_zone) {
>  			if (pcp) {
>  				pcp_spin_unlock(pcp);
>  				pcp_trylock_finish(UP_flags);
>  			}
>  
> -			batch_count = 0;
> -
>  			/*
>  			 * trylock is necessary as folios may be getting freed
>  			 * from IRQ or SoftIRQ context after an IO completion.
> @@ -2541,13 +2536,31 @@ void free_unref_page_list(struct list_head *list)
>  
>  		trace_mm_page_free_batched(&folio->page);
>  		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
> -		batch_count++;
>  	}
>  
>  	if (pcp) {
>  		pcp_spin_unlock(pcp);
>  		pcp_trylock_finish(UP_flags);
>  	}
> +	folios->nr = 0;

Same nits as for previous patch: Better to use the APIs rather than manipulate
the internal folio_batch state directly?

> +}
> +
> +void free_unref_page_list(struct list_head *list)
> +{
> +	struct folio_batch fbatch;
> +
> +	folio_batch_init(&fbatch);
> +	while (!list_empty(list)) {
> +		struct folio *folio = list_first_entry(list, struct folio, lru);
> +
> +		list_del(&folio->lru);
> +		if (folio_batch_add(&fbatch, folio) > 0)
> +			continue;
> +		free_unref_folios(&fbatch);
> +	}
> +
> +	if (fbatch.nr)
> +		free_unref_folios(&fbatch);
>  }
>  
>  /*



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release()
  2023-08-25 13:59 ` [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
@ 2023-08-31 14:41   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:41 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> There's no need to indirect through release_pages() and iterate
> over this batch of folios an extra time; we can just use the batch
> that we have.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  mm/swap.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 7bdc63b56859..c5ea0c6669e7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1073,8 +1073,7 @@ void __folio_batch_release(struct folio_batch *fbatch)
>  		lru_add_drain();
>  		fbatch->percpu_pvec_drained = true;
>  	}
> -	release_pages(fbatch->folios, folio_batch_count(fbatch));
> -	folio_batch_reinit(fbatch);
> +	folios_put(fbatch);
>  }
>  EXPORT_SYMBOL(__folio_batch_release);
>  



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios()
  2023-08-25 13:59 ` [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
@ 2023-08-31 14:49   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Almost identical to mem_cgroup_uncharge_list(), except it takes a
> folio_batch instead of a list_head.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/memcontrol.h | 12 ++++++++++++
>  mm/memcontrol.c            | 13 +++++++++++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index ab94ad4597d0..d0caeb1c1136 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -713,6 +713,14 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
>  	__mem_cgroup_uncharge_list(page_list);
>  }
>  
> +void __mem_cgroup_uncharge_folios(struct folio_batch *folios);
> +static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
> +{
> +	if (mem_cgroup_disabled())
> +		return;
> +	__mem_cgroup_uncharge_folios(folios);
> +}
> +
>  void mem_cgroup_migrate(struct folio *old, struct folio *new);
>  
>  /**
> @@ -1273,6 +1281,10 @@ static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
>  {
>  }
>  
> +static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
> +{
> +}
> +
>  static inline void mem_cgroup_migrate(struct folio *old, struct folio *new)
>  {
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 26ba9d1a66d1..a0fb008afa23 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -33,6 +33,7 @@
>  #include <linux/shmem_fs.h>
>  #include <linux/hugetlb.h>
>  #include <linux/pagemap.h>
> +#include <linux/pagevec.h>
>  #include <linux/vm_event_item.h>
>  #include <linux/smp.h>
>  #include <linux/page-flags.h>
> @@ -7190,6 +7191,18 @@ void __mem_cgroup_uncharge_list(struct list_head *page_list)
>  		uncharge_batch(&ug);
>  }
>  
> +void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
> +{
> +	struct uncharge_gather ug;
> +	unsigned int i;
> +
> +	uncharge_gather_clear(&ug);
> +	for (i = 0; i < folios->nr; i++)
> +		uncharge_folio(folios->folios[i], &ug);
> +	if (ug.memcg)
> +		uncharge_batch(&ug);
> +}
> +
>  /**
>   * mem_cgroup_migrate - Charge a folio's replacement.
>   * @old: Currently circulating folio.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 06/14] mm: Remove use of folio list from folios_put()
  2023-08-25 13:59 ` [RFC PATCH 06/14] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
@ 2023-08-31 14:53   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 14:53 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Instead of putting the interesting folios on a list, delete the
> uninteresting one from the folio_batch.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  mm/swap.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index c5ea0c6669e7..6b2f3f77450c 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -959,12 +959,11 @@ void lru_cache_disable(void)
>   */
>  void folios_put(struct folio_batch *folios)
>  {
> -	int i;
> -	LIST_HEAD(pages_to_free);
> +	int i, j;
>  	struct lruvec *lruvec = NULL;
>  	unsigned long flags = 0;
>  
> -	for (i = 0; i < folios->nr; i++) {
> +	for (i = 0, j = 0; i < folios->nr; i++) {
>  		struct folio *folio = folios->folios[i];
>  
>  		if (is_huge_zero_page(&folio->page))
> @@ -1013,14 +1012,18 @@ void folios_put(struct folio_batch *folios)
>  			count_vm_event(UNEVICTABLE_PGCLEARED);
>  		}
>  
> -		list_add(&folio->lru, &pages_to_free);
> +		if (j != i)
> +			folios->folios[j] = folio;
> +		j++;
>  	}
>  	if (lruvec)
>  		unlock_page_lruvec_irqrestore(lruvec, flags);
> +	folios->nr = j;
> +	if (!j)
> +		return;
>  
> -	mem_cgroup_uncharge_list(&pages_to_free);
> -	free_unref_page_list(&pages_to_free);
> -	folios->nr = 0;
> +	mem_cgroup_uncharge_folios(folios);
> +	free_unref_folios(folios);
>  }
>  EXPORT_SYMBOL(folios_put);
>  



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios()
  2023-08-25 13:59 ` [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
@ 2023-08-31 15:21   ` Ryan Roberts
  2023-09-01  4:09     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 15:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Call folio_undo_large_rmappable() if needed.  free_unref_page_prepare()
> destroys the ability to call folio_order(), so stash the order in
> folio->private for the benefit of the second loop.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/page_alloc.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bca5c70b5576..e586d17fb7f2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2465,7 +2465,7 @@ void free_unref_page(struct page *page, unsigned int order)
>  }
>  
>  /*
> - * Free a batch of 0-order pages
> + * Free a batch of folios
>   */
>  void free_unref_folios(struct folio_batch *folios)
>  {
> @@ -2478,7 +2478,11 @@ void free_unref_folios(struct folio_batch *folios)
>  	for (i = 0, j = 0; i < folios->nr; i++) {
>  		struct folio *folio = folios->folios[i];
>  		unsigned long pfn = folio_pfn(folio);
> -		if (!free_unref_page_prepare(&folio->page, pfn, 0))
> +		unsigned int order = folio_order(folio);

Do you need to do anything special for hugetlb folios? I see that
destroy_large_folio() has:

	if (folio_test_hugetlb(folio)) {
		free_huge_folio(folio);
		return;
	}

> +
> +		if (order > 0 && folio_test_large_rmappable(folio))
> +			folio_undo_large_rmappable(folio);
> +		if (!free_unref_page_prepare(&folio->page, pfn, order))
>  			continue;
>  
>  		/*
> @@ -2486,11 +2490,13 @@ void free_unref_folios(struct folio_batch *folios)
>  		 * comment in free_unref_page.
>  		 */
>  		migratetype = get_pcppage_migratetype(&folio->page);
> -		if (unlikely(is_migrate_isolate(migratetype))) {
> +		if (order > PAGE_ALLOC_COSTLY_ORDER ||

Should this be `if (!pcp_allowed_order(order) ||` ? That helper includes the THP
pageblock_order too.

> +		    is_migrate_isolate(migratetype)) {
>  			free_one_page(folio_zone(folio), &folio->page, pfn,
> -					0, migratetype, FPI_NONE);
> +					order, migratetype, FPI_NONE);
>  			continue;
>  		}
> +		folio->private = (void *)(unsigned long)order;
>  		if (j != i)
>  			folios->folios[j] = folio;
>  		j++;
> @@ -2500,7 +2506,9 @@ void free_unref_folios(struct folio_batch *folios)
>  	for (i = 0; i < folios->nr; i++) {
>  		struct folio *folio = folios->folios[i];
>  		struct zone *zone = folio_zone(folio);
> +		unsigned int order = (unsigned long)folio->private;
>  
> +		folio->private = NULL;
>  		migratetype = get_pcppage_migratetype(&folio->page);
>  
>  		/* Different zone requires a different pcp lock */
> @@ -2519,7 +2527,7 @@ void free_unref_folios(struct folio_batch *folios)
>  			if (unlikely(!pcp)) {
>  				pcp_trylock_finish(UP_flags);
>  				free_one_page(zone, &folio->page,
> -						folio_pfn(folio), 0,
> +						folio_pfn(folio), order,
>  						migratetype, FPI_NONE);
>  				locked_zone = NULL;
>  				continue;
> @@ -2535,7 +2543,8 @@ void free_unref_folios(struct folio_batch *folios)
>  			migratetype = MIGRATE_MOVABLE;
>  
>  		trace_mm_page_free_batched(&folio->page);
> -		free_unref_page_commit(zone, pcp, &folio->page, migratetype, 0);
> +		free_unref_page_commit(zone, pcp, &folio->page, migratetype,
> +				order);
>  	}
>  
>  	if (pcp) {



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed
  2023-08-25 13:59 ` [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
@ 2023-08-31 15:28   ` Ryan Roberts
  2023-09-01  4:10     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 15:28 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Hugetlb folios still get special treatment, but normal large folios
> can now be freed by free_unref_folios().  This should have a reasonable
> performance impact, TBD.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  mm/swap.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 21c2df0f7928..8bd15402cd8f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1000,12 +1000,13 @@ void folios_put(struct folio_batch *folios)
>  		if (!folio_put_testzero(folio))
>  			continue;
>  
> -		if (folio_test_large(folio)) {
> +		/* hugetlb has its own memcg */
> +		if (folio_test_hugetlb(folio)) {

Ahh I see, you special case hugetlb here. You can disregard my comment about
hugetlb in the previous patch.

>  			if (lruvec) {
>  				unlock_page_lruvec_irqrestore(lruvec, flags);
>  				lruvec = NULL;
>  			}
> -			__folio_put_large(folio);
> +			free_huge_folio(folio);
>  			continue;
>  		}
>  



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru()
  2023-08-25 13:59 ` [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
@ 2023-08-31 15:46   ` Ryan Roberts
  2023-09-01  4:16     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 15:46 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> The few folios which can't be moved to the LRU list (because their
> refcount dropped to zero) used to be returned to the caller to dispose
> of.  Make this simpler to call by freeing the folios directly through
> free_unref_folios().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  mm/vmscan.c | 31 ++++++++++++-------------------
>  1 file changed, 12 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 965c429847fd..d5080510608e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2489,8 +2489,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
>  		struct list_head *list)

The comment for this function is now stale:

/*
 * move_folios_to_lru() moves folios from private @list to appropriate LRU list.
 * On return, @list is reused as a list of folios to be freed by the caller.
 *
 * Returns the number of pages moved to the given lruvec.
 */

I think the "On return" bit is no longer true.


>  {
>  	int nr_pages, nr_moved = 0;
> -	LIST_HEAD(folios_to_free);
> +	struct folio_batch free_folios;
>  
> +	folio_batch_init(&free_folios);
>  	while (!list_empty(list)) {
>  		struct folio *folio = lru_to_folio(list);
>  
> @@ -2519,12 +2520,12 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
>  		if (unlikely(folio_put_testzero(folio))) {
>  			__folio_clear_lru_flags(folio);
>  
> -			if (unlikely(folio_test_large(folio))) {
> +			if (folio_batch_add(&free_folios, folio) == 0) {
>  				spin_unlock_irq(&lruvec->lru_lock);
> -				destroy_large_folio(folio);
> +				mem_cgroup_uncharge_folios(&free_folios);
> +				free_unref_folios(&free_folios);
>  				spin_lock_irq(&lruvec->lru_lock);
> -			} else
> -				list_add(&folio->lru, &folios_to_free);
> +			}
>  
>  			continue;
>  		}
> @@ -2541,10 +2542,12 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
>  			workingset_age_nonresident(lruvec, nr_pages);
>  	}
>  
> -	/*
> -	 * To save our caller's stack, now use input list for pages to free.
> -	 */
> -	list_splice(&folios_to_free, list);
> +	if (free_folios.nr) {
> +		spin_unlock_irq(&lruvec->lru_lock);
> +		mem_cgroup_uncharge_folios(&free_folios);
> +		free_unref_folios(&free_folios);
> +		spin_lock_irq(&lruvec->lru_lock);
> +	}
>  
>  	return nr_moved;
>  }
> @@ -2623,8 +2626,6 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan,
>  	spin_unlock_irq(&lruvec->lru_lock);
>  
>  	lru_note_cost(lruvec, file, stat.nr_pageout, nr_scanned - nr_reclaimed);
> -	mem_cgroup_uncharge_list(&folio_list);
> -	free_unref_page_list(&folio_list);
>  
>  	/*
>  	 * If dirty folios are scanned that are not queued for IO, it
> @@ -2765,8 +2766,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  
>  	nr_activate = move_folios_to_lru(lruvec, &l_active);
>  	nr_deactivate = move_folios_to_lru(lruvec, &l_inactive);
> -	/* Keep all free folios in l_active list */
> -	list_splice(&l_inactive, &l_active);
>  
>  	__count_vm_events(PGDEACTIVATE, nr_deactivate);
>  	__count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate);
> @@ -2776,8 +2775,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  
>  	if (nr_rotated)
>  		lru_note_cost(lruvec, file, 0, nr_rotated);
> -	mem_cgroup_uncharge_list(&l_active);
> -	free_unref_page_list(&l_active);
>  	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
>  			nr_deactivate, nr_rotated, sc->priority, file);
>  }
> @@ -5238,10 +5235,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
>  
>  	spin_unlock_irq(&lruvec->lru_lock);
>  
> -	mem_cgroup_uncharge_list(&list);
> -	free_unref_page_list(&list);
> -
> -	INIT_LIST_HEAD(&list);
>  	list_splice_init(&clean, &list);
>  
>  	if (!list_empty(&list)) {



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list()
  2023-08-25 13:59 ` [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
@ 2023-08-31 18:26   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 18:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> All users have been converted to mem_cgroup_uncharge_folios() so
> we can remove this API.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/memcontrol.h | 12 ------------
>  mm/memcontrol.c            | 19 -------------------
>  2 files changed, 31 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 44d3e459d16f..2b3966ff85f2 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -705,14 +705,6 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
>  	__mem_cgroup_uncharge(folio);
>  }
>  
> -void __mem_cgroup_uncharge_list(struct list_head *page_list);
> -static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
> -{
> -	if (mem_cgroup_disabled())
> -		return;
> -	__mem_cgroup_uncharge_list(page_list);
> -}
> -
>  void __mem_cgroup_uncharge_folios(struct folio_batch *folios);
>  static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
>  {
> @@ -1277,10 +1269,6 @@ static inline void mem_cgroup_uncharge(struct folio *folio)
>  {
>  }
>  
> -static inline void mem_cgroup_uncharge_list(struct list_head *page_list)
> -{
> -}
> -
>  static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios)
>  {
>  }
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a0fb008afa23..c89265bd9ce6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7172,25 +7172,6 @@ void __mem_cgroup_uncharge(struct folio *folio)
>  	uncharge_batch(&ug);
>  }
>  
> -/**
> - * __mem_cgroup_uncharge_list - uncharge a list of page
> - * @page_list: list of pages to uncharge
> - *
> - * Uncharge a list of pages previously charged with
> - * __mem_cgroup_charge().
> - */
> -void __mem_cgroup_uncharge_list(struct list_head *page_list)
> -{
> -	struct uncharge_gather ug;
> -	struct folio *folio;
> -
> -	uncharge_gather_clear(&ug);
> -	list_for_each_entry(folio, page_list, lru)
> -		uncharge_folio(folio, &ug);
> -	if (ug.memcg)
> -		uncharge_batch(&ug);
> -}
> -
>  void __mem_cgroup_uncharge_folios(struct folio_batch *folios)
>  {
>  	struct uncharge_gather ug;



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 14/14] mm: Remove free_unref_page_list()
  2023-08-25 13:59 ` [RFC PATCH 14/14] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
@ 2023-08-31 18:27   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 18:27 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> All callers now use free_unref_folios() so we can delete this function.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  mm/internal.h   |  1 -
>  mm/page_alloc.c | 18 ------------------
>  2 files changed, 19 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 5c6a53371aeb..42feb3f59ee5 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -443,7 +443,6 @@ extern int user_min_free_kbytes;
>  
>  void free_unref_page(struct page *page, unsigned int order);
>  void free_unref_folios(struct folio_batch *fbatch);
> -void free_unref_page_list(struct list_head *list);
>  
>  extern void zone_pcp_reset(struct zone *zone);
>  extern void zone_pcp_disable(struct zone *zone);
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e586d17fb7f2..496304014ec0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2554,24 +2554,6 @@ void free_unref_folios(struct folio_batch *folios)
>  	folios->nr = 0;
>  }
>  
> -void free_unref_page_list(struct list_head *list)
> -{
> -	struct folio_batch fbatch;
> -
> -	folio_batch_init(&fbatch);
> -	while (!list_empty(list)) {
> -		struct folio *folio = list_first_entry(list, struct folio, lru);
> -
> -		list_del(&folio->lru);
> -		if (folio_batch_add(&fbatch, folio) > 0)
> -			continue;
> -		free_unref_folios(&fbatch);
> -	}
> -
> -	if (fbatch.nr)
> -		free_unref_folios(&fbatch);
> -}
> -
>  /*
>   * split_page takes a non-compound higher-order page, and splits it into
>   * n (1<<order) sub-pages: page[0..n]



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio
  2023-08-30 18:50 ` [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)
@ 2023-08-31 18:49   ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 18:49 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 30/08/2023 19:50, Matthew Wilcox (Oracle) wrote:
> All but one caller already has a folio, so convert
> free_page_and_swap_cache() to have a folio and remove the call to
> page_folio().
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/swap.h |  8 ++++----
>  mm/khugepaged.c      |  2 +-
>  mm/memory.c          |  2 +-
>  mm/swap_state.c      | 12 ++++++------
>  4 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 493487ed7c38..3536595e3bda 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -449,9 +449,9 @@ static inline unsigned long total_swapcache_pages(void)
>  	return global_node_page_state(NR_SWAPCACHE);
>  }
>  
> -extern void free_swap_cache(struct page *page);
> -extern void free_page_and_swap_cache(struct page *);
> -extern void free_pages_and_swap_cache(struct encoded_page **, int);
> +void free_swap_cache(struct folio *folio);
> +void free_page_and_swap_cache(struct page *);
> +void free_pages_and_swap_cache(struct encoded_page **, int);
>  /* linux/mm/swapfile.c */
>  extern atomic_long_t nr_swap_pages;
>  extern long total_swap_pages;
> @@ -534,7 +534,7 @@ static inline void put_swap_device(struct swap_info_struct *si)
>  /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
>  #define free_swap_and_cache(e) is_pfn_swap_entry(e)
>  
> -static inline void free_swap_cache(struct page *page)
> +static inline void free_swap_cache(struct folio *folio)
>  {
>  }
>  
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index afc94c281035..7b83bb6a1199 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -725,7 +725,7 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte,
>  		node_stat_sub_folio(src, NR_ISOLATED_ANON +
>  				folio_is_file_lru(src));
>  		folio_unlock(src);
> -		free_swap_cache(&src->page);
> +		free_swap_cache(src);
>  		folio_putback_lru(src);
>  	}
>  }
> diff --git a/mm/memory.c b/mm/memory.c
> index e35328c2f76e..2611d0fa4465 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3194,7 +3194,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>  		folio_put(new_folio);
>  	if (old_folio) {
>  		if (page_copied)
> -			free_swap_cache(&old_folio->page);
> +			free_swap_cache(old_folio);
>  		folio_put(old_folio);
>  	}
>  
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index f68ddeb93698..e6b4a00d3655 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -282,10 +282,8 @@ void clear_shadow_from_swap_cache(int type, unsigned long begin,
>   * folio_free_swap() _with_ the lock.
>   * 					- Marcelo
>   */
> -void free_swap_cache(struct page *page)
> +void free_swap_cache(struct folio *folio)
>  {
> -	struct folio *folio = page_folio(page);
> -
>  	if (folio_test_swapcache(folio) && !folio_mapped(folio) &&
>  	    folio_trylock(folio)) {
>  		folio_free_swap(folio);
> @@ -299,9 +297,11 @@ void free_swap_cache(struct page *page)
>   */
>  void free_page_and_swap_cache(struct page *page)
>  {
> -	free_swap_cache(page);
> +	struct folio *folio = page_folio(page);
> +
> +	free_swap_cache(folio);
>  	if (!is_huge_zero_page(page))
> -		put_page(page);
> +		folio_put(folio);
>  }
>  
>  /*
> @@ -316,7 +316,7 @@ void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
>  	folio_batch_init(&folios);
>  	for (int i = 0; i < nr; i++) {
>  		struct folio *folio = page_folio(encoded_page_ptr(pages[i]));
> -		free_swap_cache(&folio->page);
> +		free_swap_cache(folio);
>  		if (folio_batch_add(&folios, folio) == 0)
>  			folios_put(&folios);
>  	}



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 18/18] mm: Add pfn_range_put()
  2023-08-30 18:50 ` [RFC PATCH 18/18] mm: Add pfn_range_put() Matthew Wilcox (Oracle)
@ 2023-08-31 19:03   ` Ryan Roberts
  2023-09-01  4:27     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-08-31 19:03 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 30/08/2023 19:50, Matthew Wilcox (Oracle) wrote:
> This function will be used imminently.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  include/linux/mm.h |  7 ++++++
>  mm/swap.c          | 56 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a517ef8d2386..4f9e2cfb372e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1515,6 +1515,13 @@ typedef union {
>  void release_pages(release_pages_arg, int nr);
>  void folios_put(struct folio_batch *folios);
>  
> +struct pfn_range {
> +	unsigned long start;
> +	unsigned long end;
> +};

As mentioned in the other thread, I think we need to better convey that a
pfn_range must be fully contained within a single folio. Perhaps its better to
call it `struct folio_region`?

> +
> +void pfn_range_put(struct pfn_range *range, unsigned int nr);

If going with `struct folio_region`, we could call this folios_put_refs()?

Or if you hate that idea and want to stick with pfn, then at least call it
pfn_ranges_put() (with s on ranges).

> +
>  static inline void put_page(struct page *page)
>  {
>  	struct folio *folio = page_folio(page);
> diff --git a/mm/swap.c b/mm/swap.c
> index 8bd15402cd8f..218d2cc4c6f4 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1027,6 +1027,62 @@ void folios_put(struct folio_batch *folios)
>  }
>  EXPORT_SYMBOL(folios_put);
>  
> +void pfn_range_put(struct pfn_range *range, unsigned int nr)
> +{
> +	struct folio_batch folios;
> +	unsigned int i;
> +	struct lruvec *lruvec = NULL;
> +	unsigned long flags = 0;
> +
> +	folio_batch_init(&folios);
> +	for (i = 0; i < nr; i++) {
> +		struct folio *folio = pfn_folio(range[i].start);
> +		unsigned int refs = range[i].end - range[i].start;

Don't you need to check for huge zero page like in folios_put()?

		if (is_huge_zero_page(&folio->page))
			continue;

> +
> +		if (folio_is_zone_device(folio)) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			if (put_devmap_managed_page_refs(&folio->page, refs))
> +				continue;
> +			if (folio_ref_sub_and_test(folio, refs))
> +				free_zone_device_page(&folio->page);
> +			continue;
> +		}
> +
> +		if (!folio_ref_sub_and_test(folio, refs))
> +			continue;
> +
> +		/* hugetlb has its own memcg */
> +		if (folio_test_hugetlb(folio)) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			free_huge_folio(folio);
> +			continue;
> +		}
> +
> +		__page_cache_release(folio, &lruvec, &flags);
> +		if (folio_batch_add(&folios, folio) == 0) {
> +			if (lruvec) {
> +				unlock_page_lruvec_irqrestore(lruvec, flags);
> +				lruvec = NULL;
> +			}
> +			mem_cgroup_uncharge_folios(&folios);
> +			free_unref_folios(&folios);
> +		}
> +	}
> +	if (lruvec)
> +		unlock_page_lruvec_irqrestore(lruvec, flags);
> +
> +	if (folios.nr) {
> +		mem_cgroup_uncharge_folios(&folios);
> +		free_unref_folios(&folios);
> +	}
> +}

This still duplicates a lot of the logic in folios_put(), but I see you have an
idea in the other thread for improving this situation - I'll reply in the
context of that thread.

But overall this looks great - thanks for taking the time to put this together -
it will integrate nicely with my mmugather series.

Thanks,
Ryan


> +
>  /**
>   * release_pages - batched put_page()
>   * @arg: array of pages to release



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages()
  2023-08-31 14:21   ` Ryan Roberts
@ 2023-09-01  3:58     ` Matthew Wilcox
  2023-09-01  8:14       ` Ryan Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  3:58 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 03:21:53PM +0100, Ryan Roberts wrote:
> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> > By making release_pages() call folios_put(), we can get rid of the calls
> > to compound_head() for the callers that already know they have folios.
> > We can also get rid of the lock_batch tracking as we know the size of
> > the batch is limited by folio_batch.
> > -		/*
> > -		 * Make sure the IRQ-safe lock-holding time does not get
> > -		 * excessive with a continuous string of pages from the
> > -		 * same lruvec. The lock is held only if lruvec != NULL.
> > -		 */
> > -		if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) {
> 
> SWAP_CLUSTER_MAX is 32. By using the folio_batch, I think you are limitted to 15
> in your batch, so I guess you could be taking/releasing the lock x2 as often? Is
> there any perf implication?

Yes, if the batch size is larger than 15, we'll take/release the lru lock
more often.  We could increase the size of the folio_batch if that becomes
a problem.  I'm not sure how often it's a problem; we already limit the
number of folios to process to 15 in, eg, folio_batch_add_and_move().
I'm not really sure why this code gets to be special and hold the lock
for twice as long as the callers of folio_batch_add_and_move.

> > @@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr)
> >  
> >  	mem_cgroup_uncharge_list(&pages_to_free);
> >  	free_unref_page_list(&pages_to_free);
> > +	folios->nr = 0;
> 
> folio_batch_reinit(folios) ?

I don't really like the abstraction here.  Back to folio_batch_move_lru()
as an example:

        for (i = 0; i < folio_batch_count(fbatch); i++) {
                struct folio *folio = fbatch->folios[i];
...
	}
...
	folio_batch_reinit(fbatch);

vs what I'd rather write:

	for (i = 0; i < fbatch->nr; i++) {
		struct folio *folio = fbatch->folios[i];
...
	}
...
	fbatch->nr = 0;

OK, we've successfully abstracted away that there is a member of
folio_batch called 'nr', but we still have to go poking around inside
folio_batch to extract the folio itself.  So it's not like we've
managed to make folio_batch a completely opaque type.  And I don't
think that folio_batch_count() is really all that much more descriptive
than fbatch->nr.  Indeed, I think the second one is easier to read;
it's obviously a plain loop.

I suppose that folio_batch_count() / _reinit() are easier to grep for
than '>nr\>' but I don't think that's a particularly useful thing to do.
We could add abstractions to get the folio_batch_folio(fbatch, i), but
when we start to get into something like folio_batch_remove_exceptionals()
(and there's something similar happening in this patchset where we strip
out the hugetlb folios), you're messing with the internal structure of
the folio_batch so much that you may as well not bother with any kind
of abstraction.

I'm really temped to rip out folio_batch_count() and folio_batch_reinit().
They don't seem useful enough.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios
  2023-08-31 14:29   ` Ryan Roberts
@ 2023-09-01  4:03     ` Matthew Wilcox
  2023-09-01  8:15       ` Ryan Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  4:03 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 03:29:59PM +0100, Ryan Roberts wrote:
> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> > Most of its callees are not yet ready to accept a folio, but we know
> > all of the pages passed in are actually folios because they're linked
> > through ->lru.
> > @@ -2469,17 +2469,17 @@ void free_unref_page(struct page *page, unsigned int order)
> >  void free_unref_page_list(struct list_head *list)
> 
> Given the function has *page* in the title and this conversion to folios
> internally doesn't actually change any behaviour, I don't personally see a lot
> of value in doing this conversion. But if the aim is that everything eventually
> becomes a folio, then fair enough.

It makes the next patch _way_ more obvious ;-)  If you merge the two
patches together, almost every line changes.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios()
  2023-08-31 15:21   ` Ryan Roberts
@ 2023-09-01  4:09     ` Matthew Wilcox
  0 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  4:09 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 04:21:53PM +0100, Ryan Roberts wrote:
> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> > @@ -2478,7 +2478,11 @@ void free_unref_folios(struct folio_batch *folios)
> >  	for (i = 0, j = 0; i < folios->nr; i++) {
> >  		struct folio *folio = folios->folios[i];
> >  		unsigned long pfn = folio_pfn(folio);
> > -		if (!free_unref_page_prepare(&folio->page, pfn, 0))
> > +		unsigned int order = folio_order(folio);
> 
> Do you need to do anything special for hugetlb folios? I see that
> destroy_large_folio() has:
> 
> 	if (folio_test_hugetlb(folio)) {
> 		free_huge_folio(folio);
> 		return;
> 	}

Right; hugetlb folios get freed specially and never come this way.
I could put in an assertion, I suppose?

> > @@ -2486,11 +2490,13 @@ void free_unref_folios(struct folio_batch *folios)
> >  		 * comment in free_unref_page.
> >  		 */
> >  		migratetype = get_pcppage_migratetype(&folio->page);
> > -		if (unlikely(is_migrate_isolate(migratetype))) {
> > +		if (order > PAGE_ALLOC_COSTLY_ORDER ||
> 
> Should this be `if (!pcp_allowed_order(order) ||` ? That helper includes the THP
> pageblock_order too.

Oh, yes, that's obviously far better than what I had here.  I got the
BUG in order_to_pindex() and didn't think to look around for the
correct predicate.  Thanks!



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed
  2023-08-31 15:28   ` Ryan Roberts
@ 2023-09-01  4:10     ` Matthew Wilcox
  0 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  4:10 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 04:28:06PM +0100, Ryan Roberts wrote:
> Ahh I see, you special case hugetlb here. You can disregard my comment about
> hugetlb in the previous patch.

Doh ;-)


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru()
  2023-08-31 15:46   ` Ryan Roberts
@ 2023-09-01  4:16     ` Matthew Wilcox
  0 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  4:16 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 04:46:50PM +0100, Ryan Roberts wrote:
> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> > The few folios which can't be moved to the LRU list (because their
> > refcount dropped to zero) used to be returned to the caller to dispose
> > of.  Make this simpler to call by freeing the folios directly through
> > free_unref_folios().
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  mm/vmscan.c | 31 ++++++++++++-------------------
> >  1 file changed, 12 insertions(+), 19 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 965c429847fd..d5080510608e 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2489,8 +2489,9 @@ static unsigned int move_folios_to_lru(struct lruvec *lruvec,
> >  		struct list_head *list)
> 
> The comment for this function is now stale:
> 
> /*
>  * move_folios_to_lru() moves folios from private @list to appropriate LRU list.
>  * On return, @list is reused as a list of folios to be freed by the caller.
>  *
>  * Returns the number of pages moved to the given lruvec.
>  */
> 
> I think the "On return" bit is no longer true.

It's still true, but misleading ;-)  I'll amend it to

- * On return, @list is reused as a list of folios to be freed by the caller.
+ * On return, @list is empty.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 18/18] mm: Add pfn_range_put()
  2023-08-31 19:03   ` Ryan Roberts
@ 2023-09-01  4:27     ` Matthew Wilcox
  2023-09-01  7:59       ` Ryan Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-01  4:27 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Thu, Aug 31, 2023 at 08:03:14PM +0100, Ryan Roberts wrote:
> > +++ b/include/linux/mm.h
> > @@ -1515,6 +1515,13 @@ typedef union {
> >  void release_pages(release_pages_arg, int nr);
> >  void folios_put(struct folio_batch *folios);
> >  
> > +struct pfn_range {
> > +	unsigned long start;
> > +	unsigned long end;
> > +};
> 
> As mentioned in the other thread, I think we need to better convey that a
> pfn_range must be fully contained within a single folio. Perhaps its better to
> call it `struct folio_region`?

Yep, I'm not taking a strong position on that.  We can call it whatever
you like, subject to usual nitpicking later.

> > +
> > +void pfn_range_put(struct pfn_range *range, unsigned int nr);
> 
> If going with `struct folio_region`, we could call this folios_put_refs()?
> 
> Or if you hate that idea and want to stick with pfn, then at least call it
> pfn_ranges_put() (with s on ranges).

folio_regions_put()?  I don't know at this point; nothing's speaking
to me.

> > +void pfn_range_put(struct pfn_range *range, unsigned int nr)
> > +{
> > +	struct folio_batch folios;
> > +	unsigned int i;
> > +	struct lruvec *lruvec = NULL;
> > +	unsigned long flags = 0;
> > +
> > +	folio_batch_init(&folios);
> > +	for (i = 0; i < nr; i++) {
> > +		struct folio *folio = pfn_folio(range[i].start);
> > +		unsigned int refs = range[i].end - range[i].start;
> 
> Don't you need to check for huge zero page like in folios_put()?
> 
> 		if (is_huge_zero_page(&folio->page))
> 			continue;

Maybe?  Can we put the huge zero page in here, or would we filter it
out earlier?

> > +	if (lruvec)
> > +		unlock_page_lruvec_irqrestore(lruvec, flags);
> > +
> > +	if (folios.nr) {
> > +		mem_cgroup_uncharge_folios(&folios);
> > +		free_unref_folios(&folios);
> > +	}
> > +}
> 
> This still duplicates a lot of the logic in folios_put(), but I see you have an
> idea in the other thread for improving this situation - I'll reply in the
> context of that thread.
> 
> But overall this looks great - thanks for taking the time to put this together -
> it will integrate nicely with my mmugather series.
> 

Thanks!  One thing I was wondering; intended to look at today, but didn't
have time for it -- can we use mmugather to solve how vmscan wants to
handle batched TLB IPIs for mapped pages, instead of having its own thing?
72b252aed506 is the commit to look at.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 18/18] mm: Add pfn_range_put()
  2023-09-01  4:27     ` Matthew Wilcox
@ 2023-09-01  7:59       ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-09-01  7:59 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On 01/09/2023 05:27, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 08:03:14PM +0100, Ryan Roberts wrote:
>>> +++ b/include/linux/mm.h
>>> @@ -1515,6 +1515,13 @@ typedef union {
>>>  void release_pages(release_pages_arg, int nr);
>>>  void folios_put(struct folio_batch *folios);
>>>  
>>> +struct pfn_range {
>>> +	unsigned long start;
>>> +	unsigned long end;
>>> +};
>>
>> As mentioned in the other thread, I think we need to better convey that a
>> pfn_range must be fully contained within a single folio. Perhaps its better to
>> call it `struct folio_region`?
> 
> Yep, I'm not taking a strong position on that.  We can call it whatever
> you like, subject to usual nitpicking later.
> 
>>> +
>>> +void pfn_range_put(struct pfn_range *range, unsigned int nr);
>>
>> If going with `struct folio_region`, we could call this folios_put_refs()?
>>
>> Or if you hate that idea and want to stick with pfn, then at least call it
>> pfn_ranges_put() (with s on ranges).
> 
> folio_regions_put()?  I don't know at this point; nothing's speaking
> to me.

I'm inclined to call it `struct folio_region` and
free_folio_regions_and_swap_cache(). Then it replaces
free_pages_and_swap_cache() (as per suggestion against your patch set). For now,
I'll implement free_folio_regions_and_swap_cache() similarly to release_pages()
+ free_swap_cache(). Then your patch set just has to reimplement
free_folio_regions_and_swap_cache() using the folio_batch approach. Shout if
that sounds problematic. Otherwise I'll post a new version of my patch set later
today.

> 
>>> +void pfn_range_put(struct pfn_range *range, unsigned int nr)
>>> +{
>>> +	struct folio_batch folios;
>>> +	unsigned int i;
>>> +	struct lruvec *lruvec = NULL;
>>> +	unsigned long flags = 0;
>>> +
>>> +	folio_batch_init(&folios);
>>> +	for (i = 0; i < nr; i++) {
>>> +		struct folio *folio = pfn_folio(range[i].start);
>>> +		unsigned int refs = range[i].end - range[i].start;
>>
>> Don't you need to check for huge zero page like in folios_put()?
>>
>> 		if (is_huge_zero_page(&folio->page))
>> 			continue;
> 
> Maybe?  Can we put the huge zero page in here, or would we filter it
> out earlier?

I'm not sure there is any earlier opportunity? The next earlist would be to
filter it out from the mmugather batch and that feels like confusing
responsibilities too much. So my vote is to put it here.


> 
>>> +	if (lruvec)
>>> +		unlock_page_lruvec_irqrestore(lruvec, flags);
>>> +
>>> +	if (folios.nr) {
>>> +		mem_cgroup_uncharge_folios(&folios);
>>> +		free_unref_folios(&folios);
>>> +	}
>>> +}
>>
>> This still duplicates a lot of the logic in folios_put(), but I see you have an
>> idea in the other thread for improving this situation - I'll reply in the
>> context of that thread.
>>
>> But overall this looks great - thanks for taking the time to put this together -
>> it will integrate nicely with my mmugather series.
>>
> 
> Thanks!  One thing I was wondering; intended to look at today, but didn't
> have time for it -- can we use mmugather to solve how vmscan wants to
> handle batched TLB IPIs for mapped pages, instead of having its own thing?
> 72b252aed506 is the commit to look at.

Yeah I thought about that too. mmugather works well for TLBIing a contiguous
range of VAs (it just maintains a range that gets expanded to the lowest and
highest entries). That works well for things like munmap and mmap_exit where a
virtually contigious block is being zapped. But I'm guessing that for vmscan,
the pages are usually not contiguous? If so, then I think mmugather would
over-TLBI and likely harm perf?



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages()
  2023-09-01  3:58     ` Matthew Wilcox
@ 2023-09-01  8:14       ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-09-01  8:14 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On 01/09/2023 04:58, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:21:53PM +0100, Ryan Roberts wrote:
>> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
>>> By making release_pages() call folios_put(), we can get rid of the calls
>>> to compound_head() for the callers that already know they have folios.
>>> We can also get rid of the lock_batch tracking as we know the size of
>>> the batch is limited by folio_batch.
>>> -		/*
>>> -		 * Make sure the IRQ-safe lock-holding time does not get
>>> -		 * excessive with a continuous string of pages from the
>>> -		 * same lruvec. The lock is held only if lruvec != NULL.
>>> -		 */
>>> -		if (lruvec && ++lock_batch == SWAP_CLUSTER_MAX) {
>>
>> SWAP_CLUSTER_MAX is 32. By using the folio_batch, I think you are limitted to 15
>> in your batch, so I guess you could be taking/releasing the lock x2 as often? Is
>> there any perf implication?
> 
> Yes, if the batch size is larger than 15, we'll take/release the lru lock
> more often.  We could increase the size of the folio_batch if that becomes
> a problem.  I'm not sure how often it's a problem; we already limit the
> number of folios to process to 15 in, eg, folio_batch_add_and_move().
> I'm not really sure why this code gets to be special and hold the lock
> for twice as long as the callers of folio_batch_add_and_move.

mmugather stores page pointers in batches of PAGE_SIZE (minus a small header).
So for 4K pages on 64 bit system, that's ~512 page pointers per batch. So I
would imagine you will notice 15 vs 32 for the munmap and mmap_exit cases.

As promised, I'm planning to do some benchmarking today, so I could play with
the size of the folio_batch.

I'm not sure if there is any special reason PAGEVEC_SIZE is 15? I can see you
changed it from 14 a while back, by better packing the struct pagevec header so
that the structure is still the same size it used to be. But 14 was set in
pre-history. Are things likely to break if we expand it to 31 (doubling its size)?

> 
>>> @@ -1040,6 +1020,40 @@ void release_pages(release_pages_arg arg, int nr)
>>>  
>>>  	mem_cgroup_uncharge_list(&pages_to_free);
>>>  	free_unref_page_list(&pages_to_free);
>>> +	folios->nr = 0;
>>
>> folio_batch_reinit(folios) ?
> 
> I don't really like the abstraction here.  Back to folio_batch_move_lru()
> as an example:
> 
>         for (i = 0; i < folio_batch_count(fbatch); i++) {
>                 struct folio *folio = fbatch->folios[i];
> ...
> 	}
> ...
> 	folio_batch_reinit(fbatch);
> 
> vs what I'd rather write:
> 
> 	for (i = 0; i < fbatch->nr; i++) {
> 		struct folio *folio = fbatch->folios[i];
> ...
> 	}
> ...
> 	fbatch->nr = 0;
> 
> OK, we've successfully abstracted away that there is a member of
> folio_batch called 'nr', but we still have to go poking around inside
> folio_batch to extract the folio itself.  So it's not like we've
> managed to make folio_batch a completely opaque type.  And I don't
> think that folio_batch_count() is really all that much more descriptive
> than fbatch->nr.  Indeed, I think the second one is easier to read;
> it's obviously a plain loop.
> 
> I suppose that folio_batch_count() / _reinit() are easier to grep for
> than '>nr\>' but I don't think that's a particularly useful thing to do.
> We could add abstractions to get the folio_batch_folio(fbatch, i), but
> when we start to get into something like folio_batch_remove_exceptionals()
> (and there's something similar happening in this patchset where we strip
> out the hugetlb folios), you're messing with the internal structure of
> the folio_batch so much that you may as well not bother with any kind
> of abstraction.
> 
> I'm really temped to rip out folio_batch_count() and folio_batch_reinit().
> They don't seem useful enough.

Looks like folio_batch_count() is used 71 times, and folio_batch_reinit() 0
times (after your changes). So the former will create churn to rip out. My
opinion is simply that if there is an API or something, you should use it rather
than touch the internals directly. It's just a nit from my perspective, so
interpret as you like ;-)

> 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios
  2023-09-01  4:03     ` Matthew Wilcox
@ 2023-09-01  8:15       ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-09-01  8:15 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On 01/09/2023 05:03, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:29:59PM +0100, Ryan Roberts wrote:
>> On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
>>> Most of its callees are not yet ready to accept a folio, but we know
>>> all of the pages passed in are actually folios because they're linked
>>> through ->lru.
>>> @@ -2469,17 +2469,17 @@ void free_unref_page(struct page *page, unsigned int order)
>>>  void free_unref_page_list(struct list_head *list)
>>
>> Given the function has *page* in the title and this conversion to folios
>> internally doesn't actually change any behaviour, I don't personally see a lot
>> of value in doing this conversion. But if the aim is that everything eventually
>> becomes a folio, then fair enough.
> 
> It makes the next patch _way_ more obvious ;-)  If you merge the two
> patches together, almost every line changes.

Yep, agreed now that I have the context of the whole series!


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list()
  2023-08-25 13:59 ` [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
@ 2023-09-04  3:43   ` Matthew Wilcox
  2024-01-05 17:00     ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-04  3:43 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman

On Fri, Aug 25, 2023 at 02:59:15PM +0100, Matthew Wilcox (Oracle) wrote:
> Use free_unref_page_batch() to free the folios.  This may increase
> the numer of IPIs from calling try_to_unmap_flush() more often,
> but that's going to be very workload-dependent.

I'd like to propose this as a replacement for this patch.  Queue the
mapped folios up so we can flush them all in one go.  Free the unmapped
ones, and the mapped ones after the flush.

It does change the ordering of mem_cgroup_uncharge_folios() and
the page flush.  I think that's OK.  This is only build-tested;
something has messed up my laptop and I can no longer launch VMs.

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6f13394b112e..526d5bb84622 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1706,14 +1706,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		struct pglist_data *pgdat, struct scan_control *sc,
 		struct reclaim_stat *stat, bool ignore_references)
 {
+	struct folio_batch free_folios;
 	LIST_HEAD(ret_folios);
-	LIST_HEAD(free_folios);
+	LIST_HEAD(mapped_folios);
 	LIST_HEAD(demote_folios);
 	unsigned int nr_reclaimed = 0;
 	unsigned int pgactivate = 0;
 	bool do_demote_pass;
 	struct swap_iocb *plug = NULL;
 
+	folio_batch_init(&free_folios);
 	memset(stat, 0, sizeof(*stat));
 	cond_resched();
 	do_demote_pass = can_demote(pgdat->node_id, sc);
@@ -1723,7 +1725,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		struct address_space *mapping;
 		struct folio *folio;
 		enum folio_references references = FOLIOREF_RECLAIM;
-		bool dirty, writeback;
+		bool dirty, writeback, mapped = false;
 		unsigned int nr_pages;
 
 		cond_resched();
@@ -1957,6 +1959,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 					stat->nr_lazyfree_fail += nr_pages;
 				goto activate_locked;
 			}
+			mapped = true;
 		}
 
 		/*
@@ -2111,14 +2114,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 */
 		nr_reclaimed += nr_pages;
 
-		/*
-		 * Is there need to periodically free_folio_list? It would
-		 * appear not as the counts should be low
-		 */
-		if (unlikely(folio_test_large(folio)))
-			destroy_large_folio(folio);
-		else
-			list_add(&folio->lru, &free_folios);
+		if (mapped) {
+			list_add(&folio->lru, &mapped_folios);
+		} else if (folio_batch_add(&free_folios, folio) == 0) {
+			mem_cgroup_uncharge_folios(&free_folios);
+			free_unref_folios(&free_folios);
+		}
 		continue;
 
 activate_locked_split:
@@ -2182,9 +2183,22 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 
 	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
 
-	mem_cgroup_uncharge_list(&free_folios);
 	try_to_unmap_flush();
-	free_unref_page_list(&free_folios);
+	while (!list_empty(&mapped_folios)) {
+		struct folio *folio = list_first_entry(&mapped_folios,
+					struct folio, lru);
+
+		list_del(&folio->lru);
+		if (folio_batch_add(&free_folios, folio) > 0)
+			continue;
+		mem_cgroup_uncharge_folios(&free_folios);
+		free_unref_folios(&free_folios);
+	}
+
+	if (free_folios.nr) {
+		mem_cgroup_uncharge_folios(&free_folios);
+		free_unref_folios(&free_folios);
+	}
 
 	list_splice(&ret_folios, folio_list);
 	count_vm_events(PGACTIVATE, pgactivate);


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
                   ` (17 preceding siblings ...)
  2023-08-30 18:50 ` [RFC PATCH 18/18] mm: Add pfn_range_put() Matthew Wilcox (Oracle)
@ 2023-09-04 13:25 ` Ryan Roberts
  2023-09-05 13:15   ` Matthew Wilcox
  18 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-09-04 13:25 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), linux-mm

On 25/08/2023 14:59, Matthew Wilcox (Oracle) wrote:
> Other than the obvious "remove calls to compound_head" changes, the
> fundamental belief here is that iterating a linked list is much slower
> than iterating an array (5-15x slower in my testing).  There's also
> an associated belief that since we iterate the batch of folios three
> times, we do better when the array is small (ie 15 entries) than we do
> with a batch that is hundreds of entries long, which only gives us the
> opportunity for the first pages to fall out of cache by the time we get
> to the end.
> 
> The one place where that probably falls down is "Free folios in a batch
> in shrink_folio_list()" where we'll flush the TLB once per batch instead
> of at the end.  That's going to take some benchmarking.
> 
> Matthew Wilcox (Oracle) (14):
>   mm: Make folios_put() the basis of release_pages()
>   mm: Convert free_unref_page_list() to use folios
>   mm: Add free_unref_folios()
>   mm: Use folios_put() in __folio_batch_release()
>   memcg: Add mem_cgroup_uncharge_folios()
>   mm: Remove use of folio list from folios_put()
>   mm: Use free_unref_folios() in put_pages_list()
>   mm: use __page_cache_release() in folios_put()
>   mm: Handle large folios in free_unref_folios()
>   mm: Allow non-hugetlb large folios to be batch processed
>   mm: Free folios in a batch in shrink_folio_list()
>   mm: Free folios directly in move_folios_to_lru()
>   memcg: Remove mem_cgroup_uncharge_list()
>   mm: Remove free_unref_page_list()
> 
>  include/linux/memcontrol.h |  24 ++---
>  include/linux/mm.h         |  19 +---
>  mm/internal.h              |   4 +-
>  mm/memcontrol.c            |  16 ++--
>  mm/mlock.c                 |   3 +-
>  mm/page_alloc.c            |  74 ++++++++-------
>  mm/swap.c                  | 180 ++++++++++++++++++++-----------------
>  mm/vmscan.c                |  51 +++++------
>  8 files changed, 181 insertions(+), 190 deletions(-)
> 

Hi Matthew,

I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?


--->8---

[  205.942771] ================================================================================
[  205.951201] UBSAN: array-index-out-of-bounds in mm/page_alloc.c:668:46
[  205.957716] index 10283 is out of range for type 'list_head [6]'
[  205.963774] ================================================================================
[  205.972200] Unable to handle kernel paging request at virtual address 006808190c06670f
[  205.980103] Mem abort info:
[  205.982884]   ESR = 0x0000000096000044
[  205.986620]   EC = 0x25: DABT (current EL), IL = 32 bits
[  205.991918]   SET = 0, FnV = 0
[  205.994959]   EA = 0, S1PTW = 0
[  205.998088]   FSC = 0x04: level 0 translation fault
[  206.002952] Data abort info:
[  206.005819]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[  206.011291]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  206.016329]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  206.021627] [006808190c06670f] address between user and kernel address ranges
[  206.028749] Internal error: Oops: 0000000096000044 [#1] SMP
[  206.034310] Modules linked in: nfs lockd grace sunrpc fscache netfs nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv_intf crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce tls nvme psample nvme_core aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  206.075579] CPU: 43 PID: 159703 Comm: git Not tainted 6.5.0-rc4-ryarob01-all #1
[  206.082875] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[  206.095638] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  206.102587] pc : free_pcppages_bulk+0x330/0x7f8
[  206.107104] lr : free_pcppages_bulk+0x7a8/0x7f8
[  206.111622] sp : ffff8000aeef3680
[  206.114923] x29: ffff8000aeef3680 x28: 000000000000282b x27: 00000000000000fc
[  206.122046] x26: 000000008015a39a x25: ffff08181ef9e840 x24: ffff0818836caf80
[  206.129169] x23: 0000000000000001 x22: 0000000000000000 x21: ffff08181ef9e850
[  206.136292] x20: fffffc200368e680 x19: fffffc200368e6c0 x18: 0000000000000000
[  206.143414] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[  206.150537] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
[  206.157659] x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : fffffc200368e688
[  206.164782] x8 : fffffc200368e680 x7 : 205d343737333639 x6 : ffff08181dee0000
[  206.171904] x5 : ffff0818836caf80 x4 : 0000000000000000 x3 : 0000000000000001
[  206.179027] x2 : ffff0818836f3330 x1 : ffff0818836f3230 x0 : 006808190c066707
[  206.186149] Call trace:
[  206.188582]  free_pcppages_bulk+0x330/0x7f8
[  206.192752]  free_unref_page_commit+0x15c/0x250
[  206.197270]  free_unref_folios+0x37c/0x4a8
[  206.201353]  release_unref_folios+0xac/0xf8
[  206.205524]  folios_put+0xe0/0x1f0
[  206.208914]  __folio_batch_release+0x34/0x88
[  206.213170]  truncate_inode_pages_range+0x160/0x540
[  206.218035]  truncate_inode_pages_final+0x58/0x90
[  206.222726]  ext4_evict_inode+0x164/0x900
[  206.226722]  evict+0xac/0x160
[  206.229678]  iput+0x170/0x228
[  206.232633]  do_unlinkat+0x1d0/0x290
[  206.236196]  __arm64_sys_unlinkat+0x48/0x98
[  206.240367]  invoke_syscall+0x74/0xf8
[  206.244016]  el0_svc_common.constprop.0+0x58/0x130
[  206.248793]  do_el0_svc+0x40/0xa8
[  206.252095]  el0_svc+0x2c/0xb8
[  206.255137]  el0t_64_sync_handler+0xc0/0xc8
[  206.259308]  el0t_64_sync+0x1a8/0x1b0
[  206.262958] Code: 8b0110a1 8b050305 8b010301 f9408020 (f9000409) 
[  206.269039] ---[ end trace 0000000000000000 ]---

--->8---


UBSAN is complaining about migratetype being out of range here:

/* Used for pages not on another list */
static inline void add_to_free_list(struct page *page, struct zone *zone,
				    unsigned int order, int migratetype)
{
	struct free_area *area = &zone->free_area[order];

	list_add(&page->buddy_list, &area->free_list[migratetype]);
	area->nr_free++;
}

And I think that is called from __free_one_page(), which is called from free_pcppages_bulk() at the top of the stack trace. migratetype originates from get_pcppage_migratetype(page), which is page->index. But I can't see where this might be getting corrupted, or how yours or my changes could affect this.




Config: This particular config actually has my large anon folios and mmu_gather series as well as this series from you. Although this issue is happening for file-backed memory, so I'm hoping its not related to LAF.

I have modified your series in a couple of relevant ways though:

 - I'm using `pcp_allowed_order(order)` instead of direct compare to PAGE_ALLOC_COSTLY_ORDER in free_unref_folios() (as I suggested in the review).

 - I've refactored folios_put() as you suggested in another thread to make implementation of my free_folio_regions_and_swap_cache() simpler:


static void release_unref_folios(struct folio_batch *folios)
{
	struct lruvec *lruvec = NULL;
	unsigned long flags = 0;
	int i;

	for (i = 0; i < folios->nr; i++)
		__page_cache_release(folios->folios[i], &lruvec, &flags);

	if (lruvec)
		unlock_page_lruvec_irqrestore(lruvec, flags);

	mem_cgroup_uncharge_folios(folios);
	free_unref_folios(folios);
}

/**
 * free_folio_regions_and_swap_cache - free swap cache and put refs
 * @folios: array of `struct folio_region`s to release
 * @nr: number of folio regions in array
 *
 * Each `struct folio_region` describes the start and end pfn of a region within
 * a single folio. The folio's swap cache is freed, then the folio reference
 * count is decremented once for each pfn in the region. If it falls to zero,
 * remove the folio from the LRU and free it.
 */
void free_folio_regions_and_swap_cache(struct folio_region *folios, int nr)
{
	struct folio_batch fbatch;
	unsigned int i;

	folio_batch_init(&fbatch);
	lru_add_drain();

	for (i = 0; i < nr; i++) {
		struct folio *folio = pfn_folio(folios[i].pfn_start);
		int refs = folios[i].pfn_end - folios[i].pfn_start;

		free_swap_cache(folio);

		if (is_huge_zero_page(&folio->page))
			continue;

		if (folio_is_zone_device(folio)) {
			if (put_devmap_managed_page_refs(&folio->page, refs))
				continue;
			if (folio_ref_sub_and_test(folio, refs))
				free_zone_device_page(&folio->page);
			continue;
		}

		if (!folio_ref_sub_and_test(folio, refs))
			continue;

		/* hugetlb has its own memcg */
		if (folio_test_hugetlb(folio)) {
			free_huge_folio(folio);
			continue;
		}

		if (folio_batch_add(&fbatch, folio) == 0)
			release_unref_folios(&fbatch);
	}

	if (fbatch.nr)
		release_unref_folios(&fbatch);
}

/**
 * folios_put - Decrement the reference count on a batch of folios.
 * @folios: The folios.
 *
 * Like folio_put(), but for a batch of folios.  This is more efficient
 * than writing the loop yourself as it will optimise the locks which need
 * to be taken if the folios are freed.  The folios batch is returned
 * empty and ready to be reused for another batch; there is no need to
 * reinitialise it.
 *
 * Context: May be called in process or interrupt context, but not in NMI
 * context.  May be called while holding a spinlock.
 */
void folios_put(struct folio_batch *folios)
{
	int i, j;

	for (i = 0, j = 0; i < folios->nr; i++) {
		struct folio *folio = folios->folios[i];

		if (is_huge_zero_page(&folio->page))
			continue;

		if (folio_is_zone_device(folio)) {
			if (put_devmap_managed_page(&folio->page))
				continue;
			if (folio_put_testzero(folio))
				free_zone_device_page(&folio->page);
			continue;
		}

		if (!folio_put_testzero(folio))
			continue;

		/* hugetlb has its own memcg */
		if (folio_test_hugetlb(folio)) {
			free_huge_folio(folio);
			continue;
		}

		if (j != i)
			folios->folios[j] = folio;
		j++;
	}

	folios->nr = j;
	if (!j)
		return;
	release_unref_folios(folios);
}
EXPORT_SYMBOL(folios_put);


Let me know if you have any thoughts.

Thanks,
Ryan




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-09-04 13:25 ` [RFC PATCH 00/14] Rearrange batched folio freeing Ryan Roberts
@ 2023-09-05 13:15   ` Matthew Wilcox
  2023-09-05 13:26     ` Ryan Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-05 13:15 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Mon, Sep 04, 2023 at 02:25:41PM +0100, Ryan Roberts wrote:
> I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?

I didn't hit that with my testing.  Admittedly I was using xfs rather
than ext4, but ...

>  UBSAN: array-index-out-of-bounds in mm/page_alloc.c:668:46
>  index 10283 is out of range for type 'list_head [6]'
>  pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : free_pcppages_bulk+0x330/0x7f8
>  lr : free_pcppages_bulk+0x7a8/0x7f8
>  sp : ffff8000aeef3680
>  x29: ffff8000aeef3680 x28: 000000000000282b x27: 00000000000000fc
>  x26: 000000008015a39a x25: ffff08181ef9e840 x24: ffff0818836caf80
>  x23: 0000000000000001 x22: 0000000000000000 x21: ffff08181ef9e850
>  x20: fffffc200368e680 x19: fffffc200368e6c0 x18: 0000000000000000
>  x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
>  x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
>  x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : fffffc200368e688
>  x8 : fffffc200368e680 x7 : 205d343737333639 x6 : ffff08181dee0000
>  x5 : ffff0818836caf80 x4 : 0000000000000000 x3 : 0000000000000001
>  x2 : ffff0818836f3330 x1 : ffff0818836f3230 x0 : 006808190c066707
>  Call trace:
>   free_pcppages_bulk+0x330/0x7f8
>   free_unref_page_commit+0x15c/0x250
>   free_unref_folios+0x37c/0x4a8
>   release_unref_folios+0xac/0xf8
>   folios_put+0xe0/0x1f0
>   __folio_batch_release+0x34/0x88
>   truncate_inode_pages_range+0x160/0x540
>   truncate_inode_pages_final+0x58/0x90
>   ext4_evict_inode+0x164/0x900
>   evict+0xac/0x160
>   iput+0x170/0x228
>   do_unlinkat+0x1d0/0x290
>   __arm64_sys_unlinkat+0x48/0x98
> 
> UBSAN is complaining about migratetype being out of range here:
> 
> /* Used for pages not on another list */
> static inline void add_to_free_list(struct page *page, struct zone *zone,
> 				    unsigned int order, int migratetype)
> {
> 	struct free_area *area = &zone->free_area[order];
> 
> 	list_add(&page->buddy_list, &area->free_list[migratetype]);
> 	area->nr_free++;
> }
> 
> And I think that is called from __free_one_page(), which is called
> from free_pcppages_bulk() at the top of the stack trace. migratetype
> originates from get_pcppage_migratetype(page), which is page->index. But
> I can't see where this might be getting corrupted, or how yours or my
> changes could affect this.

Agreed with your analysis.

My best guess is that page->index still contains the file index from
when this page was in the page cache instead of being overwritten with
the migratetype.  This is ext4, so large folios aren't in use.

I'll look more later, but I don't immediately see the problem.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-09-05 13:15   ` Matthew Wilcox
@ 2023-09-05 13:26     ` Ryan Roberts
  2023-09-05 14:00       ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Ryan Roberts @ 2023-09-05 13:26 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On 05/09/2023 14:15, Matthew Wilcox wrote:
> On Mon, Sep 04, 2023 at 02:25:41PM +0100, Ryan Roberts wrote:
>> I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?
> 
> I didn't hit that with my testing.  Admittedly I was using xfs rather
> than ext4, but ...

I've only seen it once.

I have a bit of a hybrid setup - my rootfs is xfs (and using large folios), but
the linux tree (which is being built during the benchmark) is on an ext4
partition. Large anon folios is enabled in this config, so there will be plenty
of large folios in the system.

I'm not sure if the fact that this fired from the ext4 path is too relevant -
the page with the dodgy index is already on the PCP list so may or may not be large.

> 
>>  UBSAN: array-index-out-of-bounds in mm/page_alloc.c:668:46
>>  index 10283 is out of range for type 'list_head [6]'
>>  pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>  pc : free_pcppages_bulk+0x330/0x7f8
>>  lr : free_pcppages_bulk+0x7a8/0x7f8
>>  sp : ffff8000aeef3680
>>  x29: ffff8000aeef3680 x28: 000000000000282b x27: 00000000000000fc
>>  x26: 000000008015a39a x25: ffff08181ef9e840 x24: ffff0818836caf80
>>  x23: 0000000000000001 x22: 0000000000000000 x21: ffff08181ef9e850
>>  x20: fffffc200368e680 x19: fffffc200368e6c0 x18: 0000000000000000
>>  x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
>>  x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
>>  x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : fffffc200368e688
>>  x8 : fffffc200368e680 x7 : 205d343737333639 x6 : ffff08181dee0000
>>  x5 : ffff0818836caf80 x4 : 0000000000000000 x3 : 0000000000000001
>>  x2 : ffff0818836f3330 x1 : ffff0818836f3230 x0 : 006808190c066707
>>  Call trace:
>>   free_pcppages_bulk+0x330/0x7f8
>>   free_unref_page_commit+0x15c/0x250
>>   free_unref_folios+0x37c/0x4a8
>>   release_unref_folios+0xac/0xf8
>>   folios_put+0xe0/0x1f0
>>   __folio_batch_release+0x34/0x88
>>   truncate_inode_pages_range+0x160/0x540
>>   truncate_inode_pages_final+0x58/0x90
>>   ext4_evict_inode+0x164/0x900
>>   evict+0xac/0x160
>>   iput+0x170/0x228
>>   do_unlinkat+0x1d0/0x290
>>   __arm64_sys_unlinkat+0x48/0x98
>>
>> UBSAN is complaining about migratetype being out of range here:
>>
>> /* Used for pages not on another list */
>> static inline void add_to_free_list(struct page *page, struct zone *zone,
>> 				    unsigned int order, int migratetype)
>> {
>> 	struct free_area *area = &zone->free_area[order];
>>
>> 	list_add(&page->buddy_list, &area->free_list[migratetype]);
>> 	area->nr_free++;
>> }
>>
>> And I think that is called from __free_one_page(), which is called
>> from free_pcppages_bulk() at the top of the stack trace. migratetype
>> originates from get_pcppage_migratetype(page), which is page->index. But
>> I can't see where this might be getting corrupted, or how yours or my
>> changes could affect this.
> 
> Agreed with your analysis.
> 
> My best guess is that page->index still contains the file index from
> when this page was in the page cache instead of being overwritten with
> the migratetype.  

Yeah that was my guess too. But I couldn't see how that was possible. So started
thinking it could be some separate corruption somehow...

> This is ext4, so large folios aren't in use.
> 
> I'll look more later, but I don't immediately see the problem.
> 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-09-05 13:26     ` Ryan Roberts
@ 2023-09-05 14:00       ` Matthew Wilcox
  2023-09-06  3:48         ` Matthew Wilcox
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-05 14:00 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Tue, Sep 05, 2023 at 02:26:54PM +0100, Ryan Roberts wrote:
> On 05/09/2023 14:15, Matthew Wilcox wrote:
> > On Mon, Sep 04, 2023 at 02:25:41PM +0100, Ryan Roberts wrote:
> >> I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?
> > 
> > I didn't hit that with my testing.  Admittedly I was using xfs rather
> > than ext4, but ...
> 
> I've only seen it once.
> 
> I have a bit of a hybrid setup - my rootfs is xfs (and using large folios), but
> the linux tree (which is being built during the benchmark) is on an ext4
> partition. Large anon folios is enabled in this config, so there will be plenty
> of large folios in the system.
> 
> I'm not sure if the fact that this fired from the ext4 path is too relevant -
> the page with the dodgy index is already on the PCP list so may or may not be large.

Indeed.  I have a suspicion that this may be more common, but if pages
are commonly freed to and allocated from the PCP list without ever being
transferred to the free list, we'll never see it.  Perhaps adding a
check when pages are added to the PCP list that page->index is less
than 8 would catch the miscreant relatively quickly?

> > My best guess is that page->index still contains the file index from
> > when this page was in the page cache instead of being overwritten with
> > the migratetype.  
> 
> Yeah that was my guess too. But I couldn't see how that was possible. So started
> thinking it could be some separate corruption somehow...

It could be, but a value around 10,000 would put the page at being
offset 40MB into the file, which is entirely plausible.  But yes, it
could have been some entirely separate path that freed it.

VM_BUG_ON_PAGE() would dump the entire struct page which will give us
a lot more clues including the order of the page.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-09-05 14:00       ` Matthew Wilcox
@ 2023-09-06  3:48         ` Matthew Wilcox
  2023-09-06 10:23           ` Ryan Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Matthew Wilcox @ 2023-09-06  3:48 UTC (permalink / raw)
  To: Ryan Roberts; +Cc: linux-mm

On Tue, Sep 05, 2023 at 03:00:51PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 05, 2023 at 02:26:54PM +0100, Ryan Roberts wrote:
> > On 05/09/2023 14:15, Matthew Wilcox wrote:
> > > On Mon, Sep 04, 2023 at 02:25:41PM +0100, Ryan Roberts wrote:
> > >> I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?
> > > 
> > > I didn't hit that with my testing.  Admittedly I was using xfs rather
> > > than ext4, but ...
> > 
> > I've only seen it once.
> > 
> > I have a bit of a hybrid setup - my rootfs is xfs (and using large folios), but
> > the linux tree (which is being built during the benchmark) is on an ext4
> > partition. Large anon folios is enabled in this config, so there will be plenty
> > of large folios in the system.
> > 
> > I'm not sure if the fact that this fired from the ext4 path is too relevant -
> > the page with the dodgy index is already on the PCP list so may or may not be large.
> 
> Indeed.  I have a suspicion that this may be more common, but if pages
> are commonly freed to and allocated from the PCP list without ever being
> transferred to the free list, we'll never see it.  Perhaps adding a
> check when pages are added to the PCP list that page->index is less
> than 8 would catch the miscreant relatively quickly?

Somehow my qemu setup started working again.  This stuff is black magic.

Anyway, I did this:

+++ b/mm/page_alloc.c
@@ -2405,6 +2405,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,

        __count_vm_events(PGFREE, 1 << order);
        pindex = order_to_pindex(migratetype, order);
+       VM_BUG_ON_PAGE(page->index > 7, page);
        list_add(&page->pcp_list, &pcp->lists[pindex]);
        pcp->count += 1 << order;


but I haven't caught a wascally wabbit yet after an hour of running
xfstests.  I think that's the only place we add a page to the
pcp->lists.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 00/14] Rearrange batched folio freeing
  2023-09-06  3:48         ` Matthew Wilcox
@ 2023-09-06 10:23           ` Ryan Roberts
  0 siblings, 0 replies; 49+ messages in thread
From: Ryan Roberts @ 2023-09-06 10:23 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm

On 06/09/2023 04:48, Matthew Wilcox wrote:
> On Tue, Sep 05, 2023 at 03:00:51PM +0100, Matthew Wilcox wrote:
>> On Tue, Sep 05, 2023 at 02:26:54PM +0100, Ryan Roberts wrote:
>>> On 05/09/2023 14:15, Matthew Wilcox wrote:
>>>> On Mon, Sep 04, 2023 at 02:25:41PM +0100, Ryan Roberts wrote:
>>>>> I've been doing some benchmarking of this series, as promised, but have hit an oops. It doesn't appear to be easily reproducible, and I'm struggling to figure out the root cause, so thought I would share in case you have any ideas?
>>>>
>>>> I didn't hit that with my testing.  Admittedly I was using xfs rather
>>>> than ext4, but ...
>>>
>>> I've only seen it once.
>>>
>>> I have a bit of a hybrid setup - my rootfs is xfs (and using large folios), but
>>> the linux tree (which is being built during the benchmark) is on an ext4
>>> partition. Large anon folios is enabled in this config, so there will be plenty
>>> of large folios in the system.
>>>
>>> I'm not sure if the fact that this fired from the ext4 path is too relevant -
>>> the page with the dodgy index is already on the PCP list so may or may not be large.
>>
>> Indeed.  I have a suspicion that this may be more common, but if pages
>> are commonly freed to and allocated from the PCP list without ever being
>> transferred to the free list, we'll never see it.  Perhaps adding a
>> check when pages are added to the PCP list that page->index is less
>> than 8 would catch the miscreant relatively quickly?
> 
> Somehow my qemu setup started working again.  This stuff is black magic.
> 
> Anyway, I did this:
> 
> +++ b/mm/page_alloc.c
> @@ -2405,6 +2405,7 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp,
> 
>         __count_vm_events(PGFREE, 1 << order);
>         pindex = order_to_pindex(migratetype, order);
> +       VM_BUG_ON_PAGE(page->index > 7, page);
>         list_add(&page->pcp_list, &pcp->lists[pindex]);
>         pcp->count += 1 << order;
> 
> 
> but I haven't caught a wascally wabbit yet after an hour of running
> xfstests.  I think that's the only place we add a page to the
> pcp->lists.


I added a smattering of VM_BUG_ON_PAGE(page->index > 5, page) to the places where the page is added and removed from the pcp lists. And one triggered on removing the page from the list (the same place I saw the UBSAN oops previously). But there is no page info dumped! I've enabled CONFIG_DEBUG_VM (and friends). I can't see how its possible to get the BUG report but not the dump_page() bit - what am I doing wrong?

Anyway, the fact that it did not trigger on insertion into the list suggests this is a corruption issue? I'll keep trying...



[  334.307831] kernel BUG at mm/page_alloc.c:1217!
[  334.312351] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[  334.318433] Modules linked in: nfs lockd grace sunrpc fscache netfs nls_iso8859_1 scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core mlx5_core mlxfw pci_hyperv_intf crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce tls nvme psample nvme_core aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
[  334.359704] CPU: 26 PID: 260858 Comm: git Not tainted 6.5.0-rc4-ryarob01-all-debug #1
[  334.367521] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18
[  334.380285] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  334.387233] pc : free_pcppages_bulk+0x1b0/0x2d0
[  334.391753] lr : free_pcppages_bulk+0x1b0/0x2d0
[  334.396270] sp : ffff80008fe9b810
[  334.399571] x29: ffff80008fe9b810 x28: 0000000000000001 x27: 00000000000000e7
[  334.406694] x26: fffffc2007e07780 x25: ffff08181ed8f840 x24: ffff08181ed8f868
[  334.413817] x23: 0000000000000000 x22: ffff0818836caf80 x21: ffff800081bbf008
[  334.420939] x20: 0000000000000001 x19: ffff08181ed8f850 x18: 0000000000000000
[  334.428061] x17: 6666373066666666 x16: 2066666666666666 x15: 6632303030303030
[  334.435184] x14: 0000000000000000 x13: 2935203e20746d28 x12: 454741505f4e4f5f
[  334.442306] x11: 4755425f4d56203a x10: 6573756163656220 x9 : ffff80008014ef40
[  334.449429] x8 : 5f4d56203a657375 x7 : 6163656220646570 x6 : ffff08181dee0000
[  334.456551] x5 : ffff08181ed78d88 x4 : 0000000000000000 x3 : ffff80008fe9b538
[  334.463674] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000002b
[  334.470796] Call trace:
[  334.473230]  free_pcppages_bulk+0x1b0/0x2d0
[  334.477401]  free_unref_page_commit+0x124/0x2a8
[  334.481918]  free_unref_folios+0x3b4/0x4e8
[  334.486003]  release_unref_folios+0xac/0xf8
[  334.490175]  folios_put+0x100/0x228
[  334.493651]  __folio_batch_release+0x34/0x88
[  334.497908]  truncate_inode_pages_range+0x168/0x690
[  334.502773]  truncate_inode_pages_final+0x58/0x90
[  334.507464]  ext4_evict_inode+0x164/0x900
[  334.511463]  evict+0xac/0x160
[  334.514419]  iput+0x170/0x228
[  334.517375]  do_unlinkat+0x1d0/0x290
[  334.520938]  __arm64_sys_unlinkat+0x48/0x98
[  334.525108]  invoke_syscall+0x74/0xf8
[  334.528758]  el0_svc_common.constprop.0+0x58/0x130
[  334.533536]  do_el0_svc+0x40/0xa8
[  334.536837]  el0_svc+0x2c/0xb8
[  334.539881]  el0t_64_sync_handler+0xc0/0xc8
[  334.544052]  el0t_64_sync+0x1a8/0x1b0
[  334.547703] Code: aa1a03e0 90009dc1 91072021 97ff1097 (d4210000) 
[  334.553783] ---[ end trace 0000000000000000 ]---


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list()
  2023-09-04  3:43   ` Matthew Wilcox
@ 2024-01-05 17:00     ` Matthew Wilcox
  0 siblings, 0 replies; 49+ messages in thread
From: Matthew Wilcox @ 2024-01-05 17:00 UTC (permalink / raw)
  To: linux-mm; +Cc: Mel Gorman

On Mon, Sep 04, 2023 at 04:43:22AM +0100, Matthew Wilcox wrote:
> On Fri, Aug 25, 2023 at 02:59:15PM +0100, Matthew Wilcox (Oracle) wrote:
> > Use free_unref_page_batch() to free the folios.  This may increase
> > the numer of IPIs from calling try_to_unmap_flush() more often,
> > but that's going to be very workload-dependent.
> 
> I'd like to propose this as a replacement for this patch.  Queue the
> mapped folios up so we can flush them all in one go.  Free the unmapped
> ones, and the mapped ones after the flush.

Any reaction to this patch?  I'm putting together a v2 for posting after
the merge window, and I got no feedback on whether the former version
or this one is better.

> It does change the ordering of mem_cgroup_uncharge_folios() and
> the page flush.  I think that's OK.  This is only build-tested;
> something has messed up my laptop and I can no longer launch VMs.
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6f13394b112e..526d5bb84622 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1706,14 +1706,16 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		struct pglist_data *pgdat, struct scan_control *sc,
>  		struct reclaim_stat *stat, bool ignore_references)
>  {
> +	struct folio_batch free_folios;
>  	LIST_HEAD(ret_folios);
> -	LIST_HEAD(free_folios);
> +	LIST_HEAD(mapped_folios);
>  	LIST_HEAD(demote_folios);
>  	unsigned int nr_reclaimed = 0;
>  	unsigned int pgactivate = 0;
>  	bool do_demote_pass;
>  	struct swap_iocb *plug = NULL;
>  
> +	folio_batch_init(&free_folios);
>  	memset(stat, 0, sizeof(*stat));
>  	cond_resched();
>  	do_demote_pass = can_demote(pgdat->node_id, sc);
> @@ -1723,7 +1725,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		struct address_space *mapping;
>  		struct folio *folio;
>  		enum folio_references references = FOLIOREF_RECLAIM;
> -		bool dirty, writeback;
> +		bool dirty, writeback, mapped = false;
>  		unsigned int nr_pages;
>  
>  		cond_resched();
> @@ -1957,6 +1959,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  					stat->nr_lazyfree_fail += nr_pages;
>  				goto activate_locked;
>  			}
> +			mapped = true;
>  		}
>  
>  		/*
> @@ -2111,14 +2114,12 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		 */
>  		nr_reclaimed += nr_pages;
>  
> -		/*
> -		 * Is there need to periodically free_folio_list? It would
> -		 * appear not as the counts should be low
> -		 */
> -		if (unlikely(folio_test_large(folio)))
> -			destroy_large_folio(folio);
> -		else
> -			list_add(&folio->lru, &free_folios);
> +		if (mapped) {
> +			list_add(&folio->lru, &mapped_folios);
> +		} else if (folio_batch_add(&free_folios, folio) == 0) {
> +			mem_cgroup_uncharge_folios(&free_folios);
> +			free_unref_folios(&free_folios);
> +		}
>  		continue;
>  
>  activate_locked_split:
> @@ -2182,9 +2183,22 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  
>  	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
>  
> -	mem_cgroup_uncharge_list(&free_folios);
>  	try_to_unmap_flush();
> -	free_unref_page_list(&free_folios);
> +	while (!list_empty(&mapped_folios)) {
> +		struct folio *folio = list_first_entry(&mapped_folios,
> +					struct folio, lru);
> +
> +		list_del(&folio->lru);
> +		if (folio_batch_add(&free_folios, folio) > 0)
> +			continue;
> +		mem_cgroup_uncharge_folios(&free_folios);
> +		free_unref_folios(&free_folios);
> +	}
> +
> +	if (free_folios.nr) {
> +		mem_cgroup_uncharge_folios(&free_folios);
> +		free_unref_folios(&free_folios);
> +	}
>  
>  	list_splice(&ret_folios, folio_list);
>  	count_vm_events(PGACTIVATE, pgactivate);
> 


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-01-05 17:00 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-25 13:59 [RFC PATCH 00/14] Rearrange batched folio freeing Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 01/14] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2023-08-31 14:21   ` Ryan Roberts
2023-09-01  3:58     ` Matthew Wilcox
2023-09-01  8:14       ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 02/14] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2023-08-31 14:29   ` Ryan Roberts
2023-09-01  4:03     ` Matthew Wilcox
2023-09-01  8:15       ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 03/14] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2023-08-31 14:39   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 04/14] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2023-08-31 14:41   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 05/14] memcg: Add mem_cgroup_uncharge_folios() Matthew Wilcox (Oracle)
2023-08-31 14:49   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 06/14] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2023-08-31 14:53   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 07/14] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 08/14] mm: use __page_cache_release() in folios_put() Matthew Wilcox (Oracle)
2023-08-25 13:59 ` [RFC PATCH 09/14] mm: Handle large folios in free_unref_folios() Matthew Wilcox (Oracle)
2023-08-31 15:21   ` Ryan Roberts
2023-09-01  4:09     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 10/14] mm: Allow non-hugetlb large folios to be batch processed Matthew Wilcox (Oracle)
2023-08-31 15:28   ` Ryan Roberts
2023-09-01  4:10     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 11/14] mm: Free folios in a batch in shrink_folio_list() Matthew Wilcox (Oracle)
2023-09-04  3:43   ` Matthew Wilcox
2024-01-05 17:00     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 12/14] mm: Free folios directly in move_folios_to_lru() Matthew Wilcox (Oracle)
2023-08-31 15:46   ` Ryan Roberts
2023-09-01  4:16     ` Matthew Wilcox
2023-08-25 13:59 ` [RFC PATCH 13/14] memcg: Remove mem_cgroup_uncharge_list() Matthew Wilcox (Oracle)
2023-08-31 18:26   ` Ryan Roberts
2023-08-25 13:59 ` [RFC PATCH 14/14] mm: Remove free_unref_page_list() Matthew Wilcox (Oracle)
2023-08-31 18:27   ` Ryan Roberts
2023-08-30 18:50 ` [RFC PATCH 15/18] mm: Convert free_pages_and_swap_cache() to use folios_put() Matthew Wilcox (Oracle)
2023-08-30 18:50 ` [RFC PATCH 16/18] mm: Use a folio in __collapse_huge_page_copy_succeeded() Matthew Wilcox (Oracle)
2023-08-30 18:50 ` [RFC PATCH 17/18] mm: Convert free_swap_cache() to take a folio Matthew Wilcox (Oracle)
2023-08-31 18:49   ` Ryan Roberts
2023-08-30 18:50 ` [RFC PATCH 18/18] mm: Add pfn_range_put() Matthew Wilcox (Oracle)
2023-08-31 19:03   ` Ryan Roberts
2023-09-01  4:27     ` Matthew Wilcox
2023-09-01  7:59       ` Ryan Roberts
2023-09-04 13:25 ` [RFC PATCH 00/14] Rearrange batched folio freeing Ryan Roberts
2023-09-05 13:15   ` Matthew Wilcox
2023-09-05 13:26     ` Ryan Roberts
2023-09-05 14:00       ` Matthew Wilcox
2023-09-06  3:48         ` Matthew Wilcox
2023-09-06 10:23           ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).