linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*()
@ 2024-12-03  9:47 David Hildenbrand
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

For some reason v1 didn't make it to linux-mm; same with v2. Let's try
a different SMTP server (gmail), maybe there are some issues with the
RH one ...

Let's clean up the gfp flags handling, and support __GFP_ZERO, such that we
can finally remove the TODO in memtrace code.

I did some alloc_contig_*() testing with virtio-mem and hugetlb; I did not
test powernv/memtrace -- I cross-compiled it, though.

v1 -> v2:
* "mm/page_alloc: forward the gfp flags from alloc_contig_range() to
   post_alloc_hook()"
 -> Fixup patch description

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>

David Hildenbrand (6):
  mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()
  mm/page_isolation: don't pass gfp flags to start_isolate_page_range()
  mm/page_alloc: make __alloc_contig_migrate_range() static
  mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  mm/page_alloc: forward the gfp flags from alloc_contig_range() to
    post_alloc_hook()
  powernv/memtrace: use __GFP_ZERO with alloc_contig_pages()

 arch/powerpc/platforms/powernv/memtrace.c | 31 +++---------
 include/linux/page-isolation.h            |  2 +-
 mm/internal.h                             |  4 --
 mm/memory_hotplug.c                       |  3 +-
 mm/page_alloc.c                           | 62 +++++++++++++++++++----
 mm/page_isolation.c                       | 12 ++---
 6 files changed, 63 insertions(+), 51 deletions(-)


base-commit: 679694cdccaf75df589c2737f233954669a5f601
-- 
2.47.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 13:31   ` Vlastimil Babka
                     ` (2 more replies)
  2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

The flags are no longer used, we can stop passing them to
isolate_single_pageblock().

Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_isolation.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 7e04047977cf..e680d40d96de 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -286,7 +286,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * within a free or in-use page.
  * @boundary_pfn:		pageblock-aligned pfn that a page might cross
  * @flags:			isolation flags
- * @gfp_flags:			GFP flags used for migrating pages
  * @isolate_before:	isolate the pageblock before the boundary_pfn
  * @skip_isolation:	the flag to skip the pageblock isolation in second
  *			isolate_single_pageblock()
@@ -306,8 +305,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
  * the in-use page then splitting the free page.
  */
 static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
-			gfp_t gfp_flags, bool isolate_before, bool skip_isolation,
-			int migratetype)
+		bool isolate_before, bool skip_isolation, int migratetype)
 {
 	unsigned long start_pfn;
 	unsigned long isolate_pageblock;
@@ -489,7 +487,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	bool skip_isolation = false;
 
 	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
-	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false,
+	ret = isolate_single_pageblock(isolate_start, flags, false,
 			skip_isolation, migratetype);
 	if (ret)
 		return ret;
@@ -498,7 +496,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 		skip_isolation = true;
 
 	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
-	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
+	ret = isolate_single_pageblock(isolate_end, flags, true,
 			skip_isolation, migratetype);
 	if (ret) {
 		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range()
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 13:32   ` Vlastimil Babka
                     ` (2 more replies)
  2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

The parameter is unused, so let's stop passing it.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/page-isolation.h | 2 +-
 mm/memory_hotplug.c            | 3 +--
 mm/page_alloc.c                | 2 +-
 mm/page_isolation.c            | 4 +---
 4 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 73dc2c1841ec..898bb788243b 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -31,7 +31,7 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 				  int migratetype);
 
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     int migratetype, int flags, gfp_t gfp_flags);
+			     int migratetype, int flags);
 
 void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     int migratetype);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c43b4e7fb298..9b184ba064a0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1992,8 +1992,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	/* set above range as isolated */
 	ret = start_isolate_page_range(start_pfn, end_pfn,
 				       MIGRATE_MOVABLE,
-				       MEMORY_OFFLINE | REPORT_FAILURE,
-				       GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL);
+				       MEMORY_OFFLINE | REPORT_FAILURE);
 	if (ret) {
 		reason = "failure to isolate range";
 		goto failed_removal_pcplists_disabled;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cc3296cf8c95..f371fbf2145b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6451,7 +6451,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 	 * put back to page allocator so that buddy can use them.
 	 */
 
-	ret = start_isolate_page_range(start, end, migratetype, 0, gfp_mask);
+	ret = start_isolate_page_range(start, end, migratetype, 0);
 	if (ret)
 		goto done;
 
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index e680d40d96de..c608e9d72865 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -442,8 +442,6 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
  *					 and PageOffline() pages.
  *			REPORT_FAILURE - report details about the failure to
  *			isolate the range
- * @gfp_flags:		GFP flags used for migrating pages that sit across the
- *			range boundaries.
  *
  * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
  * the range will never be allocated. Any free pages and pages freed in the
@@ -476,7 +474,7 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
 int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
-			     int migratetype, int flags, gfp_t gfp_flags)
+			     int migratetype, int flags)
 {
 	unsigned long pfn;
 	struct page *page;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
  2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 13:33   ` Vlastimil Babka
                     ` (2 more replies)
  2024-12-03  9:47 ` [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess David Hildenbrand
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

The single user is in page_alloc.c.

Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/internal.h   | 4 ----
 mm/page_alloc.c | 5 ++---
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 74713b44bedb..4bd3685c33ef 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -839,10 +839,6 @@ int
 isolate_migratepages_range(struct compact_control *cc,
 			   unsigned long low_pfn, unsigned long end_pfn);
 
-int __alloc_contig_migrate_range(struct compact_control *cc,
-					unsigned long start, unsigned long end,
-					int migratetype);
-
 /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
 void init_cma_reserved_pageblock(struct page *page);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f371fbf2145b..ce7589a4ec01 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6284,9 +6284,8 @@ static void alloc_contig_dump_pages(struct list_head *page_list)
  * @migratetype: using migratetype to filter the type of migration in
  *		trace_mm_alloc_contig_migrate_range_info.
  */
-int __alloc_contig_migrate_range(struct compact_control *cc,
-					unsigned long start, unsigned long end,
-					int migratetype)
+static int __alloc_contig_migrate_range(struct compact_control *cc,
+		unsigned long start, unsigned long end, int migratetype)
 {
 	/* This function is based on compact_zone() from compaction.c. */
 	unsigned int nr_reclaimed;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
                   ` (2 preceding siblings ...)
  2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 13:55   ` Vlastimil Babka
  2024-12-04  9:00   ` Oscar Salvador
  2024-12-03  9:47 ` [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() David Hildenbrand
  2024-12-03  9:47 ` [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages() David Hildenbrand
  5 siblings, 2 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

It's all a bit complicated for alloc_contig_range(). For example, we don't
support many flags, so let's start bailing out on unsupported
ones -- ignoring the placement hints, as we are already given the range
to allocate.

While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
simply create yet another GFP mask whereby we ignore the reclaim flags
specify by the caller. That looks very inconsistent.

Let's clean it up, constructing the gfp flags used for
compaction/migration exactly once. Update the documentation of the
gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().

Acked-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 48 ++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ce7589a4ec01..54594cc4f650 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	int ret = 0;
 	struct migration_target_control mtc = {
 		.nid = zone_to_nid(cc->zone),
-		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+		.gfp_mask = cc->gfp_mask,
 		.reason = MR_CONTIG_RANGE,
 	};
 	struct page *page;
@@ -6390,6 +6390,39 @@ static void split_free_pages(struct list_head *list)
 	}
 }
 
+static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
+{
+	const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
+	const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+	const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+
+	/*
+	 * We are given the range to allocate; node, mobility and placement
+	 * hints are irrelevant at this point. We'll simply ignore them.
+	 */
+	gfp_mask &= ~(GFP_ZONEMASK | __GFP_RECLAIMABLE | __GFP_WRITE |
+		      __GFP_HARDWALL | __GFP_THISNODE | __GFP_MOVABLE);
+
+	/*
+	 * We only support most reclaim flags (but not NOFAIL/NORETRY), and
+	 * selected action flags.
+	 */
+	if (gfp_mask & ~(reclaim_mask | action_mask))
+		return -EINVAL;
+
+	/*
+	 * Flags to control page compaction/migration/reclaim, to free up our
+	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
+	 * for them.
+	 *
+	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
+	 * keep doing that to not degrade callers.
+	 */
+	*gfp_cc_mask = (gfp_mask & (reclaim_mask | cc_action_mask)) |
+			__GFP_HARDWALL | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
+	return 0;
+}
+
 /**
  * alloc_contig_range() -- tries to allocate given range of pages
  * @start:	start PFN to allocate
@@ -6398,7 +6431,9 @@ static void split_free_pages(struct list_head *list)
  *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
  *			in range must have the same migratetype and it must
  *			be either of the two.
- * @gfp_mask:	GFP mask to use during compaction
+ * @gfp_mask:	GFP mask. Node/zone/placement hints are ignored; only some
+ *		action and reclaim modifiers are supported. Reclaim modifiers
+ *		control allocation behavior during compaction/migration/reclaim.
  *
  * The PFN range does not have to be pageblock aligned. The PFN range must
  * belong to a single zone.
@@ -6424,11 +6459,14 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 		.mode = MIGRATE_SYNC,
 		.ignore_skip_hint = true,
 		.no_set_skip_hint = true,
-		.gfp_mask = current_gfp_context(gfp_mask),
 		.alloc_contig = true,
 	};
 	INIT_LIST_HEAD(&cc.migratepages);
 
+	gfp_mask = current_gfp_context(gfp_mask);
+	if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
+		return -EINVAL;
+
 	/*
 	 * What we do here is we mark all pageblocks in range as
 	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -6571,7 +6609,9 @@ static bool zone_spans_last_pfn(const struct zone *zone,
 /**
  * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
  * @nr_pages:	Number of contiguous pages to allocate
- * @gfp_mask:	GFP mask to limit search and used during compaction
+ * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
+ *		action and reclaim modifiers are supported. Reclaim modifiers
+ *		control allocation behavior during compaction/migration/reclaim.
  * @nid:	Target node
  * @nodemask:	Mask for other possible nodes
  *
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook()
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
                   ` (3 preceding siblings ...)
  2024-12-03  9:47 ` [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 14:36   ` Vlastimil Babka
  2024-12-04  9:03   ` Oscar Salvador
  2024-12-03  9:47 ` [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages() David Hildenbrand
  5 siblings, 2 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

In the __GFP_COMP case, we already pass the gfp_flags to
prep_new_page()->post_alloc_hook(). However, in the !__GFP_COMP case, we
essentially pass only hardcoded __GFP_MOVABLE to post_alloc_hook(),
preventing some action modifiers from being effective..

Let's pass our now properly adjusted gfp flags there as well.

This way, we can now support __GFP_ZERO for alloc_contig_*().

As a side effect, we now also support __GFP_SKIP_ZERO and__GFP_ZEROTAGS;
but we'll keep the more special stuff (KASAN, NOLOCKDEP) disabled for
now.

It's worth noting that with __GFP_ZERO, we might unnecessarily zero pages
when we have to release part of our range using free_contig_range() again.
This can be optimized in the future, if ever required; the caller we'll
be converting (powernv/memtrace) next won't trigger this.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/page_alloc.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 54594cc4f650..71d70bc0ad79 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6364,7 +6364,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
 	return (ret < 0) ? ret : 0;
 }
 
-static void split_free_pages(struct list_head *list)
+static void split_free_pages(struct list_head *list, gfp_t gfp_mask)
 {
 	int order;
 
@@ -6375,7 +6375,7 @@ static void split_free_pages(struct list_head *list)
 		list_for_each_entry_safe(page, next, &list[order], lru) {
 			int i;
 
-			post_alloc_hook(page, order, __GFP_MOVABLE);
+			post_alloc_hook(page, order, gfp_mask);
 			set_page_refcounted(page);
 			if (!order)
 				continue;
@@ -6393,7 +6393,8 @@ static void split_free_pages(struct list_head *list)
 static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
 {
 	const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
-	const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
+	const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN |
+				  __GFP_ZERO | __GFP_ZEROTAGS | __GFP_SKIP_ZERO;
 	const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
 
 	/*
@@ -6541,7 +6542,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
 	}
 
 	if (!(gfp_mask & __GFP_COMP)) {
-		split_free_pages(cc.freepages);
+		split_free_pages(cc.freepages, gfp_mask);
 
 		/* Free head and tail (if any) */
 		if (start != outer_start)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages()
  2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
                   ` (4 preceding siblings ...)
  2024-12-03  9:47 ` [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() David Hildenbrand
@ 2024-12-03  9:47 ` David Hildenbrand
  2024-12-03 14:39   ` Vlastimil Babka
  5 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03  9:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, linuxppc-dev, David Hildenbrand, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

alloc_contig_pages()->alloc_contig_range() now supports __GFP_ZERO,
so let's use that instead to resolve our TODO.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/powerpc/platforms/powernv/memtrace.c | 31 +++++------------------
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c b/arch/powerpc/platforms/powernv/memtrace.c
index 877720c64515..4ac9808e55a4 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -88,26 +88,6 @@ static void flush_dcache_range_chunked(unsigned long start, unsigned long stop,
 	}
 }
 
-static void memtrace_clear_range(unsigned long start_pfn,
-				 unsigned long nr_pages)
-{
-	unsigned long pfn;
-
-	/* As HIGHMEM does not apply, use clear_page() directly. */
-	for (pfn = start_pfn; pfn < start_pfn + nr_pages; pfn++) {
-		if (IS_ALIGNED(pfn, PAGES_PER_SECTION))
-			cond_resched();
-		clear_page(__va(PFN_PHYS(pfn)));
-	}
-	/*
-	 * Before we go ahead and use this range as cache inhibited range
-	 * flush the cache.
-	 */
-	flush_dcache_range_chunked((unsigned long)pfn_to_kaddr(start_pfn),
-				   (unsigned long)pfn_to_kaddr(start_pfn + nr_pages),
-				   FLUSH_CHUNK_SIZE);
-}
-
 static u64 memtrace_alloc_node(u32 nid, u64 size)
 {
 	const unsigned long nr_pages = PHYS_PFN(size);
@@ -119,17 +99,18 @@ static u64 memtrace_alloc_node(u32 nid, u64 size)
 	 * by alloc_contig_pages().
 	 */
 	page = alloc_contig_pages(nr_pages, GFP_KERNEL | __GFP_THISNODE |
-				  __GFP_NOWARN, nid, NULL);
+				  __GFP_NOWARN | __GFP_ZERO, nid, NULL);
 	if (!page)
 		return 0;
 	start_pfn = page_to_pfn(page);
 
 	/*
-	 * Clear the range while we still have a linear mapping.
-	 *
-	 * TODO: use __GFP_ZERO with alloc_contig_pages() once supported.
+	 * Before we go ahead and use this range as cache inhibited range
+	 * flush the cache.
 	 */
-	memtrace_clear_range(start_pfn, nr_pages);
+	flush_dcache_range_chunked((unsigned long)pfn_to_kaddr(start_pfn),
+				   (unsigned long)pfn_to_kaddr(start_pfn + nr_pages),
+				   FLUSH_CHUNK_SIZE);
 
 	/*
 	 * Set pages PageOffline(), to indicate that nobody (e.g., hibernation,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
@ 2024-12-03 13:31   ` Vlastimil Babka
  2024-12-03 15:30   ` Oscar Salvador
  2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 13:31 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> The flags are no longer used, we can stop passing them to
> isolate_single_pageblock().
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_isolation.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 7e04047977cf..e680d40d96de 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -286,7 +286,6 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * within a free or in-use page.
>   * @boundary_pfn:		pageblock-aligned pfn that a page might cross
>   * @flags:			isolation flags
> - * @gfp_flags:			GFP flags used for migrating pages
>   * @isolate_before:	isolate the pageblock before the boundary_pfn
>   * @skip_isolation:	the flag to skip the pageblock isolation in second
>   *			isolate_single_pageblock()
> @@ -306,8 +305,7 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
>   * the in-use page then splitting the free page.
>   */
>  static int isolate_single_pageblock(unsigned long boundary_pfn, int flags,
> -			gfp_t gfp_flags, bool isolate_before, bool skip_isolation,
> -			int migratetype)
> +		bool isolate_before, bool skip_isolation, int migratetype)
>  {
>  	unsigned long start_pfn;
>  	unsigned long isolate_pageblock;
> @@ -489,7 +487,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	bool skip_isolation = false;
>  
>  	/* isolate [isolate_start, isolate_start + pageblock_nr_pages) pageblock */
> -	ret = isolate_single_pageblock(isolate_start, flags, gfp_flags, false,
> +	ret = isolate_single_pageblock(isolate_start, flags, false,
>  			skip_isolation, migratetype);
>  	if (ret)
>  		return ret;
> @@ -498,7 +496,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  		skip_isolation = true;
>  
>  	/* isolate [isolate_end - pageblock_nr_pages, isolate_end) pageblock */
> -	ret = isolate_single_pageblock(isolate_end, flags, gfp_flags, true,
> +	ret = isolate_single_pageblock(isolate_end, flags, true,
>  			skip_isolation, migratetype);
>  	if (ret) {
>  		unset_migratetype_isolate(pfn_to_page(isolate_start), migratetype);


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range()
  2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
@ 2024-12-03 13:32   ` Vlastimil Babka
  2024-12-03 15:32   ` Oscar Salvador
  2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 13:32 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> The parameter is unused, so let's stop passing it.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static
  2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
@ 2024-12-03 13:33   ` Vlastimil Babka
  2024-12-03 15:33   ` Oscar Salvador
  2024-12-03 21:45   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 13:33 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> The single user is in page_alloc.c.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/internal.h   | 4 ----
>  mm/page_alloc.c | 5 ++---
>  2 files changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index 74713b44bedb..4bd3685c33ef 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -839,10 +839,6 @@ int
>  isolate_migratepages_range(struct compact_control *cc,
>  			   unsigned long low_pfn, unsigned long end_pfn);
>  
> -int __alloc_contig_migrate_range(struct compact_control *cc,
> -					unsigned long start, unsigned long end,
> -					int migratetype);
> -
>  /* Free whole pageblock and set its migration type to MIGRATE_CMA. */
>  void init_cma_reserved_pageblock(struct page *page);
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index f371fbf2145b..ce7589a4ec01 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6284,9 +6284,8 @@ static void alloc_contig_dump_pages(struct list_head *page_list)
>   * @migratetype: using migratetype to filter the type of migration in
>   *		trace_mm_alloc_contig_migrate_range_info.
>   */
> -int __alloc_contig_migrate_range(struct compact_control *cc,
> -					unsigned long start, unsigned long end,
> -					int migratetype)
> +static int __alloc_contig_migrate_range(struct compact_control *cc,
> +		unsigned long start, unsigned long end, int migratetype)
>  {
>  	/* This function is based on compact_zone() from compaction.c. */
>  	unsigned int nr_reclaimed;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03  9:47 ` [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess David Hildenbrand
@ 2024-12-03 13:55   ` Vlastimil Babka
  2024-12-03 14:12     ` David Hildenbrand
  2024-12-04  9:00   ` Oscar Salvador
  1 sibling, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 13:55 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> It's all a bit complicated for alloc_contig_range(). For example, we don't
> support many flags, so let's start bailing out on unsupported
> ones -- ignoring the placement hints, as we are already given the range
> to allocate.
> 
> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
> simply create yet another GFP mask whereby we ignore the reclaim flags
> specify by the caller. That looks very inconsistent.
> 
> Let's clean it up, constructing the gfp flags used for
> compaction/migration exactly once. Update the documentation of the
> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
> 
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> +	/*
> +	 * Flags to control page compaction/migration/reclaim, to free up our
> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
> +	 * for them.
> +	 *
> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
> +	 * keep doing that to not degrade callers.
> +	 */

Wonder if we could revisit that eventually. Why limit migration targets by
cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?

Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
__GFP_NOWARN in few places, so it's mostly migration_target_control the
callers could meaningfully influence.

> +	*gfp_cc_mask = (gfp_mask & (reclaim_mask | cc_action_mask)) |
> +			__GFP_HARDWALL | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL;
> +	return 0;
> +}
> +
>  /**
>   * alloc_contig_range() -- tries to allocate given range of pages
>   * @start:	start PFN to allocate
> @@ -6398,7 +6431,9 @@ static void split_free_pages(struct list_head *list)
>   *			#MIGRATE_MOVABLE or #MIGRATE_CMA).  All pageblocks
>   *			in range must have the same migratetype and it must
>   *			be either of the two.
> - * @gfp_mask:	GFP mask to use during compaction
> + * @gfp_mask:	GFP mask. Node/zone/placement hints are ignored; only some
> + *		action and reclaim modifiers are supported. Reclaim modifiers
> + *		control allocation behavior during compaction/migration/reclaim.
>   *
>   * The PFN range does not have to be pageblock aligned. The PFN range must
>   * belong to a single zone.
> @@ -6424,11 +6459,14 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>  		.mode = MIGRATE_SYNC,
>  		.ignore_skip_hint = true,
>  		.no_set_skip_hint = true,
> -		.gfp_mask = current_gfp_context(gfp_mask),
>  		.alloc_contig = true,
>  	};
>  	INIT_LIST_HEAD(&cc.migratepages);
>  
> +	gfp_mask = current_gfp_context(gfp_mask);
> +	if (__alloc_contig_verify_gfp_mask(gfp_mask, (gfp_t *)&cc.gfp_mask))
> +		return -EINVAL;
> +
>  	/*
>  	 * What we do here is we mark all pageblocks in range as
>  	 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
> @@ -6571,7 +6609,9 @@ static bool zone_spans_last_pfn(const struct zone *zone,
>  /**
>   * alloc_contig_pages() -- tries to find and allocate contiguous range of pages
>   * @nr_pages:	Number of contiguous pages to allocate
> - * @gfp_mask:	GFP mask to limit search and used during compaction
> + * @gfp_mask:	GFP mask. Node/zone/placement hints limit the search; only some
> + *		action and reclaim modifiers are supported. Reclaim modifiers
> + *		control allocation behavior during compaction/migration/reclaim.
>   * @nid:	Target node
>   * @nodemask:	Mask for other possible nodes
>   *


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 13:55   ` Vlastimil Babka
@ 2024-12-03 14:12     ` David Hildenbrand
  2024-12-03 14:24       ` Vlastimil Babka
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03 14:12 UTC (permalink / raw)
  To: Vlastimil Babka, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 03.12.24 14:55, Vlastimil Babka wrote:
> On 12/3/24 10:47, David Hildenbrand wrote:
>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>> support many flags, so let's start bailing out on unsupported
>> ones -- ignoring the placement hints, as we are already given the range
>> to allocate.
>>
>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>> simply create yet another GFP mask whereby we ignore the reclaim flags
>> specify by the caller. That looks very inconsistent.
>>
>> Let's clean it up, constructing the gfp flags used for
>> compaction/migration exactly once. Update the documentation of the
>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>
>> Acked-by: Zi Yan <ziy@nvidia.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> 
>> +	/*
>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>> +	 * for them.
>> +	 *
>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>> +	 * keep doing that to not degrade callers.
>> +	 */
> 
> Wonder if we could revisit that eventually. Why limit migration targets by
> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?

See below.

> 
> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
> __GFP_NOWARN in few places, so it's mostly migration_target_control the
> callers could meaningfully influence.

Note the fist change in the file, where we now use the mask instead of coming up
with another one out of the blue. :)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ce7589a4ec01..54594cc4f650 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
  	int ret = 0;
  	struct migration_target_control mtc = {
  		.nid = zone_to_nid(cc->zone),
-		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
+		.gfp_mask = cc->gfp_mask,
  		.reason = MR_CONTIG_RANGE,
  	};

GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but
likely the thing we are assuming here is that we are migrating a page, and
usually, these are user allocation (except maybe balloon and some other non-lru
movable things).

The __GFP_RETRY_MAYFAIL should be moved to relevant callers a some point,
__GFP_HARDWALL, I really don't know ...

Thanks!

-- 
Cheers,

David / dhildenb


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 14:12     ` David Hildenbrand
@ 2024-12-03 14:24       ` Vlastimil Babka
  2024-12-03 15:49         ` Zi Yan
  2024-12-03 19:19         ` David Hildenbrand
  0 siblings, 2 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 14:24 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 15:12, David Hildenbrand wrote:
> On 03.12.24 14:55, Vlastimil Babka wrote:
>> On 12/3/24 10:47, David Hildenbrand wrote:
>>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>>> support many flags, so let's start bailing out on unsupported
>>> ones -- ignoring the placement hints, as we are already given the range
>>> to allocate.
>>>
>>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>>> simply create yet another GFP mask whereby we ignore the reclaim flags
>>> specify by the caller. That looks very inconsistent.
>>>
>>> Let's clean it up, constructing the gfp flags used for
>>> compaction/migration exactly once. Update the documentation of the
>>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>>
>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> 
>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>> 
>>> +	/*
>>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>>> +	 * for them.
>>> +	 *
>>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>>> +	 * keep doing that to not degrade callers.
>>> +	 */
>> 
>> Wonder if we could revisit that eventually. Why limit migration targets by
>> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
>> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?
> 
> See below.
> 
>> 
>> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
>> __GFP_NOWARN in few places, so it's mostly migration_target_control the
>> callers could meaningfully influence.
> 
> Note the fist change in the file, where we now use the mask instead of coming up
> with another one out of the blue. :)

I know. What I wanted to say - cc->gfp is on its own only checked in few
places, but now since we also translate it to migration_target_control's
gfp_mask, it's mostly that part the caller might influence with the passed
flags. But we still impose own additions to it, limiting that influence.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ce7589a4ec01..54594cc4f650 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>   	int ret = 0;
>   	struct migration_target_control mtc = {
>   		.nid = zone_to_nid(cc->zone),
> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> +		.gfp_mask = cc->gfp_mask,
>   		.reason = MR_CONTIG_RANGE,
>   	};
> 
> GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but

Yeah wonder if GFP_USER was used specifically for that part, or just randomly :)

> likely the thing we are assuming here is that we are migrating a page, and
> usually, these are user allocation (except maybe balloon and some other non-lru
> movable things).

Yeah and user allocations obey cpuset and mempolicies etc. But these are
likely somebody elses allocations that were done according to their
policies. With our migration we might be actually violating those, which
probably can't be helped (is at least migration within the same node
preferred? hmm). But it doesn't seem to me that our caller's restrictions
(if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
somebody else's pages?

> The __GFP_RETRY_MAYFAIL should be moved to relevant callers a some point,
> __GFP_HARDWALL, I really don't know ...
> 
> Thanks!
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook()
  2024-12-03  9:47 ` [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() David Hildenbrand
@ 2024-12-03 14:36   ` Vlastimil Babka
  2024-12-04  9:03   ` Oscar Salvador
  1 sibling, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 14:36 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> In the __GFP_COMP case, we already pass the gfp_flags to
> prep_new_page()->post_alloc_hook(). However, in the !__GFP_COMP case, we
> essentially pass only hardcoded __GFP_MOVABLE to post_alloc_hook(),
> preventing some action modifiers from being effective..
> 
> Let's pass our now properly adjusted gfp flags there as well.
> 
> This way, we can now support __GFP_ZERO for alloc_contig_*().
> 
> As a side effect, we now also support __GFP_SKIP_ZERO and__GFP_ZEROTAGS;
> but we'll keep the more special stuff (KASAN, NOLOCKDEP) disabled for
> now.
> 
> It's worth noting that with __GFP_ZERO, we might unnecessarily zero pages
> when we have to release part of our range using free_contig_range() again.
> This can be optimized in the future, if ever required; the caller we'll
> be converting (powernv/memtrace) next won't trigger this.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/page_alloc.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 54594cc4f650..71d70bc0ad79 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6364,7 +6364,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>  	return (ret < 0) ? ret : 0;
>  }
>  
> -static void split_free_pages(struct list_head *list)
> +static void split_free_pages(struct list_head *list, gfp_t gfp_mask)
>  {
>  	int order;
>  
> @@ -6375,7 +6375,7 @@ static void split_free_pages(struct list_head *list)
>  		list_for_each_entry_safe(page, next, &list[order], lru) {
>  			int i;
>  
> -			post_alloc_hook(page, order, __GFP_MOVABLE);
> +			post_alloc_hook(page, order, gfp_mask);
>  			set_page_refcounted(page);
>  			if (!order)
>  				continue;
> @@ -6393,7 +6393,8 @@ static void split_free_pages(struct list_head *list)
>  static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask)
>  {
>  	const gfp_t reclaim_mask = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> -	const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
> +	const gfp_t action_mask = __GFP_COMP | __GFP_RETRY_MAYFAIL | __GFP_NOWARN |
> +				  __GFP_ZERO | __GFP_ZEROTAGS | __GFP_SKIP_ZERO;
>  	const gfp_t cc_action_mask = __GFP_RETRY_MAYFAIL | __GFP_NOWARN;
>  
>  	/*
> @@ -6541,7 +6542,7 @@ int alloc_contig_range_noprof(unsigned long start, unsigned long end,
>  	}
>  
>  	if (!(gfp_mask & __GFP_COMP)) {
> -		split_free_pages(cc.freepages);
> +		split_free_pages(cc.freepages, gfp_mask);
>  
>  		/* Free head and tail (if any) */
>  		if (start != outer_start)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages()
  2024-12-03  9:47 ` [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages() David Hildenbrand
@ 2024-12-03 14:39   ` Vlastimil Babka
  0 siblings, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-03 14:39 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 10:47, David Hildenbrand wrote:
> alloc_contig_pages()->alloc_contig_range() now supports __GFP_ZERO,
> so let's use that instead to resolve our TODO.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
  2024-12-03 13:31   ` Vlastimil Babka
@ 2024-12-03 15:30   ` Oscar Salvador
  2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Oscar Salvador @ 2024-12-03 15:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:27AM +0100, David Hildenbrand wrote:
> The flags are no longer used, we can stop passing them to
> isolate_single_pageblock().
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range()
  2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
  2024-12-03 13:32   ` Vlastimil Babka
@ 2024-12-03 15:32   ` Oscar Salvador
  2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Oscar Salvador @ 2024-12-03 15:32 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:28AM +0100, David Hildenbrand wrote:
> The parameter is unused, so let's stop passing it.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static
  2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
  2024-12-03 13:33   ` Vlastimil Babka
@ 2024-12-03 15:33   ` Oscar Salvador
  2024-12-03 21:45   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Oscar Salvador @ 2024-12-03 15:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:29AM +0100, David Hildenbrand wrote:
> The single user is in page_alloc.c.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 14:24       ` Vlastimil Babka
@ 2024-12-03 15:49         ` Zi Yan
  2024-12-03 19:07           ` David Hildenbrand
  2024-12-03 19:19         ` David Hildenbrand
  1 sibling, 1 reply; 33+ messages in thread
From: Zi Yan @ 2024-12-03 15:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	Andrew Morton, Oscar Salvador, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On 3 Dec 2024, at 9:24, Vlastimil Babka wrote:

> On 12/3/24 15:12, David Hildenbrand wrote:
>> On 03.12.24 14:55, Vlastimil Babka wrote:
>>> On 12/3/24 10:47, David Hildenbrand wrote:
>>>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>>>> support many flags, so let's start bailing out on unsupported
>>>> ones -- ignoring the placement hints, as we are already given the range
>>>> to allocate.
>>>>
>>>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>>>> simply create yet another GFP mask whereby we ignore the reclaim flags
>>>> specify by the caller. That looks very inconsistent.
>>>>
>>>> Let's clean it up, constructing the gfp flags used for
>>>> compaction/migration exactly once. Update the documentation of the
>>>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>>>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>
>>>> +	/*
>>>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>>>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>>>> +	 * for them.
>>>> +	 *
>>>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>>>> +	 * keep doing that to not degrade callers.
>>>> +	 */
>>>
>>> Wonder if we could revisit that eventually. Why limit migration targets by
>>> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
>>> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?
>>
>> See below.
>>
>>>
>>> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
>>> __GFP_NOWARN in few places, so it's mostly migration_target_control the
>>> callers could meaningfully influence.
>>
>> Note the fist change in the file, where we now use the mask instead of coming up
>> with another one out of the blue. :)
>
> I know. What I wanted to say - cc->gfp is on its own only checked in few
> places, but now since we also translate it to migration_target_control's
> gfp_mask, it's mostly that part the caller might influence with the passed
> flags. But we still impose own additions to it, limiting that influence.
>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ce7589a4ec01..54594cc4f650 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>   	int ret = 0;
>>   	struct migration_target_control mtc = {
>>   		.nid = zone_to_nid(cc->zone),
>> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>> +		.gfp_mask = cc->gfp_mask,
>>   		.reason = MR_CONTIG_RANGE,
>>   	};
>>
>> GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but
>
> Yeah wonder if GFP_USER was used specifically for that part, or just randomly :)
>
>> likely the thing we are assuming here is that we are migrating a page, and
>> usually, these are user allocation (except maybe balloon and some other non-lru
>> movable things).
>
> Yeah and user allocations obey cpuset and mempolicies etc. But these are
> likely somebody elses allocations that were done according to their
> policies. With our migration we might be actually violating those, which
> probably can't be helped (is at least migration within the same node
> preferred? hmm). But it doesn't seem to me that our caller's restrictions
> (if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
> somebody else's pages?

Yeah, I was wondering why current_gfp_context() is used to adjust gfp_mask,
since current context might not be relevant. But I see it is used in
the original code, so I did not ask. If current context is irrelevant w.r.t
the to-be-migrated pages, should current_gfp_context() part be removed?

Ideally, to respect the to-be-migrated page’s gfp, we might need to go through
rmap to find its corresponding vma and possible task struct, but that seems
overly complicated.


Best Regards,
Yan, Zi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 15:49         ` Zi Yan
@ 2024-12-03 19:07           ` David Hildenbrand
  0 siblings, 0 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03 19:07 UTC (permalink / raw)
  To: Zi Yan, Vlastimil Babka
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton,
	Oscar Salvador, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On 03.12.24 16:49, Zi Yan wrote:
> On 3 Dec 2024, at 9:24, Vlastimil Babka wrote:
> 
>> On 12/3/24 15:12, David Hildenbrand wrote:
>>> On 03.12.24 14:55, Vlastimil Babka wrote:
>>>> On 12/3/24 10:47, David Hildenbrand wrote:
>>>>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>>>>> support many flags, so let's start bailing out on unsupported
>>>>> ones -- ignoring the placement hints, as we are already given the range
>>>>> to allocate.
>>>>>
>>>>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>>>>> simply create yet another GFP mask whereby we ignore the reclaim flags
>>>>> specify by the caller. That looks very inconsistent.
>>>>>
>>>>> Let's clean it up, constructing the gfp flags used for
>>>>> compaction/migration exactly once. Update the documentation of the
>>>>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>>>>
>>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>
>>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>>
>>>>> +	/*
>>>>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>>>>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>>>>> +	 * for them.
>>>>> +	 *
>>>>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>>>>> +	 * keep doing that to not degrade callers.
>>>>> +	 */
>>>>
>>>> Wonder if we could revisit that eventually. Why limit migration targets by
>>>> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
>>>> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?
>>>
>>> See below.
>>>
>>>>
>>>> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
>>>> __GFP_NOWARN in few places, so it's mostly migration_target_control the
>>>> callers could meaningfully influence.
>>>
>>> Note the fist change in the file, where we now use the mask instead of coming up
>>> with another one out of the blue. :)
>>
>> I know. What I wanted to say - cc->gfp is on its own only checked in few
>> places, but now since we also translate it to migration_target_control's
>> gfp_mask, it's mostly that part the caller might influence with the passed
>> flags. But we still impose own additions to it, limiting that influence.
>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index ce7589a4ec01..54594cc4f650 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>>    	int ret = 0;
>>>    	struct migration_target_control mtc = {
>>>    		.nid = zone_to_nid(cc->zone),
>>> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>>> +		.gfp_mask = cc->gfp_mask,
>>>    		.reason = MR_CONTIG_RANGE,
>>>    	};
>>>
>>> GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but
>>
>> Yeah wonder if GFP_USER was used specifically for that part, or just randomly :)
>>
>>> likely the thing we are assuming here is that we are migrating a page, and
>>> usually, these are user allocation (except maybe balloon and some other non-lru
>>> movable things).
>>
>> Yeah and user allocations obey cpuset and mempolicies etc. But these are
>> likely somebody elses allocations that were done according to their
>> policies. With our migration we might be actually violating those, which
>> probably can't be helped (is at least migration within the same node
>> preferred? hmm). But it doesn't seem to me that our caller's restrictions
>> (if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
>> somebody else's pages?
> 
> Yeah, I was wondering why current_gfp_context() is used to adjust gfp_mask,
> since current context might not be relevant. But I see it is used in
> the original code, so I did not ask. If current context is irrelevant w.r.t
> the to-be-migrated pages, should current_gfp_context() part be removed?

Please see how current_gfp_context() is only concerned (excluding the 
__GFP_MOVABLE thing we unconditionally set ...) about reclaim flags. 
This part make absolute sense to respect here.

So that is something different than __GFP_HARDWALL that *we so far 
unconditionally set* and is not a "reclaim" flag.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 14:24       ` Vlastimil Babka
  2024-12-03 15:49         ` Zi Yan
@ 2024-12-03 19:19         ` David Hildenbrand
  2024-12-04  8:54           ` Vlastimil Babka
  2024-12-04  8:59           ` Oscar Salvador
  1 sibling, 2 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-03 19:19 UTC (permalink / raw)
  To: Vlastimil Babka, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 03.12.24 15:24, Vlastimil Babka wrote:
> On 12/3/24 15:12, David Hildenbrand wrote:
>> On 03.12.24 14:55, Vlastimil Babka wrote:
>>> On 12/3/24 10:47, David Hildenbrand wrote:
>>>> It's all a bit complicated for alloc_contig_range(). For example, we don't
>>>> support many flags, so let's start bailing out on unsupported
>>>> ones -- ignoring the placement hints, as we are already given the range
>>>> to allocate.
>>>>
>>>> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
>>>> simply create yet another GFP mask whereby we ignore the reclaim flags
>>>> specify by the caller. That looks very inconsistent.
>>>>
>>>> Let's clean it up, constructing the gfp flags used for
>>>> compaction/migration exactly once. Update the documentation of the
>>>> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
>>>>
>>>> Acked-by: Zi Yan <ziy@nvidia.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>
>>> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>>>
>>>> +	/*
>>>> +	 * Flags to control page compaction/migration/reclaim, to free up our
>>>> +	 * page range. Migratable pages are movable, __GFP_MOVABLE is implied
>>>> +	 * for them.
>>>> +	 *
>>>> +	 * Traditionally we always had __GFP_HARDWALL|__GFP_RETRY_MAYFAIL set,
>>>> +	 * keep doing that to not degrade callers.
>>>> +	 */
>>>
>>> Wonder if we could revisit that eventually. Why limit migration targets by
>>> cpuset via __GFP_HARDWALL if we were not called with __GFP_HARDWALL? And why
>>> weaken the attempts with __GFP_RETRY_MAYFAIL if we didn't specify it?
>>
>> See below.
>>
>>>
>>> Unless I'm missing something, cc->gfp is only checked for __GFP_FS and
>>> __GFP_NOWARN in few places, so it's mostly migration_target_control the
>>> callers could meaningfully influence.
>>
>> Note the fist change in the file, where we now use the mask instead of coming up
>> with another one out of the blue. :)
> 
> I know. What I wanted to say - cc->gfp is on its own only checked in few
> places, but now since we also translate it to migration_target_control's
> gfp_mask, it's mostly that part the caller might influence with the passed
> flags. But we still impose own additions to it, limiting that influence.
> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index ce7589a4ec01..54594cc4f650 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -6294,7 +6294,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
>>    	int ret = 0;
>>    	struct migration_target_control mtc = {
>>    		.nid = zone_to_nid(cc->zone),
>> -		.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>> +		.gfp_mask = cc->gfp_mask,
>>    		.reason = MR_CONTIG_RANGE,
>>    	};
>>
>> GFP_USER contains __GFP_HARDWALL. I am not sure if that matters here, but
> 
> Yeah wonder if GFP_USER was used specifically for that part, or just randomly :)
> 
>> likely the thing we are assuming here is that we are migrating a page, and
>> usually, these are user allocation (except maybe balloon and some other non-lru
>> movable things).
> 
> Yeah and user allocations obey cpuset and mempolicies etc. But these are
> likely somebody elses allocations that were done according to their
> policies. With our migration we might be actually violating those, which
> probably can't be helped (is at least migration within the same node
> preferred? hmm).

I would hope that we handle memory policies somehow (via VMAs? not 
sure). cpuset? I have no idea.

But it doesn't seem to me that our caller's restrictions
> (if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
> somebody else's pages?

It was always set using "GFP_USER | __GFP_MOVABLE | 
__GFP_RETRY_MAYFAIL", and I removed the same flag combination in #2 from 
memory offline code, and we do have the exact same thing in 
do_migrate_range() in mm/memory_hotplug.c.

We should investigate if__GFP_HARDWALL is the right thing to use here, 
and if we can get rid of that by switching to GFP_KERNEL in all these 
places.

I can look into it + send a follow-up patch.

Thanks!

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()
  2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
  2024-12-03 13:31   ` Vlastimil Babka
  2024-12-03 15:30   ` Oscar Salvador
@ 2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vishal Moola @ 2024-12-03 21:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:27AM +0100, David Hildenbrand wrote:
> The flags are no longer used, we can stop passing them to
> isolate_single_pageblock().
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range()
  2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
  2024-12-03 13:32   ` Vlastimil Babka
  2024-12-03 15:32   ` Oscar Salvador
@ 2024-12-03 21:44   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vishal Moola @ 2024-12-03 21:44 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:28AM +0100, David Hildenbrand wrote:
> The parameter is unused, so let's stop passing it.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static
  2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
  2024-12-03 13:33   ` Vlastimil Babka
  2024-12-03 15:33   ` Oscar Salvador
@ 2024-12-03 21:45   ` Vishal Moola
  2 siblings, 0 replies; 33+ messages in thread
From: Vishal Moola @ 2024-12-03 21:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton,
	Oscar Salvador, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:29AM +0100, David Hildenbrand wrote:
> The single user is in page_alloc.c.
> 
> Reviewed-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 19:19         ` David Hildenbrand
@ 2024-12-04  8:54           ` Vlastimil Babka
  2024-12-04  8:59           ` Oscar Salvador
  1 sibling, 0 replies; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-04  8:54 UTC (permalink / raw)
  To: David Hildenbrand, linux-kernel
  Cc: linux-mm, linuxppc-dev, Andrew Morton, Oscar Salvador, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/3/24 20:19, David Hildenbrand wrote:
> On 03.12.24 15:24, Vlastimil Babka wrote:
>> On 12/3/24 15:12, David Hildenbrand wrote:
>>> On 03.12.24 14:55, Vlastimil Babka wrote:
>>> likely the thing we are assuming here is that we are migrating a page, and
>>> usually, these are user allocation (except maybe balloon and some other non-lru
>>> movable things).
>> 
>> Yeah and user allocations obey cpuset and mempolicies etc. But these are
>> likely somebody elses allocations that were done according to their
>> policies. With our migration we might be actually violating those, which
>> probably can't be helped (is at least migration within the same node
>> preferred? hmm).
> 
> I would hope that we handle memory policies somehow (via VMAs? not 
> sure). cpuset? I have no idea.

They are handled when allocating, but then the info is lost, the allocation
doesn't carry its effective nodemask.
But that's really a separate issue that just occured to me.

> But it doesn't seem to me that our caller's restrictions
>> (if those exist, would be enforced by __GFP_HARDWALL) are that relevant for
>> somebody else's pages?
> 
> It was always set using "GFP_USER | __GFP_MOVABLE | 
> __GFP_RETRY_MAYFAIL", and I removed the same flag combination in #2 from 
> memory offline code, and we do have the exact same thing in 
> do_migrate_range() in mm/memory_hotplug.c.

Yeah I agree a refactoring patch shouldn't change the existing behavior...

> We should investigate if__GFP_HARDWALL is the right thing to use here, 
> and if we can get rid of that by switching to GFP_KERNEL in all these 
> places.
> 
> I can look into it + send a follow-up patch.

...but it's a great opportunity to start questioning it and possibly change
it as a follow up :)

Thanks!

> Thanks!
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03 19:19         ` David Hildenbrand
  2024-12-04  8:54           ` Vlastimil Babka
@ 2024-12-04  8:59           ` Oscar Salvador
  2024-12-04  9:03             ` Vlastimil Babka
  1 sibling, 1 reply; 33+ messages in thread
From: Oscar Salvador @ 2024-12-04  8:59 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Vlastimil Babka, linux-kernel, linux-mm, linuxppc-dev,
	Andrew Morton, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
> It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
> and I removed the same flag combination in #2 from memory offline code, and
> we do have the exact same thing in do_migrate_range() in
> mm/memory_hotplug.c.
> 
> We should investigate if__GFP_HARDWALL is the right thing to use here, and
> if we can get rid of that by switching to GFP_KERNEL in all these places.

Why would not we want __GFP_HARDWALL set?
Without it, we could potentially migrate the page to a node which is not
part of the cpuset of the task that originally allocated it, thus violating the
policy? Is not that a problem?
 

-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-03  9:47 ` [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess David Hildenbrand
  2024-12-03 13:55   ` Vlastimil Babka
@ 2024-12-04  9:00   ` Oscar Salvador
  1 sibling, 0 replies; 33+ messages in thread
From: Oscar Salvador @ 2024-12-04  9:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:30AM +0100, David Hildenbrand wrote:
> It's all a bit complicated for alloc_contig_range(). For example, we don't
> support many flags, so let's start bailing out on unsupported
> ones -- ignoring the placement hints, as we are already given the range
> to allocate.
> 
> While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we
> simply create yet another GFP mask whereby we ignore the reclaim flags
> specify by the caller. That looks very inconsistent.
> 
> Let's clean it up, constructing the gfp flags used for
> compaction/migration exactly once. Update the documentation of the
> gfp_mask parameter for alloc_contig_range() and alloc_contig_pages().
> 
> Acked-by: Zi Yan <ziy@nvidia.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook()
  2024-12-03  9:47 ` [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() David Hildenbrand
  2024-12-03 14:36   ` Vlastimil Babka
@ 2024-12-04  9:03   ` Oscar Salvador
  1 sibling, 0 replies; 33+ messages in thread
From: Oscar Salvador @ 2024-12-04  9:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On Tue, Dec 03, 2024 at 10:47:31AM +0100, David Hildenbrand wrote:
> In the __GFP_COMP case, we already pass the gfp_flags to
> prep_new_page()->post_alloc_hook(). However, in the !__GFP_COMP case, we
> essentially pass only hardcoded __GFP_MOVABLE to post_alloc_hook(),
> preventing some action modifiers from being effective..
> 
> Let's pass our now properly adjusted gfp flags there as well.
> 
> This way, we can now support __GFP_ZERO for alloc_contig_*().
> 
> As a side effect, we now also support __GFP_SKIP_ZERO and__GFP_ZEROTAGS;
> but we'll keep the more special stuff (KASAN, NOLOCKDEP) disabled for
> now.
> 
> It's worth noting that with __GFP_ZERO, we might unnecessarily zero pages
> when we have to release part of our range using free_contig_range() again.
> This can be optimized in the future, if ever required; the caller we'll
> be converting (powernv/memtrace) next won't trigger this.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Oscar Salvador <osalvador@suse.de>


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-04  8:59           ` Oscar Salvador
@ 2024-12-04  9:03             ` Vlastimil Babka
  2024-12-04  9:15               ` Oscar Salvador
  0 siblings, 1 reply; 33+ messages in thread
From: Vlastimil Babka @ 2024-12-04  9:03 UTC (permalink / raw)
  To: Oscar Salvador, David Hildenbrand
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 12/4/24 09:59, Oscar Salvador wrote:
> On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
>> It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
>> and I removed the same flag combination in #2 from memory offline code, and
>> we do have the exact same thing in do_migrate_range() in
>> mm/memory_hotplug.c.
>> 
>> We should investigate if__GFP_HARDWALL is the right thing to use here, and
>> if we can get rid of that by switching to GFP_KERNEL in all these places.
> 
> Why would not we want __GFP_HARDWALL set?
> Without it, we could potentially migrate the page to a node which is not
> part of the cpuset of the task that originally allocated it, thus violating the
> policy? Is not that a problem?

The task doing the alloc_contig_range() will likely not be the same task as
the one that originally allocated the page, so its policy would be
different, no? So even with __GFP_HARDWALL we might be already migrating
outside the original tasks's constraint? Am I missing something?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-04  9:03             ` Vlastimil Babka
@ 2024-12-04  9:15               ` Oscar Salvador
  2024-12-04  9:28                 ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: Oscar Salvador @ 2024-12-04  9:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Hildenbrand, linux-kernel, linux-mm, linuxppc-dev,
	Andrew Morton, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Wed, Dec 04, 2024 at 10:03:28AM +0100, Vlastimil Babka wrote:
> On 12/4/24 09:59, Oscar Salvador wrote:
> > On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
> >> It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
> >> and I removed the same flag combination in #2 from memory offline code, and
> >> we do have the exact same thing in do_migrate_range() in
> >> mm/memory_hotplug.c.
> >> 
> >> We should investigate if__GFP_HARDWALL is the right thing to use here, and
> >> if we can get rid of that by switching to GFP_KERNEL in all these places.
> > 
> > Why would not we want __GFP_HARDWALL set?
> > Without it, we could potentially migrate the page to a node which is not
> > part of the cpuset of the task that originally allocated it, thus violating the
> > policy? Is not that a problem?
> 
> The task doing the alloc_contig_range() will likely not be the same task as
> the one that originally allocated the page, so its policy would be
> different, no? So even with __GFP_HARDWALL we might be already migrating
> outside the original tasks's constraint? Am I missing something?

Yes, that is right, I thought we derive the policy from the old page
somehow when migrating it, but reading the code does not seem to be the
case.

Looking at prepare_alloc_pages(), if !ac->nodemask, which would be the
case here, we would get the policy from the current task
(alloc_contig_range()) when cpusets are enabled.

So yes, I am a bit puzzled why __GFP_HARDWALL was chosen in the first
place.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-04  9:15               ` Oscar Salvador
@ 2024-12-04  9:28                 ` David Hildenbrand
  2024-12-04 10:04                   ` Oscar Salvador
  0 siblings, 1 reply; 33+ messages in thread
From: David Hildenbrand @ 2024-12-04  9:28 UTC (permalink / raw)
  To: Oscar Salvador, Vlastimil Babka
  Cc: linux-kernel, linux-mm, linuxppc-dev, Andrew Morton, Zi Yan,
	Michael Ellerman, Nicholas Piggin, Christophe Leroy, Naveen N Rao,
	Madhavan Srinivasan

On 04.12.24 10:15, Oscar Salvador wrote:
> On Wed, Dec 04, 2024 at 10:03:28AM +0100, Vlastimil Babka wrote:
>> On 12/4/24 09:59, Oscar Salvador wrote:
>>> On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
>>>> It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
>>>> and I removed the same flag combination in #2 from memory offline code, and
>>>> we do have the exact same thing in do_migrate_range() in
>>>> mm/memory_hotplug.c.
>>>>
>>>> We should investigate if__GFP_HARDWALL is the right thing to use here, and
>>>> if we can get rid of that by switching to GFP_KERNEL in all these places.
>>>
>>> Why would not we want __GFP_HARDWALL set?
>>> Without it, we could potentially migrate the page to a node which is not
>>> part of the cpuset of the task that originally allocated it, thus violating the
>>> policy? Is not that a problem?
>>
>> The task doing the alloc_contig_range() will likely not be the same task as
>> the one that originally allocated the page, so its policy would be
>> different, no? So even with __GFP_HARDWALL we might be already migrating
>> outside the original tasks's constraint? Am I missing something?
> 
> Yes, that is right, I thought we derive the policy from the old page
> somehow when migrating it, but reading the code does not seem to be the
> case.
> 
> Looking at prepare_alloc_pages(), if !ac->nodemask, which would be the
> case here, we would get the policy from the current task
> (alloc_contig_range()) when cpusets are enabled.
> 
> So yes, I am a bit puzzled why __GFP_HARDWALL was chosen in the first
> place.

I suspect because "GFP_USER" felt like the appropriate thing to do.

Before:

commit f90b1d2f1aaaa40c6519a32e69615edc25bb97d5
Author: Paul Jackson <pj@sgi.com>
Date:   Tue Sep 6 15:18:10 2005 -0700

     [PATCH] cpusets: new __GFP_HARDWALL flag

     Add another GFP flag: __GFP_HARDWALL.


GFP_USER  and GFP_KERNEL were the same thing. But memory 
offlining/alloc_contig were added later.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-04  9:28                 ` David Hildenbrand
@ 2024-12-04 10:04                   ` Oscar Salvador
  2024-12-04 11:05                     ` David Hildenbrand
  0 siblings, 1 reply; 33+ messages in thread
From: Oscar Salvador @ 2024-12-04 10:04 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Vlastimil Babka, linux-kernel, linux-mm, linuxppc-dev,
	Andrew Morton, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On Wed, Dec 04, 2024 at 10:28:39AM +0100, David Hildenbrand wrote:
> On 04.12.24 10:15, Oscar Salvador wrote:
> > On Wed, Dec 04, 2024 at 10:03:28AM +0100, Vlastimil Babka wrote:
> > > On 12/4/24 09:59, Oscar Salvador wrote:
> > > > On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
> > > > > It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
> > > > > and I removed the same flag combination in #2 from memory offline code, and
> > > > > we do have the exact same thing in do_migrate_range() in
> > > > > mm/memory_hotplug.c.
> > > > > 
> > > > > We should investigate if__GFP_HARDWALL is the right thing to use here, and
> > > > > if we can get rid of that by switching to GFP_KERNEL in all these places.
> > > > 
> > > > Why would not we want __GFP_HARDWALL set?
> > > > Without it, we could potentially migrate the page to a node which is not
> > > > part of the cpuset of the task that originally allocated it, thus violating the
> > > > policy? Is not that a problem?
> > > 
> > > The task doing the alloc_contig_range() will likely not be the same task as
> > > the one that originally allocated the page, so its policy would be
> > > different, no? So even with __GFP_HARDWALL we might be already migrating
> > > outside the original tasks's constraint? Am I missing something?
> > 
> > Yes, that is right, I thought we derive the policy from the old page
> > somehow when migrating it, but reading the code does not seem to be the
> > case.
> > 
> > Looking at prepare_alloc_pages(), if !ac->nodemask, which would be the
> > case here, we would get the policy from the current task
> > (alloc_contig_range()) when cpusets are enabled.
> > 
> > So yes, I am a bit puzzled why __GFP_HARDWALL was chosen in the first
> > place.
> 
> I suspect because "GFP_USER" felt like the appropriate thing to do.

Looking back at when the whole contiguous allocator patchset was posted,
it seems that it kinda copied what memory-offline code was doing, which
was migrating pages with GFP_HIGHUSER_MOVABLE (hotremove_migrate_alloc()).

Then, the HIGHMEM modifier was dropped due to HIGHMEM restrictions on
some systems, ending up with GFP_USER.


-- 
Oscar Salvador
SUSE Labs

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess
  2024-12-04 10:04                   ` Oscar Salvador
@ 2024-12-04 11:05                     ` David Hildenbrand
  0 siblings, 0 replies; 33+ messages in thread
From: David Hildenbrand @ 2024-12-04 11:05 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Vlastimil Babka, linux-kernel, linux-mm, linuxppc-dev,
	Andrew Morton, Zi Yan, Michael Ellerman, Nicholas Piggin,
	Christophe Leroy, Naveen N Rao, Madhavan Srinivasan

On 04.12.24 11:04, Oscar Salvador wrote:
> On Wed, Dec 04, 2024 at 10:28:39AM +0100, David Hildenbrand wrote:
>> On 04.12.24 10:15, Oscar Salvador wrote:
>>> On Wed, Dec 04, 2024 at 10:03:28AM +0100, Vlastimil Babka wrote:
>>>> On 12/4/24 09:59, Oscar Salvador wrote:
>>>>> On Tue, Dec 03, 2024 at 08:19:02PM +0100, David Hildenbrand wrote:
>>>>>> It was always set using "GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL",
>>>>>> and I removed the same flag combination in #2 from memory offline code, and
>>>>>> we do have the exact same thing in do_migrate_range() in
>>>>>> mm/memory_hotplug.c.
>>>>>>
>>>>>> We should investigate if__GFP_HARDWALL is the right thing to use here, and
>>>>>> if we can get rid of that by switching to GFP_KERNEL in all these places.
>>>>>
>>>>> Why would not we want __GFP_HARDWALL set?
>>>>> Without it, we could potentially migrate the page to a node which is not
>>>>> part of the cpuset of the task that originally allocated it, thus violating the
>>>>> policy? Is not that a problem?
>>>>
>>>> The task doing the alloc_contig_range() will likely not be the same task as
>>>> the one that originally allocated the page, so its policy would be
>>>> different, no? So even with __GFP_HARDWALL we might be already migrating
>>>> outside the original tasks's constraint? Am I missing something?
>>>
>>> Yes, that is right, I thought we derive the policy from the old page
>>> somehow when migrating it, but reading the code does not seem to be the
>>> case.
>>>
>>> Looking at prepare_alloc_pages(), if !ac->nodemask, which would be the
>>> case here, we would get the policy from the current task
>>> (alloc_contig_range()) when cpusets are enabled.
>>>
>>> So yes, I am a bit puzzled why __GFP_HARDWALL was chosen in the first
>>> place.
>>
>> I suspect because "GFP_USER" felt like the appropriate thing to do.
> 
> Looking back at when the whole contiguous allocator patchset was posted,
> it seems that it kinda copied what memory-offline code was doing, which
> was migrating pages with GFP_HIGHUSER_MOVABLE (hotremove_migrate_alloc()).
> 
> Then, the HIGHMEM modifier was dropped due to HIGHMEM restrictions on
> some systems, ending up with GFP_USER.

Looking at some other migration_target_control users, some of them also 
shouldn't be setting GFP_USER->HARDWALL either I think. Essentially, 
whenever we are migrating a page that is not guaranteed to be "ours" in 
the context of the caller.

mm/damon/paddr.c:__damon_pa_migrate_folio_list() for example, which 
obtained the addresses by scanning a chunk of physical address space.

For others it's less clear: soft_offline_in_use_page() may be called 
either using madvise() from process context, but also using sysfs using 
a PFN.


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-12-04 11:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-03  9:47 [PATCH RESEND v2 0/6] mm/page_alloc: gfp flags cleanups for alloc_contig_*() David Hildenbrand
2024-12-03  9:47 ` [PATCH RESEND v2 1/6] mm/page_isolation: don't pass gfp flags to isolate_single_pageblock() David Hildenbrand
2024-12-03 13:31   ` Vlastimil Babka
2024-12-03 15:30   ` Oscar Salvador
2024-12-03 21:44   ` Vishal Moola
2024-12-03  9:47 ` [PATCH RESEND v2 2/6] mm/page_isolation: don't pass gfp flags to start_isolate_page_range() David Hildenbrand
2024-12-03 13:32   ` Vlastimil Babka
2024-12-03 15:32   ` Oscar Salvador
2024-12-03 21:44   ` Vishal Moola
2024-12-03  9:47 ` [PATCH RESEND v2 3/6] mm/page_alloc: make __alloc_contig_migrate_range() static David Hildenbrand
2024-12-03 13:33   ` Vlastimil Babka
2024-12-03 15:33   ` Oscar Salvador
2024-12-03 21:45   ` Vishal Moola
2024-12-03  9:47 ` [PATCH RESEND v2 4/6] mm/page_alloc: sort out the alloc_contig_range() gfp flags mess David Hildenbrand
2024-12-03 13:55   ` Vlastimil Babka
2024-12-03 14:12     ` David Hildenbrand
2024-12-03 14:24       ` Vlastimil Babka
2024-12-03 15:49         ` Zi Yan
2024-12-03 19:07           ` David Hildenbrand
2024-12-03 19:19         ` David Hildenbrand
2024-12-04  8:54           ` Vlastimil Babka
2024-12-04  8:59           ` Oscar Salvador
2024-12-04  9:03             ` Vlastimil Babka
2024-12-04  9:15               ` Oscar Salvador
2024-12-04  9:28                 ` David Hildenbrand
2024-12-04 10:04                   ` Oscar Salvador
2024-12-04 11:05                     ` David Hildenbrand
2024-12-04  9:00   ` Oscar Salvador
2024-12-03  9:47 ` [PATCH RESEND v2 5/6] mm/page_alloc: forward the gfp flags from alloc_contig_range() to post_alloc_hook() David Hildenbrand
2024-12-03 14:36   ` Vlastimil Babka
2024-12-04  9:03   ` Oscar Salvador
2024-12-03  9:47 ` [PATCH RESEND v2 6/6] powernv/memtrace: use __GFP_ZERO with alloc_contig_pages() David Hildenbrand
2024-12-03 14:39   ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).