All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Mike Rapoport <rppt@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator
Date: Wed, 1 Jul 2026 16:57:59 -0400	[thread overview]
Message-ID: <akV_VwtcvUN7Mq16@cmpxchg.org> (raw)
In-Reply-To: <c11c73ba-4422-4ef1-834d-6cc230146dea@kernel.org>

On Wed, Jul 01, 2026 at 08:02:55PM +0200, Vlastimil Babka (SUSE) wrote:
> On 6/26/26 20:21, Johannes Weiner wrote:
> > The compaction capturing code assumes the allocation request order
> > and compaction target order are the same. That won't be true once
> > defrag_mode promotes sub-block allocations to pageblock-order
> > compaction: compaction targets the larger order, capture should
> > remain at the original allocation order.
> 
> Well I guess you could also try to capture the whole-pageblock page and then
> deal with it as with whole pageblock stealing? But this works too and
> perhaps it's simpler.

You mean capture the whole block in *page and then break off the chunk
we need? That's pretty far upstack. We don't hold the zone->lock at
that point to expand()... I think this is indeed a bit simpler.

> > @@ -2808,35 +2808,22 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
> >  		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
> >  		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
> >  	};
> > -	struct capture_control capc = {
> > -		.cc = &cc,
> > -		.page = NULL,
> > -	};
> >  
> > -	/*
> > -	 * Make sure the structs are really initialized before we expose the
> > -	 * capture control, in case we are interrupted and the interrupt handler
> > -	 * frees a page.
> > -	 */
> > +	/* See the comment in __alloc_pages_direct_compact() */
> >  	barrier();
> > -	WRITE_ONCE(current->capture_control, &capc);
> > +	WRITE_ONCE(capc->cc, &cc);
> >  
> > -	ret = compact_zone(&cc, &capc);
> > +	ret = compact_zone(&cc, capc);
> > +
> > +	WRITE_ONCE(capc->cc, NULL);
> 
> I wonder if it makes sense to continue having capc->cc and this whole dance
> in two functions. AFAICS (after patch 4/4) we access only capc->cc->zone and
> capc->cc->migratetype. migratetype is stable in the whole
> try_to_compact_pages(), could be part of capc. Order can be added by this
> patch (with no semantic change to it) and not the next one. Zone varies, but
> could be also in capc and set by try_to_compact_pages() before every call to
> compact_zone_order(). Then compact_zone_order() doesn't have to set up any
> capc fields anymore?

You mean something like this?

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index f29ef0653546..66a2f70e9e01 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,7 @@ enum compact_result {
 };
 
 struct alloc_context; /* in mm/internal.h */
+struct capture_control; /* in mm/internal.h */
 
 /*
  * Number of free order-0 pages that should be available above given watermark
@@ -92,7 +93,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio,
-		struct page **page);
+		struct capture_control *capc);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern bool compaction_suitable(struct zone *zone, int order,
 				unsigned long watermark, int highest_zoneidx);
diff --git a/mm/compaction.c b/mm/compaction.c
index 7df3a85d43af..f37c130bc5cc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2791,7 +2791,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int highest_zoneidx,
-		struct page **capture)
+		struct capture_control *capc)
 {
 	enum compact_result ret;
 	struct compact_control cc = {
@@ -2808,10 +2808,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
 	};
-	struct capture_control capc = {
-		.cc = &cc,
-		.page = NULL,
-	};
 
 	/*
 	 * Make sure the structs are really initialized before we expose the
@@ -2819,9 +2815,9 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * frees a page.
 	 */
 	barrier();
-	WRITE_ONCE(current->capture_control, &capc);
+	WRITE_ONCE(current->capture_control, capc);
 
-	ret = compact_zone(&cc, &capc);
+	ret = compact_zone(&cc, capc);
 
 	/*
 	 * Make sure we hide capture control first before we read the captured
@@ -2829,14 +2825,14 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * and we would leak it.
 	 */
 	WRITE_ONCE(current->capture_control, NULL);
-	*capture = READ_ONCE(capc.page);
+
 	/*
 	 * Technically, it is also possible that compaction is skipped but
 	 * the page is still captured out of luck(IRQ came and freed the page).
 	 * Returning COMPACT_SUCCESS in such cases helps in properly accounting
 	 * the COMPACT[STALL|FAIL] when compaction is skipped.
 	 */
-	if (*capture)
+	if (READ_ONCE(capc->page))
 		ret = COMPACT_SUCCESS;
 
 	return ret;
@@ -2849,13 +2845,13 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @prio: Determines how hard direct compaction should try to succeed
- * @capture: Pointer to free page created by compaction will be stored here
+ * @capc: Free page capture bypassing the freelist
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, struct page **capture)
+		enum compact_priority prio, struct capture_control *capc)
 {
 	struct zoneref *z;
 	struct zone *zone;
@@ -2882,8 +2878,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		capc->zone = zone;
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-				alloc_flags, ac->highest_zoneidx, capture);
+				alloc_flags, ac->highest_zoneidx, capc);
 		rc = max(status, rc);
 
 		/* The allocation should succeed, stop compacting */
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..6d3402001b93 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1059,7 +1059,9 @@ struct compact_control {
  * immediately when one is created during the free path.
  */
 struct capture_control {
-	struct compact_control *cc;
+	struct zone *zone;
+	int migratetype;
+	int order;
 	struct page *page;
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb422505c6ef..a2cceaaaccb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -721,14 +721,14 @@ static inline struct capture_control *task_capc(struct zone *zone)
 	return unlikely(capc) &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
-		capc->cc->zone == zone ? capc : NULL;
+		capc->zone == zone ? capc : NULL;
 }
 
 static inline bool
 compaction_capture(struct capture_control *capc, struct page *page,
 		   int order, int migratetype)
 {
-	if (!capc || order != capc->cc->order)
+	if (!capc || order != capc->order)
 		return false;
 
 	/* Do not accidentally pollute CMA or isolated regions*/
@@ -744,12 +744,12 @@ compaction_capture(struct capture_control *capc, struct page *page,
 	 * have trouble finding a high-order free page.
 	 */
 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE &&
-	    capc->cc->migratetype != MIGRATE_MOVABLE)
+	    capc->migratetype != MIGRATE_MOVABLE)
 		return false;
 
-	if (migratetype != capc->cc->migratetype)
-		trace_mm_page_alloc_extfrag(page, capc->cc->order, order,
-					    capc->cc->migratetype, migratetype);
+	if (migratetype != capc->migratetype)
+		trace_mm_page_alloc_extfrag(page, capc->order, order,
+					    capc->migratetype, migratetype);
 
 	capc->page = page;
 	return true;
@@ -4146,6 +4146,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
+	struct capture_control capc = {
+		.zone = NULL,
+		.migratetype = ac->migratetype,
+		.order = order,
+		.page = NULL,
+	};
 
 	if (!order)
 		return NULL;
@@ -4156,13 +4162,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-								prio, &page);
+							       prio, &capc);
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(gfp_mask);
 	psi_memstall_leave(&pflags);
 	delayacct_compact_end();
 
+	page = READ_ONCE(capc.page);
+
 	if (*compact_result == COMPACT_SKIPPED ||
 	    *compact_result == COMPACT_DEFERRED)
 		return NULL;

That could work as well, but let me know if that's what you thought.

The only thing I don't like so much is that it leaks that
READ_ONCE(capc.page) out of the function where we have those nice IRQ
preemption comments and the current->capture_control barriering.

> > @@ -718,7 +718,7 @@ static inline struct capture_control *task_capc(struct zone *zone)
> >  {
> >  	struct capture_control *capc = current->capture_control;
> >  
> > -	return unlikely(capc) &&
> > +	return unlikely(capc && capc->cc) &&
> >  		!(current->flags & PF_KTHREAD) &&
> >  		!capc->page &&
> >  		capc->cc->zone == zone ? capc : NULL;
> > @@ -4146,23 +4146,42 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> >  	struct page *page = NULL;
> >  	unsigned long pflags;
> >  	unsigned int noreclaim_flag;
> > +	struct capture_control capc = {
> > +		.page = NULL,
> 
> You didn't set .cc to NULL explicitly...
> 
> > +	};
> >  
> >  	if (!order)
> >  		return NULL;
> >  
> > +	/*
> > +	 * Make sure the structs are really initialized before we expose the
> > +	 * capture control, in case we are interrupted and the interrupt handler
> > +	 * frees a page.
> > +	 */
> > +	barrier();
> 
> So either an implicit { } NULL / zero initialization + barrier() is enough
> (I hope so) and we don't need to set NULL / zero in every field explicitly.
> Or not and then we should set every field and not just page.

That partial capc = { .page = NULL, } will 0 the other fields. I don't
see a correctness issue. Or were you talking about readability?

Thanks for taking a look!

  reply	other threads:[~2026-07-01 20:58 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26 18:21 [PATCH 0/4] mm: fix reclaim storms in defrag_mode Johannes Weiner
2026-06-26 18:21 ` [PATCH 1/4] mm: page_alloc: __GFP_FS lockdep annotation for direct compaction Johannes Weiner
2026-07-01 13:45   ` Vlastimil Babka (SUSE)
2026-06-26 18:21 ` [PATCH 2/4] mm: compaction: support non-movable compaction for pageblock requests Johannes Weiner
2026-07-01 14:19   ` Vlastimil Babka (SUSE)
2026-07-01 15:28     ` Johannes Weiner
2026-07-01 18:14       ` Vlastimil Babka (SUSE)
2026-07-01 21:11         ` Johannes Weiner
2026-06-26 18:21 ` [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator Johannes Weiner
2026-07-01 18:02   ` Vlastimil Babka (SUSE)
2026-07-01 20:57     ` Johannes Weiner [this message]
2026-06-26 18:21 ` [PATCH 4/4] mm: page_alloc: fix non-movable reclaim storm in defrag_mode Johannes Weiner
2026-06-26 18:29   ` Zi Yan
2026-06-26 18:43     ` Johannes Weiner
2026-07-01 18:06   ` Vlastimil Babka (SUSE)
2026-07-01 21:02     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akV_VwtcvUN7Mq16@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.