The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Mike Rapoport <rppt@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator
Date: Wed, 1 Jul 2026 16:57:59 -0400	[thread overview]
Message-ID: <akV_VwtcvUN7Mq16@cmpxchg.org> (raw)
In-Reply-To: <c11c73ba-4422-4ef1-834d-6cc230146dea@kernel.org>

On Wed, Jul 01, 2026 at 08:02:55PM +0200, Vlastimil Babka (SUSE) wrote:
> On 6/26/26 20:21, Johannes Weiner wrote:
> > The compaction capturing code assumes the allocation request order
> > and compaction target order are the same. That won't be true once
> > defrag_mode promotes sub-block allocations to pageblock-order
> > compaction: compaction targets the larger order, capture should
> > remain at the original allocation order.
> 
> Well I guess you could also try to capture the whole-pageblock page and then
> deal with it as with whole pageblock stealing? But this works too and
> perhaps it's simpler.

You mean capture the whole block in *page and then break off the chunk
we need? That's pretty far upstack. We don't hold the zone->lock at
that point to expand()... I think this is indeed a bit simpler.

> > @@ -2808,35 +2808,22 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
> >  		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
> >  		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
> >  	};
> > -	struct capture_control capc = {
> > -		.cc = &cc,
> > -		.page = NULL,
> > -	};
> >  
> > -	/*
> > -	 * Make sure the structs are really initialized before we expose the
> > -	 * capture control, in case we are interrupted and the interrupt handler
> > -	 * frees a page.
> > -	 */
> > +	/* See the comment in __alloc_pages_direct_compact() */
> >  	barrier();
> > -	WRITE_ONCE(current->capture_control, &capc);
> > +	WRITE_ONCE(capc->cc, &cc);
> >  
> > -	ret = compact_zone(&cc, &capc);
> > +	ret = compact_zone(&cc, capc);
> > +
> > +	WRITE_ONCE(capc->cc, NULL);
> 
> I wonder if it makes sense to continue having capc->cc and this whole dance
> in two functions. AFAICS (after patch 4/4) we access only capc->cc->zone and
> capc->cc->migratetype. migratetype is stable in the whole
> try_to_compact_pages(), could be part of capc. Order can be added by this
> patch (with no semantic change to it) and not the next one. Zone varies, but
> could be also in capc and set by try_to_compact_pages() before every call to
> compact_zone_order(). Then compact_zone_order() doesn't have to set up any
> capc fields anymore?

You mean something like this?

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index f29ef0653546..66a2f70e9e01 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,7 @@ enum compact_result {
 };
 
 struct alloc_context; /* in mm/internal.h */
+struct capture_control; /* in mm/internal.h */
 
 /*
  * Number of free order-0 pages that should be available above given watermark
@@ -92,7 +93,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio,
-		struct page **page);
+		struct capture_control *capc);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern bool compaction_suitable(struct zone *zone, int order,
 				unsigned long watermark, int highest_zoneidx);
diff --git a/mm/compaction.c b/mm/compaction.c
index 7df3a85d43af..f37c130bc5cc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2791,7 +2791,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int highest_zoneidx,
-		struct page **capture)
+		struct capture_control *capc)
 {
 	enum compact_result ret;
 	struct compact_control cc = {
@@ -2808,10 +2808,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
 	};
-	struct capture_control capc = {
-		.cc = &cc,
-		.page = NULL,
-	};
 
 	/*
 	 * Make sure the structs are really initialized before we expose the
@@ -2819,9 +2815,9 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * frees a page.
 	 */
 	barrier();
-	WRITE_ONCE(current->capture_control, &capc);
+	WRITE_ONCE(current->capture_control, capc);
 
-	ret = compact_zone(&cc, &capc);
+	ret = compact_zone(&cc, capc);
 
 	/*
 	 * Make sure we hide capture control first before we read the captured
@@ -2829,14 +2825,14 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * and we would leak it.
 	 */
 	WRITE_ONCE(current->capture_control, NULL);
-	*capture = READ_ONCE(capc.page);
+
 	/*
 	 * Technically, it is also possible that compaction is skipped but
 	 * the page is still captured out of luck(IRQ came and freed the page).
 	 * Returning COMPACT_SUCCESS in such cases helps in properly accounting
 	 * the COMPACT[STALL|FAIL] when compaction is skipped.
 	 */
-	if (*capture)
+	if (READ_ONCE(capc->page))
 		ret = COMPACT_SUCCESS;
 
 	return ret;
@@ -2849,13 +2845,13 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @prio: Determines how hard direct compaction should try to succeed
- * @capture: Pointer to free page created by compaction will be stored here
+ * @capc: Free page capture bypassing the freelist
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, struct page **capture)
+		enum compact_priority prio, struct capture_control *capc)
 {
 	struct zoneref *z;
 	struct zone *zone;
@@ -2882,8 +2878,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		capc->zone = zone;
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-				alloc_flags, ac->highest_zoneidx, capture);
+				alloc_flags, ac->highest_zoneidx, capc);
 		rc = max(status, rc);
 
 		/* The allocation should succeed, stop compacting */
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..6d3402001b93 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1059,7 +1059,9 @@ struct compact_control {
  * immediately when one is created during the free path.
  */
 struct capture_control {
-	struct compact_control *cc;
+	struct zone *zone;
+	int migratetype;
+	int order;
 	struct page *page;
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb422505c6ef..a2cceaaaccb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -721,14 +721,14 @@ static inline struct capture_control *task_capc(struct zone *zone)
 	return unlikely(capc) &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
-		capc->cc->zone == zone ? capc : NULL;
+		capc->zone == zone ? capc : NULL;
 }
 
 static inline bool
 compaction_capture(struct capture_control *capc, struct page *page,
 		   int order, int migratetype)
 {
-	if (!capc || order != capc->cc->order)
+	if (!capc || order != capc->order)
 		return false;
 
 	/* Do not accidentally pollute CMA or isolated regions*/
@@ -744,12 +744,12 @@ compaction_capture(struct capture_control *capc, struct page *page,
 	 * have trouble finding a high-order free page.
 	 */
 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE &&
-	    capc->cc->migratetype != MIGRATE_MOVABLE)
+	    capc->migratetype != MIGRATE_MOVABLE)
 		return false;
 
-	if (migratetype != capc->cc->migratetype)
-		trace_mm_page_alloc_extfrag(page, capc->cc->order, order,
-					    capc->cc->migratetype, migratetype);
+	if (migratetype != capc->migratetype)
+		trace_mm_page_alloc_extfrag(page, capc->order, order,
+					    capc->migratetype, migratetype);
 
 	capc->page = page;
 	return true;
@@ -4146,6 +4146,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
+	struct capture_control capc = {
+		.zone = NULL,
+		.migratetype = ac->migratetype,
+		.order = order,
+		.page = NULL,
+	};
 
 	if (!order)
 		return NULL;
@@ -4156,13 +4162,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-								prio, &page);
+							       prio, &capc);
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(gfp_mask);
 	psi_memstall_leave(&pflags);
 	delayacct_compact_end();
 
+	page = READ_ONCE(capc.page);
+
 	if (*compact_result == COMPACT_SKIPPED ||
 	    *compact_result == COMPACT_DEFERRED)
 		return NULL;

That could work as well, but let me know if that's what you thought.

The only thing I don't like so much is that it leaks that
READ_ONCE(capc.page) out of the function where we have those nice IRQ
preemption comments and the current->capture_control barriering.

> > @@ -718,7 +718,7 @@ static inline struct capture_control *task_capc(struct zone *zone)
> >  {
> >  	struct capture_control *capc = current->capture_control;
> >  
> > -	return unlikely(capc) &&
> > +	return unlikely(capc && capc->cc) &&
> >  		!(current->flags & PF_KTHREAD) &&
> >  		!capc->page &&
> >  		capc->cc->zone == zone ? capc : NULL;
> > @@ -4146,23 +4146,42 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> >  	struct page *page = NULL;
> >  	unsigned long pflags;
> >  	unsigned int noreclaim_flag;
> > +	struct capture_control capc = {
> > +		.page = NULL,
> 
> You didn't set .cc to NULL explicitly...
> 
> > +	};
> >  
> >  	if (!order)
> >  		return NULL;
> >  
> > +	/*
> > +	 * Make sure the structs are really initialized before we expose the
> > +	 * capture control, in case we are interrupted and the interrupt handler
> > +	 * frees a page.
> > +	 */
> > +	barrier();
> 
> So either an implicit { } NULL / zero initialization + barrier() is enough
> (I hope so) and we don't need to set NULL / zero in every field explicitly.
> Or not and then we should set every field and not just page.

That partial capc = { .page = NULL, } will 0 the other fields. I don't
see a correctness issue. Or were you talking about readability?

Thanks for taking a look!

  reply	other threads:[~2026-07-01 20:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-26 18:21 [PATCH 0/4] mm: fix reclaim storms in defrag_mode Johannes Weiner
2026-06-26 18:21 ` [PATCH 1/4] mm: page_alloc: __GFP_FS lockdep annotation for direct compaction Johannes Weiner
2026-07-01 13:45   ` Vlastimil Babka (SUSE)
2026-07-02  0:39   ` Shakeel Butt
2026-06-26 18:21 ` [PATCH 2/4] mm: compaction: support non-movable compaction for pageblock requests Johannes Weiner
2026-07-01 14:19   ` Vlastimil Babka (SUSE)
2026-07-01 15:28     ` Johannes Weiner
2026-07-01 18:14       ` Vlastimil Babka (SUSE)
2026-07-01 21:11         ` Johannes Weiner
2026-07-02  9:28           ` Vlastimil Babka (SUSE)
2026-07-02  1:47   ` Zi Yan
2026-06-26 18:21 ` [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator Johannes Weiner
2026-07-01 18:02   ` Vlastimil Babka (SUSE)
2026-07-01 20:57     ` Johannes Weiner [this message]
2026-07-02 10:20       ` Vlastimil Babka (SUSE)
2026-06-26 18:21 ` [PATCH 4/4] mm: page_alloc: fix non-movable reclaim storm in defrag_mode Johannes Weiner
2026-06-26 18:29   ` Zi Yan
2026-06-26 18:43     ` Johannes Weiner
2026-07-01 18:06   ` Vlastimil Babka (SUSE)
2026-07-01 21:02     ` Johannes Weiner
2026-07-02  0:31 ` [PATCH 0/4] mm: fix reclaim storms " Shakeel Butt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akV_VwtcvUN7Mq16@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=jackmanb@google.com \
    --cc=liam@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ljs@kernel.org \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox