From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07793347514
	for <linux-kernel@vger.kernel.org>; Wed,  1 Jul 2026 20:58:05 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1782939488; cv=none; b=NOEFkvMg/gSi+gpDyC+EWMEaewulvcSd848eSzLZuYBcGAglsp4mtmXT/gJ9dZgNh4va/FFUWPsj9WxXmCp3Bw9PmslTFdn0zrf4Xl7KdaZT5IBGoK0kUlJmHcjdgHMLLMNSDl55ilfP2Vb80In86pCRnkkgVIHtLKwYBIBxQvU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1782939488; c=relaxed/simple;
	bh=xve+mOvjd+BvgjOSFLd2Ks3l0XGwzj+Xnq91l3VGPVc=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=HBJswfCJnPYuV/QGj5uu1sM015S/4Qzs4Nu5FB+JW2u2iz9+azMdNyHyH3Uk76t0aMA17P6F6obrYHrFzzreswypFPeTDrHRYTdN+BHJSWoKi8HO8pzCeVR41FDbxjyULwxFoVc/AU019lvd/ul2aoDoArpet3Yc5ywSd3tTKUg=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=isOc6gwG; arc=none smtp.client-ip=209.85.222.170
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="isOc6gwG"
Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-9217d13c276so63121085a.1
        for <linux-kernel@vger.kernel.org>; Wed, 01 Jul 2026 13:58:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cmpxchg.org; s=google; t=1782939485; x=1783544285; darn=vger.kernel.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=1lIXAcoKGevoWcF6ZG0C1AXYxTwWYspPDm0VDN/gH3w=;
        b=isOc6gwGhVRTDa8bO8fIV3c35vudS2zMvFMED7ascylSpYh7L/pcpAwKTbStxwAjim
         6Tmd80hu33LHvbxVf2lJ2yp2XIz7tUKnqXEq99cqx+Blbte0UvNRY/TLPgDaQWBYvYdC
         muGUuoCXtRo70AFRsLi+WMxhTr0tXcvkYGUX8B8N5cv73m6qDTLlsSH4JwDWGVxfk+7U
         7TI4y+3yEToLnfBBWAzpRubXuI4kn1pffHl9scrG9caam5oHNFTT7D/Z6YpSdjQGkGoJ
         W0nnJTn7STSntMXPuojEvBPyoPt6T97paqtMmErsPw9mIJvWHiFN18pLIeUmtwL6jiXH
         x2MQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20251104; t=1782939485; x=1783544285;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=1lIXAcoKGevoWcF6ZG0C1AXYxTwWYspPDm0VDN/gH3w=;
        b=avgUbLgyeCwtk5FLjbwr8Q6pC9drI5yKAzj0pSA9F2ZMuH6WfTPjcHYDdnIJq3tXpF
         DJChKbkXXXfOosEuQPRzqnTTLdabfI+Bp9o4DSliiqedP1LsP/ZQwomxX2UWonlGUCui
         ab4aVeqswVDGewJzkzISH89NHfmjYbiVE7vFsPyufYXlRs0aUf6lSLM0d4/UXL+rYxoY
         VJ4sExG3pYDSEQNq4+EDPowTW5+cOKAmkHPU1O7bP2hsTkX4DVOVmXxMQDA2AXejo3iW
         4MZZABC6xLzWGPw0QHAf1TPr8IYW6CIuiom/vHqyyS48mar7ZHIWRzNnex7ZBoIa0o8h
         SoPQ==
X-Forwarded-Encrypted: i=1; AFNElJ+WM5MuxwcJQHhNWSJsljNUz6vjnDam6RspjqYsuSmOJes64wsT8HY4IpE5Jzh7s07fsTh0B2W+LcKPvxA=@vger.kernel.org
X-Gm-Message-State: AOJu0YzdxP8ebYtpgAMvnEv2+3sYO5guDVwjxI7CE/k29OlKM7qNNd8m
	06WR/UoViwdTNUrA5H9z4Qjporb7oFuMeM3jIyOKrxQUYRDpSYI4P+xpxbe44yBoQZY=
X-Gm-Gg: AfdE7cnzBEfLrgnrWLrAC11zaMX2JjCwzv4tdwUVEbv0IvOxNXhKQQ2dOqWTq8MZZWl
	V4Xl95meykYCvbaH7xuhgQxVHvvAAjjczJTJMNega/oJZBgdzJPeAyGbFZ+Zws7mwI+jHa7PjCK
	T9alxKHTlGocqAtz1eCRZO42QAZzx8dJwXl3r9GIY0vHz+IeUiIbPCjNx2xjiBvPBZozvW3vPCN
	FXdXGg7kjibopjhtdKsap8q2QNzdAYM9HsVNRZzd90UOvln30A0U5R3nFyMYFh+0EOid78vTfOh
	tgNI6TBktYHAvkky2GVl2U7IWhGWFHyjBWkvKMhbQkCHcTr/vTKWeX2wFh/GfkTpN45K5Wd7iZf
	AGm/7skypH/tEc4LKz0enp8tAJtZ9i/sT8PQq0XMri2kidgcH1VPCNkvOfNEBNvXfHUvndv1JLs
	70Nq9zkulBeoA=
X-Received: by 2002:a05:620a:31a7:b0:90c:be8b:3676 with SMTP id af79cd13be357-92e784fa513mr481282085a.56.1782939484638;
        Wed, 01 Jul 2026 13:58:04 -0700 (PDT)
Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29])
        by smtp.gmail.com with ESMTPSA id af79cd13be357-92e80161816sm40893585a.22.2026.07.01.13.58.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 01 Jul 2026 13:58:03 -0700 (PDT)
Date: Wed, 1 Jul 2026 16:57:59 -0400
From: Johannes Weiner <hannes@cmpxchg.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <ljs@kernel.org>,
	"Liam R. Howlett" <liam@infradead.org>,
	Mike Rapoport <rppt@kernel.org>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: page_alloc: move capture_control to the page
 allocator
Message-ID: <akV_VwtcvUN7Mq16@cmpxchg.org>
References: <20260626182215.1107966-1-hannes@cmpxchg.org>
 <20260626182215.1107966-4-hannes@cmpxchg.org>
 <c11c73ba-4422-4ef1-834d-6cc230146dea@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <c11c73ba-4422-4ef1-834d-6cc230146dea@kernel.org>

On Wed, Jul 01, 2026 at 08:02:55PM +0200, Vlastimil Babka (SUSE) wrote:
> On 6/26/26 20:21, Johannes Weiner wrote:
> > The compaction capturing code assumes the allocation request order
> > and compaction target order are the same. That won't be true once
> > defrag_mode promotes sub-block allocations to pageblock-order
> > compaction: compaction targets the larger order, capture should
> > remain at the original allocation order.
> 
> Well I guess you could also try to capture the whole-pageblock page and then
> deal with it as with whole pageblock stealing? But this works too and
> perhaps it's simpler.

You mean capture the whole block in *page and then break off the chunk
we need? That's pretty far upstack. We don't hold the zone->lock at
that point to expand()... I think this is indeed a bit simpler.

> > @@ -2808,35 +2808,22 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
> >  		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
> >  		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
> >  	};
> > -	struct capture_control capc = {
> > -		.cc = &cc,
> > -		.page = NULL,
> > -	};
> >  
> > -	/*
> > -	 * Make sure the structs are really initialized before we expose the
> > -	 * capture control, in case we are interrupted and the interrupt handler
> > -	 * frees a page.
> > -	 */
> > +	/* See the comment in __alloc_pages_direct_compact() */
> >  	barrier();
> > -	WRITE_ONCE(current->capture_control, &capc);
> > +	WRITE_ONCE(capc->cc, &cc);
> >  
> > -	ret = compact_zone(&cc, &capc);
> > +	ret = compact_zone(&cc, capc);
> > +
> > +	WRITE_ONCE(capc->cc, NULL);
> 
> I wonder if it makes sense to continue having capc->cc and this whole dance
> in two functions. AFAICS (after patch 4/4) we access only capc->cc->zone and
> capc->cc->migratetype. migratetype is stable in the whole
> try_to_compact_pages(), could be part of capc. Order can be added by this
> patch (with no semantic change to it) and not the next one. Zone varies, but
> could be also in capc and set by try_to_compact_pages() before every call to
> compact_zone_order(). Then compact_zone_order() doesn't have to set up any
> capc fields anymore?

You mean something like this?

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index f29ef0653546..66a2f70e9e01 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,7 @@ enum compact_result {
 };
 
 struct alloc_context; /* in mm/internal.h */
+struct capture_control; /* in mm/internal.h */
 
 /*
  * Number of free order-0 pages that should be available above given watermark
@@ -92,7 +93,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
 		unsigned int order, unsigned int alloc_flags,
 		const struct alloc_context *ac, enum compact_priority prio,
-		struct page **page);
+		struct capture_control *capc);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern bool compaction_suitable(struct zone *zone, int order,
 				unsigned long watermark, int highest_zoneidx);
diff --git a/mm/compaction.c b/mm/compaction.c
index 7df3a85d43af..f37c130bc5cc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2791,7 +2791,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 static enum compact_result compact_zone_order(struct zone *zone, int order,
 		gfp_t gfp_mask, enum compact_priority prio,
 		unsigned int alloc_flags, int highest_zoneidx,
-		struct page **capture)
+		struct capture_control *capc)
 {
 	enum compact_result ret;
 	struct compact_control cc = {
@@ -2808,10 +2808,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 		.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
 		.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
 	};
-	struct capture_control capc = {
-		.cc = &cc,
-		.page = NULL,
-	};
 
 	/*
 	 * Make sure the structs are really initialized before we expose the
@@ -2819,9 +2815,9 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * frees a page.
 	 */
 	barrier();
-	WRITE_ONCE(current->capture_control, &capc);
+	WRITE_ONCE(current->capture_control, capc);
 
-	ret = compact_zone(&cc, &capc);
+	ret = compact_zone(&cc, capc);
 
 	/*
 	 * Make sure we hide capture control first before we read the captured
@@ -2829,14 +2825,14 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	 * and we would leak it.
 	 */
 	WRITE_ONCE(current->capture_control, NULL);
-	*capture = READ_ONCE(capc.page);
+
 	/*
 	 * Technically, it is also possible that compaction is skipped but
 	 * the page is still captured out of luck(IRQ came and freed the page).
 	 * Returning COMPACT_SUCCESS in such cases helps in properly accounting
 	 * the COMPACT[STALL|FAIL] when compaction is skipped.
 	 */
-	if (*capture)
+	if (READ_ONCE(capc->page))
 		ret = COMPACT_SUCCESS;
 
 	return ret;
@@ -2849,13 +2845,13 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
  * @alloc_flags: The allocation flags of the current allocation
  * @ac: The context of current allocation
  * @prio: Determines how hard direct compaction should try to succeed
- * @capture: Pointer to free page created by compaction will be stored here
+ * @capc: Free page capture bypassing the freelist
  *
  * This is the main entry point for direct page compaction.
  */
 enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 		unsigned int alloc_flags, const struct alloc_context *ac,
-		enum compact_priority prio, struct page **capture)
+		enum compact_priority prio, struct capture_control *capc)
 {
 	struct zoneref *z;
 	struct zone *zone;
@@ -2882,8 +2878,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
 			continue;
 		}
 
+		capc->zone = zone;
 		status = compact_zone_order(zone, order, gfp_mask, prio,
-				alloc_flags, ac->highest_zoneidx, capture);
+				alloc_flags, ac->highest_zoneidx, capc);
 		rc = max(status, rc);
 
 		/* The allocation should succeed, stop compacting */
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..6d3402001b93 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1059,7 +1059,9 @@ struct compact_control {
  * immediately when one is created during the free path.
  */
 struct capture_control {
-	struct compact_control *cc;
+	struct zone *zone;
+	int migratetype;
+	int order;
 	struct page *page;
 };
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb422505c6ef..a2cceaaaccb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -721,14 +721,14 @@ static inline struct capture_control *task_capc(struct zone *zone)
 	return unlikely(capc) &&
 		!(current->flags & PF_KTHREAD) &&
 		!capc->page &&
-		capc->cc->zone == zone ? capc : NULL;
+		capc->zone == zone ? capc : NULL;
 }
 
 static inline bool
 compaction_capture(struct capture_control *capc, struct page *page,
 		   int order, int migratetype)
 {
-	if (!capc || order != capc->cc->order)
+	if (!capc || order != capc->order)
 		return false;
 
 	/* Do not accidentally pollute CMA or isolated regions*/
@@ -744,12 +744,12 @@ compaction_capture(struct capture_control *capc, struct page *page,
 	 * have trouble finding a high-order free page.
 	 */
 	if (order < pageblock_order && migratetype == MIGRATE_MOVABLE &&
-	    capc->cc->migratetype != MIGRATE_MOVABLE)
+	    capc->migratetype != MIGRATE_MOVABLE)
 		return false;
 
-	if (migratetype != capc->cc->migratetype)
-		trace_mm_page_alloc_extfrag(page, capc->cc->order, order,
-					    capc->cc->migratetype, migratetype);
+	if (migratetype != capc->migratetype)
+		trace_mm_page_alloc_extfrag(page, capc->order, order,
+					    capc->migratetype, migratetype);
 
 	capc->page = page;
 	return true;
@@ -4146,6 +4146,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	struct page *page = NULL;
 	unsigned long pflags;
 	unsigned int noreclaim_flag;
+	struct capture_control capc = {
+		.zone = NULL,
+		.migratetype = ac->migratetype,
+		.order = order,
+		.page = NULL,
+	};
 
 	if (!order)
 		return NULL;
@@ -4156,13 +4162,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	noreclaim_flag = memalloc_noreclaim_save();
 
 	*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
-								prio, &page);
+							       prio, &capc);
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(gfp_mask);
 	psi_memstall_leave(&pflags);
 	delayacct_compact_end();
 
+	page = READ_ONCE(capc.page);
+
 	if (*compact_result == COMPACT_SKIPPED ||
 	    *compact_result == COMPACT_DEFERRED)
 		return NULL;

That could work as well, but let me know if that's what you thought.

The only thing I don't like so much is that it leaks that
READ_ONCE(capc.page) out of the function where we have those nice IRQ
preemption comments and the current->capture_control barriering.

> > @@ -718,7 +718,7 @@ static inline struct capture_control *task_capc(struct zone *zone)
> >  {
> >  	struct capture_control *capc = current->capture_control;
> >  
> > -	return unlikely(capc) &&
> > +	return unlikely(capc && capc->cc) &&
> >  		!(current->flags & PF_KTHREAD) &&
> >  		!capc->page &&
> >  		capc->cc->zone == zone ? capc : NULL;
> > @@ -4146,23 +4146,42 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> >  	struct page *page = NULL;
> >  	unsigned long pflags;
> >  	unsigned int noreclaim_flag;
> > +	struct capture_control capc = {
> > +		.page = NULL,
> 
> You didn't set .cc to NULL explicitly...
> 
> > +	};
> >  
> >  	if (!order)
> >  		return NULL;
> >  
> > +	/*
> > +	 * Make sure the structs are really initialized before we expose the
> > +	 * capture control, in case we are interrupted and the interrupt handler
> > +	 * frees a page.
> > +	 */
> > +	barrier();
> 
> So either an implicit { } NULL / zero initialization + barrier() is enough
> (I hope so) and we don't need to set NULL / zero in every field explicitly.
> Or not and then we should set every field and not just page.

That partial capc = { .page = NULL, } will 0 the other fields. I don't
see a correctness issue. Or were you talking about readability?

Thanks for taking a look!