From: Johannes Weiner <hannes@cmpxchg.org>
To: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>, Zi Yan <ziy@nvidia.com>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <ljs@kernel.org>,
"Liam R. Howlett" <liam@infradead.org>,
Mike Rapoport <rppt@kernel.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator
Date: Wed, 1 Jul 2026 16:57:59 -0400 [thread overview]
Message-ID: <akV_VwtcvUN7Mq16@cmpxchg.org> (raw)
In-Reply-To: <c11c73ba-4422-4ef1-834d-6cc230146dea@kernel.org>
On Wed, Jul 01, 2026 at 08:02:55PM +0200, Vlastimil Babka (SUSE) wrote:
> On 6/26/26 20:21, Johannes Weiner wrote:
> > The compaction capturing code assumes the allocation request order
> > and compaction target order are the same. That won't be true once
> > defrag_mode promotes sub-block allocations to pageblock-order
> > compaction: compaction targets the larger order, capture should
> > remain at the original allocation order.
>
> Well I guess you could also try to capture the whole-pageblock page and then
> deal with it as with whole pageblock stealing? But this works too and
> perhaps it's simpler.
You mean capture the whole block in *page and then break off the chunk
we need? That's pretty far upstack. We don't hold the zone->lock at
that point to expand()... I think this is indeed a bit simpler.
> > @@ -2808,35 +2808,22 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
> > .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
> > .ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
> > };
> > - struct capture_control capc = {
> > - .cc = &cc,
> > - .page = NULL,
> > - };
> >
> > - /*
> > - * Make sure the structs are really initialized before we expose the
> > - * capture control, in case we are interrupted and the interrupt handler
> > - * frees a page.
> > - */
> > + /* See the comment in __alloc_pages_direct_compact() */
> > barrier();
> > - WRITE_ONCE(current->capture_control, &capc);
> > + WRITE_ONCE(capc->cc, &cc);
> >
> > - ret = compact_zone(&cc, &capc);
> > + ret = compact_zone(&cc, capc);
> > +
> > + WRITE_ONCE(capc->cc, NULL);
>
> I wonder if it makes sense to continue having capc->cc and this whole dance
> in two functions. AFAICS (after patch 4/4) we access only capc->cc->zone and
> capc->cc->migratetype. migratetype is stable in the whole
> try_to_compact_pages(), could be part of capc. Order can be added by this
> patch (with no semantic change to it) and not the next one. Zone varies, but
> could be also in capc and set by try_to_compact_pages() before every call to
> compact_zone_order(). Then compact_zone_order() doesn't have to set up any
> capc fields anymore?
You mean something like this?
diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index f29ef0653546..66a2f70e9e01 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -58,6 +58,7 @@ enum compact_result {
};
struct alloc_context; /* in mm/internal.h */
+struct capture_control; /* in mm/internal.h */
/*
* Number of free order-0 pages that should be available above given watermark
@@ -92,7 +93,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
unsigned int order, unsigned int alloc_flags,
const struct alloc_context *ac, enum compact_priority prio,
- struct page **page);
+ struct capture_control *capc);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern bool compaction_suitable(struct zone *zone, int order,
unsigned long watermark, int highest_zoneidx);
diff --git a/mm/compaction.c b/mm/compaction.c
index 7df3a85d43af..f37c130bc5cc 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2791,7 +2791,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
static enum compact_result compact_zone_order(struct zone *zone, int order,
gfp_t gfp_mask, enum compact_priority prio,
unsigned int alloc_flags, int highest_zoneidx,
- struct page **capture)
+ struct capture_control *capc)
{
enum compact_result ret;
struct compact_control cc = {
@@ -2808,10 +2808,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
.ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY),
.ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY)
};
- struct capture_control capc = {
- .cc = &cc,
- .page = NULL,
- };
/*
* Make sure the structs are really initialized before we expose the
@@ -2819,9 +2815,9 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
* frees a page.
*/
barrier();
- WRITE_ONCE(current->capture_control, &capc);
+ WRITE_ONCE(current->capture_control, capc);
- ret = compact_zone(&cc, &capc);
+ ret = compact_zone(&cc, capc);
/*
* Make sure we hide capture control first before we read the captured
@@ -2829,14 +2825,14 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
* and we would leak it.
*/
WRITE_ONCE(current->capture_control, NULL);
- *capture = READ_ONCE(capc.page);
+
/*
* Technically, it is also possible that compaction is skipped but
* the page is still captured out of luck(IRQ came and freed the page).
* Returning COMPACT_SUCCESS in such cases helps in properly accounting
* the COMPACT[STALL|FAIL] when compaction is skipped.
*/
- if (*capture)
+ if (READ_ONCE(capc->page))
ret = COMPACT_SUCCESS;
return ret;
@@ -2849,13 +2845,13 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
* @alloc_flags: The allocation flags of the current allocation
* @ac: The context of current allocation
* @prio: Determines how hard direct compaction should try to succeed
- * @capture: Pointer to free page created by compaction will be stored here
+ * @capc: Free page capture bypassing the freelist
*
* This is the main entry point for direct page compaction.
*/
enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
- enum compact_priority prio, struct page **capture)
+ enum compact_priority prio, struct capture_control *capc)
{
struct zoneref *z;
struct zone *zone;
@@ -2882,8 +2878,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
continue;
}
+ capc->zone = zone;
status = compact_zone_order(zone, order, gfp_mask, prio,
- alloc_flags, ac->highest_zoneidx, capture);
+ alloc_flags, ac->highest_zoneidx, capc);
rc = max(status, rc);
/* The allocation should succeed, stop compacting */
diff --git a/mm/internal.h b/mm/internal.h
index 181e79f1d6a2..6d3402001b93 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1059,7 +1059,9 @@ struct compact_control {
* immediately when one is created during the free path.
*/
struct capture_control {
- struct compact_control *cc;
+ struct zone *zone;
+ int migratetype;
+ int order;
struct page *page;
};
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cb422505c6ef..a2cceaaaccb1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -721,14 +721,14 @@ static inline struct capture_control *task_capc(struct zone *zone)
return unlikely(capc) &&
!(current->flags & PF_KTHREAD) &&
!capc->page &&
- capc->cc->zone == zone ? capc : NULL;
+ capc->zone == zone ? capc : NULL;
}
static inline bool
compaction_capture(struct capture_control *capc, struct page *page,
int order, int migratetype)
{
- if (!capc || order != capc->cc->order)
+ if (!capc || order != capc->order)
return false;
/* Do not accidentally pollute CMA or isolated regions*/
@@ -744,12 +744,12 @@ compaction_capture(struct capture_control *capc, struct page *page,
* have trouble finding a high-order free page.
*/
if (order < pageblock_order && migratetype == MIGRATE_MOVABLE &&
- capc->cc->migratetype != MIGRATE_MOVABLE)
+ capc->migratetype != MIGRATE_MOVABLE)
return false;
- if (migratetype != capc->cc->migratetype)
- trace_mm_page_alloc_extfrag(page, capc->cc->order, order,
- capc->cc->migratetype, migratetype);
+ if (migratetype != capc->migratetype)
+ trace_mm_page_alloc_extfrag(page, capc->order, order,
+ capc->migratetype, migratetype);
capc->page = page;
return true;
@@ -4146,6 +4146,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
struct page *page = NULL;
unsigned long pflags;
unsigned int noreclaim_flag;
+ struct capture_control capc = {
+ .zone = NULL,
+ .migratetype = ac->migratetype,
+ .order = order,
+ .page = NULL,
+ };
if (!order)
return NULL;
@@ -4156,13 +4162,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
noreclaim_flag = memalloc_noreclaim_save();
*compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac,
- prio, &page);
+ prio, &capc);
memalloc_noreclaim_restore(noreclaim_flag);
fs_reclaim_release(gfp_mask);
psi_memstall_leave(&pflags);
delayacct_compact_end();
+ page = READ_ONCE(capc.page);
+
if (*compact_result == COMPACT_SKIPPED ||
*compact_result == COMPACT_DEFERRED)
return NULL;
That could work as well, but let me know if that's what you thought.
The only thing I don't like so much is that it leaks that
READ_ONCE(capc.page) out of the function where we have those nice IRQ
preemption comments and the current->capture_control barriering.
> > @@ -718,7 +718,7 @@ static inline struct capture_control *task_capc(struct zone *zone)
> > {
> > struct capture_control *capc = current->capture_control;
> >
> > - return unlikely(capc) &&
> > + return unlikely(capc && capc->cc) &&
> > !(current->flags & PF_KTHREAD) &&
> > !capc->page &&
> > capc->cc->zone == zone ? capc : NULL;
> > @@ -4146,23 +4146,42 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
> > struct page *page = NULL;
> > unsigned long pflags;
> > unsigned int noreclaim_flag;
> > + struct capture_control capc = {
> > + .page = NULL,
>
> You didn't set .cc to NULL explicitly...
>
> > + };
> >
> > if (!order)
> > return NULL;
> >
> > + /*
> > + * Make sure the structs are really initialized before we expose the
> > + * capture control, in case we are interrupted and the interrupt handler
> > + * frees a page.
> > + */
> > + barrier();
>
> So either an implicit { } NULL / zero initialization + barrier() is enough
> (I hope so) and we don't need to set NULL / zero in every field explicitly.
> Or not and then we should set every field and not just page.
That partial capc = { .page = NULL, } will 0 the other fields. I don't
see a correctness issue. Or were you talking about readability?
Thanks for taking a look!
next prev parent reply other threads:[~2026-07-01 20:58 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-26 18:21 [PATCH 0/4] mm: fix reclaim storms in defrag_mode Johannes Weiner
2026-06-26 18:21 ` [PATCH 1/4] mm: page_alloc: __GFP_FS lockdep annotation for direct compaction Johannes Weiner
2026-07-01 13:45 ` Vlastimil Babka (SUSE)
2026-07-02 0:39 ` Shakeel Butt
2026-06-26 18:21 ` [PATCH 2/4] mm: compaction: support non-movable compaction for pageblock requests Johannes Weiner
2026-07-01 14:19 ` Vlastimil Babka (SUSE)
2026-07-01 15:28 ` Johannes Weiner
2026-07-01 18:14 ` Vlastimil Babka (SUSE)
2026-07-01 21:11 ` Johannes Weiner
2026-07-02 9:28 ` Vlastimil Babka (SUSE)
2026-07-02 1:47 ` Zi Yan
2026-06-26 18:21 ` [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator Johannes Weiner
2026-07-01 18:02 ` Vlastimil Babka (SUSE)
2026-07-01 20:57 ` Johannes Weiner [this message]
2026-07-02 10:20 ` Vlastimil Babka (SUSE)
2026-06-26 18:21 ` [PATCH 4/4] mm: page_alloc: fix non-movable reclaim storm in defrag_mode Johannes Weiner
2026-06-26 18:29 ` Zi Yan
2026-06-26 18:43 ` Johannes Weiner
2026-07-01 18:06 ` Vlastimil Babka (SUSE)
2026-07-01 21:02 ` Johannes Weiner
2026-07-02 0:31 ` [PATCH 0/4] mm: fix reclaim storms " Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akV_VwtcvUN7Mq16@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jackmanb@google.com \
--cc=liam@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox