From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07793347514 for ; Wed, 1 Jul 2026 20:58:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782939488; cv=none; b=NOEFkvMg/gSi+gpDyC+EWMEaewulvcSd848eSzLZuYBcGAglsp4mtmXT/gJ9dZgNh4va/FFUWPsj9WxXmCp3Bw9PmslTFdn0zrf4Xl7KdaZT5IBGoK0kUlJmHcjdgHMLLMNSDl55ilfP2Vb80In86pCRnkkgVIHtLKwYBIBxQvU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1782939488; c=relaxed/simple; bh=xve+mOvjd+BvgjOSFLd2Ks3l0XGwzj+Xnq91l3VGPVc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HBJswfCJnPYuV/QGj5uu1sM015S/4Qzs4Nu5FB+JW2u2iz9+azMdNyHyH3Uk76t0aMA17P6F6obrYHrFzzreswypFPeTDrHRYTdN+BHJSWoKi8HO8pzCeVR41FDbxjyULwxFoVc/AU019lvd/ul2aoDoArpet3Yc5ywSd3tTKUg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=isOc6gwG; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="isOc6gwG" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-9217d13c276so63121085a.1 for ; Wed, 01 Jul 2026 13:58:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1782939485; x=1783544285; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=1lIXAcoKGevoWcF6ZG0C1AXYxTwWYspPDm0VDN/gH3w=; b=isOc6gwGhVRTDa8bO8fIV3c35vudS2zMvFMED7ascylSpYh7L/pcpAwKTbStxwAjim 6Tmd80hu33LHvbxVf2lJ2yp2XIz7tUKnqXEq99cqx+Blbte0UvNRY/TLPgDaQWBYvYdC muGUuoCXtRo70AFRsLi+WMxhTr0tXcvkYGUX8B8N5cv73m6qDTLlsSH4JwDWGVxfk+7U 7TI4y+3yEToLnfBBWAzpRubXuI4kn1pffHl9scrG9caam5oHNFTT7D/Z6YpSdjQGkGoJ W0nnJTn7STSntMXPuojEvBPyoPt6T97paqtMmErsPw9mIJvWHiFN18pLIeUmtwL6jiXH x2MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1782939485; x=1783544285; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1lIXAcoKGevoWcF6ZG0C1AXYxTwWYspPDm0VDN/gH3w=; b=avgUbLgyeCwtk5FLjbwr8Q6pC9drI5yKAzj0pSA9F2ZMuH6WfTPjcHYDdnIJq3tXpF DJChKbkXXXfOosEuQPRzqnTTLdabfI+Bp9o4DSliiqedP1LsP/ZQwomxX2UWonlGUCui ab4aVeqswVDGewJzkzISH89NHfmjYbiVE7vFsPyufYXlRs0aUf6lSLM0d4/UXL+rYxoY VJ4sExG3pYDSEQNq4+EDPowTW5+cOKAmkHPU1O7bP2hsTkX4DVOVmXxMQDA2AXejo3iW 4MZZABC6xLzWGPw0QHAf1TPr8IYW6CIuiom/vHqyyS48mar7ZHIWRzNnex7ZBoIa0o8h SoPQ== X-Forwarded-Encrypted: i=1; AFNElJ+WM5MuxwcJQHhNWSJsljNUz6vjnDam6RspjqYsuSmOJes64wsT8HY4IpE5Jzh7s07fsTh0B2W+LcKPvxA=@vger.kernel.org X-Gm-Message-State: AOJu0YzdxP8ebYtpgAMvnEv2+3sYO5guDVwjxI7CE/k29OlKM7qNNd8m 06WR/UoViwdTNUrA5H9z4Qjporb7oFuMeM3jIyOKrxQUYRDpSYI4P+xpxbe44yBoQZY= X-Gm-Gg: AfdE7cnzBEfLrgnrWLrAC11zaMX2JjCwzv4tdwUVEbv0IvOxNXhKQQ2dOqWTq8MZZWl V4Xl95meykYCvbaH7xuhgQxVHvvAAjjczJTJMNega/oJZBgdzJPeAyGbFZ+Zws7mwI+jHa7PjCK T9alxKHTlGocqAtz1eCRZO42QAZzx8dJwXl3r9GIY0vHz+IeUiIbPCjNx2xjiBvPBZozvW3vPCN FXdXGg7kjibopjhtdKsap8q2QNzdAYM9HsVNRZzd90UOvln30A0U5R3nFyMYFh+0EOid78vTfOh tgNI6TBktYHAvkky2GVl2U7IWhGWFHyjBWkvKMhbQkCHcTr/vTKWeX2wFh/GfkTpN45K5Wd7iZf AGm/7skypH/tEc4LKz0enp8tAJtZ9i/sT8PQq0XMri2kidgcH1VPCNkvOfNEBNvXfHUvndv1JLs 70Nq9zkulBeoA= X-Received: by 2002:a05:620a:31a7:b0:90c:be8b:3676 with SMTP id af79cd13be357-92e784fa513mr481282085a.56.1782939484638; Wed, 01 Jul 2026 13:58:04 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id af79cd13be357-92e80161816sm40893585a.22.2026.07.01.13.58.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jul 2026 13:58:03 -0700 (PDT) Date: Wed, 1 Jul 2026 16:57:59 -0400 From: Johannes Weiner To: "Vlastimil Babka (SUSE)" Cc: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/4] mm: page_alloc: move capture_control to the page allocator Message-ID: References: <20260626182215.1107966-1-hannes@cmpxchg.org> <20260626182215.1107966-4-hannes@cmpxchg.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Wed, Jul 01, 2026 at 08:02:55PM +0200, Vlastimil Babka (SUSE) wrote: > On 6/26/26 20:21, Johannes Weiner wrote: > > The compaction capturing code assumes the allocation request order > > and compaction target order are the same. That won't be true once > > defrag_mode promotes sub-block allocations to pageblock-order > > compaction: compaction targets the larger order, capture should > > remain at the original allocation order. > > Well I guess you could also try to capture the whole-pageblock page and then > deal with it as with whole pageblock stealing? But this works too and > perhaps it's simpler. You mean capture the whole block in *page and then break off the chunk we need? That's pretty far upstack. We don't hold the zone->lock at that point to expand()... I think this is indeed a bit simpler. > > @@ -2808,35 +2808,22 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, > > .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY), > > .ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY) > > }; > > - struct capture_control capc = { > > - .cc = &cc, > > - .page = NULL, > > - }; > > > > - /* > > - * Make sure the structs are really initialized before we expose the > > - * capture control, in case we are interrupted and the interrupt handler > > - * frees a page. > > - */ > > + /* See the comment in __alloc_pages_direct_compact() */ > > barrier(); > > - WRITE_ONCE(current->capture_control, &capc); > > + WRITE_ONCE(capc->cc, &cc); > > > > - ret = compact_zone(&cc, &capc); > > + ret = compact_zone(&cc, capc); > > + > > + WRITE_ONCE(capc->cc, NULL); > > I wonder if it makes sense to continue having capc->cc and this whole dance > in two functions. AFAICS (after patch 4/4) we access only capc->cc->zone and > capc->cc->migratetype. migratetype is stable in the whole > try_to_compact_pages(), could be part of capc. Order can be added by this > patch (with no semantic change to it) and not the next one. Zone varies, but > could be also in capc and set by try_to_compact_pages() before every call to > compact_zone_order(). Then compact_zone_order() doesn't have to set up any > capc fields anymore? You mean something like this? diff --git a/include/linux/compaction.h b/include/linux/compaction.h index f29ef0653546..66a2f70e9e01 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -58,6 +58,7 @@ enum compact_result { }; struct alloc_context; /* in mm/internal.h */ +struct capture_control; /* in mm/internal.h */ /* * Number of free order-0 pages that should be available above given watermark @@ -92,7 +93,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order); extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, enum compact_priority prio, - struct page **page); + struct capture_control *capc); extern void reset_isolation_suitable(pg_data_t *pgdat); extern bool compaction_suitable(struct zone *zone, int order, unsigned long watermark, int highest_zoneidx); diff --git a/mm/compaction.c b/mm/compaction.c index 7df3a85d43af..f37c130bc5cc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2791,7 +2791,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) static enum compact_result compact_zone_order(struct zone *zone, int order, gfp_t gfp_mask, enum compact_priority prio, unsigned int alloc_flags, int highest_zoneidx, - struct page **capture) + struct capture_control *capc) { enum compact_result ret; struct compact_control cc = { @@ -2808,10 +2808,6 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, .ignore_skip_hint = (prio == MIN_COMPACT_PRIORITY), .ignore_block_suitable = (prio == MIN_COMPACT_PRIORITY) }; - struct capture_control capc = { - .cc = &cc, - .page = NULL, - }; /* * Make sure the structs are really initialized before we expose the @@ -2819,9 +2815,9 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, * frees a page. */ barrier(); - WRITE_ONCE(current->capture_control, &capc); + WRITE_ONCE(current->capture_control, capc); - ret = compact_zone(&cc, &capc); + ret = compact_zone(&cc, capc); /* * Make sure we hide capture control first before we read the captured @@ -2829,14 +2825,14 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, * and we would leak it. */ WRITE_ONCE(current->capture_control, NULL); - *capture = READ_ONCE(capc.page); + /* * Technically, it is also possible that compaction is skipped but * the page is still captured out of luck(IRQ came and freed the page). * Returning COMPACT_SUCCESS in such cases helps in properly accounting * the COMPACT[STALL|FAIL] when compaction is skipped. */ - if (*capture) + if (READ_ONCE(capc->page)) ret = COMPACT_SUCCESS; return ret; @@ -2849,13 +2845,13 @@ static enum compact_result compact_zone_order(struct zone *zone, int order, * @alloc_flags: The allocation flags of the current allocation * @ac: The context of current allocation * @prio: Determines how hard direct compaction should try to succeed - * @capture: Pointer to free page created by compaction will be stored here + * @capc: Free page capture bypassing the freelist * * This is the main entry point for direct page compaction. */ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, unsigned int alloc_flags, const struct alloc_context *ac, - enum compact_priority prio, struct page **capture) + enum compact_priority prio, struct capture_control *capc) { struct zoneref *z; struct zone *zone; @@ -2882,8 +2878,9 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order, continue; } + capc->zone = zone; status = compact_zone_order(zone, order, gfp_mask, prio, - alloc_flags, ac->highest_zoneidx, capture); + alloc_flags, ac->highest_zoneidx, capc); rc = max(status, rc); /* The allocation should succeed, stop compacting */ diff --git a/mm/internal.h b/mm/internal.h index 181e79f1d6a2..6d3402001b93 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1059,7 +1059,9 @@ struct compact_control { * immediately when one is created during the free path. */ struct capture_control { - struct compact_control *cc; + struct zone *zone; + int migratetype; + int order; struct page *page; }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cb422505c6ef..a2cceaaaccb1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -721,14 +721,14 @@ static inline struct capture_control *task_capc(struct zone *zone) return unlikely(capc) && !(current->flags & PF_KTHREAD) && !capc->page && - capc->cc->zone == zone ? capc : NULL; + capc->zone == zone ? capc : NULL; } static inline bool compaction_capture(struct capture_control *capc, struct page *page, int order, int migratetype) { - if (!capc || order != capc->cc->order) + if (!capc || order != capc->order) return false; /* Do not accidentally pollute CMA or isolated regions*/ @@ -744,12 +744,12 @@ compaction_capture(struct capture_control *capc, struct page *page, * have trouble finding a high-order free page. */ if (order < pageblock_order && migratetype == MIGRATE_MOVABLE && - capc->cc->migratetype != MIGRATE_MOVABLE) + capc->migratetype != MIGRATE_MOVABLE) return false; - if (migratetype != capc->cc->migratetype) - trace_mm_page_alloc_extfrag(page, capc->cc->order, order, - capc->cc->migratetype, migratetype); + if (migratetype != capc->migratetype) + trace_mm_page_alloc_extfrag(page, capc->order, order, + capc->migratetype, migratetype); capc->page = page; return true; @@ -4146,6 +4146,12 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, struct page *page = NULL; unsigned long pflags; unsigned int noreclaim_flag; + struct capture_control capc = { + .zone = NULL, + .migratetype = ac->migratetype, + .order = order, + .page = NULL, + }; if (!order) return NULL; @@ -4156,13 +4162,15 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, noreclaim_flag = memalloc_noreclaim_save(); *compact_result = try_to_compact_pages(gfp_mask, order, alloc_flags, ac, - prio, &page); + prio, &capc); memalloc_noreclaim_restore(noreclaim_flag); fs_reclaim_release(gfp_mask); psi_memstall_leave(&pflags); delayacct_compact_end(); + page = READ_ONCE(capc.page); + if (*compact_result == COMPACT_SKIPPED || *compact_result == COMPACT_DEFERRED) return NULL; That could work as well, but let me know if that's what you thought. The only thing I don't like so much is that it leaks that READ_ONCE(capc.page) out of the function where we have those nice IRQ preemption comments and the current->capture_control barriering. > > @@ -718,7 +718,7 @@ static inline struct capture_control *task_capc(struct zone *zone) > > { > > struct capture_control *capc = current->capture_control; > > > > - return unlikely(capc) && > > + return unlikely(capc && capc->cc) && > > !(current->flags & PF_KTHREAD) && > > !capc->page && > > capc->cc->zone == zone ? capc : NULL; > > @@ -4146,23 +4146,42 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order, > > struct page *page = NULL; > > unsigned long pflags; > > unsigned int noreclaim_flag; > > + struct capture_control capc = { > > + .page = NULL, > > You didn't set .cc to NULL explicitly... > > > + }; > > > > if (!order) > > return NULL; > > > > + /* > > + * Make sure the structs are really initialized before we expose the > > + * capture control, in case we are interrupted and the interrupt handler > > + * frees a page. > > + */ > > + barrier(); > > So either an implicit { } NULL / zero initialization + barrier() is enough > (I hope so) and we don't need to set NULL / zero in every field explicitly. > Or not and then we should set every field and not just page. That partial capc = { .page = NULL, } will 0 the other fields. I don't see a correctness issue. Or were you talking about readability? Thanks for taking a look!