From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54E2DC021AA for ; Tue, 18 Feb 2025 20:38:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B510528019E; Tue, 18 Feb 2025 15:38:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD91C28019B; Tue, 18 Feb 2025 15:38:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 88FCA28019E; Tue, 18 Feb 2025 15:38:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5FD7A28019B for ; Tue, 18 Feb 2025 15:38:38 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 177B31605BF for ; Tue, 18 Feb 2025 20:38:38 +0000 (UTC) X-FDA: 83134228716.22.C6F4B49 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf23.hostedemail.com (Postfix) with ESMTP id 0C31D140009 for ; Tue, 18 Feb 2025 20:38:35 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=JTVkzMRl; spf=pass (imf23.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739911116; a=rsa-sha256; cv=none; b=d4WoWqFapJWL+GFWMRYAhq93caZSDd7SLyy7hsrlD79Rw83cnpyCuSzzu6roLqqIYAZRCp B3EOr+YyEQgDhWAropnwC2hsAX+8HIbw/X1858eZVkS8YuM13gvWQxl1JmSA+VAN4uKdK2 scetEVsQBi+G8e33BSeqJkbuxXBHjo0= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=JTVkzMRl; spf=pass (imf23.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.51 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739911116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xiogs8K0wkN/JcL/xShcpaN2YEfhgzzsEDBVwZbdHNw=; b=Nu7wY2X9O7DMQqhq8rULmVT8+rSE+nGFRCITmOnZGWzxqz1P/FF++OMuhFdUCoF9rHbmHs lTK0E3HZWql6sdoA6IdJhqJrwAzT7q3vQZHF/5voSGWOBWtGzHDdVsMwiYugAeDvI+8cNP 7znqqOcm/V0R2+VmcXuwEKxn05VrW58= Received: by mail-qv1-f51.google.com with SMTP id 6a1803df08f44-6e67bc04a3bso30743826d6.3 for ; Tue, 18 Feb 2025 12:38:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1739911115; x=1740515915; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=xiogs8K0wkN/JcL/xShcpaN2YEfhgzzsEDBVwZbdHNw=; b=JTVkzMRl5aYdwmqDKx0Os5AhUe5cL0PZtpSCkq4aKNNriK4zbf0mv7hIivzFsnGFp1 KgYpy7l+ck5+F7z9pQhldkZ5yoRrbveKQsd/4us4IgMd2NPQQ7c91oLdqGodTOm8mHQ/ AF//R2rnMXBIjoEuIsc4652DcCvm119ofoSNQf2SqrE8/nZmlVqPFzwKLnfvSanshFJU 7MuofK6cip91TuSt270cer8kPf3CmTlVCO9bID3dZMPWPtufGVGFcG68YE/N4Sc1O7Pq WMJs7b1TmpoSG0wArlKuY8cqLxKftTPdappFJHSxG5qPxqvEuccMDXsaYnzeq72l5cUb Vr+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739911115; x=1740515915; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xiogs8K0wkN/JcL/xShcpaN2YEfhgzzsEDBVwZbdHNw=; b=kRp0lIHw5LWPvIRbx8jS7qLhxEYbRqk4fgiC/9H2fTQwYlbVRiHOA+CyXeP9U0eeLK LJBQTEdo9LyXtp4Vl8SuWgy4Cgp+PHcl/f4wkSStFUfwZCRrKRyNSBUD6HcGQ35uq9EH 8jMh55nY/O/PH81h/Gxe0Q/4qrxjCw7sXbUKApX6KiQcjRYrWoEu80+iCFqykDAoXz0y BjtxMYqWthQzUSA7+ULBm2nwcMhJZBHLm9r6Dp/5k9IvAnol2ePlRoTv5K0O2/psmd1J wNqOb4lZdbkO5HMUUDVq2TTrhI23sWfrihirgbDlbpcep6wUW9hQgT5ld/CsAhnkTsSV PW8g== X-Forwarded-Encrypted: i=1; AJvYcCVMW9/QFs9Vu0sFmq8cfruyjuED5IX5JMIwF9y+nZk905q6rxpTL56fdGbIyRG+3Upr9m1OGY3Kxg==@kvack.org X-Gm-Message-State: AOJu0Yxaf7LlbNbzkTin0ArpAluAzqo+gIA0hmtt0M6k2upJagwbfjEg +5tcoZvUS/gx8ItImbYfGLV916MBrHbdlBDnVDhsxYrQLNly40yf7sNDoP9f638= X-Gm-Gg: ASbGncuSsIhiPOv10Z2jOdR5008ykUIy4izTxB3t2qzYGCfFe640Q3t9xUNleuQzbIe WRYKwSbAoNUWCvYkoAFIlTNout0LRd4kkSOqDuFU7FASLcCyQQ2EzvmGTZZSQYokE1OBLXFf/Y4 ztJ/6ICSlN5OuLi+IR3EskblhP/7AKLG3yLTqFcqLvHy+P2XIi6AD9WSY0TcF8X/sgQaJV8daFC opUJ8aKWacq66tTlFDVzarub3MUuXfKsyr6eNxLfF4+KJDFFRXYiVnk578IGxJ2MLnTY/aNEpb4 +BCUIHhdLW569g== X-Google-Smtp-Source: AGHT+IFTX34tE4+Ke7FS5c46s2ZTF+CV5M73ALzr9YzSuUx/AHSZ8a3tBZmD0zWWK9U/Mb5cSWtk8Q== X-Received: by 2002:ad4:5fcd:0:b0:6d8:a0c8:e468 with SMTP id 6a1803df08f44-6e66ccd4b26mr199962256d6.24.1739911114882; Tue, 18 Feb 2025 12:38:34 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with UTF8SMTPSA id 6a1803df08f44-6e65daffd71sm67277186d6.101.2025.02.18.12.38.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Feb 2025 12:38:34 -0800 (PST) Date: Tue, 18 Feb 2025 15:38:30 -0500 From: Johannes Weiner To: Vlastimil Babka Cc: Brendan Jackman , Andrew Morton , Mel Gorman , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed Subject: Re: [PATCH] mm/page_alloc: Clarify some migratetype fallback code Message-ID: <20250218203830.GA1264207@cmpxchg.org> References: <20250214-clarify-steal-v1-1-79dc5adf1b79@google.com> <20250214212647.GB233399@cmpxchg.org> <764394d9-592c-4d68-8910-67362dd8810a@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <764394d9-592c-4d68-8910-67362dd8810a@suse.cz> X-Rspamd-Queue-Id: 0C31D140009 X-Stat-Signature: oada8uk8un1sfyqemxfhai1gs9549y14 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739911115-539795 X-HE-Meta: U2FsdGVkX18KoDTVVs+rrow2uBjrND8TwaOUL8boKRY8jm4mANFya7nM4u7FhiwSskX7iqduQim/V4KzxYMOQwxeV16Ije3wSHRtNmLnsq21rnKVqCpSCyixbmAUDCKECAmlg9ybsLLBPTyIrFptb6t5PlvoqUr58mYmPdBL8O9jWSqMg6PS16Hn4DSBMpfO0kT9Mi/EyHLG8BQNf9VDeazU7Yr6XMxiywhWgxpxZD3Gqu3nHFebmiuFqjVEYgjK21pMhGRwzMYndDChoHz55+gciZ/kpBDXAkdZrIpRhnQd9vTmtAFx/BE8Byu60XQFOYpZV7nF08mVFCXphRVMgLM1w+m22emk/++3KWzhT369xc15SJkeuqGnJQADakI6hyYh2HVrKJl/Nz9fC1N55c/mCi414c58rJyyjtwIbtMuriG3PEBflDycwIHsROo647rmwgfPb2rwnJ/wbARQwFkj7T6BfbEV6g4rRohDeBkM8ttD6MY2GQnHAqBwaseQQISX/meGG2mu2shpJKNLOiCCetF1QdbADNUbKAkfpoeAJte++yRV2yoigV3jxk8AgwxvbmsVyiQUrneCzRITuzE3NigICEwXY/9Aeaj+B+dPTvULIJOYp5NMMVz84Qb61r1el7cbW1KC02w6ibkiTXZkUOb3JOinAqKi930pd11sVJAnqNLcA78eWDG9SvRQes0EixqMVta++NkHd1Pz/nRejbeQFeOWCiQJz4Gb0Efkakg2yTUCn8DivD3GJo0wbL/IpAW9JDNj72g0KDpe/1YO0FfNsMMruk8PySkb7tnpmAzPAl/4Q+4R0ctNEfYDcX+Y8eMT70wRHc5CGAc1SvjiQHeEq3rlEJYNMV9CCgRBN4QmuJQ7M43FXYadP+Cp2p2JR4oPOOU6oAHWvSa2a2btyccFfCjGrCUt/fHg+lFfI7XVp7LYdGs+Sdiu5x32HIdQJS1ORfw2LZgwgLL PT8PLNc9 TcxEp7tAiELQ6CyTmLcIZzGwQ/aN/rJWOdp2KpbuYJl4uF0FrDQzJJngG45vMTpItGCGjjiX/9LArY7Zrv+O+qNZD/zC9USV60Djx8LZ+MX29f5VJiA8+glaZ29qlv+IZiQl09Jv4NM8auZB9d6gM/olsSkAPEJBUU4S1kmAfTzEJYwySEGmMDB6hFhPe9i4BJ8Tx4mdwPDDPUPCnQgZ1wOT8tqBQWKbtK07epRJ3f3BEaUte6OsuW6sbtSIKIq3AhdW9EYlzk9OKpdCWIEIqxzXTA+iEuX8PImESJxQJAwRMSkX+D0BKvOYddGgocnAgTHMXhco1NDpaI8vsHVSsyYgMx36saI/85QYDuWt561Xst8IhemF0Hm5qx3NUbxQDqmITTJPuy0EbKhT8Mc10gm+BbqefRBPROyP4lvdNGW0v07hEt0uLdz4lvVeMoXpDC1aUKrYOZQZYVr2m/FyZ6UGpTJMAWS5rgUVGizGWudY1hjH6bL8rv5nTpiDX5CnhnX3fyoVYF3fWp6dBduq90Ctzxg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 18, 2025 at 11:19:58AM +0100, Vlastimil Babka wrote: > On 2/17/25 17:26, Brendan Jackman wrote: > > On Fri, 14 Feb 2025 at 22:26, Johannes Weiner wrote: > >> > 2. can_steal_fallback() sounds as though it's answering the question "am > >> > I functionally permitted to allocate from that other type" but in > >> > fact it is encoding a heuristic preference. > >> > >> Here I don't see that nuance tbh. > > I think I do. > > >> > >> > 3. The same piece of data has different names in different places: > >> > can_steal vs whole_block. This reinforces point 2 because it looks > > I think some of the weirdness comes from the time before migratetype hygiene > series. IIRC it was possible to steal just the page we want to allocate, > that and extra pages but not the whole block, or whole block. Now it's > either just the page or whole block. > > >> > like the different names reflect a shift in intent from "am I > >> > allowed to steal" to "do I want to steal", but no such shift exists. > >> > > >> > Fix 1. by avoiding the term "steal" in ambiguous contexts. This fixes > >> > 3. automatically since the natural name for can_steal is whole_block. > >> > >> I'm not a fan of whole_block because it loses the action verb. It > >> makes sense in the context of steal_suitable_fallback(), but becomes > >> ominous in find_suitable_fallback(). > >> > >> Maybe @block_claimable would be better? > > How about @claim_block ? That's even more "action verb" as the verb is not > passive. Sounds good to me. > > Yeah that sounds good to me. > > > >> > @@ -1948,7 +1948,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, > >> > if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD)) > >> > set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); > >> > > >> > - /* We are not allowed to try stealing from the whole block */ > >> > + /* No point in stealing from the whole block */ > >> > >> The original comment actually makes more sense to me. Why is there no > >> point? It could well find enough free+alike pages to steal the > >> block... It's just not allowed to. > > > > OK so this is basically point 2 from the commit message, maybe I don't > > really understand why this condition is here, and maybe I'm about to > > look stupid. > > > > Why don't we allow this code to steal the whole block? There wouldn't > > be any functional bug if it did, right? I thought that !whole_block > > just meant "flipping that block would not have any real benefit from a > > fragmentation point of view". So we'd just be doing work and modifying > > data structures for no good reason. Is there some stronger reason I'm > > missing why we really _mustn't_ steal it? > > Agreed with your view. Thanks, I hadn't looked at it this way. There is also this comment * If we are stealing a relatively large buddy page, it is likely there will * be more free pages in the pageblock, so try to steal them all. explaining that it's about whether there is a point in trying. > >> I will say, the current code is pretty hard to reason about: > >> > >> On one hand we check the block size statically in can_steal_fallback; > >> on the other hand, we do that majority scan for compatible pages in > >> steal_suitable_fallback(). The effective outcomes are hard to discern, > >> and I'm honestly not convinced they're all intentional. > >> > >> For example, if we're allowed to steal the block because of this in > >> can_steal_fallback(): > >> > >> order >= pageblock_order/2 > >> > >> surely, we'll always satisfy this in steal_suitable_fallback() > >> > >> free_pages + alike_pages >= (1 << (pageblock_order-1) > >> > >> on free_pages alone. > > > > No, the former is half the _order_ the latter is half the _number of > > pages_. So e.g. we could have order=6 (assuming pageblock_order=10) > > then free_pages might be only 1<<6 which is less than 1<<9. Doh, absolute brainfart. I should have known better: "On Fri, 14 Feb 2025 at 22:26, Johannes Weiner wrote:" ^^^ ^^^^^ > Aha! Looks like this is also a leftover from before migratetype hygiene > series that went unnoticed. The logic was, if we're making an unmovable > allocation stealing from a movable block, even if we don't claim the whole > block, at least steal the highest order available. Then more unmovable > allocations can be satisfied from what remains of the high-order split, > before we have to steal from another movable pageblock. Ah, right. So it was 1. buddy >= pageblock_order: steal free pages & claim type 2. buddy >= pageblock_order/2: steal free pages 3. otherwise: steal only the requested page The hygiene patches eliminated case 2 by disallowing type mismatches between freelists and the pages on them. That's why the pageblock_order/2 check looks a lot more spurious now than it used to. > If we're making a movable allocation stealing from an unmovable pageblock, > we don't need to make this buffer, as polluting unmovable pageblocks with > movable allocations is not that bad - they can be compacted later. So we go > for the smallest order we need. > > However IIUC this is all moot now. If we don't claim the whole pageblock, > and split a high-order page, the remains of the split will go to the > freelists of the migratetype of the unclaimed pageblock (i.e. movable), > previously (before migratetype hygiene) they would got to the freelists of > the migratetype we want to allocate (unmovable). > > So now it makes no sense to want the highest order if we're not claiming the > whole pageblock, we're just causing more fragmentation for no good reason. > We should always do the find_smallest. It would be good to fix this. That's a good point. It still makes sense to have the two passes, though, right? One pass where we try to steal a whole block starting from the biggest buddies; and then one pass where we try to steal an individual page starting from the smallest buddies. Something like this, completely untested draft: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d6644aeaabec..f16e3f2bf3dd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1911,13 +1911,12 @@ static inline bool boost_watermark(struct zone *zone) * can claim the whole pageblock for the requested migratetype. If not, we check * the pageblock for constituent pages; if at least half of the pages are free * or compatible, we can still claim the whole block, so pages freed in the - * future will be put on the correct free list. Otherwise, we isolate exactly - * the order we need from the fallback block and leave its migratetype alone. + * future will be put on the correct free list. */ static struct page * -steal_suitable_fallback(struct zone *zone, struct page *page, - int current_order, int order, int start_type, - unsigned int alloc_flags, bool whole_block) +try_to_steal_block(struct zone *zone, struct page *page, + int current_order, int order, int start_type, + unsigned int alloc_flags) { int free_pages, movable_pages, alike_pages; unsigned long start_pfn; @@ -1930,7 +1929,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, * highatomic accounting. */ if (is_migrate_highatomic(block_type)) - goto single_page; + return NULL; /* Take ownership for orders >= pageblock_order */ if (current_order >= pageblock_order) { @@ -1951,14 +1950,10 @@ steal_suitable_fallback(struct zone *zone, struct page *page, if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD)) set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags); - /* We are not allowed to try stealing from the whole block */ - if (!whole_block) - goto single_page; - /* moving whole block can fail due to zone boundary conditions */ if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages, &movable_pages)) - goto single_page; + return NULL; /* * Determine how many pages are compatible with our allocation. @@ -1991,9 +1986,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page, return __rmqueue_smallest(zone, order, start_type); } -single_page: - page_del_and_expand(zone, page, order, current_order, block_type); - return page; + return NULL; } /* @@ -2216,23 +2209,16 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, if (fallback_mt == -1) continue; - /* - * We cannot steal all free pages from the pageblock and the - * requested migratetype is movable. In that case it's better to - * steal and split the smallest available page instead of the - * largest available page, because even if the next movable - * allocation falls back into a different pageblock than this - * one, it won't cause permanent fragmentation. - */ - if (!can_steal && start_migratetype == MIGRATE_MOVABLE - && current_order > order) + if (!can_steal) goto find_smallest; - goto do_steal; + page = get_page_from_free_area(area, fallback_mt); + page = try_to_steal_block(zone, page, current_order, order, + start_migratetype, alloc_flags); + if (page) + goto out; } - return NULL; - find_smallest: for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) { area = &(zone->free_area[current_order]); @@ -2248,13 +2234,9 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype, */ VM_BUG_ON(current_order > MAX_PAGE_ORDER); -do_steal: page = get_page_from_free_area(area, fallback_mt); - - /* take off list, maybe claim block, expand remainder */ - page = steal_suitable_fallback(zone, page, current_order, order, - start_migratetype, alloc_flags, can_steal); - + page_del_and_expand(zone, page, order, current_order, fallback_mt); +out: trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, fallback_mt);