From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 180C6238C0D for ; Tue, 16 Dec 2025 16:26:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765902418; cv=none; b=dtTArLQHMno5/tGUfhwL3spLQRI5M1eNZuAOpCTeLFe7461Bpts7qMObudqGmiHlMTuDOr9+jvs3t1eTapMOML+2SJHUjoop7Mj86Xy+aKnUMfZyjDTMMBStu/HlsGQwSi4gOQU0hw8aocm5f34IYtFOyZ/1YiZ8YRYbmG2bD+Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765902418; c=relaxed/simple; bh=j/5H95oW0JdA6OTH5FNp86BrM10OMky0t0SB6do0yc4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=VmQmXDWsZjwg+zRP0nhk11FCEUkqj29ekKPS7gzxWH7TPti0vjeCOfQYYohIqItqoWhIJ273oic3C/GUS0K/2KGeMgOICrpWZtKQ5qr6MDpX5Fr8VjuCNLFBLEv84VwTNwTWJPp/YHanZO8ji1+Ud6KSnyYUZMZiRClXFRve59w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=Paygd03V; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="Paygd03V" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-47774d3536dso48057575e9.0 for ; Tue, 16 Dec 2025 08:26:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1765902414; x=1766507214; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=E5R9n7vu6tW/OHWiM/k90ITG3OaA72GmenJa52OTLyE=; b=Paygd03VWl4yRbDSxZMIgfhze6jS4gjLOQq3fs+yWCHXZSLiZKGtvUh+nH4Hz0UrdJ baARdG8OUdgYrEuPp43yEyo+BTA6GJ6NdnKk/BC0GlFrkp7SoBIVqMxWdIDUr9SiDYju biAOavFEEPbbblgZdA6SnJNSTgIwG+AX6PyRsMfljT3m68o9ewtxyvxiazS2vzCMvpHN Ei2vR6F82N3Cq/BP1T8+L07MYcoil6yAeaM9W5ZFljWm7Kd1rqdAY62nYwriw0TH804q NwfE/Z5ihYI1RM5mp5cxPW5IiuuwLSOPW1DEO3rqayVq3axCIxl09/zqLid4Rmlxq7Mk 3FgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765902414; x=1766507214; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E5R9n7vu6tW/OHWiM/k90ITG3OaA72GmenJa52OTLyE=; b=m5CBnGFuyRQjD+nuzOO/5Bx/ZUXqxzLQj6M1EmUji9Lev0tJby8cg/feVw7t1h3DFL 3W3oXkp/VU9qebORKRYUXbqJuvUByzkprekHpE9yfdgHC8c2Uv2wnn1b9RPRuLhMZ3U0 9J+RFsmeDMWx4LtV5p6ZrJeSIqT3Wl0DUD26bP+C340qbfjGc/g+OmaaiZUKg3KIXsbi mF8rx6QPIg7cgBssrI4ODKv0Rxrp0yAOjxxH1twSqfct/REE74vFatcZh9uZjTkfugcT M9fGDK9LVW9WMWO5d7+QGgNsvDr5+3TznTVJeDHZDkgr9kqbS2OB1W3oer9HjSJYx1h3 tgOg== X-Forwarded-Encrypted: i=1; AJvYcCW5Lve3mroc0wfR7IB4Nv5fiWOhZt9UdCdFwv4OIQ2lTnveWmZ/wLzGnRxY6AOmXvXKndL6PjUs/xyM4E0=@vger.kernel.org X-Gm-Message-State: AOJu0YxZ0HioL3p8jRd5cCa1ZGrBU9UUaMCiVxR6ViJd+toQHu2IHYBL 47aI+8/JBuOJ8vKlwe20NM6sxDVz4CiVFhCnPIbCBSop9zoL9b1JtJjvmi+ERiG1txI= X-Gm-Gg: AY/fxX6v3pZDbB5+wp2O2T6k18ripAEe9oFtU5ySStgWM8G3rsfMjbnekZD9VZWG39u 3tiLMIF9Ow8mMQK97KA7rMk2L5fFgD9u64XmUWjHCCIUr8kv2EU2+3FipxOOw4yS1q9V0IaeuKs mrwXeOS82VCWH33ALEQLQ4G2OsC1hiRZdpxn7lPjeTIPFpNmgnUx24V3FYqpIXuQIuF1G8Q78Bp pLf6XCL1f5DZShziQnsJt4MDfM+e+djeDyXxHqqcKETLJkU0AGmO8u0v5113uWYHdJR+k4yGcRa yNNzzBtpHUCtP2GjOx0ehT4oXJQxyTzIIrkNZH4+wOhzXpDl4zASs6OwZoelkMhAdDgUZwh//xn JQPOHystkjNXWaMUa6MQFdmW/x8MZX483IGCt3zh9kKY7O7bE4wkdTDEWkQghaBWZ+PiCwmpbq/ K9f0Sp8wQqsH+vQgM7Fnn7QxGg X-Google-Smtp-Source: AGHT+IFvLlR00lt2kRJuMaZoAIYLTGi3fW5Ga4GI8C6RIk+IfB2krj/L5y1GM9Ql+05OAjjWTfxFQA== X-Received: by 2002:a05:600c:4e91:b0:475:ddad:c3a9 with SMTP id 5b1f17b1804b1-47a8f2c2f62mr156327895e9.13.1765902414286; Tue, 16 Dec 2025 08:26:54 -0800 (PST) Received: from localhost (109-81-92-149.rct.o2.cz. [109.81.92.149]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-47a8f775012sm246018635e9.7.2025.12.16.08.26.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Dec 2025 08:26:53 -0800 (PST) Date: Tue, 16 Dec 2025 17:26:52 +0100 From: Michal Hocko To: Vlastimil Babka Cc: Andrew Morton , Suren Baghdasaryan , Brendan Jackman , Johannes Weiner , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 1/2] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations Message-ID: References: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> <20251216-thp-thisnode-tweak-v1-1-0e499d13d2eb@suse.cz> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251216-thp-thisnode-tweak-v1-1-0e499d13d2eb@suse.cz> On Tue 16-12-25 16:54:21, Vlastimil Babka wrote: > Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of > local-only and all-node allocations"), THP page fault allocations have > settled on the following scheme (from the commit log): > > 1. local node only THP allocation with no reclaim, just compaction. > 2. for madvised VMA's or when synchronous compaction is enabled always - THP > allocation from any node with effort determined by global defrag setting > and VMA madvise > 3. fallback to base pages on any node > > Recent customer reports however revealed we have a gap in step 1 above. > What we have seen is excessive reclaim due to THP page faults on a NUMA > node that's close to its high watermark, while other nodes have plenty > of free memory. > > The problem with step 1 is that it promises no reclaim after the > compaction attempt, however reclaim is only avoided for certain > compaction outcomes (deferred, or skipped due to insufficient free base > pages), and not e.g. when compaction is actually performed but fails (we > did see compact_fail vmstat counter increasing). > > THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, > which is not the intention. > > Thus add a check for __GFP_THISNODE that corresponds to this exact > situation and prevents continuing with reclaim/compaction once the > initial compaction attempt isn't successful in allocating the page. > > Note that commit cc638f329ef6 has not introduced this over-reclaim > possibility; it appears to exist in some form since commit 2f0799a0ffc0 > ("mm, thp: restore node-local hugepage allocations"). Followup commits > b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction > may not succeed") and cc638f329ef6 have moved in the right direction, > but left the abovementioned gap. > > Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") > Signed-off-by: Vlastimil Babka Yes, this makes sense as an intermediate state (to make a fix for stable and other older kernels that might be interested in the fix). I would be objecting that we should just simplify this whole thing but you have done that in patch 2 Acked-by: Michal Hocko Thanks > --- > mm/page_alloc.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 822e05f1a964..e6fd1213328b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4788,6 +4788,20 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > compact_result == COMPACT_DEFERRED) > goto nopage; > > + /* > + * THP page faults may attempt local node only first, > + * but are then allowed to only compact, not reclaim, > + * see alloc_pages_mpol() > + * > + * compaction can fail for other reasons than those > + * checked above and we don't want such THP allocations > + * to put reclaim pressure on a single node in a > + * situation where other nodes might have plenty of > + * available memory > + */ > + if (gfp_mask & __GFP_THISNODE) > + goto nopage; > + > /* > * Looks like reclaim/compaction is worth trying, but > * sync compaction could be very expensive, so keep > > -- > 2.52.0 -- Michal Hocko SUSE Labs