From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f178.google.com (mail-qk1-f178.google.com [209.85.222.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9FB5258ED7 for ; Tue, 16 Dec 2025 20:11:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.178 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765915892; cv=none; b=QeJVnrd+JD0uawG3g2h5dl918EnR6FJUnsH5ihARQEjhXatcDKJgPC2gSmimjiCnbldC6/+2eUseSNCYS/v3rtc8xEyesSFpZcmX+rkN/+q0RFoEVMplWreL6uvGXnPy1tc57P/ECnPUZMAC25IOr6sCguLOCQenfH9RGJDa6F4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765915892; c=relaxed/simple; bh=vsFHyI/Go9TePB0IHdp1uXp1iziSRy9VPR5u+BrH9oc=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZPH62qWaNQhxyu7LgK6oW8OfrnleGDLmhjDDQibX0cPyp0LDlJErwtK2S0m611j5fb4kFTfSz/Nt3uz8vasdRfzrn65NxYuhP2WfAM00Um2WuP5MYEm0BIDxmIf9S4KQrvk6efYVm4XfRfKfUY4orDqjKA/WdlEUN3hY8UH66WA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=SLSS5Hd7; arc=none smtp.client-ip=209.85.222.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="SLSS5Hd7" Received: by mail-qk1-f178.google.com with SMTP id af79cd13be357-8bb6a27d3edso354796985a.3 for ; Tue, 16 Dec 2025 12:11:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1765915888; x=1766520688; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=u2TEZOHQjdAzpbHmekbmLcKEsQRt3befpNWglQK9S6E=; b=SLSS5Hd7YJiGYUBQjGwuKzwD+phvjOG9A0bbS47ns2DpKRGHPvaijHmQRtJj+teWix u35byYYOHGLrtl8MaPZ+NNJtkJtsfGgYo5Oag0MbmeEmOf3LvYcDF1UKiqENMyROoMZE Y6IBQToRts2IkHecN761kh2zQE6CQ2jzfWmyQ3Euc6iQhcdG0SK9hVV+yy6QQJQMS7li W5lLoKwSNe3/boYJ11atIKKPB92WtwF8zhGUD1XXw0Tdb0QHcmOcIj2CsE1Sl0H8JFnp 7vc4yG/NwuURGTbx8insdk8niGspzArbJHQZri09l+yR6oheTU5wQZeLDrn7D7HmQrit 5r8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765915888; x=1766520688; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=u2TEZOHQjdAzpbHmekbmLcKEsQRt3befpNWglQK9S6E=; b=OKN/YdUac8tTzpWyKFdTQ6qdjDG0xvh1dcemLdZDO/SS6YdxWZS/CQbEjmRq7MBmCr jHQ/mJimP1n50TkZnrqim6X7YXLyU5rx2UHn765S3TAPGCPGZ11jeV3RXMhOZCnGgqHd lArffLFxvMdgjfDQjb/A3sxChywWprISJRX0/bZ6H/Gd5pf7S3flLiPF0SfT3DvkFUk5 +jwTRKsfgs0ChXGb5V4xegGRtKsnZFE3LFQOhLoSZK03TrpuVnduRQLceiW/8W2vNfzq 50ocmLYgYSyfH9ZhexVSnvXBi26cDC6VlVLmpPEkLQl8+oe0P4DDob0iF09lg9fgphnZ FVPw== X-Forwarded-Encrypted: i=1; AJvYcCUQKD5PKcx072GJsCTba8D5Kx8XM73deB+I4JVeOQj96oNdtMA7sMBFtkDNQhnf6pwGqW7w1O3XEs/ec5M=@vger.kernel.org X-Gm-Message-State: AOJu0YyJXWvHh+5TjbMFCtBKbIpZJYGI6IH/dqrHzywT/uv0DgSJRQhs lN56MbwTgwSiBs1HxQNY9L3gMUkRq7K3vHe3Ycu7Tp0rzlHPtjNDXIkFxfTy2Wi3Quk= X-Gm-Gg: AY/fxX5MCGaX85TOiVUUT4XCAi6oSdJ+XE3yCL00ezGrX5fVbhyNBM9LRmX0ZTBCWgH 8EB0LAV6noR2kryhoRWH1rqgu06iLaYWQkXkqpy2ShIx55X9Jw8zHylIclrXymZ6LnBIHSMlhr5 PwiTN/KbyprpqfsYtxge1mtpxei6QsLPQRHKEv+sXyvUEQ2XIrpnWrtu6EA7YYbbN89GFF1lFsj IY/kT/+jYwXbwATeOWRpQ5CyKlK2RshQKCA6U7DvfdpGBjK8GrrQr9/DsE/kNogZ7siJ3YBXlng 7NcvIxREABOGdl/VeaLXbct640Y4McvUh/y5vbDuzy1TN60ULtklL3p9tO0z3kVNnwnENmpfEwj rb/4XLWQS2KcH+0nrM9GN+bxUMMVT0AMtDFil7FcnrKYnw0MapMyVJLC8S636+PIlMlcBhOa1bp FfO3qYg4UnEg== X-Google-Smtp-Source: AGHT+IHtx8JdCdZWFi/hKtoiZ9YSBqltx9FoeXoqraRfIaoEydTQM97rKBnnZognyMwARXSP5rZp4w== X-Received: by 2002:a05:620a:2953:b0:8b1:b2ed:2735 with SMTP id af79cd13be357-8bb3a27620bmr2299783185a.46.1765915888359; Tue, 16 Dec 2025 12:11:28 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:929a:4aff:fe16:c778]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8be31b669d1sm249499685a.46.2025.12.16.12.11.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Dec 2025 12:11:27 -0800 (PST) Date: Tue, 16 Dec 2025 15:11:23 -0500 From: Johannes Weiner To: Vlastimil Babka Cc: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 1/2] mm, page_alloc, thp: prevent reclaim for __GFP_THISNODE THP allocations Message-ID: <20251216201123.GI905277@cmpxchg.org> References: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> <20251216-thp-thisnode-tweak-v1-1-0e499d13d2eb@suse.cz> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20251216-thp-thisnode-tweak-v1-1-0e499d13d2eb@suse.cz> On Tue, Dec 16, 2025 at 04:54:21PM +0100, Vlastimil Babka wrote: > Since commit cc638f329ef6 ("mm, thp: tweak reclaim/compaction effort of > local-only and all-node allocations"), THP page fault allocations have > settled on the following scheme (from the commit log): > > 1. local node only THP allocation with no reclaim, just compaction. > 2. for madvised VMA's or when synchronous compaction is enabled always - THP > allocation from any node with effort determined by global defrag setting > and VMA madvise > 3. fallback to base pages on any node > > Recent customer reports however revealed we have a gap in step 1 above. > What we have seen is excessive reclaim due to THP page faults on a NUMA > node that's close to its high watermark, while other nodes have plenty > of free memory. > > The problem with step 1 is that it promises no reclaim after the > compaction attempt, however reclaim is only avoided for certain > compaction outcomes (deferred, or skipped due to insufficient free base > pages), and not e.g. when compaction is actually performed but fails (we > did see compact_fail vmstat counter increasing). > > THP page faults can therefore exhibit a zone_reclaim_mode-like behavior, > which is not the intention. > > Thus add a check for __GFP_THISNODE that corresponds to this exact > situation and prevents continuing with reclaim/compaction once the > initial compaction attempt isn't successful in allocating the page. > > Note that commit cc638f329ef6 has not introduced this over-reclaim > possibility; it appears to exist in some form since commit 2f0799a0ffc0 > ("mm, thp: restore node-local hugepage allocations"). Followup commits > b39d0ee2632d ("mm, page_alloc: avoid expensive reclaim when compaction > may not succeed") and cc638f329ef6 have moved in the right direction, > but left the abovementioned gap. > > Fixes: 2f0799a0ffc0 ("mm, thp: restore node-local hugepage allocations") > Signed-off-by: Vlastimil Babka Acked-by: Johannes Weiner