From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F17435F8BC for ; Wed, 17 Dec 2025 16:35:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765989321; cv=none; b=WCDrgVPZGwuOPLDdh723z7/vqfjrBghsVDQ8Bgqko+M3Ndcrwvy2CufsDwKOOQAOHcWJcTB6Kf+k5e5AxKt/rW+wFReYqTp1BuFg+CoHa7T5Ai8mW0THqtrjOYPgM0X+731BxTpbK3u8Yd9b9x+Mn5Tkzuk6vBG3PXCUFAG95gI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765989321; c=relaxed/simple; bh=qRpXTuqghiUGp417JuIQv98z4iRvJGrVJ3MXfi1/mUA=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=m1y3pDGmQEVdKYrEhnpO5h4VGhOyzZU5fqS60LKgHiN2zIH/YB/T7Zkiq6NJdmwxMpPWErg8Vv6byTPjuuJ/MD+EnwXqv0PFqoryIyy1V/isr4KhdWjm66N35Nf7CPhFVSGGnbfcVYs75A4sUNVMwXlN6LksluLRqyG1MoCahNI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=t4nzAiQz; arc=none smtp.client-ip=209.85.222.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="t4nzAiQz" Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-8b5ccceb382so683718085a.1 for ; Wed, 17 Dec 2025 08:35:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1765989317; x=1766594117; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=LT/eoeOFh0CE8O8xtQLWZr34mVlrEoKfLUvdJt2VskU=; b=t4nzAiQzmiS+cwZJC9vvLOB8QrubhBO8QkFsABlUv9rDqUvEQY8gIcV/UHY/1SBJ4g ogYTVfVLRM8YczsoZ5MysjMi2RWXLrlGL5Lv8pCOmyNDrx+qAXg+q6ZlXSVXnnERAJIx eJw4vBubjhPHBlWYK/o/kUPd/edM+0aJAY4+fG9Qng9XLiBp1ZL2xpKsgjY6XjDmVLnp pWod5AvipVJ70F7K2C7abRdNkFM4Kdi8vsPhcFczlSIlixQE6or6jA5mFBycwmfraxYR SbZcIKGvay2CYsJPXvU7W4uGprkzs2tzdpDjPfoc233f0ToD6fJ7RMC3PfyaiaDBhmHg KkvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765989317; x=1766594117; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LT/eoeOFh0CE8O8xtQLWZr34mVlrEoKfLUvdJt2VskU=; b=drDrbJawKova1W2hXLMRalCcYOiGTIixa71a1xzKZrBkxsaV5XF7WmeIjAmkC6/kXF LJz4rTogPq6820xDWZeSuAbGPVAhTfs91aqMFPvkbo1VNDRMejED92lHuQrOSeSbgQ5o 24/pTdRMYpTvZlcsfzYf9qo+kd8P1Y0lXy19DMkuv3wDrtI6hKKQ+xwBTNpQd+Tum6fX oqNvKkEDKxtfen/Ga3iXtxlTkJ4fEmCBktYPXEfaq8rS3LarVDnZgOa3A0z4MEmLoKa6 TX3Wohgn2E5jEQggePo9T9hKJgnkJyZ2rE6E6XjOUftOWjUf0hFK7/STTMnxzFS0nZ/D 6jHA== X-Forwarded-Encrypted: i=1; AJvYcCVOfOjlRUmOjuQg0rk3ysnD8y70Cx0MzMlA133E9dm5d+jOJ22QVNPqZRr4riVNgttsj3FhkmJNQU/Viis=@vger.kernel.org X-Gm-Message-State: AOJu0Ywnp2uQ8OaGxo17HwriD/JGhPM7KKBTbjBWvBkbTjM9lD/gapfZ tZGWdFiWIB8D0x3S9cqsgtFGIZz4b8xSBlU7SVIrzKnhuJSVrN45Jb8EoaQtqSC5czg= X-Gm-Gg: AY/fxX6yCQoPnsrkXQqaE/4qOtW3/lPY4NsZr9W27ZMzzCe0ZIQWk7OipgBxKC3DArC AvMwDzLGuNCU2kaFNUZUpivdRkhO8S6SDBHmGqI59g8YpPT9z4y/MhG/zoVPHH64ZjHkRm9Jzo1 AAhECMuUqnJ68gxQwbmBoUCcxFFXyvM9ESWIgaBTrbHzh8hypU+NGIPAfAoP6TWWUPOZQ/esMIr EIEH8iNIA196wTctiezvWjF/eEL2BCjb0fSBp6kNQdgHR4+3DuX6zVpPCWrqbiRUokR6o7N+aF8 wAQ/E/4dMQrbN5glneYNhFX6FnMK0LJaiGgQtN/sdw/m0XD+C3O8SBOYjbV5Yiu2qg39/AF3xr/ ZN6bJlhUggHP3iAnKMBnVeGExxUDi7CisOGCrNlgYLQQkr/7/98b8hlqiG0kflvCtsOz+RFbp3C jF1C9Gym4EEA== X-Google-Smtp-Source: AGHT+IEdVQwaRWuZqMlFZP7bAEbc17BdtpQnTZNWcSliJO6TUQ0KPHU1PoqLyumAHTZhjSEkYVESrw== X-Received: by 2002:a05:620a:4509:b0:865:916b:2751 with SMTP id af79cd13be357-8bb399d9ae2mr2577959085a.14.1765989316709; Wed, 17 Dec 2025 08:35:16 -0800 (PST) Received: from localhost ([2603:7000:c01:2716:929a:4aff:fe16:c778]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8be31b669d1sm432934285a.46.2025.12.17.08.35.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Dec 2025 08:35:15 -0800 (PST) Date: Wed, 17 Dec 2025 11:35:12 -0500 From: Johannes Weiner To: Vlastimil Babka Cc: Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , David Rientjes , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Mike Rapoport , Joshua Hahn , Pedro Falcato , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 2/2] mm, page_alloc: fail costly __GFP_NORETRY allocations faster Message-ID: References: <20251216-thp-thisnode-tweak-v1-0-0e499d13d2eb@suse.cz> <20251216-thp-thisnode-tweak-v1-2-0e499d13d2eb@suse.cz> <20251216203243.GJ905277@cmpxchg.org> <9881b540-7e22-404b-aeaa-282dc5eeb5d5@suse.cz> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9881b540-7e22-404b-aeaa-282dc5eeb5d5@suse.cz> On Wed, Dec 17, 2025 at 09:46:34AM +0100, Vlastimil Babka wrote: > On 12/16/25 21:32, Johannes Weiner wrote: > > On Tue, Dec 16, 2025 at 04:54:22PM +0100, Vlastimil Babka wrote: > >> It might make therefore more sense to just fail unconditionally after > >> the initial compaction attempt, so do that instead. Costly allocations > >> that do want the reclaim/compaction to happen at least once can omit > >> __GFP_NORETRY, or even specify __GFP_RETRY_MAYFAIL for more than one > >> attempt. > >> > >> There is a slight potential unfairness in that costly __GFP_NORETRY > >> allocations that can't perform direct compaction (i.e. lack __GFP_IO) > >> will still be allowed to direct reclaim, while those that can direct > >> compact will now never attempt direct reclaim. However, in cases of > >> memory pressure causing compaction to be skipped due to insufficient > >> base pages, direct reclaim was already not done before, so there should > >> be no functional regressions from this change. > > > > Hm, kind of. There could be enough basepages for compaction_suitable() > > but compaction odds are still higher with more free pages. So there > > might be cases it regresses. > > > > __GFP_NORETRY semantics say it'll try reclaim at least once. We should > > be able to keep that and still simplify, no? > > > >> if (costly_order && (gfp_mask & __GFP_NORETRY)) { > >> - if (gfp_mask & __GFP_THISNODE) > >> - goto nopage; > >> + goto nopage; > > > > IOW, maybe directly select for the NUMA-THP special case here? > > > > /* Optimistic node-local huge page - only compact once */ > > if (costly_order && > > ((gfp_mask & (__GFP_NORETRY|__GFP_THISNODE)) == > > (__GFP_NORETRY|__GFP_THISNODE))) > > goto nopage; > > > > and then let other __GFP_NORETRY fall through. > > I did consider it as an alternative when realizing the potential unfairness > mentioned above, but then went with the simpler code option. > > With your suggestion we keep the THP-specific check but at least remove the > arguably illogical compaction feedback. Yes, I'm in favor of removing those either way. Reclaim makes its own decisions around costly orders. For example, it targets a higher number of free pages through compaction_ready() than where compaction would return SKIPPED, to account for concurrency. I don't think the allocator should have conflicting opinions. Regarding __GFP_NORETRY: I think it would just be a chance to simplify the mental model around it again. If somebody does a NORETRY request when memory is full of stale page cache, I think it's reasonable to expect at least one shot at dropping some cache to make it happen. Shortcutting directly to compaction is a good optimization when we suspect it could succeed without requiring reclaim. But I'm not sure it's reasonable to ONLY do that and give up. Btw, I do wonder why that up-front compaction run is so explicit, when we have __alloc_pages_direct_reclaim() __alloc_pages_direct_compact() calls following below. Couldn't we check for conditions upfront and set a flag to skip reclaim initially? Then handle priority adjustments in the retry conditions? IOW, something like: unsigned long did_some_progress = 0; if (can_compact && costly_order) skip_reclaim = true; if (can_compact && order > 0 && ac->migratetype != MIGRATE_MOVABLE) skip_reclaim = true; if (gfp_thisnode_noretry(gfp_mask)) skip_reclaim = true; retry: page = get_page_from_freelist(..., alloc_flags, ...); if (page) goto got_pg; if (!skip_reclaim) { page = __alloc_pages_direct_reclaim(..., &did_some_progress); if (page) goto got_pg; } page = __alloc_pages_direct_compact(...); if (page) goto got_pg; if (should_loop()) { skip_reclaim = false; compact_priority = ...; goto retry; } That would naturally get rid of the gfp_pfmemalloc_allowed() branch for the upfront check as well, because the ALLOC_NO_WATERMARKS attempt happens before we do the reclaim/compaction calls.