From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7FFF41C2E2 for ; Tue, 26 May 2026 17:51:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779817909; cv=none; b=c4GrXkyW/Gde2dcQqRKDCdMuwlZVmJajenvYS5xjmnmfQN6neSSrAqtbuvy1kUCeX08HKceAgudzeDhdUL555OtvrQNF797xFIg5/ej81gyK5wG2ceUsa4pZH7IKOUBb4khBO18ID06QfwxXxjOap/3IrlUPLBuEfZlcOORcQmM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779817909; c=relaxed/simple; bh=rIiiQFs1fibcfz3uS/+WuRGwfxMc1iP5VgdJyye7b3w=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=niShJ2sXTaDQHGxMIPJVNyMOoTv2gTGInMQFulJxUdacwXQAW1seeY7JtKscgUrVN2JTPXYpvwFgt4pnVsB+xpaufLwdXe+sYKANV9HHjf/XkFtUSvZSYNTrB+Ky114HHhuszBYWBp5yoPWAX1zMI82hErzzvReY9acuJudKp7U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org; spf=pass smtp.mailfrom=cmpxchg.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b=ZYO70mOh; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cmpxchg.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg.org header.i=@cmpxchg.org header.b="ZYO70mOh" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-8b3fe2f19a4so118771776d6.2 for ; Tue, 26 May 2026 10:51:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg.org; s=google; t=1779817905; x=1780422705; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=XQ3cng5X0oDMXeSorLg9lB3BC86VI//X2tWXp1ZbvQQ=; b=ZYO70mOhFujGr20AW4OEZAULgC7SsTxV9zQiNoreDrtrNVJmO7uuIQC1DfWp43ikP3 yIId1u6qtnX/cK/olxk/eKw6AVELaii14bmouWQ/uTtRs/LHdC6LmUP1bOExfoYbV3Kk zgXZLR19qgNN0YPfSNvCa7TvHd/hsRno3XAAmCj9FWWNr0Hows2xHypzXlkeKFMerkQe jBekZdGp3sbEK3cC9XvCJXVRQYQoqx2f1eRvdo81FsTVPMxrX/w2QBsjhj9eLMj23Siz cW4V2imeOCmtH+W8JBm6qy1Dka5dPxjLR6lpassy4vzXHmXUVy5SjAF06qw5OX3BKduu vlHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779817905; x=1780422705; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XQ3cng5X0oDMXeSorLg9lB3BC86VI//X2tWXp1ZbvQQ=; b=crNprFeQaOHXX+nTkR3+sSfmBlTKodmyOwI6HP5VIE/c5Q7j87swOiGBD/wy2c17hs BA+GG1YbJyRoEtdNrzRpnwmbTzw6xTuFPPnB4s9mP2gp2KwQ6Udb8gPcYxwMkHjJRP7/ 5auA4WXFFtP/R8dhrU6zFHnUfQbQX5mpPQTfeAgHApNc4xttwIGmmdSYCS2Gxrr2MRU/ qx5+Mot6HcplueVBNb3O2UHkBOT1/EovcqZEIhIq37HUGFDpVtT6aiXzUHn4IDQsIOu7 fD7OjRClsyZAan7A5ANJya/uLE4XuKPY3lh2hBljVVkRDzeU/UWuafHV9bT8YFc9281/ IIrg== X-Forwarded-Encrypted: i=1; AFNElJ+OSgLGy+ty2EvuzM8M0BQyzumTxKuRNM//vR+zHKMvFit9G1OX783IdamjMkQu/42mE+BsehayvyyT6V8=@vger.kernel.org X-Gm-Message-State: AOJu0Yy4MkDxvUOSJYgmqzxyYsMNBvhumlT47xM1PwizeRqpR/w6rVox SIkLZbQkym0RFATq9yPn1WFCssFEzHawkBDNGxL05kIT6G1FJGPfidNJ/jwEFzB2h5c= X-Gm-Gg: Acq92OFtOjHKPSIeP4Q6fwVtM9s2l+QKiIP/PKgd2QlE3jFLYI19StNGqE8HSOl8306 7XZ9ER4H1oEHsTsjKZDlYTDcPZizlYDGqlV0GOQgS4vih7b+/qf0JevcSRP7QYKzW0Qai7i33nf 8A7vPN75qUgai2vNWH25nzmMKQ1jvdnM5QNYbsVAZJP8w4FFo1cz/0eTb1KGH1GBxenxDYKhe7W PI+/tqFa8TQMnswvBudUxc/N2dUTzPTCuD8a8bDOfV59pS8WPrpeaZjCt99V9vSFDn4iu2k6JGm ZYF23i9NSMcaBgBbupf1mNkNTJ5s7Pf6Y6D9ecsFrlZE/g4jxWLVa8I4U6fM0aebHp5n/7DA2P+ Ne8woHQwwAUFBH4tzEhIr5nagpSOpFhNexZmB66DDFrFoR2JJUCUkIGAmuqTvzJMOKU1/biHaki j3WE9s6pQl7hW9Bc7O+57FrQ== X-Received: by 2002:a05:6214:4113:b0:8be:3c37:5e6a with SMTP id 6a1803df08f44-8cc7b621af4mr292946046d6.21.1779817905223; Tue, 26 May 2026 10:51:45 -0700 (PDT) Received: from localhost ([2603:7001:f100:500:365a:60ff:fe62:ff29]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-8cc80dcf895sm165660966d6.1.2026.05.26.10.51.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2026 10:51:44 -0700 (PDT) Date: Tue, 26 May 2026 13:51:40 -0400 From: Johannes Weiner To: "Vlastimil Babka (SUSE)" Cc: Dmitry Ilvokhin , Andrew Morton , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH v2] mm/page_alloc: fix defrag_mode for non-reclaimable allocations Message-ID: References: <20260520122228.201550-1-d@ilvokhin.com> <20260521165910.e7dea6a4e591d66293d2bd47@linux-foundation.org> <2aedfd17-17e6-4dfe-8ae5-c7342ead708b@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2aedfd17-17e6-4dfe-8ae5-c7342ead708b@kernel.org> On Tue, May 26, 2026 at 03:13:09PM +0200, Vlastimil Babka (SUSE) wrote: > On 5/22/26 3:05 PM, Dmitry Ilvokhin wrote: > > On Thu, May 21, 2026 at 04:59:10PM -0700, Andrew Morton wrote: > >> On Wed, 20 May 2026 12:22:28 +0000 Dmitry Ilvokhin wrote: > >> > >>> When defrag_mode is enabled, ALLOC_NOFRAGMENT is enforced to prevent > >>> migratetype fallbacks and keep pageblocks clean. The allocator relies on > >>> reclaim and compaction to free pages of the correct type before allowing > >>> fallback as a last resort. > >>> > >>> However, non-reclaimable allocations such as GFP_ATOMIC cannot invoke > >>> direct reclaim or compaction. With defrag_mode=1, these allocations hit > >>> the !can_direct_reclaim bailout in __alloc_pages_slowpath() with > >>> ALLOC_NOFRAGMENT still set, and fail without ever attempting a fallback. > >>> > >>> This causes a large number of SLUB allocation failures for > >>> skbuff_head_cache under network-heavy workloads, despite free memory > >>> being available in other migratetype freelists. > >> > >> That sounds painful. > >> > >>> Clear ALLOC_NOFRAGMENT and retry for allocations that request kswapd > >>> reclaim but cannot do direct reclaim themselves (GFP_ATOMIC). Purely > >>> speculative allocations like GFP_TRANSHUGE_LIGHT that don't set > >>> __GFP_KSWAPD_RECLAIM are left to fail, since they have reasonable > >>> fallbacks and should not cause fragmentation. > >> > >> How serious is this to our users when running real-world workloads? > > > > We observed it on a few of the Meta workloads that adopted > > defrag_mode=1. > > Do you (or Johannes) have some observations to share about what > motivated those to adopt it, what kind of workloads benefit and how? As you may remember it was developed to help with higher order / THP success rates under pressure. The impetus for actually deploying it was that we saw issues with avalanches of large page cache folios vacuuming up the higher-order chunks; this (ironically) also led to failures on the network side. It's kind of a structural problem. We have real preproduction buffers for order-0 pages through the watermarks. But for higher orders we only ensure there is at least one page. That easily fails under even mild competition. Since we wanted to roll defrag_mode for THP in multi-tenant systems anyway, we figured we might as well take the plunge now and battle test the feature this way. defrag_mode fixes *that* issue, by preproducing watermark buffers in contiguous pageblocks - making everything up to that order more readily available. I'm still hoping to make it the default eventually, which was the plan with the original huge page allocator series. As we keep leaning into higher order requests more and more, and especially grow the non-optional ones, we kind of need non-optional preproduction guarantees for higher orders as well. But there are bugs like this one, and we're still figuring out some overreclaim issues with it in production as well. So I'm glad it's optional for the time being ;-)