From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7800340286 for ; Fri, 24 Apr 2026 08:32:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777019554; cv=none; b=bSl6RYR7PqvBGAswqBRioy0B0St5GSiz9ruPZ1ZbThVHVhyXiivWRF1n0ElENAcqOj8He37sB3kQ8WStVK+shuzacbwMf6PiAp7aS95EUdJ3MbNFQIr9/RtfAKMojGK9TCRtVu8ajEsdwJ5vJSyG8/TzEEfChjezBIsXUetBD0c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777019554; c=relaxed/simple; bh=tDzNCWpJD1uI/N7pNQVy8y7pAVQMrk88zWY2yeyZ5ao=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=e7GwGqh2RxZg2RynBmYWE77JSR5sRnGw/oLFk+UkMg6hfQVxmRwOLOEdrQ+kE5h2M9VKVc7t5w/ehLXRXX+ARL9s4gpJARTX1A0JO33LqX4L0GLlD2ycDkVFRPqY2jE+I7O+RALH2NYvNkXGnkS8vHl2nxZccGhEdfxsFilIyxw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b=FcQj3XXA; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=suse.com header.i=@suse.com header.b="FcQj3XXA" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-4411e1eba51so4099940f8f.3 for ; Fri, 24 Apr 2026 01:32:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1777019549; x=1777624349; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=DbmRHYLI2JvjB0fNkS2GOjThn1ey/TEZLRlkVqf1iUk=; b=FcQj3XXA8f36hhCt3T47arhsXozovGhJzWkDg6gQPHk0l4fhrNT4aSNlqYfz1/nLqp BvAjZ9DTsf2GunN5DZmZB3EIavwKyXqp4jtikWOJpx/uLApvoHbRtAY1yVItEWOJlyrX m98EzQBSZ3hSfSW1F1t/qanu6EXxseb59blqpjwv2lp2w/j/Kl++WNhENrG35zg1Ncgf Je0ki8hcGK0xNrqwvbAydlC58JJSd1j864bioAMEGfOn55H1Zd+9r8h0LuI117GVOeIV Fbaj7IPDtcTXy9kTHAihbCg+072lGuP9aEqsorLVnev3/lTzkzvy08aKVlxEuRnHewrk cHuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777019549; x=1777624349; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DbmRHYLI2JvjB0fNkS2GOjThn1ey/TEZLRlkVqf1iUk=; b=HzCIfX6sSp0gktVqEDI60lg4I6Cac5t2R+cP7pp/fsOjS6TgIFM3CGAQMY6a6faqto Cgqoceqrl/pz+sgbovceazeaBWTeCzo86chC7IbkK2dwSlatwPzHwuRkq/y372BfbrTi JqG88a8j9CEFVQ1hr9iNG70jjPfonuooxnbhd3N+gZB4EwEmm6Z553AeHfIizUOYBisw c7gxEaiLd8l6vcD/b/09mPjAiaPsdCmBVYDInH2dxNOjcWFevxlgNw1B46oUvdtqHEyE EFiLeNHEllPZJsTuLntaJYH9R44fonBk0PTCSobug47X1/U/MhOkJEs3r0UlYVN2tdgP EMyw== X-Forwarded-Encrypted: i=1; AFNElJ+mqDvgdzvcz1mNRCO+uo9pF8XNP10j8Lu7TkGo0H7HsxIxvmNlNDt/5f9Ufrm2EsCgglPULlT7Cp3eWow=@vger.kernel.org X-Gm-Message-State: AOJu0YzEyOb1w106KDN6tz0+tFxksz7N4lepJJUhjzcEFn6CAget8uNM ilj1w/iAVVjxfHPM69EvSnoNs93q8VcCb8xdnapo17xMts5dYgD5uTQ8hsP41ilRgNs= X-Gm-Gg: AeBDieuYzSR87PNhhZ1Sf9oUrd5rs0kXSDNlfN1Tg1DGocuNMig3Fu4cj31VnxbDUFj eCL8hpM+cE8P2W5N6zTlsBdcJBu9PtearDL4ngIE0/LlkmTnu0qms/EUFas3FIlTX3H5vzRKnB3 L3VsnRLcHedGzd8oczLc6kw6JByj9p6F50vxgyrRi77DSi7t3c5fo05rutuy5/J0axfCt4mwKVd CIfRxXwo0eShbw5Pv148ZANDUGIiTQFmjvYAElvmu3Iynt9DBAbicpEx2y2L5KbTdjfMl+nNuvE wZT28WRgT83U/axIrbkD3INXqm9fLsuf8PFbSxlQW/725gg1SLx7M4qYI0/nfAHfNp9bFDD0dxZ Pj+9O2OBRd9Ld4kIOhu+7hs/h//CI1Ru/HFx1bRqDMBDEly4PV4+p/eMITRuKCKtwHMKX6JdvbI KnX/jcL7s51hl4Xz3iRNSsUHVasKD4ouMUl60VjRINVcZhVOI= X-Received: by 2002:a5d:64e4:0:b0:43c:f7e5:817a with SMTP id ffacd0b85a97d-43fe3dd4b72mr47569477f8f.19.1777019548492; Fri, 24 Apr 2026 01:32:28 -0700 (PDT) Received: from localhost (109-81-17-171.rct.o2.cz. [109.81.17.171]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43fe4e46898sm62350600f8f.27.2026.04.24.01.32.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Apr 2026 01:32:28 -0700 (PDT) Date: Fri, 24 Apr 2026 10:32:27 +0200 From: Michal Hocko To: "JP Kobryn (Meta)" Cc: linux-mm@kvack.org, akpm@linux-foundation.org, vbabka@kernel.org, willy@infradead.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, chrisl@kernel.org, kasong@tencent.com, shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com, baohua@kernel.org, youngjun.park@lge.com, qi.zheng@linux.dev, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH] mm/lruvec: preemptively free dead folios during lru_add drain Message-ID: References: <20260423164307.29805-1-jp.kobryn@linux.dev> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260423164307.29805-1-jp.kobryn@linux.dev> On Thu 23-04-26 09:43:07, JP Kobryn (Meta) wrote: > Of all observable lruvec lock contention in our fleet, we find that ~24% > occurs when dead folios are present in lru_add batches at drain time. This > is wasteful in the sense that the folio is added to the LRU just to be > immediately removed via folios_put_refs(), incurring two unnecessary lock > acquisitions. > > Eliminate this overhead by preemptively cleaning up dead folios before they > make it into the LRU. Use folio_ref_freeze() to filter folios whose only > remaining refcount is the batch ref. When dead folios are found, move them > off the add batch and onto a temporary batch to be freed. > > During A/B testing on one of our prod instagram workloads (high-frequency > short-lived requests), the patch intercepted almost all dead folios before > they entered the LRU. Data collected using the mm_lru_insertion tracepoint > shows the effectiveness of the patch: > > Per-host LRU add averages at 95% CPU load > (60 hosts each side, 3 x 60s intervals) > > dead folios/min total folios/min dead % > unpatched: 1,297,785 19,341,986 6.7097% > patched: 14 19,039,996 0.0001% > > Within this workload, we save ~2.6M lock acquisitions per minute per host > as a result. > > System-wide memory stats improved on the patched side also at 95% CPU load: > - direct reclaim scanning reduced 7% > - allocation stalls reduced 5.2% > - compaction stalls reduced 12.3% > - page frees reduced 4.9% > > No regressions were observed in requests served per second or request tail > latency (p99). Both metrics showed directional improvement at higher CPU > utilization (comparing 85% to 95%). > > Signed-off-by: JP Kobryn (Meta) Acked-by: Michal Hocko Thanks! > --- > mm/swap.c | 36 +++++++++++++++++++++++++++++++++++- > 1 file changed, 35 insertions(+), 1 deletion(-) > > diff --git a/mm/swap.c b/mm/swap.c > index 5cc44f0de9877..71607b0ce3d18 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -160,13 +160,36 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) > int i; > struct lruvec *lruvec = NULL; > unsigned long flags = 0; > + struct folio_batch free_fbatch; > + bool is_lru_add = (move_fn == lru_add); > + > + /* > + * If we're adding to the LRU, preemptively filter dead folios. Use > + * this dedicated folio batch for temp storage and deferred cleanup. > + */ > + if (is_lru_add) > + folio_batch_init(&free_fbatch); > > for (i = 0; i < folio_batch_count(fbatch); i++) { > struct folio *folio = fbatch->folios[i]; > > /* block memcg migration while the folio moves between lru */ > - if (move_fn != lru_add && !folio_test_clear_lru(folio)) > + if (!is_lru_add && !folio_test_clear_lru(folio)) > + continue; > + > + /* > + * Filter dead folios by moving them from the add batch to the temp > + * batch for freeing after this loop. > + * > + * Since the folio may be part of a huge page, unqueue from > + * deferred split list to avoid a dangling list entry. > + */ > + if (is_lru_add && folio_ref_freeze(folio, 1)) { > + folio_unqueue_deferred_split(folio); > + fbatch->folios[i] = NULL; > + folio_batch_add(&free_fbatch, folio); > continue; > + } > > folio_lruvec_relock_irqsave(folio, &lruvec, &flags); > move_fn(lruvec, folio); > @@ -176,6 +199,13 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn) > > if (lruvec) > lruvec_unlock_irqrestore(lruvec, flags); > + > + /* Cleanup filtered dead folios. */ > + if (is_lru_add) { > + mem_cgroup_uncharge_folios(&free_fbatch); > + free_unref_folios(&free_fbatch); > + } > + > folios_put(fbatch); > } > > @@ -964,6 +994,10 @@ void folios_put_refs(struct folio_batch *folios, unsigned int *refs) > struct folio *folio = folios->folios[i]; > unsigned int nr_refs = refs ? refs[i] : 1; > > + /* Folio batch entry may have been preemptively removed during drain. */ > + if (!folio) > + continue; > + > if (is_huge_zero_folio(folio)) > continue; > > -- > 2.52.0 -- Michal Hocko SUSE Labs