From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 22BDF1E531 for ; Thu, 16 Apr 2026 01:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.174 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776301346; cv=none; b=Cz7jaiQDhPyKevLB2WeUc2X0Gnlin0yRRMes7BQtZybTOZajIWbBxUclwACc1nXhOT2PaPEI+WdcKjwlAKOgMNUw48HBzLKMwW1dRtkpvPNDnHbV56XCig+ab86bcmDjJx6KPOeDgVUwvKXKNrzDhCql+iAp+4t7jmLiSG/xTFA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776301346; c=relaxed/simple; bh=KcUqBkJL07vmATJgW4hzYgEzWaasdMDzqMG6XO+wO94=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=ZClzJNh1IiBOPqI6fOukEYv+J2X9oTRGRyJkWGcMRjqoWjrSL2I3UM9GbOS2L7jLi1GiMZeWVyVVgYPI/oCBpbF6oO5JsJmGSDPVeJw/JDFHepNJKX8ZY+ZQ6q4OKKoPhI64B95CPPW+6nBAs8EOCzEbz0s0bsQIpfbZZDVcfyg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=svbvtnVI; arc=none smtp.client-ip=95.215.58.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="svbvtnVI" Date: Wed, 15 Apr 2026 18:01:54 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1776301341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M/IxZ9Exo9xtMakzxD94fQQCxKkr37hohfsJWCyPGpU=; b=svbvtnVIstOZ+4s1tT6Pc2S5sB5+MzNqsn2eqkXKDE6BKiuA3xhWll1qDnSC5yf7e2RZoW /NNXdhCYFXAewFWhBQSOZoL6gg/gLdjS1RBfAB101F6EMJfrWIbC1HML5F75nZBJM9yA+a J3B+viP3FcnEM1zeypuq0tjd4wD5+iM= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Matt Fleming Cc: Andrew Morton , Christoph Hellwig , Jens Axboe , Sergey Senozhatsky , Roman Gushchin , Minchan Kim , kernel-team@cloudflare.com, Matt Fleming , Johannes Weiner , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Qi Zheng , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Message-ID: References: <20260410101550.2930139-1-matt@readmodwrite.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260410101550.2930139-1-matt@readmodwrite.com> X-Migadu-Flow: FLOW_OUT On Fri, Apr 10, 2026 at 11:15:49AM +0100, Matt Fleming wrote: > From: Matt Fleming > > should_reclaim_retry() uses zone_reclaimable_pages() to estimate whether > retrying reclaim could eventually satisfy an allocation. It's possible > for reclaim to make minimal or no progress on an LRU type despite having > ample reclaimable pages, e.g. anonymous pages when the only swap is > RAM-backed (zram). Or incompressible memory on zswap with writeback disabled or overcommitted memory.min. > This can cause the reclaim path to loop indefinitely. > > Track LRU reclaim progress (anon vs file) through a new struct > reclaim_progress passed out of try_to_free_pages(), and only count a > type's reclaimable pages if at least reclaim_progress_pct% was actually > reclaimed in the last cycle. > > The threshold is exposed as /proc/sys/vm/reclaim_progress_pct (default > 1, range 0-100). Let's not expose any sysctl or user visible API for this heuristic. It will evolve and then this interface would be awkward and hard to remove. > Setting 0 disables the gate and restores the previous > behaviour. Environments with only RAM-backed swap (zram) and small > memory may need a higher value to prevent futile anon LRU churn from > keeping the allocator spinning. > > Suggested-by: Johannes Weiner > Signed-off-by: Matt Fleming > --- [...] > > @@ -4637,7 +4672,24 @@ should_reclaim_retry(gfp_t gfp_mask, unsigned order, > !__cpuset_zone_allowed(zone, gfp_mask)) > continue; > > - available = reclaimable = zone_reclaimable_pages(zone); > + /* > + * Only count reclaimable pages from an LRU type if reclaim > + * actually made headway on that type in the last cycle. > + * This prevents the allocator from looping endlessly on > + * account of a large pool of pages that reclaim cannot make > + * progress on, e.g. anonymous pages when the only swap is > + * RAM-backed (zram). > + */ > + reclaimable = 0; > + reclaimable_file = zone_reclaimable_file_pages(zone); > + reclaimable_anon = zone_reclaimable_anon_pages(zone); Here we are getting the current reclaimable pages. > + > + if (reclaim_progress_sufficient(progress->nr_file, reclaimable_file)) > + reclaimable += reclaimable_file; > + if (reclaim_progress_sufficient(progress->nr_anon, reclaimable_anon)) > + reclaimable += reclaimable_anon; And here we are comparing the current reclaimable pages with last iteration. Is this intentional to keep things simple? > + > + available = reclaimable; > available += zone_page_state_snapshot(zone, NR_FREE_PAGES); > Another heuristic we can play with is to also pass through the vmscan scan count. If for couple of consecutive iterations, we continue to see low reclaim efficiency, go for OOM. Also maybe compare the scan count with the watermark as I expect we don't see much difference scan count for consecutive reclaim iteration, so, it is a good representative of reclaimable memory. The reclaim efficiency heuristic should handle the swap-on-zram or incomp-zswap-with-no-writeback. Treating scan count as proxy for reclaimable memory should handle the overcommitted memory.min case.