From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55A7137F74B for ; Mon, 20 Apr 2026 09:13:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776676390; cv=none; b=pHJS+nrq+rAbMSDOKtoda3TNmq22BuAP+fOmjz7BKWFeNZk6pm7H4+O+gg78o8Gje5YqAAbtmtU+t/28n6jiW+favboo0+9RIEGAQb81YG1ZxY7CKAejopRAkCtVfVMqXRQTYw9aRo88VKL4eWZRBAe4WGktKJMOGZj0WyAAfpU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1776676390; c=relaxed/simple; bh=C8qoKlks85F0abyRY4UVHu8S3bje1A5OiFTsLNFCHJ0=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AC3TKmJOjL6nLPHausMbiJhZ4sKz5bMvA+FwIZfFT4eXeWnC/lgMDAeRBBhnez0wOLouOWdGmFwZjlOk+sEQt8MrwH/sDRbzVkr6XJmY/GuVV2Gd+WZQ8kex9d6qlBva+UDfQVeOEb7WVq+QvnTAcKJARKTKyAw+vMSelzf1TLk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=feyZ5DIA; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="feyZ5DIA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B851C19425; Mon, 20 Apr 2026 09:13:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1776676390; bh=C8qoKlks85F0abyRY4UVHu8S3bje1A5OiFTsLNFCHJ0=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=feyZ5DIABOXUJTgBnuaMHlVlwg5r8J+Ztvj+k2Sb06ZsTl/QJw8gOB6hCm5yrRX0Z a9sMHw0RVWUIUNjFq9nMZIBURX1O2E1pWmlZoEprS4fXUGldtBcWNRzRnsEsyLvqS4 QIutR9xDfiuIGKNS+1jZUAlnXeUSuCRTOqNU5x3l8TT4G4rqoAHScXZUM1mb0E+O99 xBcbSZHbHt3yAmgsLrOSL9/a0JIg3r5wsK6AbLtYN2HvhOFeccHEegiFnstHWZckD4 Ocs8yYOp1Ab4AvqyXPbOSPJ1d2ZfVA5uIWYdZQ6xWnHK2qhRw5u0AzUT2iJ9ZBmL+C HsvsINsxHuBnA== Message-ID: Date: Mon, 20 Apr 2026 11:13:02 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct reclaim Content-Language: en-US To: Matt Fleming Cc: Andrew Morton , Christoph Hellwig , Jens Axboe , Sergey Senozhatsky , Roman Gushchin , Minchan Kim , kernel-team@cloudflare.com, Matt Fleming , Johannes Weiner , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Suren Baghdasaryan , Michal Hocko , Brendan Jackman , Zi Yan , Axel Rasmussen , Yuanchu Xie , Wei Xu , David Hildenbrand , Qi Zheng , Shakeel Butt , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20260410101550.2930139-1-matt@readmodwrite.com> <6ca33173-145b-43aa-8a8a-34985d375246@kernel.org> From: "Vlastimil Babka (SUSE)" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 4/15/26 11:11, Matt Fleming wrote: > On Mon, Apr 13, 2026 at 05:38:19PM +0200, Vlastimil Babka (SUSE) wrote: >> >> Hi Matt, >> >> so have you tested it for your usecase with zram and have any observations >> how it helped, what values did you set etc? > > Hey Vlastimil, > > Yeah I've tested this out. So far, results have been positive -- I see > system-wide OOM kills when memory is low and direct reclaim occurs, but > not so many OOM kills that the SRE folks have started screaming at me. Hmm... > I've only run with the proposed 1% value so far. I also ran a bunch of > benchmarks alongside a memory hogging app that peridoically touches > anoymous memory. > > Workload rpp=0 rpp=1 Notes > ---------------------------------------------------------------------------------------------- > Kernel compile + anon hog Completed, no OOM Completed, Global OOM confirmed from > Global OOM fired __alloc_pages_slowpath Completed in both cases... but was it faster? Also what got OOM killed, the hog? > > Memcached + anon hog 282k / 2.30M ops/s 562k / 3.53M ops/s Global OOM killed hog, > No OOM Global OOM fired then benchmark ran faster The improvement is nice. However even in the rpp=0 case there didn't seem to have been a thrashing so bad the system wouldn't recover. I think this is minimally an argument against having it enabled by default, as by default we don't want to cause premature OOMs if the system is still working (And yes, we do have problems to recognize when it's not working, and actually doing OOM). But these tradeoffs for killing something to get better throughput on something else are good for certain kind of servers/workloads but not as a default. And once you go that way then you might be better of looking at the PSI metrics that would be more holistic than this heuristic? > Pure fio (5 reruns each) median 3710 MiB/s median 3702 MiB/s No reproducible regression > Mixed fio + anon hog 2747 MiB/s 2915 MiB/s Global OOM killed > unrelated services > > reclaim_progress_pct=1 seems to help in these memory exhausted > situations, and doesn't appear to cause a regression for the pure file > workload case. > > If you have any suggestions for other tests or benchmarks to run I'd be > happy to do that. > > Thanks, > Matt