From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55A7137F74B
	for <linux-kernel@vger.kernel.org>; Mon, 20 Apr 2026 09:13:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1776676390; cv=none; b=pHJS+nrq+rAbMSDOKtoda3TNmq22BuAP+fOmjz7BKWFeNZk6pm7H4+O+gg78o8Gje5YqAAbtmtU+t/28n6jiW+favboo0+9RIEGAQb81YG1ZxY7CKAejopRAkCtVfVMqXRQTYw9aRo88VKL4eWZRBAe4WGktKJMOGZj0WyAAfpU=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1776676390; c=relaxed/simple;
	bh=C8qoKlks85F0abyRY4UVHu8S3bje1A5OiFTsLNFCHJ0=;
	h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From:
	 In-Reply-To:Content-Type; b=AC3TKmJOjL6nLPHausMbiJhZ4sKz5bMvA+FwIZfFT4eXeWnC/lgMDAeRBBhnez0wOLouOWdGmFwZjlOk+sEQt8MrwH/sDRbzVkr6XJmY/GuVV2Gd+WZQ8kex9d6qlBva+UDfQVeOEb7WVq+QvnTAcKJARKTKyAw+vMSelzf1TLk=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=feyZ5DIA; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="feyZ5DIA"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8B851C19425;
	Mon, 20 Apr 2026 09:13:04 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1776676390;
	bh=C8qoKlks85F0abyRY4UVHu8S3bje1A5OiFTsLNFCHJ0=;
	h=Date:Subject:To:Cc:References:From:In-Reply-To:From;
	b=feyZ5DIABOXUJTgBnuaMHlVlwg5r8J+Ztvj+k2Sb06ZsTl/QJw8gOB6hCm5yrRX0Z
	 a9sMHw0RVWUIUNjFq9nMZIBURX1O2E1pWmlZoEprS4fXUGldtBcWNRzRnsEsyLvqS4
	 QIutR9xDfiuIGKNS+1jZUAlnXeUSuCRTOqNU5x3l8TT4G4rqoAHScXZUM1mb0E+O99
	 xBcbSZHbHt3yAmgsLrOSL9/a0JIg3r5wsK6AbLtYN2HvhOFeccHEegiFnstHWZckD4
	 Ocs8yYOp1Ab4AvqyXPbOSPJ1d2ZfVA5uIWYdZQ6xWnHK2qhRw5u0AzUT2iJ9ZBmL+C
	 HsvsINsxHuBnA==
Message-ID: <b63f38c6-b073-4baf-9e61-d56b85fcb4c3@kernel.org>
Date: Mon, 20 Apr 2026 11:13:02 +0200
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] mm: Require LRU reclaim progress before retrying direct
 reclaim
Content-Language: en-US
To: Matt Fleming <matt@readmodwrite.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
 Christoph Hellwig <hch@infradead.org>, Jens Axboe <axboe@kernel.dk>,
 Sergey Senozhatsky <senozhatsky@chromium.org>,
 Roman Gushchin <roman.gushchin@linux.dev>, Minchan Kim <minchan@kernel.org>,
 kernel-team@cloudflare.com, Matt Fleming <mfleming@cloudflare.com>,
 Johannes Weiner <hannes@cmpxchg.org>, Chris Li <chrisl@kernel.org>,
 Kairui Song <kasong@tencent.com>, Kemeng Shi <shikemeng@huaweicloud.com>,
 Nhat Pham <nphamcs@gmail.com>, Baoquan He <bhe@redhat.com>,
 Barry Song <baohua@kernel.org>, Suren Baghdasaryan <surenb@google.com>,
 Michal Hocko <mhocko@suse.com>, Brendan Jackman <jackmanb@google.com>,
 Zi Yan <ziy@nvidia.com>, Axel Rasmussen <axelrasmussen@google.com>,
 Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
 David Hildenbrand <david@kernel.org>, Qi Zheng <zhengqi.arch@bytedance.com>,
 Shakeel Butt <shakeel.butt@linux.dev>, Lorenzo Stoakes <ljs@kernel.org>,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org
References: <20260410101550.2930139-1-matt@readmodwrite.com>
 <6ca33173-145b-43aa-8a8a-34985d375246@kernel.org>
 <ad9SVxWIP4FN1c9D@matt-Precision-5490>
From: "Vlastimil Babka (SUSE)" <vbabka@kernel.org>
In-Reply-To: <ad9SVxWIP4FN1c9D@matt-Precision-5490>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

On 4/15/26 11:11, Matt Fleming wrote:
> On Mon, Apr 13, 2026 at 05:38:19PM +0200, Vlastimil Babka (SUSE) wrote:
>> 
>> Hi Matt,
>> 
>> so have you tested it for your usecase with zram and have any observations
>> how it helped, what values did you set etc?
> 
> Hey Vlastimil,
> 
> Yeah I've tested this out. So far, results have been positive -- I see
> system-wide OOM kills when memory is low and direct reclaim occurs, but
> not so many OOM kills that the SRE folks have started screaming at me.

Hmm...

> I've only run with the proposed 1% value so far. I also ran a bunch of
> benchmarks alongside a memory hogging app that peridoically touches
> anoymous memory.
> 
> Workload                     rpp=0              rpp=1               Notes
> ----------------------------------------------------------------------------------------------
> Kernel compile + anon hog    Completed, no OOM  Completed,          Global OOM confirmed from
>                                                 Global OOM fired    __alloc_pages_slowpath

Completed in both cases... but was it faster? Also what got OOM killed, the hog?

> 
> Memcached + anon hog         282k / 2.30M ops/s 562k / 3.53M ops/s  Global OOM killed hog,
>                              No OOM             Global OOM fired    then benchmark ran faster

The improvement is nice. However even in the rpp=0 case there didn't seem to
have been a thrashing so bad the system wouldn't recover.

I think this is minimally an argument against having it enabled by default,
as by default we don't want to cause premature OOMs if the system is still
working (And yes, we do have problems to recognize when it's not working,
and actually doing OOM). But these tradeoffs for killing something to get
better throughput on something else are good for certain kind of
servers/workloads but not as a default.

And once you go that way then you might be better of looking at the PSI
metrics that would be more holistic than this heuristic?

> Pure fio (5 reruns each)     median 3710 MiB/s  median 3702 MiB/s   No reproducible regression
> Mixed fio + anon hog         2747 MiB/s         2915 MiB/s          Global OOM killed
>                                                                     unrelated services
> 
> reclaim_progress_pct=1 seems to help in these memory exhausted
> situations, and doesn't appear to cause a regression for the pure file
> workload case.
> 
> If you have any suggestions for other tests or benchmarks to run I'd be
> happy to do that.
> 
> Thanks,
> Matt