From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3D1CC83F1A for ; Thu, 10 Jul 2025 22:54:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4249A6B0093; Thu, 10 Jul 2025 18:54:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D6DB6B009B; Thu, 10 Jul 2025 18:54:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EB5B6B009D; Thu, 10 Jul 2025 18:54:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1DB726B0093 for ; Thu, 10 Jul 2025 18:54:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9A7375B71E for ; Thu, 10 Jul 2025 22:54:28 +0000 (UTC) X-FDA: 83649860616.05.1E935EE Received: from out-186.mta0.migadu.com (out-186.mta0.migadu.com [91.218.175.186]) by imf04.hostedemail.com (Postfix) with ESMTP id C11B140006 for ; Thu, 10 Jul 2025 22:54:26 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=n6fBUYTE; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752188067; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KP8p9g08D9jWaDh9BfM4THHq0w5McsGys06LzlvFrwo=; b=GTPgvALlz8C69qbn//XaWIBqyrtGnhNFE2PpX7o6m9JiwJPgB9ElAi0JvxDB6Ed6cMImME JDzYQGkMEqqkb55RuPrifZxrd3bLELe/9Xxl8U8C+DU42Ox+EmI4uJxNshZj2Evx/x8gGX kQA5ad7Tak0ImTe/Gujokfp/yPJyMLo= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=n6fBUYTE; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf04.hostedemail.com: domain of roman.gushchin@linux.dev designates 91.218.175.186 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752188067; a=rsa-sha256; cv=none; b=IniqsTtCj3lSVDOwA3ybyVHpAX1kf3oY998FKMJBk1XqLdhcPmulB4gvzVgLqYq1Cj2qDM hV4Xmi+kA0IqX2yhxiQK7uvOrxRAOlWQmOsr7ooaRGmgj7ux73fgQa/LrtdA/E8v0EkJRC npeHF+ZISUr1HShHhJfaZlnoEn9WucY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1752188064; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=KP8p9g08D9jWaDh9BfM4THHq0w5McsGys06LzlvFrwo=; b=n6fBUYTEwv6xJ4QjIHC9mehNBvqpMifcFEr0qIXHldx9TIxM+F1W2LOs40EY2BBUICHlXt opZIikH3egkMaXLZ+a29jPuoGVxdi1sF3qdsvfR/9yqwA1PA4zGo/LjBQjUTjETpsdNN08 aWMfj1PRRQDkmSUhHiwvvQ/20mEhTXc= From: Roman Gushchin To: Andrew Morton Cc: Jan Kara , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Liu Shixin Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing In-Reply-To: <20250710135713.916a4898fb02f595206ac861@linux-foundation.org> (Andrew Morton's message of "Thu, 10 Jul 2025 13:57:13 -0700") References: <20250710195232.124790-1-roman.gushchin@linux.dev> <20250710135713.916a4898fb02f595206ac861@linux-foundation.org> Date: Thu, 10 Jul 2025 15:54:17 -0700 Message-ID: <8734b3ac86.fsf@linux.dev> MIME-Version: 1.0 Content-Type: text/plain X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C11B140006 X-Stat-Signature: ayeh7qyig95jwn5qucnauhchx1fquhc4 X-Rspam-User: X-HE-Tag: 1752188066-302025 X-HE-Meta: U2FsdGVkX19bfCpp+T3pxSHEjf/jkxIO+UxIjo2d8n1d7zxfqRsdwzrJp5CH7gcggqoqN1ppkNAAfsEs425P372DMpREf7U+wEcgmZqvf+wmTdTTKMn6X0YH1G1ZCIr9PzvNyBarpPl1x6UVEoAhHk9bkco3XDRHGQ7uAr7VVkTcAsR8ACToJBk0LxuEMHFYoSLzxoVZWDyS00TYR/agyg8b0LPiiuUpLtGgA3921M4Dh/VtaME2HsUmlTpmc1I71dAJhh9RcIKD4RhpufUsvwu/MJUMYVb3V/PAYcFWjbgjVu27O0h5UUGpKgQZrKLkbyEsovIF2zOX+cU3ZtEum5L37kueMefTGgkZqOfd3wR4l2EXb4zki0KFjJmi0ClYYokORaUDaB38kY3vGkJck9Zq76brk90cPz8bN8q3PiQ8tZx+XJOJVPrTVMUZ6g9v9jtOk/ydaTXCfTbFOQ7J+06oiW1KxfczSrhFJyITLvI9ZzbPv6+UKjIugj3iBkZKYD1GPoM8yyeIK/R7dX9ELe/WKmqxDuI2pwRo9hnSFkQykn9YuFt9gGYQHLH7hbGGTYSp0DGAOAAqhWf3REAd4urGFc+ji6c957nq3tnyyC2nnkSIipOoDJRTUUKQuIlg/s1TGecWvJaOMvI5b4KXQEJPy5IJC3HpK3qAZ2w3zH+z3FHjuMBdBOyzNGUDL6m2Gihcf2LHSlvKX9wlZgQ7sK6GlylciaYSpqStxFJSUKgX1Tok0yEwFynrn8cM5DeoZLF5330bK3J81esCe487ZafwZUnvNL6Z37hriJwYu1XL+bYySDttdrMAnXeBMRciUjRcePCGxqkc6X2KiH/8a4SnWAnmmxylTNWfhQxVEtQ2O0aoRTCudcK4FA2mw7TkOlgdp5jmPO/W2HnElhfJeP5ssV2VAP+erNFabilGHNaYCuVMrtoKOSPAf94yFGK3RDBs6l5jmWPvHUcjU3s fIrxV1mB VZNUI6h3mkmukwaPppR7sk9V4CQt3eD/heHrLFFZfuN/ThCtJdYvjtWj5joT3xWY/WSxlz52Pcfan/GXr55VnhSuk3b0jH1sDHqXj6VhAmpaisgBixCRIId5kpMYXrJdvZX5OyPXDL2BXaVY7a3KLbyp+ODyIFbP/GsZRUej+tZLkwhHJPp04Ut4q9bZJ1NM/1cJj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Andrew Morton writes: > On Thu, 10 Jul 2025 12:52:32 -0700 Roman Gushchin wrote: > >> We've noticed in production that under a very heavy memory pressure >> the readahead behavior becomes unstable causing spikes in memory >> pressure and CPU contention on zone locks. >> >> The current mmap_miss heuristics considers minor pagefaults as a >> good reason to decrease mmap_miss and conditionally start async >> readahead. This creates a vicious cycle: asynchronous readahead >> loads more pages, which in turn causes more minor pagefaults. >> This problem is especially pronounced when multiple threads of >> an application fault on consecutive pages of an evicted executable, >> aggressively lowering the mmap_miss counter and preventing readahead >> from being disabled. >> >> To improve the logic let's check for !uptodate and workingset >> folios in do_async_mmap_readahead(). The presence of such pages >> is a strong indicator of thrashing, which is also used by the >> delay accounting code, e.g. in folio_wait_bit_common(). So instead >> of decreasing mmap_miss and lower chances to disable readahead, >> let's do the opposite and bump it by MMAP_LOTSAMISS / 2. > > Are there any testing results to share? Nothing from the production yet, but it makes a lot of difference to the reproducer I use (authored by Greg Thelen), which basically runs a huge binary with 2xCPU number of threads in a very constrained memory cgroup. Without this change the system is oscillating between performing more or less well and being completely stuck on zone locks contention when 256 threads are all competing for a small number of pages. With this change the system is pretty stable once it reaches the point with the disabled readahead. > > What sort of workloads might be harmed by this change? I hope none, but maybe I miss something. > > We do seem to be thrashing around (heh) with these readahead > heuristics. Lots of potential for playing whack-a-mole. > > Should we make the readahead code more observable? We don't seem to > have much in the way of statistics, counters, etc. And no tracepoints, > which is surprising. I think it's another good mm candidate (the first being oom killer policies, working on it) for eventual bpf-ization. For example, I can easily see that a policy specific to a file format can make a large difference. In this particular case I guess we can disable readahead based on memory psi metrics, potentially all in bpf. Thanks