Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <roman.gushchin@linux.dev>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	 linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Liu Shixin <liushixin2@huawei.com>
Subject: Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing
Date: Fri, 25 Jul 2025 16:25:49 -0700	[thread overview]
Message-ID: <87jz3vdf9e.fsf@linux.dev> (raw)
In-Reply-To: <875xffsxj4.fsf@linux.dev> (Roman Gushchin's message of "Fri, 25 Jul 2025 15:42:07 -0700")

Roman Gushchin <roman.gushchin@linux.dev> writes:

> Jan Kara <jack@suse.cz> writes:
>
>> On Thu 10-07-25 12:52:32, Roman Gushchin wrote:
>>> We've noticed in production that under a very heavy memory pressure
>>> the readahead behavior becomes unstable causing spikes in memory
>>> pressure and CPU contention on zone locks.
>>> 
>>> The current mmap_miss heuristics considers minor pagefaults as a
>>> good reason to decrease mmap_miss and conditionally start async
>>> readahead. This creates a vicious cycle: asynchronous readahead
>>> loads more pages, which in turn causes more minor pagefaults.
>>> This problem is especially pronounced when multiple threads of
>>> an application fault on consecutive pages of an evicted executable,
>>> aggressively lowering the mmap_miss counter and preventing readahead
>>> from being disabled.
>>
>> I think you're talking about filemap_map_pages() logic of handling
>> mmap_miss. It would be nice to mention it in the changelog. There's one
>> thing that doesn't quite make sense to me: When there's memory pressure,
>> I'd expect the pages to be reclaimed from memory and not just unmapped. 
>> Also given your solution uses !uptodate folios suggests the pages were
>> actually fully reclaimed and the problem really is that filemap_map_pages()
>> treats as minor page fault (i.e., cache hit) what is in fact a major page
>> fault (i.e., cache miss)?
>>
>> Actually, now that I digged deeper I've remembered that based on Liu
>> Shixin's report
>> (https://lore.kernel.org/all/20240201100835.1626685-1-liushixin2@huawei.com/)
>> which sounds a lot like what you're reporting, we have eventually merged his
>> fixes (ended up as commits 0fd44ab213bc ("mm/readahead: break read-ahead
>> loop if filemap_add_folio return -ENOMEM"), 5c46d5319bde ("mm/filemap:
>> don't decrease mmap_miss when folio has workingset flag")). Did you test a
>> kernel with these fixes (6.10 or later)? In particular after these fixes
>> the !folio_test_workingset() check in filemap_map_folio_range() and
>> filemap_map_order0_folio() should make sure we don't decrease mmap_miss
>> when faulting fresh pages. Or was in your case page evicted so long ago
>> that workingset bit is already clear?
>>
>> Once we better understand the situation, let me also mention that I have
>> two patches which I originally proposed to fix Liu's problems. They didn't
>> quite fix them so his patches got merged in the end but the problems
>> described there are still somewhat valid:
>
> Ok, I got a better understanding of the situation now. Basically we have
> a multi-threaded application which is under very heavy memory pressure.
> I multiple threads are faulting simultaneously into the same page,
> do_sync_mmap_readahead() can be called multiple times for the same page.
> This creates a negative pressure on the mmap_miss counter, which can't be
> matched by do_sync_mmap_readahead(), which is be called only once
> for every page. This basically keeps the readahead on, despite the heavy
> memory pressure.
>
> The following patch solves the problem, at least in my test scenario.
> Wdyt?

Actually, a better version is below. We don't have to avoid the actual
readahead, just not decrease mmap_miss if the page is locked.

--

diff --git a/mm/filemap.c b/mm/filemap.c
index 0d0369fb5fa1..1756690dd275 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3323,9 +3323,15 @@ static struct file *do_async_mmap_readahead(struct vm_fault *vmf,
        if (vmf->vma->vm_flags & VM_RAND_READ || !ra->ra_pages)
                return fpin;
 
-       mmap_miss = READ_ONCE(ra->mmap_miss);
-       if (mmap_miss)
-               WRITE_ONCE(ra->mmap_miss, --mmap_miss);
+       /* If folio is locked, we're likely racing against another fault,
+        * don't decrease the mmap_miss counter to avoid decreasing it
+        * multiple times for the same page and break the balance.
+        */
+       if (likely(!folio_test_locked(folio))) {
+               mmap_miss = READ_ONCE(ra->mmap_miss);
+               if (mmap_miss)
+                       WRITE_ONCE(ra->mmap_miss, --mmap_miss);
+       }
 
        if (folio_test_readahead(folio)) {
                fpin = maybe_unlock_mmap_for_io(vmf, fpin);

next prev parent reply	other threads:[~2025-07-25 23:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 19:52 [PATCH] mm: consider disabling readahead if there are signs of thrashing Roman Gushchin
2025-07-10 20:57 ` Andrew Morton
2025-07-10 22:54   ` Roman Gushchin
2025-07-10 21:43 ` Matthew Wilcox
2025-07-11 16:29   ` Roman Gushchin
2025-07-14 15:16 ` Jan Kara
2025-07-14 20:12   ` Roman Gushchin
2025-07-25 22:42   ` Roman Gushchin
2025-07-25 23:25     ` Roman Gushchin [this message]
2025-07-28  9:16       ` Jan Kara

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:0d0369fb5fa dfblob:1756690dd27 )
 OR (
bs:"Re: [PATCH] mm: consider disabling readahead if there are signs of thrashing" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87jz3vdf9e.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).