From: David Hildenbrand <david@redhat.com>
To: Hugh Dickins <hughd@google.com>, Will Deacon <will@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Keir Fraser <keirf@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Frederick Mayle <fmayle@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Peter Xu <peterx@redhat.com>, Rik van Riel <riel@surriel.com>,
Vlastimil Babka <vbabka@suse.cz>, Ge Yang <yangge1116@126.com>
Subject: Re: [PATCH] mm/gup: Drain batched mlock folio processing before attempting migration
Date: Thu, 28 Aug 2025 10:59:20 +0200 [thread overview]
Message-ID: <a0d1d889-c711-494b-a85a-33cbde4688ba@redhat.com> (raw)
In-Reply-To: <8376d8a3-cc36-ae70-0fa8-427e9ca17b9b@google.com>
On 28.08.25 10:47, Hugh Dickins wrote:
> On Sun, 24 Aug 2025, Hugh Dickins wrote:
>> On Mon, 18 Aug 2025, Will Deacon wrote:
>>> On Mon, Aug 18, 2025 at 02:31:42PM +0100, Will Deacon wrote:
>>>> On Fri, Aug 15, 2025 at 09:14:48PM -0700, Hugh Dickins wrote:
>>>>> I think replace the folio_test_mlocked(folio) part of it by
>>>>> (folio_test_mlocked(folio) && !folio_test_unevictable(folio)).
>>>>> That should reduce the extra calls to a much more reasonable
>>>>> number, while still solving your issue.
>>>>
>>>> Alas, I fear that the folio may be unevictable by this point (which
>>>> seems to coincide with the readahead fault adding it to the LRU above)
>>>> but I can try it out.
>>>
>>> I gave this a spin but I still see failures with this change.
>>
>> Many thanks, Will, for the precisely relevant traces (in which,
>> by the way, mapcount=0 really means _mapcount=0 hence mapcount=1).
>>
>> Yes, those do indeed illustrate a case which my suggested
>> (folio_test_mlocked(folio) && !folio_test_unevictable(folio))
>> failed to cover. Very helpful to have an example of that.
>>
>> And many thanks, David, for your reminder of commit 33dfe9204f29
>> ("mm/gup: clear the LRU flag of a page before adding to LRU batch").
>>
>> Yes, I strongly agree with your suggestion that the mlock batch
>> be brought into line with its change to the ordinary LRU batches,
>> and agree that doing so will be likely to solve Will's issue
>> (and similar cases elsewhere, without needing to modify them).
>>
>> Now I just have to cool my head and get back down into those
>> mlock batches. I am fearful that making a change there to suit
>> this case will turn out later to break another case (and I just
>> won't have time to redevelop as thorough a grasp of the races as
>> I had back then). But if we're lucky, applying that "one batch
>> at a time" rule will actually make it all more comprehensible.
>>
>> (I so wish we had spare room in struct page to keep the address
>> of that one batch entry, or the CPU to which that one batch
>> belongs: then, although that wouldn't eliminate all uses of
>> lru_add_drain_all(), it would allow us to efficiently extract
>> a target page from its LRU batch without a remote drain.)
>>
>> I have not yet begun to write such a patch, and I'm not yet sure
>> that it's even feasible: this mail sent to get the polite thank
>> yous out of my mind, to help clear it for getting down to work.
>
> It took several days in search of the least bad compromise, but
> in the end I concluded the opposite of what we'd intended above.
>
> There is a fundamental incompatibility between my 5.18 2fbb0c10d1e8
> ("mm/munlock: mlock_page() munlock_page() batch by pagevec")
> and Ge Yang's 6.11 33dfe9204f29
> ("mm/gup: clear the LRU flag of a page before adding to LRU batch").
>
> It turns out that the mm/swap.c folio batches (apart from lru_add)
> are all for best-effort, doesn't matter if it's missed, operations;
> whereas mlock and munlock are more serious. Probably mlock could
> be (not very satisfactorily) converted, but then munlock? Because
> of failed folio_test_clear_lru()s, it would be far too likely to
> err on either side, munlocking too soon or too late.
>
> I've concluded that one or the other has to go. If we're having
> a beauty contest, there's no doubt that 33dfe9204f29 is much nicer
> than 2fbb0c10d1e8 (which is itself far from perfect). But functionally,
> I'm afraid that removing the mlock/munlock batching will show up as a
> perceptible regression in realistic workloadsg; and on consideration,
> I've found no real justification for the LRU flag clearing change.
Just to understand what you are saying: are you saying that we will go
back to having a folio being part of multiple LRU caches? :/ If so, I
really rally hope that we can find another way and not go back to that
old handling.
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-08-28 8:59 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-15 10:18 [PATCH] mm/gup: Drain batched mlock folio processing before attempting migration Will Deacon
2025-08-16 1:03 ` John Hubbard
2025-08-16 4:33 ` Hugh Dickins
2025-08-18 13:38 ` Will Deacon
2025-08-16 4:14 ` Hugh Dickins
2025-08-16 8:15 ` David Hildenbrand
2025-08-18 13:31 ` Will Deacon
2025-08-18 14:31 ` Will Deacon
2025-08-25 1:25 ` Hugh Dickins
2025-08-25 16:04 ` David Hildenbrand
2025-08-28 8:47 ` Hugh Dickins
2025-08-28 8:59 ` David Hildenbrand [this message]
2025-08-28 16:12 ` Hugh Dickins
2025-08-28 20:38 ` David Hildenbrand
2025-08-29 1:58 ` Hugh Dickins
2025-08-29 8:56 ` David Hildenbrand
2025-08-29 11:57 ` Will Deacon
2025-08-29 13:21 ` Will Deacon
2025-08-29 16:04 ` Hugh Dickins
2025-08-29 15:46 ` Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a0d1d889-c711-494b-a85a-33cbde4688ba@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fmayle@google.com \
--cc=hughd@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=keirf@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=yangge1116@126.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).