* Re: [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios
[not found] ` <b2d9fe2b-abb0-49d1-8056-ac93aa232bbb@kernel.org>
@ 2026-05-08 20:57 ` Liam R. Howlett
2026-05-11 13:05 ` David Hildenbrand (Arm)
2026-05-13 6:47 ` Michal Hocko
0 siblings, 2 replies; 3+ messages in thread
From: Liam R. Howlett @ 2026-05-08 20:57 UTC (permalink / raw)
To: David Hildenbrand (Arm)
Cc: Michal Hocko, Minchan Kim, Suren Baghdasaryan, akpm, hca,
linux-s390, brauner, linux-mm, linux-kernel, timmurray,
Liam R. Howlett
On 26/04/30 08:08AM, David Hildenbrand (Arm) wrote:
> On 4/29/26 16:44, Michal Hocko wrote:
> > On Wed 29-04-26 15:07:04, David Hildenbrand wrote:
> >> On 4/29/26 12:33, Michal Hocko wrote:
> >>>
> >>> While the oom is the only current kernel user of MMF_UNSTABLE (in a
> >>> sense it sets the flag) the flag should denote that any page faults are
> >>> reliable because it might fault in a fresh memory and user would lose
> >>> the previous content without knowing that. Not sure MMF_OOM_REAPING
> >>> would reflect that reality better.
> >>
> >> We use it for failed fork() as well, but that's slightly different semantics (no
> >> real page faults ever made sense).
>
> Well, there is a difference: a failed-fork process was never scheduled and will
> never get scheduled.
>
> In fact, we added the MMF_UNSTABLE to the fork path in
>
> commit 64c37e134b120fb462fb4a80694bfb8e7be77b14
> Author: Liam R. Howlett <liam@infradead.org>
> Date: Mon Jan 27 12:02:21 2025 -0500
>
> kernel: be more careful about dup_mmap() failures and uprobe registering
>
> If a memory allocation fails during dup_mmap(), the maple tree can be left
> in an unsafe state for other iterators besides the exit path. All the
> locks are dropped before the exit_mmap() call (in mm/mmap.c), but the
> incomplete mm_struct can be reached through (at least) the rmap finding
> the vmas which have a pointer back to the mm_struct.
>
> Up to this point, there have been no issues with being able to find an
> mm_struct that was only partially initialised. Syzbot was able to make
> the incomplete mm_struct fail with recent forking changes, so it has been
> proven unsafe to use the mm_struct that hasn't been initialised, as
> referenced in the link below.
>
> Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to
> invalid mm") fixed the uprobe access, it does not completely remove the
> race.
>
> This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the
> oom side (even though this is extremely unlikely to be selected as an oom
> victim in the race window), and sets MMF_UNSTABLE to avoid other potential
> users from using a partially initialised mm_struct.
>
> Which was later changed in
>
> commit 43873af772f8138c5cb4b76dde9c26339e89be3b
> Author: Liam R. Howlett <liam@infradead.org>
> Date: Wed Jan 21 11:49:42 2026 -0500
>
> mm: change dup_mmap() recovery
>
> When the dup_mmap() fails during the vma duplication or setup, don't write
> the XA_ZERO entry in the vma tree. Instead, destroy the tree and free the
> new resources, leaving an empty vma tree.
>
> Using XA_ZERO introduced races where the vma could be found between
> dup_mmap() dropping all locks and exit_mmap() taking the locks. The race
> can occur because the mm can be reached through the other trees via
> successfully copied vmas and other methods such as the swapoff code.
> ...
>
> and I am not sure if MMF_UNSTABLE is still required, as we don't leave these
> stale VMA copies in the maple tree.
>
> The process might just look like just another process that is getting torn down now.
>
> But we'd have to learn from Liam :)
Yes, it will be a zero entry tree now.
I left the flag to indicate that it's an unstable mm, not for faulting
in but to be skipped in OOM events and process_mrelease since neither
should bother doing anything in the window between the dup_mm() failure
and the exit_mmap() window where the write lock was dropped.
We can safely drop the flag now if you want to, because everything has
to deal with an empty vma tree anyways - a race can occur between a call
to unmap everything and the task seg faulting.
>
>
> >
> > The bottom line is the same. Make sure PF fails rather than silently
> > provide potentially corrupted data.
> >
> >> Looking at the original patch here, using MMF_OOM_REAPING to modify zapping
> >> behavior would be clearer than MMF_UNSTABLE, I guess.
> >
> > Ohh, you mean to add a new flag, right?
>
> We could do that as well, if it's of any help.
I really think this goes back to the life cycle of the mm being somewhat
difficult to figure out. I'm fine with another flag.
Thanks,
Liam
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios
2026-05-08 20:57 ` [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Liam R. Howlett
@ 2026-05-11 13:05 ` David Hildenbrand (Arm)
2026-05-13 6:47 ` Michal Hocko
1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-11 13:05 UTC (permalink / raw)
To: Liam R. Howlett
Cc: Michal Hocko, Minchan Kim, Suren Baghdasaryan, akpm, hca,
linux-s390, brauner, linux-mm, linux-kernel, timmurray,
Liam R. Howlett
>>
>> But we'd have to learn from Liam :)
>
> Yes, it will be a zero entry tree now.
>
> I left the flag to indicate that it's an unstable mm, not for faulting
> in but to be skipped in OOM events and process_mrelease since neither
> should bother doing anything in the window between the dup_mm() failure
> and the exit_mmap() window where the write lock was dropped.
>
> We can safely drop the flag now if you want to, because everything has
> to deal with an empty vma tree anyways - a race can occur between a call
> to unmap everything and the task seg faulting.
Thanks for confirming!
>
>>
>>
>>>
>>> The bottom line is the same. Make sure PF fails rather than silently
>>> provide potentially corrupted data.
>>>
>>>
>>> Ohh, you mean to add a new flag, right?
>>
>> We could do that as well, if it's of any help.
>
> I really think this goes back to the life cycle of the mm being somewhat
> difficult to figure out.
Agreed.
> I'm fine with another flag.
Right, alternatively we could just turn the unstable flag into a "OOM is hiding
in the bushes to reap this MM".
--
Cheers,
David
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios
2026-05-08 20:57 ` [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Liam R. Howlett
2026-05-11 13:05 ` David Hildenbrand (Arm)
@ 2026-05-13 6:47 ` Michal Hocko
1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2026-05-13 6:47 UTC (permalink / raw)
To: Liam R. Howlett
Cc: David Hildenbrand (Arm), Minchan Kim, Suren Baghdasaryan, akpm,
hca, linux-s390, brauner, linux-mm, linux-kernel, timmurray,
Liam R. Howlett
On Fri 08-05-26 22:57:31, Liam R. Howlett wrote:
> We can safely drop the flag now if you want to, because everything has
> to deal with an empty vma tree anyways - a race can occur between a call
> to unmap everything and the task seg faulting.
Let's just drop it if it doesn't sever any real need anymore.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-05-13 6:47 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <ae-Zu-VAzAA7SdLa@tiehlicka>
[not found] ` <ae_roPR64e6sY_fN@google.com>
[not found] ` <afBaJLLFigkdszov@tiehlicka>
[not found] ` <afFco71vwmpQy3pk@google.com>
[not found] ` <afG-4hq7Hr62Uu6J@tiehlicka>
[not found] ` <7f98f461-62a7-455d-a7a8-cb8928465946@kernel.org>
[not found] ` <afHeXY-yeTwmURWh@tiehlicka>
[not found] ` <4a612d63-2838-40f5-ab67-79bf35dd3a56@kernel.org>
[not found] ` <afIZQOtaBabeHtCc@tiehlicka>
[not found] ` <b2d9fe2b-abb0-49d1-8056-ac93aa232bbb@kernel.org>
2026-05-08 20:57 ` [PATCH v1 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Liam R. Howlett
2026-05-11 13:05 ` David Hildenbrand (Arm)
2026-05-13 6:47 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox