* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
@ 2025-06-03 18:29 ` Matthew Wilcox
2025-06-03 18:37 ` David Hildenbrand
2025-06-03 19:03 ` Jann Horn
2025-06-03 18:33 ` David Hildenbrand
` (3 subsequent siblings)
4 siblings, 2 replies; 24+ messages in thread
From: Matthew Wilcox @ 2025-06-03 18:29 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, Peter Xu,
linux-kernel, stable
On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
>
> 1. We are copying the contents of a page with a memcpy() while userspace
> may be writing to it. This can cause the resulting data in the child to
> be inconsistent.
> 2. After we've copied this page, future writes to other pages may
> continue to be visible to the child while future writes to this page are
> no longer visible to the child.
>
> This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).
Um, OK, but isn't that expected behaviour? POSIX says:
: A process shall be created with a single thread. If a multi-threaded
: process calls fork(), the new process shall contain a replica of the
: calling thread and its entire address space, possibly including the
: states of mutexes and other resources. Consequently, the application
: shall ensure that the child process only executes async-signal-safe
: operations until such time as one of the exec functions is successful.
It's always been my understanding that you really, really shouldn't call
fork() from a multithreaded process.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:29 ` Matthew Wilcox
@ 2025-06-03 18:37 ` David Hildenbrand
2025-06-03 19:09 ` Jann Horn
2025-06-03 19:03 ` Jann Horn
1 sibling, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-06-03 18:37 UTC (permalink / raw)
To: Matthew Wilcox, Jann Horn
Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
Peter Xu, linux-kernel, stable
On 03.06.25 20:29, Matthew Wilcox wrote:
> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
>> When fork() encounters possibly-pinned pages, those pages are immediately
>> copied instead of just marking PTEs to make CoW happen later. If the parent
>> is multithreaded, this can cause the child to see memory contents that are
>> inconsistent in multiple ways:
>>
>> 1. We are copying the contents of a page with a memcpy() while userspace
>> may be writing to it. This can cause the resulting data in the child to
>> be inconsistent.
>> 2. After we've copied this page, future writes to other pages may
>> continue to be visible to the child while future writes to this page are
>> no longer visible to the child.
>>
>> This means the child could theoretically see incoherent states where
>> allocator freelists point to objects that are actually in use or stuff like
>> that. A mitigating factor is that, unless userspace already has a deadlock
>> bug, userspace can pretty much only observe such issues when fancy lockless
>> data structures are used (because if another thread was in the middle of
>> mutating data during fork() and the post-fork child tried to take the mutex
>> protecting that data, it might wait forever).
>
> Um, OK, but isn't that expected behaviour? POSIX says:
>
> : A process shall be created with a single thread. If a multi-threaded
> : process calls fork(), the new process shall contain a replica of the
> : calling thread and its entire address space, possibly including the
> : states of mutexes and other resources. Consequently, the application
> : shall ensure that the child process only executes async-signal-safe
> : operations until such time as one of the exec functions is successful.
>
> It's always been my understanding that you really, really shouldn't call
> fork() from a multithreaded process.
I have the same recollection, but rather because of concurrent O_DIRECT
and locking (pthread_atfork ...).
Using the allocator above example: what makes sure that no other thread
is halfway through modifying allocator state? You really have to sync
somehow before calling fork() -- e.g., grabbing allocator locks in
pthread_atfork().
For Linux we document in the man page
"After a fork() in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal-safety(7)) until such time
as it calls execve(2)."
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:37 ` David Hildenbrand
@ 2025-06-03 19:09 ` Jann Horn
2025-06-03 20:17 ` David Hildenbrand
0 siblings, 1 reply; 24+ messages in thread
From: Jann Horn @ 2025-06-03 19:09 UTC (permalink / raw)
To: David Hildenbrand
Cc: Matthew Wilcox, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Peter Xu, linux-kernel, stable
On Tue, Jun 3, 2025 at 8:37 PM David Hildenbrand <david@redhat.com> wrote:
> On 03.06.25 20:29, Matthew Wilcox wrote:
> > On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> >> When fork() encounters possibly-pinned pages, those pages are immediately
> >> copied instead of just marking PTEs to make CoW happen later. If the parent
> >> is multithreaded, this can cause the child to see memory contents that are
> >> inconsistent in multiple ways:
> >>
> >> 1. We are copying the contents of a page with a memcpy() while userspace
> >> may be writing to it. This can cause the resulting data in the child to
> >> be inconsistent.
> >> 2. After we've copied this page, future writes to other pages may
> >> continue to be visible to the child while future writes to this page are
> >> no longer visible to the child.
> >>
> >> This means the child could theoretically see incoherent states where
> >> allocator freelists point to objects that are actually in use or stuff like
> >> that. A mitigating factor is that, unless userspace already has a deadlock
> >> bug, userspace can pretty much only observe such issues when fancy lockless
> >> data structures are used (because if another thread was in the middle of
> >> mutating data during fork() and the post-fork child tried to take the mutex
> >> protecting that data, it might wait forever).
> >
> > Um, OK, but isn't that expected behaviour? POSIX says:
> >
> > : A process shall be created with a single thread. If a multi-threaded
> > : process calls fork(), the new process shall contain a replica of the
> > : calling thread and its entire address space, possibly including the
> > : states of mutexes and other resources. Consequently, the application
> > : shall ensure that the child process only executes async-signal-safe
> > : operations until such time as one of the exec functions is successful.
> >
> > It's always been my understanding that you really, really shouldn't call
> > fork() from a multithreaded process.
>
> I have the same recollection, but rather because of concurrent O_DIRECT
> and locking (pthread_atfork ...).
>
> Using the allocator above example: what makes sure that no other thread
> is halfway through modifying allocator state? You really have to sync
> somehow before calling fork() -- e.g., grabbing allocator locks in
> pthread_atfork().
Yeah, like what glibc does for its malloc implementation to prevent
allocator calls from racing with fork(), so that malloc() keeps
working after fork(), even though POSIX says that the libc doesn't
have to guarantee that.
> For Linux we document in the man page
>
> "After a fork() in a multithreaded program, the child can safely call
> only async-signal-safe functions (see signal-safety(7)) until such time
> as it calls execve(2)."
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 19:09 ` Jann Horn
@ 2025-06-03 20:17 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-06-03 20:17 UTC (permalink / raw)
To: Jann Horn
Cc: Matthew Wilcox, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Peter Xu, linux-kernel, stable
On 03.06.25 21:09, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 8:37 PM David Hildenbrand <david@redhat.com> wrote:
>> On 03.06.25 20:29, Matthew Wilcox wrote:
>>> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
>>>> When fork() encounters possibly-pinned pages, those pages are immediately
>>>> copied instead of just marking PTEs to make CoW happen later. If the parent
>>>> is multithreaded, this can cause the child to see memory contents that are
>>>> inconsistent in multiple ways:
>>>>
>>>> 1. We are copying the contents of a page with a memcpy() while userspace
>>>> may be writing to it. This can cause the resulting data in the child to
>>>> be inconsistent.
>>>> 2. After we've copied this page, future writes to other pages may
>>>> continue to be visible to the child while future writes to this page are
>>>> no longer visible to the child.
>>>>
>>>> This means the child could theoretically see incoherent states where
>>>> allocator freelists point to objects that are actually in use or stuff like
>>>> that. A mitigating factor is that, unless userspace already has a deadlock
>>>> bug, userspace can pretty much only observe such issues when fancy lockless
>>>> data structures are used (because if another thread was in the middle of
>>>> mutating data during fork() and the post-fork child tried to take the mutex
>>>> protecting that data, it might wait forever).
>>>
>>> Um, OK, but isn't that expected behaviour? POSIX says:
>>>
>>> : A process shall be created with a single thread. If a multi-threaded
>>> : process calls fork(), the new process shall contain a replica of the
>>> : calling thread and its entire address space, possibly including the
>>> : states of mutexes and other resources. Consequently, the application
>>> : shall ensure that the child process only executes async-signal-safe
>>> : operations until such time as one of the exec functions is successful.
>>>
>>> It's always been my understanding that you really, really shouldn't call
>>> fork() from a multithreaded process.
>>
>> I have the same recollection, but rather because of concurrent O_DIRECT
>> and locking (pthread_atfork ...).
>>
>> Using the allocator above example: what makes sure that no other thread
>> is halfway through modifying allocator state? You really have to sync
>> somehow before calling fork() -- e.g., grabbing allocator locks in
>> pthread_atfork().
>
> Yeah, like what glibc does for its malloc implementation to prevent
> allocator calls from racing with fork(), so that malloc() keeps
> working after fork(), even though POSIX says that the libc doesn't
> have to guarantee that.
I mean, the patch here is simple, and there is already a performance
penalty when allocating+copying the page, so it's not really the common
hot path.
Merely a question if this was ever officially supported and warrents a
"Fixes:".
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:29 ` Matthew Wilcox
2025-06-03 18:37 ` David Hildenbrand
@ 2025-06-03 19:03 ` Jann Horn
2025-06-04 12:22 ` David Hildenbrand
1 sibling, 1 reply; 24+ messages in thread
From: Jann Horn @ 2025-06-03 19:03 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, Peter Xu,
linux-kernel, stable
On Tue, Jun 3, 2025 at 8:29 PM Matthew Wilcox <willy@infradead.org> wrote:
> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> > When fork() encounters possibly-pinned pages, those pages are immediately
> > copied instead of just marking PTEs to make CoW happen later. If the parent
> > is multithreaded, this can cause the child to see memory contents that are
> > inconsistent in multiple ways:
> >
> > 1. We are copying the contents of a page with a memcpy() while userspace
> > may be writing to it. This can cause the resulting data in the child to
> > be inconsistent.
> > 2. After we've copied this page, future writes to other pages may
> > continue to be visible to the child while future writes to this page are
> > no longer visible to the child.
> >
> > This means the child could theoretically see incoherent states where
> > allocator freelists point to objects that are actually in use or stuff like
> > that. A mitigating factor is that, unless userspace already has a deadlock
> > bug, userspace can pretty much only observe such issues when fancy lockless
> > data structures are used (because if another thread was in the middle of
> > mutating data during fork() and the post-fork child tried to take the mutex
> > protecting that data, it might wait forever).
>
> Um, OK, but isn't that expected behaviour? POSIX says:
I don't think it is expected behavior that locklessly-updated data
structures in application code could break.
> : A process shall be created with a single thread. If a multi-threaded
> : process calls fork(), the new process shall contain a replica of the
> : calling thread and its entire address space, possibly including the
> : states of mutexes and other resources. Consequently, the application
> : shall ensure that the child process only executes async-signal-safe
> : operations until such time as one of the exec functions is successful.
I think that is only talking about ways in which you can interact with
libc, and does not mean that application code couldn't access its own
lockless data structures or such.
Though admittedly that is a fairly theoretical point, since in
practice the most likely place where you'd encounter this kind of
issue would be in an allocator implementation or such.
> It's always been my understanding that you really, really shouldn't call
> fork() from a multithreaded process.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 19:03 ` Jann Horn
@ 2025-06-04 12:22 ` David Hildenbrand
0 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-06-04 12:22 UTC (permalink / raw)
To: Jann Horn, Matthew Wilcox
Cc: Andrew Morton, Lorenzo Stoakes, Liam R. Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, linux-mm,
Peter Xu, linux-kernel, stable
On 03.06.25 21:03, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 8:29 PM Matthew Wilcox <willy@infradead.org> wrote:
>> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
>>> When fork() encounters possibly-pinned pages, those pages are immediately
>>> copied instead of just marking PTEs to make CoW happen later. If the parent
>>> is multithreaded, this can cause the child to see memory contents that are
>>> inconsistent in multiple ways:
>>>
>>> 1. We are copying the contents of a page with a memcpy() while userspace
>>> may be writing to it. This can cause the resulting data in the child to
>>> be inconsistent.
>>> 2. After we've copied this page, future writes to other pages may
>>> continue to be visible to the child while future writes to this page are
>>> no longer visible to the child.
>>>
>>> This means the child could theoretically see incoherent states where
>>> allocator freelists point to objects that are actually in use or stuff like
>>> that. A mitigating factor is that, unless userspace already has a deadlock
>>> bug, userspace can pretty much only observe such issues when fancy lockless
>>> data structures are used (because if another thread was in the middle of
>>> mutating data during fork() and the post-fork child tried to take the mutex
>>> protecting that data, it might wait forever).
>>
>> Um, OK, but isn't that expected behaviour? POSIX says:
>
> I don't think it is expected behavior that locklessly-updated data
> structures in application code could break.
>
>> : A process shall be created with a single thread. If a multi-threaded
>> : process calls fork(), the new process shall contain a replica of the
>> : calling thread and its entire address space, possibly including the
>> : states of mutexes and other resources. Consequently, the application
>> : shall ensure that the child process only executes async-signal-safe
>> : operations until such time as one of the exec functions is successful.
>
> I think that is only talking about ways in which you can interact with
> libc, and does not mean that application code couldn't access its own
> lockless data structures or such.
>
> Though admittedly that is a fairly theoretical point, since in
> practice the most likely place where you'd encounter this kind of
> issue would be in an allocator implementation or such.
I thought a bit further about that.
The thing is, that another thread in the parent might be doing something
lockless using atomics etc. And in the parent, that thread will make
progress as soon as fork() is done. In the child, that thread will never
make progress again, as it is gone.
So the assumption must be that another thread possibly stalling forever
and not making any progress will not affect the algorithm in the child.
I am not sure if that is actually the case for many lockless algorithms
in allocators. I'm curious, do we have any examples of such algorithms,
in particular, regarding allocators?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
2025-06-03 18:29 ` Matthew Wilcox
@ 2025-06-03 18:33 ` David Hildenbrand
2025-06-03 20:32 ` Pedro Falcato
` (2 subsequent siblings)
4 siblings, 0 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-06-03 18:33 UTC (permalink / raw)
To: Jann Horn, Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Vlastimil Babka, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
Cc: Peter Xu, linux-kernel, stable
On 03.06.25 20:21, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
>
> 1. We are copying the contents of a page with a memcpy() while userspace
> may be writing to it. This can cause the resulting data in the child to
> be inconsistent.
> 2. After we've copied this page, future writes to other pages may>
continue to be visible to the child while future writes to this page are
> no longer visible to the child.
> > This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).
>
Hmm, interesting.
> On top of that, this issue is only observable when pages are either
> DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> references and the parent process having used DMA-pinning at least once
> before.
Right.
>
> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> mm/memory.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 49199410805c..b406dfda976b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
> /*
> * We have a prealloc page, all good! Take it
> * over and copy the page & arm it.
> + *
> + * One nasty aspect is that we could be in a multithreaded process or
> + * such, where another thread is in the middle of writing to memory
> + * while this thread is forking. As long as we're just marking PTEs as
> + * read-only to make copy-on-write happen *later*, that's easy; we just
> + * need to do a single TLB flush before dropping the mmap/VMA locks, and
> + * that's enough to guarantee that the child gets a coherent snapshot of
> + * memory.
> + * But here, where we're doing an immediate copy, we must ensure that
> + * threads in the parent process can no longer write into the page being
> + * copied until we're done forking.
> + * This means that we still need to mark the source PTE as read-only,
> + * with an immediate TLB flush.
> + * (To make the source PTE writable again after fork() is done, we can
> + * rely on the page fault handler to do that lazily, thanks to
> + * PageAnonExclusive().)
> */
> + ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
> + flush_tlb_page(src_vma, addr);
Would we need something similar for hugetlb, or is that already handled?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
2025-06-03 18:29 ` Matthew Wilcox
2025-06-03 18:33 ` David Hildenbrand
@ 2025-06-03 20:32 ` Pedro Falcato
2025-06-04 15:41 ` Jann Horn
2025-06-05 7:33 ` Vlastimil Babka
2025-06-06 12:49 ` Jann Horn
4 siblings, 1 reply; 24+ messages in thread
From: Pedro Falcato @ 2025-06-03 20:32 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, Peter Xu,
linux-kernel, stable
On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
>
> 1. We are copying the contents of a page with a memcpy() while userspace
> may be writing to it. This can cause the resulting data in the child to
> be inconsistent.
This is an interesting problem, but we'll get to it later.
> 2. After we've copied this page, future writes to other pages may
> continue to be visible to the child while future writes to this page are
> no longer visible to the child.
>
Yes, and this is not fixable. It's also a problem for the regular write-protect
pte path where inevitably only a part of the address space will be write-protected.
This would only be fixable if e.g we suspended every thread on a multi-threaded fork.
> This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).
>
Ok, so the issue here is that atomics + memcpy (or our kernel variants) will
possibly observe tearing. This is indeed a problem, and POSIX doesn't _really_
tell us anything about this. _However_:
POSIX says:
> Any locks held by any thread in the calling process that have been set to be process-shared
> shall not be held by the child process. For locks held by any thread in the calling process
> that have not been set to be process-shared, any attempt by the child process to perform
> any operation on the lock results in undefined behavior (regardless of whether the calling
> process is single-threaded or multi-threaded).
The interesting bit here is "For locks held by any thread [...] any attempt by
the child [...] results in UB". I don't think it's entirely far-fetched to say
the spirit of the law is that atomics may also be UB (just like a lock[1] that was
held by a separate thread, then unlocked mid-concurrent-fork is in a UB state).
In any way, I think the bottom-line is that fork memory snapshot coherency is
a fallacy. It's really impossible to reach without adding insane constraints
(like the aforementioned thread suspending + resume). It's not even possible
when going through normal write-protect paths that have been conceptually stable since
the BSDs in the 1980s (due to the write-protect-a-page-at-a-time-problem).
Thus, personally I don't think this is worth fixing.
[1] This (at least in theory) covers every lock, so it also encompasses pthread spinlocks
--
Pedro
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 20:32 ` Pedro Falcato
@ 2025-06-04 15:41 ` Jann Horn
2025-06-04 16:16 ` Pedro Falcato
0 siblings, 1 reply; 24+ messages in thread
From: Jann Horn @ 2025-06-04 15:41 UTC (permalink / raw)
To: Pedro Falcato
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, Peter Xu,
linux-kernel, stable
On Tue, Jun 3, 2025 at 10:32 PM Pedro Falcato <pfalcato@suse.de> wrote:
> On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> > When fork() encounters possibly-pinned pages, those pages are immediately
> > copied instead of just marking PTEs to make CoW happen later. If the parent
> > is multithreaded, this can cause the child to see memory contents that are
> > inconsistent in multiple ways:
> >
> > 1. We are copying the contents of a page with a memcpy() while userspace
> > may be writing to it. This can cause the resulting data in the child to
> > be inconsistent.
>
> This is an interesting problem, but we'll get to it later.
>
> > 2. After we've copied this page, future writes to other pages may
> > continue to be visible to the child while future writes to this page are
> > no longer visible to the child.
> >
>
> Yes, and this is not fixable. It's also a problem for the regular write-protect
> pte path where inevitably only a part of the address space will be write-protected.
I don't understand what you mean by "inevitably only a part of the
address space will be write-protected". Are you talking about how
shared pages are kept shared between parent in child? Or are you
talking about how there is a point in time at which part of the
address space is write-protected while another part is not yet
write-protected? In that case: Yes, that can happen, but that's not a
problem.
> This would only be fixable if e.g we suspended every thread on a multi-threaded fork.
No, I think it is fine to keep threads running in parallel on a
multi-threaded fork as long as all the writes they do are guaranteed
to also be observable in the child. Such writes are no different from
writes performed before fork().
It would only get problematic if something in the parent first wrote
to page A, which has already been copied to the child (so the child no
longer sees the write) and then wrote to page B, which is CoWed (so
the child would see the write). I prevent this scenario by effectively
suspending the thread that tries to write to page A until the fork is
over (by making it block on the mmap lock in the fault handling path).
> > This means the child could theoretically see incoherent states where
> > allocator freelists point to objects that are actually in use or stuff like
> > that. A mitigating factor is that, unless userspace already has a deadlock
> > bug, userspace can pretty much only observe such issues when fancy lockless
> > data structures are used (because if another thread was in the middle of
> > mutating data during fork() and the post-fork child tried to take the mutex
> > protecting that data, it might wait forever).
> >
>
> Ok, so the issue here is that atomics + memcpy (or our kernel variants) will
> possibly observe tearing. This is indeed a problem, and POSIX doesn't _really_
> tell us anything about this. _However_:
>
> POSIX says:
> > Any locks held by any thread in the calling process that have been set to be process-shared
> > shall not be held by the child process. For locks held by any thread in the calling process
> > that have not been set to be process-shared, any attempt by the child process to perform
> > any operation on the lock results in undefined behavior (regardless of whether the calling
> > process is single-threaded or multi-threaded).
>
> The interesting bit here is "For locks held by any thread [...] any attempt by
> the child [...] results in UB". I don't think it's entirely far-fetched to say
> the spirit of the law is that atomics may also be UB (just like a lock[1] that was
> held by a separate thread, then unlocked mid-concurrent-fork is in a UB state).
I think interpreting atomic operations as locks is far-fetched. Also,
POSIX is a sort of minimal bar, and if we only implemented things
explicitly required by POSIX, we might not have a particularly useful
operating system.
Besides, I think things specified by the C standard override whatever
POSIX says, and C23 specifies that there are atomic operations, and I
haven't seen anything in C23 that restricts availability of those to
before fork().
> In any way, I think the bottom-line is that fork memory snapshot coherency is
> a fallacy. It's really impossible to reach without adding insane constraints
> (like the aforementioned thread suspending + resume). It's not even possible
> when going through normal write-protect paths that have been conceptually stable since
> the BSDs in the 1980s (due to the write-protect-a-page-at-a-time-problem).
No, Linux already had memory snapshot coherency before commit
70e806e4e645 ("mm: Do early cow for pinned pages during fork() for
ptes"). Write-protecting a page at a time does not cause coherency
issues, because letting a concurrent thread write into such memory
during fork() is no different from letting it do so before fork() from
a memory coherency perspective, as long as fork() write-locks memory
management for the process.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-04 15:41 ` Jann Horn
@ 2025-06-04 16:16 ` Pedro Falcato
0 siblings, 0 replies; 24+ messages in thread
From: Pedro Falcato @ 2025-06-04 16:16 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm, Peter Xu,
linux-kernel, stable
On Wed, Jun 04, 2025 at 05:41:47PM +0200, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 10:32 PM Pedro Falcato <pfalcato@suse.de> wrote:
> > On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
> > > When fork() encounters possibly-pinned pages, those pages are immediately
> > > copied instead of just marking PTEs to make CoW happen later. If the parent
> > > is multithreaded, this can cause the child to see memory contents that are
> > > inconsistent in multiple ways:
> > >
> > > 1. We are copying the contents of a page with a memcpy() while userspace
> > > may be writing to it. This can cause the resulting data in the child to
> > > be inconsistent.
> >
> > This is an interesting problem, but we'll get to it later.
> >
> > > 2. After we've copied this page, future writes to other pages may
> > > continue to be visible to the child while future writes to this page are
> > > no longer visible to the child.
> > >
> >
> > Yes, and this is not fixable. It's also a problem for the regular write-protect
> > pte path where inevitably only a part of the address space will be write-protected.
>
> I don't understand what you mean by "inevitably only a part of the
> address space will be write-protected". Are you talking about how
> shared pages are kept shared between parent in child? Or are you
> talking about how there is a point in time at which part of the
> address space is write-protected while another part is not yet
> write-protected? In that case: Yes, that can happen, but that's not a
> problem.
>
> > This would only be fixable if e.g we suspended every thread on a multi-threaded fork.
>
> No, I think it is fine to keep threads running in parallel on a
> multi-threaded fork as long as all the writes they do are guaranteed
> to also be observable in the child. Such writes are no different from
> writes performed before fork().
>
> It would only get problematic if something in the parent first wrote
> to page A, which has already been copied to the child (so the child no
> longer sees the write) and then wrote to page B, which is CoWed (so
> the child would see the write). I prevent this scenario by effectively
> suspending the thread that tries to write to page A until the fork is
> over (by making it block on the mmap lock in the fault handling path).
>
Ah yes, I see my mistake - we write lock all VMAs as we dup them, so
the problem I described can't happen. Thanks for the explanation :)
--
Pedro
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
` (2 preceding siblings ...)
2025-06-03 20:32 ` Pedro Falcato
@ 2025-06-05 7:33 ` Vlastimil Babka
2025-06-05 12:30 ` Pedro Falcato
2025-06-06 12:55 ` Jann Horn
2025-06-06 12:49 ` Jann Horn
4 siblings, 2 replies; 24+ messages in thread
From: Vlastimil Babka @ 2025-06-05 7:33 UTC (permalink / raw)
To: Jann Horn, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Pedro Falcato
Cc: Peter Xu, linux-kernel, stable
On 6/3/25 20:21, Jann Horn wrote:
> When fork() encounters possibly-pinned pages, those pages are immediately
> copied instead of just marking PTEs to make CoW happen later. If the parent
> is multithreaded, this can cause the child to see memory contents that are
> inconsistent in multiple ways:
>
> 1. We are copying the contents of a page with a memcpy() while userspace
> may be writing to it. This can cause the resulting data in the child to
> be inconsistent.
> 2. After we've copied this page, future writes to other pages may
> continue to be visible to the child while future writes to this page are
> no longer visible to the child.
>
> This means the child could theoretically see incoherent states where
> allocator freelists point to objects that are actually in use or stuff like
> that. A mitigating factor is that, unless userspace already has a deadlock
> bug, userspace can pretty much only observe such issues when fancy lockless
> data structures are used (because if another thread was in the middle of
> mutating data during fork() and the post-fork child tried to take the mutex
> protecting that data, it might wait forever).
>
> On top of that, this issue is only observable when pages are either
> DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> references and the parent process having used DMA-pinning at least once
> before.
Seems the changelog seems to be missing the part describing what it's doing
to fix the issue? Some details are not immediately obvious (the writing
threads become blocked in page fault) as the conversation has shown.
> Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> Cc: stable@vger.kernel.org
> Signed-off-by: Jann Horn <jannh@google.com>
Given how the fix seems to be localized to the already rare slowpath and
doesn't require us to pessimize every trivial fork(), it seems reasonable to
me even if don't have a concrete example of a sane code in the wild that's
broken by the current behavior, so:
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> mm/memory.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 49199410805c..b406dfda976b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
> /*
> * We have a prealloc page, all good! Take it
> * over and copy the page & arm it.
> + *
> + * One nasty aspect is that we could be in a multithreaded process or
> + * such, where another thread is in the middle of writing to memory
> + * while this thread is forking. As long as we're just marking PTEs as
> + * read-only to make copy-on-write happen *later*, that's easy; we just
> + * need to do a single TLB flush before dropping the mmap/VMA locks, and
> + * that's enough to guarantee that the child gets a coherent snapshot of
> + * memory.
> + * But here, where we're doing an immediate copy, we must ensure that
> + * threads in the parent process can no longer write into the page being
> + * copied until we're done forking.
> + * This means that we still need to mark the source PTE as read-only,
> + * with an immediate TLB flush.
> + * (To make the source PTE writable again after fork() is done, we can
> + * rely on the page fault handler to do that lazily, thanks to
> + * PageAnonExclusive().)
> */
> + ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
> + flush_tlb_page(src_vma, addr);
>
> if (copy_mc_user_highpage(&new_folio->page, page, addr, src_vma))
> return -EHWPOISON;
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-05 7:33 ` Vlastimil Babka
@ 2025-06-05 12:30 ` Pedro Falcato
2025-06-06 12:55 ` Jann Horn
1 sibling, 0 replies; 24+ messages in thread
From: Pedro Falcato @ 2025-06-05 12:30 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Jann Horn, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Peter Xu, linux-kernel, stable
On Thu, Jun 05, 2025 at 09:33:24AM +0200, Vlastimil Babka wrote:
> On 6/3/25 20:21, Jann Horn wrote:
> > When fork() encounters possibly-pinned pages, those pages are immediately
> > copied instead of just marking PTEs to make CoW happen later. If the parent
> > is multithreaded, this can cause the child to see memory contents that are
> > inconsistent in multiple ways:
> >
> > 1. We are copying the contents of a page with a memcpy() while userspace
> > may be writing to it. This can cause the resulting data in the child to
> > be inconsistent.
> > 2. After we've copied this page, future writes to other pages may
> > continue to be visible to the child while future writes to this page are
> > no longer visible to the child.
> >
> > This means the child could theoretically see incoherent states where
> > allocator freelists point to objects that are actually in use or stuff like
> > that. A mitigating factor is that, unless userspace already has a deadlock
> > bug, userspace can pretty much only observe such issues when fancy lockless
> > data structures are used (because if another thread was in the middle of
> > mutating data during fork() and the post-fork child tried to take the mutex
> > protecting that data, it might wait forever).
> >
> > On top of that, this issue is only observable when pages are either
> > DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> > references and the parent process having used DMA-pinning at least once
> > before.
>
> Seems the changelog seems to be missing the part describing what it's doing
> to fix the issue? Some details are not immediately obvious (the writing
> threads become blocked in page fault) as the conversation has shown.
>
> > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Given how the fix seems to be localized to the already rare slowpath and
> doesn't require us to pessimize every trivial fork(), it seems reasonable to
> me even if don't have a concrete example of a sane code in the wild that's
> broken by the current behavior, so:
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Pedro Falcato <pfalcato@suse.de>
--
Pedro
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-05 7:33 ` Vlastimil Babka
2025-06-05 12:30 ` Pedro Falcato
@ 2025-06-06 12:55 ` Jann Horn
2025-06-06 15:34 ` Vlastimil Babka
1 sibling, 1 reply; 24+ messages in thread
From: Jann Horn @ 2025-06-06 12:55 UTC (permalink / raw)
To: Vlastimil Babka
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Pedro Falcato, Peter Xu, linux-kernel, stable
On Thu, Jun 5, 2025 at 9:33 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 6/3/25 20:21, Jann Horn wrote:
> > When fork() encounters possibly-pinned pages, those pages are immediately
> > copied instead of just marking PTEs to make CoW happen later. If the parent
> > is multithreaded, this can cause the child to see memory contents that are
> > inconsistent in multiple ways:
> >
> > 1. We are copying the contents of a page with a memcpy() while userspace
> > may be writing to it. This can cause the resulting data in the child to
> > be inconsistent.
> > 2. After we've copied this page, future writes to other pages may
> > continue to be visible to the child while future writes to this page are
> > no longer visible to the child.
> >
> > This means the child could theoretically see incoherent states where
> > allocator freelists point to objects that are actually in use or stuff like
> > that. A mitigating factor is that, unless userspace already has a deadlock
> > bug, userspace can pretty much only observe such issues when fancy lockless
> > data structures are used (because if another thread was in the middle of
> > mutating data during fork() and the post-fork child tried to take the mutex
> > protecting that data, it might wait forever).
> >
> > On top of that, this issue is only observable when pages are either
> > DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> > references and the parent process having used DMA-pinning at least once
> > before.
>
> Seems the changelog seems to be missing the part describing what it's doing
> to fix the issue? Some details are not immediately obvious (the writing
> threads become blocked in page fault) as the conversation has shown.
I tried to document this in patch 2/2, where I wrote this (though I
guess I should maybe make it more verbose and not just say "subsequent
writes are delayed until mmap_write_unlock()"):
+ * - Before mmap_write_unlock(), a TLB flush ensures that parent threads can't
+ * write to copy-on-write pages anymore.
+ * - Before dup_mmap() copies page contents (which happens rarely), the
+ * parent's PTE for the page is made read-only and a TLB flush is issued, so
+ * subsequent writes are delayed until mmap_write_unlock().
But I guess this way makes it hard to review patch 1/2 individually.
Should I just squash the two patches together, and then write in the
commit message "see the comment blocks I'm adding for the fix
approach"? Or is there value in repeating the explanation in the
commit message?
> > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Given how the fix seems to be localized to the already rare slowpath and
> doesn't require us to pessimize every trivial fork(), it seems reasonable to
> me even if don't have a concrete example of a sane code in the wild that's
> broken by the current behavior, so:
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-06 12:55 ` Jann Horn
@ 2025-06-06 15:34 ` Vlastimil Babka
0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2025-06-06 15:34 UTC (permalink / raw)
To: Jann Horn
Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm, Pedro Falcato, Peter Xu, linux-kernel, stable
On 6/6/25 14:55, Jann Horn wrote:
> On Thu, Jun 5, 2025 at 9:33 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> + * - Before mmap_write_unlock(), a TLB flush ensures that parent threads can't
> + * write to copy-on-write pages anymore.
> + * - Before dup_mmap() copies page contents (which happens rarely), the
> + * parent's PTE for the page is made read-only and a TLB flush is issued, so
> + * subsequent writes are delayed until mmap_write_unlock().
>
> But I guess this way makes it hard to review patch 1/2 individually.
> Should I just squash the two patches together, and then write in the
> commit message "see the comment blocks I'm adding for the fix
That would be good unless it makes it increases the conflicts in stable
backports.
> approach"? Or is there value in repeating the explanation in the
> commit message?
Depending on the above, if the stable fix needs to stay minimal, it would be
valuable to make it more self-contained by repeating that in the commit
message. So that the LLM has an easier job marking it as a CVE. </sarcasm>
Vlastimil
>> > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
>> > Cc: stable@vger.kernel.org
>> > Signed-off-by: Jann Horn <jannh@google.com>
>>
>> Given how the fix seems to be localized to the already rare slowpath and
>> doesn't require us to pessimize every trivial fork(), it seems reasonable to
>> me even if don't have a concrete example of a sane code in the wild that's
>> broken by the current behavior, so:
>>
>> Acked-by: Vlastimil Babka <vbabka@suse.cz>
>
> Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
` (3 preceding siblings ...)
2025-06-05 7:33 ` Vlastimil Babka
@ 2025-06-06 12:49 ` Jann Horn
2025-06-06 15:49 ` Vlastimil Babka
4 siblings, 1 reply; 24+ messages in thread
From: Jann Horn @ 2025-06-06 12:49 UTC (permalink / raw)
To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Vlastimil Babka, Mike Rapoport,
Suren Baghdasaryan, Michal Hocko, linux-mm
Cc: Peter Xu, linux-kernel, stable
On Tue, Jun 3, 2025 at 8:21 PM Jann Horn <jannh@google.com> wrote:
> @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
> /*
> * We have a prealloc page, all good! Take it
> * over and copy the page & arm it.
> + *
> + * One nasty aspect is that we could be in a multithreaded process or
> + * such, where another thread is in the middle of writing to memory
> + * while this thread is forking. As long as we're just marking PTEs as
> + * read-only to make copy-on-write happen *later*, that's easy; we just
> + * need to do a single TLB flush before dropping the mmap/VMA locks, and
> + * that's enough to guarantee that the child gets a coherent snapshot of
> + * memory.
> + * But here, where we're doing an immediate copy, we must ensure that
> + * threads in the parent process can no longer write into the page being
> + * copied until we're done forking.
> + * This means that we still need to mark the source PTE as read-only,
> + * with an immediate TLB flush.
> + * (To make the source PTE writable again after fork() is done, we can
> + * rely on the page fault handler to do that lazily, thanks to
> + * PageAnonExclusive().)
> */
> + ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
> + flush_tlb_page(src_vma, addr);
Hmm... this is actually wrong, because we did
arch_enter_lazy_mmu_mode() up in copy_pte_range(). So I guess I
actually have to do:
arch_leave_lazy_mmu_mode();
ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
flush_tlb_page(src_vma, addr);
arch_enter_lazy_mmu_mode();
(arch_flush_lazy_mmu_mode() would look a bit nicer, but powerpc
doesn't implement that.)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
2025-06-06 12:49 ` Jann Horn
@ 2025-06-06 15:49 ` Vlastimil Babka
0 siblings, 0 replies; 24+ messages in thread
From: Vlastimil Babka @ 2025-06-06 15:49 UTC (permalink / raw)
To: Jann Horn, Andrew Morton, David Hildenbrand, Lorenzo Stoakes,
Liam R. Howlett, Mike Rapoport, Suren Baghdasaryan, Michal Hocko,
linux-mm
Cc: Peter Xu, linux-kernel, stable
On 6/6/25 14:49, Jann Horn wrote:
> On Tue, Jun 3, 2025 at 8:21 PM Jann Horn <jannh@google.com> wrote:
>> @@ -917,7 +917,25 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
>> /*
>> * We have a prealloc page, all good! Take it
>> * over and copy the page & arm it.
>> + *
>> + * One nasty aspect is that we could be in a multithreaded process or
>> + * such, where another thread is in the middle of writing to memory
>> + * while this thread is forking. As long as we're just marking PTEs as
>> + * read-only to make copy-on-write happen *later*, that's easy; we just
>> + * need to do a single TLB flush before dropping the mmap/VMA locks, and
>> + * that's enough to guarantee that the child gets a coherent snapshot of
>> + * memory.
>> + * But here, where we're doing an immediate copy, we must ensure that
>> + * threads in the parent process can no longer write into the page being
>> + * copied until we're done forking.
>> + * This means that we still need to mark the source PTE as read-only,
>> + * with an immediate TLB flush.
>> + * (To make the source PTE writable again after fork() is done, we can
>> + * rely on the page fault handler to do that lazily, thanks to
>> + * PageAnonExclusive().)
>> */
>> + ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
>> + flush_tlb_page(src_vma, addr);
>
> Hmm... this is actually wrong, because we did
> arch_enter_lazy_mmu_mode() up in copy_pte_range(). So I guess I
> actually have to do:
>
> arch_leave_lazy_mmu_mode();
> ptep_set_wrprotect(src_vma->vm_mm, addr, src_pte);
> flush_tlb_page(src_vma, addr);
> arch_enter_lazy_mmu_mode();
>
> (arch_flush_lazy_mmu_mode() would look a bit nicer, but powerpc
> doesn't implement that.)
Hm isn't that kinda weird that an arch can
#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE and then just not implement an
important part of it?
IIUC think it's the same with arch/sparc/include/asm/tlbflush_64.h ?
Should be possible to implement reusing part of the code from the respective
arch_leave_lazy_mmu_mode() right?
^ permalink raw reply [flat|nested] 24+ messages in thread