From: Lorenzo Stoakes <ljs@kernel.org>
To: Yang Shi <shy828301@gmail.com>
Cc: Barry Song <baohua@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
surenb@google.com, akpm@linux-foundation.org,
linux-mm@kvack.org, david@kernel.org, liam@infradead.org,
vbabka@kernel.org, rppt@kernel.org, mhocko@suse.com,
jack@suse.cz, pfalcato@suse.de, wanglian@kylinos.cn,
chentao@kylinos.cn, lianux.mm@gmail.com, kunwu.chan@gmail.com,
liyangouwen1@oppo.com, chrisl@kernel.org, kasong@tencent.com,
shikemeng@huaweicloud.com, nphamcs@gmail.com, bhe@redhat.com,
youngjun.park@lge.com, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, Nanzhe Zhao <nzzhao@126.com>
Subject: Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Date: Wed, 20 May 2026 09:11:20 +0100 [thread overview]
Message-ID: <ag1qo0q_bfePgOzx@lucifer> (raw)
In-Reply-To: <CAHbLzkrTF7w+T5mGsQuDRuhnTk6evTKBNRcH4oS=nRcUg2zpsg@mail.gmail.com>
On Tue, May 19, 2026 at 02:02:09PM -0700, Yang Shi wrote:
> On Tue, May 19, 2026 at 11:41 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > >
> > > > >
> > > > > Secondly, if vma->anon_vma is NULL, it basically means either no page
> > > > > fault happened or no cow happened, so there is no page table to copy,
> > > > > this is also what copy_page_range() does currently. So we can shrink
> > > > > the critical section to:
> > > >
> > > > Firstly, with no VMA write lock, !vma->anon_vma means a fault can race and
> > > > secondly copy_page_range() checks vma_needs_copy(), there are other cases - PFN
> > > > maps, mixed maps, UFFD W/P (ugh), guard regions.
> > > >
> > > > So yeah this isn't sufficient.
> > >
> > > However this is true...
> >
> > Yes, fault can race with fork. Basically this is actually the purpose
> > of this idea. We can have improved page fault scalability. In my
> > proposal (take write vma lock if vma->anon_vma is not NULL), the race
> > just happens on the VMAs which page fault has not happened on before.
>
> Sorry, this is incorrect. Page fault can't happen on those VMAs
> because page fault needs to create anon_vma, but it requires taking
> mmap_lock.
> If anon_vma is not NULL, vma write lock will serialize against page
> fault. So there should be no race with page fault. Removing vma write
> lock suggested by Barry may increase race.
Firstly, let's none of us be worried about making mistakes here, the anon_vma
stuff is confusing, and I've stared at it more than mostly, and even so I
managed to make mistakes (as corrected here) and forget details :))
It's a sign it all needs simplifying, but hey that's what my scalable CoW
project is (partly) about :)
Removing the VMA write lock would cause races with page fault which can result
in page tables being installed which are then not correctly duplicated for
ranges that must be.
And again I think the underlying thing here overall I think is:
1. Clearly many cases require serialisation (any that cause copy_page_range() to
fire).
2. If we were to decide not to take a lock with concurrent page faults, that
lays a trap for any future change that (reasonably) assumes that page tables
cannot be simultaneously copied while being accessible to page fault
handlers, which is bug prone.
3. As per 2, even if we were to only take the lock when we felt we absolutely
needed to, we still cause risk through adding yet another 'you just have to
know' risk to this part of mm.
4. The serialisation is quite likely relied upon by other things, this is often
the case in mm, and we may only realise that such serialisation is critical
at the point a subtle issue arises out of it.
5. Fork is one of the most sensitive, intuation-defying, complicated, and
corner- case-problem-baiting areas of mm and I really oppose us changing
fundamental behaviour here unless incredibly well justified.
On this basis, let's let the sleeping dogs lie and leave fork alone I think :)
I think I am far more inclined to take Barry's fault approach (as I've said to
him) vs. changing fork behaviour.
But I want to make sure there's not a 'third way' that could avoid either!
I am going to have a look through Barry's series in detail so we can have some
movement on this one way or another :)
>
> Thanks,
> Yang
>
Cheers, Lorenzo
next prev parent reply other threads:[~2026-05-20 8:11 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 4:04 [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 1/5] mm/filemap: Retry fault by VMA lock if the lock was released for I/O Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 2/5] mm/swapin: Retry swapin " Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 3/5] mm: Move folio_lock_or_retry() and drop __folio_lock_or_retry() Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 4/5] mm: Don't retry page fault if folio is uptodate during swap-in Barry Song (Xiaomi)
2026-04-30 12:35 ` Matthew Wilcox
2026-05-01 16:11 ` Matthew Wilcox
2026-04-30 4:04 ` [PATCH v2 5/5] mm/filemap: Avoid retrying page faults on uptodate folios in filemap faults Barry Song (Xiaomi)
2026-04-30 12:37 ` [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Matthew Wilcox
2026-04-30 22:49 ` Barry Song
2026-05-01 14:56 ` Matthew Wilcox
2026-05-01 17:44 ` Barry Song
2026-05-01 17:57 ` Matthew Wilcox
2026-05-01 18:25 ` Barry Song
2026-05-01 19:39 ` Matthew Wilcox
2026-05-03 20:39 ` Barry Song
2026-05-03 13:13 ` Jan Kara
2026-05-03 19:55 ` Barry Song
2026-05-04 13:03 ` Jan Kara
2026-05-04 13:35 ` Barry Song
2026-05-04 14:15 ` Barry Song
2026-05-17 8:45 ` Barry Song
2026-05-18 9:46 ` Lorenzo Stoakes
2026-05-18 11:25 ` Barry Song
2026-05-18 16:17 ` Matthew Wilcox
2026-05-18 20:50 ` Barry Song
2026-05-18 19:56 ` Suren Baghdasaryan
2026-05-18 21:14 ` Barry Song
2026-05-19 12:45 ` Lorenzo Stoakes
2026-05-19 14:17 ` Liam R. Howlett
2026-05-19 22:01 ` Barry Song
2026-05-19 12:53 ` Lorenzo Stoakes
2026-05-19 21:18 ` Barry Song
2026-05-20 7:50 ` Lorenzo Stoakes
2026-05-20 9:07 ` Barry Song
2026-05-20 10:07 ` Lorenzo Stoakes
2026-05-20 5:51 ` Suren Baghdasaryan
2026-05-20 10:33 ` David Hildenbrand (Arm)
2026-05-19 12:43 ` Lorenzo Stoakes
2026-05-18 9:53 ` David Hildenbrand (Arm)
2026-05-19 13:42 ` Lorenzo Stoakes
2026-05-18 21:21 ` Yang Shi
2026-05-19 11:07 ` Barry Song
2026-05-19 13:34 ` Lorenzo Stoakes
2026-05-19 18:50 ` Yang Shi
2026-05-19 20:53 ` Yang Shi
2026-05-19 13:12 ` Lorenzo Stoakes
2026-05-19 13:39 ` Lorenzo Stoakes
2026-05-19 18:41 ` Yang Shi
2026-05-19 21:02 ` Yang Shi
2026-05-20 8:11 ` Lorenzo Stoakes [this message]
2026-05-01 15:52 ` Lorenzo Stoakes
2026-05-01 16:06 ` Matthew Wilcox
2026-05-01 17:09 ` Lorenzo Stoakes
2026-05-01 17:59 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ag1qo0q_bfePgOzx@lucifer \
--to=ljs@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chentao@kylinos.cn \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=kasong@tencent.com \
--cc=kunwu.chan@gmail.com \
--cc=liam@infradead.org \
--cc=lianux.mm@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liyangouwen1@oppo.com \
--cc=loongarch@lists.linux.dev \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=nzzhao@126.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=shikemeng@huaweicloud.com \
--cc=shy828301@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=wanglian@kylinos.cn \
--cc=willy@infradead.org \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox