All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Yang Shi <shy828301@gmail.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Barry Song <baohua@kernel.org>,
	 Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	 liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
	mhocko@suse.com,  jack@suse.cz, pfalcato@suse.de,
	wanglian@kylinos.cn, chentao@kylinos.cn,  lianux.mm@gmail.com,
	kunwu.chan@gmail.com, liyangouwen1@oppo.com, chrisl@kernel.org,
	 kasong@tencent.com, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com,  youngjun.park@lge.com,
	linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org,  linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org, Nanzhe Zhao <nzzhao@126.com>
Subject: Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Date: Fri, 22 May 2026 16:37:54 +0100	[thread overview]
Message-ID: <ahB3DSwipVXw6tmK@lucifer> (raw)
In-Reply-To: <CAHbLzkq3S7NDYe4LXjurKNQU+40-wtqrD_PT18YcyHbAcNxiRQ@mail.gmail.com>

On Wed, May 20, 2026 at 02:39:49PM -0700, Yang Shi wrote:
> On Wed, May 20, 2026 at 3:34 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 5/19/26 14:53, Lorenzo Stoakes wrote:
> > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote:
> > >
> > >>>
> > >>> I think we either need to fix `fork()`, or keep the current
> > >>> behavior of dropping the VMA lock before performing I/O.
> > >>
> > >> I see. So, this problem arises from the fact that we are changing the
> > >> pagefaults requiring I/O operation to hold VMA lock...
> > >> And you want to lock VMA on fork only if vma_is_anonymous(vma) ||
> > >> is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for
> > >> anonymous and COW VMAs only while holding mmap_write_lock, preventing
> > >> any VMA modification. On the surface, that looks ok to me but I might
> > >> be missing some corner cases. If nobody sees any obvious issues, I
> > >> think it's worth a try.
> > >
> > > Not sure if you noticed but I did raise concerns ;)
> > >
> > > I wonder if you've confused the fault path and fork here, as I think Barry has
> > > been a little unclear on that.
> > >
> > > What's being suggested in this thread is to fundamentally change fork behaviour
> > > so it's different from the entire history of the kernel (or - presumably - at
> > > least recent history :)
> > I don't want fork() to become different in that regard.
> >
> > There is already a slight difference with vs. without per-VMA locks, because
> > there is a window in-between us taking the write mmap_lock and all the per-VMA
> > locks. I raised that previously [1] and assumed that it is probably fine.
> >
> > I also raised in the past why I think we must not allow concurrent page faults,
> > at least as soon as anonymous memory is involved [2].
>
> Thanks for sharing the context, it is quite helpful to understand the
> race conditions. Because Lorenzo also raised the concern about page
> fault race, I will reply to all the concerns regarding page fault race
> together in this thread.
>
> IIUC, there is already some sort of race with per vma lock. Before per
> vma lock, mmap_lock did lock everything. So page fault happened either
> before fork or after fork. But page fault can happen on other VMAs
> which have not been lock'ed yet during fork with per vma lock. For
> example, we have 3 VMAs, we lock the first VMA, but page fault still
> can happen on the other 2 VMAs during fork if they already have
> anon_vma. This is the status quo now, but it seems not harmful.
>
> The bad race shared by David is caused by racing with copy page. So it
> seems like it will be fine as long as we serialize copy page against
> page fault if I don't miss anything. Since we decide whether to copy
> page or not by checking vma->anon_vma, so it seems fine to not take
> vma lock if vma->anon_vma is NULL. This will not introduce more race
> either because setting up a new  anon_vma in page fault or madvise
> requires taking mmap_lock according to the earlier discussions.

NAK. No.

We're not doing this, we're not changing how fork fundamentally behaves because
of concerns about the fault path.

I've delineated exactly why I think this is a problem and you're pressing ahead
without addressing those concerns.

So at this point I'm going to be a grumpy maintainer and just say no, stop
please :)

Let's fix this in the right place. You don't fix a leak in the roof by repairing
a shelf next door :)

Thanks, Lorenzo


>
> Thanks,
> Yang
>
> >
> > ... and I raised that this is pretty much slower by design right now: "Well, the
> > design decision that CONFIG_PER_VMA_LOCK made for now to make page faults fast
> > and to make blocking any page faults from happening to  be slower ..." [3]
> >
> > [1] https://lore.kernel.org/all/970295ab-e85d-7af3-76e6-df53a5c52f8b@redhat.com/
> > [2] https://lore.kernel.org/all/7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com/
> > [3] https://lore.kernel.org/all/2efa2c89-3765-721d-2c3c-00590054aa5b@redhat.com/
> >
> > --
> > Cheers,
> >
> > David
> >


WARNING: multiple messages have this Message-ID (diff)
From: Lorenzo Stoakes <ljs@kernel.org>
To: Yang Shi <shy828301@gmail.com>
Cc: "David Hildenbrand (Arm)" <david@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Barry Song <baohua@kernel.org>,
	 Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	 liam@infradead.org, vbabka@kernel.org, rppt@kernel.org,
	mhocko@suse.com,  jack@suse.cz, pfalcato@suse.de,
	wanglian@kylinos.cn, chentao@kylinos.cn,  lianux.mm@gmail.com,
	kunwu.chan@gmail.com, liyangouwen1@oppo.com, chrisl@kernel.org,
	 kasong@tencent.com, shikemeng@huaweicloud.com,
	nphamcs@gmail.com, bhe@redhat.com,  youngjun.park@lge.com,
	linux-arm-kernel@lists.infradead.org,
	 linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linuxppc-dev@lists.ozlabs.org,  linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org, Nanzhe Zhao <nzzhao@126.com>
Subject: Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Date: Fri, 22 May 2026 16:37:54 +0100	[thread overview]
Message-ID: <ahB3DSwipVXw6tmK@lucifer> (raw)
In-Reply-To: <CAHbLzkq3S7NDYe4LXjurKNQU+40-wtqrD_PT18YcyHbAcNxiRQ@mail.gmail.com>

On Wed, May 20, 2026 at 02:39:49PM -0700, Yang Shi wrote:
> On Wed, May 20, 2026 at 3:34 AM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
> >
> > On 5/19/26 14:53, Lorenzo Stoakes wrote:
> > > On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote:
> > >
> > >>>
> > >>> I think we either need to fix `fork()`, or keep the current
> > >>> behavior of dropping the VMA lock before performing I/O.
> > >>
> > >> I see. So, this problem arises from the fact that we are changing the
> > >> pagefaults requiring I/O operation to hold VMA lock...
> > >> And you want to lock VMA on fork only if vma_is_anonymous(vma) ||
> > >> is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for
> > >> anonymous and COW VMAs only while holding mmap_write_lock, preventing
> > >> any VMA modification. On the surface, that looks ok to me but I might
> > >> be missing some corner cases. If nobody sees any obvious issues, I
> > >> think it's worth a try.
> > >
> > > Not sure if you noticed but I did raise concerns ;)
> > >
> > > I wonder if you've confused the fault path and fork here, as I think Barry has
> > > been a little unclear on that.
> > >
> > > What's being suggested in this thread is to fundamentally change fork behaviour
> > > so it's different from the entire history of the kernel (or - presumably - at
> > > least recent history :)
> > I don't want fork() to become different in that regard.
> >
> > There is already a slight difference with vs. without per-VMA locks, because
> > there is a window in-between us taking the write mmap_lock and all the per-VMA
> > locks. I raised that previously [1] and assumed that it is probably fine.
> >
> > I also raised in the past why I think we must not allow concurrent page faults,
> > at least as soon as anonymous memory is involved [2].
>
> Thanks for sharing the context, it is quite helpful to understand the
> race conditions. Because Lorenzo also raised the concern about page
> fault race, I will reply to all the concerns regarding page fault race
> together in this thread.
>
> IIUC, there is already some sort of race with per vma lock. Before per
> vma lock, mmap_lock did lock everything. So page fault happened either
> before fork or after fork. But page fault can happen on other VMAs
> which have not been lock'ed yet during fork with per vma lock. For
> example, we have 3 VMAs, we lock the first VMA, but page fault still
> can happen on the other 2 VMAs during fork if they already have
> anon_vma. This is the status quo now, but it seems not harmful.
>
> The bad race shared by David is caused by racing with copy page. So it
> seems like it will be fine as long as we serialize copy page against
> page fault if I don't miss anything. Since we decide whether to copy
> page or not by checking vma->anon_vma, so it seems fine to not take
> vma lock if vma->anon_vma is NULL. This will not introduce more race
> either because setting up a new  anon_vma in page fault or madvise
> requires taking mmap_lock according to the earlier discussions.

NAK. No.

We're not doing this, we're not changing how fork fundamentally behaves because
of concerns about the fault path.

I've delineated exactly why I think this is a problem and you're pressing ahead
without addressing those concerns.

So at this point I'm going to be a grumpy maintainer and just say no, stop
please :)

Let's fix this in the right place. You don't fix a leak in the roof by repairing
a shelf next door :)

Thanks, Lorenzo


>
> Thanks,
> Yang
>
> >
> > ... and I raised that this is pretty much slower by design right now: "Well, the
> > design decision that CONFIG_PER_VMA_LOCK made for now to make page faults fast
> > and to make blocking any page faults from happening to  be slower ..." [3]
> >
> > [1] https://lore.kernel.org/all/970295ab-e85d-7af3-76e6-df53a5c52f8b@redhat.com/
> > [2] https://lore.kernel.org/all/7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com/
> > [3] https://lore.kernel.org/all/2efa2c89-3765-721d-2c3c-00590054aa5b@redhat.com/
> >
> > --
> > Cheers,
> >
> > David
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2026-05-22 15:38 UTC|newest]

Thread overview: 145+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  4:04 [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Barry Song (Xiaomi)
2026-04-30  4:04 ` Barry Song (Xiaomi)
2026-04-30  4:04 ` [PATCH v2 1/5] mm/filemap: Retry fault by VMA lock if the lock was released for I/O Barry Song (Xiaomi)
2026-04-30  4:04   ` Barry Song (Xiaomi)
2026-04-30  4:04 ` [PATCH v2 2/5] mm/swapin: Retry swapin " Barry Song (Xiaomi)
2026-04-30  4:04   ` Barry Song (Xiaomi)
2026-04-30  4:04 ` [PATCH v2 3/5] mm: Move folio_lock_or_retry() and drop __folio_lock_or_retry() Barry Song (Xiaomi)
2026-04-30  4:04   ` Barry Song (Xiaomi)
2026-04-30  4:04 ` [PATCH v2 4/5] mm: Don't retry page fault if folio is uptodate during swap-in Barry Song (Xiaomi)
2026-04-30  4:04   ` Barry Song (Xiaomi)
2026-04-30 12:35   ` Matthew Wilcox
2026-04-30 12:35     ` Matthew Wilcox
2026-05-01 16:11     ` Matthew Wilcox
2026-05-01 16:11       ` Matthew Wilcox
2026-04-30  4:04 ` [PATCH v2 5/5] mm/filemap: Avoid retrying page faults on uptodate folios in filemap faults Barry Song (Xiaomi)
2026-04-30  4:04   ` Barry Song (Xiaomi)
2026-04-30 12:37 ` [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Matthew Wilcox
2026-04-30 12:37   ` Matthew Wilcox
2026-04-30 22:49   ` Barry Song
2026-04-30 22:49     ` Barry Song
2026-05-01 14:56     ` Matthew Wilcox
2026-05-01 14:56       ` Matthew Wilcox
2026-05-01 17:44       ` Barry Song
2026-05-01 17:44         ` Barry Song
2026-05-01 17:57         ` Matthew Wilcox
2026-05-01 17:57           ` Matthew Wilcox
2026-05-01 18:25           ` Barry Song
2026-05-01 18:25             ` Barry Song
2026-05-01 19:39             ` Matthew Wilcox
2026-05-01 19:39               ` Matthew Wilcox
2026-05-03 20:39               ` Barry Song
2026-05-03 20:39                 ` Barry Song
2026-05-03 13:13           ` Jan Kara
2026-05-03 13:13             ` Jan Kara
2026-05-03 19:55             ` Barry Song
2026-05-03 19:55               ` Barry Song
2026-05-04 13:03               ` Jan Kara
2026-05-04 13:03                 ` Jan Kara
2026-05-04 13:35                 ` Barry Song
2026-05-04 13:35                   ` Barry Song
2026-05-04 14:15                 ` Barry Song
2026-05-04 14:15                   ` Barry Song
2026-05-17  8:45           ` Barry Song
2026-05-17  8:45             ` Barry Song
2026-05-18  9:46             ` Lorenzo Stoakes
2026-05-18  9:46               ` Lorenzo Stoakes
2026-05-18 11:25               ` Barry Song
2026-05-18 11:25                 ` Barry Song
2026-05-18 16:17                 ` Matthew Wilcox
2026-05-18 16:17                   ` Matthew Wilcox
2026-05-18 20:50                   ` Barry Song
2026-05-18 20:50                     ` Barry Song
2026-05-18 19:56                 ` Suren Baghdasaryan
2026-05-18 19:56                   ` Suren Baghdasaryan
2026-05-18 21:14                   ` Barry Song
2026-05-18 21:14                     ` Barry Song
2026-05-19 12:45                     ` Lorenzo Stoakes
2026-05-19 12:45                       ` Lorenzo Stoakes
2026-05-19 14:17                     ` Liam R. Howlett
2026-05-19 14:17                       ` Liam R. Howlett
2026-05-19 22:01                       ` Barry Song
2026-05-19 22:01                         ` Barry Song
2026-05-20 21:04                         ` Matthew Wilcox
2026-05-20 21:04                           ` Matthew Wilcox
2026-05-20 21:14                           ` Barry Song
2026-05-20 21:14                             ` Barry Song
2026-05-20 21:15                             ` Matthew Wilcox
2026-05-20 21:15                               ` Matthew Wilcox
2026-05-20 21:35                               ` David Hildenbrand (Arm)
2026-05-20 21:35                                 ` David Hildenbrand (Arm)
2026-05-20 23:37                                 ` Barry Song
2026-05-20 23:37                                   ` Barry Song
2026-05-22 15:53                                   ` Lorenzo Stoakes
2026-05-22 15:53                                     ` Lorenzo Stoakes
2026-05-22 21:31                                     ` Barry Song
2026-05-22 21:31                                       ` Barry Song
2026-05-22  2:33                               ` Barry Song (Xiaomi)
2026-05-22  2:33                                 ` Barry Song (Xiaomi)
2026-05-22 13:09                                 ` Matthew Wilcox
2026-05-22 13:09                                   ` Matthew Wilcox
2026-05-22 13:36                                   ` Barry Song
2026-05-22 13:36                                     ` Barry Song
2026-05-22 13:48                                     ` Barry Song
2026-05-22 13:48                                       ` Barry Song
2026-05-22 15:42                                       ` Lorenzo Stoakes
2026-05-22 15:42                                         ` Lorenzo Stoakes
2026-05-19 12:53                   ` Lorenzo Stoakes
2026-05-19 12:53                     ` Lorenzo Stoakes
2026-05-19 21:18                     ` Barry Song
2026-05-19 21:18                       ` Barry Song
2026-05-20  7:50                       ` Lorenzo Stoakes
2026-05-20  7:50                         ` Lorenzo Stoakes
2026-05-20  9:07                         ` Barry Song
2026-05-20  9:07                           ` Barry Song
2026-05-20 10:07                           ` Lorenzo Stoakes
2026-05-20 10:07                             ` Lorenzo Stoakes
2026-05-20 16:20                           ` Suren Baghdasaryan
2026-05-20 16:20                             ` Suren Baghdasaryan
2026-05-20  5:51                     ` Suren Baghdasaryan
2026-05-20  5:51                       ` Suren Baghdasaryan
2026-05-22 15:39                       ` Lorenzo Stoakes
2026-05-22 15:39                         ` Lorenzo Stoakes
2026-05-20 10:33                     ` David Hildenbrand (Arm)
2026-05-20 10:33                       ` David Hildenbrand (Arm)
2026-05-20 12:55                       ` Lorenzo Stoakes
2026-05-20 12:55                         ` Lorenzo Stoakes
2026-05-20 21:39                       ` Yang Shi
2026-05-20 21:39                         ` Yang Shi
2026-05-22 15:37                         ` Lorenzo Stoakes [this message]
2026-05-22 15:37                           ` Lorenzo Stoakes
2026-05-19 12:43                 ` Lorenzo Stoakes
2026-05-19 12:43                   ` Lorenzo Stoakes
2026-05-18  9:53             ` David Hildenbrand (Arm)
2026-05-18  9:53               ` David Hildenbrand (Arm)
2026-05-19 13:42               ` Lorenzo Stoakes
2026-05-19 13:42                 ` Lorenzo Stoakes
2026-05-18 21:21             ` Yang Shi
2026-05-18 21:21               ` Yang Shi
2026-05-19 11:07               ` Barry Song
2026-05-19 11:07                 ` Barry Song
2026-05-19 13:34                 ` Lorenzo Stoakes
2026-05-19 13:34                   ` Lorenzo Stoakes
2026-05-19 18:50                 ` Yang Shi
2026-05-19 18:50                   ` Yang Shi
2026-05-19 20:53                   ` Yang Shi
2026-05-19 20:53                     ` Yang Shi
2026-05-19 13:12               ` Lorenzo Stoakes
2026-05-19 13:12                 ` Lorenzo Stoakes
2026-05-19 13:39                 ` Lorenzo Stoakes
2026-05-19 13:39                   ` Lorenzo Stoakes
2026-05-19 18:41                   ` Yang Shi
2026-05-19 18:41                     ` Yang Shi
2026-05-19 21:02                     ` Yang Shi
2026-05-19 21:02                       ` Yang Shi
2026-05-20  8:11                       ` Lorenzo Stoakes
2026-05-20  8:11                         ` Lorenzo Stoakes
2026-05-01 15:52   ` Lorenzo Stoakes
2026-05-01 15:52     ` Lorenzo Stoakes
2026-05-01 16:06     ` Matthew Wilcox
2026-05-01 16:06       ` Matthew Wilcox
2026-05-01 17:09       ` Lorenzo Stoakes
2026-05-01 17:09         ` Lorenzo Stoakes
2026-05-01 17:59     ` Barry Song
2026-05-01 17:59       ` Barry Song
2026-05-20  2:04   ` Hillf Danton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahB3DSwipVXw6tmK@lucifer \
    --to=ljs@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chentao@kylinos.cn \
    --cc=chrisl@kernel.org \
    --cc=david@kernel.org \
    --cc=jack@suse.cz \
    --cc=kasong@tencent.com \
    --cc=kunwu.chan@gmail.com \
    --cc=liam@infradead.org \
    --cc=lianux.mm@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liyangouwen1@oppo.com \
    --cc=loongarch@lists.linux.dev \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=nzzhao@126.com \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=shikemeng@huaweicloud.com \
    --cc=shy828301@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    --cc=wanglian@kylinos.cn \
    --cc=willy@infradead.org \
    --cc=youngjun.park@lge.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.