From: "Barry Song (Xiaomi)" <baohua@kernel.org>
To: willy@infradead.org
Cc: akpm@linux-foundation.org, baohua@kernel.org, bhe@redhat.com,
chentao@kylinos.cn, chrisl@kernel.org, david@kernel.org,
jack@suse.cz, kasong@tencent.com, kunwu.chan@gmail.com,
liam@infradead.org, lianux.mm@gmail.com,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, liyangouwen1@oppo.com,
ljs@kernel.org, loongarch@lists.linux.dev, mhocko@suse.com,
nphamcs@gmail.com, nzzhao@126.com, pfalcato@suse.de,
rppt@kernel.org, shikemeng@huaweicloud.com, surenb@google.com,
vbabka@kernel.org, wanglian@kylinos.cn, youngjun.park@lge.com
Subject: Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance
Date: Fri, 22 May 2026 10:33:05 +0800 [thread overview]
Message-ID: <20260522023305.98223-1-baohua@kernel.org> (raw)
In-Reply-To: <ag4kj84EcKqamdB-@casper.infradead.org>
On Thu, May 21, 2026 at 5:16 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, May 21, 2026 at 05:14:20AM +0800, Barry Song wrote:
> > My understanding is that we should not blame applications here. This is 2026:
> > there are basically only two kinds of applications — single-threaded and
> > multi-threaded — and single-threaded applications are nearly extinct.
>
> all of the applications i run are either single threaded or don't fork.
> what multithreaded applications call fork?
As I replied to David [1], we cannot control what those apps do.
Technically, I agree with you that calling fork() within a
multithreaded app may not be a good idea. But in such a complex
ecosystem, we cannot simply say no to those apps.
Especially when our phones are improving the kernel with this fix,
our customers may instead complain that our phones regress their
apps first. That feels unfair.
I can offer a two-step plan. For the first step, we keep the
current approach of dropping the VMA lock and retrying page faults,
while trying to make the smallest possible change.
As discussed with Suren, the draft code is being changed from a
whitelist approach to a blacklist approach. This way, we do not
need to touch `filemap.c` at all (probably because you are already
maintaining `filemap.c` perfectly):
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 63de8e8684f2..4101d5fa7a82 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1322,6 +1322,7 @@ void do_user_addr_fault(struct pt_regs *regs,
if (!(flags & FAULT_FLAG_USER))
goto lock_mmap;
+retry_vma:
vma = lock_vma_under_rcu(mm, address);
if (!vma)
goto lock_mmap;
@@ -1351,6 +1352,8 @@ void do_user_addr_fault(struct pt_regs *regs,
ARCH_DEFAULT_PKEY);
return;
}
+ if (!(fault & VM_FAULT_RETRY_HARD))
+ goto retry_vma;
lock_mmap:
retry:
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index a308e2c23b82..eeb7d6091bef 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1659,6 +1659,7 @@ typedef __bitwise unsigned int vm_fault_t;
* @VM_FAULT_NOPAGE: ->fault installed the pte, not return page
* @VM_FAULT_LOCKED: ->fault locked the returned page
* @VM_FAULT_RETRY: ->fault blocked, must retry
+ * @VM_FAULT_RETRY_HARD: ->fault blocked, must retry via mmap_lock
* @VM_FAULT_FALLBACK: huge page fault failed, fall back to small
* @VM_FAULT_DONE_COW: ->fault has fully handled COW
* @VM_FAULT_NEEDDSYNC: ->fault did not modify page tables and needs
@@ -1678,10 +1679,11 @@ enum vm_fault_reason {
VM_FAULT_NOPAGE = (__force vm_fault_t)0x000100,
VM_FAULT_LOCKED = (__force vm_fault_t)0x000200,
VM_FAULT_RETRY = (__force vm_fault_t)0x000400,
- VM_FAULT_FALLBACK = (__force vm_fault_t)0x000800,
- VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000,
- VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000,
- VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000,
+ VM_FAULT_RETRY_HARD = (__force vm_fault_t)0x000800,
+ VM_FAULT_FALLBACK = (__force vm_fault_t)0x001000,
+ VM_FAULT_DONE_COW = (__force vm_fault_t)0x002000,
+ VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x004000,
+ VM_FAULT_COMPLETED = (__force vm_fault_t)0x008000,
VM_FAULT_HINDEX_MASK = (__force vm_fault_t)0x0f0000,
};
diff --git a/mm/memory.c b/mm/memory.c
index 7c020995eafc..b3e7ffdd83f9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3797,7 +3797,7 @@ static inline vm_fault_t vmf_can_call_fault(const struct vm_fault *vmf)
if (vma->vm_ops->map_pages || !(vmf->flags & FAULT_FLAG_VMA_LOCK))
return 0;
vma_end_read(vma);
- return VM_FAULT_RETRY;
+ return VM_FAULT_RETRY | VM_FAULT_RETRY_HARD;
}
/**
@@ -3824,7 +3824,7 @@ vm_fault_t __vmf_anon_prepare(struct vm_fault *vmf)
return 0;
if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
if (!mmap_read_trylock(vma->vm_mm))
- return VM_FAULT_RETRY;
+ return VM_FAULT_RETRY | VM_FAULT_RETRY_HARD;
}
if (__anon_vma_prepare(vma))
ret = VM_FAULT_OOM;
@@ -4778,7 +4778,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
* under VMA lock.
*/
vma_end_read(vma);
- ret = VM_FAULT_RETRY;
+ ret = VM_FAULT_RETRY | VM_FAULT_RETRY_HARD;
goto out;
}
For the second step, we can move forward with your approach of
ripping out the PF retry code, after getting in touch with the
owners of those popular apps one by one to understand why they are
doing this and whether they can find a different approach. In
short, this would allow for a one- or two-year transition period.
What do you think about that?
[1] https://lore.kernel.org/linux-mm/CAGsJ_4xC5LdhuoWV1=tK-RZ5rkjc8aOKOkmb1L_8BG_3gtJhDg@mail.gmail.com/
next prev parent reply other threads:[~2026-05-22 2:33 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 4:04 [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 1/5] mm/filemap: Retry fault by VMA lock if the lock was released for I/O Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 2/5] mm/swapin: Retry swapin " Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 3/5] mm: Move folio_lock_or_retry() and drop __folio_lock_or_retry() Barry Song (Xiaomi)
2026-04-30 4:04 ` [PATCH v2 4/5] mm: Don't retry page fault if folio is uptodate during swap-in Barry Song (Xiaomi)
2026-04-30 12:35 ` Matthew Wilcox
2026-05-01 16:11 ` Matthew Wilcox
2026-04-30 4:04 ` [PATCH v2 5/5] mm/filemap: Avoid retrying page faults on uptodate folios in filemap faults Barry Song (Xiaomi)
2026-04-30 12:37 ` [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance Matthew Wilcox
2026-04-30 22:49 ` Barry Song
2026-05-01 14:56 ` Matthew Wilcox
2026-05-01 17:44 ` Barry Song
2026-05-01 17:57 ` Matthew Wilcox
2026-05-01 18:25 ` Barry Song
2026-05-01 19:39 ` Matthew Wilcox
2026-05-03 20:39 ` Barry Song
2026-05-03 13:13 ` Jan Kara
2026-05-03 19:55 ` Barry Song
2026-05-04 13:03 ` Jan Kara
2026-05-04 13:35 ` Barry Song
2026-05-04 14:15 ` Barry Song
2026-05-17 8:45 ` Barry Song
2026-05-18 9:46 ` Lorenzo Stoakes
2026-05-18 11:25 ` Barry Song
2026-05-18 16:17 ` Matthew Wilcox
2026-05-18 20:50 ` Barry Song
2026-05-18 19:56 ` Suren Baghdasaryan
2026-05-18 21:14 ` Barry Song
2026-05-19 12:45 ` Lorenzo Stoakes
2026-05-19 14:17 ` Liam R. Howlett
2026-05-19 22:01 ` Barry Song
2026-05-20 21:04 ` Matthew Wilcox
2026-05-20 21:14 ` Barry Song
2026-05-20 21:15 ` Matthew Wilcox
2026-05-20 21:35 ` David Hildenbrand (Arm)
2026-05-20 23:37 ` Barry Song
2026-05-22 2:33 ` Barry Song (Xiaomi) [this message]
2026-05-19 12:53 ` Lorenzo Stoakes
2026-05-19 21:18 ` Barry Song
2026-05-20 7:50 ` Lorenzo Stoakes
2026-05-20 9:07 ` Barry Song
2026-05-20 10:07 ` Lorenzo Stoakes
2026-05-20 16:20 ` Suren Baghdasaryan
2026-05-20 5:51 ` Suren Baghdasaryan
2026-05-20 10:33 ` David Hildenbrand (Arm)
2026-05-20 12:55 ` Lorenzo Stoakes
2026-05-20 21:39 ` Yang Shi
2026-05-19 12:43 ` Lorenzo Stoakes
2026-05-18 9:53 ` David Hildenbrand (Arm)
2026-05-19 13:42 ` Lorenzo Stoakes
2026-05-18 21:21 ` Yang Shi
2026-05-19 11:07 ` Barry Song
2026-05-19 13:34 ` Lorenzo Stoakes
2026-05-19 18:50 ` Yang Shi
2026-05-19 20:53 ` Yang Shi
2026-05-19 13:12 ` Lorenzo Stoakes
2026-05-19 13:39 ` Lorenzo Stoakes
2026-05-19 18:41 ` Yang Shi
2026-05-19 21:02 ` Yang Shi
2026-05-20 8:11 ` Lorenzo Stoakes
2026-05-01 15:52 ` Lorenzo Stoakes
2026-05-01 16:06 ` Matthew Wilcox
2026-05-01 17:09 ` Lorenzo Stoakes
2026-05-01 17:59 ` Barry Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260522023305.98223-1-baohua@kernel.org \
--to=baohua@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=chentao@kylinos.cn \
--cc=chrisl@kernel.org \
--cc=david@kernel.org \
--cc=jack@suse.cz \
--cc=kasong@tencent.com \
--cc=kunwu.chan@gmail.com \
--cc=liam@infradead.org \
--cc=lianux.mm@gmail.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liyangouwen1@oppo.com \
--cc=ljs@kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=mhocko@suse.com \
--cc=nphamcs@gmail.com \
--cc=nzzhao@126.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=shikemeng@huaweicloud.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=wanglian@kylinos.cn \
--cc=willy@infradead.org \
--cc=youngjun.park@lge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox