From: Lance Yang <lance.yang@linux.dev>
To: hughd@google.com
Cc: Liam.Howlett@oracle.com, akpm@linux-foundation.org,
baohua@kernel.org, baolin.wang@linux.alibaba.com,
david@kernel.org, dev.jain@arm.com, ioworker0@gmail.com,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
lorenzo.stoakes@oracle.com, mhocko@suse.com, npache@redhat.com,
rppt@kernel.org, ryan.roberts@arm.com, surenb@google.com,
vbabka@suse.cz, ziy@nvidia.com, Lance Yang <lance.yang@linux.dev>
Subject: Re: [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out
Date: Sun, 18 Jan 2026 16:39:11 +0800 [thread overview]
Message-ID: <20260118083911.21523-1-lance.yang@linux.dev> (raw)
In-Reply-To: <62e637cf-91e6-454d-a943-e5946bdf7784@linux.dev>
Hi Hugh,
Could you check if my understanding is correct?
On PAE, pmdp_get_lockless() reads pmd_low first, then pmd_high. There's a
risk of reading mismatched values if another CPU modifies the PMD between
the two reads.
Commit 146b42e07494[1] introduced local_irq_save() to protect the
split-read, blocking TLB flush IPIs during the operation.
After modifying the PMD, pmdp_get_lockless_sync() sends an IPI to ensure
all ongoing split-reads complete before proceeding with pte_free_defer().
As commit 146b42e07494[1] says:
```
Complement this pmdp_get_lockless_start() and pmdp_get_lockless_end(),
used only locally in __pte_offset_map(), with a pmdp_get_lockless_sync()
synonym for tlb_remove_table_sync_one(): to send the necessary interrupt
at the right moment on those configs which do not already send it.
```
And commit 1043173eb5eb[2] says:
```
Follow the pattern in retract_page_tables(); and using pte_free_defer()
removes most of the need for tlb_remove_table_sync_one() here; but call
pmdp_get_lockless_sync() to use it in the PAE case.
```
Regarding moving pmdp_get_lockless_sync() out from under PTL: Since
lockless readers (e.g., GUP-fast, __pte_offset_map()) are protected by
local_irq_save() rather than PTL, pmdp_get_lockless_sync() can be called
outside PTL as long as it's before pte_free_defer().
In contrast, for non-PAE, PMD reads are atomic, so pmdp_get_lockless_sync()
is a no-op.
[1] https://github.com/torvalds/linux/commit/146b42e07494e45f7c7bcf2cbf7afd1424afd78e
[2] https://github.com/torvalds/linux/commit/1043173eb5eb351a1dba11cca12705075fe74a9e
Thanks,
Lance
On Fri, 16 Jan 2026 09:25:54 +0800, Lance Yang wrote:
>
>
> On 2026/1/16 09:03, Baolin Wang wrote:
> >
> >
> > On 1/15/26 8:28 PM, Lance Yang wrote:
> >>
> >>
> >> On 2026/1/15 18:00, Baolin Wang wrote:
> >>> Hi Lance,
> >>>
> >>> On 1/15/26 3:16 PM, Lance Yang wrote:
> >>>> From: Lance Yang <lance.yang@linux.dev>
> >>>>
> >>>> tlb_remove_table_sync_one() sends IPIs to all CPUs and waits for them,
> >>>> which we really don't want to do while holding PTL.
> >>>
> >>> Could you add more comments to explain why this is safe for the PAE
> >>> case?
> >>
> >> Yep, IIUC, it is safe because we've already done pmdp_collapse_flush()
> >> which ensures the PMD change is visible.
> >>
> >> pmdp_get_lockless_sync() (which calls tlb_remove_table_sync_one() on PAE)
> >> is just to ensure any ongoing lockless pmd readers (e.g., GUP-fast)
> >> complete
> >> before we proceed. It sends IPIs to all CPUs and waits for responses -
> >> a CPU
> >> can only respond when it's not between local_irq_save() and
> >> local_irq_restore().
> >>
> >> Moving it out from under PTL doesn't change the synchronization
> >> semantics,
> >> since lockless readers don't depend on PTL anyway.
> >
> > Cc Hugh who introduced the pmdp_get_lockless_sync(), to double check.
> >
> > Sounds reasonable to me, please add these comments into the commit
> > message. Thanks.
>
> Yes, will do. Thanks!
>
> >
> >>> For the non-PAE case, you added a new tlb_remove_table_sync_one(),
> >>> why we need this (to solve what problem)? Please also add more
> >>> comments to explain.
> >>
> >> Oops, you're right, the original macro was a no-op for non-PAE.
> >>
> >> I should just move the macro call out from under PTL, rather than
> >> replacing it with direct tlb_remove_table_sync_one() calls.
> >
> > OK.
>
> Cheers,
> Lance
>
next prev parent reply other threads:[~2026-01-18 8:39 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-15 7:16 [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out from under PTL Lance Yang
2026-01-15 10:00 ` Baolin Wang
2026-01-15 12:28 ` Lance Yang
2026-01-16 1:03 ` Baolin Wang
2026-01-16 1:25 ` Lance Yang
2026-01-18 8:39 ` Lance Yang [this message]
2026-01-20 11:38 ` [PATCH v1 1/1] mm/khugepaged: move tlb_remove_table_sync_one out Lance Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260118083911.21523-1-lance.yang@linux.dev \
--to=lance.yang@linux.dev \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.