Linux Documentation
 help / color / mirror / Atom feed
From: Lance Yang <lance.yang@linux.dev>
To: ljs@kernel.org, david@kernel.org, npache@redhat.com
Cc: lance.yang@linux.dev, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org, aarcange@redhat.com,
	akpm@linux-foundation.org, anshuman.khandual@arm.com,
	apopple@nvidia.com, baohua@kernel.org,
	baolin.wang@linux.alibaba.com, byungchul@sk.com,
	catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
	dave.hansen@linux.intel.com, dev.jain@arm.com, gourry@gourry.net,
	hannes@cmpxchg.org, hughd@google.com, jack@suse.cz,
	jackmanb@google.com, jannh@google.com, jglisse@google.com,
	joshua.hahnjy@gmail.com, kas@kernel.org, liam@infradead.org,
	mathieu.desnoyers@efficios.com, matthew.brost@intel.com,
	mhiramat@kernel.org, mhocko@suse.com, peterx@redhat.com,
	pfalcato@suse.de, rakie.kim@sk.com, raquini@redhat.com,
	rdunlap@infradead.org, richard.weiyang@gmail.com,
	rientjes@google.com, rostedt@goodmis.org, rppt@kernel.org,
	ryan.roberts@arm.com, shivankg@amd.com, sunnanyong@huawei.com,
	surenb@google.com, thomas.hellstrom@linux.intel.com,
	tiwai@suse.de, usamaarif642@gmail.com, vbabka@suse.cz,
	vishal.moola@gmail.com, wangkefeng.wang@huawei.com,
	will@kernel.org, willy@infradead.org,
	yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
	ziy@nvidia.com, zokeefe@google.com, usama.arif@linux.dev
Subject: Re: [PATCH mm-unstable v18 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse
Date: Fri,  5 Jun 2026 16:59:03 +0800	[thread overview]
Message-ID: <20260605085903.77186-1-lance.yang@linux.dev> (raw)
In-Reply-To: <aiJ90SWqXvwN9dNT@lucifer>


On Fri, Jun 05, 2026 at 09:07:23AM +0100, Lorenzo Stoakes wrote:
>On Fri, Jun 05, 2026 at 09:18:27AM +0200, David Hildenbrand (Arm) wrote:
>> On 6/4/26 19:04, Nico Pache wrote:
>> > On Mon, Jun 1, 2026 at 9:00 AM Nico Pache <npache@redhat.com> wrote:
>> >>
>> >> On Mon, Jun 1, 2026 at 5:14 AM David Hildenbrand (Arm) <david@kernel.org> wrote:
>> >>>
>> >>>
>> >>> Yeah. BTW, I think we'd need a spin_lock_nested(), so @Nico, treat my code as a
>> >>> draft.
>> >>
>> >> Okay, I read the above and did some investigating.
>> >>
>> >> I will try to implement and verify the changes you suggested :)
>> >
>> > I've implemented something slightly different actually and I *think* its better!
>> >
>> > } else {
>> >        /* this is map_anon_folio_pte_nopf with no mmu update */
>> >         __map_anon_folio_pte_nopf(folio, pte, vma, start_addr,
>> >                       /*uffd_wp=*/ false);
>> >        smp_wmb();
>> >         pmd_populate(mm, pmd, pmd_pgtable(_pmd));
>> >         /*
>> >          * Some architectures (e.g. MIPS) walk the live page table in
>> >          * their implementation. update_mmu_cache_range() must be called
>> >          * with a valid page table hierarchy and the PTE lock held.
>> >          * Acquire it nested inside pmd_ptl when they are distinct locks.
>> >          */
>> >         if (pte_ptl != pmd_ptl)
>> >             spin_lock_nested(pte_ptl, SINGLE_DEPTH_NESTING);
>> >         update_mmu_cache_range(NULL, vma, start_addr, pte, nr_pages);
>> >         if (pte_ptl != pmd_ptl)
>> >             spin_unlock(pte_ptl);
>> >     }
>> > spin_unlock(pmd_ptl);
>> >
>> > The logic here is that when the PMD becomes visible, PTEs are already
>> > populated (no possibility of spurious faults on local CPU)
>> >
>> > the SMP_WMB makes sure of the above
>
>THe locks prevent those 'spurious' (really: incorrect) faults anyway so I don't
>think this is necessary.
>
>> >
>> > And the pmd is installed with the pte and pmd lock both held through
>> > the mmu_cache update.
>> >
>> > This follows the conventions used in pmd_install() and clears the
>> > potential for local CPU faults hitting cleared PTE entries.
>>
>> After the pmdp_collapse_flush() we'd be getting CPU faults due to the cleared
>> PMD already? So the case here is rather different.

The issue I was worried about: update_mmu_cache_range() can re-walk
vma->vm_mm while the PTE page table is still not reachable through the
PMD. And, yeah, that assumption is ugly, but it is what it is, and there
maybe be similar code elsewhere ...

So the ordering we need is "the PMD points to the PTE page table from
_pmd before update_mmu_cache_range()", not "new PTEs before PMD".

Those PTEs are cleared, but we hold the PTL, so nobody else can install
anything there :)

So David's original suggestion looks enough to me:

if (pte_ptl != pmd_ptl)
        spin_lock_nested(pte_ptl, SINGLE_DEPTH_NESTING);

pmd_populate();
map_anon_folio_pte_nopf();

if (pte_ptl != pmd_ptl)
        spin_unlock(pte_ptl);

>Yeah conceptually the code above is problematic because you immediately make the
>PTE available right at the point you populate, so taking a PTE lock after that
>is rather shutting the stable door after the horse has bolted.
>
>Doing it this way is not a good idea in any case because we're adding
>complexity, an extra function and an open-coded cache maintenance call for
>really no benefit.
>
>I asked Nico to abstract the anon folio mapping stuff explicitly so we could
>avoid this sort of duplication so let's not roll that back :)
>
>So again, I think going with the original suggestion (with an updated comment)
>is the right thing to do.
>
>
>Anyway, an aside But in practice we can't have page faults here right? The VMA is:
>
>- Ensured to span at least the PMD range (this isn't immediately obvious in the
>  code)
>- VMA write locked (mmap write lock held)
>
>And we hold the anon_vma lock so no rmap walkers can walk the page tables here
>either.
>
>So I actually wonder, given that, whether we need the PTE PTL at all.

I'd keep it. Cheap, and lets us sleep better at night :P

>But.
>
>At this stage it'll almost certainly be an owned exclusive cache line so it's
>very low cost to do it, and it means we honour the update_mmu_cache_range()
>contract.
>
>And it also makes it clear that we're gating changes on the PTE being
>untouchable so any future stuff that maybe changes some of these rules doesn't
>get caught out.
>
>So probably worth keeping.

Yes!

Cheers, Lance

>>
>> --
>> Cheers,
>>
>> David
>
>Thanks, Lorenzo
>

  reply	other threads:[~2026-06-05  8:59 UTC|newest]

Thread overview: 113+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-22 14:59 [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 01/14] mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 02/14] mm/khugepaged: generalize alloc_charge_folio() Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 03/14] mm/khugepaged: rework max_ptes_* handling with helper functions Nico Pache
2026-05-22 21:16   ` David Hildenbrand (Arm)
2026-06-01 13:26   ` Lorenzo Stoakes
2026-05-22 14:59 ` [PATCH mm-unstable v18 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2026-05-22 21:24   ` David Hildenbrand (Arm)
2026-05-26 14:39   ` Nico Pache
2026-06-01 14:04   ` Lorenzo Stoakes
2026-05-22 15:00 ` [PATCH mm-unstable v18 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped Nico Pache
2026-06-01 14:07   ` Lorenzo Stoakes
2026-06-02 10:26     ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2026-05-22 21:47   ` David Hildenbrand (Arm)
2026-05-26 14:42   ` Nico Pache
2026-05-31  9:39   ` Lance Yang
2026-05-31 20:00     ` David Hildenbrand (Arm)
2026-06-01  3:28       ` Lance Yang
2026-06-01  6:54         ` David Hildenbrand (Arm)
2026-06-01  7:49           ` Lance Yang
2026-06-01  8:15             ` David Hildenbrand (Arm)
2026-06-01  8:44               ` Lance Yang
2026-06-01 10:09                 ` David Hildenbrand (Arm)
2026-06-01  9:08           ` Lance Yang
2026-06-01 10:23             ` David Hildenbrand (Arm)
2026-06-01 10:47               ` Lance Yang
2026-06-01 11:13                 ` David Hildenbrand (Arm)
2026-06-01 15:00                   ` Nico Pache
2026-06-01 15:05                     ` David Hildenbrand (Arm)
2026-06-01 16:07                       ` Lance Yang
2026-06-04 17:04                     ` Nico Pache
2026-06-04 18:12                       ` Lorenzo Stoakes
2026-06-05  7:18                       ` David Hildenbrand (Arm)
2026-06-05  8:07                         ` Lorenzo Stoakes
2026-06-05  8:59                           ` Lance Yang [this message]
2026-06-02 15:30                 ` Nico Pache
2026-06-02 16:34                   ` Lance Yang
2026-06-04 12:33           ` Lorenzo Stoakes
2026-06-04 10:21   ` Lorenzo Stoakes
2026-06-04 10:32     ` Nico Pache
2026-06-04 11:38   ` Lorenzo Stoakes
2026-06-04 12:39     ` Lorenzo Stoakes
2026-06-04 12:45       ` Nico Pache
2026-06-04 12:55         ` Lorenzo Stoakes
2026-06-04 16:28           ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 07/14] mm/khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2026-05-22 21:51   ` David Hildenbrand (Arm)
2026-05-22 15:00 ` [PATCH mm-unstable v18 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2026-05-31 20:09   ` David Hildenbrand (Arm)
2026-06-01 14:13   ` Lorenzo Stoakes
2026-05-22 15:00 ` [PATCH mm-unstable v18 09/14] mm/khugepaged: improve tracepoints for mTHP orders Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 10/14] mm/khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2026-05-31 20:18   ` David Hildenbrand (Arm)
2026-06-01 14:35     ` Lorenzo Stoakes
2026-06-01 14:40       ` David Hildenbrand (Arm)
2026-05-22 15:00 ` [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support Nico Pache
2026-05-25 14:15   ` Nico Pache
2026-05-25 19:10     ` Andrew Morton
2026-05-26  6:57       ` Wei Yang
2026-05-26 12:07         ` Nico Pache
2026-05-28  8:42           ` Wei Yang
2026-05-28 17:11             ` Nico Pache
2026-05-31  7:18   ` Lance Yang
2026-05-31  8:48     ` Lance Yang
2026-06-01 12:01       ` Nico Pache
2026-06-01 12:06         ` David Hildenbrand (Arm)
2026-06-02 10:58     ` Nico Pache
2026-06-02 15:44       ` Lance Yang
2026-06-03  8:05         ` David Hildenbrand (Arm)
2026-06-04 14:40           ` Lorenzo Stoakes
2026-06-01  8:11   ` David Hildenbrand (Arm)
2026-06-01 12:40     ` Nico Pache
2026-06-01 13:15       ` David Hildenbrand (Arm)
2026-06-02 17:23         ` Nico Pache
2026-06-02 17:26           ` Nico Pache
2026-06-03  9:55           ` David Hildenbrand (Arm)
2026-06-03 10:00           ` David Hildenbrand (Arm)
2026-06-03 12:16             ` Nico Pache
2026-06-03 12:27               ` David Hildenbrand (Arm)
2026-06-04 14:14           ` Lorenzo Stoakes
2026-06-04 14:19             ` Lorenzo Stoakes
2026-06-04 13:53     ` Lorenzo Stoakes
2026-06-04 13:59       ` Lorenzo Stoakes
2026-06-04 14:45   ` Lorenzo Stoakes
2026-06-05 11:07     ` Nico Pache
2026-06-05 11:08       ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2026-05-31  7:31   ` Lance Yang
2026-05-31 20:02     ` David Hildenbrand (Arm)
2026-06-01  1:53       ` Lance Yang
2026-05-22 15:00 ` [PATCH mm-unstable v18 13/14] mm/khugepaged: run khugepaged for all orders Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 14/14] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2026-05-22 21:58   ` David Hildenbrand (Arm)
2026-05-26 12:00     ` Nico Pache
2026-05-26 14:45   ` Nico Pache
2026-05-22 15:07 ` [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support Nico Pache
2026-05-22 15:13   ` Vlastimil Babka (SUSE)
2026-05-22 16:11     ` Nico Pache
2026-05-22 21:13       ` David Hildenbrand (Arm)
2026-05-22 15:16   ` Lorenzo Stoakes
2026-05-22 16:08     ` Nico Pache
2026-05-22 16:19       ` Lorenzo Stoakes
2026-05-22 16:31         ` Nico Pache
2026-05-22 17:12           ` Lorenzo Stoakes
2026-05-26  8:14             ` Lorenzo Stoakes
2026-05-22 15:13 ` Lorenzo Stoakes
2026-05-22 20:47 ` Andrew Morton
2026-06-01 15:58   ` Alexander Gordeev
2026-06-01 17:05     ` Nico Pache
2026-06-01 17:08     ` Lorenzo Stoakes
2026-06-02  1:53       ` Lance Yang
2026-06-04 10:10 ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605085903.77186-1-lance.yang@linux.dev \
    --to=lance.yang@linux.dev \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=byungchul@sk.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jackmanb@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kas@kernel.org \
    --cc=liam@infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=ljs@kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.brost@intel.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=pfalcato@suse.de \
    --cc=rakie.kim@sk.com \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=shivankg@amd.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usama.arif@linux.dev \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox