From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Nico Pache <npache@redhat.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org
Cc: aarcange@redhat.com, akpm@linux-foundation.org,
anshuman.khandual@arm.com, apopple@nvidia.com, baohua@kernel.org,
baolin.wang@linux.alibaba.com, byungchul@sk.com,
catalin.marinas@arm.com, cl@gentwo.org, corbet@lwn.net,
dave.hansen@linux.intel.com, dev.jain@arm.com, gourry@gourry.net,
hannes@cmpxchg.org, hughd@google.com, jack@suse.cz,
jackmanb@google.com, jannh@google.com, jglisse@google.com,
joshua.hahnjy@gmail.com, kas@kernel.org, lance.yang@linux.dev,
liam@infradead.org, ljs@kernel.org,
mathieu.desnoyers@efficios.com, matthew.brost@intel.com,
mhiramat@kernel.org, mhocko@suse.com, peterx@redhat.com,
pfalcato@suse.de, rakie.kim@sk.com, raquini@redhat.com,
rdunlap@infradead.org, richard.weiyang@gmail.com,
rientjes@google.com, rostedt@goodmis.org, rppt@kernel.org,
ryan.roberts@arm.com, shivankg@amd.com, sunnanyong@huawei.com,
surenb@google.com, thomas.hellstrom@linux.intel.com,
tiwai@suse.de, usamaarif642@gmail.com, vbabka@suse.cz,
vishal.moola@gmail.com, wangkefeng.wang@huawei.com,
will@kernel.org, willy@infradead.org,
yang@os.amperecomputing.com, ying.huang@linux.alibaba.com,
ziy@nvidia.com, zokeefe@google.com,
Usama Arif <usama.arif@linux.dev>
Subject: Re: [PATCH mm-unstable v18 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse
Date: Fri, 22 May 2026 23:47:25 +0200 [thread overview]
Message-ID: <eabb0c67-e595-4215-b88b-9c821b4de01d@kernel.org> (raw)
In-Reply-To: <20260522150009.121603-7-npache@redhat.com>
On 5/22/26 17:00, Nico Pache wrote:
> Pass an order and offset to collapse_huge_page to support collapsing anon
> memory to arbitrary orders within a PMD. order indicates what mTHP size we
> are attempting to collapse to, and offset indicates were in the PMD to
> start the collapse attempt.
>
> For non-PMD collapse we must leave the anon VMA write locked until after
> we collapse the mTHP-- in the PMD case all the pages are isolated, but in
> the mTHP case this is not true, and we must keep the lock to prevent
> access/changes to the page tables. This can happen if the rmap walkers hit
> a pmd_none while the PMD entry is currently unavailable due to being
> temporarily removed during the collapse phase.
>
> Acked-by: Usama Arif <usama.arif@linux.dev>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
I guess we should add a comment here like:
/*
* Only notify about the PTE range we will actually modify. While we
* temporary unmap the whole PTE table for mTHP collapse, we'll remap
* it later, leaving other PTEs effectively unmodified. The locks we hold
* prevent anybody from stumbling over such temporarily unmapped PTE tables.
*/
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, address,
> - address + HPAGE_PMD_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, start_addr,
> + end_addr);
> mmu_notifier_invalidate_range_start(&range);
>
> pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
> @@ -1294,26 +1297,23 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> * Parallel GUP-fast is fine since GUP-fast will back off when
> * it detects PMD is changed.
> */
> - _pmd = pmdp_collapse_flush(vma, address, pmd);
> + _pmd = pmdp_collapse_flush(vma, pmd_addr, pmd);
> spin_unlock(pmd_ptl);
> mmu_notifier_invalidate_range_end(&range);
> tlb_remove_table_sync_one();
>
> - pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl);
> + pte = pte_offset_map_lock(mm, &_pmd, start_addr, &pte_ptl);
> if (pte) {
> - result = __collapse_huge_page_isolate(vma, address, pte, cc,
> - HPAGE_PMD_ORDER,
> - &compound_pagelist);
> + result = __collapse_huge_page_isolate(vma, start_addr, pte, cc,
> + order, &compound_pagelist);
> spin_unlock(pte_ptl);
> } else {
> result = SCAN_NO_PTE_TABLE;
> }
>
> if (unlikely(result != SCAN_SUCCEED)) {
> - if (pte)
> - pte_unmap(pte);
> spin_lock(pmd_ptl);
> - BUG_ON(!pmd_none(*pmd));
> + WARN_ON_ONCE(!pmd_none(*pmd));
Likely VM_WARN_ON_ONCE is sufficient.
> /*
> * We can only use set_pmd_at when establishing
> * hugepmds and never for establishing regular pmds that
> @@ -1321,21 +1321,24 @@ static enum scan_result collapse_huge_page(struct mm_struct *mm, unsigned long a
> */
> pmd_populate(mm, pmd, pmd_pgtable(_pmd));
> spin_unlock(pmd_ptl);
> - anon_vma_unlock_write(vma->anon_vma);
> goto out_up_write;
> }
>
> /*
> - * All pages are isolated and locked so anon_vma rmap
> - * can't run anymore.
> + * For PMD collapse all pages are isolated and locked so anon_vma
> + * rmap can't run anymore. For mTHP collapse the PMD entry has been
> + * removed and not all pages are isolated and locked, so we must hold
> + * the lock to prevent neighboring folios from attempting to access
> + * this PMD until its reinstalled.
> */
That makes sense. I was wondering whether there was another reason for dropping
the anon_vma lock ... I guess it was just for latency purposes given that there
was no actual need for the lock anymore once all folios in the range were
isolate+locked.
With the two its above addressed
Acked-by: David Hildenbrand (arm) <david@kernel.org>
--
Cheers,
David
next prev parent reply other threads:[~2026-05-22 21:47 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-22 14:59 [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 01/14] mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 02/14] mm/khugepaged: generalize alloc_charge_folio() Nico Pache
2026-05-22 14:59 ` [PATCH mm-unstable v18 03/14] mm/khugepaged: rework max_ptes_* handling with helper functions Nico Pache
2026-05-22 21:16 ` David Hildenbrand (Arm)
2026-06-01 13:26 ` Lorenzo Stoakes
2026-06-05 16:04 ` Zi Yan
2026-05-22 14:59 ` [PATCH mm-unstable v18 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2026-05-22 21:24 ` David Hildenbrand (Arm)
2026-05-26 14:39 ` Nico Pache
2026-06-01 14:04 ` Lorenzo Stoakes
2026-05-22 15:00 ` [PATCH mm-unstable v18 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped Nico Pache
2026-06-01 14:07 ` Lorenzo Stoakes
2026-06-02 10:26 ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse Nico Pache
2026-05-22 21:47 ` David Hildenbrand (Arm) [this message]
2026-05-26 14:42 ` Nico Pache
2026-05-31 9:39 ` Lance Yang
2026-05-31 20:00 ` David Hildenbrand (Arm)
2026-06-01 3:28 ` Lance Yang
2026-06-01 6:54 ` David Hildenbrand (Arm)
2026-06-01 7:49 ` Lance Yang
2026-06-01 8:15 ` David Hildenbrand (Arm)
2026-06-01 8:44 ` Lance Yang
2026-06-01 10:09 ` David Hildenbrand (Arm)
2026-06-01 9:08 ` Lance Yang
2026-06-01 10:23 ` David Hildenbrand (Arm)
2026-06-01 10:47 ` Lance Yang
2026-06-01 11:13 ` David Hildenbrand (Arm)
2026-06-01 15:00 ` Nico Pache
2026-06-01 15:05 ` David Hildenbrand (Arm)
2026-06-01 16:07 ` Lance Yang
2026-06-04 17:04 ` Nico Pache
2026-06-04 18:12 ` Lorenzo Stoakes
2026-06-05 7:18 ` David Hildenbrand (Arm)
2026-06-05 8:07 ` Lorenzo Stoakes
2026-06-05 8:59 ` Lance Yang
2026-06-02 15:30 ` Nico Pache
2026-06-02 16:34 ` Lance Yang
2026-06-04 12:33 ` Lorenzo Stoakes
2026-06-04 10:21 ` Lorenzo Stoakes
2026-06-04 10:32 ` Nico Pache
2026-06-04 11:38 ` Lorenzo Stoakes
2026-06-04 12:39 ` Lorenzo Stoakes
2026-06-04 12:45 ` Nico Pache
2026-06-04 12:55 ` Lorenzo Stoakes
2026-06-04 16:28 ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 07/14] mm/khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2026-05-22 21:51 ` David Hildenbrand (Arm)
2026-05-22 15:00 ` [PATCH mm-unstable v18 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics Nico Pache
2026-05-31 20:09 ` David Hildenbrand (Arm)
2026-06-01 14:13 ` Lorenzo Stoakes
2026-05-22 15:00 ` [PATCH mm-unstable v18 09/14] mm/khugepaged: improve tracepoints for mTHP orders Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 10/14] mm/khugepaged: introduce collapse_allowable_orders helper function Nico Pache
2026-05-31 20:18 ` David Hildenbrand (Arm)
2026-06-01 14:35 ` Lorenzo Stoakes
2026-06-01 14:40 ` David Hildenbrand (Arm)
2026-05-22 15:00 ` [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support Nico Pache
2026-05-25 14:15 ` Nico Pache
2026-05-25 19:10 ` Andrew Morton
2026-05-26 6:57 ` Wei Yang
2026-05-26 12:07 ` Nico Pache
2026-05-28 8:42 ` Wei Yang
2026-05-28 17:11 ` Nico Pache
2026-05-31 7:18 ` Lance Yang
2026-05-31 8:48 ` Lance Yang
2026-06-01 12:01 ` Nico Pache
2026-06-01 12:06 ` David Hildenbrand (Arm)
2026-06-02 10:58 ` Nico Pache
2026-06-02 15:44 ` Lance Yang
2026-06-03 8:05 ` David Hildenbrand (Arm)
2026-06-04 14:40 ` Lorenzo Stoakes
2026-06-01 8:11 ` David Hildenbrand (Arm)
2026-06-01 12:40 ` Nico Pache
2026-06-01 13:15 ` David Hildenbrand (Arm)
2026-06-02 17:23 ` Nico Pache
2026-06-02 17:26 ` Nico Pache
2026-06-03 9:55 ` David Hildenbrand (Arm)
2026-06-03 10:00 ` David Hildenbrand (Arm)
2026-06-03 12:16 ` Nico Pache
2026-06-03 12:27 ` David Hildenbrand (Arm)
2026-06-04 14:14 ` Lorenzo Stoakes
2026-06-04 14:19 ` Lorenzo Stoakes
2026-06-04 13:53 ` Lorenzo Stoakes
2026-06-04 13:59 ` Lorenzo Stoakes
2026-06-04 14:45 ` Lorenzo Stoakes
2026-06-05 11:07 ` Nico Pache
2026-06-05 11:08 ` Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2026-05-31 7:31 ` Lance Yang
2026-05-31 20:02 ` David Hildenbrand (Arm)
2026-06-01 1:53 ` Lance Yang
2026-05-22 15:00 ` [PATCH mm-unstable v18 13/14] mm/khugepaged: run khugepaged for all orders Nico Pache
2026-05-22 15:00 ` [PATCH mm-unstable v18 14/14] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2026-05-22 21:58 ` David Hildenbrand (Arm)
2026-05-26 12:00 ` Nico Pache
2026-05-26 14:45 ` Nico Pache
2026-05-22 15:07 ` [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP collapse support Nico Pache
2026-05-22 15:13 ` Vlastimil Babka (SUSE)
2026-05-22 16:11 ` Nico Pache
2026-05-22 21:13 ` David Hildenbrand (Arm)
2026-05-26 8:33 ` Process (was Re: [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP) " Lorenzo Stoakes
2026-05-26 19:09 ` Andrew Morton
2026-05-26 20:42 ` Vlastimil Babka (SUSE)
2026-05-31 19:49 ` David Hildenbrand (Arm)
2026-06-01 15:41 ` Lorenzo Stoakes
2026-06-01 15:45 ` David Hildenbrand (Arm)
2026-06-01 16:16 ` Lorenzo Stoakes
2026-06-02 11:20 ` David Hildenbrand (Arm)
2026-06-02 11:31 ` David Hildenbrand (Arm)
2026-06-02 12:47 ` Lorenzo Stoakes
2026-06-02 12:55 ` Vlastimil Babka (SUSE)
2026-06-02 13:01 ` David Hildenbrand (Arm)
2026-06-02 17:31 ` Mike Rapoport
2026-06-03 6:48 ` Lorenzo Stoakes
2026-06-03 8:39 ` Mike Rapoport
2026-06-03 9:57 ` Mark Brown
2026-06-03 10:51 ` Mike Rapoport
2026-06-03 9:03 ` Mark Brown
2026-06-02 12:40 ` Lorenzo Stoakes
2026-06-02 12:49 ` David Hildenbrand (Arm)
2026-06-02 12:47 ` Vlastimil Babka (SUSE)
2026-06-02 12:58 ` David Hildenbrand (Arm)
2026-06-02 13:08 ` Vlastimil Babka (SUSE)
2026-06-02 13:16 ` David Hildenbrand (Arm)
2026-06-03 1:48 ` SeongJae Park
2026-06-05 15:24 ` David Hildenbrand (Arm)
2026-06-01 15:37 ` Lorenzo Stoakes
2026-06-01 15:43 ` David Hildenbrand (Arm)
2026-06-01 15:47 ` Lorenzo Stoakes
2026-06-01 16:00 ` David Hildenbrand (Arm)
2026-05-22 15:16 ` [PATCH mm-hotfixes-unstable v18 00/14] khugepaged: add mTHP " Lorenzo Stoakes
2026-05-22 16:08 ` Nico Pache
2026-05-22 16:19 ` Lorenzo Stoakes
2026-05-22 16:31 ` Nico Pache
2026-05-22 17:12 ` Lorenzo Stoakes
2026-05-26 8:14 ` Lorenzo Stoakes
2026-05-22 15:13 ` Lorenzo Stoakes
2026-05-22 20:47 ` Andrew Morton
2026-06-01 15:58 ` Alexander Gordeev
2026-06-01 17:05 ` Nico Pache
2026-06-01 17:08 ` Lorenzo Stoakes
2026-06-02 1:53 ` Lance Yang
2026-06-04 10:10 ` Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eabb0c67-e595-4215-b88b-9c821b4de01d@kernel.org \
--to=david@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=apopple@nvidia.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=byungchul@sk.com \
--cc=catalin.marinas@arm.com \
--cc=cl@gentwo.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=dev.jain@arm.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jackmanb@google.com \
--cc=jannh@google.com \
--cc=jglisse@google.com \
--cc=joshua.hahnjy@gmail.com \
--cc=kas@kernel.org \
--cc=lance.yang@linux.dev \
--cc=liam@infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=ljs@kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=matthew.brost@intel.com \
--cc=mhiramat@kernel.org \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rakie.kim@sk.com \
--cc=raquini@redhat.com \
--cc=rdunlap@infradead.org \
--cc=richard.weiyang@gmail.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=shivankg@amd.com \
--cc=sunnanyong@huawei.com \
--cc=surenb@google.com \
--cc=thomas.hellstrom@linux.intel.com \
--cc=tiwai@suse.de \
--cc=usama.arif@linux.dev \
--cc=usamaarif642@gmail.com \
--cc=vbabka@suse.cz \
--cc=vishal.moola@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=ying.huang@linux.alibaba.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.