From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,willy@infradead.org,vbabka@suse.cz,tj@kernel.org,tglx@linutronix.de,muchun.song@linux.dev,mkoutny@suse.com,mingo@redhat.com,luto@kernel.org,lorenzo.stoakes@oracle.com,lizefan.x@bytedance.com,liam.howlett@oracle.com,kirill.shutemov@linux.intel.com,jannh@google.com,ioworker0@gmail.com,hannes@cmpxchg.org,dave.hansen@linux.intel.com,corbet@lwn.net,bp@alien8.de,david@redhat.com,akpm@linux-foundation.org
Subject: + mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch added to mm-unstable branch
Date: Mon, 03 Mar 2025 14:56:11 -0800 [thread overview]
Message-ID: <20250303225611.A246DC4CEE8@smtp.kernel.org> (raw)
The patch titled
Subject: mm: Copy-on-Write (COW) reuse support for PTE-mapped THP
has been added to the -mm mm-unstable branch. Its filename is
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david@redhat.com>
Subject: mm: Copy-on-Write (COW) reuse support for PTE-mapped THP
Date: Mon, 3 Mar 2025 17:30:06 +0100
Currently, we never end up reusing PTE-mapped THPs after fork. This
wasn't really a problem with PMD-sized THPs, because they would have to be
PTE-mapped first, but it's getting a problem with smaller THP sizes that
are effectively always PTE-mapped.
With our new "mapped exclusively" vs "maybe mapped shared" logic for large
folios, implementing CoW reuse for PTE-mapped THPs is straight forward: if
exclusively mapped, make sure that all references are from these (our)
mappings. Add some helpful comments to explain the details.
CONFIG_TRANSPARENT_HUGEPAGE selects CONFIG_MM_ID. If we spot an anon
large folio without CONFIG_TRANSPARENT_HUGEPAGE in that code, something is
seriously messed up.
There are plenty of things we can optimize in the future: For example, we
could remember that the folio is fully exclusive so we could speedup the
next fault further. Also, we could try "faulting around", turning
surrounding PTEs that map the same folio writable. But especially the
latter might increase COW latency, so it would need further investigation.
Link: https://lkml.kernel.org/r/20250303163014.1128035-14-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andy Lutomirks^H^Hski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michal Koutn <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: tejun heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zefan Li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
mm/memory.c | 83 +++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 75 insertions(+), 8 deletions(-)
--- a/mm/memory.c~mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp
+++ a/mm/memory.c
@@ -3729,19 +3729,86 @@ static vm_fault_t wp_page_shared(struct
return ret;
}
-static bool wp_can_reuse_anon_folio(struct folio *folio,
- struct vm_area_struct *vma)
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
+ struct vm_area_struct *vma)
{
+ bool exclusive = false;
+
+ /* Let's just free up a large folio if only a single page is mapped. */
+ if (folio_large_mapcount(folio) <= 1)
+ return false;
+
/*
- * We could currently only reuse a subpage of a large folio if no
- * other subpages of the large folios are still mapped. However,
- * let's just consistently not reuse subpages even if we could
- * reuse in that scenario, and give back a large folio a bit
- * sooner.
+ * The assumption for anonymous folios is that each page can only get
+ * mapped once into each MM. The only exception are KSM folios, which
+ * are always small.
+ *
+ * Each taken mapcount must be paired with exactly one taken reference,
+ * whereby the refcount must be incremented before the mapcount when
+ * mapping a page, and the refcount must be decremented after the
+ * mapcount when unmapping a page.
+ *
+ * If all folio references are from mappings, and all mappings are in
+ * the page tables of this MM, then this folio is exclusive to this MM.
*/
- if (folio_test_large(folio))
+ if (folio_test_large_maybe_mapped_shared(folio))
return false;
+ VM_WARN_ON_ONCE(folio_test_ksm(folio));
+ VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio));
+ VM_WARN_ON_ONCE(folio_entire_mapcount(folio));
+
+ if (unlikely(folio_test_swapcache(folio))) {
+ /*
+ * Note: freeing up the swapcache will fail if some PTEs are
+ * still swap entries.
+ */
+ if (!folio_trylock(folio))
+ return false;
+ folio_free_swap(folio);
+ folio_unlock(folio);
+ }
+
+ if (folio_large_mapcount(folio) != folio_ref_count(folio))
+ return false;
+
+ /* Stabilize the mapcount vs. refcount and recheck. */
+ folio_lock_large_mapcount(folio);
+ VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(folio));
+
+ if (folio_test_large_maybe_mapped_shared(folio))
+ goto unlock;
+ if (folio_large_mapcount(folio) != folio_ref_count(folio))
+ goto unlock;
+
+ VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != vma->vm_mm->mm_id &&
+ folio_mm_id(folio, 1) != vma->vm_mm->mm_id);
+
+ /*
+ * Do we need the folio lock? Likely not. If there would have been
+ * references from page migration/swapout, we would have detected
+ * an additional folio reference and never ended up here.
+ */
+ exclusive = true;
+unlock:
+ folio_unlock_large_mapcount(folio);
+ return exclusive;
+}
+#else /* !CONFIG_TRANSPARENT_HUGEPAGE */
+static bool __wp_can_reuse_large_anon_folio(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ BUILD_BUG();
+}
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+static bool wp_can_reuse_anon_folio(struct folio *folio,
+ struct vm_area_struct *vma)
+{
+ if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio))
+ return __wp_can_reuse_large_anon_folio(folio, vma);
+
/*
* We have to verify under folio lock: these early checks are
* just an optimization to avoid locking the folio and freeing
_
Patches currently in -mm which might be from david@redhat.com are
mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch
mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch
mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch
mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive-fix.patch
mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch
mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch
mm-use-single-swp_device_exclusive-entry-type.patch
mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch
kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch
mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch
mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch
mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch
mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch
mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch
mm-rmap-avoid-ebusy-from-make_device_exclusive.patch
lib-test_hmm-make-dmirror_atomic_map-consume-a-single-page.patch
mm-memory-remove-pageanonexclusive-sanity-check-in-restore_exclusive_pte.patch
mm-memory-pass-folio-and-pte-to-restore_exclusive_pte.patch
mm-memory-document-restore_exclusive_pte.patch
mm-mmu_notifier-use-mmu_notify_clear-in-remove_device_exclusive_entry.patch
mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch
mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch
mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch
mm-move-hugetlb-specific-things-in-folio-to-page.patch
mm-move-_pincount-in-folio-to-page-on-32bit.patch
mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch
mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch
mm-rmap-pass-vma-to-__folio_add_rmap.patch
mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch
bit_spinlock-__always_inline-unlock-functions.patch
mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch
mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch
mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch
mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch
mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch
fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch
fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch
mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch
reply other threads:[~2025-03-03 22:56 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250303225611.A246DC4CEE8@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=liam.howlett@oracle.com \
--cc=lizefan.x@bytedance.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=muchun.song@linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.