From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43C141F03CF for ; Mon, 3 Mar 2025 22:56:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741042572; cv=none; b=UiVuDgwv9K4O1C50p4NfANtdXN4+wGVsPT1QnXpOGZil76RucC8dap7Muj/lTlYL6dTzNayLhEYzH1/+A8HyD1khYRXFW1YRjMBUPXPmK4w6l+koEjJtLuVuAIID+lucnlP2uQ87grsxBoqxDYhFu6et0R/ASaAhTRbAqHKbO4c= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741042572; c=relaxed/simple; bh=lXyvQvVPkClST7YxH4k4JonWqxkWnjmswHbdl+6pI+c=; h=Date:To:From:Subject:Message-Id; b=FIRASjGDa3nP3GXqkIdqF24aKZcPQvDTFbgdn/Uptw5vPNay9OMrgbCHZo6MX0iptckOu0xs1Ekk1myeP9O7Umm67cwPVLF4XqMo5ZQeubycZ8g2qWqWdoUoTVy7CUrYilFqXdyD5/oWF9bKcgUIb8J1reM5dfAN74vyhtAWnhw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=h489iIeH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="h489iIeH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A246DC4CEE8; Mon, 3 Mar 2025 22:56:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1741042571; bh=lXyvQvVPkClST7YxH4k4JonWqxkWnjmswHbdl+6pI+c=; h=Date:To:From:Subject:From; b=h489iIeHhg7afFMiNbH1VXOH30DEDdgluQCA9UpRS5i8XtlTMPzJ++4sXAhusBbp4 6ZIWWtbla2lC5ONJav3Bp5B8W/k56p7tOJcVA9ji+vmGdX8tgUPMyIKIXdPTIXXCIV Omn5kJKLcuYqAiEE822ONH3NcBvkGB5s7xPP6JdU= Date: Mon, 03 Mar 2025 14:56:11 -0800 To: mm-commits@vger.kernel.org,willy@infradead.org,vbabka@suse.cz,tj@kernel.org,tglx@linutronix.de,muchun.song@linux.dev,mkoutny@suse.com,mingo@redhat.com,luto@kernel.org,lorenzo.stoakes@oracle.com,lizefan.x@bytedance.com,liam.howlett@oracle.com,kirill.shutemov@linux.intel.com,jannh@google.com,ioworker0@gmail.com,hannes@cmpxchg.org,dave.hansen@linux.intel.com,corbet@lwn.net,bp@alien8.de,david@redhat.com,akpm@linux-foundation.org From: Andrew Morton Subject: + mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch added to mm-unstable branch Message-Id: <20250303225611.A246DC4CEE8@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: mm: Copy-on-Write (COW) reuse support for PTE-mapped THP has been added to the -mm mm-unstable branch. Its filename is mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: David Hildenbrand Subject: mm: Copy-on-Write (COW) reuse support for PTE-mapped THP Date: Mon, 3 Mar 2025 17:30:06 +0100 Currently, we never end up reusing PTE-mapped THPs after fork. This wasn't really a problem with PMD-sized THPs, because they would have to be PTE-mapped first, but it's getting a problem with smaller THP sizes that are effectively always PTE-mapped. With our new "mapped exclusively" vs "maybe mapped shared" logic for large folios, implementing CoW reuse for PTE-mapped THPs is straight forward: if exclusively mapped, make sure that all references are from these (our) mappings. Add some helpful comments to explain the details. CONFIG_TRANSPARENT_HUGEPAGE selects CONFIG_MM_ID. If we spot an anon large folio without CONFIG_TRANSPARENT_HUGEPAGE in that code, something is seriously messed up. There are plenty of things we can optimize in the future: For example, we could remember that the folio is fully exclusive so we could speedup the next fault further. Also, we could try "faulting around", turning surrounding PTEs that map the same folio writable. But especially the latter might increase COW latency, so it would need further investigation. Link: https://lkml.kernel.org/r/20250303163014.1128035-14-david@redhat.com Signed-off-by: David Hildenbrand Cc: Andy Lutomirks^H^Hski Cc: Borislav Betkov Cc: Dave Hansen Cc: Ingo Molnar Cc: Jann Horn Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Kirill A. Shutemov Cc: Lance Yang Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Matthew Wilcow (Oracle) Cc: Michal Koutn Cc: Muchun Song Cc: tejun heo Cc: Thomas Gleixner Cc: Vlastimil Babka Cc: Zefan Li Signed-off-by: Andrew Morton --- mm/memory.c | 83 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 75 insertions(+), 8 deletions(-) --- a/mm/memory.c~mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp +++ a/mm/memory.c @@ -3729,19 +3729,86 @@ static vm_fault_t wp_page_shared(struct return ret; } -static bool wp_can_reuse_anon_folio(struct folio *folio, - struct vm_area_struct *vma) +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) { + bool exclusive = false; + + /* Let's just free up a large folio if only a single page is mapped. */ + if (folio_large_mapcount(folio) <= 1) + return false; + /* - * We could currently only reuse a subpage of a large folio if no - * other subpages of the large folios are still mapped. However, - * let's just consistently not reuse subpages even if we could - * reuse in that scenario, and give back a large folio a bit - * sooner. + * The assumption for anonymous folios is that each page can only get + * mapped once into each MM. The only exception are KSM folios, which + * are always small. + * + * Each taken mapcount must be paired with exactly one taken reference, + * whereby the refcount must be incremented before the mapcount when + * mapping a page, and the refcount must be decremented after the + * mapcount when unmapping a page. + * + * If all folio references are from mappings, and all mappings are in + * the page tables of this MM, then this folio is exclusive to this MM. */ - if (folio_test_large(folio)) + if (folio_test_large_maybe_mapped_shared(folio)) return false; + VM_WARN_ON_ONCE(folio_test_ksm(folio)); + VM_WARN_ON_ONCE(folio_mapcount(folio) > folio_nr_pages(folio)); + VM_WARN_ON_ONCE(folio_entire_mapcount(folio)); + + if (unlikely(folio_test_swapcache(folio))) { + /* + * Note: freeing up the swapcache will fail if some PTEs are + * still swap entries. + */ + if (!folio_trylock(folio)) + return false; + folio_free_swap(folio); + folio_unlock(folio); + } + + if (folio_large_mapcount(folio) != folio_ref_count(folio)) + return false; + + /* Stabilize the mapcount vs. refcount and recheck. */ + folio_lock_large_mapcount(folio); + VM_WARN_ON_ONCE(folio_large_mapcount(folio) < folio_ref_count(folio)); + + if (folio_test_large_maybe_mapped_shared(folio)) + goto unlock; + if (folio_large_mapcount(folio) != folio_ref_count(folio)) + goto unlock; + + VM_WARN_ON_ONCE(folio_mm_id(folio, 0) != vma->vm_mm->mm_id && + folio_mm_id(folio, 1) != vma->vm_mm->mm_id); + + /* + * Do we need the folio lock? Likely not. If there would have been + * references from page migration/swapout, we would have detected + * an additional folio reference and never ended up here. + */ + exclusive = true; +unlock: + folio_unlock_large_mapcount(folio); + return exclusive; +} +#else /* !CONFIG_TRANSPARENT_HUGEPAGE */ +static bool __wp_can_reuse_large_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + BUILD_BUG(); +} +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ + +static bool wp_can_reuse_anon_folio(struct folio *folio, + struct vm_area_struct *vma) +{ + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && folio_test_large(folio)) + return __wp_can_reuse_large_anon_folio(folio, vma); + /* * We have to verify under folio lock: these early checks are * just an optimization to avoid locking the folio and freeing _ Patches currently in -mm which might be from david@redhat.com are mm-gup-reject-foll_split_pmd-with-hugetlb-vmas.patch mm-rmap-reject-hugetlb-folios-in-folio_make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive.patch mm-rmap-convert-make_device_exclusive_range-to-make_device_exclusive-fix.patch mm-rmap-implement-make_device_exclusive-using-folio_walk-instead-of-rmap-walk.patch mm-memory-detect-writability-in-restore_exclusive_pte-through-can_change_pte_writable.patch mm-use-single-swp_device_exclusive-entry-type.patch mm-page_vma_mapped-device-exclusive-entries-are-not-migration-entries.patch kernel-events-uprobes-handle-device-exclusive-entries-correctly-in-__replace_page.patch mm-ksm-handle-device-exclusive-entries-correctly-in-write_protect_page.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_unmap_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-try_to_migrate_one.patch mm-rmap-handle-device-exclusive-entries-correctly-in-page_vma_mkclean_one.patch mm-page_idle-handle-device-exclusive-entries-correctly-in-page_idle_clear_pte_refs_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_young_one.patch mm-damon-handle-device-exclusive-entries-correctly-in-damon_folio_mkold_one.patch mm-rmap-keep-mapcount-untouched-for-device-exclusive-entries.patch mm-rmap-avoid-ebusy-from-make_device_exclusive.patch lib-test_hmm-make-dmirror_atomic_map-consume-a-single-page.patch mm-memory-remove-pageanonexclusive-sanity-check-in-restore_exclusive_pte.patch mm-memory-pass-folio-and-pte-to-restore_exclusive_pte.patch mm-memory-document-restore_exclusive_pte.patch mm-mmu_notifier-use-mmu_notify_clear-in-remove_device_exclusive_entry.patch mm-factor-out-large-folio-handling-from-folio_order-into-folio_large_order.patch mm-factor-out-large-folio-handling-from-folio_nr_pages-into-folio_large_nr_pages.patch mm-let-_folio_nr_pages-overlay-memcg_data-in-first-tail-page.patch mm-move-hugetlb-specific-things-in-folio-to-page.patch mm-move-_pincount-in-folio-to-page-on-32bit.patch mm-move-_entire_mapcount-in-folio-to-page-on-32bit.patch mm-rmap-pass-dst_vma-to-folio_dup_file_rmap_pte-and-friends.patch mm-rmap-pass-vma-to-__folio_add_rmap.patch mm-rmap-abstract-large-mapcount-operations-for-large-folios-hugetlb.patch bit_spinlock-__always_inline-unlock-functions.patch mm-rmap-use-folio_large_nr_pages-in-add-remove-functions.patch mm-rmap-basic-mm-owner-tracking-for-large-folios-hugetlb.patch mm-copy-on-write-cow-reuse-support-for-pte-mapped-thp.patch mm-convert-folio_likely_mapped_shared-to-folio_maybe_mapped_shared.patch mm-config_no_page_mapcount-to-prepare-for-not-maintain-per-page-mapcounts-in-large-folios.patch fs-proc-page-remove-per-page-mapcount-dependency-for-proc-kpagecount-config_no_page_mapcount.patch fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-pm_mmap_exclusive-config_no_page_mapcount.patch fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-mapmax-config_no_page_mapcount.patch fs-proc-task_mmu-remove-per-page-mapcount-dependency-for-smaps-smaps_rollup-config_no_page_mapcount.patch mm-stop-maintaining-the-per-page-mapcount-of-large-folios-config_no_page_mapcount.patch