From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
linux-fsdevel@vger.kernel.org,
"David Hildenbrand" <david@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Tejun Heo" <tj@kernel.org>, "Zefan Li" <lizefan.x@bytedance.com>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Andy Lutomirski" <luto@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Ingo Molnar" <mingo@redhat.com>,
"Borislav Petkov" <bp@alien8.de>,
"Dave Hansen" <dave.hansen@linux.intel.com>
Subject: [PATCH v1 12/17] mm: remove per-page mapcount dependency in folio_likely_mapped_shared() (CONFIG_NO_PAGE_MAPCOUNT)
Date: Thu, 29 Aug 2024 18:56:15 +0200 [thread overview]
Message-ID: <20240829165627.2256514-13-david@redhat.com> (raw)
In-Reply-To: <20240829165627.2256514-1-david@redhat.com>
Let's remove the dependency on the mapcount of the first folio page in
large folios and consequently any "false negatives" from
folio_likely_mapped_shared().
In theory, we could implement this change only with CONFIG_MM_ID,
without gluing it to another config option. But we'll be a bit
careful for the time being, because folio_likely_mapped_shared() can now
return "false positives" more frequently. Glue it to
CONFIG_NO_PAGE_MAPCOUNT, which expresses the "EXPERIMENTAL" character for
now.
Let's reuse our new MM ownership tracking infrastructure for large folios.
Thoroughly document the changed semantics. We might now detect that a
folio as "mapped shared" although it no longer is -- this can only happen
if more than two MMs mapped a folio at the same time, and neither of the
first two is the last one mapping the folio.
"false positives" in this context are certainly better than "false
negatives" when it comes to enforcing policies (e.g., is process 1
allowed to migrate a folio that might also be used by another process?),
but in an ideal world we wouldn't have these "false positives" either.
It's worth noting that there will not be a change for small folios and
hugetlb folios. In general, for PMD-mapped THP we don't expect a change,
only for PTE-mapped THP.
This will affect various users of folio_likely_mapped_shared():
(1) khugepaged counts PTEs that target shared folios towards the
max_ptes_shared. With false positives we might collapse too little,
with false negatives too much.
(2) NUMA hinting: PROT_NONE NUMA protection will be skipped for shared
folios in COW mappings. With false positives we skip too many, with
false negatives we don't skip some we should be skipping.
During NUMA hinting faults, we will set TNF_SHARED with shared folios
in shared mappings. With false positives we set it too often, with
false negatives not often enough.
During NUMA hinting faults, we will reject to migrate shared folios in
mappings with execute permissions (expectation: shared libraries).
With false positives we reject to migrate some, with false negatives
we migrate too many.
(3) MADV_COLD / MADV_PAGEOUT / MADV_FREE will not try splitting PTE-mapped
THPs that are considered shared but not fully covered by the
requested range, consequently not processing them. With false
positives we will not split+process some we could have processed, with
false negatives we split some folios we probably shouldn't have split.
(4) mbind() / migrate_pages() / move_pages() will refuse to migrate shared
folios unless MPOL_MF_MOVE_ALL is effective (requires CAP_SYS_NICE).
With false positives we reject to migrate some folios that could be
migrated, with false negatives we migrate some folios that shouldn't
have been migrated.
(5) folio_referenced_one() will skip exclusive swapbacked folios in
dying processes. Shared folios will not be skipped. With false
positives we might skip this optimization, with false negatives we
might apply this optimization wrongly.
Likely (3) and (4) are not really used a lot on folios that are heavily
shared among processes -- rather on anonymous memory (mostly from a
single parent process) or almost-exclusively mmap'ed files.
Similarly (1) is not expected to matter much in practice, and if so,
only for long-running child processes after fork(). But even here, it's
unlikely that it matters in practice.
(5) is not expected to matter much at all, it's a new optimization
either way.
(2) is interesting: the expectation here is that for anon folios it
might not make a big difference. For file-backed pages it might,
we'll have to learn about that.
Long story short: this paves the way for a complete
CONFIG_NO_PAGE_MAPCOUNT implementation, but maybe we'll have to
switch to another MM ownership tracking later.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 98411e53da916..b37f20b26776d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2142,9 +2142,9 @@ static inline size_t folio_size(const struct folio *folio)
* are independent.
*
* As precise information is not easily available for all folios, this function
- * estimates the number of MMs ("sharers") that are currently mapping a folio
- * using the number of times the first page of the folio is currently mapped
- * into page tables.
+ * must sometimes estimate the number of MMs ("sharers") that are currently
+ * mapping a folio using the number of times the first page of the folio is
+ * currently mapped into page tables.
*
* For small anonymous folios and anonymous hugetlb folios, the return
* value will be exactly correct: non-KSM folios can only be mapped at most once
@@ -2152,13 +2152,21 @@ static inline size_t folio_size(const struct folio *folio)
* considered shared even if mapped multiple times into the same MM.
*
* For other folios, the result can be fuzzy:
- * #. For partially-mappable large folios (THP), the return value can wrongly
- * indicate "mapped exclusively" (false negative) when the folio is
- * only partially mapped into at least one MM.
+ * #. With CONFIG_PAGE_MAPCOUNT: For partially-mappable large folios (THP),
+ * the return value can wrongly indicate "mapped exclusively" (false
+ * negative) when the folio is only partially mapped into at least one MM.
+ * #. With CONFIG_NO_PAGE_MAPCOUNT: For partially-mappable large folios
+ * (THP), the return value can wrongly indicate "mapped shared" (false
+ * positive) in some scenarios. This can only happen if two MMs are
+ * already mapping a folio and a more MM starts mapping the folio. We
+ * would still the detect the folio as "mapped shared" after the first
+ * two MMs no longer map the folio.
* #. For pagecache folios (including hugetlb), the return value can wrongly
* indicate "mapped shared" (false positive) when two VMAs in the same MM
* cover the same file range.
*
+ * With CONFIG_MM_ID, this function will never return "false negatives".
+ *
* Further, this function only considers current page table mappings that
* are tracked using the folio mapcount(s).
*
@@ -2183,12 +2191,16 @@ static inline bool folio_likely_mapped_shared(struct folio *folio)
if (mapcount <= 1)
return false;
+#ifdef CONFIG_PAGE_MAPCOUNT
/* If any page is mapped more than once we treat it "mapped shared". */
if (folio_entire_mapcount(folio) || mapcount > folio_large_nr_pages(folio))
return true;
/* Let's guess based on the first subpage. */
return atomic_read(&folio->_mapcount) > 0;
+#else /* !CONFIG_PAGE_MAPCOUNT */
+ return !folio_test_large_mapped_exclusively(folio);
+#endif /* !CONFIG_PAGE_MAPCOUNT */
}
#ifndef HAVE_ARCH_MAKE_FOLIO_ACCESSIBLE
--
2.46.0
next prev parent reply other threads:[~2024-08-29 16:58 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-29 16:56 [PATCH v1 00/17] mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 01/17] mm: factor out large folio handling from folio_order() into folio_large_order() David Hildenbrand
2024-09-23 4:44 ` Lance Yang
2024-10-23 11:11 ` Kirill A. Shutemov
2024-08-29 16:56 ` [PATCH v1 02/17] mm: factor out large folio handling from folio_nr_pages() into folio_large_nr_pages() David Hildenbrand
2024-10-23 11:18 ` Kirill A. Shutemov
2024-12-06 10:29 ` David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 03/17] mm/rmap: use folio_large_nr_pages() in add/remove functions David Hildenbrand
2024-10-23 11:22 ` Kirill A. Shutemov
2024-08-29 16:56 ` [PATCH v1 04/17] mm: let _folio_nr_pages overlay memcg_data in first tail page David Hildenbrand
2024-10-23 11:38 ` Kirill A. Shutemov
2024-10-23 11:40 ` David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 05/17] mm/rmap: pass dst_vma to page_try_dup_anon_rmap() and page_dup_file_rmap() David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 06/17] mm/rmap: pass vma to __folio_add_rmap() David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 07/17] mm/rmap: abstract large mapcount operations for large folios (!hugetlb) David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 08/17] mm/rmap: initial MM owner tracking " David Hildenbrand
2024-10-23 13:08 ` Kirill A. Shutemov
2024-10-23 13:28 ` David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 09/17] bit_spinlock: __always_inline (un)lock functions David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 10/17] mm: COW reuse support for PTE-mapped THP with CONFIG_MM_ID David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 11/17] mm: CONFIG_NO_PAGE_MAPCOUNT to prepare for not maintain per-page mapcounts in large folios David Hildenbrand
2024-08-29 16:56 ` David Hildenbrand [this message]
2024-08-29 16:56 ` [PATCH v1 13/17] fs/proc/page: remove per-page mapcount dependency for /proc/kpagecount (CONFIG_NO_PAGE_MAPCOUNT) David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 14/17] fs/proc/task_mmu: remove per-page mapcount dependency for PM_MMAP_EXCLUSIVE (CONFIG_NO_PAGE_MAPCOUNT) David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 15/17] fs/proc/task_mmu: remove per-page mapcount dependency for "mapmax" (CONFIG_NO_PAGE_MAPCOUNT) David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 16/17] fs/proc/task_mmu: remove per-page mapcount dependency for smaps/smaps_rollup (CONFIG_NO_PAGE_MAPCOUNT) David Hildenbrand
2024-08-29 16:56 ` [PATCH v1 17/17] mm: stop maintaining the per-page mapcount of large folios (CONFIG_NO_PAGE_MAPCOUNT) David Hildenbrand
2024-10-23 9:10 ` [PATCH v1 00/17] mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240829165627.2256514-13-david@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).