From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ryan Roberts <ryan.roberts@arm.com>,
Matthew Wilcox <willy@infradead.org>,
Hugh Dickins <hughd@google.com>,
Yin Fengwei <fengwei.yin@intel.com>,
Yang Shi <shy828301@gmail.com>, Ying Huang <ying.huang@intel.com>,
Zi Yan <ziy@nvidia.com>, Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Subject: [PATCH WIP v1 09/20] mm: improve folio_mapped_shared() for partially-mappable folios using rmap IDs
Date: Fri, 24 Nov 2023 14:26:14 +0100 [thread overview]
Message-ID: <20231124132626.235350-10-david@redhat.com> (raw)
In-Reply-To: <20231124132626.235350-1-david@redhat.com>
Let's make folio_mapped_shared() precise by using or rmap ID
magic to identify if a single MM is responsible for all mappings.
If there is a lot of concurrent (un)map activity, we could theoretically
spin for quite a while. But we're only looking at the rmap values in case
we didn't already identify the folio as "obviously shared". In most
cases, there should only be one or a handful of page tables involved.
For current THPs with ~512 .. 2048 subpages, we really shouldn't see a
lot of concurrent updates that keep us spinning for a long time. Anyhow,
if ever a problem this can be optimized later if there is real demand.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 21 ++++++++++++---
include/linux/rmap.h | 2 ++
mm/rmap_id.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 765e688690f1..1081a8faa1a3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2113,6 +2113,17 @@ static inline size_t folio_size(struct folio *folio)
return PAGE_SIZE << folio_order(folio);
}
+#ifdef CONFIG_RMAP_ID
+bool __folio_large_mapped_shared(struct folio *folio, struct mm_struct *mm);
+#else
+static inline bool __folio_large_mapped_shared(struct folio *folio,
+ struct mm_struct *mm)
+{
+ /* ... guess based on the mapcount of the first page of the folio. */
+ return atomic_read(&folio->page._mapcount) > 0;
+}
+#endif
+
/**
* folio_mapped_shared - Report if a folio is certainly mapped by
* multiple entities in their page tables
@@ -2141,8 +2152,11 @@ static inline size_t folio_size(struct folio *folio)
* PMD-mapped PMD-sized THP), the result will be exactly correct.
*
* For all other (partially-mappable) folios, such as PTE-mapped THP, the
- * return value is partially fuzzy: true is not fuzzy, because it means
- * "certainly mapped shared", but false means "maybe mapped exclusively".
+ * return value is partially fuzzy without CONFIG_RMAP_ID: true is not fuzzy,
+ * because it means "certainly mapped shared", but false means
+ * "maybe mapped exclusively".
+ *
+ * With CONFIG_RMAP_ID, the result will be exactly correct.
*
* Note that this function only considers *current* page table mappings
* tracked via rmap -- that properly adjusts the folio mapcount(s) -- and
@@ -2177,8 +2191,7 @@ static inline bool folio_mapped_shared(struct folio *folio,
*/
if (total_mapcount > folio_nr_pages(folio))
return true;
- /* ... guess based on the mapcount of the first page of the folio. */
- return atomic_read(&folio->page._mapcount) > 0;
+ return __folio_large_mapped_shared(folio, mm);
}
#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 19c9dc3216df..a73e146d82d1 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -253,6 +253,8 @@ void __folio_set_large_rmap_val(struct folio *folio, int count,
struct mm_struct *mm);
void __folio_add_large_rmap_val(struct folio *folio, int count,
struct mm_struct *mm);
+bool __folio_has_large_matching_rmap_val(struct folio *folio, int count,
+ struct mm_struct *mm);
#else
static inline void __folio_prep_large_rmap(struct folio *folio)
{
diff --git a/mm/rmap_id.c b/mm/rmap_id.c
index e66b0f5aea2d..85a61c830f19 100644
--- a/mm/rmap_id.c
+++ b/mm/rmap_id.c
@@ -322,6 +322,69 @@ void __folio_add_large_rmap_val(struct folio *folio, int count,
}
}
+bool __folio_has_large_matching_rmap_val(struct folio *folio, int count,
+ struct mm_struct *mm)
+{
+ const unsigned int order = folio_order(folio);
+ unsigned long diff = 0;
+
+ switch (order) {
+#if MAX_ORDER >= RMAP_SUBID_6_MIN_ORDER
+ case RMAP_SUBID_6_MIN_ORDER .. RMAP_SUBID_6_MAX_ORDER:
+ diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_6(mm, 0) * count);
+ diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_6(mm, 1) * count);
+ diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_6(mm, 2) * count);
+ diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_6(mm, 3) * count);
+ diff |= atomic_long_read(&folio->_rmap_val4) ^ (get_rmap_subid_6(mm, 4) * count);
+ diff |= atomic_long_read(&folio->_rmap_val5) ^ (get_rmap_subid_6(mm, 5) * count);
+ break;
+#endif
+#if MAX_ORDER >= RMAP_SUBID_5_MIN_ORDER
+ case RMAP_SUBID_5_MIN_ORDER .. RMAP_SUBID_5_MAX_ORDER:
+ diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_5(mm, 0) * count);
+ diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_5(mm, 1) * count);
+ diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_5(mm, 2) * count);
+ diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_5(mm, 3) * count);
+ diff |= atomic_long_read(&folio->_rmap_val4) ^ (get_rmap_subid_5(mm, 4) * count);
+ break;
+#endif
+ default:
+ diff |= atomic_long_read(&folio->_rmap_val0) ^ (get_rmap_subid_4(mm, 0) * count);
+ diff |= atomic_long_read(&folio->_rmap_val1) ^ (get_rmap_subid_4(mm, 1) * count);
+ diff |= atomic_long_read(&folio->_rmap_val2) ^ (get_rmap_subid_4(mm, 2) * count);
+ diff |= atomic_long_read(&folio->_rmap_val3) ^ (get_rmap_subid_4(mm, 3) * count);
+ break;
+ }
+ return !diff;
+}
+
+bool __folio_large_mapped_shared(struct folio *folio, struct mm_struct *mm)
+{
+ unsigned long start;
+ bool exclusive;
+ int mapcount;
+
+ VM_WARN_ON_ONCE(!folio_test_large_rmappable(folio));
+ VM_WARN_ON_ONCE(folio_test_hugetlb(folio));
+
+ /*
+ * Livelocking here is unlikely, as the caller already handles the
+ * "obviously shared" cases. If ever an issue and there is too much
+ * concurrent (un)mapping happening (using different page tables), we
+ * could stop earlier and just return "shared".
+ */
+ do {
+ start = raw_read_atomic_seqcount_begin(&folio->_rmap_atomic_seqcount);
+ mapcount = folio_mapcount(folio);
+ if (unlikely(mapcount > folio_nr_pages(folio)))
+ return true;
+ exclusive = __folio_has_large_matching_rmap_val(folio, mapcount, mm);
+ } while (raw_read_atomic_seqcount_retry(&folio->_rmap_atomic_seqcount,
+ start));
+
+ return !exclusive;
+}
+
int alloc_rmap_id(void)
{
int id;
--
2.41.0
next prev parent reply other threads:[~2023-11-24 13:27 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-24 13:26 [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 01/20] mm/rmap: factor out adding folio range into __folio_add_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 02/20] mm: add a total mapcount for large folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 03/20] mm: convert folio_estimated_sharers() to folio_mapped_shared() and improve it David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 04/20] mm/rmap: pass dst_vma to page_try_dup_anon_rmap() and page_dup_file_rmap() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 05/20] mm/rmap: abstract total mapcount operations for partially-mappable folios David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 06/20] atomic_seqcount: new (raw) seqcount variant to support concurrent writers David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 07/20] mm/rmap_id: track if one ore multiple MMs map a partially-mappable folio David Hildenbrand
2023-12-17 19:13 ` Nadav Amit
2023-12-18 14:04 ` David Hildenbrand
2023-12-18 14:34 ` Nadav Amit
2023-11-24 13:26 ` [PATCH WIP v1 08/20] mm: pass MM to folio_mapped_shared() David Hildenbrand
2023-11-24 13:26 ` David Hildenbrand [this message]
2023-11-24 13:26 ` [PATCH WIP v1 10/20] mm/memory: COW reuse support for PTE-mapped THP with rmap IDs David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 11/20] mm/rmap_id: support for 1, 2 and 3 values by manual calculation David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 12/20] mm/rmap: introduce folio_add_anon_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 13/20] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() " David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 15/20] mm/rmap_id: verify precalculated subids with CONFIG_DEBUG_VM David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 16/20] atomic_seqcount: support a single exclusive writer in the absence of other writers David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 17/20] mm/rmap_id: reduce atomic RMW operations when we are the exclusive writer David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 18/20] atomic_seqcount: use atomic add-return instead of atomic cmpxchg on 64bit David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 19/20] mm/rmap: factor out removing folio range into __folio_remove_rmap_range() David Hildenbrand
2023-11-24 13:26 ` [PATCH WIP v1 20/20] mm/rmap: perform all mapcount operations of large folios under the rmap seqcount David Hildenbrand
2023-11-24 20:55 ` [PATCH WIP v1 00/20] mm: precise "mapped shared" vs. "mapped exclusively" detection for PTE-mapped THP / partially-mappable folios Linus Torvalds
2023-11-25 17:02 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231124132626.235350-10-david@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fengwei.yin@intel.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).