From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Shakeel Butt <shakeelb@google.com>,
John Hubbard <jhubbard@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Yang Shi <shy828301@gmail.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
Michal Hocko <mhocko@kernel.org>, Nadav Amit <namit@vmware.com>,
Rik van Riel <riel@surriel.com>, Roman Gushchin <guro@fb.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Xu <peterx@redhat.com>, Donald Dutile <ddutile@redhat.com>,
Christoph Hellwig <hch@lst.de>, Oleg Nesterov <oleg@redhat.com>,
Jan Kara <jack@suse.cz>, Liang Zhang <zhangliang5@huawei.com>,
Pedro Gomes <pedrodemargomes@gmail.com>,
Oded Gabbay <oded.gabbay@gmail.com>,
linux-mm@kvack.org, David Hildenbrand <david@redhat.com>
Subject: [PATCH v4 04/17] mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()
Date: Thu, 28 Apr 2022 10:34:28 +0200 [thread overview]
Message-ID: <20220428083441.37290-5-david@redhat.com> (raw)
In-Reply-To: <20220428083441.37290-1-david@redhat.com>
... and move the special check for pinned pages into
page_try_dup_anon_rmap() to prepare for tracking exclusive anonymous
pages via a new pageflag, clearing it only after making sure that there
are no GUP pins on the anonymous page.
We really only care about pins on anonymous pages, because they are
prone to getting replaced in the COW handler once mapped R/O. For !anon
pages in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really
care about that, at least not that I could come up with an example.
Let's drop the is_cow_mapping() check from page_needs_cow_for_dma(), as we
know we're dealing with anonymous pages. Also, drop the handling of
pinned pages from copy_huge_pud() and add a comment if ever supporting
anonymous pages on the PUD level.
This is a preparation for tracking exclusivity of anonymous pages in
the rmap code, and disallowing marking a page shared (-> failing to
duplicate) if there are GUP pins on a page.
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/linux/mm.h | 5 +----
include/linux/rmap.h | 49 +++++++++++++++++++++++++++++++++++++++++++-
mm/huge_memory.c | 27 ++++++++----------------
mm/hugetlb.c | 16 ++++++++-------
mm/memory.c | 17 ++++++++++-----
mm/migrate.c | 2 +-
6 files changed, 79 insertions(+), 37 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c3c862a2e533..c7b82d078969 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1575,16 +1575,13 @@ static inline bool page_maybe_dma_pinned(struct page *page)
/*
* This should most likely only be called during fork() to see whether we
- * should break the cow immediately for a page on the src mm.
+ * should break the cow immediately for an anon page on the src mm.
*
* The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq.
*/
static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma,
struct page *page)
{
- if (!is_cow_mapping(vma->vm_flags))
- return false;
-
VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1));
if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags))
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 17230c458341..9d602fc34063 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -12,6 +12,7 @@
#include <linux/memcontrol.h>
#include <linux/highmem.h>
#include <linux/pagemap.h>
+#include <linux/memremap.h>
/*
* The anon_vma heads a list of private "related" vmas, to scan if
@@ -182,11 +183,57 @@ void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *,
void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *,
unsigned long address);
-static inline void page_dup_rmap(struct page *page, bool compound)
+static inline void __page_dup_rmap(struct page *page, bool compound)
{
atomic_inc(compound ? compound_mapcount_ptr(page) : &page->_mapcount);
}
+static inline void page_dup_file_rmap(struct page *page, bool compound)
+{
+ __page_dup_rmap(page, compound);
+}
+
+/**
+ * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped
+ * anonymous page
+ * @page: the page to duplicate the mapping for
+ * @compound: the page is mapped as compound or as a small page
+ * @vma: the source vma
+ *
+ * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq.
+ *
+ * Duplicating the mapping can only fail if the page may be pinned; device
+ * private pages cannot get pinned and consequently this function cannot fail.
+ *
+ * If duplicating the mapping succeeds, the page has to be mapped R/O into
+ * the parent and the child. It must *not* get mapped writable after this call.
+ *
+ * Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise.
+ */
+static inline int page_try_dup_anon_rmap(struct page *page, bool compound,
+ struct vm_area_struct *vma)
+{
+ VM_BUG_ON_PAGE(!PageAnon(page), page);
+
+ /*
+ * If this page may have been pinned by the parent process,
+ * don't allow to duplicate the mapping but instead require to e.g.,
+ * copy the page immediately for the child so that we'll always
+ * guarantee the pinned page won't be randomly replaced in the
+ * future on write faults.
+ */
+ if (likely(!is_device_private_page(page) &&
+ unlikely(page_needs_cow_for_dma(vma, page))))
+ return -EBUSY;
+
+ /*
+ * It's okay to share the anon page between both processes, mapping
+ * the page R/O into both processes.
+ */
+ __page_dup_rmap(page, compound);
+ return 0;
+}
+
/*
* Called from mm/vmscan.c to handle paging out
*/
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c468fee595ff..baf4ea6d8e1a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1097,23 +1097,16 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
src_page = pmd_page(pmd);
VM_BUG_ON_PAGE(!PageHead(src_page), src_page);
- /*
- * If this page is a potentially pinned page, split and retry the fault
- * with smaller page size. Normally this should not happen because the
- * userspace should use MADV_DONTFORK upon pinned regions. This is a
- * best effort that the pinned pages won't be replaced by another
- * random page during the coming copy-on-write.
- */
- if (unlikely(page_needs_cow_for_dma(src_vma, src_page))) {
+ get_page(src_page);
+ if (unlikely(page_try_dup_anon_rmap(src_page, true, src_vma))) {
+ /* Page maybe pinned: split and retry the fault on PTEs. */
+ put_page(src_page);
pte_free(dst_mm, pgtable);
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
__split_huge_pmd(src_vma, src_pmd, addr, false, NULL);
return -EAGAIN;
}
-
- get_page(src_page);
- page_dup_rmap(src_page, true);
add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
out_zero_page:
mm_inc_nr_ptes(dst_mm);
@@ -1217,14 +1210,10 @@ int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
/* No huge zero pud yet */
}
- /* Please refer to comments in copy_huge_pmd() */
- if (unlikely(page_needs_cow_for_dma(vma, pud_page(pud)))) {
- spin_unlock(src_ptl);
- spin_unlock(dst_ptl);
- __split_huge_pud(vma, src_pud, addr);
- return -EAGAIN;
- }
-
+ /*
+ * TODO: once we support anonymous pages, use page_try_dup_anon_rmap()
+ * and split if duplicating fails.
+ */
pudp_set_wrprotect(src_mm, addr, src_pud);
pud = pud_mkold(pud_wrprotect(pud));
set_pud_at(dst_mm, addr, dst_pud, pud);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 670117bbc9b4..1a59d40627d9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4788,15 +4788,18 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
get_page(ptepage);
/*
- * This is a rare case where we see pinned hugetlb
- * pages while they're prone to COW. We need to do the
- * COW earlier during fork.
+ * Failing to duplicate the anon rmap is a rare case
+ * where we see pinned hugetlb pages while they're
+ * prone to COW. We need to do the COW earlier during
+ * fork.
*
* When pre-allocating the page or copying data, we
* need to be without the pgtable locks since we could
* sleep during the process.
*/
- if (unlikely(page_needs_cow_for_dma(vma, ptepage))) {
+ if (!PageAnon(ptepage)) {
+ page_dup_file_rmap(ptepage, true);
+ } else if (page_try_dup_anon_rmap(ptepage, true, vma)) {
pte_t src_pte_old = entry;
struct page *new;
@@ -4843,7 +4846,6 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
entry = huge_pte_wrprotect(entry);
}
- page_dup_rmap(ptepage, true);
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(npages, dst);
}
@@ -5523,7 +5525,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
ClearHPageRestoreReserve(page);
hugepage_add_new_anon_rmap(page, vma, haddr);
} else
- page_dup_rmap(page, true);
+ page_dup_file_rmap(page, true);
new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
set_huge_pte_at(mm, haddr, ptep, new_pte);
@@ -5884,7 +5886,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
goto out_release_unlock;
if (vm_shared) {
- page_dup_rmap(page, true);
+ page_dup_file_rmap(page, true);
} else {
ClearHPageRestoreReserve(page);
hugepage_add_new_anon_rmap(page, dst_vma, dst_addr);
diff --git a/mm/memory.c b/mm/memory.c
index ca0b256d1065..2112484682a9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -825,7 +825,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
*/
get_page(page);
rss[mm_counter(page)]++;
- page_dup_rmap(page, false);
+ /* Cannot fail as these pages cannot get pinned. */
+ BUG_ON(page_try_dup_anon_rmap(page, false, src_vma));
/*
* We do not preserve soft-dirty information, because so
@@ -921,18 +922,24 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
struct page *page;
page = vm_normal_page(src_vma, addr, pte);
- if (page && unlikely(page_needs_cow_for_dma(src_vma, page))) {
+ if (page && PageAnon(page)) {
/*
* If this page may have been pinned by the parent process,
* copy the page immediately for the child so that we'll always
* guarantee the pinned page won't be randomly replaced in the
* future.
*/
- return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
- addr, rss, prealloc, page);
+ get_page(page);
+ if (unlikely(page_try_dup_anon_rmap(page, false, src_vma))) {
+ /* Page maybe pinned, we have to copy. */
+ put_page(page);
+ return copy_present_page(dst_vma, src_vma, dst_pte, src_pte,
+ addr, rss, prealloc, page);
+ }
+ rss[mm_counter(page)]++;
} else if (page) {
get_page(page);
- page_dup_rmap(page, false);
+ page_dup_file_rmap(page, false);
rss[mm_counter(page)]++;
}
diff --git a/mm/migrate.c b/mm/migrate.c
index 6c31ee1e1c9b..dd9ceb778e94 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -234,7 +234,7 @@ static bool remove_migration_pte(struct folio *folio,
if (folio_test_anon(folio))
hugepage_add_anon_rmap(new, vma, pvmw.address);
else
- page_dup_rmap(new, true);
+ page_dup_file_rmap(new, true);
set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
} else
#endif
--
2.35.1
next prev parent reply other threads:[~2022-04-28 8:35 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-28 8:34 [PATCH v4 00/17] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 01/17] mm/rmap: fix missing swap_free() in try_to_unmap() after arch_unmap_one() failed David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 02/17] mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 03/17] mm/memory: slightly simplify copy_present_pte() David Hildenbrand
2022-04-28 8:34 ` David Hildenbrand [this message]
2022-04-28 8:34 ` [PATCH v4 05/17] mm/rmap: convert RMAP flags to a proper distinct rmap_t type David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 06/17] mm/rmap: remove do_page_add_anon_rmap() David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 07/17] mm/rmap: pass rmap flags to hugepage_add_anon_rmap() David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 08/17] mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 09/17] mm/rmap: use page_move_anon_rmap() when reusing a mapped PageAnon() page exclusively David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 10/17] mm/huge_memory: remove outdated VM_WARN_ON_ONCE_PAGE from unmap_page() David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 11/17] mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 12/17] mm: remember exclusively mapped anonymous pages with PG_anon_exclusive David Hildenbrand
2022-04-29 18:26 ` Andrew Morton
2022-04-29 18:56 ` David Hildenbrand
2022-12-06 3:05 ` Miaohe Lin
2022-12-06 8:43 ` David Hildenbrand
2022-12-06 9:37 ` Miaohe Lin
2022-12-06 9:40 ` David Hildenbrand
2022-12-06 11:28 ` Miaohe Lin
2022-04-28 8:34 ` [PATCH v4 13/17] mm/rmap: fail try_to_migrate() early when setting a PMD migration entry fails David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 14/17] mm/gup: disallow follow_page(FOLL_PIN) David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 15/17] mm: support GUP-triggered unsharing of anonymous pages David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 16/17] mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page David Hildenbrand
2022-04-28 8:34 ` [PATCH v4 17/17] mm/gup: sanity-check with CONFIG_DEBUG_VM that anonymous pages are exclusive when (un)pinning David Hildenbrand
2022-04-28 8:37 ` [PATCH v4 00/17] mm: COW fixes part 2: reliable GUP pins of anonymous pages David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220428083441.37290-5-david@redhat.com \
--to=david@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ddutile@redhat.com \
--cc=guro@fb.com \
--cc=hch@lst.de \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.kravetz@oracle.com \
--cc=namit@vmware.com \
--cc=oded.gabbay@gmail.com \
--cc=oleg@redhat.com \
--cc=pedrodemargomes@gmail.com \
--cc=peterx@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=rppt@linux.ibm.com \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=zhangliang5@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).