From: Peter Xu <peterx@redhat.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Nadav Amit <nadav.amit@gmail.com>,
Matthew Wilcox <willy@infradead.org>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>,
Jerome Glisse <jglisse@redhat.com>,
"Kirill A . Shutemov" <kirill@shutemov.name>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Alistair Popple <apopple@nvidia.com>,
peterx@redhat.com
Subject: [PATCH v8 08/23] mm/shmem: Allow uffd wr-protect none pte for file-backed mem
Date: Mon, 4 Apr 2022 21:48:50 -0400 [thread overview]
Message-ID: <20220405014850.14352-1-peterx@redhat.com> (raw)
In-Reply-To: <20220405014646.13522-1-peterx@redhat.com>
File-backed memory differs from anonymous memory in that even if the pte is
missing, the data could still resides either in the file or in page/swap cache.
So when wr-protect a pte, we need to consider none ptes too.
We do that by installing the uffd-wp pte markers when necessary. So when
there's a future write to the pte, the fault handler will go the special path
to first fault-in the page as read-only, then report to userfaultfd server with
the wr-protect message.
On the other hand, when unprotecting a page, it's also possible that the pte
got unmapped but replaced by the special uffd-wp marker. Then we'll need to be
able to recover from a uffd-wp pte marker into a none pte, so that the next
access to the page will fault in correctly as usual when accessed the next
time.
Special care needs to be taken throughout the change_protection_range()
process. Since now we allow user to wr-protect a none pte, we need to be able
to pre-populate the page table entries if we see (!anonymous && MM_CP_UFFD_WP)
requests, otherwise change_protection_range() will always skip when the pgtable
entry does not exist.
For example, the pgtable can be missing for a whole chunk of 2M pmd, but the
page cache can exist for the 2M range. When we want to wr-protect one 4K page
within the 2M pmd range, we need to pre-populate the pgtable and install the
pte marker showing that we want to get a message and block the thread when the
page cache of that 4K page is written. Without pre-populating the pmd,
change_protection() will simply skip that whole pmd.
Note that this patch only covers the small pages (pte level) but not covering
any of the transparent huge pages yet. That will be done later, and this patch
will be a preparation for it too.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/mprotect.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 62 insertions(+), 2 deletions(-)
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 709a6f73b764..bd62d5938c6c 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -30,6 +30,7 @@
#include <linux/mm_inline.h>
#include <linux/pgtable.h>
#include <linux/sched/sysctl.h>
+#include <linux/userfaultfd_k.h>
#include <asm/cacheflush.h>
#include <asm/mmu_context.h>
#include <asm/tlbflush.h>
@@ -188,8 +189,16 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
newpte = pte_swp_mksoft_dirty(newpte);
if (pte_swp_uffd_wp(oldpte))
newpte = pte_swp_mkuffd_wp(newpte);
- } else if (is_pte_marker_entry(entry)) {
- /* Skip it, the same as none pte */
+ } else if (pte_marker_entry_uffd_wp(entry)) {
+ /*
+ * If this is uffd-wp pte marker and we'd like
+ * to unprotect it, drop it; the next page
+ * fault will trigger without uffd trapping.
+ */
+ if (uffd_wp_resolve) {
+ pte_clear(vma->vm_mm, addr, pte);
+ pages++;
+ }
continue;
} else {
newpte = oldpte;
@@ -204,6 +213,20 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
set_pte_at(vma->vm_mm, addr, pte, newpte);
pages++;
}
+ } else {
+ /* It must be an none page, or what else?.. */
+ WARN_ON_ONCE(!pte_none(oldpte));
+ if (unlikely(uffd_wp && !vma_is_anonymous(vma))) {
+ /*
+ * For file-backed mem, we need to be able to
+ * wr-protect a none pte, because even if the
+ * pte is none, the page/swap cache could
+ * exist. Doing that by install a marker.
+ */
+ set_pte_at(vma->vm_mm, addr, pte,
+ make_pte_marker(PTE_MARKER_UFFD_WP));
+ pages++;
+ }
}
} while (pte++, addr += PAGE_SIZE, addr != end);
arch_leave_lazy_mmu_mode();
@@ -237,6 +260,39 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
return 0;
}
+/* Return true if we're uffd wr-protecting file-backed memory, or false */
+static inline bool
+uffd_wp_protect_file(struct vm_area_struct *vma, unsigned long cp_flags)
+{
+ return (cp_flags & MM_CP_UFFD_WP) && !vma_is_anonymous(vma);
+}
+
+/*
+ * If wr-protecting the range for file-backed, populate pgtable for the case
+ * when pgtable is empty but page cache exists. When {pte|pmd|...}_alloc()
+ * failed it means no memory, we don't have a better option but stop.
+ */
+#define change_pmd_prepare(vma, pmd, cp_flags) \
+ do { \
+ if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \
+ if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \
+ break; \
+ } \
+ } while (0)
+/*
+ * This is the general pud/p4d/pgd version of change_pmd_prepare(). We need to
+ * have separate change_pmd_prepare() because pte_alloc() returns 0 on success,
+ * while {pmd|pud|p4d}_alloc() returns the valid pointer on success.
+ */
+#define change_prepare(vma, high, low, addr, cp_flags) \
+ do { \
+ if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \
+ low##_t *p = low##_alloc(vma->vm_mm, high, addr); \
+ if (WARN_ON_ONCE(p == NULL)) \
+ break; \
+ } \
+ } while (0)
+
static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
pud_t *pud, unsigned long addr, unsigned long end,
pgprot_t newprot, unsigned long cp_flags)
@@ -255,6 +311,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
next = pmd_addr_end(addr, end);
+ change_pmd_prepare(vma, pmd, cp_flags);
/*
* Automatic NUMA balancing walks the tables with mmap_lock
* held for read. It's possible a parallel update to occur
@@ -320,6 +377,7 @@ static inline unsigned long change_pud_range(struct vm_area_struct *vma,
pud = pud_offset(p4d, addr);
do {
next = pud_addr_end(addr, end);
+ change_prepare(vma, pud, pmd, addr, cp_flags);
if (pud_none_or_clear_bad(pud))
continue;
pages += change_pmd_range(vma, pud, addr, next, newprot,
@@ -340,6 +398,7 @@ static inline unsigned long change_p4d_range(struct vm_area_struct *vma,
p4d = p4d_offset(pgd, addr);
do {
next = p4d_addr_end(addr, end);
+ change_prepare(vma, p4d, pud, addr, cp_flags);
if (p4d_none_or_clear_bad(p4d))
continue;
pages += change_pud_range(vma, p4d, addr, next, newprot,
@@ -365,6 +424,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma,
inc_tlb_flush_pending(mm);
do {
next = pgd_addr_end(addr, end);
+ change_prepare(vma, pgd, p4d, addr, cp_flags);
if (pgd_none_or_clear_bad(pgd))
continue;
pages += change_p4d_range(vma, pgd, addr, next, newprot,
--
2.32.0
next prev parent reply other threads:[~2022-04-05 1:51 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-05 1:46 [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs Peter Xu
2022-04-05 1:46 ` [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry Peter Xu
2022-04-12 1:07 ` Alistair Popple
2022-04-12 19:45 ` Peter Xu
2022-04-13 0:30 ` Alistair Popple
2022-04-13 13:44 ` Peter Xu
2022-04-19 8:25 ` Alistair Popple
2022-04-19 19:44 ` Peter Xu
2022-04-05 1:48 ` [PATCH v8 02/23] mm: Teach core mm about pte markers Peter Xu
2022-04-12 1:22 ` Alistair Popple
2022-04-12 19:53 ` Peter Xu
2022-04-05 1:48 ` [PATCH v8 03/23] mm: Check against orig_pte for finish_fault() Peter Xu
2022-04-12 2:05 ` Alistair Popple
2022-04-12 19:54 ` Peter Xu
[not found] ` <CGME20220413140330eucas1p167da41e079712b829ef8237dc27b049c@eucas1p1.samsung.com>
2022-04-13 14:03 ` Marek Szyprowski
2022-04-13 16:43 ` Peter Xu
2022-04-14 7:51 ` Marek Szyprowski
2022-04-14 16:30 ` Peter Xu
2022-04-14 20:57 ` Andrew Morton
2022-04-14 21:08 ` Peter Xu
2022-04-15 14:21 ` Guenter Roeck
2022-04-15 14:41 ` Peter Xu
2022-04-05 1:48 ` [PATCH v8 04/23] mm/uffd: PTE_MARKER_UFFD_WP Peter Xu
2022-04-06 1:41 ` kernel test robot
2022-04-05 1:48 ` [PATCH v8 05/23] mm/shmem: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2022-04-05 1:48 ` [PATCH v8 06/23] mm/shmem: Handle uffd-wp special pte in page fault handler Peter Xu
2022-05-11 16:30 ` David Hildenbrand
2022-05-12 16:34 ` Peter Xu
2022-04-05 1:48 ` [PATCH v8 07/23] mm/shmem: Persist uffd-wp bit across zapping for file-backed Peter Xu
2022-04-05 1:48 ` Peter Xu [this message]
2022-04-05 1:48 ` [PATCH v8 09/23] mm/shmem: Allows file-back mem to be uffd wr-protected on thps Peter Xu
2022-04-05 1:48 ` [PATCH v8 10/23] mm/shmem: Handle uffd-wp during fork() Peter Xu
2022-04-06 6:16 ` kernel test robot
2022-04-06 12:18 ` Peter Xu
2022-04-05 1:48 ` [PATCH v8 11/23] mm/hugetlb: Introduce huge pte version of uffd-wp helpers Peter Xu
2022-04-05 1:49 ` [PATCH v8 12/23] mm/hugetlb: Hook page faults for uffd write protection Peter Xu
2022-04-05 1:49 ` [PATCH v8 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Peter Xu
2022-04-05 1:49 ` [PATCH v8 14/23] mm/hugetlb: Handle UFFDIO_WRITEPROTECT Peter Xu
2022-04-05 1:49 ` [PATCH v8 15/23] mm/hugetlb: Handle pte markers in page faults Peter Xu
2022-04-06 13:37 ` kernel test robot
2022-04-06 15:02 ` Peter Xu
2022-04-05 1:49 ` [PATCH v8 16/23] mm/hugetlb: Allow uffd wr-protect none ptes Peter Xu
2022-04-05 1:49 ` [PATCH v8 17/23] mm/hugetlb: Only drop uffd-wp special pte if required Peter Xu
2022-04-05 1:49 ` [PATCH v8 18/23] mm/hugetlb: Handle uffd-wp during fork() Peter Xu
2022-04-05 1:49 ` [PATCH v8 19/23] mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered Peter Xu
2022-04-05 1:49 ` [PATCH v8 20/23] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Peter Xu
2022-04-05 1:49 ` [PATCH v8 21/23] mm/uffd: Enable write protection for shmem & hugetlbfs Peter Xu
2022-04-05 1:49 ` [PATCH v8 22/23] mm: Enable PTE markers by default Peter Xu
2022-04-19 15:13 ` Johannes Weiner
2022-04-19 19:59 ` Peter Xu
2022-04-19 20:14 ` Johannes Weiner
2022-04-19 20:28 ` Peter Xu
2022-04-19 21:24 ` Johannes Weiner
2022-04-19 22:01 ` Peter Xu
2022-04-20 13:46 ` Johannes Weiner
2022-04-20 14:25 ` Peter Xu
2022-04-05 1:49 ` [PATCH v8 23/23] selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Peter Xu
2022-04-05 22:16 ` [PATCH v8 00/23] userfaultfd-wp: Support shmem and hugetlbfs Andrew Morton
2022-04-05 22:42 ` Peter Xu
2022-04-05 22:49 ` Andrew Morton
2022-04-05 23:02 ` Peter Xu
2022-04-05 23:08 ` Andrew Morton
2022-05-10 19:05 ` Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220405014850.14352-1-peterx@redhat.com \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jglisse@redhat.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=nadav.amit@gmail.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).