From: Peter Xu <peterx@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "Axel Rasmussen" <axelrasmussen@google.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Alexey Dobriyan" <adobriyan@gmail.com>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Anshuman Khandual" <anshuman.khandual@arm.com>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Chinwen Chang" <chinwen.chang@mediatek.com>,
"Huang Ying" <ying.huang@intel.com>,
"Ingo Molnar" <mingo@redhat.com>, "Jann Horn" <jannh@google.com>,
"Jerome Glisse" <jglisse@redhat.com>,
"Lokesh Gidra" <lokeshgidra@google.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
"Michael Ellerman" <mpe@ellerman.id.au>,
"Michal Koutný" <mkoutny@suse.com>,
"Michel Lespinasse" <walken@google.com>,
"Mike Rapoport" <rppt@linux.vnet.ibm.com>,
"Nicholas Piggin" <npiggin@gmail.com>, "Shaohua Li" <shli@fb.com>,
"Shawn Anastasio" <shawn@anastas.io>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Steven Price" <steven.price@arm.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, "Adam Ruprecht" <ruprecht@google.com>,
"Cannon Matthews" <cannonmatthews@google.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
"David Rientjes" <rientjes@google.com>,
"Oliver Upton" <oupton@google.com>
Subject: Re: [PATCH v3 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp
Date: Mon, 1 Feb 2021 18:21:55 -0500 [thread overview]
Message-ID: <20210201232155.GL260413@xz-x1> (raw)
In-Reply-To: <bdac0f96-1d6a-6450-c58a-6902d985e3e0@oracle.com>
On Mon, Feb 01, 2021 at 02:33:20PM -0800, Mike Kravetz wrote:
> On 1/28/21 2:48 PM, Axel Rasmussen wrote:
> > From: Peter Xu <peterx@redhat.com>
> >
> > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
> > userfaultfd-wp is always based on pgtable entries, so they cannot be shared.
> >
> > Walk the hugetlb range and unshare all such mappings if there is, right before
> > UFFDIO_REGISTER will succeed and return to userspace.
> >
> > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing
> > is completely disabled for userfaultfd-wp registered range.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
> > ---
> > fs/userfaultfd.c | 45 ++++++++++++++++++++++++++++++++++++
> > include/linux/mmu_notifier.h | 1 +
> > 2 files changed, 46 insertions(+)
> >
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index 894cc28142e7..2c6706ac2504 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -15,6 +15,7 @@
> > #include <linux/sched/signal.h>
> > #include <linux/sched/mm.h>
> > #include <linux/mm.h>
> > +#include <linux/mmu_notifier.h>
> > #include <linux/poll.h>
> > #include <linux/slab.h>
> > #include <linux/seq_file.h>
> > @@ -1190,6 +1191,47 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
> > }
> > }
> >
> > +/*
> > + * This function will unconditionally remove all the shared pmd pgtable entries
> > + * within the specific vma for a hugetlbfs memory range.
> > + */
> > +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > + struct hstate *h = hstate_vma(vma);
> > + unsigned long sz = huge_page_size(h);
> > + struct mm_struct *mm = vma->vm_mm;
> > + struct mmu_notifier_range range;
> > + unsigned long address;
> > + spinlock_t *ptl;
> > + pte_t *ptep;
> > +
>
> Perhaps we should add a quick to see if vma is sharable. Might be as
> simple as !(vma->vm_flags & VM_MAYSHARE). I see a comment/question in
> a later patch about only doing minor fault processing on shared mappings.
Yes, that comment was majorly about shmem though - I believe shared case should
still be the major one, especially for hugetlbfs.
So what I was thinking is something like: one non-uffd process use shared
mapping of the file, meanwhile the other uffd process used private mapping on
the same file. When the uffd process access page it could fault in the page
cache and continued by UFFDIO_CONTINUE, however when it writes it'll COW into
private pages. Something like that. Not sure whether it's useful, but I just
don't see why we should block that case.
>
> Code below looks fine, but it would be a wast to do all that for a vma
> that could not be shared.
Right, still better to check it.
Mike, I agree with all your comments on the initial 4 patches, thanks for the
input! To make Axel's life easier, I've modified them locally and pushed since
after all I'll do it in my series too (I also picked Mike's r-b on patch 3):
https://github.com/xzpeter/linux/commits/uffd-wp-shmem-hugetlbfs
Axel, feel free to fetch from it directly.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2021-02-01 23:23 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-28 22:48 [PATCH v3 0/9] userfaultfd: add minor fault handling Axel Rasmussen
2021-01-28 22:48 ` [PATCH v3 1/9] hugetlb: Pass vma into huge_pte_alloc() Axel Rasmussen
2021-01-28 23:42 ` [PATCH v4 " Axel Rasmussen
2021-02-01 21:38 ` Mike Kravetz
2021-02-01 21:53 ` Mike Kravetz
2021-02-01 22:16 ` Peter Xu
2021-01-28 22:48 ` [PATCH v3 2/9] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Axel Rasmussen
2021-02-01 22:01 ` Mike Kravetz
2021-01-28 22:48 ` [PATCH v3 3/9] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Axel Rasmussen
2021-02-01 22:09 ` Mike Kravetz
2021-01-28 22:48 ` [PATCH v3 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Axel Rasmussen
2021-02-01 22:33 ` Mike Kravetz
2021-02-01 23:21 ` Peter Xu [this message]
2021-01-28 22:48 ` [PATCH v3 5/9] userfaultfd: add minor fault registration mode Axel Rasmussen
2021-02-01 18:31 ` Peter Xu
2021-02-02 17:15 ` Peter Xu
2021-02-03 18:20 ` Axel Rasmussen
2021-01-28 22:48 ` [PATCH v3 6/9] userfaultfd: disable huge PMD sharing for MINOR registered VMAs Axel Rasmussen
2021-01-28 22:48 ` [PATCH v3 7/9] userfaultfd: add UFFDIO_CONTINUE ioctl Axel Rasmussen
2021-02-01 19:21 ` Peter Xu
2021-02-01 22:11 ` Axel Rasmussen
2021-02-01 22:40 ` Peter Xu
2021-02-01 23:42 ` Mike Kravetz
2021-02-01 22:41 ` Lokesh Gidra
2021-01-28 22:48 ` [PATCH v3 8/9] userfaultfd: update documentation to describe minor fault handling Axel Rasmussen
2021-02-01 20:06 ` Peter Xu
2021-02-02 23:07 ` Axel Rasmussen
2021-01-28 22:48 ` [PATCH v3 9/9] userfaultfd/selftests: add test exercising " Axel Rasmussen
2021-02-01 19:33 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210201232155.GL260413@xz-x1 \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=adobriyan@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=axelrasmussen@google.com \
--cc=cannonmatthews@google.com \
--cc=catalin.marinas@arm.com \
--cc=chinwen.chang@mediatek.com \
--cc=dgilbert@redhat.com \
--cc=jannh@google.com \
--cc=jglisse@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=oupton@google.com \
--cc=rientjes@google.com \
--cc=rostedt@goodmis.org \
--cc=rppt@linux.vnet.ibm.com \
--cc=ruprecht@google.com \
--cc=shawn@anastas.io \
--cc=shli@fb.com \
--cc=steven.price@arm.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).