From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
Miaohe Lin <linmiaohe@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
Muchun Song <songmuchun@bytedance.com>,
Linux MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
Date: Wed, 13 Jul 2022 15:46:06 -0700 [thread overview]
Message-ID: <CAJHvVcgqb6R6ePMbgmA8LpMpMgrGWie9ZCTuR4MA77GTvi4XAw@mail.gmail.com> (raw)
In-Reply-To: <Ys7uXHdzzFezUub5@xz-m1.local>
I think there is a small mistake in this patch.
Consider the non-minor-fault case. We have this block:
/* Add shared, newly allocated pages to the page cache. */
if (vm_shared && !is_continue) {
/* ... */
}
In here, we've added the newly allocated page to the page cache, and
we've set this page_in_pagecache flag to true. But we *do not* setup
rmap for this page in this block. I think in this case, the patch will
cause us to do the wrong thing: we should hugepage_add_new_anon_rmap()
further down, but with this patch we dup instead.
On Wed, Jul 13, 2022 at 9:10 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jul 13, 2022 at 10:24:09AM -0400, Peter Xu wrote:
> > On Tue, Jul 12, 2022 at 10:39:20AM -0700, Mike Kravetz wrote:
> > > On 07/12/22 21:05, Miaohe Lin wrote:
> > > > In MCOPY_ATOMIC_CONTINUE case with a non-shared VMA, pages in the page
> > > > cache are installed in the ptes. But hugepage_add_new_anon_rmap is called
> > > > for them mistakenly because they're not vm_shared. This will corrupt the
> > > > page->mapping used by page cache code.
> > > >
> > > > Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
> > > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > > > ---
> > > > mm/hugetlb.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > This looks correct to me.
> > >
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > >
> > > However, I am having a hard time wrapping my head around how UFFDIO_CONTINUE
> > > should work on non-anon private mappings. For example, a private mapping of
> > > a hugetlbfs file. I think we just map the page in the file/cache and do not
> > > set the write bit in the pte. So, yes we would want page_dup_file_rmap()
> > > in this case as shown below.
> > >
> > > Adding Axel and Peter on Cc: as they were more involved in adding that code
> > > and the design of UFFDIO_CONTINUE.
> >
> > Yes the change makes sense to me too. There's just one thing to check on
> > whether minor mode should support private mappings at all as it's probably
> > not in the major goal of when it's proposed.
> >
> > I don't see why it can't logically, but I think we should have failed the
> > uffdio-register already somewhere before when the vma was private and
> > registered with minor mode. It's just that I cannot quickly find it in the
> > code anywhere.. ideally it should be checked in vma_can_userfault() but it
> > seems not.
> >
> > Axel?
> >
> > PS: the minor mode man page update seems to be still missing.
>
> Oh I should have done a pull first on the man-page repo..
>
> From the man page indeed I didn't see anything mentioned on not allowing
> private mappings. There's the example given on using two mappings for
> modifying pages but logically that applies to private mappings too - we
> could have mapped the uffd region with private mappings but the other one
> shared, then we can modify page caches but later after pte installed it'll
> trigger cow for writes.
>
> So I think we need to confirm whether private mappings are supported. If
> no, we should be crystal clear in both the code and man page (we probably
> want a follow up patch to man-page to mention that too?). If yes, we'll
> need Miaohe's patch and also make sure they're enabled in the current code
> path. We'll also want to set test_uffdio_minor=1 for "hugetlb" test case
> in the userfaultfd kselftest (currently it's not there).
So, originally when I proposed minor fault handling, I was expecting
to only support VM_SHARED VMAs. Indeed, I too have a hard time
imagining how one might benefit from using it with a private mapping.
If my memory serves this restriction was relaxed due to feedback on
the original RFC proposal [1], essentially on the basis of: why make
it more restrictive than it needs to be? Since all we need for a minor
fault to happen is for the pages to be in the page cache, that's the
only restriction we should have.
I don't see why it shouldn't work in principle though. Imagine a
scenario where the VM guest's mapping is private, and the memory
manager's mapping is shared. I guess in this case, say for a write
from the guest:
1. The guest will generate a minor fault
2. The memory manager can modify the page via its shared mapping, and
the guest will see those changes
3. After UFFDIO_CONTINUE resolves the fault, the page is CoW-ed, and
the memory manager can no longer see the guest's version of the page
I'm not really sure *why* you'd want to do this, but it seems like it
should work.
[1]: https://patchwork.kernel.org/project/linux-mm/patch/20210107190453.3051110-2-axelrasmussen@google.com/
>
> --
> Peter Xu
>
next prev parent reply other threads:[~2022-07-13 22:46 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-12 13:05 [PATCH] mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte Miaohe Lin
2022-07-12 17:39 ` Mike Kravetz
2022-07-13 2:10 ` Miaohe Lin
2022-07-13 14:24 ` Peter Xu
2022-07-13 16:10 ` Peter Xu
2022-07-13 22:46 ` Axel Rasmussen [this message]
2022-07-13 23:36 ` Mike Kravetz
2022-07-14 0:20 ` Axel Rasmussen
2022-07-14 10:09 ` Miaohe Lin
2022-07-14 15:45 ` Peter Xu
2022-07-15 2:50 ` Miaohe Lin
2022-07-13 17:23 ` Andrew Morton
2022-07-14 9:59 ` Miaohe Lin
2022-07-14 15:52 ` Peter Xu
2022-07-15 3:56 ` Miaohe Lin
2022-07-15 12:35 ` Peter Xu
2022-07-15 16:45 ` Axel Rasmussen
2022-07-15 17:07 ` Peter Xu
2022-07-15 17:28 ` Axel Rasmussen
2022-07-15 17:39 ` Peter Xu
2022-07-15 17:51 ` Axel Rasmussen
2022-07-16 1:32 ` Miaohe Lin
2022-07-15 17:29 ` Mike Kravetz
2022-07-15 17:38 ` Peter Xu
2022-07-16 23:06 ` Andrew Morton
2022-07-18 2:25 ` Miaohe Lin
2022-07-18 18:07 ` Axel Rasmussen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJHvVcgqb6R6ePMbgmA8LpMpMgrGWie9ZCTuR4MA77GTvi4XAw@mail.gmail.com \
--to=axelrasmussen@google.com \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).