From: Andrea Arcangeli <aarcange@redhat.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [PATCH] mm: Don't fault around userfaultfd-registered regions on reads
Date: Fri, 27 Nov 2020 19:33:39 -0500 [thread overview]
Message-ID: <X8Ga44uXHmzn/vB9@redhat.com> (raw)
In-Reply-To: <20201127122224.GX4327@casper.infradead.org>
Hello,
On Fri, Nov 27, 2020 at 12:22:24PM +0000, Matthew Wilcox wrote:
> On Thu, Nov 26, 2020 at 05:23:59PM -0500, Peter Xu wrote:
> > For wr-protected mode uffds, errornously fault in those pages around could lead
> > to threads accessing the pages without uffd server's awareness. For example,
> > when punching holes on uffd-wp registered shmem regions, we'll first try to
> > unmap all the pages before evicting the page cache but without locking the
> > page (please refer to shmem_fallocate(), where unmap_mapping_range() is called
> > before shmem_truncate_range()). When fault-around happens near a hole being
> > punched, we might errornously fault in the "holes" right before it will be
> > punched. Then there's a small window before the page cache was finally
> > dropped, and after the page will be writable again (NOTE: the uffd-wp protect
> > information is totally lost due to the pre-unmap in shmem_fallocate(), so the
> > page can be writable within the small window). That's severe data loss.
>
> Sounds like you have a missing page_mkwrite implementation.
If the real fault happened through the pagetable (which got dropped by
the hole punching), a "missing type" userfault would be delivered to
userland (because the pte would be none). Userland would invoke
UFFDIO_COPY with the UFFDIO_COPY_MODE_WP flag. Such flag would then
map the filled shmem page (not necessarily all zero and not
necessarily the old content before the hole punch) with _PAGE_RW not
set and _PAGE_UFFD_WP set, so the next write would also trigger a
wrprotect userfault (this is what the uffd-wp app expects).
filemap_map_pages doesn't notify userland when it fills a pte and it
will map again the page read-write.
However filemap_map_pages isn't capable to fill a hole and to undo the
hole punch, all it can do are minor faults to re-fill the ptes from a
not-yet-truncated inode page.
Now it would be ok if filemap_map_pages re-filled the ptes, after they
were zapped the first time by fallocate, and then the fallocate would
zap them again before truncating the page, but I don't see a second
pte zap... there's just the below single call of unmap_mapping_range:
if ((u64)unmap_end > (u64)unmap_start)
unmap_mapping_range(mapping, unmap_start,
1 + unmap_end - unmap_start, 0);
shmem_truncate_range(inode, offset, offset + len - 1);
So looking at filemap_map_pages in shmem, I'm really wondering if the
non-uffd case is correct or not.
Do we end up with ptes pointing to non pagecache, so then the virtual
mapping is out of sync also with read/write syscalls that will then
allocate another shmem page out of sync of the ptes?
If a real page fault happened instead of filemap_map_pages, the
shmem_fault() would block during fallocate punch hole by checking
inode->i_private, as the comment says:
* So refrain from
* faulting pages into the hole while it's being punched.
It's not immediately clear where filemap_map_pages refrains from
faulting pages into the hole while it's being punched... given it's
ignoring inode->i_private.
So I'm not exactly sure how shmem can safely faulted in through
filemap_map_pages, without going through shmem_fault... I suppose
shmem simply is unsafe to use filemap_map_pages and it'd require
a specific shmem_map_pages?
If only filemap_map_pages was refraining re-faulting ptes of a shmem
page that is about to be truncated (whose original ptes had _PAGE_RW
unset and _PAGE_UFFD_WP set) there would be no problem with the uffd
interaction. So a proper shmem_map_pages could co-exist with uffd, the
userfaultfd_armed check would be only an optimization but it wouldn't
be required to avoid userland memory corruption?
From 8c6fb1b7dde309f0c8b5666a8e13557ae46369b4 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@redhat.com>
Date: Fri, 27 Nov 2020 19:12:44 -0500
Subject: [PATCH 1/1] shmem: stop faulting in pages without checking
inode->i_private
Per shmem_fault comment shmem need to "refrain from faulting pages
into the hole while it's being punched" and to do so it must check
inode->i_private, which filemap_map_pages won't so it's unsafe to use
in shmem because it can leave ptes pointing to non-pagecache pages in
shmem backed vmas.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
mm/shmem.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 8e2b35ba93ad..f6f29b3e67c6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3942,7 +3942,6 @@ static const struct super_operations shmem_ops = {
static const struct vm_operations_struct shmem_vm_ops = {
.fault = shmem_fault,
- .map_pages = filemap_map_pages,
#ifdef CONFIG_NUMA
.set_policy = shmem_set_policy,
.get_policy = shmem_get_policy,
Thanks,
Andrea
next prev parent reply other threads:[~2020-11-28 0:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-26 22:23 [PATCH] mm: Don't fault around userfaultfd-registered regions on reads Peter Xu
2020-11-27 8:16 ` Mike Rapoport
2020-11-27 13:31 ` Peter Xu
2020-11-27 12:22 ` Matthew Wilcox
2020-11-27 13:51 ` Peter Xu
2020-11-28 0:33 ` Andrea Arcangeli [this message]
2020-11-28 15:29 ` Peter Xu
2020-12-01 19:08 ` Andrea Arcangeli
2020-12-01 21:31 ` Hugh Dickins
2020-12-01 23:42 ` Andrea Arcangeli
2020-11-27 17:00 ` David Hildenbrand
2020-11-27 18:28 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=X8Ga44uXHmzn/vB9@redhat.com \
--to=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.