From: Mike Rapoport <rppt@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>,
'Andrew Morton' <akpm@linux-foundation.org>,
linux-mm@kvack.org, 'Mike Kravetz' <mike.kravetz@oracle.com>,
"'Dr. David Alan Gilbert'" <dgilbert@redhat.com>,
'Shaohua Li' <shli@fb.com>,
'Pavel Emelyanov' <xemul@virtuozzo.com>
Subject: Re: [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults
Date: Sun, 20 Nov 2016 14:10:51 +0200 [thread overview]
Message-ID: <20161120121050.GC32009@rapoport-lnx> (raw)
In-Reply-To: <20161118003734.GC10229@redhat.com>
On Fri, Nov 18, 2016 at 01:37:34AM +0100, Andrea Arcangeli wrote:
> Hello,
>
> I found a minor issue with the non cooperative testcase, sometime an
> userfault would trigger in between UFFD_EVENT_MADVDONTNEED and
> UFFDIO_UNREGISTER:
>
> case UFFD_EVENT_MADVDONTNEED:
> uffd_reg.range.start = msg.arg.madv_dn.start;
> uffd_reg.range.len = msg.arg.madv_dn.end -
> msg.arg.madv_dn.start;
> if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_reg.range))
>
> It always triggered at the nr == 0:
>
> for (nr = 0; nr < nr_pages; nr++) {
> if (my_bcmp(area_dst + nr * page_size, zeropage, page_size))
>
> The userfault still pending after UFFDIO_UNREGISTER returned, lead to
> poll() getting a UFFD_EVENT_PAGEFAULT and trying to do a UFFDIO_COPY
> into the unregistered range, which gracefully results in -EINVAL.
>
> So this could be all handled in userland, by storing the MADV_DONTNEED
> range and calling UFFDIO_WAKE instead of UFFDIO_COPY... but I think
> it's more reliable to fix it into the kernel.
>
> If a pending userfault happens before UFFDIO_UNREGISTER it'll just
> behave like if it happened after.
>
> I also noticed the order of uffd notification of MADV_DONTNEED and the
> pagetable zap was wrong, we've to notify userland first so it won't
> risk to call UFFDIO_COPY while the process runs zap_page_range.
>
> With the two patches appended below the -EINVAL error out of
> UFFDIO_COPY is gone.
>
> From fc27d209e566d95e8ae0eb83a703aa4e02316b4c Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Thu, 17 Nov 2016 20:15:50 +0100
> Subject: [PATCH 1/2] userfaultfd: non-cooperative: avoid MADV_DONTNEED race
> condition
>
> MADV_DONTNEED must be notified to userland before the pages are
> zapped. This allows userland to immediately stop adding pages to the
> userfaultfd ranges before the pages are actually zapped or there could
> be non-zeropage leftovers as result of concurrent UFFDIO_COPY run in
> between zap_page_range and madvise_userfault_dontneed (both
> MADV_DONTNEED and UFFDIO_COPY runs under the mmap_sem for reading, so
> they can run concurrently).
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> ---
> mm/madvise.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 7168bc6..4d4c7f8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -476,8 +476,8 @@ static long madvise_dontneed(struct vm_area_struct *vma,
> if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
> return -EINVAL;
>
> - zap_page_range(vma, start, end - start, NULL);
> madvise_userfault_dontneed(vma, prev, start, end);
> + zap_page_range(vma, start, end - start, NULL);
> return 0;
> }
>
>
>
> From 18e7b30cf82c927af4c0323a6caac20184a03ff4 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@redhat.com>
> Date: Thu, 17 Nov 2016 20:20:40 +0100
> Subject: [PATCH 2/2] userfaultfd: non-cooperative: wake userfaults after
> UFFDIO_UNREGISTER
>
> Userfaults may still happen after the userfaultfd monitor thread
> received a UFFD_EVENT_MADVDONTNEED until UFFDIO_UNREGISTER is run.
>
> Wake any pending userfault within UFFDIO_UNREGISTER protected by the
> mmap_sem for writing, so they will not be reported to userland leading
> to UFFDIO_COPY returning -EINVAL (as the range was already
> unregistered) and they will not hang permanently either.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> ---
> fs/userfaultfd.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 2b75fab..42168d3 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1267,6 +1267,19 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
> start = vma->vm_start;
> vma_end = min(end, vma->vm_end);
>
> + if (userfaultfd_missing(vma)) {
> + /*
> + * Wake any concurrent pending userfault while
> + * we unregister, so they will not hang
> + * permanently and it avoids userland to call
> + * UFFDIO_WAKE explicitly.
> + */
> + struct userfaultfd_wake_range range;
> + range.start = start;
> + range.len = vma_end - start;
> + wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
> + }
> +
> new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
> prev = vma_merge(mm, prev, start, vma_end, new_flags,
> vma->anon_vma, vma->vm_file, vma->vm_pgoff,
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-20 12:11 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-02 19:33 [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 01/33] userfaultfd: document _IOR/_IOW Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 02/33] userfaultfd: correct comment about UFFD_FEATURE_PAGEFAULT_FLAG_WP Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 03/33] userfaultfd: convert BUG() to WARN_ON_ONCE() Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 04/33] userfaultfd: use vma_is_anonymous Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 05/33] userfaultfd: non-cooperative: Split the find_userfault() routine Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 06/33] userfaultfd: non-cooperative: Add ability to report non-PF events from uffd descriptor Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 07/33] userfaultfd: non-cooperative: report all available features to userland Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 08/33] userfaultfd: non-cooperative: Add fork() event Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 09/33] userfaultfd: non-cooperative: Add fork() event, build warning fix Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 10/33] userfaultfd: non-cooperative: dup_userfaultfd: use mm_count instead of mm_users Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 11/33] userfaultfd: non-cooperative: Add mremap() event Andrea Arcangeli
2016-11-03 7:41 ` Hillf Danton
2016-11-03 17:52 ` Mike Rapoport
2016-11-04 15:40 ` Mike Rapoport
2016-11-02 19:33 ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Andrea Arcangeli
2016-11-03 8:01 ` Hillf Danton
2016-11-03 17:24 ` Mike Rapoport
2016-11-04 16:40 ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED requestg Andrea Arcangeli
2016-11-04 15:42 ` [PATCH 12/33] userfaultfd: non-cooperative: Add madvise() event for MADV_DONTNEED request Mike Rapoport
2016-11-02 19:33 ` [PATCH 13/33] userfaultfd: hugetlbfs: add copy_huge_page_from_user for hugetlb userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 14/33] userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for " Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 15/33] userfaultfd: hugetlbfs: add __mcopy_atomic_hugetlb for huge page UFFDIO_COPY Andrea Arcangeli
2016-11-03 10:15 ` Hillf Danton
2016-11-03 17:33 ` Mike Kravetz
2016-11-03 19:14 ` Mike Kravetz
2016-11-04 6:43 ` Hillf Danton
2016-11-04 19:36 ` Andrea Arcangeli
2016-11-04 20:34 ` Mike Kravetz
2016-11-08 21:06 ` Mike Kravetz
2016-11-16 18:28 ` Andrea Arcangeli
2016-11-16 18:53 ` Mike Kravetz
2016-11-17 15:40 ` Andrea Arcangeli
2016-11-17 19:26 ` Mike Kravetz
2016-11-18 0:05 ` Andrea Arcangeli
2016-11-18 5:52 ` Mike Kravetz
2016-11-22 1:16 ` Mike Kravetz
2016-11-23 6:38 ` Hillf Danton
2016-12-15 19:02 ` Andrea Arcangeli
2016-12-16 3:54 ` Hillf Danton
2016-11-17 19:41 ` Mike Kravetz
2016-11-04 16:35 ` Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 16/33] userfaultfd: hugetlbfs: add userfaultfd hugetlb hook Andrea Arcangeli
2016-11-04 7:02 ` Hillf Danton
2016-11-02 19:33 ` [PATCH 17/33] userfaultfd: hugetlbfs: allow registration of ranges containing huge pages Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 18/33] userfaultfd: hugetlbfs: add userfaultfd_hugetlb test Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 19/33] userfaultfd: hugetlbfs: userfaultfd_huge_must_wait for hugepmd ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 20/33] userfaultfd: introduce vma_can_userfault Andrea Arcangeli
2016-11-04 7:39 ` Hillf Danton
2016-11-02 19:33 ` [PATCH 21/33] userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 22/33] userfaultfd: shmem: introduce vma_is_shmem Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 23/33] userfaultfd: shmem: add tlbflush.h header for microblaze Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 24/33] userfaultfd: shmem: use shmem_mcopy_atomic_pte for shared memory Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults Andrea Arcangeli
2016-11-04 8:59 ` Hillf Danton
2016-11-04 14:53 ` Mike Rapoport
2016-11-04 15:44 ` Mike Rapoport
2016-11-04 16:56 ` Andrea Arcangeli
2016-11-18 0:37 ` Andrea Arcangeli
2016-11-20 12:10 ` Mike Rapoport [this message]
2016-11-02 19:33 ` [PATCH 26/33] userfaultfd: shmem: allow registration of shared memory ranges Andrea Arcangeli
2016-11-02 19:33 ` [PATCH 27/33] userfaultfd: shmem: add userfaultfd_shmem test Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 28/33] userfaultfd: shmem: lock the page before adding it to pagecache Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 29/33] userfaultfd: shmem: avoid leaking blocks and used blocks in UFFDIO_COPY Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 30/33] userfaultfd: non-cooperative: selftest: introduce userfaultfd_open Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 31/33] userfaultfd: non-cooperative: selftest: add ufd parameter to copy_page Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 32/33] userfaultfd: non-cooperative: selftest: add test for FORK, MADVDONTNEED and REMAP events Andrea Arcangeli
2016-11-02 19:34 ` [PATCH 33/33] mm: mprotect: use pmd_trans_unstable instead of taking the pmd_lock Andrea Arcangeli
2016-11-02 20:07 ` [PATCH 00/33] userfaultfd tmpfs/hugetlbfs/non-cooperative Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161120121050.GC32009@rapoport-lnx \
--to=rppt@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dgilbert@redhat.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=shli@fb.com \
--cc=xemul@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).