From: Mike Rapoport <rppt@kernel.org>
To: David Hildenbrand <david@redhat.com>
Cc: Nadav Amit <nadav.amit@gmail.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Xu <peterx@redhat.com>, Linux-MM <linux-mm@kvack.org>
Subject: Re: userfaultfd: usability issue due to lack of UFFD events ordering
Date: Mon, 31 Jan 2022 20:47:16 +0200 [thread overview]
Message-ID: <YfgutA6FYwu7RyJP@kernel.org> (raw)
In-Reply-To: <a7660987-23d1-d550-5315-7f24c1b27076@redhat.com>
On Mon, Jan 31, 2022 at 03:41:05PM +0100, David Hildenbrand wrote:
> On 31.01.22 15:28, Mike Rapoport wrote:
> > On Mon, Jan 31, 2022 at 03:12:36PM +0100, David Hildenbrand wrote:
> >> On 31.01.22 15:05, Mike Rapoport wrote:
> >>> On Mon, Jan 31, 2022 at 11:48:27AM +0100, David Hildenbrand wrote:
> >>>> On 31.01.22 11:42, Mike Rapoport wrote:
> >>>>> Hi Nadav,
> >>>>>
> >>>>> On Sat, Jan 29, 2022 at 10:23:55PM -0800, Nadav Amit wrote:
> >>>>>> Using userfautlfd and looking at the kernel code, I encountered a usability
> >>>>>> issue that complicates userspace UFFD-monitor implementation. I obviosuly
> >>>>>> might be wrong, so I would appreciate a (polite?) feedback. I do have a
> >>>>>> userspace workaround, but I thought it is worthy to share and to hear your
> >>>>>> opinion, as well as feedback from other UFFD users.
> >>>>>>
> >>>>>> The issue I encountered regards the ordering of UFFD events tbat might not
> >>>>>> reflect the actual order in which events took place.
> >>>>>>
> >>>>>> In more detail, UFFD events (e.g., unmap, fork) are not ordered against
> >>>>>> themselves [*]. The mm-lock is dropped before notifying the userspace
> >>>>>> UFFD-monitor, and therefore there is no guarantee as to whether the order of
> >>>>>> the events actually reflects the order in which the events took place.
> >>>>>> This can prevent a UFFD-monitor from using the events to track which
> >>>>>> ranges are mapped. Specifically, UFFD_EVENT_FORK message and a
> >>>>>> UFFD_EVENT_UNMAP message (which reflects unmap in the parent process) can
> >>>>>> be reordered, if the events are triggered by two different threads. In
> >>>>>> this case the UFFD-monitor cannot figure from the events whether the
> >>>>>> child process has the unmapped memory range still mapped (because fork
> >>>>>> happened first) or not.
> >>>>>
> >>>>> Yeah, it seems that something like this is possible:
> >>>>>
> >>>>>
> >>>>> fork() munmap()
> >>>>> mmap_write_unlock();
> >>>>> mmap_write_lock_killable();
> >>>>> do_things();
> >>>>> mmap_{read,write}_unlock();
> >>>>> userfaultfd_unmap_complete();
> >>>>> dup_userfaultfd_complete();
> >>>>>
> >>>>
> >>>> I was thinking about other possible races, e.g., MADV_DONTNEED/MADV_FREE
> >>>> racing with UFFD_EVENT_PAGEFAULT -- where we only hold the mmap_lock in
> >>>> read mode. But not sure if they apply.
> >>>
> >>> The userspace can live with these, at least for uffd missing page faults.
> >>> If the monitor will try to resolve a page fault for a removed area, the
> >>> errno from UFFDIO_COPY/ZERO can be used to detect such races.
> >>
> >> I was wondering if the monitor could get confused if he just resolved a
> >> page fault via UFFDIO_COPY/ZERO and then receives a REMOVE event.
> >
> > And why would it be confused?
>
> My thinking was that the monitor might use REMOVE events to track which
> pages are actually populated. If you receive REMOVE after
> UFFDIO_COPY/ZERO the monitor would conclude that the page is not
> populated, just like if we'd get the MADV_DONTNEED/MADV_REMOVE
> immediately after placing a page.
I still don't follow your usecase.
In CRIU we simply discard whatever content we had to fill when there is
REMOVE event. If a page fault occurs in that region we use UFFDIO_ZEROPAGE,
just as it would happen in "normal" page fault processing
(note, CRIU does not support uffd with hugetlb or shmem)
> Of course, it heavily depends on the target use case in the monitor or I
> might just be wrong.
>
> --
> Thanks,
>
> David / dhildenb
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2022-01-31 18:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-30 6:23 userfaultfd: usability issue due to lack of UFFD events ordering Nadav Amit
2022-01-31 10:42 ` Mike Rapoport
2022-01-31 10:48 ` David Hildenbrand
2022-01-31 14:05 ` Mike Rapoport
2022-01-31 14:12 ` David Hildenbrand
2022-01-31 14:28 ` Mike Rapoport
2022-01-31 14:41 ` David Hildenbrand
2022-01-31 18:47 ` Mike Rapoport [this message]
2022-01-31 22:39 ` Nadav Amit
2022-02-01 9:10 ` Mike Rapoport
2022-02-10 7:48 ` Peter Xu
2022-02-10 18:42 ` Nadav Amit
2022-02-14 4:02 ` Peter Xu
2022-02-15 22:35 ` Nadav Amit
2022-02-16 8:27 ` Peter Xu
2022-02-17 21:15 ` Mike Rapoport
2022-01-31 17:23 ` Nadav Amit
2022-01-31 17:28 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YfgutA6FYwu7RyJP@kernel.org \
--to=rppt@kernel.org \
--cc=aarcange@redhat.com \
--cc=david@redhat.com \
--cc=linux-mm@kvack.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.