Re: [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@linux.ibm.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Pavel Emelyanov <xemul@virtuozzo.com>,
	Andrei Vagin <avagin@virtuozzo.com>
Subject: Re: [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races
Date: Sun, 6 Dec 2020 11:37:03 +0200	[thread overview]
Message-ID: <20201206093703.GY123287@linux.ibm.com> (raw)
In-Reply-To: <31DA12CC-E9CC-497D-A2EE-B83549D95CE8@gmail.com>

Hello Nadav,

On Thu, Dec 03, 2020 at 11:57:46AM -0800, Nadav Amit wrote:
> Hello Mike,
> 
> Regarding your (old) patch:
> 
> > On May 23, 2018, at 12:42 AM, Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> > 
> > If a process monitored with userfaultfd changes it's memory mappings or
> > forks() at the same time as uffd monitor fills the process memory with
> > UFFDIO_COPY, the actual creation of page table entries and copying of the
> > data in mcopy_atomic may happen either before of after the memory mapping
> > modifications and there is no way for the uffd monitor to maintain
> > consistent view of the process memory layout.
> > 
> > For instance, let's consider fork() running in parallel with
> > userfaultfd_copy():
> > 
> > process        		         |	uffd monitor
> > ---------------------------------+------------------------------
> > fork()        		         | userfaultfd_copy()
> > ...        		         | ...
> >    dup_mmap()        	         |     down_read(mmap_sem)
> >    down_write(mmap_sem)         |     /* create PTEs, copy data */
> >        dup_uffd()               |     up_read(mmap_sem)
> >        copy_page_range()        |
> >        up_write(mmap_sem)       |
> >        dup_uffd_complete()      |
> >            /* notify monitor */ |
> > 
> > If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will be
> > present by the time copy_page_range() is called and they will appear in the
> > child's memory mappings. However, if the fork() is the first to take the
> > mmap_sem, the new pages won't be mapped in the child's address space.
> > 
> > Since userfaultfd monitor has no way to determine what was the order, let's
> > disallow userfaultfd_copy in parallel with the non-cooperative events. In
> > such case we return -EAGAIN and the uffd monitor can understand that
> > userfaultfd_copy() clashed with a non-cooperative event and take an
> > appropriate action.
> 
> I am struggling to understand this patch and would appreciate your
> assistance.
 
The tl;dr version is that without this commit we had failing fork tests
in CRIU [1] :)

> Specifically, I have two questions:
> 
> 1. How can memory corruption occur? If the page is already mapped and the
> handler “mistakenly" calls userfaultfd_copy(), wouldn't mcopy_atomic_pte()
> return -EEXIST once it sees the PTE already exists? In such case, I would
> presume that the handler should be able to recover gracefully by waking the
> faulting thread.
 
The issue we had was when fork() in a monitored process happened
concurrently with "background copy" of pages into the process address
space during a post-copy process migration.

The userspace has no way to tell who won the race for mmap_lock and
depending on that we can have two different cases:

* fork() took the mmap_lock, pages in the parent are still empty, so
they will be empty in the child

* userfaultfd_copy() won the lock, there is data in the parent and the
child's inherits these mappings

The uffd monotor should *know* what is the state of child's memory and
without this patch it could only guess.

> 2. How is memory ordering supposed to work here? IIUC, mmap_changing is not
> protected by any lock and there are no memory barriers that are associated
> with the assignment. Indeed, the code calls WRITE_ONCE()/READ_ONCE(), but
> AFAIK this does not guarantee ordering with non-volatile reads/writes.

There is also mmap_lock involved, so I don't see how copy can start in
parallel with fork processing. Fork sets mmap_chaning to true while
holding mmap_lock, so copy cannot start in parallel. When mmap_lock is
realeased, mmap_chaning remains true until fork event is pushed to
userspace and when this is done there is no issue with
userfaultfd_copy.

Maybe I am missing something...

[1] https://github.com/checkpoint-restore/criu/blob/criu-dev/test/zdtm/transition/fork.c

> Thanks,
> Nadav

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2020-12-06  9:37 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-23  7:42 [PATCH] userfaultfd: prevent non-cooperative events vs mcopy_atomic races Mike Rapoport
2018-05-24 11:24 ` Pavel Emelyanov
2018-05-24 11:56   ` Mike Rapoport
2018-05-24 16:40     ` Pavel Emelyanov
2018-05-24 19:06       ` Mike Rapoport
2018-05-25 14:05         ` Pavel Emelyanov
2020-12-03 19:57 ` Nadav Amit
2020-12-06  9:37   ` Mike Rapoport [this message]
2020-12-07  4:31     ` Nadav Amit
2020-12-08  8:34       ` Mike Rapoport
2020-12-08  8:57         ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201206093703.GY123287@linux.ibm.com \
    --to=rppt@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=nadav.amit@gmail.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=xemul@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.