Re: [PATCH v2 2/3] userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mike Rapoport <rppt@kernel.org>
To: Lokesh Gidra <lokeshgidra@google.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	selinux@vger.kernel.org, surenb@google.com,
	kernel-team@android.com, aarcange@redhat.com, peterx@redhat.com,
	david@redhat.com, axelrasmussen@google.com, bgeffon@google.com,
	willy@infradead.org, jannh@google.com, kaleshsingh@google.com,
	ngeoffray@google.com, timmurray@google.com
Subject: Re: [PATCH v2 2/3] userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx
Date: Wed, 7 Feb 2024 17:27:23 +0200	[thread overview]
Message-ID: <ZcOhW8NR9XWhVnKS@kernel.org> (raw)
In-Reply-To: <CA+EESO7RNn0aQhLxY+NDddNNNT6586qX08=rphU1-XmyoP5mZQ@mail.gmail.com>

On Mon, Feb 05, 2024 at 12:53:33PM -0800, Lokesh Gidra wrote:
> On Sun, Feb 4, 2024 at 2:27 AM Mike Rapoport <rppt@kernel.org> wrote:
> >
> > > 3) Based on [1] I see how mmap_changing helps in eliminating duplicate
> > > work (background copy) by uffd monitor, but didn't get if there is a
> > > correctness aspect too that I'm missing? I concur with Amit's point in
> > > [1] that getting -EEXIST when setting up the pte will avoid memory
> > > corruption, no?
> >
> > In the fork case without mmap_changing the child process may be get data or
> > zeroes depending on the race for mmap_lock between the fork and
> > uffdio_copy and -EEXIST is not enough for monitor to detect what was the
> > ordering between fork and uffdio_copy.
> 
> This is extremely helpful. IIUC, there is a window after mmap_lock
> (write-mode) is released and before the uffd monitor thread is
> notified of fork. In that window, the monitor doesn't know that fork
> has already happened. So, without mmap_changing it would have done
> background copy only in the parent, thereby causing data inconsistency
> between parent and child processes.

Yes.
 
> It seems to me that the correctness argument for mmap_changing is
> there in case of FORK event and REMAP when mremap is called with
> MREMAP_DONTUNMAP. In all other cases its only benefit is by avoiding
> unnecessary background copies, right?

Yes, I think you are right, but it's possible I've forgot some nasty race
that will need mmap_changing for other events.

> > > > > > > > > @@ -783,7 +788,9 @@ bool userfaultfd_remove(struct vm_area_struct *vma,
> > > > > > > > >               return true;
> > > > > > > > >
> > > > > > > > >       userfaultfd_ctx_get(ctx);
> > > > > > > > > +     down_write(&ctx->map_changing_lock);
> > > > > > > > >       atomic_inc(&ctx->mmap_changing);
> > > > > > > > > +     up_write(&ctx->map_changing_lock);
> > > > > > > > >       mmap_read_unlock(mm);
> > > > > > > > >
> > > > > > > > >       msg_init(&ewq.msg);
> > > > > >
> > > > > > If this happens in read mode, then why are you waiting for the readers
> > > > > > to leave?  Can't you just increment the atomic?  It's fine happening in
> > > > > > read mode today, so it should be fine with this new rwsem.
> > > > >
> > > > > It's been a while and the details are blurred now, but if I remember
> > > > > correctly, having this in read mode forced non-cooperative uffd monitor to
> > > > > be single threaded. If a monitor runs, say uffdio_copy, and in parallel a
> > > > > thread in the monitored process does MADV_DONTNEED, the latter will wait
> > > > > for userfaultfd_remove notification to be processed in the monitor and drop
> > > > > the VMA contents only afterwards. If a non-cooperative monitor would
> > > > > process notification in parallel with uffdio ops, MADV_DONTNEED could
> > > > > continue and race with uffdio_copy, so read mode wouldn't be enough.
> > > > >
> > > >
> > > > Right now this function won't stop to wait for readers to exit the
> > > > critical section, but with this change there will be a pause (since the
> > > > down_write() will need to wait for the readers with the read lock).  So
> > > > this is adding a delay in this call path that isn't necessary (?) nor
> > > > existed before.  If you have non-cooperative uffd monitors, then you
> > > > will have to wait for them to finish to mark the uffd as being removed,
> > > > where as before it was a fire & forget, this is now a wait to tell.
> > > >
> > > I think a lot will be clearer once we get a response to my questions
> > > above. IMHO not only this write-lock is needed here, we need to fix
> > > userfaultfd_remove() by splitting it into userfaultfd_remove_prep()
> > > and userfaultfd_remove_complete() (like all other non-cooperative
> > > operations) as well. This patch enables us to do that as we remove
> > > mmap_changing's dependency on mmap_lock for synchronization.
> >
> > The write-lock is not a requirement here for correctness and I don't see
> > why we would need userfaultfd_remove_prep().
> >
> > As I've said earlier, having a write-lock here will let CRIU to run
> > background copy in parallel with processing of uffd events, but I don't
> > feel strongly about doing it.
> >
> Got it. Anyways, such a change needn't be part of this patch, so I'm
> going to keep it unchanged.

You mean with a read lock?

-- 
Sincerely yours,
Mike.

next prev parent reply	other threads:[~2024-02-07 15:27 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 19:35 [PATCH v2 0/3] per-vma locks in userfaultfd Lokesh Gidra
2024-01-29 19:35 ` [PATCH v2 1/3] userfaultfd: move userfaultfd_ctx struct to header file Lokesh Gidra
2024-01-30  7:12   ` Mike Rapoport
2024-01-29 19:35 ` [PATCH v2 2/3] userfaultfd: protect mmap_changing with rw_sem in userfaulfd_ctx Lokesh Gidra
2024-01-29 21:00   ` Liam R. Howlett
2024-01-29 22:35     ` Lokesh Gidra
2024-01-30  3:46       ` Liam R. Howlett
2024-01-30  8:55         ` Mike Rapoport
2024-01-30 17:28           ` Liam R. Howlett
2024-01-31  2:24             ` Lokesh Gidra
2024-02-04 10:27               ` Mike Rapoport
2024-02-05 20:53                 ` Lokesh Gidra
2024-02-07 15:27                   ` Mike Rapoport [this message]
2024-02-07 20:24                     ` Lokesh Gidra
2024-02-12  8:14                       ` Mike Rapoport
2024-01-30  7:21   ` Mike Rapoport
2024-01-29 19:35 ` [PATCH v2 3/3] userfaultfd: use per-vma locks in userfaultfd operations Lokesh Gidra
2024-01-29 20:36   ` Liam R. Howlett
2024-01-29 20:52     ` Suren Baghdasaryan
2024-01-29 21:18       ` Liam R. Howlett
2024-01-30  0:28       ` Lokesh Gidra
2024-01-30  2:58         ` Liam R. Howlett
2024-01-31  2:49           ` Lokesh Gidra
2024-01-31 21:41             ` Liam R. Howlett
2024-02-05 21:46               ` Suren Baghdasaryan
2024-02-05 21:54                 ` Lokesh Gidra
2024-02-05 22:00                   ` Liam R. Howlett
2024-02-05 22:24                     ` Lokesh Gidra
2024-02-06 14:35                       ` Liam R. Howlett
2024-02-06 16:26                         ` Lokesh Gidra
2024-02-06 17:07                           ` Liam R. Howlett
2024-01-31  3:03           ` Suren Baghdasaryan
2024-01-31 21:43             ` Liam R. Howlett
2024-01-29 20:39 ` [PATCH v2 0/3] per-vma locks in userfaultfd Liam R. Howlett
2024-01-29 21:58   ` Lokesh Gidra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcOhW8NR9XWhVnKS@kernel.org \
    --to=rppt@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=bgeffon@google.com \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=kaleshsingh@google.com \
    --cc=kernel-team@android.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lokeshgidra@google.com \
    --cc=ngeoffray@google.com \
    --cc=peterx@redhat.com \
    --cc=selinux@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).