From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Eduardo Habkost <ehabkost@redhat.com>,
Juan Quintela <quintela@redhat.com>,
qemu-devel@nongnu.org,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code
Date: Tue, 25 Feb 2020 09:27:40 -0500 [thread overview]
Message-ID: <20200225142740.GF113102@xz-x1> (raw)
In-Reply-To: <531232dc-33f9-2c37-41a6-ef3899abd11a@redhat.com>
On Tue, Feb 25, 2020 at 08:44:56AM +0100, David Hildenbrand wrote:
> On 24.02.20 23:49, Peter Xu wrote:
> > On Fri, Feb 21, 2020 at 05:42:04PM +0100, David Hildenbrand wrote:
> >> When we partially change mappings (esp., mmap over parts of an existing
> >> mmap like qemu_ram_remap() does) where we have a userfaultfd handler
> >> registered, the handler will implicitly be unregistered from the parts that
> >> changed.
> >>
> >> Trying to place pages onto mappings where there is no longer a handler
> >> registered will fail. Let's make sure that any waiter is woken up - we
> >> have to do that manually.
> >>
> >> Let's also document how UFFDIO_UNREGISTER will handle this scenario.
> >>
> >> This is mainly a preparation for RAM blocks with resizable allcoations,
> >> where the mapping of the invalid RAM range will change. The source will
> >> keep sending pages that are outside of the new (shrunk) RAM size. We have
> >> to treat these pages like they would have been migrated, but can
> >> essentially simply drop the content (ignore the placement error).
> >>
> >> Keep printing a warning on EINVAL, to avoid hiding other (programming)
> >> issues. ENOENT is unique.
> >>
> >> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>
> >> Cc: Peter Xu <peterx@redhat.com>
> >> Cc: Andrea Arcangeli <aarcange@redhat.com>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >> migration/postcopy-ram.c | 37 +++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 37 insertions(+)
> >>
> >> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> >> index c68caf4e42..f023830b9a 100644
> >> --- a/migration/postcopy-ram.c
> >> +++ b/migration/postcopy-ram.c
> >> @@ -506,6 +506,12 @@ static int cleanup_range(RAMBlock *rb, void *opaque)
> >> range_struct.start = (uintptr_t)host_addr;
> >> range_struct.len = length;
> >>
> >> + /*
> >> + * In case the mapping was partially changed since we enabled userfault
> >> + * (e.g., via qemu_ram_remap()), the userfaultfd handler was already removed
> >> + * for the mappings that changed. Unregistering will, however, still work
> >> + * and ignore mappings without a registered handler.
> >> + */
> >
> > Ideally we should still only unregister what we have registered.
> > After all we do have this information because we know what we
> > registered, we know what has unmapped (in your new resize() hook, when
> > postcopy_state==RUNNING).
>
> Not in the case of qemu_ram_remap(). And whatever you propose will
> require synchronization (see my other mail) and more complicated
> handling than this. uffd allows you to handle races with mmap changes in
> a very elegant way (e.g., -ENOENT, or unregisterignoring changed mappings).
All writers to the new postcopy_min_length should have BQL already.
The only left is the last cleanup_range() where we can take the BQL
for a while. However...
>
> >
> > An extreme example is when we register with pages in range [A, B),
> > then shrink it to [A, C), then we mapped something else within [C, B)
> > (note, with virtio-mem logically B can be very big and C can be very
> > small, it means [B, C) can cover quite some address space). Then if:
> >
> > - [C, B) memory type is not compatible with uffd, or
>
> That will never happen in the near future. Without resizable allocations:
> - All memory is either anonymous or from a single fd
>
> In addition, right now, only anonymous memory can be used for resizable
> RAM. However, with resizable allocations we could have:
> - All used_length memory is either anonymous or from a single fd
> - All remaining memory is either anonymous or from a single fd
>
> Everything else does not make any sense IMHO and I don't think this is
> relevant long term. You cannot arbitrarily map things into the
> used_length part of a RAMBlock. That would contradict to its page_size
> and its fd. E.g., you would break qemu_ram_remap().
... I think this persuaded me. :) You are right they can still be
protected until max_length with PROT_NONE. Would you mind add some of
the above into the comment above unregister of uffd?
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2020-02-25 14:28 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-21 16:41 [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 01/13] util: vfio-helpers: Factor out and fix processing of existing ram blocks David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 02/13] stubs/ram-block: Remove stubs that are no longer needed David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 03/13] numa: Teach ram block notifiers about resizeable ram blocks David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 04/13] numa: Make all callbacks of ram block notifiers optional David Hildenbrand
2020-02-21 16:41 ` [PATCH v2 05/13] migration/ram: Handle RAM block resizes during precopy David Hildenbrand
2020-02-24 22:27 ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 06/13] exec: Relax range check in ram_block_discard_range() David Hildenbrand
2020-02-24 22:27 ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 07/13] migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() David Hildenbrand
2020-02-24 22:28 ` Peter Xu
2020-02-21 16:41 ` [PATCH v2 08/13] migration/ram: Simplify host page handling in ram_load_postcopy() David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 09/13] migration/ram: Consolidate variable reset after placement " David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 10/13] migration/ram: Handle RAM block resizes during postcopy David Hildenbrand
2020-02-24 22:26 ` Peter Xu
2020-02-25 7:28 ` David Hildenbrand
2020-02-25 16:11 ` Peter Xu
2020-02-21 16:42 ` [PATCH v2 11/13] migration/multifd: Print used_length of memory block David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 12/13] migration/ram: Use offset_in_ramblock() in range checks David Hildenbrand
2020-02-21 16:42 ` [PATCH v2 13/13] migration/ram: Tolerate partially changed mappings in postcopy code David Hildenbrand
2020-02-24 22:49 ` Peter Xu
2020-02-25 7:44 ` David Hildenbrand
2020-02-25 14:27 ` Peter Xu [this message]
2020-02-25 15:37 ` Peter Xu
2020-02-21 18:04 ` [PATCH v2 00/13] migrate/ram: Fix resizing RAM blocks while migrating Peter Xu
2020-02-24 9:09 ` David Hildenbrand
2020-02-24 17:45 ` Peter Xu
2020-02-24 18:44 ` David Hildenbrand
2020-02-24 18:59 ` David Hildenbrand
2020-02-24 19:18 ` Peter Xu
2020-02-24 19:34 ` David Hildenbrand
2020-02-24 20:04 ` Peter Xu
2020-02-24 20:54 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200225142740.GF113102@xz-x1 \
--to=peterx@redhat.com \
--cc=aarcange@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).