Re: [PATCH v13 0/5] UFFD write-tracking migration/snapshots

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
Cc: Juan Quintela <quintela@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org, Den Lunev <den@openvz.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH v13 0/5] UFFD write-tracking migration/snapshots
Date: Thu, 11 Feb 2021 12:32:15 -0500	[thread overview]
Message-ID: <20210211173215.GC157159@xz-x1> (raw)
In-Reply-To: <d7dcfbf8-8f33-065b-47a9-802952732218@virtuozzo.com>

On Thu, Feb 11, 2021 at 07:19:47PM +0300, Andrey Gruzdev wrote:
> On 09.02.2021 22:06, David Hildenbrand wrote:
> > > > Hi,
> > > > 
> > > > just stumbled over this, quick question:
> > > > 
> > > > I recently played with UFFD_WP and notices that write protection is
> > > > only effective on pages/ranges that have already pages populated (IOW:
> > > > !pte_none() in the kernel).
> > > > 
> > > > In case memory was never populated (or was discarded using e.g.,
> > > > madvice(DONTNEED)), write-protection will be skipped silently and you
> > > > won't get WP events for applicable pages.
> > > > 
> > > > So if someone writes to a yet unpoupulated page ("zero"), you won't
> > > > get WP events.
> > > > 
> > > > I can spot that you do a single uffd_change_protection() on the whole
> > > > RAMBlock.
> > > > 
> > > > How are you handling that scenario, or why don't you have to handle
> > > > that scenario?
> > > > 
> > > Hi David,
> > > 
> > > I really wonder if such a problem exists.. If we are talking about a
> > 
> > I immediately ran into this issue with my simplest test cases. :)
> > 
> > > write to an unpopulated page, we should get first page fault on
> > > non-present page and populate it with protection bits from
> > > respective vma.
> > > For UFFD_WP vma's  page will be populated non-writable. So we'll get
> > > another page fault on present but read-only page and go to
> > > handle_userfault.
> > 
> > See the attached test program. Triggers for me on 5.11.0-rc6+ and
> > 5.9.13-200.fc33
> > 
> > gcc -lpthread uffdio_wp.c -o uffdio_wp
> > ./uffdio_wp
> > WP did not fire
> > 
> > Uncomment the placement of the zeropage just before registering to make
> > the WP actually trigger. If there is no PTE, there is nothing to
> > protect.
> > 
> > 
> > And it makes sense: How should the fault handler know which ranges you
> > wp-ed, if there is no place to store that information (the PTEs!). The
> > VMA cannot tell that story, it only knows that someone registered
> > UFFD_WP to selectively wp some parts.
> > 
> > You might have to register also for MISSING faults and place zero pages.
> > 
> Looked at the kernel code, agree that we miss WP events for unpopulated
> pages, UFFD_WP softbit won't be set in this case. But it doesn't make saved
> snapshot inconsistent or introduce security issues. The only side effect is
> that we may save updated page instead of zeroed, just increasing snapshot
> size. However this guest-physical page has never been touched from the point
> of view of saved vCPU/device state and is not a concern.

Oh I just remembered one thing, that Linux should be zeroing pages when
allocating, so even if the page has legacy content it'll be cleared with
__GFP_ZERO allocations.  So yeah it would be harder to have issue at least with
a sensible OS.  I'm not sure about Windows or others, but it could be a common
case.  Then the only overhead is the extra pages we kept in the live snapshot,
which takes some more disk space.

Or there could be firmware running without OS at all, but it should really not
read unallocated pages assuming there must be zero.  It's not a sane behavior
even for a firmware.

> 
> Often (at least on desktop Windows guests) only a small part of RAM has ever
> been allocated by guest. Migration code needs to read each guest-physical
> page, so we'll have a lot of additional UFFD events, much more MISSING
> events then WP-faults.
> 
> And the main problem is that adding MISSING handler is impossible in current
> single-threaded snapshot code. We'll get an immediate deadlock on iterative
> page read.

Right.  We'll need to rework the design but just for saving a bunch of snapshot
image disk size.  So now I agree with you, let's keep this in mind, but maybe
it isn't worth a fix for now, at least until we figure something really broken.

Andrey, do you think we should still mention this issue into the todo list of
the wiki page of live snapshot?

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2021-02-11 18:24 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 15:24 [PATCH v13 0/5] UFFD write-tracking migration/snapshots andrey.gruzdev--- via
2021-01-21 15:24 ` [PATCH v13 1/5] migration: introduce 'background-snapshot' migration capability andrey.gruzdev--- via
2021-01-21 15:24 ` [PATCH v13 2/5] migration: introduce UFFD-WP low-level interface helpers andrey.gruzdev--- via
2021-01-21 15:24 ` [PATCH v13 3/5] migration: support UFFD write fault processing in ram_save_iterate() andrey.gruzdev--- via
2021-01-21 15:24 ` [PATCH v13 4/5] migration: implementation of background snapshot thread andrey.gruzdev--- via
2021-01-28 18:29   ` Dr. David Alan Gilbert
2021-01-29  8:17     ` Andrey Gruzdev
2021-01-21 15:24 ` [PATCH v13 5/5] migration: introduce 'userfaultfd-wrlat.py' script andrey.gruzdev--- via
2021-02-09 12:37 ` [PATCH v13 0/5] UFFD write-tracking migration/snapshots David Hildenbrand
2021-02-09 18:38   ` Andrey Gruzdev
2021-02-09 19:06     ` David Hildenbrand
2021-02-09 20:09       ` Peter Xu
2021-02-09 20:31         ` Peter Xu
2021-02-11  9:21           ` Andrey Gruzdev
2021-02-11 17:18             ` Peter Xu
2021-02-11 18:15               ` Andrey Gruzdev
2021-02-11 16:19       ` Andrey Gruzdev
2021-02-11 17:32         ` Peter Xu [this message]
2021-02-11 18:28           ` Andrey Gruzdev
2021-02-11 19:01             ` David Hildenbrand
2021-02-11 20:31               ` Peter Xu
2021-02-11 20:44                 ` David Hildenbrand
2021-02-11 21:05                   ` Peter Xu
2021-02-11 21:09                     ` David Hildenbrand
2021-02-12  3:06                       ` Peter Xu
2021-02-12  8:52                         ` David Hildenbrand
2021-02-12 16:11                           ` Peter Xu
2021-02-13  9:34                             ` Andrey Gruzdev
2021-02-13 10:30                               ` David Hildenbrand
2021-02-16 23:35                               ` Peter Xu
2021-02-17 10:31                                 ` David Hildenbrand
2021-02-19  6:57                                 ` Andrey Gruzdev
2021-02-19  7:45                                   ` David Hildenbrand
2021-02-19 20:50                                   ` Peter Xu
2021-02-19 21:10                                     ` Peter Xu
2021-02-19 21:14                                       ` David Hildenbrand
2021-02-19 21:20                                         ` David Hildenbrand
2021-02-19 22:47                                           ` Peter Xu
2021-02-20  7:59                                             ` David Hildenbrand
2021-02-22 17:29                                               ` Peter Xu
2021-02-22 17:33                                                 ` David Hildenbrand
2021-02-22 17:54                                                   ` Peter Xu
2021-02-22 18:11                                                     ` David Hildenbrand
2021-02-24 16:56                                                       ` Andrey Gruzdev
2021-02-24 17:01                                                         ` David Hildenbrand
2021-02-24 17:52                                                           ` Andrey Gruzdev
2021-02-24 16:43                                     ` Andrey Gruzdev
2021-02-24 16:54                                       ` David Hildenbrand
2021-02-24 17:00                                         ` Andrey Gruzdev
2021-02-11 19:21     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210211173215.GC157159@xz-x1 \
    --to=peterx@redhat.com \
    --cc=andrey.gruzdev@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).