Re: Migration memory corruption - PV backends need to quiesce

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Vrabel <david.vrabel@citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>, Tim Deegan <tim@xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	Xen-devel List <xen-devel@lists.xen.org>,
	Paul Durrant <Paul.Durrant@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: Migration memory corruption - PV backends need to quiesce
Date: Mon, 30 Jun 2014 12:57:16 +0100	[thread overview]
Message-ID: <53B1509C.3070604@citrix.com> (raw)
In-Reply-To: <1404126466.14488.9.camel@kazak.uk.xensource.com>

On 30/06/14 12:07, Ian Campbell wrote:
> On Mon, 2014-06-30 at 12:52 +0200, Tim Deegan wrote:
>> At 12:14 +0200 on 30 Jun (1404126862), Tim Deegan wrote:
>>> At 10:47 +0100 on 30 Jun (1404121679), David Vrabel wrote:
>>>> Shared ring updates are strictly ordered with respect to the writes to
>>>> data pages (either via grant map or grant copy).  This means that is the
>>>> guest sees a response in the ring it is guaranteed that all writes to
>>>> the associated pages are also present.
>>>
>>> Is the ring update also strictly ordered wrt the grant unmap operation?
>>>
>>>> The write of the response and the write of the producer index are
>>>> strictly ordered.  If the backend is in the process of writing a
>>>> response and the page is saved then the partial (corrupt) response is
>>>> not visible to the guest.  The write of the producer index is atomic so
>>>> the saver cannot see a partial producer index write.
>>>
>>> Yes.  The (suggested) problem is that live migration does not preserve
>>> that write ordering.  So we have to worry about something like this:
>>>
>>> 1. Toolstack pauses the domain for the final pass.  Reads the final
>>>    LGD bitmap, which happens to include the shared ring but not the
>>>    data pages.
>>> 2. Backend writes the data.
>>> 3. Backend unmaps the data page, marking it dirty.
>>> 4. Backend writes the ring.
>>> 5. Toolstack sends the ring page across in the last pass.
>>> 6. Guest resumes, seeing the I/O marked as complete, but without the
>>>    data.
>>
>> It occurs to me that the guest should be able to defend against this
>> by taking a local copy of the response producer before migration and
>> using _that_ for the replay logic afterwards.  That is guaranteed to
>> exclude any I/O that completed after the VM was paused, and as long as
>> the unmap is guaranteed to happen before the ring update, we're OK.
> 
> AIUI blkfront at least maintains it's own shadow copy of the ring at all
> times, and the recovery process doesn't use the migrated copy of the
> ring at all (at least not the responses). I might be misunderstanding
> the code there though.

Yes, this is what it looks like to me.  netfront is also safe since it
just discards everything and doesn't replay at all.

This does rather feel like we're discovering problems that were
identified (and fixed) a long time ago.

I think there needs to be better documentation of front/backend drivers
so people know how to write them correctly.  I'll try to carve out some
time for this if no one else volunteers...

Perhaps all we need is a clear statement of: The contents of shared ring
are undefined on resume.  A frontend driver must not use the shared ring
to replay any requests.

>> (That still leaves the question that Andrew raised of memcpy()
>> breaking atomicity/ordering of updates.)
> 
> That's the memcpy in the migration code vs the definitely correctly
> ordered updates done by the b.e., right?

But if the frontend doesn't use the shared ring for replay, it doesn't
matter if this memcpy is correctly ordered or not?

David

next prev parent reply	other threads:[~2014-06-30 11:57 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-27 16:51 Migration memory corruption - PV backends need to quiesce Andrew Cooper
2014-06-27 17:28 ` David Vrabel
2014-06-27 18:15   ` Tim Deegan
2014-06-27 18:37     ` Andrew Cooper
2014-06-30  8:38 ` Ian Campbell
2014-06-30  9:02   ` Andrew Cooper
2014-06-30  9:21     ` Ian Campbell
2014-06-30  9:46       ` Andrew Cooper
2014-06-30  9:52         ` Ian Campbell
2014-06-30 10:13           ` Andrew Cooper
2014-06-30  9:47   ` David Vrabel
2014-06-30  9:53     ` Ian Campbell
2014-07-01 10:29       ` David Vrabel
2014-07-02 10:02         ` Ian Campbell
2014-07-02 10:03           ` David Vrabel
2014-06-30 10:14     ` Tim Deegan
2014-06-30 10:24       ` Ian Campbell
2014-06-30 10:52       ` Tim Deegan
2014-06-30 11:07         ` Ian Campbell
2014-06-30 11:12           ` Tim Deegan
2014-06-30 11:57           ` David Vrabel [this message]
2014-06-30 12:20             ` Ian Campbell
2014-06-30 11:01       ` Ian Campbell
2014-06-30 11:08         ` Tim Deegan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53B1509C.3070604@citrix.com \
    --to=david.vrabel@citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=Paul.Durrant@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=tim@xen.org \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.