All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@eu.citrix.com>
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: "Liu, Jinsong" <jinsong.liu@intel.com>,
	"xen-devel@lists.xensource.com" <xen-devel@lists.xensource.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>,
	"JBeulich@suse.com" <JBeulich@suse.com>
Subject: Re: [PATCH V3] X86/vMCE: handle broken page with regard to migration
Date: Tue, 20 Nov 2012 15:08:46 +0000	[thread overview]
Message-ID: <50AB9CFE.7000003@eu.citrix.com> (raw)
In-Reply-To: <1353344256.18229.129.camel@zakaz.uk.xensource.com>

On 19/11/12 16:57, Ian Campbell wrote:
> On Mon, 2012-11-19 at 15:29 +0000, George Dunlap wrote:
>> On 19/11/12 09:55, Ian Campbell wrote:
>>> If we get to this stage then haven't we either already sent something
>>> over the wire for this page or marked it as dirty when we tried and
>>> failed to send it?
>>>
>>> In the former case we don't care that the page is now broken on the
>>> source since the target has got a good pre-breakage copy.
>>>
>>> In the latter case could we not set a flag at the same time as we mark
>>> the page dirty which means "go round at least one more time"?
>> Yeah -- on the last iteration, the VM itself has to be paused; if any
>> pages get broken after that, it doesn't really matter, does it? The real
>> thing is to have a consistent "snapshot" of behavior.
>>
>> I guess the one potentially tricky case to worry about is whether to
>> deliver an MCE to the guest on restore.  Consider the following scenario:
>>
>> - Page A is modified (and marked dirty)
>> - VM paused for last iteration
>> - Page breaks, is marked broken in the p2m
>> - Save code sends page A
>>
>> In that case, the save code would send a "broken" page, and the restore
>> code would mark a page as broken, and we *would* want to deliver an MCE
>> on the far side.  But suppose the last two steps were reversed:
>>
>> - Page A modified
>> - VM paused for last iteration
>> - Save code sends page A
>> - Page breaks, marked broken in the p2m
>>
>> In that case, when the save code sends page A, it will send a good page;
>> there's no need to mark it broken, or to send the guest an MCE.
> I guess you'd want to err on the side of stopping using a good page, as
> opposed to continuing to use a bad page? i.e. its better to take a
> spurious vMCE than to not take an actual one.

While that's true, taking a spurious MCE means at very least one less 
page available to the guest to use (for HVM guests that haven't 
ballooned down, at least), and the unnecessary loss of the data in that 
page.

The problem I guess is that the save code at the moment has no way of 
distinguishing the following cases:
1. Marked broken after the last time I sent it, but before the VM was 
paused; but the page hasn't been written to
2. Marked broken after the VM was paused; page hadn't been written to
3. Marked broken after the VM was paused, but the page had been written to

In case 1, we definitely need to send a broken page; but the VM may have 
already received a vMCE.  In case 2, we don't need to send a broken page 
or a vMCE, while in #3 we need to do both.

On the other hand, the whole situation is hopefully rare enough that 
maybe we can just do the simple correct thing, even if it's a tiny bit 
sub-optimal.  In that case, assuming that spurious vMCEs aren't a 
problem (e.g., #1), I think we basically just need to see if the last 
iteration contains a broken page, and if so, send the guest a vMCE on 
resume.

Thoughts?

> I'm not actually sure what a guest does with a vMCE, I guess it does
> some sort of memory exchange to give the bad page back to the h/v and
> get a good page in return? If the hypervisor thinks the old page is ok
> rather than bad I guess it'll just put it in the free list instead of
> the bad list?

Yes, I'm pretty sure the hypervisor's accounting of broken pages is 
separate from guest p2m entries; I think if you mark a p2m entry broken, 
the hypervisor will just free the ram page that was mapped there before.

I think the guest just tries to recover gracefully when it gets a vMCE 
(e.g., by re-reading the page from disk or killing the process).  I 
don't think it asks the hypervisor for another page to replace it at 
this point.

  -George

  reply	other threads:[~2012-11-20 15:08 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-17  2:04 [PATCH V3] X86/vMCE: handle broken page with regard to migration Liu Jinsong
2012-11-16 18:19 ` Ian Jackson
2012-11-16 18:31   ` Liu, Jinsong
2012-11-19  9:55     ` Ian Campbell
2012-11-19 15:29       ` George Dunlap
2012-11-19 16:57         ` Ian Campbell
2012-11-20 15:08           ` George Dunlap [this message]
2012-11-20 17:08             ` Liu, Jinsong
2012-11-20 17:23               ` George Dunlap
2012-11-20 17:49                 ` Liu, Jinsong
2012-11-20 18:54               ` Liu, Jinsong
2012-11-21 11:07                 ` Ian Campbell
2012-11-21 11:18                   ` George Dunlap
2012-11-21 12:11                     ` Liu, Jinsong
2012-11-20 16:43           ` Liu, Jinsong
2012-11-20 16:29         ` Liu, Jinsong
2012-11-20 16:11       ` Liu, Jinsong
2012-11-20 17:48 ` George Dunlap
2012-11-20 18:13   ` Liu, Jinsong
2012-11-20 18:21     ` Ian Jackson
2012-11-20 18:39       ` Liu, Jinsong
2012-11-20 18:42         ` Ian Jackson
2012-11-20 19:07           ` Liu, Jinsong
2012-11-21 11:34           ` George Dunlap
2012-11-21 11:55             ` Ian Jackson
2012-11-21 12:11             ` Ian Campbell
2012-11-21 12:15               ` George Dunlap
2012-11-21 13:26               ` Liu, Jinsong
2012-11-21 13:37                 ` Jan Beulich
2012-11-22 11:23                   ` Liu, Jinsong
2012-11-21 13:59                 ` George Dunlap
2012-11-22 11:44                   ` Liu, Jinsong
2012-11-21 12:17 ` George Dunlap
2012-11-21 13:31   ` Liu, Jinsong
2012-11-22 12:37   ` Liu, Jinsong
2012-11-22 13:36     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50AB9CFE.7000003@eu.citrix.com \
    --to=george.dunlap@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=JBeulich@suse.com \
    --cc=jinsong.liu@intel.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.