All of lore.kernel.org
 help / color / mirror / Atom feed
* Stability of migration?
@ 2006-06-14  0:32 John Byrne
  2006-06-14  1:12 ` John Byrne
  2006-06-14  7:24 ` Keir Fraser
  0 siblings, 2 replies; 3+ messages in thread
From: John Byrne @ 2006-06-14  0:32 UTC (permalink / raw)
  To: xen-devel


Hi,

With xen-unstable changset 10333:360f9dc71f51, live migration is not 
reliable. Migrating an active domain (I use a kernel build in my test) 
back and forth between two machines will result in the build or the 
domain crashing. I tweaked xc_linux_save.c to enable the verify pass 
without outputting all the debugging messages and I can see that one or 
two pages do not get a data match in the log.

I have yet to see a failure of the domain with non-live migration, but I 
sometimes see a data mismatch on a page during the verification. Which 
would indicate that either suspend doesn't mean what I think it does or 
pages of a suspended VM are being altered when they shouldn't be.

So, I guess I'll start with the easy question: should non-live migration 
ever have a page fail to verify? If not, how can I identify the source 
of the problem?

The harder question: how to identify the source of the corruption in 
live migration?

Thanks,

John Byrne

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Stability of migration?
  2006-06-14  0:32 Stability of migration? John Byrne
@ 2006-06-14  1:12 ` John Byrne
  2006-06-14  7:24 ` Keir Fraser
  1 sibling, 0 replies; 3+ messages in thread
From: John Byrne @ 2006-06-14  1:12 UTC (permalink / raw)
  To: xen-devel

I should have made clear I am testing on x86_64.

John

John Byrne wrote:
> 
> Hi,
> 
> With xen-unstable changset 10333:360f9dc71f51, live migration is not 
> reliable. Migrating an active domain (I use a kernel build in my test) 
> back and forth between two machines will result in the build or the 
> domain crashing. I tweaked xc_linux_save.c to enable the verify pass 
> without outputting all the debugging messages and I can see that one or 
> two pages do not get a data match in the log.
> 
> I have yet to see a failure of the domain with non-live migration, but I 
> sometimes see a data mismatch on a page during the verification. Which 
> would indicate that either suspend doesn't mean what I think it does or 
> pages of a suspended VM are being altered when they shouldn't be.
> 
> So, I guess I'll start with the easy question: should non-live migration 
> ever have a page fail to verify? If not, how can I identify the source 
> of the problem?
> 
> The harder question: how to identify the source of the corruption in 
> live migration?
> 
> Thanks,
> 
> John Byrne
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Stability of migration?
  2006-06-14  0:32 Stability of migration? John Byrne
  2006-06-14  1:12 ` John Byrne
@ 2006-06-14  7:24 ` Keir Fraser
  1 sibling, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2006-06-14  7:24 UTC (permalink / raw)
  To: John Byrne; +Cc: xen-devel


On 14 Jun 2006, at 01:32, John Byrne wrote:

> So, I guess I'll start with the easy question: should non-live 
> migration ever have a page fail to verify? If not, how can I identify 
> the source of the problem?

They are probably pages shared with backend drivers (xenstore, blkback, 
netback, etc.). Since domain teardown is asynchronous, those backend 
drivers may still have those pages mapped and be able to update them 
while the save is in progress. It's harmless (but of course false 
positives on the verify test are rather annoying!).

  -- Keir

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-06-14  7:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-14  0:32 Stability of migration? John Byrne
2006-06-14  1:12 ` John Byrne
2006-06-14  7:24 ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.