* Stability of migration?
@ 2006-06-14 0:32 John Byrne
2006-06-14 1:12 ` John Byrne
2006-06-14 7:24 ` Keir Fraser
0 siblings, 2 replies; 3+ messages in thread
From: John Byrne @ 2006-06-14 0:32 UTC (permalink / raw)
To: xen-devel
Hi,
With xen-unstable changset 10333:360f9dc71f51, live migration is not
reliable. Migrating an active domain (I use a kernel build in my test)
back and forth between two machines will result in the build or the
domain crashing. I tweaked xc_linux_save.c to enable the verify pass
without outputting all the debugging messages and I can see that one or
two pages do not get a data match in the log.
I have yet to see a failure of the domain with non-live migration, but I
sometimes see a data mismatch on a page during the verification. Which
would indicate that either suspend doesn't mean what I think it does or
pages of a suspended VM are being altered when they shouldn't be.
So, I guess I'll start with the easy question: should non-live migration
ever have a page fail to verify? If not, how can I identify the source
of the problem?
The harder question: how to identify the source of the corruption in
live migration?
Thanks,
John Byrne
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Stability of migration?
2006-06-14 0:32 Stability of migration? John Byrne
@ 2006-06-14 1:12 ` John Byrne
2006-06-14 7:24 ` Keir Fraser
1 sibling, 0 replies; 3+ messages in thread
From: John Byrne @ 2006-06-14 1:12 UTC (permalink / raw)
To: xen-devel
I should have made clear I am testing on x86_64.
John
John Byrne wrote:
>
> Hi,
>
> With xen-unstable changset 10333:360f9dc71f51, live migration is not
> reliable. Migrating an active domain (I use a kernel build in my test)
> back and forth between two machines will result in the build or the
> domain crashing. I tweaked xc_linux_save.c to enable the verify pass
> without outputting all the debugging messages and I can see that one or
> two pages do not get a data match in the log.
>
> I have yet to see a failure of the domain with non-live migration, but I
> sometimes see a data mismatch on a page during the verification. Which
> would indicate that either suspend doesn't mean what I think it does or
> pages of a suspended VM are being altered when they shouldn't be.
>
> So, I guess I'll start with the easy question: should non-live migration
> ever have a page fail to verify? If not, how can I identify the source
> of the problem?
>
> The harder question: how to identify the source of the corruption in
> live migration?
>
> Thanks,
>
> John Byrne
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Stability of migration?
2006-06-14 0:32 Stability of migration? John Byrne
2006-06-14 1:12 ` John Byrne
@ 2006-06-14 7:24 ` Keir Fraser
1 sibling, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2006-06-14 7:24 UTC (permalink / raw)
To: John Byrne; +Cc: xen-devel
On 14 Jun 2006, at 01:32, John Byrne wrote:
> So, I guess I'll start with the easy question: should non-live
> migration ever have a page fail to verify? If not, how can I identify
> the source of the problem?
They are probably pages shared with backend drivers (xenstore, blkback,
netback, etc.). Since domain teardown is asynchronous, those backend
drivers may still have those pages mapped and be able to update them
while the save is in progress. It's harmless (but of course false
positives on the verify test are rather annoying!).
-- Keir
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-06-14 7:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-14 0:32 Stability of migration? John Byrne
2006-06-14 1:12 ` John Byrne
2006-06-14 7:24 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.