* Domain save/migrate issue
@ 2006-02-14 13:45 Noam Taich
0 siblings, 0 replies; 13+ messages in thread
From: Noam Taich @ 2006-02-14 13:45 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1077 bytes --]
As it stands now, when a domain is saved to disk it is suspended in the
process, and then destroyed. This is done by the checkpointing code in
Xend.
However, it's a good idea to leave the option open to the user whether
or not the saved domain should be kept alive or be suspended,
as it is very possible this feature may be used in real checkpointing of
a working domain state, just in order to not lose everything should
something go wrong.
if you have a guest which does some heavy calculations, or one which
handles many customer's connection this may be a good idea even in
migration (which calls the checkpointing code anyway.), although in the
latter case, of course the network interfaces of the new domain will be
taken down, and it may be used as a hot spare ready for a quick
hook-up...
So what do you think?
I'm thinking of either using pause/unpause instead of suspend and
manually save the info the suspend record would have contained,
OR recreating the vcpu after HYPERVISOR_suspend() is done with it,
remapping the necessary fns.
[-- Attachment #1.2: Type: text/html, Size: 2233 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Domain save/migrate issue
@ 2006-02-14 14:08 Noam Taich
2006-02-14 14:41 ` Daniel Veillard
2006-02-15 11:28 ` Jacob Gorm Hansen
0 siblings, 2 replies; 13+ messages in thread
From: Noam Taich @ 2006-02-14 14:08 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1: Type: text/plain, Size: 1615 bytes --]
As it stands now, when a domain is saved to disk it is suspended in the
process, and then destroyed. This is done by the checkpointing code in
Xend.
However, it's a good idea to leave the option open to the user whether
or not the saved domain should be kept alive or be suspended,
as it is very possible this feature may be used in real checkpointing of
a working domain state, just in order to not lose everything should
something go wrong.
if you have a guest which does some heavy calculations, or one which
handles many customer's connection this may be a good idea even in
migration (which calls the checkpointing code anyway.), although in the
latter case, of course the network interfaces of the new domain will be
taken down, and it may be used as a hot spare ready for a quick
hook-up...
So what do you think?
I'm thinking of either using pause/unpause instead of suspend and
manually save the info the suspend record would have contained,
OR recreating the vcpu after HYPERVISOR_suspend() is done with it,
remapping the necessary fns.
A Nice addition:
As a final addition to these nice capabilities (though only a NICE one,
not an important one), I believe it would also be a good idea to add a
utility which can read a state file (as written to the fd in
xc_linux_save) created during live migration,
And output a "fixed" file with the earlier copies of the pages that were
resent discarded.
This would allow the aforementioned domain save option to be LIVE also,
and without wasting too much space,
Which would enable live local checkpointing.
[-- Attachment #1.2: Type: text/html, Size: 4319 bytes --]
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-14 14:08 Noam Taich
@ 2006-02-14 14:41 ` Daniel Veillard
2006-02-14 14:44 ` Steven Hand
2006-02-15 11:28 ` Jacob Gorm Hansen
1 sibling, 1 reply; 13+ messages in thread
From: Daniel Veillard @ 2006-02-14 14:41 UTC (permalink / raw)
To: Noam Taich; +Cc: xen-devel
On Tue, Feb 14, 2006 at 06:08:06AM -0800, Noam Taich wrote:
> As it stands now, when a domain is saved to disk it is suspended in the
> process, and then destroyed. This is done by the checkpointing code in
> Xend.
> However, it's a good idea to leave the option open to the user whether
> or not the saved domain should be kept alive or be suspended,
> as it is very possible this feature may be used in real checkpointing of
> a working domain state, just in order to not lose everything should
> something go wrong.
>
> if you have a guest which does some heavy calculations, or one which
> handles many customer's connection this may be a good idea even in
> migration (which calls the checkpointing code anyway.), although in the
> latter case, of course the network interfaces of the new domain will be
> taken down, and it may be used as a hot spare ready for a quick
> hook-up...
>
> So what do you think?
I tend to agree, it's about having orthogonal APIs, I think you can build
the current behaviour by a sequence of the simpler save and a destroy (though
it would not be atomic anymore). Is there any strong reason a saved domain
must not be left running ?
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-14 14:41 ` Daniel Veillard
@ 2006-02-14 14:44 ` Steven Hand
2006-02-14 14:55 ` Anthony Liguori
2006-02-14 14:57 ` Daniel Veillard
0 siblings, 2 replies; 13+ messages in thread
From: Steven Hand @ 2006-02-14 14:44 UTC (permalink / raw)
To: veillard; +Cc: Noam Taich, xen-devel
> On Tue, Feb 14, 2006 at 06:08:06AM -0800, Noam Taich wrote:
> > As it stands now, when a domain is saved to disk it is suspended in the
> > process, and then destroyed. This is done by the checkpointing code in
> > Xend.
> > However, it's a good idea to leave the option open to the user whether
> > or not the saved domain should be kept alive or be suspended,
> > as it is very possible this feature may be used in real checkpointing of
> > a working domain state, just in order to not lose everything should
> > something go wrong.
> >
> > if you have a guest which does some heavy calculations, or one which
> > handles many customer's connection this may be a good idea even in
> > migration (which calls the checkpointing code anyway.), although in the
> > latter case, of course the network interfaces of the new domain will be
> > taken down, and it may be used as a hot spare ready for a quick
> > hook-up...
> >
> > So what do you think?
>
> I tend to agree, it's about having orthogonal APIs, I think you can build
> the current behaviour by a sequence of the simpler save and a destroy (though
> it would not be atomic anymore). Is there any strong reason a saved domain
> must not be left running ?
Unless you also have some way to simulataneously snapshot the file
system, it is not safe to allow the guest to continue and then later
resume the checkpointed version.
S.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-14 14:44 ` Steven Hand
@ 2006-02-14 14:55 ` Anthony Liguori
2006-02-14 14:57 ` Daniel Veillard
1 sibling, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2006-02-14 14:55 UTC (permalink / raw)
To: Steven Hand; +Cc: Noam Taich, xen-devel, veillard
Steven Hand wrote:
>> I tend to agree, it's about having orthogonal APIs, I think you can build
>> the current behaviour by a sequence of the simpler save and a destroy (though
>> it would not be atomic anymore). Is there any strong reason a saved domain
>> must not be left running ?
>>
>
> Unless you also have some way to simulataneously snapshot the file
> system, it is not safe to allow the guest to continue and then later
> resume the checkpointed version.
>
There are storage devices that provide this capability. Also, Dan Smith
is working on a COW device for Xen that could be used to checkpoint a
storage device.
However, there's a fair bit of work that would be needed to allow for
light-weight checkpointing. A domain has to be suspended for the
checkpoint to finish (although presumably one could use a similar as
live migration to get most of the way there). Today, in Xen, there is
no way to get out of a suspended state.
If a domain could leave the suspended state, it would make checkpointing
pretty cheap. Also, presumably, it would simplify rebooting because
instead of having to recreate a domain on reboot, the hypervisor could
just reinit it. Of course, it would need some way of knowing how to
build the domain...
Regards,
Anthony Liguori
> S.
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-14 14:44 ` Steven Hand
2006-02-14 14:55 ` Anthony Liguori
@ 2006-02-14 14:57 ` Daniel Veillard
1 sibling, 0 replies; 13+ messages in thread
From: Daniel Veillard @ 2006-02-14 14:57 UTC (permalink / raw)
To: Steven Hand; +Cc: Noam Taich, xen-devel
On Tue, Feb 14, 2006 at 02:44:29PM +0000, Steven Hand wrote:
>
> > On Tue, Feb 14, 2006 at 06:08:06AM -0800, Noam Taich wrote:
> > it would not be atomic anymore). Is there any strong reason a saved domain
> > must not be left running ?
>
> Unless you also have some way to simulataneously snapshot the file
> system, it is not safe to allow the guest to continue and then later
> resume the checkpointed version.
The problem already exists if you resume twice a guest from an image.
Restraining the API doesn't fix the problem, it just limits the probability
of hitting it by mistake. I'm not sure there is any way we can garantee
100% safe operations in all case, most FSes don't have snapshotting
capabilities anyway.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Domain save/migrate issue
@ 2006-02-15 8:38 Noam Taich
2006-02-15 14:16 ` Anthony Liguori
0 siblings, 1 reply; 13+ messages in thread
From: Noam Taich @ 2006-02-15 8:38 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel
> there's a fair bit of work that would be needed to allow for
light-weight checkpointing. A domain has to be suspended for the
checkpoint to finish (although presumably one could use a similar as
live migration to get most of the way there).
> Today, in Xen, there is no way to get out of a suspended state.
>If a domain could leave the suspended state, it would make
checkpointing
pretty cheap. Also, presumably, it would simplify rebooting because
instead of having to recreate a domain on reboot, the hypervisor could
just reinit it. Of course, it would need some way of knowing how to
build the domain...
Regards,
Anthony Liguori
> S.
>
My suggestion is NOT to bring a domain out of suspension. What I am
suggesting is bypassing the problem.
Here's the current idea:
We need the memory image of the domain to be static, So we can't allow
the domain to run. So, the first idea is to use pause/unpause instead of
suspend.
Now for the next (serious) problem:
This seems to work fine (in live or non live settings) until the
xc_linux_save() function reaches the part where it checks the frame
number
Of the suspend record, which makes sense, because now, we have NO
suspend record. So, the second idea is to (simply?) write all that info
on the io_fd
The function gets ourselves. Just canonicalize the fns that suspend
does,
And write the appropriate info.
The restore function does not have to change at all... it sees the same
input.
So, what do you think, is this a good idea? Even possible? Will it
entail a lot?
One of my concerns is this: the shared pages.
Can Xen write to them while the guest is "only" paused? And if so, what
Can it (practically) write there while the guest is paused?
Even if it CAN, is it Reasonable to expect it won't do that usually?
I'm not really troubled by the storage issues. This feature would be
useful in many cases even with no solution to that problem.
Sorry for the multiple messages on the original subject. It was an
unfortunate misunderstanding.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Domain save/migrate issue
@ 2006-02-15 9:32 Noam Taich
0 siblings, 0 replies; 13+ messages in thread
From: Noam Taich @ 2006-02-15 9:32 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel
My mentioning of the shared pages has brought up something else:
How LIVE can the live migration become?
What I mean is this:
Right before the last iteration of the main xc_linux_save() main while
loop,
The domain is suspended. After the loop, the shared info pages are
written.
So, interrupts would be ignored, as xen won't send them to a shutdown
domain.
Seems obvious, but my point is that should there be a way to get xen out
of suspended state, another improvement can be made:
Xen can be made to write to event channels of suspended domains (up to a
certain limit, and that capability can be turned on or off to finish
things off...)
Won't that mean that it would mean a lot less interrupts are lost?
The migration would be even more LIVE.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-14 14:08 Noam Taich
2006-02-14 14:41 ` Daniel Veillard
@ 2006-02-15 11:28 ` Jacob Gorm Hansen
1 sibling, 0 replies; 13+ messages in thread
From: Jacob Gorm Hansen @ 2006-02-15 11:28 UTC (permalink / raw)
To: Noam Taich; +Cc: xen-devel
On 2/14/06, Noam Taich <noam.taich@qumranet.com> wrote:
> This would allow the aforementioned domain save option to be LIVE also, and
> without wasting too much space,
>
> Which would enable live local checkpointing.
hi,
with my self-migration patch, I already have this functionality. In
the domain, I open a block device with O_DIRECT, and then a user-space
process take care of the live checkpointing to disk. A small
bootloader is prepended to the checkpoint, so that is can revive
itself.
You can see the source of such a user-space process here:
http://www.distlab.dk/hg/index.cgi/xen-gfx.hg?cmd=file;filenode=bed84d57fa6fe224eede24cd5ace69df4234a8b1;file=tools/migrate/minimig.c
Jacob
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-15 8:38 Noam Taich
@ 2006-02-15 14:16 ` Anthony Liguori
0 siblings, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2006-02-15 14:16 UTC (permalink / raw)
To: Noam Taich; +Cc: xen-devel
Noam Taich wrote:
> We need the memory image of the domain to be static, So we can't allow
> the domain to run. So, the first idea is to use pause/unpause instead of
>
Suspend does more than just canonicalize the p2m, it also provides
callbacks for all of the devices so that they can canonicalize their own
page references and set them self up to reinitialize upon resume.
While the save code can access the p2m table, it has no way of knowing
the device information so just pausing isn't really an option (also, you
could do bad things like checkpoint before a storage operation was
committed or something like that).
Regards,
Anthony Liguori
> suspend.
>
> Now for the next (serious) problem:
> This seems to work fine (in live or non live settings) until the
> xc_linux_save() function reaches the part where it checks the frame
> number
> Of the suspend record, which makes sense, because now, we have NO
> suspend record. So, the second idea is to (simply?) write all that info
> on the io_fd
> The function gets ourselves. Just canonicalize the fns that suspend
> does,
> And write the appropriate info.
>
> The restore function does not have to change at all... it sees the same
> input.
>
> So, what do you think, is this a good idea? Even possible? Will it
> entail a lot?
>
> One of my concerns is this: the shared pages.
> Can Xen write to them while the guest is "only" paused? And if so, what
> Can it (practically) write there while the guest is paused?
> Even if it CAN, is it Reasonable to expect it won't do that usually?
>
>
> I'm not really troubled by the storage issues. This feature would be
> useful in many cases even with no solution to that problem.
>
> Sorry for the multiple messages on the original subject. It was an
> unfortunate misunderstanding.
>
>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>>
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Domain save/migrate issue
@ 2006-02-15 16:22 Noam Taich
2006-02-15 16:26 ` Anthony Liguori
0 siblings, 1 reply; 13+ messages in thread
From: Noam Taich @ 2006-02-15 16:22 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel
>Suspend does more than just canonicalize the p2m, it also provides
callbacks for all of the devices so that they can canonicalize their own
page references and set them self up to reinitialize upon resume.
>While the save code can access the p2m table, it has no way of knowing
the device information so just pausing isn't really an option (also, you
could do bad things like checkpoint before a storage operation was
committed or something like that).
>Regards,
>Anthony Liguori
Ok. So pause ALONE is not possible. But I didn't mean I'd do only a
pause:
Suspend writes all the info you just mentioned. And the save code just
writes the info that suspend left for it.
So, lets tackle the problem in xen and/or the host.
If a change was entered into the suspend code, say, a special mode,
"pseudoSuspend" was added, in which the function writes all that same
info
Into that very same place it wrote to before (or any other cozy spot we
can access from the save function), but does NOT actually suspend the
domain
(Ideally, it would suspend nothing. But, if necessary, it can resume
everything instead of calling the actual shutdown code).
The save function doesn't need to know anything now that it didn't know
before...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Domain save/migrate issue
2006-02-15 16:22 Domain save/migrate issue Noam Taich
@ 2006-02-15 16:26 ` Anthony Liguori
0 siblings, 0 replies; 13+ messages in thread
From: Anthony Liguori @ 2006-02-15 16:26 UTC (permalink / raw)
To: Noam Taich; +Cc: xen-devel
Noam Taich wrote:
> Ok. So pause ALONE is not possible. But I didn't mean I'd do only a
> pause:
>
> Suspend writes all the info you just mentioned. And the save code just
> writes the info that suspend left for it.
> So, lets tackle the problem in xen and/or the host.
>
> If a change was entered into the suspend code, say, a special mode,
> "pseudoSuspend" was added, in which the function writes all that same
> info
>
Why exactly would this be better than just making domain's unsuspendable?
All it would take is some hypervisor plumbing...
Regards,
Anthony Liguori
> Into that very same place it wrote to before (or any other cozy spot we
> can access from the save function), but does NOT actually suspend the
> domain
> (Ideally, it would suspend nothing. But, if necessary, it can resume
> everything instead of calling the actual shutdown code).
>
> The save function doesn't need to know anything now that it didn't know
> before...
>
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Domain save/migrate issue
@ 2006-02-15 16:42 Noam Taich
0 siblings, 0 replies; 13+ messages in thread
From: Noam Taich @ 2006-02-15 16:42 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel
Noam Taich wrote:
> Ok. So pause ALONE is not possible. But I didn't mean I'd do only a
> pause:
>
> Suspend writes all the info you just mentioned. And the save code just
> writes the info that suspend left for it.
> So, lets tackle the problem in xen and/or the host.
>
> If a change was entered into the suspend code, say, a special mode,
> "pseudoSuspend" was added, in which the function writes all that same
> info
>
Why exactly would this be better than just making domain's
unsuspendable?
All it would take is some hypervisor plumbing...
Regards,
Anthony Liguori
>
Better? I don't know, I hold judgment as to whether or not domains
should be made unsuspendable, I just offer to add an option.
If there is no other need for domain suspension, it can replace the old
suspend.
About the hypervisor plumbing issue, isn't it just a matter of going
over all the smaller suspend functions the main suspend function calls
and make the changes there? Making sure nothing else gets suspended,
and changing the code that goes over the drivers and handles the
callbacks.
So far, I haven't seen anything that would indicate it would entail
changes
in the far reaches of hypervisor space...
Then again... am I wrong?
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2006-02-15 16:42 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-15 16:22 Domain save/migrate issue Noam Taich
2006-02-15 16:26 ` Anthony Liguori
-- strict thread matches above, loose matches on Subject: below --
2006-02-15 16:42 Noam Taich
2006-02-15 9:32 Noam Taich
2006-02-15 8:38 Noam Taich
2006-02-15 14:16 ` Anthony Liguori
2006-02-14 14:08 Noam Taich
2006-02-14 14:41 ` Daniel Veillard
2006-02-14 14:44 ` Steven Hand
2006-02-14 14:55 ` Anthony Liguori
2006-02-14 14:57 ` Daniel Veillard
2006-02-15 11:28 ` Jacob Gorm Hansen
2006-02-14 13:45 Noam Taich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.