* Domain saving and filesystem corruption
@ 2006-06-25 0:32 Tracy R Reed
2006-06-26 20:59 ` Keir Fraser
0 siblings, 1 reply; 5+ messages in thread
From: Tracy R Reed @ 2006-06-25 0:32 UTC (permalink / raw)
To: xen-devel
I have been using Xen for over a year now. For the most part I have had
very good success with it and we are now working on rolling it out
throughout my company. But I just ran across something really annoying
and dangerous.
When I first started playing with xen I read all of the docs I could
find and at that time I am pretty sure xen did not automatically save
domains when the machine was shut down. Later on I noticed that it was
trying to do so but was failing because the directory to save to did not
exist on my machine for some reason (was not created during the
install). After that I completely forgot about this behavior. A month or
two ago I upgraded to Xen 3.0 from mercurial (I don't have the sources
around anymore and I don't see how to get xen to tell me its exact
version) and it seems that domain saving on shutdown is now working.
Great. I recently had some unrelated system problems which caused me to
need to shut down, boot from a rescue disk, and mount the logical volume
normally used by my mail server and do quite a bit of work on it. Once
done I booted the system normally, xen started the mail domain, and all
kinds of weird stuff started happening related to the filesystem. I shut
down the domain, did an fsck of the mail server logical volume, and
found thousands of errors.
Then I realized what had happened. The xen domain was saving state to
the disk including internal buffers and who knows what that were not
synch'd to the disk. So I mounted a very dirty filesystem, made a bunch
of changes, then the mail server domain came back up expecting the fs to
be in the same state it was left in and proceeded as if everything were
normal which ended up causing massive corruption and many lost emails.
Fortunately this is on a dev machine which hosts a bunch of personal
domains and other stuff and not business critical things. But it is
still highly annoying.
I recommend that whenever Xen saves a domain that the domain somehow
sync the filesystem state to disk. Ideally the fs would even be marked
clean so that if someone needs to mount the fs while the domain is not
running such as I did they can. There really needs to be a way for a xen
domain, upon being started, to know that the fs is in a sane and
consistent state just as it was when it was saved. Ensuring that only
filesystems marked clean are left after a save and mounted upon restart
is one way to do that. Or is there some sort of time stamp such as a
last mount time in the fs that the domain can look at and save with the
domain state and make sure that the last mount time has not changed when
the domain is restarted? I realize that most of these things are
filesystem/OS specific. It would be really nice to have a general
solution to this. I think something needs to be done because the current
situation seems quite dangerous. For now I have disabled the
saving/restarting of domains and will do so on all of our production
systems also. It's a risk I just can't take.
I mentioned this to someone on the IRC channel and they said "That is
documented behavior." Unfortunately that doesn't bring back my data. It
wasn't documented when I started using Xen and I can't possibly keep up
on everything written about Xen in the meantime.
--
Tracy R Reed
http://ultraviolet.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Domain saving and filesystem corruption
2006-06-25 0:32 Domain saving and filesystem corruption Tracy R Reed
@ 2006-06-26 20:59 ` Keir Fraser
2006-06-26 21:13 ` Anthony Liguori
0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2006-06-26 20:59 UTC (permalink / raw)
To: Tracy R Reed; +Cc: xen-devel
On 25 Jun 2006, at 01:32, Tracy R Reed wrote:
> I mentioned this to someone on the IRC channel and they said "That is
> documented behavior." Unfortunately that doesn't bring back my data. It
> wasn't documented when I started using Xen and I can't possibly keep up
> on everything written about Xen in the meantime.
I'm not sure if the behaviour is documented, but it certainly isn't
new. Save/restore has always behaved like that -- a filesystem should
be considered 'locked down' by a guest except when the guest OS is shut
down cleanly. No interlock is enforced or metadata maintained for this
in open source tools.
-- Keir
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Domain saving and filesystem corruption
2006-06-26 20:59 ` Keir Fraser
@ 2006-06-26 21:13 ` Anthony Liguori
2006-06-26 22:05 ` Tracy R Reed
0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2006-06-26 21:13 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel, Tracy R Reed
Keir Fraser wrote:
>
> On 25 Jun 2006, at 01:32, Tracy R Reed wrote:
>
>> I mentioned this to someone on the IRC channel and they said "That is
>> documented behavior." Unfortunately that doesn't bring back my data. It
>> wasn't documented when I started using Xen and I can't possibly keep up
>> on everything written about Xen in the meantime.
>
> I'm not sure if the behaviour is documented, but it certainly isn't
> new. Save/restore has always behaved like that -- a filesystem should
> be considered 'locked down' by a guest except when the guest OS is
> shut down cleanly. No interlock is enforced or metadata maintained for
> this in open source tools.
You really ought to avoid save/restore/migrate when not using network or
checkpointable storage. You will almost certainly eventually get some
sort of corruption.
I didn't realize xend actually tries to save domains on shutdown. Seems
like a bad idea to me. Is this correct? Is this only for domains
started with /etc/init.d/xendomains?
Regards,
Anthony Liguori
> -- Keir
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Domain saving and filesystem corruption
2006-06-26 21:13 ` Anthony Liguori
@ 2006-06-26 22:05 ` Tracy R Reed
2006-06-27 2:45 ` Eric Peterson
0 siblings, 1 reply; 5+ messages in thread
From: Tracy R Reed @ 2006-06-26 22:05 UTC (permalink / raw)
To: Anthony Liguori; +Cc: xen-devel
Anthony Liguori wrote:
> You really ought to avoid save/restore/migrate when not using network or
> checkpointable storage. You will almost certainly eventually get some
> sort of corruption.
No doubt. Thing is, I didn't realize it was doing this. The machine so
rarely gets rebooted that I never noticed it saving out the state of the
domains to disk. I am impressed with how fast it does it though.
> I didn't realize xend actually tries to save domains on shutdown. Seems
> like a bad idea to me. Is this correct? Is this only for domains
> started with /etc/init.d/xendomains?
On RedHat (I run FC5 in my domain0 and CentOS 4.3 in my domains) you can
look in /etc/sysconfig/xendomains to see how this all works. It looks
like by default it will try to save the state of all domains unless you
set XENDOMAINS_AUTO_ONLY to true. It is set to false by default.
One odd thing I see is this:
# Directory to save running domains to when the system (dom0) is
# shut down. Will also be used to restore domains from if
# XENDOMAINS_RESTORE
# is set (see below). Leave empty to disable domain saving on shutdown
# (e.g. because you rather shut domains down).
# If domain saving does succeed, SHUTDOWN will not be executed.
#
#XENDOMAINS_SAVE=/var/lib/xen/save
So XENDOMAINS_SAVE is commented out by default. So it should be "". So
why are the domains being saved? It looks like it should not have
defaulted to trying to save all of the domains but it should have
skipped saving them since XENDOMAINS_SAVE is not defined and it should
have executed the commands in XENDOMAINS_SHUTDOWN. I am not in front of
my Xen console right now where I can play with this but I will try to
look into it tonight when I am.
--
Tracy R Reed http://ultraviolet.org
A: Because we read from top to bottom, left to right
Q: Why should I start my reply below the quoted text
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Domain saving and filesystem corruption
2006-06-26 22:05 ` Tracy R Reed
@ 2006-06-27 2:45 ` Eric Peterson
0 siblings, 0 replies; 5+ messages in thread
From: Eric Peterson @ 2006-06-27 2:45 UTC (permalink / raw)
To: Tracy R Reed; +Cc: xen-devel
On 6/26/06, Tracy R Reed <treed@ultraviolet.org> wrote:
> One odd thing I see is this:
>
> # Directory to save running domains to when the system (dom0) is
> # shut down. Will also be used to restore domains from if
> # XENDOMAINS_RESTORE
> # is set (see below). Leave empty to disable domain saving on shutdown
> # (e.g. because you rather shut domains down).
> # If domain saving does succeed, SHUTDOWN will not be executed.
> #
> #XENDOMAINS_SAVE=/var/lib/xen/save
>
> So XENDOMAINS_SAVE is commented out by default. So it should be "". So
I believe this is just a place holder to indicate the default value
that is used in the code. The comment block indicates that you would
need to have something like this to disable it:
XENDOMAINS_SAVE=""
That's how I interpret code such as this. I may be wrong.
-Eric
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-06-27 2:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25 0:32 Domain saving and filesystem corruption Tracy R Reed
2006-06-26 20:59 ` Keir Fraser
2006-06-26 21:13 ` Anthony Liguori
2006-06-26 22:05 ` Tracy R Reed
2006-06-27 2:45 ` Eric Peterson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.