All of lore.kernel.org
 help / color / mirror / Atom feed
* Domain saving and filesystem corruption
@ 2006-06-25  0:32 Tracy R Reed
  2006-06-26 20:59 ` Keir Fraser
  0 siblings, 1 reply; 5+ messages in thread
From: Tracy R Reed @ 2006-06-25  0:32 UTC (permalink / raw)
  To: xen-devel


I have been using Xen for over a year now. For the most part I have had
very good success with it and we are now working on rolling it out
throughout my company. But I just ran across something really annoying
and dangerous.

When I first started playing with xen I read all of the docs I could
find and at that time I am pretty sure xen did not automatically save
domains when the machine was shut down. Later on I noticed that it was
trying to do so but was failing because the directory to save to did not
exist on my machine for some reason (was not created during the
install). After that I completely forgot about this behavior. A month or
two ago I upgraded to Xen 3.0 from mercurial (I don't have the sources
around anymore and I don't see how to get xen to tell me its exact
version) and it seems that domain saving on shutdown is now working.
Great. I recently had some unrelated system problems which caused me to
need to shut down, boot from a rescue disk, and mount the logical volume
normally used by my mail server and do quite a bit of work on it. Once
done I booted the system normally, xen started the mail domain, and all
kinds of weird stuff started happening related to the filesystem. I shut
down the domain, did an fsck of the mail server logical volume, and
found thousands of errors.

Then I realized what had happened. The xen domain was saving state to
the disk including internal buffers and who knows what that were not
synch'd to the disk. So I mounted a very  dirty filesystem, made a bunch
of changes, then the mail server domain came back up expecting the fs to
be in the same state it was left in and proceeded as if everything were
normal which ended up causing massive corruption and many lost emails.
Fortunately this is on a dev machine which hosts a bunch of personal
domains and other stuff and not business critical things. But it is
still highly annoying.

I recommend that whenever Xen saves a domain that the  domain somehow
sync the filesystem state to disk. Ideally the fs would even be marked
clean so that if someone needs to mount the fs while the domain is not
running such as I did they can. There really needs to be a way for a xen
domain, upon being started, to know that the fs is in a sane and
consistent state just as it was when it was saved. Ensuring that only
filesystems marked clean are left after a save and mounted upon restart
is one way to do that. Or is there some sort of time stamp such as a
last mount time in the fs that the domain can look at and save with the
domain state and make sure that the last mount time has not changed when
the domain is restarted? I realize that most of these things are
filesystem/OS specific. It would be really nice to have a general
solution to this. I think something needs to be done because the current
situation seems quite dangerous. For now I have disabled the
saving/restarting of domains and will do so on all of our production
systems also. It's a risk I just can't take.

I mentioned this to someone on the IRC channel and they said "That is
documented behavior." Unfortunately that doesn't bring back my data. It
wasn't documented when I started using Xen and I can't possibly keep up
on everything written about Xen in the meantime.

-- 
Tracy R Reed
http://ultraviolet.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Domain saving and filesystem corruption
  2006-06-25  0:32 Domain saving and filesystem corruption Tracy R Reed
@ 2006-06-26 20:59 ` Keir Fraser
  2006-06-26 21:13   ` Anthony Liguori
  0 siblings, 1 reply; 5+ messages in thread
From: Keir Fraser @ 2006-06-26 20:59 UTC (permalink / raw)
  To: Tracy R Reed; +Cc: xen-devel


On 25 Jun 2006, at 01:32, Tracy R Reed wrote:

> I mentioned this to someone on the IRC channel and they said "That is
> documented behavior." Unfortunately that doesn't bring back my data. It
> wasn't documented when I started using Xen and I can't possibly keep up
> on everything written about Xen in the meantime.

I'm not sure if the behaviour is documented, but it certainly isn't 
new. Save/restore has always behaved like that -- a filesystem should 
be considered 'locked down' by a guest except when the guest OS is shut 
down cleanly. No interlock is enforced or metadata maintained for this 
in open source tools.

  -- Keir

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Domain saving and filesystem corruption
  2006-06-26 20:59 ` Keir Fraser
@ 2006-06-26 21:13   ` Anthony Liguori
  2006-06-26 22:05     ` Tracy R Reed
  0 siblings, 1 reply; 5+ messages in thread
From: Anthony Liguori @ 2006-06-26 21:13 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Tracy R Reed

Keir Fraser wrote:
>
> On 25 Jun 2006, at 01:32, Tracy R Reed wrote:
>
>> I mentioned this to someone on the IRC channel and they said "That is
>> documented behavior." Unfortunately that doesn't bring back my data. It
>> wasn't documented when I started using Xen and I can't possibly keep up
>> on everything written about Xen in the meantime.
>
> I'm not sure if the behaviour is documented, but it certainly isn't 
> new. Save/restore has always behaved like that -- a filesystem should 
> be considered 'locked down' by a guest except when the guest OS is 
> shut down cleanly. No interlock is enforced or metadata maintained for 
> this in open source tools.

You really ought to avoid save/restore/migrate when not using network or 
checkpointable storage.  You will almost certainly eventually get some 
sort of corruption.

I didn't realize xend actually tries to save domains on shutdown.  Seems 
like a bad idea to me.  Is this correct?  Is this only for domains 
started with /etc/init.d/xendomains?

Regards,

Anthony Liguori

>  -- Keir
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Domain saving and filesystem corruption
  2006-06-26 21:13   ` Anthony Liguori
@ 2006-06-26 22:05     ` Tracy R Reed
  2006-06-27  2:45       ` Eric Peterson
  0 siblings, 1 reply; 5+ messages in thread
From: Tracy R Reed @ 2006-06-26 22:05 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: xen-devel

Anthony Liguori wrote:
> You really ought to avoid save/restore/migrate when not using network or 
> checkpointable storage.  You will almost certainly eventually get some 
> sort of corruption.

No doubt. Thing is, I didn't realize it was doing this. The machine so 
rarely gets rebooted that I never noticed it saving out the state of the 
domains to disk. I am impressed with how fast it does it though.

> I didn't realize xend actually tries to save domains on shutdown.  Seems 
> like a bad idea to me.  Is this correct?  Is this only for domains 
> started with /etc/init.d/xendomains?

On RedHat (I run FC5 in my domain0 and CentOS 4.3 in my domains) you can 
look in /etc/sysconfig/xendomains to see how this all works. It looks 
like by default it will try to save the state of all domains unless you 
set XENDOMAINS_AUTO_ONLY to true. It is set to false by default.

One odd thing I see is this:

# Directory to save running domains to when the system (dom0) is
# shut down. Will also be used to restore domains from if
# XENDOMAINS_RESTORE
# is set (see below). Leave empty to disable domain saving on shutdown
# (e.g. because you rather shut domains down).
# If domain saving does succeed, SHUTDOWN will not be executed.
#
#XENDOMAINS_SAVE=/var/lib/xen/save

So XENDOMAINS_SAVE is commented out by default. So it should be "". So 
why are the domains being saved? It looks like it should not have 
defaulted to trying to save all of the domains but it should have 
skipped saving them since XENDOMAINS_SAVE is not defined and it should 
have executed the commands in XENDOMAINS_SHUTDOWN. I am not in front of 
my Xen console right now where I can play with this but I will try to 
look into it tonight when I am.

-- 
Tracy R Reed                  http://ultraviolet.org
A: Because we read from top to bottom, left to right
Q: Why should I start my reply below the quoted text

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Domain saving and filesystem corruption
  2006-06-26 22:05     ` Tracy R Reed
@ 2006-06-27  2:45       ` Eric Peterson
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Peterson @ 2006-06-27  2:45 UTC (permalink / raw)
  To: Tracy R Reed; +Cc: xen-devel

On 6/26/06, Tracy R Reed <treed@ultraviolet.org> wrote:
> One odd thing I see is this:
>
> # Directory to save running domains to when the system (dom0) is
> # shut down. Will also be used to restore domains from if
> # XENDOMAINS_RESTORE
> # is set (see below). Leave empty to disable domain saving on shutdown
> # (e.g. because you rather shut domains down).
> # If domain saving does succeed, SHUTDOWN will not be executed.
> #
> #XENDOMAINS_SAVE=/var/lib/xen/save
>
> So XENDOMAINS_SAVE is commented out by default. So it should be "". So

I believe this is just a place holder to indicate the default value
that is used in the code. The comment block indicates that you would
need to have something like this to disable it:

XENDOMAINS_SAVE=""

That's how I interpret code such as this. I may be wrong.

-Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-06-27  2:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-25  0:32 Domain saving and filesystem corruption Tracy R Reed
2006-06-26 20:59 ` Keir Fraser
2006-06-26 21:13   ` Anthony Liguori
2006-06-26 22:05     ` Tracy R Reed
2006-06-27  2:45       ` Eric Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.