All of lore.kernel.org
 help / color / mirror / Atom feed
* Migration filesystem coherency?
@ 2006-06-27 19:29 John Byrne
  0 siblings, 0 replies; 4+ messages in thread
From: John Byrne @ 2006-06-27 19:29 UTC (permalink / raw)
  To: xen-devel


Hi,

I thought I had a workaround for live migration crashing (I've been 
looking at the SLES 3.0.2 9742c code.), but I found that I was getting 
filesystem errors. I'm wondering if the problem is races in data being 
written to the backing storage.

When migrating a domain, before the domain is started on the new host, 
you have to guarantee that all the domU vbd data is out of the block 
cache and written to the backing device. (In the case of a loopback 
device, whether this is sufficient depends on the cross-host coherency 
guarantees of the backing filesystem.) I cannot see that this takes 
place synchronously with the migration process. To me it looks like that 
the teardown/flush of the backing device depends on the action of the 
xenbus and the hotplug scripts and looks asynchronous to the migration 
process.

So, am I right that there is a really a problem here or is there some 
other way the vbd data is getting flushed during migrate?

Thanks,

John Byrne

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: Migration filesystem coherency?
@ 2006-06-27 20:39 Ian Pratt
  2006-06-27 22:08 ` John Byrne
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Pratt @ 2006-06-27 20:39 UTC (permalink / raw)
  To: John Byrne, xen-devel

> I thought I had a workaround for live migration crashing 
> (I've been looking at the SLES 3.0.2 9742c code.), but I 
> found that I was getting filesystem errors. I'm wondering if 
> the problem is races in data being written to the backing storage.
> 
> When migrating a domain, before the domain is started on the 
> new host, you have to guarantee that all the domU vbd data is 
> out of the block cache and written to the backing device. (In 
> the case of a loopback device, whether this is sufficient 
> depends on the cross-host coherency guarantees of the backing 
> filesystem.) I cannot see that this takes place synchronously 
> with the migration process. To me it looks like that the 
> teardown/flush of the backing device depends on the action of 
> the xenbus and the hotplug scripts and looks asynchronous to 
> the migration process.
> 
> So, am I right that there is a really a problem here or is 
> there some other way the vbd data is getting flushed during migrate?

The loop device doesn't do direct IO, so using it for migration is
fundamentally unsafe. See Andrew/Julians's blktap patches for a way to
do safe file-backed VMs. 

Ian 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Migration filesystem coherency?
  2006-06-27 20:39 Migration filesystem coherency? Ian Pratt
@ 2006-06-27 22:08 ` John Byrne
  2006-06-28 16:34   ` Charles Coffing
  0 siblings, 1 reply; 4+ messages in thread
From: John Byrne @ 2006-06-27 22:08 UTC (permalink / raw)
  To: Ian Pratt; +Cc: xen-devel

Ian Pratt wrote:
>> I thought I had a workaround for live migration crashing 
>> (I've been looking at the SLES 3.0.2 9742c code.), but I 
>> found that I was getting filesystem errors. I'm wondering if 
>> the problem is races in data being written to the backing storage.
>>
>> When migrating a domain, before the domain is started on the 
>> new host, you have to guarantee that all the domU vbd data is 
>> out of the block cache and written to the backing device. (In 
>> the case of a loopback device, whether this is sufficient 
>> depends on the cross-host coherency guarantees of the backing 
>> filesystem.) I cannot see that this takes place synchronously 
>> with the migration process. To me it looks like that the 
>> teardown/flush of the backing device depends on the action of 
>> the xenbus and the hotplug scripts and looks asynchronous to 
>> the migration process.
>>
>> So, am I right that there is a really a problem here or is 
>> there some other way the vbd data is getting flushed during migrate?
> 
> The loop device doesn't do direct IO, so using it for migration is
> fundamentally unsafe. See Andrew/Julians's blktap patches for a way to
> do safe file-backed VMs. 
> 
> Ian 
> 

Ian,

At the moment, I'm trying a shared physical disk. Should that work? If 
so, what code is guaranteeing the data is written to disk before the 
domain starts executing on the new host?

As to loopback, regardless of what kind of I/O it does, when the 
loopback device is torn down, all I/O should be committed to, at least, 
the VFS layer of the backing filesystem. If the backing filesystem makes 
the proper coherency guarantees, then this should be sufficient. My 
understanding is that both GFS and OCFS2 make these guarantees. So with 
these filesystems as the backing store, as long as Xen can guarantee the 
tear down before the domain starts executing on the new node, things 
should work, shouldn't they?

Thanks,

John Byrne

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Migration filesystem coherency?
  2006-06-27 22:08 ` John Byrne
@ 2006-06-28 16:34   ` Charles Coffing
  0 siblings, 0 replies; 4+ messages in thread
From: Charles Coffing @ 2006-06-28 16:34 UTC (permalink / raw)
  To: Ian Pratt, John Byrne; +Cc: xen-devel

On Tue, Jun 27, 2006 at  4:08 PM, in message <44A1AC41.3030600@hp.com>,
John Byrne <john.l.byrne@hp.com> wrote: 
>>> I thought I had a workaround for live migration crashing 
>>> (I've been looking at the SLES 3.0.2 9742c code.), but I 
>>> found that I was getting filesystem errors. I'm wondering if 
>>> the problem is races in data being written to the backing storage.
>>>
>>> When migrating a domain, before the domain is started on the 
>>> new host, you have to guarantee that all the domU vbd data is 
>>> out of the block cache and written to the backing device. (In 
>>> the case of a loopback device, whether this is sufficient 
>>> depends on the cross- host coherency guarantees of the backing 
>>> filesystem.) I cannot see that this takes place synchronously 
>>> with the migration process. To me it looks like that the 
>>> teardown/flush of the backing device depends on the action of 
>>> the xenbus and the hotplug scripts and looks asynchronous to 
>>> the migration process.

I'm seeing this too, but in a slightly different context.

> As to loopback, regardless of what kind of I/O it does, when the 
> loopback device is torn down, all I/O should be committed to, at
least, 
> the VFS layer of the backing filesystem. If the backing filesystem
makes 
> the proper coherency guarantees, then this should be sufficient. My 
> understanding is that both GFS and OCFS2 make these guarantees. So
with 
> these filesystems as the backing store, as long as Xen can guarantee
the 
> tear down before the domain starts executing on the new node, things

> should work, shouldn't they?

John, I haven't looked at the migration case, but the problem you're
describing does sound very similar to Novell's bugzilla #185557.  In
this case, try doing a "xm shutdown -w" (and once that returns,
immediately start the VM on another physical node).  The shutdown should
wait until the domain is completely shut down (and flushed, one would
hope) before returning.  It doesn't... the udev event that tears down
the loopback device hasn't necessarily happened before the command
returns, and so we've been seeing filesystem corruption when the VM is
brought back up on another node.

We're on OCFS2, so I, too, think that ensuring the loopback is torn
down synchronously would be sufficient to fix this problem.

-Charles

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-06-28 16:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27 20:39 Migration filesystem coherency? Ian Pratt
2006-06-27 22:08 ` John Byrne
2006-06-28 16:34   ` Charles Coffing
  -- strict thread matches above, loose matches on Subject: below --
2006-06-27 19:29 John Byrne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.