vbd flushing during migration?

All of lore.kernel.org
 help / color / mirror / Atom feed

* vbd flushing during migration?
@ 2006-07-31 19:39 John Byrne
  2006-07-31 19:56 ` Andrew Warfield
  0 siblings, 1 reply; 5+ messages in thread
From: John Byrne @ 2006-07-31 19:39 UTC (permalink / raw)
  To: xen-devel

Hi,

I don't see any obvious flush to disk taking place for vbd's on the 
source host in XendCheckpoint.py before the domain is started on the new 
host. Is there a guarantee that all written data is on disk somewhere 
else or is something needed?

Thanks,

John Byrne

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vbd flushing during migration?
  2006-07-31 19:39 vbd flushing during migration? John Byrne
@ 2006-07-31 19:56 ` Andrew Warfield
  2006-07-31 22:26   ` John Byrne
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Warfield @ 2006-07-31 19:56 UTC (permalink / raw)
  To: John Byrne; +Cc: xen-devel

It's slightly more than a flush that's required.  The migration
protocol needs to be extended so that execution on the target host
doesn't start until all of the outstanding (i.e. issued by the
backend) block requests have been either cancelled or acknowledged.
This should be pretty straight forward given that the backend driver
ref counts a blkif's state based on pending requests, and won't tear
down the backend directory in xenstore until all the outstanding
requests have cleared.  All that is likely required is to have the
migration code register watches on the backend vbd directories, and
wait for them to disappear before giving the all-clear to the new
host.

We've talked about this enough to know how to fix it, but haven't had
a chance to hack it up.  (I think Julian has looked into the problem a
bit for blktap, but not yet done a general fix.) Patches would
certainly be welcome though. ;)

a.

On 7/31/06, John Byrne <john.l.byrne@hp.com> wrote:
>
> Hi,
>
> I don't see any obvious flush to disk taking place for vbd's on the
> source host in XendCheckpoint.py before the domain is started on the new
> host. Is there a guarantee that all written data is on disk somewhere
> else or is something needed?
>
> Thanks,
>
> John Byrne
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vbd flushing during migration?
  2006-07-31 19:56 ` Andrew Warfield
@ 2006-07-31 22:26   ` John Byrne
  2006-07-31 23:03     ` Andrew Warfield
  2006-08-01 19:28     ` Charles Coffing
  0 siblings, 2 replies; 5+ messages in thread
From: John Byrne @ 2006-07-31 22:26 UTC (permalink / raw)
  To: Andrew Warfield; +Cc: xen-devel

It would be a bit ugly, but mostly straightforward to watch for the 
destruction of the vbds (or all devices) after the destroyDomain() is 
done and then sending an all-clear. (The last time I looked there wasn't 
a waitForDomainDestroy() anywhere, so it would probably be best to write 
one.) This would guarantee correctness: which is the most important thing.

The problem I see with that strategy is the effect on downtime during a 
live-move. Ideally you'd like to start the vbd cleanup when the final 
suspend is done and hope to parallelize the any final device operations 
with the final pass of live-move. How to do that and play nice with 
domain destruction on the normal path and handle errors seems a lot less 
clear to me.

So, are you just ignoring the notion of minimizing downtime for the 
moment or is there something I'm missing?

John

Andrew Warfield wrote:
> It's slightly more than a flush that's required.  The migration
> protocol needs to be extended so that execution on the target host
> doesn't start until all of the outstanding (i.e. issued by the
> backend) block requests have been either cancelled or acknowledged.
> This should be pretty straight forward given that the backend driver
> ref counts a blkif's state based on pending requests, and won't tear
> down the backend directory in xenstore until all the outstanding
> requests have cleared.  All that is likely required is to have the
> migration code register watches on the backend vbd directories, and
> wait for them to disappear before giving the all-clear to the new
> host.
> 
> We've talked about this enough to know how to fix it, but haven't had
> a chance to hack it up.  (I think Julian has looked into the problem a
> bit for blktap, but not yet done a general fix.) Patches would
> certainly be welcome though. ;)
> 
> a.
> 
> On 7/31/06, John Byrne <john.l.byrne@hp.com> wrote:
>>
>> Hi,
>>
>> I don't see any obvious flush to disk taking place for vbd's on the
>> source host in XendCheckpoint.py before the domain is started on the new
>> host. Is there a guarantee that all written data is on disk somewhere
>> else or is something needed?
>>
>> Thanks,
>>
>> John Byrne
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vbd flushing during migration?
  2006-07-31 22:26   ` John Byrne
@ 2006-07-31 23:03     ` Andrew Warfield
  2006-08-01 19:28     ` Charles Coffing
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Warfield @ 2006-07-31 23:03 UTC (permalink / raw)
  To: John Byrne; +Cc: xen-devel

> So, are you just ignoring the notion of minimizing downtime for the
> moment or is there something I'm missing?

That's exactly what I'm suggesting.  The current risk is a (very slim)
write-after-write error case.  Basically, you have a number of
in-flight write requests on the original machine that's somewhere in
between the backend and the physical disk at the time of migration.
Currently, you migrate and the shadow request ring reissues these on
the new host -- which is the right thing to do given that requests are
idempotent.  The problem is that the original in-flight requests can
still hit the disk some time later and cause problems.  The WAW is if
you write an update to a block that had an in-flight request
immediately on arriving at the new host, and it then gets overwritten
by the original request.

Note that for sane block devices this is extremely unlikely as the
aperture that we are talking about is basically whatever is in the
disk's request queue-- it's only really a problem for things like
NFS+loopback and other instances of buffered I/O behind blockback
(which is generally a really bad idea!) where you could see a large
window of outstanding requests that haven't actually hit the disk.
These situations probably need more than just waiting for blkback to
clear pending reqs, as loopback will acknowledge requests befre they
hit the disk in some cases.

So, I think the short-term correctness-preserving approach is to (a)
modify the migration process to add an interlock on block backends on
the source physical machine to go to a closed state -- indicating that
all the outstanding requests have cleared, and (b) not to use
loopback, or buffered IO generally, behind blkback when you intend to
do migration.  The blktap code in the tree is much safer for this sort
of thing and we're happy to sort out migration problems if/when they
come up.

If this winds up adding a big overhead to migration switching time (I
don't think it should, block shutdown can be parallelized with the
stop-and-copy round of migration -- you'll be busy transferring all
the dirty pages that you've queued for DMA anyway) we can probably
speed it up.  One option would be to look into whether the linux block
layer will let you abort submitted requests.  Another would be to
modify the block frontend driver to realize that it's just been
migrated and queue all requests to blocks that were in it's shadow
ring until it receives notification that those writes have cleared
from the original host.  As you point out -- these are probably best
left as a second step. ;)

I'd be interested to know if anyone on the list is solving this sort
of thing already using some sort of storage fencing fanciness to just
sink any pending requests on the original host after migration has
happened.

a.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: vbd flushing during migration?
  2006-07-31 22:26   ` John Byrne
  2006-07-31 23:03     ` Andrew Warfield
@ 2006-08-01 19:28     ` Charles Coffing
  1 sibling, 0 replies; 5+ messages in thread
From: Charles Coffing @ 2006-08-01 19:28 UTC (permalink / raw)
  To: Andrew Warfield, John Byrne; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3173 bytes --]

I've got a patch in our tree that does (basically) what John is
describing.

The exact bug we hit was that a "xm shutdown -w vm" did not wait until
the vbds were cleared out before returning.  So now I wait until the
backend/vbd nodes go away before returning.

This could probably be done more cleanly with watches, and should be
abstracted out to be sure it applies equally to migration, and so forth.
 But for the sake of discussion, the patch is attached.

-Charles


>>> On Mon, Jul 31, 2006 at  4:26 PM, in message
<44CE83B1.1090605@hp.com>, John
Byrne <john.l.byrne@hp.com> wrote: 
> It would be a bit ugly, but mostly straightforward to watch for the 
> destruction of the vbds (or all devices) after the destroyDomain() is

> done and then sending an all- clear. (The last time I looked there
wasn't 
> a waitForDomainDestroy() anywhere, so it would probably be best to
write 
> one.) This would guarantee correctness: which is the most important
thing.
> 
> The problem I see with that strategy is the effect on downtime during
a 
> live- move. Ideally you'd like to start the vbd cleanup when the
final 
> suspend is done and hope to parallelize the any final device
operations 
> with the final pass of live- move. How to do that and play nice with

> domain destruction on the normal path and handle errors seems a lot
less 
> clear to me.
> 
> So, are you just ignoring the notion of minimizing downtime for the 
> moment or is there something I'm missing?
> 
> John
> 
> Andrew Warfield wrote:
>> It's slightly more than a flush that's required.  The migration
>> protocol needs to be extended so that execution on the target host
>> doesn't start until all of the outstanding (i.e. issued by the
>> backend) block requests have been either cancelled or acknowledged.
>> This should be pretty straight forward given that the backend
driver
>> ref counts a blkif's state based on pending requests, and won't
tear
>> down the backend directory in xenstore until all the outstanding
>> requests have cleared.  All that is likely required is to have the
>> migration code register watches on the backend vbd directories, and
>> wait for them to disappear before giving the all- clear to the new
>> host.
>> 
>> We've talked about this enough to know how to fix it, but haven't
had
>> a chance to hack it up.  (I think Julian has looked into the problem
a
>> bit for blktap, but not yet done a general fix.) Patches would
>> certainly be welcome though. ;)
>> 
>> a.
>> 
>> On 7/31/06, John Byrne <john.l.byrne@hp.com> wrote:
>>>
>>> Hi,
>>>
>>> I don't see any obvious flush to disk taking place for vbd's on
the
>>> source host in XendCheckpoint.py before the domain is started on
the new
>>> host. Is there a guarantee that all written data is on disk
somewhere
>>> else or is something needed?
>>>
>>> Thanks,
>>>
>>> John Byrne
>>>
>>>
>>> _______________________________________________
>>> Xen- devel mailing list
>>> Xen- devel@lists.xensource.com
>>> http://lists.xensource.com/xen- devel
>>>
>> 
> 
> 
> _______________________________________________
> Xen- devel mailing list
> Xen- devel@lists.xensource.com
> http://lists.xensource.com/xen- devel



[-- Attachment #2: xen-shutdown-wait.diff --]
[-- Type: application/octet-stream, Size: 1287 bytes --]

Index: xen-unstable/tools/python/xen/xm/shutdown.py
===================================================================
--- xen-unstable.orig/tools/python/xen/xm/shutdown.py
+++ xen-unstable/tools/python/xen/xm/shutdown.py
@@ -52,6 +52,8 @@ def shutdown(opts, doms, mode, wait):
     for d in doms:
         server.xend.domain.shutdown(d, mode)
     if wait:
+        from xen.xend.xenstore.xstransact import xstransact
+        doms_to_cleanup = doms[:]
         while doms:
             alive = server.xend.domains(0)
             dead = []
@@ -62,6 +64,17 @@ def shutdown(opts, doms, mode, wait):
                 opts.info("Domain %s terminated" % d)
                 doms.remove(d)
             time.sleep(1)
+        # Now all the domains are terminated, but wait until the devices are
+        # cleaned up.
+        for d in doms_to_cleanup:
+            info = server.xend.domain(d)
+            domid = int(sxp.child_value(info, 'domid', '-1'))
+            device_class_path = '/local/domain/0/backend/vbd/%d/' % domid
+            while True:
+                devices = xstransact.List(device_class_path)
+                if len(devices) == 0:
+                    break
+                time.sleep(1)
         opts.info("All domains terminated")
 
 def shutdown_mode(opts):

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-08-01 19:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-31 19:39 vbd flushing during migration? John Byrne
2006-07-31 19:56 ` Andrew Warfield
2006-07-31 22:26   ` John Byrne
2006-07-31 23:03     ` Andrew Warfield
2006-08-01 19:28     ` Charles Coffing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.