qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] unplug_request and migration
@ 2017-06-08 14:41 David Gibson
  2017-06-08 15:10 ` Dr. David Alan Gilbert
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Gibson @ 2017-06-08 14:41 UTC (permalink / raw)
  To: dgilber, quintela; +Cc: qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1281 bytes --]

Hi Dave & Juan,

I'm hoping one of you can answer this.

I'm currently grappling with (amongst other things) a pseries machine
racing a hot unplug operation with a migrate.  There's various issues
with what interim state we need, and which bits of it need to be
migrated that I'm still investigating.  But, there's a more general
question that I'm guessing must have already been addressed for x86.

For any "soft" unplug device - i.e. using ->unplug_request, rather
than ->unplug, giving a device_del command will just ask the guest
nicely to release the device, with the completion of the unplug
happening only if and when the guest indicates it's ready for the
device to go away.  AFAICT, the device_del command will return as soon
as the request is made, but if the guest is busy, the completion of
the hot unplug could take arbitrarily long.

So, what happens if there's a migration in between the unplug_request
and the guest completing the unplug?  How does libvirt (or whatever)
know whether to include the device on the destination machine command
line?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson
@ 2017-06-08 15:10 ` Dr. David Alan Gilbert
  2017-06-08 15:44   ` Dr. David Alan Gilbert
  2017-06-08 16:07 ` Juan Quintela
  2017-06-09  9:09 ` Igor Mammedov
  2 siblings, 1 reply; 7+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-08 15:10 UTC (permalink / raw)
  To: David Gibson, jdenemar; +Cc: quintela, qemu-ppc, qemu-devel

* David Gibson (david@gibson.dropbear.id.au) wrote:
> Hi Dave & Juan,
> 
> I'm hoping one of you can answer this.
> 
> I'm currently grappling with (amongst other things) a pseries machine
> racing a hot unplug operation with a migrate.  There's various issues
> with what interim state we need, and which bits of it need to be
> migrated that I'm still investigating.  But, there's a more general
> question that I'm guessing must have already been addressed for x86.
> 
> For any "soft" unplug device - i.e. using ->unplug_request, rather
> than ->unplug, giving a device_del command will just ask the guest
> nicely to release the device, with the completion of the unplug
> happening only if and when the guest indicates it's ready for the
> device to go away.  AFAICT, the device_del command will return as soon
> as the request is made, but if the guest is busy, the completion of
> the hot unplug could take arbitrarily long.
> 
> So, what happens if there's a migration in between the unplug_request
> and the guest completing the unplug?  How does libvirt (or whatever)
> know whether to include the device on the destination machine command
> line?

No, I don't understand how that works. cc'ing in jdenemar for libvirt

Dave

> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-08 15:10 ` Dr. David Alan Gilbert
@ 2017-06-08 15:44   ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 7+ messages in thread
From: Dr. David Alan Gilbert @ 2017-06-08 15:44 UTC (permalink / raw)
  To: David Gibson, jdenemar; +Cc: quintela, qemu-ppc, qemu-devel

* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> * David Gibson (david@gibson.dropbear.id.au) wrote:
> > Hi Dave & Juan,
> > 
> > I'm hoping one of you can answer this.
> > 
> > I'm currently grappling with (amongst other things) a pseries machine
> > racing a hot unplug operation with a migrate.  There's various issues
> > with what interim state we need, and which bits of it need to be
> > migrated that I'm still investigating.  But, there's a more general
> > question that I'm guessing must have already been addressed for x86.
> > 
> > For any "soft" unplug device - i.e. using ->unplug_request, rather
> > than ->unplug, giving a device_del command will just ask the guest
> > nicely to release the device, with the completion of the unplug
> > happening only if and when the guest indicates it's ready for the
> > device to go away.  AFAICT, the device_del command will return as soon
> > as the request is made, but if the guest is busy, the completion of
> > the hot unplug could take arbitrarily long.
> > 
> > So, what happens if there's a migration in between the unplug_request
> > and the guest completing the unplug?  How does libvirt (or whatever)
> > know whether to include the device on the destination machine command
> > line?
> 
> No, I don't understand how that works. cc'ing in jdenemar for libvirt

I had a bit of a prod, I can see:
  a) There's a 'DEVICE_DELETED' qmp event sent by qemu
  b) I can also see an ACPI OST event that I think might be just for;
DIMMs


That's just poking around libvirt's src/qemu/qemu_process.c



Dave

> Dave
> 
> > -- 
> > David Gibson			| I'll have my music baroque, and my code
> > david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> > 				| _way_ _around_!
> > http://www.ozlabs.org/~dgibson
> 
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson
  2017-06-08 15:10 ` Dr. David Alan Gilbert
@ 2017-06-08 16:07 ` Juan Quintela
  2017-06-09  9:09 ` Igor Mammedov
  2 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2017-06-08 16:07 UTC (permalink / raw)
  To: David Gibson; +Cc: dgilbert, qemu-ppc, qemu-devel

David Gibson <david@gibson.dropbear.id.au> wrote:
> Hi Dave & Juan,
>
> I'm hoping one of you can answer this.
>
> I'm currently grappling with (amongst other things) a pseries machine
> racing a hot unplug operation with a migrate.  There's various issues
> with what interim state we need, and which bits of it need to be
> migrated that I'm still investigating.  But, there's a more general
> question that I'm guessing must have already been addressed for x86.
>
> For any "soft" unplug device - i.e. using ->unplug_request, rather
> than ->unplug, giving a device_del command will just ask the guest
> nicely to release the device, with the completion of the unplug
> happening only if and when the guest indicates it's ready for the
> device to go away.  AFAICT, the device_del command will return as soon
> as the request is made, but if the guest is busy, the completion of
> the hot unplug could take arbitrarily long.
>
> So, what happens if there's a migration in between the unplug_request
> and the guest completing the unplug?  How does libvirt (or whatever)
> know whether to include the device on the destination machine command
> line?

On upstream, I removed the posibility of doing a hotplug/unplug while we
are migrating.  But if device del has returned, I can't see how that can
be detected.

Later, Juan.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson
  2017-06-08 15:10 ` Dr. David Alan Gilbert
  2017-06-08 16:07 ` Juan Quintela
@ 2017-06-09  9:09 ` Igor Mammedov
  2017-06-09 10:03   ` David Gibson
  2 siblings, 1 reply; 7+ messages in thread
From: Igor Mammedov @ 2017-06-09  9:09 UTC (permalink / raw)
  To: David Gibson
  Cc: Dr. David Alan Gilbert, quintela, qemu-ppc, qemu-devel, jdenemar

On Fri, 9 Jun 2017 00:41:06 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> Hi Dave & Juan,
> 
> I'm hoping one of you can answer this.
> 
> I'm currently grappling with (amongst other things) a pseries machine
> racing a hot unplug operation with a migrate.  There's various issues
> with what interim state we need, and which bits of it need to be
> migrated that I'm still investigating.  But, there's a more general
> question that I'm guessing must have already been addressed for x86.
> 
> For any "soft" unplug device - i.e. using ->unplug_request, rather
> than ->unplug, giving a device_del command will just ask the guest
> nicely to release the device, with the completion of the unplug
> happening only if and when the guest indicates it's ready for the
> device to go away.  AFAICT, the device_del command will return as soon
> as the request is made, but if the guest is busy, the completion of
> the hot unplug could take arbitrarily long.
> 
> So, what happens if there's a migration in between the unplug_request
> and the guest completing the unplug?  How does libvirt (or whatever)
> know whether to include the device on the destination machine command
> line?
> 

looking at qdev_unplug():
    if (!migration_is_idle()) { 
        error_setg(errp, "device_del not allowed while migrating");
        return;
    }

so unplug request should fail if migration is in progress , it won't reach guest
and mgmt side will have to repeat request on migration completion.

But it's still possible to issue unplug request first and then start migration,
that's where race between DEVICE_DELETED and migration start (starting DST with
being unplugged device) occurs.

it could be possible:
 1: on unplug_request() set global flag that there is pending unplug and forbid
    migration until completion. But there is no guarantee that unplug will
    be completed nor a way to notice that it's failed/rejected by guest.
    I'm not sure how that could be solved.
 2: set per device pending_unplug flag and delay unplug event from guest
    until migration is completed if migration is in progress when unplug
    callback is called.
    mgmt will treat the case as usual migration, i.e. start dst with being
    unplugged device, and device will be removed on dst side on migration
    completion.
    (it should be generic solution as x86 is also affected), as place where
    to put this common logic I'd suggest hotplug_handler_unplug()

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-09  9:09 ` Igor Mammedov
@ 2017-06-09 10:03   ` David Gibson
  2017-06-09 12:18     ` Juan Quintela
  0 siblings, 1 reply; 7+ messages in thread
From: David Gibson @ 2017-06-09 10:03 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Dr. David Alan Gilbert, quintela, qemu-ppc, qemu-devel, jdenemar

[-- Attachment #1: Type: text/plain, Size: 3149 bytes --]

On Fri, Jun 09, 2017 at 11:09:10AM +0200, Igor Mammedov wrote:
> On Fri, 9 Jun 2017 00:41:06 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > Hi Dave & Juan,
> > 
> > I'm hoping one of you can answer this.
> > 
> > I'm currently grappling with (amongst other things) a pseries machine
> > racing a hot unplug operation with a migrate.  There's various issues
> > with what interim state we need, and which bits of it need to be
> > migrated that I'm still investigating.  But, there's a more general
> > question that I'm guessing must have already been addressed for x86.
> > 
> > For any "soft" unplug device - i.e. using ->unplug_request, rather
> > than ->unplug, giving a device_del command will just ask the guest
> > nicely to release the device, with the completion of the unplug
> > happening only if and when the guest indicates it's ready for the
> > device to go away.  AFAICT, the device_del command will return as soon
> > as the request is made, but if the guest is busy, the completion of
> > the hot unplug could take arbitrarily long.
> > 
> > So, what happens if there's a migration in between the unplug_request
> > and the guest completing the unplug?  How does libvirt (or whatever)
> > know whether to include the device on the destination machine command
> > line?
> > 
> 
> looking at qdev_unplug():
>     if (!migration_is_idle()) { 
>         error_setg(errp, "device_del not allowed while migrating");
>         return;
>     }
> 
> so unplug request should fail if migration is in progress , it won't reach guest
> and mgmt side will have to repeat request on migration completion.
> 
> But it's still possible to issue unplug request first and then start
> migration,

Right, that's the case I'm interested in, not the other way around.

> that's where race between DEVICE_DELETED and migration start (starting DST with
> being unplugged device) occurs.
> 
> it could be possible:
>  1: on unplug_request() set global flag that there is pending unplug and forbid
>     migration until completion. But there is no guarantee that unplug will
>     be completed nor a way to notice that it's failed/rejected by guest.
>     I'm not sure how that could be solved.
>  2: set per device pending_unplug flag and delay unplug event from guest
>     until migration is completed if migration is in progress when unplug
>     callback is called.
>     mgmt will treat the case as usual migration, i.e. start dst with being
>     unplugged device, and device will be removed on dst side on migration
>     completion.
>     (it should be generic solution as x86 is also affected), as place where
>     to put this common logic I'd suggest hotplug_handler_unplug()

So.. it seems like the short version is that racing migration and
unplug is broken already.

Which is unfortunate, but at least means I don't need to worry about
it particularly for Power.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] unplug_request and migration
  2017-06-09 10:03   ` David Gibson
@ 2017-06-09 12:18     ` Juan Quintela
  0 siblings, 0 replies; 7+ messages in thread
From: Juan Quintela @ 2017-06-09 12:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Igor Mammedov, Dr. David Alan Gilbert, qemu-ppc, qemu-devel,
	jdenemar

David Gibson <david@gibson.dropbear.id.au> wrote:
> On Fri, Jun 09, 2017 at 11:09:10AM +0200, Igor Mammedov wrote:
>> On Fri, 9 Jun 2017 00:41:06 +1000
>> David Gibson <david@gibson.dropbear.id.au> wrote:
>> 
>> > Hi Dave & Juan,
>> > 
>> > I'm hoping one of you can answer this.
>> > 
>> > I'm currently grappling with (amongst other things) a pseries machine
>> > racing a hot unplug operation with a migrate.  There's various issues
>> > with what interim state we need, and which bits of it need to be
>> > migrated that I'm still investigating.  But, there's a more general
>> > question that I'm guessing must have already been addressed for x86.
>> > 
>> > For any "soft" unplug device - i.e. using ->unplug_request, rather
>> > than ->unplug, giving a device_del command will just ask the guest
>> > nicely to release the device, with the completion of the unplug
>> > happening only if and when the guest indicates it's ready for the
>> > device to go away.  AFAICT, the device_del command will return as soon
>> > as the request is made, but if the guest is busy, the completion of
>> > the hot unplug could take arbitrarily long.
>> > 
>> > So, what happens if there's a migration in between the unplug_request
>> > and the guest completing the unplug?  How does libvirt (or whatever)
>> > know whether to include the device on the destination machine command
>> > line?
>> > 
>> 
>> looking at qdev_unplug():
>>     if (!migration_is_idle()) { 
>>         error_setg(errp, "device_del not allowed while migrating");
>>         return;
>>     }
>> 
>> so unplug request should fail if migration is in progress , it won't reach guest
>> and mgmt side will have to repeat request on migration completion.
>> 
>> But it's still possible to issue unplug request first and then start
>> migration,
>
> Right, that's the case I'm interested in, not the other way around.
>
>> that's where race between DEVICE_DELETED and migration start (starting DST with
>> being unplugged device) occurs.
>> 
>> it could be possible:
>>  1: on unplug_request() set global flag that there is pending unplug and forbid
>>     migration until completion. But there is no guarantee that unplug will
>>     be completed nor a way to notice that it's failed/rejected by guest.
>>     I'm not sure how that could be solved.
>>  2: set per device pending_unplug flag and delay unplug event from guest
>>     until migration is completed if migration is in progress when unplug
>>     callback is called.
>>     mgmt will treat the case as usual migration, i.e. start dst with being
>>     unplugged device, and device will be removed on dst side on migration
>>     completion.
>>     (it should be generic solution as x86 is also affected), as place where
>>     to put this common logic I'd suggest hotplug_handler_unplug()
>
> So.. it seems like the short version is that racing migration and
> unplug is broken already.

> Which is unfortunate, but at least means I don't need to worry about
> it particularly for Power.

Yeap.  I think that when I put the patches (for 2.10) to disable
hot[un]plug during migration, it was the 1st try to do something about
it.

Later, Juan.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-09 12:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson
2017-06-08 15:10 ` Dr. David Alan Gilbert
2017-06-08 15:44   ` Dr. David Alan Gilbert
2017-06-08 16:07 ` Juan Quintela
2017-06-09  9:09 ` Igor Mammedov
2017-06-09 10:03   ` David Gibson
2017-06-09 12:18     ` Juan Quintela

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).