* [Qemu-devel] unplug_request and migration @ 2017-06-08 14:41 David Gibson 2017-06-08 15:10 ` Dr. David Alan Gilbert ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: David Gibson @ 2017-06-08 14:41 UTC (permalink / raw) To: dgilber, quintela; +Cc: qemu-ppc, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1281 bytes --] Hi Dave & Juan, I'm hoping one of you can answer this. I'm currently grappling with (amongst other things) a pseries machine racing a hot unplug operation with a migrate. There's various issues with what interim state we need, and which bits of it need to be migrated that I'm still investigating. But, there's a more general question that I'm guessing must have already been addressed for x86. For any "soft" unplug device - i.e. using ->unplug_request, rather than ->unplug, giving a device_del command will just ask the guest nicely to release the device, with the completion of the unplug happening only if and when the guest indicates it's ready for the device to go away. AFAICT, the device_del command will return as soon as the request is made, but if the guest is busy, the completion of the hot unplug could take arbitrarily long. So, what happens if there's a migration in between the unplug_request and the guest completing the unplug? How does libvirt (or whatever) know whether to include the device on the destination machine command line? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson @ 2017-06-08 15:10 ` Dr. David Alan Gilbert 2017-06-08 15:44 ` Dr. David Alan Gilbert 2017-06-08 16:07 ` Juan Quintela 2017-06-09 9:09 ` Igor Mammedov 2 siblings, 1 reply; 7+ messages in thread From: Dr. David Alan Gilbert @ 2017-06-08 15:10 UTC (permalink / raw) To: David Gibson, jdenemar; +Cc: quintela, qemu-ppc, qemu-devel * David Gibson (david@gibson.dropbear.id.au) wrote: > Hi Dave & Juan, > > I'm hoping one of you can answer this. > > I'm currently grappling with (amongst other things) a pseries machine > racing a hot unplug operation with a migrate. There's various issues > with what interim state we need, and which bits of it need to be > migrated that I'm still investigating. But, there's a more general > question that I'm guessing must have already been addressed for x86. > > For any "soft" unplug device - i.e. using ->unplug_request, rather > than ->unplug, giving a device_del command will just ask the guest > nicely to release the device, with the completion of the unplug > happening only if and when the guest indicates it's ready for the > device to go away. AFAICT, the device_del command will return as soon > as the request is made, but if the guest is busy, the completion of > the hot unplug could take arbitrarily long. > > So, what happens if there's a migration in between the unplug_request > and the guest completing the unplug? How does libvirt (or whatever) > know whether to include the device on the destination machine command > line? No, I don't understand how that works. cc'ing in jdenemar for libvirt Dave > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-08 15:10 ` Dr. David Alan Gilbert @ 2017-06-08 15:44 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 7+ messages in thread From: Dr. David Alan Gilbert @ 2017-06-08 15:44 UTC (permalink / raw) To: David Gibson, jdenemar; +Cc: quintela, qemu-ppc, qemu-devel * Dr. David Alan Gilbert (dgilbert@redhat.com) wrote: > * David Gibson (david@gibson.dropbear.id.au) wrote: > > Hi Dave & Juan, > > > > I'm hoping one of you can answer this. > > > > I'm currently grappling with (amongst other things) a pseries machine > > racing a hot unplug operation with a migrate. There's various issues > > with what interim state we need, and which bits of it need to be > > migrated that I'm still investigating. But, there's a more general > > question that I'm guessing must have already been addressed for x86. > > > > For any "soft" unplug device - i.e. using ->unplug_request, rather > > than ->unplug, giving a device_del command will just ask the guest > > nicely to release the device, with the completion of the unplug > > happening only if and when the guest indicates it's ready for the > > device to go away. AFAICT, the device_del command will return as soon > > as the request is made, but if the guest is busy, the completion of > > the hot unplug could take arbitrarily long. > > > > So, what happens if there's a migration in between the unplug_request > > and the guest completing the unplug? How does libvirt (or whatever) > > know whether to include the device on the destination machine command > > line? > > No, I don't understand how that works. cc'ing in jdenemar for libvirt I had a bit of a prod, I can see: a) There's a 'DEVICE_DELETED' qmp event sent by qemu b) I can also see an ACPI OST event that I think might be just for; DIMMs That's just poking around libvirt's src/qemu/qemu_process.c Dave > Dave > > > -- > > David Gibson | I'll have my music baroque, and my code > > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > > | _way_ _around_! > > http://www.ozlabs.org/~dgibson > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson 2017-06-08 15:10 ` Dr. David Alan Gilbert @ 2017-06-08 16:07 ` Juan Quintela 2017-06-09 9:09 ` Igor Mammedov 2 siblings, 0 replies; 7+ messages in thread From: Juan Quintela @ 2017-06-08 16:07 UTC (permalink / raw) To: David Gibson; +Cc: dgilbert, qemu-ppc, qemu-devel David Gibson <david@gibson.dropbear.id.au> wrote: > Hi Dave & Juan, > > I'm hoping one of you can answer this. > > I'm currently grappling with (amongst other things) a pseries machine > racing a hot unplug operation with a migrate. There's various issues > with what interim state we need, and which bits of it need to be > migrated that I'm still investigating. But, there's a more general > question that I'm guessing must have already been addressed for x86. > > For any "soft" unplug device - i.e. using ->unplug_request, rather > than ->unplug, giving a device_del command will just ask the guest > nicely to release the device, with the completion of the unplug > happening only if and when the guest indicates it's ready for the > device to go away. AFAICT, the device_del command will return as soon > as the request is made, but if the guest is busy, the completion of > the hot unplug could take arbitrarily long. > > So, what happens if there's a migration in between the unplug_request > and the guest completing the unplug? How does libvirt (or whatever) > know whether to include the device on the destination machine command > line? On upstream, I removed the posibility of doing a hotplug/unplug while we are migrating. But if device del has returned, I can't see how that can be detected. Later, Juan. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson 2017-06-08 15:10 ` Dr. David Alan Gilbert 2017-06-08 16:07 ` Juan Quintela @ 2017-06-09 9:09 ` Igor Mammedov 2017-06-09 10:03 ` David Gibson 2 siblings, 1 reply; 7+ messages in thread From: Igor Mammedov @ 2017-06-09 9:09 UTC (permalink / raw) To: David Gibson Cc: Dr. David Alan Gilbert, quintela, qemu-ppc, qemu-devel, jdenemar On Fri, 9 Jun 2017 00:41:06 +1000 David Gibson <david@gibson.dropbear.id.au> wrote: > Hi Dave & Juan, > > I'm hoping one of you can answer this. > > I'm currently grappling with (amongst other things) a pseries machine > racing a hot unplug operation with a migrate. There's various issues > with what interim state we need, and which bits of it need to be > migrated that I'm still investigating. But, there's a more general > question that I'm guessing must have already been addressed for x86. > > For any "soft" unplug device - i.e. using ->unplug_request, rather > than ->unplug, giving a device_del command will just ask the guest > nicely to release the device, with the completion of the unplug > happening only if and when the guest indicates it's ready for the > device to go away. AFAICT, the device_del command will return as soon > as the request is made, but if the guest is busy, the completion of > the hot unplug could take arbitrarily long. > > So, what happens if there's a migration in between the unplug_request > and the guest completing the unplug? How does libvirt (or whatever) > know whether to include the device on the destination machine command > line? > looking at qdev_unplug(): if (!migration_is_idle()) { error_setg(errp, "device_del not allowed while migrating"); return; } so unplug request should fail if migration is in progress , it won't reach guest and mgmt side will have to repeat request on migration completion. But it's still possible to issue unplug request first and then start migration, that's where race between DEVICE_DELETED and migration start (starting DST with being unplugged device) occurs. it could be possible: 1: on unplug_request() set global flag that there is pending unplug and forbid migration until completion. But there is no guarantee that unplug will be completed nor a way to notice that it's failed/rejected by guest. I'm not sure how that could be solved. 2: set per device pending_unplug flag and delay unplug event from guest until migration is completed if migration is in progress when unplug callback is called. mgmt will treat the case as usual migration, i.e. start dst with being unplugged device, and device will be removed on dst side on migration completion. (it should be generic solution as x86 is also affected), as place where to put this common logic I'd suggest hotplug_handler_unplug() ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-09 9:09 ` Igor Mammedov @ 2017-06-09 10:03 ` David Gibson 2017-06-09 12:18 ` Juan Quintela 0 siblings, 1 reply; 7+ messages in thread From: David Gibson @ 2017-06-09 10:03 UTC (permalink / raw) To: Igor Mammedov Cc: Dr. David Alan Gilbert, quintela, qemu-ppc, qemu-devel, jdenemar [-- Attachment #1: Type: text/plain, Size: 3149 bytes --] On Fri, Jun 09, 2017 at 11:09:10AM +0200, Igor Mammedov wrote: > On Fri, 9 Jun 2017 00:41:06 +1000 > David Gibson <david@gibson.dropbear.id.au> wrote: > > > Hi Dave & Juan, > > > > I'm hoping one of you can answer this. > > > > I'm currently grappling with (amongst other things) a pseries machine > > racing a hot unplug operation with a migrate. There's various issues > > with what interim state we need, and which bits of it need to be > > migrated that I'm still investigating. But, there's a more general > > question that I'm guessing must have already been addressed for x86. > > > > For any "soft" unplug device - i.e. using ->unplug_request, rather > > than ->unplug, giving a device_del command will just ask the guest > > nicely to release the device, with the completion of the unplug > > happening only if and when the guest indicates it's ready for the > > device to go away. AFAICT, the device_del command will return as soon > > as the request is made, but if the guest is busy, the completion of > > the hot unplug could take arbitrarily long. > > > > So, what happens if there's a migration in between the unplug_request > > and the guest completing the unplug? How does libvirt (or whatever) > > know whether to include the device on the destination machine command > > line? > > > > looking at qdev_unplug(): > if (!migration_is_idle()) { > error_setg(errp, "device_del not allowed while migrating"); > return; > } > > so unplug request should fail if migration is in progress , it won't reach guest > and mgmt side will have to repeat request on migration completion. > > But it's still possible to issue unplug request first and then start > migration, Right, that's the case I'm interested in, not the other way around. > that's where race between DEVICE_DELETED and migration start (starting DST with > being unplugged device) occurs. > > it could be possible: > 1: on unplug_request() set global flag that there is pending unplug and forbid > migration until completion. But there is no guarantee that unplug will > be completed nor a way to notice that it's failed/rejected by guest. > I'm not sure how that could be solved. > 2: set per device pending_unplug flag and delay unplug event from guest > until migration is completed if migration is in progress when unplug > callback is called. > mgmt will treat the case as usual migration, i.e. start dst with being > unplugged device, and device will be removed on dst side on migration > completion. > (it should be generic solution as x86 is also affected), as place where > to put this common logic I'd suggest hotplug_handler_unplug() So.. it seems like the short version is that racing migration and unplug is broken already. Which is unfortunate, but at least means I don't need to worry about it particularly for Power. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] unplug_request and migration 2017-06-09 10:03 ` David Gibson @ 2017-06-09 12:18 ` Juan Quintela 0 siblings, 0 replies; 7+ messages in thread From: Juan Quintela @ 2017-06-09 12:18 UTC (permalink / raw) To: David Gibson Cc: Igor Mammedov, Dr. David Alan Gilbert, qemu-ppc, qemu-devel, jdenemar David Gibson <david@gibson.dropbear.id.au> wrote: > On Fri, Jun 09, 2017 at 11:09:10AM +0200, Igor Mammedov wrote: >> On Fri, 9 Jun 2017 00:41:06 +1000 >> David Gibson <david@gibson.dropbear.id.au> wrote: >> >> > Hi Dave & Juan, >> > >> > I'm hoping one of you can answer this. >> > >> > I'm currently grappling with (amongst other things) a pseries machine >> > racing a hot unplug operation with a migrate. There's various issues >> > with what interim state we need, and which bits of it need to be >> > migrated that I'm still investigating. But, there's a more general >> > question that I'm guessing must have already been addressed for x86. >> > >> > For any "soft" unplug device - i.e. using ->unplug_request, rather >> > than ->unplug, giving a device_del command will just ask the guest >> > nicely to release the device, with the completion of the unplug >> > happening only if and when the guest indicates it's ready for the >> > device to go away. AFAICT, the device_del command will return as soon >> > as the request is made, but if the guest is busy, the completion of >> > the hot unplug could take arbitrarily long. >> > >> > So, what happens if there's a migration in between the unplug_request >> > and the guest completing the unplug? How does libvirt (or whatever) >> > know whether to include the device on the destination machine command >> > line? >> > >> >> looking at qdev_unplug(): >> if (!migration_is_idle()) { >> error_setg(errp, "device_del not allowed while migrating"); >> return; >> } >> >> so unplug request should fail if migration is in progress , it won't reach guest >> and mgmt side will have to repeat request on migration completion. >> >> But it's still possible to issue unplug request first and then start >> migration, > > Right, that's the case I'm interested in, not the other way around. > >> that's where race between DEVICE_DELETED and migration start (starting DST with >> being unplugged device) occurs. >> >> it could be possible: >> 1: on unplug_request() set global flag that there is pending unplug and forbid >> migration until completion. But there is no guarantee that unplug will >> be completed nor a way to notice that it's failed/rejected by guest. >> I'm not sure how that could be solved. >> 2: set per device pending_unplug flag and delay unplug event from guest >> until migration is completed if migration is in progress when unplug >> callback is called. >> mgmt will treat the case as usual migration, i.e. start dst with being >> unplugged device, and device will be removed on dst side on migration >> completion. >> (it should be generic solution as x86 is also affected), as place where >> to put this common logic I'd suggest hotplug_handler_unplug() > > So.. it seems like the short version is that racing migration and > unplug is broken already. > Which is unfortunate, but at least means I don't need to worry about > it particularly for Power. Yeap. I think that when I put the patches (for 2.10) to disable hot[un]plug during migration, it was the 1st try to do something about it. Later, Juan. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-06-09 12:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-06-08 14:41 [Qemu-devel] unplug_request and migration David Gibson 2017-06-08 15:10 ` Dr. David Alan Gilbert 2017-06-08 15:44 ` Dr. David Alan Gilbert 2017-06-08 16:07 ` Juan Quintela 2017-06-09 9:09 ` Igor Mammedov 2017-06-09 10:03 ` David Gibson 2017-06-09 12:18 ` Juan Quintela
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).