From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58424) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZUaND-00034W-Rk for qemu-devel@nongnu.org; Wed, 26 Aug 2015 09:04:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZUaN8-0001Ih-Vj for qemu-devel@nongnu.org; Wed, 26 Aug 2015 09:04:15 -0400 References: <1439470382-17540-1-git-send-email-lvivier@redhat.com> <20150814052055.GF4587@in.ibm.com> <55CD9CE9.8010109@redhat.com> <20150823190830.11834.17712@loki> From: Laurent Vivier Message-ID: <55DDB947.2080302@redhat.com> Date: Wed, 26 Aug 2015 15:04:07 +0200 MIME-Version: 1.0 In-Reply-To: <20150823190830.11834.17712@loki> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth , bharata@linux.vnet.ibm.com Cc: dgibson@redhat.com, qemu-ppc@nongnu.org, Alexander Graf , qemu-devel@nongnu.org On 23/08/2015 21:08, Michael Roth wrote: > Quoting Laurent Vivier (2015-08-14 02:46:49) >> >> >> On 14/08/2015 07:20, Bharata B Rao wrote: >>> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote: >>>> When a device is hotplugged, attach() sets "configured" to >>>> false, waiting an action from the OS to configure it and then >>>> to call ibm,configure-connector. On ibm,configure-connector, >>>> the hypervisor sets "configured" to true. >>>> >>>> In case of coldplugged device, attach() sets "configured" to >>>> false, but firmware and OS never call the ibm,configure-connector >>>> in this case, so it remains set to false. >>>> >>>> It could be harmless, but when we unplug a device, hypervisor >>>> waits the device becomes configured because for it, a not configured >>>> device is a device being configured, so it waits the end of configuration >>>> to unplug it... and it never happens, so it is never unplugged. >>> >>> Not true for at least logical DR device like CPU. I am able to cleanly >>> unplug a cold plugged CPU in the patchset I posted at: >>> >>> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html >>> >>> And this is how the state transitions work for cold plugged CPU devices: >>> >>> - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE >>> and isolation_state=UNISOLATED. >>> - device_del results in drck->detach() that just returns by setting >>> drc->awaiting_release to true. >>> - Unplug notification is sent to guest. >>> - Guest comes back with set_indicator RTAS call for setting isolation_state >>> to ISOLATED. set_isolation_state() sets drc->configured to false. >>> - Guest comes back again with set_indicator RTAS call for setting allocation >>> state to UNUSABLE. set_allocation_state() finalizes the device removal by >>> calling drck->detach() >> >> It doesn't work for PCI, because (QEMU 2.4.0): >> >> static int set_allocation_state(sPAPRDRConnector *drc, >> sPAPRDRAllocationState state) >> ... >> if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) { >> ... >> drck->detach(drc, DEVICE(drc->dev), drc->detach_cb, >> drc->detach_cb_opaque, NULL); >> ... > > Ok, that makes sense then: > > the is_configured() checks were added due to a race specifically with > PCI devices: when we plug the device we hand control over to OS and set > state to unisolated as a result. The guest assumes 'interactive' hotplug > where it sets a slot back to isolated and waits for the user to actually > plug it in. Once plugged in, state is moved back to isolated, and guest > starts configuring device. We use a flag in guest drmgr invocation to skip > the wait, but it *still* does the change to isolated state. So there's an > extra unisolated->isolated->unisolated transition for PCI in guest code. > > Because of that check, if management does a quick device_add+device_del, > there's a race where we mark the device as awaiting_release as soon as > the device_del comes in (even though device_add event might still be > getting processed by guest). That would fine normally, but in this state > a transition to isolated state results in the device getting immediately > finalized and then disappearing while the guest is trying to configure > it, so the extra transition in the PCI case races with device_del. > > The is_configured() check removes that race window, and the check was > added in set_isolation(). 'logical' resources (lmb/cpu/phb) get > finalized via set_allocation() however, which is why they didn't appear > affected by this bug. And from what I can tell, cpu/lmb don't make extra > 'isolated'/'unallocated' transitions, just the ones at the end unplug, > so the fact that we're missing the check in set_allocation() shouldn't > be a problem. Makes sense to set the configured flag appropriately for > those case as well though for consistency. > > Reviewed-by: Michael Roth David or Alex, are you ready to take this patch to your -next branch ? > >> } >> >>> - drck->detach() now calls drc->detach_cb() that truly releases the >>> CPU resource by getting rid of vCPU thread in QEMU. >> >> Laurent >> >