From: Laurent Vivier <lvivier@redhat.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>, bharata@linux.vnet.ibm.com
Cc: dgibson@redhat.com, qemu-ppc@nongnu.org,
Alexander Graf <agraf@suse.de>,
qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured"
Date: Wed, 26 Aug 2015 15:04:07 +0200 [thread overview]
Message-ID: <55DDB947.2080302@redhat.com> (raw)
In-Reply-To: <20150823190830.11834.17712@loki>
On 23/08/2015 21:08, Michael Roth wrote:
> Quoting Laurent Vivier (2015-08-14 02:46:49)
>>
>>
>> On 14/08/2015 07:20, Bharata B Rao wrote:
>>> On Thu, Aug 13, 2015 at 02:53:02PM +0200, Laurent Vivier wrote:
>>>> When a device is hotplugged, attach() sets "configured" to
>>>> false, waiting an action from the OS to configure it and then
>>>> to call ibm,configure-connector. On ibm,configure-connector,
>>>> the hypervisor sets "configured" to true.
>>>>
>>>> In case of coldplugged device, attach() sets "configured" to
>>>> false, but firmware and OS never call the ibm,configure-connector
>>>> in this case, so it remains set to false.
>>>>
>>>> It could be harmless, but when we unplug a device, hypervisor
>>>> waits the device becomes configured because for it, a not configured
>>>> device is a device being configured, so it waits the end of configuration
>>>> to unplug it... and it never happens, so it is never unplugged.
>>>
>>> Not true for at least logical DR device like CPU. I am able to cleanly
>>> unplug a cold plugged CPU in the patchset I posted at:
>>>
>>> https://lists.gnu.org/archive/html/qemu-ppc/2015-08/msg00041.html
>>>
>>> And this is how the state transitions work for cold plugged CPU devices:
>>>
>>> - Cold plugged CPU DRC is explicitly set with allocation_state=USABLE
>>> and isolation_state=UNISOLATED.
>>> - device_del results in drck->detach() that just returns by setting
>>> drc->awaiting_release to true.
>>> - Unplug notification is sent to guest.
>>> - Guest comes back with set_indicator RTAS call for setting isolation_state
>>> to ISOLATED. set_isolation_state() sets drc->configured to false.
>>> - Guest comes back again with set_indicator RTAS call for setting allocation
>>> state to UNUSABLE. set_allocation_state() finalizes the device removal by
>>> calling drck->detach()
>>
>> It doesn't work for PCI, because (QEMU 2.4.0):
>>
>> static int set_allocation_state(sPAPRDRConnector *drc,
>> sPAPRDRAllocationState state)
>> ...
>> if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
>> ...
>> drck->detach(drc, DEVICE(drc->dev), drc->detach_cb,
>> drc->detach_cb_opaque, NULL);
>> ...
>
> Ok, that makes sense then:
>
> the is_configured() checks were added due to a race specifically with
> PCI devices: when we plug the device we hand control over to OS and set
> state to unisolated as a result. The guest assumes 'interactive' hotplug
> where it sets a slot back to isolated and waits for the user to actually
> plug it in. Once plugged in, state is moved back to isolated, and guest
> starts configuring device. We use a flag in guest drmgr invocation to skip
> the wait, but it *still* does the change to isolated state. So there's an
> extra unisolated->isolated->unisolated transition for PCI in guest code.
>
> Because of that check, if management does a quick device_add+device_del,
> there's a race where we mark the device as awaiting_release as soon as
> the device_del comes in (even though device_add event might still be
> getting processed by guest). That would fine normally, but in this state
> a transition to isolated state results in the device getting immediately
> finalized and then disappearing while the guest is trying to configure
> it, so the extra transition in the PCI case races with device_del.
>
> The is_configured() check removes that race window, and the check was
> added in set_isolation(). 'logical' resources (lmb/cpu/phb) get
> finalized via set_allocation() however, which is why they didn't appear
> affected by this bug. And from what I can tell, cpu/lmb don't make extra
> 'isolated'/'unallocated' transitions, just the ones at the end unplug,
> so the fact that we're missing the check in set_allocation() shouldn't
> be a problem. Makes sense to set the configured flag appropriately for
> those case as well though for consistency.
>
> Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David or Alex, are you ready to take this patch to your -next branch ?
>
>> }
>>
>>> - drck->detach() now calls drc->detach_cb() that truly releases the
>>> CPU resource by getting rid of vCPU thread in QEMU.
>>
>> Laurent
>>
>
next prev parent reply other threads:[~2015-08-26 13:04 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-13 12:53 [Qemu-devel] [PATCH RFC] pseries: define coldplugged devices as "configured" Laurent Vivier
2015-08-14 5:20 ` Bharata B Rao
2015-08-14 7:16 ` Laurent Vivier
2015-08-14 7:44 ` Bharata B Rao
2015-08-14 7:46 ` Laurent Vivier
2015-08-23 19:08 ` Michael Roth
2015-08-26 13:04 ` Laurent Vivier [this message]
2015-08-14 12:33 ` [Qemu-devel] [Qemu-ppc] " Laurent Vivier
2015-09-01 5:00 ` David Gibson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55DDB947.2080302@redhat.com \
--to=lvivier@redhat.com \
--cc=agraf@suse.de \
--cc=bharata@linux.vnet.ibm.com \
--cc=dgibson@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.