From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41700) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZrwD-0000oE-Fr for qemu-devel@nongnu.org; Wed, 09 Sep 2015 22:50:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZZrwA-0006N0-7Y for qemu-devel@nongnu.org; Wed, 09 Sep 2015 22:50:13 -0400 Received: from e19.ny.us.ibm.com ([129.33.205.209]:42170) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZZrwA-0006Mn-3a for qemu-devel@nongnu.org; Wed, 09 Sep 2015 22:50:10 -0400 Received: from /spool/local by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 9 Sep 2015 22:50:09 -0400 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Michael Roth In-Reply-To: <20150910011800.GF17641@voom.redhat.com> References: <1441755895-8920-1-git-send-email-mdroth@linux.vnet.ibm.com> <20150909041041.GA17641@voom.redhat.com> <20150909171922.3885.44908@loki> <20150910011800.GF17641@voom.redhat.com> Message-ID: <20150910025001.3885.59200@loki> Date: Wed, 09 Sep 2015 21:50:01 -0500 Subject: Re: [Qemu-devel] [PATCH v2] spapr_drc: don't allow 'empty' DRCs to be unisolated List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, Bharata B Rao Quoting David Gibson (2015-09-09 20:18:01) > On Wed, Sep 09, 2015 at 12:19:22PM -0500, Michael Roth wrote: > > Quoting David Gibson (2015-09-08 23:10:41) > > > On Tue, Sep 08, 2015 at 06:44:55PM -0500, Michael Roth wrote: > > > > Logical resources start with allocation-state:UNUSABLE / > > > > isolation-state:ISOLATED. During hotplug, guests will transition > > > > them to allocate-state:USABLE, and then to isolate-state:UNISOLATED. > > > > The former transition does not seem to have any failure path for > > > > cases where a DRC does not have any resources associated with it to > > > > allocate for guest, but instead relies on the subsequent > > > > isolation-state:UNISOLATED transition to indicate failure in this > > > > situation. > > > > = > > > > Currently DRC code does not implement this logic, but instead > > > > tries to indicate failure by refusing the allocation-state:USABLE > > > > transition. Unfortunately, since that's not a documented failure > > > > path, guests continue undeterred, causing undefined behavior in > > > > QEMU and guest code. > > > > = > > > > Fix this by handling things as PAPR defines (13.7 and 13.7.3.1). > > > > = > > > > Cc: qemu-ppc@nongnu.org > > > > Cc: David Gibson > > > > Cc: Bharata B Rao > > > > Signed-off-by: Michael Roth > > > > --- > > > > v2: > > > > - actually include the full changeset in the patch > > > = > > > Several queries for clarification: > > = > > Looks like you've already picked this up, but just in case: > > = > > > = > > > * Is this intended to replace Bharata's patch "spapr_drc: > > > Return correct state for logical DR in entity_sense()" or to apply > > > on top of it? > > = > > Yes this should replace that patch. > > = > > > = > > > * If I'm understanding correctly, the problem here is that although > > > the guest is supposed to check for failures to set the allocation > > > state, it's actually not? This patch is to make qemu gracefully > > > handle the guest's failure to do this? Is that right? > > = > > The root issue is that allocation state setting doesn't actually have > > a documented failure path. Instead, the subsequent isolation state > > setting is where we surface an error if we're unable to actually > > provide the device to the guest (due to it not being attached > > to the DRC in question). > = > Um.. 13.7.3.1 (step 2) implies checking for failures on setting > allocation state, and 13.5.3.5 Table 177 says "If the transition to > the usable state is not possible the -3 (no such indicator > implemented) status is returned." > = > How is that not a documented failure path? Hmm, I think I misunderpreted this statement from 13.7: "The OS may use the get-sensor-state RTAS call with the dr-entity-sense token to deter- mine if a given drc-index refers to a connector that is currently usable for DR operations. If the connector is not currently usable the return state is =E2=80=9CDR entity unusable=E2=80=9D (= 2). A set-indicator (isolation state) RTAS call to an unusable connector or (dr-indicator) to any logical resource connector results in a =E2=80=9CNo such indicator implemented=E2=80=9D return sta- tus." To me it seemed to suggest that isolation-state is where we fail attempts to bring up an UNUSABLE resource, but the statement doesn't actually preclude us from returning error on setting allocation-state. I'm not sure how I misinterpreted 13.7.3.1 to support this assumption, it does seem to clearly indicate setting allocation-state can return error (and in fact mentions nothing about accounting for isolation-state errors in that case), and 13.5.3.5 makes it clear it should also be -3 in that case. I'll send a v3 with the set-allocation logic in place. I'll also look into returning RTAS errors directly. Thanks for the catch! > = > > We weren't surfacing that error in the isolation state setting, so the > > guest was continuing on with configuring devices that aren't actually > > present. This patch should correct that. > = > So, to be clear, I think the check in set isolation-state is a good > idea as well, to properly handle a misbehaving guest. > = > > Personally it seems like allocation state setting is where that failure > > should occur, since that's the earliest point to surface the error, but > > that's how PAPR has it documented. > = > .. but I'm not seeing how PAPR is documenting it as happening in set > isolation state rather than set allocation-state. > = > -- = > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _othe= r_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson