From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54649) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YZqXH-0005KJ-9w for qemu-devel@nongnu.org; Sun, 22 Mar 2015 20:48:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YZqXD-0001uC-8F for qemu-devel@nongnu.org; Sun, 22 Mar 2015 20:48:07 -0400 Date: Mon, 23 Mar 2015 11:48:35 +1100 From: David Gibson Message-ID: <20150323004835.GC25043@voom.fritz.box> References: <1425006675-19976-8-git-send-email-mdroth@linux.vnet.ibm.com> <20150302070246.GH29409@voom.fritz.box> <20150303044016.27171.3218@loki> <20150303053339.GN29409@voom.fritz.box> <20150304055034.27171.34590@loki> <20150304133708.27171.49509@loki> <20150305043040.GL18072@voom.fritz.box> <20150305141258.2674.35847@loki> <20150312055210.GT11973@voom.redhat.com> <20150317033129.16682.56158@loki> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="ZmUaFz6apKcXQszQ" Content-Disposition: inline In-Reply-To: <20150317033129.16682.56158@loki> Subject: Re: [Qemu-devel] [PATCH v6 07/15] spapr_rtas: add ibm, configure-connector RTAS interface List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Roth Cc: aik@ozlabs.ru, qemu-devel@nongnu.org, agraf@suse.de, ncmike@ncultra.org, qemu-ppc@nongnu.org, tyreld@linux.vnet.ibm.com, bharata.rao@gmail.com, nfont@linux.vnet.ibm.com --ZmUaFz6apKcXQszQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Mar 16, 2015 at 10:31:29PM -0500, Michael Roth wrote: > Quoting David Gibson (2015-03-12 00:52:10) > > On Thu, Mar 05, 2015 at 08:12:58AM -0600, Michael Roth wrote: > > > Quoting David Gibson (2015-03-04 22:30:40) > > > > On Wed, Mar 04, 2015 at 07:37:08AM -0600, Michael Roth wrote: > > > > > Quoting Michael Roth (2015-03-03 23:50:34) > > > > > > Quoting David Gibson (2015-03-02 23:33:39) > > > > > > > On Mon, Mar 02, 2015 at 10:40:16PM -0600, Michael Roth wrote: > > > > > > > > Quoting David Gibson (2015-03-02 01:02:46) > > > > > > > > > On Thu, Feb 26, 2015 at 09:11:07PM -0600, Michael Roth wr= ote: > > > > > > > > > > This interface is used to fetch an OF device-tree nodes= that describes a > > > > > > > > > > newly-attached device to guest. It is called multiple t= imes to walk the > > > > > > > > > > device-tree node and fetch individual properties into a= 'workarea'/buffer > > > > > > > > > > provided by the guest. > > > > > > > > > >=20 > > > > > > > > > > The device-tree is generated by QEMU and passed to an s= PAPRDRConnector during > > > > > > > > > > the initial hotplug operation, and the state of these R= TAS calls is tracked by > > > > > > > > > > the sPAPRDRConnector. When the last of these properties= is successfully > > > > > > > > > > fetched, we report as special return value to the guest= and transition > > > > > > > > > > the device to a 'configured' state on the QEMU/DRC side. > > > > > > > > > >=20 > > > > > > > > > > See docs/specs/ppc-spapr-hotplug.txt for a complete des= cription of > > > > > > > > > > this interface. > > > > > > > > > >=20 > > > > > > > > > > Signed-off-by: Michael Roth > > > > > > > > >=20 > > > > > > > > >=20 > > > > > > > > > So, actually, here's probably the best place to explain w= hat I had in > > > > > > > > > mind for changing the internal interface for this stuff. = I was > > > > > > > > > thinking something like this pseudocode: > > > > > > > > >=20 > > > > > > > > > struct DRCCCState { > > > > > > > > > void *fdt; > > > > > > > > > int offset; > > > > > > > > > int depth; > > > > > > > > > }; > > > > > > > > >=20 > > > > > > > > > rtas_configure_connector() > > > > > > > > > { > > > > > > > > > ... > > > > > > > > > DRCCCState *ccstate; > > > > > > > > > ... > > > > > > > > >=20 > > > > > > > > > /* check parameters, retrieve drc */ > > > > > > > > > ccstate =3D drc->ccstate; > > > > > > > > >=20 > > > > > > > > > if (!ccstate) { > > > > > > > > > /* Haven't started configuring yet */ > > > > > > > > > ccstate =3D malloc(...); > > > > > > > > > /* Retrieve the dt fragment from the back= end */ > > > > > > > > > ccstate->fdt =3D drck->get_dt(...); > > > > > > > > > ccstate->offset =3D 0; > > > > > > > > > } > > > > > > > > >=20 > > > > > > > > > while (get next tag from fdt) { > > > > > > > > > switch (tag) > > > > > > > > > case FDT_PROPERTY: > > > > > > > > > /* Translate property into rtas r= eturn values */ > > > > > > > > > return SPAPR_DR_CC_RESPONSE_NEXT_= PROPERTY; > > > > > > > > >=20 > > > > > > > > > /* other cases ... */ > > > > > > > > > } > > > > > > > > > =20 > > > > > > > > > /* Fall through only if we've completed streaming= out the dt > > > > > > > > > */ > > > > > > > > >=20 > > > > > > > > > /* Tell the back end we've finished configuring = */ > > > > > > > > > drck->cc_completed(...); > > > > > > > > > return SPAPR_DR_CC_RESPONSE_SUCCESS; > > > > > > > > > } > > > > > > > > >=20 > > > > > > > > > On reset, or anything else which interrupts the configura= tion process, > > > > > > > > > just blow away drc->ccstate. > > > > > > > >=20 > > > > > > > > Ok, that seems reasonable. I took a stab at it here: > > > > > > > >=20 > > > > > > > > https://github.com/mdroth/qemu/commit/79ce372743da1b63a= 6fa33e3de1f1daba8ea1fdc > > > > > > > > https://github.com/mdroth/qemu/commits/spapr-hotplug-pci > > > > > > >=20 > > > > > > > It's looking pretty close now, thanks for the rework. > > > > > > >=20 > > > > > > > > It exposes the ccstate as you suggested, via drck->get_cc_s= tate(), and in > > > > > > > > place of drck->cc_completed() I have drck->set_configured()= which serves > > > > > > > > roughly the same purpose I think. I opted not to let RTAS h= andle > > > > > > > > allocation, since it seemed to imply RTAS owns it and not t= he DRC. > > > > > > >=20 > > > > > > > So, that was intentional; basically RTAS *does* own the CCsta= te. But > > > > > > > for convenience of index we need connect it to the DRC. Thin= k of it > > > > > > > like an rtas_priv field in the DRC. > > > > > > >=20 > > > > > > > In particular I think the CCstate should be opaque to everyth= ing > > > > > > > except the RTAS code itself, which means initializing the off= set and > > > > > > > depth in RTAS, not in a drck callback. As far as the drck ca= llback > > > > > > > is concerned, it's supplying a dt fragment, but it doesn't ca= re about > > > > > > > the details of how the upper layer communicates that through = to the > > > > > > > guest. > > > > > >=20 > > > > > > Ah ok, so it was about moving the CCState out of DRC, and not j= ust the > > > > > > awkward interface that wraps FDT traversal. So I went ahead and= did it > > > > > > as you suggested, but also making it actually opaque, and relyi= ng on > > > > > > a couple callbacks that configure-connector passes to > > > > > > drc->begin_configure_connector to handle init/reset of the CCSt= ate > > > > > > fields (such as the fdt, and the start offset (which isn't nece= ssarilly 0)): > > > > > >=20 > > > > > > https://github.com/mdroth/qemu/commits/spapr-hotplug-pci > > > > > > https://github.com/mdroth/qemu/commit/732aa10fa2e41951c396373= e7df7d31861322531 > > > > > >=20 > > > > > > I think I have all your other comments addressed, so if that lo= oks ok > > > > > > I'll post v7 soon. Thanks! > > > > >=20 > > > > > Yikes, just noticed a use-after-free in the new code. Fixed here: > > > > >=20 > > > > > https://github.com/mdroth/qemu/commit/3fd03f649dc5cd34aa6e2544d= 38855dd0f8b3708 > > > >=20 > > > > Ok, I'm now getting myself a bit tangled in the various revisions. > > > > However looking at > > > >=20 > > > > https://github.com/mdroth/qemu/commit/732aa10fa2e41951c396373e7df7d= 31861322531 > > > >=20 > > > > The ->begin_configure_connector stuff seems unnecessarily > > > > complicated. Couldn't you just have begin_configure_connector() > > > > return the fdt, then initialize ccs in rtas_ibm_configure_connector= () > > > > itself, avoiding the callback-from-a-callback. > > >=20 > > > We need the fdt, as well as the fdt starting offset, to initialize th= e CCS. > >=20 > > Do you actually have a use-case for a non-zero starting offset? Or > > could you simplify by having the individual PCI device always create > > its fdt fragment at offset 0. >=20 > Something as simple as: >=20 > offset =3D fdt_add_subnode(fdt, 0, "pci@2"); >=20 > Results in offset =3D 8 Oh, right, of course. I'd forgotten that your sample node wasn't the "root" of your fragmentary tree, but instead a subnode of that unused root. > I'm not sure exactly why, but I guess a subnode has an inherent offset as= sociated > with it. Well, in this case it will be the FDT_NODE_BEGIN for the fragment's root node at offset 0, the root node's name ("") at offset 4, then we align to a 4 byte boundary, leaving the next tag, FDT_NODE_BEGIN for the node you care about at offset 8. So.. it would be possible to bake the PCI tree fragments with the actual PCI node as the root node (this sort of case is why we allow a name on the root node, although it's usually ""). It might be a bit awkward though - it can't be done with fdt_add_subnode(), only with fdt_begin_node(). > I've since found that fdt_offset_ptr() can be used to bake the offset int= o the fdt > pointer, so RTAS can treat the offset as 0 from that point forward. Hrm.. I'm not sure quite what you have in mind here; it doesn't really sound like an intended use case for fdt_offset_ptr() (which is mostly intended as an internal interface only). > I've implemented a drc->get_fdt() using this approach. >=20 > >=20 > > > I think it's a matter a of taste whether that's those are returned se= parately, > > > or through a callback passed via begin_configure_connector. The appro= ach I > > > took just seemed a bit more instructive about what data was needed, > > > and why. > >=20 > > > drck->get_fdt() and drck->get_fdt_starting_offset() instead of the > > > callback seemed a bit much too specific in purpose to warrant a gener= al > > > interface, and it since we seem to need a reset_ccs anyway (see below= ), > > > init_ccs seemed like a good place to contain those values. > >=20 > > Um.. I'm a bit confused by this. You could return both the fdt > > pointert and offset as one call using pointers or a structure return > > value without needing to invoke a callback-from-a-callback. >=20 > True, a get_fdt() could also take a pointer arg to store the offset, so > that's doable. fdt_offset_ptr() is a bit cleaner though IMO. >=20 > >=20 > > > I am fine with just initializing ccs via get_fdt()/get_fdt_starting_o= ffset() > > > beforehand though, but I do think we're stuck with a reset_ccs callba= ck > > > if we're agreed on drck->get_configure_connector_state() =3D=3D NULL = being > > > the primary means to invalidate CCS state. > >=20 > > Hm. I'll have to take another look. I'd really like to keep things > > to a single set of callbacks if possible, rather than having both > > callbacks and counter-callbacks, or whatever you want to call them. >=20 > I ran it by Alex during his IBM visit, and it's seeming like this is > turning out a bit more funky than necessary because we're trying to > combine my approach of relying on the DRC to store the state as an > opaque, while still keeping the state opaque to anything but RTAS. >=20 > If we move the configure-connector state to a separate list as you > originally suggested the need for wierd callbacks goes away. >=20 > This does bring about my original concerns about having a way for the > DRC to reset the state on 1) configure-connector reset, and 2) system res= et >=20 > 1) can be addressed by making the observation that RTAS does know > when to reset the configure-connector state, via rtas-set-indicator(ISOLA= TE). > So if we do it that way, RTAS can zap/invalidate a CCS as the same points > DRC would have do it. We leak a little bit of the DRC state-machine, > but it's fairly trivial. >=20 > 2) can be addressed by registering a separate reset handler that clears t= he > CCS list (which I've hung off of sPAPREnvironment, with > spapr_ccs_{add,remove,reset_hook} to work with the list using drc_index > as the key) Ok. I'll have to look at the code again and see what it seems like. > I've pushed the changes here: >=20 > https://github.com/mdroth/qemu/commits/spapr-hotplug-pci >=20 > 2a6f2b2 *spapr_drc: make prop_get_fdt() standalone > 9140fe4 *spapr_drc: add get_fdt() and set_configured() > d242fed *spapr_rtas: ibm,configure-connector, use get_fdt()/set_configure= d() > 984ee1b *spapr_drc: drop the old stuff >=20 > let me know if you'd prefer I just submit a v8. If you could push the v8 please. >=20 > >=20 > > > > I'm also not sure that reset_ccs is worth abstracting. I think it > > > > would be reasonable just to say that freeing and setting to NULL the > > > > ccs link is sufficient. > > >=20 > > > But after allocation, rtas_configure_connector hands over the ccs link > > > to DRC, and it's local copy goes out of scope. The only way to retrie= ve > > > it is via get_configure_connector_state(), so if the idea is to return > > > NULL open reset, we have no way to free the ccs structure. If we simp= ly > > > have DRC free it, we violate the idea that ccs state is opaque. So gi= ven > > > the init_ccs callback above, it made sense to handle the free via a > > > reset_ccs. > > >=20 > > > >=20 > > > > That said, the current reset_ccs doesn't appear to be quite right, > > > > since it frees the ccs structure, but not the fdt fragment it points > > > > to. I'm not sure how awkward it would be to force them into a comm= on > > > > allocation to avoid that. > > >=20 > > > You mean freeing the actual FDT data? In this case the FDT pointer is > > > simply a pointer to the copy the DRC has, and the lifecycle of the FDT > > > is tied to the device lifecycle, and spans beyond that of a CCS (since > > > we can configure/unconfigure the same device multiple times without > > > unplugging in between) > >=20 > > Oh, ok. Why do you need a copy in ccstate then? The rtas code has > > access to the drc structure as well. >=20 > Hmm, true, we don't actually need a copy. It makes sense a little more > sense when using the fdt_offset_ptr() approach to get rid of the offset, > and I think now it makes the separation between DRC and > rtas-configure-connector a bit more complete, but we could still just > call drc->get_fdt() each time. Let me know if that's preferable and I'll > work it in for the next submission. I think it's preferable. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --ZmUaFz6apKcXQszQ Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJVD2LjAAoJEGw4ysog2bOS3EQP/R0TECpOnGeBR8qZqI41mbrg SB+Hcb8HGJIgz45V+RXHVMdgKpDSwsRm2oZgc9Qa/E8e4sdCBUc6Qu9plaWQlfId l8byenSArXbAVvuq9T7Ti798GK3oVF5sPAWrFt/19JA3jc34l0NLhAHHanscY2pA le38L6+jgfaJebs5xe6w1lQCeGFpYHOKrmyzoApicDLtlpDGg0oZd/qiD/qYucsI X+OPyhWrkOk/YOifLOIZ23mlUggbSxNWpQQ23b/+m97Z/F4l9bPLSBDTytEDN9Aq RRgfjD9VJTjxudCea0WgwOo3/YcokI6nf2my2M5AmqupVhC13Bskt/TEIKgDPY3j g+sRAKyBOZVNZMYLMSvHWR0IW26jpcwPu11VB/XDQmKiGWxADdCOVk4ILggdUvuB G66sLFTwLcs+Rjvk62gQvRSSAylq2WQ2REb50RARcua8kBP5liJ11XOlqaF9fNf0 +mBAtqgEvi5AwO8BDZJQzr2SgJNtOzC6AWXpjbohEizFLzUMplxUgQZIwZ+ps3gF O/ORyA0U8szGYeQj2XOvZNC8Z6peaO3hs9sJsUQm7n2jDOYnegYIG9Odm/cCWJqj jfPg/PW87X/suc+N9HTKjNilvIPtROmSSw9hsJl4Qrfn51D3hUNXRJrXMq/QTIPs dKuimAfNyoZW8b3PZXHy =vchc -----END PGP SIGNATURE----- --ZmUaFz6apKcXQszQ--