From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:33239) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SmmzP-0003tf-Ph for qemu-devel@nongnu.org; Thu, 05 Jul 2012 10:25:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SmmzG-0001yc-Ea for qemu-devel@nongnu.org; Thu, 05 Jul 2012 10:25:03 -0400 Received: from e34.co.us.ibm.com ([32.97.110.152]:60268) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SmmzG-0001xV-5V for qemu-devel@nongnu.org; Thu, 05 Jul 2012 10:24:54 -0400 Received: from /spool/local by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Jul 2012 08:24:44 -0600 Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 43680C40018 for ; Thu, 5 Jul 2012 14:22:58 +0000 (WET) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q65EMjxB208580 for ; Thu, 5 Jul 2012 08:22:48 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q65EMidk026761 for ; Thu, 5 Jul 2012 08:22:45 -0600 Message-ID: <4FF5A331.9030108@linux.vnet.ibm.com> Date: Thu, 05 Jul 2012 10:22:41 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1340390174-7493-1-git-send-email-coreyb@linux.vnet.ibm.com> <20120626091004.GA14451@redhat.com> <4FE9A0F0.2050809@redhat.com> <20120626175045.2c7011b3@doriath.home> <4FEA37A9.10707@linux.vnet.ibm.com> <4FEA3D9C.8080205@redhat.com> <4FF21A67.8010100@linux.vnet.ibm.com> <4FF31265.1000308@linux.vnet.ibm.com> <4FF316C9.5020100@redhat.com> <4FF31CFD.7030508@linux.vnet.ibm.com> <4FF325C8.4060401@redhat.com> <4FF3F806.7020307@redhat.com> In-Reply-To: <4FF3F806.7020307@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino , pbonzini@redhat.com, Eric Blake On 07/04/2012 04:00 AM, Kevin Wolf wrote: > Am 03.07.2012 19:03, schrieb Eric Blake: >>>>> 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the >>>>> set that has access flags matching the qemu_open action flags. >>>>> qemu_open increments refcount for this fd. >>>>> 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named >>>>> "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt >>>>> in-use flag turned on for fd. >>>>> 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first >>>>> fd from the set that has access flags matching the qemu_open action >>>>> flags. qemu_open increments refcount for this fd. >>>>> 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the >>>>> set. turns libvirt in-use flag off marking the fd ready to be closed >>>>> when qemu is done with it. >>>> >>>> If we decided to not return the individual fd numbers to libvirt, file >>>> descriptors would be uniquely identified by an fdset/flags pair here. >>>> >>> >>> Are you saying we'd pass the fdset name and flags parameters on >>> remove-fd to somehow identify the fds to remove? >> >> Passing the flag parameters is not trivial, as that would mean the QMP >> code would have to define constants mapping to all of the O_* flags that >> qemu_open supports. It's easier to support closing by fd number. > > Good point. So pass-fd or whatever it will be called needs to returnn > both fdset number and the individual fd number. > >>>>> 5. qemu_close decrements refcount for fd, and closes fd when refcount is >>>>> zero and libvirt in use flag is off. >>>> >>>> The monitor could just hold another reference, then we save the >>>> additional flag. But that's a qemu implementation detail. >>>> >>> >>> I'm not sure I understand what you mean. >> >> pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, >> with initial use count of 1 (the use is the monitor). qemu_open() >> increments the use count. A new qemu_close() wrapper would decrement >> the use count. And both calling 'remove-fd', or closing the QMP monitor >> of an fd that has not yet been passed through 'remove-fd', serves as a >> way to decrement the use count. You'd still have to track whether the >> monitor is using an fd (to avoid over-decrementing on QMP monitor >> close), but by having the monitor's use also tracked under the refcount, >> then refcount reaching 0 is sufficient to auto-close an fd. > >> I think >> that also means that re-establishing the client QMP connection would >> increment > > This is an interesting idea. I've never thought about what to do with a > reconnect. It felt a bit odd at first, but now your proposal totally > makes sense to me. > > >> For some examples: >> >> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in >> use by monitor, as member of fdset1 >> 2. client crashes, so all tracked fds are visited; fd=4 had not yet been >> passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is >> now 0 so qemu closes it > > Doesn't the refcount belong to an fdset rather than a single fd? As long > as the fdset has still a reference, it's possible to reach the other fds > in the set through a reopen. For example: > > 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member > of fdset1, in use by monitor, refcount 1 > 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a > member of fdset, in use by monitor, refcount is still 1 > 3. client calls 'device-add' with /dev/fdset/1 as the backing filename > and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 > 4. client crashes, so all tracked fdsets are visited; fdset1 gets its > refcount decreased to 1, and both member fds 4 and 5 stay open. > > If you had refcounted a single fd, fd 5 would be closed now and qemu > couldn't switch to O_RDONLY any more. > If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue? >> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in >> use by monitor, as member of fdset1 >> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, >> so qemu_open() increments the refcount to 2 >> 3. client crashes, so all tracked fds are visited; fd=4 had not yet been >> passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 >> open because it is still in use by the block device >> 4. client re-establishes QMP connection, and 'query-fds' lets client >> learn about fd=4 still being open as part of fdset1, but also informs >> client that fd is not in use by the monitor >> >> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in >> use by monitor, as member of fdset1 >> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, >> so qemu_open() increments the refcount to 2 >> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no >> longer in use by the monitor, refcount decremented to 1 but still left >> open because it is in use by the block device >> 4. client crashes, so all tracked fds are visited; but fd=4 is already >> marked as not in use by the monitor, so its refcount is unchanged >> >> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in >> use by monitor, as member of fdset1 >> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, >> but the command fails for some other reason, so the refcount is still 1 >> at the end of the command (although it may have been temporarily >> incremented then decremented during the command) >> 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or >> QMP connection is closed), so qemu marks fd=4 as no longer in use by the >> monitor, refcount is now decremented to 0 and fd=4 is closed > > Yup, apart from the comment above the examples look good to me. > >> I think that covers the idea; you need a bool in_use for tracking >> monitor state (the monitor is in use until either a remove-fd or a >> monitor connection closes), as well as a ref-count. > > Yes, makes sense to me. > >>> We also need a query-fdsets command that lists all fdsets that exist. If >>>> we add information about single fds to the return value of it, we >>>> probably don't need a separate query-fd that operates on a single fdset. >>>> >>> >>> Yes, good point. And maybe we don't need 2 commands. query-fdsets >>> could return all the sets and all the fds that are in those sets. >> >> Yes, I think a single query command is good enough here, something like: >> >> { "execute":"query-fdsets" } => >> { "return" : { "sets": [ >> { "name": "fdset1", >> "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] }, >> { "name": "fdset2", >> "fds": [ { "fd": 5, "monitor": false, "refcount": 1 }, >> { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } } > > Looks good, this is exactly what I was thinking of. > >>>> In use by whom? If it's still in use in qemu (as in "in-use flag would >>>> be set") and we have a refcount of zero, then that's a bug. >>>> >>> >>> In use by qemu. I don't think it's a bug. I think there are situations >>> where refcount gets to zero but qemu is still using the fd. >> >> I think the refcount being non-zero _is_ what defines an fd as being in >> use by qemu (including use by the monitor). Any place you have to close >> an fd before reopening it is dangerous; the safe way is always to open >> with the new permissions before closing the old permissions. > > There is one case I'm aware of where we need to be careful: Before > opening an image, qemu may probe the format. In this case, the image > gets opened twice, and the first close comes before the second open. I'm > not entirely sure how hard it would be to get rid of that behaviour. > > If we can't get rid of it, we have a small window that the refcount > doesn't really cover, and if we weren't careful it would become racy. > This is why I mentioned earlier that maybe we need to defer the refcount > decrease a bit. However, I can't see how the in-use flag would make a > difference there. If the refcount can't cover it, the in-use flag can't > either. Yeah this is a problem. Could we introduce another flag to cover this? -- Regards, Corey