From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:51614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sm5Ld-0004by-1n for qemu-devel@nongnu.org; Tue, 03 Jul 2012 11:49:11 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Sm5LV-0004vi-PP for qemu-devel@nongnu.org; Tue, 03 Jul 2012 11:49:04 -0400 Received: from e38.co.us.ibm.com ([32.97.110.159]:42088) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Sm5LV-0004tz-A6 for qemu-devel@nongnu.org; Tue, 03 Jul 2012 11:48:57 -0400 Received: from /spool/local by e38.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 3 Jul 2012 09:48:50 -0600 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id DAE113E40181 for ; Tue, 3 Jul 2012 15:41:28 +0000 (WET) Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q63Fesbg136692 for ; Tue, 3 Jul 2012 09:41:12 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q63FeOpO029925 for ; Tue, 3 Jul 2012 09:40:24 -0600 Message-ID: <4FF31265.1000308@linux.vnet.ibm.com> Date: Tue, 03 Jul 2012 11:40:21 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1340390174-7493-1-git-send-email-coreyb@linux.vnet.ibm.com> <20120626091004.GA14451@redhat.com> <4FE9A0F0.2050809@redhat.com> <20120626175045.2c7011b3@doriath.home> <4FEA37A9.10707@linux.vnet.ibm.com> <4FEA3D9C.8080205@redhat.com> <4FF21A67.8010100@linux.vnet.ibm.com> In-Reply-To: <4FF21A67.8010100@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , Kevin Wolf Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino , pbonzini@redhat.com On 07/02/2012 06:02 PM, Corey Bryant wrote: > > > On 06/26/2012 06:54 PM, Eric Blake wrote: >> On 06/26/2012 04:28 PM, Corey Bryant wrote: >> >>>>>> With this proposed series, we have usage akin to: >>>>>> >>>>>> 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing >>>>>> QEMU's >>>>>> view of the FD >>>>>> 2. drive_add file=/dev/fd/N >>>>>> 3. if failure: >>>>>> close_fd "/dev/fd/N" >>>>> >>>>> In fact, there are more steps: >>>>> >>>>> 4. use it successfully >>>>> 5. close_fd "/dev/fd/N" >>>>> >>>>> I think it would well be possible that qemu just closes the fd when >>>>> it's >>>>> not used internally any more. >>>> >>>> pass-fd could have a flag indicating qemu to do that. >>>> >>> >>> It seems like libvirt would be in a better position to understand when a >>> file is no longer in use, and then it can call close_fd. No? Of course >>> the the only fd that needs to be closed is the originally passed fd. The >>> dup'd fd's are closed by QEMU. >> >> The more we discuss this, the more I'm convinced that commands like >> 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR >> will need to have an optional argument that says the name of the file to >> reopen. That is, I've seen three proposals: >> >> Proposal One: enhance every command to take an fd:name protocol. >> PRO: Successful use of a name removes the fd from the 'getfd' list. >> CON: Lots of commands to change. >> CON: The command sequence is not atomic. >> CON: Block layer does not have a file name tied to the fd, so the reopen >> operation also needs to learn the fd:name protocol, but since we're >> already touching the command it's not much more work. >> USAGE: >> 1. getfd FDSET={M}, ties new fd to "name" >> 2. drive_add fd=name - looks up the fd by name from the list >> 3a. on success, fd is no longer in the list, drive_add consumed it >> 3b. on failure, libvirt must call 'closefd name' to avoid a leak >> 4. getfd FDSET={M2}, ties new fd to "newname" >> 5. block-commit fd=newname - looks up fd by name from the list >> 6a. on success, fd is no longer in the list, block-commit consumed it >> 6b. on failure, libvirt must call 'closefd newname' to avoid a leak >> >> Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then >> update that name to the new fd in advance of any operation that needs to >> do a reopen. >> PRO: All other commands work without impact by using qemu_open(), by >> getting a clone of the permanent name. >> CON: The permanent name must remain open as long as qemu might need to >> reopen it, and reopening needs the pass-fd force option. >> CON: The command sequence is not atomic. >> USAGE: >> 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N") >> showing QEMU's permanent name of the FD >> 2. drive_add file= means that qemu_open() will clone the >> fd, and drive_add is now using yet another FD while N remains open; >> meanwhile, the block layer knows that the drive is tied to the permanent >> name >> 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and >> still returns /dev/fd/N >> 4. block-commit - looks up the block-device name (/dev/fd/N), which maps >> back to the permanent fd, and gets a copy of the newly passed M2 >> 5. on completion (success or failure), libvirt must call 'closefd name' >> to avoid a leak >> >> Proposal Three: Use thread-local fds passed alongside any command, >> rather than passing via a separate command >> PRO: All commands can now atomically handle fd passing >> PRO: Commands that only need a one-shot open can now use fd >> CON: Commands that need to reopen still need modification to allow a >> name override during the reopen >> 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd >> is used as the block file, on failure it is atomically closed, so there >> is no leak either way. However, the block file has no permanent name. >> 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file >> override must be passed again (since we have no guarantee that the >> /dev/fdlist/1 of _this_ command matches the command-local naming used in >> the previous command), but the fd is again used atomically >> >> Under proposal 3, there is no need to dup fd's, merely only to check >> that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd >> received via FDSET. And the fact that things are made atomic means that >> libvirt no longer has to worry about calling closefd, nor does it have >> to worry about being interrupted between two monitor commands and not >> knowing what state qemu is in. But it does mean teaching every possible >> command that wants to do a reopen to learn how to use fd overrides >> rather than to reuse the permanent name that was originally in place on >> the first open. >> > > Here's another option that Kevin and I discussed today on IRC. I've > modified a few minor details since the discussion. And Kevin please > correct me if anything is wrong. > > Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds > should all refer to the same file, but may have different access flags > (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the > matching access mode flags. > pass-fds: > { 'command': 'pass-fds', > 'data': {'fdsetname': 'str', '*close': 'bool'}, > 'returns': 'string' } > close-fds: > { 'command': 'close-fds', > 'data': {'fdsetname': 'str' } > where: > @fdsetname - the file descriptor set name > @close - *if true QEMU closes the monitor fds when they expire. > if false, fds remain on the monitor until close-fds > command. > PRO: Supports reopen > PRO: All other commands work without impact by using qemu_open() > PRO: No fd leakage if close==true specified > CON: Determining when to close fds when close==true may be tricky > USAGE: > 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to > the passed set of fds. > 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the > list that has access flags matching the qemu_open() action flags. > 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first > fd from the list that has access flags matching the qemu_open() action > flags. > 4. The monitor fds are closed: > A) *If @close == true, qemu closes the set of fds when the timer > expires > B) If @close == false, libvirt must issue close-fds command to close > the set of fds > > *How to solve this has yet to be determined. Kevin mentioned > potentially using a bottom-half or a timer in qemu_close(). > Thanks again for taking time to discuss this at today's QEMU community call. Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design. Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd. 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. More functional details: -If libvirt crashes it can call "query-fd /dev/fdset/1" to determine which fds are open in the set. -If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use? -This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called. Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied. -- Regards, Corey