From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59742) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SlohX-0007ug-FG for qemu-devel@nongnu.org; Mon, 02 Jul 2012 18:02:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1SlohV-00067Y-8K for qemu-devel@nongnu.org; Mon, 02 Jul 2012 18:02:35 -0400 Received: from e9.ny.us.ibm.com ([32.97.182.139]:39669) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SlohV-00066l-3V for qemu-devel@nongnu.org; Mon, 02 Jul 2012 18:02:33 -0400 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 2 Jul 2012 18:02:28 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 3E95638C805A for ; Mon, 2 Jul 2012 18:02:26 -0400 (EDT) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q62M2PpG421264 for ; Mon, 2 Jul 2012 18:02:25 -0400 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q62M2IUY008129 for ; Mon, 2 Jul 2012 16:02:18 -0600 Message-ID: <4FF21A67.8010100@linux.vnet.ibm.com> Date: Mon, 02 Jul 2012 18:02:15 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1340390174-7493-1-git-send-email-coreyb@linux.vnet.ibm.com> <20120626091004.GA14451@redhat.com> <4FE9A0F0.2050809@redhat.com> <20120626175045.2c7011b3@doriath.home> <4FEA37A9.10707@linux.vnet.ibm.com> <4FEA3D9C.8080205@redhat.com> In-Reply-To: <4FEA3D9C.8080205@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: Kevin Wolf , aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino , pbonzini@redhat.com On 06/26/2012 06:54 PM, Eric Blake wrote: > On 06/26/2012 04:28 PM, Corey Bryant wrote: > >>>>> With this proposed series, we have usage akin to: >>>>> >>>>> 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's >>>>> view of the FD >>>>> 2. drive_add file=/dev/fd/N >>>>> 3. if failure: >>>>> close_fd "/dev/fd/N" >>>> >>>> In fact, there are more steps: >>>> >>>> 4. use it successfully >>>> 5. close_fd "/dev/fd/N" >>>> >>>> I think it would well be possible that qemu just closes the fd when it's >>>> not used internally any more. >>> >>> pass-fd could have a flag indicating qemu to do that. >>> >> >> It seems like libvirt would be in a better position to understand when a >> file is no longer in use, and then it can call close_fd. No? Of course >> the the only fd that needs to be closed is the originally passed fd. The >> dup'd fd's are closed by QEMU. > > The more we discuss this, the more I'm convinced that commands like > 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR > will need to have an optional argument that says the name of the file to > reopen. That is, I've seen three proposals: > > Proposal One: enhance every command to take an fd:name protocol. > PRO: Successful use of a name removes the fd from the 'getfd' list. > CON: Lots of commands to change. > CON: The command sequence is not atomic. > CON: Block layer does not have a file name tied to the fd, so the reopen > operation also needs to learn the fd:name protocol, but since we're > already touching the command it's not much more work. > USAGE: > 1. getfd FDSET={M}, ties new fd to "name" > 2. drive_add fd=name - looks up the fd by name from the list > 3a. on success, fd is no longer in the list, drive_add consumed it > 3b. on failure, libvirt must call 'closefd name' to avoid a leak > 4. getfd FDSET={M2}, ties new fd to "newname" > 5. block-commit fd=newname - looks up fd by name from the list > 6a. on success, fd is no longer in the list, block-commit consumed it > 6b. on failure, libvirt must call 'closefd newname' to avoid a leak > > Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then > update that name to the new fd in advance of any operation that needs to > do a reopen. > PRO: All other commands work without impact by using qemu_open(), by > getting a clone of the permanent name. > CON: The permanent name must remain open as long as qemu might need to > reopen it, and reopening needs the pass-fd force option. > CON: The command sequence is not atomic. > USAGE: > 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N") > showing QEMU's permanent name of the FD > 2. drive_add file= means that qemu_open() will clone the > fd, and drive_add is now using yet another FD while N remains open; > meanwhile, the block layer knows that the drive is tied to the permanent > name > 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and > still returns /dev/fd/N > 4. block-commit - looks up the block-device name (/dev/fd/N), which maps > back to the permanent fd, and gets a copy of the newly passed M2 > 5. on completion (success or failure), libvirt must call 'closefd name' > to avoid a leak > > Proposal Three: Use thread-local fds passed alongside any command, > rather than passing via a separate command > PRO: All commands can now atomically handle fd passing > PRO: Commands that only need a one-shot open can now use fd > CON: Commands that need to reopen still need modification to allow a > name override during the reopen > 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd > is used as the block file, on failure it is atomically closed, so there > is no leak either way. However, the block file has no permanent name. > 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file > override must be passed again (since we have no guarantee that the > /dev/fdlist/1 of _this_ command matches the command-local naming used in > the previous command), but the fd is again used atomically > > Under proposal 3, there is no need to dup fd's, merely only to check > that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd > received via FDSET. And the fact that things are made atomic means that > libvirt no longer has to worry about calling closefd, nor does it have > to worry about being interrupted between two monitor commands and not > knowing what state qemu is in. But it does mean teaching every possible > command that wants to do a reopen to learn how to use fd overrides > rather than to reuse the permanent name that was originally in place on > the first open. > Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds *How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close(). -- Regards, Corey