From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50088) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Smp2h-0004nZ-H6 for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:36:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Smp2a-0004U1-Ei for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:36:35 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:46554) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Smp2a-0004TU-9e for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:36:28 -0400 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Jul 2012 12:36:24 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id C0E8238C8087 for ; Thu, 5 Jul 2012 12:35:29 -0400 (EDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q65GZTWC389728 for ; Thu, 5 Jul 2012 12:35:29 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q65M6MMl029494 for ; Thu, 5 Jul 2012 18:06:22 -0400 Message-ID: <4FF5C24E.9010008@linux.vnet.ibm.com> Date: Thu, 05 Jul 2012 12:35:26 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1340390174-7493-1-git-send-email-coreyb@linux.vnet.ibm.com> <20120626091004.GA14451@redhat.com> <4FE9A0F0.2050809@redhat.com> <20120626175045.2c7011b3@doriath.home> <4FEA37A9.10707@linux.vnet.ibm.com> <4FEA3D9C.8080205@redhat.com> <4FF21A67.8010100@linux.vnet.ibm.com> <4FF31265.1000308@linux.vnet.ibm.com> <4FF316C9.5020100@redhat.com> <4FF31CFD.7030508@linux.vnet.ibm.com> <4FF325C8.4060401@redhat.com> <4FF3F806.7020307@redhat.com> <4FF5A331.9030108@linux.vnet.ibm.com> <4FF5A9D9.7080607@redhat.com> In-Reply-To: <4FF5A9D9.7080607@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino , pbonzini@redhat.com, Eric Blake On 07/05/2012 10:51 AM, Kevin Wolf wrote: > Am 05.07.2012 16:22, schrieb Corey Bryant: >>>> For some examples: >>>> >>>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in >>>> use by monitor, as member of fdset1 >>>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet been >>>> passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is >>>> now 0 so qemu closes it >>> >>> Doesn't the refcount belong to an fdset rather than a single fd? As long >>> as the fdset has still a reference, it's possible to reach the other fds >>> in the set through a reopen. For example: >>> >>> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member >>> of fdset1, in use by monitor, refcount 1 >>> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a >>> member of fdset, in use by monitor, refcount is still 1 >>> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename >>> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 >>> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its >>> refcount decreased to 1, and both member fds 4 and 5 stay open. >>> >>> If you had refcounted a single fd, fd 5 would be closed now and qemu >>> couldn't switch to O_RDONLY any more. >>> >> >> If the O_RDONLY is for a future command, it would make more sense to add >> that fd before that future command. Or are you passing it at step 2 >> because it is needed for the probe re-open issue? > > Come on, requiring realistic examples isn't fair. ;-) Heh :) > > Swap steps 2 and 3 in the example, then step 3 is just the preparation > for a command that uses the new fd. The problem stays the same. Or a > live commit operation could be in flight so that qemu must be able to > switch on completion without libvirt interaction. Good point. I thought it would be useful to run through the examples again with each fdset having a refcount, and each fd having an in-use flag. One difference though is that refcount is only incremented/decremented when qemu_open/qemu_close are called. I think this works and covers Kevin's concerns. Actually it might be a bit simpler too. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, but fd=4 remains open because refcount of fdset1 is 1 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no longer in-use by the monitor, and fd=5 remains open because refcount of fdset1 is 1 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off > > There are enough reasons that an fd in an fdset exists and is not > opened, but I can't think of one why it should be dropped. > >>>>>> In use by whom? If it's still in use in qemu (as in "in-use flag would >>>>>> be set") and we have a refcount of zero, then that's a bug. >>>>>> >>>>> >>>>> In use by qemu. I don't think it's a bug. I think there are situations >>>>> where refcount gets to zero but qemu is still using the fd. >>>> >>>> I think the refcount being non-zero _is_ what defines an fd as being in >>>> use by qemu (including use by the monitor). Any place you have to close >>>> an fd before reopening it is dangerous; the safe way is always to open >>>> with the new permissions before closing the old permissions. >>> >>> There is one case I'm aware of where we need to be careful: Before >>> opening an image, qemu may probe the format. In this case, the image >>> gets opened twice, and the first close comes before the second open. I'm >>> not entirely sure how hard it would be to get rid of that behaviour. >>> >>> If we can't get rid of it, we have a small window that the refcount >>> doesn't really cover, and if we weren't careful it would become racy. >>> This is why I mentioned earlier that maybe we need to defer the refcount >>> decrease a bit. However, I can't see how the in-use flag would make a >>> difference there. If the refcount can't cover it, the in-use flag can't >>> either. >> >> Yeah this is a problem. Could we introduce another flag to cover this? > > Adding more refcounts or flags is not a problem, but it doesn't solve it > either. The hard question is when to set that flag. > > I believe it may be easier to just change the block layer so that it > opens files only once during bdrv_open(). > > Kevin > -- Regards, Corey