From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50593) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Smp4s-0007hF-RU for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:38:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Smp4n-0004zr-6a for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:38:50 -0400 Received: from e33.co.us.ibm.com ([32.97.110.151]:35128) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Smp4m-0004zK-S8 for qemu-devel@nongnu.org; Thu, 05 Jul 2012 12:38:45 -0400 Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Jul 2012 10:38:39 -0600 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id B0352C9011F for ; Thu, 5 Jul 2012 12:37:21 -0400 (EDT) Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q65GbLU1370648 for ; Thu, 5 Jul 2012 12:37:21 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q65M8DJ3006865 for ; Thu, 5 Jul 2012 18:08:14 -0400 Message-ID: <4FF5C2BE.3070607@linux.vnet.ibm.com> Date: Thu, 05 Jul 2012 12:37:18 -0400 From: Corey Bryant MIME-Version: 1.0 References: <1340390174-7493-1-git-send-email-coreyb@linux.vnet.ibm.com> <20120626091004.GA14451@redhat.com> <4FE9A0F0.2050809@redhat.com> <20120626175045.2c7011b3@doriath.home> <4FEA37A9.10707@linux.vnet.ibm.com> <4FEA3D9C.8080205@redhat.com> <4FF21A67.8010100@linux.vnet.ibm.com> <4FF31265.1000308@linux.vnet.ibm.com> <4FF316C9.5020100@redhat.com> <4FF31CFD.7030508@linux.vnet.ibm.com> <4FF325C8.4060401@redhat.com> <4FF3F806.7020307@redhat.com> <4FF5A331.9030108@linux.vnet.ibm.com> <4FF5A9D9.7080607@redhat.com> <4FF5C24E.9010008@linux.vnet.ibm.com> In-Reply-To: <4FF5C24E.9010008@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, libvir-list@redhat.com, qemu-devel@nongnu.org, Luiz Capitulino , pbonzini@redhat.com, Eric Blake On 07/05/2012 12:35 PM, Corey Bryant wrote: > > > On 07/05/2012 10:51 AM, Kevin Wolf wrote: >> Am 05.07.2012 16:22, schrieb Corey Bryant: >>>>> For some examples: >>>>> >>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount >>>>> 1, in >>>>> use by monitor, as member of fdset1 >>>>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet >>>>> been >>>>> passed to 'remove-fd', so qemu decrements refcount; refcount of >>>>> fd=4 is >>>>> now 0 so qemu closes it >>>> >>>> Doesn't the refcount belong to an fdset rather than a single fd? As >>>> long >>>> as the fdset has still a reference, it's possible to reach the other >>>> fds >>>> in the set through a reopen. For example: >>>> >>>> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a >>>> member >>>> of fdset1, in use by monitor, refcount 1 >>>> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a >>>> member of fdset, in use by monitor, refcount is still 1 >>>> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename >>>> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 >>>> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its >>>> refcount decreased to 1, and both member fds 4 and 5 stay open. >>>> >>>> If you had refcounted a single fd, fd 5 would be closed now and qemu >>>> couldn't switch to O_RDONLY any more. >>>> >>> >>> If the O_RDONLY is for a future command, it would make more sense to add >>> that fd before that future command. Or are you passing it at step 2 >>> because it is needed for the probe re-open issue? >> >> Come on, requiring realistic examples isn't fair. ;-) > > Heh :) > >> >> Swap steps 2 and 3 in the example, then step 3 is just the preparation >> for a command that uses the new fd. The problem stays the same. Or a >> live commit operation could be in flight so that qemu must be able to >> switch on completion without libvirt interaction. > > Good point. > > I thought it would be useful to run through the examples again with each > fdset having a refcount, and each fd having an in-use flag. One > difference though is that refcount is only incremented/decremented when > qemu_open/qemu_close are called. I think this works and covers Kevin's > concerns. Actually it might be a bit simpler too. > > 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with > refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, > so qemu_open() increments the refcount of fdset1 to 1 > 3. client crashes, so all tracked fds are visited; fd=4 had not yet been > passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned > off and fd=4 is left open because it is still in use by the block device > (refcount is 1) > 4. client re-establishes QMP connection, so all tracked fds are visited, > and in-use flags are turned back on; 'query-fds' lets client learn about > fd=4 still being open as part of fdset1 > > 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with > refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, > so qemu_open() increments the refcount of fdset1 to 1 > 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no > longer in-use by the monitor, and is left open because it is in use by > the block device (refcount is 1) > 4. client crashes, so all tracked fds are visited; fd=4 is not in-use > but refcount is 1 so it is not closed > > 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with > refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, > but the command fails for some other reason, so the refcount is still 0 > at the end of the command (although it may have been temporarily > incremented then decremented during the command) > 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or > QMP connection is closed), so qemu marks fd=4 as no longer in-use by the > monitor; refcount is 0 so all fds in fdset1 are closed > > 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with > refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with > refcount still 0; fd=5's in-use flag is turned on > 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not > yet been passed to 'remove-fd', so their in-use flags are on; in-use > flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in > fdset1 > > 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with > refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with > refcount still 0; fd=5's in-use flag is turned on > 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no > longer in-use by the monitor, and fd=4 is closed since the refcount is > 0; fd=5 remains open > > 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 > with refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 > with refcount still 0; fd=5's in-use flag is turned on > 3. client calls 'device-add' with /dev/fdset/1 as the backing filename > and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 > 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no > longer in-use by the monitor, but fd=4 remains open because refcount of > fdset1 is 1 > 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no > longer in-use by the monitor, and fd=5 remains open because refcount of > fdset1 is 1 > 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both > fds are closed since refcount is 0 and their in-use flags are off > > 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 > with refcount of 0; fd=4's in-use flag is turned on > 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 > with refcount still 0; fd=5's in-use flag is turned on > 3. client calls 'device-add' with /dev/fdset/1 as the backing filename > and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 > 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have > not yet been passed to 'remove-fd', so their in-use flags are on; in-use > flags are turned off and both fds are left open because the set is still > in use by the block device (refcount is 1) > 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both > fds are closed since refcount is 0 and their in-use flags are off > >> >> There are enough reasons that an fd in an fdset exists and is not >> opened, but I can't think of one why it should be dropped. >> >>>>>>> In use by whom? If it's still in use in qemu (as in "in-use flag >>>>>>> would >>>>>>> be set") and we have a refcount of zero, then that's a bug. >>>>>>> >>>>>> >>>>>> In use by qemu. I don't think it's a bug. I think there are >>>>>> situations >>>>>> where refcount gets to zero but qemu is still using the fd. >>>>> >>>>> I think the refcount being non-zero _is_ what defines an fd as >>>>> being in >>>>> use by qemu (including use by the monitor). Any place you have to >>>>> close >>>>> an fd before reopening it is dangerous; the safe way is always to open >>>>> with the new permissions before closing the old permissions. >>>> >>>> There is one case I'm aware of where we need to be careful: Before >>>> opening an image, qemu may probe the format. In this case, the image >>>> gets opened twice, and the first close comes before the second open. >>>> I'm >>>> not entirely sure how hard it would be to get rid of that behaviour. >>>> >>>> If we can't get rid of it, we have a small window that the refcount >>>> doesn't really cover, and if we weren't careful it would become racy. >>>> This is why I mentioned earlier that maybe we need to defer the >>>> refcount >>>> decrease a bit. However, I can't see how the in-use flag would make a >>>> difference there. If the refcount can't cover it, the in-use flag can't >>>> either. >>> >>> Yeah this is a problem. Could we introduce another flag to cover this? >> >> Adding more refcounts or flags is not a problem, but it doesn't solve it >> either. The hard question is when to set that flag. >> >> I believe it may be easier to just change the block layer so that it >> opens files only once during bdrv_open(). >> Can this fix be delivered after the fd passing patch series? -- Regards, Corey