qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Corey Bryant <coreyb@linux.vnet.ibm.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
	libvir-list@redhat.com, qemu-devel@nongnu.org,
	Luiz Capitulino <lcapitulino@redhat.com>,
	pbonzini@redhat.com, Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor	passing	using	pass-fd
Date: Thu, 05 Jul 2012 12:37:18 -0400	[thread overview]
Message-ID: <4FF5C2BE.3070607@linux.vnet.ibm.com> (raw)
In-Reply-To: <4FF5C24E.9010008@linux.vnet.ibm.com>



On 07/05/2012 12:35 PM, Corey Bryant wrote:
>
>
> On 07/05/2012 10:51 AM, Kevin Wolf wrote:
>> Am 05.07.2012 16:22, schrieb Corey Bryant:
>>>>> For some examples:
>>>>>
>>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount
>>>>> 1, in
>>>>> use by monitor, as member of fdset1
>>>>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet
>>>>> been
>>>>> passed to 'remove-fd', so qemu decrements refcount; refcount of
>>>>> fd=4 is
>>>>> now 0 so qemu closes it
>>>>
>>>> Doesn't the refcount belong to an fdset rather than a single fd? As
>>>> long
>>>> as the fdset has still a reference, it's possible to reach the other
>>>> fds
>>>> in the set through a reopen. For example:
>>>>
>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a
>>>> member
>>>> of fdset1, in use by monitor, refcount 1
>>>> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a
>>>> member of fdset, in use by monitor, refcount is still 1
>>>> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
>>>> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2
>>>> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its
>>>> refcount decreased to 1, and both member fds 4 and 5 stay open.
>>>>
>>>> If you had refcounted a single fd, fd 5 would be closed now and qemu
>>>> couldn't switch to O_RDONLY any more.
>>>>
>>>
>>> If the O_RDONLY is for a future command, it would make more sense to add
>>> that fd before that future command.  Or are you passing it at step 2
>>> because it is needed for the probe re-open issue?
>>
>> Come on, requiring realistic examples isn't fair. ;-)
>
> Heh :)
>
>>
>> Swap steps 2 and 3 in the example, then step 3 is just the preparation
>> for a command that uses the new fd. The problem stays the same. Or a
>> live commit operation could be in flight so that qemu must be able to
>> switch on completion without libvirt interaction.
>
> Good point.
>
> I thought it would be useful to run through the examples again with each
> fdset having a refcount, and each fd having an in-use flag.  One
> difference though is that refcount is only incremented/decremented when
> qemu_open/qemu_close are called.  I think this works and covers Kevin's
> concerns.  Actually it might be a bit simpler too.
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> so qemu_open() increments the refcount of fdset1 to 1
> 3. client crashes, so all tracked fds are visited; fd=4 had not yet been
> passed to 'remove-fd', so it's in-use flag is on;  in-use flag is turned
> off and fd=4 is left open because it is still in use by the block device
> (refcount is 1)
> 4. client re-establishes QMP connection, so all tracked fds are visited,
> and in-use flags are turned back on; 'query-fds' lets client learn about
> fd=4 still being open as part of fdset1
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> so qemu_open() increments the refcount of fdset1 to 1
> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, and is left open because it is in use by
> the block device (refcount is 1)
> 4. client crashes, so all tracked fds are visited; fd=4 is not in-use
> but refcount is 1 so it is not closed
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> but the command fails for some other reason, so the refcount is still 0
> at the end of the command (although it may have been temporarily
> incremented then decremented during the command)
> 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or
> QMP connection is closed), so qemu marks fd=4 as no longer in-use by the
> monitor; refcount is 0 so all fds in fdset1 are closed
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with
> refcount still 0; fd=5's in-use flag is turned on
> 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not
> yet been passed to 'remove-fd', so their in-use flags are on; in-use
> flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in
> fdset1
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with
> refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, and fd=4 is closed since the refcount is
> 0; fd=5 remains open
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1
> with refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1
> with refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
> 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, but fd=4 remains open because refcount of
> fdset1 is 1
> 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no
> longer in-use by the monitor, and fd=5 remains open because refcount of
> fdset1 is 1
> 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both
> fds are closed since refcount is 0 and their in-use flags are off
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1
> with refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1
> with refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
> 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have
> not yet been passed to 'remove-fd', so their in-use flags are on; in-use
> flags are turned off and both fds are left open because the set is still
> in use by the block device (refcount is 1)
> 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both
> fds are closed since refcount is 0 and their in-use flags are off
>
>>
>> There are enough reasons that an fd in an fdset exists and is not
>> opened, but I can't think of one why it should be dropped.
>>
>>>>>>> In use by whom? If it's still in use in qemu (as in "in-use flag
>>>>>>> would
>>>>>>> be set") and we have a refcount of zero, then that's a bug.
>>>>>>>
>>>>>>
>>>>>> In use by qemu.  I don't think it's a bug.  I think there are
>>>>>> situations
>>>>>> where refcount gets to zero but qemu is still using the fd.
>>>>>
>>>>> I think the refcount being non-zero _is_ what defines an fd as
>>>>> being in
>>>>> use by qemu (including use by the monitor).  Any place you have to
>>>>> close
>>>>> an fd before reopening it is dangerous; the safe way is always to open
>>>>> with the new permissions before closing the old permissions.
>>>>
>>>> There is one case I'm aware of where we need to be careful: Before
>>>> opening an image, qemu may probe the format. In this case, the image
>>>> gets opened twice, and the first close comes before the second open.
>>>> I'm
>>>> not entirely sure how hard it would be to get rid of that behaviour.
>>>>
>>>> If we can't get rid of it, we have a small window that the refcount
>>>> doesn't really cover, and if we weren't careful it would become racy.
>>>> This is why I mentioned earlier that maybe we need to defer the
>>>> refcount
>>>> decrease a bit. However, I can't see how the in-use flag would make a
>>>> difference there. If the refcount can't cover it, the in-use flag can't
>>>> either.
>>>
>>> Yeah this is a problem.  Could we introduce another flag to cover this?
>>
>> Adding more refcounts or flags is not a problem, but it doesn't solve it
>> either. The hard question is when to set that flag.
>>
>> I believe it may be easier to just change the block layer so that it
>> opens files only once during bdrv_open().
>>

Can this fix be delivered after the fd passing patch series?

-- 
Regards,
Corey

  reply	other threads:[~2012-07-05 16:38 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-22 18:36 [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
2012-06-22 19:31   ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 2/7] qapi: Convert getfd and closefd Corey Bryant
2012-07-11 18:51   ` Luiz Capitulino
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 3/7] qapi: Add pass-fd QMP command Corey Bryant
2012-06-22 20:24   ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 4/7] qapi: Re-arrange monitor.c functions Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 5/7] block: Prevent /dev/fd/X filename from being detected as floppy Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 6/7] block: Convert open calls to qemu_open Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 7/7] osdep: Enable qemu_open to dup pre-opened fd Corey Bryant
2012-06-22 19:58   ` Eric Blake
     [not found] ` <20120626091004.GA14451@redhat.com>
     [not found]   ` <4FE9A0F0.2050809@redhat.com>
     [not found]     ` <20120626175045.2c7011b3@doriath.home>
     [not found]       ` <4FEA37A9.10707@linux.vnet.ibm.com>
     [not found]         ` <4FEA3D9C.8080205@redhat.com>
2012-07-02 22:02           ` [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-07-02 22:31             ` Eric Blake
2012-07-03  9:07               ` Daniel P. Berrange
2012-07-03  9:40               ` Kevin Wolf
2012-07-03 13:42               ` Corey Bryant
2012-07-03 15:40             ` Corey Bryant
2012-07-03 15:59               ` Kevin Wolf
2012-07-03 16:25                 ` Corey Bryant
2012-07-03 17:03                   ` Eric Blake
2012-07-03 17:46                     ` Corey Bryant
2012-07-03 18:00                       ` Eric Blake
2012-07-03 18:21                         ` Corey Bryant
2012-07-04  8:09                           ` Kevin Wolf
2012-07-05 15:06                             ` Corey Bryant
2012-07-09 14:05                               ` Luiz Capitulino
2012-07-09 15:05                                 ` Corey Bryant
2012-07-09 15:46                                   ` Kevin Wolf
2012-07-09 16:18                                     ` Luiz Capitulino
2012-07-09 17:59                                       ` Corey Bryant
2012-07-09 17:35                                     ` Corey Bryant
2012-07-09 17:48                                       ` Luiz Capitulino
2012-07-09 18:02                                         ` Corey Bryant
2012-07-10  7:53                                       ` Kevin Wolf
2012-07-09 18:20                                   ` Corey Bryant
2012-07-04  8:00                     ` Kevin Wolf
2012-07-05 14:22                       ` Corey Bryant
2012-07-05 14:51                         ` Kevin Wolf
2012-07-05 16:35                           ` Corey Bryant
2012-07-05 16:37                             ` Corey Bryant [this message]
2012-07-06  9:06                               ` Kevin Wolf
2012-07-05 17:00                             ` Eric Blake
2012-07-05 17:36                               ` Corey Bryant
2012-07-06  9:11                               ` Kevin Wolf
2012-07-06 17:14                                 ` Corey Bryant
2012-07-06 17:15                                   ` Corey Bryant
2012-07-06 17:40                                 ` Corey Bryant
2012-07-06 18:19                                   ` [Qemu-devel] [libvirt] " Corey Bryant
2012-07-09 14:04                                   ` [Qemu-devel] " Kevin Wolf
2012-07-09 15:23                                     ` Corey Bryant
2012-07-09 15:30                                       ` Kevin Wolf
2012-07-09 18:40   ` Anthony Liguori
2012-07-09 19:00     ` Luiz Capitulino
2012-07-10  8:54       ` Daniel P. Berrange
2012-07-10  7:58     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF5C2BE.3070607@linux.vnet.ibm.com \
    --to=coreyb@linux.vnet.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).