From: Corey Bryant <coreyb@linux.vnet.ibm.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
libvir-list@redhat.com, qemu-devel@nongnu.org,
Luiz Capitulino <lcapitulino@redhat.com>,
pbonzini@redhat.com, Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd
Date: Thu, 05 Jul 2012 10:22:41 -0400 [thread overview]
Message-ID: <4FF5A331.9030108@linux.vnet.ibm.com> (raw)
In-Reply-To: <4FF3F806.7020307@redhat.com>
On 07/04/2012 04:00 AM, Kevin Wolf wrote:
> Am 03.07.2012 19:03, schrieb Eric Blake:
>>>>> 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the
>>>>> set that has access flags matching the qemu_open action flags.
>>>>> qemu_open increments refcount for this fd.
>>>>> 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named
>>>>> "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt
>>>>> in-use flag turned on for fd.
>>>>> 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first
>>>>> fd from the set that has access flags matching the qemu_open action
>>>>> flags. qemu_open increments refcount for this fd.
>>>>> 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the
>>>>> set. turns libvirt in-use flag off marking the fd ready to be closed
>>>>> when qemu is done with it.
>>>>
>>>> If we decided to not return the individual fd numbers to libvirt, file
>>>> descriptors would be uniquely identified by an fdset/flags pair here.
>>>>
>>>
>>> Are you saying we'd pass the fdset name and flags parameters on
>>> remove-fd to somehow identify the fds to remove?
>>
>> Passing the flag parameters is not trivial, as that would mean the QMP
>> code would have to define constants mapping to all of the O_* flags that
>> qemu_open supports. It's easier to support closing by fd number.
>
> Good point. So pass-fd or whatever it will be called needs to returnn
> both fdset number and the individual fd number.
>
>>>>> 5. qemu_close decrements refcount for fd, and closes fd when refcount is
>>>>> zero and libvirt in use flag is off.
>>>>
>>>> The monitor could just hold another reference, then we save the
>>>> additional flag. But that's a qemu implementation detail.
>>>>
>>>
>>> I'm not sure I understand what you mean.
>>
>> pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset,
>> with initial use count of 1 (the use is the monitor). qemu_open()
>> increments the use count. A new qemu_close() wrapper would decrement
>> the use count. And both calling 'remove-fd', or closing the QMP monitor
>> of an fd that has not yet been passed through 'remove-fd', serves as a
>> way to decrement the use count. You'd still have to track whether the
>> monitor is using an fd (to avoid over-decrementing on QMP monitor
>> close), but by having the monitor's use also tracked under the refcount,
>> then refcount reaching 0 is sufficient to auto-close an fd.
>
>> I think
>> that also means that re-establishing the client QMP connection would
>> increment
>
> This is an interesting idea. I've never thought about what to do with a
> reconnect. It felt a bit odd at first, but now your proposal totally
> makes sense to me.
>
>
>> For some examples:
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet been
>> passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is
>> now 0 so qemu closes it
>
> Doesn't the refcount belong to an fdset rather than a single fd? As long
> as the fdset has still a reference, it's possible to reach the other fds
> in the set through a reopen. For example:
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member
> of fdset1, in use by monitor, refcount 1
> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a
> member of fdset, in use by monitor, refcount is still 1
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2
> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its
> refcount decreased to 1, and both member fds 4 and 5 stay open.
>
> If you had refcounted a single fd, fd 5 would be closed now and qemu
> couldn't switch to O_RDONLY any more.
>
If the O_RDONLY is for a future command, it would make more sense to add
that fd before that future command. Or are you passing it at step 2
because it is needed for the probe re-open issue?
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> so qemu_open() increments the refcount to 2
>> 3. client crashes, so all tracked fds are visited; fd=4 had not yet been
>> passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4
>> open because it is still in use by the block device
>> 4. client re-establishes QMP connection, and 'query-fds' lets client
>> learn about fd=4 still being open as part of fdset1, but also informs
>> client that fd is not in use by the monitor
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> so qemu_open() increments the refcount to 2
>> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
>> longer in use by the monitor, refcount decremented to 1 but still left
>> open because it is in use by the block device
>> 4. client crashes, so all tracked fds are visited; but fd=4 is already
>> marked as not in use by the monitor, so its refcount is unchanged
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> but the command fails for some other reason, so the refcount is still 1
>> at the end of the command (although it may have been temporarily
>> incremented then decremented during the command)
>> 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or
>> QMP connection is closed), so qemu marks fd=4 as no longer in use by the
>> monitor, refcount is now decremented to 0 and fd=4 is closed
>
> Yup, apart from the comment above the examples look good to me.
>
>> I think that covers the idea; you need a bool in_use for tracking
>> monitor state (the monitor is in use until either a remove-fd or a
>> monitor connection closes), as well as a ref-count.
>
> Yes, makes sense to me.
>
>>> We also need a query-fdsets command that lists all fdsets that exist. If
>>>> we add information about single fds to the return value of it, we
>>>> probably don't need a separate query-fd that operates on a single fdset.
>>>>
>>>
>>> Yes, good point. And maybe we don't need 2 commands. query-fdsets
>>> could return all the sets and all the fds that are in those sets.
>>
>> Yes, I think a single query command is good enough here, something like:
>>
>> { "execute":"query-fdsets" } =>
>> { "return" : { "sets": [
>> { "name": "fdset1",
>> "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] },
>> { "name": "fdset2",
>> "fds": [ { "fd": 5, "monitor": false, "refcount": 1 },
>> { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
>
> Looks good, this is exactly what I was thinking of.
>
>>>> In use by whom? If it's still in use in qemu (as in "in-use flag would
>>>> be set") and we have a refcount of zero, then that's a bug.
>>>>
>>>
>>> In use by qemu. I don't think it's a bug. I think there are situations
>>> where refcount gets to zero but qemu is still using the fd.
>>
>> I think the refcount being non-zero _is_ what defines an fd as being in
>> use by qemu (including use by the monitor). Any place you have to close
>> an fd before reopening it is dangerous; the safe way is always to open
>> with the new permissions before closing the old permissions.
>
> There is one case I'm aware of where we need to be careful: Before
> opening an image, qemu may probe the format. In this case, the image
> gets opened twice, and the first close comes before the second open. I'm
> not entirely sure how hard it would be to get rid of that behaviour.
>
> If we can't get rid of it, we have a small window that the refcount
> doesn't really cover, and if we weren't careful it would become racy.
> This is why I mentioned earlier that maybe we need to defer the refcount
> decrease a bit. However, I can't see how the in-use flag would make a
> difference there. If the refcount can't cover it, the in-use flag can't
> either.
Yeah this is a problem. Could we introduce another flag to cover this?
--
Regards,
Corey
next prev parent reply other threads:[~2012-07-05 14:25 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-22 18:36 [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
2012-06-22 19:31 ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 2/7] qapi: Convert getfd and closefd Corey Bryant
2012-07-11 18:51 ` Luiz Capitulino
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 3/7] qapi: Add pass-fd QMP command Corey Bryant
2012-06-22 20:24 ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 4/7] qapi: Re-arrange monitor.c functions Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 5/7] block: Prevent /dev/fd/X filename from being detected as floppy Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 6/7] block: Convert open calls to qemu_open Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 7/7] osdep: Enable qemu_open to dup pre-opened fd Corey Bryant
2012-06-22 19:58 ` Eric Blake
[not found] ` <20120626091004.GA14451@redhat.com>
[not found] ` <4FE9A0F0.2050809@redhat.com>
[not found] ` <20120626175045.2c7011b3@doriath.home>
[not found] ` <4FEA37A9.10707@linux.vnet.ibm.com>
[not found] ` <4FEA3D9C.8080205@redhat.com>
2012-07-02 22:02 ` [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-07-02 22:31 ` Eric Blake
2012-07-03 9:07 ` Daniel P. Berrange
2012-07-03 9:40 ` Kevin Wolf
2012-07-03 13:42 ` Corey Bryant
2012-07-03 15:40 ` Corey Bryant
2012-07-03 15:59 ` Kevin Wolf
2012-07-03 16:25 ` Corey Bryant
2012-07-03 17:03 ` Eric Blake
2012-07-03 17:46 ` Corey Bryant
2012-07-03 18:00 ` Eric Blake
2012-07-03 18:21 ` Corey Bryant
2012-07-04 8:09 ` Kevin Wolf
2012-07-05 15:06 ` Corey Bryant
2012-07-09 14:05 ` Luiz Capitulino
2012-07-09 15:05 ` Corey Bryant
2012-07-09 15:46 ` Kevin Wolf
2012-07-09 16:18 ` Luiz Capitulino
2012-07-09 17:59 ` Corey Bryant
2012-07-09 17:35 ` Corey Bryant
2012-07-09 17:48 ` Luiz Capitulino
2012-07-09 18:02 ` Corey Bryant
2012-07-10 7:53 ` Kevin Wolf
2012-07-09 18:20 ` Corey Bryant
2012-07-04 8:00 ` Kevin Wolf
2012-07-05 14:22 ` Corey Bryant [this message]
2012-07-05 14:51 ` Kevin Wolf
2012-07-05 16:35 ` Corey Bryant
2012-07-05 16:37 ` Corey Bryant
2012-07-06 9:06 ` Kevin Wolf
2012-07-05 17:00 ` Eric Blake
2012-07-05 17:36 ` Corey Bryant
2012-07-06 9:11 ` Kevin Wolf
2012-07-06 17:14 ` Corey Bryant
2012-07-06 17:15 ` Corey Bryant
2012-07-06 17:40 ` Corey Bryant
2012-07-06 18:19 ` [Qemu-devel] [libvirt] " Corey Bryant
2012-07-09 14:04 ` [Qemu-devel] " Kevin Wolf
2012-07-09 15:23 ` Corey Bryant
2012-07-09 15:30 ` Kevin Wolf
2012-07-09 18:40 ` Anthony Liguori
2012-07-09 19:00 ` Luiz Capitulino
2012-07-10 8:54 ` Daniel P. Berrange
2012-07-10 7:58 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FF5A331.9030108@linux.vnet.ibm.com \
--to=coreyb@linux.vnet.ibm.com \
--cc=aliguori@us.ibm.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=libvir-list@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).