All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corey Bryant <coreyb@linux.vnet.ibm.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
	libvir-list@redhat.com, qemu-devel@nongnu.org,
	Luiz Capitulino <lcapitulino@redhat.com>,
	pbonzini@redhat.com, Eric Blake <eblake@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 0/7] file descriptor	passing	using	pass-fd
Date: Thu, 05 Jul 2012 12:37:18 -0400	[thread overview]
Message-ID: <4FF5C2BE.3070607@linux.vnet.ibm.com> (raw)
In-Reply-To: <4FF5C24E.9010008@linux.vnet.ibm.com>



On 07/05/2012 12:35 PM, Corey Bryant wrote:
>
>
> On 07/05/2012 10:51 AM, Kevin Wolf wrote:
>> Am 05.07.2012 16:22, schrieb Corey Bryant:
>>>>> For some examples:
>>>>>
>>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount
>>>>> 1, in
>>>>> use by monitor, as member of fdset1
>>>>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet
>>>>> been
>>>>> passed to 'remove-fd', so qemu decrements refcount; refcount of
>>>>> fd=4 is
>>>>> now 0 so qemu closes it
>>>>
>>>> Doesn't the refcount belong to an fdset rather than a single fd? As
>>>> long
>>>> as the fdset has still a reference, it's possible to reach the other
>>>> fds
>>>> in the set through a reopen. For example:
>>>>
>>>> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a
>>>> member
>>>> of fdset1, in use by monitor, refcount 1
>>>> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a
>>>> member of fdset, in use by monitor, refcount is still 1
>>>> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
>>>> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2
>>>> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its
>>>> refcount decreased to 1, and both member fds 4 and 5 stay open.
>>>>
>>>> If you had refcounted a single fd, fd 5 would be closed now and qemu
>>>> couldn't switch to O_RDONLY any more.
>>>>
>>>
>>> If the O_RDONLY is for a future command, it would make more sense to add
>>> that fd before that future command.  Or are you passing it at step 2
>>> because it is needed for the probe re-open issue?
>>
>> Come on, requiring realistic examples isn't fair. ;-)
>
> Heh :)
>
>>
>> Swap steps 2 and 3 in the example, then step 3 is just the preparation
>> for a command that uses the new fd. The problem stays the same. Or a
>> live commit operation could be in flight so that qemu must be able to
>> switch on completion without libvirt interaction.
>
> Good point.
>
> I thought it would be useful to run through the examples again with each
> fdset having a refcount, and each fd having an in-use flag.  One
> difference though is that refcount is only incremented/decremented when
> qemu_open/qemu_close are called.  I think this works and covers Kevin's
> concerns.  Actually it might be a bit simpler too.
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> so qemu_open() increments the refcount of fdset1 to 1
> 3. client crashes, so all tracked fds are visited; fd=4 had not yet been
> passed to 'remove-fd', so it's in-use flag is on;  in-use flag is turned
> off and fd=4 is left open because it is still in use by the block device
> (refcount is 1)
> 4. client re-establishes QMP connection, so all tracked fds are visited,
> and in-use flags are turned back on; 'query-fds' lets client learn about
> fd=4 still being open as part of fdset1
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> so qemu_open() increments the refcount of fdset1 to 1
> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, and is left open because it is in use by
> the block device (refcount is 1)
> 4. client crashes, so all tracked fds are visited; fd=4 is not in-use
> but refcount is 1 so it is not closed
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
> but the command fails for some other reason, so the refcount is still 0
> at the end of the command (although it may have been temporarily
> incremented then decremented during the command)
> 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or
> QMP connection is closed), so qemu marks fd=4 as no longer in-use by the
> monitor; refcount is 0 so all fds in fdset1 are closed
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with
> refcount still 0; fd=5's in-use flag is turned on
> 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not
> yet been passed to 'remove-fd', so their in-use flags are on; in-use
> flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in
> fdset1
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with
> refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with
> refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, and fd=4 is closed since the refcount is
> 0; fd=5 remains open
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1
> with refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1
> with refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
> 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
> longer in-use by the monitor, but fd=4 remains open because refcount of
> fdset1 is 1
> 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no
> longer in-use by the monitor, and fd=5 remains open because refcount of
> fdset1 is 1
> 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both
> fds are closed since refcount is 0 and their in-use flags are off
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1
> with refcount of 0; fd=4's in-use flag is turned on
> 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1
> with refcount still 0; fd=5's in-use flag is turned on
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1
> 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have
> not yet been passed to 'remove-fd', so their in-use flags are on; in-use
> flags are turned off and both fds are left open because the set is still
> in use by the block device (refcount is 1)
> 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both
> fds are closed since refcount is 0 and their in-use flags are off
>
>>
>> There are enough reasons that an fd in an fdset exists and is not
>> opened, but I can't think of one why it should be dropped.
>>
>>>>>>> In use by whom? If it's still in use in qemu (as in "in-use flag
>>>>>>> would
>>>>>>> be set") and we have a refcount of zero, then that's a bug.
>>>>>>>
>>>>>>
>>>>>> In use by qemu.  I don't think it's a bug.  I think there are
>>>>>> situations
>>>>>> where refcount gets to zero but qemu is still using the fd.
>>>>>
>>>>> I think the refcount being non-zero _is_ what defines an fd as
>>>>> being in
>>>>> use by qemu (including use by the monitor).  Any place you have to
>>>>> close
>>>>> an fd before reopening it is dangerous; the safe way is always to open
>>>>> with the new permissions before closing the old permissions.
>>>>
>>>> There is one case I'm aware of where we need to be careful: Before
>>>> opening an image, qemu may probe the format. In this case, the image
>>>> gets opened twice, and the first close comes before the second open.
>>>> I'm
>>>> not entirely sure how hard it would be to get rid of that behaviour.
>>>>
>>>> If we can't get rid of it, we have a small window that the refcount
>>>> doesn't really cover, and if we weren't careful it would become racy.
>>>> This is why I mentioned earlier that maybe we need to defer the
>>>> refcount
>>>> decrease a bit. However, I can't see how the in-use flag would make a
>>>> difference there. If the refcount can't cover it, the in-use flag can't
>>>> either.
>>>
>>> Yeah this is a problem.  Could we introduce another flag to cover this?
>>
>> Adding more refcounts or flags is not a problem, but it doesn't solve it
>> either. The hard question is when to set that flag.
>>
>> I believe it may be easier to just change the block layer so that it
>> opens files only once during bdrv_open().
>>

Can this fix be delivered after the fd passing patch series?

-- 
Regards,
Corey

  reply	other threads:[~2012-07-05 16:38 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-22 18:36 [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 1/7] qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg Corey Bryant
2012-06-22 19:31   ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 2/7] qapi: Convert getfd and closefd Corey Bryant
2012-07-11 18:51   ` Luiz Capitulino
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 3/7] qapi: Add pass-fd QMP command Corey Bryant
2012-06-22 20:24   ` Eric Blake
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 4/7] qapi: Re-arrange monitor.c functions Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 5/7] block: Prevent /dev/fd/X filename from being detected as floppy Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 6/7] block: Convert open calls to qemu_open Corey Bryant
2012-06-22 18:36 ` [Qemu-devel] [PATCH v4 7/7] osdep: Enable qemu_open to dup pre-opened fd Corey Bryant
2012-06-22 19:58   ` Eric Blake
     [not found] ` <20120626091004.GA14451@redhat.com>
     [not found]   ` <4FE9A0F0.2050809@redhat.com>
     [not found]     ` <20120626175045.2c7011b3@doriath.home>
     [not found]       ` <4FEA37A9.10707@linux.vnet.ibm.com>
     [not found]         ` <4FEA3D9C.8080205@redhat.com>
2012-07-02 22:02           ` [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd Corey Bryant
2012-07-02 22:31             ` Eric Blake
2012-07-03  9:07               ` Daniel P. Berrange
2012-07-03  9:40               ` Kevin Wolf
2012-07-03 13:42               ` Corey Bryant
2012-07-03 15:40             ` Corey Bryant
2012-07-03 15:59               ` Kevin Wolf
2012-07-03 16:25                 ` Corey Bryant
2012-07-03 17:03                   ` Eric Blake
2012-07-03 17:46                     ` Corey Bryant
2012-07-03 18:00                       ` Eric Blake
2012-07-03 18:21                         ` Corey Bryant
2012-07-04  8:09                           ` Kevin Wolf
2012-07-05 15:06                             ` Corey Bryant
2012-07-09 14:05                               ` Luiz Capitulino
2012-07-09 15:05                                 ` Corey Bryant
2012-07-09 15:46                                   ` Kevin Wolf
2012-07-09 16:18                                     ` Luiz Capitulino
2012-07-09 17:59                                       ` Corey Bryant
2012-07-09 17:35                                     ` Corey Bryant
2012-07-09 17:48                                       ` Luiz Capitulino
2012-07-09 18:02                                         ` Corey Bryant
2012-07-10  7:53                                       ` Kevin Wolf
2012-07-09 18:20                                   ` Corey Bryant
2012-07-04  8:00                     ` Kevin Wolf
2012-07-05 14:22                       ` Corey Bryant
2012-07-05 14:51                         ` Kevin Wolf
2012-07-05 16:35                           ` Corey Bryant
2012-07-05 16:37                             ` Corey Bryant [this message]
2012-07-06  9:06                               ` Kevin Wolf
2012-07-05 17:00                             ` Eric Blake
2012-07-05 17:36                               ` Corey Bryant
2012-07-06  9:11                               ` Kevin Wolf
2012-07-06 17:14                                 ` Corey Bryant
2012-07-06 17:15                                   ` Corey Bryant
2012-07-06 17:40                                 ` Corey Bryant
2012-07-06 18:19                                   ` [Qemu-devel] [libvirt] " Corey Bryant
2012-07-09 14:04                                   ` [Qemu-devel] " Kevin Wolf
2012-07-09 15:23                                     ` Corey Bryant
2012-07-09 15:30                                       ` Kevin Wolf
2012-07-09 18:40   ` Anthony Liguori
2012-07-09 19:00     ` Luiz Capitulino
2012-07-10  8:54       ` Daniel P. Berrange
2012-07-10  7:58     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FF5C2BE.3070607@linux.vnet.ibm.com \
    --to=coreyb@linux.vnet.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.