qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anthony Liguori <anthony@codemonkey.ws>
To: Eric Blake <eblake@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	libvir-list@redhat.com, Anthony Liguori <aliguori@us.ibm.com>,
	Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [libvirt] [RFC 0/5] block: File descriptor passing using -open-hook-fd
Date: Tue, 01 May 2012 16:52:05 -0500	[thread overview]
Message-ID: <4FA05B05.70406@codemonkey.ws> (raw)
In-Reply-To: <4FA04E12.2090206@redhat.com>

On 05/01/2012 03:56 PM, Eric Blake wrote:
> On 05/01/2012 02:25 PM, Anthony Liguori wrote:
>> Thanks for sending this out Stefan.
>
> Indeed.
>
>
>>> This series adds the -open-hook-fd command-line option.  Whenever QEMU
>>> needs to
>>> open an image file it sends a request over the given UNIX domain
>>> socket.  The
>>> response includes the file descriptor or an errno on failure.  Please
>>> see the
>>> patches for details on the protocol.
>>>
>>> The -open-hook-fd approach allows QEMU to support file descriptor passing
>>> without changing -drive.  It also supports snapshot_blkdev and other
>>> commands
>>> that re-open image files.
>>>
>>> Anthony Liguori<aliguori@us.ibm.com>   wrote most of these patches.  I
>>> added a
>>> demo -open-hook-fd server and added some small fixes.  Since Anthony is
>>> traveling right now I'm sending the RFC for discussion.
>>
>> What I like about this approach is that it's useful outside the block
>> layer and is conceptionally simple from a QEMU PoV.  We simply delegate
>> open() to libvirt and let libvirt enforce whatever rules it wants.
>>
>> This is not meant to be an alternative to blockdev, but even with
>> blockdev, I think we still want to use a mechanism like this even with
>> blockdev.
>
> The overall series looks like it would be rather interesting.  What sort
> of timing restrictions are there?  For example, the proposed
> 'drive-reopen' command (probably now delegated to qemu 1.2) would mean
> that qemu would be calling back into libvirt in order to do the reopen.
>   If libvirt takes its time in passing back an open fd, is it going to
> starve qemu from answering unrelated monitor commands in the meantime?

s/libvirt/kernel/g and your concerns are equally valid.

Doing open() should never be done in a path that could block things.  There's 
always the possibility that we're on top of NFS and the open could timeout.

For something like drive_reopen, we should use an asynchronous open() that 
dispatched the open() in the posix-aio thread pool.

That's part of what's nice about this approach, we could still call file_open() 
in the posix-aio thread pool...

> I definitely want to make sure we avoid deadlock where libvirt is
> waiting on a monitor command, but the monitor command is waiting on
> libvirt to pass an fd.
>
> Is this also an opportunity to request whether a particular fd must be
> seekable vs. acceptable as a one-pass read or write, perhaps by whether
> the command is 1 (seekable open) or 2 (one-pass open)?

I'm not really sure where the distinction lies...

I want the RPC to behave exactly like open().  So if we're assuming that open() 
of a /dev/ file returns something that is ioctl()'able, then that's what libvirt 
should return.

If we want to sort of do fd-transformation where a special protocol is used for 
things like ioctl, that's fine, but it ought to be a different mechanism (that's 
probably not nearly as generic).

> For example,
> migration is one-pass (and therefore libvirt passes a pipe which is
> hooked up to a helper app that uses O_DIRECT), while block devices must
> be seekable.

But migration doesn't involve doing an open().  This is not a replacement for fd 
passing.  This is a replacement for open() to make up for the facts that (1) 
some management tools like libvirt cannot isolate guests with DAC and (2) 
SELinux cannot be used to isolate guests across all file systems.

I would really prefer that the kernel fix this problem for us, but from what I'm 
told, the problem lies in the NFS standards committee so short of forking the 
NFS protocol, there isn't much that the kernel can do.

Regards,

Anthony Liguori

>

  reply	other threads:[~2012-05-01 21:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-01 15:31 [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 1/5] block: add open() wrapper that can be hooked by libvirt Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 2/5] block: add new command line parameter that and protocol description Stefan Hajnoczi
2012-05-02  8:58   ` Daniel P. Berrange
2012-05-02  9:03   ` Daniel P. Berrange
2012-05-01 15:31 ` [Qemu-devel] [RFC 3/5] block: plumb up open-hook-fd option Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 4/5] osdep: add qemu_recvmsg() wrapper Stefan Hajnoczi
2012-05-01 15:31 ` [Qemu-devel] [RFC 5/5] Example -open-hook-fd server Stefan Hajnoczi
2012-05-01 16:04   ` [Qemu-devel] [libvirt] " Stefan Hajnoczi
2012-05-01 20:25 ` [Qemu-devel] [RFC 0/5] block: File descriptor passing using -open-hook-fd Anthony Liguori
2012-05-01 20:56   ` [Qemu-devel] [libvirt] " Eric Blake
2012-05-01 21:52     ` Anthony Liguori [this message]
2012-05-02 16:40     ` Paolo Bonzini
2012-05-01 21:45   ` [Qemu-devel] " Corey Bryant
2012-05-01 21:53     ` Anthony Liguori
2012-05-01 22:15       ` [Qemu-devel] [libvirt] " Eric Blake
2012-05-01 22:21         ` Anthony Liguori
2012-05-07 16:10         ` Corey Bryant
2012-05-02  8:20   ` [Qemu-devel] " Kevin Wolf
2012-05-02  8:27     ` Stefan Hajnoczi
2012-05-02  9:38       ` Kevin Wolf
2012-05-02  8:53     ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2012-05-02  9:45       ` Kevin Wolf
2012-05-02  9:56         ` Daniel P. Berrange
2012-05-02 19:25           ` Paolo Bonzini
2012-05-03 19:19         ` Anthony Liguori
2012-05-02  9:01 ` [Qemu-devel] " Daniel P. Berrange
2012-05-04  3:28 ` [Qemu-devel] [libvirt] " Zhi Yong Wu
2012-05-17 13:42   ` Stefan Hajnoczi
2012-05-17 13:57     ` Zhi Yong Wu
2012-05-17 14:02     ` Zhi Yong Wu
2012-05-18 10:38       ` Stefan Hajnoczi
2012-05-17 14:14     ` Eric Blake
2012-05-18 10:38       ` Stefan Hajnoczi
2012-07-09 20:00       ` Anthony Liguori
2012-07-09 20:29         ` Eric Blake
2012-07-09 20:46           ` Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FA05B05.70406@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=aliguori@us.ibm.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=libvir-list@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).