All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>,
	Christian Brunner <chb@muc.de>, Blue Swirl <blauwirbel@gmail.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org,
	ceph-devel@vger.kernel.org
Subject: Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Date: Tue, 25 May 2010 12:19:05 +0300	[thread overview]
Message-ID: <4BFB9609.1070002@redhat.com> (raw)
In-Reply-To: <4BFAD090.3000203@codemonkey.ws>

On 05/24/2010 10:16 PM, Anthony Liguori wrote:
> On 05/24/2010 06:56 AM, Avi Kivity wrote:
>> On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:
>>>
>>>> The server would be local and talk over a unix domain socket, perhaps
>>>> anonymous.
>>>>
>>>> nbd has other issues though, such as requiring a copy and no 
>>>> support for
>>>> metadata operations such as snapshot and file size extension.
>>>>
>>> Sorry, my explanation was unclear.  I'm not sure how running servers
>>> on localhost can solve the problem.
>>
>> The local server can convert from the local (nbd) protocol to the 
>> remote (sheepdog, ceph) protocol.
>>
>>> What I wanted to say was that we cannot specify the image of VM. With
>>> nbd protocol, command line arguments are as follows:
>>>
>>>   $ qemu nbd:hostname:port
>>>
>>> As this syntax shows, with nbd protocol the client cannot pass the VM
>>> image name to the server.
>>
>> We would extend it to allow it to connect to a unix domain socket:
>>
>>   qemu nbd:unix:/path/to/socket
>
> nbd is a no-go because it only supports a single, synchronous I/O 
> operation at a time and has no mechanism for extensibility.
>
> If we go this route, I think two options are worth considering.  The 
> first would be a purely socket based approach where we just accepted 
> the extra copy.
>
> The other potential approach would be shared memory based.  We export 
> all guest ram as shared memory along with a small bounce buffer pool.  
> We would then use a ring queue (potentially even using virtio-blk) and 
> an eventfd for notification.

We can't actually export guest memory unless we allocate it as a shared 
memory object, which has many disadvantages.  The only way to export 
anonymous memory now is vmsplice(), which is fairly limited.


>
>> The server at the other end would associate the socket with a 
>> filename and forward it to the server using the remote protocol.
>>
>> However, I don't think nbd would be a good protocol.  My preference 
>> would be for a plugin API, or for a new local protocol that uses 
>> splice() to avoid copies.
>
> I think a good shared memory implementation would be preferable to 
> plugins.  I think it's worth attempting to do a plugin interface for 
> the block layer but I strongly suspect it would not be sufficient.
>
> I would not want to see plugins that interacted with BlockDriverState 
> directly, for instance.  We change it far too often.  Our main loop 
> functions are also not terribly stable so I'm not sure how we would 
> handle that (unless we forced all block plugins to be in a separate 
> thread).

If we manage to make a good long-term stable plugin API, it would be a 
good candidate for the block layer itself.

Some OSes manage to have a stable block driver ABI, so it should be 
possible, if difficult.

-- 
error compiling committee.c: too many arguments to function


WARNING: multiple messages have this Message-ID (diff)
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	Blue Swirl <blauwirbel@gmail.com>,
	ceph-devel@vger.kernel.org, Christian Brunner <chb@muc.de>,
	MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm
Date: Tue, 25 May 2010 12:19:05 +0300	[thread overview]
Message-ID: <4BFB9609.1070002@redhat.com> (raw)
In-Reply-To: <4BFAD090.3000203@codemonkey.ws>

On 05/24/2010 10:16 PM, Anthony Liguori wrote:
> On 05/24/2010 06:56 AM, Avi Kivity wrote:
>> On 05/24/2010 02:42 PM, MORITA Kazutaka wrote:
>>>
>>>> The server would be local and talk over a unix domain socket, perhaps
>>>> anonymous.
>>>>
>>>> nbd has other issues though, such as requiring a copy and no 
>>>> support for
>>>> metadata operations such as snapshot and file size extension.
>>>>
>>> Sorry, my explanation was unclear.  I'm not sure how running servers
>>> on localhost can solve the problem.
>>
>> The local server can convert from the local (nbd) protocol to the 
>> remote (sheepdog, ceph) protocol.
>>
>>> What I wanted to say was that we cannot specify the image of VM. With
>>> nbd protocol, command line arguments are as follows:
>>>
>>>   $ qemu nbd:hostname:port
>>>
>>> As this syntax shows, with nbd protocol the client cannot pass the VM
>>> image name to the server.
>>
>> We would extend it to allow it to connect to a unix domain socket:
>>
>>   qemu nbd:unix:/path/to/socket
>
> nbd is a no-go because it only supports a single, synchronous I/O 
> operation at a time and has no mechanism for extensibility.
>
> If we go this route, I think two options are worth considering.  The 
> first would be a purely socket based approach where we just accepted 
> the extra copy.
>
> The other potential approach would be shared memory based.  We export 
> all guest ram as shared memory along with a small bounce buffer pool.  
> We would then use a ring queue (potentially even using virtio-blk) and 
> an eventfd for notification.

We can't actually export guest memory unless we allocate it as a shared 
memory object, which has many disadvantages.  The only way to export 
anonymous memory now is vmsplice(), which is fairly limited.


>
>> The server at the other end would associate the socket with a 
>> filename and forward it to the server using the remote protocol.
>>
>> However, I don't think nbd would be a good protocol.  My preference 
>> would be for a plugin API, or for a new local protocol that uses 
>> splice() to avoid copies.
>
> I think a good shared memory implementation would be preferable to 
> plugins.  I think it's worth attempting to do a plugin interface for 
> the block layer but I strongly suspect it would not be sufficient.
>
> I would not want to see plugins that interacted with BlockDriverState 
> directly, for instance.  We change it far too often.  Our main loop 
> functions are also not terribly stable so I'm not sure how we would 
> handle that (unless we forced all block plugins to be in a separate 
> thread).

If we manage to make a good long-term stable plugin API, it would be a 
good candidate for the block layer itself.

Some OSes manage to have a stable block driver ABI, so it should be 
possible, if difficult.

-- 
error compiling committee.c: too many arguments to function

  reply	other threads:[~2010-05-25  9:19 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-19 19:22 [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm Christian Brunner
2010-05-19 19:22 ` [Qemu-devel] " Christian Brunner
2010-05-20 20:31 ` Blue Swirl
2010-05-20 20:31   ` Blue Swirl
2010-05-20 21:18   ` Christian Brunner
2010-05-20 21:18     ` Christian Brunner
2010-05-20 21:29     ` Anthony Liguori
2010-05-20 21:29       ` Anthony Liguori
2010-05-20 22:16       ` Christian Brunner
2010-05-20 22:16         ` Christian Brunner
2010-05-21  5:28         ` Stefan Hajnoczi
2010-05-21  5:28           ` Stefan Hajnoczi
2010-05-21  6:13           ` MORITA Kazutaka
2010-05-21  6:13             ` MORITA Kazutaka
2010-05-21  5:54         ` MORITA Kazutaka
2010-05-21  5:54           ` MORITA Kazutaka
2010-05-23 12:01       ` Avi Kivity
2010-05-23 12:01         ` Avi Kivity
2010-05-24  7:12         ` MORITA Kazutaka
2010-05-24  7:12           ` MORITA Kazutaka
2010-05-24 11:05           ` Avi Kivity
2010-05-24 11:05             ` Avi Kivity
2010-05-24 11:42             ` MORITA Kazutaka
2010-05-24 11:42               ` MORITA Kazutaka
2010-05-24 11:56               ` Avi Kivity
2010-05-24 11:56                 ` Avi Kivity
2010-05-24 12:07                 ` Cláudio Martins
2010-05-24 12:07                   ` Cláudio Martins
2010-05-24 14:01                 ` MORITA Kazutaka
2010-05-24 14:01                   ` MORITA Kazutaka
2010-05-24 19:07                   ` Christian Brunner
2010-05-24 19:07                     ` Christian Brunner
2010-05-24 19:38                     ` Anthony Liguori
2010-05-24 19:38                       ` Anthony Liguori
2010-05-25  9:14                       ` Avi Kivity
2010-05-25  9:14                         ` Avi Kivity
2010-05-25 13:17                         ` Anthony Liguori
2010-05-25 13:17                           ` Anthony Liguori
2010-05-25 13:25                           ` Avi Kivity
2010-05-25 13:25                             ` Avi Kivity
2010-05-25 13:29                             ` Anthony Liguori
2010-05-25 13:29                               ` Anthony Liguori
2010-05-25 13:36                               ` Avi Kivity
2010-05-25 13:36                                 ` Avi Kivity
2010-05-25 13:54                                 ` Anthony Liguori
2010-05-25 13:54                                   ` Anthony Liguori
2010-05-25 13:57                                   ` Avi Kivity
2010-05-25 13:57                                     ` Avi Kivity
2010-05-25 14:02                                     ` Anthony Liguori
2010-05-25 14:02                                       ` Anthony Liguori
2010-05-26  8:44                                       ` Avi Kivity
2010-05-26  8:44                                         ` Avi Kivity
2010-05-25 14:01                             ` Kevin Wolf
2010-05-25 14:01                               ` Kevin Wolf
2010-05-25 16:21                               ` Avi Kivity
2010-05-25 16:21                                 ` Avi Kivity
2010-05-25 17:12                                 ` Sage Weil
2010-05-25 17:12                                   ` Sage Weil
2010-05-25 17:12                                   ` Sage Weil
2010-05-26  5:24                                   ` MORITA Kazutaka
2010-05-26  5:24                                     ` MORITA Kazutaka
2010-05-26  8:46                                   ` Avi Kivity
2010-05-26  8:46                                     ` Avi Kivity
2010-05-24 19:16                 ` Anthony Liguori
2010-05-24 19:16                   ` Anthony Liguori
2010-05-25  9:19                   ` Avi Kivity [this message]
2010-05-25  9:19                     ` Avi Kivity
2010-05-25 13:26                   ` MORITA Kazutaka
2010-05-25 13:26                     ` MORITA Kazutaka
2010-05-24  8:27         ` Stefan Hajnoczi
2010-05-24  8:27           ` Stefan Hajnoczi
2010-05-24 11:03           ` Avi Kivity
2010-05-24 11:03             ` Avi Kivity
2010-05-24 19:19             ` Anthony Liguori
2010-05-24 19:19               ` Anthony Liguori
2010-05-25  9:22               ` Avi Kivity
2010-05-25  9:22                 ` Avi Kivity
2010-05-25 11:02         ` Kevin Wolf
2010-05-25 11:02           ` Kevin Wolf
2010-05-25 11:25           ` Avi Kivity
2010-05-25 11:25             ` Avi Kivity
2010-05-25 12:03             ` Christoph Hellwig
2010-05-25 12:03               ` Christoph Hellwig
2010-05-25 12:13               ` Avi Kivity
2010-05-25 12:13                 ` Avi Kivity
2010-05-25 13:25             ` Anthony Liguori
2010-05-25 13:25               ` Anthony Liguori
2010-05-25 13:31               ` Avi Kivity
2010-05-25 13:31                 ` Avi Kivity
2010-05-25 13:35                 ` Anthony Liguori
2010-05-25 13:35                   ` Anthony Liguori
2010-05-25 13:38                   ` Avi Kivity
2010-05-25 13:38                     ` Avi Kivity
2010-05-25 13:55                     ` Anthony Liguori
2010-05-25 13:55                       ` Anthony Liguori
2010-05-25 14:01                       ` Avi Kivity
2010-05-25 14:01                         ` Avi Kivity
2010-05-25 14:05                         ` Anthony Liguori
2010-05-25 14:05                           ` Anthony Liguori
2010-05-25 15:00                           ` Avi Kivity
2010-05-25 15:00                             ` Avi Kivity
2010-05-25 15:01                             ` Anthony Liguori
2010-05-25 15:01                               ` Anthony Liguori
2010-05-25 16:16                               ` Avi Kivity
2010-05-25 16:16                                 ` Avi Kivity
2010-05-25 16:21                                 ` Anthony Liguori
2010-05-25 16:21                                   ` Anthony Liguori
2010-05-25 16:27                                   ` Avi Kivity
2010-05-25 16:27                                     ` Avi Kivity
2010-05-25 13:53               ` Kevin Wolf
2010-05-25 13:53                 ` Kevin Wolf
2010-05-25 13:55                 ` Avi Kivity
2010-05-25 13:55                   ` Avi Kivity
2010-05-25 14:03                   ` Anthony Liguori
2010-05-25 14:03                     ` Anthony Liguori
2010-05-25 15:02                     ` Avi Kivity
2010-05-25 15:02                       ` Avi Kivity
2010-05-25 14:09                   ` Kevin Wolf
2010-05-25 14:09                     ` Kevin Wolf
2010-05-25 15:01                     ` Avi Kivity
2010-05-25 15:01                       ` Avi Kivity
2010-05-20 23:02   ` Yehuda Sadeh Weinraub
2010-05-20 23:02     ` Yehuda Sadeh Weinraub
2010-05-23  7:59     ` Blue Swirl
2010-05-23  7:59       ` Blue Swirl
2010-05-24  2:17       ` Yehuda Sadeh Weinraub
2010-05-24  2:17         ` Yehuda Sadeh Weinraub
2010-05-25 20:13         ` Blue Swirl
2010-05-25 20:13           ` [Qemu-devel] " Blue Swirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BFB9609.1070002@redhat.com \
    --to=avi@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=blauwirbel@gmail.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=chb@muc.de \
    --cc=kvm@vger.kernel.org \
    --cc=morita.kazutaka@lab.ntt.co.jp \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.