From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=59814 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OGuih-0002AR-0M for qemu-devel@nongnu.org; Tue, 25 May 2010 10:02:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OGuif-0006zY-Ja for qemu-devel@nongnu.org; Tue, 25 May 2010 10:02:58 -0400 Received: from mail-gw0-f45.google.com ([74.125.83.45]:43623) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OGuif-0006zS-G8 for qemu-devel@nongnu.org; Tue, 25 May 2010 10:02:57 -0400 Received: by gwb11 with SMTP id 11so52741gwb.4 for ; Tue, 25 May 2010 07:02:56 -0700 (PDT) Message-ID: <4BFBD88D.9090002@codemonkey.ws> Date: Tue, 25 May 2010 09:02:53 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC PATCH 1/1] ceph/rbd block driver for qemu-kvm References: <20100519192222.GD61706@ncolin.muc.de> <4BF5A9D2.5080609@codemonkey.ws> <4BF91937.2070801@redhat.com> <87wrutg4dk.wl%morita.kazutaka@lab.ntt.co.jp> <4BFA5D96.3030603@redhat.com> <4BFA696D.2060606@redhat.com> <4BFAD59E.2010706@codemonkey.ws> <4BFB94D9.5080904@redhat.com> <4BFBCDD9.4070104@codemonkey.ws> <4BFBCFB9.6020104@redhat.com> <4BFBD0C6.9000105@codemonkey.ws> <4BFBD261.9040908@redhat.com> <4BFBD693.2030108@codemonkey.ws> <4BFBD759.40601@redhat.com> In-Reply-To: <4BFBD759.40601@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, Blue Swirl , ceph-devel@vger.kernel.org, Christian Brunner , MORITA Kazutaka On 05/25/2010 08:57 AM, Avi Kivity wrote: > On 05/25/2010 04:54 PM, Anthony Liguori wrote: >> On 05/25/2010 08:36 AM, Avi Kivity wrote: >>> >>> We'd need a kernel-level generic snapshot API for this eventually. >>> >>>> or (2) implement BUSE to complement FUSE and CUSE to enable proper >>>> userspace block devices. >>> >>> Likely slow due do lots of copying. Also needs a snapshot API. >> >> The kernel could use splice. > > Still can't make guest memory appear in (A)BUSE process memory without > either mmu tricks (vmsplice in reverse) or a copy. May be workable > for an (A)BUSE driver that talks over a network, and thus can splice() > its way out. splice() actually takes offset parameter so it may be possible to treat that offset parameter as a file offset. That would essentially allow you to implement a splice() based thread pool where splice() replaces preadv/pwritev. It's not quite linux-aio, but it should take you pretty far. I think the main point is that the problem of allowing block plugins to qemu is the same as block plugins for the kernel. The kernel doesn't provide a stable interface (and we probably can't for the same reasons) and it's generally discourage from a code quality perspective. That said, making an external program work well as a block backend is identical to making userspace block devices fast. Regards, Anthony Liguori