qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
@ 2009-01-22 16:59 Anthony Liguori
  2009-01-26 15:13 ` Gerd Hoffmann
  0 siblings, 1 reply; 3+ messages in thread
From: Anthony Liguori @ 2009-01-22 16:59 UTC (permalink / raw)
  To: qemu-devel

Revision: 6397
          http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=6397
Author:   aliguori
Date:     2009-01-22 16:59:24 +0000 (Thu, 22 Jan 2009)

Log Message:
-----------
Vectored block device API (Avi Kivity)

Most devices that are capable of DMA are also capable of scatter-gather.
With the memory mapping API, this means that the device code needs to be
able to access discontiguous host memory regions.

For block devices, this translates to vectored I/O.  This patch implements
an aynchronous vectored interface for the qemu block devices.  At the moment
all I/O is bounced and submitted through the non-vectored API; in the future
we will convert block devices to natively support vectored I/O wherever
possible.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Modified Paths:
--------------
    trunk/block.c
    trunk/block.h

Modified: trunk/block.c
===================================================================
--- trunk/block.c	2009-01-22 16:59:20 UTC (rev 6396)
+++ trunk/block.c	2009-01-22 16:59:24 UTC (rev 6397)
@@ -1246,6 +1246,69 @@
 /**************************************************************/
 /* async I/Os */
 
+typedef struct VectorTranslationState {
+    QEMUIOVector *iov;
+    uint8_t *bounce;
+    int is_write;
+    BlockDriverAIOCB *aiocb;
+    BlockDriverAIOCB *this_aiocb;
+} VectorTranslationState;
+
+static void bdrv_aio_rw_vector_cb(void *opaque, int ret)
+{
+    VectorTranslationState *s = opaque;
+
+    if (!s->is_write) {
+        qemu_iovec_from_buffer(s->iov, s->bounce);
+    }
+    qemu_free(s->bounce);
+    s->this_aiocb->cb(s->this_aiocb->opaque, ret);
+    qemu_aio_release(s->this_aiocb);
+}
+
+static BlockDriverAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            QEMUIOVector *iov,
+                                            int nb_sectors,
+                                            BlockDriverCompletionFunc *cb,
+                                            void *opaque,
+                                            int is_write)
+
+{
+    VectorTranslationState *s = qemu_mallocz(sizeof(*s));
+    BlockDriverAIOCB *aiocb = qemu_aio_get(bs, cb, opaque);
+
+    s->this_aiocb = aiocb;
+    s->iov = iov;
+    s->bounce = qemu_memalign(512, nb_sectors * 512);
+    s->is_write = is_write;
+    if (is_write) {
+        qemu_iovec_to_buffer(s->iov, s->bounce);
+        s->aiocb = bdrv_aio_write(bs, sector_num, s->bounce, nb_sectors,
+                                  bdrv_aio_rw_vector_cb, s);
+    } else {
+        s->aiocb = bdrv_aio_read(bs, sector_num, s->bounce, nb_sectors,
+                                 bdrv_aio_rw_vector_cb, s);
+    }
+    return aiocb;
+}
+
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+                              cb, opaque, 0);
+}
+
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+                                  QEMUIOVector *iov, int nb_sectors,
+                                  BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+                              cb, opaque, 1);
+}
+
 BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
                                 uint8_t *buf, int nb_sectors,
                                 BlockDriverCompletionFunc *cb, void *opaque)
@@ -1294,6 +1357,11 @@
 {
     BlockDriver *drv = acb->bs->drv;
 
+    if (acb->cb == bdrv_aio_rw_vector_cb) {
+        VectorTranslationState *s = acb->opaque;
+        acb = s->aiocb;
+    }
+
     drv->bdrv_aio_cancel(acb);
 }
 

Modified: trunk/block.h
===================================================================
--- trunk/block.h	2009-01-22 16:59:20 UTC (rev 6396)
+++ trunk/block.h	2009-01-22 16:59:24 UTC (rev 6397)
@@ -2,6 +2,7 @@
 #define BLOCK_H
 
 #include "qemu-aio.h"
+#include "qemu-common.h"
 
 /* block.c */
 typedef struct BlockDriver BlockDriver;
@@ -85,6 +86,13 @@
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
 
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+                                  QEMUIOVector *iov, int nb_sectors,
+                                  BlockDriverCompletionFunc *cb, void *opaque);
+
 BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
                                 uint8_t *buf, int nb_sectors,
                                 BlockDriverCompletionFunc *cb, void *opaque);

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
  2009-01-22 16:59 [Qemu-devel] [6397] Vectored block device API (Avi Kivity) Anthony Liguori
@ 2009-01-26 15:13 ` Gerd Hoffmann
  2009-01-26 20:48   ` Anthony Liguori
  0 siblings, 1 reply; 3+ messages in thread
From: Gerd Hoffmann @ 2009-01-26 15:13 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
> For block devices, this translates to vectored I/O.  This patch implements
> an aynchronous vectored interface for the qemu block devices.  At the moment
> all I/O is bounced and submitted through the non-vectored API; in the future
> we will convert block devices to natively support vectored I/O wherever
> possible.

Any plan for this?

Current state is this:  BlockDriver provides *three* ways to do I/O.

#1 is brdv_{read,write}, operating on sectors.
#2 is bdrv_aio_{read,write}, operation on sectors too.
#3 is brdv_{pread,pwrite}, operating on bytes.

All block drivers implement #1.
Most block drivers implement only #1.
#2 is implemented by qcow, qcow2, raw (including host_device).
#3 is implemented by raw (+hostdevice) only.

We can't kill #1 for the time being.

Not sure what the motivation for #3 is (O_DIRECT ?).  Is that actually
useful for something?  Can we drop it maybe?  Block I/O is sector
oriented after all, so I'm not sure what the motivation for a
byte-oriented interface is in the first place ...

#2 is the candidate to be transformed into a vectored API.  Given the
plan is to implement aio using threads:  I think the block driver
doesn't need to know anything about aio.  It should provide read/write
methods which are (a) vectored and (b) thread-safe.

For raw this is trivial:  Use preadv(), done.

For qcow2 this is a bit more difficuilt.  Metadata updating obviously
requires some locking.  Probably lookups too (I'm not familiar with the
on-disk format and driver internals).  The actual data transfer should
be doable unlocked I think.

The interface could look like this:

    int (*bdrv_read)(BlockDriverState *bs, QEMUIOVector *qiov,
                     int64_t sector_num);
    int (*bdrv_write)(BlockDriverState *bs, QEMUIOVector *qiov,
                      int64_t sector_num);

All the aio magic can live in the block layer then, well hidden from the
block drivers.

Comments on this?  Especially from people knowing qcow2 better that I do?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
  2009-01-26 15:13 ` Gerd Hoffmann
@ 2009-01-26 20:48   ` Anthony Liguori
  0 siblings, 0 replies; 3+ messages in thread
From: Anthony Liguori @ 2009-01-26 20:48 UTC (permalink / raw)
  To: qemu-devel

Gerd Hoffmann wrote:
> Anthony Liguori wrote:
>   
>> For block devices, this translates to vectored I/O.  This patch implements
>> an aynchronous vectored interface for the qemu block devices.  At the moment
>> all I/O is bounced and submitted through the non-vectored API; in the future
>> we will convert block devices to natively support vectored I/O wherever
>> possible.
>>     
>
> Any plan for this?
>   

Avi is on vacation this week but my understanding is that he'll be 
submitting patches shortly after he gets back.

> Current state is this:  BlockDriver provides *three* ways to do I/O.
>
> #1 is brdv_{read,write}, operating on sectors.
> #2 is bdrv_aio_{read,write}, operation on sectors too.
> #3 is brdv_{pread,pwrite}, operating on bytes.
>
> All block drivers implement #1.
> Most block drivers implement only #1.
> #2 is implemented by qcow, qcow2, raw (including host_device).
> #3 is implemented by raw (+hostdevice) only.
>
> We can't kill #1 for the time being.
>
> Not sure what the motivation for #3 is (O_DIRECT ?).

Internally, qcow, qcow2, etc. use the block API to access the disk 
formats.  These accesses are not always sector aligned.

> For qcow2 this is a bit more difficuilt.  Metadata updating obviously
> requires some locking.  Probably lookups too (I'm not familiar with the
> on-disk format and driver internals).  The actual data transfer should
> be doable unlocked I think.
>
> The interface could look like this:
>
>     int (*bdrv_read)(BlockDriverState *bs, QEMUIOVector *qiov,
>                      int64_t sector_num);
>     int (*bdrv_write)(BlockDriverState *bs, QEMUIOVector *qiov,
>                       int64_t sector_num);
>
> All the aio magic can live in the block layer then, well hidden from the
> block drivers.
>   

I agree with this on principle, but making all of the entry points 
thread safe is a fair bit of work.  Recall that we don't use threads on 
every platform so we need to take that into account too.

Regards,

Anthony Liguori

> Comments on this?  Especially from people knowing qcow2 better that I do?
>
> cheers,
>   Gerd
>
>
>   

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-01-26 20:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-22 16:59 [Qemu-devel] [6397] Vectored block device API (Avi Kivity) Anthony Liguori
2009-01-26 15:13 ` Gerd Hoffmann
2009-01-26 20:48   ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).