* [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
@ 2009-01-22 16:59 Anthony Liguori
2009-01-26 15:13 ` Gerd Hoffmann
0 siblings, 1 reply; 3+ messages in thread
From: Anthony Liguori @ 2009-01-22 16:59 UTC (permalink / raw)
To: qemu-devel
Revision: 6397
http://svn.sv.gnu.org/viewvc/?view=rev&root=qemu&revision=6397
Author: aliguori
Date: 2009-01-22 16:59:24 +0000 (Thu, 22 Jan 2009)
Log Message:
-----------
Vectored block device API (Avi Kivity)
Most devices that are capable of DMA are also capable of scatter-gather.
With the memory mapping API, this means that the device code needs to be
able to access discontiguous host memory regions.
For block devices, this translates to vectored I/O. This patch implements
an aynchronous vectored interface for the qemu block devices. At the moment
all I/O is bounced and submitted through the non-vectored API; in the future
we will convert block devices to natively support vectored I/O wherever
possible.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Modified Paths:
--------------
trunk/block.c
trunk/block.h
Modified: trunk/block.c
===================================================================
--- trunk/block.c 2009-01-22 16:59:20 UTC (rev 6396)
+++ trunk/block.c 2009-01-22 16:59:24 UTC (rev 6397)
@@ -1246,6 +1246,69 @@
/**************************************************************/
/* async I/Os */
+typedef struct VectorTranslationState {
+ QEMUIOVector *iov;
+ uint8_t *bounce;
+ int is_write;
+ BlockDriverAIOCB *aiocb;
+ BlockDriverAIOCB *this_aiocb;
+} VectorTranslationState;
+
+static void bdrv_aio_rw_vector_cb(void *opaque, int ret)
+{
+ VectorTranslationState *s = opaque;
+
+ if (!s->is_write) {
+ qemu_iovec_from_buffer(s->iov, s->bounce);
+ }
+ qemu_free(s->bounce);
+ s->this_aiocb->cb(s->this_aiocb->opaque, ret);
+ qemu_aio_release(s->this_aiocb);
+}
+
+static BlockDriverAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *iov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque,
+ int is_write)
+
+{
+ VectorTranslationState *s = qemu_mallocz(sizeof(*s));
+ BlockDriverAIOCB *aiocb = qemu_aio_get(bs, cb, opaque);
+
+ s->this_aiocb = aiocb;
+ s->iov = iov;
+ s->bounce = qemu_memalign(512, nb_sectors * 512);
+ s->is_write = is_write;
+ if (is_write) {
+ qemu_iovec_to_buffer(s->iov, s->bounce);
+ s->aiocb = bdrv_aio_write(bs, sector_num, s->bounce, nb_sectors,
+ bdrv_aio_rw_vector_cb, s);
+ } else {
+ s->aiocb = bdrv_aio_read(bs, sector_num, s->bounce, nb_sectors,
+ bdrv_aio_rw_vector_cb, s);
+ }
+ return aiocb;
+}
+
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque)
+{
+ return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+ cb, opaque, 0);
+}
+
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque)
+{
+ return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+ cb, opaque, 1);
+}
+
BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque)
@@ -1294,6 +1357,11 @@
{
BlockDriver *drv = acb->bs->drv;
+ if (acb->cb == bdrv_aio_rw_vector_cb) {
+ VectorTranslationState *s = acb->opaque;
+ acb = s->aiocb;
+ }
+
drv->bdrv_aio_cancel(acb);
}
Modified: trunk/block.h
===================================================================
--- trunk/block.h 2009-01-22 16:59:20 UTC (rev 6396)
+++ trunk/block.h 2009-01-22 16:59:24 UTC (rev 6397)
@@ -2,6 +2,7 @@
#define BLOCK_H
#include "qemu-aio.h"
+#include "qemu-common.h"
/* block.c */
typedef struct BlockDriver BlockDriver;
@@ -85,6 +86,13 @@
typedef struct BlockDriverAIOCB BlockDriverAIOCB;
typedef void BlockDriverCompletionFunc(void *opaque, int ret);
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque);
+
BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
uint8_t *buf, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque);
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
2009-01-22 16:59 [Qemu-devel] [6397] Vectored block device API (Avi Kivity) Anthony Liguori
@ 2009-01-26 15:13 ` Gerd Hoffmann
2009-01-26 20:48 ` Anthony Liguori
0 siblings, 1 reply; 3+ messages in thread
From: Gerd Hoffmann @ 2009-01-26 15:13 UTC (permalink / raw)
To: qemu-devel
Anthony Liguori wrote:
> For block devices, this translates to vectored I/O. This patch implements
> an aynchronous vectored interface for the qemu block devices. At the moment
> all I/O is bounced and submitted through the non-vectored API; in the future
> we will convert block devices to natively support vectored I/O wherever
> possible.
Any plan for this?
Current state is this: BlockDriver provides *three* ways to do I/O.
#1 is brdv_{read,write}, operating on sectors.
#2 is bdrv_aio_{read,write}, operation on sectors too.
#3 is brdv_{pread,pwrite}, operating on bytes.
All block drivers implement #1.
Most block drivers implement only #1.
#2 is implemented by qcow, qcow2, raw (including host_device).
#3 is implemented by raw (+hostdevice) only.
We can't kill #1 for the time being.
Not sure what the motivation for #3 is (O_DIRECT ?). Is that actually
useful for something? Can we drop it maybe? Block I/O is sector
oriented after all, so I'm not sure what the motivation for a
byte-oriented interface is in the first place ...
#2 is the candidate to be transformed into a vectored API. Given the
plan is to implement aio using threads: I think the block driver
doesn't need to know anything about aio. It should provide read/write
methods which are (a) vectored and (b) thread-safe.
For raw this is trivial: Use preadv(), done.
For qcow2 this is a bit more difficuilt. Metadata updating obviously
requires some locking. Probably lookups too (I'm not familiar with the
on-disk format and driver internals). The actual data transfer should
be doable unlocked I think.
The interface could look like this:
int (*bdrv_read)(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num);
int (*bdrv_write)(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num);
All the aio magic can live in the block layer then, well hidden from the
block drivers.
Comments on this? Especially from people knowing qcow2 better that I do?
cheers,
Gerd
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Qemu-devel] [6397] Vectored block device API (Avi Kivity)
2009-01-26 15:13 ` Gerd Hoffmann
@ 2009-01-26 20:48 ` Anthony Liguori
0 siblings, 0 replies; 3+ messages in thread
From: Anthony Liguori @ 2009-01-26 20:48 UTC (permalink / raw)
To: qemu-devel
Gerd Hoffmann wrote:
> Anthony Liguori wrote:
>
>> For block devices, this translates to vectored I/O. This patch implements
>> an aynchronous vectored interface for the qemu block devices. At the moment
>> all I/O is bounced and submitted through the non-vectored API; in the future
>> we will convert block devices to natively support vectored I/O wherever
>> possible.
>>
>
> Any plan for this?
>
Avi is on vacation this week but my understanding is that he'll be
submitting patches shortly after he gets back.
> Current state is this: BlockDriver provides *three* ways to do I/O.
>
> #1 is brdv_{read,write}, operating on sectors.
> #2 is bdrv_aio_{read,write}, operation on sectors too.
> #3 is brdv_{pread,pwrite}, operating on bytes.
>
> All block drivers implement #1.
> Most block drivers implement only #1.
> #2 is implemented by qcow, qcow2, raw (including host_device).
> #3 is implemented by raw (+hostdevice) only.
>
> We can't kill #1 for the time being.
>
> Not sure what the motivation for #3 is (O_DIRECT ?).
Internally, qcow, qcow2, etc. use the block API to access the disk
formats. These accesses are not always sector aligned.
> For qcow2 this is a bit more difficuilt. Metadata updating obviously
> requires some locking. Probably lookups too (I'm not familiar with the
> on-disk format and driver internals). The actual data transfer should
> be doable unlocked I think.
>
> The interface could look like this:
>
> int (*bdrv_read)(BlockDriverState *bs, QEMUIOVector *qiov,
> int64_t sector_num);
> int (*bdrv_write)(BlockDriverState *bs, QEMUIOVector *qiov,
> int64_t sector_num);
>
> All the aio magic can live in the block layer then, well hidden from the
> block drivers.
>
I agree with this on principle, but making all of the entry points
thread safe is a fair bit of work. Recall that we don't use threads on
every platform so we need to take that into account too.
Regards,
Anthony Liguori
> Comments on this? Especially from people knowing qcow2 better that I do?
>
> cheers,
> Gerd
>
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-01-26 20:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-22 16:59 [Qemu-devel] [6397] Vectored block device API (Avi Kivity) Anthony Liguori
2009-01-26 15:13 ` Gerd Hoffmann
2009-01-26 20:48 ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).