[Qemu-devel] [PATCH 2/5] Add map client retry notification

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 2/5] Add map client retry notification
  2009-01-18 19:53 [Qemu-devel] [PATCH 0/5] Direct memory access for devices Avi Kivity
@ 2009-01-18 19:53 ` Avi Kivity
  0 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-01-18 19:53 UTC (permalink / raw)
  To: qemu-devel, Anthony Liguori

The target memory mapping API may fail if the bounce buffer resources
are exhausted.  Add a notification mechanism to allow clients to retry
the mapping operation when resources become available again.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 cpu-all.h |    3 +++
 exec.c    |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 3439999..67e795e 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -928,6 +928,9 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
                               int is_write);
 void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
                                int is_write);
+void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
+void cpu_unregister_map_client(void *cookie);
+
 uint32_t ldub_phys(target_phys_addr_t addr);
 uint32_t lduw_phys(target_phys_addr_t addr);
 uint32_t ldl_phys(target_phys_addr_t addr);
diff --git a/exec.c b/exec.c
index 7162271..62bedc0 100644
--- a/exec.c
+++ b/exec.c
@@ -3053,6 +3053,43 @@ typedef struct {
 
 static BounceBuffer bounce;
 
+typedef struct MapClient {
+    void *opaque;
+    void (*callback)(void *opaque);
+    LIST_ENTRY(MapClient) link;
+} MapClient;
+
+static LIST_HEAD(map_client_list, MapClient) map_client_list
+    = LIST_HEAD_INITIALIZER(map_client_list);
+
+void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque))
+{
+    MapClient *client = qemu_malloc(sizeof(*client));
+
+    client->opaque = opaque;
+    client->callback = callback;
+    LIST_INSERT_HEAD(&map_client_list, client, link);
+    return client;
+}
+
+void cpu_unregister_map_client(void *_client)
+{
+    MapClient *client = (MapClient *)_client;
+
+    LIST_REMOVE(client, link);
+}
+
+static void cpu_notify_map_clients(void)
+{
+    MapClient *client;
+
+    while (!LIST_EMPTY(&map_client_list)) {
+        client = LIST_FIRST(&map_client_list);
+        client->callback(client->opaque);
+        LIST_REMOVE(client, link);
+    }
+}
+
 void *cpu_physical_memory_map(target_phys_addr_t addr,
                               target_phys_addr_t *plen,
                               int is_write)
@@ -3137,6 +3174,7 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
     }
     qemu_free(bounce.buffer);
     bounce.buffer = NULL;
+    cpu_notify_map_clients();
 }
 
 /* warning: addr must be aligned */
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2)
@ 2009-01-22 10:36 Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

One of the deficiencies of the current device layer is that it can only access
guest RAM via cpu_physical_memory_rw().  This means that the device emulation
code must copy the memory to or from a temporary buffer, even though the host
offers APIs which allow direct access to memory.  This reduces efficiency on
DMA capable devices, especially disks.

This patchset introduces a complement to the read/write API,
cpu_physical_memory_map() which allows device emulation code to map
guest memory directly.  The API bounces memory regions which cannot be
mapped (such as mmio regions) using an internal buffer.

As an example, IDE emulation is converted to use the new API.  This exposes
another deficiency: lack of scatter/gather support in the block layer.  To
work around this, a vectored block API is introduced, currently emulated
by bouncing.  Additional work is needed to convert all block format drivers
to use the vectored API.

Changes from v1:
 - documented memory mapping API
 - added access_len parameter to unmap operation, to indicate how much
   memory was actually accessed
 - move QEMUIOVector to cutils.c, and add flatten/unflatten operations
 - change block format driver API to accept a QEMUIOVector rather than a
   bare struct iovec

Avi Kivity (5):
  Add target memory mapping API
  Add map client retry notification
  I/O vector helpers
  Vectored block device API
  Convert IDE to directly access guest memory

 block.c       |   68 +++++++++++++++++++++++++++
 block.h       |    8 +++
 cpu-all.h     |    8 +++
 cutils.c      |   47 +++++++++++++++++++
 exec.c        |  142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ide.c      |  133 +++++++++++++++++++++++++++++++++++++++++++++++------
 qemu-common.h |   12 +++++
 7 files changed, 402 insertions(+), 16 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/5] Add target memory mapping API
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
@ 2009-01-22 10:36 ` Avi Kivity
  2009-01-22 12:24   ` Ian Jackson
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

Devices accessing large amounts of memory (as with DMA) will wish to obtain
a pointer to guest memory rather than access it indirectly via
cpu_physical_memory_rw().  Add a new API to convert target addresses to
host pointers.

In case the target address does not correspond to RAM, a bounce buffer is
allocated.  To prevent the guest from causing the host to allocate unbounded
amounts of bounce buffer, this memory is limited (currently to one page).

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 cpu-all.h |    6 +++
 exec.c    |  102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index ee0a6e3..22ffaa7 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -923,6 +923,12 @@ static inline void cpu_physical_memory_write(target_phys_addr_t addr,
 {
     cpu_physical_memory_rw(addr, (uint8_t *)buf, len, 1);
 }
+void *cpu_physical_memory_map(target_phys_addr_t addr,
+                              target_phys_addr_t *plen,
+                              int is_write);
+void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
+                               int is_write, target_phys_addr_t access_len);
+
 uint32_t ldub_phys(target_phys_addr_t addr);
 uint32_t lduw_phys(target_phys_addr_t addr);
 uint32_t ldl_phys(target_phys_addr_t addr);
diff --git a/exec.c b/exec.c
index faa6333..6dd88fc 100644
--- a/exec.c
+++ b/exec.c
@@ -3045,6 +3045,108 @@ void cpu_physical_memory_write_rom(target_phys_addr_t addr,
     }
 }
 
+typedef struct {
+    void *buffer;
+    target_phys_addr_t addr;
+    target_phys_addr_t len;
+} BounceBuffer;
+
+static BounceBuffer bounce;
+
+/* Map a physical memory region into a host virtual address.
+ * May map a subset of the requested range, given by and returned in *plen.
+ * May return NULL if resources needed to perform the mapping are exhausted.
+ * Use only for reads OR writes - not for read-modify-write operations.
+ */
+void *cpu_physical_memory_map(target_phys_addr_t addr,
+                              target_phys_addr_t *plen,
+                              int is_write)
+{
+    target_phys_addr_t len = *plen;
+    target_phys_addr_t done = 0;
+    int l;
+    uint8_t *ret = NULL;
+    uint8_t *ptr;
+    target_phys_addr_t page;
+    unsigned long pd;
+    PhysPageDesc *p;
+    unsigned long addr1;
+
+    while (len > 0) {
+        page = addr & TARGET_PAGE_MASK;
+        l = (page + TARGET_PAGE_SIZE) - addr;
+        if (l > len)
+            l = len;
+        p = phys_page_find(page >> TARGET_PAGE_BITS);
+        if (!p) {
+            pd = IO_MEM_UNASSIGNED;
+        } else {
+            pd = p->phys_offset;
+        }
+
+        if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+            if (done || bounce.buffer) {
+                break;
+            }
+            bounce.buffer = qemu_memalign(TARGET_PAGE_SIZE, TARGET_PAGE_SIZE);
+            bounce.addr = addr;
+            bounce.len = l;
+            if (!is_write) {
+                cpu_physical_memory_rw(addr, bounce.buffer, l, 0);
+            }
+            ptr = bounce.buffer;
+        } else {
+            addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+            ptr = phys_ram_base + addr1;
+        }
+        if (!done) {
+            ret = ptr;
+        } else if (ret + done != ptr) {
+            break;
+        }
+
+        len -= l;
+        addr += l;
+        done += l;
+    }
+    *plen = done;
+    return ret;
+}
+
+/* Unmaps a memory region previously mapped by cpu_physical_memory_map().
+ * Will also mark the memory as dirty if is_write == 1.  access_len gives
+ * the amount of memory that was actually read or written by the caller.
+ */
+void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
+                               int is_write, target_phys_addr_t access_len)
+{
+    if (buffer != bounce.buffer) {
+        if (is_write) {
+            unsigned long addr1 = (uint8_t *)buffer - phys_ram_base;
+            while (access_len) {
+                unsigned l;
+                l = TARGET_PAGE_SIZE;
+                if (l > access_len)
+                    l = access_len;
+                if (!cpu_physical_memory_is_dirty(addr1)) {
+                    /* invalidate code */
+                    tb_invalidate_phys_page_range(addr1, addr1 + l, 0);
+                    /* set dirty bit */
+                    phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+                        (0xff & ~CODE_DIRTY_FLAG);
+                }
+                addr1 += l;
+                access_len -= l;
+            }
+        }
+        return;
+    }
+    if (is_write) {
+        cpu_physical_memory_write(bounce.addr, bounce.buffer, access_len);
+    }
+    qemu_free(bounce.buffer);
+    bounce.buffer = NULL;
+}
 
 /* warning: addr must be aligned */
 uint32_t ldl_phys(target_phys_addr_t addr)
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/5] Add map client retry notification
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
@ 2009-01-22 10:36 ` Avi Kivity
  2009-01-22 12:30   ` Ian Jackson
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 3/5] I/O vector helpers Avi Kivity
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

The target memory mapping API may fail if the bounce buffer resources
are exhausted.  Add a notification mechanism to allow clients to retry
the mapping operation when resources become available again.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 cpu-all.h |    2 ++
 exec.c    |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 22ffaa7..e71bd06 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -928,6 +928,8 @@ void *cpu_physical_memory_map(target_phys_addr_t addr,
                               int is_write);
 void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
                                int is_write, target_phys_addr_t access_len);
+void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
+void cpu_unregister_map_client(void *cookie);
 
 uint32_t ldub_phys(target_phys_addr_t addr);
 uint32_t lduw_phys(target_phys_addr_t addr);
diff --git a/exec.c b/exec.c
index 6dd88fc..56e5e48 100644
--- a/exec.c
+++ b/exec.c
@@ -3053,10 +3053,49 @@ typedef struct {
 
 static BounceBuffer bounce;
 
+typedef struct MapClient {
+    void *opaque;
+    void (*callback)(void *opaque);
+    LIST_ENTRY(MapClient) link;
+} MapClient;
+
+static LIST_HEAD(map_client_list, MapClient) map_client_list
+    = LIST_HEAD_INITIALIZER(map_client_list);
+
+void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque))
+{
+    MapClient *client = qemu_malloc(sizeof(*client));
+
+    client->opaque = opaque;
+    client->callback = callback;
+    LIST_INSERT_HEAD(&map_client_list, client, link);
+    return client;
+}
+
+void cpu_unregister_map_client(void *_client)
+{
+    MapClient *client = (MapClient *)_client;
+
+    LIST_REMOVE(client, link);
+}
+
+static void cpu_notify_map_clients(void)
+{
+    MapClient *client;
+
+    while (!LIST_EMPTY(&map_client_list)) {
+        client = LIST_FIRST(&map_client_list);
+        client->callback(client->opaque);
+        LIST_REMOVE(client, link);
+    }
+}
+
 /* Map a physical memory region into a host virtual address.
  * May map a subset of the requested range, given by and returned in *plen.
  * May return NULL if resources needed to perform the mapping are exhausted.
  * Use only for reads OR writes - not for read-modify-write operations.
+ * Use cpu_register_map_client() to know when retrying the map operation is
+ * likely to succeed.
  */
 void *cpu_physical_memory_map(target_phys_addr_t addr,
                               target_phys_addr_t *plen,
@@ -3146,6 +3185,7 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
     }
     qemu_free(bounce.buffer);
     bounce.buffer = NULL;
+    cpu_notify_map_clients();
 }
 
 /* warning: addr must be aligned */
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 3/5] I/O vector helpers
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity
@ 2009-01-22 10:36 ` Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 4/5] Vectored block device API Avi Kivity
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

In general, it is not possible to predict the size of of an I/O vector since
a contiguous guest region may map to a disconiguous host region.  Add some
helpers to manage I/O vector growth.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 cutils.c      |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 qemu-common.h |   12 ++++++++++++
 2 files changed, 59 insertions(+), 0 deletions(-)

diff --git a/cutils.c b/cutils.c
index 9617e08..80a7a1d 100644
--- a/cutils.c
+++ b/cutils.c
@@ -101,3 +101,50 @@ int qemu_fls(int i)
 {
     return 32 - clz32(i);
 }
+
+/* io vectors */
+
+void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint)
+{
+    qiov->iov = qemu_malloc(alloc_hint * sizeof(struct iovec));
+    qiov->niov = 0;
+    qiov->nalloc = alloc_hint;
+}
+
+void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len)
+{
+    if (qiov->niov == qiov->nalloc) {
+        qiov->nalloc = 2 * qiov->nalloc + 1;
+        qiov->iov = qemu_realloc(qiov->iov, qiov->nalloc * sizeof(struct iovec));
+    }
+    qiov->iov[qiov->niov].iov_base = base;
+    qiov->iov[qiov->niov].iov_len = len;
+    ++qiov->niov;
+}
+
+void qemu_iovec_destroy(QEMUIOVector *qiov)
+{
+    qemu_free(qiov->iov);
+}
+
+void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf)
+{
+    uint8_t *p = (uint8_t *)buf;
+    int i;
+
+    for (i = 0; i < qiov->niov; ++i) {
+        memcpy(p, qiov->iov[i].iov_base, qiov->iov[i].iov_len);
+        p += qiov->iov[i].iov_len;
+    }
+}
+
+void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf)
+{
+    const uint8_t *p = (const uint8_t *)buf;
+    int i;
+
+    for (i = 0; i < qiov->niov; ++i) {
+        memcpy(qiov->iov[i].iov_base, p, qiov->iov[i].iov_len);
+        p += qiov->iov[i].iov_len;
+    }
+}
diff --git a/qemu-common.h b/qemu-common.h
index d83e61b..ae773e0 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -191,6 +191,18 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id);
 /* Force QEMU to stop what it's doing and service IO */
 void qemu_service_io(void);
 
+typedef struct QEMUIOVector {
+    struct iovec *iov;
+    int niov;
+    int nalloc;
+} QEMUIOVector;
+
+void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
+void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
+void qemu_iovec_destroy(QEMUIOVector *qiov);
+void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf);
+void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf);
+
 #endif /* dyngen-exec.h hack */
 
 #endif
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 4/5] Vectored block device API
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
                   ` (2 preceding siblings ...)
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 3/5] I/O vector helpers Avi Kivity
@ 2009-01-22 10:36 ` Avi Kivity
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 5/5] Convert IDE to directly access guest memory Avi Kivity
  2009-01-22 16:59 ` [Qemu-devel] Re: [PATCH 0/5] Direct memory access for devices (v2) Anthony Liguori
  5 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

Most devices that are capable of DMA are also capable of scatter-gather.
With the memory mapping API, this means that the device code needs to be
able to access discontiguous host memory regions.

For block devices, this translates to vectored I/O.  This patch implements
an aynchronous vectored interface for the qemu block devices.  At the moment
all I/O is bounced and submitted through the non-vectored API; in the future
we will convert block devices to natively support vectored I/O wherever
possible.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 block.c |   68 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h |    8 +++++++
 2 files changed, 76 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 3250327..f570afc 100644
--- a/block.c
+++ b/block.c
@@ -1246,6 +1246,69 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn)
 /**************************************************************/
 /* async I/Os */
 
+typedef struct VectorTranslationState {
+    QEMUIOVector *iov;
+    uint8_t *bounce;
+    int is_write;
+    BlockDriverAIOCB *aiocb;
+    BlockDriverAIOCB *this_aiocb;
+} VectorTranslationState;
+
+static void bdrv_aio_rw_vector_cb(void *opaque, int ret)
+{
+    VectorTranslationState *s = opaque;
+
+    if (!s->is_write) {
+        qemu_iovec_from_buffer(s->iov, s->bounce);
+    }
+    qemu_free(s->bounce);
+    s->this_aiocb->cb(s->this_aiocb->opaque, ret);
+    qemu_aio_release(s->this_aiocb);
+}
+
+static BlockDriverAIOCB *bdrv_aio_rw_vector(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            QEMUIOVector *iov,
+                                            int nb_sectors,
+                                            BlockDriverCompletionFunc *cb,
+                                            void *opaque,
+                                            int is_write)
+
+{
+    VectorTranslationState *s = qemu_mallocz(sizeof(*s));
+    BlockDriverAIOCB *aiocb = qemu_aio_get(bs, cb, opaque);
+
+    s->this_aiocb = aiocb;
+    s->iov = iov;
+    s->bounce = qemu_memalign(512, nb_sectors * 512);
+    s->is_write = is_write;
+    if (is_write) {
+        qemu_iovec_to_buffer(s->iov, s->bounce);
+        s->aiocb = bdrv_aio_write(bs, sector_num, s->bounce, nb_sectors,
+                                  bdrv_aio_rw_vector_cb, s);
+    } else {
+        s->aiocb = bdrv_aio_read(bs, sector_num, s->bounce, nb_sectors,
+                                 bdrv_aio_rw_vector_cb, s);
+    }
+    return aiocb;
+}
+
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+                              cb, opaque, 0);
+}
+
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+                                  QEMUIOVector *iov, int nb_sectors,
+                                  BlockDriverCompletionFunc *cb, void *opaque)
+{
+    return bdrv_aio_rw_vector(bs, sector_num, iov, nb_sectors,
+                              cb, opaque, 1);
+}
+
 BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
                                 uint8_t *buf, int nb_sectors,
                                 BlockDriverCompletionFunc *cb, void *opaque)
@@ -1294,6 +1357,11 @@ void bdrv_aio_cancel(BlockDriverAIOCB *acb)
 {
     BlockDriver *drv = acb->bs->drv;
 
+    if (acb->cb == bdrv_aio_rw_vector_cb) {
+        VectorTranslationState *s = acb->opaque;
+        acb = s->aiocb;
+    }
+
     drv->bdrv_aio_cancel(acb);
 }
 
diff --git a/block.h b/block.h
index c3314a1..9733409 100644
--- a/block.h
+++ b/block.h
@@ -2,6 +2,7 @@
 #define BLOCK_H
 
 #include "qemu-aio.h"
+#include "qemu-common.h"
 
 /* block.c */
 typedef struct BlockDriver BlockDriver;
@@ -85,6 +86,13 @@ int bdrv_commit(BlockDriverState *bs);
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
 
+BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
+                                  QEMUIOVector *iov, int nb_sectors,
+                                  BlockDriverCompletionFunc *cb, void *opaque);
+
 BlockDriverAIOCB *bdrv_aio_read(BlockDriverState *bs, int64_t sector_num,
                                 uint8_t *buf, int nb_sectors,
                                 BlockDriverCompletionFunc *cb, void *opaque);
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 5/5] Convert IDE to directly access guest memory
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
                   ` (3 preceding siblings ...)
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 4/5] Vectored block device API Avi Kivity
@ 2009-01-22 10:36 ` Avi Kivity
  2009-01-22 16:59 ` [Qemu-devel] Re: [PATCH 0/5] Direct memory access for devices (v2) Anthony Liguori
  5 siblings, 0 replies; 11+ messages in thread
From: Avi Kivity @ 2009-01-22 10:36 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel

Instead of copying to a temporary buffer, map guest memory for IDE DMA
transactions.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 hw/ide.c |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/hw/ide.c b/hw/ide.c
index 06ac6dc..49c785d 100644
--- a/hw/ide.c
+++ b/hw/ide.c
@@ -422,6 +422,7 @@ typedef struct IDEState {
     int atapi_dma; /* true if dma is requested for the packet cmd */
     /* ATA DMA state */
     int io_buffer_size;
+    QEMUIOVector iovec;
     /* PIO transfer handling */
     int req_nb_sectors; /* number of sectors per interrupt */
     EndTransferFunc *end_transfer_func;
@@ -862,6 +863,66 @@ static void ide_sector_read(IDEState *s)
     }
 }
 
+
+/* return 0 if buffer completed */
+static int dma_buf_prepare(BMDMAState *bm, int is_write)
+{
+    IDEState *s = bm->ide_if;
+    struct {
+        uint32_t addr;
+        uint32_t size;
+    } prd;
+    int l, len;
+    void *mem;
+    target_phys_addr_t l1;
+
+    qemu_iovec_init(&s->iovec, s->nsector / (TARGET_PAGE_SIZE/512) + 1);
+    s->io_buffer_size = 0;
+    for(;;) {
+        if (bm->cur_prd_len == 0) {
+            /* end of table (with a fail safe of one page) */
+            if (bm->cur_prd_last ||
+                (bm->cur_addr - bm->addr) >= 4096)
+                return s->io_buffer_size != 0;
+            cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
+            bm->cur_addr += 8;
+            prd.addr = le32_to_cpu(prd.addr);
+            prd.size = le32_to_cpu(prd.size);
+            len = prd.size & 0xfffe;
+            if (len == 0)
+                len = 0x10000;
+            bm->cur_prd_len = len;
+            bm->cur_prd_addr = prd.addr;
+            bm->cur_prd_last = (prd.size & 0x80000000);
+        }
+        l = bm->cur_prd_len;
+        if (l > 0) {
+            l1 = l;
+            mem = cpu_physical_memory_map(bm->cur_prd_addr, &l1, is_write);
+            if (!mem) {
+                break;
+            }
+            qemu_iovec_add(&s->iovec, mem, l1);
+            bm->cur_prd_addr += l1;
+            bm->cur_prd_len -= l1;
+            s->io_buffer_size += l1;
+        }
+    }
+    return 1;
+}
+
+static void dma_buf_commit(IDEState *s, int is_write)
+{
+    int i;
+
+    for (i = 0; i < s->iovec.niov; ++i) {
+        cpu_physical_memory_unmap(s->iovec.iov[i].iov_base,
+                                  s->iovec.iov[i].iov_len, is_write,
+                                  s->iovec.iov[i].iov_len);
+    }
+    qemu_iovec_destroy(&s->iovec);
+}
+
 static void ide_dma_error(IDEState *s)
 {
     ide_transfer_stop(s);
@@ -883,10 +944,12 @@ static int ide_handle_write_error(IDEState *s, int error, int op)
         s->bmdma->status |= op;
         vm_stop(0);
     } else {
-        if (op == BM_STATUS_DMA_RETRY)
+        if (op == BM_STATUS_DMA_RETRY) {
+            dma_buf_commit(s, 0);
             ide_dma_error(s);
-        else
+        } else {
             ide_rw_error(s);
+        }
     }
 
     return 1;
@@ -940,6 +1003,39 @@ static int dma_buf_rw(BMDMAState *bm, int is_write)
     return 1;
 }
 
+typedef struct {
+    BMDMAState *bm;
+    void (*cb)(void *opaque, int ret);
+    QEMUBH *bh;
+} MapFailureContinuation;
+
+static void reschedule_dma(void *opaque)
+{
+    MapFailureContinuation *cont = opaque;
+
+    cont->cb(cont->bm, 0);
+    qemu_bh_delete(cont->bh);
+    qemu_free(cont);
+}
+
+static void continue_after_map_failure(void *opaque)
+{
+    MapFailureContinuation *cont = opaque;
+
+    cont->bh = qemu_bh_new(reschedule_dma, opaque);
+    qemu_bh_schedule(cont->bh);
+}
+
+static void wait_for_bounce_buffer(BMDMAState *bmdma,
+                                   void (*cb)(void *opaque, int ret))
+{
+    MapFailureContinuation *cont = qemu_malloc(sizeof(*cont));
+
+    cont->bm = bmdma;
+    cont->cb = cb;
+    cpu_register_map_client(cont, continue_after_map_failure);
+}
+
 static void ide_read_dma_cb(void *opaque, int ret)
 {
     BMDMAState *bm = opaque;
@@ -948,6 +1044,7 @@ static void ide_read_dma_cb(void *opaque, int ret)
     int64_t sector_num;
 
     if (ret < 0) {
+        dma_buf_commit(s, 1);
 	ide_dma_error(s);
 	return;
     }
@@ -955,11 +1052,10 @@ static void ide_read_dma_cb(void *opaque, int ret)
     n = s->io_buffer_size >> 9;
     sector_num = ide_get_sector(s);
     if (n > 0) {
+        dma_buf_commit(s, 1);
         sector_num += n;
         ide_set_sector(s, sector_num);
         s->nsector -= n;
-        if (dma_buf_rw(bm, 1) == 0)
-            goto eot;
     }
 
     /* end of transfer ? */
@@ -977,15 +1073,19 @@ static void ide_read_dma_cb(void *opaque, int ret)
 
     /* launch next transfer */
     n = s->nsector;
-    if (n > IDE_DMA_BUF_SECTORS)
-        n = IDE_DMA_BUF_SECTORS;
     s->io_buffer_index = 0;
     s->io_buffer_size = n * 512;
+    if (dma_buf_prepare(bm, 1) == 0)
+        goto eot;
+    if (!s->iovec.niov) {
+        wait_for_bounce_buffer(bm, ide_read_dma_cb);
+        return;
+    }
 #ifdef DEBUG_AIO
     printf("aio_read: sector_num=%" PRId64 " n=%d\n", sector_num, n);
 #endif
-    bm->aiocb = bdrv_aio_read(s->bs, sector_num, s->io_buffer, n,
-                              ide_read_dma_cb, bm);
+    bm->aiocb = bdrv_aio_readv(s->bs, sector_num, &s->iovec, n,
+                               ide_read_dma_cb, bm);
     ide_dma_submit_check(s, ide_read_dma_cb, bm);
 }
 
@@ -1081,6 +1181,7 @@ static void ide_write_dma_cb(void *opaque, int ret)
     n = s->io_buffer_size >> 9;
     sector_num = ide_get_sector(s);
     if (n > 0) {
+        dma_buf_commit(s, 0);
         sector_num += n;
         ide_set_sector(s, sector_num);
         s->nsector -= n;
@@ -1099,20 +1200,20 @@ static void ide_write_dma_cb(void *opaque, int ret)
         return;
     }
 
-    /* launch next transfer */
     n = s->nsector;
-    if (n > IDE_DMA_BUF_SECTORS)
-        n = IDE_DMA_BUF_SECTORS;
-    s->io_buffer_index = 0;
     s->io_buffer_size = n * 512;
-
-    if (dma_buf_rw(bm, 0) == 0)
+    /* launch next transfer */
+    if (dma_buf_prepare(bm, 0) == 0)
         goto eot;
+    if (!s->iovec.niov) {
+        wait_for_bounce_buffer(bm, ide_write_dma_cb);
+        return;
+    }
 #ifdef DEBUG_AIO
     printf("aio_write: sector_num=%" PRId64 " n=%d\n", sector_num, n);
 #endif
-    bm->aiocb = bdrv_aio_write(s->bs, sector_num, s->io_buffer, n,
-                               ide_write_dma_cb, bm);
+    bm->aiocb = bdrv_aio_writev(s->bs, sector_num, &s->iovec, n,
+                                ide_write_dma_cb, bm);
     ide_dma_submit_check(s, ide_write_dma_cb, bm);
 }
 
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] Add target memory mapping API
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
@ 2009-01-22 12:24   ` Ian Jackson
  0 siblings, 0 replies; 11+ messages in thread
From: Ian Jackson @ 2009-01-22 12:24 UTC (permalink / raw)
  To: qemu-devel

Avi Kivity writes ("[Qemu-devel] [PATCH 1/5] Add target memory mapping API"):
> Devices accessing large amounts of memory (as with DMA) will wish to obtain
> a pointer to guest memory rather than access it indirectly via
> cpu_physical_memory_rw().  Add a new API to convert target addresses to
> host pointers.
> 
> In case the target address does not correspond to RAM, a bounce buffer is
> allocated.  To prevent the guest from causing the host to allocate unbounded
> amounts of bounce buffer, this memory is limited (currently to one page).
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
...
> +void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
> +                               int is_write, target_phys_addr_t access_len)

Great, thanks for adding the access_len.

Ian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] Add map client retry notification
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity
@ 2009-01-22 12:30   ` Ian Jackson
  2009-01-22 18:51     ` Anthony Liguori
  0 siblings, 1 reply; 11+ messages in thread
From: Ian Jackson @ 2009-01-22 12:30 UTC (permalink / raw)
  To: qemu-devel

Avi Kivity writes ("[Qemu-devel] [PATCH 2/5] Add map client retry notification"):
> The target memory mapping API may fail if the bounce buffer resources
> are exhausted.  Add a notification mechanism to allow clients to retry
> the mapping operation when resources become available again.

Does this API not suffer from the potential deadlock described by
Anthony ?

Imagine that for some reason bounce buffers are in use.  If we have a
client which wants to do a single writev on a tap device it will even
deadlock by itself:

  map(<block 0>) succeeds
  map(<block 1>) fails, NULL
  register_map_client

but the callback will never happen because the client is effectively
waiting for itself to release its own mapping.

Since callers cannot assume that they can map more than one range at
once (since there's only one bounce buffer), any caller which needs to
do scatter-gather (like a tap device, as Anthony points out) needs to
invent its own bounce buffers.  That seems like a waste of effort.

There should be a single bounce buffer fallback mechanism, and it
should be sufficiently powerful that it can be used for tap devices,
which means that the calling device emulation must present a single
scatter-gather list to the API all in one go.

Ian.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] Re: [PATCH 0/5] Direct memory access for devices (v2)
  2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
                   ` (4 preceding siblings ...)
  2009-01-22 10:36 ` [Qemu-devel] [PATCH 5/5] Convert IDE to directly access guest memory Avi Kivity
@ 2009-01-22 16:59 ` Anthony Liguori
  5 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2009-01-22 16:59 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

Avi Kivity wrote:
> One of the deficiencies of the current device layer is that it can only access
> guest RAM via cpu_physical_memory_rw().  This means that the device emulation
> code must copy the memory to or from a temporary buffer, even though the host
> offers APIs which allow direct access to memory.  This reduces efficiency on
> DMA capable devices, especially disks.
>
> This patchset introduces a complement to the read/write API,
> cpu_physical_memory_map() which allows device emulation code to map
> guest memory directly.  The API bounces memory regions which cannot be
> mapped (such as mmio regions) using an internal buffer.
>
> As an example, IDE emulation is converted to use the new API.  This exposes
> another deficiency: lack of scatter/gather support in the block layer.  To
> work around this, a vectored block API is introduced, currently emulated
> by bouncing.  Additional work is needed to convert all block format drivers
> to use the vectored API.
>   

Applied all.  Thanks.

Regards,

Anthony Liguori

> Changes from v1:
>  - documented memory mapping API
>  - added access_len parameter to unmap operation, to indicate how much
>    memory was actually accessed
>  - move QEMUIOVector to cutils.c, and add flatten/unflatten operations
>  - change block format driver API to accept a QEMUIOVector rather than a
>    bare struct iovec
>
> Avi Kivity (5):
>   Add target memory mapping API
>   Add map client retry notification
>   I/O vector helpers
>   Vectored block device API
>   Convert IDE to directly access guest memory
>
>  block.c       |   68 +++++++++++++++++++++++++++
>  block.h       |    8 +++
>  cpu-all.h     |    8 +++
>  cutils.c      |   47 +++++++++++++++++++
>  exec.c        |  142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ide.c      |  133 +++++++++++++++++++++++++++++++++++++++++++++++------
>  qemu-common.h |   12 +++++
>  7 files changed, 402 insertions(+), 16 deletions(-)
>
>
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] Add map client retry notification
  2009-01-22 12:30   ` Ian Jackson
@ 2009-01-22 18:51     ` Anthony Liguori
  0 siblings, 0 replies; 11+ messages in thread
From: Anthony Liguori @ 2009-01-22 18:51 UTC (permalink / raw)
  To: qemu-devel

Ian Jackson wrote:
> Avi Kivity writes ("[Qemu-devel] [PATCH 2/5] Add map client retry notification"):
>   
>> The target memory mapping API may fail if the bounce buffer resources
>> are exhausted.  Add a notification mechanism to allow clients to retry
>> the mapping operation when resources become available again.
>>     
>
> Does this API not suffer from the potential deadlock described by
> Anthony ?
>
> Imagine that for some reason bounce buffers are in use.  If we have a
> client which wants to do a single writev on a tap device it will even
> deadlock by itself:
>
>   map(<block 0>) succeeds
>   map(<block 1>) fails, NULL
>   register_map_client
>
> but the callback will never happen because the client is effectively
> waiting for itself to release its own mapping.
>   

Yes, a client is not allowed to do this.  To put it another way (and 
perhaps this needs to be documented), register_map_client can only be 
used safely if a client unmaps all of it's existing mappings.

> Since callers cannot assume that they can map more than one range at
> once (since there's only one bounce buffer), any caller which needs to
> do scatter-gather (like a tap device, as Anthony points out) needs to
> invent its own bounce buffers.  That seems like a waste of effort.
>   

It needs to be able to fall back to something like cpu_physical_memory_rw.

> There should be a single bounce buffer fallback mechanism, and it
> should be sufficiently powerful that it can be used for tap devices,
> which means that the calling device emulation must present a single
> scatter-gather list to the API all in one go.
>   

You could have an API like:

try_to_map_or_bounce(list-of-phys-iovecs, buffer-to-bounce-to, callback, 
opaque);

That would be a nice addition for packet IO devices.  Better yet, it 
should be:

try_to_map_or_bounce(map-func, unmap-func, iofunc, opaque, 
list-of-phys-iovecs, buffer-to-bounce-to)

If you go back and look at my previous mails about packet helpers and 
stream helpers, that's just about the signature of my proposed packet 
helper.  Like I mentioned earlier, I definitely think we should have 
such a thing.

Regards,

Anthony Liguori

> Ian.
>
>
>   

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-01-22 18:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-22 10:36 [Qemu-devel] [PATCH 0/5] Direct memory access for devices (v2) Avi Kivity
2009-01-22 10:36 ` [Qemu-devel] [PATCH 1/5] Add target memory mapping API Avi Kivity
2009-01-22 12:24   ` Ian Jackson
2009-01-22 10:36 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity
2009-01-22 12:30   ` Ian Jackson
2009-01-22 18:51     ` Anthony Liguori
2009-01-22 10:36 ` [Qemu-devel] [PATCH 3/5] I/O vector helpers Avi Kivity
2009-01-22 10:36 ` [Qemu-devel] [PATCH 4/5] Vectored block device API Avi Kivity
2009-01-22 10:36 ` [Qemu-devel] [PATCH 5/5] Convert IDE to directly access guest memory Avi Kivity
2009-01-22 16:59 ` [Qemu-devel] Re: [PATCH 0/5] Direct memory access for devices (v2) Anthony Liguori
  -- strict thread matches above, loose matches on Subject: below --
2009-01-18 19:53 [Qemu-devel] [PATCH 0/5] Direct memory access for devices Avi Kivity
2009-01-18 19:53 ` [Qemu-devel] [PATCH 2/5] Add map client retry notification Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).