* [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support
@ 2011-08-04 17:14 Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 01/10] dma-helpers: allow including from target-independent code Paolo Bonzini
` (11 more replies)
0 siblings, 12 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Hi,
this is the version of SCSI scatter/gather based on the existing
DMA helpers infrastructure.
The infrastructure required a little update because I need to
know the residual amount of data upon short transfers. To this
end, my choice was to make QEMUSGList mutable and track the
current position in there. Any other ideas are welcome, the
reason for this choice is explained in patch 2.
The patches are quite self-contained, but they depend on the
changes I posted yesterday.
Patch 11 is the sample vmw_pvscsi device model that I used to
test the code.
Paolo Bonzini (11):
dma-helpers: allow including from target-independent code
dma-helpers: track position in the QEMUSGList
dma-helpers: rewrite completion/cancellation
dma-helpers: prepare for adding dma_buf_* functions
dma-helpers: add dma_buf_read and dma_buf_write
scsi: pass residual amount to command_complete
scsi: add scatter/gather functionality
scsi-disk: commonize iovec creation between reads and writes
scsi-disk: lazily allocate bounce buffer
scsi-disk: enable scatter/gather functionality
sample pvscsi driver with s/g support
Makefile.objs | 1 +
cutils.c | 8 +-
default-configs/i386-softmmu.mak | 1 +
default-configs/pci.mak | 1 +
default-configs/x86_64-softmmu.mak | 1 +
dma-helpers.c | 231 +++++++--
dma.h | 27 +-
hw/esp.c | 5 +-
hw/lsi53c895a.c | 4 +-
hw/pci.h | 1 +
hw/scsi-bus.c | 38 ++-
hw/scsi-disk.c | 117 +++--
hw/scsi.h | 7 +-
hw/spapr_vscsi.c | 4 +-
hw/usb-msd.c | 4 +-
hw/vmw_pvscsi.c | 904 ++++++++++++++++++++++++++++++++++++
hw/vmw_pvscsi.h | 389 ++++++++++++++++
qemu-common.h | 1 +
trace-events | 15 +
19 files changed, 1646 insertions(+), 113 deletions(-)
create mode 100644 hw/vmw_pvscsi.c
create mode 100644 hw/vmw_pvscsi.h
--
1.7.6
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 01/10] dma-helpers: allow including from target-independent code
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 02/10] dma-helpers: track position in the QEMUSGList Paolo Bonzini
` (10 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Target-independent code cannot construct sglists, but it can take
them from the outside as a black box. Allow this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
dma.h | 12 ++++++++----
qemu-common.h | 1 +
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/dma.h b/dma.h
index 3d8324b..f7e0142 100644
--- a/dma.h
+++ b/dma.h
@@ -15,22 +15,26 @@
#include "hw/hw.h"
#include "block.h"
-typedef struct {
+typedef struct ScatterGatherEntry ScatterGatherEntry;
+
+#if defined(TARGET_PHYS_ADDR_BITS)
+struct ScatterGatherEntry {
target_phys_addr_t base;
target_phys_addr_t len;
-} ScatterGatherEntry;
+};
-typedef struct {
+struct QEMUSGList {
ScatterGatherEntry *sg;
int nsg;
int nalloc;
target_phys_addr_t size;
-} QEMUSGList;
+};
void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
target_phys_addr_t len);
void qemu_sglist_destroy(QEMUSGList *qsg);
+#endif
typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
diff --git a/qemu-common.h b/qemu-common.h
index 1e3c665..69c6595 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -18,6 +18,7 @@ typedef struct DeviceState DeviceState;
struct Monitor;
typedef struct Monitor Monitor;
+typedef struct QEMUSGList QEMUSGList;
/* we put basic includes here to avoid repeating them in device drivers */
#include <stdlib.h>
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 02/10] dma-helpers: track position in the QEMUSGList
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 01/10] dma-helpers: allow including from target-independent code Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 03/10] dma-helpers: rewrite completion/cancellation Paolo Bonzini
` (9 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
The DMA helpers infrastructures cannot at the moment track how many bytes
have been actually written, so the users cannot detect short transfers.
Adding an accessor to the DMAAIOCB cannot fix this however, because the
callback may not have access at all to the AIOCB if the transfer is
completed synchronously. In this case, the operation is completed before
the caller of bdrv_aio_{read,write}v has the opportunity to store the
AIOCB anywhere.
So, augment the SGList API with functions to walk the QEMUSGList and
map segments along the way. Track the number of residual bytes, and
add a function to retrieve it (so that it can even be used where
you have no access to target_phys_addr_t).
An alternative would be to add the AIOCB as a third parameter to the
BlockDriverCompletionFunc. This would have been a much bigger patch,
but I can do it if requested.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
dma-helpers.c | 64 +++++++++++++++++++++++++++++++++++++++++---------------
dma.h | 12 +++++++++-
2 files changed, 57 insertions(+), 19 deletions(-)
diff --git a/dma-helpers.c b/dma-helpers.c
index ba7f897..6a59f59 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -28,9 +28,51 @@ void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
qsg->sg[qsg->nsg].base = base;
qsg->sg[qsg->nsg].len = len;
qsg->size += len;
+ qsg->resid += len;
++qsg->nsg;
}
+void qemu_sglist_rewind(QEMUSGList *qsg)
+{
+ qsg->resid = qsg->size;
+ qsg->cur_index = 0;
+ qsg->cur_byte = 0;
+}
+
+int64_t qemu_sglist_get_resid(QEMUSGList *qsg)
+{
+ return qsg->resid;
+}
+
+void qemu_sglist_advance(QEMUSGList *qsg, target_phys_addr_t bytes)
+{
+ assert(qsg->cur_index < qsg->nsg);
+ assert(bytes <= qsg->sg[qsg->cur_index].len - qsg->cur_byte);
+ qsg->cur_byte += bytes;
+ qsg->resid -= bytes;
+ if (qsg->cur_byte == qsg->sg[qsg->cur_index].len) {
+ qsg->cur_index++;
+ qsg->cur_byte = 0;
+ }
+}
+
+void *qemu_sglist_map_segment(QEMUSGList *qsg, target_phys_addr_t *cur_len, bool is_write)
+{
+ target_phys_addr_t cur_addr;
+ void *mem;
+
+ if (qsg->cur_index == qsg->nsg) {
+ return NULL;
+ }
+ cur_addr = qsg->sg[qsg->cur_index].base + qsg->cur_byte;
+ *cur_len = qsg->sg[qsg->cur_index].len - qsg->cur_byte;
+ mem = cpu_physical_memory_map(cur_addr, cur_len, is_write);
+ if (mem) {
+ qemu_sglist_advance(qsg, *cur_len);
+ }
+ return mem;
+}
+
void qemu_sglist_destroy(QEMUSGList *qsg)
{
qemu_free(qsg->sg);
@@ -43,8 +85,6 @@ typedef struct {
QEMUSGList *sg;
uint64_t sector_num;
int is_write;
- int sg_cur_index;
- target_phys_addr_t sg_cur_byte;
QEMUIOVector iov;
QEMUBH *bh;
DMAIOFunc *io_func;
@@ -83,33 +123,24 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
static void dma_bdrv_cb(void *opaque, int ret)
{
DMAAIOCB *dbs = (DMAAIOCB *)opaque;
- target_phys_addr_t cur_addr, cur_len;
void *mem;
+ target_phys_addr_t cur_len;
dbs->acb = NULL;
dbs->sector_num += dbs->iov.size / 512;
dma_bdrv_unmap(dbs);
qemu_iovec_reset(&dbs->iov);
- if (dbs->sg_cur_index == dbs->sg->nsg || ret < 0) {
+ if (dbs->sg->cur_index == dbs->sg->nsg || ret < 0) {
dbs->common.cb(dbs->common.opaque, ret);
qemu_iovec_destroy(&dbs->iov);
qemu_aio_release(dbs);
return;
}
- while (dbs->sg_cur_index < dbs->sg->nsg) {
- cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
- cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
- mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
- if (!mem)
- break;
+ while ((mem = qemu_sglist_map_segment(dbs->sg, &cur_len,
+ !dbs->is_write)) != NULL) {
qemu_iovec_add(&dbs->iov, mem, cur_len);
- dbs->sg_cur_byte += cur_len;
- if (dbs->sg_cur_byte == dbs->sg->sg[dbs->sg_cur_index].len) {
- dbs->sg_cur_byte = 0;
- ++dbs->sg_cur_index;
- }
}
if (dbs->iov.size == 0) {
@@ -151,12 +182,11 @@ BlockDriverAIOCB *dma_bdrv_io(
dbs->bs = bs;
dbs->sg = sg;
dbs->sector_num = sector_num;
- dbs->sg_cur_index = 0;
- dbs->sg_cur_byte = 0;
dbs->is_write = is_write;
dbs->io_func = io_func;
dbs->bh = NULL;
qemu_iovec_init(&dbs->iov, sg->nsg);
+ qemu_sglist_rewind(sg);
dma_bdrv_cb(dbs, 0);
if (!dbs->acb) {
qemu_aio_release(dbs);
diff --git a/dma.h b/dma.h
index f7e0142..363e932 100644
--- a/dma.h
+++ b/dma.h
@@ -27,15 +27,23 @@ struct QEMUSGList {
ScatterGatherEntry *sg;
int nsg;
int nalloc;
+ int cur_index;
+ target_phys_addr_t cur_byte;
target_phys_addr_t size;
+ target_phys_addr_t resid;
};
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base,
target_phys_addr_t len);
-void qemu_sglist_destroy(QEMUSGList *qsg);
+void qemu_sglist_advance(QEMUSGList *qsg, target_phys_addr_t bytes);
+void *qemu_sglist_map_segment(QEMUSGList *qsg, target_phys_addr_t *cur_len, bool is_write);
#endif
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_rewind(QEMUSGList *qsg);
+void qemu_sglist_destroy(QEMUSGList *qsg);
+int64_t qemu_sglist_get_resid(QEMUSGList *qsg);
+
typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num,
QEMUIOVector *iov, int nb_sectors,
BlockDriverCompletionFunc *cb, void *opaque);
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 03/10] dma-helpers: rewrite completion/cancellation
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 01/10] dma-helpers: allow including from target-independent code Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 02/10] dma-helpers: track position in the QEMUSGList Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 04/10] dma-helpers: prepare for adding dma_buf_* functions Paolo Bonzini
` (8 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
This fixes various problems with completion/cancellation:
* If DMA encounters a bounce buffer conflict, and the DMA operation is
canceled before the bottom half fires, bad things happen.
* memory is not unmapped after cancellation, again causing problems
when doing DMA to I/O areas
* cancellation could leak the iovec
and probably more that I've missed. The patch fixes them by sharing
the cleanup code between completion and cancellation. The dma_bdrv_cb
now returns a boolean completed/not completed flag, and the wrapper
dma_continue takes care of tasks to do upon completion.
Most of these are basically impossible in practice, but the resulting
code is also more suitable for introduction of dma_buf_read and
dma_buf_write.
One note: since memory-based DMA will not use dbs->acb, here I'm
switching dbs->common.cb == NULL to mark a canceled operation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
dma-helpers.c | 90 ++++++++++++++++++++++++++++++++++++++------------------
1 files changed, 61 insertions(+), 29 deletions(-)
diff --git a/dma-helpers.c b/dma-helpers.c
index 6a59f59..d716524 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -90,7 +90,49 @@ typedef struct {
DMAIOFunc *io_func;
} DMAAIOCB;
-static void dma_bdrv_cb(void *opaque, int ret);
+static int dma_bdrv_cb(DMAAIOCB *opaque, int ret);
+
+static void dma_bdrv_unmap(DMAAIOCB *dbs)
+{
+ int i;
+
+ for (i = 0; i < dbs->iov.niov; ++i) {
+ cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
+ dbs->iov.iov[i].iov_len, !dbs->is_write,
+ dbs->iov.iov[i].iov_len);
+ }
+ qemu_iovec_reset(&dbs->iov);
+}
+
+static void dma_complete(DMAAIOCB *dbs, int ret)
+{
+ dma_bdrv_unmap(dbs);
+ if (dbs->common.cb) {
+ dbs->common.cb(dbs->common.opaque, ret);
+ }
+ qemu_iovec_destroy(&dbs->iov);
+ if (dbs->bh) {
+ qemu_bh_delete(dbs->bh);
+ dbs->bh = NULL;
+ }
+ qemu_aio_release(dbs);
+}
+
+static BlockDriverAIOCB *dma_continue(DMAAIOCB *dbs, int ret)
+{
+ assert(ret != -EAGAIN);
+ if (ret == 0) {
+ /* No error so far, try doing more DMA. If dma_bdrv_cb starts an
+ asynchronous operation, it returns -EAGAIN and we will be
+ called again by either reschedule_dma or dma_bdrv_aio_cb.
+ If not, call the BlockDriverCompletionFunc. */
+ ret = dma_bdrv_cb(dbs, ret);
+ }
+ if (ret != -EAGAIN) {
+ dma_complete(dbs, ret);
+ }
+ return &dbs->common;
+}
static void reschedule_dma(void *opaque)
{
@@ -98,7 +140,7 @@ static void reschedule_dma(void *opaque)
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
- dma_bdrv_cb(opaque, 0);
+ dma_continue(dbs, 0);
}
static void continue_after_map_failure(void *opaque)
@@ -109,33 +151,23 @@ static void continue_after_map_failure(void *opaque)
qemu_bh_schedule(dbs->bh);
}
-static void dma_bdrv_unmap(DMAAIOCB *dbs)
+static void dma_bdrv_aio_cb(void *opaque, int ret)
{
- int i;
-
- for (i = 0; i < dbs->iov.niov; ++i) {
- cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
- dbs->iov.iov[i].iov_len, !dbs->is_write,
- dbs->iov.iov[i].iov_len);
- }
+ DMAAIOCB *dbs = (DMAAIOCB *)opaque;
+ dma_continue(dbs, ret);
}
-static void dma_bdrv_cb(void *opaque, int ret)
+static int dma_bdrv_cb(DMAAIOCB *dbs, int ret)
{
- DMAAIOCB *dbs = (DMAAIOCB *)opaque;
void *mem;
target_phys_addr_t cur_len;
dbs->acb = NULL;
dbs->sector_num += dbs->iov.size / 512;
dma_bdrv_unmap(dbs);
- qemu_iovec_reset(&dbs->iov);
if (dbs->sg->cur_index == dbs->sg->nsg || ret < 0) {
- dbs->common.cb(dbs->common.opaque, ret);
- qemu_iovec_destroy(&dbs->iov);
- qemu_aio_release(dbs);
- return;
+ return ret;
}
while ((mem = qemu_sglist_map_segment(dbs->sg, &cur_len,
@@ -145,16 +177,17 @@ static void dma_bdrv_cb(void *opaque, int ret)
if (dbs->iov.size == 0) {
cpu_register_map_client(dbs, continue_after_map_failure);
- return;
+ return -EAGAIN;
}
dbs->acb = dbs->io_func(dbs->bs, dbs->sector_num, &dbs->iov,
- dbs->iov.size / 512, dma_bdrv_cb, dbs);
+ dbs->iov.size / 512, dma_bdrv_aio_cb, dbs);
if (!dbs->acb) {
- dma_bdrv_unmap(dbs);
- qemu_iovec_destroy(&dbs->iov);
- return;
+ dbs->common.cb = NULL;
+ return -EIO;
}
+
+ return -EAGAIN;
}
static void dma_aio_cancel(BlockDriverAIOCB *acb)
@@ -162,8 +195,12 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
DMAAIOCB *dbs = container_of(acb, DMAAIOCB, common);
if (dbs->acb) {
- bdrv_aio_cancel(dbs->acb);
+ BlockDriverAIOCB *acb = dbs->acb;
+ dbs->acb = NULL;
+ bdrv_aio_cancel(acb);
}
+ dbs->common.cb = NULL;
+ dma_complete(dbs, 0);
}
static AIOPool dma_aio_pool = {
@@ -187,12 +224,7 @@ BlockDriverAIOCB *dma_bdrv_io(
dbs->bh = NULL;
qemu_iovec_init(&dbs->iov, sg->nsg);
qemu_sglist_rewind(sg);
- dma_bdrv_cb(dbs, 0);
- if (!dbs->acb) {
- qemu_aio_release(dbs);
- return NULL;
- }
- return &dbs->common;
+ return dma_continue(dbs, 0);
}
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 04/10] dma-helpers: prepare for adding dma_buf_* functions
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (2 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 03/10] dma-helpers: rewrite completion/cancellation Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write Paolo Bonzini
` (7 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Store in the AIOCB which callback function we are using, and abstract
the process of starting DMA.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
dma-helpers.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)
diff --git a/dma-helpers.c b/dma-helpers.c
index d716524..4469ce2 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -78,9 +78,13 @@ void qemu_sglist_destroy(QEMUSGList *qsg)
qemu_free(qsg->sg);
}
-typedef struct {
+typedef struct DMAAIOCB DMAAIOCB;
+
+struct DMAAIOCB {
BlockDriverAIOCB common;
- BlockDriverState *bs;
+ union {
+ BlockDriverState *bs;
+ } u;
BlockDriverAIOCB *acb;
QEMUSGList *sg;
uint64_t sector_num;
@@ -88,9 +92,8 @@ typedef struct {
QEMUIOVector iov;
QEMUBH *bh;
DMAIOFunc *io_func;
-} DMAAIOCB;
-
-static int dma_bdrv_cb(DMAAIOCB *opaque, int ret);
+ int (*cb)(DMAAIOCB *, int);
+};
static void dma_bdrv_unmap(DMAAIOCB *dbs)
{
@@ -122,11 +125,11 @@ static BlockDriverAIOCB *dma_continue(DMAAIOCB *dbs, int ret)
{
assert(ret != -EAGAIN);
if (ret == 0) {
- /* No error so far, try doing more DMA. If dma_bdrv_cb starts an
+ /* No error so far, try doing more DMA. If dbs->cb starts an
asynchronous operation, it returns -EAGAIN and we will be
called again by either reschedule_dma or dma_bdrv_aio_cb.
If not, call the BlockDriverCompletionFunc. */
- ret = dma_bdrv_cb(dbs, ret);
+ ret = dbs->cb(dbs, ret);
}
if (ret != -EAGAIN) {
dma_complete(dbs, ret);
@@ -180,7 +183,7 @@ static int dma_bdrv_cb(DMAAIOCB *dbs, int ret)
return -EAGAIN;
}
- dbs->acb = dbs->io_func(dbs->bs, dbs->sector_num, &dbs->iov,
+ dbs->acb = dbs->io_func(dbs->u.bs, dbs->sector_num, &dbs->iov,
dbs->iov.size / 512, dma_bdrv_aio_cb, dbs);
if (!dbs->acb) {
dbs->common.cb = NULL;
@@ -216,12 +219,13 @@ BlockDriverAIOCB *dma_bdrv_io(
DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, bs, cb, opaque);
dbs->acb = NULL;
- dbs->bs = bs;
+ dbs->u.bs = bs;
dbs->sg = sg;
dbs->sector_num = sector_num;
dbs->is_write = is_write;
dbs->io_func = io_func;
dbs->bh = NULL;
+ dbs->cb = dma_bdrv_cb;
qemu_iovec_init(&dbs->iov, sg->nsg);
qemu_sglist_rewind(sg);
return dma_continue(dbs, 0);
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (3 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 04/10] dma-helpers: prepare for adding dma_buf_* functions Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-11 7:58 ` Stefan Hajnoczi
2011-08-04 17:14 ` [Qemu-devel] [PATCH 06/10] scsi: pass residual amount to command_complete Paolo Bonzini
` (6 subsequent siblings)
11 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
These helpers do a full transfer from an in-memory buffer to
target memory, with full support for MMIO areas. It will be used to store
the reply of an emulated command into a QEMUSGList provided by the
adapter.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
cutils.c | 8 +++---
dma-helpers.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
dma.h | 5 ++++
3 files changed, 72 insertions(+), 4 deletions(-)
diff --git a/cutils.c b/cutils.c
index f9a7e36..969fd2e 100644
--- a/cutils.c
+++ b/cutils.c
@@ -215,14 +215,14 @@ void qemu_iovec_concat(QEMUIOVector *dst, QEMUIOVector *src, size_t size)
void qemu_iovec_destroy(QEMUIOVector *qiov)
{
- assert(qiov->nalloc != -1);
-
- qemu_free(qiov->iov);
+ if (qiov->nalloc != -1) {
+ qemu_free(qiov->iov);
+ }
}
void qemu_iovec_reset(QEMUIOVector *qiov)
{
- assert(qiov->nalloc != -1);
+ assert(qiov->nalloc != -1 || qiov->niov == 0);
qiov->niov = 0;
qiov->size = 0;
diff --git a/dma-helpers.c b/dma-helpers.c
index 4469ce2..10dc8ca 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -84,6 +84,7 @@ struct DMAAIOCB {
BlockDriverAIOCB common;
union {
BlockDriverState *bs;
+ uint8_t *ptr;
} u;
BlockDriverAIOCB *acb;
QEMUSGList *sg;
@@ -245,3 +246,65 @@ BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
{
return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, 1);
}
+
+
+static int dma_copy_cb(DMAAIOCB *dbs, int ret)
+{
+ void *mem;
+ target_phys_addr_t cur_len;
+
+ assert(ret == 0);
+
+ /* sector_num is the residual number of bytes to copy. */
+ while (dbs->sector_num > 0 &&
+ (mem = qemu_sglist_map_segment(dbs->sg, &cur_len,
+ !dbs->is_write)) != NULL) {
+ if (dbs->is_write) {
+ memcpy(dbs->u.ptr, mem, cur_len);
+ } else {
+ memcpy(mem, dbs->u.ptr, cur_len);
+ }
+ cpu_physical_memory_unmap(mem, cur_len, !dbs->is_write, cur_len);
+ dbs->u.ptr += cur_len;
+ dbs->sector_num -= cur_len;
+ }
+
+ if (dbs->sg->resid > 0) {
+ cpu_register_map_client(dbs, continue_after_map_failure);
+ return -EAGAIN;
+ }
+
+ return 0;
+}
+
+static BlockDriverAIOCB *dma_copy(
+ uint8_t *ptr, int32_t len, QEMUSGList *sg,
+ void (*cb)(void *opaque, int ret), void *opaque, int is_write)
+{
+ DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, NULL, cb, opaque);
+
+ dbs->acb = NULL;
+ dbs->u.ptr = ptr;
+ dbs->sg = sg;
+ dbs->sector_num = MIN(len, sg->size);
+ dbs->is_write = is_write;
+ dbs->bh = NULL;
+ dbs->cb = dma_copy_cb;
+ qemu_iovec_init_external(&dbs->iov, NULL, 0);
+ qemu_sglist_rewind(sg);
+ return dma_continue(dbs, 0);
+}
+
+BlockDriverAIOCB *dma_buf_read(
+ uint8_t *ptr, int32_t len, QEMUSGList *sg,
+ void (*cb)(void *opaque, int ret), void *opaque)
+{
+ return dma_copy(ptr, len, sg, cb, opaque, 0);
+}
+
+BlockDriverAIOCB *dma_buf_write(
+ uint8_t *ptr, int32_t len, QEMUSGList *sg,
+ void (*cb)(void *opaque, int ret), void *opaque)
+{
+ return dma_copy(ptr, len, sg, cb, opaque, 1);
+}
diff --git a/dma.h b/dma.h
index 363e932..6bf51ee 100644
--- a/dma.h
+++ b/dma.h
@@ -58,4 +58,9 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
QEMUSGList *sg, uint64_t sector,
BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *dma_buf_read(uint8_t *ptr, int32_t len, QEMUSGList *sg,
+ BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *dma_buf_write(uint8_t *ptr, int32_t len, QEMUSGList *sg,
+ BlockDriverCompletionFunc *cb, void *opaque);
+
#endif
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 06/10] scsi: pass residual amount to command_complete
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (4 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 07/10] scsi: add scatter/gather functionality Paolo Bonzini
` (5 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
With the upcoming sglist support, HBAs will not see any transfer_data
call and will not have a way to detect short transfers. So pass the
residual amount of data upon command completion.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/esp.c | 3 ++-
hw/lsi53c895a.c | 2 +-
hw/scsi-bus.c | 6 +++++-
hw/scsi.h | 3 ++-
hw/spapr_vscsi.c | 2 +-
hw/usb-msd.c | 2 +-
6 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/hw/esp.c b/hw/esp.c
index be3a35d..5d29071 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -395,7 +395,8 @@ static void esp_do_dma(ESPState *s)
esp_dma_done(s);
}
-static void esp_command_complete(SCSIRequest *req, uint32_t status)
+static void esp_command_complete(SCSIRequest *req, uint32_t status,
+ int32_t resid)
{
ESPState *s = DO_UPCAST(ESPState, busdev.qdev, req->bus->qbus.parent);
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index dac176a..a0c2419 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -706,7 +706,7 @@ static int lsi_queue_req(LSIState *s, SCSIRequest *req, uint32_t len)
}
/* Callback to indicate that the SCSI layer has completed a command. */
-static void lsi_command_complete(SCSIRequest *req, uint32_t status)
+static void lsi_command_complete(SCSIRequest *req, uint32_t status, int32_t resid)
{
LSIState *s = DO_UPCAST(LSIState, dev.qdev, req->bus->qbus.parent);
int out;
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 648b1f9..b49d02d 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -394,6 +394,8 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun,
}
req->cmd = cmd;
+ req->resid = req->cmd.xfer;
+
switch (buf[0]) {
case INQUIRY:
trace_scsi_inquiry(d->id, lun, tag, cmd.buf[1], cmd.buf[2]);
@@ -1043,6 +1045,8 @@ void scsi_req_continue(SCSIRequest *req)
void scsi_req_data(SCSIRequest *req, int len)
{
trace_scsi_req_data(req->dev->id, req->lun, req->tag, len);
+ assert(req->cmd.mode != SCSI_XFER_NONE);
+ req->resid -= len;
req->bus->ops->transfer_data(req, len);
}
@@ -1091,7 +1095,7 @@ void scsi_req_complete(SCSIRequest *req, int status)
scsi_req_ref(req);
scsi_req_dequeue(req);
- req->bus->ops->complete(req, req->status);
+ req->bus->ops->complete(req, req->status, req->resid);
scsi_req_unref(req);
}
diff --git a/hw/scsi.h b/hw/scsi.h
index 98fd689..76d4df2 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -47,6 +47,7 @@ struct SCSIRequest {
uint32_t tag;
uint32_t lun;
uint32_t status;
+ size_t resid;
SCSICommand cmd;
BlockDriverAIOCB *aiocb;
uint8_t sense[SCSI_SENSE_BUF_SIZE];
@@ -98,7 +99,7 @@ struct SCSIDeviceInfo {
struct SCSIBusOps {
void (*transfer_data)(SCSIRequest *req, uint32_t arg);
- void (*complete)(SCSIRequest *req, uint32_t arg);
+ void (*complete)(SCSIRequest *req, uint32_t arg, int32_t len);
void (*cancel)(SCSIRequest *req);
};
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
index 1f4de11..128b5a6 100644
--- a/hw/spapr_vscsi.c
+++ b/hw/spapr_vscsi.c
@@ -467,7 +467,7 @@ static void vscsi_transfer_data(SCSIRequest *sreq, uint32_t len)
}
/* Callback to indicate that the SCSI layer has completed a transfer. */
-static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status)
+static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, int32_t resid)
{
VSCSIState *s = DO_UPCAST(VSCSIState, vdev.qdev, sreq->bus->qbus.parent);
vscsi_req *req = sreq->hba_private;
diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 63305b8..8cddf80 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -232,7 +232,7 @@ static void usb_msd_transfer_data(SCSIRequest *req, uint32_t len)
}
}
-static void usb_msd_command_complete(SCSIRequest *req, uint32_t status)
+static void usb_msd_command_complete(SCSIRequest *req, uint32_t status, int32_t resid)
{
MSDState *s = DO_UPCAST(MSDState, dev.qdev, req->bus->qbus.parent);
USBPacket *p = s->packet;
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 07/10] scsi: add scatter/gather functionality
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (5 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 06/10] scsi: pass residual amount to command_complete Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 08/10] scsi-disk: commonize iovec creation between reads and writes Paolo Bonzini
` (4 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Scatter/gather functionality uses the newly added DMA helpers. The
device can choose between doing DMA itself, or calling scsi_req_data
as usual, which will use the newly added DMA helpers to map the
destination area(s) piecewise and copy to/from them.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/esp.c | 2 +-
hw/lsi53c895a.c | 2 +-
hw/scsi-bus.c | 34 +++++++++++++++++++++++++++++++---
hw/scsi.h | 4 +++-
hw/spapr_vscsi.c | 2 +-
hw/usb-msd.c | 2 +-
6 files changed, 38 insertions(+), 8 deletions(-)
diff --git a/hw/esp.c b/hw/esp.c
index 5d29071..2d098ea 100644
--- a/hw/esp.c
+++ b/hw/esp.c
@@ -245,7 +245,7 @@ static void do_busid_cmd(ESPState *s, uint8_t *buf, uint8_t busid)
DPRINTF("do_busid_cmd: busid 0x%x\n", busid);
lun = busid & 7;
s->current_req = scsi_req_new(s->current_dev, 0, lun, buf, NULL);
- datalen = scsi_req_enqueue(s->current_req);
+ datalen = scsi_req_enqueue(s->current_req, NULL);
s->ti_size = datalen;
if (datalen != 0) {
s->rregs[ESP_RSTAT] = STAT_TC;
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index a0c2419..ee7f67c 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -785,7 +785,7 @@ static void lsi_do_command(LSIState *s)
s->current->req = scsi_req_new(dev, s->current->tag, s->current_lun, buf,
s->current);
- n = scsi_req_enqueue(s->current->req);
+ n = scsi_req_enqueue(s->current->req, NULL);
if (n) {
if (n > 0) {
lsi_set_phase(s, PHASE_DI);
diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index b49d02d..5876f78 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -5,6 +5,7 @@
#include "qdev.h"
#include "blockdev.h"
#include "trace.h"
+#include "dma.h"
static char *scsibus_get_fw_dev_path(DeviceState *dev);
static int scsi_req_parse(SCSICommand *cmd, SCSIDevice *dev, uint8_t *buf);
@@ -521,7 +522,7 @@ void scsi_req_build_sense(SCSIRequest *req, SCSISense sense)
req->sense_len = 18;
}
-int32_t scsi_req_enqueue(SCSIRequest *req)
+int32_t scsi_req_enqueue(SCSIRequest *req, QEMUSGList *sg)
{
int32_t rc;
@@ -529,6 +530,7 @@ int32_t scsi_req_enqueue(SCSIRequest *req)
scsi_req_ref(req);
req->enqueued = true;
QTAILQ_INSERT_TAIL(&req->dev->requests, req, next);
+ req->sg = sg;
scsi_req_ref(req);
rc = req->ops->send_command(req, req->cmd.buf);
@@ -1039,15 +1041,41 @@ void scsi_req_continue(SCSIRequest *req)
}
}
+static void scsi_dma_cb(void *opaque, int ret)
+{
+ SCSIRequest *req = opaque;
+ assert(ret == 0);
+ assert(req->dma_started);
+ req->resid = qemu_sglist_get_resid(req->sg);
+ scsi_req_continue(req);
+}
+
/* Called by the devices when data is ready for the HBA. The HBA should
start a DMA operation to read or fill the device's data buffer.
Once it completes, calling scsi_req_continue will restart I/O. */
void scsi_req_data(SCSIRequest *req, int len)
{
+ uint8_t *buf;
trace_scsi_req_data(req->dev->id, req->lun, req->tag, len);
assert(req->cmd.mode != SCSI_XFER_NONE);
- req->resid -= len;
- req->bus->ops->transfer_data(req, len);
+ if (!req->sg) {
+ req->resid -= len;
+ req->bus->ops->transfer_data(req, len);
+ return;
+ }
+
+ /* If the device calls scsi_req_data and the HBA specified a
+ * scatter/gather list, the transfer has to happen in a single
+ * step. */
+ assert(!req->dma_started);
+ req->dma_started = true;
+
+ buf = scsi_req_get_buf(req);
+ if (req->cmd.mode == SCSI_XFER_FROM_DEV) {
+ req->aiocb = dma_buf_read(buf, len, req->sg, scsi_dma_cb, req);
+ } else {
+ req->aiocb = dma_buf_write(buf, len, req->sg, scsi_dma_cb, req);
+ }
}
void scsi_req_print(SCSIRequest *req)
diff --git a/hw/scsi.h b/hw/scsi.h
index 76d4df2..febb6fd 100644
--- a/hw/scsi.h
+++ b/hw/scsi.h
@@ -50,6 +50,8 @@ struct SCSIRequest {
size_t resid;
SCSICommand cmd;
BlockDriverAIOCB *aiocb;
+ QEMUSGList *sg;
+ bool dma_started;
uint8_t sense[SCSI_SENSE_BUF_SIZE];
uint32_t sense_len;
bool enqueued;
@@ -174,7 +176,7 @@ SCSIRequest *scsi_req_alloc(SCSIReqOps *reqops, SCSIDevice *d, uint32_t tag,
uint32_t lun, void *hba_private);
SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, uint32_t lun,
uint8_t *buf, void *hba_private);
-int32_t scsi_req_enqueue(SCSIRequest *req);
+int32_t scsi_req_enqueue(SCSIRequest *req, QEMUSGList *qsg);
void scsi_req_free(SCSIRequest *req);
SCSIRequest *scsi_req_ref(SCSIRequest *req);
void scsi_req_unref(SCSIRequest *req);
diff --git a/hw/spapr_vscsi.c b/hw/spapr_vscsi.c
index 128b5a6..3a6d35f 100644
--- a/hw/spapr_vscsi.c
+++ b/hw/spapr_vscsi.c
@@ -601,7 +601,7 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
req->lun = lun;
req->sreq = scsi_req_new(sdev, req->qtag, lun, srp->cmd.cdb, req);
- n = scsi_req_enqueue(req->sreq);
+ n = scsi_req_enqueue(req->sreq, NULL);
dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
req->qtag, srp->cmd.cdb[0], id, lun, n);
diff --git a/hw/usb-msd.c b/hw/usb-msd.c
index 8cddf80..167da8a 100644
--- a/hw/usb-msd.c
+++ b/hw/usb-msd.c
@@ -381,7 +381,7 @@ static int usb_msd_handle_data(USBDevice *dev, USBPacket *p)
s->residue = 0;
s->scsi_len = 0;
s->req = scsi_req_new(s->scsi_dev, s->tag, 0, cbw.cmd, NULL);
- scsi_req_enqueue(s->req);
+ scsi_req_enqueue(s->req, NULL);
/* ??? Should check that USB and SCSI data transfer
directions match. */
if (s->mode != USB_MSDM_CSW && s->residue == 0) {
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 08/10] scsi-disk: commonize iovec creation between reads and writes
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (6 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 07/10] scsi: add scatter/gather functionality Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 09/10] scsi-disk: lazily allocate bounce buffer Paolo Bonzini
` (3 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Also, consistently use qiov.size instead of iov.iov_len.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/scsi-disk.c | 40 ++++++++++++++++++----------------------
1 files changed, 18 insertions(+), 22 deletions(-)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 2cb6ff3..37dd9d6 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -107,6 +107,13 @@ static void scsi_cancel_io(SCSIRequest *req)
r->req.aiocb = NULL;
}
+static uint32_t scsi_init_iovec(SCSIDiskReq *r)
+{
+ r->iov.iov_len = MIN(r->sector_count * 512, SCSI_DMA_BUF_SIZE);
+ qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+ return r->qiov.size / 512;
+}
+
static void scsi_read_complete(void * opaque, int ret)
{
SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -120,12 +127,12 @@ static void scsi_read_complete(void * opaque, int ret)
}
}
- DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->iov.iov_len);
+ DPRINTF("Data ready tag=0x%x len=%zd\n", r->req.tag, r->qiov.size);
- n = r->iov.iov_len / 512;
+ n = r->qiov.size / 512;
r->sector += n;
r->sector_count -= n;
- scsi_req_data(&r->req, r->iov.iov_len);
+ scsi_req_data(&r->req, r->qiov.size);
}
@@ -158,12 +165,7 @@ static void scsi_read_data(SCSIRequest *req)
return;
}
- n = r->sector_count;
- if (n > SCSI_DMA_BUF_SIZE / 512)
- n = SCSI_DMA_BUF_SIZE / 512;
-
- r->iov.iov_len = n * 512;
- qemu_iovec_init_external(&r->qiov, &r->iov, 1);
+ n = scsi_init_iovec(r);
r->req.aiocb = bdrv_aio_readv(s->bs, r->sector, &r->qiov, n,
scsi_read_complete, r);
if (r->req.aiocb == NULL) {
@@ -210,7 +212,6 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type)
static void scsi_write_complete(void * opaque, int ret)
{
SCSIDiskReq *r = (SCSIDiskReq *)opaque;
- uint32_t len;
uint32_t n;
r->req.aiocb = NULL;
@@ -221,19 +222,15 @@ static void scsi_write_complete(void * opaque, int ret)
}
}
- n = r->iov.iov_len / 512;
+ n = r->qiov.size / 512;
r->sector += n;
r->sector_count -= n;
if (r->sector_count == 0) {
scsi_req_complete(&r->req, GOOD);
} else {
- len = r->sector_count * 512;
- if (len > SCSI_DMA_BUF_SIZE) {
- len = SCSI_DMA_BUF_SIZE;
- }
- r->iov.iov_len = len;
- DPRINTF("Write complete tag=0x%x more=%d\n", r->req.tag, len);
- scsi_req_data(&r->req, len);
+ scsi_init_iovec(r);
+ DPRINTF("Write complete tag=0x%x more=%d\n", r->req.tag, r->qiov.size);
+ scsi_req_data(&r->req, r->qiov.size);
}
}
@@ -252,16 +249,15 @@ static void scsi_write_data(SCSIRequest *req)
return;
}
- n = r->iov.iov_len / 512;
+ n = r->qiov.size / 512;
if (n) {
- qemu_iovec_init_external(&r->qiov, &r->iov, 1);
r->req.aiocb = bdrv_aio_writev(s->bs, r->sector, &r->qiov, n,
- scsi_write_complete, r);
+ scsi_write_complete, r);
if (r->req.aiocb == NULL) {
scsi_write_complete(r, -ENOMEM);
}
} else {
- /* Invoke completion routine to fetch data from host. */
+ /* Called for the first time. Ask the driver to send us more data. */
scsi_write_complete(r, 0);
}
}
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 09/10] scsi-disk: lazily allocate bounce buffer
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (7 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 08/10] scsi-disk: commonize iovec creation between reads and writes Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 10/10] scsi-disk: enable scatter/gather functionality Paolo Bonzini
` (2 subsequent siblings)
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
It will not be needed for reads and writes if the HBA provides a sglist.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/scsi-disk.c | 33 ++++++++++++++++++++++-----------
1 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 37dd9d6..509407f 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -54,6 +54,7 @@ typedef struct SCSIDiskReq {
/* Both sector and sector_count are in terms of qemu 512 byte blocks. */
uint64_t sector;
uint32_t sector_count;
+ uint32_t buflen;
struct iovec iov;
QEMUIOVector qiov;
uint32_t status;
@@ -77,13 +78,15 @@ struct SCSIDiskState
};
static int scsi_handle_rw_error(SCSIDiskReq *r, int error, int type);
-static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf);
+static int scsi_disk_emulate_command(SCSIDiskReq *r);
static void scsi_free_request(SCSIRequest *req)
{
SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
- qemu_vfree(r->iov.iov_base);
+ if (r->iov.iov_base) {
+ qemu_vfree(r->iov.iov_base);
+ }
}
/* Helper function for command completion with sense. */
@@ -109,7 +112,13 @@ static void scsi_cancel_io(SCSIRequest *req)
static uint32_t scsi_init_iovec(SCSIDiskReq *r)
{
- r->iov.iov_len = MIN(r->sector_count * 512, SCSI_DMA_BUF_SIZE);
+ SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+ if (!r->iov.iov_base) {
+ r->buflen = SCSI_DMA_BUF_SIZE;
+ r->iov.iov_base = qemu_blockalign(s->bs, r->buflen);
+ }
+ r->iov.iov_len = MIN(r->sector_count * 512, r->buflen);
qemu_iovec_init_external(&r->qiov, &r->iov, 1);
return r->qiov.size / 512;
}
@@ -288,7 +297,7 @@ static void scsi_dma_restart_bh(void *opaque)
scsi_write_data(&r->req);
break;
case SCSI_REQ_STATUS_RETRY_FLUSH:
- ret = scsi_disk_emulate_command(r, r->iov.iov_base);
+ ret = scsi_disk_emulate_command(r);
if (ret == 0) {
scsi_req_complete(&r->req, GOOD);
}
@@ -780,14 +789,21 @@ static int scsi_disk_emulate_read_toc(SCSIRequest *req, uint8_t *outbuf)
return toclen;
}
-static int scsi_disk_emulate_command(SCSIDiskReq *r, uint8_t *outbuf)
+static int scsi_disk_emulate_command(SCSIDiskReq *r)
{
SCSIRequest *req = &r->req;
SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
uint64_t nb_sectors;
+ uint8_t *outbuf;
int buflen = 0;
int ret;
+ if (!r->iov.iov_base) {
+ r->buflen = 4096;
+ r->iov.iov_base = qemu_blockalign(s->bs, 4096);
+ }
+
+ outbuf = r->iov.iov_base;
switch (req->cmd.buf[0]) {
case TEST_UNIT_READY:
if (!bdrv_is_inserted(s->bs))
@@ -950,11 +966,9 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf)
SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, req->dev);
int32_t len;
uint8_t command;
- uint8_t *outbuf;
int rc;
command = buf[0];
- outbuf = (uint8_t *)r->iov.iov_base;
DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", req->lun, req->tag, buf[0]);
#ifdef DEBUG_SCSI
@@ -985,7 +999,7 @@ static int32_t scsi_send_command(SCSIRequest *req, uint8_t *buf)
case SERVICE_ACTION_IN:
case VERIFY:
case REZERO_UNIT:
- rc = scsi_disk_emulate_command(r, outbuf);
+ rc = scsi_disk_emulate_command(r);
if (rc < 0) {
return 0;
}
@@ -1206,11 +1220,8 @@ static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag,
{
SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, d);
SCSIRequest *req;
- SCSIDiskReq *r;
req = scsi_req_alloc(&scsi_disk_reqops, &s->qdev, tag, lun, hba_private);
- r = DO_UPCAST(SCSIDiskReq, req, req);
- r->iov.iov_base = qemu_blockalign(s->bs, SCSI_DMA_BUF_SIZE);
return req;
}
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 10/10] scsi-disk: enable scatter/gather functionality
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (8 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 09/10] scsi-disk: lazily allocate bounce buffer Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 11/11] sample pvscsi driver with s/g support Paolo Bonzini
2011-08-11 7:57 ` [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Stefan Hajnoczi
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/scsi-disk.c | 52 +++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 41 insertions(+), 11 deletions(-)
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 509407f..81117d2 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -37,6 +37,7 @@ do { fprintf(stderr, "scsi-disk: " fmt , ## __VA_ARGS__); } while (0)
#include "scsi-defs.h"
#include "sysemu.h"
#include "blockdev.h"
+#include "dma.h"
#define SCSI_DMA_BUF_SIZE 131072
#define SCSI_MAX_INQUIRY_LEN 256
@@ -123,6 +124,25 @@ static uint32_t scsi_init_iovec(SCSIDiskReq *r)
return r->qiov.size / 512;
}
+static void scsi_dma_complete(void *opaque, int ret)
+{
+ SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+
+ r->req.resid = qemu_sglist_get_resid(r->req.sg);
+ r->req.aiocb = NULL;
+ if (ret) {
+ int is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV);
+ int retry = is_read ? SCSI_REQ_STATUS_RETRY_READ : SCSI_REQ_STATUS_RETRY_WRITE;
+ if (scsi_handle_rw_error(r, -ret, retry)) {
+ return;
+ }
+ }
+
+ r->sector += r->sector_count;
+ r->sector_count = 0;
+ scsi_req_complete(&r->req, GOOD);
+}
+
static void scsi_read_complete(void * opaque, int ret)
{
SCSIDiskReq *r = (SCSIDiskReq *)opaque;
@@ -174,9 +194,14 @@ static void scsi_read_data(SCSIRequest *req)
return;
}
- n = scsi_init_iovec(r);
- r->req.aiocb = bdrv_aio_readv(s->bs, r->sector, &r->qiov, n,
- scsi_read_complete, r);
+ if (r->req.sg) {
+ r->req.aiocb = dma_bdrv_read(s->bs, r->req.sg, r->sector,
+ scsi_dma_complete, r);
+ } else {
+ n = scsi_init_iovec(r);
+ r->req.aiocb = bdrv_aio_readv(s->bs, r->sector, &r->qiov, n,
+ scsi_read_complete, r);
+ }
if (r->req.aiocb == NULL) {
scsi_read_complete(r, -EIO);
}
@@ -258,16 +283,21 @@ static void scsi_write_data(SCSIRequest *req)
return;
}
- n = r->qiov.size / 512;
- if (n) {
+ if (r->req.sg) {
+ r->req.aiocb = dma_bdrv_write(s->bs, r->req.sg, r->sector,
+ scsi_dma_complete, r);
+ } else {
+ n = r->qiov.size / 512;
+ if (!n) {
+ /* Called for the first time. Ask the driver to send us more data. */
+ scsi_write_complete(r, 0);
+ return;
+ }
r->req.aiocb = bdrv_aio_writev(s->bs, r->sector, &r->qiov, n,
scsi_write_complete, r);
- if (r->req.aiocb == NULL) {
- scsi_write_complete(r, -ENOMEM);
- }
- } else {
- /* Called for the first time. Ask the driver to send us more data. */
- scsi_write_complete(r, 0);
+ }
+ if (r->req.aiocb == NULL) {
+ scsi_write_complete(r, -ENOMEM);
}
}
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [Qemu-devel] [PATCH 11/11] sample pvscsi driver with s/g support
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (9 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 10/10] scsi-disk: enable scatter/gather functionality Paolo Bonzini
@ 2011-08-04 17:14 ` Paolo Bonzini
2011-08-11 7:57 ` [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Stefan Hajnoczi
11 siblings, 0 replies; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-04 17:14 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
Makefile.objs | 1 +
default-configs/i386-softmmu.mak | 1 +
default-configs/pci.mak | 1 +
default-configs/x86_64-softmmu.mak | 1 +
hw/pci.h | 1 +
hw/vmw_pvscsi.c | 904 ++++++++++++++++++++++++++++++++++++
hw/vmw_pvscsi.h | 389 ++++++++++++++++
trace-events | 15 +
8 files changed, 1313 insertions(+), 0 deletions(-)
create mode 100644 hw/vmw_pvscsi.c
create mode 100644 hw/vmw_pvscsi.h
diff --git a/Makefile.objs b/Makefile.objs
index 6991a9f..bd9fc63 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -259,6 +259,7 @@ hw-obj-$(CONFIG_AHCI) += ide/ich.o
# SCSI layer
hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
+hw-obj-$(CONFIG_VMWARE_PVSCSI_PCI) += vmw_pvscsi.o
hw-obj-$(CONFIG_ESP) += esp.o
hw-obj-y += dma-helpers.o sysbus.o isa-bus.o
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 55589fa..a97c94a 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -21,3 +21,4 @@ CONFIG_PIIX_PCI=y
CONFIG_SOUND=y
CONFIG_HPET=y
CONFIG_APPLESMC=y
+CONFIG_VMWARE_PVSCSI_PCI=y
diff --git a/default-configs/pci.mak b/default-configs/pci.mak
index 22bd350..280101b 100644
--- a/default-configs/pci.mak
+++ b/default-configs/pci.mak
@@ -9,6 +9,7 @@ CONFIG_EEPRO100_PCI=y
CONFIG_PCNET_PCI=y
CONFIG_PCNET_COMMON=y
CONFIG_LSI_SCSI_PCI=y
+CONFIG_VMWARE_PVSCSI_PCI=y
CONFIG_RTL8139_PCI=y
CONFIG_E1000_PCI=y
CONFIG_IDE_CORE=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 8895028..5321314 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -21,3 +21,4 @@ CONFIG_PIIX_PCI=y
CONFIG_SOUND=y
CONFIG_HPET=y
CONFIG_APPLESMC=y
+CONFIG_VMWARE_PVSCSI_PCI=y
diff --git a/hw/pci.h b/hw/pci.h
index 8fd4f86..1c60f57 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -60,6 +60,7 @@
#define PCI_DEVICE_ID_VMWARE_NET 0x0720
#define PCI_DEVICE_ID_VMWARE_SCSI 0x0730
#define PCI_DEVICE_ID_VMWARE_IDE 0x1729
+#define PCI_DEVICE_ID_VMWARE_PVSCSI 0x07c0
/* Intel (0x8086) */
#define PCI_DEVICE_ID_INTEL_82551IT 0x1209
diff --git a/hw/vmw_pvscsi.c b/hw/vmw_pvscsi.c
new file mode 100644
index 0000000..19a3871
--- /dev/null
+++ b/hw/vmw_pvscsi.c
@@ -0,0 +1,904 @@
+/*
+ * VMware Paravirtualized SCSI Host Bus Adapter emulation
+ *
+ * Copyright (c) 2011 Red Hat, Inc.
+ * Written by Paolo Bonzini
+ *
+ * This code is licensed under GPLv2.
+ */
+
+#include <assert.h>
+
+#include "hw.h"
+#include "pci.h"
+#include "scsi.h"
+#include "scsi-defs.h"
+#include "vmw_pvscsi.h"
+#include "block_int.h"
+#include "host-utils.h"
+#include "dma.h"
+#include "trace.h"
+
+#define PVSCSI_MAX_DEVS 127
+#define PAGE_SIZE 4096
+#define PAGE_SHIFT 12
+
+typedef struct PVSCSISGState {
+ target_phys_addr_t elemAddr;
+ target_phys_addr_t dataAddr;
+ uint32_t resid;
+} PVSCSISGState;
+
+typedef struct PVSCSIRequest {
+ SCSIDevice *sdev;
+ SCSIRequest *sreq;
+ uint8_t sense_key;
+ uint8_t completed;
+ int lun;
+ uint64_t resid;
+ QEMUSGList sgl;
+ PVSCSISGState sg;
+ struct PVSCSIRingReqDesc req;
+ struct PVSCSIRingCmpDesc cmp;
+ QTAILQ_ENTRY(PVSCSIRequest) next;
+} PVSCSIRequest;
+
+typedef QTAILQ_HEAD(, PVSCSIRequest) PVSCSIRequestList;
+
+typedef struct {
+ PCIDevice dev;
+ SCSIBus bus;
+ QEMUBH *complete_reqs_bh;
+
+ int mmio_io_addr;
+ uint32_t use_iovec;
+
+ /* zeroed on reset */
+ uint32_t cmd_latch;
+ uint32_t cmd_buffer[sizeof(struct PVSCSICmdDescSetupRings)
+ / sizeof(uint32_t)];
+ uint32_t cmd_ptr;
+ uint32_t cmd_status;
+ uint32_t intr_status;
+ uint32_t intr_mask;
+ uint32_t intr_cmpl;
+ uint32_t intr_msg;
+ struct PVSCSICmdDescSetupRings rings;
+ struct PVSCSICmdDescSetupMsgRing msgRing;
+ uint32_t reqNumEntriesLog2;
+ uint32_t cmpNumEntriesLog2;
+ uint32_t msgNumEntriesLog2;
+
+ PVSCSIRequestList pending_queue;
+ PVSCSIRequestList complete_queue;
+} PVSCSIState;
+
+\f
+static inline int pvscsi_get_lun(uint8_t *lun)
+{
+ if (lun[0] || lun[2] || lun[3] || lun[4] || lun[5] || lun[6] || lun[7]) {
+ return -1;
+ }
+ return lun[1];
+}
+
+static inline int pvscsi_get_dev_lun(PVSCSIState *s,
+ uint8_t *lun, uint32_t target,
+ SCSIDevice **sdev)
+{
+ SCSIBus *bus = &s->bus;
+ int lunval;
+ *sdev = NULL;
+ if (target > PVSCSI_MAX_DEVS) {
+ return -1;
+ }
+ lunval = pvscsi_get_lun(lun);
+ if (lunval < 0) {
+ return -1;
+ }
+ *sdev = bus->devs[target];
+ if (!sdev) {
+ return -1;
+ }
+ return lunval;
+}
+
+\f
+/* Add a command to the pending queue. */
+static PVSCSIRequest *pvscsi_queue_request(PVSCSIState *s, SCSIDevice **d,
+ struct PVSCSIRingReqDesc *req)
+{
+ PVSCSIRequest *p;
+ int lun;
+
+ trace_pvscsi_queue_request(req->context, req->cdb[0], req->dataLen);
+
+ p = qemu_mallocz(sizeof(*p));
+ p->req = *req;
+ p->cmp.context = p->req.context;
+ QTAILQ_INSERT_TAIL(&s->pending_queue, p, next);
+
+ lun = pvscsi_get_dev_lun(s, req->lun, req->target, d);
+ if (!*d) {
+ return p;
+ }
+
+ p->lun = lun;
+ return p;
+}
+
+static void pvscsi_free_queue(PVSCSIRequestList *q)
+{
+ PVSCSIRequest *p;
+
+ while (!QTAILQ_EMPTY(q)) {
+ p = QTAILQ_FIRST(q);
+ QTAILQ_REMOVE(q, p, next);
+ qemu_free(p);
+ }
+}
+
+static void pvscsi_soft_reset(PVSCSIState *s)
+{
+ qbus_reset_all_fn(&s->bus);
+ pvscsi_free_queue(&s->complete_queue);
+ assert(QTAILQ_EMPTY(&s->pending_queue));
+ memset(&s->cmd_latch, 0, sizeof(*s) - offsetof(PVSCSIState, cmd_latch));
+ s->intr_cmpl = PVSCSI_INTR_CMPL_0;
+ s->intr_msg = PVSCSI_INTR_MSG_0;
+ QTAILQ_INIT(&s->pending_queue);
+ QTAILQ_INIT(&s->complete_queue);
+}
+
+\f
+static void pvscsi_raise_intr(PVSCSIState *s, int mask)
+{
+ int intr_raised = mask & ~s->intr_status;
+ s->intr_status |= mask;
+ trace_pvscsi_raise_intr(intr_raised,
+ (intr_raised & s->intr_mask) == 0 ? "masked" : "");
+ if (intr_raised & s->intr_mask) {
+ qemu_set_irq(s->dev.irq[0], 1);
+ }
+}
+
+static void pvscsi_acknowledge_intr(PVSCSIState *s, int mask)
+{
+ trace_pvscsi_acknowledge_intr(mask);
+ s->intr_status &= ~mask;
+ if (mask == s->intr_cmpl) {
+ s->intr_cmpl ^= PVSCSI_INTR_CMPL_MASK;
+
+ /* Try putting more complete requests on the ring. */
+ if (!QTAILQ_EMPTY(&s->complete_queue)) {
+ qemu_bh_schedule(s->complete_reqs_bh);
+ }
+ }
+ if (mask == s->intr_msg) {
+ s->intr_msg ^= PVSCSI_INTR_MSG_MASK;
+ }
+ if ((s->intr_status & s->intr_mask) == 0) {
+ qemu_set_irq(s->dev.irq[0], 0);
+ }
+}
+
+static void pvscsi_set_intr_mask(PVSCSIState *s, int mask)
+{
+ int intr_enabled = mask & ~s->intr_mask;
+ s->intr_mask = mask;
+ if (s->intr_status & intr_enabled) {
+ qemu_set_irq(s->dev.irq[0], 1);
+ }
+ if ((s->intr_status & mask) == 0) {
+ qemu_set_irq(s->dev.irq[0], 0);
+ }
+}
+
+\f
+#define pvscsi_ld_ring_state(s, field) \
+ ldl_le_phys(s->rings.ringsStatePPN * PAGE_SIZE + offsetof(struct PVSCSIRingsState, field))
+
+#define pvscsi_st_ring_state(s, field, val) \
+ stl_le_phys(s->rings.ringsStatePPN * PAGE_SIZE + offsetof(struct PVSCSIRingsState, field), \
+ val)
+
+/* Return number of free elements in the completion ring. */
+static inline int pvscsi_cmp_free(PVSCSIState *s)
+{
+ return ((1 << s->cmpNumEntriesLog2) - 1 -
+ (pvscsi_ld_ring_state(s, cmpProdIdx) - pvscsi_ld_ring_state(s, cmpConsIdx)));
+}
+
+/* Return number of pending elements in the request ring. */
+static inline int pvscsi_req_pending(PVSCSIState *s)
+{
+ return pvscsi_ld_ring_state(s, reqProdIdx) - pvscsi_ld_ring_state(s, reqConsIdx);
+}
+
+/* Return the physical address of the idx-th element in the ring
+ * whose physical page numbers are given by ppn. Each element in
+ * the ring has size bytes. */
+static target_phys_addr_t pvscsi_get_ring_addr(PVSCSIState *s, int idx,
+ int size, uint64_t *ppn)
+{
+ uint32_t ofs = idx * size;
+ return (ppn[ofs >> PAGE_SHIFT] * PAGE_SIZE) | (ofs & (PAGE_SIZE - 1));
+}
+\f
+
+#define barrier()
+
+/* Copy cmp_desc on the completion ring, assuming there is a free entry. */
+static void pvscsi_cmp_ring_put(PVSCSIState *s,
+ struct PVSCSIRingCmpDesc *cmp_desc)
+{
+ uint32_t cmp_entries = s->cmpNumEntriesLog2;
+ uint32_t val = pvscsi_ld_ring_state(s, cmpProdIdx);
+ uint32_t idx = val & MASK(cmp_entries);
+ target_phys_addr_t addr;
+
+ trace_pvscsi_cmp_ring_put(cmp_desc->context);
+ addr = pvscsi_get_ring_addr(s, idx, sizeof(struct PVSCSIRingCmpDesc),
+ s->rings.cmpRingPPNs);
+
+ barrier();
+ cpu_physical_memory_write(addr, (void *)cmp_desc, sizeof(*cmp_desc));
+ barrier();
+ pvscsi_st_ring_state(s, cmpProdIdx, val + 1);
+}
+
+/* Put all completed requests on the completion ring. */
+static void pvscsi_complete_reqs(void *opaque)
+{
+ PVSCSIState *s = opaque;
+ PVSCSIRequest *p;
+ int n = pvscsi_cmp_free(s);
+ int done = 0;
+ while (n > 0 && !QTAILQ_EMPTY(&s->complete_queue)) {
+ p = QTAILQ_FIRST(&s->complete_queue);
+ QTAILQ_REMOVE(&s->complete_queue, p, next);
+ pvscsi_cmp_ring_put(s, &p->cmp);
+ qemu_free(p);
+ n--;
+ done++;
+ }
+ if (done) {
+ pvscsi_raise_intr(s, s->intr_cmpl);
+ }
+}
+
+/* Prepare to put r on the completion ring. */
+static void pvscsi_complete_req(PVSCSIState *s, PVSCSIRequest *p)
+{
+ assert(!p->completed);
+ trace_pvscsi_complete_req(p->cmp.context, p->cmp.dataLen, p->sense_key);
+ if (p->sreq != NULL) {
+ scsi_req_unref(p->sreq);
+ p->sreq = NULL;
+ }
+ p->completed = 1;
+ QTAILQ_REMOVE(&s->pending_queue, p, next);
+ QTAILQ_INSERT_TAIL(&s->complete_queue, p, next);
+ qemu_bh_schedule(s->complete_reqs_bh);
+}
+
+/* Write sense data for a completed request. */
+static void pvscsi_write_sense(PVSCSIRequest *p, uint8_t *buf, int len)
+{
+ p->cmp.senseLen = MIN(p->req.senseLen, len);
+ p->sense_key = buf[2];
+ cpu_physical_memory_write(p->req.senseAddr, buf, p->cmp.senseLen);
+}
+
+static void pvscsi_transfer_data_with_buffer(PVSCSIRequest *p, bool to_host,
+ uint8_t *buf, int len)
+{
+ if (len) {
+ cpu_physical_memory_rw(p->req.dataAddr, buf, len, to_host);
+ p->cmp.dataLen += len;
+ p->req.dataAddr += len;
+ p->resid -= len;
+ }
+}
+
+static void pvscsi_get_next_sg_elem(PVSCSISGState *sg)
+{
+ struct PVSCSISGElement elem;
+
+ for (;; sg->elemAddr = elem.addr) {
+ cpu_physical_memory_read(sg->elemAddr, (void *)&elem,
+ sizeof(elem));
+#if 0
+ /* PVSCSI_SGE_FLAG_CHAIN_ELEMENT not in the header file! */
+ if ((elem.flags & PVSCSI_SGE_FLAG_CHAIN_ELEMENT) == 0) {
+ break;
+ }
+#else
+ break;
+#endif
+ }
+
+ sg->elemAddr += sizeof(elem);
+ sg->dataAddr = elem.addr;
+ sg->resid = elem.length;
+}
+
+static void pvscsi_transfer_data_with_sg_list(PVSCSIRequest *p, bool to_host,
+ uint8_t *buf, int len)
+{
+ int n;
+ while (len) {
+ while (!p->sg.resid) {
+ pvscsi_get_next_sg_elem(&p->sg);
+ trace_pvscsi_sg_elem(p->req.context, p->sg.dataAddr, p->sg.resid);
+ }
+ assert(len > 0);
+ n = MIN((unsigned) len, p->sg.resid);
+ if (n) {
+ cpu_physical_memory_rw(p->sg.dataAddr, buf, n, to_host);
+ }
+
+ buf += n;
+ p->cmp.dataLen += n;
+ p->sg.dataAddr += n;
+
+ len -= n;
+ p->resid -= n;
+ p->sg.resid -= n;
+ }
+}
+
+static void pvscsi_convert_sglist(PVSCSIRequest *p)
+{
+ int n;
+ uint64_t len = p->req.dataLen;
+ PVSCSISGState sg = p->sg;
+ while (len) {
+ while (!sg.resid) {
+ pvscsi_get_next_sg_elem(&sg);
+ trace_pvscsi_sg_elem(p->req.context, sg.dataAddr, sg.resid);
+ }
+ assert(len > 0);
+ n = MIN((unsigned) len, sg.resid);
+ if (n) {
+ qemu_sglist_add(&p->sgl, sg.dataAddr, n);
+ }
+
+ sg.dataAddr += n;
+ len -= n;
+ sg.resid -= n;
+ }
+}
+
+static void pvscsi_build_sglist(PVSCSIRequest *p)
+{
+ qemu_sglist_init(&p->sgl, 1);
+ if (p->req.flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
+ pvscsi_convert_sglist(p);
+ } else {
+ qemu_sglist_add(&p->sgl, p->req.dataAddr, p->req.dataLen);
+ }
+}
+
+/* Callback to indicate that the SCSI layer has completed a transfer. */
+static void pvscsi_transfer_data(SCSIRequest *req, uint32_t len)
+{
+ PVSCSIRequest *p = req->hba_private;
+ uint8_t *buf = scsi_req_get_buf(req);
+ int to_host = (p->req.flags & PVSCSI_FLAG_CMD_DIR_TOHOST) != 0;
+
+ assert(!req->sg);
+ if (!p) {
+ fprintf(stderr, "PVSCSI: Can't find request for tag 0x%x\n", req->tag);
+ return;
+ }
+
+ assert(p->resid);
+ trace_pvscsi_transfer_data(p->req.context, len);
+ if (!len) {
+ /* Short transfer. */
+ p->cmp.hostStatus = BTSTAT_DATARUN;
+ scsi_req_cancel(req);
+ return;
+ }
+
+ if (len > p->resid) {
+ /* Small buffer. */
+ p->cmp.hostStatus = BTSTAT_DATARUN;
+ scsi_req_cancel(req);
+ return;
+ }
+
+ if (p->req.flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
+ pvscsi_transfer_data_with_sg_list(p, to_host, buf, len);
+ } else {
+ pvscsi_transfer_data_with_buffer(p, to_host, buf, len);
+ }
+
+ scsi_req_continue(req);
+}
+
+/* Callback to indicate that the SCSI layer has completed a transfer. */
+static void pvscsi_command_complete(SCSIRequest *req, uint32_t status, int32_t resid)
+{
+ PVSCSIState *s = DO_UPCAST(PVSCSIState, dev.qdev, req->bus->qbus.parent);
+ PVSCSIRequest *p = req->hba_private;
+
+ if (!p) {
+ fprintf(stderr, "PVSCSI: Can't find request for tag 0x%x\n", req->tag);
+ return;
+ }
+
+ if (resid) {
+ /* Short transfer. */
+ p->cmp.hostStatus = BTSTAT_DATARUN;
+ }
+ p->cmp.scsiStatus = status;
+ if (p->cmp.scsiStatus == CHECK_CONDITION) {
+ uint8_t sense[96];
+ int n = scsi_req_get_sense(p->sreq, sense, sizeof(sense));
+ pvscsi_write_sense(p, sense, n);
+ }
+ qemu_sglist_destroy(&p->sgl);
+ pvscsi_complete_req(s, p);
+}
+
+static void pvscsi_request_cancelled(SCSIRequest *req)
+{
+ PVSCSIState *s = DO_UPCAST(PVSCSIState, dev.qdev, req->bus->qbus.parent);
+ PVSCSIRequest *p = req->hba_private;
+
+ if (p->cmp.hostStatus == BTSTAT_SUCCESS) {
+ p->cmp.hostStatus = BTSTAT_ABORTQUEUE;
+ }
+ pvscsi_complete_req(s, p);
+}
+\f
+
+/* Process a request from the request ring. */
+static void pvscsi_process_req(PVSCSIState *s, struct PVSCSIRingReqDesc *r)
+{
+ SCSIDevice *d;
+ PVSCSIRequest *p = pvscsi_queue_request(s, &d, r);
+ int64_t datalen, n;
+
+ if (!d) {
+ p->cmp.hostStatus = BTSTAT_SELTIMEO;
+ pvscsi_complete_req(s, p);
+ return;
+ }
+
+ if (r->flags & PVSCSI_FLAG_CMD_WITH_SG_LIST) {
+ p->sg.elemAddr = r->dataAddr;
+ }
+
+ p->sreq = scsi_req_new(d, r->context, p->lun, r->cdb, p);
+ if (p->sreq->cmd.mode == SCSI_XFER_FROM_DEV
+ && (r->flags & PVSCSI_FLAG_CMD_DIR_TODEVICE)) {
+ p->cmp.hostStatus = BTSTAT_BADMSG;
+ scsi_req_cancel(p->sreq);
+ return;
+ }
+ if (p->sreq->cmd.mode == SCSI_XFER_TO_DEV
+ && (r->flags & PVSCSI_FLAG_CMD_DIR_TOHOST)) {
+ p->cmp.hostStatus = BTSTAT_BADMSG;
+ scsi_req_cancel(p->sreq);
+ return;
+ }
+ if (!s->use_iovec) {
+ n = scsi_req_enqueue(p->sreq, NULL);
+ } else {
+ pvscsi_build_sglist(p);
+ n = scsi_req_enqueue(p->sreq, &p->sgl);
+ }
+
+ if (n) {
+ datalen = (n < 0 ? -n : n);
+ p->resid = MIN(datalen, r->dataLen);
+ scsi_req_continue(p->sreq);
+ }
+}
+
+/* Process pending requests on the request ring. */
+static void pvscsi_process_req_ring(PVSCSIState *s)
+{
+ uint32_t req_entries = s->reqNumEntriesLog2;
+
+ trace_pvscsi_kick_io();
+ while (pvscsi_req_pending(s)) {
+ uint32_t val = pvscsi_ld_ring_state(s, reqConsIdx);
+ uint32_t idx = val & MASK(req_entries);
+ target_phys_addr_t addr;
+ struct PVSCSIRingReqDesc req_desc;
+
+ addr = pvscsi_get_ring_addr(s, idx, sizeof(struct PVSCSIRingReqDesc),
+ s->rings.reqRingPPNs);
+
+ barrier();
+ cpu_physical_memory_read(addr, (void *)&req_desc, sizeof(req_desc));
+ pvscsi_process_req(s, &req_desc);
+ barrier();
+ pvscsi_st_ring_state(s, reqConsIdx, val + 1);
+ }
+}
+
+\f
+static int32_t pvscsi_cmd_bad(PVSCSIState *s)
+{
+ fprintf(stderr, "vmw_pvscsi: bad command %d\n", s->cmd_latch);
+ return -1;
+}
+
+static int32_t pvscsi_cmd_unimpl(PVSCSIState *s)
+{
+ fprintf(stderr, "vmw_pvscsi: unimplemented command %d\n", s->cmd_latch);
+ return -1;
+}
+
+static int32_t pvscsi_cmd_adapter_reset(PVSCSIState *s)
+{
+ pvscsi_soft_reset(s);
+ return 0;
+}
+
+static int floor_log2(int x)
+{
+ assert(x);
+ return 31 - clz32(x);
+}
+
+/* Setup ring buffers and initialize the ring state page. */
+static int32_t pvscsi_cmd_setup_rings(PVSCSIState *s)
+{
+ memcpy(&s->rings, s->cmd_buffer, sizeof(s->rings));
+ if (s->rings.reqRingNumPages == 0 ||
+ s->rings.cmpRingNumPages == 0) {
+ return -1;
+ }
+
+ s->reqNumEntriesLog2 = floor_log2(s->rings.reqRingNumPages * PAGE_SIZE
+ / sizeof(struct PVSCSIRingReqDesc));
+ s->cmpNumEntriesLog2 = floor_log2(s->rings.cmpRingNumPages * PAGE_SIZE
+ / sizeof(struct PVSCSIRingCmpDesc));
+
+ trace_pvscsi_setup_req_ring(s->rings.reqRingNumPages,
+ 1 << s->reqNumEntriesLog2);
+ trace_pvscsi_setup_cmp_ring(s->rings.cmpRingNumPages,
+ 1 << s->cmpNumEntriesLog2);
+
+ pvscsi_st_ring_state(s, reqNumEntriesLog2, s->reqNumEntriesLog2);
+ pvscsi_st_ring_state(s, cmpNumEntriesLog2, s->cmpNumEntriesLog2);
+ pvscsi_st_ring_state(s, cmpProdIdx, 0);
+ pvscsi_st_ring_state(s, cmpConsIdx, 0);
+ pvscsi_st_ring_state(s, reqProdIdx, 0);
+ pvscsi_st_ring_state(s, reqConsIdx, 0);
+ return 0;
+}
+
+static int32_t pvscsi_cmd_reset_bus(PVSCSIState *s)
+{
+ qbus_reset_all_fn(&s->bus);
+ return 0;
+}
+
+static int32_t pvscsi_cmd_reset_device(PVSCSIState *s)
+{
+ struct PVSCSICmdDescResetDevice *cmd =
+ (struct PVSCSICmdDescResetDevice *) &s->cmd_buffer;
+ SCSIDevice *sdev;
+
+ pvscsi_get_dev_lun(s, cmd->lun, cmd->target, &sdev);
+ if (sdev != NULL && sdev->info->qdev.reset) {
+ sdev->info->qdev.reset(&sdev->qdev);
+ }
+
+ return 0;
+}
+
+static int32_t pvscsi_cmd_abort_cmd(PVSCSIState *s)
+{
+ return 0;
+}
+
+static int32_t pvscsi_cmd_setup_msg_ring(PVSCSIState *s)
+{
+ memcpy(&s->msgRing, s->cmd_buffer, sizeof(s->msgRing));
+ if (s->msgRing.numPages == 0) {
+ return -1;
+ }
+
+ s->msgNumEntriesLog2 = floor_log2(s->msgRing.numPages * PAGE_SIZE
+ / sizeof(struct PVSCSIRingMsgDesc));
+
+ trace_pvscsi_setup_msg_ring(s->msgRing.numPages,
+ 1 << s->msgNumEntriesLog2);
+
+ pvscsi_st_ring_state(s, msgNumEntriesLog2, s->msgNumEntriesLog2);
+ pvscsi_st_ring_state(s, msgProdIdx, 0);
+ pvscsi_st_ring_state(s, msgConsIdx, 0);
+ return 0;
+}
+
+typedef struct {
+ int nargs;
+ int32_t (*fn)(PVSCSIState *);
+} PVSCSICmd;
+
+static const PVSCSICmd pvscsi_commands[PVSCSI_CMD_LAST] = {
+ [PVSCSI_CMD_FIRST] = {
+ .nargs = 0,
+ .fn = pvscsi_cmd_bad,
+ },
+ [PVSCSI_CMD_ADAPTER_RESET] = {
+ .nargs = 0,
+ .fn = pvscsi_cmd_adapter_reset
+ },
+ [PVSCSI_CMD_ISSUE_SCSI] = {
+ .nargs = 0, /* unknown */
+ .fn = pvscsi_cmd_unimpl
+ },
+ [PVSCSI_CMD_SETUP_RINGS] = {
+ .nargs = sizeof(struct PVSCSICmdDescSetupRings) / sizeof(uint32_t),
+ .fn = pvscsi_cmd_setup_rings
+ },
+ [PVSCSI_CMD_RESET_BUS] = {
+ .nargs = 0,
+ .fn = pvscsi_cmd_reset_bus
+ },
+ [PVSCSI_CMD_RESET_DEVICE] = {
+ .nargs = sizeof(struct PVSCSICmdDescResetDevice) / sizeof(uint32_t),
+ .fn = pvscsi_cmd_reset_device
+ },
+ [PVSCSI_CMD_ABORT_CMD] = {
+ .nargs = sizeof(struct PVSCSICmdDescAbortCmd) / sizeof(uint32_t),
+ .fn = pvscsi_cmd_abort_cmd
+ },
+ [PVSCSI_CMD_CONFIG] = {
+ .nargs = 0, /* unknown */
+ .fn = pvscsi_cmd_unimpl
+ },
+ [PVSCSI_CMD_SETUP_MSG_RING] = {
+ .nargs = sizeof(struct PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t),
+ .fn = pvscsi_cmd_setup_msg_ring
+ },
+ [PVSCSI_CMD_DEVICE_UNPLUG] = {
+ .nargs = 0, /* unknown */
+ .fn = pvscsi_cmd_unimpl
+ }
+};
+
+\f
+static void pvscsi_maybe_do_cmd(PVSCSIState *s)
+{
+ int cmd = s->cmd_latch >= PVSCSI_CMD_LAST ? PVSCSI_CMD_FIRST : s->cmd_latch;
+ const PVSCSICmd *cmd_info = &pvscsi_commands[cmd];
+
+ if (s->cmd_ptr >= cmd_info->nargs) {
+ s->cmd_status = cmd_info->fn(s);
+ s->cmd_latch = 0;
+ s->cmd_ptr = 0;
+ }
+}
+
+static uint32_t pvscsi_reg_readl(PVSCSIState *s, int offset)
+{
+ switch (offset) {
+ case PVSCSI_REG_OFFSET_COMMAND:
+ case PVSCSI_REG_OFFSET_COMMAND_DATA:
+ case PVSCSI_REG_OFFSET_KICK_NON_RW_IO:
+ case PVSCSI_REG_OFFSET_KICK_RW_IO:
+ fprintf(stderr, "vmw_pvscsi: read to write-only register %x\n", offset);
+ break;
+ case PVSCSI_REG_OFFSET_COMMAND_STATUS:
+ return s->cmd_status;
+ break;
+ case PVSCSI_REG_OFFSET_INTR_STATUS:
+ return s->intr_status;
+ break;
+ case PVSCSI_REG_OFFSET_INTR_MASK:
+ return s->intr_mask;
+ break;
+ case PVSCSI_REG_OFFSET_LAST_STS_0:
+ case PVSCSI_REG_OFFSET_LAST_STS_1:
+ case PVSCSI_REG_OFFSET_LAST_STS_2:
+ case PVSCSI_REG_OFFSET_LAST_STS_3:
+ case PVSCSI_REG_OFFSET_DEBUG:
+ fprintf(stderr, "vmw_pvscsi: read from unsupported register %x\n", offset);
+ break;
+ default:
+ break;
+ }
+ return 0;
+}
+
+static void pvscsi_reg_write(PVSCSIState *s, int offset, uint32_t val, int size)
+{
+ if (size != 4) {
+ switch (offset) {
+ case PVSCSI_REG_OFFSET_COMMAND:
+ case PVSCSI_REG_OFFSET_COMMAND_DATA:
+ case PVSCSI_REG_OFFSET_COMMAND_STATUS:
+ case PVSCSI_REG_OFFSET_INTR_STATUS:
+ case PVSCSI_REG_OFFSET_INTR_MASK:
+ abort();
+ default:
+ break;
+ }
+ }
+
+ switch (offset) {
+ case PVSCSI_REG_OFFSET_COMMAND:
+ trace_pvscsi_cmd(val);
+ s->cmd_latch = val;
+ s->cmd_ptr = 0;
+ pvscsi_maybe_do_cmd(s);
+ break;
+ case PVSCSI_REG_OFFSET_COMMAND_DATA:
+ s->cmd_buffer[s->cmd_ptr++] = val;
+ pvscsi_maybe_do_cmd(s);
+ break;
+ case PVSCSI_REG_OFFSET_COMMAND_STATUS:
+ fprintf(stderr, "vmw_pvscsi: write to read-only register %x\n", offset);
+ break;
+ case PVSCSI_REG_OFFSET_INTR_STATUS:
+ pvscsi_acknowledge_intr(s, val);
+ break;
+ case PVSCSI_REG_OFFSET_INTR_MASK:
+ pvscsi_set_intr_mask(s, val);
+ break;
+ case PVSCSI_REG_OFFSET_KICK_NON_RW_IO:
+ case PVSCSI_REG_OFFSET_KICK_RW_IO:
+ pvscsi_process_req_ring(s);
+ break;
+
+ case PVSCSI_REG_OFFSET_LAST_STS_0:
+ case PVSCSI_REG_OFFSET_LAST_STS_1:
+ case PVSCSI_REG_OFFSET_LAST_STS_2:
+ case PVSCSI_REG_OFFSET_LAST_STS_3:
+ case PVSCSI_REG_OFFSET_DEBUG:
+ fprintf(stderr, "vmw_pvscsi: write to unsupported register %x\n", offset);
+ break;
+ default:
+ break;
+ }
+}
+
+static void pvscsi_mmio_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+ PVSCSIState *s = opaque;
+
+ addr &= PVSCSI_MEM_SPACE_SIZE - 1;
+ pvscsi_reg_write(s, addr, val, 1);
+}
+
+static void pvscsi_mmio_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+ PVSCSIState *s = opaque;
+
+ addr &= PVSCSI_MEM_SPACE_SIZE - 1;
+ pvscsi_reg_write(s, addr, val, 2);
+}
+
+static void pvscsi_mmio_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+ PVSCSIState *s = opaque;
+
+ addr &= PVSCSI_MEM_SPACE_SIZE - 1;
+ pvscsi_reg_write(s, addr, val, 4);
+}
+
+static uint32_t pvscsi_mmio_readb(void *opaque, target_phys_addr_t addr)
+{
+ abort();
+}
+
+static uint32_t pvscsi_mmio_readw(void *opaque, target_phys_addr_t addr)
+{
+ abort();
+}
+
+static uint32_t pvscsi_mmio_readl(void *opaque, target_phys_addr_t addr)
+{
+ PVSCSIState *s = opaque;
+
+ addr &= PVSCSI_MEM_SPACE_SIZE - 1;
+ return pvscsi_reg_readl(s, addr);
+}
+
+static CPUReadMemoryFunc * const pvscsi_mmio_readfn[3] = {
+ pvscsi_mmio_readb,
+ pvscsi_mmio_readw,
+ pvscsi_mmio_readl,
+};
+
+static CPUWriteMemoryFunc * const pvscsi_mmio_writefn[3] = {
+ pvscsi_mmio_writeb,
+ pvscsi_mmio_writew,
+ pvscsi_mmio_writel,
+};
+
+static void pvscsi_reset(DeviceState *dev)
+{
+ PVSCSIState *s = DO_UPCAST(PVSCSIState, dev.qdev, dev);
+
+ pvscsi_soft_reset(s);
+}
+
+static int pvscsi_uninit(PCIDevice *d)
+{
+ PVSCSIState *s = DO_UPCAST(PVSCSIState, dev, d);
+
+ cpu_unregister_io_memory(s->mmio_io_addr);
+
+ return 0;
+}
+
+static struct SCSIBusOps pvscsi_scsi_ops = {
+ .transfer_data = pvscsi_transfer_data,
+ .complete = pvscsi_command_complete,
+ .cancel = pvscsi_request_cancelled,
+};
+
+static int pvscsi_init(PCIDevice *dev)
+{
+ PVSCSIState *s = DO_UPCAST(PVSCSIState, dev, dev);
+ uint8_t *pci_conf;
+
+ pci_conf = s->dev.config;
+
+ pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_VMWARE);
+ pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_VMWARE_PVSCSI);
+ pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_SCSI);
+
+ /* PCI subsystem ID */
+ pci_conf[PCI_SUBSYSTEM_ID] = 0x00;
+ pci_conf[PCI_SUBSYSTEM_ID + 1] = 0x10;
+
+ /* PCI latency timer = 255 */
+ pci_conf[PCI_LATENCY_TIMER] = 0xff;
+
+ /* Interrupt pin 1 */
+ pci_conf[PCI_INTERRUPT_PIN] = 0x01;
+
+ s->mmio_io_addr = cpu_register_io_memory(pvscsi_mmio_readfn,
+ pvscsi_mmio_writefn, s,
+ DEVICE_NATIVE_ENDIAN);
+ pci_register_bar_simple(&s->dev, 0, PVSCSI_MEM_SPACE_SIZE,
+ 0, s->mmio_io_addr);
+
+#if 0
+ s->pio_io_addr = cpu_register_io_memory(pvscsi_mmio_readfn,
+ pvscsi_mmio_writefn, s,
+ DEVICE_NATIVE_ENDIAN);
+ pci_register_bar(&s->dev, 1, 256, PCI_BASE_ADDRESS_SPACE_IO,
+ pvscsi_io_mapfunc);
+#endif
+
+ s->complete_reqs_bh = qemu_bh_new(pvscsi_complete_reqs, s);
+
+ scsi_bus_new(&s->bus, &dev->qdev, 1, PVSCSI_MAX_DEVS,
+ &pvscsi_scsi_ops);
+ if (!dev->qdev.hotplugged) {
+ return scsi_bus_legacy_handle_cmdline(&s->bus);
+ }
+ return 0;
+}
+
+static PCIDeviceInfo pvscsi_info = {
+ .qdev.name = "vmw_pvscsi",
+ .qdev.size = sizeof(PVSCSIState),
+ .qdev.reset = pvscsi_reset,
+ .init = pvscsi_init,
+ .exit = pvscsi_uninit,
+ .qdev.props = (Property[]) {
+ DEFINE_PROP_BIT("sg", PVSCSIState, use_iovec, 0, true),
+ DEFINE_PROP_END_OF_LIST(),
+ },
+};
+
+static void vmw_pvscsi_register_devices(void)
+{
+ pci_qdev_register(&pvscsi_info);
+}
+
+device_init(vmw_pvscsi_register_devices);
diff --git a/hw/vmw_pvscsi.h b/hw/vmw_pvscsi.h
new file mode 100644
index 0000000..b7fa3f6
--- /dev/null
+++ b/hw/vmw_pvscsi.h
@@ -0,0 +1,389 @@
+/*
+ * VMware PVSCSI header file
+ *
+ * Copyright (C) 2008-2009, VMware, Inc. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; version 2 of the License and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
+ * NON INFRINGEMENT. See the GNU General Public License for more
+ * details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#ifndef _VMW_PVSCSI_H_
+#define _VMW_PVSCSI_H_
+
+#define PVSCSI_MAX_NUM_SG_ENTRIES_PER_SEGMENT 128
+
+#define MASK(n) ((1 << (n)) - 1) /* make an n-bit mask */
+
+#define __packed __attribute__((packed))
+
+/*
+ * host adapter status/error codes
+ */
+enum HostBusAdapterStatus {
+ BTSTAT_SUCCESS = 0x00, /* CCB complete normally with no errors */
+ BTSTAT_LINKED_COMMAND_COMPLETED = 0x0a,
+ BTSTAT_LINKED_COMMAND_COMPLETED_WITH_FLAG = 0x0b,
+ BTSTAT_DATA_UNDERRUN = 0x0c,
+ BTSTAT_SELTIMEO = 0x11, /* SCSI selection timeout */
+ BTSTAT_DATARUN = 0x12, /* data overrun/underrun */
+ BTSTAT_BUSFREE = 0x13, /* unexpected bus free */
+ BTSTAT_INVPHASE = 0x14, /* invalid bus phase or sequence requested by target */
+ BTSTAT_LUNMISMATCH = 0x17, /* linked CCB has different LUN from first CCB */
+ BTSTAT_SENSFAILED = 0x1b, /* auto request sense failed */
+ BTSTAT_TAGREJECT = 0x1c, /* SCSI II tagged queueing message rejected by target */
+ BTSTAT_BADMSG = 0x1d, /* unsupported message received by the host adapter */
+ BTSTAT_HAHARDWARE = 0x20, /* host adapter hardware failed */
+ BTSTAT_NORESPONSE = 0x21, /* target did not respond to SCSI ATN, sent a SCSI RST */
+ BTSTAT_SENTRST = 0x22, /* host adapter asserted a SCSI RST */
+ BTSTAT_RECVRST = 0x23, /* other SCSI devices asserted a SCSI RST */
+ BTSTAT_DISCONNECT = 0x24, /* target device reconnected improperly (w/o tag) */
+ BTSTAT_BUSRESET = 0x25, /* host adapter issued BUS device reset */
+ BTSTAT_ABORTQUEUE = 0x26, /* abort queue generated */
+ BTSTAT_HASOFTWARE = 0x27, /* host adapter software error */
+ BTSTAT_HATIMEOUT = 0x30, /* host adapter hardware timeout error */
+ BTSTAT_SCSIPARITY = 0x34, /* SCSI parity error detected */
+};
+
+/*
+ * Register offsets.
+ *
+ * These registers are accessible both via i/o space and mm i/o.
+ */
+
+enum PVSCSIRegOffset {
+ PVSCSI_REG_OFFSET_COMMAND = 0x0,
+ PVSCSI_REG_OFFSET_COMMAND_DATA = 0x4,
+ PVSCSI_REG_OFFSET_COMMAND_STATUS = 0x8,
+ PVSCSI_REG_OFFSET_LAST_STS_0 = 0x100,
+ PVSCSI_REG_OFFSET_LAST_STS_1 = 0x104,
+ PVSCSI_REG_OFFSET_LAST_STS_2 = 0x108,
+ PVSCSI_REG_OFFSET_LAST_STS_3 = 0x10c,
+ PVSCSI_REG_OFFSET_INTR_STATUS = 0x100c,
+ PVSCSI_REG_OFFSET_INTR_MASK = 0x2010,
+ PVSCSI_REG_OFFSET_KICK_NON_RW_IO = 0x3014,
+ PVSCSI_REG_OFFSET_DEBUG = 0x3018,
+ PVSCSI_REG_OFFSET_KICK_RW_IO = 0x4018,
+};
+
+/*
+ * Virtual h/w commands.
+ */
+
+enum PVSCSICommands {
+ PVSCSI_CMD_FIRST = 0, /* has to be first */
+
+ PVSCSI_CMD_ADAPTER_RESET = 1,
+ PVSCSI_CMD_ISSUE_SCSI = 2,
+ PVSCSI_CMD_SETUP_RINGS = 3,
+ PVSCSI_CMD_RESET_BUS = 4,
+ PVSCSI_CMD_RESET_DEVICE = 5,
+ PVSCSI_CMD_ABORT_CMD = 6,
+ PVSCSI_CMD_CONFIG = 7,
+ PVSCSI_CMD_SETUP_MSG_RING = 8,
+ PVSCSI_CMD_DEVICE_UNPLUG = 9,
+
+ PVSCSI_CMD_LAST = 10 /* has to be last */
+};
+
+/*
+ * Command descriptor for PVSCSI_CMD_RESET_DEVICE --
+ */
+
+struct PVSCSICmdDescResetDevice {
+ uint32_t target;
+ uint8_t lun[8];
+} __packed;
+
+/*
+ * Command descriptor for PVSCSI_CMD_ABORT_CMD --
+ *
+ * - currently does not support specifying the LUN.
+ * - _pad should be 0.
+ */
+
+struct PVSCSICmdDescAbortCmd {
+ uint64_t context;
+ uint32_t target;
+ uint32_t _pad;
+} __packed;
+
+/*
+ * Command descriptor for PVSCSI_CMD_SETUP_RINGS --
+ *
+ * Notes:
+ * - reqRingNumPages and cmpRingNumPages need to be power of two.
+ * - reqRingNumPages and cmpRingNumPages need to be different from 0,
+ * - reqRingNumPages and cmpRingNumPages need to be inferior to
+ * PVSCSI_SETUP_RINGS_MAX_NUM_PAGES.
+ */
+
+#define PVSCSI_SETUP_RINGS_MAX_NUM_PAGES 32
+struct PVSCSICmdDescSetupRings {
+ uint32_t reqRingNumPages;
+ uint32_t cmpRingNumPages;
+ uint64_t ringsStatePPN;
+ uint64_t reqRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+ uint64_t cmpRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
+} __packed;
+
+/*
+ * Command descriptor for PVSCSI_CMD_SETUP_MSG_RING --
+ *
+ * Notes:
+ * - this command was not supported in the initial revision of the h/w
+ * interface. Before using it, you need to check that it is supported by
+ * writing PVSCSI_CMD_SETUP_MSG_RING to the 'command' register, then
+ * immediately after read the 'command status' register:
+ * * a value of -1 means that the cmd is NOT supported,
+ * * a value != -1 means that the cmd IS supported.
+ * If it's supported the 'command status' register should return:
+ * sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(uint32_t).
+ * - this command should be issued _after_ the usual SETUP_RINGS so that the
+ * RingsState page is already setup. If not, the command is a nop.
+ * - numPages needs to be a power of two,
+ * - numPages needs to be different from 0,
+ * - _pad should be zero.
+ */
+
+#define PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES 16
+
+struct PVSCSICmdDescSetupMsgRing {
+ uint32_t numPages;
+ uint32_t _pad;
+ uint64_t ringPPNs[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
+} __packed;
+
+enum PVSCSIMsgType {
+ PVSCSI_MSG_DEV_ADDED = 0,
+ PVSCSI_MSG_DEV_REMOVED = 1,
+ PVSCSI_MSG_LAST = 2,
+};
+
+/*
+ * Msg descriptor.
+ *
+ * sizeof(struct PVSCSIRingMsgDesc) == 128.
+ *
+ * - type is of type enum PVSCSIMsgType.
+ * - the content of args depend on the type of event being delivered.
+ */
+
+struct PVSCSIRingMsgDesc {
+ uint32_t type;
+ uint32_t args[31];
+} __packed;
+
+struct PVSCSIMsgDescDevStatusChanged {
+ uint32_t type; /* PVSCSI_MSG_DEV _ADDED / _REMOVED */
+ uint32_t bus;
+ uint32_t target;
+ uint8_t lun[8];
+ uint32_t pad[27];
+} __packed;
+
+/*
+ * Rings state.
+ *
+ * - the fields:
+ * . msgProdIdx,
+ * . msgConsIdx,
+ * . msgNumEntriesLog2,
+ * .. are only used once the SETUP_MSG_RING cmd has been issued.
+ * - '_pad' helps to ensure that the msg related fields are on their own
+ * cache-line.
+ */
+
+struct PVSCSIRingsState {
+ uint32_t reqProdIdx;
+ uint32_t reqConsIdx;
+ uint32_t reqNumEntriesLog2;
+
+ uint32_t cmpProdIdx;
+ uint32_t cmpConsIdx;
+ uint32_t cmpNumEntriesLog2;
+
+ uint8_t _pad[104];
+
+ uint32_t msgProdIdx;
+ uint32_t msgConsIdx;
+ uint32_t msgNumEntriesLog2;
+} __packed;
+
+/*
+ * Request descriptor.
+ *
+ * sizeof(RingReqDesc) = 128
+ *
+ * - context: is a unique identifier of a command. It could normally be any
+ * 64bit value, however we currently store it in the serialNumber variable
+ * of struct SCSI_Command, so we have the following restrictions due to the
+ * way this field is handled in the vmkernel storage stack:
+ * * this value can't be 0,
+ * * the upper 32bit need to be 0 since serialNumber is as a uint32_t.
+ * Currently tracked as PR 292060.
+ * - dataLen: contains the total number of bytes that need to be transferred.
+ * - dataAddr:
+ * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is set: dataAddr is the PA of the first
+ * s/g table segment, each s/g segment is entirely contained on a single
+ * page of physical memory,
+ * * if PVSCSI_FLAG_CMD_WITH_SG_LIST is NOT set, then dataAddr is the PA of
+ * the buffer used for the DMA transfer,
+ * - flags:
+ * * PVSCSI_FLAG_CMD_WITH_SG_LIST: see dataAddr above,
+ * * PVSCSI_FLAG_CMD_DIR_NONE: no DMA involved,
+ * * PVSCSI_FLAG_CMD_DIR_TOHOST: transfer from device to main memory,
+ * * PVSCSI_FLAG_CMD_DIR_TODEVICE: transfer from main memory to device,
+ * * PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB: reserved to handle CDBs larger than
+ * 16bytes. To be specified.
+ * - vcpuHint: vcpuId of the processor that will be most likely waiting for the
+ * completion of the i/o. For guest OSes that use lowest priority message
+ * delivery mode (such as windows), we use this "hint" to deliver the
+ * completion action to the proper vcpu. For now, we can use the vcpuId of
+ * the processor that initiated the i/o as a likely candidate for the vcpu
+ * that will be waiting for the completion..
+ * - bus should be 0: we currently only support bus 0 for now.
+ * - unused should be zero'd.
+ */
+
+#define PVSCSI_FLAG_CMD_WITH_SG_LIST (1 << 0)
+#define PVSCSI_FLAG_CMD_OUT_OF_BAND_CDB (1 << 1)
+#define PVSCSI_FLAG_CMD_DIR_NONE (1 << 2)
+#define PVSCSI_FLAG_CMD_DIR_TOHOST (1 << 3)
+#define PVSCSI_FLAG_CMD_DIR_TODEVICE (1 << 4)
+
+struct PVSCSIRingReqDesc {
+ uint64_t context;
+ uint64_t dataAddr;
+ uint64_t dataLen;
+ uint64_t senseAddr;
+ uint32_t senseLen;
+ uint32_t flags;
+ uint8_t cdb[16];
+ uint8_t cdbLen;
+ uint8_t lun[8];
+ uint8_t tag;
+ uint8_t bus;
+ uint8_t target;
+ uint8_t vcpuHint;
+ uint8_t unused[59];
+} __packed;
+
+/*
+ * Scatter-gather list management.
+ *
+ * As described above, when PVSCSI_FLAG_CMD_WITH_SG_LIST is set in the
+ * RingReqDesc.flags, then RingReqDesc.dataAddr is the PA of the first s/g
+ * table segment.
+ *
+ * - each segment of the s/g table contain a succession of struct
+ * PVSCSISGElement.
+ * - each segment is entirely contained on a single physical page of memory.
+ * - a "chain" s/g element has the flag PVSCSI_SGE_FLAG_CHAIN_ELEMENT set in
+ * PVSCSISGElement.flags and in this case:
+ * * addr is the PA of the next s/g segment,
+ * * length is undefined, assumed to be 0.
+ */
+
+struct PVSCSISGElement {
+ uint64_t addr;
+ uint32_t length;
+ uint32_t flags;
+} __packed;
+
+/*
+ * Completion descriptor.
+ *
+ * sizeof(RingCmpDesc) = 32
+ *
+ * - context: identifier of the command. The same thing that was specified
+ * under "context" as part of struct RingReqDesc at initiation time,
+ * - dataLen: number of bytes transferred for the actual i/o operation,
+ * - senseLen: number of bytes written into the sense buffer,
+ * - hostStatus: adapter status,
+ * - scsiStatus: device status,
+ * - _pad should be zero.
+ */
+
+struct PVSCSIRingCmpDesc {
+ uint64_t context;
+ uint64_t dataLen;
+ uint32_t senseLen;
+ uint16_t hostStatus;
+ uint16_t scsiStatus;
+ uint32_t _pad[2];
+} __packed;
+
+/*
+ * Interrupt status / IRQ bits.
+ */
+
+#define PVSCSI_INTR_CMPL_0 (1 << 0)
+#define PVSCSI_INTR_CMPL_1 (1 << 1)
+#define PVSCSI_INTR_CMPL_MASK MASK(2)
+
+#define PVSCSI_INTR_MSG_0 (1 << 2)
+#define PVSCSI_INTR_MSG_1 (1 << 3)
+#define PVSCSI_INTR_MSG_MASK (MASK(2) << 2)
+
+#define PVSCSI_INTR_ALL_SUPPORTED MASK(4)
+
+/*
+ * Number of MSI-X vectors supported.
+ */
+#define PVSCSI_MAX_INTRS 24
+
+/*
+ * Enumeration of supported MSI-X vectors
+ */
+#define PVSCSI_VECTOR_COMPLETION 0
+
+/*
+ * Misc constants for the rings.
+ */
+
+#define PVSCSI_MAX_NUM_PAGES_REQ_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
+#define PVSCSI_MAX_NUM_PAGES_CMP_RING PVSCSI_SETUP_RINGS_MAX_NUM_PAGES
+#define PVSCSI_MAX_NUM_PAGES_MSG_RING PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES
+
+#define PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE \
+ (PAGE_SIZE / sizeof(struct PVSCSIRingReqDesc))
+
+#define PVSCSI_MAX_REQ_QUEUE_DEPTH \
+ (PVSCSI_MAX_NUM_PAGES_REQ_RING * PVSCSI_MAX_NUM_REQ_ENTRIES_PER_PAGE)
+
+#define PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES 1
+#define PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES 1
+#define PVSCSI_MEM_SPACE_MISC_NUM_PAGES 2
+#define PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES 2
+#define PVSCSI_MEM_SPACE_MSIX_NUM_PAGES 2
+
+enum PVSCSIMemSpace {
+ PVSCSI_MEM_SPACE_COMMAND_PAGE = 0,
+ PVSCSI_MEM_SPACE_INTR_STATUS_PAGE = 1,
+ PVSCSI_MEM_SPACE_MISC_PAGE = 2,
+ PVSCSI_MEM_SPACE_KICK_IO_PAGE = 4,
+ PVSCSI_MEM_SPACE_MSIX_TABLE_PAGE = 6,
+ PVSCSI_MEM_SPACE_MSIX_PBA_PAGE = 7,
+};
+
+#define PVSCSI_MEM_SPACE_NUM_PAGES \
+ (PVSCSI_MEM_SPACE_COMMAND_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_INTR_STATUS_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_MISC_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_KICK_IO_NUM_PAGES + \
+ PVSCSI_MEM_SPACE_MSIX_NUM_PAGES)
+
+#define PVSCSI_MEM_SPACE_SIZE (PVSCSI_MEM_SPACE_NUM_PAGES * PAGE_SIZE)
+
+#endif /* _VMW_PVSCSI_H_ */
diff --git a/trace-events b/trace-events
index c565748..42231c1 100644
--- a/trace-events
+++ b/trace-events
@@ -254,6 +254,21 @@ disable scsi_inquiry(int target, int lun, int tag, int cdb1, int cdb2) "target %
disable scsi_test_unit_ready(int target, int lun, int tag) "target %d lun %d tag %d"
disable scsi_request_sense(int target, int lun, int tag) "target %d lun %d tag %d"
+# hw/vmw_pvscsi.c
+disable pvscsi_queue_request(uint64_t context, uint8_t command, uint64_t dataLen) "context %"PRIu64" command %d length %"PRIu64""
+disable pvscsi_sg_elem(uint64_t context, uint64_t addr, uint64_t length) "context %"PRIu64" addr %"PRIu64" length %"PRIu64""
+disable pvscsi_transfer_data(uint64_t context, uint64_t length) "context %"PRIu64" length %"PRIu64""
+disable pvscsi_request_sense(uint64_t context, int lun) "context %"PRIu64" lun %d"
+disable pvscsi_kick_io(void) "kick request ring"
+disable pvscsi_complete_req(uint64_t context, uint64_t length, uint8_t sense) "context %"PRIu64" length %"PRIu64" sense %d"
+disable pvscsi_cmp_ring_put(uint64_t context) "context %"PRIu64""
+disable pvscsi_raise_intr(uint32_t intr, const char *state) "raised intr %d %s"
+disable pvscsi_acknowledge_intr(uint32_t intr) "acknowledged intr %d"
+disable pvscsi_setup_req_ring(uint32_t pages, uint32_t entries) "req ring - %d pages %d entries"
+disable pvscsi_setup_cmp_ring(uint32_t pages, uint32_t entries) "cmp ring - %d pages %d entries"
+disable pvscsi_setup_msg_ring(uint32_t pages, uint32_t entries) "msg ring - %d pages %d entries"
+disable pvscsi_cmd(int cmd) "command %d"
+
# vl.c
disable vm_state_notify(int running, int reason) "running %d reason %d"
--
1.7.6
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
` (10 preceding siblings ...)
2011-08-04 17:14 ` [Qemu-devel] [PATCH 11/11] sample pvscsi driver with s/g support Paolo Bonzini
@ 2011-08-11 7:57 ` Stefan Hajnoczi
11 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2011-08-11 7:57 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Thu, Aug 04, 2011 at 07:14:38PM +0200, Paolo Bonzini wrote:
> Hi,
>
> this is the version of SCSI scatter/gather based on the existing
> DMA helpers infrastructure.
>
> The infrastructure required a little update because I need to
> know the residual amount of data upon short transfers. To this
> end, my choice was to make QEMUSGList mutable and track the
> current position in there. Any other ideas are welcome, the
> reason for this choice is explained in patch 2.
>
> The patches are quite self-contained, but they depend on the
> changes I posted yesterday.
>
> Patch 11 is the sample vmw_pvscsi device model that I used to
> test the code.
This is a good opportunity to rename is_write in dma-helpers because it
is confusing.
The problem is that bdrv_*() is_write indicates whether the I/O request
is a read or write. But in cpu_physical_memory_map() is_write indicates
whether we are writing to target memory.
These two is_write use cases actually have opposite meanings, therefore
the confusing code in dma-helpers.c today:
mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->is_write);
^^^^^^^^^^^^^^
Please use a DMA direction instead of is_write:
DMA-to-device means target->device memory transfer
DMA-from-device means device->target memory transfer
This patch series is a good place to do the rename because it adds more
instances of !dbs->is_write.
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-04 17:14 ` [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write Paolo Bonzini
@ 2011-08-11 7:58 ` Stefan Hajnoczi
2011-08-11 12:10 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2011-08-11 7:58 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel
On Thu, Aug 04, 2011 at 07:14:43PM +0200, Paolo Bonzini wrote:
> These helpers do a full transfer from an in-memory buffer to
> target memory, with full support for MMIO areas. It will be used to store
> the reply of an emulated command into a QEMUSGList provided by the
> adapter.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> cutils.c | 8 +++---
> dma-helpers.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> dma.h | 5 ++++
> 3 files changed, 72 insertions(+), 4 deletions(-)
I don't understand this patch. If we have a memory buffer that needs to
be transferred to target memory, then there is no need for bounce
buffers or cpu_physical_memory_map().
Can we use cpu_physical_memory_rw() on each sglist element instead? No
-EAGAIN necessary because the memory buffer already acts as the local
bounce buffer.
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 7:58 ` Stefan Hajnoczi
@ 2011-08-11 12:10 ` Paolo Bonzini
2011-08-11 13:29 ` Stefan Hajnoczi
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-11 12:10 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel
On 08/11/2011 09:58 AM, Stefan Hajnoczi wrote:
> On Thu, Aug 04, 2011 at 07:14:43PM +0200, Paolo Bonzini wrote:
>> These helpers do a full transfer from an in-memory buffer to
>> target memory, with full support for MMIO areas. It will be used to store
>> the reply of an emulated command into a QEMUSGList provided by the
>> adapter.
>>
>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>> ---
>> cutils.c | 8 +++---
>> dma-helpers.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> dma.h | 5 ++++
>> 3 files changed, 72 insertions(+), 4 deletions(-)
>
> I don't understand this patch. If we have a memory buffer that needs to
> be transferred to target memory, then there is no need for bounce
> buffers or cpu_physical_memory_map().
>
> Can we use cpu_physical_memory_rw() on each sglist element instead? No
> -EAGAIN necessary because the memory buffer already acts as the local
> bounce buffer.
Doh, you're obviously right. I don't know what I was thinking. :)
What do you think about passing the residual bytes for short transfers?
Should I look into updating BlockDriverCompletionFunc, or is the
approach of patch 2 okay? If I have an excuse to learn more about
Coccinelle, that can be fun. :)
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 12:10 ` Paolo Bonzini
@ 2011-08-11 13:29 ` Stefan Hajnoczi
2011-08-11 14:24 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2011-08-11 13:29 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel
On Thu, Aug 11, 2011 at 1:10 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/11/2011 09:58 AM, Stefan Hajnoczi wrote:
>>
>> On Thu, Aug 04, 2011 at 07:14:43PM +0200, Paolo Bonzini wrote:
>>>
>>> These helpers do a full transfer from an in-memory buffer to
>>> target memory, with full support for MMIO areas. It will be used to
>>> store
>>> the reply of an emulated command into a QEMUSGList provided by the
>>> adapter.
>>>
>>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>>> ---
>>> cutils.c | 8 +++---
>>> dma-helpers.c | 63
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> dma.h | 5 ++++
>>> 3 files changed, 72 insertions(+), 4 deletions(-)
>>
>> I don't understand this patch. If we have a memory buffer that needs to
>> be transferred to target memory, then there is no need for bounce
>> buffers or cpu_physical_memory_map().
>>
>> Can we use cpu_physical_memory_rw() on each sglist element instead? No
>> -EAGAIN necessary because the memory buffer already acts as the local
>> bounce buffer.
>
> Doh, you're obviously right. I don't know what I was thinking. :)
>
> What do you think about passing the residual bytes for short transfers?
> Should I look into updating BlockDriverCompletionFunc, or is the approach
> of patch 2 okay? If I have an excuse to learn more about Coccinelle, that
> can be fun. :)
The bdrv_aio_readv() and bdrv_aio_writev() functions don't have the
concept of residual bytes. They only work on fully completed I/O
operations. If there is an error they pass -errno. Therefore I don't
think BlockDriverCompletionFunc is the right type to add residual
bytes to.
It seems that residual bytes are a SCSI layer concept that the block
layer does not deal with.
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 13:29 ` Stefan Hajnoczi
@ 2011-08-11 14:24 ` Paolo Bonzini
2011-08-11 14:37 ` Kevin Wolf
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-11 14:24 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel
On 08/11/2011 03:29 PM, Stefan Hajnoczi wrote:
>> >
>> > What do you think about passing the residual bytes for short transfers?
>> > Should I look into updating BlockDriverCompletionFunc, or is the approach
>> > of patch 2 okay? If I have an excuse to learn more about Coccinelle, that
>> > can be fun.:)
> The bdrv_aio_readv() and bdrv_aio_writev() functions don't have the
> concept of residual bytes. They only work on fully completed I/O
> operations. If there is an error they pass -errno.
But if a transfer was split due to failure of cpu_physical_memory_map,
and only the second part fails, you can have a short transfer and you
need to pass residual bytes back. The only way out of this is to make a
bounce buffer as big as all the unmappable parts of the S/G list, which
is undesirable of course. So the residual bytes are a general DMA
concept, not specific to SCSI.
> Therefore I don't think BlockDriverCompletionFunc is the right type
> to add residual bytes to.
Right, I would rather update BlockDriverCompletionFunc to pass the AIOCB
as a third parameter, and store the residual bytes in the DMAAIOCB (with
a getter that the completion function can use).
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 14:24 ` Paolo Bonzini
@ 2011-08-11 14:37 ` Kevin Wolf
2011-08-11 15:05 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2011-08-11 14:37 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel
Am 11.08.2011 16:24, schrieb Paolo Bonzini:
> On 08/11/2011 03:29 PM, Stefan Hajnoczi wrote:
>>>>
>>>> What do you think about passing the residual bytes for short transfers?
>>>> Should I look into updating BlockDriverCompletionFunc, or is the approach
>>>> of patch 2 okay? If I have an excuse to learn more about Coccinelle, that
>>>> can be fun.:)
>> The bdrv_aio_readv() and bdrv_aio_writev() functions don't have the
>> concept of residual bytes. They only work on fully completed I/O
>> operations. If there is an error they pass -errno.
>
> But if a transfer was split due to failure of cpu_physical_memory_map,
> and only the second part fails, you can have a short transfer and you
> need to pass residual bytes back. The only way out of this is to make a
> bounce buffer as big as all the unmappable parts of the S/G list, which
> is undesirable of course. So the residual bytes are a general DMA
> concept, not specific to SCSI.
>
>> Therefore I don't think BlockDriverCompletionFunc is the right type
>> to add residual bytes to.
>
> Right, I would rather update BlockDriverCompletionFunc to pass the AIOCB
> as a third parameter, and store the residual bytes in the DMAAIOCB (with
> a getter that the completion function can use).
Isn't the DMAAIOCB already passed as opaque to the callback?
Kevin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 14:37 ` Kevin Wolf
@ 2011-08-11 15:05 ` Paolo Bonzini
2011-08-11 15:12 ` Kevin Wolf
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-11 15:05 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel
On 08/11/2011 04:37 PM, Kevin Wolf wrote:
> > Right, I would rather update BlockDriverCompletionFunc to pass the AIOCB
> > as a third parameter, and store the residual bytes in the DMAAIOCB (with
> > a getter that the completion function can use).
>
> Isn't the DMAAIOCB already passed as opaque to the callback?
It is passed to the dma_bdrv_cb, but not to the caller-provided
callback. If the operation completes before dma_bdrv_{read,write}
returns, the AIOCB is not stored anywhere and the asynchronous callback
does not have access to it. Usually it does not have anything to do
with it, but in this case it could get the residual.
Another possibility is always completing DMA in a bottom half. This
ensures that the callback can access the AIOCB, but it exposes an
implementation detail to the caller, so I don't like it.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 15:05 ` Paolo Bonzini
@ 2011-08-11 15:12 ` Kevin Wolf
2011-08-11 15:27 ` Paolo Bonzini
0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2011-08-11 15:12 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Stefan Hajnoczi, qemu-devel
Am 11.08.2011 17:05, schrieb Paolo Bonzini:
> On 08/11/2011 04:37 PM, Kevin Wolf wrote:
>>> Right, I would rather update BlockDriverCompletionFunc to pass the AIOCB
>>> as a third parameter, and store the residual bytes in the DMAAIOCB (with
>>> a getter that the completion function can use).
>>
>> Isn't the DMAAIOCB already passed as opaque to the callback?
>
> It is passed to the dma_bdrv_cb, but not to the caller-provided
> callback. If the operation completes before dma_bdrv_{read,write}
> returns, the AIOCB is not stored anywhere and the asynchronous callback
> does not have access to it. Usually it does not have anything to do
> with it, but in this case it could get the residual.
>
> Another possibility is always completing DMA in a bottom half. This
> ensures that the callback can access the AIOCB, but it exposes an
> implementation detail to the caller, so I don't like it.
At least in the block layer, AIO callbacks may never be called before
the submission function has returned. I think this makes the DMA helpers
provide the same behaviour.
But I'm not sure if the definition of the AIOCB struct isn't private to
the block layer.
Kevin
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 15:12 ` Kevin Wolf
@ 2011-08-11 15:27 ` Paolo Bonzini
2011-08-11 20:06 ` Stefan Hajnoczi
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2011-08-11 15:27 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel
On 08/11/2011 05:12 PM, Kevin Wolf wrote:
>> > Another possibility is always completing DMA in a bottom half. This
>> > ensures that the callback can access the AIOCB, but it exposes an
>> > implementation detail to the caller, so I don't like it.
>
> At least in the block layer, AIO callbacks may never be called before
> the submission function has returned. I think this makes the DMA helpers
> provide the same behaviour.
>
> But I'm not sure if the definition of the AIOCB struct isn't private to
> the block layer.
Yes, it is; I would add a getter that is specific to the DMAAIOCB.
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write
2011-08-11 15:27 ` Paolo Bonzini
@ 2011-08-11 20:06 ` Stefan Hajnoczi
0 siblings, 0 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2011-08-11 20:06 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel
On Thu, Aug 11, 2011 at 4:27 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/11/2011 05:12 PM, Kevin Wolf wrote:
>>>
>>> > Another possibility is always completing DMA in a bottom half. This
>>> > ensures that the callback can access the AIOCB, but it exposes an
>>> > implementation detail to the caller, so I don't like it.
>>
>> At least in the block layer, AIO callbacks may never be called before
>> the submission function has returned. I think this makes the DMA helpers
>> provide the same behaviour.
>>
>> But I'm not sure if the definition of the AIOCB struct isn't private to
>> the block layer.
>
> Yes, it is; I would add a getter that is specific to the DMAAIOCB.
You don't need to make the dma_bdrv_io() cb function a
BlockDriverCompletionFunc. Instead define a DMACompletionFunc:
typedef void DMACompletionFunc(void *opaque, int ret,
target_phys_addr_t residual);
The only one invoking dbs->common.cb is dma-helpers.c so you can just
avoid using common.cb and instead use a DMACompletionFunc cb.
Perhaps the AIOCB concept can be made generic where BlockDriverAIOCB
inherits from it. I'm not even so sure that keeping a
BlockDriverState pointer around is that useful. At least
dma-helpers.c seems to think it's too much typing and it just
duplicates the bs field into DMAAIOCB directly :). The question then
becomes how to abstract the typed callback function nicely.
Stefan
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-08-11 20:07 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-04 17:14 [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 01/10] dma-helpers: allow including from target-independent code Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 02/10] dma-helpers: track position in the QEMUSGList Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 03/10] dma-helpers: rewrite completion/cancellation Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 04/10] dma-helpers: prepare for adding dma_buf_* functions Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 05/10] dma-helpers: add dma_buf_read and dma_buf_write Paolo Bonzini
2011-08-11 7:58 ` Stefan Hajnoczi
2011-08-11 12:10 ` Paolo Bonzini
2011-08-11 13:29 ` Stefan Hajnoczi
2011-08-11 14:24 ` Paolo Bonzini
2011-08-11 14:37 ` Kevin Wolf
2011-08-11 15:05 ` Paolo Bonzini
2011-08-11 15:12 ` Kevin Wolf
2011-08-11 15:27 ` Paolo Bonzini
2011-08-11 20:06 ` Stefan Hajnoczi
2011-08-04 17:14 ` [Qemu-devel] [PATCH 06/10] scsi: pass residual amount to command_complete Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 07/10] scsi: add scatter/gather functionality Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 08/10] scsi-disk: commonize iovec creation between reads and writes Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 09/10] scsi-disk: lazily allocate bounce buffer Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 10/10] scsi-disk: enable scatter/gather functionality Paolo Bonzini
2011-08-04 17:14 ` [Qemu-devel] [PATCH 11/11] sample pvscsi driver with s/g support Paolo Bonzini
2011-08-11 7:57 ` [Qemu-devel] [PATCH 00/10] SCSI scatter/gather support Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).