[Qemu-devel] [PATCH v2 00/15] nbd improvements

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 00/15] nbd improvements
@ 2011-09-16 14:25 Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers Paolo Bonzini
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Here is v2 of the nbd improvements series.  It is based on Kevin's
block branch, currently at 2168851.

Compared to v1, I reordered the patches in more logical blocks.
Patches 1-4 touch sheepdog, patches 5-8 (including Stefan bdrv_flush
patch, rebased) touch the generic block layer, patches 9-15 touch NBD.

I also implemented multiple in-flight requests for the NBD client.  The
server is still serial, so there's not much benefit, but I checked that
all code paths are hit.  The changes are easy and actually cleanup the
code compared to v1.  However, it requires making discard asynchronous,
because the client has to cope with getting other replies before the
one for discard.  This is patch 8.

I didn't include the two patches already in the block branch.

v1->v2:
      moved coroutine send/recv functions out of qemu-coroutine.c
      block: added support for co_discard and aio_discard
      nbd: added support for multiple in-flight requests

Paolo Bonzini (14):
  sheepdog: add coroutine_fn markers
  add socket_set_block
  sheepdog: move coroutine send/recv function to generic code
  coroutine-io: handle zero returns from recv
  block: group together the plugging of synchronous IO emulation
  block: add bdrv_co_flush support
  block: add bdrv_co_discard and bdrv_aio_discard support
  nbd: fix error handling in the server
  nbd: add support for NBD_CMD_FLUSH
  nbd: add support for NBD_CMD_FLAG_FUA
  nbd: add support for NBD_CMD_TRIM
  nbd: switch to asynchronous operation
  nbd: split requests
  nbd: allow multiple in-flight requests

Stefan Hajnoczi (1):
  block: emulate .bdrv_flush() using .bdrv_aio_flush()

 block.c           |  228 +++++++++++++++++++++++++++++++++++++----
 block.h           |    3 +
 block/blkdebug.c  |    6 -
 block/blkverify.c |    9 --
 block/nbd.c       |  292 ++++++++++++++++++++++++++++++++++++++++++++++-------
 block/qcow.c      |    6 -
 block/qcow2.c     |   19 ----
 block/qed.c       |    6 -
 block/raw-posix.c |   18 ----
 block/sheepdog.c  |  239 ++++++--------------------------------------
 block_int.h       |   10 ++-
 cutils.c          |  183 +++++++++++++++++++++++++++++++++
 nbd.c             |   66 ++++++++++--
 oslib-posix.c     |    7 ++
 oslib-win32.c     |    6 +
 qemu-common.h     |   30 ++++++
 qemu_socket.h     |    1 +
 trace-events      |    1 +
 18 files changed, 786 insertions(+), 344 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-17  5:32   ` MORITA Kazutaka
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 02/15] add socket_set_block Paolo Bonzini
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: MORITA Kazutaka

This makes the following patch easier to review.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/sheepdog.c |   14 +++++++-------
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index c1f6e07..af696a5 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -396,7 +396,7 @@ static inline int free_aio_req(BDRVSheepdogState *s, AIOReq *aio_req)
     return !QLIST_EMPTY(&acb->aioreq_head);
 }
 
-static void sd_finish_aiocb(SheepdogAIOCB *acb)
+static void coroutine_fn sd_finish_aiocb(SheepdogAIOCB *acb)
 {
     if (!acb->canceled) {
         qemu_coroutine_enter(acb->coroutine, NULL);
@@ -735,7 +735,7 @@ out:
     return ret;
 }
 
-static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
                            struct iovec *iov, int niov, int create,
                            enum AIOCBState aiocb_type);
 
@@ -743,7 +743,7 @@ static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
  * This function searchs pending requests to the object `oid', and
  * sends them.
  */
-static void send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id)
+static void coroutine_fn send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id)
 {
     AIOReq *aio_req, *next;
     SheepdogAIOCB *acb;
@@ -777,7 +777,7 @@ static void send_pending_req(BDRVSheepdogState *s, uint64_t oid, uint32_t id)
  * This function is registered as a fd handler, and called from the
  * main loop when s->fd is ready for reading responses.
  */
-static void aio_read_response(void *opaque)
+static void coroutine_fn aio_read_response(void *opaque)
 {
     SheepdogObjRsp rsp;
     BDRVSheepdogState *s = opaque;
@@ -1064,7 +1064,7 @@ out:
     return ret;
 }
 
-static int add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
+static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
                            struct iovec *iov, int niov, int create,
                            enum AIOCBState aiocb_type)
 {
@@ -1517,7 +1517,7 @@ static int sd_truncate(BlockDriverState *bs, int64_t offset)
  * update metadata, this sends a write request to the vdi object.
  * Otherwise, this switches back to sd_co_readv/writev.
  */
-static void sd_write_done(SheepdogAIOCB *acb)
+static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
 {
     int ret;
     BDRVSheepdogState *s = acb->common.bs->opaque;
@@ -1615,7 +1615,7 @@ out:
  * Returns 1 when we need to wait a response, 0 when there is no sent
  * request and -errno in error cases.
  */
-static int sd_co_rw_vector(void *p)
+static int coroutine_fn sd_co_rw_vector(void *p)
 {
     SheepdogAIOCB *acb = p;
     int ret = 0;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 02/15] add socket_set_block
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-17  5:33   ` MORITA Kazutaka
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code Paolo Bonzini
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: MORITA Kazutaka

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 oslib-posix.c |    7 +++++++
 oslib-win32.c |    6 ++++++
 qemu_socket.h |    1 +
 3 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/oslib-posix.c b/oslib-posix.c
index a304fb0..dbc8ee8 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -103,6 +103,13 @@ void qemu_vfree(void *ptr)
     free(ptr);
 }
 
+void socket_set_block(int fd)
+{
+    int f;
+    f = fcntl(fd, F_GETFL);
+    fcntl(fd, F_SETFL, f & ~O_NONBLOCK);
+}
+
 void socket_set_nonblock(int fd)
 {
     int f;
diff --git a/oslib-win32.c b/oslib-win32.c
index 5f0759f..5e3de7d 100644
--- a/oslib-win32.c
+++ b/oslib-win32.c
@@ -73,6 +73,12 @@ void qemu_vfree(void *ptr)
     VirtualFree(ptr, 0, MEM_RELEASE);
 }
 
+void socket_set_block(int fd)
+{
+    unsigned long opt = 0;
+    ioctlsocket(fd, FIONBIO, &opt);
+}
+
 void socket_set_nonblock(int fd)
 {
     unsigned long opt = 1;
diff --git a/qemu_socket.h b/qemu_socket.h
index 180e4db..9e32fac 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -35,6 +35,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 /* misc helpers */
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
+void socket_set_block(int fd);
 void socket_set_nonblock(int fd);
 int send_all(int fd, const void *buf, int len1);
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 02/15] add socket_set_block Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-17  6:29   ` MORITA Kazutaka
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv Paolo Bonzini
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Outside coroutines, avoid busy waiting on EAGAIN by temporarily
making the socket blocking.

The API of qemu_recvv/qemu_sendv is slightly different from
do_readv/do_writev because they do not handle coroutines.  It
returns the number of bytes written before encountering an
EAGAIN.  The specificity of yielding on EAGAIN is entirely in
qemu-coroutine.c.

Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/sheepdog.c |  225 ++++++------------------------------------------------
 cutils.c         |  177 ++++++++++++++++++++++++++++++++++++++++++
 qemu-common.h    |   30 +++++++
 3 files changed, 230 insertions(+), 202 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index af696a5..94e62a3 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -443,129 +443,6 @@ static SheepdogAIOCB *sd_aio_setup(BlockDriverState *bs, QEMUIOVector *qiov,
     return acb;
 }
 
-#ifdef _WIN32
-
-struct msghdr {
-    struct iovec *msg_iov;
-    size_t        msg_iovlen;
-};
-
-static ssize_t sendmsg(int s, const struct msghdr *msg, int flags)
-{
-    size_t size = 0;
-    char *buf, *p;
-    int i, ret;
-
-    /* count the msg size */
-    for (i = 0; i < msg->msg_iovlen; i++) {
-        size += msg->msg_iov[i].iov_len;
-    }
-    buf = g_malloc(size);
-
-    p = buf;
-    for (i = 0; i < msg->msg_iovlen; i++) {
-        memcpy(p, msg->msg_iov[i].iov_base, msg->msg_iov[i].iov_len);
-        p += msg->msg_iov[i].iov_len;
-    }
-
-    ret = send(s, buf, size, flags);
-
-    g_free(buf);
-    return ret;
-}
-
-static ssize_t recvmsg(int s, struct msghdr *msg, int flags)
-{
-    size_t size = 0;
-    char *buf, *p;
-    int i, ret;
-
-    /* count the msg size */
-    for (i = 0; i < msg->msg_iovlen; i++) {
-        size += msg->msg_iov[i].iov_len;
-    }
-    buf = g_malloc(size);
-
-    ret = qemu_recv(s, buf, size, flags);
-    if (ret < 0) {
-        goto out;
-    }
-
-    p = buf;
-    for (i = 0; i < msg->msg_iovlen; i++) {
-        memcpy(msg->msg_iov[i].iov_base, p, msg->msg_iov[i].iov_len);
-        p += msg->msg_iov[i].iov_len;
-    }
-out:
-    g_free(buf);
-    return ret;
-}
-
-#endif
-
-/*
- * Send/recv data with iovec buffers
- *
- * This function send/recv data from/to the iovec buffer directly.
- * The first `offset' bytes in the iovec buffer are skipped and next
- * `len' bytes are used.
- *
- * For example,
- *
- *   do_send_recv(sockfd, iov, len, offset, 1);
- *
- * is equals to
- *
- *   char *buf = malloc(size);
- *   iov_to_buf(iov, iovcnt, buf, offset, size);
- *   send(sockfd, buf, size, 0);
- *   free(buf);
- */
-static int do_send_recv(int sockfd, struct iovec *iov, int len, int offset,
-                        int write)
-{
-    struct msghdr msg;
-    int ret, diff;
-
-    memset(&msg, 0, sizeof(msg));
-    msg.msg_iov = iov;
-    msg.msg_iovlen = 1;
-
-    len += offset;
-
-    while (iov->iov_len < len) {
-        len -= iov->iov_len;
-
-        iov++;
-        msg.msg_iovlen++;
-    }
-
-    diff = iov->iov_len - len;
-    iov->iov_len -= diff;
-
-    while (msg.msg_iov->iov_len <= offset) {
-        offset -= msg.msg_iov->iov_len;
-
-        msg.msg_iov++;
-        msg.msg_iovlen--;
-    }
-
-    msg.msg_iov->iov_base = (char *) msg.msg_iov->iov_base + offset;
-    msg.msg_iov->iov_len -= offset;
-
-    if (write) {
-        ret = sendmsg(sockfd, &msg, 0);
-    } else {
-        ret = recvmsg(sockfd, &msg, 0);
-    }
-
-    msg.msg_iov->iov_base = (char *) msg.msg_iov->iov_base - offset;
-    msg.msg_iov->iov_len += offset;
-
-    iov->iov_len += diff;
-    return ret;
-}
-
 static int connect_to_sdog(const char *addr, const char *port)
 {
     char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];
@@ -618,65 +495,6 @@ success:
     return fd;
 }
 
-static int do_readv_writev(int sockfd, struct iovec *iov, int len,
-                           int iov_offset, int write)
-{
-    int ret;
-again:
-    ret = do_send_recv(sockfd, iov, len, iov_offset, write);
-    if (ret < 0) {
-        if (errno == EINTR) {
-            goto again;
-        }
-        if (errno == EAGAIN) {
-            if (qemu_in_coroutine()) {
-                qemu_coroutine_yield();
-            }
-            goto again;
-        }
-        error_report("failed to recv a rsp, %s", strerror(errno));
-        return 1;
-    }
-
-    iov_offset += ret;
-    len -= ret;
-    if (len) {
-        goto again;
-    }
-
-    return 0;
-}
-
-static int do_readv(int sockfd, struct iovec *iov, int len, int iov_offset)
-{
-    return do_readv_writev(sockfd, iov, len, iov_offset, 0);
-}
-
-static int do_writev(int sockfd, struct iovec *iov, int len, int iov_offset)
-{
-    return do_readv_writev(sockfd, iov, len, iov_offset, 1);
-}
-
-static int do_read_write(int sockfd, void *buf, int len, int write)
-{
-    struct iovec iov;
-
-    iov.iov_base = buf;
-    iov.iov_len = len;
-
-    return do_readv_writev(sockfd, &iov, len, 0, write);
-}
-
-static int do_read(int sockfd, void *buf, int len)
-{
-    return do_read_write(sockfd, buf, len, 0);
-}
-
-static int do_write(int sockfd, void *buf, int len)
-{
-    return do_read_write(sockfd, buf, len, 1);
-}
-
 static int send_req(int sockfd, SheepdogReq *hdr, void *data,
                     unsigned int *wlen)
 {
@@ -691,10 +509,9 @@ static int send_req(int sockfd, SheepdogReq *hdr, void *data,
         iov[1].iov_len = *wlen;
     }
 
-    ret = do_writev(sockfd, iov, sizeof(*hdr) + *wlen, 0);
-    if (ret) {
+    ret = qemu_sendv(sockfd, iov, sizeof(*hdr) + *wlen, 0);
+    if (ret < 0) {
         error_report("failed to send a req, %s", strerror(errno));
-        ret = -1;
     }
 
     return ret;
@@ -704,17 +521,19 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
                   unsigned int *wlen, unsigned int *rlen)
 {
     int ret;
+    struct iovec iov;
 
+    socket_set_block(sockfd);
     ret = send_req(sockfd, hdr, data, wlen);
-    if (ret) {
-        ret = -1;
+    if (ret < 0) {
         goto out;
     }
 
-    ret = do_read(sockfd, hdr, sizeof(*hdr));
-    if (ret) {
+    iov.iov_base = hdr;
+    iov.iov_len = sizeof(*hdr);
+    ret = qemu_recvv(sockfd, &iov, sizeof(*hdr), 0);
+    if (ret < 0) {
         error_report("failed to get a rsp, %s", strerror(errno));
-        ret = -1;
         goto out;
     }
 
@@ -723,15 +542,17 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
     }
 
     if (*rlen) {
-        ret = do_read(sockfd, data, *rlen);
-        if (ret) {
+        iov.iov_base = data;
+        iov.iov_len = *rlen;
+        ret = qemu_recvv(sockfd, &iov, *rlen, 0);
+        if (ret < 0) {
             error_report("failed to get the data, %s", strerror(errno));
-            ret = -1;
             goto out;
         }
     }
     ret = 0;
 out:
+    socket_set_nonblock(sockfd);
     return ret;
 }
 
@@ -793,8 +614,8 @@ static void coroutine_fn aio_read_response(void *opaque)
     }
 
     /* read a header */
-    ret = do_read(fd, &rsp, sizeof(rsp));
-    if (ret) {
+    ret = qemu_co_recv(fd, &rsp, sizeof(rsp));
+    if (ret < 0) {
         error_report("failed to get the header, %s", strerror(errno));
         goto out;
     }
@@ -839,9 +660,9 @@ static void coroutine_fn aio_read_response(void *opaque)
         }
         break;
     case AIOCB_READ_UDATA:
-        ret = do_readv(fd, acb->qiov->iov, rsp.data_length,
-                       aio_req->iov_offset);
-        if (ret) {
+        ret = qemu_co_recvv(fd, acb->qiov->iov, rsp.data_length,
+                            aio_req->iov_offset);
+        if (ret < 0) {
             error_report("failed to get the data, %s", strerror(errno));
             goto out;
         }
@@ -1114,15 +935,15 @@ static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
     set_cork(s->fd, 1);
 
     /* send a header */
-    ret = do_write(s->fd, &hdr, sizeof(hdr));
-    if (ret) {
+    ret = qemu_co_send(s->fd, &hdr, sizeof(hdr));
+    if (ret < 0) {
         error_report("failed to send a req, %s", strerror(errno));
         return -EIO;
     }
 
     if (wlen) {
-        ret = do_writev(s->fd, iov, wlen, aio_req->iov_offset);
-        if (ret) {
+        ret = qemu_co_sendv(s->fd, iov, wlen, aio_req->iov_offset);
+        if (ret < 0) {
             error_report("failed to send a data, %s", strerror(errno));
             return -EIO;
         }
diff --git a/cutils.c b/cutils.c
index c91f887..b302020 100644
--- a/cutils.c
+++ b/cutils.c
@@ -25,6 +25,9 @@
 #include "host-utils.h"
 #include <math.h>
 
+#include "qemu_socket.h"
+#include "qemu-coroutine.h"
+
 void pstrcpy(char *buf, int buf_size, const char *str)
 {
     int c;
@@ -415,3 +418,177 @@ int64_t strtosz(const char *nptr, char **end)
 {
     return strtosz_suffix(nptr, end, STRTOSZ_DEFSUFFIX_MB);
 }
+
+/*
+ * Send/recv data with iovec buffers
+ *
+ * This function send/recv data from/to the iovec buffer directly.
+ * The first `offset' bytes in the iovec buffer are skipped and next
+ * `len' bytes are used.
+ *
+ * For example,
+ *
+ *   do_sendv_recvv(sockfd, iov, len, offset, 1);
+ *
+ * is equal to
+ *
+ *   char *buf = malloc(size);
+ *   iov_to_buf(iov, iovcnt, buf, offset, size);
+ *   send(sockfd, buf, size, 0);
+ *   free(buf);
+ */
+static int do_sendv_recvv(int sockfd, struct iovec *iov, int len, int offset,
+                          int do_sendv)
+{
+    int ret, diff, iovlen;
+    struct iovec *last_iov;
+
+    /* last_iov is inclusive, so count from one.  */
+    iovlen = 1;
+    last_iov = iov;
+    len += offset;
+
+    while (last_iov->iov_len < len) {
+        len -= last_iov->iov_len;
+
+        last_iov++;
+        iovlen++;
+    }
+
+    diff = last_iov->iov_len - len;
+    last_iov->iov_len -= diff;
+
+    while (iov->iov_len <= offset) {
+        offset -= iov->iov_len;
+
+        iov++;
+        iovlen--;
+    }
+
+    iov->iov_base = (char *) iov->iov_base + offset;
+    iov->iov_len -= offset;
+
+    {
+#ifdef CONFIG_IOVEC
+        struct msghdr msg;
+        memset(&msg, 0, sizeof(msg));
+        msg.msg_iov = iov;
+        msg.msg_iovlen = iovlen;
+
+        do {
+            if (do_sendv) {
+                ret = sendmsg(sockfd, &msg, 0);
+            } else {
+                ret = recvmsg(sockfd, &msg, 0);
+            }
+        } while (ret == -1 && errno == EINTR);
+#else
+        struct iovec *p = iov;
+        ret = 0;
+        while (iovlen > 0) {
+            int rc;
+            if (do_sendv) {
+                rc = send(sockfd, p->iov_base, p->iov_len, 0);
+            } else {
+                rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
+            }
+            if (rc == -1) {
+                if (errno == EINTR) {
+                    continue;
+                }
+                if (ret == 0) {
+                    ret = -1;
+                }
+                break;
+            }
+            iovlen--, p++;
+            ret += rc;
+        }
+#endif
+    }
+
+    /* Undo the changes above */
+    iov->iov_base = (char *) iov->iov_base - offset;
+    iov->iov_len += offset;
+    last_iov->iov_len += diff;
+    return ret;
+}
+
+int qemu_recvv(int sockfd, struct iovec *iov, int len, int iov_offset)
+{
+    return do_sendv_recvv(sockfd, iov, len, iov_offset, 0);
+}
+
+int qemu_sendv(int sockfd, struct iovec *iov, int len, int iov_offset)
+{
+    return do_sendv_recvv(sockfd, iov, len, iov_offset, 1);
+}
+
+int coroutine_fn qemu_co_recvv(int sockfd, struct iovec *iov,
+                               int len, int iov_offset)
+{
+    int total = 0;
+    int ret;
+    while (len) {
+        ret = qemu_recvv(sockfd, iov, len, iov_offset + total);
+        if (ret < 0) {
+            if (errno == EAGAIN) {
+                qemu_coroutine_yield();
+                continue;
+            }
+            if (total == 0) {
+                total = -1;
+            }
+            break;
+        }
+        if (ret == 0) {
+            break;
+        }
+        total += ret, len -= ret;
+    }
+
+    return total;
+}
+
+int coroutine_fn qemu_co_sendv(int sockfd, struct iovec *iov,
+                               int len, int iov_offset)
+{
+    int total = 0;
+    int ret;
+    while (len) {
+        ret = qemu_sendv(sockfd, iov, len, iov_offset + total);
+        if (ret < 0) {
+            if (errno == EAGAIN) {
+                qemu_coroutine_yield();
+                continue;
+            }
+            if (total == 0) {
+                total = -1;
+            }
+            break;
+        }
+        total += ret, len -= ret;
+    }
+
+    return total;
+}
+
+int coroutine_fn qemu_co_recv(int sockfd, void *buf, int len)
+{
+    struct iovec iov;
+
+    iov.iov_base = buf;
+    iov.iov_len = len;
+
+    return qemu_co_recvv(sockfd, &iov, len, 0);
+}
+
+int coroutine_fn qemu_co_send(int sockfd, void *buf, int len)
+{
+    struct iovec iov;
+
+    iov.iov_base = buf;
+    iov.iov_len = len;
+
+    return qemu_co_sendv(sockfd, &iov, len, 0);
+}
diff --git a/qemu-common.h b/qemu-common.h
index 404c421..d7ee4a8 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -203,6 +203,9 @@ int qemu_pipe(int pipefd[2]);
 #define qemu_recv(sockfd, buf, len, flags) recv(sockfd, buf, len, flags)
 #endif
 
+int qemu_recvv(int sockfd, struct iovec *iov, int len, int iov_offset);
+int qemu_sendv(int sockfd, struct iovec *iov, int len, int iov_offset);
+
 /* Error handling.  */
 
 void QEMU_NORETURN hw_error(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
@@ -302,6 +305,33 @@ struct qemu_work_item {
 void qemu_init_vcpu(void *env);
 #endif
 
+/**
+ * Sends an iovec (or optionally a part of it) down a socket, yielding
+ * when the socket is full.
+ */
+int qemu_co_sendv(int sockfd, struct iovec *iov,
+                  int len, int iov_offset);
+
+/**
+ * Receives data into an iovec (or optionally into a part of it) from
+ * a socket, yielding when there is no data in the socket.
+ */
+int qemu_co_recvv(int sockfd, struct iovec *iov,
+                  int len, int iov_offset);
+
+
+/**
+ * Sends a buffer down a socket, yielding when the socket is full.
+ */
+int qemu_co_send(int sockfd, void *buf, int len);
+
+/**
+ * Receives data into a buffer from a socket, yielding when there
+ * is no data in the socket.
+ */
+int qemu_co_recv(int sockfd, void *buf, int len);
+
+
 typedef struct QEMUIOVector {
     struct iovec *iov;
     int niov;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (2 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-17  6:50   ` MORITA Kazutaka
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 05/15] block: emulate .bdrv_flush() using .bdrv_aio_flush() Paolo Bonzini
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: MORITA Kazutaka

When the other side is shutdown, read returns zero (writes return EPIPE).
In this case, care must be taken to avoid infinite loops.  This error
was already present in sheepdog.

Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cutils.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/cutils.c b/cutils.c
index b302020..295187f 100644
--- a/cutils.c
+++ b/cutils.c
@@ -501,8 +501,11 @@ static int do_sendv_recvv(int sockfd, struct iovec *iov, int len, int offset,
                 }
                 break;
             }
-            iovlen--, p++;
+            if (rc == 0) {
+                break;
+            }
             ret += rc;
+            iovlen--, p++;
         }
 #endif
     }
@@ -567,6 +570,9 @@ int coroutine_fn qemu_co_sendv(int sockfd, struct iovec *iov,
             }
             break;
         }
+        if (ret == 0) {
+            break;
+        }
         total += ret, len -= ret;
     }
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 05/15] block: emulate .bdrv_flush() using .bdrv_aio_flush()
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (3 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 06/15] block: group together the plugging of synchronous IO emulation Paolo Bonzini
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Stefan Hajnoczi

From: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

Block drivers typically have two copies of the flush operation: a
synchronous .bdrv_flush() and an asynchronous .bdrv_aio_flush().  This
patch applies the same emulation that we already do for
.bdrv_read()/.bdrv_write() to .bdrv_flush().  Now block drivers only
need to provide either .bdrv_aio_flush() or, in the case of legacy
drivers, .bdrv_flush().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c           |   31 +++++++++++++++++++++++++++----
 block/blkdebug.c  |    6 ------
 block/blkverify.c |    9 ---------
 block/qcow.c      |    6 ------
 block/qcow2.c     |   19 -------------------
 block/qed.c       |    6 ------
 block/raw-posix.c |   18 ------------------
 7 files changed, 27 insertions(+), 68 deletions(-)

diff --git a/block.c b/block.c
index e3fe97f..ce35dce 100644
--- a/block.c
+++ b/block.c
@@ -59,6 +59,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
                         uint8_t *buf, int nb_sectors);
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);
+static int bdrv_flush_em(BlockDriverState *bs);
 static BlockDriverAIOCB *bdrv_co_aio_readv_em(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
@@ -205,8 +206,11 @@ void bdrv_register(BlockDriver *bdrv)
         }
     }
 
-    if (!bdrv->bdrv_aio_flush)
+    if (!bdrv->bdrv_aio_flush) {
         bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
+    } else if (!bdrv->bdrv_flush) {
+        bdrv->bdrv_flush = bdrv_flush_em;
+    }
 
     QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
@@ -2866,7 +2870,7 @@ static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
 /**************************************************************/
 /* sync block device emulation */
 
-static void bdrv_rw_em_cb(void *opaque, int ret)
+static void bdrv_em_cb(void *opaque, int ret)
 {
     *(int *)opaque = ret;
 }
@@ -2886,7 +2890,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
     iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
     qemu_iovec_init_external(&qiov, &iov, 1);
     acb = bdrv_aio_readv(bs, sector_num, &qiov, nb_sectors,
-        bdrv_rw_em_cb, &async_ret);
+        bdrv_em_cb, &async_ret);
     if (acb == NULL) {
         async_ret = -1;
         goto fail;
@@ -2914,7 +2918,26 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
     iov.iov_len = nb_sectors * BDRV_SECTOR_SIZE;
     qemu_iovec_init_external(&qiov, &iov, 1);
     acb = bdrv_aio_writev(bs, sector_num, &qiov, nb_sectors,
-        bdrv_rw_em_cb, &async_ret);
+        bdrv_em_cb, &async_ret);
+    if (acb == NULL) {
+        async_ret = -1;
+        goto fail;
+    }
+    while (async_ret == NOT_DONE) {
+        qemu_aio_wait();
+    }
+
+fail:
+    return async_ret;
+}
+
+static int bdrv_flush_em(BlockDriverState *bs)
+{
+    int async_ret;
+    BlockDriverAIOCB *acb;
+
+    async_ret = NOT_DONE;
+    acb = bdrv_aio_flush(bs, bdrv_em_cb, &async_ret);
     if (acb == NULL) {
         async_ret = -1;
         goto fail;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index b3c5d42..9b88535 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -397,11 +397,6 @@ static void blkdebug_close(BlockDriverState *bs)
     }
 }
 
-static int blkdebug_flush(BlockDriverState *bs)
-{
-    return bdrv_flush(bs->file);
-}
-
 static BlockDriverAIOCB *blkdebug_aio_flush(BlockDriverState *bs,
     BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -454,7 +449,6 @@ static BlockDriver bdrv_blkdebug = {
 
     .bdrv_file_open     = blkdebug_open,
     .bdrv_close         = blkdebug_close,
-    .bdrv_flush         = blkdebug_flush,
 
     .bdrv_aio_readv     = blkdebug_aio_readv,
     .bdrv_aio_writev    = blkdebug_aio_writev,
diff --git a/block/blkverify.c b/block/blkverify.c
index c7522b4..483f3b3 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -116,14 +116,6 @@ static void blkverify_close(BlockDriverState *bs)
     s->test_file = NULL;
 }
 
-static int blkverify_flush(BlockDriverState *bs)
-{
-    BDRVBlkverifyState *s = bs->opaque;
-
-    /* Only flush test file, the raw file is not important */
-    return bdrv_flush(s->test_file);
-}
-
 static int64_t blkverify_getlength(BlockDriverState *bs)
 {
     BDRVBlkverifyState *s = bs->opaque;
@@ -368,7 +360,6 @@ static BlockDriver bdrv_blkverify = {
 
     .bdrv_file_open     = blkverify_open,
     .bdrv_close         = blkverify_close,
-    .bdrv_flush         = blkverify_flush,
 
     .bdrv_aio_readv     = blkverify_aio_readv,
     .bdrv_aio_writev    = blkverify_aio_writev,
diff --git a/block/qcow.c b/block/qcow.c
index c8bfecc..9b71116 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -781,11 +781,6 @@ static int qcow_write_compressed(BlockDriverState *bs, int64_t sector_num,
     return 0;
 }
 
-static int qcow_flush(BlockDriverState *bs)
-{
-    return bdrv_flush(bs->file);
-}
-
 static BlockDriverAIOCB *qcow_aio_flush(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -826,7 +821,6 @@ static BlockDriver bdrv_qcow = {
     .bdrv_open		= qcow_open,
     .bdrv_close		= qcow_close,
     .bdrv_create	= qcow_create,
-    .bdrv_flush		= qcow_flush,
     .bdrv_is_allocated	= qcow_is_allocated,
     .bdrv_set_key	= qcow_set_key,
     .bdrv_make_empty	= qcow_make_empty,
diff --git a/block/qcow2.c b/block/qcow2.c
index 510ff68..4dc980c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1092,24 +1092,6 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
     return 0;
 }
 
-static int qcow2_flush(BlockDriverState *bs)
-{
-    BDRVQcowState *s = bs->opaque;
-    int ret;
-
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
-    if (ret < 0) {
-        return ret;
-    }
-
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
-    if (ret < 0) {
-        return ret;
-    }
-
-    return bdrv_flush(bs->file);
-}
-
 static BlockDriverAIOCB *qcow2_aio_flush(BlockDriverState *bs,
                                          BlockDriverCompletionFunc *cb,
                                          void *opaque)
@@ -1242,7 +1224,6 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_open          = qcow2_open,
     .bdrv_close         = qcow2_close,
     .bdrv_create        = qcow2_create,
-    .bdrv_flush         = qcow2_flush,
     .bdrv_is_allocated  = qcow2_is_allocated,
     .bdrv_set_key       = qcow2_set_key,
     .bdrv_make_empty    = qcow2_make_empty,
diff --git a/block/qed.c b/block/qed.c
index 624e261..5cf4e08 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -533,11 +533,6 @@ static void bdrv_qed_close(BlockDriverState *bs)
     qemu_vfree(s->l1_table);
 }
 
-static int bdrv_qed_flush(BlockDriverState *bs)
-{
-    return bdrv_flush(bs->file);
-}
-
 static int qed_create(const char *filename, uint32_t cluster_size,
                       uint64_t image_size, uint32_t table_size,
                       const char *backing_file, const char *backing_fmt)
@@ -1479,7 +1474,6 @@ static BlockDriver bdrv_qed = {
     .bdrv_open                = bdrv_qed_open,
     .bdrv_close               = bdrv_qed_close,
     .bdrv_create              = bdrv_qed_create,
-    .bdrv_flush               = bdrv_qed_flush,
     .bdrv_is_allocated        = bdrv_qed_is_allocated,
     .bdrv_make_empty          = bdrv_qed_make_empty,
     .bdrv_aio_readv           = bdrv_qed_aio_readv,
diff --git a/block/raw-posix.c b/block/raw-posix.c
index b8d0ef7..4cdede5 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -835,19 +835,6 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
     return result;
 }
 
-static int raw_flush(BlockDriverState *bs)
-{
-    BDRVRawState *s = bs->opaque;
-    int ret;
-
-    ret = qemu_fdatasync(s->fd);
-    if (ret < 0) {
-        return -errno;
-    }
-
-    return 0;
-}
-
 #ifdef CONFIG_XFS
 static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
 {
@@ -915,7 +902,6 @@ static BlockDriver bdrv_file = {
     .bdrv_write = raw_write,
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
-    .bdrv_flush = raw_flush,
     .bdrv_discard = raw_discard,
 
     .bdrv_aio_readv = raw_aio_readv,
@@ -1185,7 +1171,6 @@ static BlockDriver bdrv_host_device = {
     .bdrv_create        = hdev_create,
     .create_options     = raw_create_options,
     .bdrv_has_zero_init = hdev_has_zero_init,
-    .bdrv_flush         = raw_flush,
 
     .bdrv_aio_readv	= raw_aio_readv,
     .bdrv_aio_writev	= raw_aio_writev,
@@ -1306,7 +1291,6 @@ static BlockDriver bdrv_host_floppy = {
     .bdrv_create        = hdev_create,
     .create_options     = raw_create_options,
     .bdrv_has_zero_init = hdev_has_zero_init,
-    .bdrv_flush         = raw_flush,
 
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
@@ -1407,7 +1391,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_create        = hdev_create,
     .create_options     = raw_create_options,
     .bdrv_has_zero_init = hdev_has_zero_init,
-    .bdrv_flush         = raw_flush,
 
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
@@ -1528,7 +1511,6 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_create        = hdev_create,
     .create_options     = raw_create_options,
     .bdrv_has_zero_init = hdev_has_zero_init,
-    .bdrv_flush         = raw_flush,
 
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 06/15] block: group together the plugging of synchronous IO emulation
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (4 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 05/15] block: emulate .bdrv_flush() using .bdrv_aio_flush() Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 07/15] block: add bdrv_co_flush support Paolo Bonzini
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

This is duplicated twice for read/write operations.  Unify it, and
move it close to the code that adds bdrv_flush.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index ce35dce..394ecaf 100644
--- a/block.c
+++ b/block.c
@@ -189,26 +189,26 @@ void bdrv_register(BlockDriver *bdrv)
         /* Emulate AIO by coroutines, and sync by AIO */
         bdrv->bdrv_aio_readv = bdrv_co_aio_readv_em;
         bdrv->bdrv_aio_writev = bdrv_co_aio_writev_em;
-        bdrv->bdrv_read = bdrv_read_em;
-        bdrv->bdrv_write = bdrv_write_em;
      } else {
         bdrv->bdrv_co_readv = bdrv_co_readv_em;
         bdrv->bdrv_co_writev = bdrv_co_writev_em;
-
         if (!bdrv->bdrv_aio_readv) {
             /* add AIO emulation layer */
             bdrv->bdrv_aio_readv = bdrv_aio_readv_em;
             bdrv->bdrv_aio_writev = bdrv_aio_writev_em;
-        } else if (!bdrv->bdrv_read) {
-            /* add synchronous IO emulation layer */
-            bdrv->bdrv_read = bdrv_read_em;
-            bdrv->bdrv_write = bdrv_write_em;
         }
     }
 
     if (!bdrv->bdrv_aio_flush) {
         bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
-    } else if (!bdrv->bdrv_flush) {
+    }
+
+    /* add synchronous IO emulation layer */
+    if (!bdrv->bdrv_read) {
+        bdrv->bdrv_read = bdrv_read_em;
+        bdrv->bdrv_write = bdrv_write_em;
+    }
+    if (!bdrv->bdrv_flush) {
         bdrv->bdrv_flush = bdrv_flush_em;
     }
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 07/15] block: add bdrv_co_flush support
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (5 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 06/15] block: group together the plugging of synchronous IO emulation Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 08/15] block: add bdrv_co_discard and bdrv_aio_discard support Paolo Bonzini
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     |   45 +++++++++++++++++++++++++++++++++++----------
 block_int.h |    1 +
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index 394ecaf..f4b9089 100644
--- a/block.c
+++ b/block.c
@@ -66,6 +66,8 @@ static BlockDriverAIOCB *bdrv_co_aio_readv_em(BlockDriverState *bs,
 static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
+static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs,
+        BlockDriverCompletionFunc *cb, void *opaque);
 static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
@@ -199,7 +201,11 @@ void bdrv_register(BlockDriver *bdrv)
         }
     }
 
-    if (!bdrv->bdrv_aio_flush) {
+    if (bdrv->bdrv_co_flush) {
+        bdrv->bdrv_aio_flush = bdrv_co_aio_flush_em;
+    } else if (bdrv->bdrv_aio_flush) {
+        bdrv->bdrv_co_flush = bdrv_co_flush_em;
+    } else {
         bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
     }
 
@@ -1035,11 +1041,6 @@ static inline bool bdrv_has_async_rw(BlockDriver *drv)
         || drv->bdrv_aio_readv != bdrv_aio_readv_em;
 }
 
-static inline bool bdrv_has_async_flush(BlockDriver *drv)
-{
-    return drv->bdrv_aio_flush != bdrv_aio_flush_em;
-}
-
 /* return < 0 if error. See bdrv_write() for the return codes */
 int bdrv_read(BlockDriverState *bs, int64_t sector_num,
               uint8_t *buf, int nb_sectors)
@@ -1742,8 +1743,8 @@ int bdrv_flush(BlockDriverState *bs)
         return 0;
     }
 
-    if (bs->drv && bdrv_has_async_flush(bs->drv) && qemu_in_coroutine()) {
-        return bdrv_co_flush_em(bs);
+    if (bs->drv && bs->drv->bdrv_co_flush && qemu_in_coroutine()) {
+        return bs->drv->bdrv_co_flush(bs);
     }
 
     if (bs->drv && bs->drv->bdrv_flush) {
@@ -2764,7 +2765,7 @@ static AIOPool bdrv_em_co_aio_pool = {
     .cancel             = bdrv_aio_co_cancel_em,
 };
 
-static void bdrv_co_rw_bh(void *opaque)
+static void bdrv_co_em_bh(void *opaque)
 {
     BlockDriverAIOCBCoroutine *acb = opaque;
 
@@ -2786,7 +2787,7 @@ static void coroutine_fn bdrv_co_rw(void *opaque)
             acb->req.nb_sectors, acb->req.qiov);
     }
 
-    acb->bh = qemu_bh_new(bdrv_co_rw_bh, acb);
+    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
     qemu_bh_schedule(acb->bh);
 }
 
@@ -2829,6 +2830,30 @@ static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs,
                                  true);
 }
 
+static void coroutine_fn bdrv_co_flush(void *opaque)
+{
+    BlockDriverAIOCBCoroutine *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+
+    acb->req.error = bs->drv->bdrv_co_flush(bs);
+    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs,
+                                              BlockDriverCompletionFunc *cb,
+                                              void *opaque)
+{
+    Coroutine *co;
+    BlockDriverAIOCBCoroutine *acb;
+
+    acb = qemu_aio_get(&bdrv_em_co_aio_pool, bs, cb, opaque);
+    co = qemu_coroutine_create(bdrv_co_flush);
+    qemu_coroutine_enter(co, acb);
+
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
diff --git a/block_int.h b/block_int.h
index 8c3b863..bb39b0b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -83,6 +83,7 @@ struct BlockDriver {
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
     int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
+    int coroutine_fn (*bdrv_co_flush)(BlockDriverState *bs);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 08/15] block: add bdrv_co_discard and bdrv_aio_discard support
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (6 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 07/15] block: add bdrv_co_flush support Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 09/15] nbd: fix error handling in the server Paolo Bonzini
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c      |  140 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block.h      |    3 +
 block_int.h  |    9 +++-
 trace-events |    1 +
 4 files changed, 148 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index f4b9089..7853982 100644
--- a/block.c
+++ b/block.c
@@ -53,6 +53,9 @@ static BlockDriverAIOCB *bdrv_aio_writev_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
 static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
+static BlockDriverAIOCB *bdrv_aio_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque);
 static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
 static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
@@ -60,6 +63,8 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num,
 static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num,
                          const uint8_t *buf, int nb_sectors);
 static int bdrv_flush_em(BlockDriverState *bs);
+static int bdrv_discard_em(BlockDriverState *bs, int64_t sector_num,
+                           int nb_sectors);
 static BlockDriverAIOCB *bdrv_co_aio_readv_em(BlockDriverState *bs,
         int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
         BlockDriverCompletionFunc *cb, void *opaque);
@@ -68,6 +73,9 @@ static BlockDriverAIOCB *bdrv_co_aio_writev_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
 static BlockDriverAIOCB *bdrv_co_aio_flush_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
+static BlockDriverAIOCB *bdrv_co_aio_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque);
 static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
@@ -75,6 +83,8 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
 static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs);
+static int coroutine_fn bdrv_co_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors);
 
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
     QTAILQ_HEAD_INITIALIZER(bdrv_states);
@@ -209,6 +219,14 @@ void bdrv_register(BlockDriver *bdrv)
         bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
     }
 
+    if (bdrv->bdrv_co_discard) {
+        bdrv->bdrv_aio_discard = bdrv_co_aio_discard_em;
+    } else if (bdrv->bdrv_aio_discard) {
+        bdrv->bdrv_co_discard = bdrv_co_discard_em;
+    } else {
+        bdrv->bdrv_aio_discard = bdrv_aio_discard_em;
+    }
+
     /* add synchronous IO emulation layer */
     if (!bdrv->bdrv_read) {
         bdrv->bdrv_read = bdrv_read_em;
@@ -217,6 +235,9 @@ void bdrv_register(BlockDriver *bdrv)
     if (!bdrv->bdrv_flush) {
         bdrv->bdrv_flush = bdrv_flush_em;
     }
+    if (!bdrv->bdrv_discard) {
+        bdrv->bdrv_discard = bdrv_discard_em;
+    }
 
     QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
@@ -1791,10 +1812,18 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
     if (!bs->drv) {
         return -ENOMEDIUM;
     }
-    if (!bs->drv->bdrv_discard) {
-        return 0;
+    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+        return -EIO;
+    }
+    if (bs->drv->bdrv_co_discard && qemu_in_coroutine()) {
+        return bs->drv->bdrv_co_discard(bs, sector_num, nb_sectors);
     }
-    return bs->drv->bdrv_discard(bs, sector_num, nb_sectors);
+
+    if (bs->drv->bdrv_discard) {
+        return bs->drv->bdrv_discard(bs, sector_num, nb_sectors);
+    }
+
+    return 0;
 }
 
 /*
@@ -2656,6 +2685,24 @@ BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
     return drv->bdrv_aio_flush(bs, cb, opaque);
 }
 
+BlockDriverAIOCB *bdrv_aio_discard(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    trace_bdrv_aio_discard(bs, sector_num, nb_sectors, opaque);
+
+    if (!bs->drv) {
+        return NULL;
+    }
+    if (bs->read_only) {
+        return NULL;
+    }
+    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+        return NULL;
+    }
+    return bs->drv->bdrv_aio_discard(bs, sector_num, nb_sectors, cb, opaque);
+}
+
 void bdrv_aio_cancel(BlockDriverAIOCB *acb)
 {
     acb->pool->cancel(acb);
@@ -2873,6 +2920,52 @@ static BlockDriverAIOCB *bdrv_aio_flush_em(BlockDriverState *bs,
     return &acb->common;
 }
 
+static void coroutine_fn bdrv_co_discard(void *opaque)
+{
+    BlockDriverAIOCBCoroutine *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+
+    acb->req.error = bs->drv->bdrv_co_discard(bs, acb->req.sector,
+                                              acb->req.nb_sectors);
+    acb->bh = qemu_bh_new(bdrv_co_em_bh, acb);
+    qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *bdrv_co_aio_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    Coroutine *co;
+    BlockDriverAIOCBCoroutine *acb;
+
+    acb = qemu_aio_get(&bdrv_em_co_aio_pool, bs, cb, opaque);
+    acb->req.sector = sector_num;
+    acb->req.nb_sectors = nb_sectors;
+    co = qemu_coroutine_create(bdrv_co_discard);
+    qemu_coroutine_enter(co, acb);
+
+    return &acb->common;
+}
+
+static BlockDriverAIOCB *bdrv_aio_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    BlockDriverAIOCBSync *acb;
+
+    acb = qemu_aio_get(&bdrv_em_aio_pool, bs, cb, opaque);
+    acb->is_write = 1; /* don't bounce in the completion hadler */
+    acb->qiov = NULL;
+    acb->bounce = NULL;
+
+    if (!acb->bh) {
+        acb->bh = qemu_bh_new(bdrv_aio_bh_cb, acb);
+    }
+    acb->ret = bdrv_discard(bs, sector_num, nb_sectors);
+    qemu_bh_schedule(acb->bh);
+    return &acb->common;
+}
+
 static BlockDriverAIOCB *bdrv_aio_noop_em(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque)
 {
@@ -2975,6 +3068,30 @@ fail:
     return async_ret;
 }
 
+static int bdrv_discard_em(BlockDriverState *bs, int64_t sector_num,
+                           int nb_sectors)
+
+{
+    int async_ret;
+    BlockDriverAIOCB *acb;
+
+    async_ret = NOT_DONE;
+    acb = bdrv_aio_discard(bs, sector_num, nb_sectors,
+                           bdrv_em_cb, &async_ret);
+    if (acb == NULL) {
+        async_ret = -1;
+        goto fail;
+    }
+
+    while (async_ret == NOT_DONE) {
+        qemu_aio_wait();
+    }
+
+
+fail:
+    return async_ret;
+}
+
 void bdrv_init(void)
 {
     module_call_init(MODULE_INIT_BLOCK);
@@ -3083,6 +3200,23 @@ static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs)
     return co.ret;
 }
 
+static int coroutine_fn bdrv_co_discard_em(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors)
+{
+    CoroutineIOCompletion co = {
+        .coroutine = qemu_coroutine_self(),
+    };
+    BlockDriverAIOCB *acb;
+
+    acb = bdrv_aio_discard(bs, sector_num, nb_sectors,
+                           bdrv_co_io_em_complete, &co);
+    if (!acb) {
+        return -EIO;
+    }
+    qemu_coroutine_yield();
+    return co.ret;
+}
+
 /**************************************************************/
 /* removable device support */
 
diff --git a/block.h b/block.h
index 16bfa0a..94cd395 100644
--- a/block.h
+++ b/block.h
@@ -156,6 +156,9 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
                                   BlockDriverCompletionFunc *cb, void *opaque);
 BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
                                  BlockDriverCompletionFunc *cb, void *opaque);
+BlockDriverAIOCB *bdrv_aio_discard(BlockDriverState *bs,
+                                   int64_t sector_num, int nb_sectors,
+                                   BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
 
 typedef struct BlockRequest {
diff --git a/block_int.h b/block_int.h
index bb39b0b..4222bda 100644
--- a/block_int.h
+++ b/block_int.h
@@ -63,6 +63,8 @@ struct BlockDriver {
     void (*bdrv_close)(BlockDriverState *bs);
     int (*bdrv_create)(const char *filename, QEMUOptionParameter *options);
     int (*bdrv_flush)(BlockDriverState *bs);
+    int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors);
     int (*bdrv_is_allocated)(BlockDriverState *bs, int64_t sector_num,
                              int nb_sectors, int *pnum);
     int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
@@ -76,14 +78,17 @@ struct BlockDriver {
         BlockDriverCompletionFunc *cb, void *opaque);
     BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
-    int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
-                        int nb_sectors);
+    BlockDriverAIOCB *(*bdrv_aio_discard)(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque);
 
     int coroutine_fn (*bdrv_co_readv)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
     int coroutine_fn (*bdrv_co_writev)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
     int coroutine_fn (*bdrv_co_flush)(BlockDriverState *bs);
+    int coroutine_fn (*bdrv_co_discard)(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);
diff --git a/trace-events b/trace-events
index fe64684..2dcfb9c 100644
--- a/trace-events
+++ b/trace-events
@@ -59,6 +59,7 @@ multiwrite_cb(void *mcb, int ret) "mcb %p ret %d"
 bdrv_aio_multiwrite(void *mcb, int num_callbacks, int num_reqs) "mcb %p num_callbacks %d num_reqs %d"
 bdrv_aio_multiwrite_earlyfail(void *mcb) "mcb %p"
 bdrv_aio_multiwrite_latefail(void *mcb, int i) "mcb %p i %d"
+bdrv_aio_discard(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_flush(void *bs, void *opaque) "bs %p opaque %p"
 bdrv_aio_readv(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
 bdrv_aio_writev(void *bs, int64_t sector_num, int nb_sectors, void *opaque) "bs %p sector_num %"PRId64" nb_sectors %d opaque %p"
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 09/15] nbd: fix error handling in the server
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (7 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 08/15] block: add bdrv_co_discard and bdrv_aio_discard support Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 10/15] nbd: add support for NBD_CMD_FLUSH Paolo Bonzini
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

bdrv_read and bdrv_write return negative errno values, not -1.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 nbd.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/nbd.c b/nbd.c
index 595f4d8..a7fa507 100644
--- a/nbd.c
+++ b/nbd.c
@@ -580,6 +580,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
 {
     struct nbd_request request;
     struct nbd_reply reply;
+    int ret;
 
     TRACE("Reading request.");
 
@@ -618,12 +619,13 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
     case NBD_CMD_READ:
         TRACE("Request type is READ");
 
-        if (bdrv_read(bs, (request.from + dev_offset) / 512,
-                  data + NBD_REPLY_SIZE,
-                  request.len / 512) == -1) {
+        ret = bdrv_read(bs, (request.from + dev_offset) / 512,
+                        data + NBD_REPLY_SIZE,
+                        request.len / 512);
+        if (ret < 0) {
             LOG("reading from file failed");
-            errno = EINVAL;
-            return -1;
+            reply.error = -ret;
+            request.len = 0;
         }
         *offset += request.len;
 
@@ -666,11 +668,12 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
         } else {
             TRACE("Writing to device");
 
-            if (bdrv_write(bs, (request.from + dev_offset) / 512,
-                       data, request.len / 512) == -1) {
+            ret = bdrv_write(bs, (request.from + dev_offset) / 512,
+                             data, request.len / 512);
+            if (ret < 0) {
                 LOG("writing to file failed");
-                errno = EINVAL;
-                return -1;
+                reply.error = -ret;
+                request.len = 0;
             }
 
             *offset += request.len;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 10/15] nbd: add support for NBD_CMD_FLUSH
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (8 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 09/15] nbd: fix error handling in the server Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 11/15] nbd: add support for NBD_CMD_FLAG_FUA Paolo Bonzini
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Note for the brace police: the style in this commit and the following
is consistent with the rest of the file.  It is then fixed together with
the introduction of coroutines.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |   31 +++++++++++++++++++++++++++++++
 nbd.c       |   15 ++++++++++++++-
 2 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 76f04d8..b6b2f2d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -238,6 +238,36 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num,
     return 0;
 }
 
+static int nbd_flush(BlockDriverState *bs)
+{
+    BDRVNBDState *s = bs->opaque;
+    struct nbd_request request;
+    struct nbd_reply reply;
+
+    if (!(s->nbdflags & NBD_FLAG_SEND_FLUSH)) {
+        return 0;
+    }
+
+    request.type = NBD_CMD_FLUSH;
+    request.handle = (uint64_t)(intptr_t)bs;
+    request.from = 0;
+    request.len = 0;
+
+    if (nbd_send_request(s->sock, &request) == -1)
+        return -errno;
+
+    if (nbd_receive_reply(s->sock, &reply) == -1)
+        return -errno;
+
+    if (reply.error != 0)
+        return -reply.error;
+
+    if (reply.handle != request.handle)
+        return -EIO;
+
+    return 0;
+}
+
 static void nbd_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
@@ -261,6 +291,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_read		= nbd_read,
     .bdrv_write		= nbd_write,
     .bdrv_close		= nbd_close,
+    .bdrv_flush		= nbd_flush,
     .bdrv_getlength	= nbd_getlength,
     .protocol_name	= "nbd",
 };
diff --git a/nbd.c b/nbd.c
index a7fa507..8852ded 100644
--- a/nbd.c
+++ b/nbd.c
@@ -194,7 +194,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
     memcpy(buf, "NBDMAGIC", 8);
     cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
     cpu_to_be64w((uint64_t*)(buf + 16), size);
-    cpu_to_be32w((uint32_t*)(buf + 24), flags | NBD_FLAG_HAS_FLAGS);
+    cpu_to_be32w((uint32_t*)(buf + 24),
+                 flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH);
     memset(buf + 28, 0, 124);
 
     if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) {
@@ -686,6 +687,18 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
         TRACE("Request type is DISCONNECT");
         errno = 0;
         return 1;
+    case NBD_CMD_FLUSH:
+        TRACE("Request type is FLUSH");
+
+        ret = bdrv_flush(bs);
+        if (ret < 0) {
+            LOG("flush failed");
+            reply.error = -ret;
+        }
+
+        if (nbd_send_reply(csock, &reply) == -1)
+            return -1;
+        break;
     default:
         LOG("invalid request type (%u) received", request.type);
         errno = EINVAL;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 11/15] nbd: add support for NBD_CMD_FLAG_FUA
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (9 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 10/15] nbd: add support for NBD_CMD_FLUSH Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 12/15] nbd: add support for NBD_CMD_TRIM Paolo Bonzini
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

The server can use it to issue a flush automatically after a
write.  The client can also use it to mimic a write-through
cache.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |    8 ++++++++
 nbd.c       |   13 +++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index b6b2f2d..23f83d4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -216,6 +216,10 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num,
     struct nbd_reply reply;
 
     request.type = NBD_CMD_WRITE;
+    if (!bdrv_enable_write_cache(bs) && (s->nbdflags & NBD_FLAG_SEND_FUA)) {
+        request.type |= NBD_CMD_FLAG_FUA;
+    }
+
     request.handle = (uint64_t)(intptr_t)bs;
     request.from = sector_num * 512;;
     request.len = nb_sectors * 512;
@@ -249,6 +253,10 @@ static int nbd_flush(BlockDriverState *bs)
     }
 
     request.type = NBD_CMD_FLUSH;
+    if (s->nbdflags & NBD_FLAG_SEND_FUA) {
+        request.type |= NBD_CMD_FLAG_FUA;
+    }
+
     request.handle = (uint64_t)(intptr_t)bs;
     request.from = 0;
     request.len = 0;
diff --git a/nbd.c b/nbd.c
index 8852ded..7faa08b 100644
--- a/nbd.c
+++ b/nbd.c
@@ -195,7 +195,8 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
     cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
     cpu_to_be64w((uint64_t*)(buf + 16), size);
     cpu_to_be32w((uint32_t*)(buf + 24),
-                 flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH);
+                 flags | NBD_FLAG_HAS_FLAGS |
+                 NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     memset(buf + 28, 0, 124);
 
     if (write_sync(csock, buf, sizeof(buf)) != sizeof(buf)) {
@@ -616,7 +617,7 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
     reply.handle = request.handle;
     reply.error = 0;
 
-    switch (request.type) {
+    switch (request.type & NBD_CMD_MASK_COMMAND) {
     case NBD_CMD_READ:
         TRACE("Request type is READ");
 
@@ -678,6 +679,14 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
             }
 
             *offset += request.len;
+
+            if (request.type & NBD_CMD_FLAG_FUA) {
+                ret = bdrv_flush(bs);
+                if (ret < 0) {
+                    LOG("flush failed");
+                    reply.error = -ret;
+                }
+            }
         }
 
         if (nbd_send_reply(csock, &reply) == -1)
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 12/15] nbd: add support for NBD_CMD_TRIM
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (10 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 11/15] nbd: add support for NBD_CMD_FLAG_FUA Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 13/15] nbd: switch to asynchronous operation Paolo Bonzini
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Map it to bdrv_discard.  The server can also expose NBD_FLAG_SEND_TRIM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |   31 +++++++++++++++++++++++++++++++
 nbd.c       |   13 ++++++++++++-
 2 files changed, 43 insertions(+), 1 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 23f83d4..35c15c8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -276,6 +276,36 @@ static int nbd_flush(BlockDriverState *bs)
     return 0;
 }
 
+static int nbd_discard(BlockDriverState *bs, int64_t sector_num,
+                       int nb_sectors)
+{
+    BDRVNBDState *s = bs->opaque;
+    struct nbd_request request;
+    struct nbd_reply reply;
+
+    if (!(s->nbdflags & NBD_FLAG_SEND_TRIM)) {
+        return 0;
+    }
+    request.type = NBD_CMD_TRIM;
+    request.handle = (uint64_t)(intptr_t)bs;
+    request.from = sector_num * 512;;
+    request.len = nb_sectors * 512;
+
+    if (nbd_send_request(s->sock, &request) == -1)
+        return -errno;
+
+    if (nbd_receive_reply(s->sock, &reply) == -1)
+        return -errno;
+
+    if (reply.error !=0)
+        return -reply.error;
+
+    if (reply.handle != request.handle)
+        return -EIO;
+
+    return 0;
+}
+
 static void nbd_close(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
@@ -300,6 +330,7 @@ static BlockDriver bdrv_nbd = {
     .bdrv_write		= nbd_write,
     .bdrv_close		= nbd_close,
     .bdrv_flush		= nbd_flush,
+    .bdrv_discard	= nbd_discard,
     .bdrv_getlength	= nbd_getlength,
     .protocol_name	= "nbd",
 };
diff --git a/nbd.c b/nbd.c
index 7faa08b..5a618c5 100644
--- a/nbd.c
+++ b/nbd.c
@@ -195,7 +195,7 @@ int nbd_negotiate(int csock, off_t size, uint32_t flags)
     cpu_to_be64w((uint64_t*)(buf + 8), 0x00420281861253LL);
     cpu_to_be64w((uint64_t*)(buf + 16), size);
     cpu_to_be32w((uint32_t*)(buf + 24),
-                 flags | NBD_FLAG_HAS_FLAGS |
+                 flags | NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_TRIM |
                  NBD_FLAG_SEND_FLUSH | NBD_FLAG_SEND_FUA);
     memset(buf + 28, 0, 124);
 
@@ -708,6 +708,17 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, uint64_t dev_offset,
         if (nbd_send_reply(csock, &reply) == -1)
             return -1;
         break;
+    case NBD_CMD_TRIM:
+        TRACE("Request type is TRIM");
+        ret = bdrv_discard(bs, (request.from + dev_offset) / 512,
+                           request.len / 512);
+        if (ret < 0) {
+            LOG("discard failed");
+            reply.error = -ret;
+        }
+        if (nbd_send_reply(csock, &reply) == -1)
+            return -1;
+        break;
     default:
         LOG("invalid request type (%u) received", request.type);
         errno = EINVAL;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 13/15] nbd: switch to asynchronous operation
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (11 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 12/15] nbd: add support for NBD_CMD_TRIM Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 14/15] nbd: split requests Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 15/15] nbd: allow multiple in-flight requests Paolo Bonzini
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |  217 +++++++++++++++++++++++++++++++++++++++--------------------
 nbd.c       |    8 ++
 2 files changed, 151 insertions(+), 74 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 35c15c8..f6efd7b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -53,6 +53,11 @@ typedef struct BDRVNBDState {
     size_t blocksize;
     char *export_name; /* An NBD server may export several devices */
 
+    CoMutex mutex;
+    Coroutine *coroutine;
+
+    struct nbd_reply reply;
+
     /* If it begins with  '/', this is a UNIX domain socket. Otherwise,
      * it's a string of the form <hostname|ip4|\[ip6\]>:port
      */
@@ -105,6 +110,95 @@ out:
     return err;
 }
 
+static void nbd_coroutine_start(BDRVNBDState *s, struct nbd_request *request)
+{
+    qemu_co_mutex_lock(&s->mutex);
+    s->coroutine = qemu_coroutine_self();
+    request->handle = (uint64_t)(intptr_t)s;
+}
+
+static int nbd_have_request(void *opaque)
+{
+    BDRVNBDState *s = opaque;
+
+    return !!s->coroutine;
+}
+
+static void nbd_reply_ready(void *opaque)
+{
+    BDRVNBDState *s = opaque;
+
+    if (s->reply.handle == 0) {
+        /* No reply already in flight.  Fetch a header.  */
+        if (nbd_receive_reply(s->sock, &s->reply) < 0) {
+            s->reply.handle = 0;
+        }
+    }
+
+    /* There's no need for a mutex on the receive side, because the
+     * handler acts as a synchronization point and ensures that only
+     * one coroutine is called until the reply finishes.  */
+    if (s->coroutine) {
+        qemu_coroutine_enter(s->coroutine, NULL);
+    }
+}
+
+static void nbd_restart_write(void *opaque)
+{
+    BDRVNBDState *s = opaque;
+    qemu_coroutine_enter(s->coroutine, NULL);
+}
+
+static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
+                               struct iovec *iov, int offset)
+{
+    int rc, ret;
+
+    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write,
+                            nbd_have_request, NULL, s);
+    rc = nbd_send_request(s->sock, request);
+    if (rc != -1 && iov) {
+        ret = qemu_co_sendv(s->sock, iov, request->len, offset);
+        if (ret != request->len) {
+            errno = -EIO;
+            rc = -1;
+        }
+    }
+    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL,
+                            nbd_have_request, NULL, s);
+    return rc;
+}
+
+static void nbd_co_receive_reply(BDRVNBDState *s, struct nbd_request *request,
+                                 struct nbd_reply *reply,
+                                 struct iovec *iov, int offset)
+{
+    int ret;
+
+    /* Wait until we're woken up by the read handler.  */
+    qemu_coroutine_yield();
+    *reply = s->reply;
+    if (reply->handle != request->handle) {
+        reply->error = EIO;
+    } else {
+        if (iov && reply->error == 0) {
+            ret = qemu_co_recvv(s->sock, iov, request->len, offset);
+            if (ret != request->len) {
+                reply->error = EIO;
+            }
+        }
+
+        /* Tell the read handler to read another header.  */
+        s->reply.handle = 0;
+    }
+}
+
+static void nbd_coroutine_end(BDRVNBDState *s, struct nbd_request *request)
+{
+    s->coroutine = NULL;
+    qemu_co_mutex_unlock(&s->mutex);
+}
+
 static int nbd_establish_connection(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
@@ -134,8 +228,11 @@ static int nbd_establish_connection(BlockDriverState *bs)
         return -errno;
     }
 
-    /* Now that we're connected, set the socket to be non-blocking */
+    /* Now that we're connected, set the socket to be non-blocking and
+     * kick the reply mechanism.  */
     socket_set_nonblock(sock);
+    qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL,
+                            nbd_have_request, NULL, s);
 
     s->sock = sock;
     s->size = size;
@@ -151,7 +248,6 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     struct nbd_request request;
 
     request.type = NBD_CMD_DISC;
-    request.handle = (uint64_t)(intptr_t)bs;
     request.from = 0;
     request.len = 0;
     nbd_send_request(s->sock, &request);
@@ -164,6 +260,8 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
     BDRVNBDState *s = bs->opaque;
     int result;
 
+    qemu_co_mutex_init(&s->mutex);
+
     /* Pop the config into our state object. Exit if invalid. */
     result = nbd_config(s, filename, flags);
     if (result != 0) {
@@ -178,38 +276,30 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
     return result;
 }
 
-static int nbd_read(BlockDriverState *bs, int64_t sector_num,
-                    uint8_t *buf, int nb_sectors)
+static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors, QEMUIOVector *qiov)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
     struct nbd_reply reply;
 
     request.type = NBD_CMD_READ;
-    request.handle = (uint64_t)(intptr_t)bs;
     request.from = sector_num * 512;;
     request.len = nb_sectors * 512;
 
-    if (nbd_send_request(s->sock, &request) == -1)
-        return -errno;
-
-    if (nbd_receive_reply(s->sock, &reply) == -1)
-        return -errno;
-
-    if (reply.error !=0)
-        return -reply.error;
-
-    if (reply.handle != request.handle)
-        return -EIO;
-
-    if (nbd_wr_sync(s->sock, buf, request.len, 1) != request.len)
-        return -EIO;
+    nbd_coroutine_start(s, &request);
+    if (nbd_co_send_request(s, &request, NULL, 0) == -1) {
+        reply.error = errno;
+    } else {
+        nbd_co_receive_reply(s, &request, &reply, qiov->iov, 0);
+    }
+    nbd_coroutine_end(s, &request);
+    return -reply.error;
 
-    return 0;
 }
 
-static int nbd_write(BlockDriverState *bs, int64_t sector_num,
-                     const uint8_t *buf, int nb_sectors)
+static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
+                         int nb_sectors, QEMUIOVector *qiov)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
@@ -220,29 +310,20 @@ static int nbd_write(BlockDriverState *bs, int64_t sector_num,
         request.type |= NBD_CMD_FLAG_FUA;
     }
 
-    request.handle = (uint64_t)(intptr_t)bs;
     request.from = sector_num * 512;;
     request.len = nb_sectors * 512;
 
-    if (nbd_send_request(s->sock, &request) == -1)
-        return -errno;
-
-    if (nbd_wr_sync(s->sock, (uint8_t*)buf, request.len, 0) != request.len)
-        return -EIO;
-
-    if (nbd_receive_reply(s->sock, &reply) == -1)
-        return -errno;
-
-    if (reply.error !=0)
-        return -reply.error;
-
-    if (reply.handle != request.handle)
-        return -EIO;
-
-    return 0;
+    nbd_coroutine_start(s, &request);
+    if (nbd_co_send_request(s, &request, qiov->iov, 0) == -1) {
+        reply.error = errno;
+    } else {
+        nbd_co_receive_reply(s, &request, &reply, NULL, 0);
+    }
+    nbd_coroutine_end(s, &request);
+    return -reply.error;
 }
 
-static int nbd_flush(BlockDriverState *bs)
+static int nbd_co_flush(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
@@ -257,27 +338,21 @@ static int nbd_flush(BlockDriverState *bs)
         request.type |= NBD_CMD_FLAG_FUA;
     }
 
-    request.handle = (uint64_t)(intptr_t)bs;
     request.from = 0;
     request.len = 0;
 
-    if (nbd_send_request(s->sock, &request) == -1)
-        return -errno;
-
-    if (nbd_receive_reply(s->sock, &reply) == -1)
-        return -errno;
-
-    if (reply.error != 0)
-        return -reply.error;
-
-    if (reply.handle != request.handle)
-        return -EIO;
-
-    return 0;
+    nbd_coroutine_start(s, &request);
+    if (nbd_co_send_request(s, &request, NULL, 0) == -1) {
+        reply.error = errno;
+    } else {
+        nbd_co_receive_reply(s, &request, &reply, NULL, 0);
+    }
+    nbd_coroutine_end(s, &request);
+    return -reply.error;
 }
 
-static int nbd_discard(BlockDriverState *bs, int64_t sector_num,
-                       int nb_sectors)
+static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
+                          int nb_sectors)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
@@ -287,23 +362,17 @@ static int nbd_discard(BlockDriverState *bs, int64_t sector_num,
         return 0;
     }
     request.type = NBD_CMD_TRIM;
-    request.handle = (uint64_t)(intptr_t)bs;
     request.from = sector_num * 512;;
     request.len = nb_sectors * 512;
 
-    if (nbd_send_request(s->sock, &request) == -1)
-        return -errno;
-
-    if (nbd_receive_reply(s->sock, &reply) == -1)
-        return -errno;
-
-    if (reply.error !=0)
-        return -reply.error;
-
-    if (reply.handle != request.handle)
-        return -EIO;
-
-    return 0;
+    nbd_coroutine_start(s, &request);
+    if (nbd_co_send_request(s, &request, NULL, 0) == -1) {
+        reply.error = errno;
+    } else {
+        nbd_co_receive_reply(s, &request, &reply, NULL, 0);
+    }
+    nbd_coroutine_end(s, &request);
+    return -reply.error;
 }
 
 static void nbd_close(BlockDriverState *bs)
@@ -326,11 +395,11 @@ static BlockDriver bdrv_nbd = {
     .format_name	= "nbd",
     .instance_size	= sizeof(BDRVNBDState),
     .bdrv_file_open	= nbd_open,
-    .bdrv_read		= nbd_read,
-    .bdrv_write		= nbd_write,
+    .bdrv_co_readv	= nbd_co_readv,
+    .bdrv_co_writev	= nbd_co_writev,
     .bdrv_close		= nbd_close,
-    .bdrv_flush		= nbd_flush,
-    .bdrv_discard	= nbd_discard,
+    .bdrv_co_flush	= nbd_co_flush,
+    .bdrv_co_discard	= nbd_co_discard,
     .bdrv_getlength	= nbd_getlength,
     .protocol_name	= "nbd",
 };
diff --git a/nbd.c b/nbd.c
index 5a618c5..40c76d9 100644
--- a/nbd.c
+++ b/nbd.c
@@ -81,6 +81,14 @@ size_t nbd_wr_sync(int fd, void *buffer, size_t size, bool do_read)
 {
     size_t offset = 0;
 
+    if (qemu_in_coroutine()) {
+        if (do_read) {
+            return qemu_co_recv(fd, buffer, size);
+        } else {
+            return qemu_co_send(fd, buffer, size);
+        }
+    }
+
     while (offset < size) {
         ssize_t len;
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 14/15] nbd: split requests
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (12 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 13/15] nbd: switch to asynchronous operation Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 15/15] nbd: allow multiple in-flight requests Paolo Bonzini
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

qemu-nbd has a limit of slightly less than 1M per request.  Work
around this in the nbd block driver.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |   52 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index f6efd7b..25abaf7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -276,8 +276,9 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
     return result;
 }
 
-static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
-                        int nb_sectors, QEMUIOVector *qiov)
+static int nbd_co_readv_1(BlockDriverState *bs, int64_t sector_num,
+                          int nb_sectors, QEMUIOVector *qiov,
+                          int offset)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
@@ -291,15 +292,16 @@ static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
     if (nbd_co_send_request(s, &request, NULL, 0) == -1) {
         reply.error = errno;
     } else {
-        nbd_co_receive_reply(s, &request, &reply, qiov->iov, 0);
+        nbd_co_receive_reply(s, &request, &reply, qiov->iov, offset);
     }
     nbd_coroutine_end(s, &request);
     return -reply.error;
 
 }
 
-static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
-                         int nb_sectors, QEMUIOVector *qiov)
+static int nbd_co_writev_1(BlockDriverState *bs, int64_t sector_num,
+                           int nb_sectors, QEMUIOVector *qiov,
+                           int offset)
 {
     BDRVNBDState *s = bs->opaque;
     struct nbd_request request;
@@ -314,7 +316,7 @@ static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
     request.len = nb_sectors * 512;
 
     nbd_coroutine_start(s, &request);
-    if (nbd_co_send_request(s, &request, qiov->iov, 0) == -1) {
+    if (nbd_co_send_request(s, &request, qiov->iov, offset) == -1) {
         reply.error = errno;
     } else {
         nbd_co_receive_reply(s, &request, &reply, NULL, 0);
@@ -323,6 +325,44 @@ static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
     return -reply.error;
 }
 
+/* qemu-nbd has a limit of slightly less than 1M per request.  Try to
+ * remain aligned to 4K. */
+#define NBD_MAX_SECTORS 2040
+
+static int nbd_co_readv(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors, QEMUIOVector *qiov)
+{
+    int offset = 0;
+    int ret;
+    while (nb_sectors > NBD_MAX_SECTORS) {
+        ret = nbd_co_readv_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset);
+        if (ret < 0) {
+            return ret;
+        }
+        offset += NBD_MAX_SECTORS * 512;
+        sector_num += NBD_MAX_SECTORS;
+        nb_sectors -= NBD_MAX_SECTORS;
+    }
+    return nbd_co_readv_1(bs, sector_num, nb_sectors, qiov, offset);
+}
+
+static int nbd_co_writev(BlockDriverState *bs, int64_t sector_num,
+                         int nb_sectors, QEMUIOVector *qiov)
+{
+    int offset = 0;
+    int ret;
+    while (nb_sectors > NBD_MAX_SECTORS) {
+        ret = nbd_co_writev_1(bs, sector_num, NBD_MAX_SECTORS, qiov, offset);
+        if (ret < 0) {
+            return ret;
+        }
+        offset += NBD_MAX_SECTORS * 512;
+        sector_num += NBD_MAX_SECTORS;
+        nb_sectors -= NBD_MAX_SECTORS;
+    }
+    return nbd_co_writev_1(bs, sector_num, nb_sectors, qiov, offset);
+}
+
 static int nbd_co_flush(BlockDriverState *bs)
 {
     BDRVNBDState *s = bs->opaque;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [Qemu-devel] [PATCH v2 15/15] nbd: allow multiple in-flight requests
  2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
                   ` (13 preceding siblings ...)
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 14/15] nbd: split requests Paolo Bonzini
@ 2011-09-16 14:25 ` Paolo Bonzini
  14 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-16 14:25 UTC (permalink / raw)
  To: qemu-devel

Allow sending up to 16 requests, and drive the replies to the coroutine
that did the request.  The code is written to be exactly the same as
before this patch when MAX_NBD_REQUESTS == 1 (modulo the extra mutex
and state).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/nbd.c |   69 +++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 56 insertions(+), 13 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 25abaf7..8eb946f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -46,6 +46,10 @@
 #define logout(fmt, ...) ((void)0)
 #endif
 
+#define MAX_NBD_REQUESTS	16
+#define HANDLE_TO_INDEX(bs, handle) ((handle) ^ ((uint64_t)(intptr_t)bs))
+#define INDEX_TO_HANDLE(bs, index)  ((index)  ^ ((uint64_t)(intptr_t)bs))
+
 typedef struct BDRVNBDState {
     int sock;
     uint32_t nbdflags;
@@ -53,9 +57,12 @@ typedef struct BDRVNBDState {
     size_t blocksize;
     char *export_name; /* An NBD server may export several devices */
 
-    CoMutex mutex;
-    Coroutine *coroutine;
+    CoMutex send_mutex;
+    CoMutex free_sema;
+    Coroutine *send_coroutine;
+    int in_flight;
 
+    Coroutine *recv_coroutine[MAX_NBD_REQUESTS];
     struct nbd_reply reply;
 
     /* If it begins with  '/', this is a UNIX domain socket. Otherwise,
@@ -112,41 +119,68 @@ out:
 
 static void nbd_coroutine_start(BDRVNBDState *s, struct nbd_request *request)
 {
-    qemu_co_mutex_lock(&s->mutex);
-    s->coroutine = qemu_coroutine_self();
-    request->handle = (uint64_t)(intptr_t)s;
+    int i;
+
+    /* Poor man semaphore.  The free_sema is locked when no other request
+     * can be accepted, and unlocked after receiving one reply.  */
+    if (s->in_flight >= MAX_NBD_REQUESTS - 1) {
+        qemu_co_mutex_lock(&s->free_sema);
+        assert(s->in_flight < MAX_NBD_REQUESTS);
+    }
+    s->in_flight++;
+
+    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
+        if (s->recv_coroutine[i] == NULL) {
+            s->recv_coroutine[i] = qemu_coroutine_self();
+            break;
+        }
+    }
+
+    assert(i < MAX_NBD_REQUESTS);
+    request->handle = INDEX_TO_HANDLE(s, i);
 }
 
 static int nbd_have_request(void *opaque)
 {
     BDRVNBDState *s = opaque;
 
-    return !!s->coroutine;
+    return s->in_flight > 0;
 }
 
 static void nbd_reply_ready(void *opaque)
 {
     BDRVNBDState *s = opaque;
+    int i;
 
     if (s->reply.handle == 0) {
         /* No reply already in flight.  Fetch a header.  */
         if (nbd_receive_reply(s->sock, &s->reply) < 0) {
             s->reply.handle = 0;
+            goto fail;
         }
     }
 
     /* There's no need for a mutex on the receive side, because the
      * handler acts as a synchronization point and ensures that only
      * one coroutine is called until the reply finishes.  */
-    if (s->coroutine) {
-        qemu_coroutine_enter(s->coroutine, NULL);
+    i = HANDLE_TO_INDEX(s, s->reply.handle);
+    if (s->recv_coroutine[i]) {
+        qemu_coroutine_enter(s->recv_coroutine[i], NULL);
+        return;
+    }
+
+fail:
+    for (i = 0; i < MAX_NBD_REQUESTS; i++) {
+        if (s->recv_coroutine[i]) {
+            qemu_coroutine_enter(s->recv_coroutine[i], NULL);
+        }
     }
 }
 
 static void nbd_restart_write(void *opaque)
 {
     BDRVNBDState *s = opaque;
-    qemu_coroutine_enter(s->coroutine, NULL);
+    qemu_coroutine_enter(s->send_coroutine, NULL);
 }
 
 static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
@@ -154,6 +188,8 @@ static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
 {
     int rc, ret;
 
+    qemu_co_mutex_lock(&s->send_mutex);
+    s->send_coroutine = qemu_coroutine_self();
     qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, nbd_restart_write,
                             nbd_have_request, NULL, s);
     rc = nbd_send_request(s->sock, request);
@@ -166,6 +202,8 @@ static int nbd_co_send_request(BDRVNBDState *s, struct nbd_request *request,
     }
     qemu_aio_set_fd_handler(s->sock, nbd_reply_ready, NULL,
                             nbd_have_request, NULL, s);
+    s->send_coroutine = NULL;
+    qemu_co_mutex_unlock(&s->send_mutex);
     return rc;
 }
 
@@ -175,7 +213,8 @@ static void nbd_co_receive_reply(BDRVNBDState *s, struct nbd_request *request,
 {
     int ret;
 
-    /* Wait until we're woken up by the read handler.  */
+    /* Wait until we're woken up by the read handler.  TODO: perhaps
+     * peek at the next reply and avoid yielding if it's ours?  */
     qemu_coroutine_yield();
     *reply = s->reply;
     if (reply->handle != request->handle) {
@@ -195,8 +234,11 @@ static void nbd_co_receive_reply(BDRVNBDState *s, struct nbd_request *request,
 
 static void nbd_coroutine_end(BDRVNBDState *s, struct nbd_request *request)
 {
-    s->coroutine = NULL;
-    qemu_co_mutex_unlock(&s->mutex);
+    int i = HANDLE_TO_INDEX(s, request->handle);
+    s->recv_coroutine[i] = NULL;
+    if (s->in_flight-- == MAX_NBD_REQUESTS) {
+        qemu_co_mutex_unlock(&s->free_sema);
+    }
 }
 
 static int nbd_establish_connection(BlockDriverState *bs)
@@ -260,7 +302,8 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
     BDRVNBDState *s = bs->opaque;
     int result;
 
-    qemu_co_mutex_init(&s->mutex);
+    qemu_co_mutex_init(&s->send_mutex);
+    qemu_co_mutex_init(&s->free_sema);
 
     /* Pop the config into our state object. Exit if invalid. */
     result = nbd_config(s, filename, flags);
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers Paolo Bonzini
@ 2011-09-17  5:32   ` MORITA Kazutaka
  0 siblings, 0 replies; 24+ messages in thread
From: MORITA Kazutaka @ 2011-09-17  5:32 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, MORITA Kazutaka

At Fri, 16 Sep 2011 16:25:38 +0200,
Paolo Bonzini wrote:
> 
> This makes the following patch easier to review.
> 
> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/sheepdog.c |   14 +++++++-------
>  1 files changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/15] add socket_set_block
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 02/15] add socket_set_block Paolo Bonzini
@ 2011-09-17  5:33   ` MORITA Kazutaka
  0 siblings, 0 replies; 24+ messages in thread
From: MORITA Kazutaka @ 2011-09-17  5:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, MORITA Kazutaka

At Fri, 16 Sep 2011 16:25:39 +0200,
Paolo Bonzini wrote:
> 
> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  oslib-posix.c |    7 +++++++
>  oslib-win32.c |    6 ++++++
>  qemu_socket.h |    1 +
>  3 files changed, 14 insertions(+), 0 deletions(-)

Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code Paolo Bonzini
@ 2011-09-17  6:29   ` MORITA Kazutaka
  2011-09-17 14:49     ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: MORITA Kazutaka @ 2011-09-17  6:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

At Fri, 16 Sep 2011 16:25:40 +0200,
Paolo Bonzini wrote:
> 
> Outside coroutines, avoid busy waiting on EAGAIN by temporarily
> making the socket blocking.
> 
> The API of qemu_recvv/qemu_sendv is slightly different from
> do_readv/do_writev because they do not handle coroutines.  It
> returns the number of bytes written before encountering an
> EAGAIN.  The specificity of yielding on EAGAIN is entirely in
> qemu-coroutine.c.
> 
> Reviewed-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/sheepdog.c |  225 ++++++------------------------------------------------
>  cutils.c         |  177 ++++++++++++++++++++++++++++++++++++++++++
>  qemu-common.h    |   30 +++++++
>  3 files changed, 230 insertions(+), 202 deletions(-)

It seems this patch causes a compile error of qemu-ga.

Other things I noticed:

>  static int send_req(int sockfd, SheepdogReq *hdr, void *data,
>                      unsigned int *wlen)
>  {
> @@ -691,10 +509,9 @@ static int send_req(int sockfd, SheepdogReq *hdr, void *data,
>          iov[1].iov_len = *wlen;
>      }
>  
> -    ret = do_writev(sockfd, iov, sizeof(*hdr) + *wlen, 0);
> -    if (ret) {
> +    ret = qemu_sendv(sockfd, iov, sizeof(*hdr) + *wlen, 0);

This is wrong because qemu_sendv() may return a smaller value than
(sizeof(*hdr) + *wlen).  We need to do things like qemu_write_full()
here.

> +    if (ret < 0) {
>          error_report("failed to send a req, %s", strerror(errno));
> -        ret = -1;
>      }
>  
>      return ret;
> @@ -704,17 +521,19 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
>                    unsigned int *wlen, unsigned int *rlen)
>  {
>      int ret;
> +    struct iovec iov;
>  
> +    socket_set_block(sockfd);
>      ret = send_req(sockfd, hdr, data, wlen);
> -    if (ret) {
> -        ret = -1;
> +    if (ret < 0) {
>          goto out;
>      }
>  
> -    ret = do_read(sockfd, hdr, sizeof(*hdr));
> -    if (ret) {
> +    iov.iov_base = hdr;
> +    iov.iov_len = sizeof(*hdr);
> +    ret = qemu_recvv(sockfd, &iov, sizeof(*hdr), 0);

qemu_recvv() may also return a smaller value than sizeof(*hdr) here.

> +    if (ret < 0) {
>          error_report("failed to get a rsp, %s", strerror(errno));
> -        ret = -1;
>          goto out;
>      }
>  
> @@ -723,15 +542,17 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
>      }
>  
>      if (*rlen) {
> -        ret = do_read(sockfd, data, *rlen);
> -        if (ret) {
> +        iov.iov_base = data;
> +        iov.iov_len = *rlen;
> +        ret = qemu_recvv(sockfd, &iov, *rlen, 0);

Same here.

> +        if (ret < 0) {
>              error_report("failed to get the data, %s", strerror(errno));
> -            ret = -1;
>              goto out;
>          }
>      }
>      ret = 0;
>  out:
> +    socket_set_nonblock(sockfd);
>      return ret;
>  }
>  

[snip]

> +
> +/*
> + * Send/recv data with iovec buffers
> + *
> + * This function send/recv data from/to the iovec buffer directly.
> + * The first `offset' bytes in the iovec buffer are skipped and next
> + * `len' bytes are used.
> + *
> + * For example,
> + *
> + *   do_sendv_recvv(sockfd, iov, len, offset, 1);
> + *
> + * is equal to
> + *
> + *   char *buf = malloc(size);
> + *   iov_to_buf(iov, iovcnt, buf, offset, size);
> + *   send(sockfd, buf, size, 0);
> + *   free(buf);
> + */
> +static int do_sendv_recvv(int sockfd, struct iovec *iov, int len, int offset,
> +                          int do_sendv)
> +{
> +    int ret, diff, iovlen;
> +    struct iovec *last_iov;
> +
> +    /* last_iov is inclusive, so count from one.  */
> +    iovlen = 1;
> +    last_iov = iov;
> +    len += offset;
> +
> +    while (last_iov->iov_len < len) {
> +        len -= last_iov->iov_len;
> +
> +        last_iov++;
> +        iovlen++;
> +    }
> +
> +    diff = last_iov->iov_len - len;
> +    last_iov->iov_len -= diff;
> +
> +    while (iov->iov_len <= offset) {
> +        offset -= iov->iov_len;
> +
> +        iov++;
> +        iovlen--;
> +    }
> +
> +    iov->iov_base = (char *) iov->iov_base + offset;
> +    iov->iov_len -= offset;
> +
> +    {
> +#ifdef CONFIG_IOVEC
> +        struct msghdr msg;
> +        memset(&msg, 0, sizeof(msg));
> +        msg.msg_iov = iov;
> +        msg.msg_iovlen = iovlen;
> +
> +        do {
> +            if (do_sendv) {
> +                ret = sendmsg(sockfd, &msg, 0);
> +            } else {
> +                ret = recvmsg(sockfd, &msg, 0);
> +            }
> +        } while (ret == -1 && errno == EINTR);
> +#else
> +        struct iovec *p = iov;
> +        ret = 0;
> +        while (iovlen > 0) {
> +            int rc;
> +            if (do_sendv) {
> +                rc = send(sockfd, p->iov_base, p->iov_len, 0);
> +            } else {
> +                rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
> +            }
> +            if (rc == -1) {
> +                if (errno == EINTR) {
> +                    continue;
> +                }
> +                if (ret == 0) {
> +                    ret = -1;
> +                }
> +                break;
> +            }
> +            iovlen--, p++;
> +            ret += rc;
> +        }

This code can be called inside coroutines with a non-blocking fd, so
should we avoid busy waiting?


Thanks,

Kazutaka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv
  2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv Paolo Bonzini
@ 2011-09-17  6:50   ` MORITA Kazutaka
  0 siblings, 0 replies; 24+ messages in thread
From: MORITA Kazutaka @ 2011-09-17  6:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, MORITA Kazutaka

At Fri, 16 Sep 2011 16:25:41 +0200,
Paolo Bonzini wrote:
> 
> When the other side is shutdown, read returns zero (writes return EPIPE).
> In this case, care must be taken to avoid infinite loops.  This error
> was already present in sheepdog.
> 
> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  cutils.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/cutils.c b/cutils.c
> index b302020..295187f 100644
> --- a/cutils.c
> +++ b/cutils.c
> @@ -501,8 +501,11 @@ static int do_sendv_recvv(int sockfd, struct iovec *iov, int len, int offset,
>                  }
>                  break;
>              }
> -            iovlen--, p++;
> +            if (rc == 0) {
> +                break;
> +            }
>              ret += rc;
> +            iovlen--, p++;
>          }
>  #endif
>      }
> @@ -567,6 +570,9 @@ int coroutine_fn qemu_co_sendv(int sockfd, struct iovec *iov,
>              }
>              break;
>          }
> +        if (ret == 0) {
> +            break;
> +        }
>          total += ret, len -= ret;
>      }

When EPIPE is set, write() returns -1 doesn't it?

It looks like qemu_co_recvv() handles a zero return correctly, so I
think this patch is not needed.


Thanks,

Kazutaka

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-17  6:29   ` MORITA Kazutaka
@ 2011-09-17 14:49     ` Paolo Bonzini
  2011-09-17 17:16       ` MORITA Kazutaka
  2011-09-19  7:47       ` Kevin Wolf
  0 siblings, 2 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-17 14:49 UTC (permalink / raw)
  To: MORITA Kazutaka; +Cc: Kevin Wolf, qemu-devel

On 09/17/2011 08:29 AM, MORITA Kazutaka wrote:
>> >  +#else
>> >  +        struct iovec *p = iov;
>> >  +        ret = 0;
>> >  +        while (iovlen>  0) {
>> >  +            int rc;
>> >  +            if (do_sendv) {
>> >  +                rc = send(sockfd, p->iov_base, p->iov_len, 0);
>> >  +            } else {
>> >  +                rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
>> >  +            }
>> >  +            if (rc == -1) {
>> >  +                if (errno == EINTR) {
>> >  +                    continue;
>> >  +                }
>> >  +                if (ret == 0) {
>> >  +                    ret = -1;
>> >  +                }
>> >  +                break;
>> >  +            }
>> >  +            iovlen--, p++;
>> >  +            ret += rc;
>> >  +        }
> This code can be called inside coroutines with a non-blocking fd, so
> should we avoid busy waiting?

It doesn't busy wait, it exits with EAGAIN.  I'll squash in here the 
first hunk of patch 4, which is needed.

qemu_co_recvv already handles reads that return zero, unlike sheepdog's 
do_readv_writev.  I probably moved it there inadvertently while moving 
code around to cutils.c, but in order to fix qemu-ga I need to create a 
new file qemu-coroutine-io.c.

Kevin, do you want me to resubmit everything, or are you going to apply 
some more patches to the block branch (5 to 12 should be fine)?

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-17 14:49     ` Paolo Bonzini
@ 2011-09-17 17:16       ` MORITA Kazutaka
  2011-09-19  7:47       ` Kevin Wolf
  1 sibling, 0 replies; 24+ messages in thread
From: MORITA Kazutaka @ 2011-09-17 17:16 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, MORITA Kazutaka

At Sat, 17 Sep 2011 16:49:22 +0200,
Paolo Bonzini wrote:
> 
> On 09/17/2011 08:29 AM, MORITA Kazutaka wrote:
> >> >  +#else
> >> >  +        struct iovec *p = iov;
> >> >  +        ret = 0;
> >> >  +        while (iovlen>  0) {
> >> >  +            int rc;
> >> >  +            if (do_sendv) {
> >> >  +                rc = send(sockfd, p->iov_base, p->iov_len, 0);
> >> >  +            } else {
> >> >  +                rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
> >> >  +            }
> >> >  +            if (rc == -1) {
> >> >  +                if (errno == EINTR) {
> >> >  +                    continue;
> >> >  +                }
> >> >  +                if (ret == 0) {
> >> >  +                    ret = -1;
> >> >  +                }
> >> >  +                break;
> >> >  +            }
> >> >  +            iovlen--, p++;
> >> >  +            ret += rc;
> >> >  +        }
> > This code can be called inside coroutines with a non-blocking fd, so
> > should we avoid busy waiting?
> 
> It doesn't busy wait, it exits with EAGAIN.  I'll squash in here the 

Oops, you're right.  Sorry for the noise.

Thanks,

Kazutaka


> first hunk of patch 4, which is needed.
> 
> qemu_co_recvv already handles reads that return zero, unlike sheepdog's 
> do_readv_writev.  I probably moved it there inadvertently while moving 
> code around to cutils.c, but in order to fix qemu-ga I need to create a 
> new file qemu-coroutine-io.c.
> 
> Kevin, do you want me to resubmit everything, or are you going to apply 
> some more patches to the block branch (5 to 12 should be fine)?
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-17 14:49     ` Paolo Bonzini
  2011-09-17 17:16       ` MORITA Kazutaka
@ 2011-09-19  7:47       ` Kevin Wolf
  2011-09-19  9:34         ` Paolo Bonzini
  1 sibling, 1 reply; 24+ messages in thread
From: Kevin Wolf @ 2011-09-19  7:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel, MORITA Kazutaka

Am 17.09.2011 16:49, schrieb Paolo Bonzini:
> On 09/17/2011 08:29 AM, MORITA Kazutaka wrote:
>>>>  +#else
>>>>  +        struct iovec *p = iov;
>>>>  +        ret = 0;
>>>>  +        while (iovlen>  0) {
>>>>  +            int rc;
>>>>  +            if (do_sendv) {
>>>>  +                rc = send(sockfd, p->iov_base, p->iov_len, 0);
>>>>  +            } else {
>>>>  +                rc = qemu_recv(sockfd, p->iov_base, p->iov_len, 0);
>>>>  +            }
>>>>  +            if (rc == -1) {
>>>>  +                if (errno == EINTR) {
>>>>  +                    continue;
>>>>  +                }
>>>>  +                if (ret == 0) {
>>>>  +                    ret = -1;
>>>>  +                }
>>>>  +                break;
>>>>  +            }
>>>>  +            iovlen--, p++;
>>>>  +            ret += rc;
>>>>  +        }
>> This code can be called inside coroutines with a non-blocking fd, so
>> should we avoid busy waiting?
> 
> It doesn't busy wait, it exits with EAGAIN.  I'll squash in here the 
> first hunk of patch 4, which is needed.
> 
> qemu_co_recvv already handles reads that return zero, unlike sheepdog's 
> do_readv_writev.  I probably moved it there inadvertently while moving 
> code around to cutils.c, but in order to fix qemu-ga I need to create a 
> new file qemu-coroutine-io.c.
> 
> Kevin, do you want me to resubmit everything, or are you going to apply 
> some more patches to the block branch (5 to 12 should be fine)?

As long as it's clear what the current version is, I don't mind. Do I
understand right that there will be a v3 for patches 3 and 4?

Kevin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code
  2011-09-19  7:47       ` Kevin Wolf
@ 2011-09-19  9:34         ` Paolo Bonzini
  0 siblings, 0 replies; 24+ messages in thread
From: Paolo Bonzini @ 2011-09-19  9:34 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, MORITA Kazutaka

On 09/19/2011 09:47 AM, Kevin Wolf wrote:
> As long as it's clear what the current version is, I don't mind. Do I
> understand right that there will be a v3 for patches 3 and 4?

Yes.

Paolo

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-09-19  9:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-16 14:25 [Qemu-devel] [PATCH v2 00/15] nbd improvements Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 01/15] sheepdog: add coroutine_fn markers Paolo Bonzini
2011-09-17  5:32   ` MORITA Kazutaka
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 02/15] add socket_set_block Paolo Bonzini
2011-09-17  5:33   ` MORITA Kazutaka
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 03/15] sheepdog: move coroutine send/recv function to generic code Paolo Bonzini
2011-09-17  6:29   ` MORITA Kazutaka
2011-09-17 14:49     ` Paolo Bonzini
2011-09-17 17:16       ` MORITA Kazutaka
2011-09-19  7:47       ` Kevin Wolf
2011-09-19  9:34         ` Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 04/15] coroutine-io: handle zero returns from recv Paolo Bonzini
2011-09-17  6:50   ` MORITA Kazutaka
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 05/15] block: emulate .bdrv_flush() using .bdrv_aio_flush() Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 06/15] block: group together the plugging of synchronous IO emulation Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 07/15] block: add bdrv_co_flush support Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 08/15] block: add bdrv_co_discard and bdrv_aio_discard support Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 09/15] nbd: fix error handling in the server Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 10/15] nbd: add support for NBD_CMD_FLUSH Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 11/15] nbd: add support for NBD_CMD_FLAG_FUA Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 12/15] nbd: add support for NBD_CMD_TRIM Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 13/15] nbd: switch to asynchronous operation Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 14/15] nbd: split requests Paolo Bonzini
2011-09-16 14:25 ` [Qemu-devel] [PATCH v2 15/15] nbd: allow multiple in-flight requests Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).