All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission
@ 2014-12-01  9:04 Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 1/3] linux-aio: fix submit aio as a batch Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Ming Lei @ 2014-12-01  9:04 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Kevin Wolf

The 1st patch fixes batch submission.

The 2nd one fixes -EAGAIN for non-batch case.

The 3rd one is a cleanup.

This patchset is splitted from previous patchset(dataplane: optimization
and multi virtqueue support), as suggested by Stefan.

V7:
	- add protection for aborting in laio_attach_aio_context(), as suggested
	by Stefan, 1/3
	- patch style, return real aborting failure to caller, as suggested
	by Kevin, 1/3
	- track pending I/O and only handle -EAGAIN if there is pending I/O,
	pointed by Kevin, 2/3 

V6:
	- don't pass ioq_submit() return value to ioq_enqueue(), as suggested
	by Stefan
	- fix one build failure introduced in V5, reported by Stefan

V5:
	- in case of submission failure, return -EIO for new coming requests
	until aborting is handled
	- in patch2, follow Paolo's suggestion about ioq_enqueue() changes

V4:
	- abort reuqests in BH to abvoid potential "Co-routine re-entered recursively"
	- remove 'enqueue' parameter to ioq_submit() to simpify change
	- beautify code as suggested by Paolo

V3:
	- rebase on QEMU master
V2:
	- code style fix and commit log fix as suggested by Benoît Canet
V1:
	- rebase on latest QEMU master

 block/linux-aio.c |  139 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 116 insertions(+), 23 deletions(-)


Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v7 1/3] linux-aio: fix submit aio as a batch
  2014-12-01  9:04 [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Ming Lei
@ 2014-12-01  9:04 ` Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2014-12-01  9:04 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Kevin Wolf; +Cc: Ming Lei

In the submit path, we can't complete request directly,
otherwise "Co-routine re-entered recursively" may be caused,
so this patch fixes the issue with below ideas:

	- for -EAGAIN or partial submission, retry the submision
	in following completion cb which is run in BH context
	- for part of submission, update the io queue too
	- for case of io queue full, submit queued requests
	immediatelly and return failure to caller
	- for other failure, abort all queued requests in BH
    context, and requests won't be allow to submit until
    aborting is handled

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/linux-aio.c |  116 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 97 insertions(+), 19 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index d92513b..53c5616 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -38,11 +38,21 @@ struct qemu_laiocb {
     QLIST_ENTRY(qemu_laiocb) node;
 };
 
+/*
+ * TODO: support to batch I/O from multiple bs in one same
+ * AIO context, one important use case is multi-lun scsi,
+ * so in future the IO queue should be per AIO context.
+ */
 typedef struct {
     struct iocb *iocbs[MAX_QUEUED_IO];
     int plugged;
     unsigned int size;
     unsigned int idx;
+
+    /* abort queued requests in BH context */
+    QEMUBH *abort_bh;
+    bool aborting;
+    int abort_ret;
 } LaioQueue;
 
 struct qemu_laio_state {
@@ -59,6 +69,8 @@ struct qemu_laio_state {
     int event_max;
 };
 
+static int ioq_submit(struct qemu_laio_state *s);
+
 static inline ssize_t io_event_ret(struct io_event *ev)
 {
     return (ssize_t)(((uint64_t)ev->res2 << 32) | ev->res);
@@ -135,6 +147,11 @@ static void qemu_laio_completion_bh(void *opaque)
 
         qemu_laio_process_completion(s, laiocb);
     }
+
+    /* Handle -EAGAIN or partial submission */
+    if (s->io_q.idx) {
+        ioq_submit(s);
+    }
 }
 
 static void qemu_laio_completion_cb(EventNotifier *e)
@@ -175,47 +192,100 @@ static void ioq_init(LaioQueue *io_q)
     io_q->size = MAX_QUEUED_IO;
     io_q->idx = 0;
     io_q->plugged = 0;
+    io_q->aborting = false;
 }
 
+/* Always return >= 0 and it means how many requests are submitted */
 static int ioq_submit(struct qemu_laio_state *s)
 {
-    int ret, i = 0;
+    int ret;
     int len = s->io_q.idx;
 
-    do {
-        ret = io_submit(s->ctx, len, s->io_q.iocbs);
-    } while (i++ < 3 && ret == -EAGAIN);
-
-    /* empty io queue */
-    s->io_q.idx = 0;
+    if (!len) {
+        return 0;
+    }
 
+    ret = io_submit(s->ctx, len, s->io_q.iocbs);
     if (ret < 0) {
-        i = 0;
-    } else {
-        i = ret;
+        /* retry in following completion cb */
+        if (ret == -EAGAIN) {
+            return 0;
+        }
+
+        /*
+         * Abort in BH context for avoiding Co-routine re-entered,
+         * and update io queue at that time
+         */
+        s->io_q.aborting = true;
+        s->io_q.abort_ret = ret;
+        qemu_bh_schedule(s->io_q.abort_bh);
+        ret = 0;
     }
 
-    for (; i < len; i++) {
-        struct qemu_laiocb *laiocb =
-            container_of(s->io_q.iocbs[i], struct qemu_laiocb, iocb);
+    /*
+     * update io queue, and retry will be started automatically
+     * in following completion cb for the remainder
+     */
+    if (ret > 0) {
+        if (ret < len) {
+            memmove(&s->io_q.iocbs[0], &s->io_q.iocbs[ret],
+                    (len - ret) * sizeof(struct iocb *));
+        }
+        s->io_q.idx -= ret;
+    }
+
+    return ret;
+}
 
-        laiocb->ret = (ret < 0) ? ret : -EIO;
+static void ioq_abort_bh(void *opaque)
+{
+    struct qemu_laio_state *s = opaque;
+    int i;
+
+    for (i = 0; i < s->io_q.idx; i++) {
+        struct qemu_laiocb *laiocb = container_of(s->io_q.iocbs[i],
+                                                  struct qemu_laiocb,
+                                                  iocb);
+        laiocb->ret = s->io_q.abort_ret;
         qemu_laio_process_completion(s, laiocb);
     }
-    return ret;
+
+    s->io_q.idx = 0;
+    s->io_q.aborting = false;
 }
 
-static void ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
+static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
 {
     unsigned int idx = s->io_q.idx;
 
+    /* Request can't be allowed to submit until aborting is handled */
+    if (unlikely(s->io_q.aborting)) {
+        return -EIO;
+    }
+
+    if (unlikely(idx == s->io_q.size)) {
+        ioq_submit(s);
+
+        if (unlikely(s->io_q.aborting)) {
+            return -EIO;
+        }
+        idx = s->io_q.idx;
+    }
+
+    /* It has to return now if queue is still full */
+    if (unlikely(idx == s->io_q.size)) {
+        return -EAGAIN;
+    }
+
     s->io_q.iocbs[idx++] = iocb;
     s->io_q.idx = idx;
 
-    /* submit immediately if queue is full */
-    if (idx == s->io_q.size) {
+    /* submit immediately if queue depth is above 2/3 */
+    if (idx > s->io_q.size * 2 / 3) {
         ioq_submit(s);
     }
+
+    return 0;
 }
 
 void laio_io_plug(BlockDriverState *bs, void *aio_ctx)
@@ -281,7 +351,9 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
             goto out_free_aiocb;
         }
     } else {
-        ioq_enqueue(s, iocbs);
+        if (ioq_enqueue(s, iocbs) < 0) {
+            goto out_free_aiocb;
+        }
     }
     return &laiocb->common;
 
@@ -296,14 +368,20 @@ void laio_detach_aio_context(void *s_, AioContext *old_context)
 
     aio_set_event_notifier(old_context, &s->e, NULL);
     qemu_bh_delete(s->completion_bh);
+    qemu_bh_delete(s->io_q.abort_bh);
 }
 
 void laio_attach_aio_context(void *s_, AioContext *new_context)
 {
     struct qemu_laio_state *s = s_;
 
+    s->io_q.abort_bh = aio_bh_new(new_context, ioq_abort_bh, s);
     s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s);
     aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb);
+
+    if (s->io_q.aborting) {
+        qemu_bh_schedule(s->io_q.abort_bh);
+    }
 }
 
 void *laio_init(void)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v7 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case
  2014-12-01  9:04 [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 1/3] linux-aio: fix submit aio as a batch Ming Lei
@ 2014-12-01  9:04 ` Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
  2014-12-10 13:11 ` [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2014-12-01  9:04 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Kevin Wolf; +Cc: Ming Lei

Previously -EAGAIN is simply ignored for !s->io_q.plugged case,
and sometimes it is easy to cause -EIO to VM, such as NVME device.

This patch handles -EAGAIN by io queue for !s->io_q.plugged case,
and it will be retried in following aio completion cb.

Most of times, -EAGAIN only happens if there is pending I/O, but
from linux kernel AIO implementation io_submit() might return it
when kmem_cache_alloc(GFP_KERNEL) returns NULL too. So 'pending'
in 'struct qemu_laio_state' is introduced for tracking active IO,
and -EAGAIN is handled when there is pending I/O.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/linux-aio.c |   32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 53c5616..9403b17 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -56,6 +56,7 @@ typedef struct {
 } LaioQueue;
 
 struct qemu_laio_state {
+    unsigned long pending;
     io_context_t ctx;
     EventNotifier e;
 
@@ -98,6 +99,7 @@ static void qemu_laio_process_completion(struct qemu_laio_state *s,
             }
         }
     }
+    s->pending--;
     laiocb->common.cb(laiocb->common.opaque, ret);
 
     qemu_aio_unref(laiocb);
@@ -179,6 +181,7 @@ static void laio_cancel(BlockAIOCB *blockacb)
         return;
     }
 
+    laiocb->ctx->pending--;
     laiocb->common.cb(laiocb->common.opaque, laiocb->ret);
 }
 
@@ -280,8 +283,13 @@ static int ioq_enqueue(struct qemu_laio_state *s, struct iocb *iocb)
     s->io_q.iocbs[idx++] = iocb;
     s->io_q.idx = idx;
 
-    /* submit immediately if queue depth is above 2/3 */
-    if (idx > s->io_q.size * 2 / 3) {
+    /*
+     * This is reached in two cases: queue not plugged but io_submit
+     * returned -EAGAIN, or queue plugged.  In the latter case, start
+     * submitting some I/O if the queue is getting too full.  In the
+     * former case, instead, wait until an I/O operation is completed.
+     */
+    if (s->io_q.plugged && unlikely(idx > s->io_q.size * 2 / 3)) {
         ioq_submit(s);
     }
 
@@ -346,15 +354,23 @@ BlockAIOCB *laio_submit(BlockDriverState *bs, void *aio_ctx, int fd,
     }
     io_set_eventfd(&laiocb->iocb, event_notifier_get_fd(&s->e));
 
-    if (!s->io_q.plugged) {
-        if (io_submit(s->ctx, 1, &iocbs) < 0) {
-            goto out_free_aiocb;
-        }
-    } else {
-        if (ioq_enqueue(s, iocbs) < 0) {
+    /* Switch to queue mode until -EAGAIN is handled */
+    if (!s->io_q.plugged && !s->io_q.idx) {
+        int ret = io_submit(s->ctx, 1, &iocbs);
+        if (ret >= 0) {
+            return &laiocb->common;
+        } else if (ret != -EAGAIN || (ret == -EAGAIN && !s->pending)) {
             goto out_free_aiocb;
         }
+        /*
+         * In case of -EAGAIN, only queue the req if there is pending
+         * I/O and it is resubmitted in completion of pending I/O
+         */
+    }
+    if (ioq_enqueue(s, iocbs) < 0) {
+        goto out_free_aiocb;
     }
+    s->pending++;
     return &laiocb->common;
 
 out_free_aiocb:
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH v7 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb'
  2014-12-01  9:04 [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 1/3] linux-aio: fix submit aio as a batch Ming Lei
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
@ 2014-12-01  9:04 ` Ming Lei
  2014-12-10 13:11 ` [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Ming Lei @ 2014-12-01  9:04 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini, Stefan Hajnoczi, Kevin Wolf; +Cc: Ming Lei

No one uses the 'node' field any more, so remove it
from 'struct qemu_laiocb', and this can save 16byte
for the struct on 64bit arch.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/linux-aio.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index 9403b17..cad3848 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -35,7 +35,6 @@ struct qemu_laiocb {
     size_t nbytes;
     QEMUIOVector *qiov;
     bool is_read;
-    QLIST_ENTRY(qemu_laiocb) node;
 };
 
 /*
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission
  2014-12-01  9:04 [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Ming Lei
                   ` (2 preceding siblings ...)
  2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
@ 2014-12-10 13:11 ` Paolo Bonzini
  3 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2014-12-10 13:11 UTC (permalink / raw)
  To: Ming Lei, qemu-devel, Stefan Hajnoczi, Kevin Wolf



On 01/12/2014 10:04, Ming Lei wrote:
> The 1st patch fixes batch submission.
> 
> The 2nd one fixes -EAGAIN for non-batch case.
> 
> The 3rd one is a cleanup.
> 
> This patchset is splitted from previous patchset(dataplane: optimization
> and multi virtqueue support), as suggested by Stefan.

I think this series needs a rewrite.  Instead of removing 'node' from
'struct qemu_laiocb', each LaioQueue should actually have a list (or
SIMPLEQ) of pending 'struct qemu_laiocb'.

Every time you do an io_submit, you copy from the laiocb queue to the
iocbs array.  If it returns N, the first N elements are removed from the
list.  There is no need to do a memmove, the iocbs will be populated
again on the next io_submit.  If io_submit fails with anything but
EAGAIN, just abort().  It should never happen.

Every time you get a request, and the list was empty, you do an io_submit.

Every time you get a bdrv_unplug and the nesting level was 1, you do an
io_submit.

Every time you get a completion, if the pending queue is not empty, you
do an io_submit.

It should be _that_ simple.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-12-10 13:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-01  9:04 [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Ming Lei
2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 1/3] linux-aio: fix submit aio as a batch Ming Lei
2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 2/3] linux-aio: handling -EAGAIN for !s->io_q.plugged case Ming Lei
2014-12-01  9:04 ` [Qemu-devel] [PATCH v7 3/3] linux-aio: remove 'node' from 'struct qemu_laiocb' Ming Lei
2014-12-10 13:11 ` [Qemu-devel] [PATCH v7 0/3] linux-aio: fix batch submission Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.