All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, Roman Pen <roman.penyaev@profitbricks.com>
Subject: Re: [Qemu-devel] [PATCH v2 3/3] linux-aio: fix re-entrant completion processing
Date: Wed, 28 Sep 2016 10:40:07 +0800	[thread overview]
Message-ID: <20160928024007.GG1284@lemon> (raw)
In-Reply-To: <1474989516-18255-4-git-send-email-stefanha@redhat.com>

On Tue, 09/27 16:18, Stefan Hajnoczi wrote:
> Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> completions from ioq_submit()") added an optimization that processes
> completions each time ioq_submit() returns with requests in flight.
> This commit introduces a "Co-routine re-entered recursively" error which
> can be triggered with -drive format=qcow2,aio=native.
> 
> Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
> debugged the following backtrace:
> 
>   (gdb) bt
>   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
>   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
>   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
>   #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
>   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
>   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
>   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
>   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
>   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
>   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
>   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
>   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
>   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
>   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
>   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
>   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> It turned out that re-entrant ioq_submit() and completion processing
> between three requests caused this error.  The following check is not
> sufficient to prevent recursively entering coroutines:
> 
>   if (laiocb->co != qemu_coroutine_self()) {
>       qemu_coroutine_enter(laiocb->co);
>   }
> 
> As the following coroutine backtrace shows, not just the current
> coroutine (self) can be entered.  There might also be other coroutines
> that are currently entered and transferred control due to the qcow2 lock
> (CoMutex):
> 
>   (gdb) qemu coroutine 0x5555583464d0
>   #0  0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175
>   #1  0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117
>   #2  0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60
>   #3  0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119
>   #4  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
>   #5  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
>   #6  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383
>   #7  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402
>   #8  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804
>   #9  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041
>   #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133
>   #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509
>   #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804
>   #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041
>   #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133
>   #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783
>   #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991
>   #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> 
> Use the new qemu_coroutine_entered() function instead of comparing
> against qemu_coroutine_self().  This is correct because:
> 
> 1. If a coroutine is not entered then it must have yielded to wait for
>    I/O completion.  It is therefore safe to enter.
> 
> 2. If a coroutine is entered then it must be in
>    ioq_submit()/qemu_laio_process_completions() because otherwise it
>    would be yielded while waiting for I/O completion.  Therefore it will
>    check laio->ret and return from ioq_submit() instead of yielding,
>    i.e. it's guaranteed not to hang.
> 
> Reported-by: Fam Zheng <famz@redhat.com>
> Tested-by: Fam Zheng <famz@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block/linux-aio.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/block/linux-aio.c b/block/linux-aio.c
> index d4e19d4..1685ec2 100644
> --- a/block/linux-aio.c
> +++ b/block/linux-aio.c
> @@ -94,9 +94,12 @@ static void qemu_laio_process_completion(struct qemu_laiocb *laiocb)
>  
>      laiocb->ret = ret;
>      if (laiocb->co) {
> -        /* Jump and continue completion for foreign requests, don't do
> -         * anything for current request, it will be completed shortly. */
> -        if (laiocb->co != qemu_coroutine_self()) {
> +        /* If the coroutine is already entered it must be in ioq_submit() and
> +         * will notice laio->ret has been filled in when it eventually runs
> +         * later.  Coroutines cannot be entered recursively so avoid doing
> +         * that!
> +         */
> +        if (!qemu_coroutine_entered(laiocb->co)) {
>              qemu_coroutine_enter(laiocb->co);
>          }
>      } else {
> -- 
> 2.7.4
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

  reply	other threads:[~2016-09-28  2:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27 15:18 [Qemu-devel] [PATCH v2 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
2016-09-27 15:18 ` [Qemu-devel] [PATCH v2 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
2016-09-28  2:19   ` Fam Zheng
2016-09-27 15:18 ` [Qemu-devel] [PATCH v2 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
2016-09-28  2:38   ` Fam Zheng
2016-09-27 15:18 ` [Qemu-devel] [PATCH v2 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
2016-09-28  2:40   ` Fam Zheng [this message]
2016-09-28 16:12 ` [Qemu-devel] [Qemu-block] [PATCH v2 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160928024007.GG1284@lemon \
    --to=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=roman.penyaev@profitbricks.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.