From: Stefan Hajnoczi <stefanha@redhat.com>
To: Roman Penyaev <roman.penyaev@profitbricks.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, Kevin Wolf <kwolf@redhat.com>,
qemu-block@nongnu.org, Fam Zheng <famz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
Date: Tue, 27 Sep 2016 16:25:38 +0100 [thread overview]
Message-ID: <20160927152538.GC2835@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAJrWOzAFraMQgptx--rZ8qaV0x++KRdQ-Gy855ye1ESvtKvEVA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]
On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> > completions from ioq_submit()") added an optimization that processes
> > completions each time ioq_submit() returns with requests in flight.
> > This commit introduces a "Co-routine re-entered recursively" error which
> > can be triggered with -drive format=qcow2,aio=native.
> >
> > Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
> > debugged the following backtrace:
> >
> > (gdb) bt
> > #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> > #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> > #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
> > #3 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
> > #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
> > #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
> > #6 0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
> > #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
> > #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
> > #9 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
> > #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
> > #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
> > #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
> > #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
> > #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
> > #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
> > #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> >
> > It turned out that re-entrant ioq_submit() and completion processing
> > between three requests caused this error. The following check is not
> > sufficient to prevent recursively entering coroutines:
> >
> > if (laiocb->co != qemu_coroutine_self()) {
> > qemu_coroutine_enter(laiocb->co);
> > }
> >
> > As the following coroutine backtrace shows, not just the current
> > coroutine (self) can be entered. There might also be other coroutines
> > that are currently entered and transferred control due to the qcow2 lock
> > (CoMutex):
>
> I doubt that that was introduced by the commit you've specified:
> 0ed93d84edab.
>
> Before my patch coroutine was unconditionally entered. The following
> is what was changed by 0ed93d84edab:
>
> if (laiocb->co) {
> - qemu_coroutine_enter(laiocb->co);
> + /* Jump and continue completion for foreign requests, don't do
> + * anything for current request, it will be completed shortly. */
> + if (laiocb->co != qemu_coroutine_self()) {
> + qemu_coroutine_enter(laiocb->co);
> + }
Unconditionally entering was safe prior to 0ed93d84edab since all
coroutines yielded and qemu_coroutine_entered() would be false all the
time. Therefore it wasn't necessary to protect against re-entering a
coroutine.
> If you have a strong reproduction, could you please verify that.
The bug is 100% deterministic. Just boot up a guest with -drive
format=qcow2,aio=native.
I noticed that I forgot to include a second backtrace in the commit
description. I am resending the patch series with the missing
backtrace. Hopefully that will make the bug clearer.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
next prev parent reply other threads:[~2016-09-27 15:25 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
2016-09-27 16:20 ` Paolo Bonzini
2016-09-27 16:29 ` Stefan Hajnoczi
2016-09-27 16:55 ` Paolo Bonzini
2016-09-28 9:50 ` Kevin Wolf
2016-09-27 14:06 ` [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
2016-09-27 14:29 ` Roman Penyaev
2016-09-27 15:25 ` Stefan Hajnoczi [this message]
2016-09-27 17:55 ` Roman Penyaev
2016-09-28 3:01 ` Fam Zheng
2016-09-28 9:14 ` Roman Penyaev
2016-09-28 9:34 ` Fam Zheng
2016-09-28 9:38 ` Roman Penyaev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160927152538.GC2835@stefanha-x1.localdomain \
--to=stefanha@redhat.com \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=roman.penyaev@profitbricks.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).