qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Roman Penyaev <roman.penyaev@profitbricks.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, Fam Zheng <famz@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
Date: Tue, 27 Sep 2016 16:25:38 +0100	[thread overview]
Message-ID: <20160927152538.GC2835@stefanha-x1.localdomain> (raw)
In-Reply-To: <CAJrWOzAFraMQgptx--rZ8qaV0x++KRdQ-Gy855ye1ESvtKvEVA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4954 bytes --]

On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> > completions from ioq_submit()") added an optimization that processes
> > completions each time ioq_submit() returns with requests in flight.
> > This commit introduces a "Co-routine re-entered recursively" error which
> > can be triggered with -drive format=qcow2,aio=native.
> >
> > Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
> > debugged the following backtrace:
> >
> >   (gdb) bt
> >   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> >   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> >   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
> >   #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
> >   #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
> >   #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
> >   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
> >   #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
> >   #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
> >   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
> >   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
> >   #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
> >   #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
> >   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
> >   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
> >   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
> >   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78
> >
> > It turned out that re-entrant ioq_submit() and completion processing
> > between three requests caused this error.  The following check is not
> > sufficient to prevent recursively entering coroutines:
> >
> >   if (laiocb->co != qemu_coroutine_self()) {
> >       qemu_coroutine_enter(laiocb->co);
> >   }
> >
> > As the following coroutine backtrace shows, not just the current
> > coroutine (self) can be entered.  There might also be other coroutines
> > that are currently entered and transferred control due to the qcow2 lock
> > (CoMutex):
> 
> I doubt that that was introduced by the commit you've specified:
> 0ed93d84edab.
> 
> Before my patch coroutine was unconditionally entered.  The following
> is what was changed by 0ed93d84edab:
> 
>      if (laiocb->co) {
> -        qemu_coroutine_enter(laiocb->co);
> +        /* Jump and continue completion for foreign requests, don't do
> +         * anything for current request, it will be completed shortly. */
> +        if (laiocb->co != qemu_coroutine_self()) {
> +            qemu_coroutine_enter(laiocb->co);
> +        }

Unconditionally entering was safe prior to 0ed93d84edab since all
coroutines yielded and qemu_coroutine_entered() would be false all the
time.  Therefore it wasn't necessary to protect against re-entering a
coroutine.

> If you have a strong reproduction, could you please verify that.

The bug is 100% deterministic.  Just boot up a guest with -drive
format=qcow2,aio=native.

I noticed that I forgot to include a second backtrace in the commit
description.  I am resending the patch series with the missing
backtrace.  Hopefully that will make the bug clearer.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

  reply	other threads:[~2016-09-27 15:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-27 14:06 [Qemu-devel] [PATCH 0/3] linux-aio: fix "Co-routine re-entered recursively" error Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 1/3] coroutine: add qemu_coroutine_entered() function Stefan Hajnoczi
2016-09-27 16:20   ` Paolo Bonzini
2016-09-27 16:29     ` Stefan Hajnoczi
2016-09-27 16:55       ` Paolo Bonzini
2016-09-28  9:50         ` Kevin Wolf
2016-09-27 14:06 ` [Qemu-devel] [PATCH 2/3] test-coroutine: test qemu_coroutine_entered() Stefan Hajnoczi
2016-09-27 14:06 ` [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing Stefan Hajnoczi
2016-09-27 14:29   ` Roman Penyaev
2016-09-27 15:25     ` Stefan Hajnoczi [this message]
2016-09-27 17:55       ` Roman Penyaev
2016-09-28  3:01         ` Fam Zheng
2016-09-28  9:14           ` Roman Penyaev
2016-09-28  9:34             ` Fam Zheng
2016-09-28  9:38               ` Roman Penyaev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160927152538.GC2835@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=roman.penyaev@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).