From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46980) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1botOo-0004hX-R4 for qemu-devel@nongnu.org; Tue, 27 Sep 2016 10:30:27 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1botOj-0004U7-3p for qemu-devel@nongnu.org; Tue, 27 Sep 2016 10:30:21 -0400 Received: from mail-ua0-x236.google.com ([2607:f8b0:400c:c08::236]:33888) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1botOi-0004Tn-Uh for qemu-devel@nongnu.org; Tue, 27 Sep 2016 10:30:17 -0400 Received: by mail-ua0-x236.google.com with SMTP id u68so12259101uau.1 for ; Tue, 27 Sep 2016 07:30:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1474985217-21690-4-git-send-email-stefanha@redhat.com> References: <1474985217-21690-1-git-send-email-stefanha@redhat.com> <1474985217-21690-4-git-send-email-stefanha@redhat.com> From: Roman Penyaev Date: Tue, 27 Sep 2016 16:29:55 +0200 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel , Kevin Wolf , qemu-block@nongnu.org, Fam Zheng Hey Stefan, On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi wrote: > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process > completions from ioq_submit()") added an optimization that processes > completions each time ioq_submit() returns with requests in flight. > This commit introduces a "Co-routine re-entered recursively" error which > can be triggered with -drive format=qcow2,aio=native. > > Fam Zheng , Kevin Wolf , and I > debugged the following backtrace: > > (gdb) bt > #0 0x00007ffff0a046f5 in raise () at /lib64/libc.so.6 > #1 0x00007ffff0a062fa in abort () at /lib64/libc.so.6 > #2 0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113 > #3 0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218 > #4 0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331 > #5 0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383 > #6 0x0000555555a4bbd3 in laio_co_submit (bs=, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402 > #7 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804 > #8 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041 > #9 0x0000555555a52db8 in bdrv_co_preadv (child=, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133 > #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=) at block/qcow2.c:1509 > #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804 > #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041 > #13 0x0000555555a52db8 in bdrv_co_preadv (child=, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133 > #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783 > #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991 > #16 0x0000555555ac0cfa in coroutine_trampoline (i0=, i1=) at util/coroutine-ucontext.c:78 > > It turned out that re-entrant ioq_submit() and completion processing > between three requests caused this error. The following check is not > sufficient to prevent recursively entering coroutines: > > if (laiocb->co != qemu_coroutine_self()) { > qemu_coroutine_enter(laiocb->co); > } > > As the following coroutine backtrace shows, not just the current > coroutine (self) can be entered. There might also be other coroutines > that are currently entered and transferred control due to the qcow2 lock > (CoMutex): I doubt that that was introduced by the commit you've specified: 0ed93d84edab. Before my patch coroutine was unconditionally entered. The following is what was changed by 0ed93d84edab: if (laiocb->co) { - qemu_coroutine_enter(laiocb->co); + /* Jump and continue completion for foreign requests, don't do + * anything for current request, it will be completed shortly. */ + if (laiocb->co != qemu_coroutine_self()) { + qemu_coroutine_enter(laiocb->co); + } If you have a strong reproduction, could you please verify that. What worries me is the following: 1. Issue was introduced before and was unnoticed (ok). 2. Issue - is something else and/or was really introduced by commit 0ed93d84edab (not ok). Of course the 2. is not nice. Thanks. -- Roman