From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56634) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpBGA-00024Y-Op for qemu-devel@nongnu.org; Wed, 28 Sep 2016 05:34:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpBG9-0002Ay-QJ for qemu-devel@nongnu.org; Wed, 28 Sep 2016 05:34:38 -0400 Date: Wed, 28 Sep 2016 17:34:24 +0800 From: Fam Zheng Message-ID: <20160928093424.GA11641@lemon> References: <1474985217-21690-1-git-send-email-stefanha@redhat.com> <1474985217-21690-4-git-send-email-stefanha@redhat.com> <20160927152538.GC2835@stefanha-x1.localdomain> <20160928030118.GH1284@lemon> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Roman Penyaev Cc: Stefan Hajnoczi , Kevin Wolf , qemu-devel , qemu-block@nongnu.org On Wed, 09/28 11:14, Roman Penyaev wrote: > On Wed, Sep 28, 2016 at 5:01 AM, Fam Zheng wrote: > > On Tue, 09/27 19:55, Roman Penyaev wrote: > >> > The bug is 100% deterministic. Just boot up a guest with -drive > >> > format=qcow2,aio=native. > >> > >> It turns out to be that everything is broken. I started all my > >> tests with format=raw,aio=native and immediately got coroutine > >> recursive. That is completely weird. > >> > >> So, what I did is the following: > >> > >> 1. Took latest master (nothing works) > >> 2. Did interactive rebase to 12c8720 > >> 12c8720 2016-06-28 | Merge remote-tracking branch > >> 'remotes/stefanha/tags/block-pull-request' into staging [Peter > >> Maydell] > >> > >> this merge request includes all your patches related to > >> virtio-blk and MQ support. > >> > >> 3. Applied 0ed93d84edab. Everything works fine. > > > > Have you tried qcow2 at this point? raw crashes with 1a62d0accdf85 doesn't mean > > qcow2 is fine without it. > > > > That's true. qcow2 IO path is different, and presence of the > patch 1a62d0accdf85 does not affect - coroutine still enters > recursively. > > But for me it is quite surprising that IO fragmentation (what > was done in 1a62d0accdf85) rises the misbehavior on raw IO path. Maybe the mystery with this change is your particular I/O pattern on the raw image is change thereafter, from ioq = 1 to ioq > 1 (from the linux-aio.c's PoV, due to fragmentation), then multiple coroutines are created for one big request, to trigger the crash. Fam > > But of course originally issue was introduced by me. Stefan, > thanks for a fix. > > -- > Roman