From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38972) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6F3y-0006XN-O5 for qemu-devel@nongnu.org; Tue, 08 Dec 2015 05:00:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1a6F3t-0003IW-Oh for qemu-devel@nongnu.org; Tue, 08 Dec 2015 05:00:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54123) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1a6F3t-0003IS-HC for qemu-devel@nongnu.org; Tue, 08 Dec 2015 04:59:57 -0500 Date: Tue, 8 Dec 2015 10:59:54 +0100 From: Kevin Wolf Message-ID: <20151208095954.GD5071@noname.str.redhat.com> References: <1449118802-12047-1-git-send-email-stefanha@redhat.com> <1449118802-12047-3-git-send-email-stefanha@redhat.com> <20151207110251.6391b306.cornelia.huck@de.ibm.com> <20151207174229.4edc6004.cornelia.huck@de.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151207174229.4edc6004.cornelia.huck@de.ibm.com> Subject: Re: [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Cornelia Huck Cc: Peter Maydell , Fam Zheng , qemu-devel@nongnu.org, Stefan Hajnoczi Am 07.12.2015 um 17:42 hat Cornelia Huck geschrieben: > On Mon, 7 Dec 2015 11:02:51 +0100 > Cornelia Huck wrote: > > > On Thu, 3 Dec 2015 13:00:00 +0800 > > Stefan Hajnoczi wrote: > > > > > From: Fam Zheng > > > > > > The assertion problem was noticed in 06c3916b35a, but it wasn't > > > completely fixed, because even though the req is not marked as > > > serialising, it still gets serialised by wait_serialising_requests > > > against other serialising requests, which could lead to the same > > > assertion failure. > > > > > > Fix it by even more explicitly skipping the serialising for this > > > specific case. > > > > > > Signed-off-by: Fam Zheng > > > Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com > > > Signed-off-by: Stefan Hajnoczi > > > --- > > > block/backup.c | 2 +- > > > block/io.c | 12 +++++++----- > > > include/block/block.h | 4 ++-- > > > trace-events | 2 +- > > > 4 files changed, 11 insertions(+), 9 deletions(-) > > > > This one causes segfaults for me: > > > > Program received signal SIGSEGV, Segmentation fault. > > bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071 > > 3071 if (!drv) { > > > > (gdb) bt > > #0 bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071 This looks like some kind of memory corruption that hit blk->bs. It's most definitely not a valid pointer anyway. > > #1 0x0000000080216974 in blk_is_inserted (blk=) > > at /data/git/yyy/qemu/block/block-backend.c:986 > > #2 0x00000000802169c6 in blk_is_available (blk=blk@entry=0x3ffb17e7960) > > at /data/git/yyy/qemu/block/block-backend.c:991 > > #3 0x0000000080216d12 in blk_check_byte_request (blk=blk@entry=0x3ffb17e7960, > > offset=offset@entry=4928966656, size=16384) > > at /data/git/yyy/qemu/block/block-backend.c:558 > > #4 0x0000000080216df2 in blk_check_request (blk=blk@entry=0x3ffb17e7960, > > sector_num=sector_num@entry=9626888, nb_sectors=nb_sectors@entry=32) > > at /data/git/yyy/qemu/block/block-backend.c:589 > > #5 0x0000000080217ee8 in blk_aio_readv (blk=0x3ffb17e7960, sector_num= > > 9626888, iov=0x8098c658, nb_sectors=, cb= > > 0x80081150 , opaque=0x80980620) > > at /data/git/yyy/qemu/block/block-backend.c:727 > > #6 0x000000008008186e in submit_requests (niov=, > > num_reqs=, start=, mrb=, > > blk=) at /data/git/yyy/qemu/hw/block/virtio-blk.c:366 > > #7 virtio_blk_submit_multireq (mrb=, blk=) > > at /data/git/yyy/qemu/hw/block/virtio-blk.c:444 > > #8 virtio_blk_submit_multireq (blk=0x3ffb17e7960, mrb=0x3ffffffeb58) > > at /data/git/yyy/qemu/hw/block/virtio-blk.c:389 > > #9 0x00000000800823ee in virtio_blk_handle_output (vdev=, > > vq=) at /data/git/yyy/qemu/hw/block/virtio-blk.c:615 > > #10 0x00000000801e367e in aio_dispatch (ctx=0x80918520) > > at /data/git/yyy/qemu/aio-posix.c:326 > > #11 0x00000000801d28b0 in aio_ctx_dispatch (source=, > > callback=, user_data=) > > at /data/git/yyy/qemu/async.c:231 > > #12 0x000003fffd36a05a in g_main_context_dispatch () > > from /lib64/libglib-2.0.so.0 > > #13 0x00000000801e0ffa in glib_pollfds_poll () > > at /data/git/yyy/qemu/main-loop.c:211 > > #14 os_host_main_loop_wait (timeout=) > > at /data/git/yyy/qemu/main-loop.c:256 > > #15 main_loop_wait (nonblocking=) > > at /data/git/yyy/qemu/main-loop.c:504 > > #16 0x00000000800148a6 in main_loop () at /data/git/yyy/qemu/vl.c:1923 > > #17 main (argc=, argv=, envp=) > > at /data/git/yyy/qemu/vl.c:4684 > > > > Relevant part of command line: > > > > -drive file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none -device virtio-blk-ccw,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off > > I played around a bit. The main part of this change seems to be calling > wait_serialising_requests() conditionally; reverting this makes the > guest boot again. > > I then tried to find out when wait_serialising_requests() was NOT > called and added fprintfs: well, it was _always_ called. I then added a > fprintf for flags at the beginning of the function: this produced a > segfault no matter whether wait_serialising_requests() was called > conditionally or unconditionally. Weird race? > > Anything further I can do? I guess this patch fixes a bug for someone, > but it means insta-death for my setup... If it happens immediately, perhaps running under valgrind is possible and could give some hints about what happened with blk->bs? Kevin