qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Fam Zheng <famz@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests
Date: Tue, 8 Dec 2015 14:45:22 +0100	[thread overview]
Message-ID: <20151208134522.GE5071@noname.str.redhat.com> (raw)
In-Reply-To: <5666DB13.7020501@de.ibm.com>

Am 08.12.2015 um 14:28 hat Christian Borntraeger geschrieben:
> On 12/08/2015 01:30 PM, Christian Borntraeger wrote:
> > On 12/08/2015 01:00 PM, Cornelia Huck wrote:
> >> On Tue, 8 Dec 2015 10:59:54 +0100
> >> Kevin Wolf <kwolf@redhat.com> wrote:
> >>
> >>> Am 07.12.2015 um 17:42 hat Cornelia Huck geschrieben:
> >>>> On Mon, 7 Dec 2015 11:02:51 +0100
> >>>> Cornelia Huck <cornelia.huck@de.ibm.com> wrote:
> >>>>
> >>>>> On Thu,  3 Dec 2015 13:00:00 +0800
> >>>>> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>>>>
> >>>>>> From: Fam Zheng <famz@redhat.com>
> >>>>>>
> >>>>>> The assertion problem was noticed in 06c3916b35a, but it wasn't
> >>>>>> completely fixed, because even though the req is not marked as
> >>>>>> serialising, it still gets serialised by wait_serialising_requests
> >>>>>> against other serialising requests, which could lead to the same
> >>>>>> assertion failure.
> >>>>>>
> >>>>>> Fix it by even more explicitly skipping the serialising for this
> >>>>>> specific case.
> >>>>>>
> >>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
> >>>>>> Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
> >>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >>>>>> ---
> >>>>>>  block/backup.c        |  2 +-
> >>>>>>  block/io.c            | 12 +++++++-----
> >>>>>>  include/block/block.h |  4 ++--
> >>>>>>  trace-events          |  2 +-
> >>>>>>  4 files changed, 11 insertions(+), 9 deletions(-)
> >>>>>
> >>>>> This one causes segfaults for me:
> >>>>>
> >>>>> Program received signal SIGSEGV, Segmentation fault.
> >>>>> bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071
> >>>>> 3071	    if (!drv) {
> >>>>>
> >>>>> (gdb) bt
> >>>>> #0  bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071
> >>>
> >>> This looks like some kind of memory corruption that hit blk->bs. It's
> >>> most definitely not a valid pointer anyway.
> >>>
> >>>>> #1  0x0000000080216974 in blk_is_inserted (blk=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/block/block-backend.c:986
> >>>>> #2  0x00000000802169c6 in blk_is_available (blk=blk@entry=0x3ffb17e7960)
> >>>>>     at /data/git/yyy/qemu/block/block-backend.c:991
> >>>>> #3  0x0000000080216d12 in blk_check_byte_request (blk=blk@entry=0x3ffb17e7960, 
> >>>>>     offset=offset@entry=4928966656, size=16384)
> >>>>>     at /data/git/yyy/qemu/block/block-backend.c:558
> >>>>> #4  0x0000000080216df2 in blk_check_request (blk=blk@entry=0x3ffb17e7960, 
> >>>>>     sector_num=sector_num@entry=9626888, nb_sectors=nb_sectors@entry=32)
> >>>>>     at /data/git/yyy/qemu/block/block-backend.c:589
> >>>>> #5  0x0000000080217ee8 in blk_aio_readv (blk=0x3ffb17e7960, sector_num=
> >>>>>     9626888, iov=0x8098c658, nb_sectors=<optimized out>, cb=
> >>>>>     0x80081150 <virtio_blk_rw_complete>, opaque=0x80980620)
> >>>>>     at /data/git/yyy/qemu/block/block-backend.c:727
> >>>>> #6  0x000000008008186e in submit_requests (niov=<optimized out>, 
> >>>>>     num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>, 
> >>>>>     blk=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:366
> >>>>> #7  virtio_blk_submit_multireq (mrb=<optimized out>, blk=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/hw/block/virtio-blk.c:444
> >>>>> #8  virtio_blk_submit_multireq (blk=0x3ffb17e7960, mrb=0x3ffffffeb58)
> >>>>>     at /data/git/yyy/qemu/hw/block/virtio-blk.c:389
> >>>>> #9  0x00000000800823ee in virtio_blk_handle_output (vdev=<optimized out>, 
> >>>>>     vq=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:615
> >>>>> #10 0x00000000801e367e in aio_dispatch (ctx=0x80918520)
> >>>>>     at /data/git/yyy/qemu/aio-posix.c:326
> >>>>> #11 0x00000000801d28b0 in aio_ctx_dispatch (source=<optimized out>, 
> >>>>>     callback=<optimized out>, user_data=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/async.c:231
> >>>>> #12 0x000003fffd36a05a in g_main_context_dispatch ()
> >>>>>    from /lib64/libglib-2.0.so.0
> >>>>> #13 0x00000000801e0ffa in glib_pollfds_poll ()
> >>>>>     at /data/git/yyy/qemu/main-loop.c:211
> >>>>> #14 os_host_main_loop_wait (timeout=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/main-loop.c:256
> >>>>> #15 main_loop_wait (nonblocking=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/main-loop.c:504
> >>>>> #16 0x00000000800148a6 in main_loop () at /data/git/yyy/qemu/vl.c:1923
> >>>>> #17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
> >>>>>     at /data/git/yyy/qemu/vl.c:4684
> >>>>>
> >>>>> Relevant part of command line:
> >>>>>
> >>>>> -drive file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none -device virtio-blk-ccw,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off
> >>>>
> >>>> I played around a bit. The main part of this change seems to be calling
> >>>> wait_serialising_requests() conditionally; reverting this makes the
> >>>> guest boot again.
> >>>>
> >>>> I then tried to find out when wait_serialising_requests() was NOT
> >>>> called and added fprintfs: well, it was _always_ called. I then added a
> >>>> fprintf for flags at the beginning of the function: this produced a
> >>>> segfault no matter whether wait_serialising_requests() was called
> >>>> conditionally or unconditionally. Weird race?
> >>>>
> >>>> Anything further I can do? I guess this patch fixes a bug for someone,
> >>>> but it means insta-death for my setup...
> >>>
> >>> If it happens immediately, perhaps running under valgrind is possible
> >>> and could give some hints about what happened with blk->bs?
> >>
> >> Just a quick update: This triggers on a qemu built with a not-so-fresh
> >> gcc 4.7.2 (and it seems to depend on compiler optimizations as well).
> >> We can't trigger it with newer gccs. No hints from valgrind, sadly.
> >> Investigation ongoing.
> >>
> >>
> > 
> > Some updates after looking at Cornelias system. It seem to be related to 
> > the F18 gcc that is still on that test system.
> > 
> > Problem happens when hw/block/virtio-blk.c is compiled
> > with -O2 and goes away with -O1 and -O0 (I trimmed that down to
> > -fexpensive-optimizations) 
> > 
> > The system calls virtio_blk_submit_multireq 26 times and then it messes
> > up the blk pointer:
> > 
> > (gdb) display blk->name
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) next
> > 419	            if (niov + req->qiov.niov > IOV_MAX) {
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) 
> > 424	            if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len) {
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) 
> > 419	            if (niov + req->qiov.niov > IOV_MAX) {
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) 
> > 429	            if (sector_num + nb_sectors != req->sector_num) {
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) 
> > 434	                submit_requests(blk, mrb, start, num_reqs, niov);
> > 1: blk->name = 0x80894300 "drive-virtio-disk0"
> > (gdb) 
> > 413	    for (i = 0; i < mrb->num_reqs; i++) {
> > 1: blk->name = 0x8089ae40 ""
> > 
> > 
> > uninlining submit_request makes the problem go away, so might be a
> > gcc bug in Fedora18. I am now going to look at the disassembly to be
> > sure.
> 
> Not a compiler bug. gcc uses a floating point register 8 to spill
> the pointer of blk (which is call saved) submit_request will later
> on call  qemu_coroutine_enter and after returning from 
> qemu_coroutine_enter, the fpr8 contains junk. Not sure yet, what happened.

Coroutines don't save the FPU state, so you're not supposed to use
floating point operations inside coroutines. That the compiler spills
some integer value into a floating point register is a bit nasty...

Kevin

  reply	other threads:[~2015-12-08 13:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-03  4:59 [Qemu-devel] [PULL for-2.5 0/4] Block patches Stefan Hajnoczi
2015-12-03  4:59 ` [Qemu-devel] [PULL for-2.5 1/4] iothread: include id in thread name Stefan Hajnoczi
2015-12-07  9:58   ` Cornelia Huck
2015-12-03  5:00 ` [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests Stefan Hajnoczi
2015-12-07 10:02   ` Cornelia Huck
2015-12-07 16:42     ` Cornelia Huck
2015-12-08  2:08       ` Fam Zheng
2015-12-08  9:59       ` Kevin Wolf
2015-12-08 12:00         ` Cornelia Huck
2015-12-08 12:30           ` Christian Borntraeger
2015-12-08 13:28             ` Christian Borntraeger
2015-12-08 13:45               ` Kevin Wolf [this message]
2015-12-08 13:58                 ` Christian Borntraeger
2015-12-08 14:03                   ` Christian Borntraeger
2015-12-08 14:10                   ` Kevin Wolf
2015-12-08 14:24                     ` Christian Borntraeger
2015-12-08 14:38                       ` Cornelia Huck
2015-12-08 14:15                 ` Peter Maydell
2015-12-03  5:00 ` [Qemu-devel] [PULL for-2.5 3/4] iotests: Add "add_drive_raw" method Stefan Hajnoczi
2015-12-03  5:00 ` [Qemu-devel] [PULL for-2.5 4/4] iotests: Add regresion test case for write notifier assertion failure Stefan Hajnoczi
2015-12-03 11:53 ` [Qemu-devel] [PULL for-2.5 0/4] Block patches Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151208134522.GE5071@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cornelia.huck@de.ibm.com \
    --cc=famz@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).