From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Cornelia Huck <cornelia.huck@de.ibm.com>, Kevin Wolf <kwolf@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
Fam Zheng <famz@redhat.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests
Date: Tue, 8 Dec 2015 13:30:00 +0100 [thread overview]
Message-ID: <5666CD48.4050600@de.ibm.com> (raw)
In-Reply-To: <20151208130008.5d0fc318.cornelia.huck@de.ibm.com>
On 12/08/2015 01:00 PM, Cornelia Huck wrote:
> On Tue, 8 Dec 2015 10:59:54 +0100
> Kevin Wolf <kwolf@redhat.com> wrote:
>
>> Am 07.12.2015 um 17:42 hat Cornelia Huck geschrieben:
>>> On Mon, 7 Dec 2015 11:02:51 +0100
>>> Cornelia Huck <cornelia.huck@de.ibm.com> wrote:
>>>
>>>> On Thu, 3 Dec 2015 13:00:00 +0800
>>>> Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>
>>>>> From: Fam Zheng <famz@redhat.com>
>>>>>
>>>>> The assertion problem was noticed in 06c3916b35a, but it wasn't
>>>>> completely fixed, because even though the req is not marked as
>>>>> serialising, it still gets serialised by wait_serialising_requests
>>>>> against other serialising requests, which could lead to the same
>>>>> assertion failure.
>>>>>
>>>>> Fix it by even more explicitly skipping the serialising for this
>>>>> specific case.
>>>>>
>>>>> Signed-off-by: Fam Zheng <famz@redhat.com>
>>>>> Message-id: 1448962590-2842-2-git-send-email-famz@redhat.com
>>>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>> ---
>>>>> block/backup.c | 2 +-
>>>>> block/io.c | 12 +++++++-----
>>>>> include/block/block.h | 4 ++--
>>>>> trace-events | 2 +-
>>>>> 4 files changed, 11 insertions(+), 9 deletions(-)
>>>>
>>>> This one causes segfaults for me:
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071
>>>> 3071 if (!drv) {
>>>>
>>>> (gdb) bt
>>>> #0 bdrv_is_inserted (bs=0x800000000000) at /data/git/yyy/qemu/block.c:3071
>>
>> This looks like some kind of memory corruption that hit blk->bs. It's
>> most definitely not a valid pointer anyway.
>>
>>>> #1 0x0000000080216974 in blk_is_inserted (blk=<optimized out>)
>>>> at /data/git/yyy/qemu/block/block-backend.c:986
>>>> #2 0x00000000802169c6 in blk_is_available (blk=blk@entry=0x3ffb17e7960)
>>>> at /data/git/yyy/qemu/block/block-backend.c:991
>>>> #3 0x0000000080216d12 in blk_check_byte_request (blk=blk@entry=0x3ffb17e7960,
>>>> offset=offset@entry=4928966656, size=16384)
>>>> at /data/git/yyy/qemu/block/block-backend.c:558
>>>> #4 0x0000000080216df2 in blk_check_request (blk=blk@entry=0x3ffb17e7960,
>>>> sector_num=sector_num@entry=9626888, nb_sectors=nb_sectors@entry=32)
>>>> at /data/git/yyy/qemu/block/block-backend.c:589
>>>> #5 0x0000000080217ee8 in blk_aio_readv (blk=0x3ffb17e7960, sector_num=
>>>> 9626888, iov=0x8098c658, nb_sectors=<optimized out>, cb=
>>>> 0x80081150 <virtio_blk_rw_complete>, opaque=0x80980620)
>>>> at /data/git/yyy/qemu/block/block-backend.c:727
>>>> #6 0x000000008008186e in submit_requests (niov=<optimized out>,
>>>> num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
>>>> blk=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:366
>>>> #7 virtio_blk_submit_multireq (mrb=<optimized out>, blk=<optimized out>)
>>>> at /data/git/yyy/qemu/hw/block/virtio-blk.c:444
>>>> #8 virtio_blk_submit_multireq (blk=0x3ffb17e7960, mrb=0x3ffffffeb58)
>>>> at /data/git/yyy/qemu/hw/block/virtio-blk.c:389
>>>> #9 0x00000000800823ee in virtio_blk_handle_output (vdev=<optimized out>,
>>>> vq=<optimized out>) at /data/git/yyy/qemu/hw/block/virtio-blk.c:615
>>>> #10 0x00000000801e367e in aio_dispatch (ctx=0x80918520)
>>>> at /data/git/yyy/qemu/aio-posix.c:326
>>>> #11 0x00000000801d28b0 in aio_ctx_dispatch (source=<optimized out>,
>>>> callback=<optimized out>, user_data=<optimized out>)
>>>> at /data/git/yyy/qemu/async.c:231
>>>> #12 0x000003fffd36a05a in g_main_context_dispatch ()
>>>> from /lib64/libglib-2.0.so.0
>>>> #13 0x00000000801e0ffa in glib_pollfds_poll ()
>>>> at /data/git/yyy/qemu/main-loop.c:211
>>>> #14 os_host_main_loop_wait (timeout=<optimized out>)
>>>> at /data/git/yyy/qemu/main-loop.c:256
>>>> #15 main_loop_wait (nonblocking=<optimized out>)
>>>> at /data/git/yyy/qemu/main-loop.c:504
>>>> #16 0x00000000800148a6 in main_loop () at /data/git/yyy/qemu/vl.c:1923
>>>> #17 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
>>>> at /data/git/yyy/qemu/vl.c:4684
>>>>
>>>> Relevant part of command line:
>>>>
>>>> -drive file=/dev/sda,if=none,id=drive-virtio-disk0,format=raw,serial=ccwzfcp1,cache=none -device virtio-blk-ccw,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,scsi=off
>>>
>>> I played around a bit. The main part of this change seems to be calling
>>> wait_serialising_requests() conditionally; reverting this makes the
>>> guest boot again.
>>>
>>> I then tried to find out when wait_serialising_requests() was NOT
>>> called and added fprintfs: well, it was _always_ called. I then added a
>>> fprintf for flags at the beginning of the function: this produced a
>>> segfault no matter whether wait_serialising_requests() was called
>>> conditionally or unconditionally. Weird race?
>>>
>>> Anything further I can do? I guess this patch fixes a bug for someone,
>>> but it means insta-death for my setup...
>>
>> If it happens immediately, perhaps running under valgrind is possible
>> and could give some hints about what happened with blk->bs?
>
> Just a quick update: This triggers on a qemu built with a not-so-fresh
> gcc 4.7.2 (and it seems to depend on compiler optimizations as well).
> We can't trigger it with newer gccs. No hints from valgrind, sadly.
> Investigation ongoing.
>
>
Some updates after looking at Cornelias system. It seem to be related to
the F18 gcc that is still on that test system.
Problem happens when hw/block/virtio-blk.c is compiled
with -O2 and goes away with -O1 and -O0 (I trimmed that down to
-fexpensive-optimizations)
The system calls virtio_blk_submit_multireq 26 times and then it messes
up the blk pointer:
(gdb) display blk->name
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb) next
419 if (niov + req->qiov.niov > IOV_MAX) {
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb)
424 if (req->qiov.size / BDRV_SECTOR_SIZE + nb_sectors > max_xfer_len) {
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb)
419 if (niov + req->qiov.niov > IOV_MAX) {
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb)
429 if (sector_num + nb_sectors != req->sector_num) {
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb)
434 submit_requests(blk, mrb, start, num_reqs, niov);
1: blk->name = 0x80894300 "drive-virtio-disk0"
(gdb)
413 for (i = 0; i < mrb->num_reqs; i++) {
1: blk->name = 0x8089ae40 ""
uninlining submit_request makes the problem go away, so might be a
gcc bug in Fedora18. I am now going to look at the disassembly to be
sure.
Christian
next prev parent reply other threads:[~2015-12-08 12:30 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-03 4:59 [Qemu-devel] [PULL for-2.5 0/4] Block patches Stefan Hajnoczi
2015-12-03 4:59 ` [Qemu-devel] [PULL for-2.5 1/4] iothread: include id in thread name Stefan Hajnoczi
2015-12-07 9:58 ` Cornelia Huck
2015-12-03 5:00 ` [Qemu-devel] [PULL for-2.5 2/4] block: Don't wait serialising for non-COR read requests Stefan Hajnoczi
2015-12-07 10:02 ` Cornelia Huck
2015-12-07 16:42 ` Cornelia Huck
2015-12-08 2:08 ` Fam Zheng
2015-12-08 9:59 ` Kevin Wolf
2015-12-08 12:00 ` Cornelia Huck
2015-12-08 12:30 ` Christian Borntraeger [this message]
2015-12-08 13:28 ` Christian Borntraeger
2015-12-08 13:45 ` Kevin Wolf
2015-12-08 13:58 ` Christian Borntraeger
2015-12-08 14:03 ` Christian Borntraeger
2015-12-08 14:10 ` Kevin Wolf
2015-12-08 14:24 ` Christian Borntraeger
2015-12-08 14:38 ` Cornelia Huck
2015-12-08 14:15 ` Peter Maydell
2015-12-03 5:00 ` [Qemu-devel] [PULL for-2.5 3/4] iotests: Add "add_drive_raw" method Stefan Hajnoczi
2015-12-03 5:00 ` [Qemu-devel] [PULL for-2.5 4/4] iotests: Add regresion test case for write notifier assertion failure Stefan Hajnoczi
2015-12-03 11:53 ` [Qemu-devel] [PULL for-2.5 0/4] Block patches Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5666CD48.4050600@de.ibm.com \
--to=borntraeger@de.ibm.com \
--cc=cornelia.huck@de.ibm.com \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).