* [Qemu-devel] strange crash in tracked_request_begin
@ 2016-03-07 12:29 Christian Borntraeger
2016-03-07 13:35 ` Paolo Bonzini
2016-03-07 17:01 ` Stefan Hajnoczi
0 siblings, 2 replies; 6+ messages in thread
From: Christian Borntraeger @ 2016-03-07 12:29 UTC (permalink / raw)
To: qemu-block; +Cc: Paolo Bonzini, qemu-devel, Stefan Hajnoczi
Folks,
I had a crash of a qemu guest in tracked_request_begin.
The testcase was a guest with ramdisk/kernel that reboots in a
loop. (about 10 times per second) with a single null-co disk
attached. No idea how to reproduce this, seems to be a lucky hit.
(gdb) bt
#0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, type=type@entry=BDRV_TRACKED_READ)
at /home/cborntra/REPOS/qemu/block/io.c:390
#1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, flags@entry=(unknown: 0))
at /home/cborntra/REPOS/qemu/block/io.c:1001
#2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>)
at /home/cborntra/REPOS/qemu/block/io.c:1024
#3 bdrv_co_do_rw (opaque=0x3ff7400e370) at /home/cborntra/REPOS/qemu/block/io.c:2173
#4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
#5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
looking at the code we are at
QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
which translates to
if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL)
(&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
(&bs->tracked_requests)->lh_first = (req);
(req)->list.le_prev = &(&bs->tracked_requests)->lh_first;
gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile
(gdb) print /x bs->tracked_requests
$6 = {lh_first = 0x0}
Now looking at the code I am asking myself if this can happen in parallel
to another code that touches tracked_requests, because gcc seems to read
&bs->tracked_requests)->lh_first twice (first to check the value, then
to use it as pointer)
388 qemu_co_queue_init(&req->wait_queue);
0x00000000101db594 <+76>: la %r2,72(%r13)
0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 <qemu_co_queue_init>
389
390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 = (&bs->tracked_requests)->lh_first)
0x00000000101db5a4 <+92>: stg %r1,48(%r13) # (req)->list.le_next = r1
0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto
0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 = (&bs->tracked_requests)->lh_first) (again!!)
0x00000000101db5b6 <+110>: la %r2,48(%r13) |
=> 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0 bang
0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+
0x00000000101db5c6 <+126>: lay %r12,12744(%r12)
0x00000000101db5cc <+132>: stg %r12,56(%r13)
Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] strange crash in tracked_request_begin
2016-03-07 12:29 [Qemu-devel] strange crash in tracked_request_begin Christian Borntraeger
@ 2016-03-07 13:35 ` Paolo Bonzini
2016-03-07 13:40 ` Christian Borntraeger
2016-03-07 17:01 ` Stefan Hajnoczi
1 sibling, 1 reply; 6+ messages in thread
From: Paolo Bonzini @ 2016-03-07 13:35 UTC (permalink / raw)
To: Christian Borntraeger, qemu-block; +Cc: qemu-devel, Stefan Hajnoczi
On 07/03/2016 13:29, Christian Borntraeger wrote:
> Folks,
>
> I had a crash of a qemu guest in tracked_request_begin.
> The testcase was a guest with ramdisk/kernel that reboots in a
> loop. (about 10 times per second) with a single null-co disk
> attached.
Does it use a separate iothread?
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] strange crash in tracked_request_begin
2016-03-07 13:35 ` Paolo Bonzini
@ 2016-03-07 13:40 ` Christian Borntraeger
0 siblings, 0 replies; 6+ messages in thread
From: Christian Borntraeger @ 2016-03-07 13:40 UTC (permalink / raw)
To: Paolo Bonzini, qemu-block; +Cc: qemu-devel, Stefan Hajnoczi
On 03/07/2016 02:35 PM, Paolo Bonzini wrote:
>
>
> On 07/03/2016 13:29, Christian Borntraeger wrote:
>> Folks,
>>
>> I had a crash of a qemu guest in tracked_request_begin.
>> The testcase was a guest with ramdisk/kernel that reboots in a
>> loop. (about 10 times per second) with a single null-co disk
>> attached.
>
> Does it use a separate iothread?
yes
libvirt kludge (why is there no null-co/null-aio in libvirt?) looks like
<qemu:commandline>
<qemu:arg value='-drive'/>
<qemu:arg value='driver=null-co,id=null8,if=none,size=100G'/>
<qemu:arg value='-device'/>
<qemu:arg value='virtio-blk-ccw,scsi=off,drive=null8,serial=null8,iothread=iothread1'/>
</qemu:commandline>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] strange crash in tracked_request_begin
2016-03-07 12:29 [Qemu-devel] strange crash in tracked_request_begin Christian Borntraeger
2016-03-07 13:35 ` Paolo Bonzini
@ 2016-03-07 17:01 ` Stefan Hajnoczi
2016-03-07 19:00 ` Christian Borntraeger
1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2016-03-07 17:01 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: Paolo Bonzini, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 3324 bytes --]
On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote:
> Folks,
>
> I had a crash of a qemu guest in tracked_request_begin.
> The testcase was a guest with ramdisk/kernel that reboots in a
> loop. (about 10 times per second) with a single null-co disk
> attached. No idea how to reproduce this, seems to be a lucky hit.
>
> (gdb) bt
> #0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, type=type@entry=BDRV_TRACKED_READ)
> at /home/cborntra/REPOS/qemu/block/io.c:390
> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, flags@entry=(unknown: 0))
> at /home/cborntra/REPOS/qemu/block/io.c:1001
> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>)
> at /home/cborntra/REPOS/qemu/block/io.c:1024
> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at /home/cborntra/REPOS/qemu/block/io.c:2173
> #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
>
> looking at the code we are at
>
> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
> which translates to
>
> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL)
> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
> (&bs->tracked_requests)->lh_first = (req);
> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first;
>
> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile
> (gdb) print /x bs->tracked_requests
> $6 = {lh_first = 0x0}
>
> Now looking at the code I am asking myself if this can happen in parallel
> to another code that touches tracked_requests, because gcc seems to read
> &bs->tracked_requests)->lh_first twice (first to check the value, then
> to use it as pointer)
tracked_requests is protected by AioContext. Perhaps something is doing
I/O without acquiring AioContext?
Luckily there is only 1 place where items are added and removed from
tracked_requests. This might make debugging somewhat easier.
>
> 388 qemu_co_queue_init(&req->wait_queue);
> 0x00000000101db594 <+76>: la %r2,72(%r13)
> 0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 <qemu_co_queue_init>
>
> 389
> 390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
> 0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 = (&bs->tracked_requests)->lh_first)
> 0x00000000101db5a4 <+92>: stg %r1,48(%r13) # (req)->list.le_next = r1
> 0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto
> 0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 = (&bs->tracked_requests)->lh_first) (again!!)
> 0x00000000101db5b6 <+110>: la %r2,48(%r13) |
> => 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0 bang
> 0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+
> 0x00000000101db5c6 <+126>: lay %r12,12744(%r12)
> 0x00000000101db5cc <+132>: stg %r12,56(%r13)
>
>
> Christian
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] strange crash in tracked_request_begin
2016-03-07 17:01 ` Stefan Hajnoczi
@ 2016-03-07 19:00 ` Christian Borntraeger
2016-03-08 10:06 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Christian Borntraeger @ 2016-03-07 19:00 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Paolo Bonzini, qemu-devel, qemu-block
On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote:
> On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote:
>> Folks,
>>
>> I had a crash of a qemu guest in tracked_request_begin.
>> The testcase was a guest with ramdisk/kernel that reboots in a
>> loop. (about 10 times per second) with a single null-co disk
>> attached. No idea how to reproduce this, seems to be a lucky hit.
>>
>> (gdb) bt
>> #0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, type=type@entry=BDRV_TRACKED_READ)
>> at /home/cborntra/REPOS/qemu/block/io.c:390
>> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, flags@entry=(unknown: 0))
>> at /home/cborntra/REPOS/qemu/block/io.c:1001
>> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>)
>> at /home/cborntra/REPOS/qemu/block/io.c:1024
>> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at /home/cborntra/REPOS/qemu/block/io.c:2173
>> #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
>> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
>>
>> looking at the code we are at
>>
>> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>> which translates to
>>
>> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL)
>> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
>> (&bs->tracked_requests)->lh_first = (req);
>> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first;
>>
>> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile
>> (gdb) print /x bs->tracked_requests
>> $6 = {lh_first = 0x0}
>>
>> Now looking at the code I am asking myself if this can happen in parallel
>> to another code that touches tracked_requests, because gcc seems to read
>> &bs->tracked_requests)->lh_first twice (first to check the value, then
>> to use it as pointer)
>
> tracked_requests is protected by AioContext. Perhaps something is doing
> I/O without acquiring AioContext?
Hmm, the guest was rebooting, which resets all devices. Maybe something
in that code is still not right? I will have a look.
>
> Luckily there is only 1 place where items are added and removed from
> tracked_requests. This might make debugging somewhat easier.
I have trouble reproducing the issue, which makes it hard :-/
>>
>> 388 qemu_co_queue_init(&req->wait_queue);
>> 0x00000000101db594 <+76>: la %r2,72(%r13)
>> 0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 <qemu_co_queue_init>
>>
>> 389
>> 390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>> 0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 = (&bs->tracked_requests)->lh_first)
>> 0x00000000101db5a4 <+92>: stg %r1,48(%r13) # (req)->list.le_next = r1
>> 0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto
>> 0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 = (&bs->tracked_requests)->lh_first) (again!!)
>> 0x00000000101db5b6 <+110>: la %r2,48(%r13) |
>> => 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0 bang
>> 0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+
>> 0x00000000101db5c6 <+126>: lay %r12,12744(%r12)
>> 0x00000000101db5cc <+132>: stg %r12,56(%r13)
>>
>>
>> Christian
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] strange crash in tracked_request_begin
2016-03-07 19:00 ` Christian Borntraeger
@ 2016-03-08 10:06 ` Stefan Hajnoczi
0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2016-03-08 10:06 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: Paolo Bonzini, qemu-devel, qemu-block
[-- Attachment #1: Type: text/plain, Size: 2741 bytes --]
On Mon, Mar 07, 2016 at 08:00:49PM +0100, Christian Borntraeger wrote:
> On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote:
> > On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote:
> >> Folks,
> >>
> >> I had a crash of a qemu guest in tracked_request_begin.
> >> The testcase was a guest with ramdisk/kernel that reboots in a
> >> loop. (about 10 times per second) with a single null-co disk
> >> attached. No idea how to reproduce this, seems to be a lucky hit.
> >>
> >> (gdb) bt
> >> #0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, type=type@entry=BDRV_TRACKED_READ)
> >> at /home/cborntra/REPOS/qemu/block/io.c:390
> >> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, bytes=4096, qiov=0x3ff7400cbd8, flags=<optimized out>, flags@entry=(unknown: 0))
> >> at /home/cborntra/REPOS/qemu/block/io.c:1001
> >> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), qiov=<optimized out>, nb_sectors=<optimized out>, sector_num=<optimized out>, bs=<optimized out>)
> >> at /home/cborntra/REPOS/qemu/block/io.c:1024
> >> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at /home/cborntra/REPOS/qemu/block/io.c:2173
> >> #4 0x000000001022d8f6 in coroutine_trampoline (i0=<optimized out>, i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79
> >> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6
> >>
> >> looking at the code we are at
> >>
> >> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
> >> which translates to
> >>
> >> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL)
> >> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next;
> >> (&bs->tracked_requests)->lh_first = (req);
> >> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first;
> >>
> >> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile
> >> (gdb) print /x bs->tracked_requests
> >> $6 = {lh_first = 0x0}
> >>
> >> Now looking at the code I am asking myself if this can happen in parallel
> >> to another code that touches tracked_requests, because gcc seems to read
> >> &bs->tracked_requests)->lh_first twice (first to check the value, then
> >> to use it as pointer)
> >
> > tracked_requests is protected by AioContext. Perhaps something is doing
> > I/O without acquiring AioContext?
>
> Hmm, the guest was rebooting, which resets all devices. Maybe something
> in that code is still not right? I will have a look.
virtio_blk_reset() does acquire AioContext so at least that part should
be safe with running IOThreads.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-08 10:06 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-07 12:29 [Qemu-devel] strange crash in tracked_request_begin Christian Borntraeger
2016-03-07 13:35 ` Paolo Bonzini
2016-03-07 13:40 ` Christian Borntraeger
2016-03-07 17:01 ` Stefan Hajnoczi
2016-03-07 19:00 ` Christian Borntraeger
2016-03-08 10:06 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).