From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39517) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ad0Oq-0000KA-7g for qemu-devel@nongnu.org; Mon, 07 Mar 2016 14:01:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ad0Ol-0006Br-89 for qemu-devel@nongnu.org; Mon, 07 Mar 2016 14:01:00 -0500 Received: from e06smtp10.uk.ibm.com ([195.75.94.106]:58153) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ad0Ok-0006BT-Vx for qemu-devel@nongnu.org; Mon, 07 Mar 2016 14:00:55 -0500 Received: from localhost by e06smtp10.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 7 Mar 2016 19:00:53 -0000 References: <56DD7414.9080306@de.ibm.com> <20160307170139.GB26074@stefanha-x1.localdomain> From: Christian Borntraeger Message-ID: <56DDCFE1.4000808@de.ibm.com> Date: Mon, 7 Mar 2016 20:00:49 +0100 MIME-Version: 1.0 In-Reply-To: <20160307170139.GB26074@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] strange crash in tracked_request_begin List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Paolo Bonzini , qemu-devel , qemu-block@nongnu.org On 03/07/2016 06:01 PM, Stefan Hajnoczi wrote: > On Mon, Mar 07, 2016 at 01:29:08PM +0100, Christian Borntraeger wrote: >> Folks, >> >> I had a crash of a qemu guest in tracked_request_begin. >> The testcase was a guest with ramdisk/kernel that reboots in a >> loop. (about 10 times per second) with a single null-co disk >> attached. No idea how to reproduce this, seems to be a lucky hit. >> >> (gdb) bt >> #0 0x00000000101db5ba in tracked_request_begin (req=req@entry=0x3ff90f1bdc0, bs=bs@entry=0x42a39190, offset=offset@entry=0, bytes=bytes@entry=4096, type=type@entry=BDRV_TRACKED_READ) >> at /home/cborntra/REPOS/qemu/block/io.c:390 >> #1 0x00000000101de91e in bdrv_co_do_preadv (bs=0x42a39190, offset=0, bytes=4096, qiov=0x3ff7400cbd8, flags=, flags@entry=(unknown: 0)) >> at /home/cborntra/REPOS/qemu/block/io.c:1001 >> #2 0x00000000101dfc3e in bdrv_co_do_readv (flags=(unknown: 0), qiov=, nb_sectors=, sector_num=, bs=) >> at /home/cborntra/REPOS/qemu/block/io.c:1024 >> #3 bdrv_co_do_rw (opaque=0x3ff7400e370) at /home/cborntra/REPOS/qemu/block/io.c:2173 >> #4 0x000000001022d8f6 in coroutine_trampoline (i0=, i1=-1946150928) at /home/cborntra/REPOS/qemu/util/coroutine-ucontext.c:79 >> #5 0x000003ff95ed150a in __makecontext_ret () from /lib64/libc.so.6 >> >> looking at the code we are at >> >> QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); >> which translates to >> >> if (((req)->list.le_next = (&bs->tracked_requests)->lh_first) != NULL) >> (&bs->tracked_requests)->lh_first->list.le_prev = &(req)->list.le_next; >> (&bs->tracked_requests)->lh_first = (req); >> (req)->list.le_prev = &(&bs->tracked_requests)->lh_first; >> >> gdb says, that (&bs->tracked_requests)->lh_first) is zero in the corefile >> (gdb) print /x bs->tracked_requests >> $6 = {lh_first = 0x0} >> >> Now looking at the code I am asking myself if this can happen in parallel >> to another code that touches tracked_requests, because gcc seems to read >> &bs->tracked_requests)->lh_first twice (first to check the value, then >> to use it as pointer) > > tracked_requests is protected by AioContext. Perhaps something is doing > I/O without acquiring AioContext? Hmm, the guest was rebooting, which resets all devices. Maybe something in that code is still not right? I will have a look. > > Luckily there is only 1 place where items are added and removed from > tracked_requests. This might make debugging somewhat easier. I have trouble reproducing the issue, which makes it hard :-/ >> >> 388 qemu_co_queue_init(&req->wait_queue); >> 0x00000000101db594 <+76>: la %r2,72(%r13) >> 0x00000000101db598 <+80>: brasl %r14,0x1022cdc0 >> >> 389 >> 390 QLIST_INSERT_HEAD(&bs->tracked_requests, req, list); >> 0x00000000101db59e <+86>: lg %r1,12744(%r12) # r1 = (&bs->tracked_requests)->lh_first) >> 0x00000000101db5a4 <+92>: stg %r1,48(%r13) # (req)->list.le_next = r1 >> 0x00000000101db5aa <+98>: cgij %r1,0,8,0x101db5c0 ---+ # if r1==0 goto >> 0x00000000101db5b0 <+104>: lg %r1,12744(%r12) | # r1 = (&bs->tracked_requests)->lh_first) (again!!) >> 0x00000000101db5b6 <+110>: la %r2,48(%r13) | >> => 0x00000000101db5ba <+114>: stg %r2,56(%r1) | # r1==0 bang >> 0x00000000101db5c0 <+120>: stg %r13,12744(%r12)<-----+ >> 0x00000000101db5c6 <+126>: lay %r12,12744(%r12) >> 0x00000000101db5cc <+132>: stg %r12,56(%r13) >> >> >> Christian >>