From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53215) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WrXyT-0003Du-Ey for qemu-devel@nongnu.org; Mon, 02 Jun 2014 15:32:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WrXyI-000213-LR for qemu-devel@nongnu.org; Mon, 02 Jun 2014 15:32:49 -0400 Received: from mail-ph.de-nserver.de ([85.158.179.214]:35825) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WrXyI-00020d-BM for qemu-devel@nongnu.org; Mon, 02 Jun 2014 15:32:38 -0400 Message-ID: <538CD167.1080100@profihost.ag> Date: Mon, 02 Jun 2014 21:32:55 +0200 From: Stefan Priebe MIME-Version: 1.0 References: <53863BC6.3040108@profihost.ag> <53863C9A.4040905@profihost.ag> <606EBA1F-638A-487D-8551-8D183D79937E@profihost.ag> <20140602134007.GG3049@stefanha-thinkpad.redhat.com> In-Reply-To: <20140602134007.GG3049@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] qemu 2.0 segfaults in event notifier List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: "famz@redhat.com" , qemu-devel , "qemu-stable@nongnu.org" Am 02.06.2014 15:40, schrieb Stefan Hajnoczi: > On Fri, May 30, 2014 at 04:10:39PM +0200, Stefan Priebe wrote: >> even with >> +From 271c0f68b4eae72691721243a1c37f46a3232d61 Mon Sep 17 00:00:00 2001 >> +From: Fam Zheng >> +Date: Wed, 21 May 2014 10:42:13 +0800 >> +Subject: [PATCH] aio: Fix use-after-free in cancellation path >> >> applied i saw today segfault with the following backtrace: >> >> Program terminated with signal 11, Segmentation fault. >> #0 0x00007f9dd633343f in event_notifier_set (e=0x124) at util/event_notifier-posix.c:97 >> 97 util/event_notifier-posix.c: No such file or directory. >> (gdb) bt >> #0 0x00007f9dd633343f in event_notifier_set (e=0x124) at util/event_notifier-posix.c:97 >> #1 0x00007f9dd5f4eafc in aio_notify (ctx=0x0) at async.c:246 >> #2 0x00007f9dd5f4e697 in qemu_bh_schedule (bh=0x7f9b98eeeb30) at async.c:128 >> #3 0x00007f9dd5fa2c44 in rbd_finish_aiocb (c=0x7f9dd9069ad0, rcb=0x7f9dd85f1770) at block/rbd.c:585 > > Hi Stefan, > Please print the QEMUBH: > (gdb) p *(QEMUBH*)0x7f9b98eeeb30 new trace: (gdb) bt #0 0x00007f69e421c43f in event_notifier_set (e=0x124) at util/event_notifier-posix.c:97 #1 0x00007f69e3e37afc in aio_notify (ctx=0x0) at async.c:246 #2 0x00007f69e3e37697 in qemu_bh_schedule (bh=0x7f5dac217f60) at async.c:128 #3 0x00007f69e3e8bc44 in rbd_finish_aiocb (c=0x7f5dac0c3f30, rcb=0x7f5dafa50610) at block/rbd.c:585 #4 0x00007f69e17bee44 in librbd::AioCompletion::complete() () from /usr/lib/librbd.so.1 #5 0x00007f69e17be832 in librbd::AioCompletion::complete_request(CephContext*, long) () from /usr/lib/librbd.so.1 #6 0x00007f69e1c946ba in Context::complete(int) () from /usr/lib/librados.so.2 #7 0x00007f69e17f1e85 in ObjectCacher::C_WaitForWrite::finish(int) () from /usr/lib/librbd.so.1 #8 0x00007f69e1c946ba in Context::complete(int) () from /usr/lib/librados.so.2 #9 0x00007f69e1d373c8 in Finisher::finisher_thread_entry() () from /usr/lib/librados.so.2 #10 0x00007f69dbd43b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #11 0x00007f69dba8e13d in clone () from /lib/x86_64-linux-gnu/libc.so.6 #12 0x0000000000000000 in ?? () this i another core dump so address differ: (gdb) p *(QEMUBH*)0x7f5dac217f60 $1 = {ctx = 0x0, cb = 0x7f69e3e8bb75 , opaque = 0x7f5dafa50610, next = 0x7f69e6b04d10, scheduled = false, idle = false, deleted = true} > It would also be interesting to print out the qemu_aio_context->first_bh > linked list of QEMUBH structs to check whether 0x7f9b98eeeb30 is on the > list. Do you mean just this: (gdb) p *(QEMUBH*)qemu_aio_context->first_bh $3 = {ctx = 0x7f69e68a4e00, cb = 0x7f69e41546a5 , opaque = 0x7f69e6b4a5e0, next = 0x7f69e6b4a570, scheduled = false, idle = false, deleted = false} > The aio_bh_new() and aio_bh_schedule() APIs are supposed to be > thread-safe. In theory the rbd.c code is fine. But maybe there is a > race condition somewhere. rbd.c was fine with 1.7.0 Stefan