From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NGHb9-0004AM-LV for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:44:19 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NGHb4-00046k-HK for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:44:18 -0500 Received: from [199.232.76.173] (port=33660 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NGHb4-00046f-8w for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:44:14 -0500 Received: from mail-yw0-f171.google.com ([209.85.211.171]:42469) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NGHb4-0003dG-4S for qemu-devel@nongnu.org; Thu, 03 Dec 2009 14:44:14 -0500 Received: by ywh1 with SMTP id 1so1492930ywh.18 for ; Thu, 03 Dec 2009 11:44:13 -0800 (PST) Message-ID: <4B181509.3030705@codemonkey.ws> Date: Thu, 03 Dec 2009 13:44:09 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [STAGING]: Block migration segfaults References: <20091203162121.67d9c120@doriath> <4B18051F.4040207@siemens.com> In-Reply-To: <4B18051F.4040207@siemens.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: "qemu-devel@nongnu.org" , "lirans@il.ibm.com" , Luiz Capitulino Jan Kiszka wrote: > Luiz Capitulino wrote: > >> Hi there, >> >> Got this while testing block migration in staging: >> >> """ >> Program terminated with signal 11, Segmentation fault. >> #0 0x0000000000410cf9 in monitor_vprintf (mon=0x0, fmt=0x5ae5e7 "Start full migration for %s\n", >> ap=0x7fff1f830a40) at /home/lcapitulino/src/aliguori-queue/monitor.c:192 >> 192 if (mon->mc && !mon->mc->print_enabled) { >> """ >> >> The problem here is that init_blk_migration() calls monitor_printf() with >> a NULL 'mon' and the backtrace shows that this is true for the entire call >> chain. >> > > What is the backtrace? And how did you start the migration? > > >> You probably didn't note it before because the lowest-level monitor >> print function would just return if the 'mon' parameter was NULL. >> > > I was aware that mon might be NULL, but the existing code handled this > gracefully. > > >> A patch from me (4a29a in staging) changes a higher level monitor >> function to touch 'mon' before passing it down and here's the segfault. >> >> Now, the point is: I could give the old behavior back but I think we're >> hiding a bug there. Why would you call monitor_printf() with a NULL 'mon'? >> > > If there is no monitor associated with the current context, it can be > NULL. This is mostly the case during early startup. One may also use > this to suppress output (though I don't recall any real case ATM). > I'm a bit concerned with this explanation as there is no reason something should be printing to the monitor unless it's in response to a monitor command. I'd like to see the full call chain to see what's happening too. >> Anyways, the following patch adds the old behavior back just in case >> you want to see it working... >> > > Yes, better restore the check. Still, your call stack would be > interesting. Maybe there is actual another bug behind it. > I think we should have an assert in this path because mon==NULL is definitely wrong. Regards, Anthony Liguori