From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: BUG: unable to handle kernel NULL pointer dereference - xen_spin_lock_flags Date: Tue, 14 Feb 2012 13:58:27 +0000 Message-ID: <4F3A6883.4010204@citrix.com> References: <4F399181.5060801@theshore.net> <37EB35CA-3984-4C29-98F8-8258D68F9B13@theshore.net> <1329214171.31256.198.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1329214171.31256.198.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Ian Campbell Cc: xen devel List-Id: xen-devel@lists.xenproject.org On 14/02/12 10:09, Ian Campbell wrote: > 1On Tue, 2012-02-14 at 00:54 +0000, Christopher S. Aker wrote: >> On Feb 13, 2012, at 5:41 PM, Christopher S. Aker wrote: >>> Network stress testing (iperf, UDP, bidirectional) the above stack reliably BUGs on both 3.0.4 and 3.2.5, and on both IGB or e1000e NICs with the following: >>> >>> BUG: unable to handle kernel NULL pointer dereference at 00000474 >>> IP: [] xen_spin_lock_flags+0x27/0x70 >> This happens regardless of CONFIG_PARAVIRT_SPINLOCKS enabled or disabled. > I think that rules out the recent pv spinlock bug (fixed by > 7a7546b377bdaa25ac77f33d9433c59f259b9688, in various stable trees > AFAIK). > > What line of code does that IP correspond to within xen_spin_lock_flags? > Likewise the one in xen_netbk_schedule_xenvif from the stack. > > I suspect this must be &netbk->net_schedule_list_lock but I don't see > how that can ever be NULL nor does the offset appear to be 0x474, at > least in my tree -- although that may depend on debug options. > > Are you rebooting guests or plug/unplugging vifs while this is going on? > What about hotplugging CPUs (dom0 in particular)? > > Does this happen as soon as the test starts or does it work for a bit > before failing? > > Ian. I dont know if this is related, but it looks very similar to a bug a friend of mine encountered. I tried to investigate but got nowhere. Panic can be found: http://pastebin.com/ExCwhzpy The panic looks as if it is on the same logical instruction. (There is a 32/64bit difference which would likely explain the out-by-one byte reference for the dereference.) The difference here is this bug is from the ext4 path, indicating that it might be a spinlock problem rather than a network problem (of course, assuming that this is infact the same bug) -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com