From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philippe Gerum In-Reply-To: <469F4A98.3080307@domain.hid> References: <469BF43D.1040704@domain.hid> <46973753.6010206@domain.hid> <4694ED98.6000000@domain.hid> <46937E70.10903@domain.hid> <469345EB.6060302@domain.hid> <22554361.1184054457326.JavaMail.ngmail@domain.hid> <2026261.1184070574283.JavaMail.ngmail@domain.hid> <1982070.1184078400928.JavaMail.ngmail@domain.hid> <4693A702.1010604@domain.hid> <913919.1184311634860.JavaMail.ngmail@domain.hid> <21969019.1184569651818.JavaMail.ngmail@domain.hid> <29054475.1184842736562.JavaMail.ngmail@domain.hid> <469F4A98.3080307@domain.hid> Content-Type: text/plain Date: Thu, 19 Jul 2007 14:19:09 +0200 Message-Id: <1184847549.28303.46.camel@domain.hid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: Philippe Gerum Subject: Re: [Xenomai-help] Sporadic PC freeze after rt_task_start Reply-To: rpm@xenomai.org List-Id: Help regarding installation and common use of Xenomai List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai-help , "M. Koehrer" On Thu, 2007-07-19 at 13:27 +0200, Jan Kiszka wrote: > M. Koehrer wrote: > > Hi! > > > > After a couple of over-night test runs, I finally got an NMI watchdog detected lockup with the sporadic freeze option. > > I started the system with the argument nmi_watchdog=1 (also isolcpus=1). > > See the code below. As I have not connected a serial console, I have attached a screen shot in a fairly > > bad quality as jpg file... However, it is good enough to be able to read everything... > > The lockup is in function rpi_pop [xeno_nucleus]. > > It is called from gatekeeper_thread and from default_wake_function. > > See the attached jpg for details. > > Looks like we are stuck on rpilock, Philippe. > Seems likely, yes. Switching the nucleus DEBUG option would engage the lockup detector, and pull the brake whenever the nucleus fails to grab the rpilock. Mathias, I guess this test has not been run with the nucleus debug option enabled. Any chance to get a disassembly of the rpi_pop routine as compiled into your kernel, so that we could check if we are really stuck on this lock, or rather on some infinite walk into a corrupted RPI list? > And when looking at the holders of rpilock, I think one issue could be > that we hold that lock while calling into xnpod_renice_root [1], ie. > doing a potential context switch. Was this checked to be save? xnpod_renice_root() does no reschedule immediately on purpose, we would never have been able to run any SMP config more than a couple of seconds otherwise. (See the NOSWITCH bit). > Furthermore, that code path reveals that we take nklock nested into > rpilock [2]. I haven't found a spot for the other way around (and I hope > there is none) xnshadow_start(). > , but such nesting is already evil per se... Well, nesting spinlocks only falls into evilness when you get a circular graph, but since the rpilock is a rookie in the locking team, I'm going to check this. Ok, I'm tackling this lockup issue now. I first need to reproduce it. More news later. > > Mathias, already tried your test case with our old friend "priority > coupling" switched off? *If* this lock-up is actually due to rpilock > brokenness, switching the feature off should make it disappear. > It would be nice to switch on the nucleus DEBUG feature, especially the queue debugging one. I understand this may hide the bug due to the alteration of timings, but still, it would be useful to know whether a configuration without NMI but with such debug knob on would trigger the alarm. > Jan > > > [1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#435 > [2]http://www.rts.uni-hannover.de/xenomai/lxr/source/include/nucleus/pod.h?v=SVN-trunk#308 > -- Philippe.