From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4C77F5EF.3010701@domain.hid> Date: Fri, 27 Aug 2010 19:29:19 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <4C7783C9.5090501@domain.hid> <4C7788A5.2020307@domain.hid> <4C77B054.8080609@domain.hid> <4C77B208.5090802@domain.hid> <4C77F142.4030103@domain.hid> In-Reply-To: <4C77F142.4030103@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)? List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai-core Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: >>>> Jan Kiszka wrote: >>>>> Hi, >>>>> >>>>> I'm hitting that bug check in __xnpod_schedule after >>>>> xnintr_clock_handler issued a xnpod_schedule like this: >>>>> >>>>> if (--sched->inesting == 0) { >>>>> __clrbits(sched->status, XNINIRQ); >>>>> xnpod_schedule(); >>>>> } >>>>> >>>>> Either the assumption behind the bug check is no longer correct (no call >>>>> to xnpod_schedule() without a real need), or we should check for >>>>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then). >>>>> >>>>> Comments? >>>> You probably have a real bug. This BUG_ON means that the scheduler is >>>> about to switch context for real, whereas the resched bit is not set, >>>> which is wrong. >>> This happened over my 2.6.35 port - maybe some spurious IRQ enabling. >>> Debugging further... >> You should look for something which changes the scheduler state without >> setting the resched bit, or for something which clears the bit without >> taking the scheduler changes into account. > > It looks like a generic Xenomai issue on SMP boxes, though a mostly > harmless one: > > The task that was scheduled in without XNRESCHED set locally has been > woken up by a remote CPU. The waker requeued the task and set the > resched condition for itself and in the resched proxy mask for the > remote CPU. But there is at least one place in the Xenomai code where we > drop the nklock between xnsched_set_resched and xnpod_schedule: > do_taskexit_event (I bet there are even more). Now the resched target > CPU runs into a timer handler, issues xnpod_schedule unconditionally, > and happens to find the woken-up task before it is actually informed via > an IPI. > > I think this is a harmless race, but it ruins the debug assertion > "need_resched != 0". Not that harmless, since without the debugging code, we would miss the reschedule too... -- Gilles.