From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4C77F142.4030103@domain.hid> Date: Fri, 27 Aug 2010 19:09:22 +0200 From: Jan Kiszka MIME-Version: 1.0 References: <4C7783C9.5090501@domain.hid> <4C7788A5.2020307@domain.hid> <4C77B054.8080609@domain.hid> <4C77B208.5090802@domain.hid> In-Reply-To: <4C77B208.5090802@domain.hid> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai-core] False positive XENO_BUGON(NUCLEUS, need_resched == 0)? List-Id: Xenomai life and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Gilles Chanteperdrix Cc: xenomai-core Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: >>>> Hi, >>>> >>>> I'm hitting that bug check in __xnpod_schedule after >>>> xnintr_clock_handler issued a xnpod_schedule like this: >>>> >>>> if (--sched->inesting == 0) { >>>> __clrbits(sched->status, XNINIRQ); >>>> xnpod_schedule(); >>>> } >>>> >>>> Either the assumption behind the bug check is no longer correct (no call >>>> to xnpod_schedule() without a real need), or we should check for >>>> __xnpod_test_resched(sched) in xnintr_clock_handler (but under nklock then). >>>> >>>> Comments? >>> You probably have a real bug. This BUG_ON means that the scheduler is >>> about to switch context for real, whereas the resched bit is not set, >>> which is wrong. >> This happened over my 2.6.35 port - maybe some spurious IRQ enabling. >> Debugging further... > > You should look for something which changes the scheduler state without > setting the resched bit, or for something which clears the bit without > taking the scheduler changes into account. It looks like a generic Xenomai issue on SMP boxes, though a mostly harmless one: The task that was scheduled in without XNRESCHED set locally has been woken up by a remote CPU. The waker requeued the task and set the resched condition for itself and in the resched proxy mask for the remote CPU. But there is at least one place in the Xenomai code where we drop the nklock between xnsched_set_resched and xnpod_schedule: do_taskexit_event (I bet there are even more). Now the resched target CPU runs into a timer handler, issues xnpod_schedule unconditionally, and happens to find the woken-up task before it is actually informed via an IPI. I think this is a harmless race, but it ruins the debug assertion "need_resched != 0". Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux