From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philippe Gerum In-Reply-To: <450A6566.1070200@domain.hid> References: <58168.129.217.4.64.1158067468.squirrel@domain.hid> <1158091876.5020.8.camel@domain.hid> <450727D1.6060203@domain.hid> <4507C49A.5070909@domain.hid> <1158223595.5040.16.camel@domain.hid> <1158308512.5009.35.camel@domain.hid> <450A6566.1070200@domain.hid> Content-Type: text/plain Date: Fri, 15 Sep 2006 11:09:26 +0200 Message-Id: <1158311366.5009.51.camel@domain.hid> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [Xenomai-core] Re: [Xenomai-help] Bad EIP kernel-Oops Reply-To: rpm@xenomai.org List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai@xenomai.org On Fri, 2006-09-15 at 10:33 +0200, Jan Kiszka wrote: > Philippe Gerum wrote: > > diff -uNrp 2.6.17-ipipe-1.3-10/kernel/ipipe/core.c 2.6.17-ipipe/kernel/ipipe/core.c > > --- 2.6.17-ipipe-1.3-10/kernel/ipipe/core.c 2006-09-15 10:13:15.000000000 +0200 > > +++ 2.6.17-ipipe/kernel/ipipe/core.c 2006-09-14 20:09:22.000000000 +0200 > > ... > > @@ -638,6 +642,7 @@ int ipipe_control_irq(unsigned irq, unsi > > int fastcall __ipipe_dispatch_event (unsigned event, void *data) > > { > > struct ipipe_domain *start_domain, *this_domain, *next_domain; > > + ipipe_event_handler_t evhand; > > struct list_head *pos, *npos; > > unsigned long flags; > > ipipe_declare_cpuid; > > @@ -649,8 +654,6 @@ int fastcall __ipipe_dispatch_event (uns > > > > list_for_each_safe(pos,npos,&__ipipe_pipeline) { > > > > - next_domain = list_entry(pos,struct ipipe_domain,p_link); > > - > > /* > > * Note: Domain migration may occur while running > > * event or interrupt handlers, in which case the > > @@ -659,11 +662,22 @@ int fastcall __ipipe_dispatch_event (uns > > * care for that, always tracking the current domain > > * descriptor upon return from those handlers. > > */ > > - if (next_domain->evhand[event] != NULL) { > > + next_domain = list_entry(pos,struct ipipe_domain,p_link); > > + > > + /* > > + * Keep a cached copy of the handler's address since > > + * ipipe_catch_event() may clear it under our feet. > > + */ > > + > > + evhand = next_domain->evhand[event]; > > + > > + if (evhand != NULL) { > > ipipe_percpu_domain[cpuid] = next_domain; > > + next_domain->cpudata[cpuid].evsync |= (1LL << event); > > Isn't there still a race window between grabbing evhand and setting this > bit here? If yes, can this be solved by moving set/clear flag out of the > condition? > ipipe_catch_event() that changes the handler's address must hold the critical lock to do so, and since we are running with masked IRQs in the code above, we prevent the lock from being be obtained (i.e. through the critical IPI) for the current CPU, so this is ok. > If this is not the solution, I wonder if we should really care that > much. Unregistering an event handler remains a rarely used scenario > which actually never happens with the nucleus built in. Shouldn't we > better put some grace period after it and live with the fact that on a > *very busy* system it *may* cause troubles once in a while? Reminds me > of the fact that procfs handler unregistration with private data > attached is also racy by design... > I agree, this is a corner case which is extremely unlikely to happen in production systems, since we would likely never unplug the nucleus in the first place. Still, I think that if we can fix this issue with minimum overhead, it should be done. So far, the cost involved is changing a 64bit value twice in the event dispatcher, and once in the domain switch code. I'd prefer to limit this to 32bit values, but unfortunately can't yet, until some of the x86-specific exceptions are grouped under a single event code, so that we don't need more than 32 event bits to flag them. > Jan > -- Philippe.