From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <518D15DA.3030902@xenomai.org> Date: Fri, 10 May 2013 17:44:26 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <50F96C89.8010004@siemens.com> <518D11D7.4040409@xenomai.org> In-Reply-To: <518D11D7.4040409@xenomai.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] [CRITICAL PATCH] ipipe: Re-read domain context after IRQ processing List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Xenomai On 05/10/2013 05:27 PM, Gilles Chanteperdrix wrote: > On 01/18/2013 04:38 PM, Jan Kiszka wrote: > >> This fixes a nasty bug on SMP boxes: We may migrate to root in the >> context of an IRQ handler, and then also to a different CPU. Therefore, >> we must not use domain contexts read before the invocation but update >> them afterward or use stable information like the domain reference. >> >> Signed-off-by: Jan Kiszka >> --- >> >> We are still facing stalled, unkillable RT processes despite this fix, >> but at least the head domain status corruption (and related warnings) >> seems to be gone now. >> >> kernel/ipipe/core.c | 5 +++-- >> 1 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c >> index 68af0b3..6aa9572 100644 >> --- a/kernel/ipipe/core.c >> +++ b/kernel/ipipe/core.c >> @@ -1173,18 +1173,19 @@ static void dispatch_irq_head(unsigned int irq) /* hw interrupts off */ >> head->irqs[irq].handler(irq, head->irqs[irq].cookie); >> __ipipe_run_irqtail(irq); >> hard_local_irq_disable(); >> + p = ipipe_this_cpu_head_context(); >> __clear_bit(IPIPE_STALL_FLAG, &p->status); >> >> /* Are we still running in the head domain? */ >> if (likely(__ipipe_current_context == p)) { >> /* Did we enter this code over the head domain? */ >> - if (old == p) { >> + if (old->domain == head) { >> /* Yes, do immediate synchronization. */ >> if (__ipipe_ipending_p(p)) >> __ipipe_sync_stage(); >> return; >> } >> - __ipipe_set_current_context(old); >> + __ipipe_set_current_context(ipipe_this_cpu_root_context()); >> } >> >> /* > > > Hi Jan, > > I may have an issue reported which resembles the problem fixed by this > patch. If I understand your patch, it means that an irq handler may > migrate domain. But if I read __ipipe_sync_stage() code: > > void __ipipe_do_sync_stage(void) > { > struct ipipe_percpu_domain_data *p; > struct ipipe_domain *ipd; > int irq; > > p = __ipipe_current_context; > ipd = p->domain; > > __set_bit(IPIPE_STALL_FLAG, &p->status); > smp_wmb(); > > if (ipd == ipipe_root_domain) > trace_hardirqs_off(); > > for (;;) { > irq = __ipipe_next_irq(p); > if (irq < 0) > break; > /* > * Make sure the compiler does not reorder wrongly, so > * that all updates to maps are done before the > * handler gets called. > */ > barrier(); > > if (test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control)) > continue; > > if (ipd != ipipe_head_domain) > hard_local_irq_enable(); > > if (likely(ipd != ipipe_root_domain)) { > ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); > __ipipe_run_irqtail(irq); > hard_local_irq_disable(); > } else if (ipipe_virtual_irq_p(irq)) { > irq_enter(); > ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); > irq_exit(); > root_stall_after_handler(); > hard_local_irq_disable(); > while (__ipipe_check_root_resched()) > __ipipe_preempt_schedule_irq(); > } else { > ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); > root_stall_after_handler(); > hard_local_irq_disable(); > } > > p = __ipipe_current_context; > } > > if (ipd == ipipe_root_domain) > trace_hardirqs_on(); > > __clear_bit(IPIPE_STALL_FLAG, &p->status); > } > > It seems to me that at the end of each loop, we re-read p, but we assume > that p->domain did not change. So I would add: > > ipd = p->domain > > right after the line > p = __ipipe_current_context > > But I am not sure that we are supposed to allow irq handlers to migrate > domains. And I am not sure that this would not break the logic of __ipipe_sync_stage, so, maybe replacing: p = __ipipe_current_context with p = ipipe_this_cpu_context(ipd) would make more sense? -- Gilles.