From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <518D11D7.4040409@xenomai.org> Date: Fri, 10 May 2013 17:27:19 +0200 From: Gilles Chanteperdrix MIME-Version: 1.0 References: <50F96C89.8010004@siemens.com> In-Reply-To: <50F96C89.8010004@siemens.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Xenomai] [CRITICAL PATCH] ipipe: Re-read domain context after IRQ processing List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: Xenomai On 01/18/2013 04:38 PM, Jan Kiszka wrote: > This fixes a nasty bug on SMP boxes: We may migrate to root in the > context of an IRQ handler, and then also to a different CPU. Therefore, > we must not use domain contexts read before the invocation but update > them afterward or use stable information like the domain reference. > > Signed-off-by: Jan Kiszka > --- > > We are still facing stalled, unkillable RT processes despite this fix, > but at least the head domain status corruption (and related warnings) > seems to be gone now. > > kernel/ipipe/core.c | 5 +++-- > 1 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/ipipe/core.c b/kernel/ipipe/core.c > index 68af0b3..6aa9572 100644 > --- a/kernel/ipipe/core.c > +++ b/kernel/ipipe/core.c > @@ -1173,18 +1173,19 @@ static void dispatch_irq_head(unsigned int irq) /* hw interrupts off */ > head->irqs[irq].handler(irq, head->irqs[irq].cookie); > __ipipe_run_irqtail(irq); > hard_local_irq_disable(); > + p = ipipe_this_cpu_head_context(); > __clear_bit(IPIPE_STALL_FLAG, &p->status); > > /* Are we still running in the head domain? */ > if (likely(__ipipe_current_context == p)) { > /* Did we enter this code over the head domain? */ > - if (old == p) { > + if (old->domain == head) { > /* Yes, do immediate synchronization. */ > if (__ipipe_ipending_p(p)) > __ipipe_sync_stage(); > return; > } > - __ipipe_set_current_context(old); > + __ipipe_set_current_context(ipipe_this_cpu_root_context()); > } > > /* Hi Jan, I may have an issue reported which resembles the problem fixed by this patch. If I understand your patch, it means that an irq handler may migrate domain. But if I read __ipipe_sync_stage() code: void __ipipe_do_sync_stage(void) { struct ipipe_percpu_domain_data *p; struct ipipe_domain *ipd; int irq; p = __ipipe_current_context; ipd = p->domain; __set_bit(IPIPE_STALL_FLAG, &p->status); smp_wmb(); if (ipd == ipipe_root_domain) trace_hardirqs_off(); for (;;) { irq = __ipipe_next_irq(p); if (irq < 0) break; /* * Make sure the compiler does not reorder wrongly, so * that all updates to maps are done before the * handler gets called. */ barrier(); if (test_bit(IPIPE_LOCK_FLAG, &ipd->irqs[irq].control)) continue; if (ipd != ipipe_head_domain) hard_local_irq_enable(); if (likely(ipd != ipipe_root_domain)) { ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); __ipipe_run_irqtail(irq); hard_local_irq_disable(); } else if (ipipe_virtual_irq_p(irq)) { irq_enter(); ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); irq_exit(); root_stall_after_handler(); hard_local_irq_disable(); while (__ipipe_check_root_resched()) __ipipe_preempt_schedule_irq(); } else { ipd->irqs[irq].handler(irq, ipd->irqs[irq].cookie); root_stall_after_handler(); hard_local_irq_disable(); } p = __ipipe_current_context; } if (ipd == ipipe_root_domain) trace_hardirqs_on(); __clear_bit(IPIPE_STALL_FLAG, &p->status); } It seems to me that at the end of each loop, we re-read p, but we assume that p->domain did not change. So I would add: ipd = p->domain right after the line p = __ipipe_current_context But I am not sure that we are supposed to allow irq handlers to migrate domains. Regards. -- Gilles.