Re: [Xenomai-help] kernel oopses when killing realtime task

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@domain.hid>
To: Philippe Gerum <rpm@xenomai.org>
Cc: xenomai@xenomai.org
Subject: Re: [Xenomai-help] kernel oopses when killing realtime task
Date: Mon, 25 Oct 2010 23:47:18 +0200	[thread overview]
Message-ID: <4CC5FAE6.6010305@domain.hid> (raw)
In-Reply-To: <1288042858.26618.204.camel@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 6725 bytes --]

Am 25.10.2010 23:40, Philippe Gerum wrote:
> On Mon, 2010-10-25 at 23:22 +0200, Jan Kiszka wrote:
>> Am 25.10.2010 23:12, Philippe Gerum wrote:
>>> On Mon, 2010-10-25 at 21:22 +0200, Jan Kiszka wrote:
>>>> Am 25.10.2010 21:20, Philippe Gerum wrote:
>>>>> On Mon, 2010-10-25 at 21:15 +0200, Jan Kiszka wrote:
>>>>>> Am 25.10.2010 21:08, Philippe Gerum wrote:
>>>>>>> On Mon, 2010-10-25 at 20:10 +0200, Jan Kiszka wrote:
>>>>>>>> Am 25.10.2010 18:48, Philippe Gerum wrote:
>>>>>>>>> On Wed, 2010-10-13 at 16:52 +0200, Philippe Gerum wrote: 
>>>>>>>>>>>
>>>>>>>>>>> Should we test IPIPE_STALL_FLAG on all but current CPUs?
>>>>>>>>>>
>>>>>>>>>> That would solve this particular issue, but we should drain the pipeline
>>>>>>>>>> out of any Xenomai critical section. The way it is done now may induce a
>>>>>>>>>> deadlock (e.g. CPU0 waiting for CPU1 to acknowledge critical entry in
>>>>>>>>>> ipipe_enter_critical when getting some IPI, and CPU1 waiting hw IRQs off
>>>>>>>>>> for CPU0 to release the Xenomai lock that annoys us right now).
>>>>>>>>>>
>>>>>>>>>> I'll come up with something hopefully better and tested in the next
>>>>>>>>>> days.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sorry for the lag. In case that helps, here is another approach, based
>>>>>>>>> on telling the pipeline to ignore the irq about to be detached, so that
>>>>>>>>> it passes all further occurrences down to the next domain, without
>>>>>>>>
>>>>>>>> Err, won't this irritate that next domain, ie. won't Linux dump warnings
>>>>>>>> about a spurious/unhandled IRQ? I think either the old handler shall
>>>>>>>> receive the last event or no one.
>>>>>>>
>>>>>>> Flipping the IRQ modes within a ipipe_critical_enter/exit section gives
>>>>>>> you that guarantee. You are supposed to have disabled the irq line
>>>>>>> before detaching, and critical IPIs cannot be acknowledged until all
>>>>>>> CPUs have re-enabled interrupts at some point. Therefore, there are only
>>>>>>> two scenarii:
>>>>>>>
>>>>>>> - irq was disabled before delivery, and a pending interrupt is masked by
>>>>>>> the PIC and never delivered to the CPU.
>>>>>>>
>>>>>>> - an interrupt sneaked in before disabling, it is currently processed by
>>>>>>> the pipeline in the low handler on some CPU, in which case interrupts
>>>>>>> are off, so a critical IPI could be acked yet, and the irq mode bits
>>>>>>> still allow dispatching to the target domain on that CPU. The assumption
>>>>>>> which is happily made is that only head domains are interested in
>>>>>>> un-virtualizing irqs, so the dispatch will happen immediately, while the
>>>>>>> handler is still valid (actually, we are not allowed to un-virtualize
>>>>>>> root irqs, and intermediate Adeos domains are already considered as
>>>>>>> endangered species, so this is fine).
>>>>>>>
>>>>>>>>
>>>>>>>> Why this complex solution, why not simply draining (via critical_enter
>>>>>>>> or whatever) - but _after_ xnintr_irq_detach, ie. while the related
>>>>>>>> resources are still valid?
>>>>>>>>
>>>>>>>
>>>>>>> Because it's already too late. You have cleared the handler pointer when
>>>>>>> un-virtualizing via xnarch_release_irq, and the wired irq dispatcher or
>>>>>>> the log syncer on another CPU could then branch to eip $0.
>>>>>>
>>>>>> Just make ipipe_virtualize_irq install a nop handler instead of NULL.
>>>>>
>>>>> This does not solve the issue of the last interrupt which should be
>>>>> processed. You don't want to miss it.
>>>>
>>>> Don't understand. No interrupt is supposed to arrive anymore on
>>>> deregistration, the last user is supposed to be down by now. We just
>>>> need to catch though that slipped through.
>>>
>>> No, we are handling the case when an interrupt is currently handled on a
>>> CPU which is not the one that unregisters the IRQ, and which managed to
>>> sneak in while the irq source was about to be masked in the PIC. This is
>>> purely asynchronous for us in SMP, since we don't have irq descriptor
>>> locks for the pipeline, we only have them at Xenomai level, which is one
>>> step too far to protected our low level Adeos handler against that kind
>>> of race. Logically speaking, there is no reason why you would accept to
>>> leave that irq unhandled if it is there and known from a CPU (it was
>>> actually the first point you raised).
>>
>> If that handler is already running, the IRQ will get handled, we just
>> need to wait for the handler to finish after we returned from
>> ipipe_virtualize_irq => thus we need a barrier here.
>>
> 
> So, we let ipipe_virtualize_irq clear the handler pointer any remote CPU
> could use in parallel, and we...synchronize on the outcome? The notion
> of "handler" may explain why we don't sync yet: I'm not taking about the
> Xenomai entry for interrupts, I'm dealing with interrupts in the early
> code of ipipe_handle_irq, before you hit the wired irq dispatcher or the
> sync_stage loop. 

/me too.

> 
>> If the handler was about to run but we deregistered it a bit quicker,
>> the IRQ need not be addressed at device level anymore. Reason: we
>> already switched off any IRQ assertions in the device before we entered
>> ipipe_virtualize_irq. So no harm is caused, that IRQ line is deasserted
>> already (IOW: the IRQ became a spurious one while cleaning up).
> 
> You can attempt to disable the irq line on one CPU, and have a relevant
> irq entering the low level pipeline handler on another one at exactly
> the same time. We have _no_ sync here.

That is not the problem here. I'm not talking about the line, I'm
talking now about the device that drives it. Silencing the IRQ there is
important, but is async /wrt to other cores and needs a barrier.

Actually, playing with the IRQ line is no driver business anyway (except
for very few special cases).


> 
>>
>>>
>>> The proper way to fix this issue would have been to fix xnintr_detach in
>>> the first place, because calling ipipe_virtualize_irq() while holding a
>>> lock with irqs off is wrong. We could have then drained the pipeline
>>> from the unregistering code. I'm rather going for a decent solution
>>> which is not prone to regression for 2.x.
>>>
>>
>> Again, I see no reason for a more complex solution than avoiding that
>> NULL pointer dereference at ipipe level as I suggested and adding a very
>> simply system wide barrier right after dropping nklock in xnintr_detach.
>>
> 
> You cannot drop that lock without rewriting the xnintr layer in rather
> touchy areas, that is the point.

I meant intrlock - we do not call xnintr_detach with nklock acquired
(your pre-detach synch would not work as well if we did).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 259 bytes --]

next prev parent reply	other threads:[~2010-10-25 21:47 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-07 11:57 [Xenomai-help] kernel oopses when killing realtime task Pavel Machek
2010-10-07 12:11 ` Gilles Chanteperdrix
2010-10-07 13:00   ` Gilles Chanteperdrix
2010-10-07 12:32 ` Jan Kiszka
2010-10-08  7:01   ` Pavel Machek
2010-10-08  7:20     ` Gilles Chanteperdrix
2010-10-08  8:17     ` Philippe Gerum
2010-10-08  8:41       ` Jan Kiszka
2010-10-08  8:57         ` Philippe Gerum
2010-10-08  9:00           ` Philippe Gerum
2010-10-08  9:41     ` Philippe Gerum
2010-10-13  9:03       ` Pavel Machek
2010-10-13  9:16         ` Philippe Gerum
2010-10-13  9:26           ` Pavel Machek
2010-10-13 14:52             ` Philippe Gerum
2010-10-25 16:48               ` Philippe Gerum
2010-10-25 18:10                 ` Jan Kiszka
2010-10-25 19:08                   ` Philippe Gerum
2010-10-25 19:11                     ` Philippe Gerum
2010-10-25 19:15                     ` Jan Kiszka
2010-10-25 19:20                       ` Philippe Gerum
2010-10-25 19:22                         ` Jan Kiszka
2010-10-25 21:12                           ` Philippe Gerum
2010-10-25 21:22                             ` Jan Kiszka
2010-10-25 21:40                               ` Philippe Gerum
2010-10-25 21:47                                 ` Jan Kiszka [this message]
2010-10-26  4:43                                   ` Philippe Gerum
2010-10-26  5:22                                     ` Jan Kiszka
2010-10-26 19:33                                       ` Jan Kiszka
2010-10-28  5:17                                         ` Philippe Gerum
2010-10-28  7:31                                           ` Jan Kiszka
2010-10-28  7:38                                             ` Jan Kiszka
2010-10-28  7:46                                             ` Philippe Gerum
2010-11-07 15:15                                               ` Philippe Gerum
2010-11-07 16:22                                                 ` Jan Kiszka
2010-11-07 16:55                                                   ` Philippe Gerum
2010-11-07 16:59                                                   ` Philippe Gerum
2010-11-07 17:19                                                   ` Philippe Gerum
2010-11-09  8:01                                                   ` Jan Kiszka
2010-11-09  8:26                                                     ` Philippe Gerum
2010-11-09  8:39                                                       ` Jan Kiszka
2010-11-09  9:36                                                         ` Philippe Gerum
2010-11-09 13:12                                                           ` Jan Kiszka
2010-11-12  8:48                                                             ` Philippe Gerum
2010-11-12  9:14                                                               ` Jan Kiszka
2010-11-12 13:57                                                                 ` Philippe Gerum
2010-11-12 14:30                                                                   ` Jan Kiszka
2010-11-12 17:42                                                                     ` Philippe Gerum
2010-11-12 18:42                                                                       ` Jan Kiszka
2010-11-14 21:28                                                                         ` Philippe Gerum
2010-10-07 14:07 ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CC5FAE6.6010305@domain.hid \
    --to=jan.kiszka@domain.hid \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.