All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <kiszka@domain.hid>
To: Philippe Gerum <rpm@xenomai.org>
Cc: xenomai-core <xenomai@xenomai.org>
Subject: Re: [Xenomai-core] [bug] don't try this at home...
Date: Wed, 07 Dec 2005 18:44:22 +0100	[thread overview]
Message-ID: <43971F76.4090505@domain.hid> (raw)
In-Reply-To: <4396DEC0.5060006@domain.hid>

[-- Attachment #1: Type: text/plain, Size: 5441 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>
>>> Jan Kiszka wrote:
>>>
>>>> Jan Kiszka wrote:
>>>>
>>>>
>>>>> Hi Philippe,
>>>>>
>>>>> I'm afraid this one is serious: let the attached migration stress test
>>>>> run on likely any Xenomai since 2.0, preferably with
>>>>> CONFIG_XENO_OPT_DEBUG on. Will give a nice crash sooner or later (I'm
>>>>> trying to set up a serial console now).
>>>>>
>>>
>>> Confirmed here. My test box went through some nifty triple salto out of
>>> the window running this frag for 2mn or so. Actually, the semop
>>> handshake is not even needed to cause the crash. At first sight, it
>>> looks like a migration issue taking place during the critical phase when
>>> a shadow thread switches back to Linux to terminate.
>>>
>>>
>>>>
>>>> As it took some time to persuade my box to not just reboot but to
>>>> give a
>>>> message, I'm posting here the kernel dump of the P-III running
>>>> nat_migration:
>>>>
>>>> [...]
>>>> Xenomai: starting native API services.
>>>> ce649fb4 ce648000 00000b17 00000202 c0139246 cdf2819c cdf28070 0b12d310
>>>>       00000037 ce648000 00000000 c02f0700 00009a28 00000000 b7e94a70
>>>> bfed63c8
>>>>       00000000 ce648000 c0102fcb b7e94a70 bfed63dc b7faf4b0 bfed63c8
>>>> 00000000
>>>> Call Trace:
>>>> [<c0139246>] __ipipe_dispatch_event+0x96/0x130
>>>> [<c0102fcb>] work_resched+0x6/0x1c
>>>> Xenomai: fatal: blocked thread migration[22175] rescheduled?!
>>>> (status=0x300010, sig=0, prev=watchdog/0[3])
>>>
>>> This babe is awaken by Linux while Xeno sees it in a dormant state,
>>> likely after it has terminated. No wonder why things are going wild
>>> after that... Ok, job queued. Thanks.
>>>
>>
>>
>> I think I can explain this warning now: This happens during creation of
>> a new userspace real-time thread. In the context of the newly created
>> Linux pthread that is to become a real-time thread, Xenomai first sets
>> up the real-time part and then calls xnshadow_map. The latter function
>> does further init and then signals via xnshadow_signal_completion to the
>> parent Linux thread (the caller of rt_task_create e.g.) that the thread
>> is up. This happens before xnshadow_harden, i.e. still in preemptible
>> linux context.
>>
>> The signalling should normally do not cause a reschedule as the caller -
>> the to-be-mapped linux pthread - has higher prio than the woken up
>> thread.
> 
> Xeno never assumes this.
> 
>  And Xenomai implicitly assumes with this fatal-test above that
>> there is no preemption! But it can happen: the watchdog thread of linux
>> does preempt here. So, I think it's a false positive.
>>
> 
> This is wrong. This check is not related to Linux preemption at all; it
> makes sure that control over any shadow is shared in a strictly
> _mutually exclusive_ way, so that a thread blocked at Xenomai level may
> not not be seen as runnable by Linux either. Disabling it only makes
> things worse since the scheduling state is obviously corrupted when it
> triggers, and that's the root bug we are chasing right now. You should
> not draw any conclusion beyond that. Additionally, keep in mind that
> Xeno has already run over some PREEMPT_RT patches, for which an infinite
> number of CPUs is assumed over a fine-grained code base, which induces
> maximum preemption probabilities.
> 

Ok, may explanation was a quick hack before some meeting here, I should
have elaborated it more thoroughly. Let's try to do it step by step so
that you can say where I go of the right path:

1. We enter xnshadow_map. The linux thread is happily running, the
   shadow thread is in XNDORMANT state and not yet linked to its linux
   mate. Any linux preemption hitting us here and causing a reactivation
   of this particular linux thread later will not cause any activity of
   do_schedule_event related to this thread because [1] is NULL. That's
   important, we will see later why.

2. After some init stuff, xnshadow_map links the shadow to the linux
   thread [2] and then calls xnshadow_signal_completion. This call would
   normally wake up the sleeping parent of our linux thread, performing
   a direct standard linux schedule from the new born thread to the
   parent. Again, nothing here about which do_schedule_event could
   complain.

3. Now let's consider some preemption by a third linux task after [2]
   but before [3]. Scheduling away the new linux thread is no issue. But
   when it comes back again, we will see those nice xnpod_fatal. The
   reason: our shadow thread is now linked to its linux mate, thus [1]
   will evaluate non-NULL, and later also [4] will hit as XNDORMANT is
   part of XNTHREAD_BLOCK_BITS (and the thread is not ptraced).

Ok, this is how I see THIS particular issue so far. For me the question
is now:

 a) I'm right?
 b) If yes, is this preemption uncritical, thus the warning in the
    described context a false positive?
 c) If it is not, can this cause the following crash?

Jan

[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1515
[2]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L765
[3]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L621
[4]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/shadow.c?v=SVN-trunk#L1555


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

  reply	other threads:[~2005-12-07 17:44 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-30 16:35 [Xenomai-core] [bug] don't try this at home Jan Kiszka
2005-11-30 17:29 ` Jan Kiszka
2005-11-30 17:45   ` Philippe Gerum
2005-12-07 12:50     ` Jan Kiszka
2005-12-07 13:08       ` Philippe Gerum
2005-12-07 17:44         ` Jan Kiszka [this message]
2005-12-09 12:53           ` Philippe Gerum
2005-12-16 20:26     ` Philippe Gerum
2005-12-16 20:56       ` Philippe Gerum
2005-12-16 23:36         ` Philippe Gerum
2005-12-18 13:55           ` Jan Kiszka
2005-12-18 18:30             ` Philippe Gerum
2005-12-18 18:59               ` Jan Kiszka
2005-12-18 19:19                 ` Philippe Gerum
2005-12-22 15:05               ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43971F76.4090505@domain.hid \
    --to=kiszka@domain.hid \
    --cc=rpm@xenomai.org \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.