From: Jan Kiszka <jan.kiszka@domain.hid>
To: Philippe Gerum <rpm@xenomai.org>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>,
Tschaeche IT-Services <services@domain.hid>
Subject: Re: [Xenomai-core] [PATCH] Mayday support
Date: Tue, 06 Jul 2010 17:54:01 +0200 [thread overview]
Message-ID: <4C335199.7000401@domain.hid> (raw)
In-Reply-To: <1278431097.1939.7.camel@domain.hid>
Philippe Gerum wrote:
> On Mon, 2010-06-28 at 16:06 +0200, Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2010-06-24 at 14:05 +0200, Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> I've toyed a bit to find a generic approach for the nucleus to regain
>>>>> complete control over a userland application running in a syscall-less
>>>>> loop.
>>>>>
>>>>> The original issue was about recovering gracefully from a runaway
>>>>> situation detected by the nucleus watchdog, where a thread would spin in
>>>>> primary mode without issuing any syscall, but this would also apply for
>>>>> real-time signals pending for such a thread. Currently, Xenomai rt
>>>>> signals cannot preempt syscall-less code running in primary mode either.
>>>>>
>>>>> The major difference between the previous approaches we discussed about
>>>>> and this one, is the fact that we now force the runaway thread to run a
>>>>> piece of valid code that calls into the nucleus. We do not force the
>>>>> thread to run faulty code or at a faulty address anymore. Therefore, we
>>>>> can reuse this feature to improve the rt signal management, without
>>>>> having to forge yet-another signal stack frame for this.
>>>>>
>>>>> The code introduced only fixes the watchdog related issue, but also does
>>>>> some groundwork for enhancing the rt signal support later. The
>>>>> implementation details can be found here:
>>>>> http://git.xenomai.org/?p=xenomai-rpm.git;a=commit;h=4cf21a2ae58354819da6475ae869b96c2defda0c
>>>>>
>>>>> The current mayday support is only available for powerpc and x86 for
>>>>> now, more will come in the next days. To have it enabled, you have to
>>>>> upgrade your I-pipe patch to 2.6.32.15-2.7-00 or 2.6.34-2.7-00 for x86,
>>>>> 2.6.33.5-2.10-01 or 2.6.34-2.10-00 for powerpc. That feature relies on a
>>>>> new interface available from those latest patches.
>>>>>
>>>>> The current implementation does not break the 2.5.x ABI on purpose, so
>>>>> we could merge it into the stable branch.
>>>>>
>>>>> We definitely need user feedback on this. Typically, does arming the
>>>>> nucleus watchdog with that patch support in, properly recovers from your
>>>>> favorite "get me out of here" situation? TIA,
>>>>>
>>>>> You can pull this stuff from
>>>>> git://git.xenomai.org/xenomai-rpm.git, queue/mayday branch.
>>>>>
>>>> I've retested the feature as it's now in master, and it has one
>>>> remaining problem: If you run the cpu hog under gdb control and try to
>>>> break out of the while(1) loop, this doesn't work before the watchdog
>>>> expired - of course. But if you send the break before the expiry (or hit
>>>> a breakpoint), something goes wrong. The Xenomai task continues to spin,
>>>> and there is no chance to kill its process (only gdb).
>>> I can't reproduce this easily here; it happened only once on a lite52xx,
>>> and then disappeared; no way to reproduce this once on a dual core atom
>>> in 64bit mode, or on a x86_32 single core platform either. But I still
>>> saw it once on a powerpc target, so this looks like a generic
>>> time-dependent issue.
>>>
>>> Do you have the same behavior on a single core config,
>> You cannot reproduce it on a single core as the CPU hog will occupy that
>> core and gdb cannot be operated.
>>
>>> and/or without
>>> WARNSW enabled?
>> Just tried and disabled WARNSW in the test below: no difference.
>>
>>> Also, could you post your hog test code? maybe there is a difference
>>> with the way I'm testing.
>> #include <signal.h>
>> #include <native/task.h>
>> #include <sys/mman.h>
>> #include <stdlib.h>
>>
>> void sighandler(int sig, siginfo_t *si, void *context)
>> {
>> printf("SIGDEBUG: reason=%d\n", si->si_value.sival_int);
>> exit(1);
>> }
>>
>> void loop(void *arg)
>> {
>> RT_TASK_INFO info;
>>
>> while (1)
>> if (!arg)
>> rt_task_inquire(NULL, &info);
>> }
>>
>> int main(int argc, const char *argv[])
>> {
>> struct sigaction sa;
>> RT_TASK task;
>>
>> sigemptyset(&sa.sa_mask);
>> sa.sa_sigaction = sighandler;
>> sa.sa_flags = SA_SIGINFO;
>> sigaction(SIGDEBUG, &sa, NULL);
>>
>> mlockall(MCL_CURRENT|MCL_FUTURE);
>> rt_task_spawn(&task, "cpu-hog", 0, 99, T_JOINABLE|T_WARNSW, loop,
>> (void *)(long)((argc > 1) && strcmp(argv[1], "--lethal") == 0));
>> rt_task_join(&task);
>>
>> return 0;
>> }
>
> I can't reproduce this issue, leaving the watchdog threshold to the
> default value (4s).
>
>> CONFIG_XENO_OPT_WATCHDOG=y
>> CONFIG_XENO_OPT_WATCHDOG_TIMEOUT=60
>
> 60s seems way too long to have a chance of recovering from a runaway
> loop to a reasonably sane state.
That's required for debugging the kernel.
> Do you still see the issue with shorter
> timeouts?
Yes, I usually lower the timeout before triggering the issue.
OK, I will try to find some time to look closer at this.
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2010-07-06 15:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-02 17:19 [Xenomai-core] [RFC] Break out of endless user space loops Jan Kiszka
2010-06-02 17:30 ` Gilles Chanteperdrix
2010-06-03 6:55 ` Jan Kiszka
2010-06-03 8:27 ` Philippe Gerum
2010-06-03 8:47 ` Jan Kiszka
2010-06-03 9:56 ` Philippe Gerum
2010-06-03 10:18 ` Jan Kiszka
2010-06-03 10:47 ` Philippe Gerum
2010-06-03 10:52 ` Philippe Gerum
2010-06-03 10:59 ` Jan Kiszka
2010-06-02 20:58 ` Philippe Gerum
2010-06-03 6:56 ` Jan Kiszka
2010-06-09 10:41 ` [Xenomai-core] [PATCH] Mayday support (was: Re: [RFC] Break out of endless user space loops) Philippe Gerum
2010-06-09 13:38 ` [Xenomai-help] " Tschaeche IT-Services
2010-06-09 14:01 ` Philippe Gerum
2010-06-09 18:11 ` Tschaeche IT-Services
2010-06-18 23:11 ` [Xenomai-core] " Philippe Gerum
2010-06-24 9:22 ` [Xenomai-help] " Tschaeche IT-Services
2010-06-24 9:34 ` [Xenomai-core] [PATCH] Mayday support Jan Kiszka
2010-06-24 10:28 ` [Xenomai-core] [PATCH] Mayday support (was: Re: [RFC] Break out of endless user space loops) Philippe Gerum
2010-06-24 12:05 ` [Xenomai-core] [PATCH] Mayday support Jan Kiszka
2010-06-27 16:01 ` Philippe Gerum
2010-06-28 14:06 ` Jan Kiszka
2010-06-28 14:12 ` Philippe Gerum
2010-07-06 15:44 ` Philippe Gerum
2010-07-06 15:54 ` Jan Kiszka [this message]
2010-07-06 16:41 ` Philippe Gerum
2010-07-06 17:10 ` Jan Kiszka
2010-08-20 12:32 ` Jan Kiszka
2010-08-20 14:00 ` Philippe Gerum
2010-08-20 14:06 ` Jan Kiszka
2010-08-20 14:20 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C335199.7000401@domain.hid \
--to=jan.kiszka@domain.hid \
--cc=rpm@xenomai.org \
--cc=services@domain.hid \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.