* [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai...
@ 2006-12-22 11:32 M. Koehrer
0 siblings, 0 replies; 5+ messages in thread
From: M. Koehrer @ 2006-12-22 11:32 UTC (permalink / raw)
To: dmitry.adamushko, mathias_koehrer; +Cc: xenomai
Hi Dmitry,
>
> > As this is hard to understand, I strongly recommend that there is Xenomai
> support
> > for this! I.e. a Xenomai API that can be called with a (callback-)function
> pointer and
> > a user data pointer.
> > When a (realtime) thread calls this function, the real time thread is
> blocked.
> > The callback function is then called from a safe context and
> > after exit of the callback function the real time thread is resumed.
>
> I haven't got your idea. Did you get the cause of the problem with fork() ?
Yes, I think I got it (more or less).
However, as a typical user of a OS I do not want to take care of this
very specific things. I want to call an OS-API that is doing the job for me.
I.e. in my example, I want to call an API that allows me to execute whatever
external application without any risk and side effects.
>From the functional point of view I want to use the "system()" call.
When it is not save to call system() directly (due to all the things that have been
discussed), I think it is important to document this (e.g. in the Wiki) _and_ to
provide an easy usable replacement for it, i.e. something like a rt_task_system()
that does all the required things internally for me and behaves on the functional side exactly
like the standard system() does.
The Xenomai Native example is a very good example for a cleanly designed API that
can be easily (and nearly intuitively) used.
Any workarounds (I call it workaround as the intuitive, straightforward approach is not possible)
that are required to get a problem solved, weakens the whole OS (Xenomai).
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-help] Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-21 10:21 ` M. Koehrer
@ 2006-12-21 10:45 Dmitry Adamushko
2006-12-20 14:11 ` [Xenomai-help] Aw: Re: A fairly small rtnet/Xenomai application that freezes the PC M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Adamushko @ 2006-12-21 10:45 UTC (permalink / raw)
To: M. Koehrer; +Cc: Xenomai help, Jan Kiszka
On 21/12/06, M. Koehrer <mathias_koehrer@domain.hid> wrote:
> Hi Jan, hi everybody,
>
> I have stripped down my program that is crashing Xenomai even further.
> (I have attached the complete source code).
> No rtnet is required.
> Now I have the following real time task:
>
> static void realtimetask(void *arg)
> {
> system("ls -l");
> rt_task_sleep(1000000000ULL);
> printf("rt_task_sleep done...\n");
> }
Is it still true that when you place printf() right after the system()
call, it works?
What happens when you try different sleep intervals : 0, say 1000 ?
Just to be sure where we are stuck. Insert exit() (or rt_task_delete(NULL)) :
(0) after system() --- if said above about printf() is not true;
(1) after rt_task_sleep();
(2) after printf("rt_task_sleep done...\n").
At which step does a hang start occuring?
>
> This leads to a complete freeze of the PC on a 2.6.19.1 kernel using the latest
> Xenomai (from SVN) and the included adeos-patch.
I have almost the same config (only 2.6.19) at home, although it's p3
750 so I'll try if it won't be solved by this everning (or we are sure
that only high-end machines are involved).
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Aw: Re: A fairly small rtnet/Xenomai application that freezes the PC
@ 2006-12-20 14:11 ` M. Koehrer
2006-12-19 8:08 ` [Xenomai-help] NMI watchdog: Loading of xeno_native leads to reboot of PC Jan Kiszka
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-20 14:11 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
[-- Attachment #1.1: Type: text/plain, Size: 2569 bytes --]
Hi Jan,
enclosed is the ethereal dump file.
Unfortunately I do not have a slow machine... (We try to get the fastest machines running...).
Thanks for checking the application.
Mathias
> > after a long debug-reboot-try-again-session I was able to reduce the
> problem I had
> > to a very short application that leads to a complete system freeze on a
> 2.6.19.1 kernel.
> > As with my PC I am not able to run the NMI stuff within Xenomai, I forward
> the application to you
> > and the list. Perhaps somebody can try out the enclosed application with
> the latest Xenomai version
> > on 2.6.19.1.
>
> Thanks for the test case, I will give this a try ASAP.
>
> > Please adjust the IP settings in the C file and in the xenorun script to
> your setup.
> > It is sufficient to send out UDP frames, if the remote device does not
> answer this does not hurt.
> > However the remote device should listen on a specified port to avoid ICMP
> complains about
> > no-opened UDP ports...
> > Using Ethereal I monitored the traffic.
> > Here is the summary of it:
> > ARP request/response (from the xenorun script)
> > ARP request/response (from the applicaion)
> > UDP message to port 18765 (no answer)
> > 2 seconds later: UDP message to port 18765 (no answer)
> > 2+5=7 seconds later: ARP request/response (from the application)
> > 2 seconds later: UDP message to port 18765
> > Here the system freezes. I do not see the message "Step A".
>
> Could you set me the Ethereal dump for reference? Just in case the test
> does not kick immediately for me and I need to check the event flow.
>
> >
> > One important piece is the system() call out of the realtime application
> to do a rtroute.
> > When I remove this call, there is no error...
> >
> >
> > I am using Xenomai SVN #1969 and the included 2.6.19.1 patch. Pentium 4
> UP.
>
> Do you have a different (slower) execution platform at hand to check if
> the CPU speed as influence on the lock-up? I hope it is not the case -
> makes tracking easier. But it wouldn't be the first time.
>
> >
> > Any feedback on this is highly welcome!
> >
>
> You will get it.
>
> Thanks again,
> Jan
>
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
[-- Attachment #2: rtnet_crashtest.tcpdump --]
[-- Type: application/octet-stream, Size: 714 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads to reboot of PC
2006-12-19 7:54 ` M. Koehrer
@ 2006-12-19 8:08 ` Jan Kiszka
2006-12-19 7:54 ` M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2006-12-19 8:08 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
M. Koehrer wrote:
> Hi!
>
> Before digging deeper into the issue I have mailed yesterday (see below), I have to solve and
> to understand the NMI watchdog feature.
> I have enabled the NMI watchdog in the kernel configuration and set the time value to 100 now (100us).
> Also, I passed the nmi_watchdog=1 kernel parameter to GRUB.
> In dmesg's output I see the line:
> Testing NMI watchdog ... OK.
>
> The Xenomai functionality is compiled as modules (as far as it is possible).
>
> Now, I do a
> modprobe xeno_nucleus
> This looks fine.
>
> Now, I do a
> modprobe xeno_native
> to load the native skin (I need it for my application to run).
> Then the PC reboots directly.
> This means I have no chance to start my application as the PC reboots before
> I can start it...
>
> I have the impression that some important thing is missing in my test.
> The xeno timer is not started after modprobe xenu_nucleus
> The contents of /proc/timer is:
> status=off:setup=120:tickval=0:jiffies=0
>
> Could this be a reason for the behaviour?
That's normal, it is started on loading the first skin.
> Can I start the timer before loading xeno_native to avoid the NMIs?
Nope, the NMI must work as you tried to use it. I have basically the
same setup here (all modular, NMI on by default), but even with latest
kernel/ipipe everything is fine.
Could you post your .config to compare details? Also the output of
/proc/interrupts over .19 would be interesting.
Jan
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] NMI watchdog: Loading of xeno_native leads to reboot of PC
@ 2006-12-19 7:54 ` M. Koehrer
2006-12-19 8:14 ` Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads to M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-19 7:54 UTC (permalink / raw)
To: gilles.chanteperdrix, mathias_koehrer; +Cc: xenomai, jan.kiszka
Hi!
Before digging deeper into the issue I have mailed yesterday (see below), I have to solve and
to understand the NMI watchdog feature.
I have enabled the NMI watchdog in the kernel configuration and set the time value to 100 now (100us).
Also, I passed the nmi_watchdog=1 kernel parameter to GRUB.
In dmesg's output I see the line:
Testing NMI watchdog ... OK.
The Xenomai functionality is compiled as modules (as far as it is possible).
Now, I do a
modprobe xeno_nucleus
This looks fine.
Now, I do a
modprobe xeno_native
to load the native skin (I need it for my application to run).
Then the PC reboots directly.
This means I have no chance to start my application as the PC reboots before
I can start it...
I have the impression that some important thing is missing in my test.
The xeno timer is not started after modprobe xenu_nucleus
The contents of /proc/timer is:
status=off:setup=120:tickval=0:jiffies=0
Could this be a reason for the behaviour?
Can I start the timer before loading xeno_native to avoid the NMIs?
Thanks for all help on this topic as without this feature it seems to be impossible
to detect the bug/issue with a freezing system with the latest Xenomai/rtnet and
kernel 2.6.19.1. (see https://mail.gna.org/public/xenomai-help/2006-12/msg00109.html)
Regards
Mathias
----- Original Nachricht ----
Von: Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
An: "M. Koehrer" <mathias_koehrer@domain.hid>
Datum: 18.12.2006 16:32
Betreff: Re: Aw: Re: Aw: Re: [RTnet-users] [Xenomai-help] rtnet / Xenomai:
Kernel
> M. Koehrer wrote:
> > O.k,
> >
> > I tried once more with the NMI watchdog stuff.
> > However, it looks as if I do not understand the NMI watchdog correctly...
> > I passed nmi_watchdog=1 as kernel parameter.
> > Now, the NMI watchdog seems to be o.k.
> > I have set the kernel parameter "NMI watchdog latency threshold (us)" to
> 1000000 (1 second).
> > Now I do a modprobe xeno_nucleus.
> > This is o.k.
> > Then I do a modprobe xeno_native.
> > This leads to a watchdog NMI on the console after 1s.
> > "NMI watchdog detected timer latency above 100000us"
> > CPU 1
> > EIP is at mwait_idle 0x23/0x37
> >
> > When I compile the Xenomai functionality directly into the kernel (no
> modules), I never reach
> > the login prompt at my PC as the NMI watchdog from above came first...
> >
> > Well, somehow the NMI stuff seems to work. However, I am not able to start
> my application
> > as my systems gets the NMI before I have the chance to start the
> application...
> > What is wrong here? I think, I miss one piece in the puzzle...
> >
> > Thanks for any support on that strange behaviour.
>
> 1 second is probably way to much and overflow a 32 bits value when
> converted to a processor ticks count. The defaults of 100 us is more
> reasonable.
>
> --
> Gilles Chanteperdrix
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads to
2006-12-19 8:08 ` [Xenomai-help] NMI watchdog: Loading of xeno_native leads to reboot of PC Jan Kiszka
@ 2006-12-19 8:14 ` M. Koehrer
2006-12-19 9:26 ` [Xenomai-help] NMI watchdog: Loading of xeno_native leads M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-19 8:14 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
[-- Attachment #1.1: Type: text/plain, Size: 2431 bytes --]
Hi Jan,
here is my /proc/interrupts
CPU0
0: 331260 IO-APIC-edge timer
1: 8 IO-APIC-edge i8042
7: 0 IO-APIC-edge parport0
9: 0 IO-APIC-fasteoi acpi
14: 12 IO-APIC-edge ide0
16: 1022 IO-APIC-fasteoi eth0
19: 2358 IO-APIC-fasteoi libata
NMI: 331248
LOC: 331233
ERR: 0
MIS: 0
I have enclosed a config.gz file of my 2.6.19.1 kernel.
Mathias
> > Before digging deeper into the issue I have mailed yesterday (see below),
> I have to solve and
> > to understand the NMI watchdog feature.
> > I have enabled the NMI watchdog in the kernel configuration and set the
> time value to 100 now (100us).
> > Also, I passed the nmi_watchdog=1 kernel parameter to GRUB.
> > In dmesg's output I see the line:
> > Testing NMI watchdog ... OK.
> >
> > The Xenomai functionality is compiled as modules (as far as it is
> possible).
> >
> > Now, I do a
> > modprobe xeno_nucleus
> > This looks fine.
> >
> > Now, I do a
> > modprobe xeno_native
> > to load the native skin (I need it for my application to run).
> > Then the PC reboots directly.
> > This means I have no chance to start my application as the PC reboots
> before
> > I can start it...
> >
> > I have the impression that some important thing is missing in my test.
> > The xeno timer is not started after modprobe xenu_nucleus
> > The contents of /proc/timer is:
> > status=off:setup=120:tickval=0:jiffies=0
> >
> > Could this be a reason for the behaviour?
>
> That's normal, it is started on loading the first skin.
>
> > Can I start the timer before loading xeno_native to avoid the NMIs?
>
> Nope, the NMI must work as you tried to use it. I have basically the
> same setup here (all modular, NMI on by default), but even with latest
> kernel/ipipe everything is fine.
>
> Could you post your .config to compare details? Also the output of
> /proc/interrupts over .19 would be interesting.
>
> Jan
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
[-- Attachment #2: config.gz --]
[-- Type: application/octetstream, Size: 8526 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads...
@ 2006-12-19 9:26 ` M. Koehrer
2006-12-19 12:04 ` Aw: " M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-19 9:26 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
Hi!
It was the compiler!!! Using gcc-3.3 solved the issue.
That means, that somehow gcc-4.1.2 must not be used with Xenomai currently...
This allowed me to catch the system hang I was hunting for since yesterday...
And I got it.
I write the information I found on the console (I have not tried with console redirection yet...).
CPU 0, eip c010122d
EIP 0060:[<c010122d>]
EFLAGS 00000246
EIP is at mwait_idle_with_hints+0x2c/0x2e
eax:0 ebx:c0498000 ecx:0 edx:0
Call Trace
mwait_idle+0x0/0x2e
mwait_idle+0x1d/0x2e
cpu_idle+0x44/0x86
start_kernel+0x1f1/0x1f5
unknown_bootoption+0x0/0x191
I'll try to reproduce the same with console redirection, however, I hope this could help already!
Regards
Mathias
> Hi!
>
> I have modified the configuration to disable profiling and to set the NMI
> time value to 100.
> However, there is no difference.
> Could this be a compiler issue?
> I am using gcc 4.1.2 (debian etch prerelease)
> I try to recompile the kernel using an older gcc-3.3 to see if this helps.
>
> Regards
>
> Mathias
> > > here is my /proc/interrupts
> > >
> > > CPU0
> > > 0: 331260 IO-APIC-edge timer
> > > 1: 8 IO-APIC-edge i8042
> > > 7: 0 IO-APIC-edge parport0
> > > 9: 0 IO-APIC-fasteoi acpi
> > > 14: 12 IO-APIC-edge ide0
> > > 16: 1022 IO-APIC-fasteoi eth0
> > > 19: 2358 IO-APIC-fasteoi libata
> > > NMI: 331248
> > > LOC: 331233
> > > ERR: 0
> > > MIS: 0
> >
> > Nothing unusual on first sight.
> >
> > >
> > > I have enclosed a config.gz file of my 2.6.19.1 kernel.
> >
> > Two things to try:
> > - CONFIG_XENO_HW_NMI_DEBUG_LATENCY_MAX=100 (i.e. default again)
> > - CONFIG_PROFILING=y (I'm not sure ATM if it may interfere with
> > Xenomai's watchdog)
> >
> > Jan
> >
> >
>
> --
> Mathias Koehrer
> mathias_koehrer@domain.hid
>
>
> Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
> ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
> und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
> nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
> http://www.arcor.de/rd/emf-dsl-2
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* Aw: Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads...
@ 2006-12-19 12:04 ` M. Koehrer
2006-12-20 13:25 ` [Xenomai-help] A fairly small rtnet/Xenomai application that freezes the PC M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-19 12:04 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
Hi Jan,
I tried to enable the kernel hacking parameters you proposed.
However, now I get a "NMI early shots: 0" message many times per second.
These seems to slow down everything dramatically, it is hard to work with the
system. What are the early shots messages and how can I avoid them?
Regards
Mathias
> > It was the compiler!!! Using gcc-3.3 solved the issue.
> > That means, that somehow gcc-4.1.2 must not be used with Xenomai
> currently...
>
> OK, this needs some examination then. Can anyone reproduce this issue?
> /me is currently lacking the compiler.
>
> >
> > This allowed me to catch the system hang I was hunting for since
> yesterday...
> > And I got it.
> > I write the information I found on the console (I have not tried with
> console redirection yet...).
> > CPU 0, eip c010122d
> > EIP 0060:[<c010122d>]
> > EFLAGS 00000246
> > EIP is at mwait_idle_with_hints+0x2c/0x2e
> > eax:0 ebx:c0498000 ecx:0 edx:0
> >
> > Call Trace
> > mwait_idle+0x0/0x2e
> > mwait_idle+0x1d/0x2e
> > cpu_idle+0x44/0x86
> > start_kernel+0x1f1/0x1f5
> > unknown_bootoption+0x0/0x191
>
> Well, this caught your box in the idle loop - not that informative yet.
> We must probably look beyond the last context switch. Please switch on
> the I-pipe tracer (kernel hacking -> I-pipe debugging) and configure
> back_trace_points to, hmm, say 200. On NMI alarm, we should then see a
> function call trace.
>
> >
> > I'll try to reproduce the same with console redirection, however, I hope
> this could help already!
> >
> > Regards
> >
> > Mathias
>
> Thanks for your effort,
>
> Jan
>
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] A fairly small rtnet/Xenomai application that freezes the PC
@ 2006-12-20 13:25 ` M. Koehrer
2006-12-21 8:48 ` [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-20 13:25 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
[-- Attachment #1.1: Type: text/plain, Size: 1786 bytes --]
Hi Jan and Gilles,
after a long debug-reboot-try-again-session I was able to reduce the problem I had
to a very short application that leads to a complete system freeze on a 2.6.19.1 kernel.
As with my PC I am not able to run the NMI stuff within Xenomai, I forward the application to you
and the list. Perhaps somebody can try out the enclosed application with the latest Xenomai version
on 2.6.19.1.
Please adjust the IP settings in the C file and in the xenorun script to your setup.
It is sufficient to send out UDP frames, if the remote device does not answer this does not hurt.
However the remote device should listen on a specified port to avoid ICMP complains about
no-opened UDP ports...
Using Ethereal I monitored the traffic.
Here is the summary of it:
ARP request/response (from the xenorun script)
ARP request/response (from the applicaion)
UDP message to port 18765 (no answer)
2 seconds later: UDP message to port 18765 (no answer)
2+5=7 seconds later: ARP request/response (from the application)
2 seconds later: UDP message to port 18765
Here the system freezes. I do not see the message "Step A".
One important piece is the system() call out of the realtime application to do a rtroute.
When I remove this call, there is no error...
I am using Xenomai SVN #1969 and the included 2.6.19.1 patch. Pentium 4 UP.
Any feedback on this is highly welcome!
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
[-- Attachment #2: rtnet_crashtest.tgz --]
[-- Type: application/octet-stream, Size: 1889 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-20 14:11 ` [Xenomai-help] Aw: Re: A fairly small rtnet/Xenomai application that freezes the PC M. Koehrer
@ 2006-12-21 8:48 ` M. Koehrer
2006-12-21 9:03 ` [Xenomai-help] " Jan Kiszka
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 8:48 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
Hi Jan,
meanwhile I have done a couple of additional tests.
With one of the PCs I have, I have disabled the Memory Cache to slow it down.
The effect was the very same.
I have reproduced the behaviour on another PC (also P4, but different mainboard).
The system freezes as well.
And finally, I have found out that printing out a line with printf() directly after the system() call provides
a workaround.Then the system is stable.
I have replaced the system() call where I called rtroute with a simple call to "ls -l" (i.e system("ls -l") ).
Then the system freezes as well.
It looks to me as if a system() call out of the realtime task is not properly handled.
An printf() after the system() call seems to move the system back on track...
Hope that helps a little bit to identify the issue...
Regards
Mathias
> Hi Jan,
>
> enclosed is the ethereal dump file.
> Unfortunately I do not have a slow machine... (We try to get the fastest
> machines running...).
>
> Thanks for checking the application.
>
> Mathias
> > > after a long debug-reboot-try-again-session I was able to reduce the
> > problem I had
> > > to a very short application that leads to a complete system freeze on a
> > 2.6.19.1 kernel.
> > > As with my PC I am not able to run the NMI stuff within Xenomai, I
> forward
> > the application to you
> > > and the list. Perhaps somebody can try out the enclosed application
> with
> > the latest Xenomai version
> > > on 2.6.19.1.
> >
> > Thanks for the test case, I will give this a try ASAP.
> >
> > > Please adjust the IP settings in the C file and in the xenorun script
> to
> > your setup.
> > > It is sufficient to send out UDP frames, if the remote device does not
> > answer this does not hurt.
> > > However the remote device should listen on a specified port to avoid
> ICMP
> > complains about
> > > no-opened UDP ports...
> > > Using Ethereal I monitored the traffic.
> > > Here is the summary of it:
> > > ARP request/response (from the xenorun script)
> > > ARP request/response (from the applicaion)
> > > UDP message to port 18765 (no answer)
> > > 2 seconds later: UDP message to port 18765 (no answer)
> > > 2+5=7 seconds later: ARP request/response (from the application)
> > > 2 seconds later: UDP message to port 18765
> > > Here the system freezes. I do not see the message "Step A".
> >
> > Could you set me the Ethereal dump for reference? Just in case the test
> > does not kick immediately for me and I need to check the event flow.
> >
> > >
> > > One important piece is the system() call out of the realtime
> application
> > to do a rtroute.
> > > When I remove this call, there is no error...
> > >
> > >
> > > I am using Xenomai SVN #1969 and the included 2.6.19.1 patch. Pentium 4
> > UP.
> >
> > Do you have a different (slower) execution platform at hand to check if
> > the CPU speed as influence on the lock-up? I hope it is not the case -
> > makes tracking easier. But it wouldn't be the first time.
> >
> > >
> > > Any feedback on this is highly welcome!
> > >
> >
> > You will get it.
> >
> > Thanks again,
> > Jan
> >
> >
>
> --
> Mathias Koehrer
> mathias_koehrer@domain.hid
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-21 8:48 ` [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the M. Koehrer
@ 2006-12-21 9:03 ` Jan Kiszka
2006-12-21 10:21 ` M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Jan Kiszka @ 2006-12-21 9:03 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]
M. Koehrer wrote:
> Hi Jan,
>
> meanwhile I have done a couple of additional tests.
> With one of the PCs I have, I have disabled the Memory Cache to slow it down.
> The effect was the very same.
> I have reproduced the behaviour on another PC (also P4, but different mainboard).
> The system freezes as well.
> And finally, I have found out that printing out a line with printf() directly after the system() call provides
> a workaround.Then the system is stable.
>
> I have replaced the system() call where I called rtroute with a simple call to "ls -l" (i.e system("ls -l") ).
> Then the system freezes as well.
> It looks to me as if a system() call out of the realtime task is not properly handled.
> An printf() after the system() call seems to move the system back on track...
>
> Hope that helps a little bit to identify the issue...
Unfortunately not yet. I tried with exactly the same configuration you
once mailed on a Pentium M 1.3 GHz - but all worked fine. The obvious
differences are the CPU speed (I may have access to a crash box with
more GHz next week) and the NIC (rt_eepro100 in my case).
As my hope of being able to reproduce it on my own is not that high, I
would like to ask you to try nailing down the lock-up on kernel level.
That means trying to identify (via printk e.g. - if this doesn't make
the bug jump around) which function is still executed and where we do
not get.
TIA,
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-21 9:03 ` [Xenomai-help] " Jan Kiszka
@ 2006-12-21 10:21 ` M. Koehrer
2006-12-21 11:19 ` [Xenomai-help] " M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 10:21 UTC (permalink / raw)
To: jan.kiszka, mathias_koehrer; +Cc: xenomai
[-- Attachment #1.1: Type: text/plain, Size: 2547 bytes --]
Hi Jan, hi everybody,
I have stripped down my program that is crashing Xenomai even further.
(I have attached the complete source code).
No rtnet is required.
Now I have the following real time task:
static void realtimetask(void *arg)
{
system("ls -l");
rt_task_sleep(1000000000ULL);
printf("rt_task_sleep done...\n");
}
This leads to a complete freeze of the PC on a 2.6.19.1 kernel using the latest
Xenomai (from SVN) and the included adeos-patch.
I never had this issue on 2.6.17.* kernel.
Unfortunately this seems to occur only on fast Pentium 4 machines...
Is there anybody out there that can reproduce it?
Thanks for all feedback on this!
Mathias
> > meanwhile I have done a couple of additional tests.
> > With one of the PCs I have, I have disabled the Memory Cache to slow it
> down.
> > The effect was the very same.
> > I have reproduced the behaviour on another PC (also P4, but different
> mainboard).
> > The system freezes as well.
> > And finally, I have found out that printing out a line with printf()
> directly after the system() call provides
> > a workaround.Then the system is stable.
> >
> > I have replaced the system() call where I called rtroute with a simple
> call to "ls -l" (i.e system("ls -l") ).
> > Then the system freezes as well.
> > It looks to me as if a system() call out of the realtime task is not
> properly handled.
> > An printf() after the system() call seems to move the system back on
> track...
> >
> > Hope that helps a little bit to identify the issue...
>
> Unfortunately not yet. I tried with exactly the same configuration you
> once mailed on a Pentium M 1.3 GHz - but all worked fine. The obvious
> differences are the CPU speed (I may have access to a crash box with
> more GHz next week) and the NIC (rt_eepro100 in my case).
>
> As my hope of being able to reproduce it on my own is not that high, I
> would like to ask you to try nailing down the lock-up on kernel level.
> That means trying to identify (via printk e.g. - if this doesn't make
> the bug jump around) which function is still executed and where we do
> not get.
>
> TIA,
> Jan
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
[-- Attachment #2: xeno_crash.tgz --]
[-- Type: application/octet-stream, Size: 954 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-21 10:45 [Xenomai-help] Re: A fairly small rtnet/Xenomai application that freezes the Dmitry Adamushko
@ 2006-12-21 11:19 ` M. Koehrer
2006-12-21 11:28 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 11:19 UTC (permalink / raw)
To: dmitry.adamushko, mathias_koehrer; +Cc: xenomai, jan.kiszka
Hi Dmitry,
thanks for your response.
>
>
> Is it still true that when you place printf() right after the system()
> call, it works?
Yes, a printf() directly after the system() fixes the issue.
>
> What happens when you try different sleep intervals : 0, say 1000 ?
>
> Just to be sure where we are stuck. Insert exit() (or rt_task_delete(NULL))
> :
>
> (0) after system() --- if said above about printf() is not true;
> (1) after rt_task_sleep();
> (2) after printf("rt_task_sleep done...\n").
I have added a hacky trace mechanism in the kernel to see what's happening
(whenever the nmi_watchdog with the kernel oops appears).
What I can see is that rt_task_sleep(); is completely passed.
If I modify my application to add a second rt_task_sleep() directly after the
existing one, I see the both rt_task_sleep() calls are passed.
The freeze seems to happen with the printf() call after the rt_task_sleep.
Unfortunately, I do not know where I can trace here...
Best regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the
2006-12-21 11:19 ` [Xenomai-help] " M. Koehrer
@ 2006-12-21 11:28 ` Philippe Gerum
2006-12-21 11:51 ` [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2006-12-21 11:28 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai, jan.kiszka
On Thu, 2006-12-21 at 12:19 +0100, M. Koehrer wrote:
> Hi Dmitry,
>
> thanks for your response.
> >
> >
> > Is it still true that when you place printf() right after the system()
> > call, it works?
> Yes, a printf() directly after the system() fixes the issue.
>
> >
> > What happens when you try different sleep intervals : 0, say 1000 ?
> >
> > Just to be sure where we are stuck. Insert exit() (or rt_task_delete(NULL))
> > :
The important issue to check is the above one. Please try calling
rt_task_delete(NULL), and in a second test, rt_task_suspend(NULL),
instead of letting the thread routine return. TIA,
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 11:28 ` Philippe Gerum
@ 2006-12-21 11:51 ` M. Koehrer
2006-12-21 13:09 ` Dmitry Adamushko
2006-12-21 13:36 ` [Xenomai-help] " M. Koehrer
0 siblings, 2 replies; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 11:51 UTC (permalink / raw)
To: rpm, mathias_koehrer; +Cc: xenomai, jan.kiszka
Hello Philippe,
> The important issue to check is the above one. Please try calling
> rt_task_delete(NULL), and in a second test, rt_task_suspend(NULL),
> instead of letting the thread routine return. TIA,
O.K, here are the results:
I have placed rt_task_delete() at all possible positions in realtimetask().
The system freezes when I place rt_task_delete() after the printf() statement.
At all other positions (after system(), after rt_task_sleep()) the system does not freeze.
I get the very same results when I use rt_task_suspend() instead of rt_task_delete().
It means, that the printf() call causes the freeze.
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 11:51 ` [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai M. Koehrer
@ 2006-12-21 13:09 ` Dmitry Adamushko
2006-12-21 13:36 ` [Xenomai-help] " M. Koehrer
1 sibling, 0 replies; 5+ messages in thread
From: Dmitry Adamushko @ 2006-12-21 13:09 UTC (permalink / raw)
To: M. Koehrer; +Cc: Xenomai help, Jan Kiszka
On 21/12/06, M. Koehrer <mathias_koehrer@domain.hid> wrote:
> Hello Philippe,
>
> > The important issue to check is the above one. Please try calling
> > rt_task_delete(NULL), and in a second test, rt_task_suspend(NULL),
> > instead of letting the thread routine return. TIA,
> O.K, here are the results:
> I have placed rt_task_delete() at all possible positions in realtimetask().
> The system freezes when I place rt_task_delete() after the printf() statement.
> At all other positions (after system(), after rt_task_sleep()) the system does not freeze.
> I get the very same results when I use rt_task_suspend() instead of rt_task_delete().
>
> It means, that the printf() call causes the freeze.
just to be sure.
Could you try with other secondary domain calls (instead of printf() )
: write() (open/close as I suggested before) or even getpid() ?
(1) instead of the last code line (printf() after rt_task_sleep() ) ;
(2) put it after system(). Does it still prevent a PC from hanging?
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 13:09 ` Dmitry Adamushko
(?)
@ 2006-12-21 13:36 ` M. Koehrer
2006-12-21 14:13 ` Philippe Gerum
-1 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 13:36 UTC (permalink / raw)
To: dmitry.adamushko, mathias_koehrer; +Cc: xenomai, jan.kiszka
Hi all,
some more interesing measurements:
1) Even when I remove the printf() completely, the system freezes.
My realtimetask is then
static void realtimetask(void *arg)
{
system("ls -l");
rt_task_sleep(1000000000ULL);
}
2) When I replace the printf() by a gepid() the behaviour is the same.
3) Jan, your application using the rt_task_shadow behaves the very same (it freezes)!
4) A getpid() directly after system does not help - however, a printf at this position helps to
prevent the freeze.
But now, I found one very interesting thing:
When I rename my /lib/tls to /lib/tls.disabled, it works!!!!
It seems to be (once more) a ugly thread local storage stuff.
The ldd dependency of xeno_crash show libc and libpthread that are take from
/lib/tls/i686/cmov directory.
Perhaps that could give a hint!
Regards
Mathias
> Hello Philippe,
> >
> > > The important issue to check is the above one. Please try calling
> > > rt_task_delete(NULL), and in a second test, rt_task_suspend(NULL),
> > > instead of letting the thread routine return. TIA,
> > O.K, here are the results:
> > I have placed rt_task_delete() at all possible positions in
> realtimetask().
> > The system freezes when I place rt_task_delete() after the printf()
> statement.
> > At all other positions (after system(), after rt_task_sleep()) the system
> does not freeze.
> > I get the very same results when I use rt_task_suspend() instead of
> rt_task_delete().
> >
> > It means, that the printf() call causes the freeze.
>
> just to be sure.
>
> Could you try with other secondary domain calls (instead of printf() )
> : write() (open/close as I suggested before) or even getpid() ?
>
> (1) instead of the last code line (printf() after rt_task_sleep() ) ;
>
> (2) put it after system(). Does it still prevent a PC from hanging?
>
>
> --
> Best regards,
> Dmitry Adamushko
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 13:36 ` [Xenomai-help] " M. Koehrer
@ 2006-12-21 14:13 ` Philippe Gerum
2006-12-21 15:00 ` [Xenomai-help] " M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2006-12-21 14:13 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
On Thu, 2006-12-21 at 14:36 +0100, M. Koehrer wrote:
> Hi all,
>
> some more interesing measurements:
> 1) Even when I remove the printf() completely, the system freezes.
> My realtimetask is then
> static void realtimetask(void *arg)
> {
> system("ls -l");
> rt_task_sleep(1000000000ULL);
>
> }
>
> 2) When I replace the printf() by a gepid() the behaviour is the same.
>
> 3) Jan, your application using the rt_task_shadow behaves the very same (it freezes)!
>
> 4) A getpid() directly after system does not help - however, a printf at this position helps to
> prevent the freeze.
>
> But now, I found one very interesting thing:
> When I rename my /lib/tls to /lib/tls.disabled, it works!!!!
> It seems to be (once more) a ugly thread local storage stuff.
> The ldd dependency of xeno_crash show libc and libpthread that are take from
> /lib/tls/i686/cmov directory.
> Perhaps that could give a hint!
Please check the following assertions on your setup:
- does enabling the debug option for the Xenomai nucleus cause Xenomai
warnings to appear (messages about forced switches of the crashtest task
to secondary mode), even over 2.6.17 with the very same test code?
- does enabling CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP in
the kernel hacking section cause Linux warnings to appear while the test
code runs over 2.6.19, before the box crashes?
- does the bug still occurs after the call to the system() routine has
been replaced by the following frag?
if (vfork() == 0)
execlp("/bin/ls", "ls", "-l", NULL);
else
wait(NULL);
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: Re: Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 14:13 ` Philippe Gerum
@ 2006-12-21 15:00 ` M. Koehrer
2006-12-21 15:17 ` Dmitry Adamushko
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 15:00 UTC (permalink / raw)
To: rpm, mathias_koehrer; +Cc: xenomai
Hi Philippe,
here are some of the results:
> Please check the following assertions on your setup:
>
> - does enabling the debug option for the Xenomai nucleus cause Xenomai
> warnings to appear (messages about forced switches of the crashtest task
> to secondary mode), even over 2.6.17 with the very same test code?
Yes, on 2.6.17 I get the line
Xenomai: Switching crashtest to secondary mode after exception #14 from user-space at 0xb7ff2c7c (pid 1616)
On 2.6.19.1, the system freezes, i.e. I cannot see any message.
>
> - does enabling CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP in
> the kernel hacking section cause Linux warnings to appear while the test
> code runs over 2.6.19, before the box crashes?
I am about to recompile the kernel, this takes a while...
> - does the bug still occurs after the call to the system() routine has
> been replaced by the following frag?
>
> if (vfork() == 0)
> execlp("/bin/ls", "ls", "-l", NULL);
> else
> wait(NULL);
>
I have replaced the system() call by your code fragement. And yes, this works!
No freeze!
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [Xenomai-help] Re: Re: Re: Re: Re: A fairly small rtnet/Xenomai
2006-12-21 15:00 ` [Xenomai-help] " M. Koehrer
@ 2006-12-21 15:17 ` Dmitry Adamushko
2006-12-21 15:36 ` [Xenomai-help] " M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Adamushko @ 2006-12-21 15:17 UTC (permalink / raw)
To: M. Koehrer; +Cc: Xenomai help
On 21/12/06, M. Koehrer <mathias_koehrer@domain.hid> wrote:
>
> > - does the bug still occurs after the call to the system() routine has
> > been replaced by the following frag?
> >
> > if (vfork() == 0)
> > execlp("/bin/ls", "ls", "-l", NULL);
> > else
> > wait(NULL);
> >
> I have replaced the system() call by your code fragement. And yes, this works!
> No freeze!
Ok, could you let me know what happens if you use fork() instead of vfork()?
system() is the same thing : fork -> exec + but it also deals with signals.
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai...
2006-12-21 15:17 ` Dmitry Adamushko
@ 2006-12-21 15:36 ` M. Koehrer
2006-12-21 15:43 ` Dmitry Adamushko
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-21 15:36 UTC (permalink / raw)
To: dmitry.adamushko, mathias_koehrer; +Cc: xenomai
Hi Dmitry,
when I use fork() instead of vfork() I have the freeze again.
I.e. using vfork() it works fine, using fork() freezes the system.
Regards
Mathias
> >
> > > - does the bug still occurs after the call to the system() routine has
> > > been replaced by the following frag?
> > >
> > > if (vfork() == 0)
> > > execlp("/bin/ls", "ls", "-l", NULL);
> > > else
> > > wait(NULL);
> > >
> > I have replaced the system() call by your code fragement. And yes, this
> works!
> > No freeze!
>
> Ok, could you let me know what happens if you use fork() instead of
> vfork()?
>
> system() is the same thing : fork -> exec + but it also deals with signals.
>
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai...
2006-12-21 15:36 ` [Xenomai-help] " M. Koehrer
@ 2006-12-21 15:43 ` Dmitry Adamushko
2006-12-22 9:06 ` M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Adamushko @ 2006-12-21 15:43 UTC (permalink / raw)
To: M. Koehrer; +Cc: Xenomai help
> when I use fork() instead of vfork() I have the freeze again.
> I.e. using vfork() it works fine, using fork() freezes the system.
Gilles: is this not the end point of the fork vs. vfork epopee ? :)
Anyway, it looks your remarks about fork() make sense. But if it's
true indeed, a more nicer solution would be required.
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai...
2006-12-21 18:18 ` Gilles Chanteperdrix
@ 2006-12-22 9:06 ` M. Koehrer
2006-12-22 9:24 ` [Xenomai-help] " Gilles Chanteperdrix
-1 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-22 9:06 UTC (permalink / raw)
To: gilles.chanteperdrix, dmitry.adamushko; +Cc: xenomai
Good morning everybody,
here are some more news concerning the Xenomai crash issue.
1) I have tried to enable the CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP
kernel config parameters. But I did not get any messages before the kernel freezes.
2) I have tried to link a static application (using -lstatic). When I pass
the -L/usr/lib/nptl option to force the linker to use the nptl libs, the effect is the same.
Well, I have now a static binary that freezes the system.
I can mails you the binary file (about 280kByte bz2 file) if you like.
I think this file is too large to be posted to the list.
One additional question comes in my mind:
As I learned from all that things that are related to the issue, it seems to be fairly
critical to fork out of a real time task or to create new processes out of it.
My question is now:
Is it possible to force a real time task back to standard linux task behaviour,
in this state the critical calls can be made. After it a move back to the origin
real time state would be nice. In the native skin there is the rt_task_shadow()
call that allows to move a standard linux task to a real time task.
For moving a real time task to a standard task (and back) something like
rt_task_movetostandard(&state) that saves the real time state in a variable
and a rt_task_movebacktorealtime(&state) would be helpful.
>From the user point all that would be something like a critical region,
I enter a region where I can do really everything and then I move back
to the real time domain.
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: A fairly small rtnet/Xenomai...
2006-12-22 9:06 ` M. Koehrer
@ 2006-12-22 9:24 ` Gilles Chanteperdrix
2006-12-22 9:40 ` Dmitry Adamushko
0 siblings, 1 reply; 5+ messages in thread
From: Gilles Chanteperdrix @ 2006-12-22 9:24 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
M. Koehrer wrote:
> Good morning everybody,
>
> here are some more news concerning the Xenomai crash issue.
> 1) I have tried to enable the CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP
> kernel config parameters. But I did not get any messages before the kernel freezes.
>
> 2) I have tried to link a static application (using -lstatic). When I pass
> the -L/usr/lib/nptl option to force the linker to use the nptl libs, the effect is the same.
> Well, I have now a static binary that freezes the system.
> I can mails you the binary file (about 280kByte bz2 file) if you like.
> I think this file is too large to be posted to the list.
>
> One additional question comes in my mind:
> As I learned from all that things that are related to the issue, it seems to be fairly
> critical to fork out of a real time task or to create new processes out of it.
In absence of a better solution, I would recommend to use the "fault_vm"
function after each fork.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: A fairly small rtnet/Xenomai...
2006-12-22 9:24 ` [Xenomai-help] " Gilles Chanteperdrix
@ 2006-12-22 9:40 ` Dmitry Adamushko
2006-12-22 10:27 ` [Xenomai-help] " M. Koehrer
0 siblings, 1 reply; 5+ messages in thread
From: Dmitry Adamushko @ 2006-12-22 9:40 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: Xenomai help
On 22/12/06, Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
>
> In absence of a better solution, I would recommend to use the "fault_vm"
> function after each fork.
But it's not safe. It doesn't stop any other rt threads (if there are
a few in this app.) from touching the wp-pages (it's not only about
stacks after all) in the mean time. So fault_vm() only increases the
probability of not-crashing but doesn't eliminate it completely.
So all the contextes have to be blocked starting from the moment
fork() is about to be called and till the moment a subsequent
fault_vm() is done.
It's ugly and that suggests fork() is not ok here at all.
And btw, vfork() should be a funny thing being called from
multi-threaded app. It blocks a calling context and borrows its
address space while other threads are continuing to run, well, with
the same context (maybe it's handled somehow, have to check).
>
> --
> Gilles Chanteperdrix
>
--
Best regards,
Dmitry Adamushko
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai...
2006-12-22 10:15 ` [Xenomai-help] " Gilles Chanteperdrix
@ 2006-12-22 10:27 ` M. Koehrer
2006-12-22 11:20 ` Philippe Gerum
0 siblings, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-22 10:27 UTC (permalink / raw)
To: gilles.chanteperdrix, dmitry.adamushko; +Cc: xenomai
Hi Gilles,
> fault_vm is safe to use only if you are calling fork at a time when
> there is only one thread. So, if your application is forking at init, it
> should be OK.
Do you mean there must be only one real time thread?
That means, when I have an application that creates multiple real time threads,
I can not rely on fault_vm() ?
In this case I have to do the "hard" way by using a different (non real time)
context to do the forks and system() calls.
As this is hard to understand, I strongly recommend that there is Xenomai support
for this! I.e. a Xenomai API that can be called with a (callback-)function pointer and
a user data pointer.
When a (realtime) thread calls this function, the real time thread is blocked.
The callback function is then called from a safe context and
after exit of the callback function the real time thread is resumed.
Regards
Mathias
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai...
2006-12-22 10:27 ` [Xenomai-help] " M. Koehrer
@ 2006-12-22 11:20 ` Philippe Gerum
2006-12-22 10:15 ` [Xenomai-help] " Gilles Chanteperdrix
2006-12-22 11:40 ` [Xenomai-help] Re: " M. Koehrer
0 siblings, 2 replies; 5+ messages in thread
From: Philippe Gerum @ 2006-12-22 11:20 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
On Fri, 2006-12-22 at 11:27 +0100, M. Koehrer wrote:
> Hi Gilles,
>
> > fault_vm is safe to use only if you are calling fork at a time when
> > there is only one thread. So, if your application is forking at init, it
> > should be OK.
> Do you mean there must be only one real time thread?
> That means, when I have an application that creates multiple real time threads,
> I can not rely on fault_vm() ?
> In this case I have to do the "hard" way by using a different (non real time)
> context to do the forks and system() calls.
> As this is hard to understand, I strongly recommend that there is Xenomai support
> for this! I.e. a Xenomai API that can be called with a (callback-)function pointer and
> a user data pointer.
> When a (realtime) thread calls this function, the real time thread is blocked.
> The callback function is then called from a safe context and
> after exit of the callback function the real time thread is resumed.
>
Sorry, but no, no way, I won't merge anything like this, ever. This is
the wrong way to go. The right way is to fix the COW issue at kernel
level - probably the I-pipe has to provide the required support -
because this is where those dirty details belong to. This is definitely
not an API issue, because you just cannot tell application developers to
care about arch-specific VM issues when using a so-called generic API
that has to work the same way on all archs (e.g. MMU-less platforms
don't care about this, others would). What could be considered as a
bearable limitation right now must not have any impact on long-term
principles, and the API stuff belongs to that category of issues.
> Regards
>
> Mathias
>
>
>
>
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: A fairly small rtnet/Xenomai...
2006-12-22 9:40 ` Dmitry Adamushko
@ 2006-12-22 10:15 ` Gilles Chanteperdrix
2006-12-21 18:18 ` Gilles Chanteperdrix
0 siblings, 1 reply; 5+ messages in thread
From: Gilles Chanteperdrix @ 2006-12-22 10:15 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Xenomai help
Dmitry Adamushko wrote:
> On 22/12/06, Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org> wrote:
>
>>In absence of a better solution, I would recommend to use the "fault_vm"
>> function after each fork.
>
>
> But it's not safe. It doesn't stop any other rt threads (if there are
> a few in this app.) from touching the wp-pages (it's not only about
> stacks after all) in the mean time. So fault_vm() only increases the
> probability of not-crashing but doesn't eliminate it completely.
>
> So all the contextes have to be blocked starting from the moment
> fork() is about to be called and till the moment a subsequent
> fault_vm() is done.
> It's ugly and that suggests fork() is not ok here at all.
>
> And btw, vfork() should be a funny thing being called from
> multi-threaded app. It blocks a calling context and borrows its
> address space while other threads are continuing to run, well, with
> the same context (maybe it's handled somehow, have to check).
fault_vm is safe to use only if you are calling fork at a time when
there is only one thread. So, if your application is forking at init, it
should be OK.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 5+ messages in thread* [Xenomai-help] Re: A fairly small rtnet/Xenomai...
2006-12-21 15:43 ` Dmitry Adamushko
@ 2006-12-21 18:18 ` Gilles Chanteperdrix
0 siblings, 0 replies; 5+ messages in thread
From: Gilles Chanteperdrix @ 2006-12-21 18:18 UTC (permalink / raw)
To: Dmitry Adamushko; +Cc: Xenomai help
Dmitry Adamushko wrote:
>>when I use fork() instead of vfork() I have the freeze again.
>>I.e. using vfork() it works fine, using fork() freezes the system.
>
>
> Gilles: is this not the end point of the fork vs. vfork epopee ? :)
Right. I am currently using linuxthreads pthread implementation, that is
why I had a look at it.
>
> Anyway, it looks your remarks about fork() make sense. But if it's
> true indeed, a more nicer solution would be required.
The simple solution is to only use fork in non time-critical part of
code (like initializations and cleanup) and to fault the pages after the
fork. It make sense to limit fork use to non time-critical code, I mean,
I would already not recommend to create threads in time-critical code,
creating a process is even worse.
A more complex solution would be to really duplicate the mlocked pages
at fork time.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai...
2006-12-22 11:20 ` Philippe Gerum
2006-12-22 10:15 ` [Xenomai-help] " Gilles Chanteperdrix
@ 2006-12-22 11:40 ` M. Koehrer
2006-12-22 12:09 ` Philippe Gerum
1 sibling, 1 reply; 5+ messages in thread
From: M. Koehrer @ 2006-12-22 11:40 UTC (permalink / raw)
To: rpm, mathias_koehrer; +Cc: xenomai
Hi Philippe,
I agree. To fix the root cause is actually the very best to do!
This eases the life of users and developers.
Regards
Mathias
>
> Sorry, but no, no way, I won't merge anything like this, ever. This is
> the wrong way to go. The right way is to fix the COW issue at kernel
> level - probably the I-pipe has to provide the required support -
> because this is where those dirty details belong to. This is definitely
> not an API issue, because you just cannot tell application developers to
> care about arch-specific VM issues when using a so-called generic API
> that has to work the same way on all archs (e.g. MMU-less platforms
> don't care about this, others would). What could be considered as a
> bearable limitation right now must not have any impact on long-term
> principles, and the API stuff belongs to that category of issues.
--
Mathias Koehrer
mathias_koehrer@domain.hid
Viel oder wenig? Schnell oder langsam? Unbegrenzt surfen + telefonieren
ohne Zeit- und Volumenbegrenzung? DAS TOP ANGEBOT JETZT bei Arcor: günstig
und schnell mit DSL - das All-Inclusive-Paket für clevere Doppel-Sparer,
nur 44,85 inkl. DSL- und ISDN-Grundgebühr!
http://www.arcor.de/rd/emf-dsl-2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai...
2006-12-22 11:40 ` [Xenomai-help] Re: " M. Koehrer
@ 2006-12-22 12:09 ` Philippe Gerum
0 siblings, 0 replies; 5+ messages in thread
From: Philippe Gerum @ 2006-12-22 12:09 UTC (permalink / raw)
To: M. Koehrer; +Cc: xenomai
On Fri, 2006-12-22 at 12:40 +0100, M. Koehrer wrote:
> Hi Philippe,
>
> I agree. To fix the root cause is actually the very best to do!
> This eases the life of users and developers.
Definitely, yes. The point is that once you go down the
"rt_task_system()" path, you end up being trapped into API
proliferation, which would continue with rt_task_fork, rt_task_vfork and
so on. This would a dead end, unfortunately.
Asking people to use rt_task_system() is already asking them to
understand why they should not use system() in the first place, so this
can't solve the root issue. The problem comes entirely from the fact
that we don't expect any more faults after mlockall, and COW proved us
wrong. The fact that the machine freezes is only a side-effect due to
the co-kernel constraints, the first and foremost problem is that
gracefully handling a page fault would induce uncontrolable latencies
anyway.
IOW, the problem has to be fixed at kernel level, because it's
fundamentally an arch-dependent core issue.
>
> Regards
>
> Mathias
>
> >
> > Sorry, but no, no way, I won't merge anything like this, ever. This is
> > the wrong way to go. The right way is to fix the COW issue at kernel
> > level - probably the I-pipe has to provide the required support -
> > because this is where those dirty details belong to. This is definitely
> > not an API issue, because you just cannot tell application developers to
> > care about arch-specific VM issues when using a so-called generic API
> > that has to work the same way on all archs (e.g. MMU-less platforms
> > don't care about this, others would). What could be considered as a
> > bearable limitation right now must not have any impact on long-term
> > principles, and the API stuff belongs to that category of issues.
>
>
--
Philippe.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-12-22 12:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-22 11:32 [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai M. Koehrer
-- strict thread matches above, loose matches on Subject: below --
2006-12-21 10:45 [Xenomai-help] Re: A fairly small rtnet/Xenomai application that freezes the Dmitry Adamushko
2006-12-20 14:11 ` [Xenomai-help] Aw: Re: A fairly small rtnet/Xenomai application that freezes the PC M. Koehrer
2006-12-19 8:08 ` [Xenomai-help] NMI watchdog: Loading of xeno_native leads to reboot of PC Jan Kiszka
2006-12-19 7:54 ` M. Koehrer
2006-12-19 8:14 ` Re: [Xenomai-help] NMI watchdog: Loading of xeno_native leads to M. Koehrer
2006-12-19 9:26 ` [Xenomai-help] NMI watchdog: Loading of xeno_native leads M. Koehrer
2006-12-19 12:04 ` Aw: " M. Koehrer
2006-12-20 13:25 ` [Xenomai-help] A fairly small rtnet/Xenomai application that freezes the PC M. Koehrer
2006-12-21 8:48 ` [Xenomai-help] Re: Re: A fairly small rtnet/Xenomai application that freezes the M. Koehrer
2006-12-21 9:03 ` [Xenomai-help] " Jan Kiszka
2006-12-21 10:21 ` M. Koehrer
2006-12-21 11:19 ` [Xenomai-help] " M. Koehrer
2006-12-21 11:28 ` Philippe Gerum
2006-12-21 11:51 ` [Xenomai-help] Re: Re: Re: A fairly small rtnet/Xenomai M. Koehrer
2006-12-21 13:09 ` Dmitry Adamushko
2006-12-21 13:36 ` [Xenomai-help] " M. Koehrer
2006-12-21 14:13 ` Philippe Gerum
2006-12-21 15:00 ` [Xenomai-help] " M. Koehrer
2006-12-21 15:17 ` Dmitry Adamushko
2006-12-21 15:36 ` [Xenomai-help] " M. Koehrer
2006-12-21 15:43 ` Dmitry Adamushko
2006-12-22 9:06 ` M. Koehrer
2006-12-22 9:24 ` [Xenomai-help] " Gilles Chanteperdrix
2006-12-22 9:40 ` Dmitry Adamushko
2006-12-22 10:27 ` [Xenomai-help] " M. Koehrer
2006-12-22 11:20 ` Philippe Gerum
2006-12-22 10:15 ` [Xenomai-help] " Gilles Chanteperdrix
2006-12-21 18:18 ` Gilles Chanteperdrix
2006-12-22 11:40 ` [Xenomai-help] Re: " M. Koehrer
2006-12-22 12:09 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.