* [Xenomai-help] Debugging in Xenomai
@ 2006-11-23 15:25 Daniel Schnell
2006-11-23 15:57 ` Jan Kiszka
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Daniel Schnell @ 2006-11-23 15:25 UTC (permalink / raw)
To: xenomai
[-- Attachment #1: Type: text/plain, Size: 2471 bytes --]
Hi,
our application deadlocks from time to time.
To isolate which of the threads are deadlocked, I look into
/proc/xenomai/stat. Here I can see if there are context switches
happening or not. Further I can see which threads have which PID and
which task state they are into. One of the task states that is not clear
to me btw is the state "X" or "relaxed shadow". Anybody ?
So after gathering that I have some basic idea which of the threads is
having a deadlock and now the question is, how to continue ?
One of the next steps would be finding out which actual function back
trace the suspicious thread has. So I execute gdb and try to attach to
the appropriate process, which works. Problem: sending Ctrl-C doesn't
work, independant of if gdb is executed via ssh or serial console. So I
cannot stop the actual program beeing debugged, rendering the gdb
approach useless. Also sending SIGINT to the GDB process doesn't work.
It seems to be simply ignored. As I understand CTRL-C is effectively
sending SIGINT and is sent to GDB itself and not to the underlying appl.
Of course I always can insert logger output strings. I even wrote a
logger that doesn't force to switch the process into secondary mode. I
just thought we might have something that gives a developer a more
effective way of fastly finding out which task has which actual back
trace.
Under VxWorks there is a very elegant way of debugging with the help of
the userLib. Of course this is possible here because VxWorks makes no
distinction between user and kernel space and the shell itself is
running as a task having access to symbol tables etc. You can say "i"
which gives a similar output as /proc/xenomai/stat but with much more
informations like stack size, actual stack consumption, actual executed
function pointer, etc. "b" gives you a global breakpoint for any
function. One can get stack traces without the help of gdb via "tt",
etc. etc.
In Xenomai I suppose the actual stack execution pointer, stack size,
etc. of a task has to be available to the kernel as well, so
principially it should be possible to provide more informations than
what /proc/xenomai/stat or /proc/xenomai/sched provide. Those migrating
from VxWorks (and many others probably as well) would certainly like
that.
How do the Xenomai veterans debug their application in case of deadlocks
/ synchronization errors, crashes etc. ?
Best regards,
Daniel.
[-- Attachment #2: Type: text/html, Size: 4631 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-23 15:25 [Xenomai-help] Debugging in Xenomai Daniel Schnell
@ 2006-11-23 15:57 ` Jan Kiszka
2006-11-23 17:23 ` Philippe Gerum
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2006-11-23 15:57 UTC (permalink / raw)
To: Daniel Schnell; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 3270 bytes --]
Daniel Schnell wrote:
> Hi,
>
> our application deadlocks from time to time.
>
> To isolate which of the threads are deadlocked, I look into
> /proc/xenomai/stat. Here I can see if there are context switches
> happening or not. Further I can see which threads have which PID and
> which task state they are into. One of the task states that is not clear
> to me btw is the state "X" or "relaxed shadow". Anybody ?
Thread is pending in secondary mode on some Linux resource.
>
> So after gathering that I have some basic idea which of the threads is
> having a deadlock and now the question is, how to continue ?
>
> One of the next steps would be finding out which actual function back
> trace the suspicious thread has. So I execute gdb and try to attach to
> the appropriate process, which works. Problem: sending Ctrl-C doesn't
> work, independant of if gdb is executed via ssh or serial console. So I
> cannot stop the actual program beeing debugged, rendering the gdb
> approach useless. Also sending SIGINT to the GDB process doesn't work.
> It seems to be simply ignored. As I understand CTRL-C is effectively
> sending SIGINT and is sent to GDB itself and not to the underlying appl.
I would say there is a bug, either in gdb or in Xenomai. Does debugging
in general work on your box, i.e. without Xenomai? Does it fail with any
trivial Xenomai application? What version? On PPC? If there is a bug
/wrt Xenomai, it has to be fixed.
>
> Of course I always can insert logger output strings. I even wrote a
> logger that doesn't force to switch the process into secondary mode. I
> just thought we might have something that gives a developer a more
> effective way of fastly finding out which task has which actual back
> trace.
Already tried strace? It may help to understand more if your thread
hangs in secondary mode.
Beyond that, we are all desperately waiting for LTTng... :)
>
> Under VxWorks there is a very elegant way of debugging with the help of
> the userLib. Of course this is possible here because VxWorks makes no
> distinction between user and kernel space and the shell itself is
> running as a task having access to symbol tables etc. You can say "i"
> which gives a similar output as /proc/xenomai/stat but with much more
> informations like stack size, actual stack consumption, actual executed
> function pointer, etc. "b" gives you a global breakpoint for any
> function. One can get stack traces without the help of gdb via "tt",
> etc. etc.
>
> In Xenomai I suppose the actual stack execution pointer, stack size,
> etc. of a task has to be available to the kernel as well, so
> principially it should be possible to provide more informations than
> what /proc/xenomai/stat or /proc/xenomai/sched provide. Those migrating
> from VxWorks (and many others probably as well) would certainly like
> that.
>
> How do the Xenomai veterans debug their application in case of deadlocks
> / synchronization errors, crashes etc. ?
Log message instrumentation (we also have such a RT-safe mechanism),
meditating on the code (better: let someone else meditate that looks at
it from a different angle), gdb (at least on x86 we had no problems so far).
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-23 15:25 [Xenomai-help] Debugging in Xenomai Daniel Schnell
2006-11-23 15:57 ` Jan Kiszka
@ 2006-11-23 17:23 ` Philippe Gerum
2006-11-23 17:56 ` Gilles Chanteperdrix
2006-11-24 9:20 ` Peter Soetens
3 siblings, 0 replies; 8+ messages in thread
From: Philippe Gerum @ 2006-11-23 17:23 UTC (permalink / raw)
To: Daniel Schnell; +Cc: xenomai
On Thu, 2006-11-23 at 15:25 +0000, Daniel Schnell wrote:
> Hi,
>
> our application deadlocks from time to time.
>
> To isolate which of the threads are deadlocked, I look
> into /proc/xenomai/stat. Here I can see if there are context switches
> happening or not. Further I can see which threads have which PID and
> which task state they are into. One of the task states that is not
> clear to me btw is the state "X" or "relaxed shadow". Anybody ?
>
It's running under the control of the Linux scheduler (secondary mode).
The fact that SIGINT seems blocked for your application over GDB is no
good sign, and this is what needs to be investigated. Is SIGINT properly
handled (i.e. causing GDB to take control back) before your application
enters this wild state? E.g. does hitting ^C over GDB while your
application is initializing, and maybe later during the routine
processing work? When does this apparently stop working?
I'm trying to find out whether the fact that SIGINT is not being caught
anymore is related to the deadlocked state here.
Another thing: what does 'ps' say when the application looks deadlocked?
--
Philippe.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-23 15:25 [Xenomai-help] Debugging in Xenomai Daniel Schnell
2006-11-23 15:57 ` Jan Kiszka
2006-11-23 17:23 ` Philippe Gerum
@ 2006-11-23 17:56 ` Gilles Chanteperdrix
2006-11-24 9:20 ` Peter Soetens
3 siblings, 0 replies; 8+ messages in thread
From: Gilles Chanteperdrix @ 2006-11-23 17:56 UTC (permalink / raw)
To: Daniel Schnell; +Cc: xenomai
Daniel Schnell wrote:
> How do the Xenomai veterans debug their application in case of deadlocks
> / synchronization errors, crashes etc. ?
You may try to enable Xenomai debug options, it may give you some
information of why the program is crashing, for instance if you are
destroying the same object twice or such kind of errors.
If using the posix skin, one source of deadlocks is the threads
cancelation, you should cancel all the threads and then join them all
and not cancel then join each thread in turn.
--
Gilles Chanteperdrix
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-23 15:25 [Xenomai-help] Debugging in Xenomai Daniel Schnell
` (2 preceding siblings ...)
2006-11-23 17:56 ` Gilles Chanteperdrix
@ 2006-11-24 9:20 ` Peter Soetens
2006-11-24 10:41 ` Jan Kiszka
3 siblings, 1 reply; 8+ messages in thread
From: Peter Soetens @ 2006-11-24 9:20 UTC (permalink / raw)
To: xenomai
On Thursday 23 November 2006 16:25, Daniel Schnell wrote:
>
> One of the next steps would be finding out which actual function back
> trace the suspicious thread has. So I execute gdb and try to attach to
> the appropriate process, which works. Problem: sending Ctrl-C doesn't
> work, independant of if gdb is executed via ssh or serial console. So I
> cannot stop the actual program beeing debugged, rendering the gdb
> approach useless. Also sending SIGINT to the GDB process doesn't work.
> It seems to be simply ignored. As I understand CTRL-C is effectively
> sending SIGINT and is sent to GDB itself and not to the underlying appl.
We had a similar issue while debugging an RTNET app (main thread + 1 xenomai
posix skin thread) under xenomai. I don't recall exactly the circumstances,
but the app was blocked on a socket, and a Ctrl-C did not work. A 'killall
gdb' (SIGTERM) did come through and killed gdb. If you (the Xeno/RTNet
developers)'re interested in this case, I'll see if I can get more info.
Peter
--
Peter Soetens -- FMTC -- <http://www.fmtc.be>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-24 9:20 ` Peter Soetens
@ 2006-11-24 10:41 ` Jan Kiszka
2006-11-24 11:09 ` Philippe Gerum
0 siblings, 1 reply; 8+ messages in thread
From: Jan Kiszka @ 2006-11-24 10:41 UTC (permalink / raw)
To: Peter Soetens; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 1506 bytes --]
Peter Soetens wrote:
> On Thursday 23 November 2006 16:25, Daniel Schnell wrote:
>> One of the next steps would be finding out which actual function back
>> trace the suspicious thread has. So I execute gdb and try to attach to
>> the appropriate process, which works. Problem: sending Ctrl-C doesn't
>> work, independant of if gdb is executed via ssh or serial console. So I
>> cannot stop the actual program beeing debugged, rendering the gdb
>> approach useless. Also sending SIGINT to the GDB process doesn't work.
>> It seems to be simply ignored. As I understand CTRL-C is effectively
>> sending SIGINT and is sent to GDB itself and not to the underlying appl.
>
> We had a similar issue while debugging an RTNET app (main thread + 1 xenomai
> posix skin thread) under xenomai. I don't recall exactly the circumstances,
> but the app was blocked on a socket, and a Ctrl-C did not work. A 'killall
> gdb' (SIGTERM) did come through and killed gdb. If you (the Xeno/RTNet
> developers)'re interested in this case, I'll see if I can get more info.
You're welcome.
I just checked the behaviour of examples/xenomai/posix/eth_p_all over
gdb. I can interrupt the blocking recv - so far so good - but the
syscall is unfortunately not replayed when continuing. Instead, the
program just terminates because some error (EINTR) is reported to the
application.
[Too lazy to dig:] Philippe, isn't syscall restarting after an
interruption the job of the Xenomai nucleus?
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-24 10:41 ` Jan Kiszka
@ 2006-11-24 11:09 ` Philippe Gerum
2006-11-24 12:10 ` Jan Kiszka
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Gerum @ 2006-11-24 11:09 UTC (permalink / raw)
To: Jan Kiszka; +Cc: xenomai
On Fri, 2006-11-24 at 11:41 +0100, Jan Kiszka wrote:
> Peter Soetens wrote:
> > On Thursday 23 November 2006 16:25, Daniel Schnell wrote:
> >> One of the next steps would be finding out which actual function back
> >> trace the suspicious thread has. So I execute gdb and try to attach to
> >> the appropriate process, which works. Problem: sending Ctrl-C doesn't
> >> work, independant of if gdb is executed via ssh or serial console. So I
> >> cannot stop the actual program beeing debugged, rendering the gdb
> >> approach useless. Also sending SIGINT to the GDB process doesn't work.
> >> It seems to be simply ignored. As I understand CTRL-C is effectively
> >> sending SIGINT and is sent to GDB itself and not to the underlying appl.
> >
> > We had a similar issue while debugging an RTNET app (main thread + 1 xenomai
> > posix skin thread) under xenomai. I don't recall exactly the circumstances,
> > but the app was blocked on a socket, and a Ctrl-C did not work. A 'killall
> > gdb' (SIGTERM) did come through and killed gdb. If you (the Xeno/RTNet
> > developers)'re interested in this case, I'll see if I can get more info.
>
> You're welcome.
>
> I just checked the behaviour of examples/xenomai/posix/eth_p_all over
> gdb. I can interrupt the blocking recv - so far so good - but the
> syscall is unfortunately not replayed when continuing. Instead, the
> program just terminates because some error (EINTR) is reported to the
> application.
>
> [Too lazy to dig:] Philippe, isn't syscall restarting after an
> interruption the job of the Xenomai nucleus?
Yes, it is, and normally, it does so.
syscall():
return -EINTR
do_hi/lo_syscall_event():
request_syscall_restart():
if (syscall_return == -EINTR) pass back -ERESTARTSYS
Linux ret_from_syscall:
do_notify_resume upon sig, which eventually checks for
-ERESTARTSYS then fixes $pc to restart the interrupted call.
This is regular Linux code, Xenomai relies on it, but does not
change it.
Issue #1: a blocking Xenomai syscall does not detect the XNBREAK
condition properly upon return from xnpod_suspend_thread() or
xnsynch_sleep_on(), therefore does not pass back -EINTR to the nucleus
when this bit is raised for the current thread, which causes the signal
to be left over. Usually, the consequence of this is "please find the
reset button, and exercise your firmware once more". This said, weird
behaviour instead of total freeze has been experienced in this context,
too.
Issue #2: user-space switches signal handling to SysV behaviour instead
of the BSDish one, thus preventing syscall restart in favour of getting
a failure return code with errno == -EINTR. Usually not the case, but,
this deserves a verification.
Issue #3: Oops. We have a bug.
>
> Jan
>
--
Philippe.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Xenomai-help] Debugging in Xenomai
2006-11-24 11:09 ` Philippe Gerum
@ 2006-11-24 12:10 ` Jan Kiszka
0 siblings, 0 replies; 8+ messages in thread
From: Jan Kiszka @ 2006-11-24 12:10 UTC (permalink / raw)
To: rpm; +Cc: xenomai
[-- Attachment #1: Type: text/plain, Size: 2931 bytes --]
Philippe Gerum wrote:
> On Fri, 2006-11-24 at 11:41 +0100, Jan Kiszka wrote:
>> Peter Soetens wrote:
>>> On Thursday 23 November 2006 16:25, Daniel Schnell wrote:
>>>> One of the next steps would be finding out which actual function back
>>>> trace the suspicious thread has. So I execute gdb and try to attach to
>>>> the appropriate process, which works. Problem: sending Ctrl-C doesn't
>>>> work, independant of if gdb is executed via ssh or serial console. So I
>>>> cannot stop the actual program beeing debugged, rendering the gdb
>>>> approach useless. Also sending SIGINT to the GDB process doesn't work.
>>>> It seems to be simply ignored. As I understand CTRL-C is effectively
>>>> sending SIGINT and is sent to GDB itself and not to the underlying appl.
>>> We had a similar issue while debugging an RTNET app (main thread + 1 xenomai
>>> posix skin thread) under xenomai. I don't recall exactly the circumstances,
>>> but the app was blocked on a socket, and a Ctrl-C did not work. A 'killall
>>> gdb' (SIGTERM) did come through and killed gdb. If you (the Xeno/RTNet
>>> developers)'re interested in this case, I'll see if I can get more info.
>> You're welcome.
>>
>> I just checked the behaviour of examples/xenomai/posix/eth_p_all over
>> gdb. I can interrupt the blocking recv - so far so good - but the
>> syscall is unfortunately not replayed when continuing. Instead, the
>> program just terminates because some error (EINTR) is reported to the
>> application.
>>
>> [Too lazy to dig:] Philippe, isn't syscall restarting after an
>> interruption the job of the Xenomai nucleus?
>
> Yes, it is, and normally, it does so.
>
> syscall():
> return -EINTR
> do_hi/lo_syscall_event():
> request_syscall_restart():
> if (syscall_return == -EINTR) pass back -ERESTARTSYS
> Linux ret_from_syscall:
> do_notify_resume upon sig, which eventually checks for
> -ERESTARTSYS then fixes $pc to restart the interrupted call.
> This is regular Linux code, Xenomai relies on it, but does not
> change it.
>
> Issue #1: a blocking Xenomai syscall does not detect the XNBREAK
> condition properly upon return from xnpod_suspend_thread() or
> xnsynch_sleep_on(), therefore does not pass back -EINTR to the nucleus
> when this bit is raised for the current thread, which causes the signal
> to be left over. Usually, the consequence of this is "please find the
> reset button, and exercise your firmware once more". This said, weird
> behaviour instead of total freeze has been experienced in this context,
> too.
>
> Issue #2: user-space switches signal handling to SysV behaviour instead
> of the BSDish one, thus preventing syscall restart in favour of getting
> a failure return code with errno == -EINTR. Usually not the case, but,
> this deserves a verification.
>
> Issue #3: Oops. We have a bug.
>
I take (a variant of) #1 - RTnet bug...
Jan
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-11-24 12:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-23 15:25 [Xenomai-help] Debugging in Xenomai Daniel Schnell
2006-11-23 15:57 ` Jan Kiszka
2006-11-23 17:23 ` Philippe Gerum
2006-11-23 17:56 ` Gilles Chanteperdrix
2006-11-24 9:20 ` Peter Soetens
2006-11-24 10:41 ` Jan Kiszka
2006-11-24 11:09 ` Philippe Gerum
2006-11-24 12:10 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.