All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] [xenomai-forge] psos: crash while stressing event timers
@ 2013-06-04 10:56 Ronny Meeus
  2013-06-04 12:41 ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Ronny Meeus @ 2013-06-04 10:56 UTC (permalink / raw)
  To: xenomai

Hello

we are currently running with recent version of xenomai-forge.
The issue we see is a crash while running the attached application code
(pSOS interface).

Basically the test creates a number of "chained" (called batches) tasks.
After setting up the batch, the first task starts a timer and when the
timer expires, an event is sent to the next task in the chain.
This process continues forever.

If only 1 chain is created we do not see issues.
The number of threads in the chain is less relevant since typically there
will be no big impact when the number of tasks increases.

When we start to increase the number of batches we start to see crashes.
For example in the test below we create 20 batches with 1 thread in each
batch.
The -o parameter specifies the timeout used by the timer of each task
before the control is given to the next task in the chain.

ulimit -s 128 ;taskset 2 ./tests -t 1 -b 20 -o 1
Xenomai test: threads 1, batches 20, timeout 1 ms, stats 0
thread_entry(0,0)
thread_entry(1,0)
thread_entry(2,0)
thread_entry(3,0)
thread_entry(4,0)
thread_entry(5,0)
thread_entry(6,0)
thread_entry(7,0)
thread_entry(8,0)
thread_entry(9,0)
thread_entry(10,0)
thread_entry(11,0)
thread_entry(12,0)
thread_entry(13,0)
thread_entry(14,0)
thread_entry(15,0)
thread_entry(16,0)
thread_entry(17,0)
thread_entry(18,0)
thread_entry(19,0)
tm_evafter(4,0) returned 75 (errno 22): tmid=4920880
tm_evafter(9,0) returned 75 (errno 22): tmid=4919912
Segmentation fault

After some investigation it looks like it has something to do with the
timerhandling in xenomai.
The tm_evafter error indicates that the creation of the underlying posix
timer has failed.
Short after this typically a segmentation fault is seen.

If we run the same test application with an old version of xenomai forge
(1year ago) the issue is not observed.

Other useful information is that we typically see the issue when we start
to reach a high cpuload.
On boards with a stronger processor, the number of batches can be much
higher compared to boards with a low end processor.

The problem is observed in various processor environments (mips/arm/ppc)
and with different versions of the C library.

Best regards,
Ronny
-------------- next part --------------
A non-text attachment was scrubbed...
Name: event_stress.c
Type: text/x-csrc
Size: 7481 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20130604/1800cdf1/attachment.c>

^ permalink raw reply	[flat|nested] 17+ messages in thread
* Re: [Xenomai] [xenomai-forge] psos: crash while stressing event timers
@ 2013-06-04 20:26 Tom Philips
  2013-06-05  8:02 ` Philippe Gerum
  0 siblings, 1 reply; 17+ messages in thread
From: Tom Philips @ 2013-06-04 20:26 UTC (permalink / raw)
  To: xenomai

We have made a core dump and the crash occurs in pvfree().
See call sequence explanation below.

The actual problem however, seems to be in the libc library.

I'll elaborate.
The crash occurs when using the timer function tm_evafter()

This is the code call sequence of tm_evafter(), pseudo code, annotated a
bit:

tm_evafter()              called from our app
  start_evtimer()
    timerobj_init()       calls timer_create() POSIX function
    timerobj_start()      calls timer_settime() POSIX function
    if (error)            we get an error from timerobj_start()
      timerobj_destroy()  destroy's POSIX timer
      pvlist_remove()
      pvfree()            ==> crashes (but not always)

So in our tests, timer_setime() sometimes returns an error code,
while the timer does seem to be started.
I.e. we get a negative return code from timer_settime(),
errno is set to 22 (EINVAL), but the timer is started anyhow.
All of this was checked in the debugger.

This does not seem correct behaviour of the timer_settime() system call.

Obviously, you will run into problems eventually.
I.e. the above code will clean up the the timer and pv objects and
the code that is called at timer elapse does the same thing.
So you will get a double free of the pv structures.

This only occurs under heavy load.
Not all false timer_settime() errors result in a crash.
I guess it all depends in which order the destruction takes place.

If I ignore the error returned by timerobj_start(), all works fine.

I've tried this on different architectures (ARM, MIPS, x86),
different LIBC versions (2.10, 2.11.3, 2.12, 2.15),
different kernel versions (2.6.29, 2.6.32, 3.2, 3.4.24)
They all exibit the same problem.

We might need to contact the LIBC guys...

--
Tom

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-06-11 10:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-04 10:56 [Xenomai] [xenomai-forge] psos: crash while stressing event timers Ronny Meeus
2013-06-04 12:41 ` Philippe Gerum
2013-06-04 12:57   ` Ronny Meeus
2013-06-04 14:04     ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2013-06-04 20:26 Tom Philips
2013-06-05  8:02 ` Philippe Gerum
2013-06-05  8:28   ` Ronny Meeus
2013-06-05  8:36     ` Philippe Gerum
2013-06-05  9:38       ` Philippe Gerum
2013-06-05  9:51       ` Tom Philips
2013-06-05  9:59         ` Philippe Gerum
2013-06-05 10:11           ` Ronny Meeus
2013-06-05 10:25             ` Philippe Gerum
2013-06-05 20:50               ` Philippe Gerum
2013-06-07 10:39                 ` Ronny Meeus
2013-06-11 10:10                   ` Ronny Meeus
2013-06-11 10:21                     ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.