All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: Ronny Meeus <ronny.meeus@gmail.com>
Cc: xenomai@xenomai.org
Subject: Re: [Xenomai] [xenomai-forge] psos: crash while stressing event timers
Date: Wed, 05 Jun 2013 12:25:16 +0200	[thread overview]
Message-ID: <51AF120C.6070603@xenomai.org> (raw)
In-Reply-To: <CAMJ=MEcJN=Sbtz_kw9JUKR_ckQwzMprAdYgcinZvR7SrjmetmQ@mail.gmail.com>

On 06/05/2013 12:11 PM, Ronny Meeus wrote:
>
>
> On Wed, Jun 5, 2013 at 11:59 AM, Philippe Gerum <rpm@xenomai.org
> <mailto:rpm@xenomai.org>> wrote:
>
>     On 06/05/2013 11:51 AM, Tom Philips wrote:
>
>         Philippe,
>
>         I tried the asserts you suggested and here's the output:
>
>         +++
>         xenomai: timerobj.c:308: timerobj_start: Assertion
>         `timer_getoverrun(tmobj->__timer) >= 0' failed.
>         Aborted (core dumped)
>         +++
>
>
>         So this means the call to timer_getoverrun() fails.
>         Then I did some further investigation.
>         I changed the code in timerobj_start() to this:
>
>
>         +++
>             if (__RT(timer_settime(tmobj->__timer, TIMER_ABSTIME, it,
>         NULL))) {
>               int error = errno;
>               errno = 0;
>               if (timer_getoverrun(tmobj->__timer) < 0) {
>                 __bt(-errno);           => @ line 311
>               }
>         /*  assert(timer_getoverrun(tmobj-__>timer) >= 0); */
>               assert(it->it_value.tv_sec >= 0 && it->it_value.tv_nsec <
>         1000000000);
>               assert(it->it_interval.tv_sec >= 0 &&
>         it->it_interval.tv_nsec <
>         1000000000);
>               return __bt(-error);      => @ line 316
>             }
>         +++
>
>
>         This gives following (strange) result:
>         (The 'tm_evafter(x,y) returned' traces are from my test app)
>
>
>         +++
>         ------------------------------__------------------------------__------------------
>         [ ERROR BACKTRACE: thread 0.0 ]
>
>              #0  EINVAL in timerobj_start(), timerobj.c:316
>         => #1  EINVAL in timerobj_start(), timerobj.c:311
>         ------------------------------__------------------------------__------------------
>         tm_evafter(0,0) returned 75 (errno 22): tmid=3066037456
>         ------------------------------__------------------------------__------------------
>         [ ERROR BACKTRACE: thread 4.0 ]
>
>         => #0  EINVAL in timerobj_start(), timerobj.c:316
>         ------------------------------__------------------------------__------------------
>         tm_evafter(4,0) returned 75 (errno 0): tmid=67192088
>         ------------------------------__------------------------------__------------------
>         [ ERROR BACKTRACE: thread 4.0 ]
>
>              #0  EINVAL in timerobj_start(), timerobj.c:316
>         => #1  EINVAL in timerobj_start(), timerobj.c:311
>         ------------------------------__------------------------------__------------------
>         tm_evafter(4,0) returned 75 (errno 22): tmid=67191880
>         +++
>
>
>         So timer_getoverrun() fails and sets errno to EINVAL (22)
>         This can only mean that the given timerid is (or has become)
>         invalid.
>
>         But timer_getoverrun() does not always fail when timer_settime()
>         fails.
>         My rough estimate is that 1 out of 20 times timer_getoverrun()
>         does not fail when timer_settime() fails.
>
>         Might it be that there is a race condition between the above code
>         and the code excuted at timer elapse (i.e. were the timer is
>         destroyed)?
>
>
>
>     Yes, I think there is some race in the Xenomai code, and running SMP
>     makes it more likely. I just got the backtrace below:
>
>     #0  __timer_settime_new (timerid=0x0, flags=1, value=0x7f78b2102d60,
>          ovalue=0x0) at ../nptl/sysdeps/unix/sysv/__linux/timer_settime.c:58
>     #1  0x00007f78b229c2e0 in timerobj_start (tmobj=0x7f78440008f8,
>          handler=0x7f78b24ca9c1 <post_event_once>, it=0x7f78b2102d60)
>          at /home/rpm/git/xenomai-forge/__lib/copperplate/timerobj.c:309
>     #2  0x00007f78b24cabf6 in start_evtimer (events=2, it=0x7f78b2102d60,
>          tmid_r=0x7f78b2102db0) at
>     /home/rpm/git/xenomai-forge/__lib/psos/tm.c:133
>     #3  0x00007f78b24cac86 in tm_evafter (ticks=1, events=2,
>          tmid_r=0x7f78b2102db0) at
>     /home/rpm/git/xenomai-forge/__lib/psos/tm.c:153
>     #4  0x0000000000400c64 in thread_timer (bidx=24, tidx=0) at settime.c:73
>     #5  0x000000000040101c in thread_receive (bidx=24, tidx=0) at
>     settime.c:117
>     #6  0x00000000004013a6 in thread_entry (a1=24, a2=0, a3=0, a4=0)
>          at settime.c:149
>     #7  0x00007f78b24c9835 in task_trampoline (arg=0x67efc80)
>          at /home/rpm/git/xenomai-forge/__lib/psos/task.c:200
>     #8  0x0000003d71c07d15 in start_thread (arg=0x7f78b2103700)
>          at pthread_create.c:308
>     #9  0x0000003d714f248d in clone ()
>          at ../sysdeps/unix/sysv/linux/__x86_64/clone.S:114
>
>     The timerid is definitely broken in some cases.
>
>     --
>     Philippe.
>
>
>     _________________________________________________
>     Xenomai mailing list
>     Xenomai@xenomai.org <mailto:Xenomai@xenomai.org>
>     http://www.xenomai.org/__mailman/listinfo/xenomai
>     <http://www.xenomai.org/mailman/listinfo/xenomai>
>
>
> Hello
>
> if I understand the code of the timerserver correctly, it processes all
> timers that have an expiry in the past.
> So the linux timer will wake-up the timeserver thread and it simply
> starts to process all timers for which the timeout is in the past.
>
> Suppose now that a timer is enqueued and right after that the task that
> enqueues the timer is scheduled out and does only return after the timer
> is expired. If there is another timer expired in the meantime, also the
> timer of the process scheduled out is already processed and cleaned-up.
>
> In this way you have a race condition that will typically be seen under
> heavy load and with short timeouts.

I'm not sure to understand the scenario you describe. Low-resolution 
timers in userland are all processed over a single carrier thread, 
independently from the actual timer owner. When a timer elapses, it gets 
removed out of the outstanding queue on behalf of the carrier thread 
with no risk of duplicate handling and/or cleanup.

However, I suspect a race between that carrier thread and the thread 
re-arming a just elapsed timer, causing the former to cleanup this timer 
unexpectedly. The code supposed to prevent looks fragile. But maybe this 
is the same scenario you just described, using another wording.

-- 
Philippe.


  reply	other threads:[~2013-06-05 10:25 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-04 20:26 [Xenomai] [xenomai-forge] psos: crash while stressing event timers Tom Philips
2013-06-05  8:02 ` Philippe Gerum
2013-06-05  8:28   ` Ronny Meeus
2013-06-05  8:36     ` Philippe Gerum
2013-06-05  9:38       ` Philippe Gerum
2013-06-05  9:51       ` Tom Philips
2013-06-05  9:59         ` Philippe Gerum
2013-06-05 10:11           ` Ronny Meeus
2013-06-05 10:25             ` Philippe Gerum [this message]
2013-06-05 20:50               ` Philippe Gerum
2013-06-07 10:39                 ` Ronny Meeus
2013-06-11 10:10                   ` Ronny Meeus
2013-06-11 10:21                     ` Philippe Gerum
  -- strict thread matches above, loose matches on Subject: below --
2013-06-04 10:56 Ronny Meeus
2013-06-04 12:41 ` Philippe Gerum
2013-06-04 12:57   ` Ronny Meeus
2013-06-04 14:04     ` Philippe Gerum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51AF120C.6070603@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=ronny.meeus@gmail.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.