[Xenomai-core] [bug] zombie mutex owners

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] [bug] zombie mutex owners
@ 2006-05-10  7:58 Jan Kiszka
  2006-05-10  9:16 ` Dmitry Adamushko
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2006-05-10  7:58 UTC (permalink / raw)
  To: xenomai-core


[-- Attachment #1.1: Type: text/plain, Size: 2293 bytes --]

Hi,

running the attached test case for the native skin, you will get an ugly
lock-up on probably all Xenomai versions. Granted, this code is a bit
synthetic. I originally thought I could trigger the bug also via
timeouts when waiting on mutexes, but this scenario is safe (the timeout
is cleared before being able to cause harm).

What we see here is that task1 forwards the ownership of the mutex to
task2 on its first unlock invocation. Then we interrupt task2, making it
drop its wish to acquire the lock - but it already has it! Now weird
things happen on cleanup of task2 (likely also when trying to require
the lock via task1 beforehand).

The attached fix solves at least the crash but still gives an
unsatisfying result:

: fn      -158+  __rt_mutex_lock (hisyscall_event)
: fn      -156+  __copy_from_user_ll (__rt_mutex_lock)
: fn      -154+  xnregistry_fetch (__rt_mutex_lock)
:|fn      -151+  __ipipe_restore_pipeline_head (xnregistry_fetch)
: fn      -148+  rt_mutex_lock (__rt_mutex_lock)
:|fn      -144+  xnsynch_sleep_on (rt_mutex_lock)
:|fn      -134+  xnpod_resume_thread (xnsynch_sleep_on)
:|fn      -130+  xnpod_suspend_thread (xnsynch_sleep_on)
:|fn      -125+  xnpod_schedule (xnpod_suspend_thread)
:|fn      -116!  __switch_to (xnpod_schedule)
:|fn      -103+  rt_mutex_unlock (rt_mutex_lock)
:|fn      -100+  xnsynch_wakeup_one_sleeper (rt_mutex_unlock)
:|fn       -98+  xnpod_resume_thread (xnsynch_wakeup_one_sleeper)
:|fn       -95+  xnsynch_clear_boost (xnsynch_wakeup_one_sleeper)
:|fn       -89+  xnpod_schedule (rt_mutex_unlock)
:|fn       -85+  __switch_to (xnpod_schedule)
:|fn       -79!  __ipipe_restore_pipeline_head (rt_mutex_lock)

This means that task2 needs to be woken up in order to let task1
re-acquire the mutex. What would be more efficient for task1 is to
"steal" the granted lock again (that's what the preempt-rt people do in
their rtmutex code - and this is where I stumbled over our issues).

I haven't tried to construct test cases for other skins yet, but in
theory at least POSIX and RTDM should suffer from the same issue. This
raises the question if there might be some generic solution at nucleus
level for this, also improving the re-acquire path. But I have nothing
at hand so far.

Jan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: native-mutex-relock.patch --]
[-- Type: text/x-patch; name="native-mutex-relock.patch", Size: 1198 bytes --]

Index: ksrc/skins/native/mutex.c
===================================================================
--- ksrc/skins/native/mutex.c	(Revision 1058)
+++ ksrc/skins/native/mutex.c	(Arbeitskopie)
@@ -375,12 +375,17 @@ int rt_mutex_lock(RT_MUTEX *mutex, RTIME
 
     xnsynch_sleep_on(&mutex->synch_base, timeout);
 
-    if (xnthread_test_flags(&task->thread_base, XNRMID))
-        err = -EIDRM;           /* Mutex deleted while pending. */
-    else if (xnthread_test_flags(&task->thread_base, XNTIMEO))
-        err = -ETIMEDOUT;       /* Timeout. */
-    else if (xnthread_test_flags(&task->thread_base, XNBREAK))
-        err = -EINTR;           /* Unblocked. */
+    if (xnthread_test_flags(&task->thread_base, XNRMID | XNTIMEO | XNBREAK)) {
+        if (xnthread_test_flags(&task->thread_base, XNRMID))
+            err = -EIDRM;           /* Mutex deleted while pending. */
+        else if (xnthread_test_flags(&task->thread_base, XNTIMEO))
+            err = -ETIMEDOUT;       /* Timeout. */
+        else
+            err = -EINTR;           /* Unblocked. */
+
+        if (mutex->owner == task)
+            rt_mutex_unlock(mutex);
+    }
 
   unlock_and_exit:
 

[-- Attachment #1.3: mutex-relock.c --]
[-- Type: text/plain, Size: 1287 bytes --]

#include <stdio.h>
#include <sys/mman.h>
#include <native/task.h>
#include <native/mutex.h>
#include <native/timer.h>
#include <rtdm/rtbenchmark.h>

RT_TASK task1, task2;
RT_MUTEX mtx;
int fd;

void task1_fnc(void *arg)
{
        printf("started task1\n");
        rt_mutex_lock(&mtx, 0);
        rt_task_sleep(1000000000LL);
        rt_mutex_unlock(&mtx);

        rt_task_unblock(&task2);

        rt_mutex_lock(&mtx, 0);
        rt_dev_ioctl(fd, RTBNCH_RTIOC_REFREEZE_TRACE, 0);
        rt_dev_close(fd);
        rt_mutex_unlock(&mtx);
        printf("done task1\n");
}

void task2_fnc(void *arg)
{
        printf("started task2\n");
        if (rt_mutex_lock(&mtx, 0) < 0) {
                printf("lock failed in task2\n");
                return;
        }
        rt_mutex_unlock(&mtx);

        printf("done task2\n");
}

int main()
{
        mlockall(MCL_CURRENT | MCL_FUTURE);

        fd = rt_dev_open("rtbenchmark0", 0);
        rt_mutex_create(&mtx, NULL);

        rt_task_spawn(&task1, "task1", 0, 20, T_JOINABLE, task1_fnc, 0);
        rt_task_spawn(&task2, "task2", 0, 10, T_JOINABLE, task2_fnc, 0);

        rt_task_join(&task1);
        rt_task_join(&task2);

        rt_mutex_delete(&mtx);

        return 0;
}

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10  7:58 [Xenomai-core] [bug] zombie mutex owners Jan Kiszka
@ 2006-05-10  9:16 ` Dmitry Adamushko
  2006-05-10 10:07   ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Adamushko @ 2006-05-10  9:16 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Hi Jan,

>
> running the attached test case for the native skin, you will get an ugly
> lock-up on probably all Xenomai versions. Granted, this code is a bit
> synthetic. I originally thought I could trigger the bug also via
> timeouts when waiting on mutexes, but this scenario is safe (the timeout
> is cleared before being able to cause harm).
>

just in order to educate me as probably I might have got something
wrong at the first glance :)

if we take this one:

--- mutex.c	2006-02-27 15:34:58.000000000 +0100
+++ mutex-NEW.c	2006-05-10 11:55:25.000000000 +0200
@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
 	err = -EIDRM; /* Mutex deleted while pending. */
     else if (xnthread_test_flags(&task->thread_base,XNTIMEO))
 	err = -ETIMEDOUT; /* Timeout.*/
-    else if (xnthread_test_flags(&task->thread_base,XNBREAK))
+    else if (xnthread_test_flags(&task->thread_base,XNBREAK) &&
mutex->owner != task)
 	err = -EINTR; /* Unblocked.*/

  unlock_and_exit:

As I understand task2 has a lower prio and that's why

[task1] rt_mutex_unlock
[task 1] rt_task_unblock(task1)

are called in a row.

ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
rt_mutex_lock()) and finds XNBREAK flag but,

[doc] -EINTR is returned if rt_task_unblock() has been called for the
waiting task (1) before the mutex has become available (2).

(1) it's true, task2 was still waiting at that time;
(2) it's wrong, task2 was already the owner.

So why just not to bail out XNBREAK and continue task2 as it has a
mutex (as shown above) ?

--
Best regards,
Dmitry Adamushko


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10  9:16 ` Dmitry Adamushko
@ 2006-05-10 10:07   ` Jan Kiszka
  2006-05-10 10:40     ` Philippe Gerum
                       ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Jan Kiszka @ 2006-05-10 10:07 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2868 bytes --]

Dmitry Adamushko wrote:
> Hi Jan,
> 
>>
>> running the attached test case for the native skin, you will get an ugly
>> lock-up on probably all Xenomai versions. Granted, this code is a bit
>> synthetic. I originally thought I could trigger the bug also via
>> timeouts when waiting on mutexes, but this scenario is safe (the timeout
>> is cleared before being able to cause harm).
>>
> 
> just in order to educate me as probably I might have got something
> wrong at the first glance :)
> 
> if we take this one:
> 
> --- mutex.c    2006-02-27 15:34:58.000000000 +0100
> +++ mutex-NEW.c    2006-05-10 11:55:25.000000000 +0200
> @@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
>     err = -EIDRM; /* Mutex deleted while pending. */
>     else if (xnthread_test_flags(&task->thread_base,XNTIMEO))
>     err = -ETIMEDOUT; /* Timeout.*/
> -    else if (xnthread_test_flags(&task->thread_base,XNBREAK))
> +    else if (xnthread_test_flags(&task->thread_base,XNBREAK) &&
> mutex->owner != task)
>     err = -EINTR; /* Unblocked.*/
> 
>  unlock_and_exit:
> 
> As I understand task2 has a lower prio and that's why
> 
> [task1] rt_mutex_unlock
> [task 1] rt_task_unblock(task1)
> 
> are called in a row.
> 
> ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
> rt_mutex_lock()) and finds XNBREAK flag but,
> 
> [doc] -EINTR is returned if rt_task_unblock() has been called for the
> waiting task (1) before the mutex has become available (2).
> 
> (1) it's true, task2 was still waiting at that time;
> (2) it's wrong, task2 was already the owner.
> 
> So why just not to bail out XNBREAK and continue task2 as it has a
> mutex (as shown above) ?

Indeed, this solves the issue more gracefully.

Looking at this again from a different perspective and running the test
case with your patch in a slightly different way, I think I
misinterpreted the crash. If I modify task2 like this

void task2_fnc(void *arg)
{
        printf("started task2\n");
        if (rt_mutex_lock(&mtx, 0) < 0) {
                printf("lock failed in task2\n");
                return;
        }
//        rt_mutex_unlock(&mtx);

        printf("done task2\n");
}

I'm also getting a crash. So the problem seems to be releasing a mutex
ownership on task termination. Well, this needs further examination.

Looks like the issue is limited to cleanup problems and is not that
widespread to other skins as I thought. RTDM is not involved as it does
not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
interruption and should recover from this.

Besides this, we then may want to consider if introducing a pending
ownership of synch objects is worthwhile to improve efficiency of PIP
users. Not critical, but if it comes at a reasonable price... Will try
to draft something.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10 10:07   ` Jan Kiszka
@ 2006-05-10 10:40     ` Philippe Gerum
  2006-05-10 10:52     ` Philippe Gerum
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 10:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Dmitry Adamushko wrote:
> 
>>Hi Jan,
>>
>>
>>>running the attached test case for the native skin, you will get an ugly
>>>lock-up on probably all Xenomai versions. Granted, this code is a bit
>>>synthetic. I originally thought I could trigger the bug also via
>>>timeouts when waiting on mutexes, but this scenario is safe (the timeout
>>>is cleared before being able to cause harm).
>>>
>>
>>just in order to educate me as probably I might have got something
>>wrong at the first glance :)
>>
>>if we take this one:
>>
>>--- mutex.c    2006-02-27 15:34:58.000000000 +0100
>>+++ mutex-NEW.c    2006-05-10 11:55:25.000000000 +0200
>>@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
>>    err = -EIDRM; /* Mutex deleted while pending. */
>>    else if (xnthread_test_flags(&task->thread_base,XNTIMEO))
>>    err = -ETIMEDOUT; /* Timeout.*/
>>-    else if (xnthread_test_flags(&task->thread_base,XNBREAK))
>>+    else if (xnthread_test_flags(&task->thread_base,XNBREAK) &&
>>mutex->owner != task)
>>    err = -EINTR; /* Unblocked.*/
>>
>> unlock_and_exit:
>>
>>As I understand task2 has a lower prio and that's why
>>
>>[task1] rt_mutex_unlock
>>[task 1] rt_task_unblock(task1)
>>
>>are called in a row.
>>
>>ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
>>rt_mutex_lock()) and finds XNBREAK flag but,
>>
>>[doc] -EINTR is returned if rt_task_unblock() has been called for the
>>waiting task (1) before the mutex has become available (2).
>>
>>(1) it's true, task2 was still waiting at that time;
>>(2) it's wrong, task2 was already the owner.
>>
>>So why just not to bail out XNBREAK and continue task2 as it has a
>>mutex (as shown above) ?
> 
> 
> Indeed, this solves the issue more gracefully.
> 
> Looking at this again from a different perspective and running the test
> case with your patch in a slightly different way, I think I
> misinterpreted the crash. If I modify task2 like this
> 
> void task2_fnc(void *arg)
> {
>         printf("started task2\n");
>         if (rt_mutex_lock(&mtx, 0) < 0) {
>                 printf("lock failed in task2\n");
>                 return;
>         }
> //        rt_mutex_unlock(&mtx);
> 
>         printf("done task2\n");
> }
> 
> I'm also getting a crash. So the problem seems to be releasing a mutex
> ownership on task termination. Well, this needs further examination.
> 
> Looks like the issue is limited to cleanup problems and is not that
> widespread to other skins as I thought. RTDM is not involved as it does
> not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
> interruption and should recover from this.
> 
> Besides this, we then may want to consider if introducing a pending
> ownership of synch objects is worthwhile to improve efficiency of PIP
> users. Not critical, but if it comes at a reasonable price... Will try
> to draft something.
> 

I've planned to work over the simulator asap to implement the stealing 
of ownership at the nucleus level, so that this kind of issue will 
become history.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10 10:07   ` Jan Kiszka
  2006-05-10 10:40     ` Philippe Gerum
@ 2006-05-10 10:52     ` Philippe Gerum
  2006-05-10 11:49       ` Jan Kiszka
  2006-05-10 12:28     ` Philippe Gerum
  2006-05-10 16:55     ` [Xenomai-core] Pending ownership and resource stealing Philippe Gerum
  3 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 10:52 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Dmitry Adamushko wrote:
> 
> Indeed, this solves the issue more gracefully.
>
> Looking at this again from a different perspective and running the test
> case with your patch in a slightly different way, I think I
> misinterpreted the crash. If I modify task2 like this
> 
> void task2_fnc(void *arg)
> {
>         printf("started task2\n");
>         if (rt_mutex_lock(&mtx, 0) < 0) {
>                 printf("lock failed in task2\n");
>                 return;
>         }
> //        rt_mutex_unlock(&mtx);
> 
>         printf("done task2\n");
> }
> 
> I'm also getting a crash. So the problem seems to be releasing a mutex
> ownership on task termination. Well, this needs further examination.
>

The native skin does not implement robust mutex, indeed.

> Looks like the issue is limited to cleanup problems and is not that
> widespread to other skins as I thought. RTDM is not involved as it does
> not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on
> interruption and should recover from this.
>

I can confirm with the help of the simulator that there's no issue 
regarding the way the ownership is transferred back and forth between 
both tasks, this works. However, I agree that this should not preclude 
us from improving the priority inversion guard, by allowing the lock to 
be stolen if the new owner has not resumed execution yet, after it has 
been granted the mutex ownership.

> Besides this, we then may want to consider if introducing a pending
> ownership of synch objects is worthwhile to improve efficiency of PIP
> users. Not critical, but if it comes at a reasonable price... Will try
> to draft something.
> 
> Jan
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Xenomai-core mailing list
> Xenomai-core@domain.hid
> https://mail.gna.org/listinfo/xenomai-core


-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10 10:52     ` Philippe Gerum
@ 2006-05-10 11:49       ` Jan Kiszka
  2006-05-10 16:39         ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2006-05-10 11:49 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1102 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Dmitry Adamushko wrote:
>>
>> Indeed, this solves the issue more gracefully.
>>
>> Looking at this again from a different perspective and running the test
>> case with your patch in a slightly different way, I think I
>> misinterpreted the crash. If I modify task2 like this
>>
>> void task2_fnc(void *arg)
>> {
>>         printf("started task2\n");
>>         if (rt_mutex_lock(&mtx, 0) < 0) {
>>                 printf("lock failed in task2\n");
>>                 return;
>>         }
>> //        rt_mutex_unlock(&mtx);
>>
>>         printf("done task2\n");
>> }
>>
>> I'm also getting a crash. So the problem seems to be releasing a mutex
>> ownership on task termination. Well, this needs further examination.
>>
> 
> The native skin does not implement robust mutex, indeed.

Yeah, lunch opened my eyes: the skin data structure (RT_MUTEX) is not
updated appropriately on task cleanup. What about some callback hook in
xnsynch_t to invoke a per-skin cleanup handler when running
xnsynch_release_all_ownerships?

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10 10:07   ` Jan Kiszka
  2006-05-10 10:40     ` Philippe Gerum
  2006-05-10 10:52     ` Philippe Gerum
@ 2006-05-10 12:28     ` Philippe Gerum
  2006-05-10 16:55     ` [Xenomai-core] Pending ownership and resource stealing Philippe Gerum
  3 siblings, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 12:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Dmitry Adamushko wrote:
> 
>>Hi Jan,
>>
>>
>>>running the attached test case for the native skin, you will get an ugly
>>>lock-up on probably all Xenomai versions. Granted, this code is a bit
>>>synthetic. I originally thought I could trigger the bug also via
>>>timeouts when waiting on mutexes, but this scenario is safe (the timeout
>>>is cleared before being able to cause harm).
>>>
>>
>>just in order to educate me as probably I might have got something
>>wrong at the first glance :)
>>
>>if we take this one:
>>
>>--- mutex.c    2006-02-27 15:34:58.000000000 +0100
>>+++ mutex-NEW.c    2006-05-10 11:55:25.000000000 +0200
>>@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex,
>>    err = -EIDRM; /* Mutex deleted while pending. */
>>    else if (xnthread_test_flags(&task->thread_base,XNTIMEO))
>>    err = -ETIMEDOUT; /* Timeout.*/
>>-    else if (xnthread_test_flags(&task->thread_base,XNBREAK))
>>+    else if (xnthread_test_flags(&task->thread_base,XNBREAK) &&
>>mutex->owner != task)
>>    err = -EINTR; /* Unblocked.*/
>>
>> unlock_and_exit:
>>
>>As I understand task2 has a lower prio and that's why
>>
>>[task1] rt_mutex_unlock
>>[task 1] rt_task_unblock(task1)
>>
>>are called in a row.
>>
>>ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on
>>rt_mutex_lock()) and finds XNBREAK flag but,
>>
>>[doc] -EINTR is returned if rt_task_unblock() has been called for the
>>waiting task (1) before the mutex has become available (2).
>>
>>(1) it's true, task2 was still waiting at that time;
>>(2) it's wrong, task2 was already the owner.
>>
>>So why just not to bail out XNBREAK and continue task2 as it has a
>>mutex (as shown above) ?
> 
> 
> Indeed, this solves the issue more gracefully.
> 

The real issue with the XNBREAK bit is that xnpod_unblock_thread() 
should only raise it for threads which it does actually awake, and not 
for those which are already resumed, e.g. by a call to 
xnsynch_wakeup_sleeper().

This explains why task2 gets -EINTR from rt_mutex_lock() albeit the 
syscall has indeed succeeded, so that the next rt_mutex_unlock() was ok, 
but the downstream code might clearly be confused by such behaviour. It 
should get a success code instead, since it has been resumed by 
xnsynch_wakeup_sleeper() _before_ xnpod_unblock_thread() has been called 
against it. Fixed in the repository.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] [bug] zombie mutex owners
  2006-05-10 11:49       ` Jan Kiszka
@ 2006-05-10 16:39         ` Philippe Gerum
  0 siblings, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 16:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Philippe Gerum wrote:
> 
>>Jan Kiszka wrote:
>>
>>>Dmitry Adamushko wrote:
>>>
>>>Indeed, this solves the issue more gracefully.
>>>
>>>Looking at this again from a different perspective and running the test
>>>case with your patch in a slightly different way, I think I
>>>misinterpreted the crash. If I modify task2 like this
>>>
>>>void task2_fnc(void *arg)
>>>{
>>>        printf("started task2\n");
>>>        if (rt_mutex_lock(&mtx, 0) < 0) {
>>>                printf("lock failed in task2\n");
>>>                return;
>>>        }
>>>//        rt_mutex_unlock(&mtx);
>>>
>>>        printf("done task2\n");
>>>}
>>>
>>>I'm also getting a crash. So the problem seems to be releasing a mutex
>>>ownership on task termination. Well, this needs further examination.
>>>
>>
>>The native skin does not implement robust mutex, indeed.
> 
> 
> Yeah, lunch opened my eyes: the skin data structure (RT_MUTEX) is not
> updated appropriately on task cleanup. What about some callback hook in
> xnsynch_t to invoke a per-skin cleanup handler when running
> xnsynch_release_all_ownerships?
> 

Yep, we need this. Added by commit #1067 (i.e. 
xnsynch_register_cleanup()) and also backported to the maintenance 
branch, so that new code may happily rely on it without too much 
portability burden.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 10:07   ` Jan Kiszka
                       ` (2 preceding siblings ...)
  2006-05-10 12:28     ` Philippe Gerum
@ 2006-05-10 16:55     ` Philippe Gerum
  2006-05-10 17:34       ` Gilles Chanteperdrix
  2006-05-10 17:34       ` Jan Kiszka
  3 siblings, 2 replies; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 16:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:

> Besides this, we then may want to consider if introducing a pending
> ownership of synch objects is worthwhile to improve efficiency of PIP
> users. Not critical, but if it comes at a reasonable price... Will try
> to draft something.
> 

I've committed an implementation of this stuff to the trunk, tested on 
your testcase over the simulator. So far, it's ok. The only thing that 
should change downstream compared to the previous behaviour is that 
xnsynch_sleep_on() might unblock immediately at skin level without any 
xnsynch_wakeup_sleeper() calls being previously invoked, since the 
blocking call does the stealing during the pending ownership window.

This means that skins now _must_ fix their internal state when unblocked 
from xnsynch_sleep_on() if they happen to track their own resource owner 
field for instance, since they might become the owner of such resource 
without any unlock/release/whatever routine being called at the said 
skin level. I've fixed a couple of skins for that purpose (not checked 
RTDM btw), but it would be safer if you could double-check the impact of 
such change on the interfaces you've crafted.

This change only affects PIP-enabled synchronization objects in a 
reasonably limited manner and seems to behave properly, but please, give 
this code hell on your side too.

-- 

Philippe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 16:55     ` [Xenomai-core] Pending ownership and resource stealing Philippe Gerum
@ 2006-05-10 17:34       ` Gilles Chanteperdrix
  2006-05-10 18:39         ` Philippe Gerum
  2006-05-10 17:34       ` Jan Kiszka
  1 sibling, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2006-05-10 17:34 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, xenomai

Philippe Gerum wrote:
 > This means that skins now _must_ fix their internal state when unblocked 
 > from xnsynch_sleep_on() if they happen to track their own resource owner 
 > field for instance, since they might become the owner of such resource 
 > without any unlock/release/whatever routine being called at the said 
 > skin level. I've fixed a couple of skins for that purpose (not checked 
 > RTDM btw), but it would be safer if you could double-check the impact of 
 > such change on the interfaces you've crafted.

When waking up, how do we know that we are in the problematic situation
? Can we do the house keeping in the callback registered with
xnsynch_register_cleanup, or are you talking of a different situation ?


-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 16:55     ` [Xenomai-core] Pending ownership and resource stealing Philippe Gerum
  2006-05-10 17:34       ` Gilles Chanteperdrix
@ 2006-05-10 17:34       ` Jan Kiszka
  2006-05-10 21:23         ` Philippe Gerum
  1 sibling, 1 reply; 18+ messages in thread
From: Jan Kiszka @ 2006-05-10 17:34 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 1882 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
> 
>> Besides this, we then may want to consider if introducing a pending
>> ownership of synch objects is worthwhile to improve efficiency of PIP
>> users. Not critical, but if it comes at a reasonable price... Will try
>> to draft something.
>>
> 
> I've committed an implementation of this stuff to the trunk, tested on
> your testcase over the simulator. So far, it's ok.

I'll give up, you are too fast for me. ;)

> The only thing that
> should change downstream compared to the previous behaviour is that
> xnsynch_sleep_on() might unblock immediately at skin level without any
> xnsynch_wakeup_sleeper() calls being previously invoked, since the
> blocking call does the stealing during the pending ownership window.
> 
> This means that skins now _must_ fix their internal state when unblocked
> from xnsynch_sleep_on() if they happen to track their own resource owner
> field for instance, since they might become the owner of such resource
> without any unlock/release/whatever routine being called at the said
> skin level. I've fixed a couple of skins for that purpose (not checked
> RTDM btw), but it would be safer if you could double-check the impact of
> such change on the interfaces you've crafted.

Well, if this means that once you have called xnsynch_wakeup_sleeper()
for some lower-prio task, you must call xnsynch_sleep_on() to steal it
for a higher-prio task, then RTDM needs fixing (it only sets a private
lock bit in this case).

> 
> This change only affects PIP-enabled synchronization objects in a
> reasonably limited manner and seems to behave properly, but please, give
> this code hell on your side too.
> 

Will do.

Jan


PS: The usage can be checked also via the cross reference:
http://www.rts.uni-hannover.de/xenomai/lxr/ident?v=SVN-trunk;i=XNSYNCH_PIP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 17:34       ` Gilles Chanteperdrix
@ 2006-05-10 18:39         ` Philippe Gerum
  2006-05-10 20:00           ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 18:39 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > This means that skins now _must_ fix their internal state when unblocked 
>  > from xnsynch_sleep_on() if they happen to track their own resource owner 
>  > field for instance, since they might become the owner of such resource 
>  > without any unlock/release/whatever routine being called at the said 
>  > skin level. I've fixed a couple of skins for that purpose (not checked 
>  > RTDM btw), but it would be safer if you could double-check the impact of 
>  > such change on the interfaces you've crafted.
> 
> When waking up, how do we know that we are in the problematic situation
> ?

The only possible issue with the new implementation is when the skin 
maintains its own specific data for tracking ownership of a resource, 
that complements the synch::owner field of the core nucleus resource 
object. The example code Jan has posted lately to raise the concern 
about the XNBREAK issue is also demonstrative of the issue of pending 
ownership, i.e.:

prio(task1) > prio(task2)

1. task1 grabs a resource
2. task1 sleeps for some time
3. task2 blocks requesting the resource
4. task1 wakes up from the sleep and releases the resource to task2
5. task1 wants the resource back immediately and calls 
xnsynch_sleep_on() since the ownership has been transferred to task2 
since step 4.
6a. old way: task1 would block and task2 would run anyway, with a PIP 
boost, blocking task1 until the resource is released
6b. new way: task1 steals the resource previously granted to task2 
directly from xnsynch_sleep_on(), but doing so, nobody downstream has 
had a chance to update any skin-specific data, such as an additional 
"owner" field.

The caller of xnsynch_sleep_on() in native/mutex.c illustrates this 
(i.e. grab_mutex label).

  Can we do the house keeping in the callback registered with
> xnsynch_register_cleanup, or are you talking of a different situation ?
> 
> 

It's different. The cleanup handler as described by Jan is only there to 
perform some housekeeping chores on synch objects which get forcibly 
released because their previous owner went away (typically xnpod_delete()).

-- 

Philippe.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 18:39         ` Philippe Gerum
@ 2006-05-10 20:00           ` Gilles Chanteperdrix
  2006-05-10 21:25             ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2006-05-10 20:00 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Philippe Gerum wrote:
 > prio(task1) > prio(task2)
 > 
 > 1. task1 grabs a resource
 > 2. task1 sleeps for some time
 > 3. task2 blocks requesting the resource
 > 4. task1 wakes up from the sleep and releases the resource to task2
 > 5. task1 wants the resource back immediately and calls 
 > xnsynch_sleep_on() since the ownership has been transferred to task2 
 > since step 4.
 > 6a. old way: task1 would block and task2 would run anyway, with a PIP 
 > boost, blocking task1 until the resource is released
 > 6b. new way: task1 steals the resource previously granted to task2 
 > directly from xnsynch_sleep_on(), but doing so, nobody downstream has 
 > had a chance to update any skin-specific data, such as an additional 
 > "owner" field.

Posix skin mutexes work the new way without calling xnsynch_sleep_on, so
probably need fixing.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 17:34       ` Jan Kiszka
@ 2006-05-10 21:23         ` Philippe Gerum
  2006-05-11  7:56           ` Jan Kiszka
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 21:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai

Jan Kiszka wrote:
> Philippe Gerum wrote:
> 
>>Jan Kiszka wrote:
>>
>>
>>>Besides this, we then may want to consider if introducing a pending
>>>ownership of synch objects is worthwhile to improve efficiency of PIP
>>>users. Not critical, but if it comes at a reasonable price... Will try
>>>to draft something.
>>>
>>
>>I've committed an implementation of this stuff to the trunk, tested on
>>your testcase over the simulator. So far, it's ok.
> 
> 
> I'll give up, you are too fast for me. ;)
> 
> 
>>The only thing that
>>should change downstream compared to the previous behaviour is that
>>xnsynch_sleep_on() might unblock immediately at skin level without any
>>xnsynch_wakeup_sleeper() calls being previously invoked, since the
>>blocking call does the stealing during the pending ownership window.
>>
>>This means that skins now _must_ fix their internal state when unblocked
>>from xnsynch_sleep_on() if they happen to track their own resource owner
>>field for instance, since they might become the owner of such resource
>>without any unlock/release/whatever routine being called at the said
>>skin level. I've fixed a couple of skins for that purpose (not checked
>>RTDM btw), but it would be safer if you could double-check the impact of
>>such change on the interfaces you've crafted.
> 
> 
> Well, if this means that once you have called xnsynch_wakeup_sleeper()
> for some lower-prio task, you must call xnsynch_sleep_on() to steal it
> for a higher-prio task, then RTDM needs fixing (it only sets a private
> lock bit in this case).

No need to call xnsynch_sleep_on() more than usually done; just have a 
look at native/mutex.c in rt_mutex_lock(), and follow the code labeled 
grab_mutex, it should give your the proper illustration of the issue.

> 
> 
>>This change only affects PIP-enabled synchronization objects in a
>>reasonably limited manner and seems to behave properly, but please, give
>>this code hell on your side too.
>>
> 
> 
> Will do.
> 
> Jan
> 
> 
> PS: The usage can be checked also via the cross reference:
> http://www.rts.uni-hannover.de/xenomai/lxr/ident?v=SVN-trunk;i=XNSYNCH_PIP
> 


-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 20:00           ` Gilles Chanteperdrix
@ 2006-05-10 21:25             ` Philippe Gerum
  2006-05-11 17:17               ` Gilles Chanteperdrix
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2006-05-10 21:25 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > prio(task1) > prio(task2)
>  > 
>  > 1. task1 grabs a resource
>  > 2. task1 sleeps for some time
>  > 3. task2 blocks requesting the resource
>  > 4. task1 wakes up from the sleep and releases the resource to task2
>  > 5. task1 wants the resource back immediately and calls 
>  > xnsynch_sleep_on() since the ownership has been transferred to task2 
>  > since step 4.
>  > 6a. old way: task1 would block and task2 would run anyway, with a PIP 
>  > boost, blocking task1 until the resource is released
>  > 6b. new way: task1 steals the resource previously granted to task2 
>  > directly from xnsynch_sleep_on(), but doing so, nobody downstream has 
>  > had a chance to update any skin-specific data, such as an additional 
>  > "owner" field.
> 
> Posix skin mutexes work the new way without calling xnsynch_sleep_on, so
> probably need fixing.
> 

I don't see any additional information maintained by the skin, aside of 
the sem->count field, so that should be ok as it is. Is there anything 
else recorded for tracking the current ownership of a sem-mutex object?

-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 21:23         ` Philippe Gerum
@ 2006-05-11  7:56           ` Jan Kiszka
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kiszka @ 2006-05-11  7:56 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]

Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> The only thing that
>>> should change downstream compared to the previous behaviour is that
>>> xnsynch_sleep_on() might unblock immediately at skin level without any
>>> xnsynch_wakeup_sleeper() calls being previously invoked, since the
>>> blocking call does the stealing during the pending ownership window.
>>>
>>> This means that skins now _must_ fix their internal state when unblocked
>>> from xnsynch_sleep_on() if they happen to track their own resource owner
>>> field for instance, since they might become the owner of such resource
>>> without any unlock/release/whatever routine being called at the said
>>> skin level. I've fixed a couple of skins for that purpose (not checked
>>> RTDM btw), but it would be safer if you could double-check the impact of
>>> such change on the interfaces you've crafted.
>>
>>
>> Well, if this means that once you have called xnsynch_wakeup_sleeper()
>> for some lower-prio task, you must call xnsynch_sleep_on() to steal it
>> for a higher-prio task, then RTDM needs fixing (it only sets a private
>> lock bit in this case).
> 
> No need to call xnsynch_sleep_on() more than usually done; just have a
> look at native/mutex.c in rt_mutex_lock(), and follow the code labeled
> grab_mutex, it should give your the proper illustration of the issue.

I did so and discovered that prio-inheritance was broken for
RTDM-mutexes right from the beginning. In case such a mutex was entered
uncontended, the owner was not recorded. Oops...

This caused no crash due to the check "owner != NULL" in
xnsynch_sleep_on [1]. But this also made me wonder if it is a reasonable
state for a PIP synch object to be acquired without having an owner. If
no, we should rather bail out in some way here.

Anyway, RTDM seems to work fine now (trunk and 2.1.x), even with a
reduced rtdm_mutex_t data structure and shrunken code (rtdm_mutex_unlock
became trivial by only using xnsynch_t). Further tests will follow
during the day. Oh, and the look-steeling code behaved as expected so far.

Jan


[1]http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/synch.c#L179


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-10 21:25             ` Philippe Gerum
@ 2006-05-11 17:17               ` Gilles Chanteperdrix
  2006-05-11 22:39                 ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2006-05-11 17:17 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Philippe Gerum wrote:
 > Gilles Chanteperdrix wrote:
 > > Philippe Gerum wrote:
 > >  > prio(task1) > prio(task2)
 > >  > 
 > >  > 1. task1 grabs a resource
 > >  > 2. task1 sleeps for some time
 > >  > 3. task2 blocks requesting the resource
 > >  > 4. task1 wakes up from the sleep and releases the resource to task2
 > >  > 5. task1 wants the resource back immediately and calls 
 > >  > xnsynch_sleep_on() since the ownership has been transferred to task2 
 > >  > since step 4.
 > >  > 6a. old way: task1 would block and task2 would run anyway, with a PIP 
 > >  > boost, blocking task1 until the resource is released
 > >  > 6b. new way: task1 steals the resource previously granted to task2 
 > >  > directly from xnsynch_sleep_on(), but doing so, nobody downstream has 
 > >  > had a chance to update any skin-specific data, such as an additional 
 > >  > "owner" field.
 > > 
 > > Posix skin mutexes work the new way without calling xnsynch_sleep_on, so
 > > probably need fixing.
 > > 
 > 
 > I don't see any additional information maintained by the skin, aside of 
 > the sem->count field, so that should be ok as it is. Is there anything 
 > else recorded for tracking the current ownership of a sem-mutex object?

The mutex->count field is unconditionnaly set to 0 and task2 woken up
when task1 releases the mutex, so that when task1 want the mutex again,
it will appear as free, and task1 will get it without calling
xnsynch_sleep_on, task2 will eventually wakeup spuriously or not
depending on whether the mutex is free. So, setting the count to 0 when
waking up seems wrong.

In a nut shell, the posix skin should be changed to use the new feature
instead of implementing its own behaviour.

-- 


					    Gilles Chanteperdrix.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai-core] Pending ownership and resource stealing
  2006-05-11 17:17               ` Gilles Chanteperdrix
@ 2006-05-11 22:39                 ` Philippe Gerum
  0 siblings, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2006-05-11 22:39 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>  > Gilles Chanteperdrix wrote:
>  > > Philippe Gerum wrote:
>  > >  > prio(task1) > prio(task2)
>  > >  > 
>  > >  > 1. task1 grabs a resource
>  > >  > 2. task1 sleeps for some time
>  > >  > 3. task2 blocks requesting the resource
>  > >  > 4. task1 wakes up from the sleep and releases the resource to task2
>  > >  > 5. task1 wants the resource back immediately and calls 
>  > >  > xnsynch_sleep_on() since the ownership has been transferred to task2 
>  > >  > since step 4.
>  > >  > 6a. old way: task1 would block and task2 would run anyway, with a PIP 
>  > >  > boost, blocking task1 until the resource is released
>  > >  > 6b. new way: task1 steals the resource previously granted to task2 
>  > >  > directly from xnsynch_sleep_on(), but doing so, nobody downstream has 
>  > >  > had a chance to update any skin-specific data, such as an additional 
>  > >  > "owner" field.
>  > > 
>  > > Posix skin mutexes work the new way without calling xnsynch_sleep_on, so
>  > > probably need fixing.
>  > > 
>  > 
>  > I don't see any additional information maintained by the skin, aside of 
>  > the sem->count field, so that should be ok as it is. Is there anything 
>  > else recorded for tracking the current ownership of a sem-mutex object?
> 
> The mutex->count field is unconditionnaly set to 0 and task2 woken up
> when task1 releases the mutex, so that when task1 want the mutex again,
> it will appear as free, and task1 will get it without calling
> xnsynch_sleep_on, task2 will eventually wakeup spuriously or not
> depending on whether the mutex is free. So, setting the count to 0 when
> waking up seems wrong.
>

We just need to set it to 1 instead, just like if we were returning from
mutex_trylock_internal() successfully.

> In a nut shell, the posix skin should be changed to use the new feature
> instead of implementing its own behaviour.
> 

There is no new feature per se, just an additional case upon return from
xnsynch_sleep_on() to take into account.

-- 

Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-05-11 22:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-10  7:58 [Xenomai-core] [bug] zombie mutex owners Jan Kiszka
2006-05-10  9:16 ` Dmitry Adamushko
2006-05-10 10:07   ` Jan Kiszka
2006-05-10 10:40     ` Philippe Gerum
2006-05-10 10:52     ` Philippe Gerum
2006-05-10 11:49       ` Jan Kiszka
2006-05-10 16:39         ` Philippe Gerum
2006-05-10 12:28     ` Philippe Gerum
2006-05-10 16:55     ` [Xenomai-core] Pending ownership and resource stealing Philippe Gerum
2006-05-10 17:34       ` Gilles Chanteperdrix
2006-05-10 18:39         ` Philippe Gerum
2006-05-10 20:00           ` Gilles Chanteperdrix
2006-05-10 21:25             ` Philippe Gerum
2006-05-11 17:17               ` Gilles Chanteperdrix
2006-05-11 22:39                 ` Philippe Gerum
2006-05-10 17:34       ` Jan Kiszka
2006-05-10 21:23         ` Philippe Gerum
2006-05-11  7:56           ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.