From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4461C318.5070207@domain.hid> Date: Wed, 10 May 2006 12:40:24 +0200 From: Philippe Gerum MIME-Version: 1.0 Subject: Re: [Xenomai-core] [bug] zombie mutex owners References: <44619D0B.1080402@domain.hid> <4461BB5A.3010403@domain.hid> In-Reply-To: <4461BB5A.3010403@domain.hid> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: xenomai@xenomai.org Jan Kiszka wrote: > Dmitry Adamushko wrote: > >>Hi Jan, >> >> >>>running the attached test case for the native skin, you will get an ugly >>>lock-up on probably all Xenomai versions. Granted, this code is a bit >>>synthetic. I originally thought I could trigger the bug also via >>>timeouts when waiting on mutexes, but this scenario is safe (the timeout >>>is cleared before being able to cause harm). >>> >> >>just in order to educate me as probably I might have got something >>wrong at the first glance :) >> >>if we take this one: >> >>--- mutex.c 2006-02-27 15:34:58.000000000 +0100 >>+++ mutex-NEW.c 2006-05-10 11:55:25.000000000 +0200 >>@@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex, >> err = -EIDRM; /* Mutex deleted while pending. */ >> else if (xnthread_test_flags(&task->thread_base,XNTIMEO)) >> err = -ETIMEDOUT; /* Timeout.*/ >>- else if (xnthread_test_flags(&task->thread_base,XNBREAK)) >>+ else if (xnthread_test_flags(&task->thread_base,XNBREAK) && >>mutex->owner != task) >> err = -EINTR; /* Unblocked.*/ >> >> unlock_and_exit: >> >>As I understand task2 has a lower prio and that's why >> >>[task1] rt_mutex_unlock >>[task 1] rt_task_unblock(task1) >> >>are called in a row. >> >>ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on >>rt_mutex_lock()) and finds XNBREAK flag but, >> >>[doc] -EINTR is returned if rt_task_unblock() has been called for the >>waiting task (1) before the mutex has become available (2). >> >>(1) it's true, task2 was still waiting at that time; >>(2) it's wrong, task2 was already the owner. >> >>So why just not to bail out XNBREAK and continue task2 as it has a >>mutex (as shown above) ? > > > Indeed, this solves the issue more gracefully. > > Looking at this again from a different perspective and running the test > case with your patch in a slightly different way, I think I > misinterpreted the crash. If I modify task2 like this > > void task2_fnc(void *arg) > { > printf("started task2\n"); > if (rt_mutex_lock(&mtx, 0) < 0) { > printf("lock failed in task2\n"); > return; > } > // rt_mutex_unlock(&mtx); > > printf("done task2\n"); > } > > I'm also getting a crash. So the problem seems to be releasing a mutex > ownership on task termination. Well, this needs further examination. > > Looks like the issue is limited to cleanup problems and is not that > widespread to other skins as I thought. RTDM is not involved as it does > not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on > interruption and should recover from this. > > Besides this, we then may want to consider if introducing a pending > ownership of synch objects is worthwhile to improve efficiency of PIP > users. Not critical, but if it comes at a reasonable price... Will try > to draft something. > I've planned to work over the simulator asap to implement the stealing of ownership at the nucleus level, so that this kind of issue will become history. -- Philippe.