From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <44619D0B.1080402@domain.hid> Date: Wed, 10 May 2006 09:58:03 +0200 From: Jan Kiszka MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig5FE0ACF3E1F0C662AD925C5D" Sender: jan.kiszka@domain.hid Subject: [Xenomai-core] [bug] zombie mutex owners List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: xenomai-core This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig5FE0ACF3E1F0C662AD925C5D Content-Type: multipart/mixed; boundary="------------070704070506020903070402" This is a multi-part message in MIME format. --------------070704070506020903070402 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Hi, running the attached test case for the native skin, you will get an ugly lock-up on probably all Xenomai versions. Granted, this code is a bit synthetic. I originally thought I could trigger the bug also via timeouts when waiting on mutexes, but this scenario is safe (the timeout is cleared before being able to cause harm). What we see here is that task1 forwards the ownership of the mutex to task2 on its first unlock invocation. Then we interrupt task2, making it drop its wish to acquire the lock - but it already has it! Now weird things happen on cleanup of task2 (likely also when trying to require the lock via task1 beforehand). The attached fix solves at least the crash but still gives an unsatisfying result: : fn -158+ __rt_mutex_lock (hisyscall_event) : fn -156+ __copy_from_user_ll (__rt_mutex_lock) : fn -154+ xnregistry_fetch (__rt_mutex_lock) :|fn -151+ __ipipe_restore_pipeline_head (xnregistry_fetch) : fn -148+ rt_mutex_lock (__rt_mutex_lock) :|fn -144+ xnsynch_sleep_on (rt_mutex_lock) :|fn -134+ xnpod_resume_thread (xnsynch_sleep_on) :|fn -130+ xnpod_suspend_thread (xnsynch_sleep_on) :|fn -125+ xnpod_schedule (xnpod_suspend_thread) :|fn -116! __switch_to (xnpod_schedule) :|fn -103+ rt_mutex_unlock (rt_mutex_lock) :|fn -100+ xnsynch_wakeup_one_sleeper (rt_mutex_unlock) :|fn -98+ xnpod_resume_thread (xnsynch_wakeup_one_sleeper) :|fn -95+ xnsynch_clear_boost (xnsynch_wakeup_one_sleeper) :|fn -89+ xnpod_schedule (rt_mutex_unlock) :|fn -85+ __switch_to (xnpod_schedule) :|fn -79! __ipipe_restore_pipeline_head (rt_mutex_lock) This means that task2 needs to be woken up in order to let task1 re-acquire the mutex. What would be more efficient for task1 is to "steal" the granted lock again (that's what the preempt-rt people do in their rtmutex code - and this is where I stumbled over our issues). I haven't tried to construct test cases for other skins yet, but in theory at least POSIX and RTDM should suffer from the same issue. This raises the question if there might be some generic solution at nucleus level for this, also improving the re-acquire path. But I have nothing at hand so far. Jan --------------070704070506020903070402 Content-Type: text/x-patch; name="native-mutex-relock.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="native-mutex-relock.patch" Index: ksrc/skins/native/mutex.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- ksrc/skins/native/mutex.c (Revision 1058) +++ ksrc/skins/native/mutex.c (Arbeitskopie) @@ -375,12 +375,17 @@ int rt_mutex_lock(RT_MUTEX *mutex, RTIME =20 xnsynch_sleep_on(&mutex->synch_base, timeout); =20 - if (xnthread_test_flags(&task->thread_base, XNRMID)) - err =3D -EIDRM; /* Mutex deleted while pending. */ - else if (xnthread_test_flags(&task->thread_base, XNTIMEO)) - err =3D -ETIMEDOUT; /* Timeout. */ - else if (xnthread_test_flags(&task->thread_base, XNBREAK)) - err =3D -EINTR; /* Unblocked. */ + if (xnthread_test_flags(&task->thread_base, XNRMID | XNTIMEO | XNBRE= AK)) { + if (xnthread_test_flags(&task->thread_base, XNRMID)) + err =3D -EIDRM; /* Mutex deleted while pending. */= + else if (xnthread_test_flags(&task->thread_base, XNTIMEO)) + err =3D -ETIMEDOUT; /* Timeout. */ + else + err =3D -EINTR; /* Unblocked. */ + + if (mutex->owner =3D=3D task) + rt_mutex_unlock(mutex); + } =20 unlock_and_exit: =20 --------------070704070506020903070402 Content-Type: text/plain; name="mutex-relock.c" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline; filename="mutex-relock.c" #include #include #include #include #include #include RT_TASK task1, task2; RT_MUTEX mtx; int fd; void task1_fnc(void *arg) { printf("started task1\n"); rt_mutex_lock(&mtx, 0); rt_task_sleep(1000000000LL); rt_mutex_unlock(&mtx); rt_task_unblock(&task2); rt_mutex_lock(&mtx, 0); rt_dev_ioctl(fd, RTBNCH_RTIOC_REFREEZE_TRACE, 0); rt_dev_close(fd); rt_mutex_unlock(&mtx); printf("done task1\n"); } void task2_fnc(void *arg) { printf("started task2\n"); if (rt_mutex_lock(&mtx, 0) < 0) { printf("lock failed in task2\n"); return; } rt_mutex_unlock(&mtx); printf("done task2\n"); } int main() { mlockall(MCL_CURRENT | MCL_FUTURE); fd =3D rt_dev_open("rtbenchmark0", 0); rt_mutex_create(&mtx, NULL); rt_task_spawn(&task1, "task1", 0, 20, T_JOINABLE, task1_fnc, 0); rt_task_spawn(&task2, "task2", 0, 10, T_JOINABLE, task2_fnc, 0); rt_task_join(&task1); rt_task_join(&task2); rt_mutex_delete(&mtx); return 0; } --------------070704070506020903070402-- --------------enig5FE0ACF3E1F0C662AD925C5D Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEYZ0LniDOoMHTA+kRAsrdAJsGPx7b5qLs+/RtsnQUNU0kcR04WACfVnR/ b3+VewazX6eInHL35Ci6MD4= =maln -----END PGP SIGNATURE----- --------------enig5FE0ACF3E1F0C662AD925C5D--