From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <4461BB5A.3010403@domain.hid> Date: Wed, 10 May 2006 12:07:22 +0200 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Xenomai-core] [bug] zombie mutex owners References: <44619D0B.1080402@domain.hid> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig777070B4F7866DE56B1E0A41" Sender: jan.kiszka@domain.hid List-Id: "Xenomai life and development \(bug reports, patches, discussions\)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dmitry Adamushko Cc: xenomai@xenomai.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig777070B4F7866DE56B1E0A41 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable Dmitry Adamushko wrote: > Hi Jan, >=20 >> >> running the attached test case for the native skin, you will get an ug= ly >> lock-up on probably all Xenomai versions. Granted, this code is a bit >> synthetic. I originally thought I could trigger the bug also via >> timeouts when waiting on mutexes, but this scenario is safe (the timeo= ut >> is cleared before being able to cause harm). >> >=20 > just in order to educate me as probably I might have got something > wrong at the first glance :) >=20 > if we take this one: >=20 > --- mutex.c 2006-02-27 15:34:58.000000000 +0100 > +++ mutex-NEW.c 2006-05-10 11:55:25.000000000 +0200 > @@ -391,7 +391,7 @@ int rt_mutex_lock (RT_MUTEX *mutex, > err =3D -EIDRM; /* Mutex deleted while pending. */ > else if (xnthread_test_flags(&task->thread_base,XNTIMEO)) > err =3D -ETIMEDOUT; /* Timeout.*/ > - else if (xnthread_test_flags(&task->thread_base,XNBREAK)) > + else if (xnthread_test_flags(&task->thread_base,XNBREAK) && > mutex->owner !=3D task) > err =3D -EINTR; /* Unblocked.*/ >=20 > unlock_and_exit: >=20 > As I understand task2 has a lower prio and that's why >=20 > [task1] rt_mutex_unlock > [task 1] rt_task_unblock(task1) >=20 > are called in a row. >=20 > ok, task2 wakes up in rt_mutex_unlock() (when task1 is blocked on > rt_mutex_lock()) and finds XNBREAK flag but, >=20 > [doc] -EINTR is returned if rt_task_unblock() has been called for the > waiting task (1) before the mutex has become available (2). >=20 > (1) it's true, task2 was still waiting at that time; > (2) it's wrong, task2 was already the owner. >=20 > So why just not to bail out XNBREAK and continue task2 as it has a > mutex (as shown above) ? Indeed, this solves the issue more gracefully. Looking at this again from a different perspective and running the test case with your patch in a slightly different way, I think I misinterpreted the crash. If I modify task2 like this void task2_fnc(void *arg) { printf("started task2\n"); if (rt_mutex_lock(&mtx, 0) < 0) { printf("lock failed in task2\n"); return; } // rt_mutex_unlock(&mtx); printf("done task2\n"); } I'm also getting a crash. So the problem seems to be releasing a mutex ownership on task termination. Well, this needs further examination. Looks like the issue is limited to cleanup problems and is not that widespread to other skins as I thought. RTDM is not involved as it does not know EINTR for rtdm_mutex_lock. The POSIX skins runs in a loop on interruption and should recover from this. Besides this, we then may want to consider if introducing a pending ownership of synch objects is worthwhile to improve efficiency of PIP users. Not critical, but if it comes at a reasonable price... Will try to draft something. Jan --------------enig777070B4F7866DE56B1E0A41 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEYbtaniDOoMHTA+kRApjAAJ9OuW8mbBArRt6m1Uuotn8eDG+O1gCbBQsC v4uHFfiB3O5+L82ZBDusQBY= =nL8Z -----END PGP SIGNATURE----- --------------enig777070B4F7866DE56B1E0A41--