From: Jan Kiszka <jan.kiszka@domain.hid>
To: rpm@xenomai.org
Cc: Jan Kiszka <jan.kiszka@domain.hid>, xenomai-core <xenomai@xenomai.org>
Subject: Re: [Xenomai-core] [BUG] Lock stealing is borken
Date: Tue, 19 Aug 2008 22:35:06 +0200 [thread overview]
Message-ID: <48AB2E7A.4050606@domain.hid> (raw)
In-Reply-To: <48AB2948.1060209@domain.hid>
[-- Attachment #1: Type: text/plain, Size: 6501 bytes --]
Philippe Gerum wrote:
> Jan Kiszka wrote:
>> Philippe Gerum wrote:
>>> Jan Kiszka wrote:
>>>> Philippe Gerum wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> bad news, everyone :(. According to the result of some lengthy debug
>>>>>> session with a customer and several ad-hoc lttng instrumentations, we
>>>>>> have a fatal bug in the nucleus' implementation of the lock stealing
>>>>>> algorithm. Consider this scenario:
>>>>>>
>>>>>> 1. Thread A acquires Mutex X successfully, ie. it leaves the (in this
>>>>>> case) rt_mutex_acquire service, and its XNWAKEN flag is therefore
>>>>>> cleared.
>>>>>>
>>>>>> 2. Thread A blocks on some further Mutex Y (in our case it was a
>>>>>> semaphore, but that doesn't matter).
>>>>>>
>>>>>> 3. Thread B signals the availability of Mutex Y to Thread A, thus it
>>>>>> also set XNWAKEN in Thread A. But Thread A is not yet scheduled on
>>>>>> its CPU.
>>>>>>
>>>>>> 4. Thread C tries to acquire Mutex X, finds it assigned to Thread A, but
>>>>>> also notices that the XNWAKEN flag of Thread A is set. Thus it steals
>>>>>> the mutex although Thread A already entered the critical section -
>>>>>> and hell breaks loose...
>>>>>>
>>>>> See commit #3795, and change log entry from 2008-05-15. Unless I misunderstood
>>>>> your description, this bug was fixed in 2.4.4.
>>>> Oh, fatally missed that fix.
>>>>
>>>> Anyway, the patch looks a bit unclean to me. Either you are lacking
>>>> wwake = NULL in xnpod_suspend_thread, or the whole information encoded
>>>> in XNWAKEN can already be covered by wwake directly.
>>>>
>>> Clearing wwake has to be done when returning from xnsynch_sleep_on, only when
>>> the code knows that ownership is eventually granted to the caller; making such a
>>> decision in xnpod_suspend_thread() would be wrong.
>> What about
>>
>> http://www.rts.uni-hannover.de/xenomai/lxr/source/ksrc/nucleus/pod.c#1411
>>
>> then?
>>
>
> That portion of code applies _before_ the thread enters suspended state. We
> bother for the other side, i.e. when it resumes from actual suspension, until it
> has been decided whether it should be allowed to keep running, or redo the wait
> due to the resource being stolen away.
Then clearing XNWAKEN here is useless - it comes for free, but it has no
practical effect. For code clarity reasons, this should be remove IMHO.
>
>>> The awake bit has been kept mainly because the nucleus commonly uses bitmasks to
>>> get fast access to thread status & information. It's not mandatory to have this
>>> one in, it's just conforming to the rest of the implementation.
>> I see, but redundancy come with some ugliness as well. And we add more
>> code to hot paths.
>>
>
> The hot path barely involves facing a stolen resource situation. Most of the
> time, the XNWAKEN test will be false, or this would mean that the application
> exhibits a high priority thread that routinely and frequently competes with a
> low priority one for gaining access to such resource. I would rather fix the
> latter code first.
Besides that the lock-stealing path (with two instead of only one
conditional jumps) is relevant for the WCET in some cases (but that is
nitpicking), I was more referring to setting the bit + setting wwake in
the wakeup paths.
>
> Regarding the perceived ugliness, I guess this is a matter of taste here. I like
> the idea of always testing a given information the same way, like:
>
> if (test_bits(thread, foobar))
>
> and avoid things like:
>
> if (test_bits(thread, foo) || thread->field == bar)
The latter is what we see now with XNWAKEN (while plain "thread->field
== bar" would be possible in this case). We don't gain anything here by
using a bit (no combined tests with other bits e.g.).
Jan
---
include/nucleus/thread.h | 1 -
ksrc/nucleus/pod.c | 2 +-
ksrc/nucleus/synch.c | 5 +----
3 files changed, 2 insertions(+), 6 deletions(-)
Index: b/include/nucleus/thread.h
===================================================================
--- a/include/nucleus/thread.h
+++ b/include/nucleus/thread.h
@@ -112,7 +112,6 @@
#define XNRMID 0x00000002 /**< Pending on a removed resource */
#define XNBREAK 0x00000004 /**< Forcibly awaken from a wait state */
#define XNKICKED 0x00000008 /**< Kicked upon Linux signal (shadow only) */
-#define XNWAKEN 0x00000010 /**< Thread waken up upon resource availability */
#define XNROBBED 0x00000020 /**< Robbed from resource ownership */
#define XNATOMIC 0x00000040 /**< In atomic switch from secondary to primary mode */
#define XNAFFSET 0x00000080 /**< CPU affinity changed from primary mode */
Index: b/ksrc/nucleus/pod.c
===================================================================
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -1406,7 +1406,7 @@ void xnpod_suspend_thread(xnthread_t *th
}
#endif /* CONFIG_XENO_OPT_PERVASIVE */
- xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK | XNWAKEN | XNROBBED);
+ xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK | XNROBBED);
}
/* Don't start the timer for a thread indefinitely delayed by
Index: b/ksrc/nucleus/synch.c
===================================================================
--- a/ksrc/nucleus/synch.c
+++ b/ksrc/nucleus/synch.c
@@ -199,7 +199,7 @@ redo:
}
if (thread->cprio > owner->cprio) {
- if (xnthread_test_info(owner, XNWAKEN) && owner->wwake == synch) {
+ if (owner->wwake == synch) {
/* Ownership is still pending, steal the resource. */
synch->owner = thread;
xnthread_clear_info(thread, XNRMID | XNTIMEO | XNBREAK);
@@ -243,7 +243,6 @@ redo:
unlock_and_exit:
thread->wwake = NULL;
- xnthread_clear_info(thread, XNWAKEN);
xnlock_put_irqrestore(&nklock, s);
}
@@ -389,7 +388,6 @@ xnthread_t *xnsynch_wakeup_one_sleeper(x
thread->wchan = NULL;
thread->wwake = synch;
synch->owner = thread;
- xnthread_set_info(thread, XNWAKEN);
trace_mark(xn_nucleus_synch_wakeup_one,
"thread %p thread_name %s synch %p",
thread, xnthread_name(thread), synch);
@@ -503,7 +501,6 @@ xnpholder_t *xnsynch_wakeup_this_sleeper
thread->wchan = NULL;
thread->wwake = synch;
synch->owner = thread;
- xnthread_set_info(thread, XNWAKEN);
trace_mark(xn_nucleus_synch_wakeup_all,
"thread %p thread_name %s synch %p",
thread, xnthread_name(thread), synch);
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]
next prev parent reply other threads:[~2008-08-19 20:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-08-19 16:42 [Xenomai-core] [BUG] Lock stealing is borken Jan Kiszka
2008-08-19 19:12 ` Philippe Gerum
2008-08-19 19:28 ` Jan Kiszka
2008-08-19 19:50 ` Philippe Gerum
2008-08-19 19:56 ` Jan Kiszka
2008-08-19 20:12 ` Philippe Gerum
2008-08-19 20:35 ` Jan Kiszka [this message]
2008-08-20 9:00 ` Philippe Gerum
2008-08-20 9:14 ` Jan Kiszka
2008-08-28 19:26 ` Philippe Gerum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48AB2E7A.4050606@domain.hid \
--to=jan.kiszka@domain.hid \
--cc=rpm@xenomai.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.