All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: xenomai@lists.linux.dev, yo sang <stlfatboy@hotmail.com>,
	Giulio Moro <giulio@bela.io>
Subject: Re: Issues with evl_mutex_trylock()
Date: Fri, 11 Aug 2023 15:17:52 +0200	[thread overview]
Message-ID: <87il9l90k5.fsf@xenomai.org> (raw)
In-Reply-To: <5e40a21c-2b7e-42db-88c3-5a6879f276af@siemens.com>


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 26.08.22 18:17, Philippe Gerum wrote:
>> 
>> Philippe Gerum <rpm@xenomai.org> writes:
>> 
>>> Giulio Moro <giulio@bela.io> writes:
>>>
>>>> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>>>>
>>>> Here's a summary of what I get when running the test with some
>>>> parameters (arguments are: "max iterations", "number of threads to
>>>> create", "priority of the main thread", "flags (t means don't print
>>>> anything). The error code reported alongside failures is the base 2
>>>> representation of the return code. 100 means that errors were
>>>> detected in evl_unlock_mutex() in the main thread following a
>>>> successful call to evl_trylock_mutex(). During these tests, the
>>>> output is piped to cat and then redirected to /dev/null.
>>>>
>>>> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
>>>> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
>>>> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
>>>> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
>>>> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
>>>> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
>>>> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
>>>> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
>>>> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
>>>> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
>>>> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
>>>> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
>>>> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
>>>> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
>>>> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>>>>
>>>> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>>>>
>>>
>>> Confirmed. There is something wrong.
>>>
>> 
>> Please retry with this patch in. There is clearly a race in the current
>> implementation of the mutex release code, specifically the release path.
>> 
>> @Jan: If believe this gremlin may be in Cobalt too. The key issue is
>> with the handle transitioning to NO_HANDLE before transfer_ownership()
>> switches it to the handle of the first contender in line. If userland
>> manages to slip in with a fast_acquire() during this tiny window, we may
>> be toast, with two threads actually holding the same lock. The issue was
>> made more obvious with EVL which does not serialize the trylock code
>> around cmpxchg in kernel space, triggering the issue faster. I'll have a
>> second look to Cobalt, but since you know the mutex code in depth there,
>> four eyeballs would be better.
>> 
>> diff --git a/kernel/evl/mutex.c b/kernel/evl/mutex.c
>> index e0cf1ba21d5b4..d96f8c9f04df7 100644
>> --- a/kernel/evl/mutex.c
>> +++ b/kernel/evl/mutex.c
>> @@ -899,14 +899,16 @@ void __evl_unlock_mutex(struct evl_mutex *mutex)
>>  	if (mutex->flags & EVL_MUTEX_CEILING)
>>  		clear_boost_locked(mutex, curr, EVL_MUTEX_CEILING);
>>  
>> +	/*
>> +	 * The whole logic is based on the invariant that the current
>> +	 * thread does own the mutex being released.
>> +	 * IOW: currh == atomic_read(lockp) & ~(EVL_MUTEX_FLCLAIM|EVL_MUTEX_FLCEIL).
>> +	 */
>>  	h = atomic_read(lockp);
>> -	h = atomic_cmpxchg(lockp, h, EVL_NO_HANDLE);
>
> This here was different from current Xenomai 3:
>
> 	currh = curr->handle;
> 	[...]
> 	h = atomic_cmpxchg(lockp, currh, XN_NO_HANDLE);
>
> Under cobalt, we are using the thread handle as expectation value for
> the fast lock. If FLCLAIM should have been set there, the cmpxchg will
> fail, and the fast lock will NOT contain XN_NO_HANDLE until we switch
> things in transfer_ownership(). IOW, I don't see how Xenomai 3 could be
> affected in the same way by the issue that you fixed in EVL.
>
> Am I missing something?

No, I agree, Xenomai3 is immune. Either FLCLAIM is set in which case we
would not match the current handle therefore the fast handle would not
be available for grab, or it is set and we would not attempt to transfer
the ownership.

False alarm, my bad.

-- 
Philippe.

  reply	other threads:[~2023-08-11 13:21 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 20:47 Issues with evl_mutex_trylock() Giulio Moro
2022-08-26 14:09 ` Philippe Gerum
2022-08-26 15:16   ` Giulio Moro
2022-08-26 15:33     ` Philippe Gerum
2022-08-26 16:17   ` Philippe Gerum
2022-08-26 18:07     ` Philippe Gerum
2022-08-27 16:12     ` Giulio Moro
2022-08-27 17:53     ` Philippe Gerum
2023-08-10  6:00       ` Jan Kiszka
2023-08-10  6:40         ` sang yo
2023-08-10  8:18         ` Philippe Gerum
2023-08-11  7:58           ` Jan Kiszka
2023-08-11 11:31       ` Jan Kiszka
2023-08-11 12:31     ` Jan Kiszka
2023-08-11 13:17       ` Philippe Gerum [this message]
2023-08-11 13:49         ` Jan Kiszka
2023-08-15  6:07           ` sang yo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87il9l90k5.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=giulio@bela.io \
    --cc=jan.kiszka@siemens.com \
    --cc=stlfatboy@hotmail.com \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.