All of lore.kernel.org
 help / color / mirror / Atom feed
From: Philippe Gerum <rpm@xenomai.org>
To: Giulio Moro <giulio@bela.io>
Cc: xenomai@lists.linux.dev
Subject: Re: Issues with evl_mutex_trylock()
Date: Fri, 26 Aug 2022 16:09:25 +0200	[thread overview]
Message-ID: <87pmgnnkyb.fsf@xenomai.org> (raw)
In-Reply-To: <a4c1ed74-2ae9-9427-994b-54dd7773399e@bela.io>


Giulio Moro <giulio@bela.io> writes:

> In the toy example below I start some pthreads with SCHED_FIFO, priority 94 and setting the affinity of one thread per core. I then attach them to the EVL core and share a lock between them and the main thread. The main thread has a settable priority and schedule (SCHED_OTHER with prio 0 or SCHED_FIFO with higher priority) and its affinity is set to all available cores.
>
> Here's a summary of what I get when running the test with some parameters (arguments are: "max iterations", "number of threads to create", "priority of the main thread", "flags (t means don't print anything). The error code reported alongside failures is the base 2 representation of the return code. 100 means that errors were detected in evl_unlock_mutex() in the main thread following a successful call to evl_trylock_mutex(). During these tests, the output is piped to cat and then redirected to /dev/null.
>
> mutex-test 100000 0 0 t :  Success real 0m13.583s user 0m1.218s sys 0m0.000s
> mutex-test 100000 0 90 t :  Success real 0m13.350s user 0m1.177s sys 0m0.000s
> mutex-test 100000 0 99 t :  Success real 0m13.347s user 0m1.179s sys 0m0.000s
> mutex-test 100000 1 0 t :  Success real 0m20.087s user 0m3.575s sys 0m0.000s
> mutex-test 100000 1 90 t :  Success real 0m17.610s user 0m2.133s sys 0m0.000s
> mutex-test 100000 1 99 t :  Success real 0m15.466s user 0m1.333s sys 0m0.000s
> mutex-test 100000 2 0 t :  Failed 100 real 0m0.259s user 0m0.083s sys 0m0.000s
> mutex-test 100000 2 90 t :  Failed 100 real 0m0.374s user 0m0.176s sys 0m0.000s
> mutex-test 100000 2 99 t :  Failed 100 real 0m0.207s user 0m0.109s sys 0m0.000s
> mutex-test 100000 3 0 t :  Failed 100 real 0m0.253s user 0m0.124s sys 0m0.000s
> mutex-test 100000 3 90 t :  Failed 100 real 0m0.383s user 0m0.149s sys 0m0.000s
> mutex-test 100000 3 99 t :  Failed 100 real 0m0.283s user 0m0.151s sys 0m0.000s
> mutex-test 100000 4 0 t :  Failed 100 real 0m0.526s user 0m0.221s sys 0m0.000s
> mutex-test 100000 4 90 t :  Failed 100 real 0m0.438s user 0m0.178s sys 0m0.000s
> mutex-test 100000 4 99 t :  Failed 100 real 0m0.434s user 0m0.178s sys 0m0.000s
>
> So, when running with 2 or more threads, the same error keeps surfacing very quickly.
>

Confirmed. There is something wrong.

> I further observe the following unexpected behaviours:
> - in the 0 and 1 thread cases, I get increasing values in the ISW
> column for the main thread when monitoring with evl ps -st. This
> happens regardless of the priority of the main thread and of the
> amount of data printed to the screen doesn't happen with 2, 3 or 4
> threads (for these I have to run with 'k' to keep going after the
> errors and be able to observe the output of evl ps)

This is due to an issue in your test code: evl_attach_thread() inherits
the current POSIX scheduling params for the caller, so you need to set
them prior to calling this routine. Otherwise, you need to change them
EVL-wise after attachment using evl_set_schedattr().

> - running `./mutex-test 100000 1 X a` (where 'a' stands for "print
> all", and for any priority X) will almost always quickly show the
> following behaviour: the main thread to stops running, and process
> kworker/u8:3+stdout:3033.O runs at 100% CPU. I don't think it's
> expected that a SCHED_FIFO thread attached to the EVL core becomes
> blocked because it calls evl_printf(). Note that only the main thread
> is blocked, the other thread runs. This doesn't happen when numThreads
> == 0 or numThreads >= 2.

This is the expected behavior unless you switch the file descriptor to
the stdout/stderr proxy to non blocking I/O. If you do, the oob caller
would not wait for the output to drain and return immediately with
-EAGAIN, dropping the current message in the same move.

> - running with `./mutex-test 100000 1 X q`, where 'q' is for "don't print anything", gives me 90% of the time an immediate hard lock up of the board (not even a stacktrace out of the UART!) (X can be any priority). Note that if this is run with `t` (throttled printing) or `a` (print all), it works kind of OK, apart from the issue mentione in the previous bullet point

Which is expected as well: although your main thread does sleep for 100
µs at each iteration, thread 1-n don't, the starving the system from CPU
time on their respecting processors. Enabling the watchdog in the EVL
debug settings would likely trigger a report from the core.

> - occasionally I get other lockups of the board during these tests that harder to reproduce. For instance, one happened at some point while running `./mutex-test 10000 3 0 t`

Could be the same issue as above.

> - I do get some EVL-related stacktraces relatively often when ctrl-C a starting program. I will make a separate post about this.
>
> As previously, this is tested on TI's SK-AM62 with this branch https://source.denx.de/Xenomai/xenomai4/linux-evl/-/tree/vendor/ti/v5.10.120 and commit 7ccc58d62905724f1b8de4823e0fe8e0129b0fb7 of libevl (r37 + 1 )
>

-- 
Philippe.

  reply	other threads:[~2022-08-26 14:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-25 20:47 Issues with evl_mutex_trylock() Giulio Moro
2022-08-26 14:09 ` Philippe Gerum [this message]
2022-08-26 15:16   ` Giulio Moro
2022-08-26 15:33     ` Philippe Gerum
2022-08-26 16:17   ` Philippe Gerum
2022-08-26 18:07     ` Philippe Gerum
2022-08-27 16:12     ` Giulio Moro
2022-08-27 17:53     ` Philippe Gerum
2023-08-10  6:00       ` Jan Kiszka
2023-08-10  6:40         ` sang yo
2023-08-10  8:18         ` Philippe Gerum
2023-08-11  7:58           ` Jan Kiszka
2023-08-11 11:31       ` Jan Kiszka
2023-08-11 12:31     ` Jan Kiszka
2023-08-11 13:17       ` Philippe Gerum
2023-08-11 13:49         ` Jan Kiszka
2023-08-15  6:07           ` sang yo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmgnnkyb.fsf@xenomai.org \
    --to=rpm@xenomai.org \
    --cc=giulio@bela.io \
    --cc=xenomai@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.