From: Philippe Gerum <rpm@xenomai.org>
To: Dave Rolenc <Dave.Rolenc@kratosdefense.com>
Cc: "xenomai@lists.linux.dev" <xenomai@lists.linux.dev>,
Russell Johnson <russell.johnson@kratosdefense.com>
Subject: Re: [PATCH 0/1] rwsem_down_write_slowpath check if oob() before skipping schedule()
Date: Wed, 31 May 2023 18:55:57 +0200 [thread overview]
Message-ID: <877csoxvle.fsf@xenomai.org> (raw)
In-Reply-To: <eb4b2a7608bf4cb1b16c75b07d095602@PH1P110MB1666.NAMP110.PROD.OUTLOOK.COM>
Dave Rolenc <Dave.Rolenc@kratosdefense.com> writes:
>> What's even worse and the root cause of that issue is that no task on
>> the oob stage should ever run rwsem_down_write_slowpath() in the first
>> place.
>
[snip]
>
> Assuming you're correct that this code never runs out of band, that
> just means that the schedule call will never get skipped by the goto as
> the running_oob() always returns false. The important part I observed is
> that if that goto trylock_again gets hit the task never gets out of that
> loop. If we make it so the schedule() chunk never gets skipped, things
> work fine.
>
> Empirical evidence shows that never skipping that schedule() call makes
> the problem go away. My test scenario, which is way too involved to
> package up for you and involves custom hardware, will run a couple hours
> max before getting a stuck CPU. With the patch it ran over 4 days
> without issue. Assuming running_oob always returns false, the code
> should be roughly equivalent to commenting out the lines as I did in my
> first attempt. My first attempt at commenting out the lines also worked
> fine for over 24 hours. I wish I had more of a definitive answer to the
> other task involved, but the stack traces didn't really help there. Not
> having a fully working kernel debugger kind of limited what I could see,
> so I had to sample stack traces and figure out which code path it was
> taking by piecing together the information I had.
>
> I can definitely run your suggested test with the WARN_ONCE if you
> really want, but I don't think the cause is some oob context running
> this code. It was just my misunderstanding that this code could run in
> oob coupled with my desire to not delete those lines if at all possible.
>
Ok, got it. Adding running_oob() which should always evaluate to false
only prevents the code from spinning, papering over the issue. Replacing
running_oob() by dovetailing() would achieve the same purpose. If so, my
patch does not bring anything valuable. Besides, if the Dovetail debug
is compiled in, a bad (oob) context running strictly in-band code would
most certainly have caused some existing assertions to trigger anyway.
> Do you have any thoughts on how to proceed?
>
First thing would be to enable CONFIG_DEBUG_RWSEMS and re-run an
overnight test without any patch in, hoping for the native debug
infrastructure to give us some hint.
--
Philippe.
next prev parent reply other threads:[~2023-05-31 18:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-30 19:04 [PATCH 0/1] rwsem_down_write_slowpath check if oob() before skipping schedule() Dave Rolenc
2023-05-31 6:22 ` Philippe Gerum
2023-05-31 15:27 ` Dave Rolenc
2023-05-31 16:55 ` Philippe Gerum [this message]
2023-05-31 21:48 ` [External] - " Dave Rolenc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877csoxvle.fsf@xenomai.org \
--to=rpm@xenomai.org \
--cc=Dave.Rolenc@kratosdefense.com \
--cc=russell.johnson@kratosdefense.com \
--cc=xenomai@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.