From: Gabriele Monaco <gmonaco@redhat.com>
To: Nam Cao <namcao@linutronix.de>
Cc: linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
linux-trace-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Tomas Glozar <tglozar@redhat.com>, Juri Lelli <jlelli@redhat.com>
Subject: Re: [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions
Date: Fri, 11 Apr 2025 08:09:12 +0200 [thread overview]
Message-ID: <360fe1770511702e35081e950bf2a362a4c092a8.camel@redhat.com> (raw)
In-Reply-To: <20250411045225.gP4DqmFt@linutronix.de>
On Fri, 2025-04-11 at 06:52 +0200, Nam Cao wrote:
> On Fri, Apr 04, 2025 at 10:45:20AM +0200, Gabriele Monaco wrote:
> > DA monitor can be accessed from multiple cores simultaneously, this
> > is
> > likely, for instance when dealing with per-task monitors reacting
> > on
> > events that do not always occur on the CPU where the task is
> > running.
> > This can cause race conditions where two events change the next
> > state
> > and we see inconsistent values. E.g.:
> >
> > [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
> > [63] event_srs: 27: sleepable x sched_set_state_sleepable ->
> > sleepable
> > [63] error_srs: 27: event sched_switch_suspend not expected in
> > the state running
> >
> > In this case the monitor fails because the event on CPU 62 wins
> > against
> > the one on CPU 63, although the correct state should have been
> > sleepable, since the task get suspended.
> >
> > Detect if the current state was modified by using try_cmpxchg while
> > storing the next value. If it was, try again reading the current
> > state.
> > After a maximum number of failed retries, react as if it was an
> > error
> > with invalid current state (we cannot determine it).
> >
> > Monitors where this type of condition can occur must be able to
> > account
> > for racing events in any possible order, as we cannot know the
> > winner.
>
> Is locking not simpler? I understand raw_spin_lock() doesn't work
> because
> it steps on some tracepoints. But how about adding something like
> raw_spin_lock_notrace()?
It is probably simpler, but I think it would require also to disable
interrupts (some events occur in interrupt context), I'm not sure the
introduced overhead is going to be worth it in the fast path, but
that's kinda what I wanted to learn in this RFC ;)
>
> static inline bool raw_spin_lock_notrace(raw_spinlock_t *lock)
> {
> preempt_disable_notrace(); //probably not required,
> tracepoint handlers do this already
>
> if (!do_raw_spin_trylock(lock))
> do_raw_spin_lock(lock);
> }
>
> My LTL series theoretically also has this problem, but I have never
> got it
> during testing yet. We should use the same solution for both DA and
> LTL.
Yes totally, on the long run we might get some common utilities for
this kind of things that aren't too monitor specific. But for now I
wouldn't worry too much.
>
> Also, can you please Cc me in your RV patches?
>
Right.. will do!
Thanks for your feedback,
Gabriele
next prev parent reply other threads:[~2025-04-11 6:09 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 8:45 [RFC PATCH 0/9] rv: Add monitors to validate task switch Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 1/9] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 2/9] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 3/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 4/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 5/9] sched: Add sched tracepoints for RV task model Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 6/9] sched: Treat try_to_block_task with pending signal as wakeup Gabriele Monaco
2025-04-13 15:05 ` Nam Cao
2025-04-14 10:31 ` Gabriele Monaco
2025-04-15 11:04 ` Nam Cao
2025-04-15 11:30 ` Gabriele Monaco
2025-04-16 9:20 ` Nam Cao
2025-04-16 11:42 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-04-11 4:52 ` Nam Cao
2025-04-11 6:09 ` Gabriele Monaco [this message]
2025-04-04 8:45 ` [RFC PATCH 8/9] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 9/9] rv: Add srs per-task monitor Gabriele Monaco
2025-04-10 8:53 ` Juri Lelli
2025-04-11 6:12 ` Gabriele Monaco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=360fe1770511702e35081e950bf2a362a4c092a8.camel@redhat.com \
--to=gmonaco@redhat.com \
--cc=jlelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=namcao@linutronix.de \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglozar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox