From: Nam Cao <namcao@linutronix.de>
To: Gabriele Monaco <gmonaco@redhat.com>
Cc: linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
linux-trace-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Tomas Glozar <tglozar@redhat.com>, Juri Lelli <jlelli@redhat.com>
Subject: Re: [RFC PATCH v2 10/12] rv: Retry when da monitor detects race conditions
Date: Wed, 21 May 2025 08:58:00 +0200 [thread overview]
Message-ID: <20250521065800.jgtPyWd7@linutronix.de> (raw)
In-Reply-To: <cdcab46480cbfe2c3320e060ea9603adbd04956c.camel@redhat.com>
On Mon, May 19, 2025 at 01:13:01PM +0200, Gabriele Monaco wrote:
>
>
> On Mon, 2025-05-19 at 12:38 +0200, Nam Cao wrote:
> > On Mon, May 19, 2025 at 12:28:12PM +0200, Gabriele Monaco wrote:
> > > Mmh, although definitely unlikely, I'm thinking of a case in which
> > > the
> > > event starts on one CPU and at the same time we see events in IRQ
> > > and
> > > on another CPU, let's say continuously. Nothing forbids that
> > > between
> > > any two consecutive try_cmpxchg another CPU/context changes the
> > > next
> > > state (making the local try_cmpxchg fail).
> > > In practice I've never seen it going on the second iteration, as
> > > the
> > > critical section is really tiny, but I'm not sure we can guarantee
> > > this
> > > never happens.
> > > Or am I missing something?
> >
> > I have a feeling that you missed my point. I agree that the retrying
> > is
> > needed, because we may race with another.
> >
> > What I am proposing is that we drop the MAX_DA_RETRY_RACING_EVENTS,
> > and
> > just keep retrying until we succeed.
> >
> > And that's safe to do, because the maximum number of retries is the
> > number
> > of tasks contending with us to set the monitor's state. So we know we
> > won't
> > be retrying for long.
>
> I get this point, what I mean is: can we really guarantee the number of
> contending tasks (or contexts) is finite?
> In other words, the try_cmpxchg guarantees 1 and only 1 actor wins
> every time, but cannot guarantee all actors will eventually win, an
> actor /could/ be hanging there forever.
>
> This handler is running for each event in the monitor and tracepoint
> handlers can be interrupted as well as run in interrupt context (where
> of course they cannot be interrupted). I don't think the number of
> actors is bounded by the number of CPUs.
> I see this situation is extremely unlikely, but in an exotic scenario
> where a CPU is sufficiently slower than others (e.g. in a VM) I believe
> we can see this critical section large enough for this to potentially
> happen.
>
> I'm not quite afraid of infinite loops, but rather RV introducing
> unbounded latency very hard to track and without any reporting.
> Chances are, since tracepoints and actual traced events are not atomic,
> that by the time this delayed context /wins/ the RV event is no longer
> current, so we may see an error already.
>
> Does it make sense to you or am I making it more complex than it should
> be?
Right, I can see that being a problem. But I don't know enough about it to
comment further, so do as you think best, maybe someone else can help.
Best regards,
Nam
next prev parent reply other threads:[~2025-05-21 6:58 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
2025-05-14 8:43 ` [RFC PATCH v2 01/12] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 02/12] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 03/12] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-05-19 8:02 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 04/12] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-05-19 8:05 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 05/12] rv: Return init error when registering monitors Gabriele Monaco
2025-05-19 8:06 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 06/12] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
2025-05-19 8:29 ` Nam Cao
2025-05-19 8:41 ` Gabriele Monaco
2025-05-19 8:43 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
2025-05-19 8:42 ` Nam Cao
2025-05-19 9:04 ` Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 08/12] rv: Extend and adapt snroc model Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-06-24 7:36 ` Nam Cao
2025-06-24 14:44 ` Gabriele Monaco
2025-06-24 15:50 ` Nam Cao
2025-06-24 19:31 ` Steven Rostedt
2025-06-27 15:02 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 10/12] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-05-19 9:06 ` Nam Cao
2025-05-19 10:28 ` Gabriele Monaco
2025-05-19 10:38 ` Nam Cao
2025-05-19 11:13 ` Gabriele Monaco
2025-05-21 6:58 ` Nam Cao [this message]
2025-05-14 8:43 ` [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
2025-05-27 13:37 ` Nam Cao
2025-05-27 14:35 ` Gabriele Monaco
2025-05-27 14:50 ` Nam Cao
2025-05-28 11:27 ` Gabriele Monaco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250521065800.jgtPyWd7@linutronix.de \
--to=namcao@linutronix.de \
--cc=gmonaco@redhat.com \
--cc=jlelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglozar@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).