Re: [RFC PATCH v2 10/12] rv: Retry when da monitor detects race conditions

linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nam Cao <namcao@linutronix.de>
To: Gabriele Monaco <gmonaco@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-trace-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tomas Glozar <tglozar@redhat.com>, Juri Lelli <jlelli@redhat.com>
Subject: Re: [RFC PATCH v2 10/12] rv: Retry when da monitor detects race conditions
Date: Wed, 21 May 2025 08:58:00 +0200	[thread overview]
Message-ID: <20250521065800.jgtPyWd7@linutronix.de> (raw)
In-Reply-To: <cdcab46480cbfe2c3320e060ea9603adbd04956c.camel@redhat.com>

On Mon, May 19, 2025 at 01:13:01PM +0200, Gabriele Monaco wrote:
> 
> 
> On Mon, 2025-05-19 at 12:38 +0200, Nam Cao wrote:
> > On Mon, May 19, 2025 at 12:28:12PM +0200, Gabriele Monaco wrote:
> > > Mmh, although definitely unlikely, I'm thinking of a case in which
> > > the
> > > event starts on one CPU and at the same time we see events in IRQ
> > > and 
> > > on another CPU, let's say continuously. Nothing forbids that
> > > between
> > > any two consecutive try_cmpxchg another CPU/context changes the
> > > next
> > > state (making the local try_cmpxchg fail).
> > > In practice I've never seen it going on the second iteration, as
> > > the
> > > critical section is really tiny, but I'm not sure we can guarantee
> > > this
> > > never happens.
> > > Or am I missing something?
> > 
> > I have a feeling that you missed my point. I agree that the retrying
> > is
> > needed, because we may race with another.
> > 
> > What I am proposing is that we drop the MAX_DA_RETRY_RACING_EVENTS,
> > and
> > just keep retrying until we succeed.
> > 
> > And that's safe to do, because the maximum number of retries is the
> > number
> > of tasks contending with us to set the monitor's state. So we know we
> > won't
> > be retrying for long.
> 
> I get this point, what I mean is: can we really guarantee the number of
> contending tasks (or contexts) is finite?
> In other words, the try_cmpxchg guarantees 1 and only 1 actor wins
> every time, but cannot guarantee all actors will eventually win, an
> actor /could/ be hanging there forever.
> 
> This handler is running for each event in the monitor and tracepoint
> handlers can be interrupted as well as run in interrupt context (where
> of course they cannot be interrupted). I don't think the number of
> actors is bounded by the number of CPUs.
> I see this situation is extremely unlikely, but in an exotic scenario
> where a CPU is sufficiently slower than others (e.g. in a VM) I believe
> we can see this critical section large enough for this to potentially
> happen.
> 
> I'm not quite afraid of infinite loops, but rather RV introducing
> unbounded latency very hard to track and without any reporting.
> Chances are, since tracepoints and actual traced events are not atomic,
> that by the time this delayed context /wins/ the RV event is no longer
> current, so we may see an error already.
> 
> Does it make sense to you or am I making it more complex than it should
> be?

Right, I can see that being a problem. But I don't know enough about it to
comment further, so do as you think best, maybe someone else can help.

Best regards,
Nam

next prev parent reply	other threads:[~2025-05-21  6:58 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250514084314.57976-1-gmonaco@redhat.com>
2025-05-14  8:43 ` [RFC PATCH v2 01/12] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-05-14  8:43 ` [RFC PATCH v2 02/12] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
2025-05-14  8:43 ` [RFC PATCH v2 03/12] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-05-19  8:02   ` Nam Cao
2025-05-14  8:43 ` [RFC PATCH v2 04/12] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-05-19  8:05   ` Nam Cao
2025-05-14  8:43 ` [RFC PATCH v2 05/12] rv: Return init error when registering monitors Gabriele Monaco
2025-05-19  8:06   ` Nam Cao
2025-05-14  8:43 ` [RFC PATCH v2 06/12] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
2025-05-19  8:29   ` Nam Cao
2025-05-19  8:41     ` Gabriele Monaco
2025-05-19  8:43       ` Nam Cao
2025-05-14  8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
2025-05-19  8:42   ` Nam Cao
2025-05-19  9:04     ` Gabriele Monaco
2025-05-14  8:43 ` [RFC PATCH v2 08/12] rv: Extend and adapt snroc model Gabriele Monaco
2025-05-14  8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-06-24  7:36   ` Nam Cao
2025-06-24 14:44     ` Gabriele Monaco
2025-06-24 15:50       ` Nam Cao
2025-06-24 19:31         ` Steven Rostedt
2025-06-27 15:02           ` Nam Cao
2025-05-14  8:43 ` [RFC PATCH v2 10/12] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-05-19  9:06   ` Nam Cao
2025-05-19 10:28     ` Gabriele Monaco
2025-05-19 10:38       ` Nam Cao
2025-05-19 11:13         ` Gabriele Monaco
2025-05-21  6:58           ` Nam Cao [this message]
2025-05-14  8:43 ` [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-05-14  8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
2025-05-27 13:37   ` Nam Cao
2025-05-27 14:35     ` Gabriele Monaco
2025-05-27 14:50       ` Nam Cao
2025-05-28 11:27         ` Gabriele Monaco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250521065800.jgtPyWd7@linutronix.de \
    --to=namcao@linutronix.de \
    --cc=gmonaco@redhat.com \
    --cc=jlelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglozar@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).