linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/17] rv: Add monitors to validate task switch
@ 2025-07-15  7:14 Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 01/17] tools/rv: Do not skip idle in trace Gabriele Monaco
                   ` (17 more replies)
  0 siblings, 18 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Nam Cao
  Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

This series adds three monitors to the sched collection, extends and
replaces previously existing monitors:

tss => sts:
Not only prove that switches occur in scheduling context but also that
each call to the scheduler implies a switch (also a vain one) and
disables interrupts doing so.

sco:
The sched_set_state tracepoint can now be called from scheduling context
in a particular condition, that is when there is a pending signal. Adapt
the monitor to cover this scenario.

snroc:
Include sched_switch_vain as transition only possible in own context,
this model is also required to keep the following two models simple.

nrp (NEW):
* preemption requires need resched which is cleared by any switch
  (includes a non optimal workaround for /nested/ preemptions)

sssw (NEW):
* suspension requires setting the task to sleepable and, after the
  switch occurs, the task requires a wakeup to come back to runnable

opid (NEW):
* waking and need-resched operations occur with interrupts and
  preemption disabled or in IRQ without explicitly disabling preemption

Also include some minor cleanup patches (1-10) tracepoints (12) and
preparatory fixes (11) covering some corner cases:

The series is currently based on the trace/for-next tree [1] as it
requires some other patches on RV.

Patch 1 fixes the behaviour of the rv tool with -s and idle tasks.

Patch 2 allows the rv tool to gracefully terminate with SIGTERM

Patch 3 adds da_handle_start_run_event_ also to per-task monitors

Patch 4 removes a trailing whitespace from the rv tracepoint string

Patch 5 returns the registration error in all DA monitor instead of 0

Patch 6 fixes an out-of-bound memory access in DA tracepoints

Patch 7 adjusts monitors to have minimised Kconfig dependencies

Patch 8 properly orders nested monitors in the RV Kconfig file

Patch 9 adjusts dot2c not to create lines over 100 columns

Patch 10 adapts monitors headers based on patch 9

Patch 11 detects race conditions when rv monitors run concurrently and
retries applying the events

Patch 12 adds the need_resched and switch_vain tracepoints and adds a
parameter to the sched_set_state tracepoint

Patch 13 adapts the sco monitor to the new version of sched_set_state

Patch 14 extends the snroc model and adapts it to the new sched_set_state

Patch 15 adds the sts monitor to replace tss

Patch 16 adds the nrp and sssw monitors

Patch 17 adds the opid monitor

NOTES

The nrp and sssw monitors include workarounds for racy conditions:

* A sleeping task requires to set the state to sleepable, but in case of
  a task sleeping on an rtlock, the set sleepable and wakeup events race
  and we don't always see the right order:

 5d..2. 107.488369: event: 639: sleepable x set_sleepable -> sleepable
 4d..5. 107.488369: event: 639: sleepable x wakeup -> running (final)
 5d..3. 107.488385: error: 639: switch_suspend not expected in the state running

    wakeup()                    set_state()
        state=RUNNING
                                    trace_set_state()
        trace_wakeup()
                                    state=SLEEPING

  I added a special event (switch_block) but there may be a better way.
  Taking a pi_lock in rtlock_slowlock_locked when setting the state to
  TASK_RTLOCK_WAIT avoids this race, although this is likely not
  something we want to do.

* I consider preemption any scheduling with preempt==true and assume
  this can happen only if need resched is set.
  In practice, however, we may see a preemption where the flag
  is not set. This can happen in one specific condition:

  need_resched
                  preempt_schedule()
                                        preempt_schedule_irq()
                                            __schedule()
  !need_resched
                      __schedule()

  We start a standard preemption (e.g. from preempt_enable when the flag
  is set), an interrupts occurs before we schedule and, on its exit path,
  it schedules, which clears the need_resched flag.
  When the preempted task runs again, we continue the standard
  preemption started earlier, although the flag is no longer set.

  I added a workaround to allow the model not to fail in this condition,
  but it makes the model weaker. In fact, we are not proving that:
   1. we don't miss any event setting need_resched
   2. we don't preempt when not required
  Future versions of the monitor may weaken assumptions only when we
  register an interrupt.

Changes since RFC2:
 * Arrange commits to prevent failed build while bisecting.
 * Avoid dot2k generated files to reach the column limit. (Nam Cao)
 * Rearrange and simplify da_monitor retry on racing events.
 * Improve nrp monitor to handle /nested/ preemption on IRQ.
 * Added minor patches (6-10).
 * Cleanup and rearrange order.
Changes since RFC [2]:
 * Remove wakeup tracepoint in try_to_block_task and use a different
   flavour of sched_set_state
 * Split the large srs monitor in two separate monitors for preemption
   and sleep. These no longer have a concept of running task, they just
   enforce the requirements for the different types of sched out.
 * Restore the snroc monitor to describe the relationship between
   generic sched out and sched in.
 * Add opid monitor.
 * Fix some build errors and cleanup.

[1] - git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
[2] - https://lore.kernel.org/lkml/20250404084512.98552-11-gmonaco@redhat.com

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>

Gabriele Monaco (16):
  tools/rv: Do not skip idle in trace
  tools/rv: Stop gracefully also on SIGTERM
  rv: Add da_handle_start_run_event_ to per-task monitors
  rv: Remove trailing whitespace from tracepoint string
  rv: Return init error when registering monitors
  rv: Use strings in da monitors tracepoints
  rv: Adjust monitor dependencies
  verification/rvgen: Organise Kconfig entries for nested monitors
  rv: Fix generated files going over 100 column limit
  rv: Retry when da monitor detects race conditions
  sched: Adapt sched tracepoints for RV task model
  rv: Adapt the sco monitor to the new set_state
  rv: Extend snroc model
  rv: Replace tss monitor with more complete sts
  rv: Add nrp and sssw per-task monitors
  rv: Add opid per-cpu monitor

Nam Cao (1):
  tools/dot2c: Fix generated files going over 100 column limit

 Documentation/trace/rv/monitor_sched.rst      | 354 +++++++++++++++---
 include/linux/rv.h                            |   3 +-
 include/linux/sched.h                         |   7 +-
 include/rv/da_monitor.h                       | 115 +++---
 include/trace/events/sched.h                  |  17 +-
 kernel/sched/core.c                           |  10 +-
 kernel/trace/rv/Kconfig                       |  10 +-
 kernel/trace/rv/Makefile                      |   5 +-
 kernel/trace/rv/monitors/{tss => nrp}/Kconfig |  10 +-
 kernel/trace/rv/monitors/nrp/nrp.c            | 146 ++++++++
 kernel/trace/rv/monitors/nrp/nrp.h            |  87 +++++
 kernel/trace/rv/monitors/nrp/nrp_trace.h      |  15 +
 kernel/trace/rv/monitors/opid/Kconfig         |  17 +
 kernel/trace/rv/monitors/opid/opid.c          | 169 +++++++++
 kernel/trace/rv/monitors/opid/opid.h          | 104 +++++
 kernel/trace/rv/monitors/opid/opid_trace.h    |  15 +
 kernel/trace/rv/monitors/sched/Kconfig        |   1 +
 kernel/trace/rv/monitors/sched/sched.c        |   3 +-
 kernel/trace/rv/monitors/sco/sco.c            |  11 +-
 kernel/trace/rv/monitors/sco/sco.h            |  16 +-
 kernel/trace/rv/monitors/scpd/Kconfig         |   2 +-
 kernel/trace/rv/monitors/scpd/scpd.c          |   3 +-
 kernel/trace/rv/monitors/sleep/sleep.c        |   3 +-
 kernel/trace/rv/monitors/sncid/Kconfig        |   2 +-
 kernel/trace/rv/monitors/sncid/sncid.c        |   3 +-
 kernel/trace/rv/monitors/snep/Kconfig         |   2 +-
 kernel/trace/rv/monitors/snep/snep.c          |   3 +-
 kernel/trace/rv/monitors/snep/snep.h          |  14 +-
 kernel/trace/rv/monitors/snroc/snroc.c        |  15 +-
 kernel/trace/rv/monitors/snroc/snroc.h        |  18 +-
 kernel/trace/rv/monitors/sssw/Kconfig         |  15 +
 kernel/trace/rv/monitors/sssw/sssw.c          | 115 ++++++
 kernel/trace/rv/monitors/sssw/sssw.h          |  97 +++++
 kernel/trace/rv/monitors/sssw/sssw_trace.h    |  15 +
 kernel/trace/rv/monitors/sts/Kconfig          |  18 +
 kernel/trace/rv/monitors/sts/sts.c            | 117 ++++++
 kernel/trace/rv/monitors/sts/sts.h            |  97 +++++
 .../{tss/tss_trace.h => sts/sts_trace.h}      |   8 +-
 kernel/trace/rv/monitors/tss/tss.c            |  91 -----
 kernel/trace/rv/monitors/tss/tss.h            |  47 ---
 kernel/trace/rv/monitors/wip/Kconfig          |   2 +-
 kernel/trace/rv/monitors/wip/wip.c            |   3 +-
 kernel/trace/rv/monitors/wwnr/wwnr.c          |   3 +-
 kernel/trace/rv/rv_trace.h                    |  89 ++---
 tools/verification/models/sched/nrp.dot       |  29 ++
 tools/verification/models/sched/opid.dot      |  35 ++
 tools/verification/models/sched/sco.dot       |   1 +
 tools/verification/models/sched/snroc.dot     |   2 +-
 tools/verification/models/sched/sssw.dot      |  24 ++
 tools/verification/models/sched/sts.dot       |  29 ++
 tools/verification/models/sched/tss.dot       |  18 -
 tools/verification/rv/src/in_kernel.c         |   2 +-
 tools/verification/rv/src/rv.c                |   1 +
 tools/verification/rvgen/rvgen/container.py   |  13 +
 tools/verification/rvgen/rvgen/dot2c.py       |  19 +-
 tools/verification/rvgen/rvgen/generator.py   |  13 +-
 56 files changed, 1730 insertions(+), 353 deletions(-)
 rename kernel/trace/rv/monitors/{tss => nrp}/Kconfig (60%)
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.c
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.h
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp_trace.h
 create mode 100644 kernel/trace/rv/monitors/opid/Kconfig
 create mode 100644 kernel/trace/rv/monitors/opid/opid.c
 create mode 100644 kernel/trace/rv/monitors/opid/opid.h
 create mode 100644 kernel/trace/rv/monitors/opid/opid_trace.h
 create mode 100644 kernel/trace/rv/monitors/sssw/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.c
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.h
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw_trace.h
 create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sts/sts.c
 create mode 100644 kernel/trace/rv/monitors/sts/sts.h
 rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
 create mode 100644 tools/verification/models/sched/nrp.dot
 create mode 100644 tools/verification/models/sched/opid.dot
 create mode 100644 tools/verification/models/sched/sssw.dot
 create mode 100644 tools/verification/models/sched/sts.dot
 delete mode 100644 tools/verification/models/sched/tss.dot


base-commit: 76604f9ce795edc8c134a6c60b76eb2999ecb739
-- 
2.50.1


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-16 11:50   ` Peter Zijlstra
  2025-07-15  7:14 ` [PATCH v3 02/17] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Currently, the userspace RV tool skips trace events triggered by the RV
tool itself, this can be changed by passing the parameter -s, which sets
the variable config_my_pid to 0 (instead of the tool's PID).
The current condition for per-task monitors (config_has_id) does not
check that config_my_pid isn't 0 to skip. In case we pass -s, we show
events triggered by RV but don't show those triggered by idle (PID 0).

Fix the condition to account this scenario.

Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 tools/verification/rv/src/in_kernel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
index c0dcee795c0de..72b03bae021cd 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -429,7 +429,7 @@ ikm_event_handler(struct trace_seq *s, struct tep_record *record,
 
 	tep_get_common_field_val(s, trace_event, "common_pid", record, &pid, 1);
 
-	if (config_has_id && (config_my_pid == id))
+	if (config_my_pid && config_has_id && (config_my_pid == id))
 		return 0;
 	else if (config_my_pid && (config_my_pid == pid))
 		return 0;
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 02/17] tools/rv: Stop gracefully also on SIGTERM
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 01/17] tools/rv: Do not skip idle in trace Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 03/17] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Currently the userspace RV tool starts a monitor and waits for the user
to press Ctrl-C (SIGINT) to terminate and stop the monitor.
This doesn't account for a scenario where a user starts RV in background
and simply kills it (SIGTERM unless the user specifies differently).
E.g.:
 # rv mon wip &
 # kill %

Would terminate RV without stopping the monitor and next RV executions
won't start correctly.

Register the signal handler used for SIGINT also to SIGTERM.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 tools/verification/rv/src/rv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/verification/rv/src/rv.c b/tools/verification/rv/src/rv.c
index 239de054d1e06..b8fe24a87d97c 100644
--- a/tools/verification/rv/src/rv.c
+++ b/tools/verification/rv/src/rv.c
@@ -191,6 +191,7 @@ int main(int argc, char **argv)
 		 * and exit.
 		 */
 		signal(SIGINT, stop_rv);
+		signal(SIGTERM, stop_rv);
 
 		rv_mon(argc - 1, &argv[1]);
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 03/17] rv: Add da_handle_start_run_event_ to per-task monitors
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 01/17] tools/rv: Do not skip idle in trace Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 02/17] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 04/17] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The RV da_monitor API allows to start monitors in two ways:
da_handle_start_event_NAME and da_handle_start_run_event_NAME.
The former is used when the event is followed by the initial state of
the module, so we ignore the event but we know the monitor is in the
initial state and can start monitoring, the latter can be used if the
event can only occur in the initial state, so we do handle the event as
if the monitor was in the initial state.
This latter API is defined for implicit monitors but not per-task ones.

Define da_handle_start_run_event_NAME macro also for per-task monitors.

Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/rv/da_monitor.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 15f9ed4e4bb69..ed3c34fe18d61 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -487,6 +487,30 @@ da_handle_start_event_##name(struct task_struct *tsk, enum events_##name event)
 	__da_handle_event_##name(da_mon, tsk, event);						\
 												\
 	return 1;										\
+}												\
+												\
+/*												\
+ * da_handle_start_run_event_##name - start monitoring and handle event				\
+ *												\
+ * This function is used to notify the monitor that the system is in the			\
+ * initial state, so the monitor can start monitoring and handling event.			\
+ */												\
+static inline bool										\
+da_handle_start_run_event_##name(struct task_struct *tsk, enum events_##name event)		\
+{												\
+	struct da_monitor *da_mon;								\
+												\
+	if (!da_monitor_enabled_##name())							\
+		return 0;									\
+												\
+	da_mon = da_get_monitor_##name(tsk);							\
+												\
+	if (unlikely(!da_monitoring_##name(da_mon)))						\
+		da_monitor_start_##name(da_mon);						\
+												\
+	__da_handle_event_##name(da_mon, tsk, event);						\
+												\
+	return 1;										\
 }
 
 /*
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 04/17] rv: Remove trailing whitespace from tracepoint string
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (2 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 03/17] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 05/17] rv: Return init error when registering monitors Gabriele Monaco
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

RV event tracepoints print a line with the format:
    "event_xyz: S0 x event -> S1 "
    "event_xyz: S1 x event -> S0 (final)"

While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").

Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".

Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/rv_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index b6f3104984664..17ba07329b670 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -29,11 +29,11 @@ DECLARE_EVENT_CLASS(event_da_monitor,
 		__entry->final_state		= final_state;
 	),
 
-	TP_printk("%s x %s -> %s %s",
+	TP_printk("%s x %s -> %s%s",
 		__entry->state,
 		__entry->event,
 		__entry->next_state,
-		__entry->final_state ? "(final)" : "")
+		__entry->final_state ? " (final)" : "")
 );
 
 DECLARE_EVENT_CLASS(error_da_monitor,
@@ -90,12 +90,12 @@ DECLARE_EVENT_CLASS(event_da_monitor_id,
 		__entry->final_state		= final_state;
 	),
 
-	TP_printk("%d: %s x %s -> %s %s",
+	TP_printk("%d: %s x %s -> %s%s",
 		__entry->id,
 		__entry->state,
 		__entry->event,
 		__entry->next_state,
-		__entry->final_state ? "(final)" : "")
+		__entry->final_state ? " (final)" : "")
 );
 
 DECLARE_EVENT_CLASS(error_da_monitor_id,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 05/17] rv: Return init error when registering monitors
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (3 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 04/17] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 06/17] rv: Use strings in da monitors tracepoints Gabriele Monaco
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Monitors generated with dot2k have their registration function (the one
called during monitor initialisation) return always 0, even if the
registration failed on RV side.
This can hide potential errors.

Return the value returned by the RV register function.

Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/monitors/sched/sched.c | 3 +--
 kernel/trace/rv/monitors/sco/sco.c     | 3 +--
 kernel/trace/rv/monitors/scpd/scpd.c   | 3 +--
 kernel/trace/rv/monitors/sncid/sncid.c | 3 +--
 kernel/trace/rv/monitors/snep/snep.c   | 3 +--
 kernel/trace/rv/monitors/snroc/snroc.c | 3 +--
 kernel/trace/rv/monitors/tss/tss.c     | 3 +--
 kernel/trace/rv/monitors/wip/wip.c     | 3 +--
 kernel/trace/rv/monitors/wwnr/wwnr.c   | 3 +--
 9 files changed, 9 insertions(+), 18 deletions(-)

diff --git a/kernel/trace/rv/monitors/sched/sched.c b/kernel/trace/rv/monitors/sched/sched.c
index 905e03c3c934d..d04db4b543f96 100644
--- a/kernel/trace/rv/monitors/sched/sched.c
+++ b/kernel/trace/rv/monitors/sched/sched.c
@@ -21,8 +21,7 @@ struct rv_monitor rv_sched = {
 
 static int __init register_sched(void)
 {
-	rv_register_monitor(&rv_sched, NULL);
-	return 0;
+	return rv_register_monitor(&rv_sched, NULL);
 }
 
 static void __exit unregister_sched(void)
diff --git a/kernel/trace/rv/monitors/sco/sco.c b/kernel/trace/rv/monitors/sco/sco.c
index 4cff59220bfc7..66f4639d46ac4 100644
--- a/kernel/trace/rv/monitors/sco/sco.c
+++ b/kernel/trace/rv/monitors/sco/sco.c
@@ -71,8 +71,7 @@ static struct rv_monitor rv_sco = {
 
 static int __init register_sco(void)
 {
-	rv_register_monitor(&rv_sco, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_sco, &rv_sched);
 }
 
 static void __exit unregister_sco(void)
diff --git a/kernel/trace/rv/monitors/scpd/scpd.c b/kernel/trace/rv/monitors/scpd/scpd.c
index cbdd6a5f8d7fd..299703cd72b06 100644
--- a/kernel/trace/rv/monitors/scpd/scpd.c
+++ b/kernel/trace/rv/monitors/scpd/scpd.c
@@ -79,8 +79,7 @@ static struct rv_monitor rv_scpd = {
 
 static int __init register_scpd(void)
 {
-	rv_register_monitor(&rv_scpd, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_scpd, &rv_sched);
 }
 
 static void __exit unregister_scpd(void)
diff --git a/kernel/trace/rv/monitors/sncid/sncid.c b/kernel/trace/rv/monitors/sncid/sncid.c
index f5037cd6214c2..3e1ee715a0fbf 100644
--- a/kernel/trace/rv/monitors/sncid/sncid.c
+++ b/kernel/trace/rv/monitors/sncid/sncid.c
@@ -79,8 +79,7 @@ static struct rv_monitor rv_sncid = {
 
 static int __init register_sncid(void)
 {
-	rv_register_monitor(&rv_sncid, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_sncid, &rv_sched);
 }
 
 static void __exit unregister_sncid(void)
diff --git a/kernel/trace/rv/monitors/snep/snep.c b/kernel/trace/rv/monitors/snep/snep.c
index 0076ba6d7ea44..2adc3108d60c9 100644
--- a/kernel/trace/rv/monitors/snep/snep.c
+++ b/kernel/trace/rv/monitors/snep/snep.c
@@ -79,8 +79,7 @@ static struct rv_monitor rv_snep = {
 
 static int __init register_snep(void)
 {
-	rv_register_monitor(&rv_snep, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_snep, &rv_sched);
 }
 
 static void __exit unregister_snep(void)
diff --git a/kernel/trace/rv/monitors/snroc/snroc.c b/kernel/trace/rv/monitors/snroc/snroc.c
index bb1f60d552960..540e686e699f4 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.c
+++ b/kernel/trace/rv/monitors/snroc/snroc.c
@@ -68,8 +68,7 @@ static struct rv_monitor rv_snroc = {
 
 static int __init register_snroc(void)
 {
-	rv_register_monitor(&rv_snroc, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_snroc, &rv_sched);
 }
 
 static void __exit unregister_snroc(void)
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
index 542787e6524fc..0452fcd9edcfe 100644
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ b/kernel/trace/rv/monitors/tss/tss.c
@@ -74,8 +74,7 @@ static struct rv_monitor rv_tss = {
 
 static int __init register_tss(void)
 {
-	rv_register_monitor(&rv_tss, &rv_sched);
-	return 0;
+	return rv_register_monitor(&rv_tss, &rv_sched);
 }
 
 static void __exit unregister_tss(void)
diff --git a/kernel/trace/rv/monitors/wip/wip.c b/kernel/trace/rv/monitors/wip/wip.c
index ed758fec8608f..4b4e99615a11f 100644
--- a/kernel/trace/rv/monitors/wip/wip.c
+++ b/kernel/trace/rv/monitors/wip/wip.c
@@ -71,8 +71,7 @@ static struct rv_monitor rv_wip = {
 
 static int __init register_wip(void)
 {
-	rv_register_monitor(&rv_wip, NULL);
-	return 0;
+	return rv_register_monitor(&rv_wip, NULL);
 }
 
 static void __exit unregister_wip(void)
diff --git a/kernel/trace/rv/monitors/wwnr/wwnr.c b/kernel/trace/rv/monitors/wwnr/wwnr.c
index 172f31c4b0f34..4145bea2729e1 100644
--- a/kernel/trace/rv/monitors/wwnr/wwnr.c
+++ b/kernel/trace/rv/monitors/wwnr/wwnr.c
@@ -70,8 +70,7 @@ static struct rv_monitor rv_wwnr = {
 
 static int __init register_wwnr(void)
 {
-	rv_register_monitor(&rv_wwnr, NULL);
-	return 0;
+	return rv_register_monitor(&rv_wwnr, NULL);
 }
 
 static void __exit unregister_wwnr(void)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 06/17] rv: Use strings in da monitors tracepoints
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (4 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 05/17] rv: Return init error when registering monitors Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15 14:11   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 07/17] rv: Adjust monitor dependencies Gabriele Monaco
                   ` (11 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Using DA monitors tracepoints with KASAN enabled triggers the following
warning:

 BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
 Read of size 32 at addr ffffffffaada8980 by task ...
 Call Trace:
  <TASK>
 [...]
  do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
  ? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
  ? trace_event_sncid+0x83/0x200
  trace_event_sncid+0x163/0x200
 [...]
 The buggy address belongs to the variable:
  automaton_snep+0x4e0/0x5e0

This is caused by the tracepoints reading 32 bits __array instead of
__string from the automata definition. Such strings are literals and
reading 32 bytes ends up in out of bound memory accesses (e.g. the next
automaton's data in this case).
The error is harmless as, while printing the string, we stop at the null
terminator, but it should still be fixed.

Use the __string facilities while defining the tracepoints to avoid
reading out of bound memory.

Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/rv_trace.h | 76 +++++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 17ba07329b670..d38e0d3abdfde 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -16,23 +16,23 @@ DECLARE_EVENT_CLASS(event_da_monitor,
 	TP_ARGS(state, event, next_state, final_state),
 
 	TP_STRUCT__entry(
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
-		__array(	char,	next_state,	MAX_DA_NAME_LEN	)
-		__field(	bool,	final_state			)
+		__string(	state,		state		)
+		__string(	event,		event		)
+		__string(	next_state,	next_state	)
+		__field(	bool,		final_state	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		memcpy(__entry->next_state,	next_state,	MAX_DA_NAME_LEN);
-		__entry->final_state		= final_state;
+		__assign_str(state);
+		__assign_str(event);
+		__assign_str(next_state);
+		__entry->final_state = final_state;
 	),
 
 	TP_printk("%s x %s -> %s%s",
-		__entry->state,
-		__entry->event,
-		__entry->next_state,
+		__get_str(state),
+		__get_str(event),
+		__get_str(next_state),
 		__entry->final_state ? " (final)" : "")
 );
 
@@ -43,18 +43,18 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 	TP_ARGS(state, event),
 
 	TP_STRUCT__entry(
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
+		__string(	state,	state	)
+		__string(	event,	event	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
+		__assign_str(state);
+		__assign_str(event);
 	),
 
 	TP_printk("event %s not expected in the state %s",
-		__entry->event,
-		__entry->state)
+		__get_str(event),
+		__get_str(state))
 );
 
 #include <monitors/wip/wip_trace.h>
@@ -75,26 +75,26 @@ DECLARE_EVENT_CLASS(event_da_monitor_id,
 	TP_ARGS(id, state, event, next_state, final_state),
 
 	TP_STRUCT__entry(
-		__field(	int,	id				)
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
-		__array(	char,	next_state,	MAX_DA_NAME_LEN	)
-		__field(	bool,	final_state			)
+		__field(	int,		id		)
+		__string(	state,		state		)
+		__string(	event,		event		)
+		__string(	next_state,	next_state	)
+		__field(	bool,		final_state	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		memcpy(__entry->next_state,	next_state,	MAX_DA_NAME_LEN);
-		__entry->id			= id;
-		__entry->final_state		= final_state;
+		__assign_str(state);
+		__assign_str(event);
+		__assign_str(next_state);
+		__entry->id		= id;
+		__entry->final_state	= final_state;
 	),
 
 	TP_printk("%d: %s x %s -> %s%s",
 		__entry->id,
-		__entry->state,
-		__entry->event,
-		__entry->next_state,
+		__get_str(state),
+		__get_str(event),
+		__get_str(next_state),
 		__entry->final_state ? " (final)" : "")
 );
 
@@ -105,21 +105,21 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
 	TP_ARGS(id, state, event),
 
 	TP_STRUCT__entry(
-		__field(	int,	id				)
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
+		__field(	int,	id	)
+		__string(	state,	state	)
+		__string(	event,	event	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		__entry->id			= id;
+		__assign_str(state);
+		__assign_str(event);
+		__entry->id	= id;
 	),
 
 	TP_printk("%d: event %s not expected in the state %s",
 		__entry->id,
-		__entry->event,
-		__entry->state)
+		__get_str(event),
+		__get_str(state))
 );
 
 #include <monitors/wwnr/wwnr_trace.h>
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 07/17] rv: Adjust monitor dependencies
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (5 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 06/17] rv: Use strings in da monitors tracepoints Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-16  8:19   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors Gabriele Monaco
                   ` (10 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu, Gabriele Monaco,
	linux-trace-kernel
  Cc: Ingo Molnar, Peter Zijlstra, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

RV monitors relying on the preemptirqs tracepoints are set as dependent
on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
enable the tracepoints but are not the minimal configurations enabling
them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
manually).

Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
monitors.

Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/monitors/scpd/Kconfig  | 2 +-
 kernel/trace/rv/monitors/sncid/Kconfig | 2 +-
 kernel/trace/rv/monitors/snep/Kconfig  | 2 +-
 kernel/trace/rv/monitors/wip/Kconfig   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/rv/monitors/scpd/Kconfig b/kernel/trace/rv/monitors/scpd/Kconfig
index b9114fbf680f9..682d0416188b3 100644
--- a/kernel/trace/rv/monitors/scpd/Kconfig
+++ b/kernel/trace/rv/monitors/scpd/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SCPD
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/sncid/Kconfig b/kernel/trace/rv/monitors/sncid/Kconfig
index 76bcfef4fd103..3a5639feaaaf6 100644
--- a/kernel/trace/rv/monitors/sncid/Kconfig
+++ b/kernel/trace/rv/monitors/sncid/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SNCID
 	depends on RV
-	depends on IRQSOFF_TRACER
+	depends on TRACE_IRQFLAGS
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/snep/Kconfig b/kernel/trace/rv/monitors/snep/Kconfig
index 77527f9712325..7dd54f434ff75 100644
--- a/kernel/trace/rv/monitors/snep/Kconfig
+++ b/kernel/trace/rv/monitors/snep/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SNEP
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/wip/Kconfig b/kernel/trace/rv/monitors/wip/Kconfig
index e464b9294865b..87a26195792b4 100644
--- a/kernel/trace/rv/monitors/wip/Kconfig
+++ b/kernel/trace/rv/monitors/wip/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_WIP
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	select DA_MON_EVENTS_IMPLICIT
 	bool "wip monitor"
 	help
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (6 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 07/17] rv: Adjust monitor dependencies Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15 14:48   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit Gabriele Monaco
                   ` (9 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The current behaviour of rvgen when running with the -a option is to
append the necessary lines at the end of the configuration for Kconfig,
Makefile and tracepoints.
This is not always the desired behaviour in case of nested monitors:
while tracepoints are not affected by nesting and the Makefile's only
requirement is that the parent monitor is built before its children, in
the Kconfig it is better to have children defined right after their
parent, otherwise the result has wrong indentation:

[*]   foo_parent monitor
[*]     foo_child1 monitor
[*]     foo_child2 monitor
[*]   bar_parent monitor
[*]     bar_child1 monitor
[*]     bar_child2 monitor
[*]   foo_child3 monitor
[*]   foo_child4 monitor

Adapt rvgen to look for a different marker for nested monitors in the
Kconfig file and append the line right after the last sibling, instead
of the last monitor.
Also add the marker when creating a new parent monitor.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/Kconfig                     |  5 +++++
 tools/verification/rvgen/rvgen/container.py | 13 +++++++++++++
 tools/verification/rvgen/rvgen/generator.py | 13 ++++++++-----
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index c11bf7e61ebf0..26017378f79b8 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -43,6 +43,7 @@ config RV_PER_TASK_MONITORS
 
 source "kernel/trace/rv/monitors/wip/Kconfig"
 source "kernel/trace/rv/monitors/wwnr/Kconfig"
+
 source "kernel/trace/rv/monitors/sched/Kconfig"
 source "kernel/trace/rv/monitors/tss/Kconfig"
 source "kernel/trace/rv/monitors/sco/Kconfig"
@@ -50,9 +51,13 @@ source "kernel/trace/rv/monitors/snroc/Kconfig"
 source "kernel/trace/rv/monitors/scpd/Kconfig"
 source "kernel/trace/rv/monitors/snep/Kconfig"
 source "kernel/trace/rv/monitors/sncid/Kconfig"
+# Add new sched monitors here
+
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
 source "kernel/trace/rv/monitors/pagefault/Kconfig"
 source "kernel/trace/rv/monitors/sleep/Kconfig"
+# Add new rtapp monitors here
+
 # Add new monitors here
 
 config RV_REACTORS
diff --git a/tools/verification/rvgen/rvgen/container.py b/tools/verification/rvgen/rvgen/container.py
index 47d8ab2ad3ec4..fee493f89fde6 100644
--- a/tools/verification/rvgen/rvgen/container.py
+++ b/tools/verification/rvgen/rvgen/container.py
@@ -20,3 +20,16 @@ class Container(generator.RVGenerator):
         main_h = self.main_h
         main_h = main_h.replace("%%MODEL_NAME%%", self.name)
         return main_h
+
+    def fill_kconfig_tooltip(self):
+        """Override to produce a marker for this container in the Kconfig"""
+        container_marker = f"# Add new {self.name} monitors here\n"
+        if self.auto_patch:
+            self._patch_file("Kconfig",
+                            "# Add new monitors here", "")
+        result = super().fill_kconfig_tooltip()
+        if self.auto_patch:
+            self._patch_file("Kconfig",
+                            "# Add new monitors here", container_marker)
+            return result
+        return result + container_marker
diff --git a/tools/verification/rvgen/rvgen/generator.py b/tools/verification/rvgen/rvgen/generator.py
index 19d0078a38032..ac97c4505ff00 100644
--- a/tools/verification/rvgen/rvgen/generator.py
+++ b/tools/verification/rvgen/rvgen/generator.py
@@ -137,7 +137,8 @@ class RVGenerator:
         kconfig = kconfig.replace("%%MONITOR_DEPS%%", monitor_deps)
         return kconfig
 
-    def __patch_file(self, file, marker, line):
+    def _patch_file(self, file, marker, line):
+        assert(self.auto_patch)
         file_to_patch = os.path.join(self.rv_dir, file)
         content = self._read_file(file_to_patch)
         content = content.replace(marker, line + "\n" + marker)
@@ -146,7 +147,7 @@ class RVGenerator:
     def fill_tracepoint_tooltip(self):
         monitor_class_type = self.fill_monitor_class_type()
         if self.auto_patch:
-            self.__patch_file("rv_trace.h",
+            self._patch_file("rv_trace.h",
                             "// Add new monitors based on CONFIG_%s here" % monitor_class_type,
                             "#include <monitors/%s/%s_trace.h>" % (self.name, self.name))
             return "  - Patching %s/rv_trace.h, double check the result" % self.rv_dir
@@ -158,8 +159,10 @@ Add this line where other tracepoints are included and %s is defined:
 
     def fill_kconfig_tooltip(self):
         if self.auto_patch:
-            self.__patch_file("Kconfig",
-                            "# Add new monitors here",
+            # monitors with a container should stay together in the Kconfig
+            self._patch_file("Kconfig",
+                            "# Add new %smonitors here" %
+                              (self.parent + " " if self.parent else ""),
                             "source \"kernel/trace/rv/monitors/%s/Kconfig\"" % (self.name))
             return "  - Patching %s/Kconfig, double check the result" % self.rv_dir
 
@@ -172,7 +175,7 @@ source \"kernel/trace/rv/monitors/%s/Kconfig\"
         name = self.name
         name_up = name.upper()
         if self.auto_patch:
-            self.__patch_file("Makefile",
+            self._patch_file("Makefile",
                             "# Add new monitors here",
                             "obj-$(CONFIG_RV_MON_%s) += monitors/%s/%s.o" % (name_up, name, name))
             return "  - Patching %s/Makefile, double check the result" % self.rv_dir
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (7 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15 15:01   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 10/17] rv: " Gabriele Monaco
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Nam Cao, Gabriele Monaco, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

From: Nam Cao <namcao@linutronix.de>

The dot2c.py script generates all states in a single line. This breaks the
100 column limit when the state machines are non-trivial.

Change dot2c.py to generate the states in separate lines in case the
generated line is going to be too long.

Co-authored-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 tools/verification/rvgen/rvgen/dot2c.py | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/verification/rvgen/rvgen/dot2c.py b/tools/verification/rvgen/rvgen/dot2c.py
index 6009caf568d92..5216d98ae21c8 100644
--- a/tools/verification/rvgen/rvgen/dot2c.py
+++ b/tools/verification/rvgen/rvgen/dot2c.py
@@ -152,28 +152,29 @@ class Dot2c(Automata):
         max_state_name = max(self.states, key = len).__len__()
         return max(max_state_name, self.invalid_state_str.__len__())
 
-    def __get_state_string_length(self):
-        maxlen = self.__get_max_strlen_of_states() + self.enum_suffix.__len__()
-        return "%" + str(maxlen) + "s"
-
     def get_aut_init_function(self):
         nr_states = self.states.__len__()
         nr_events = self.events.__len__()
         buff = []
 
-        strformat = self.__get_state_string_length()
-
+        maxlen = self.__get_max_strlen_of_states() + len(self.enum_suffix)
+        # account for tabs and spaces/punctuation for each event
+        linetoolong = 16 + (maxlen + 3) * nr_events >= self.line_length
         for x in range(nr_states):
-            line = "\t\t{ "
+            line = "\t\t{\n" if linetoolong else "\t\t{ "
             for y in range(nr_events):
                 next_state = self.function[x][y]
                 if next_state != self.invalid_state_str:
                     next_state = self.function[x][y] + self.enum_suffix
 
+                if linetoolong:
+                    line += "\t\t\t%s" % next_state
+                else:
+                    line += "%*s" % (maxlen, next_state)
                 if y != nr_events-1:
-                    line = line + strformat % next_state + ", "
+                    line += ",\n" if linetoolong else ", "
                 else:
-                    line = line + strformat % next_state + " },"
+                    line += "\n\t\t}," if linetoolong else " },"
             buff.append(line)
 
         return self.__buff_to_string(buff)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 10/17] rv: Fix generated files going over 100 column limit
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (8 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15 15:08   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 11/17] rv: Retry when da monitor detects race conditions Gabriele Monaco
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The dot2c.py script generates all states in a single line. This breaks the
100 column limit when the state machines are non-trivial.
Recent changes allow it to print states over multiple lines if the
resulting line would have been too long.

Adapt existing monitors with line length over the limit.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/monitors/sco/sco.h     | 12 ++++++++++--
 kernel/trace/rv/monitors/snep/snep.h   | 14 ++++++++++++--
 kernel/trace/rv/monitors/snroc/snroc.h | 12 ++++++++++--
 3 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/rv/monitors/sco/sco.h b/kernel/trace/rv/monitors/sco/sco.h
index 7a4c1f2d5ca1c..83ca9a03331af 100644
--- a/kernel/trace/rv/monitors/sco/sco.h
+++ b/kernel/trace/rv/monitors/sco/sco.h
@@ -39,8 +39,16 @@ static const struct automaton_sco automaton_sco = {
 		"schedule_exit"
 	},
 	.function = {
-		{     thread_context_sco, scheduling_context_sco,          INVALID_STATE },
-		{          INVALID_STATE,          INVALID_STATE,     thread_context_sco },
+		{
+			thread_context_sco,
+			scheduling_context_sco,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			INVALID_STATE,
+			thread_context_sco
+		},
 	},
 	.initial_state = thread_context_sco,
 	.final_states = { 1, 0 },
diff --git a/kernel/trace/rv/monitors/snep/snep.h b/kernel/trace/rv/monitors/snep/snep.h
index 6d16b9ad931e1..4cd9abb77b7b2 100644
--- a/kernel/trace/rv/monitors/snep/snep.h
+++ b/kernel/trace/rv/monitors/snep/snep.h
@@ -41,8 +41,18 @@ static const struct automaton_snep automaton_snep = {
 		"schedule_exit"
 	},
 	.function = {
-		{ non_scheduling_context_snep, non_scheduling_context_snep, scheduling_contex_snep,               INVALID_STATE },
-		{               INVALID_STATE,               INVALID_STATE,          INVALID_STATE, non_scheduling_context_snep },
+		{
+			non_scheduling_context_snep,
+			non_scheduling_context_snep,
+			scheduling_contex_snep,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			non_scheduling_context_snep
+		},
 	},
 	.initial_state = non_scheduling_context_snep,
 	.final_states = { 1, 0 },
diff --git a/kernel/trace/rv/monitors/snroc/snroc.h b/kernel/trace/rv/monitors/snroc/snroc.h
index c3650a2b1b107..be46f7b9ebb87 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.h
+++ b/kernel/trace/rv/monitors/snroc/snroc.h
@@ -39,8 +39,16 @@ static const struct automaton_snroc automaton_snroc = {
 		"sched_switch_out"
 	},
 	.function = {
-		{      INVALID_STATE,  own_context_snroc,       INVALID_STATE },
-		{  own_context_snroc,      INVALID_STATE, other_context_snroc },
+		{
+			INVALID_STATE,
+			own_context_snroc,
+			INVALID_STATE
+		},
+		{
+			own_context_snroc,
+			INVALID_STATE,
+			other_context_snroc
+		},
 	},
 	.initial_state = other_context_snroc,
 	.final_states = { 1, 0 },
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (9 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 10/17] rv: " Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15 15:23   ` Nam Cao
  2025-07-15  7:14 ` [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:

  [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
  [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
  [63] error_srs: 27: event sched_switch_suspend not expected in the state running

In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.

Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react as if it was an error
with invalid current state (we cannot determine it).

Remove the functions da_monitor_curr_state() and da_monitor_set_state()
as they only hide the underlying implementation in this case.

Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/rv.h      |  3 +-
 include/rv/da_monitor.h | 91 ++++++++++++++++++++++-------------------
 2 files changed, 51 insertions(+), 43 deletions(-)

diff --git a/include/linux/rv.h b/include/linux/rv.h
index 97baf58d88b28..0250a04f4524c 100644
--- a/include/linux/rv.h
+++ b/include/linux/rv.h
@@ -7,7 +7,8 @@
 #ifndef _LINUX_RV_H
 #define _LINUX_RV_H
 
-#define MAX_DA_NAME_LEN	32
+#define MAX_DA_NAME_LEN			32
+#define MAX_DA_RETRY_RACING_EVENTS	3
 
 #ifdef CONFIG_RV
 #include <linux/bitops.h>
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index ed3c34fe18d61..bd3096a3181f8 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -54,23 +54,6 @@ static inline void da_monitor_reset_##name(struct da_monitor *da_mon)				\
 	da_mon->curr_state = model_get_initial_state_##name();					\
 }												\
 												\
-/*												\
- * da_monitor_curr_state_##name - return the current state					\
- */												\
-static inline type da_monitor_curr_state_##name(struct da_monitor *da_mon)			\
-{												\
-	return da_mon->curr_state;								\
-}												\
-												\
-/*												\
- * da_monitor_set_state_##name - set the new current state					\
- */												\
-static inline void										\
-da_monitor_set_state_##name(struct da_monitor *da_mon, enum states_##name state)		\
-{												\
-	da_mon->curr_state = state;								\
-}												\
-												\
 /*												\
  * da_monitor_start_##name - start monitoring							\
  *												\
@@ -127,57 +110,72 @@ static inline bool da_monitor_handling_event_##name(struct da_monitor *da_mon)
  * Event handler for implicit monitors. Implicit monitor is the one which the
  * handler does not need to specify which da_monitor to manipulate. Examples
  * of implicit monitor are the per_cpu or the global ones.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_IMPLICIT(name, type)					\
 												\
 static inline bool										\
 da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE)						\
+			goto out_react;								\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
+			goto out_success;							\
 	}											\
+	/* Special invalid transition if we run out of retries. */				\
+	curr_state = INVALID_STATE;								\
 												\
+out_react:											\
 	cond_react_##name(curr_state, event);							\
 												\
 	trace_error_##name(model_get_state_name_##name(curr_state),				\
 			   model_get_event_name_##name(event));					\
 												\
 	return false;										\
+												\
+out_success:											\
+	trace_event_##name(model_get_state_name_##name(curr_state),				\
+			   model_get_event_name_##name(event),					\
+			   model_get_state_name_##name(next_state),				\
+			   model_is_final_state_##name(next_state));				\
+												\
+	return true;										\
 }												\
 
 /*
  * Event handler for per_task monitors.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_PER_TASK(name, type)					\
 												\
 static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct *tsk,		\
 				   enum events_##name event)					\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(tsk->pid,							\
-				   model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE)						\
+			goto out_react;								\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
+			goto out_success;							\
 	}											\
+	/* Special invalid transition if we run out of retries. */				\
+	curr_state = INVALID_STATE;								\
 												\
+out_react:											\
 	cond_react_##name(curr_state, event);							\
 												\
 	trace_error_##name(tsk->pid,								\
@@ -185,6 +183,15 @@ static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct
 			   model_get_event_name_##name(event));					\
 												\
 	return false;										\
+												\
+out_success:											\
+	trace_event_##name(tsk->pid,								\
+			   model_get_state_name_##name(curr_state),				\
+			   model_get_event_name_##name(event),					\
+			   model_get_state_name_##name(next_state),				\
+			   model_is_final_state_##name(next_state));				\
+												\
+	return true;										\
 }
 
 /*
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (10 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 11/17] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-16 12:38   ` Peter Zijlstra
  2025-07-15  7:14 ` [PATCH v3 13/17] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Masami Hiramatsu, linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

Add the following tracepoints:
* sched_set_need_resched(tsk, cpu, tif)
    Called when a task is set the need resched [lazy] flag
* sched_switch_vain(preempt, tsk, tsk_state)
    Called when a task is selected again during __schedule
    i.e. prev == next == tsk : no real context switch

Add new parameter to sched_set_state to identify whether the state
change was due to an explicit call or a signal pending while scheduling.
We now also trace from try_to_block_task in case a signal was pending
and the task is set to runnable.

Also adapt all monitors using sched_set_state to avoid breaking build.

These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/sched.h                  |  7 ++++++-
 include/trace/events/sched.h           | 17 +++++++++++++++--
 kernel/sched/core.c                    | 10 +++++++++-
 kernel/trace/rv/monitors/sco/sco.c     |  3 ++-
 kernel/trace/rv/monitors/sleep/sleep.c |  3 ++-
 kernel/trace/rv/monitors/snroc/snroc.c |  3 ++-
 6 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7bce4c7ae3b4f..19ab4597c97d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -339,9 +339,11 @@ extern void io_schedule_finish(int token);
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
-/* wrapper function to trace from this header file */
+/* wrapper functions to trace from this header file */
 DECLARE_TRACEPOINT(sched_set_state_tp);
 extern void __trace_set_current_state(int state_value);
+DECLARE_TRACEPOINT(sched_set_need_resched_tp);
+extern void __trace_set_need_resched(struct task_struct *curr, int tif);
 
 /**
  * struct prev_cputime - snapshot of system and user cputime
@@ -2059,6 +2061,9 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
 
 static inline void set_tsk_need_resched(struct task_struct *tsk)
 {
+	if (tracepoint_enabled(sched_set_need_resched_tp) &&
+	    !test_tsk_thread_flag(tsk, TIF_NEED_RESCHED))
+		__trace_set_need_resched(tsk, TIF_NEED_RESCHED);
 	set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
 }
 
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 4e6b2910cec3f..c9dec6d38ad2d 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -889,11 +889,24 @@ DECLARE_TRACE(sched_exit,
 	TP_PROTO(bool is_switch, unsigned long ip),
 	TP_ARGS(is_switch, ip));
 
+/*
+ * Tracepoint called when setting the state of a task;
+ * this tracepoint is guaranteed to be called from the waking context of the
+ * task setting the state.
+ */
 DECLARE_TRACE_CONDITION(sched_set_state,
-	TP_PROTO(struct task_struct *tsk, int state),
-	TP_ARGS(tsk, state),
+	TP_PROTO(struct task_struct *tsk, int state, bool from_signal),
+	TP_ARGS(tsk, state, from_signal),
 	TP_CONDITION(!!(tsk->__state) != !!state));
 
+DECLARE_TRACE(sched_set_need_resched,
+	TP_PROTO(struct task_struct *tsk, int cpu, int tif),
+	TP_ARGS(tsk, cpu, tif));
+
+DECLARE_TRACE(sched_switch_vain,
+	TP_PROTO(bool preempt, struct task_struct *tsk, unsigned int prev_state),
+	TP_ARGS(preempt, tsk, prev_state));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df17..6cb70e6f7fa17 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -495,7 +495,7 @@ EXPORT_TRACEPOINT_SYMBOL(sched_set_state_tp);
 /* Call via the helper macro trace_set_current_state. */
 void __trace_set_current_state(int state_value)
 {
-	trace_sched_set_state_tp(current, state_value);
+	trace_sched_set_state_tp(current, state_value, false);
 }
 EXPORT_SYMBOL(__trace_set_current_state);
 
@@ -1110,6 +1110,7 @@ static void __resched_curr(struct rq *rq, int tif)
 
 	cpu = cpu_of(rq);
 
+	trace_sched_set_need_resched_tp(curr, cpu, tif);
 	if (cpu == smp_processor_id()) {
 		set_ti_thread_flag(cti, tif);
 		if (tif == TIF_NEED_RESCHED)
@@ -1125,6 +1126,11 @@ static void __resched_curr(struct rq *rq, int tif)
 	}
 }
 
+void __trace_set_need_resched(struct task_struct *curr, int tif)
+{
+	trace_sched_set_need_resched_tp(curr, smp_processor_id(), tif);
+}
+
 void resched_curr(struct rq *rq)
 {
 	__resched_curr(rq, TIF_NEED_RESCHED);
@@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
 	int flags = DEQUEUE_NOCLOCK;
 
 	if (signal_pending_state(task_state, p)) {
+		trace_sched_set_state_tp(p, TASK_RUNNING, true);
 		WRITE_ONCE(p->__state, TASK_RUNNING);
 		*task_state_p = TASK_RUNNING;
 		return false;
@@ -6786,6 +6793,7 @@ static void __sched notrace __schedule(int sched_mode)
 		rq = context_switch(rq, prev, next, &rf);
 	} else {
 		rq_unpin_lock(rq, &rf);
+		trace_sched_switch_vain_tp(preempt, prev, prev_state);
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);
 	}
diff --git a/kernel/trace/rv/monitors/sco/sco.c b/kernel/trace/rv/monitors/sco/sco.c
index 66f4639d46ac4..c9206aa12c319 100644
--- a/kernel/trace/rv/monitors/sco/sco.c
+++ b/kernel/trace/rv/monitors/sco/sco.c
@@ -19,7 +19,8 @@
 static struct rv_monitor rv_sco;
 DECLARE_DA_MON_PER_CPU(sco, unsigned char);
 
-static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+				   int state, bool from_signal)
 {
 	da_handle_start_event_sco(sched_set_state_sco);
 }
diff --git a/kernel/trace/rv/monitors/sleep/sleep.c b/kernel/trace/rv/monitors/sleep/sleep.c
index eea447b069071..5103a98818c53 100644
--- a/kernel/trace/rv/monitors/sleep/sleep.c
+++ b/kernel/trace/rv/monitors/sleep/sleep.c
@@ -82,7 +82,8 @@ static void ltl_atoms_init(struct task_struct *task, struct ltl_monitor *mon, bo
 
 }
 
-static void handle_sched_set_state(void *data, struct task_struct *task, int state)
+static void handle_sched_set_state(void *data, struct task_struct *task,
+				   int state, bool from_signal)
 {
 	if (state & TASK_INTERRUPTIBLE)
 		ltl_atom_pulse(task, LTL_SLEEP, true);
diff --git a/kernel/trace/rv/monitors/snroc/snroc.c b/kernel/trace/rv/monitors/snroc/snroc.c
index 540e686e699f4..2651f589d1554 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.c
+++ b/kernel/trace/rv/monitors/snroc/snroc.c
@@ -19,7 +19,8 @@
 static struct rv_monitor rv_snroc;
 DECLARE_DA_MON_PER_TASK(snroc, unsigned char);
 
-static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+				   int state, bool from_signal)
 {
 	da_handle_event_snroc(tsk, sched_set_state_snroc);
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 13/17] rv: Adapt the sco monitor to the new set_state
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (11 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 14/17] rv: Extend snroc model Gabriele Monaco
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The sched_set_state tracepoint changed prototype adding a new argument,
this argument can differentiate between an explicit set_state called by
a task and a set state to runnable by the scheduler due to a pending
signal.

Expand the model to handle the new set_state flavour, the monitor was
making sure set state happens only outside of the scheduler, if the
event occurs with the new argument (from_signal) set to true, we instead
expect it to be inside the scheduler.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst | 35 +++++++++++++-----------
 kernel/trace/rv/monitors/sco/sco.c       |  5 +++-
 kernel/trace/rv/monitors/sco/sco.h       |  4 +++
 tools/verification/models/sched/sco.dot  |  1 +
 4 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 24b2c62a3bc26..6f76bba94d9fb 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -64,22 +64,25 @@ Monitor sco
 ~~~~~~~~~~~
 
 The scheduling context operations (sco) monitor ensures changes in a task state
-happen only in thread context::
-
-
-                        |
-                        |
-                        v
-    sched_set_state   +------------------+
-  +------------------ |                  |
-  |                   |  thread_context  |
-  +-----------------> |                  | <+
-                      +------------------+  |
-                        |                   |
-                        | schedule_entry    | schedule_exit
-                        v                   |
-                                            |
-                       scheduling_context  -+
+happen only in thread context, the only exception is a special kind of set
+state that occurs if a task about to sleep has a pending signal. This set state
+is not called by the thread but by the scheduler itself::
+
+                                        |
+                                        |
+                                        v
+    sched_set_state                   +------------------+
+  +---------------------------------- |                  |
+  |                                   |  thread_context  |
+  +---------------------------------> |                  | <+
+                                      +------------------+  |
+                                        |                   |
+                                        | schedule_entry    | schedule_exit
+                                        v                   |
+    sched_set_state_runnable_signal                         |
+  +----------------------------------                       |
+  |                                    scheduling_context   |
+  +--------------------------------->                      -+
 
 Monitor snroc
 ~~~~~~~~~~~~~
diff --git a/kernel/trace/rv/monitors/sco/sco.c b/kernel/trace/rv/monitors/sco/sco.c
index c9206aa12c319..6457ff2469d08 100644
--- a/kernel/trace/rv/monitors/sco/sco.c
+++ b/kernel/trace/rv/monitors/sco/sco.c
@@ -22,7 +22,10 @@ DECLARE_DA_MON_PER_CPU(sco, unsigned char);
 static void handle_sched_set_state(void *data, struct task_struct *tsk,
 				   int state, bool from_signal)
 {
-	da_handle_start_event_sco(sched_set_state_sco);
+	if (from_signal)
+		da_handle_event_sco(sched_set_state_runnable_signal_sco);
+	else
+		da_handle_start_event_sco(sched_set_state_sco);
 }
 
 static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
diff --git a/kernel/trace/rv/monitors/sco/sco.h b/kernel/trace/rv/monitors/sco/sco.h
index 83ca9a03331af..2d05ab882b2a6 100644
--- a/kernel/trace/rv/monitors/sco/sco.h
+++ b/kernel/trace/rv/monitors/sco/sco.h
@@ -15,6 +15,7 @@ enum states_sco {
 
 enum events_sco {
 	sched_set_state_sco = 0,
+	sched_set_state_runnable_signal_sco,
 	schedule_entry_sco,
 	schedule_exit_sco,
 	event_max_sco
@@ -35,17 +36,20 @@ static const struct automaton_sco automaton_sco = {
 	},
 	.event_names = {
 		"sched_set_state",
+		"sched_set_state_runnable_signal",
 		"schedule_entry",
 		"schedule_exit"
 	},
 	.function = {
 		{
 			thread_context_sco,
+			INVALID_STATE,
 			scheduling_context_sco,
 			INVALID_STATE
 		},
 		{
 			INVALID_STATE,
+			scheduling_context_sco,
 			INVALID_STATE,
 			thread_context_sco
 		},
diff --git a/tools/verification/models/sched/sco.dot b/tools/verification/models/sched/sco.dot
index 20b0e3b449a6b..4e44ed58c62a3 100644
--- a/tools/verification/models/sched/sco.dot
+++ b/tools/verification/models/sched/sco.dot
@@ -7,6 +7,7 @@ digraph state_automaton {
 	{node [shape = plaintext] "thread_context"};
 	"__init_thread_context" -> "thread_context";
 	"scheduling_context" [label = "scheduling_context"];
+	"scheduling_context" -> "scheduling_context" [ label = "sched_set_state_runnable_signal" ];
 	"scheduling_context" -> "thread_context" [ label = "schedule_exit" ];
 	"thread_context" [label = "thread_context", color = green3];
 	"thread_context" -> "scheduling_context" [ label = "schedule_entry" ];
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 14/17] rv: Extend snroc model
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (12 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 13/17] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 15/17] rv: Replace tss monitor with more complete sts Gabriele Monaco
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The snroc monitor ensures changes in a task state happens only in the
respective task's context. This is the only monitor enforcing a task
is to be switched in after being switched out, and vice-versa.
Although this is a trivial claim, it is useful to complete other claims
on when scheduling is possible while keeping models simple.

Add the sched_switch_vain event to the model, enforcing that a vain
switch doesn't change the context but can only occur while a task is
running (e.g. a call to schedule that re-selects the current task).

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst  | 32 ++++++++++++-----------
 kernel/trace/rv/monitors/snroc/snroc.c    |  9 +++++++
 kernel/trace/rv/monitors/snroc/snroc.h    |  8 ++++--
 tools/verification/models/sched/snroc.dot |  2 +-
 4 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 6f76bba94d9fb..5cb58ac441d83 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -89,21 +89,23 @@ Monitor snroc
 
 The set non runnable on its own context (snroc) monitor ensures changes in a
 task state happens only in the respective task's context. This is a per-task
-monitor::
-
-                        |
-                        |
-                        v
-                      +------------------+
-                      |  other_context   | <+
-                      +------------------+  |
-                        |                   |
-                        | sched_switch_in   | sched_switch_out
-                        v                   |
-    sched_set_state                         |
-  +------------------                       |
-  |                       own_context       |
-  +----------------->                      -+
+monitor. A task is in its own context after switching in and leaves the context
+when switched out, a vain switch maintains the context::
+
+                          |
+                          |
+                          v
+                        +------------------+
+                        |  other_context   | <+
+                        +------------------+  |
+                          |                   |
+                          | sched_switch_in   | sched_switch_out
+                          v                   |
+    sched_set_state                           |
+    sched_switch_vain                         |
+  +--------------------     own_context       |
+  |                                           |
+  +------------------->                      -+
 
 Monitor scpd
 ~~~~~~~~~~~~
diff --git a/kernel/trace/rv/monitors/snroc/snroc.c b/kernel/trace/rv/monitors/snroc/snroc.c
index 2651f589d1554..11a56b58665e7 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.c
+++ b/kernel/trace/rv/monitors/snroc/snroc.c
@@ -34,6 +34,13 @@ static void handle_sched_switch(void *data, bool preempt,
 	da_handle_event_snroc(next, sched_switch_in_snroc);
 }
 
+static void handle_sched_switch_vain(void *data, bool preempt,
+				     struct task_struct *tsk,
+				     unsigned int tsk_state)
+{
+	da_handle_event_snroc(tsk, sched_switch_vain_snroc);
+}
+
 static int enable_snroc(void)
 {
 	int retval;
@@ -44,6 +51,7 @@ static int enable_snroc(void)
 
 	rv_attach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
 	rv_attach_trace_probe("snroc", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("snroc", sched_switch_vain_tp, handle_sched_switch_vain);
 
 	return 0;
 }
@@ -54,6 +62,7 @@ static void disable_snroc(void)
 
 	rv_detach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
 	rv_detach_trace_probe("snroc", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("snroc", sched_switch_vain_tp, handle_sched_switch_vain);
 
 	da_monitor_destroy_snroc();
 }
diff --git a/kernel/trace/rv/monitors/snroc/snroc.h b/kernel/trace/rv/monitors/snroc/snroc.h
index be46f7b9ebb87..064d98fb9796d 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.h
+++ b/kernel/trace/rv/monitors/snroc/snroc.h
@@ -17,6 +17,7 @@ enum events_snroc {
 	sched_set_state_snroc = 0,
 	sched_switch_in_snroc,
 	sched_switch_out_snroc,
+	sched_switch_vain_snroc,
 	event_max_snroc
 };
 
@@ -36,18 +37,21 @@ static const struct automaton_snroc automaton_snroc = {
 	.event_names = {
 		"sched_set_state",
 		"sched_switch_in",
-		"sched_switch_out"
+		"sched_switch_out",
+		"sched_switch_vain"
 	},
 	.function = {
 		{
 			INVALID_STATE,
 			own_context_snroc,
+			INVALID_STATE,
 			INVALID_STATE
 		},
 		{
 			own_context_snroc,
 			INVALID_STATE,
-			other_context_snroc
+			other_context_snroc,
+			own_context_snroc
 		},
 	},
 	.initial_state = other_context_snroc,
diff --git a/tools/verification/models/sched/snroc.dot b/tools/verification/models/sched/snroc.dot
index 8b71c32d4dca4..5b1a787d0350b 100644
--- a/tools/verification/models/sched/snroc.dot
+++ b/tools/verification/models/sched/snroc.dot
@@ -10,7 +10,7 @@ digraph state_automaton {
 	"other_context" -> "own_context" [ label = "sched_switch_in" ];
 	"own_context" [label = "own_context"];
 	"own_context" -> "other_context" [ label = "sched_switch_out" ];
-	"own_context" -> "own_context" [ label = "sched_set_state" ];
+	"own_context" -> "own_context" [ label = "sched_set_state\nsched_switch_vain" ];
 	{ rank = min ;
 		"__init_other_context";
 		"other_context";
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 15/17] rv: Replace tss monitor with more complete sts
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (13 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 14/17] rv: Extend snroc model Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 16/17] rv: Add nrp and sssw per-task monitors Gabriele Monaco
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The tss monitor currently guarantees context switches can happen only
while scheduling, but it doesn't enforce that each scheduling call
implies a task switch. This can be implied only if we rely on the newly
introduced sched_switch_vain tracepoint, which represents a
scheduler call where the previously running task is the same that is
picked to run next, in fact no context is switched.

Replace the monitor with a more comprehensive specification which
implies tss but also ensures that:
* each scheduler call switches context (or has a vain switch)
* each context switch happens with interrupts disabled

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst      |  68 +++++++---
 kernel/trace/rv/Kconfig                       |   2 +-
 kernel/trace/rv/Makefile                      |   2 +-
 kernel/trace/rv/monitors/sts/Kconfig          |  18 +++
 kernel/trace/rv/monitors/sts/sts.c            | 117 ++++++++++++++++++
 kernel/trace/rv/monitors/sts/sts.h            |  97 +++++++++++++++
 .../{tss/tss_trace.h => sts/sts_trace.h}      |   8 +-
 kernel/trace/rv/monitors/tss/Kconfig          |  14 ---
 kernel/trace/rv/monitors/tss/tss.c            |  90 --------------
 kernel/trace/rv/monitors/tss/tss.h            |  47 -------
 kernel/trace/rv/rv_trace.h                    |   2 +-
 tools/verification/models/sched/sts.dot       |  29 +++++
 tools/verification/models/sched/tss.dot       |  18 ---
 13 files changed, 316 insertions(+), 196 deletions(-)
 create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sts/sts.c
 create mode 100644 kernel/trace/rv/monitors/sts/sts.h
 rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
 delete mode 100644 kernel/trace/rv/monitors/tss/Kconfig
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
 create mode 100644 tools/verification/models/sched/sts.dot
 delete mode 100644 tools/verification/models/sched/tss.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 5cb58ac441d83..e4171aadef1c2 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -40,26 +40,6 @@ defined in by Daniel Bristot in [1].
 
 Currently we included the following:
 
-Monitor tss
-~~~~~~~~~~~
-
-The task switch while scheduling (tss) monitor ensures a task switch happens
-only in scheduling context, that is inside a call to `__schedule`::
-
-                     |
-                     |
-                     v
-                   +-----------------+
-                   |     thread      | <+
-                   +-----------------+  |
-                     |                  |
-                     | schedule_entry   | schedule_exit
-                     v                  |
-    sched_switch                        |
-  +---------------                      |
-  |                       sched         |
-  +-------------->                     -+
-
 Monitor sco
 ~~~~~~~~~~~
 
@@ -170,6 +150,54 @@ schedule is not called with interrupt disabled::
                                        |
                         cant_sched    -+
 
+Monitor sts
+~~~~~~~~~~~
+
+The schedule implies task switch (sts) monitor ensures a task switch happens in
+every scheduling context, that is inside a call to ``__schedule``, as well as no
+task switch can happen without scheduling and before interrupts are disabled.
+This require the special type of switch called vain, which occurs when the next
+task picked for execution is the same as the previously running one, in fact no
+real task switch occurs::
+
+                    |
+                    |
+                    v
+                  #====================#   irq_disable
+                  H                    H   irq_enable
+                  H       thread       H --------------+
+                  H                    H               |
+  +-------------> H                    H <-------------+
+  |               #====================#
+  |                 |
+  |                 | schedule_entry
+  |                 v
+  |               +--------------------+
+  |               |     scheduling     | <+
+  |               +--------------------+  |
+  |                 |                     |
+  |                 | irq_disable         | irq_enable
+  |                 v                     |
+  |               +--------------------+  |
+  |               | disable_to_switch  | -+
+  | schedule_exit +--------------------+
+  |                 |
+  |                 | sched_switch
+  |                 | sched_switch_vain
+  |                 v
+  |               +--------------------+
+  |               |     switching      |
+  |               +--------------------+
+  |                 |
+  |                 | irq_enable
+  |                 v
+  |               +--------------------+   irq_disable
+  |               |                    |   irq_enable
+  |               |   enable_to_exit   | --------------+
+  |               |                    |               |
+  +-------------- |                    | <-------------+
+                  +--------------------+
+
 References
 ----------
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 26017378f79b8..da62d97d329aa 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -45,12 +45,12 @@ source "kernel/trace/rv/monitors/wip/Kconfig"
 source "kernel/trace/rv/monitors/wwnr/Kconfig"
 
 source "kernel/trace/rv/monitors/sched/Kconfig"
-source "kernel/trace/rv/monitors/tss/Kconfig"
 source "kernel/trace/rv/monitors/sco/Kconfig"
 source "kernel/trace/rv/monitors/snroc/Kconfig"
 source "kernel/trace/rv/monitors/scpd/Kconfig"
 source "kernel/trace/rv/monitors/snep/Kconfig"
 source "kernel/trace/rv/monitors/sncid/Kconfig"
+source "kernel/trace/rv/monitors/sts/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 13ec2944c6650..db8653f7cedf3 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -6,7 +6,6 @@ obj-$(CONFIG_RV) += rv.o
 obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
 obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
 obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
-obj-$(CONFIG_RV_MON_TSS) += monitors/tss/tss.o
 obj-$(CONFIG_RV_MON_SCO) += monitors/sco/sco.o
 obj-$(CONFIG_RV_MON_SNROC) += monitors/snroc/snroc.o
 obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
@@ -15,6 +14,7 @@ obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
 obj-$(CONFIG_RV_MON_RTAPP) += monitors/rtapp/rtapp.o
 obj-$(CONFIG_RV_MON_PAGEFAULT) += monitors/pagefault/pagefault.o
 obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
+obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/sts/Kconfig b/kernel/trace/rv/monitors/sts/Kconfig
new file mode 100644
index 0000000000000..3fb1b5387566c
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_STS
+	depends on RV
+	depends on TRACE_IRQFLAGS
+	depends on RV_MON_SCHED
+	default y
+	select DA_MON_EVENTS_IMPLICIT
+	bool "sts monitor"
+	help
+	  Monitor to ensure relationships between scheduler and switches
+	   * each call to the scheduler implies a switch
+	   * switches only happen inside the scheduler
+	   * switches happen with interrupt disabled
+	  This monitor is part of the sched monitors collection.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sts/sts.c b/kernel/trace/rv/monitors/sts/sts.c
new file mode 100644
index 0000000000000..deb18a9d48f3b
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sts"
+
+#include <trace/events/sched.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sts.h"
+
+static struct rv_monitor rv_sts;
+DECLARE_DA_MON_PER_CPU(sts, unsigned char);
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_sts(irq_disable_sts);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_sts(irq_enable_sts);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+				struct task_struct *prev,
+				struct task_struct *next,
+				unsigned int prev_state)
+{
+	da_handle_event_sts(sched_switch_sts);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+				     struct task_struct *tsk,
+				     unsigned int tsk_state)
+{
+	da_handle_event_sts(sched_switch_vain_sts);
+}
+
+static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+{
+	da_handle_event_sts(schedule_entry_sts);
+}
+
+static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+{
+	da_handle_start_event_sts(schedule_exit_sts);
+}
+
+static int enable_sts(void)
+{
+	int retval;
+
+	retval = da_monitor_init_sts();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("sts", irq_disable, handle_irq_disable);
+	rv_attach_trace_probe("sts", irq_enable, handle_irq_enable);
+	rv_attach_trace_probe("sts", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+	rv_attach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+	rv_attach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+	return 0;
+}
+
+static void disable_sts(void)
+{
+	rv_sts.enabled = 0;
+
+	rv_detach_trace_probe("sts", irq_disable, handle_irq_disable);
+	rv_detach_trace_probe("sts", irq_enable, handle_irq_enable);
+	rv_detach_trace_probe("sts", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+	rv_detach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+	rv_detach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+	da_monitor_destroy_sts();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_sts = {
+	.name = "sts",
+	.description = "schedule implies task switch.",
+	.enable = enable_sts,
+	.disable = disable_sts,
+	.reset = da_monitor_reset_all_sts,
+	.enabled = 0,
+};
+
+static int __init register_sts(void)
+{
+	return rv_register_monitor(&rv_sts, &rv_sched);
+}
+
+static void __exit unregister_sts(void)
+{
+	rv_unregister_monitor(&rv_sts);
+}
+
+module_init(register_sts);
+module_exit(unregister_sts);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sts: schedule implies task switch.");
diff --git a/kernel/trace/rv/monitors/sts/sts.h b/kernel/trace/rv/monitors/sts/sts.h
new file mode 100644
index 0000000000000..537ea1b2c023a
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sts automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sts {
+	thread_sts = 0,
+	disable_to_switch_sts,
+	enable_to_exit_sts,
+	scheduling_sts,
+	switching_sts,
+	state_max_sts
+};
+
+#define INVALID_STATE state_max_sts
+
+enum events_sts {
+	irq_disable_sts = 0,
+	irq_enable_sts,
+	sched_switch_sts,
+	sched_switch_vain_sts,
+	schedule_entry_sts,
+	schedule_exit_sts,
+	event_max_sts
+};
+
+struct automaton_sts {
+	char *state_names[state_max_sts];
+	char *event_names[event_max_sts];
+	unsigned char function[state_max_sts][event_max_sts];
+	unsigned char initial_state;
+	bool final_states[state_max_sts];
+};
+
+static const struct automaton_sts automaton_sts = {
+	.state_names = {
+		"thread",
+		"disable_to_switch",
+		"enable_to_exit",
+		"scheduling",
+		"switching"
+	},
+	.event_names = {
+		"irq_disable",
+		"irq_enable",
+		"sched_switch",
+		"sched_switch_vain",
+		"schedule_entry",
+		"schedule_exit"
+	},
+	.function = {
+		{
+			thread_sts,
+			thread_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			scheduling_sts,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			scheduling_sts,
+			switching_sts,
+			switching_sts,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			enable_to_exit_sts,
+			enable_to_exit_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			thread_sts
+		},
+		{
+			disable_to_switch_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			enable_to_exit_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+	},
+	.initial_state = thread_sts,
+	.final_states = { 1, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h b/kernel/trace/rv/monitors/sts/sts_trace.h
similarity index 67%
rename from kernel/trace/rv/monitors/tss/tss_trace.h
rename to kernel/trace/rv/monitors/sts/sts_trace.h
index 4619dbb50cc06..d78beb58d5b3d 100644
--- a/kernel/trace/rv/monitors/tss/tss_trace.h
+++ b/kernel/trace/rv/monitors/sts/sts_trace.h
@@ -4,12 +4,12 @@
  * Snippet to be included in rv_trace.h
  */
 
-#ifdef CONFIG_RV_MON_TSS
-DEFINE_EVENT(event_da_monitor, event_tss,
+#ifdef CONFIG_RV_MON_STS
+DEFINE_EVENT(event_da_monitor, event_sts,
 	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
 	     TP_ARGS(state, event, next_state, final_state));
 
-DEFINE_EVENT(error_da_monitor, error_tss,
+DEFINE_EVENT(error_da_monitor, error_sts,
 	     TP_PROTO(char *state, char *event),
 	     TP_ARGS(state, event));
-#endif /* CONFIG_RV_MON_TSS */
+#endif /* CONFIG_RV_MON_STS */
diff --git a/kernel/trace/rv/monitors/tss/Kconfig b/kernel/trace/rv/monitors/tss/Kconfig
deleted file mode 100644
index 479f86f52e60d..0000000000000
--- a/kernel/trace/rv/monitors/tss/Kconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-#
-config RV_MON_TSS
-	depends on RV
-	depends on RV_MON_SCHED
-	default y
-	select DA_MON_EVENTS_IMPLICIT
-	bool "tss monitor"
-	help
-	  Monitor to ensure sched_switch happens only in scheduling context.
-	  This monitor is part of the sched monitors collection.
-
-	  For further information, see:
-	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
deleted file mode 100644
index 0452fcd9edcfe..0000000000000
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ /dev/null
@@ -1,90 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "tss"
-
-#include <trace/events/sched.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "tss.h"
-
-static struct rv_monitor rv_tss;
-DECLARE_DA_MON_PER_CPU(tss, unsigned char);
-
-static void handle_sched_switch(void *data, bool preempt,
-				struct task_struct *prev,
-				struct task_struct *next,
-				unsigned int prev_state)
-{
-	da_handle_event_tss(sched_switch_tss);
-}
-
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
-{
-	da_handle_event_tss(schedule_entry_tss);
-}
-
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
-{
-	da_handle_start_event_tss(schedule_exit_tss);
-}
-
-static int enable_tss(void)
-{
-	int retval;
-
-	retval = da_monitor_init_tss();
-	if (retval)
-		return retval;
-
-	rv_attach_trace_probe("tss", sched_switch, handle_sched_switch);
-	rv_attach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
-	rv_attach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
-	return 0;
-}
-
-static void disable_tss(void)
-{
-	rv_tss.enabled = 0;
-
-	rv_detach_trace_probe("tss", sched_switch, handle_sched_switch);
-	rv_detach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
-	rv_detach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
-	da_monitor_destroy_tss();
-}
-
-static struct rv_monitor rv_tss = {
-	.name = "tss",
-	.description = "task switch while scheduling.",
-	.enable = enable_tss,
-	.disable = disable_tss,
-	.reset = da_monitor_reset_all_tss,
-	.enabled = 0,
-};
-
-static int __init register_tss(void)
-{
-	return rv_register_monitor(&rv_tss, &rv_sched);
-}
-
-static void __exit unregister_tss(void)
-{
-	rv_unregister_monitor(&rv_tss);
-}
-
-module_init(register_tss);
-module_exit(unregister_tss);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("tss: task switch while scheduling.");
diff --git a/kernel/trace/rv/monitors/tss/tss.h b/kernel/trace/rv/monitors/tss/tss.h
deleted file mode 100644
index f0a36fda1b873..0000000000000
--- a/kernel/trace/rv/monitors/tss/tss.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of tss automaton
- * For further information about this format, see kernel documentation:
- *   Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_tss {
-	thread_tss = 0,
-	sched_tss,
-	state_max_tss
-};
-
-#define INVALID_STATE state_max_tss
-
-enum events_tss {
-	sched_switch_tss = 0,
-	schedule_entry_tss,
-	schedule_exit_tss,
-	event_max_tss
-};
-
-struct automaton_tss {
-	char *state_names[state_max_tss];
-	char *event_names[event_max_tss];
-	unsigned char function[state_max_tss][event_max_tss];
-	unsigned char initial_state;
-	bool final_states[state_max_tss];
-};
-
-static const struct automaton_tss automaton_tss = {
-	.state_names = {
-		"thread",
-		"sched"
-	},
-	.event_names = {
-		"sched_switch",
-		"schedule_entry",
-		"schedule_exit"
-	},
-	.function = {
-		{     INVALID_STATE,         sched_tss,     INVALID_STATE },
-		{         sched_tss,     INVALID_STATE,        thread_tss },
-	},
-	.initial_state = thread_tss,
-	.final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index d38e0d3abdfde..dbd2f2ef513a7 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -58,11 +58,11 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 );
 
 #include <monitors/wip/wip_trace.h>
-#include <monitors/tss/tss_trace.h>
 #include <monitors/sco/sco_trace.h>
 #include <monitors/scpd/scpd_trace.h>
 #include <monitors/snep/snep_trace.h>
 #include <monitors/sncid/sncid_trace.h>
+#include <monitors/sts/sts_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
 
 #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/sts.dot b/tools/verification/models/sched/sts.dot
new file mode 100644
index 0000000000000..8152675b4f528
--- /dev/null
+++ b/tools/verification/models/sched/sts.dot
@@ -0,0 +1,29 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = circle] "disable_to_switch"};
+	{node [shape = circle] "enable_to_exit"};
+	{node [shape = circle] "scheduling"};
+	{node [shape = circle] "switching"};
+	{node [shape = plaintext, style=invis, label=""] "__init_thread"};
+	{node [shape = doublecircle] "thread"};
+	{node [shape = circle] "thread"};
+	"__init_thread" -> "thread";
+	"disable_to_switch" [label = "disable_to_switch"];
+	"disable_to_switch" -> "scheduling" [ label = "irq_enable" ];
+	"disable_to_switch" -> "switching" [ label = "sched_switch\nsched_switch_vain" ];
+	"enable_to_exit" [label = "enable_to_exit"];
+	"enable_to_exit" -> "enable_to_exit" [ label = "irq_disable\nirq_enable" ];
+	"enable_to_exit" -> "thread" [ label = "schedule_exit" ];
+	"scheduling" [label = "scheduling"];
+	"scheduling" -> "disable_to_switch" [ label = "irq_disable" ];
+	"switching" [label = "switching"];
+	"switching" -> "enable_to_exit" [ label = "irq_enable" ];
+	"thread" [label = "thread", color = green3];
+	"thread" -> "scheduling" [ label = "schedule_entry" ];
+	"thread" -> "thread" [ label = "irq_disable\nirq_enable" ];
+	{ rank = min ;
+		"__init_thread";
+		"thread";
+	}
+}
diff --git a/tools/verification/models/sched/tss.dot b/tools/verification/models/sched/tss.dot
deleted file mode 100644
index 7dfa1d9121bbd..0000000000000
--- a/tools/verification/models/sched/tss.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
-	center = true;
-	size = "7,11";
-	{node [shape = plaintext] "sched"};
-	{node [shape = plaintext, style=invis, label=""] "__init_thread"};
-	{node [shape = ellipse] "thread"};
-	{node [shape = plaintext] "thread"};
-	"__init_thread" -> "thread";
-	"sched" [label = "sched"];
-	"sched" -> "sched" [ label = "sched_switch" ];
-	"sched" -> "thread" [ label = "schedule_exit" ];
-	"thread" [label = "thread", color = green3];
-	"thread" -> "sched" [ label = "schedule_entry" ];
-	{ rank = min ;
-		"__init_thread";
-		"thread";
-	}
-}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 16/17] rv: Add nrp and sssw per-task monitors
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (14 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 15/17] rv: Replace tss monitor with more complete sts Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-15  7:14 ` [PATCH v3 17/17] rv: Add opid per-cpu monitor Gabriele Monaco
  2025-07-16  9:42 ` [PATCH v3 00/17] rv: Add monitors to validate task switch Nam Cao
  17 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Add 2 per-task monitors as part of the sched model:

* nrp: need-resched preempts
    Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
    Monitor to ensure sched_set_state to sleepable leads to sleeping and
    sleeping tasks require wakeup.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst   | 168 +++++++++++++++++++++
 kernel/trace/rv/Kconfig                    |   2 +
 kernel/trace/rv/Makefile                   |   2 +
 kernel/trace/rv/monitors/nrp/Kconfig       |  14 ++
 kernel/trace/rv/monitors/nrp/nrp.c         | 146 ++++++++++++++++++
 kernel/trace/rv/monitors/nrp/nrp.h         |  87 +++++++++++
 kernel/trace/rv/monitors/nrp/nrp_trace.h   |  15 ++
 kernel/trace/rv/monitors/sched/Kconfig     |   1 +
 kernel/trace/rv/monitors/sssw/Kconfig      |  15 ++
 kernel/trace/rv/monitors/sssw/sssw.c       | 115 ++++++++++++++
 kernel/trace/rv/monitors/sssw/sssw.h       |  97 ++++++++++++
 kernel/trace/rv/monitors/sssw/sssw_trace.h |  15 ++
 kernel/trace/rv/rv_trace.h                 |   2 +
 tools/verification/models/sched/nrp.dot    |  29 ++++
 tools/verification/models/sched/sssw.dot   |  24 +++
 15 files changed, 732 insertions(+)
 create mode 100644 kernel/trace/rv/monitors/nrp/Kconfig
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.c
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.h
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp_trace.h
 create mode 100644 kernel/trace/rv/monitors/sssw/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.c
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.h
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw_trace.h
 create mode 100644 tools/verification/models/sched/nrp.dot
 create mode 100644 tools/verification/models/sched/sssw.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index e4171aadef1c2..4cb6590284786 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -198,6 +198,174 @@ real task switch occurs::
   +-------------- |                    | <-------------+
                   +--------------------+
 
+Monitor nrp
+-----------
+
+The need resched preempts (nrp) monitor ensures preemption requires need
+resched. Only kernel preemptions are considered, since preemptions while
+returning to userspace, for this monitor, are indistinguishable from
+``sched_switch_yield`` (described in the sssw monitor).
+A preemption is any of the following types of ``sched_switch``:
+
+* ``sched_switch_preempt``:
+  a task is ``preempted``, this can happen after the need for ``rescheduling``
+  has been set. This is not valid for the *lazy* variant of the flag, which
+  causes only userspace preemptions.
+* ``sched_switch_vain_preempt``:
+  a task goes through the scheduler from a preemption context but it is picked
+  as the next task to run, hence no real task switch occurs. Since the
+  scheduler runs, this clears the need to reschedule.
+
+In theory, a preemption can only occur after the ``need_resched`` flag is set. In
+practice, however, it is possible to see a preemption where the flag is not
+set. This can happen in one specific condition::
+
+  need_resched
+                   preempt_schedule()
+                                           preempt_schedule_irq()
+                                                   __schedule()
+  !need_resched
+                           __schedule()
+
+In the situation above, standard preemption starts (e.g. from preempt_enable
+when the flag is set), an interrupt occurs before scheduling and, on its exit
+path, it schedules, which clears the ``need_resched`` flag.
+When the preempted task runs again, the standard preemption started earlier
+resumes, although the flag is no longer set. The monitor considers this a
+``nested_preemption``, this allows another preemption without re-setting the
+flag. This condition relaxes the monitor constraints and may catch false
+negatives (i.e. no real ``nested_preemptions``) but makes the monitor more
+robust and able to validate other scenarios.
+For simplicity, the monitor starts in ``preempt_irq``, although no interrupt
+occurred, as the situation above is tricky to pinpoint.
+
+This monitor ignores when the task is switched in, as this complicates things
+when different types of ``sched_switch`` occur (e.g. sleeping or yielding, here
+marked as ``sched_switch_other`` or ``sched_switch_vain``). The snroc monitor
+ensures a task is switched in before it can be switched out again.
+For this reason, the ``any_thread_running`` state does not imply that the
+monitored task is not running, simply it is not set for rescheduling::
+
+    sched_switch_other
+    sched_switch_vain
+    irq_entry                 #===========================================#
+  +-------------------------- H                                           H
+  |                           H                                           H
+  +-------------------------> H             any_thread_running            H
+                              H                                           H
+  +-------------------------> H                                           H
+  |                           #===========================================#
+  |                             |                       ^
+  | sched_switch_preempt        |                       | sched_switch_preempt
+  | sched_switch_vain_preempt   |                      sched_switch_vain_preempt
+  | sched_switch_other          | sched_need_resched    | sched_switch_other
+  | sched_switch_vain           |                       | sched_switch_vain
+  |                             v                       |
+  |                           +----------------------+  |
+  |                      +--- |                      |  |
+  |   sched_need_resched |    |     rescheduling     | -+
+  |                      +--> |                      |
+  |                           +----------------------+
+  |                             |
+  |                             | irq_exit
+  |                             v
+  |                           +----------------------+
+  |                           |                      | ---+
+  |                      ---> |                      |    | sched_need_resched
+  |                           |      preempt_irq     |    | irq_exit
+  |                           |                      | <--+
+  |                           |                      | <--+
+  |                           +----------------------+    |
+  |                             |                         |
+  |                             | sched_switch_preempt    |
+  |                            sched_switch_vain_preempt  |
+  |                             | sched_switch_other      | sched_need_resched
+  |                             | sched_switch_vain       |
+  |                             v                         |
+  |                           +-----------------------+   |
+  +-------------------------- |    nested_preempt     | --+
+                              +-----------------------+
+                                ^ irq_entry         |
+                                +-------------------+
+
+Due to how the ``need_resched`` flag on the preemption count works on arm64,
+this monitor is unstable on that architecture, as it often records preemptions
+when the flag is not set, even in presence of the workaround above.
+For the time being, the monitor is disabled by default on arm64.
+
+Monitor sssw
+------------
+
+The set state sleep and wakeup (sssw) monitor ensures ``sched_set_state`` to
+sleepable leads to sleeping and sleeping tasks require wakeup.
+It includes the following types of ``sched_switch``:
+
+* ``switch_suspend``:
+  a task puts itself to sleep, this can happen only after explicitly setting
+  the task to ``sleepable``. After a task is suspended, it needs to be woken up
+  (``waking`` state) before being switched in again.
+  Setting the task's state to ``sleepable`` can be reverted before switching if it
+  is woken up or set to ``runnable``.
+* ``switch_blocked``:
+  a special case of a ``switch_suspend`` where the task is waiting on a
+  sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
+  state events racing with each other and this leads the model to perceive this
+  type of switch when the task is not set to sleepable. This is a limitation of
+  the model in SMP system and workarounds may slow down the system.
+* ``switch_yield``:
+  a task explicitly calls the scheduler, this looks like a preemption as the
+  task is still runnable but the ``need_resched`` flag is not set. It can
+  happen after a ``yield`` system call or from the idle task. By definition,
+  a task cannot yield while ``sleepable`` as that would be a suspension.
+* ``switch_vain``:
+  a task explicitly calls the scheduler but it is picked as the next task to run,
+  hence no real task switch occurs. This can occur as a yield, which is not
+  valid when the task is sleepable. A special case of a yield is when a task in
+  ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is pending. The
+  task doesn't go through the usual blocking/waking and is set back to
+  runnable, the resulting switch looks like a yield.
+
+As for the nrp monitor, this monitor doesn't include a running state,
+``sleepable`` and ``runnable`` are only referring to the task's desired
+state, which could be scheduled out (e.g. due to preemption). However, it does
+include the event ``sched_switch_in`` to represent when a task is allowed to
+become running. This can be triggered also by preemption, but cannot occur
+after the task got to ``sleeping`` until a ``wakeup``::
+
+  sched_set_state_runnable
+  sched_wakeup
+  sched_switch_vain_preempt     |
+  sched_switch_preempt          |
+  sched_switch_yield            v
+  sched_switch_vain        #=============================================#
+       +-----------------> H                                             H
+       |                   H                                             H
+       +------------------ H                  runnable                   H
+                           H                                             H
+       +-----------------> H                                             H
+       |                   #=============================================#
+  sched_set_state_runnable   |                          |            ^
+  sched_wakeup               |               sched_switch_blocking   |
+       |          sched_set_state_sleepable             |            |
+       |                     v                          |            |
+       |                   +------------------------+   |       sched_wakeup
+       +------------------ |                        |   |            |
+                           |                        |   |            |
+       +-----------------> |        sleepable       |   |            |
+       |                   |                        |   |            |
+       +------------------ |                        |   |            |
+  sched_switch_in          +------------------------+   |            |
+  sched_switch_preempt                  |               |            |
+  sched_switch_vain_preempt    sched_switch_suspend     |            |
+  sched_set_state_sleepable    sched_switch_blocking    |            |
+                                        |               |            |
+                                        v               |            |
+                           +------------------------+   |            |
+                           |         sleeping       | <-+            |
+                           +------------------------+                |
+                                        |                            |
+                                        +----------------------------+
+
 References
 ----------
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index da62d97d329aa..2cce1f22892a0 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -51,6 +51,8 @@ source "kernel/trace/rv/monitors/scpd/Kconfig"
 source "kernel/trace/rv/monitors/snep/Kconfig"
 source "kernel/trace/rv/monitors/sncid/Kconfig"
 source "kernel/trace/rv/monitors/sts/Kconfig"
+source "kernel/trace/rv/monitors/nrp/Kconfig"
+source "kernel/trace/rv/monitors/sssw/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index db8653f7cedf3..3f517fe639c5a 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -15,6 +15,8 @@ obj-$(CONFIG_RV_MON_RTAPP) += monitors/rtapp/rtapp.o
 obj-$(CONFIG_RV_MON_PAGEFAULT) += monitors/pagefault/pagefault.o
 obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
 obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
+obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
+obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/nrp/Kconfig b/kernel/trace/rv/monitors/nrp/Kconfig
new file mode 100644
index 0000000000000..f37ff70e8d204
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_NRP
+	depends on RV
+	depends on RV_MON_SCHED
+	default y if !ARCH_ARM64
+	select DA_MON_EVENTS_ID
+	bool "nrp monitor"
+	help
+	  Monitor to ensure preemption requires need resched.
+	  This monitor is part of the sched monitors collection.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/nrp/nrp.c b/kernel/trace/rv/monitors/nrp/nrp.c
new file mode 100644
index 0000000000000..90600497dc487
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "nrp"
+
+#include <trace/events/irq.h>
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "nrp.h"
+
+static struct rv_monitor rv_nrp;
+DECLARE_DA_MON_PER_TASK(nrp, unsigned char);
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+	da_handle_event_nrp(current, irq_entry_nrp);
+}
+
+static void attach_vector_irq(void)
+{
+	rv_attach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_attach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_attach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+static void detach_vector_irq(void)
+{
+	rv_detach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_detach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_detach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq(void) { }
+static void detach_vector_irq(void) { }
+#endif
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+	da_handle_event_nrp(current, irq_entry_nrp);
+}
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk,
+				      int cpu, int tif)
+{
+	if (tif == TIF_NEED_RESCHED)
+		da_handle_start_event_nrp(tsk, sched_need_resched_nrp);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+				struct task_struct *prev,
+				struct task_struct *next,
+				unsigned int prev_state)
+{
+	if (preempt)
+		da_handle_event_nrp(prev, sched_switch_preempt_nrp);
+	else
+		da_handle_event_nrp(prev, sched_switch_other_nrp);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+				     struct task_struct *tsk,
+				     unsigned int tsk_state)
+{
+	if (preempt)
+		da_handle_event_nrp(tsk, sched_switch_vain_preempt_nrp);
+	else
+		da_handle_event_nrp(tsk, sched_switch_vain_nrp);
+}
+
+static int enable_nrp(void)
+{
+	int retval;
+
+	retval = da_monitor_init_nrp();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
+	rv_attach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_attach_trace_probe("nrp", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("nrp", sched_switch_vain_tp, handle_sched_switch_vain);
+	attach_vector_irq();
+
+	return 0;
+}
+
+static void disable_nrp(void)
+{
+	rv_nrp.enabled = 0;
+
+	rv_detach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
+	rv_detach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_detach_trace_probe("nrp", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("nrp", sched_switch_vain_tp, handle_sched_switch_vain);
+	detach_vector_irq();
+
+	da_monitor_destroy_nrp();
+}
+
+static struct rv_monitor rv_nrp = {
+	.name = "nrp",
+	.description = "need resched preempts.",
+	.enable = enable_nrp,
+	.disable = disable_nrp,
+	.reset = da_monitor_reset_all_nrp,
+	.enabled = 0,
+};
+
+static int __init register_nrp(void)
+{
+	return rv_register_monitor(&rv_nrp, &rv_sched);
+}
+
+static void __exit unregister_nrp(void)
+{
+	rv_unregister_monitor(&rv_nrp);
+}
+
+module_init(register_nrp);
+module_exit(unregister_nrp);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("nrp: need resched preempts.");
diff --git a/kernel/trace/rv/monitors/nrp/nrp.h b/kernel/trace/rv/monitors/nrp/nrp.h
new file mode 100644
index 0000000000000..376c5a03dd2e9
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of nrp automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_nrp {
+	preempt_irq_nrp = 0,
+	any_thread_running_nrp,
+	nested_preempt_nrp,
+	rescheduling_nrp,
+	state_max_nrp
+};
+
+#define INVALID_STATE state_max_nrp
+
+enum events_nrp {
+	irq_entry_nrp = 0,
+	sched_need_resched_nrp,
+	sched_switch_other_nrp,
+	sched_switch_preempt_nrp,
+	sched_switch_vain_nrp,
+	sched_switch_vain_preempt_nrp,
+	event_max_nrp
+};
+
+struct automaton_nrp {
+	char *state_names[state_max_nrp];
+	char *event_names[event_max_nrp];
+	unsigned char function[state_max_nrp][event_max_nrp];
+	unsigned char initial_state;
+	bool final_states[state_max_nrp];
+};
+
+static const struct automaton_nrp automaton_nrp = {
+	.state_names = {
+		"preempt_irq",
+		"any_thread_running",
+		"nested_preempt",
+		"rescheduling"
+	},
+	.event_names = {
+		"irq_entry",
+		"sched_need_resched",
+		"sched_switch_other",
+		"sched_switch_preempt",
+		"sched_switch_vain",
+		"sched_switch_vain_preempt"
+	},
+	.function = {
+		{
+			preempt_irq_nrp,
+			preempt_irq_nrp,
+			nested_preempt_nrp,
+			nested_preempt_nrp,
+			nested_preempt_nrp,
+			nested_preempt_nrp
+		},
+		{
+			any_thread_running_nrp,
+			rescheduling_nrp,
+			any_thread_running_nrp,
+			INVALID_STATE,
+			any_thread_running_nrp,
+			INVALID_STATE
+		},
+		{
+			nested_preempt_nrp,
+			preempt_irq_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp
+		},
+		{
+			preempt_irq_nrp,
+			rescheduling_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp
+		},
+	},
+	.initial_state = preempt_irq_nrp,
+	.final_states = { 0, 1, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/nrp/nrp_trace.h b/kernel/trace/rv/monitors/nrp/nrp_trace.h
new file mode 100644
index 0000000000000..2e13497de3b6f
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_NRP
+DEFINE_EVENT(event_da_monitor_id, event_nrp,
+	     TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_nrp,
+	     TP_PROTO(int id, char *state, char *event),
+	     TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_NRP */
diff --git a/kernel/trace/rv/monitors/sched/Kconfig b/kernel/trace/rv/monitors/sched/Kconfig
index ae3eb410abd78..aa16456da8647 100644
--- a/kernel/trace/rv/monitors/sched/Kconfig
+++ b/kernel/trace/rv/monitors/sched/Kconfig
@@ -2,6 +2,7 @@
 #
 config RV_MON_SCHED
 	depends on RV
+	depends on RV_PER_TASK_MONITORS >= 3
 	bool "sched monitor"
 	help
 	  Collection of monitors to check the scheduler behaves according to specifications.
diff --git a/kernel/trace/rv/monitors/sssw/Kconfig b/kernel/trace/rv/monitors/sssw/Kconfig
new file mode 100644
index 0000000000000..23b7eeb38bbfc
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_SSSW
+	depends on RV
+	depends on RV_MON_SCHED
+	default y
+	select DA_MON_EVENTS_ID
+	bool "sssw monitor"
+	help
+	  Monitor to ensure sched_set_state to sleepable leads to sleeping and
+	  sleeping tasks require wakeup.
+	  This monitor is part of the sched monitors collection.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sssw/sssw.c b/kernel/trace/rv/monitors/sssw/sssw.c
new file mode 100644
index 0000000000000..efa695dead36a
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sssw"
+
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sssw.h"
+
+static struct rv_monitor rv_sssw;
+DECLARE_DA_MON_PER_TASK(sssw, unsigned char);
+
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+				   int state, bool from_signal)
+{
+	if (state == TASK_RUNNING)
+		da_handle_start_event_sssw(tsk, sched_set_state_runnable_sssw);
+	else
+		da_handle_event_sssw(tsk, sched_set_state_sleepable_sssw);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+				struct task_struct *prev,
+				struct task_struct *next,
+				unsigned int prev_state)
+{
+	if (preempt)
+		da_handle_event_sssw(prev, sched_switch_preempt_sssw);
+	else if (prev_state == TASK_RUNNING)
+		da_handle_start_event_sssw(prev, sched_switch_yield_sssw);
+	else if (prev_state == TASK_RTLOCK_WAIT)
+		/* special case of sleeping task with racy conditions */
+		da_handle_event_sssw(prev, sched_switch_blocking_sssw);
+	else
+		da_handle_event_sssw(prev, sched_switch_suspend_sssw);
+	da_handle_event_sssw(next, sched_switch_in_sssw);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+				     struct task_struct *tsk,
+				     unsigned int tsk_state)
+{
+	if (preempt)
+		da_handle_event_sssw(tsk, sched_switch_vain_preempt_sssw);
+	else
+		da_handle_start_event_sssw(tsk, sched_switch_vain_sssw);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+	da_handle_start_event_sssw(p, sched_wakeup_sssw);
+}
+
+static int enable_sssw(void)
+{
+	int retval;
+
+	retval = da_monitor_init_sssw();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+	rv_attach_trace_probe("sssw", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("sssw", sched_switch_vain_tp, handle_sched_switch_vain);
+	rv_attach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+
+	return 0;
+}
+
+static void disable_sssw(void)
+{
+	rv_sssw.enabled = 0;
+
+	rv_detach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+	rv_detach_trace_probe("sssw", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("sssw", sched_switch_vain_tp, handle_sched_switch_vain);
+	rv_detach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+
+	da_monitor_destroy_sssw();
+}
+
+static struct rv_monitor rv_sssw = {
+	.name = "sssw",
+	.description = "set state sleep and wakeup.",
+	.enable = enable_sssw,
+	.disable = disable_sssw,
+	.reset = da_monitor_reset_all_sssw,
+	.enabled = 0,
+};
+
+static int __init register_sssw(void)
+{
+	return rv_register_monitor(&rv_sssw, &rv_sched);
+}
+
+static void __exit unregister_sssw(void)
+{
+	rv_unregister_monitor(&rv_sssw);
+}
+
+module_init(register_sssw);
+module_exit(unregister_sssw);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sssw: set state sleep and wakeup.");
diff --git a/kernel/trace/rv/monitors/sssw/sssw.h b/kernel/trace/rv/monitors/sssw/sssw.h
new file mode 100644
index 0000000000000..2b29230f29cff
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.h
@@ -0,0 +1,97 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sssw automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sssw {
+	runnanble_sssw = 0,
+	sleepable_sssw,
+	sleeping_sssw,
+	state_max_sssw
+};
+
+#define INVALID_STATE state_max_sssw
+
+enum events_sssw {
+	sched_set_state_runnable_sssw = 0,
+	sched_set_state_sleepable_sssw,
+	sched_switch_blocking_sssw,
+	sched_switch_in_sssw,
+	sched_switch_preempt_sssw,
+	sched_switch_suspend_sssw,
+	sched_switch_vain_sssw,
+	sched_switch_vain_preempt_sssw,
+	sched_switch_yield_sssw,
+	sched_wakeup_sssw,
+	event_max_sssw
+};
+
+struct automaton_sssw {
+	char *state_names[state_max_sssw];
+	char *event_names[event_max_sssw];
+	unsigned char function[state_max_sssw][event_max_sssw];
+	unsigned char initial_state;
+	bool final_states[state_max_sssw];
+};
+
+static const struct automaton_sssw automaton_sssw = {
+	.state_names = {
+		"runnanble",
+		"sleepable",
+		"sleeping"
+	},
+	.event_names = {
+		"sched_set_state_runnable",
+		"sched_set_state_sleepable",
+		"sched_switch_blocking",
+		"sched_switch_in",
+		"sched_switch_preempt",
+		"sched_switch_suspend",
+		"sched_switch_vain",
+		"sched_switch_vain_preempt",
+		"sched_switch_yield",
+		"sched_wakeup"
+	},
+	.function = {
+		{
+			runnanble_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			runnanble_sssw,
+			runnanble_sssw,
+			INVALID_STATE,
+			runnanble_sssw,
+			runnanble_sssw,
+			runnanble_sssw,
+			runnanble_sssw
+		},
+		{
+			runnanble_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			sleepable_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			INVALID_STATE,
+			sleepable_sssw,
+			INVALID_STATE,
+			runnanble_sssw
+		},
+		{
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			runnanble_sssw
+		},
+	},
+	.initial_state = runnanble_sssw,
+	.final_states = { 1, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/sssw/sssw_trace.h b/kernel/trace/rv/monitors/sssw/sssw_trace.h
new file mode 100644
index 0000000000000..6c03cfc6960bf
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_SSSW
+DEFINE_EVENT(event_da_monitor_id, event_sssw,
+	     TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_sssw,
+	     TP_PROTO(int id, char *state, char *event),
+	     TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_SSSW */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index dbd2f2ef513a7..b5cb1d1807fd2 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -124,6 +124,8 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
 
 #include <monitors/wwnr/wwnr_trace.h>
 #include <monitors/snroc/snroc_trace.h>
+#include <monitors/nrp/nrp_trace.h>
+#include <monitors/sssw/sssw_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
 
 #endif /* CONFIG_DA_MON_EVENTS_ID */
diff --git a/tools/verification/models/sched/nrp.dot b/tools/verification/models/sched/nrp.dot
new file mode 100644
index 0000000000000..337ea310f7f8a
--- /dev/null
+++ b/tools/verification/models/sched/nrp.dot
@@ -0,0 +1,29 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = doublecircle] "any_thread_running"};
+	{node [shape = circle] "any_thread_running"};
+	{node [shape = circle] "nested_preempt"};
+	{node [shape = plaintext, style=invis, label=""] "__init_preempt_irq"};
+	{node [shape = circle] "preempt_irq"};
+	{node [shape = circle] "rescheduling"};
+	"__init_preempt_irq" -> "preempt_irq";
+	"any_thread_running" [label = "any_thread_running", color = green3];
+	"any_thread_running" -> "any_thread_running" [ label = "sched_switch_other\nsched_switch_vain\nirq_entry" ];
+	"any_thread_running" -> "rescheduling" [ label = "sched_need_resched" ];
+	"nested_preempt" [label = "nested_preempt"];
+	"nested_preempt" -> "any_thread_running" [ label = "sched_switch_preempt\nsched_switch_vain_preempt\nsched_switch_other\nsched_switch_vain" ];
+	"nested_preempt" -> "nested_preempt" [ label = "irq_entry" ];
+	"nested_preempt" -> "preempt_irq" [ label = "sched_need_resched" ];
+	"preempt_irq" [label = "preempt_irq"];
+	"preempt_irq" -> "nested_preempt" [ label = "sched_switch_preempt\nsched_switch_vain_preempt\nsched_switch_other\nsched_switch_vain" ];
+	"preempt_irq" -> "preempt_irq" [ label = "irq_entry\nsched_need_resched" ];
+	"rescheduling" [label = "rescheduling"];
+	"rescheduling" -> "any_thread_running" [ label = "sched_switch_preempt\nsched_switch_vain_preempt\nsched_switch_other\nsched_switch_vain" ];
+	"rescheduling" -> "preempt_irq" [ label = "irq_entry" ];
+	"rescheduling" -> "rescheduling" [ label = "sched_need_resched" ];
+	{ rank = min ;
+		"__init_preempt_irq";
+		"preempt_irq";
+	}
+}
diff --git a/tools/verification/models/sched/sssw.dot b/tools/verification/models/sched/sssw.dot
new file mode 100644
index 0000000000000..362e783f2bd58
--- /dev/null
+++ b/tools/verification/models/sched/sssw.dot
@@ -0,0 +1,24 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_runnanble"};
+	{node [shape = doublecircle] "runnanble"};
+	{node [shape = circle] "runnanble"};
+	{node [shape = circle] "sleepable"};
+	{node [shape = circle] "sleeping"};
+	"__init_runnanble" -> "runnanble";
+	"runnanble" [label = "runnanble", color = green3];
+	"runnanble" -> "runnanble" [ label = "sched_set_state_runnable\nsched_wakeup\nsched_switch_in\nsched_switch_vain\nsched_switch_yield\nsched_switch_preempt\nsched_switch_vain_preempt" ];
+	"runnanble" -> "sleepable" [ label = "sched_set_state_sleepable" ];
+	"runnanble" -> "sleeping" [ label = "sched_switch_blocking" ];
+	"sleepable" [label = "sleepable"];
+	"sleepable" -> "runnanble" [ label = "sched_set_state_runnable\nsched_wakeup" ];
+	"sleepable" -> "sleepable" [ label = "sched_set_state_sleepable\nsched_switch_in\nsched_switch_preempt\nsched_switch_vain_preempt" ];
+	"sleepable" -> "sleeping" [ label = "sched_switch_suspend\nsched_switch_blocking" ];
+	"sleeping" [label = "sleeping"];
+	"sleeping" -> "runnanble" [ label = "sched_wakeup" ];
+	{ rank = min ;
+		"__init_runnanble";
+		"runnanble";
+	}
+}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v3 17/17] rv: Add opid per-cpu monitor
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (15 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 16/17] rv: Add nrp and sssw per-task monitors Gabriele Monaco
@ 2025-07-15  7:14 ` Gabriele Monaco
  2025-07-16  9:38   ` Nam Cao
  2025-07-16  9:42 ` [PATCH v3 00/17] rv: Add monitors to validate task switch Nam Cao
  17 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15  7:14 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
    Monitor to ensure wakeup and need_resched occur with irq and
    preemption disabled or in irq handlers.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst   |  55 +++++++
 kernel/trace/rv/Kconfig                    |   1 +
 kernel/trace/rv/Makefile                   |   1 +
 kernel/trace/rv/monitors/nrp/Kconfig       |   2 +-
 kernel/trace/rv/monitors/opid/Kconfig      |  17 +++
 kernel/trace/rv/monitors/opid/opid.c       | 169 +++++++++++++++++++++
 kernel/trace/rv/monitors/opid/opid.h       | 104 +++++++++++++
 kernel/trace/rv/monitors/opid/opid_trace.h |  15 ++
 kernel/trace/rv/rv_trace.h                 |   1 +
 tools/verification/models/sched/opid.dot   |  35 +++++
 10 files changed, 399 insertions(+), 1 deletion(-)
 create mode 100644 kernel/trace/rv/monitors/opid/Kconfig
 create mode 100644 kernel/trace/rv/monitors/opid/opid.c
 create mode 100644 kernel/trace/rv/monitors/opid/opid.h
 create mode 100644 kernel/trace/rv/monitors/opid/opid_trace.h
 create mode 100644 tools/verification/models/sched/opid.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 4cb6590284786..6f796e22dbab6 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -366,6 +366,61 @@ after the task got to ``sleeping`` until a ``wakeup``::
                                         |                            |
                                         +----------------------------+
 
+Monitor opid
+------------
+
+The operations with preemption and irq disabled (opid) monitor ensures
+operations like ``wakeup`` and ``need_resched`` occur with interrupts and
+preemption disabled or during interrupt context, in such case preemption may
+not be disabled explicitly.
+``need_resched`` can be set by some RCU internals functions, in which case it
+doesn't match a task wakeup and might occur with only interrupts disabled::
+
+                 |                     sched_need_resched
+                 |                     sched_waking
+                 |                     irq_entry
+                 |                   +--------------------+
+                 v                   v                    |
+               +------------------------------------------------------+
+  +----------- |                     disabled                         | <+
+  |            +------------------------------------------------------+  |
+  |              |                 ^                                     |
+  |              |          preempt_disable      sched_need_resched      |
+  |       preempt_enable           |           +--------------------+    |
+  |              v                 |           v                    |    |
+  |            +------------------------------------------------------+  |
+  |            |                   irq_disabled                       |  |
+  |            +------------------------------------------------------+  |
+  |                              |             |        ^                |
+  |                          irq_entry         |        |                |
+  |     sched_need_resched       v             |   irq_disable           |
+  |     sched_waking +--------------+          |        |                |
+  |           +----- |              |     irq_enable    |                |
+  |           |      |    in_irq    |          |        |                |
+  |           +----> |              |          |        |                |
+  |                  +--------------+          |        |          irq_disable
+  |                     |                      |        |                |
+  | irq_enable          | irq_enable           |        |                |
+  |                     v                      v        |                |
+  |            #======================================================#  |
+  |            H                     enabled                          H  |
+  |            #======================================================#  |
+  |              |                   ^         ^ preempt_enable     |    |
+  |       preempt_disable     preempt_enable   +--------------------+    |
+  |              v                   |                                   |
+  |            +------------------+  |                                   |
+  +----------> | preempt_disabled | -+                                   |
+               +------------------+                                      |
+                 |                                                       |
+                 +-------------------------------------------------------+
+
+This monitor is designed to work on ``PREEMPT_RT`` kernels, the special case of
+events occurring in interrupt context is a shortcut to identify valid scenarios
+where the preemption tracepoints might not be visible, during interrupts
+preemption is always disabled. On non- ``PREEMPT_RT`` kernels, the interrupts
+might invoke a softirq to set ``need_resched`` and wake up a task. This is
+another special case that is currently not supported by the monitor.
+
 References
 ----------
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 2cce1f22892a0..86d4c59ca4966 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -53,6 +53,7 @@ source "kernel/trace/rv/monitors/sncid/Kconfig"
 source "kernel/trace/rv/monitors/sts/Kconfig"
 source "kernel/trace/rv/monitors/nrp/Kconfig"
 source "kernel/trace/rv/monitors/sssw/Kconfig"
+source "kernel/trace/rv/monitors/opid/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 3f517fe639c5a..955d0947122ce 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
 obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
 obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
 obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
+obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/nrp/Kconfig b/kernel/trace/rv/monitors/nrp/Kconfig
index f37ff70e8d204..a175c430d351f 100644
--- a/kernel/trace/rv/monitors/nrp/Kconfig
+++ b/kernel/trace/rv/monitors/nrp/Kconfig
@@ -3,7 +3,7 @@
 config RV_MON_NRP
 	depends on RV
 	depends on RV_MON_SCHED
-	default y if !ARCH_ARM64
+	default y if !ARM64
 	select DA_MON_EVENTS_ID
 	bool "nrp monitor"
 	help
diff --git a/kernel/trace/rv/monitors/opid/Kconfig b/kernel/trace/rv/monitors/opid/Kconfig
new file mode 100644
index 0000000000000..23b43d2704153
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_OPID
+	depends on RV
+	depends on TRACE_IRQFLAGS
+	depends on TRACE_PREEMPT_TOGGLE
+	depends on RV_MON_SCHED
+	default y if PREEMPT_RT
+	select DA_MON_EVENTS_IMPLICIT
+	bool "opid monitor"
+	help
+	  Monitor to ensure operations like wakeup and need resched occur with
+	  interrupts and preemption disabled or during IRQs, where preemption
+	  may not be disabled explicitly.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
new file mode 100644
index 0000000000000..4f1902f24805c
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "opid"
+
+#include <trace/events/sched.h>
+#include <trace/events/irq.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "opid.h"
+
+static struct rv_monitor rv_opid;
+DECLARE_DA_MON_PER_CPU(opid, unsigned char);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+	da_handle_event_opid(irq_entry_opid);
+}
+
+static void attach_vector_irq(void)
+{
+	rv_attach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_attach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_attach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+static void detach_vector_irq(void)
+{
+	rv_detach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_detach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_detach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq(void) { }
+static void detach_vector_irq(void) { }
+#endif
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(irq_disable_opid);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(irq_enable_opid);
+}
+
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+	da_handle_event_opid(irq_entry_opid);
+}
+
+static void handle_preempt_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(preempt_disable_opid);
+}
+
+static void handle_preempt_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(preempt_enable_opid);
+}
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk, int cpu, int tif)
+{
+	/* The monitor's intitial state is not in_irq */
+	if (this_cpu_read(hardirq_context))
+		da_handle_event_opid(sched_need_resched_opid);
+	else
+		da_handle_start_event_opid(sched_need_resched_opid);
+}
+
+static void handle_sched_waking(void *data, struct task_struct *p)
+{
+	/* The monitor's intitial state is not in_irq */
+	if (this_cpu_read(hardirq_context))
+		da_handle_event_opid(sched_waking_opid);
+	else
+		da_handle_start_event_opid(sched_waking_opid);
+}
+
+static int enable_opid(void)
+{
+	int retval;
+
+	retval = da_monitor_init_opid();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("opid", irq_disable, handle_irq_disable);
+	rv_attach_trace_probe("opid", irq_enable, handle_irq_enable);
+	rv_attach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+	rv_attach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+	rv_attach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+	rv_attach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_attach_trace_probe("opid", sched_waking, handle_sched_waking);
+	attach_vector_irq();
+
+	return 0;
+}
+
+static void disable_opid(void)
+{
+	rv_opid.enabled = 0;
+
+	rv_detach_trace_probe("opid", irq_disable, handle_irq_disable);
+	rv_detach_trace_probe("opid", irq_enable, handle_irq_enable);
+	rv_detach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+	rv_detach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+	rv_detach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+	rv_detach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_detach_trace_probe("opid", sched_waking, handle_sched_waking);
+	detach_vector_irq();
+
+	da_monitor_destroy_opid();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_opid = {
+	.name = "opid",
+	.description = "operations with preemption and irq disabled.",
+	.enable = enable_opid,
+	.disable = disable_opid,
+	.reset = da_monitor_reset_all_opid,
+	.enabled = 0,
+};
+
+static int __init register_opid(void)
+{
+	rv_register_monitor(&rv_opid, &rv_sched);
+	return 0;
+}
+
+static void __exit unregister_opid(void)
+{
+	rv_unregister_monitor(&rv_opid);
+}
+
+module_init(register_opid);
+module_exit(unregister_opid);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("opid: operations with preemption and irq disabled.");
diff --git a/kernel/trace/rv/monitors/opid/opid.h b/kernel/trace/rv/monitors/opid/opid.h
new file mode 100644
index 0000000000000..efdbcda8fff30
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of opid automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_opid {
+	disabled_opid = 0,
+	enabled_opid,
+	in_irq_opid,
+	irq_disabled_opid,
+	preempt_disabled_opid,
+	state_max_opid
+};
+
+#define INVALID_STATE state_max_opid
+
+enum events_opid {
+	irq_disable_opid = 0,
+	irq_enable_opid,
+	irq_entry_opid,
+	preempt_disable_opid,
+	preempt_enable_opid,
+	sched_need_resched_opid,
+	sched_waking_opid,
+	event_max_opid
+};
+
+struct automaton_opid {
+	char *state_names[state_max_opid];
+	char *event_names[event_max_opid];
+	unsigned char function[state_max_opid][event_max_opid];
+	unsigned char initial_state;
+	bool final_states[state_max_opid];
+};
+
+static const struct automaton_opid automaton_opid = {
+	.state_names = {
+		"disabled",
+		"enabled",
+		"in_irq",
+		"irq_disabled",
+		"preempt_disabled"
+	},
+	.event_names = {
+		"irq_disable",
+		"irq_enable",
+		"irq_entry",
+		"preempt_disable",
+		"preempt_enable",
+		"sched_need_resched",
+		"sched_waking"
+	},
+	.function = {
+		{
+			INVALID_STATE,
+			preempt_disabled_opid,
+			disabled_opid,
+			INVALID_STATE,
+			irq_disabled_opid,
+			disabled_opid,
+			disabled_opid
+		},
+		{
+			irq_disabled_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			preempt_disabled_opid,
+			enabled_opid,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			enabled_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			in_irq_opid,
+			in_irq_opid
+		},
+		{
+			INVALID_STATE,
+			enabled_opid,
+			in_irq_opid,
+			disabled_opid,
+			INVALID_STATE,
+			irq_disabled_opid,
+			INVALID_STATE
+		},
+		{
+			disabled_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			enabled_opid,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+	},
+	.initial_state = disabled_opid,
+	.final_states = { 0, 1, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/opid/opid_trace.h b/kernel/trace/rv/monitors/opid/opid_trace.h
new file mode 100644
index 0000000000000..3df6ff955c300
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_OPID
+DEFINE_EVENT(event_da_monitor, event_opid,
+	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor, error_opid,
+	     TP_PROTO(char *state, char *event),
+	     TP_ARGS(state, event));
+#endif /* CONFIG_RV_MON_OPID */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index b5cb1d1807fd2..c2e841a080df6 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -63,6 +63,7 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 #include <monitors/snep/snep_trace.h>
 #include <monitors/sncid/sncid_trace.h>
 #include <monitors/sts/sts_trace.h>
+#include <monitors/opid/opid_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
 
 #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/opid.dot b/tools/verification/models/sched/opid.dot
new file mode 100644
index 0000000000000..2d5e1df3405f2
--- /dev/null
+++ b/tools/verification/models/sched/opid.dot
@@ -0,0 +1,35 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_disabled"};
+	{node [shape = circle] "disabled"};
+	{node [shape = doublecircle] "enabled"};
+	{node [shape = circle] "enabled"};
+	{node [shape = circle] "in_irq"};
+	{node [shape = circle] "irq_disabled"};
+	{node [shape = circle] "preempt_disabled"};
+	"__init_disabled" -> "disabled";
+	"disabled" [label = "disabled"];
+	"disabled" -> "disabled" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
+	"disabled" -> "irq_disabled" [ label = "preempt_enable" ];
+	"disabled" -> "preempt_disabled" [ label = "irq_enable" ];
+	"enabled" [label = "enabled", color = green3];
+	"enabled" -> "enabled" [ label = "preempt_enable" ];
+	"enabled" -> "irq_disabled" [ label = "irq_disable" ];
+	"enabled" -> "preempt_disabled" [ label = "preempt_disable" ];
+	"in_irq" [label = "in_irq"];
+	"in_irq" -> "enabled" [ label = "irq_enable" ];
+	"in_irq" -> "in_irq" [ label = "sched_need_resched\nsched_waking" ];
+	"irq_disabled" [label = "irq_disabled"];
+	"irq_disabled" -> "disabled" [ label = "preempt_disable" ];
+	"irq_disabled" -> "enabled" [ label = "irq_enable" ];
+	"irq_disabled" -> "in_irq" [ label = "irq_entry" ];
+	"irq_disabled" -> "irq_disabled" [ label = "sched_need_resched" ];
+	"preempt_disabled" [label = "preempt_disabled"];
+	"preempt_disabled" -> "disabled" [ label = "irq_disable" ];
+	"preempt_disabled" -> "enabled" [ label = "preempt_enable" ];
+	{ rank = min ;
+		"__init_disabled";
+		"disabled";
+	}
+}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 06/17] rv: Use strings in da monitors tracepoints
  2025-07-15  7:14 ` [PATCH v3 06/17] rv: Use strings in da monitors tracepoints Gabriele Monaco
@ 2025-07-15 14:11   ` Nam Cao
  0 siblings, 0 replies; 57+ messages in thread
From: Nam Cao @ 2025-07-15 14:11 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:23AM +0200, Gabriele Monaco wrote:
> Using DA monitors tracepoints with KASAN enabled triggers the following
> warning:
> 
>  BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
>  Read of size 32 at addr ffffffffaada8980 by task ...
>  Call Trace:
>   <TASK>
>  [...]
>   do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
>   ? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
>   ? trace_event_sncid+0x83/0x200
>   trace_event_sncid+0x163/0x200
>  [...]
>  The buggy address belongs to the variable:
>   automaton_snep+0x4e0/0x5e0
> 
> This is caused by the tracepoints reading 32 bits __array instead of
                                               ^
                                               bytes

> __string from the automata definition. Such strings are literals and
> reading 32 bytes ends up in out of bound memory accesses (e.g. the next
> automaton's data in this case).
> The error is harmless as, while printing the string, we stop at the null
> terminator, but it should still be fixed.

I'm not sure about the "harmless" part. What if this string is at the very
end of kernel image? Wouldn't unmapped memory be read?

> Use the __string facilities while defining the tracepoints to avoid
> reading out of bound memory.
> 
> Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

The change ifself looks good:

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors
  2025-07-15  7:14 ` [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors Gabriele Monaco
@ 2025-07-15 14:48   ` Nam Cao
  2025-07-16  7:40     ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-15 14:48 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:25AM +0200, Gabriele Monaco wrote:
> The current behaviour of rvgen when running with the -a option is to
> append the necessary lines at the end of the configuration for Kconfig,
> Makefile and tracepoints.
> This is not always the desired behaviour in case of nested monitors:
> while tracepoints are not affected by nesting and the Makefile's only
> requirement is that the parent monitor is built before its children, in
> the Kconfig it is better to have children defined right after their
> parent, otherwise the result has wrong indentation:
> 
> [*]   foo_parent monitor
> [*]     foo_child1 monitor
> [*]     foo_child2 monitor
> [*]   bar_parent monitor
> [*]     bar_child1 monitor
> [*]     bar_child2 monitor
> [*]   foo_child3 monitor
> [*]   foo_child4 monitor
> 
> Adapt rvgen to look for a different marker for nested monitors in the
> Kconfig file and append the line right after the last sibling, instead
> of the last monitor.
> Also add the marker when creating a new parent monitor.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  kernel/trace/rv/Kconfig                     |  5 +++++
>  tools/verification/rvgen/rvgen/container.py | 13 +++++++++++++
>  tools/verification/rvgen/rvgen/generator.py | 13 ++++++++-----
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
> index c11bf7e61ebf0..26017378f79b8 100644
> --- a/kernel/trace/rv/Kconfig
> +++ b/kernel/trace/rv/Kconfig
> @@ -43,6 +43,7 @@ config RV_PER_TASK_MONITORS
>  
>  source "kernel/trace/rv/monitors/wip/Kconfig"
>  source "kernel/trace/rv/monitors/wwnr/Kconfig"
> +
>  source "kernel/trace/rv/monitors/sched/Kconfig"
>  source "kernel/trace/rv/monitors/tss/Kconfig"
>  source "kernel/trace/rv/monitors/sco/Kconfig"
> @@ -50,9 +51,13 @@ source "kernel/trace/rv/monitors/snroc/Kconfig"
>  source "kernel/trace/rv/monitors/scpd/Kconfig"
>  source "kernel/trace/rv/monitors/snep/Kconfig"
>  source "kernel/trace/rv/monitors/sncid/Kconfig"
> +# Add new sched monitors here
> +
>  source "kernel/trace/rv/monitors/rtapp/Kconfig"
>  source "kernel/trace/rv/monitors/pagefault/Kconfig"
>  source "kernel/trace/rv/monitors/sleep/Kconfig"
> +# Add new rtapp monitors here
> +
>  # Add new monitors here
>  
>  config RV_REACTORS
> diff --git a/tools/verification/rvgen/rvgen/container.py b/tools/verification/rvgen/rvgen/container.py
> index 47d8ab2ad3ec4..fee493f89fde6 100644
> --- a/tools/verification/rvgen/rvgen/container.py
> +++ b/tools/verification/rvgen/rvgen/container.py
> @@ -20,3 +20,16 @@ class Container(generator.RVGenerator):
>          main_h = self.main_h
>          main_h = main_h.replace("%%MODEL_NAME%%", self.name)
>          return main_h
> +
> +    def fill_kconfig_tooltip(self):
> +        """Override to produce a marker for this container in the Kconfig"""
> +        container_marker = f"# Add new {self.name} monitors here\n"
> +        if self.auto_patch:
> +            self._patch_file("Kconfig",
> +                            "# Add new monitors here", "")

Isn't this one excessive? I gave it a try, but I had double blank line:

python3 tools/verification/rvgen -a container -n hello

   61 source "kernel/trace/rv/monitors/sleep/Kconfig"
   62 # Add new rtapp monitors here
   63 
+  64 
+  65 source "kernel/trace/rv/monitors/hello/Kconfig"
+  66 # Add new hello monitors here
+  67 
   68 # Add new monitors here
   69 
   70 config RV_REACTORS

> +        result = super().fill_kconfig_tooltip()
> +        if self.auto_patch:
> +            self._patch_file("Kconfig",
> +                            "# Add new monitors here", container_marker)
> +            return result
> +        return result + container_marker
> diff --git a/tools/verification/rvgen/rvgen/generator.py b/tools/verification/rvgen/rvgen/generator.py
> index 19d0078a38032..ac97c4505ff00 100644
> --- a/tools/verification/rvgen/rvgen/generator.py
> +++ b/tools/verification/rvgen/rvgen/generator.py
> @@ -137,7 +137,8 @@ class RVGenerator:
>          kconfig = kconfig.replace("%%MONITOR_DEPS%%", monitor_deps)
>          return kconfig
>  
> -    def __patch_file(self, file, marker, line):
> +    def _patch_file(self, file, marker, line):
> +        assert(self.auto_patch)
>          file_to_patch = os.path.join(self.rv_dir, file)
>          content = self._read_file(file_to_patch)
>          content = content.replace(marker, line + "\n" + marker)
> @@ -146,7 +147,7 @@ class RVGenerator:
>      def fill_tracepoint_tooltip(self):
>          monitor_class_type = self.fill_monitor_class_type()
>          if self.auto_patch:
> -            self.__patch_file("rv_trace.h",
> +            self._patch_file("rv_trace.h",
>                              "// Add new monitors based on CONFIG_%s here" % monitor_class_type,
>                              "#include <monitors/%s/%s_trace.h>" % (self.name, self.name))
>              return "  - Patching %s/rv_trace.h, double check the result" % self.rv_dir
> @@ -158,8 +159,10 @@ Add this line where other tracepoints are included and %s is defined:
>  
>      def fill_kconfig_tooltip(self):
>          if self.auto_patch:
> -            self.__patch_file("Kconfig",
> -                            "# Add new monitors here",
> +            # monitors with a container should stay together in the Kconfig
> +            self._patch_file("Kconfig",
> +                            "# Add new %smonitors here" %
> +                              (self.parent + " " if self.parent else ""),

This one must be kept in sync with container_marker in
Container.fill_kconfig_tooltip(). I think this is hard to maintain later
on.

How about we keep in centralized with a helper function, something like:

def container_marker(container: str) -> str:
    return f"# Add new {container} monitors here\n"

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit
  2025-07-15  7:14 ` [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit Gabriele Monaco
@ 2025-07-15 15:01   ` Nam Cao
       [not found]     ` <d69862275becf1d296c80a08b29b2081857a85a1.camel@redhat.com>
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-15 15:01 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Tue, Jul 15, 2025 at 09:14:26AM +0200, Gabriele Monaco wrote:
> From: Nam Cao <namcao@linutronix.de>

You seem to take a slightly different solution. Feel free to take my name
off the author field, Co-developed-by: or Suggested-by: would be fine. It's
up to you, doesn't really matter to me.

> The dot2c.py script generates all states in a single line. This breaks the
> 100 column limit when the state machines are non-trivial.
> 
> Change dot2c.py to generate the states in separate lines in case the
> generated line is going to be too long.
> 
> Co-authored-by: Gabriele Monaco <gmonaco@redhat.com>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  tools/verification/rvgen/rvgen/dot2c.py | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/verification/rvgen/rvgen/dot2c.py b/tools/verification/rvgen/rvgen/dot2c.py
> index 6009caf568d92..5216d98ae21c8 100644
> --- a/tools/verification/rvgen/rvgen/dot2c.py
> +++ b/tools/verification/rvgen/rvgen/dot2c.py
> @@ -152,28 +152,29 @@ class Dot2c(Automata):
>          max_state_name = max(self.states, key = len).__len__()
>          return max(max_state_name, self.invalid_state_str.__len__())
>  
> -    def __get_state_string_length(self):
> -        maxlen = self.__get_max_strlen_of_states() + self.enum_suffix.__len__()
> -        return "%" + str(maxlen) + "s"
> -
>      def get_aut_init_function(self):
>          nr_states = self.states.__len__()
>          nr_events = self.events.__len__()
>          buff = []
>  
> -        strformat = self.__get_state_string_length()
> -
> +        maxlen = self.__get_max_strlen_of_states() + len(self.enum_suffix)
> +        # account for tabs and spaces/punctuation for each event
> +        linetoolong = 16 + (maxlen + 3) * nr_events >= self.line_length

I managed to figure out 16 is the indentation. But I failed to understand
where is this '3' from.

Can you please add some comments for these magic numbers? Or better, assign
them to variables with self-explanatory names.

Best regards,
Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 10/17] rv: Fix generated files going over 100 column limit
  2025-07-15  7:14 ` [PATCH v3 10/17] rv: " Gabriele Monaco
@ 2025-07-15 15:08   ` Nam Cao
  2025-07-15 15:24     ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-15 15:08 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:27AM +0200, Gabriele Monaco wrote:
> The dot2c.py script generates all states in a single line. This breaks the
> 100 column limit when the state machines are non-trivial.
> Recent changes allow it to print states over multiple lines if the
> resulting line would have been too long.
> 
> Adapt existing monitors with line length over the limit.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  kernel/trace/rv/monitors/sco/sco.h     | 12 ++++++++++--
>  kernel/trace/rv/monitors/snep/snep.h   | 14 ++++++++++++--
>  kernel/trace/rv/monitors/snroc/snroc.h | 12 ++++++++++--
>  3 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/trace/rv/monitors/sco/sco.h b/kernel/trace/rv/monitors/sco/sco.h
> index 7a4c1f2d5ca1c..83ca9a03331af 100644
> --- a/kernel/trace/rv/monitors/sco/sco.h
> +++ b/kernel/trace/rv/monitors/sco/sco.h
> @@ -39,8 +39,16 @@ static const struct automaton_sco automaton_sco = {
>  		"schedule_exit"
>  	},
>  	.function = {
> -		{     thread_context_sco, scheduling_context_sco,          INVALID_STATE },
> -		{          INVALID_STATE,          INVALID_STATE,     thread_context_sco },
> +		{
> +			thread_context_sco,
> +			scheduling_context_sco,
> +			INVALID_STATE
> +		},
> +		{
> +			INVALID_STATE,
> +			INVALID_STATE,
> +			thread_context_sco
> +		},

I'm confused, these lines were not over 100 columns. Same for snroc.

From my understanding of the previous patch, the script does not break
lines which are not over the limit. Did I miss something?

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-15  7:14 ` [PATCH v3 11/17] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-07-15 15:23   ` Nam Cao
  2025-07-16  8:20     ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-15 15:23 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Tue, Jul 15, 2025 at 09:14:28AM +0200, Gabriele Monaco wrote:
>  static inline bool										\
>  da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
>  {												\
> -	type curr_state = da_monitor_curr_state_##name(da_mon);					\
> -	type next_state = model_get_next_state_##name(curr_state, event);			\
> -												\
> -	if (next_state != INVALID_STATE) {							\
> -		da_monitor_set_state_##name(da_mon, next_state);				\
> -												\
> -		trace_event_##name(model_get_state_name_##name(curr_state),			\
> -				   model_get_event_name_##name(event),				\
> -				   model_get_state_name_##name(next_state),			\
> -				   model_is_final_state_##name(next_state));			\
> -												\
> -		return true;									\
> +	enum states_##name curr_state, next_state;						\
> +												\
> +	curr_state = READ_ONCE(da_mon->curr_state);						\
> +	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
> +		next_state = model_get_next_state_##name(curr_state, event);			\
> +		if (next_state == INVALID_STATE)						\
> +			goto out_react;								\
> +		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state)))		\
> +			goto out_success;							\
>  	}											\
> +	/* Special invalid transition if we run out of retries. */				\
> +	curr_state = INVALID_STATE;								\
>  												\
> +out_react:											\
>  	cond_react_##name(curr_state, event);							\
>  												\
>  	trace_error_##name(model_get_state_name_##name(curr_state),				\
>  			   model_get_event_name_##name(event));					\

If I understand correctly, if after 3 tries and we still fail to change the
state, we will invoke the reactor and trace_error? Doesn't that cause a
false positive? Because it is not a violation of the model, it is just a
race making us fail to change the state.

Same below.

Also, I wouldn't use goto unless necessary. Perhaps it is better to put the
code at "out_react:" and "out_success:" into the loop. But that's just my
personal preference, up to you.

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 10/17] rv: Fix generated files going over 100 column limit
  2025-07-15 15:08   ` Nam Cao
@ 2025-07-15 15:24     ` Gabriele Monaco
  2025-07-16  8:13       ` Nam Cao
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-15 15:24 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur



On Tue, 2025-07-15 at 17:08 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:27AM +0200, Gabriele Monaco wrote:
> > The dot2c.py script generates all states in a single line. This
> > breaks the
> > 100 column limit when the state machines are non-trivial.
> > Recent changes allow it to print states over multiple lines if the
> > resulting line would have been too long.
> > 
> > Adapt existing monitors with line length over the limit.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  kernel/trace/rv/monitors/sco/sco.h     | 12 ++++++++++--
> >  kernel/trace/rv/monitors/snep/snep.h   | 14 ++++++++++++--
> >  kernel/trace/rv/monitors/snroc/snroc.h | 12 ++++++++++--
> >  3 files changed, 32 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/trace/rv/monitors/sco/sco.h
> > b/kernel/trace/rv/monitors/sco/sco.h
> > index 7a4c1f2d5ca1c..83ca9a03331af 100644
> > --- a/kernel/trace/rv/monitors/sco/sco.h
> > +++ b/kernel/trace/rv/monitors/sco/sco.h
> > @@ -39,8 +39,16 @@ static const struct automaton_sco automaton_sco
> > = {
> >  		"schedule_exit"
> >  	},
> >  	.function = {
> > -		{     thread_context_sco,
> > scheduling_context_sco,          INVALID_STATE },
> > -		{          INVALID_STATE,         
> > INVALID_STATE,     thread_context_sco },
> > +		{
> > +			thread_context_sco,
> > +			scheduling_context_sco,
> > +			INVALID_STATE
> > +		},
> > +		{
> > +			INVALID_STATE,
> > +			INVALID_STATE,
> > +			thread_context_sco
> > +		},
> 
> I'm confused, these lines were not over 100 columns. Same for snroc.
> 
> From my understanding of the previous patch, the script does not
> break
> lines which are not over the limit. Did I miss something?

Right, I didn't make it obvious in the commit description since I
thought it wasn't too important.
Those are the monitors whose lines are going to be longer than 100
columns later in the series.

Changing it there saves a bit of complication in the next patches,
where I only add lines for new events instead of splitting the line
/and/ adding the events.

Do you think I should mention this in the commit description?

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors
  2025-07-15 14:48   ` Nam Cao
@ 2025-07-16  7:40     ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  7:40 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur



On Tue, 2025-07-15 at 16:48 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:25AM +0200, Gabriele Monaco wrote:
> > The current behaviour of rvgen when running with the -a option is
> > to
> > append the necessary lines at the end of the configuration for
> > Kconfig,
> > Makefile and tracepoints.
> > This is not always the desired behaviour in case of nested
> > monitors:
> > while tracepoints are not affected by nesting and the Makefile's
> > only
> > requirement is that the parent monitor is built before its
> > children, in
> > the Kconfig it is better to have children defined right after their
> > parent, otherwise the result has wrong indentation:
> > 
> > [*]   foo_parent monitor
> > [*]     foo_child1 monitor
> > [*]     foo_child2 monitor
> > [*]   bar_parent monitor
> > [*]     bar_child1 monitor
> > [*]     bar_child2 monitor
> > [*]   foo_child3 monitor
> > [*]   foo_child4 monitor
> > 
> > Adapt rvgen to look for a different marker for nested monitors in
> > the
> > Kconfig file and append the line right after the last sibling,
> > instead
> > of the last monitor.
> > Also add the marker when creating a new parent monitor.
> > 
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  kernel/trace/rv/Kconfig                     |  5 +++++
> >  tools/verification/rvgen/rvgen/container.py | 13 +++++++++++++
> >  tools/verification/rvgen/rvgen/generator.py | 13 ++++++++-----
> >  3 files changed, 26 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
> > index c11bf7e61ebf0..26017378f79b8 100644
> > --- a/kernel/trace/rv/Kconfig
> > +++ b/kernel/trace/rv/Kconfig
> > @@ -43,6 +43,7 @@ config RV_PER_TASK_MONITORS
> >  
> >  source "kernel/trace/rv/monitors/wip/Kconfig"
> >  source "kernel/trace/rv/monitors/wwnr/Kconfig"
> > +
> >  source "kernel/trace/rv/monitors/sched/Kconfig"
> >  source "kernel/trace/rv/monitors/tss/Kconfig"
> >  source "kernel/trace/rv/monitors/sco/Kconfig"
> > @@ -50,9 +51,13 @@ source "kernel/trace/rv/monitors/snroc/Kconfig"
> >  source "kernel/trace/rv/monitors/scpd/Kconfig"
> >  source "kernel/trace/rv/monitors/snep/Kconfig"
> >  source "kernel/trace/rv/monitors/sncid/Kconfig"
> > +# Add new sched monitors here
> > +
> >  source "kernel/trace/rv/monitors/rtapp/Kconfig"
> >  source "kernel/trace/rv/monitors/pagefault/Kconfig"
> >  source "kernel/trace/rv/monitors/sleep/Kconfig"
> > +# Add new rtapp monitors here
> > +
> >  # Add new monitors here
> >  
> >  config RV_REACTORS
> > diff --git a/tools/verification/rvgen/rvgen/container.py
> > b/tools/verification/rvgen/rvgen/container.py
> > index 47d8ab2ad3ec4..fee493f89fde6 100644
> > --- a/tools/verification/rvgen/rvgen/container.py
> > +++ b/tools/verification/rvgen/rvgen/container.py
> > @@ -20,3 +20,16 @@ class Container(generator.RVGenerator):
> >          main_h = self.main_h
> >          main_h = main_h.replace("%%MODEL_NAME%%", self.name)
> >          return main_h
> > +
> > +    def fill_kconfig_tooltip(self):
> > +        """Override to produce a marker for this container in the
> > Kconfig"""
> > +        container_marker = f"# Add new {self.name} monitors
> > here\n"
> > +        if self.auto_patch:
> > +            self._patch_file("Kconfig",
> > +                            "# Add new monitors here", "")
> 
> Isn't this one excessive? I gave it a try, but I had double blank
> line:
> 
> python3 tools/verification/rvgen -a container -n hello
> 
>    61 source "kernel/trace/rv/monitors/sleep/Kconfig"
>    62 # Add new rtapp monitors here
>    63 
> +  64 
> +  65 source "kernel/trace/rv/monitors/hello/Kconfig"
> +  66 # Add new hello monitors here
> +  67 
>    68 # Add new monitors here
>    69 
>    70 config RV_REACTORS
> 

Yeah indeed it is, thanks for finding out, it somehow slipped my
tests..

> > +        result = super().fill_kconfig_tooltip()
> > +        if self.auto_patch:
> > +            self._patch_file("Kconfig",
> > +                            "# Add new monitors here",
> > container_marker)
> > +            return result
> > +        return result + container_marker
> > diff --git a/tools/verification/rvgen/rvgen/generator.py
> > b/tools/verification/rvgen/rvgen/generator.py
> > index 19d0078a38032..ac97c4505ff00 100644
> > --- a/tools/verification/rvgen/rvgen/generator.py
> > +++ b/tools/verification/rvgen/rvgen/generator.py
> > @@ -137,7 +137,8 @@ class RVGenerator:
> >          kconfig = kconfig.replace("%%MONITOR_DEPS%%",
> > monitor_deps)
> >          return kconfig
> >  
> > -    def __patch_file(self, file, marker, line):
> > +    def _patch_file(self, file, marker, line):
> > +        assert(self.auto_patch)
> >          file_to_patch = os.path.join(self.rv_dir, file)
> >          content = self._read_file(file_to_patch)
> >          content = content.replace(marker, line + "\n" + marker)
> > @@ -146,7 +147,7 @@ class RVGenerator:
> >      def fill_tracepoint_tooltip(self):
> >          monitor_class_type = self.fill_monitor_class_type()
> >          if self.auto_patch:
> > -            self.__patch_file("rv_trace.h",
> > +            self._patch_file("rv_trace.h",
> >                              "// Add new monitors based on
> > CONFIG_%s here" % monitor_class_type,
> >                              "#include <monitors/%s/%s_trace.h>" %
> > (self.name, self.name))
> >              return "  - Patching %s/rv_trace.h, double check the
> > result" % self.rv_dir
> > @@ -158,8 +159,10 @@ Add this line where other tracepoints are
> > included and %s is defined:
> >  
> >      def fill_kconfig_tooltip(self):
> >          if self.auto_patch:
> > -            self.__patch_file("Kconfig",
> > -                            "# Add new monitors here",
> > +            # monitors with a container should stay together in
> > the Kconfig
> > +            self._patch_file("Kconfig",
> > +                            "# Add new %smonitors here" %
> > +                              (self.parent + " " if self.parent
> > else ""),
> 
> This one must be kept in sync with container_marker in
> Container.fill_kconfig_tooltip(). I think this is hard to maintain
> later
> on.
> 
> How about we keep in centralized with a helper function, something
> like:
> 
> def container_marker(container: str) -> str:
>     return f"# Add new {container} monitors here\n"
> 

Good point, will do.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 10/17] rv: Fix generated files going over 100 column limit
  2025-07-15 15:24     ` Gabriele Monaco
@ 2025-07-16  8:13       ` Nam Cao
  2025-07-16  9:07         ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-16  8:13 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 05:24:11PM +0200, Gabriele Monaco wrote:
> Right, I didn't make it obvious in the commit description since I
> thought it wasn't too important.
> Those are the monitors whose lines are going to be longer than 100
> columns later in the series.
> 
> Changing it there saves a bit of complication in the next patches,
> where I only add lines for new events instead of splitting the line
> /and/ adding the events.

Ah, that makes sense. I thought the script went wild.

> Do you think I should mention this in the commit description?

A maintainer once told me that the rule is "one thing per patch", not "half
a thing per patch".

This does make the diff easier to read in the later patch, but I think it
is not worth it. For this case, I can easily review the later patches by
regenerate the files.

So I suggest dropping these.

Best regards,
Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 07/17] rv: Adjust monitor dependencies
  2025-07-15  7:14 ` [PATCH v3 07/17] rv: Adjust monitor dependencies Gabriele Monaco
@ 2025-07-16  8:19   ` Nam Cao
  2025-07-16  8:30     ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-16  8:19 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:24AM +0200, Gabriele Monaco wrote:
> RV monitors relying on the preemptirqs tracepoints are set as dependent
> on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
> enable the tracepoints but are not the minimal configurations enabling
> them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
> manually).
> 
> Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
> monitors.
> 
> Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

I still dislike this. TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS are not
selectable manually, making it hard to enable the monitors. You would need
to manually enable PREEMPT_TRACER and IRQSOFF_TRACER.

I prefer "select" instead.

But well, if you insist on doing it this way. It is not broken, it is just
inconvenient to configure.

Acked-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-15 15:23   ` Nam Cao
@ 2025-07-16  8:20     ` Gabriele Monaco
  2025-07-16  8:27       ` Nam Cao
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  8:20 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur



On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:28AM +0200, Gabriele Monaco wrote:
> >  static inline
> > bool										\
> >  da_event_##name(struct da_monitor *da_mon, enum events_##name
> > event)				\
> >  {								
> > 				\
> > -	type curr_state =
> > da_monitor_curr_state_##name(da_mon);					\
> > -	type next_state = model_get_next_state_##name(curr_state,
> > event);			\
> > -
> > 												\
> > -	if (next_state != INVALID_STATE)
> > {							\
> > -		da_monitor_set_state_##name(da_mon,
> > next_state);				\
> > -
> > 												\
> > -
> > 		trace_event_##name(model_get_state_name_##name(curr_state),			\
> > -				  
> > model_get_event_name_##name(event),				\
> > -				  
> > model_get_state_name_##name(next_state),			\
> > -				  
> > model_is_final_state_##name(next_state));			\
> > -
> > 												\
> > -		return
> > true;									\
> > +	enum states_##name curr_state,
> > next_state;						\
> > +								
> > 				\
> > +	curr_state = READ_ONCE(da_mon-
> > >curr_state);						\
> > +	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++)
> > {					\
> > +		next_state =
> > model_get_next_state_##name(curr_state, event);			\
> > +		if (next_state ==
> > INVALID_STATE)						\
> > +			goto
> > out_react;								\
> > +		if (likely(try_cmpxchg(&da_mon->curr_state,
> > &curr_state, next_state)))		\
> > +			goto
> > out_success;							\
> >  	}							
> > 				\
> > +	/* Special invalid transition if we run out of retries.
> > */				\
> > +	curr_state =
> > INVALID_STATE;								\
> >  								
> > 				\
> > +out_react:							
> > 				\
> >  	cond_react_##name(curr_state,
> > event);							\
> >  								
> > 				\
> >  	trace_error_##name(model_get_state_name_##name(curr_state)
> > ,				\
> >  			  
> > model_get_event_name_##name(event));					\
> 
> If I understand correctly, if after 3 tries and we still fail to
> change the
> state, we will invoke the reactor and trace_error? Doesn't that cause
> a
> false positive? Because it is not a violation of the model, it is
> just a
> race making us fail to change the state.
> 

Yes, that's correct.
My rationale was that, at that point, the monitor is likely no longer
in sync, so silently ignoring the situation is not really an option.
In this case, the reaction includes an invalid current state (because
in fact we don't know what the current state is) and tools may be able
to understand that. I know you wouldn't be able to do that in LTL..
By the way, LTL uses multiple statuses, so this lockless approach may
not really work.

I don't see this situation happening often: I only ever observed 2
events able to race, 4 happening at the same time is wild, but of
course cannot be excluded in principle for any possible monitor.
Yet, I have the feeling a monitor where this can happen is not well
designed and RV should point that out.
Do you have ideas of potential monitors where more than 3 events can
race?

Perhaps a full blown reaction is a bit aggressive in this situation, as
the /fault/ may not be necessarily in the monitor.
We could think of a special tracepoint or just printing.

> Same below.
> 
> Also, I wouldn't use goto unless necessary. Perhaps it is better to
> put the
> code at "out_react:" and "out_success:" into the loop. But that's
> just my
> personal preference, up to you.

That could be done if we do a whole different thing when retries run
out, instead of defaulting to out_react.
I liked to avoid excessive indentation with those goto as well but
yeah, it may not be quite necessary.

I'll have a deeper thought on this.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-16  8:20     ` Gabriele Monaco
@ 2025-07-16  8:27       ` Nam Cao
  2025-07-16  8:38         ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-16  8:27 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Wed, Jul 16, 2025 at 10:20:39AM +0200, Gabriele Monaco wrote:
> On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> > If I understand correctly, if after 3 tries and we still fail to
> > change the
> > state, we will invoke the reactor and trace_error? Doesn't that cause
> > a
> > false positive? Because it is not a violation of the model, it is
> > just a
> > race making us fail to change the state.
> > 
> 
> Yes, that's correct.
> My rationale was that, at that point, the monitor is likely no longer
> in sync, so silently ignoring the situation is not really an option.
> In this case, the reaction includes an invalid current state (because
> in fact we don't know what the current state is) and tools may be able
> to understand that.

Can't you bring the monitor back to the init state, and start over again?

I think "da_mon->monitoring = 0;" does the trick?

> I know you wouldn't be able to do that in LTL..  By the way, LTL uses
> multiple statuses, so this lockless approach may not really work.

Let's worry about one thing at a time ;)

> I don't see this situation happening often: I only ever observed 2
> events able to race, 4 happening at the same time is wild, but of
> course cannot be excluded in principle for any possible monitor.
> Yet, I have the feeling a monitor where this can happen is not well
> designed and RV should point that out.
> Do you have ideas of potential monitors where more than 3 events can
> race?
> 
> Perhaps a full blown reaction is a bit aggressive in this situation, as
> the /fault/ may not be necessarily in the monitor.
> We could think of a special tracepoint or just printing.
> 
> > Same below.
> > 
> > Also, I wouldn't use goto unless necessary. Perhaps it is better to
> > put the
> > code at "out_react:" and "out_success:" into the loop. But that's
> > just my
> > personal preference, up to you.
> 
> That could be done if we do a whole different thing when retries run
> out, instead of defaulting to out_react.
> I liked to avoid excessive indentation with those goto as well but
> yeah, it may not be quite necessary.

Sure, as I said before, "just my personal preference, up to you."

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 07/17] rv: Adjust monitor dependencies
  2025-07-16  8:19   ` Nam Cao
@ 2025-07-16  8:30     ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  8:30 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur



On Wed, 2025-07-16 at 10:19 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:24AM +0200, Gabriele Monaco wrote:
> > RV monitors relying on the preemptirqs tracepoints are set as
> > dependent
> > on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations
> > do
> > enable the tracepoints but are not the minimal configurations
> > enabling
> > them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not
> > selectable
> > manually).
> > 
> > Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
> > monitors.
> > 
> > Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu
> > monitors")
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> 
> I still dislike this. TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS are not
> selectable manually, making it hard to enable the monitors. You would
> need
> to manually enable PREEMPT_TRACER and IRQSOFF_TRACER.
> 

Right, they cannot but they are sufficient. For instance the debug
version of several distribution kernels has PROVE_LOCKING enabled but
not TRACE_PREEMPT_TOGGLE. Both of them set TRACE_IRQFLAGS, so the
monitor should be available on those builds.

> I prefer "select" instead.
> 
> But well, if you insist on doing it this way. It is not broken, it is
> just
> inconvenient to configure.

We had an internal discussion and my approach seems still better from a
distro maintainer/reviewer perspective.
Anyway, let's agree to disagree :)

> 
> Acked-by: Nam Cao <namcao@linutronix.de>

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-16  8:27       ` Nam Cao
@ 2025-07-16  8:38         ` Gabriele Monaco
  2025-07-16  8:45           ` Nam Cao
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  8:38 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur



On Wed, 2025-07-16 at 10:27 +0200, Nam Cao wrote:
> On Wed, Jul 16, 2025 at 10:20:39AM +0200, Gabriele Monaco wrote:
> > On Tue, 2025-07-15 at 17:23 +0200, Nam Cao wrote:
> > > If I understand correctly, if after 3 tries and we still fail to
> > > change the
> > > state, we will invoke the reactor and trace_error? Doesn't that
> > > cause
> > > a
> > > false positive? Because it is not a violation of the model, it is
> > > just a
> > > race making us fail to change the state.
> > > 
> > 
> > Yes, that's correct.
> > My rationale was that, at that point, the monitor is likely no
> > longer
> > in sync, so silently ignoring the situation is not really an
> > option.
> > In this case, the reaction includes an invalid current state
> > (because
> > in fact we don't know what the current state is) and tools may be
> > able
> > to understand that.
> 
> Can't you bring the monitor back to the init state, and start over
> again?
> 
> I think "da_mon->monitoring = 0;" does the trick?
> 

Yes you can, but I wouldn't do so silently.
I'd say the cleanest approach without reaction is to still return false
for the system to do all the cleanup but trace the event or, at the
very least, print a warning.

But you're right, this is more relevant for who develops the monitor
rather than for the user, so should probably be tracked separately.

Thanks,
Gabriele

> > I know you wouldn't be able to do that in LTL..  By the way, LTL
> > uses
> > multiple statuses, so this lockless approach may not really work.
> 
> Let's worry about one thing at a time ;)
> 
> > I don't see this situation happening often: I only ever observed 2
> > events able to race, 4 happening at the same time is wild, but of
> > course cannot be excluded in principle for any possible monitor.
> > Yet, I have the feeling a monitor where this can happen is not well
> > designed and RV should point that out.
> > Do you have ideas of potential monitors where more than 3 events
> > can
> > race?
> > 
> > Perhaps a full blown reaction is a bit aggressive in this
> > situation, as
> > the /fault/ may not be necessarily in the monitor.
> > We could think of a special tracepoint or just printing.
> > 
> > > Same below.
> > > 
> > > Also, I wouldn't use goto unless necessary. Perhaps it is better
> > > to
> > > put the
> > > code at "out_react:" and "out_success:" into the loop. But that's
> > > just my
> > > personal preference, up to you.
> > 
> > That could be done if we do a whole different thing when retries
> > run
> > out, instead of defaulting to out_react.
> > I liked to avoid excessive indentation with those goto as well but
> > yeah, it may not be quite necessary.
> 
> Sure, as I said before, "just my personal preference, up to you."
> 
> Nam


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-16  8:38         ` Gabriele Monaco
@ 2025-07-16  8:45           ` Nam Cao
  2025-07-16  8:59             ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-16  8:45 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 10:27 +0200, Nam Cao wrote:
> > Can't you bring the monitor back to the init state, and start over
> > again?
> > 
> > I think "da_mon->monitoring = 0;" does the trick?
> > 
> 
> Yes you can, but I wouldn't do so silently.

Why not? The absolute worst that we get, is the rare case where a bug
appears at the exact same time. In that case, we would get a false
negative.

And I think that is really really rare.

> I'd say the cleanest approach without reaction is to still return false
> for the system to do all the cleanup but trace the event or, at the
> very least, print a warning.
> 
> But you're right, this is more relevant for who develops the monitor
> rather than for the user, so should probably be tracked separately.

Yes, if you really want to emit some sort of warning here, it should be
absolutely clear that the monitor itself is having a hiccup, not the
monitored kernel.

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-16  8:45           ` Nam Cao
@ 2025-07-16  8:59             ` Gabriele Monaco
  2025-07-16  9:02               ` Nam Cao
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  8:59 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur



On Wed, 2025-07-16 at 10:45 +0200, Nam Cao wrote:
> On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> > Yes you can, but I wouldn't do so silently.
> 
> Why not? The absolute worst that we get, is the rare case where a bug
> appears at the exact same time. In that case, we would get a false
> negative.
> 
> And I think that is really really rare.

Well, we wouldn't even know how rare it is because we don't track it!

It may be a harmless note but might also be a design flaw in the
monitor, so it should be possible to understand when it happens.

> Yes, if you really want to emit some sort of warning here, it should
> be
> absolutely clear that the monitor itself is having a hiccup, not the
> monitored kernel.

Yeah totally, not saying that other errors reported in the wild are
going to be necessarily the kernel's fault though ;)
But let's keep these cases separate going forward nonetheless.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 11/17] rv: Retry when da monitor detects race conditions
  2025-07-16  8:59             ` Gabriele Monaco
@ 2025-07-16  9:02               ` Nam Cao
  0 siblings, 0 replies; 57+ messages in thread
From: Nam Cao @ 2025-07-16  9:02 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Wed, Jul 16, 2025 at 10:59:44AM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 10:45 +0200, Nam Cao wrote:
> > On Wed, Jul 16, 2025 at 10:38:11AM +0200, Gabriele Monaco wrote:
> > > Yes you can, but I wouldn't do so silently.
> > 
> > Why not? The absolute worst that we get, is the rare case where a bug
> > appears at the exact same time. In that case, we would get a false
> > negative.
> > 
> > And I think that is really really rare.
> 
> Well, we wouldn't even know how rare it is because we don't track it!
> 
> It may be a harmless note but might also be a design flaw in the
> monitor, so it should be possible to understand when it happens.

Oh, so you just want to track when this happens, as a RV developer. That
makes sense.

Just make sure it won't confuse other people who use the monitors to test
the kernel.

Best regards,
Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 10/17] rv: Fix generated files going over 100 column limit
  2025-07-16  8:13       ` Nam Cao
@ 2025-07-16  9:07         ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16  9:07 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur



On Wed, 2025-07-16 at 10:13 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 05:24:11PM +0200, Gabriele Monaco wrote:
> > Right, I didn't make it obvious in the commit description since I
> > thought it wasn't too important.
> > Those are the monitors whose lines are going to be longer than 100
> > columns later in the series.
> > 
> > Changing it there saves a bit of complication in the next patches,
> > where I only add lines for new events instead of splitting the line
> > /and/ adding the events.
> 
> Ah, that makes sense. I thought the script went wild.
> 
> > Do you think I should mention this in the commit description?
> 
> A maintainer once told me that the rule is "one thing per patch", not
> "half
> a thing per patch".
> 

Sounds like someone familiar, good point :)

> This does make the diff easier to read in the later patch, but I
> think it
> is not worth it. For this case, I can easily review the later patches
> by
> regenerate the files.
> 
> So I suggest dropping these.

Alright then, I'm usually finding the diffs in emails confusing so I
prefer to have as much help as possible. But here it shouldn't be a big
deal indeed.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit
       [not found]     ` <d69862275becf1d296c80a08b29b2081857a85a1.camel@redhat.com>
@ 2025-07-16  9:34       ` Nam Cao
  0 siblings, 0 replies; 57+ messages in thread
From: Nam Cao @ 2025-07-16  9:34 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Peter Zijlstra, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

On Wed, Jul 16, 2025 at 11:27:54AM +0200, Gabriele Monaco wrote:
> On Tue, 2025-07-15 at 17:01 +0200, Nam Cao wrote:
> > On Tue, Jul 15, 2025 at 09:14:26AM +0200, Gabriele Monaco wrote:
> > > From: Nam Cao <namcao@linutronix.de>
> > > -        strformat = self.__get_state_string_length()
> > > -
> > > +        maxlen = self.__get_max_strlen_of_states() +
> > > len(self.enum_suffix)
> > > +        # account for tabs and spaces/punctuation for each event
> > > +        linetoolong = 16 + (maxlen + 3) * nr_events >=
> > > self.line_length
> > 
> > I managed to figure out 16 is the indentation. But I failed to
> > understand
> > where is this '3' from.
> > 
> > Can you please add some comments for these magic numbers? Or better,
> > assign
> > them to variables with self-explanatory names.
> 
> Turns out it was wrong ;)
> I'll fix it and make it clear.

Hah! Got you.

I'm guessing that it was supposed to be 2. I wait for the reveal in v4.

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 17/17] rv: Add opid per-cpu monitor
  2025-07-15  7:14 ` [PATCH v3 17/17] rv: Add opid per-cpu monitor Gabriele Monaco
@ 2025-07-16  9:38   ` Nam Cao
  2025-07-16 10:00     ` Gabriele Monaco
  2025-07-18 10:26     ` Gabriele Monaco
  0 siblings, 2 replies; 57+ messages in thread
From: Nam Cao @ 2025-07-16  9:38 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:34AM +0200, Gabriele Monaco wrote:
> diff --git a/kernel/trace/rv/monitors/nrp/Kconfig b/kernel/trace/rv/monitors/nrp/Kconfig
> index f37ff70e8d204..a175c430d351f 100644
> --- a/kernel/trace/rv/monitors/nrp/Kconfig
> +++ b/kernel/trace/rv/monitors/nrp/Kconfig
> @@ -3,7 +3,7 @@
>  config RV_MON_NRP
>  	depends on RV
>  	depends on RV_MON_SCHED
> -	default y if !ARCH_ARM64
> +	default y if !ARM64

I think this is not supposed to be in this patch? It has nothing to do with
the opid monitor.

>  	select DA_MON_EVENTS_ID
>  	bool "nrp monitor"
>  	help
> diff --git a/kernel/trace/rv/monitors/opid/Kconfig b/kernel/trace/rv/monitors/opid/Kconfig
> new file mode 100644
> index 0000000000000..23b43d2704153
> --- /dev/null
> +++ b/kernel/trace/rv/monitors/opid/Kconfig
> @@ -0,0 +1,17 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +config RV_MON_OPID
> +	depends on RV
> +	depends on TRACE_IRQFLAGS
> +	depends on TRACE_PREEMPT_TOGGLE
> +	depends on RV_MON_SCHED
> +	default y if PREEMPT_RT
> +	select DA_MON_EVENTS_IMPLICIT

Shouldn't we add "depends on PREEMPT_RT"? I tried this monitor on
non-RT x86 kernel, and got some errors. That could confuse people.

And the monitor reports some errors on riscv64 with PREEMPT_RT=y:

root@riscv:~/rv-tests# uname -a
Linux riscv 6.16.0-rc6-00054-g7590637d9ca2 #87 SMP PREEMPT_RT Wed Jul 16 11:26:00 CEST 2025 riscv64 GNU/Linux
root@riscv:~/rv-tests# stress-ng --cpu-sched -1
stress-ng: info:  [452] defaulting to a 1 day run per stressor
stress-ng: info:  [452] dispatching hogs: 4 cpu-sched
[  614.390462] rv: monitor opid does not allow event irq_entry on state in_irq

Best regards,
Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 00/17] rv: Add monitors to validate task switch
  2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (16 preceding siblings ...)
  2025-07-15  7:14 ` [PATCH v3 17/17] rv: Add opid per-cpu monitor Gabriele Monaco
@ 2025-07-16  9:42 ` Nam Cao
  2025-07-16 12:20   ` Gabriele Monaco
  17 siblings, 1 reply; 57+ messages in thread
From: Nam Cao @ 2025-07-16  9:42 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:17AM +0200, Gabriele Monaco wrote:
> This series adds three monitors to the sched collection, extends and
> replaces previously existing monitors:

I looked at the patches that I understand. For the others, I lack the
background knowledge to review them, sorry.

I gave the series a test run, and beside the opid's errors, everything else
looks fine.

Looking forward to v4.

Nam

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 17/17] rv: Add opid per-cpu monitor
  2025-07-16  9:38   ` Nam Cao
@ 2025-07-16 10:00     ` Gabriele Monaco
  2025-07-18 10:26     ` Gabriele Monaco
  1 sibling, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 10:00 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur



On Wed, 2025-07-16 at 11:38 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:34AM +0200, Gabriele Monaco wrote:
> > diff --git a/kernel/trace/rv/monitors/nrp/Kconfig
> > b/kernel/trace/rv/monitors/nrp/Kconfig
> > index f37ff70e8d204..a175c430d351f 100644
> > --- a/kernel/trace/rv/monitors/nrp/Kconfig
> > +++ b/kernel/trace/rv/monitors/nrp/Kconfig
> > @@ -3,7 +3,7 @@
> >  config RV_MON_NRP
> >  	depends on RV
> >  	depends on RV_MON_SCHED
> > -	default y if !ARCH_ARM64
> > +	default y if !ARM64
> 
> I think this is not supposed to be in this patch? It has nothing to
> do with
> the opid monitor.

Damn, fixed up the wrong patch, will move it to the other one..

> 
> >  	select DA_MON_EVENTS_ID
> >  	bool "nrp monitor"
> >  	help
> > diff --git a/kernel/trace/rv/monitors/opid/Kconfig
> > b/kernel/trace/rv/monitors/opid/Kconfig
> > new file mode 100644
> > index 0000000000000..23b43d2704153
> > --- /dev/null
> > +++ b/kernel/trace/rv/monitors/opid/Kconfig
> > @@ -0,0 +1,17 @@
> > +# SPDX-License-Identifier: GPL-2.0-only
> > +#
> > +config RV_MON_OPID
> > +	depends on RV
> > +	depends on TRACE_IRQFLAGS
> > +	depends on TRACE_PREEMPT_TOGGLE
> > +	depends on RV_MON_SCHED
> > +	default y if PREEMPT_RT
> > +	select DA_MON_EVENTS_IMPLICIT
> 
> Shouldn't we add "depends on PREEMPT_RT"? I tried this monitor on
> non-RT x86 kernel, and got some errors. That could confuse people.

Mmh, my rationale was that it reports errors on non PREEMPT_RT, but it
does build. If someone wants to try it out there, they are free to do
it. We are just not supporting it officially.
The monitor might start working in the future also on non RT kernels,
or at least if someone wants to try whether it's the case, they can do
it easily.

Same idea for the ARM64 thing above.

But I should definitely mention this explicitly in the Kconfig entry
not to confuse people..

> 
> And the monitor reports some errors on riscv64 with PREEMPT_RT=y:
> 
> root@riscv:~/rv-tests# uname -a
> Linux riscv 6.16.0-rc6-00054-g7590637d9ca2 #87 SMP PREEMPT_RT Wed Jul
> 16 11:26:00 CEST 2025 riscv64 GNU/Linux
> root@riscv:~/rv-tests# stress-ng --cpu-sched -1
> stress-ng: info:  [452] defaulting to a 1 day run per stressor
> stress-ng: info:  [452] dispatching hogs: 4 cpu-sched
> [  614.390462] rv: monitor opid does not allow event irq_entry on
> state in_irq
> 

Mmh riscv.. I haven't tested it there, guess I need to start keeping a
VM somewhere.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-15  7:14 ` [PATCH v3 01/17] tools/rv: Do not skip idle in trace Gabriele Monaco
@ 2025-07-16 11:50   ` Peter Zijlstra
  2025-07-16 12:18     ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 11:50 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:18AM +0200, Gabriele Monaco wrote:
> Currently, the userspace RV tool skips trace events triggered by the RV
> tool itself, this can be changed by passing the parameter -s, which sets
> the variable config_my_pid to 0 (instead of the tool's PID).
> The current condition for per-task monitors (config_has_id) does not
> check that config_my_pid isn't 0 to skip. In case we pass -s, we show
> events triggered by RV but don't show those triggered by idle (PID 0).
> 
> Fix the condition to account this scenario.

The distinction between !my_pid and has_id is that you can in fact trace
pid-0 if you want?

> Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
>  tools/verification/rv/src/in_kernel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
> index c0dcee795c0de..72b03bae021cd 100644
> --- a/tools/verification/rv/src/in_kernel.c
> +++ b/tools/verification/rv/src/in_kernel.c
> @@ -429,7 +429,7 @@ ikm_event_handler(struct trace_seq *s, struct tep_record *record,
>  
>  	tep_get_common_field_val(s, trace_event, "common_pid", record, &pid, 1);
>  
> -	if (config_has_id && (config_my_pid == id))
> +	if (config_my_pid && config_has_id && (config_my_pid == id))
>  		return 0;
>  	else if (config_my_pid && (config_my_pid == pid))
>  		return 0;
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-16 11:50   ` Peter Zijlstra
@ 2025-07-16 12:18     ` Gabriele Monaco
  2025-07-16 12:41       ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 12:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur



On Wed, 2025-07-16 at 13:50 +0200, Peter Zijlstra wrote:
> On Tue, Jul 15, 2025 at 09:14:18AM +0200, Gabriele Monaco wrote:
> > Currently, the userspace RV tool skips trace events triggered by
> > the RV
> > tool itself, this can be changed by passing the parameter -s, which
> > sets
> > the variable config_my_pid to 0 (instead of the tool's PID).
> > The current condition for per-task monitors (config_has_id) does
> > not
> > check that config_my_pid isn't 0 to skip. In case we pass -s, we
> > show
> > events triggered by RV but don't show those triggered by idle (PID
> > 0).
> > 
> > Fix the condition to account this scenario.
> 
> The distinction between !my_pid and has_id is that you can in fact
> trace
> pid-0 if you want?
> 

Yes pretty much, no flag is meant to skip events from pid-0.


has_id is used to distinguish between per-cpu/global monitors (they
don't have id) and per-task monitors (the id is the pid).

The case with !has_id is correctly checking for both my_pid != 0 while
skipping events associated to my_pid (rv thread's PID).

In the other case we end up with:
* -s skipping events generated by the tool (correct)
* omitting -s skips events generated by pid-0 (undesired)


> > Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> >  tools/verification/rv/src/in_kernel.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/verification/rv/src/in_kernel.c
> > b/tools/verification/rv/src/in_kernel.c
> > index c0dcee795c0de..72b03bae021cd 100644
> > --- a/tools/verification/rv/src/in_kernel.c
> > +++ b/tools/verification/rv/src/in_kernel.c
> > @@ -429,7 +429,7 @@ ikm_event_handler(struct trace_seq *s, struct
> > tep_record *record,
> >  
> >  	tep_get_common_field_val(s, trace_event, "common_pid",
> > record, &pid, 1);
> >  
> > -	if (config_has_id && (config_my_pid == id))
> > +	if (config_my_pid && config_has_id && (config_my_pid ==
> > id))
> >  		return 0;
> >  	else if (config_my_pid && (config_my_pid == pid))
> >  		return 0;
> > -- 
> > 2.50.1
> > 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 00/17] rv: Add monitors to validate task switch
  2025-07-16  9:42 ` [PATCH v3 00/17] rv: Add monitors to validate task switch Nam Cao
@ 2025-07-16 12:20   ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 12:20 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur



On Wed, 2025-07-16 at 11:42 +0200, Nam Cao wrote:
> On Tue, Jul 15, 2025 at 09:14:17AM +0200, Gabriele Monaco wrote:
> > This series adds three monitors to the sched collection, extends
> > and
> > replaces previously existing monitors:
> 
> I looked at the patches that I understand. For the others, I lack the
> background knowledge to review them, sorry.
> 
> I gave the series a test run, and beside the opid's errors,
> everything else
> looks fine.
> 
> Looking forward to v4.
> 

Thanks for the testing and review, was very helpful!

Cheers,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-15  7:14 ` [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
@ 2025-07-16 12:38   ` Peter Zijlstra
  2025-07-16 13:40     ` Gabriele Monaco
  2025-07-16 15:09     ` Gabriele Monaco
  0 siblings, 2 replies; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 12:38 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Tue, Jul 15, 2025 at 09:14:29AM +0200, Gabriele Monaco wrote:
> Add the following tracepoints:
> * sched_set_need_resched(tsk, cpu, tif)
>     Called when a task is set the need resched [lazy] flag
> * sched_switch_vain(preempt, tsk, tsk_state)
>     Called when a task is selected again during __schedule
>     i.e. prev == next == tsk : no real context switch

> @@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
>  	int flags = DEQUEUE_NOCLOCK;
>  
>  	if (signal_pending_state(task_state, p)) {
> +		trace_sched_set_state_tp(p, TASK_RUNNING, true);
>  		WRITE_ONCE(p->__state, TASK_RUNNING);
>  		*task_state_p = TASK_RUNNING;
>  		return false;

I'm confused on the purpose of this. How does this relate to say the
wakeup in signal_wake_up_state() ?

> @@ -6786,6 +6793,7 @@ static void __sched notrace __schedule(int sched_mode)
>  		rq = context_switch(rq, prev, next, &rf);
>  	} else {
>  		rq_unpin_lock(rq, &rf);
> +		trace_sched_switch_vain_tp(preempt, prev, prev_state);
>  		__balance_callbacks(rq);
>  		raw_spin_rq_unlock_irq(rq);
>  	}

Hurmph... don't you already have this covered by: trace_sched_exit_tp() ?

Specifically, the only case where is_switch := false, is this case.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-16 12:18     ` Gabriele Monaco
@ 2025-07-16 12:41       ` Peter Zijlstra
  2025-07-16 13:05         ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 12:41 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 02:18:28PM +0200, Gabriele Monaco wrote:
> 
> 
> On Wed, 2025-07-16 at 13:50 +0200, Peter Zijlstra wrote:
> > On Tue, Jul 15, 2025 at 09:14:18AM +0200, Gabriele Monaco wrote:
> > > Currently, the userspace RV tool skips trace events triggered by
> > > the RV
> > > tool itself, this can be changed by passing the parameter -s, which
> > > sets
> > > the variable config_my_pid to 0 (instead of the tool's PID).
> > > The current condition for per-task monitors (config_has_id) does
> > > not
> > > check that config_my_pid isn't 0 to skip. In case we pass -s, we
> > > show
> > > events triggered by RV but don't show those triggered by idle (PID
> > > 0).
> > > 
> > > Fix the condition to account this scenario.
> > 
> > The distinction between !my_pid and has_id is that you can in fact
> > trace
> > pid-0 if you want?
> > 
> 
> Yes pretty much, no flag is meant to skip events from pid-0.

> > > -	if (config_has_id && (config_my_pid == id))
> > > +	if (config_my_pid && config_has_id && (config_my_pid == id))

But should we then not write:

	if (config_has_id && (config_my_pid == id))


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-16 12:41       ` Peter Zijlstra
@ 2025-07-16 13:05         ` Gabriele Monaco
  2025-07-16 13:08           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 13:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Wed, 2025-07-16 at 14:41 +0200, Peter Zijlstra wrote:
> On Wed, Jul 16, 2025 at 02:18:28PM +0200, Gabriele Monaco wrote:
> > On Wed, 2025-07-16 at 13:50 +0200, Peter Zijlstra wrote:
> > > On Tue, Jul 15, 2025 at 09:14:18AM +0200, Gabriele Monaco wrote:
> > > > Currently, the userspace RV tool skips trace events triggered by the RV
> > > > tool itself, this can be changed by passing the parameter -s, which
> > > > sets the variable config_my_pid to 0 (instead of the tool's PID). The
> > > > current condition for per-task monitors (config_has_id) does not check
> > > > that config_my_pid isn't 0 to skip. In case we pass -s, we show events
> > > > triggered by RV but don't show those triggered by idle (PID 0).
> > > 
> > > The distinction between !my_pid and has_id is that you can in fact trace
> > > pid-0 if you want?
> > > 
> > Yes pretty much, no flag is meant to skip events from pid-0.
> 
> > > > -	if (config_has_id && (config_my_pid == id))
> > > > +	if (config_my_pid && config_has_id && (config_my_pid == id))
> 
> But should we then not write:
> 
> 	if (config_has_id && (config_my_pid == id))

Sorry, got a bit confused, I flipped the two while describing:
* -s shows traces from RV but skips from pid-0 (unintended)
* omitting -s skips events from RV (correct)

If we are running a per-task monitor config_has_id is always true, we pass -s,
which makes config_my_pid = 0 (intended /not/ to skip RV).
Now when we are about to trace an event from idle (id=0), we skip it, although
we really shouldn't.
That's why we also needs to check for config_my_pid not being 0.

Does it make sense?

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-16 13:05         ` Gabriele Monaco
@ 2025-07-16 13:08           ` Peter Zijlstra
  2025-07-16 13:13             ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 13:08 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 03:05:50PM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 14:41 +0200, Peter Zijlstra wrote:
> > On Wed, Jul 16, 2025 at 02:18:28PM +0200, Gabriele Monaco wrote:
> > > On Wed, 2025-07-16 at 13:50 +0200, Peter Zijlstra wrote:
> > > > On Tue, Jul 15, 2025 at 09:14:18AM +0200, Gabriele Monaco wrote:
> > > > > Currently, the userspace RV tool skips trace events triggered by the RV
> > > > > tool itself, this can be changed by passing the parameter -s, which
> > > > > sets the variable config_my_pid to 0 (instead of the tool's PID). The
> > > > > current condition for per-task monitors (config_has_id) does not check
> > > > > that config_my_pid isn't 0 to skip. In case we pass -s, we show events
> > > > > triggered by RV but don't show those triggered by idle (PID 0).
> > > > 
> > > > The distinction between !my_pid and has_id is that you can in fact trace
> > > > pid-0 if you want?
> > > > 
> > > Yes pretty much, no flag is meant to skip events from pid-0.
> > 
> > > > > -	if (config_has_id && (config_my_pid == id))
> > > > > +	if (config_my_pid && config_has_id && (config_my_pid == id))
> > 
> > But should we then not write:
> > 
> > 	if (config_has_id && (config_my_pid == id))
> 
> Sorry, got a bit confused, I flipped the two while describing:
> * -s shows traces from RV but skips from pid-0 (unintended)
> * omitting -s skips events from RV (correct)
> 
> If we are running a per-task monitor config_has_id is always true, we pass -s,
> which makes config_my_pid = 0 (intended /not/ to skip RV).
> Now when we are about to trace an event from idle (id=0), we skip it, although
> we really shouldn't.
> That's why we also needs to check for config_my_pid not being 0.
> 
> Does it make sense?

Sorta, but would it not make sense to use has_pid := -1 for the invalid
case, instead of 0, which is a valid pid?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 01/17] tools/rv: Do not skip idle in trace
  2025-07-16 13:08           ` Peter Zijlstra
@ 2025-07-16 13:13             ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 13:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
	Nam Cao, Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Wed, 2025-07-16 at 15:08 +0200, Peter Zijlstra wrote:
> > > > > > -	if (config_has_id && (config_my_pid == id))
> > > > > > +	if (config_my_pid && config_has_id &&
> > > > > > (config_my_pid == id))
> > > 
> > > But should we then not write:
> > > 
> > > 	if (config_has_id && (config_my_pid == id))
> > 
> > Sorry, got a bit confused, I flipped the two while describing:
> > * -s shows traces from RV but skips from pid-0 (unintended)
> > * omitting -s skips events from RV (correct)
> > 
> > If we are running a per-task monitor config_has_id is always true,
> > we pass -s,
> > which makes config_my_pid = 0 (intended /not/ to skip RV).
> > Now when we are about to trace an event from idle (id=0), we skip
> > it, although
> > we really shouldn't.
> > That's why we also needs to check for config_my_pid not being 0.
> > 
> > Does it make sense?
> 
> Sorta, but would it not make sense to use has_pid := -1 for the
> invalid case, instead of 0, which is a valid pid?

Yeah that's another option, I reckon even cleaner since it's currently
misleading..


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 12:38   ` Peter Zijlstra
@ 2025-07-16 13:40     ` Gabriele Monaco
  2025-07-16 13:45       ` Peter Zijlstra
  2025-07-16 15:09     ` Gabriele Monaco
  1 sibling, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 13:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, 2025-07-16 at 14:38 +0200, Peter Zijlstra wrote:
> On Tue, Jul 15, 2025 at 09:14:29AM +0200, Gabriele Monaco wrote:
> > Add the following tracepoints:
> > * sched_set_need_resched(tsk, cpu, tif)
> >     Called when a task is set the need resched [lazy] flag
> > * sched_switch_vain(preempt, tsk, tsk_state)
> >     Called when a task is selected again during __schedule
> >     i.e. prev == next == tsk : no real context switch
> 
> > @@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq,
> > struct task_struct *p,
> >  	int flags = DEQUEUE_NOCLOCK;
> >  
> >  	if (signal_pending_state(task_state, p)) {
> > +		trace_sched_set_state_tp(p, TASK_RUNNING, true);
> >  		WRITE_ONCE(p->__state, TASK_RUNNING);
> >  		*task_state_p = TASK_RUNNING;
> >  		return false;
> 
> I'm confused on the purpose of this. How does this relate to say the
> wakeup in signal_wake_up_state() ?
> 
> > @@ -6786,6 +6793,7 @@ static void __sched notrace __schedule(int
> > sched_mode)
> >  		rq = context_switch(rq, prev, next, &rf);
> >  	} else {
> >  		rq_unpin_lock(rq, &rf);
> > +		trace_sched_switch_vain_tp(preempt, prev,
> > prev_state);
> >  		__balance_callbacks(rq);
> >  		raw_spin_rq_unlock_irq(rq);
> >  	}
> 
> Hurmph... don't you already have this covered by:
> trace_sched_exit_tp() ?
> 
> Specifically, the only case where is_switch := false, is this case.

Mostly, it may work in some cases, but sched_exit happens with
interrupt enabled while all types of switches (including the vain ones)
must occur with interrupt disabled.

Some assumptions don't stand without this tracepoint, but I guess I
could adapt monitors to live without this if you believe it's not worth
adding a new tracepoint there.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 13:40     ` Gabriele Monaco
@ 2025-07-16 13:45       ` Peter Zijlstra
  2025-07-16 14:07         ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 13:45 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 03:40:13PM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 14:38 +0200, Peter Zijlstra wrote:

> > > @@ -6786,6 +6793,7 @@ static void __sched notrace __schedule(int
> > > sched_mode)
> > >  		rq = context_switch(rq, prev, next, &rf);
> > >  	} else {
> > >  		rq_unpin_lock(rq, &rf);
> > > +		trace_sched_switch_vain_tp(preempt, prev,
> > > prev_state);
> > >  		__balance_callbacks(rq);
> > >  		raw_spin_rq_unlock_irq(rq);
> > >  	}
> > 
> > Hurmph... don't you already have this covered by:
> > trace_sched_exit_tp() ?
> > 
> > Specifically, the only case where is_switch := false, is this case.
> 
> Mostly, it may work in some cases, but sched_exit happens with
> interrupt enabled while all types of switches (including the vain ones)
> must occur with interrupt disabled.
> 
> Some assumptions don't stand without this tracepoint, but I guess I
> could adapt monitors to live without this if you believe it's not worth
> adding a new tracepoint there.

I'm not sure I understand the importance of IRQ state when describing
task transitions.

You know both:

 - schedule() invocations for any one task are in-order;
 - schedule() invocations for any one CPU are in-order.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 13:45       ` Peter Zijlstra
@ 2025-07-16 14:07         ` Gabriele Monaco
  2025-07-16 14:19           ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 14:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, 2025-07-16 at 15:45 +0200, Peter Zijlstra wrote:
> 
> I'm not sure I understand the importance of IRQ state when describing
> task transitions.
> 
> You know both:
> 
>  - schedule() invocations for any one task are in-order;
>  - schedule() invocations for any one CPU are in-order.
> 

It's to describe latencies, which is the original purpose of the
scheduler model: if the event supposed to wake up an RT task occurs
with interrupts disabled while scheduling, we'd need to wait for that
run to complete.
Now to be fair, it doesn't really matter whether that call to
__schedule switches or not, but always having a switch event simplifies
things while modelling.
I can rearrange models like sts (added in this series) not to expect
that.

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 14:07         ` Gabriele Monaco
@ 2025-07-16 14:19           ` Peter Zijlstra
  2025-07-16 14:38             ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 14:19 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 04:07:16PM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 15:45 +0200, Peter Zijlstra wrote:
> > 
> > I'm not sure I understand the importance of IRQ state when describing
> > task transitions.
> > 
> > You know both:
> > 
> >  - schedule() invocations for any one task are in-order;
> >  - schedule() invocations for any one CPU are in-order.
> > 
> 
> It's to describe latencies, which is the original purpose of the
> scheduler model: if the event supposed to wake up an RT task occurs
> with interrupts disabled while scheduling, we'd need to wait for that
> run to complete.
> Now to be fair, it doesn't really matter whether that call to
> __schedule switches or not, but always having a switch event simplifies
> things while modelling.

So in general I'm a minimalist; less is more etc. 

Even if you care about latencies, I don't see what that tracepoint
adds. In fact, I don't even see the point of the .is_switch argument to
trace_sched_exit_tp(). That state can be fully reconstructed from having
seen trace_sched_switch() between trace_sched_{enter,exit}_tp().

As for the IRQ state, if you see:

 trace_sched_enter_tp();
 trace_irq_disable();
 trace_irq_enable();

You already know all you need to know; there was no switch, otherwise it
would'be been:

 trace_sched_enter_tp();
 trace_irq_disable();
 trace_sched_switch();
 trace_irq_enable();

Also, can we get rid of that CALLER_ADDR0 argument as well?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 14:19           ` Peter Zijlstra
@ 2025-07-16 14:38             ` Gabriele Monaco
  2025-07-16 15:31               ` Peter Zijlstra
  0 siblings, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 14:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, 2025-07-16 at 16:19 +0200, Peter Zijlstra wrote:
> So in general I'm a minimalist; less is more etc.
>
> Even if you care about latencies, I don't see what that tracepoint adds. In
> fact, I don't even see the point of the .is_switch argument to
> trace_sched_exit_tp(). That state can be fully reconstructed from having seen
> trace_sched_switch() between trace_sched_{enter,exit}_tp().
>
> As for the IRQ state, if you see:
>
>  trace_sched_enter_tp();
>  trace_irq_disable();
>  trace_irq_enable();
>
> You already know all you need to know; there was no switch, otherwise
> it would'be been:
>
>  trace_sched_enter_tp();
>  trace_irq_disable();
>  trace_sched_switch();
>  trace_irq_enable();
>

Or you could see:

 trace_sched_enter_tp();
 trace_irq_disable();
 trace_irq_enable();
 trace_irq_disable();
 trace_sched_switch();
 trace_irq_enable();

Where in fact there /was/ a switch, that trace was actually something
like:

 trace_sched_enter_tp();
 trace_irq_disable();
 **irq_entry();**
 **irq_exit();**
 trace_irq_enable();
 trace_irq_disable();
 trace_sched_switch();
 trace_irq_enable();

So as you said, we can still reconstruct what happened from the trace, but the
model suddenly needs more states and more events.
If we could directly tell whether interrupts were disabled manually or from an
actual interrupt, that wouldn't be necessary, for instance (as in the original
model by Daniel).

I get your point why we don't really need the additional tracepoint, but some
arguments giving more context come almost for free.

> Also, can we get rid of that CALLER_ADDR0 argument as well?

Yeah good point, this might have been be useful to understand some things
(__schedule called from preempt_irq for instance) but it's a pain to make sense
out of it.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 12:38   ` Peter Zijlstra
  2025-07-16 13:40     ` Gabriele Monaco
@ 2025-07-16 15:09     ` Gabriele Monaco
  2025-07-16 15:47       ` Peter Zijlstra
  1 sibling, 1 reply; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 15:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, 2025-07-16 at 14:38 +0200, Peter Zijlstra wrote:
> On Tue, Jul 15, 2025 at 09:14:29AM +0200, Gabriele Monaco wrote:
> > Add the following tracepoints:
> > * sched_set_need_resched(tsk, cpu, tif)
> >     Called when a task is set the need resched [lazy] flag
> > * sched_switch_vain(preempt, tsk, tsk_state)
> >     Called when a task is selected again during __schedule
> >     i.e. prev == next == tsk : no real context switch
> 
> > @@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq,
> > struct task_struct *p,
> >  	int flags = DEQUEUE_NOCLOCK;
> >  
> >  	if (signal_pending_state(task_state, p)) {
> > +		trace_sched_set_state_tp(p, TASK_RUNNING, true);
> >  		WRITE_ONCE(p->__state, TASK_RUNNING);
> >  		*task_state_p = TASK_RUNNING;
> >  		return false;
> 
> I'm confused on the purpose of this. How does this relate to say the
> wakeup in signal_wake_up_state() ?

Also this adds more context: models like sssw (in this series) expect
that, after a task is set to sleepable, it either goes to sleep or is
woken up/set to runnable.

In this specific case, the task is set to runnable without tracing it,
so the model doesn't know what happened, since it may not see a wakeup
after that (the task is already runnable).

Now I'm not sure if there are other events that we are guaranteed to
see to reconstruct this specific case (at some point we should see the
signal, I assume).
This just simplified things as that is the only state change that was
not traced.

Am I missing anything obvious here?

Thanks,
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 14:38             ` Gabriele Monaco
@ 2025-07-16 15:31               ` Peter Zijlstra
  2025-07-16 16:14                 ` Gabriele Monaco
  0 siblings, 1 reply; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 15:31 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 04:38:36PM +0200, Gabriele Monaco wrote:

> So as you said, we can still reconstruct what happened from the trace, but the
> model suddenly needs more states and more events.

So given a sequence like:

  trace_sched_enter_tp();
  { trace_irq_disable();
    **irq_entry();**
    **irq_exit();**
    trace_irq_enable(); } * Ni
  trace_irq_disable();
  { trace_sched_switch(); } * Nj
  trace_irq_enable();
  { trace_irq_disable();
    **irq_entry();**
    **irq_exit();**
    trace_irq_enable(); } * Nk
  trace_sched_exit_tp();

It becomes somewhat hard to figure out which exact IRQ disabled section
the switch did not happen in (Nj == 0).

> If we could directly tell whether interrupts were disabled manually or from an
> actual interrupt, that wouldn't be necessary, for instance (as in the original
> model by Daniel).

Hmm.. we do indeed appear to trace the IRQ state before adding
HARDIRQ_OFFSET to preempt_count(). Yes, that complicates things a
little.

So... it *might* be possible to lift lockdep_hardirq_enter() to before
we start tracing. But then you're stuck to running with lockdep
enabled -- I'm thinking that's not ideal, given those other patches you
sent.

I'm going to go on holidays soon, but I've made a note to see if we can
lift setting HARDIRQ_OFFSET before we start tracing. IIRC the current
order is because setting HARDIRQ_OFFSET is using preempt_count_add()
which can be instrumented itself.

But we could use __preempt_count_add() instead, then we loose the
tracing from setting HARDIRQ_OFFSET, but I don't think that is a
problem. We already get the latency from the IRQ tracepoints after all.

> I get your point why we don't really need the additional tracepoint, but some
> arguments giving more context come almost for free.

Right. So please always try and justify adding tracepoints.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 15:09     ` Gabriele Monaco
@ 2025-07-16 15:47       ` Peter Zijlstra
  0 siblings, 0 replies; 57+ messages in thread
From: Peter Zijlstra @ 2025-07-16 15:47 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 16, 2025 at 05:09:43PM +0200, Gabriele Monaco wrote:
> On Wed, 2025-07-16 at 14:38 +0200, Peter Zijlstra wrote:
> > On Tue, Jul 15, 2025 at 09:14:29AM +0200, Gabriele Monaco wrote:
> > > Add the following tracepoints:
> > > * sched_set_need_resched(tsk, cpu, tif)
> > >     Called when a task is set the need resched [lazy] flag
> > > * sched_switch_vain(preempt, tsk, tsk_state)
> > >     Called when a task is selected again during __schedule
> > >     i.e. prev == next == tsk : no real context switch
> > 
> > > @@ -6592,6 +6598,7 @@ static bool try_to_block_task(struct rq *rq,
> > > struct task_struct *p,
> > >  	int flags = DEQUEUE_NOCLOCK;
> > >  
> > >  	if (signal_pending_state(task_state, p)) {
> > > +		trace_sched_set_state_tp(p, TASK_RUNNING, true);
> > >  		WRITE_ONCE(p->__state, TASK_RUNNING);
> > >  		*task_state_p = TASK_RUNNING;
> > >  		return false;
> > 
> > I'm confused on the purpose of this. How does this relate to say the
> > wakeup in signal_wake_up_state() ?
> 
> Also this adds more context: models like sssw (in this series) expect
> that, after a task is set to sleepable, it either goes to sleep or is
> woken up/set to runnable.
> 
> In this specific case, the task is set to runnable without tracing it,
> so the model doesn't know what happened, since it may not see a wakeup
> after that (the task is already runnable).
> 
> Now I'm not sure if there are other events that we are guaranteed to
> see to reconstruct this specific case (at some point we should see the
> signal, I assume).
> This just simplified things as that is the only state change that was
> not traced.
> 
> Am I missing anything obvious here?


AFAICT this is just a normal wakeup race.

Given:

  for (;;) {
    set_current_state(TASK_UNINTERRUPTIBLE);
    if (COND)
      break;
    schedule();
  }
  __set_current_state(TASK_RUNNING);

vs

  COND=1;
  wake_up_state(p, TASK_UNINTERRUPTIBLE);

The race is seeing COND before or after hitting schedule(). With
interruptible this simply becomes:


  for (;;) {
    set_current_state(TASK_INTERRUPTIBLE);
    if (SIGPENDING || COND)
      break;
    schedule();
  }
  __set_current_state(TASK_RUNNING);

vs

  COND=1;
  wake_up_state(p, TASK_INTERRUPTIBLE);

vs

  set_tsk_thread_flag(p, TIF_SIGPENDING);
  wake_up_state(p, TASK_INTERRUPTIBLE);


(same with KILLABLE, but for fatal signals only)
(except the signal thing will often exit with -EINTR / -ERESTARTSYS, but
that should matter here, right?)

How is the race for seeing SIGPENDING different from seeing COND?

Both will issue a wakeup; except in one case the wakeup is superfluous
because schedule didn't end up blocking because it already saw the
condition, while in the other case it did block and the wakeup is indeed
needed.

Anyway, I don't mind tracing it, but I am confused on that new (3rd)
argument to trace_sched_set_state_tp().


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model
  2025-07-16 15:31               ` Peter Zijlstra
@ 2025-07-16 16:14                 ` Gabriele Monaco
  0 siblings, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-16 16:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, 2025-07-16 at 17:31 +0200, Peter Zijlstra wrote:
> On Wed, Jul 16, 2025 at 04:38:36PM +0200, Gabriele Monaco wrote:
> 
> > So as you said, we can still reconstruct what happened from the
> > trace, but the model suddenly needs more states and more events.
> 
> So given a sequence like:
> 
>   trace_sched_enter_tp();
>   { trace_irq_disable();
>     **irq_entry();**
>     **irq_exit();**
>     trace_irq_enable(); } * Ni
>   trace_irq_disable();
>   { trace_sched_switch(); } * Nj
>   trace_irq_enable();
>   { trace_irq_disable();
>     **irq_entry();**
>     **irq_exit();**
>     trace_irq_enable(); } * Nk
>   trace_sched_exit_tp();
> 
> It becomes somewhat hard to figure out which exact IRQ disabled
> section
> the switch did not happen in (Nj == 0).
> 
> > If we could directly tell whether interrupts were disabled manually
> > or from an actual interrupt, that wouldn't be necessary, for
> > instance (as in the original model by Daniel).
> 
> Hmm.. we do indeed appear to trace the IRQ state before adding
> HARDIRQ_OFFSET to preempt_count(). Yes, that complicates things a
> little.
> 
> So... it *might* be possible to lift lockdep_hardirq_enter() to
> before we start tracing. But then you're stuck to running with
> lockdep enabled -- I'm thinking that's not ideal, given those other
> patches you sent.
> 
> I'm going to go on holidays soon, but I've made a note to see if we
> can lift setting HARDIRQ_OFFSET before we start tracing. IIRC the
> current order is because setting HARDIRQ_OFFSET is using
> preempt_count_add() which can be instrumented itself.
> 

Yeah I wondered if that was something perhaps required by RCU or
something else (some calls are in the way). NMIs have it set during the
tracepoints, for instance.

Thanks again and enjoy your holiday!

Gabriele

> But we could use __preempt_count_add() instead, then we loose the
> tracing from setting HARDIRQ_OFFSET, but I don't think that is a
> problem. We already get the latency from the IRQ tracepoints after
> all.
> 
> > I get your point why we don't really need the additional
> > tracepoint, but some
> > arguments giving more context come almost for free.
> 
> Right. So please always try and justify adding tracepoints.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v3 17/17] rv: Add opid per-cpu monitor
  2025-07-16  9:38   ` Nam Cao
  2025-07-16 10:00     ` Gabriele Monaco
@ 2025-07-18 10:26     ` Gabriele Monaco
  1 sibling, 0 replies; 57+ messages in thread
From: Gabriele Monaco @ 2025-07-18 10:26 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Wed, 2025-07-16 at 11:38 +0200, Nam Cao wrote:
> And the monitor reports some errors on riscv64 with PREEMPT_RT=y:
> 
> root@riscv:~/rv-tests# uname -a
> Linux riscv 6.16.0-rc6-00054-g7590637d9ca2 #87 SMP PREEMPT_RT Wed Jul
> 16 11:26:00 CEST 2025 riscv64 GNU/Linux
> root@riscv:~/rv-tests# stress-ng --cpu-sched -1
> stress-ng: info:  [452] defaulting to a 1 day run per stressor
> stress-ng: info:  [452] dispatching hogs: 4 cpu-sched
> [  614.390462] rv: monitor opid does not allow event irq_entry on
> state in_irq

Finally managed to bootstrap a riscv VM, that is an error I thought I'd
fixed and I'm genuinely surprised I didn't notice on other
architectures.

Thanks again for catching it though!
Gabriele


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2025-07-18 10:27 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15  7:14 [PATCH v3 00/17] rv: Add monitors to validate task switch Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 01/17] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-07-16 11:50   ` Peter Zijlstra
2025-07-16 12:18     ` Gabriele Monaco
2025-07-16 12:41       ` Peter Zijlstra
2025-07-16 13:05         ` Gabriele Monaco
2025-07-16 13:08           ` Peter Zijlstra
2025-07-16 13:13             ` Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 02/17] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 03/17] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 04/17] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 05/17] rv: Return init error when registering monitors Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 06/17] rv: Use strings in da monitors tracepoints Gabriele Monaco
2025-07-15 14:11   ` Nam Cao
2025-07-15  7:14 ` [PATCH v3 07/17] rv: Adjust monitor dependencies Gabriele Monaco
2025-07-16  8:19   ` Nam Cao
2025-07-16  8:30     ` Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 08/17] verification/rvgen: Organise Kconfig entries for nested monitors Gabriele Monaco
2025-07-15 14:48   ` Nam Cao
2025-07-16  7:40     ` Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 09/17] tools/dot2c: Fix generated files going over 100 column limit Gabriele Monaco
2025-07-15 15:01   ` Nam Cao
     [not found]     ` <d69862275becf1d296c80a08b29b2081857a85a1.camel@redhat.com>
2025-07-16  9:34       ` Nam Cao
2025-07-15  7:14 ` [PATCH v3 10/17] rv: " Gabriele Monaco
2025-07-15 15:08   ` Nam Cao
2025-07-15 15:24     ` Gabriele Monaco
2025-07-16  8:13       ` Nam Cao
2025-07-16  9:07         ` Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 11/17] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-07-15 15:23   ` Nam Cao
2025-07-16  8:20     ` Gabriele Monaco
2025-07-16  8:27       ` Nam Cao
2025-07-16  8:38         ` Gabriele Monaco
2025-07-16  8:45           ` Nam Cao
2025-07-16  8:59             ` Gabriele Monaco
2025-07-16  9:02               ` Nam Cao
2025-07-15  7:14 ` [PATCH v3 12/17] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
2025-07-16 12:38   ` Peter Zijlstra
2025-07-16 13:40     ` Gabriele Monaco
2025-07-16 13:45       ` Peter Zijlstra
2025-07-16 14:07         ` Gabriele Monaco
2025-07-16 14:19           ` Peter Zijlstra
2025-07-16 14:38             ` Gabriele Monaco
2025-07-16 15:31               ` Peter Zijlstra
2025-07-16 16:14                 ` Gabriele Monaco
2025-07-16 15:09     ` Gabriele Monaco
2025-07-16 15:47       ` Peter Zijlstra
2025-07-15  7:14 ` [PATCH v3 13/17] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 14/17] rv: Extend snroc model Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 15/17] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 16/17] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-07-15  7:14 ` [PATCH v3 17/17] rv: Add opid per-cpu monitor Gabriele Monaco
2025-07-16  9:38   ` Nam Cao
2025-07-16 10:00     ` Gabriele Monaco
2025-07-18 10:26     ` Gabriele Monaco
2025-07-16  9:42 ` [PATCH v3 00/17] rv: Add monitors to validate task switch Nam Cao
2025-07-16 12:20   ` Gabriele Monaco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).