linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] rv: Add monitors to validate task switch
@ 2025-07-28 13:50 Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 1/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Nam Cao
  Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
	John Kacur

This series adds three monitors to the sched collection, extends and
replaces previously existing monitors:

{tss,snroc} => sts:
Not only prove that switches occur in scheduling context and scheduling
needs interrupt disabled but also that each call to the scheduler
disables interrupts to (optionally) switch.

nrp (NEW):
* preemption requires need resched which is cleared by any switch
  (includes a non optimal workaround for /nested/ preemptions)

sssw (NEW):
* suspension requires setting the task to sleepable and, after the
  switch occurs, the task requires a wakeup to come back to runnable

opid (NEW):
* waking and need-resched operations occur with interrupts and
  preemption disabled or in IRQ without explicitly disabling preemption

Also include some minor cleanup patches (1-4) tracepoints (6) and
preparatory fixes (5) covering some corner cases:

The series is currently based on the tracing rv/for-next tree.

Patch 1 adds da_handle_start_run_event_ also to per-task monitors

Patch 2 removes a trailing whitespace from the rv tracepoint string

Patch 3 fixes an out-of-bound memory access in DA tracepoints

Patch 4 adjusts monitors to have minimised Kconfig dependencies

Patch 5 detects race conditions when rv monitors run concurrently and
retries applying the events

Patch 6 adds the need_resched and removes unused arguments from
schedule entry/exit tracepoints

Patch 7 adds the sts monitor to replace tss and sncid

Patch 8 adds the nrp and sssw monitors

Patch 9 adds the opid monitor

NOTES

The nrp and sssw monitors include workarounds for racy conditions:

* A sleeping task requires to set the state to sleepable, but in case of
  a task sleeping on an rtlock, the set sleepable and wakeup events race
  and we don't always see the right order:

 5d..2. 107.488369: event: 639: sleepable x set_sleepable -> sleepable
 4d..5. 107.488369: event: 639: sleepable x wakeup -> running (final)
 5d..3. 107.488385: error: 639: switch_suspend not expected in the state running

    wakeup()                    set_state()
        state=RUNNING
                                    trace_set_state()
        trace_wakeup()
                                    state=SLEEPING

  I added a special event (switch_block) but there may be a better way.
  Taking a pi_lock in rtlock_slowlock_locked when setting the state to
  TASK_RTLOCK_WAIT avoids this race, although this is likely not
  something we want to do.

* I consider preemption any scheduling with preempt==true and assume
  this can happen only if need resched is set.
  In practice, however, we may see a preemption where the flag
  is not set. This can happen in one specific condition:

  need_resched
                  preempt_schedule()
                                        preempt_schedule_irq()
                                            __schedule()
  !need_resched
                      __schedule()

  We start a standard preemption (e.g. from preempt_enable when the flag
  is set), an interrupts occurs before we schedule and, on its exit path,
  it schedules, which clears the need_resched flag.
  When the preempted task runs again, we continue the standard
  preemption started earlier, although the flag is no longer set.

  I added a workaround to allow the model not to fail in this condition,
  by allowing a preemption without need_resched if an interrupt is
  received. This might catch false negatives too.

Changes since V4:
* Drop already applied patches for the tools/ directory
* Avoid using smp_processor_id() from tracepoint context

Changes since V3 [1]:
* Fix condition for lines shorter than 100 columns in dot2c.
* Fix Kconfig tooltip for container monitors in rvgen.
* Improve condition not to skip pid-0 in userspace tool.
* Rearrange monitors to reduce needed tracepoints and arguments.
* Separately track errors when DAs run out of retries due to races.
* Fix issue in opid with multiple handlers in the same interrupt.
* Cleanup patches by removing, squashing and reordering.

Changes since RFC2:
 * Arrange commits to prevent failed build while bisecting.
 * Avoid dot2k generated files to reach the column limit. (Nam Cao)
 * Rearrange and simplify da_monitor retry on racing events.
 * Improve nrp monitor to handle /nested/ preemption on IRQ.
 * Added minor patches (6-10).
 * Cleanup and rearrange order.
Changes since RFC [2]:
 * Remove wakeup tracepoint in try_to_block_task and use a different
   flavour of sched_set_state
 * Split the large srs monitor in two separate monitors for preemption
   and sleep. These no longer have a concept of running task, they just
   enforce the requirements for the different types of sched out.
 * Restore the snroc monitor to describe the relationship between
   generic sched out and sched in.
 * Add opid monitor.
 * Fix some build errors and cleanup.

[1] - https://lore.kernel.org/lkml/20250715071434.22508-1-gmonaco@redhat.com
[2] - https://lore.kernel.org/lkml/20250404084512.98552-11-gmonaco@redhat.com

To: Ingo Molnar <mingo@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Steven Rostedt <rostedt@goodmis.org>
To: Nam Cao <namcao@linutronix.de>
Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Juri Lelli <jlelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>

Gabriele Monaco (9):
  rv: Add da_handle_start_run_event_ to per-task monitors
  rv: Remove trailing whitespace from tracepoint string
  rv: Use strings in da monitors tracepoints
  rv: Adjust monitor dependencies
  rv: Retry when da monitor detects race conditions
  sched: Adapt sched tracepoints for RV task model
  rv: Replace tss and sncid monitors with more complete sts
  rv: Add nrp and sssw per-task monitors
  rv: Add opid per-cpu monitor

 Documentation/trace/rv/monitor_sched.rst      | 307 +++++++++++++++---
 include/linux/rv.h                            |   3 +-
 include/linux/sched.h                         |   7 +-
 include/rv/da_monitor.h                       | 131 +++++---
 include/trace/events/sched.h                  |  12 +-
 kernel/sched/core.c                           |  13 +-
 kernel/trace/rv/Kconfig                       |  11 +-
 kernel/trace/rv/Makefile                      |   6 +-
 kernel/trace/rv/monitors/{tss => nrp}/Kconfig |  12 +-
 kernel/trace/rv/monitors/nrp/nrp.c            | 138 ++++++++
 kernel/trace/rv/monitors/nrp/nrp.h            |  75 +++++
 kernel/trace/rv/monitors/nrp/nrp_trace.h      |  15 +
 kernel/trace/rv/monitors/opid/Kconfig         |  19 ++
 kernel/trace/rv/monitors/opid/opid.c          | 168 ++++++++++
 kernel/trace/rv/monitors/opid/opid.h          | 104 ++++++
 .../sncid_trace.h => opid/opid_trace.h}       |   8 +-
 kernel/trace/rv/monitors/sched/Kconfig        |   1 +
 kernel/trace/rv/monitors/sco/sco.c            |   4 +-
 kernel/trace/rv/monitors/scpd/Kconfig         |   2 +-
 kernel/trace/rv/monitors/scpd/scpd.c          |   4 +-
 kernel/trace/rv/monitors/sncid/sncid.c        |  95 ------
 kernel/trace/rv/monitors/sncid/sncid.h        |  49 ---
 kernel/trace/rv/monitors/snep/Kconfig         |   2 +-
 kernel/trace/rv/monitors/snep/snep.c          |   4 +-
 .../trace/rv/monitors/{sncid => sssw}/Kconfig |  10 +-
 kernel/trace/rv/monitors/sssw/sssw.c          | 116 +++++++
 kernel/trace/rv/monitors/sssw/sssw.h          | 105 ++++++
 kernel/trace/rv/monitors/sssw/sssw_trace.h    |  15 +
 kernel/trace/rv/monitors/sts/Kconfig          |  19 ++
 kernel/trace/rv/monitors/sts/sts.c            | 156 +++++++++
 kernel/trace/rv/monitors/sts/sts.h            | 117 +++++++
 .../{tss/tss_trace.h => sts/sts_trace.h}      |   8 +-
 kernel/trace/rv/monitors/tss/tss.c            |  90 -----
 kernel/trace/rv/monitors/tss/tss.h            |  47 ---
 kernel/trace/rv/monitors/wip/Kconfig          |   2 +-
 kernel/trace/rv/rv_trace.h                    | 114 ++++---
 tools/verification/models/sched/nrp.dot       |  29 ++
 tools/verification/models/sched/opid.dot      |  35 ++
 tools/verification/models/sched/sncid.dot     |  18 -
 tools/verification/models/sched/sssw.dot      |  30 ++
 tools/verification/models/sched/sts.dot       |  38 +++
 tools/verification/models/sched/tss.dot       |  18 -
 42 files changed, 1665 insertions(+), 492 deletions(-)
 rename kernel/trace/rv/monitors/{tss => nrp}/Kconfig (51%)
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.c
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.h
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp_trace.h
 create mode 100644 kernel/trace/rv/monitors/opid/Kconfig
 create mode 100644 kernel/trace/rv/monitors/opid/opid.c
 create mode 100644 kernel/trace/rv/monitors/opid/opid.h
 rename kernel/trace/rv/monitors/{sncid/sncid_trace.h => opid/opid_trace.h} (66%)
 delete mode 100644 kernel/trace/rv/monitors/sncid/sncid.c
 delete mode 100644 kernel/trace/rv/monitors/sncid/sncid.h
 rename kernel/trace/rv/monitors/{sncid => sssw}/Kconfig (58%)
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.c
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.h
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw_trace.h
 create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sts/sts.c
 create mode 100644 kernel/trace/rv/monitors/sts/sts.h
 rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
 create mode 100644 tools/verification/models/sched/nrp.dot
 create mode 100644 tools/verification/models/sched/opid.dot
 delete mode 100644 tools/verification/models/sched/sncid.dot
 create mode 100644 tools/verification/models/sched/sssw.dot
 create mode 100644 tools/verification/models/sched/sts.dot
 delete mode 100644 tools/verification/models/sched/tss.dot


base-commit: b8a7fba39cd49eab343bfe561d85bb5dc57541af
-- 
2.50.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v5 1/9] rv: Add da_handle_start_run_event_ to per-task monitors
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 2/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The RV da_monitor API allows to start monitors in two ways:
da_handle_start_event_NAME and da_handle_start_run_event_NAME.
The former is used when the event is followed by the initial state of
the module, so we ignore the event but we know the monitor is in the
initial state and can start monitoring, the latter can be used if the
event can only occur in the initial state, so we do handle the event as
if the monitor was in the initial state.
This latter API is defined for implicit monitors but not per-task ones.

Define da_handle_start_run_event_NAME macro also for per-task monitors.

Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/rv/da_monitor.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 15f9ed4e4bb6..ed3c34fe18d6 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -487,6 +487,30 @@ da_handle_start_event_##name(struct task_struct *tsk, enum events_##name event)
 	__da_handle_event_##name(da_mon, tsk, event);						\
 												\
 	return 1;										\
+}												\
+												\
+/*												\
+ * da_handle_start_run_event_##name - start monitoring and handle event				\
+ *												\
+ * This function is used to notify the monitor that the system is in the			\
+ * initial state, so the monitor can start monitoring and handling event.			\
+ */												\
+static inline bool										\
+da_handle_start_run_event_##name(struct task_struct *tsk, enum events_##name event)		\
+{												\
+	struct da_monitor *da_mon;								\
+												\
+	if (!da_monitor_enabled_##name())							\
+		return 0;									\
+												\
+	da_mon = da_get_monitor_##name(tsk);							\
+												\
+	if (unlikely(!da_monitoring_##name(da_mon)))						\
+		da_monitor_start_##name(da_mon);						\
+												\
+	__da_handle_event_##name(da_mon, tsk, event);						\
+												\
+	return 1;										\
 }
 
 /*
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 2/9] rv: Remove trailing whitespace from tracepoint string
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 1/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 3/9] rv: Use strings in da monitors tracepoints Gabriele Monaco
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

RV event tracepoints print a line with the format:
    "event_xyz: S0 x event -> S1 "
    "event_xyz: S1 x event -> S0 (final)"

While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").

Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".

Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/rv_trace.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index b6f310498466..17ba07329b67 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -29,11 +29,11 @@ DECLARE_EVENT_CLASS(event_da_monitor,
 		__entry->final_state		= final_state;
 	),
 
-	TP_printk("%s x %s -> %s %s",
+	TP_printk("%s x %s -> %s%s",
 		__entry->state,
 		__entry->event,
 		__entry->next_state,
-		__entry->final_state ? "(final)" : "")
+		__entry->final_state ? " (final)" : "")
 );
 
 DECLARE_EVENT_CLASS(error_da_monitor,
@@ -90,12 +90,12 @@ DECLARE_EVENT_CLASS(event_da_monitor_id,
 		__entry->final_state		= final_state;
 	),
 
-	TP_printk("%d: %s x %s -> %s %s",
+	TP_printk("%d: %s x %s -> %s%s",
 		__entry->id,
 		__entry->state,
 		__entry->event,
 		__entry->next_state,
-		__entry->final_state ? "(final)" : "")
+		__entry->final_state ? " (final)" : "")
 );
 
 DECLARE_EVENT_CLASS(error_da_monitor_id,
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 3/9] rv: Use strings in da monitors tracepoints
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 1/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 2/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 4/9] rv: Adjust monitor dependencies Gabriele Monaco
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Using DA monitors tracepoints with KASAN enabled triggers the following
warning:

 BUG: KASAN: global-out-of-bounds in do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
 Read of size 32 at addr ffffffffaada8980 by task ...
 Call Trace:
  <TASK>
 [...]
  do_trace_event_raw_event_event_da_monitor+0xd6/0x1a0
  ? __pfx_do_trace_event_raw_event_event_da_monitor+0x10/0x10
  ? trace_event_sncid+0x83/0x200
  trace_event_sncid+0x163/0x200
 [...]
 The buggy address belongs to the variable:
  automaton_snep+0x4e0/0x5e0

This is caused by the tracepoints reading 32 bytes __array instead of
__string from the automata definition. Such strings are literals and
reading 32 bytes ends up in out of bound memory accesses (e.g. the next
automaton's data in this case).
The error is harmless as, while printing the string, we stop at the null
terminator, but it should still be fixed.

Use the __string facilities while defining the tracepoints to avoid
reading out of bound memory.

Fixes: 792575348ff7 ("rv/include: Add deterministic automata monitor definition via C macros")
Reviewed-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/rv_trace.h | 76 +++++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 38 deletions(-)

diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 17ba07329b67..d38e0d3abdfd 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -16,23 +16,23 @@ DECLARE_EVENT_CLASS(event_da_monitor,
 	TP_ARGS(state, event, next_state, final_state),
 
 	TP_STRUCT__entry(
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
-		__array(	char,	next_state,	MAX_DA_NAME_LEN	)
-		__field(	bool,	final_state			)
+		__string(	state,		state		)
+		__string(	event,		event		)
+		__string(	next_state,	next_state	)
+		__field(	bool,		final_state	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		memcpy(__entry->next_state,	next_state,	MAX_DA_NAME_LEN);
-		__entry->final_state		= final_state;
+		__assign_str(state);
+		__assign_str(event);
+		__assign_str(next_state);
+		__entry->final_state = final_state;
 	),
 
 	TP_printk("%s x %s -> %s%s",
-		__entry->state,
-		__entry->event,
-		__entry->next_state,
+		__get_str(state),
+		__get_str(event),
+		__get_str(next_state),
 		__entry->final_state ? " (final)" : "")
 );
 
@@ -43,18 +43,18 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 	TP_ARGS(state, event),
 
 	TP_STRUCT__entry(
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
+		__string(	state,	state	)
+		__string(	event,	event	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
+		__assign_str(state);
+		__assign_str(event);
 	),
 
 	TP_printk("event %s not expected in the state %s",
-		__entry->event,
-		__entry->state)
+		__get_str(event),
+		__get_str(state))
 );
 
 #include <monitors/wip/wip_trace.h>
@@ -75,26 +75,26 @@ DECLARE_EVENT_CLASS(event_da_monitor_id,
 	TP_ARGS(id, state, event, next_state, final_state),
 
 	TP_STRUCT__entry(
-		__field(	int,	id				)
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
-		__array(	char,	next_state,	MAX_DA_NAME_LEN	)
-		__field(	bool,	final_state			)
+		__field(	int,		id		)
+		__string(	state,		state		)
+		__string(	event,		event		)
+		__string(	next_state,	next_state	)
+		__field(	bool,		final_state	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		memcpy(__entry->next_state,	next_state,	MAX_DA_NAME_LEN);
-		__entry->id			= id;
-		__entry->final_state		= final_state;
+		__assign_str(state);
+		__assign_str(event);
+		__assign_str(next_state);
+		__entry->id		= id;
+		__entry->final_state	= final_state;
 	),
 
 	TP_printk("%d: %s x %s -> %s%s",
 		__entry->id,
-		__entry->state,
-		__entry->event,
-		__entry->next_state,
+		__get_str(state),
+		__get_str(event),
+		__get_str(next_state),
 		__entry->final_state ? " (final)" : "")
 );
 
@@ -105,21 +105,21 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
 	TP_ARGS(id, state, event),
 
 	TP_STRUCT__entry(
-		__field(	int,	id				)
-		__array(	char,	state,		MAX_DA_NAME_LEN	)
-		__array(	char,	event,		MAX_DA_NAME_LEN	)
+		__field(	int,	id	)
+		__string(	state,	state	)
+		__string(	event,	event	)
 	),
 
 	TP_fast_assign(
-		memcpy(__entry->state,		state,		MAX_DA_NAME_LEN);
-		memcpy(__entry->event,		event,		MAX_DA_NAME_LEN);
-		__entry->id			= id;
+		__assign_str(state);
+		__assign_str(event);
+		__entry->id	= id;
 	),
 
 	TP_printk("%d: event %s not expected in the state %s",
 		__entry->id,
-		__entry->event,
-		__entry->state)
+		__get_str(event),
+		__get_str(state))
 );
 
 #include <monitors/wwnr/wwnr_trace.h>
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 4/9] rv: Adjust monitor dependencies
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (2 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 3/9] rv: Use strings in da monitors tracepoints Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 5/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu, Gabriele Monaco,
	linux-trace-kernel
  Cc: Nam Cao, Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

RV monitors relying on the preemptirqs tracepoints are set as dependent
on PREEMPT_TRACER and IRQSOFF_TRACER. In fact, those configurations do
enable the tracepoints but are not the minimal configurations enabling
them, which are TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS (not selectable
manually).

Set TRACE_PREEMPT_TOGGLE and TRACE_IRQFLAGS as dependencies for
monitors.

Fixes: fbe6c09b7eb4 ("rv: Add scpd, snep and sncid per-cpu monitors")
Acked-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/trace/rv/monitors/scpd/Kconfig  | 2 +-
 kernel/trace/rv/monitors/sncid/Kconfig | 2 +-
 kernel/trace/rv/monitors/snep/Kconfig  | 2 +-
 kernel/trace/rv/monitors/wip/Kconfig   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/rv/monitors/scpd/Kconfig b/kernel/trace/rv/monitors/scpd/Kconfig
index b9114fbf680f..682d0416188b 100644
--- a/kernel/trace/rv/monitors/scpd/Kconfig
+++ b/kernel/trace/rv/monitors/scpd/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SCPD
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/sncid/Kconfig b/kernel/trace/rv/monitors/sncid/Kconfig
index 76bcfef4fd10..3a5639feaaaf 100644
--- a/kernel/trace/rv/monitors/sncid/Kconfig
+++ b/kernel/trace/rv/monitors/sncid/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SNCID
 	depends on RV
-	depends on IRQSOFF_TRACER
+	depends on TRACE_IRQFLAGS
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/snep/Kconfig b/kernel/trace/rv/monitors/snep/Kconfig
index 77527f971232..7dd54f434ff7 100644
--- a/kernel/trace/rv/monitors/snep/Kconfig
+++ b/kernel/trace/rv/monitors/snep/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_SNEP
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	depends on RV_MON_SCHED
 	default y
 	select DA_MON_EVENTS_IMPLICIT
diff --git a/kernel/trace/rv/monitors/wip/Kconfig b/kernel/trace/rv/monitors/wip/Kconfig
index e464b9294865..87a26195792b 100644
--- a/kernel/trace/rv/monitors/wip/Kconfig
+++ b/kernel/trace/rv/monitors/wip/Kconfig
@@ -2,7 +2,7 @@
 #
 config RV_MON_WIP
 	depends on RV
-	depends on PREEMPT_TRACER
+	depends on TRACE_PREEMPT_TOGGLE
 	select DA_MON_EVENTS_IMPLICIT
 	bool "wip monitor"
 	help
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 5/9] rv: Retry when da monitor detects race conditions
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (3 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 4/9] rv: Adjust monitor dependencies Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 14:18   ` Nam Cao
  2025-07-28 13:50 ` [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:

  [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
  [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
  [63] error_srs: 27: event sched_switch_suspend not expected in the state running

In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.

Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react by calling a special
tracepoint, print on the console and reset the monitor.

Remove the functions da_monitor_curr_state() and da_monitor_set_state()
as they only hide the underlying implementation in this case.

Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/rv.h         |   3 +-
 include/rv/da_monitor.h    | 107 +++++++++++++++++++------------------
 kernel/trace/rv/Kconfig    |   5 ++
 kernel/trace/rv/rv_trace.h |  24 +++++++++
 4 files changed, 85 insertions(+), 54 deletions(-)

diff --git a/include/linux/rv.h b/include/linux/rv.h
index 80731242fe60..14410a42faef 100644
--- a/include/linux/rv.h
+++ b/include/linux/rv.h
@@ -10,7 +10,8 @@
 #include <linux/types.h>
 #include <linux/list.h>
 
-#define MAX_DA_NAME_LEN	32
+#define MAX_DA_NAME_LEN			32
+#define MAX_DA_RETRY_RACING_EVENTS	3
 
 #ifdef CONFIG_RV
 #include <linux/bitops.h>
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index ed3c34fe18d6..17fa4f6e5ea6 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -54,23 +54,6 @@ static inline void da_monitor_reset_##name(struct da_monitor *da_mon)				\
 	da_mon->curr_state = model_get_initial_state_##name();					\
 }												\
 												\
-/*												\
- * da_monitor_curr_state_##name - return the current state					\
- */												\
-static inline type da_monitor_curr_state_##name(struct da_monitor *da_mon)			\
-{												\
-	return da_mon->curr_state;								\
-}												\
-												\
-/*												\
- * da_monitor_set_state_##name - set the new current state					\
- */												\
-static inline void										\
-da_monitor_set_state_##name(struct da_monitor *da_mon, enum states_##name state)		\
-{												\
-	da_mon->curr_state = state;								\
-}												\
-												\
 /*												\
  * da_monitor_start_##name - start monitoring							\
  *												\
@@ -127,63 +110,81 @@ static inline bool da_monitor_handling_event_##name(struct da_monitor *da_mon)
  * Event handler for implicit monitors. Implicit monitor is the one which the
  * handler does not need to specify which da_monitor to manipulate. Examples
  * of implicit monitor are the per_cpu or the global ones.
+ *
+ * Retry in case there is a race between getting and setting the next state,
+ * warn and reset the monitor if it runs out of retries. The monitor should be
+ * able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_IMPLICIT(name, type)					\
 												\
 static inline bool										\
 da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE) {						\
+			cond_react_##name(curr_state, event);					\
+			trace_error_##name(model_get_state_name_##name(curr_state),		\
+					   model_get_event_name_##name(event));			\
+			return false;								\
+		}										\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state))) {	\
+			trace_event_##name(model_get_state_name_##name(curr_state),		\
+					   model_get_event_name_##name(event),			\
+					   model_get_state_name_##name(next_state),		\
+					   model_is_final_state_##name(next_state));		\
+			return true;								\
+		}										\
 	}											\
 												\
-	cond_react_##name(curr_state, event);							\
-												\
-	trace_error_##name(model_get_state_name_##name(curr_state),				\
-			   model_get_event_name_##name(event));					\
-												\
+	trace_rv_retries_error(#name, model_get_event_name_##name(event));			\
+	pr_warn("rv: " __stringify(MAX_DA_RETRY_RACING_EVENTS)					\
+		" retries reached for event %s, resetting monitor %s",				\
+		model_get_event_name_##name(event), #name);					\
 	return false;										\
 }												\
 
 /*
  * Event handler for per_task monitors.
+ *
+ * Retry in case there is a race between getting and setting the next state,
+ * warn and reset the monitor if it runs out of retries. The monitor should be
+ * able to handle various orders.
  */
 #define DECLARE_DA_MON_MODEL_HANDLER_PER_TASK(name, type)					\
 												\
 static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct *tsk,		\
 				   enum events_##name event)					\
 {												\
-	type curr_state = da_monitor_curr_state_##name(da_mon);					\
-	type next_state = model_get_next_state_##name(curr_state, event);			\
-												\
-	if (next_state != INVALID_STATE) {							\
-		da_monitor_set_state_##name(da_mon, next_state);				\
-												\
-		trace_event_##name(tsk->pid,							\
-				   model_get_state_name_##name(curr_state),			\
-				   model_get_event_name_##name(event),				\
-				   model_get_state_name_##name(next_state),			\
-				   model_is_final_state_##name(next_state));			\
-												\
-		return true;									\
+	enum states_##name curr_state, next_state;						\
+												\
+	curr_state = READ_ONCE(da_mon->curr_state);						\
+	for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) {					\
+		next_state = model_get_next_state_##name(curr_state, event);			\
+		if (next_state == INVALID_STATE) {						\
+			cond_react_##name(curr_state, event);					\
+			trace_error_##name(tsk->pid,						\
+					   model_get_state_name_##name(curr_state),		\
+					   model_get_event_name_##name(event));			\
+			return false;								\
+		}										\
+		if (likely(try_cmpxchg(&da_mon->curr_state, &curr_state, next_state))) {	\
+			trace_event_##name(tsk->pid,						\
+					   model_get_state_name_##name(curr_state),		\
+					   model_get_event_name_##name(event),			\
+					   model_get_state_name_##name(next_state),		\
+					   model_is_final_state_##name(next_state));		\
+			return true;								\
+		}										\
 	}											\
 												\
-	cond_react_##name(curr_state, event);							\
-												\
-	trace_error_##name(tsk->pid,								\
-			   model_get_state_name_##name(curr_state),				\
-			   model_get_event_name_##name(event));					\
-												\
+	trace_rv_retries_error(#name, model_get_event_name_##name(event));			\
+	pr_warn("rv: " __stringify(MAX_DA_RETRY_RACING_EVENTS)					\
+		" retries reached for event %s, resetting monitor %s",				\
+		model_get_event_name_##name(event), #name);					\
 	return false;										\
 }
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 26017378f79b..34164eb4ec91 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -3,12 +3,17 @@
 config RV_MON_EVENTS
 	bool
 
+config RV_MON_MAINTENANCE_EVENTS
+	bool
+
 config DA_MON_EVENTS_IMPLICIT
 	select RV_MON_EVENTS
+	select RV_MON_MAINTENANCE_EVENTS
 	bool
 
 config DA_MON_EVENTS_ID
 	select RV_MON_EVENTS
+	select RV_MON_MAINTENANCE_EVENTS
 	bool
 
 config LTL_MON_EVENTS_ID
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index d38e0d3abdfd..3af46cd185b3 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -176,6 +176,30 @@ DECLARE_EVENT_CLASS(error_ltl_monitor_id,
 #include <monitors/sleep/sleep_trace.h>
 // Add new monitors based on CONFIG_LTL_MON_EVENTS_ID here
 #endif /* CONFIG_LTL_MON_EVENTS_ID */
+
+#ifdef CONFIG_RV_MON_MAINTENANCE_EVENTS
+/* Tracepoint useful for monitors development, currenly only used in DA */
+TRACE_EVENT(rv_retries_error,
+
+	TP_PROTO(char *name, char *event),
+
+	TP_ARGS(name, event),
+
+	TP_STRUCT__entry(
+		__string(	name,	name	)
+		__string(	event,	event	)
+	),
+
+	TP_fast_assign(
+		__assign_str(name);
+		__assign_str(event);
+	),
+
+	TP_printk(__stringify(MAX_DA_RETRY_RACING_EVENTS)
+		" retries reached for event %s, resetting monitor %s",
+		__get_str(event), __get_str(name))
+);
+#endif /* CONFIG_RV_MON_MAINTENANCE_EVENTS */
 #endif /* _TRACE_RV_H */
 
 /* This part must be outside protection */
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (4 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 5/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 16:02   ` Nam Cao
  2025-07-28 13:50 ` [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts Gabriele Monaco
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Masami Hiramatsu, linux-trace-kernel
  Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

Add the following tracepoint:
* sched_set_need_resched(tsk, cpu, tif)
    Called when a task is set the need resched [lazy] flag

Remove the unused ip parameter from sched_entry and sched_exit and alter
sched_entry to have a value of preempt consistent with the one used in
sched_switch.

Also adapt all monitors using sched_{entry,exit} to avoid breaking build.

These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 include/linux/sched.h                  |  7 ++++++-
 include/trace/events/sched.h           | 12 ++++++++----
 kernel/sched/core.c                    | 13 ++++++++++---
 kernel/trace/rv/monitors/sco/sco.c     |  4 ++--
 kernel/trace/rv/monitors/scpd/scpd.c   |  4 ++--
 kernel/trace/rv/monitors/sncid/sncid.c |  4 ++--
 kernel/trace/rv/monitors/snep/snep.c   |  4 ++--
 kernel/trace/rv/monitors/tss/tss.c     |  4 ++--
 8 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index fabd7fe1a07a..91d1fdbc2f56 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -339,9 +339,11 @@ extern void io_schedule_finish(int token);
 extern long io_schedule_timeout(long timeout);
 extern void io_schedule(void);
 
-/* wrapper function to trace from this header file */
+/* wrapper functions to trace from this header file */
 DECLARE_TRACEPOINT(sched_set_state_tp);
 extern void __trace_set_current_state(int state_value);
+DECLARE_TRACEPOINT(sched_set_need_resched_tp);
+extern void __trace_set_need_resched(struct task_struct *curr, int tif);
 
 /**
  * struct prev_cputime - snapshot of system and user cputime
@@ -2063,6 +2065,9 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
 
 static inline void set_tsk_need_resched(struct task_struct *tsk)
 {
+	if (tracepoint_enabled(sched_set_need_resched_tp) &&
+	    !test_tsk_thread_flag(tsk, TIF_NEED_RESCHED))
+		__trace_set_need_resched(tsk, TIF_NEED_RESCHED);
 	set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
 }
 
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 4e6b2910cec3..c08893bde255 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -882,18 +882,22 @@ DECLARE_TRACE(sched_compute_energy,
 	TP_ARGS(p, dst_cpu, energy, max_util, busy_time));
 
 DECLARE_TRACE(sched_entry,
-	TP_PROTO(bool preempt, unsigned long ip),
-	TP_ARGS(preempt, ip));
+	TP_PROTO(bool preempt),
+	TP_ARGS(preempt));
 
 DECLARE_TRACE(sched_exit,
-	TP_PROTO(bool is_switch, unsigned long ip),
-	TP_ARGS(is_switch, ip));
+	TP_PROTO(bool is_switch),
+	TP_ARGS(is_switch));
 
 DECLARE_TRACE_CONDITION(sched_set_state,
 	TP_PROTO(struct task_struct *tsk, int state),
 	TP_ARGS(tsk, state),
 	TP_CONDITION(!!(tsk->__state) != !!state));
 
+DECLARE_TRACE(sched_set_need_resched,
+	TP_PROTO(struct task_struct *tsk, int cpu, int tif),
+	TP_ARGS(tsk, cpu, tif));
+
 #endif /* _TRACE_SCHED_H */
 
 /* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec68fc686bd7..b485e0639616 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1110,6 +1110,7 @@ static void __resched_curr(struct rq *rq, int tif)
 
 	cpu = cpu_of(rq);
 
+	trace_sched_set_need_resched_tp(curr, cpu, tif);
 	if (cpu == smp_processor_id()) {
 		set_ti_thread_flag(cti, tif);
 		if (tif == TIF_NEED_RESCHED)
@@ -1125,6 +1126,11 @@ static void __resched_curr(struct rq *rq, int tif)
 	}
 }
 
+void __trace_set_need_resched(struct task_struct *curr, int tif)
+{
+	trace_sched_set_need_resched_tp(curr, smp_processor_id(), tif);
+}
+
 void resched_curr(struct rq *rq)
 {
 	__resched_curr(rq, TIF_NEED_RESCHED);
@@ -5329,7 +5335,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	 * switched the context for the first time. It is returning from
 	 * schedule for the first time in this path.
 	 */
-	trace_sched_exit_tp(true, CALLER_ADDR0);
+	trace_sched_exit_tp(true);
 	preempt_enable();
 
 	if (current->set_child_tid)
@@ -6678,7 +6684,8 @@ static void __sched notrace __schedule(int sched_mode)
 	struct rq *rq;
 	int cpu;
 
-	trace_sched_entry_tp(preempt, CALLER_ADDR0);
+	/* Trace preemptions consistently with task switches */
+	trace_sched_entry_tp(sched_mode == SM_PREEMPT);
 
 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
@@ -6793,7 +6800,7 @@ static void __sched notrace __schedule(int sched_mode)
 		__balance_callbacks(rq);
 		raw_spin_rq_unlock_irq(rq);
 	}
-	trace_sched_exit_tp(is_switch, CALLER_ADDR0);
+	trace_sched_exit_tp(is_switch);
 }
 
 void __noreturn do_task_dead(void)
diff --git a/kernel/trace/rv/monitors/sco/sco.c b/kernel/trace/rv/monitors/sco/sco.c
index 66f4639d46ac..04c36405e2e3 100644
--- a/kernel/trace/rv/monitors/sco/sco.c
+++ b/kernel/trace/rv/monitors/sco/sco.c
@@ -24,12 +24,12 @@ static void handle_sched_set_state(void *data, struct task_struct *tsk, int stat
 	da_handle_start_event_sco(sched_set_state_sco);
 }
 
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+static void handle_schedule_entry(void *data, bool preempt)
 {
 	da_handle_event_sco(schedule_entry_sco);
 }
 
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+static void handle_schedule_exit(void *data, bool is_switch)
 {
 	da_handle_start_event_sco(schedule_exit_sco);
 }
diff --git a/kernel/trace/rv/monitors/scpd/scpd.c b/kernel/trace/rv/monitors/scpd/scpd.c
index 299703cd72b0..1e351ba52fee 100644
--- a/kernel/trace/rv/monitors/scpd/scpd.c
+++ b/kernel/trace/rv/monitors/scpd/scpd.c
@@ -30,12 +30,12 @@ static void handle_preempt_enable(void *data, unsigned long ip, unsigned long pa
 	da_handle_start_event_scpd(preempt_enable_scpd);
 }
 
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+static void handle_schedule_entry(void *data, bool preempt)
 {
 	da_handle_event_scpd(schedule_entry_scpd);
 }
 
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+static void handle_schedule_exit(void *data, bool is_switch)
 {
 	da_handle_event_scpd(schedule_exit_scpd);
 }
diff --git a/kernel/trace/rv/monitors/sncid/sncid.c b/kernel/trace/rv/monitors/sncid/sncid.c
index 3e1ee715a0fb..c8491f426365 100644
--- a/kernel/trace/rv/monitors/sncid/sncid.c
+++ b/kernel/trace/rv/monitors/sncid/sncid.c
@@ -30,12 +30,12 @@ static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent
 	da_handle_start_event_sncid(irq_enable_sncid);
 }
 
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+static void handle_schedule_entry(void *data, bool preempt)
 {
 	da_handle_start_event_sncid(schedule_entry_sncid);
 }
 
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+static void handle_schedule_exit(void *data, bool is_switch)
 {
 	da_handle_start_event_sncid(schedule_exit_sncid);
 }
diff --git a/kernel/trace/rv/monitors/snep/snep.c b/kernel/trace/rv/monitors/snep/snep.c
index 2adc3108d60c..558950f524a5 100644
--- a/kernel/trace/rv/monitors/snep/snep.c
+++ b/kernel/trace/rv/monitors/snep/snep.c
@@ -30,12 +30,12 @@ static void handle_preempt_enable(void *data, unsigned long ip, unsigned long pa
 	da_handle_start_event_snep(preempt_enable_snep);
 }
 
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+static void handle_schedule_entry(void *data, bool preempt)
 {
 	da_handle_event_snep(schedule_entry_snep);
 }
 
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+static void handle_schedule_exit(void *data, bool is_switch)
 {
 	da_handle_start_event_snep(schedule_exit_snep);
 }
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
index 0452fcd9edcf..95ebd15131f5 100644
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ b/kernel/trace/rv/monitors/tss/tss.c
@@ -27,12 +27,12 @@ static void handle_sched_switch(void *data, bool preempt,
 	da_handle_event_tss(sched_switch_tss);
 }
 
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+static void handle_schedule_entry(void *data, bool preempt)
 {
 	da_handle_event_tss(schedule_entry_tss);
 }
 
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+static void handle_schedule_exit(void *data, bool is_switch)
 {
 	da_handle_start_event_tss(schedule_exit_tss);
 }
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (5 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 15:53   ` Nam Cao
  2025-07-28 13:50 ` [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors Gabriele Monaco
  2025-07-28 13:50 ` [PATCH v5 9/9] rv: Add opid per-cpu monitor Gabriele Monaco
  8 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

The tss monitor currently guarantees task switches can happen only while
scheduling, whereas the sncid monitor enforces scheduling occurs with
interrupt disabled.

Replace the monitors with a more comprehensive specification which
implies both but also ensures that:
* each scheduler call disable interrupts to switch
* each task switch happens with interrupts disabled

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst      |  87 +++++-----
 kernel/trace/rv/Kconfig                       |   3 +-
 kernel/trace/rv/Makefile                      |   3 +-
 kernel/trace/rv/monitors/sncid/Kconfig        |  15 --
 kernel/trace/rv/monitors/sncid/sncid.c        |  95 -----------
 kernel/trace/rv/monitors/sncid/sncid.h        |  49 ------
 kernel/trace/rv/monitors/sncid/sncid_trace.h  |  15 --
 kernel/trace/rv/monitors/sts/Kconfig          |  19 +++
 kernel/trace/rv/monitors/sts/sts.c            | 156 ++++++++++++++++++
 kernel/trace/rv/monitors/sts/sts.h            | 117 +++++++++++++
 .../{tss/tss_trace.h => sts/sts_trace.h}      |   8 +-
 kernel/trace/rv/monitors/tss/Kconfig          |  14 --
 kernel/trace/rv/monitors/tss/tss.c            |  90 ----------
 kernel/trace/rv/monitors/tss/tss.h            |  47 ------
 kernel/trace/rv/rv_trace.h                    |   3 +-
 tools/verification/models/sched/sncid.dot     |  18 --
 tools/verification/models/sched/sts.dot       |  38 +++++
 tools/verification/models/sched/tss.dot       |  18 --
 18 files changed, 385 insertions(+), 410 deletions(-)
 delete mode 100644 kernel/trace/rv/monitors/sncid/Kconfig
 delete mode 100644 kernel/trace/rv/monitors/sncid/sncid.c
 delete mode 100644 kernel/trace/rv/monitors/sncid/sncid.h
 delete mode 100644 kernel/trace/rv/monitors/sncid/sncid_trace.h
 create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sts/sts.c
 create mode 100644 kernel/trace/rv/monitors/sts/sts.h
 rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
 delete mode 100644 kernel/trace/rv/monitors/tss/Kconfig
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
 delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
 delete mode 100644 tools/verification/models/sched/sncid.dot
 create mode 100644 tools/verification/models/sched/sts.dot
 delete mode 100644 tools/verification/models/sched/tss.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 24b2c62a3bc2..6c4c00216c07 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -40,26 +40,6 @@ defined in by Daniel Bristot in [1].
 
 Currently we included the following:
 
-Monitor tss
-~~~~~~~~~~~
-
-The task switch while scheduling (tss) monitor ensures a task switch happens
-only in scheduling context, that is inside a call to `__schedule`::
-
-                     |
-                     |
-                     v
-                   +-----------------+
-                   |     thread      | <+
-                   +-----------------+  |
-                     |                  |
-                     | schedule_entry   | schedule_exit
-                     v                  |
-    sched_switch                        |
-  +---------------                      |
-  |                       sched         |
-  +-------------->                     -+
-
 Monitor sco
 ~~~~~~~~~~~
 
@@ -144,26 +124,55 @@ does not enable preemption::
                                                   |
                           scheduling_contex      -+
 
-Monitor sncid
-~~~~~~~~~~~~~
-
-The schedule not called with interrupt disabled (sncid) monitor ensures
-schedule is not called with interrupt disabled::
+Monitor sts
+~~~~~~~~~~~
 
-                       |
-                       |
-                       v
-    schedule_entry   +--------------+
-    schedule_exit    |              |
-  +----------------- |  can_sched   |
-  |                  |              |
-  +----------------> |              | <+
-                     +--------------+  |
-                       |               |
-                       | irq_disable   | irq_enable
-                       v               |
-                                       |
-                        cant_sched    -+
+The schedule implies task switch (sts) monitor ensures a task switch happens
+only in scheduling context and up to once, as well as scheduling occurs with
+interrupts enabled but no task switch can happen before interrupts are
+disabled. When the next task picked for execution is the same as the previously
+running one, no real task switch occurs but interrupts are disabled nonetheless::
+
+    irq_entry                      |
+     +----+                        |
+     v    |                        v
+ +------------+ irq_enable    #===================#   irq_disable
+ |            | ------------> H                   H   irq_entry
+ | cant_sched | <------------ H                   H   irq_enable
+ |            | irq_disable   H     can_sched     H --------------+
+ +------------+               H                   H               |
+                              H                   H               |
+            +---------------> H                   H <-------------+
+            |                 #===================#
+            |                   |
+      schedule_exit             | schedule_entry
+            |                   v
+            |   +-------------------+     irq_enable
+            |   |    scheduling     | <---------------+
+            |   +-------------------+                 |
+            |     |                                   |
+            |     | irq_disable                    +--------+  irq_entry
+            |     v                                |        | --------+
+            |   +-------------------+  irq_entry   | in_irq |         |
+            |   |                   | -----------> |        | <-------+
+            |   | disable_to_switch |              +--------+
+            |   |                   | --+
+            |   +-------------------+   |
+            |     |                     |
+            |     | sched_switch        |
+            |     v                     |
+            |   +-------------------+   |
+            |   |     switching     |   | irq_enable
+            |   +-------------------+   |
+            |     |                     |
+            |     | irq_enable          |
+            |     v                     |
+            |   +-------------------+   |
+            +-- |  enable_to_exit   | <-+
+                +-------------------+
+                  ^               | irq_disable
+                  |               | irq_entry
+                  +---------------+ irq_enable
 
 References
 ----------
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 34164eb4ec91..b688b24081c8 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -50,12 +50,11 @@ source "kernel/trace/rv/monitors/wip/Kconfig"
 source "kernel/trace/rv/monitors/wwnr/Kconfig"
 
 source "kernel/trace/rv/monitors/sched/Kconfig"
-source "kernel/trace/rv/monitors/tss/Kconfig"
 source "kernel/trace/rv/monitors/sco/Kconfig"
 source "kernel/trace/rv/monitors/snroc/Kconfig"
 source "kernel/trace/rv/monitors/scpd/Kconfig"
 source "kernel/trace/rv/monitors/snep/Kconfig"
-source "kernel/trace/rv/monitors/sncid/Kconfig"
+source "kernel/trace/rv/monitors/sts/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 13ec2944c665..1939d3d7621c 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -6,15 +6,14 @@ obj-$(CONFIG_RV) += rv.o
 obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
 obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
 obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
-obj-$(CONFIG_RV_MON_TSS) += monitors/tss/tss.o
 obj-$(CONFIG_RV_MON_SCO) += monitors/sco/sco.o
 obj-$(CONFIG_RV_MON_SNROC) += monitors/snroc/snroc.o
 obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
 obj-$(CONFIG_RV_MON_SNEP) += monitors/snep/snep.o
-obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
 obj-$(CONFIG_RV_MON_RTAPP) += monitors/rtapp/rtapp.o
 obj-$(CONFIG_RV_MON_PAGEFAULT) += monitors/pagefault/pagefault.o
 obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
+obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/sncid/Kconfig b/kernel/trace/rv/monitors/sncid/Kconfig
deleted file mode 100644
index 3a5639feaaaf..000000000000
--- a/kernel/trace/rv/monitors/sncid/Kconfig
+++ /dev/null
@@ -1,15 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-#
-config RV_MON_SNCID
-	depends on RV
-	depends on TRACE_IRQFLAGS
-	depends on RV_MON_SCHED
-	default y
-	select DA_MON_EVENTS_IMPLICIT
-	bool "sncid monitor"
-	help
-	  Monitor to ensure schedule is not called with interrupt disabled.
-	  This monitor is part of the sched monitors collection.
-
-	  For further information, see:
-	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sncid/sncid.c b/kernel/trace/rv/monitors/sncid/sncid.c
deleted file mode 100644
index c8491f426365..000000000000
--- a/kernel/trace/rv/monitors/sncid/sncid.c
+++ /dev/null
@@ -1,95 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "sncid"
-
-#include <trace/events/sched.h>
-#include <trace/events/preemptirq.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "sncid.h"
-
-static struct rv_monitor rv_sncid;
-DECLARE_DA_MON_PER_CPU(sncid, unsigned char);
-
-static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
-{
-	da_handle_event_sncid(irq_disable_sncid);
-}
-
-static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
-{
-	da_handle_start_event_sncid(irq_enable_sncid);
-}
-
-static void handle_schedule_entry(void *data, bool preempt)
-{
-	da_handle_start_event_sncid(schedule_entry_sncid);
-}
-
-static void handle_schedule_exit(void *data, bool is_switch)
-{
-	da_handle_start_event_sncid(schedule_exit_sncid);
-}
-
-static int enable_sncid(void)
-{
-	int retval;
-
-	retval = da_monitor_init_sncid();
-	if (retval)
-		return retval;
-
-	rv_attach_trace_probe("sncid", irq_disable, handle_irq_disable);
-	rv_attach_trace_probe("sncid", irq_enable, handle_irq_enable);
-	rv_attach_trace_probe("sncid", sched_entry_tp, handle_schedule_entry);
-	rv_attach_trace_probe("sncid", sched_exit_tp, handle_schedule_exit);
-
-	return 0;
-}
-
-static void disable_sncid(void)
-{
-	rv_sncid.enabled = 0;
-
-	rv_detach_trace_probe("sncid", irq_disable, handle_irq_disable);
-	rv_detach_trace_probe("sncid", irq_enable, handle_irq_enable);
-	rv_detach_trace_probe("sncid", sched_entry_tp, handle_schedule_entry);
-	rv_detach_trace_probe("sncid", sched_exit_tp, handle_schedule_exit);
-
-	da_monitor_destroy_sncid();
-}
-
-static struct rv_monitor rv_sncid = {
-	.name = "sncid",
-	.description = "schedule not called with interrupt disabled.",
-	.enable = enable_sncid,
-	.disable = disable_sncid,
-	.reset = da_monitor_reset_all_sncid,
-	.enabled = 0,
-};
-
-static int __init register_sncid(void)
-{
-	return rv_register_monitor(&rv_sncid, &rv_sched);
-}
-
-static void __exit unregister_sncid(void)
-{
-	rv_unregister_monitor(&rv_sncid);
-}
-
-module_init(register_sncid);
-module_exit(unregister_sncid);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("sncid: schedule not called with interrupt disabled.");
diff --git a/kernel/trace/rv/monitors/sncid/sncid.h b/kernel/trace/rv/monitors/sncid/sncid.h
deleted file mode 100644
index 21304725142b..000000000000
--- a/kernel/trace/rv/monitors/sncid/sncid.h
+++ /dev/null
@@ -1,49 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of sncid automaton
- * For further information about this format, see kernel documentation:
- *   Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_sncid {
-	can_sched_sncid = 0,
-	cant_sched_sncid,
-	state_max_sncid
-};
-
-#define INVALID_STATE state_max_sncid
-
-enum events_sncid {
-	irq_disable_sncid = 0,
-	irq_enable_sncid,
-	schedule_entry_sncid,
-	schedule_exit_sncid,
-	event_max_sncid
-};
-
-struct automaton_sncid {
-	char *state_names[state_max_sncid];
-	char *event_names[event_max_sncid];
-	unsigned char function[state_max_sncid][event_max_sncid];
-	unsigned char initial_state;
-	bool final_states[state_max_sncid];
-};
-
-static const struct automaton_sncid automaton_sncid = {
-	.state_names = {
-		"can_sched",
-		"cant_sched"
-	},
-	.event_names = {
-		"irq_disable",
-		"irq_enable",
-		"schedule_entry",
-		"schedule_exit"
-	},
-	.function = {
-		{ cant_sched_sncid,   INVALID_STATE, can_sched_sncid, can_sched_sncid },
-		{    INVALID_STATE, can_sched_sncid,   INVALID_STATE,   INVALID_STATE },
-	},
-	.initial_state = can_sched_sncid,
-	.final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/monitors/sncid/sncid_trace.h b/kernel/trace/rv/monitors/sncid/sncid_trace.h
deleted file mode 100644
index 3ce42a57671d..000000000000
--- a/kernel/trace/rv/monitors/sncid/sncid_trace.h
+++ /dev/null
@@ -1,15 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-
-/*
- * Snippet to be included in rv_trace.h
- */
-
-#ifdef CONFIG_RV_MON_SNCID
-DEFINE_EVENT(event_da_monitor, event_sncid,
-	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
-	     TP_ARGS(state, event, next_state, final_state));
-
-DEFINE_EVENT(error_da_monitor, error_sncid,
-	     TP_PROTO(char *state, char *event),
-	     TP_ARGS(state, event));
-#endif /* CONFIG_RV_MON_SNCID */
diff --git a/kernel/trace/rv/monitors/sts/Kconfig b/kernel/trace/rv/monitors/sts/Kconfig
new file mode 100644
index 000000000000..7d1ff0f6fc91
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/Kconfig
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_STS
+	depends on RV
+	depends on TRACE_IRQFLAGS
+	depends on RV_MON_SCHED
+	default y
+	select DA_MON_EVENTS_IMPLICIT
+	bool "sts monitor"
+	help
+	  Monitor to ensure relationships between scheduler and task switches
+	   * the scheduler is called and returns with interrupts disabled
+	   * each call to the scheduler has up to one switch
+	   * switches only happen inside the scheduler
+	   * each call to the scheduler disables interrupts to switch
+	  This monitor is part of the sched monitors collection.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sts/sts.c b/kernel/trace/rv/monitors/sts/sts.c
new file mode 100644
index 000000000000..c4a9cd67c1d2
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sts"
+
+#include <trace/events/sched.h>
+#include <trace/events/irq.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sts.h"
+
+static struct rv_monitor rv_sts;
+DECLARE_DA_MON_PER_CPU(sts, unsigned char);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+	da_handle_event_sts(irq_entry_sts);
+}
+
+static void attach_vector_irq(void)
+{
+	rv_attach_trace_probe("sts", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_attach_trace_probe("sts", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_attach_trace_probe("sts", reschedule_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("sts", call_function_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("sts", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+static void detach_vector_irq(void)
+{
+	rv_detach_trace_probe("sts", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_detach_trace_probe("sts", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_detach_trace_probe("sts", reschedule_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("sts", call_function_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("sts", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq(void) { }
+static void detach_vector_irq(void) { }
+#endif
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_sts(irq_disable_sts);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_sts(irq_enable_sts);
+}
+
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+	da_handle_event_sts(irq_entry_sts);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+				struct task_struct *prev,
+				struct task_struct *next,
+				unsigned int prev_state)
+{
+	da_handle_event_sts(sched_switch_sts);
+}
+
+static void handle_schedule_entry(void *data, bool preempt)
+{
+	da_handle_event_sts(schedule_entry_sts);
+}
+
+static void handle_schedule_exit(void *data, bool is_switch)
+{
+	da_handle_start_event_sts(schedule_exit_sts);
+}
+
+static int enable_sts(void)
+{
+	int retval;
+
+	retval = da_monitor_init_sts();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("sts", irq_disable, handle_irq_disable);
+	rv_attach_trace_probe("sts", irq_enable, handle_irq_enable);
+	rv_attach_trace_probe("sts", irq_handler_entry, handle_irq_entry);
+	rv_attach_trace_probe("sts", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+	rv_attach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+	attach_vector_irq();
+
+	return 0;
+}
+
+static void disable_sts(void)
+{
+	rv_sts.enabled = 0;
+
+	rv_detach_trace_probe("sts", irq_disable, handle_irq_disable);
+	rv_detach_trace_probe("sts", irq_enable, handle_irq_enable);
+	rv_detach_trace_probe("sts", irq_handler_entry, handle_irq_entry);
+	rv_detach_trace_probe("sts", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+	rv_detach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+	detach_vector_irq();
+
+	da_monitor_destroy_sts();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_sts = {
+	.name = "sts",
+	.description = "schedule implies task switch.",
+	.enable = enable_sts,
+	.disable = disable_sts,
+	.reset = da_monitor_reset_all_sts,
+	.enabled = 0,
+};
+
+static int __init register_sts(void)
+{
+	return rv_register_monitor(&rv_sts, &rv_sched);
+}
+
+static void __exit unregister_sts(void)
+{
+	rv_unregister_monitor(&rv_sts);
+}
+
+module_init(register_sts);
+module_exit(unregister_sts);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sts: schedule implies task switch.");
diff --git a/kernel/trace/rv/monitors/sts/sts.h b/kernel/trace/rv/monitors/sts/sts.h
new file mode 100644
index 000000000000..3368b6599a00
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.h
@@ -0,0 +1,117 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sts automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sts {
+	can_sched_sts = 0,
+	cant_sched_sts,
+	disable_to_switch_sts,
+	enable_to_exit_sts,
+	in_irq_sts,
+	scheduling_sts,
+	switching_sts,
+	state_max_sts
+};
+
+#define INVALID_STATE state_max_sts
+
+enum events_sts {
+	irq_disable_sts = 0,
+	irq_enable_sts,
+	irq_entry_sts,
+	sched_switch_sts,
+	schedule_entry_sts,
+	schedule_exit_sts,
+	event_max_sts
+};
+
+struct automaton_sts {
+	char *state_names[state_max_sts];
+	char *event_names[event_max_sts];
+	unsigned char function[state_max_sts][event_max_sts];
+	unsigned char initial_state;
+	bool final_states[state_max_sts];
+};
+
+static const struct automaton_sts automaton_sts = {
+	.state_names = {
+		"can_sched",
+		"cant_sched",
+		"disable_to_switch",
+		"enable_to_exit",
+		"in_irq",
+		"scheduling",
+		"switching"
+	},
+	.event_names = {
+		"irq_disable",
+		"irq_enable",
+		"irq_entry",
+		"sched_switch",
+		"schedule_entry",
+		"schedule_exit"
+	},
+	.function = {
+		{
+			cant_sched_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			scheduling_sts,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			can_sched_sts,
+			cant_sched_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			enable_to_exit_sts,
+			in_irq_sts,
+			switching_sts,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			enable_to_exit_sts,
+			enable_to_exit_sts,
+			enable_to_exit_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			can_sched_sts
+		},
+		{
+			INVALID_STATE,
+			scheduling_sts,
+			in_irq_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			disable_to_switch_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			enable_to_exit_sts,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+	},
+	.initial_state = can_sched_sts,
+	.final_states = { 1, 0, 0, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h b/kernel/trace/rv/monitors/sts/sts_trace.h
similarity index 67%
rename from kernel/trace/rv/monitors/tss/tss_trace.h
rename to kernel/trace/rv/monitors/sts/sts_trace.h
index 4619dbb50cc0..d78beb58d5b3 100644
--- a/kernel/trace/rv/monitors/tss/tss_trace.h
+++ b/kernel/trace/rv/monitors/sts/sts_trace.h
@@ -4,12 +4,12 @@
  * Snippet to be included in rv_trace.h
  */
 
-#ifdef CONFIG_RV_MON_TSS
-DEFINE_EVENT(event_da_monitor, event_tss,
+#ifdef CONFIG_RV_MON_STS
+DEFINE_EVENT(event_da_monitor, event_sts,
 	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
 	     TP_ARGS(state, event, next_state, final_state));
 
-DEFINE_EVENT(error_da_monitor, error_tss,
+DEFINE_EVENT(error_da_monitor, error_sts,
 	     TP_PROTO(char *state, char *event),
 	     TP_ARGS(state, event));
-#endif /* CONFIG_RV_MON_TSS */
+#endif /* CONFIG_RV_MON_STS */
diff --git a/kernel/trace/rv/monitors/tss/Kconfig b/kernel/trace/rv/monitors/tss/Kconfig
deleted file mode 100644
index 479f86f52e60..000000000000
--- a/kernel/trace/rv/monitors/tss/Kconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-#
-config RV_MON_TSS
-	depends on RV
-	depends on RV_MON_SCHED
-	default y
-	select DA_MON_EVENTS_IMPLICIT
-	bool "tss monitor"
-	help
-	  Monitor to ensure sched_switch happens only in scheduling context.
-	  This monitor is part of the sched monitors collection.
-
-	  For further information, see:
-	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
deleted file mode 100644
index 95ebd15131f5..000000000000
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ /dev/null
@@ -1,90 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "tss"
-
-#include <trace/events/sched.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "tss.h"
-
-static struct rv_monitor rv_tss;
-DECLARE_DA_MON_PER_CPU(tss, unsigned char);
-
-static void handle_sched_switch(void *data, bool preempt,
-				struct task_struct *prev,
-				struct task_struct *next,
-				unsigned int prev_state)
-{
-	da_handle_event_tss(sched_switch_tss);
-}
-
-static void handle_schedule_entry(void *data, bool preempt)
-{
-	da_handle_event_tss(schedule_entry_tss);
-}
-
-static void handle_schedule_exit(void *data, bool is_switch)
-{
-	da_handle_start_event_tss(schedule_exit_tss);
-}
-
-static int enable_tss(void)
-{
-	int retval;
-
-	retval = da_monitor_init_tss();
-	if (retval)
-		return retval;
-
-	rv_attach_trace_probe("tss", sched_switch, handle_sched_switch);
-	rv_attach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
-	rv_attach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
-	return 0;
-}
-
-static void disable_tss(void)
-{
-	rv_tss.enabled = 0;
-
-	rv_detach_trace_probe("tss", sched_switch, handle_sched_switch);
-	rv_detach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
-	rv_detach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
-	da_monitor_destroy_tss();
-}
-
-static struct rv_monitor rv_tss = {
-	.name = "tss",
-	.description = "task switch while scheduling.",
-	.enable = enable_tss,
-	.disable = disable_tss,
-	.reset = da_monitor_reset_all_tss,
-	.enabled = 0,
-};
-
-static int __init register_tss(void)
-{
-	return rv_register_monitor(&rv_tss, &rv_sched);
-}
-
-static void __exit unregister_tss(void)
-{
-	rv_unregister_monitor(&rv_tss);
-}
-
-module_init(register_tss);
-module_exit(unregister_tss);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("tss: task switch while scheduling.");
diff --git a/kernel/trace/rv/monitors/tss/tss.h b/kernel/trace/rv/monitors/tss/tss.h
deleted file mode 100644
index f0a36fda1b87..000000000000
--- a/kernel/trace/rv/monitors/tss/tss.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of tss automaton
- * For further information about this format, see kernel documentation:
- *   Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_tss {
-	thread_tss = 0,
-	sched_tss,
-	state_max_tss
-};
-
-#define INVALID_STATE state_max_tss
-
-enum events_tss {
-	sched_switch_tss = 0,
-	schedule_entry_tss,
-	schedule_exit_tss,
-	event_max_tss
-};
-
-struct automaton_tss {
-	char *state_names[state_max_tss];
-	char *event_names[event_max_tss];
-	unsigned char function[state_max_tss][event_max_tss];
-	unsigned char initial_state;
-	bool final_states[state_max_tss];
-};
-
-static const struct automaton_tss automaton_tss = {
-	.state_names = {
-		"thread",
-		"sched"
-	},
-	.event_names = {
-		"sched_switch",
-		"schedule_entry",
-		"schedule_exit"
-	},
-	.function = {
-		{     INVALID_STATE,         sched_tss,     INVALID_STATE },
-		{         sched_tss,     INVALID_STATE,        thread_tss },
-	},
-	.initial_state = thread_tss,
-	.final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 3af46cd185b3..97b2f7e07f27 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -58,11 +58,10 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 );
 
 #include <monitors/wip/wip_trace.h>
-#include <monitors/tss/tss_trace.h>
 #include <monitors/sco/sco_trace.h>
 #include <monitors/scpd/scpd_trace.h>
 #include <monitors/snep/snep_trace.h>
-#include <monitors/sncid/sncid_trace.h>
+#include <monitors/sts/sts_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
 
 #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/sncid.dot b/tools/verification/models/sched/sncid.dot
deleted file mode 100644
index 072851721b50..000000000000
--- a/tools/verification/models/sched/sncid.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
-	center = true;
-	size = "7,11";
-	{node [shape = plaintext, style=invis, label=""] "__init_can_sched"};
-	{node [shape = ellipse] "can_sched"};
-	{node [shape = plaintext] "can_sched"};
-	{node [shape = plaintext] "cant_sched"};
-	"__init_can_sched" -> "can_sched";
-	"can_sched" [label = "can_sched", color = green3];
-	"can_sched" -> "can_sched" [ label = "schedule_entry\nschedule_exit" ];
-	"can_sched" -> "cant_sched" [ label = "irq_disable" ];
-	"cant_sched" [label = "cant_sched"];
-	"cant_sched" -> "can_sched" [ label = "irq_enable" ];
-	{ rank = min ;
-		"__init_can_sched";
-		"can_sched";
-	}
-}
diff --git a/tools/verification/models/sched/sts.dot b/tools/verification/models/sched/sts.dot
new file mode 100644
index 000000000000..8f5f38be04d5
--- /dev/null
+++ b/tools/verification/models/sched/sts.dot
@@ -0,0 +1,38 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_can_sched"};
+	{node [shape = doublecircle] "can_sched"};
+	{node [shape = circle] "can_sched"};
+	{node [shape = circle] "cant_sched"};
+	{node [shape = circle] "disable_to_switch"};
+	{node [shape = circle] "enable_to_exit"};
+	{node [shape = circle] "in_irq"};
+	{node [shape = circle] "scheduling"};
+	{node [shape = circle] "switching"};
+	"__init_can_sched" -> "can_sched";
+	"can_sched" [label = "can_sched", color = green3];
+	"can_sched" -> "cant_sched" [ label = "irq_disable" ];
+	"can_sched" -> "scheduling" [ label = "schedule_entry" ];
+	"cant_sched" [label = "cant_sched"];
+	"cant_sched" -> "can_sched" [ label = "irq_enable" ];
+	"cant_sched" -> "cant_sched" [ label = "irq_entry" ];
+	"disable_to_switch" [label = "disable_to_switch"];
+	"disable_to_switch" -> "enable_to_exit" [ label = "irq_enable" ];
+	"disable_to_switch" -> "in_irq" [ label = "irq_entry" ];
+	"disable_to_switch" -> "switching" [ label = "sched_switch" ];
+	"enable_to_exit" [label = "enable_to_exit"];
+	"enable_to_exit" -> "can_sched" [ label = "schedule_exit" ];
+	"enable_to_exit" -> "enable_to_exit" [ label = "irq_disable\nirq_entry\nirq_enable" ];
+	"in_irq" [label = "in_irq"];
+	"in_irq" -> "in_irq" [ label = "irq_entry" ];
+	"in_irq" -> "scheduling" [ label = "irq_enable" ];
+	"scheduling" [label = "scheduling"];
+	"scheduling" -> "disable_to_switch" [ label = "irq_disable" ];
+	"switching" [label = "switching"];
+	"switching" -> "enable_to_exit" [ label = "irq_enable" ];
+	{ rank = min ;
+		"__init_can_sched";
+		"can_sched";
+	}
+}
diff --git a/tools/verification/models/sched/tss.dot b/tools/verification/models/sched/tss.dot
deleted file mode 100644
index 7dfa1d9121bb..000000000000
--- a/tools/verification/models/sched/tss.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
-	center = true;
-	size = "7,11";
-	{node [shape = plaintext] "sched"};
-	{node [shape = plaintext, style=invis, label=""] "__init_thread"};
-	{node [shape = ellipse] "thread"};
-	{node [shape = plaintext] "thread"};
-	"__init_thread" -> "thread";
-	"sched" [label = "sched"];
-	"sched" -> "sched" [ label = "sched_switch" ];
-	"sched" -> "thread" [ label = "schedule_exit" ];
-	"thread" [label = "thread", color = green3];
-	"thread" -> "sched" [ label = "schedule_entry" ];
-	{ rank = min ;
-		"__init_thread";
-		"thread";
-	}
-}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (6 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 15:56   ` Nam Cao
  2025-07-28 13:50 ` [PATCH v5 9/9] rv: Add opid per-cpu monitor Gabriele Monaco
  8 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Add 2 per-task monitors as part of the sched model:

* nrp: need-resched preempts
    Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
    Monitor to ensure sched_set_state to sleepable leads to sleeping and
    sleeping tasks require wakeup.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst   | 167 +++++++++++++++++++++
 kernel/trace/rv/Kconfig                    |   2 +
 kernel/trace/rv/Makefile                   |   2 +
 kernel/trace/rv/monitors/nrp/Kconfig       |  16 ++
 kernel/trace/rv/monitors/nrp/nrp.c         | 138 +++++++++++++++++
 kernel/trace/rv/monitors/nrp/nrp.h         |  75 +++++++++
 kernel/trace/rv/monitors/nrp/nrp_trace.h   |  15 ++
 kernel/trace/rv/monitors/sched/Kconfig     |   1 +
 kernel/trace/rv/monitors/sssw/Kconfig      |  15 ++
 kernel/trace/rv/monitors/sssw/sssw.c       | 116 ++++++++++++++
 kernel/trace/rv/monitors/sssw/sssw.h       | 105 +++++++++++++
 kernel/trace/rv/monitors/sssw/sssw_trace.h |  15 ++
 kernel/trace/rv/rv_trace.h                 |   2 +
 tools/verification/models/sched/nrp.dot    |  29 ++++
 tools/verification/models/sched/sssw.dot   |  30 ++++
 15 files changed, 728 insertions(+)
 create mode 100644 kernel/trace/rv/monitors/nrp/Kconfig
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.c
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp.h
 create mode 100644 kernel/trace/rv/monitors/nrp/nrp_trace.h
 create mode 100644 kernel/trace/rv/monitors/sssw/Kconfig
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.c
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw.h
 create mode 100644 kernel/trace/rv/monitors/sssw/sssw_trace.h
 create mode 100644 tools/verification/models/sched/nrp.dot
 create mode 100644 tools/verification/models/sched/sssw.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 6c4c00216c07..11ef963cb578 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -174,6 +174,173 @@ running one, no real task switch occurs but interrupts are disabled nonetheless:
                   |               | irq_entry
                   +---------------+ irq_enable
 
+Monitor nrp
+-----------
+
+The need resched preempts (nrp) monitor ensures preemption requires
+``need_resched``. Only kernel preemption is considered, since preemption
+while returning to userspace, for this monitor, is indistinguishable from
+``sched_switch_yield`` (described in the sssw monitor).
+A kernel preemption is whenever ``__schedule`` is called with the preemption
+flag set to true (e.g. from preempt_enable or exiting from interrupts). This
+type of preemption occurs after the need for ``rescheduling`` has been set.
+This is not valid for the *lazy* variant of the flag, which causes only
+userspace preemption.
+A ``schedule_entry_preempt`` may involve a task switch or not, in the latter
+case, a task goes through the scheduler from a preemption context but it is
+picked as the next task to run. Since the scheduler runs, this clears the need
+to reschedule. The ``any_thread_running`` state does not imply the monitored
+task is not running as this monitor does not track the outcome of scheduling.
+
+In theory, a preemption can only occur after the ``need_resched`` flag is set. In
+practice, however, it is possible to see a preemption where the flag is not
+set. This can happen in one specific condition::
+
+  need_resched
+                   preempt_schedule()
+                                           preempt_schedule_irq()
+                                                   __schedule()
+  !need_resched
+                           __schedule()
+
+In the situation above, standard preemption starts (e.g. from preempt_enable
+when the flag is set), an interrupt occurs before scheduling and, on its exit
+path, it schedules, which clears the ``need_resched`` flag.
+When the preempted task runs again, the standard preemption started earlier
+resumes, although the flag is no longer set. The monitor considers this a
+``nested_preemption``, this allows another preemption without re-setting the
+flag. This condition relaxes the monitor constraints and may catch false
+negatives (i.e. no real ``nested_preemptions``) but makes the monitor more
+robust and able to validate other scenarios.
+For simplicity, the monitor starts in ``preempt_irq``, although no interrupt
+occurred, as the situation above is hard to pinpoint::
+
+    schedule_entry
+    irq_entry                 #===========================================#
+  +-------------------------- H                                           H
+  |                           H                                           H
+  +-------------------------> H             any_thread_running            H
+                              H                                           H
+  +-------------------------> H                                           H
+  |                           #===========================================#
+  | schedule_entry              |                       ^
+  | schedule_entry_preempt      | sched_need_resched    | schedule_entry
+  |                             |                      schedule_entry_preempt
+  |                             v                       |
+  |                           +----------------------+  |
+  |                      +--- |                      |  |
+  |   sched_need_resched |    |     rescheduling     | -+
+  |                      +--> |                      |
+  |                           +----------------------+
+  |                             | irq_entry
+  |                             v
+  |                           +----------------------+
+  |                           |                      | ---+
+  |                      ---> |                      |    | sched_need_resched
+  |                           |      preempt_irq     |    | irq_entry
+  |                           |                      | <--+
+  |                           |                      | <--+
+  |                           +----------------------+    |
+  |                             | schedule_entry          | sched_need_resched
+  |                             | schedule_entry_preempt  |
+  |                             v                         |
+  |                           +-----------------------+   |
+  +-------------------------- |    nested_preempt     | --+
+                              +-----------------------+
+                                ^ irq_entry         |
+                                +-------------------+
+
+Due to how the ``need_resched`` flag on the preemption count works on arm64,
+this monitor is unstable on that architecture, as it often records preemption
+when the flag is not set, even in presence of the workaround above.
+For the time being, the monitor is disabled by default on arm64.
+
+Monitor sssw
+------------
+
+The set state sleep and wakeup (sssw) monitor ensures ``set_state`` to
+sleepable leads to sleeping and sleeping tasks require wakeup. It includes the
+following types of switch:
+
+* ``switch_suspend``:
+  a task puts itself to sleep, this can happen only after explicitly setting
+  the task to ``sleepable``. After a task is suspended, it needs to be woken up
+  (``waking`` state) before being switched in again.
+  Setting the task's state to ``sleepable`` can be reverted before switching if it
+  is woken up or set to ``runnable``.
+* ``switch_blocking``:
+  a special case of a ``switch_suspend`` where the task is waiting on a
+  sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
+  state events racing with each other and this leads the model to perceive this
+  type of switch when the task is not set to sleepable. This is a limitation of
+  the model in SMP system and workarounds may slow down the system.
+* ``switch_preempt``:
+  a task switch as a result of kernel preemption (``schedule_entry_preempt`` in
+  the nrp model).
+* ``switch_yield``:
+  a task explicitly calls the scheduler or is preempted while returning to
+  userspace. It can happen after a ``yield`` system call, from the idle task or
+  if the ``need_resched`` flag is set. By definition, a task cannot yield while
+  ``sleepable`` as that would be a suspension. A special case of a yield occurs
+  when a task in ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is
+  pending. The task doesn't go through the usual blocking/waking and is set
+  back to runnable, the resulting switch (if there) looks like a yield to the
+  ``signal_wakeup`` state and is followed by the signal delivery. From this
+  state, the monitor expects a signal even if it sees a wakeup event, although
+  not necessary, to rule out false negatives.
+
+This monitor doesn't include a running state, ``sleepable`` and ``runnable``
+are only referring to the task's desired state, which could be scheduled out
+(e.g. due to preemption). However, it does include the event
+``sched_switch_in`` to represent when a task is allowed to become running. This
+can be triggered also by preemption, but cannot occur after the task got to
+``sleeping`` before a ``wakeup`` occurs::
+
+   +--------------------------------------------------------------------------+
+   |                                                                          |
+   |                                                                          |
+   | switch_suspend           |                                               |
+   | switch_blocking          |                                               |
+   v                          v                                               |
+ +----------+              #==========================#   set_state_runnable  |
+ |          |              H                          H   wakeup              |
+ |          |              H                          H   switch_in           |
+ |          |              H                          H   switch_yield        |
+ | sleeping |              H                          H   switch_preempt      |
+ |          |              H                          H   signal_deliver      |
+ |          |  switch_     H                          H ------+               |
+ |          |  _blocking   H         runnable         H       |               |
+ |          | <----------- H                          H <-----+               |
+ +----------+              H                          H                       |
+   |   wakeup              H                          H                       |
+   +---------------------> H                          H                       |
+                           H                          H                       |
+               +---------> H                          H                       |
+               |           #==========================#                       |
+               |             |                ^                               |
+               |             |                | set_state_runnable            |
+               |             |                | wakeup                        |
+               |    set_state_sleepable       |      +------------------------+
+               |             v                |      |
+               |           +--------------------------+  set_state_sleepable
+               |           |                          |  switch_in
+               |           |                          |  switch_preempt
+   signal_deliver          |        sleepable         |  signal_deliver
+               |           |                          | ------+
+               |           |                          |       |
+               |           |                          | <-----+
+               |           +--------------------------+
+               |             |                ^
+               |        switch_yield          | set_state_sleepable
+               |             v                |
+               |           +---------------+  |
+               +---------- | signal_wakeup | -+
+                           +---------------+
+                             ^           | switch_in
+                             |           | switch_preempt
+                             |           | switch_yield
+                             +-----------+ wakeup
+
 References
 ----------
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index b688b24081c8..59d0db898d4a 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -55,6 +55,8 @@ source "kernel/trace/rv/monitors/snroc/Kconfig"
 source "kernel/trace/rv/monitors/scpd/Kconfig"
 source "kernel/trace/rv/monitors/snep/Kconfig"
 source "kernel/trace/rv/monitors/sts/Kconfig"
+source "kernel/trace/rv/monitors/nrp/Kconfig"
+source "kernel/trace/rv/monitors/sssw/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 1939d3d7621c..2afac88539d3 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -14,6 +14,8 @@ obj-$(CONFIG_RV_MON_RTAPP) += monitors/rtapp/rtapp.o
 obj-$(CONFIG_RV_MON_PAGEFAULT) += monitors/pagefault/pagefault.o
 obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
 obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
+obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
+obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/nrp/Kconfig b/kernel/trace/rv/monitors/nrp/Kconfig
new file mode 100644
index 000000000000..f5ec08f65535
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/Kconfig
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_NRP
+	depends on RV
+	depends on RV_MON_SCHED
+	default y if !ARM64
+	select DA_MON_EVENTS_ID
+	bool "nrp monitor"
+	help
+	  Monitor to ensure preemption requires need resched.
+	  This monitor is part of the sched monitors collection.
+
+	  This monitor is unstable on arm64, say N unless you are testing it.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/nrp/nrp.c b/kernel/trace/rv/monitors/nrp/nrp.c
new file mode 100644
index 000000000000..5a83b7171432
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "nrp"
+
+#include <trace/events/irq.h>
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "nrp.h"
+
+static struct rv_monitor rv_nrp;
+DECLARE_DA_MON_PER_TASK(nrp, unsigned char);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+	da_handle_event_nrp(current, irq_entry_nrp);
+}
+
+static void attach_vector_irq(void)
+{
+	rv_attach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_attach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_attach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+static void detach_vector_irq(void)
+{
+	rv_detach_trace_probe("nrp", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_detach_trace_probe("nrp", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_detach_trace_probe("nrp", reschedule_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("nrp", call_function_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("nrp", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq(void) { }
+static void detach_vector_irq(void) { }
+#endif
+
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+	da_handle_event_nrp(current, irq_entry_nrp);
+}
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk,
+				      int cpu, int tif)
+{
+	/*
+	 * Although need_resched leads to both the rescheduling and preempt_irq
+	 * states, it is safer to start the monitor always in preempt_irq,
+	 * which may not mirror the system state but makes the monitor simpler,
+	 */
+	if (tif == TIF_NEED_RESCHED)
+		da_handle_start_event_nrp(tsk, sched_need_resched_nrp);
+}
+
+static void handle_schedule_entry(void *data, bool preempt)
+{
+	if (preempt)
+		da_handle_event_nrp(current, schedule_entry_preempt_nrp);
+	else
+		da_handle_event_nrp(current, schedule_entry_nrp);
+}
+
+static int enable_nrp(void)
+{
+	int retval;
+
+	retval = da_monitor_init_nrp();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
+	rv_attach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_attach_trace_probe("nrp", sched_entry_tp, handle_schedule_entry);
+	attach_vector_irq();
+
+	return 0;
+}
+
+static void disable_nrp(void)
+{
+	rv_nrp.enabled = 0;
+
+	rv_detach_trace_probe("nrp", irq_handler_entry, handle_irq_entry);
+	rv_detach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_detach_trace_probe("nrp", sched_entry_tp, handle_schedule_entry);
+	detach_vector_irq();
+
+	da_monitor_destroy_nrp();
+}
+
+static struct rv_monitor rv_nrp = {
+	.name = "nrp",
+	.description = "need resched preempts.",
+	.enable = enable_nrp,
+	.disable = disable_nrp,
+	.reset = da_monitor_reset_all_nrp,
+	.enabled = 0,
+};
+
+static int __init register_nrp(void)
+{
+	return rv_register_monitor(&rv_nrp, &rv_sched);
+}
+
+static void __exit unregister_nrp(void)
+{
+	rv_unregister_monitor(&rv_nrp);
+}
+
+module_init(register_nrp);
+module_exit(unregister_nrp);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("nrp: need resched preempts.");
diff --git a/kernel/trace/rv/monitors/nrp/nrp.h b/kernel/trace/rv/monitors/nrp/nrp.h
new file mode 100644
index 000000000000..c9f12207cbf6
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.h
@@ -0,0 +1,75 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of nrp automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_nrp {
+	preempt_irq_nrp = 0,
+	any_thread_running_nrp,
+	nested_preempt_nrp,
+	rescheduling_nrp,
+	state_max_nrp
+};
+
+#define INVALID_STATE state_max_nrp
+
+enum events_nrp {
+	irq_entry_nrp = 0,
+	sched_need_resched_nrp,
+	schedule_entry_nrp,
+	schedule_entry_preempt_nrp,
+	event_max_nrp
+};
+
+struct automaton_nrp {
+	char *state_names[state_max_nrp];
+	char *event_names[event_max_nrp];
+	unsigned char function[state_max_nrp][event_max_nrp];
+	unsigned char initial_state;
+	bool final_states[state_max_nrp];
+};
+
+static const struct automaton_nrp automaton_nrp = {
+	.state_names = {
+		"preempt_irq",
+		"any_thread_running",
+		"nested_preempt",
+		"rescheduling"
+	},
+	.event_names = {
+		"irq_entry",
+		"sched_need_resched",
+		"schedule_entry",
+		"schedule_entry_preempt"
+	},
+	.function = {
+		{
+			preempt_irq_nrp,
+			preempt_irq_nrp,
+			nested_preempt_nrp,
+			nested_preempt_nrp
+		},
+		{
+			any_thread_running_nrp,
+			rescheduling_nrp,
+			any_thread_running_nrp,
+			INVALID_STATE
+		},
+		{
+			nested_preempt_nrp,
+			preempt_irq_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp
+		},
+		{
+			preempt_irq_nrp,
+			rescheduling_nrp,
+			any_thread_running_nrp,
+			any_thread_running_nrp
+		},
+	},
+	.initial_state = preempt_irq_nrp,
+	.final_states = { 0, 1, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/nrp/nrp_trace.h b/kernel/trace/rv/monitors/nrp/nrp_trace.h
new file mode 100644
index 000000000000..2e13497de3b6
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_NRP
+DEFINE_EVENT(event_da_monitor_id, event_nrp,
+	     TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_nrp,
+	     TP_PROTO(int id, char *state, char *event),
+	     TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_NRP */
diff --git a/kernel/trace/rv/monitors/sched/Kconfig b/kernel/trace/rv/monitors/sched/Kconfig
index ae3eb410abd7..aa16456da864 100644
--- a/kernel/trace/rv/monitors/sched/Kconfig
+++ b/kernel/trace/rv/monitors/sched/Kconfig
@@ -2,6 +2,7 @@
 #
 config RV_MON_SCHED
 	depends on RV
+	depends on RV_PER_TASK_MONITORS >= 3
 	bool "sched monitor"
 	help
 	  Collection of monitors to check the scheduler behaves according to specifications.
diff --git a/kernel/trace/rv/monitors/sssw/Kconfig b/kernel/trace/rv/monitors/sssw/Kconfig
new file mode 100644
index 000000000000..23b7eeb38bbf
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_SSSW
+	depends on RV
+	depends on RV_MON_SCHED
+	default y
+	select DA_MON_EVENTS_ID
+	bool "sssw monitor"
+	help
+	  Monitor to ensure sched_set_state to sleepable leads to sleeping and
+	  sleeping tasks require wakeup.
+	  This monitor is part of the sched monitors collection.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sssw/sssw.c b/kernel/trace/rv/monitors/sssw/sssw.c
new file mode 100644
index 000000000000..84b8d890d9d4
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.c
@@ -0,0 +1,116 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sssw"
+
+#include <trace/events/sched.h>
+#include <trace/events/signal.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sssw.h"
+
+static struct rv_monitor rv_sssw;
+DECLARE_DA_MON_PER_TASK(sssw, unsigned char);
+
+static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+{
+	if (state == TASK_RUNNING)
+		da_handle_start_event_sssw(tsk, sched_set_state_runnable_sssw);
+	else
+		da_handle_event_sssw(tsk, sched_set_state_sleepable_sssw);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+				struct task_struct *prev,
+				struct task_struct *next,
+				unsigned int prev_state)
+{
+	if (preempt)
+		da_handle_event_sssw(prev, sched_switch_preempt_sssw);
+	else if (prev_state == TASK_RUNNING)
+		da_handle_event_sssw(prev, sched_switch_yield_sssw);
+	else if (prev_state == TASK_RTLOCK_WAIT)
+		/* special case of sleeping task with racy conditions */
+		da_handle_event_sssw(prev, sched_switch_blocking_sssw);
+	else
+		da_handle_event_sssw(prev, sched_switch_suspend_sssw);
+	da_handle_event_sssw(next, sched_switch_in_sssw);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+	/*
+	 * Wakeup can also lead to signal_wakeup although the system is
+	 * actually runnable. The monitor can safely start with this event.
+	 */
+	da_handle_start_event_sssw(p, sched_wakeup_sssw);
+}
+
+static void handle_signal_deliver(void *data, int sig,
+				   struct kernel_siginfo *info,
+				   struct k_sigaction *ka)
+{
+	da_handle_event_sssw(current, signal_deliver_sssw);
+}
+
+static int enable_sssw(void)
+{
+	int retval;
+
+	retval = da_monitor_init_sssw();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+	rv_attach_trace_probe("sssw", sched_switch, handle_sched_switch);
+	rv_attach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+	rv_attach_trace_probe("sssw", signal_deliver, handle_signal_deliver);
+
+	return 0;
+}
+
+static void disable_sssw(void)
+{
+	rv_sssw.enabled = 0;
+
+	rv_detach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+	rv_detach_trace_probe("sssw", sched_switch, handle_sched_switch);
+	rv_detach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+	rv_detach_trace_probe("sssw", signal_deliver, handle_signal_deliver);
+
+	da_monitor_destroy_sssw();
+}
+
+static struct rv_monitor rv_sssw = {
+	.name = "sssw",
+	.description = "set state sleep and wakeup.",
+	.enable = enable_sssw,
+	.disable = disable_sssw,
+	.reset = da_monitor_reset_all_sssw,
+	.enabled = 0,
+};
+
+static int __init register_sssw(void)
+{
+	return rv_register_monitor(&rv_sssw, &rv_sched);
+}
+
+static void __exit unregister_sssw(void)
+{
+	rv_unregister_monitor(&rv_sssw);
+}
+
+module_init(register_sssw);
+module_exit(unregister_sssw);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sssw: set state sleep and wakeup.");
diff --git a/kernel/trace/rv/monitors/sssw/sssw.h b/kernel/trace/rv/monitors/sssw/sssw.h
new file mode 100644
index 000000000000..243d54050c94
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sssw automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sssw {
+	runnable_sssw = 0,
+	signal_wakeup_sssw,
+	sleepable_sssw,
+	sleeping_sssw,
+	state_max_sssw
+};
+
+#define INVALID_STATE state_max_sssw
+
+enum events_sssw {
+	sched_set_state_runnable_sssw = 0,
+	sched_set_state_sleepable_sssw,
+	sched_switch_blocking_sssw,
+	sched_switch_in_sssw,
+	sched_switch_preempt_sssw,
+	sched_switch_suspend_sssw,
+	sched_switch_yield_sssw,
+	sched_wakeup_sssw,
+	signal_deliver_sssw,
+	event_max_sssw
+};
+
+struct automaton_sssw {
+	char *state_names[state_max_sssw];
+	char *event_names[event_max_sssw];
+	unsigned char function[state_max_sssw][event_max_sssw];
+	unsigned char initial_state;
+	bool final_states[state_max_sssw];
+};
+
+static const struct automaton_sssw automaton_sssw = {
+	.state_names = {
+		"runnable",
+		"signal_wakeup",
+		"sleepable",
+		"sleeping"
+	},
+	.event_names = {
+		"sched_set_state_runnable",
+		"sched_set_state_sleepable",
+		"sched_switch_blocking",
+		"sched_switch_in",
+		"sched_switch_preempt",
+		"sched_switch_suspend",
+		"sched_switch_yield",
+		"sched_wakeup",
+		"signal_deliver"
+	},
+	.function = {
+		{
+			runnable_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			runnable_sssw,
+			runnable_sssw,
+			INVALID_STATE,
+			runnable_sssw,
+			runnable_sssw,
+			runnable_sssw
+		},
+		{
+			INVALID_STATE,
+			sleepable_sssw,
+			INVALID_STATE,
+			signal_wakeup_sssw,
+			signal_wakeup_sssw,
+			INVALID_STATE,
+			signal_wakeup_sssw,
+			signal_wakeup_sssw,
+			runnable_sssw
+		},
+		{
+			runnable_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			sleepable_sssw,
+			sleepable_sssw,
+			sleeping_sssw,
+			signal_wakeup_sssw,
+			runnable_sssw,
+			sleepable_sssw
+		},
+		{
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			runnable_sssw,
+			INVALID_STATE
+		},
+	},
+	.initial_state = runnable_sssw,
+	.final_states = { 1, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/sssw/sssw_trace.h b/kernel/trace/rv/monitors/sssw/sssw_trace.h
new file mode 100644
index 000000000000..6c03cfc6960b
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_SSSW
+DEFINE_EVENT(event_da_monitor_id, event_sssw,
+	     TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_sssw,
+	     TP_PROTO(int id, char *state, char *event),
+	     TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_SSSW */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 97b2f7e07f27..e4e78d23b7b0 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -123,6 +123,8 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
 
 #include <monitors/wwnr/wwnr_trace.h>
 #include <monitors/snroc/snroc_trace.h>
+#include <monitors/nrp/nrp_trace.h>
+#include <monitors/sssw/sssw_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
 
 #endif /* CONFIG_DA_MON_EVENTS_ID */
diff --git a/tools/verification/models/sched/nrp.dot b/tools/verification/models/sched/nrp.dot
new file mode 100644
index 000000000000..77bb64669416
--- /dev/null
+++ b/tools/verification/models/sched/nrp.dot
@@ -0,0 +1,29 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = doublecircle] "any_thread_running"};
+	{node [shape = circle] "any_thread_running"};
+	{node [shape = circle] "nested_preempt"};
+	{node [shape = plaintext, style=invis, label=""] "__init_preempt_irq"};
+	{node [shape = circle] "preempt_irq"};
+	{node [shape = circle] "rescheduling"};
+	"__init_preempt_irq" -> "preempt_irq";
+	"any_thread_running" [label = "any_thread_running", color = green3];
+	"any_thread_running" -> "any_thread_running" [ label = "schedule_entry\nirq_entry" ];
+	"any_thread_running" -> "rescheduling" [ label = "sched_need_resched" ];
+	"nested_preempt" [label = "nested_preempt"];
+	"nested_preempt" -> "any_thread_running" [ label = "schedule_entry_preempt\nschedule_entry" ];
+	"nested_preempt" -> "nested_preempt" [ label = "irq_entry" ];
+	"nested_preempt" -> "preempt_irq" [ label = "sched_need_resched" ];
+	"preempt_irq" [label = "preempt_irq"];
+	"preempt_irq" -> "nested_preempt" [ label = "schedule_entry_preempt\nschedule_entry" ];
+	"preempt_irq" -> "preempt_irq" [ label = "irq_entry\nsched_need_resched" ];
+	"rescheduling" [label = "rescheduling"];
+	"rescheduling" -> "any_thread_running" [ label = "schedule_entry_preempt\nschedule_entry" ];
+	"rescheduling" -> "preempt_irq" [ label = "irq_entry" ];
+	"rescheduling" -> "rescheduling" [ label = "sched_need_resched" ];
+	{ rank = min ;
+		"__init_preempt_irq";
+		"preempt_irq";
+	}
+}
diff --git a/tools/verification/models/sched/sssw.dot b/tools/verification/models/sched/sssw.dot
new file mode 100644
index 000000000000..4994c3e876be
--- /dev/null
+++ b/tools/verification/models/sched/sssw.dot
@@ -0,0 +1,30 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_runnable"};
+	{node [shape = doublecircle] "runnable"};
+	{node [shape = circle] "runnable"};
+	{node [shape = circle] "signal_wakeup"};
+	{node [shape = circle] "sleepable"};
+	{node [shape = circle] "sleeping"};
+	"__init_runnable" -> "runnable";
+	"runnable" [label = "runnable", color = green3];
+	"runnable" -> "runnable" [ label = "sched_set_state_runnable\nsched_wakeup\nsched_switch_in\nsched_switch_yield\nsched_switch_preempt\nsignal_deliver" ];
+	"runnable" -> "sleepable" [ label = "sched_set_state_sleepable" ];
+	"runnable" -> "sleeping" [ label = "sched_switch_blocking" ];
+	"signal_wakeup" [label = "signal_wakeup"];
+	"signal_wakeup" -> "runnable" [ label = "signal_deliver" ];
+	"signal_wakeup" -> "signal_wakeup" [ label = "sched_switch_in\nsched_switch_preempt\nsched_switch_yield\nsched_wakeup" ];
+	"signal_wakeup" -> "sleepable" [ label = "sched_set_state_sleepable" ];
+	"sleepable" [label = "sleepable"];
+	"sleepable" -> "runnable" [ label = "sched_set_state_runnable\nsched_wakeup" ];
+	"sleepable" -> "signal_wakeup" [ label = "sched_switch_yield" ];
+	"sleepable" -> "sleepable" [ label = "sched_set_state_sleepable\nsched_switch_in\nsched_switch_preempt\nsignal_deliver" ];
+	"sleepable" -> "sleeping" [ label = "sched_switch_suspend\nsched_switch_blocking" ];
+	"sleeping" [label = "sleeping"];
+	"sleeping" -> "runnable" [ label = "sched_wakeup" ];
+	{ rank = min ;
+		"__init_runnable";
+		"runnable";
+	}
+}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v5 9/9] rv: Add opid per-cpu monitor
  2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
                   ` (7 preceding siblings ...)
  2025-07-28 13:50 ` [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors Gabriele Monaco
@ 2025-07-28 13:50 ` Gabriele Monaco
  2025-07-28 15:58   ` Nam Cao
  8 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-28 13:50 UTC (permalink / raw)
  To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc
  Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
    Monitor to ensure wakeup and need_resched occur with irq and
    preemption disabled or in irq handlers.

Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 Documentation/trace/rv/monitor_sched.rst   |  55 +++++++
 kernel/trace/rv/Kconfig                    |   1 +
 kernel/trace/rv/Makefile                   |   1 +
 kernel/trace/rv/monitors/opid/Kconfig      |  19 +++
 kernel/trace/rv/monitors/opid/opid.c       | 168 +++++++++++++++++++++
 kernel/trace/rv/monitors/opid/opid.h       | 104 +++++++++++++
 kernel/trace/rv/monitors/opid/opid_trace.h |  15 ++
 kernel/trace/rv/rv_trace.h                 |   1 +
 tools/verification/models/sched/opid.dot   |  35 +++++
 9 files changed, 399 insertions(+)
 create mode 100644 kernel/trace/rv/monitors/opid/Kconfig
 create mode 100644 kernel/trace/rv/monitors/opid/opid.c
 create mode 100644 kernel/trace/rv/monitors/opid/opid.h
 create mode 100644 kernel/trace/rv/monitors/opid/opid_trace.h
 create mode 100644 tools/verification/models/sched/opid.dot

diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 11ef963cb578..3f8381ad9ec7 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -341,6 +341,61 @@ can be triggered also by preemption, but cannot occur after the task got to
                              |           | switch_yield
                              +-----------+ wakeup
 
+Monitor opid
+------------
+
+The operations with preemption and irq disabled (opid) monitor ensures
+operations like ``wakeup`` and ``need_resched`` occur with interrupts and
+preemption disabled or during interrupt context, in such case preemption may
+not be disabled explicitly.
+``need_resched`` can be set by some RCU internals functions, in which case it
+doesn't match a task wakeup and might occur with only interrupts disabled::
+
+                 |                     sched_need_resched
+                 |                     sched_waking
+                 |                     irq_entry
+                 |                   +--------------------+
+                 v                   v                    |
+               +------------------------------------------------------+
+  +----------- |                     disabled                         | <+
+  |            +------------------------------------------------------+  |
+  |              |                 ^                                     |
+  |              |          preempt_disable      sched_need_resched      |
+  |       preempt_enable           |           +--------------------+    |
+  |              v                 |           v                    |    |
+  |            +------------------------------------------------------+  |
+  |            |                   irq_disabled                       |  |
+  |            +------------------------------------------------------+  |
+  |                              |             |        ^                |
+  |     irq_entry            irq_entry         |        |                |
+  |     sched_need_resched       v             |   irq_disable           |
+  |     sched_waking +--------------+          |        |                |
+  |           +----- |              |     irq_enable    |                |
+  |           |      |    in_irq    |          |        |                |
+  |           +----> |              |          |        |                |
+  |                  +--------------+          |        |          irq_disable
+  |                     |                      |        |                |
+  | irq_enable          | irq_enable           |        |                |
+  |                     v                      v        |                |
+  |            #======================================================#  |
+  |            H                     enabled                          H  |
+  |            #======================================================#  |
+  |              |                   ^         ^ preempt_enable     |    |
+  |       preempt_disable     preempt_enable   +--------------------+    |
+  |              v                   |                                   |
+  |            +------------------+  |                                   |
+  +----------> | preempt_disabled | -+                                   |
+               +------------------+                                      |
+                 |                                                       |
+                 +-------------------------------------------------------+
+
+This monitor is designed to work on ``PREEMPT_RT`` kernels, the special case of
+events occurring in interrupt context is a shortcut to identify valid scenarios
+where the preemption tracepoints might not be visible, during interrupts
+preemption is always disabled. On non- ``PREEMPT_RT`` kernels, the interrupts
+might invoke a softirq to set ``need_resched`` and wake up a task. This is
+another special case that is currently not supported by the monitor.
+
 References
 ----------
 
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 59d0db898d4a..5b4be87ba59d 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -57,6 +57,7 @@ source "kernel/trace/rv/monitors/snep/Kconfig"
 source "kernel/trace/rv/monitors/sts/Kconfig"
 source "kernel/trace/rv/monitors/nrp/Kconfig"
 source "kernel/trace/rv/monitors/sssw/Kconfig"
+source "kernel/trace/rv/monitors/opid/Kconfig"
 # Add new sched monitors here
 
 source "kernel/trace/rv/monitors/rtapp/Kconfig"
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 2afac88539d3..750e4ad6fa0f 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_RV_MON_SLEEP) += monitors/sleep/sleep.o
 obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
 obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
 obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
+obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
 # Add new monitors here
 obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
 obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/opid/Kconfig b/kernel/trace/rv/monitors/opid/Kconfig
new file mode 100644
index 000000000000..561d32da572b
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/Kconfig
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_OPID
+	depends on RV
+	depends on TRACE_IRQFLAGS
+	depends on TRACE_PREEMPT_TOGGLE
+	depends on RV_MON_SCHED
+	default y if PREEMPT_RT
+	select DA_MON_EVENTS_IMPLICIT
+	bool "opid monitor"
+	help
+	  Monitor to ensure operations like wakeup and need resched occur with
+	  interrupts and preemption disabled or during IRQs, where preemption
+	  may not be disabled explicitly.
+
+	  This monitor is unstable on !PREEMPT_RT, say N unless you are testing it.
+
+	  For further information, see:
+	    Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
new file mode 100644
index 000000000000..50d64e7fb8c4
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -0,0 +1,168 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "opid"
+
+#include <trace/events/sched.h>
+#include <trace/events/irq.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "opid.h"
+
+static struct rv_monitor rv_opid;
+DECLARE_DA_MON_PER_CPU(opid, unsigned char);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+	da_handle_event_opid(irq_entry_opid);
+}
+
+static void attach_vector_irq(void)
+{
+	rv_attach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_attach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_attach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
+		rv_attach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+static void detach_vector_irq(void)
+{
+	rv_detach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_IRQ_WORK))
+		rv_detach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
+	if (IS_ENABLED(CONFIG_SMP)) {
+		rv_detach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
+		rv_detach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
+	}
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq(void) { }
+static void detach_vector_irq(void) { }
+#endif
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(irq_disable_opid);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(irq_enable_opid);
+}
+
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+	da_handle_event_opid(irq_entry_opid);
+}
+
+static void handle_preempt_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(preempt_disable_opid);
+}
+
+static void handle_preempt_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+	da_handle_event_opid(preempt_enable_opid);
+}
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk, int cpu, int tif)
+{
+	/* The monitor's intitial state is not in_irq */
+	if (this_cpu_read(hardirq_context))
+		da_handle_event_opid(sched_need_resched_opid);
+	else
+		da_handle_start_event_opid(sched_need_resched_opid);
+}
+
+static void handle_sched_waking(void *data, struct task_struct *p)
+{
+	/* The monitor's intitial state is not in_irq */
+	if (this_cpu_read(hardirq_context))
+		da_handle_event_opid(sched_waking_opid);
+	else
+		da_handle_start_event_opid(sched_waking_opid);
+}
+
+static int enable_opid(void)
+{
+	int retval;
+
+	retval = da_monitor_init_opid();
+	if (retval)
+		return retval;
+
+	rv_attach_trace_probe("opid", irq_disable, handle_irq_disable);
+	rv_attach_trace_probe("opid", irq_enable, handle_irq_enable);
+	rv_attach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+	rv_attach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+	rv_attach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+	rv_attach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_attach_trace_probe("opid", sched_waking, handle_sched_waking);
+	attach_vector_irq();
+
+	return 0;
+}
+
+static void disable_opid(void)
+{
+	rv_opid.enabled = 0;
+
+	rv_detach_trace_probe("opid", irq_disable, handle_irq_disable);
+	rv_detach_trace_probe("opid", irq_enable, handle_irq_enable);
+	rv_detach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+	rv_detach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+	rv_detach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+	rv_detach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+	rv_detach_trace_probe("opid", sched_waking, handle_sched_waking);
+	detach_vector_irq();
+
+	da_monitor_destroy_opid();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_opid = {
+	.name = "opid",
+	.description = "operations with preemption and irq disabled.",
+	.enable = enable_opid,
+	.disable = disable_opid,
+	.reset = da_monitor_reset_all_opid,
+	.enabled = 0,
+};
+
+static int __init register_opid(void)
+{
+	return rv_register_monitor(&rv_opid, &rv_sched);
+}
+
+static void __exit unregister_opid(void)
+{
+	rv_unregister_monitor(&rv_opid);
+}
+
+module_init(register_opid);
+module_exit(unregister_opid);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("opid: operations with preemption and irq disabled.");
diff --git a/kernel/trace/rv/monitors/opid/opid.h b/kernel/trace/rv/monitors/opid/opid.h
new file mode 100644
index 000000000000..b4b8c2ff7f64
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of opid automaton
+ * For further information about this format, see kernel documentation:
+ *   Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_opid {
+	disabled_opid = 0,
+	enabled_opid,
+	in_irq_opid,
+	irq_disabled_opid,
+	preempt_disabled_opid,
+	state_max_opid
+};
+
+#define INVALID_STATE state_max_opid
+
+enum events_opid {
+	irq_disable_opid = 0,
+	irq_enable_opid,
+	irq_entry_opid,
+	preempt_disable_opid,
+	preempt_enable_opid,
+	sched_need_resched_opid,
+	sched_waking_opid,
+	event_max_opid
+};
+
+struct automaton_opid {
+	char *state_names[state_max_opid];
+	char *event_names[event_max_opid];
+	unsigned char function[state_max_opid][event_max_opid];
+	unsigned char initial_state;
+	bool final_states[state_max_opid];
+};
+
+static const struct automaton_opid automaton_opid = {
+	.state_names = {
+		"disabled",
+		"enabled",
+		"in_irq",
+		"irq_disabled",
+		"preempt_disabled"
+	},
+	.event_names = {
+		"irq_disable",
+		"irq_enable",
+		"irq_entry",
+		"preempt_disable",
+		"preempt_enable",
+		"sched_need_resched",
+		"sched_waking"
+	},
+	.function = {
+		{
+			INVALID_STATE,
+			preempt_disabled_opid,
+			disabled_opid,
+			INVALID_STATE,
+			irq_disabled_opid,
+			disabled_opid,
+			disabled_opid
+		},
+		{
+			irq_disabled_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			preempt_disabled_opid,
+			enabled_opid,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+		{
+			INVALID_STATE,
+			enabled_opid,
+			in_irq_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			in_irq_opid,
+			in_irq_opid
+		},
+		{
+			INVALID_STATE,
+			enabled_opid,
+			in_irq_opid,
+			disabled_opid,
+			INVALID_STATE,
+			irq_disabled_opid,
+			INVALID_STATE
+		},
+		{
+			disabled_opid,
+			INVALID_STATE,
+			INVALID_STATE,
+			INVALID_STATE,
+			enabled_opid,
+			INVALID_STATE,
+			INVALID_STATE
+		},
+	},
+	.initial_state = disabled_opid,
+	.final_states = { 0, 1, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/opid/opid_trace.h b/kernel/trace/rv/monitors/opid/opid_trace.h
new file mode 100644
index 000000000000..3df6ff955c30
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_OPID
+DEFINE_EVENT(event_da_monitor, event_opid,
+	     TP_PROTO(char *state, char *event, char *next_state, bool final_state),
+	     TP_ARGS(state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor, error_opid,
+	     TP_PROTO(char *state, char *event),
+	     TP_ARGS(state, event));
+#endif /* CONFIG_RV_MON_OPID */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index e4e78d23b7b0..4a6faddac614 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -62,6 +62,7 @@ DECLARE_EVENT_CLASS(error_da_monitor,
 #include <monitors/scpd/scpd_trace.h>
 #include <monitors/snep/snep_trace.h>
 #include <monitors/sts/sts_trace.h>
+#include <monitors/opid/opid_trace.h>
 // Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
 
 #endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/opid.dot b/tools/verification/models/sched/opid.dot
new file mode 100644
index 000000000000..840052f6952b
--- /dev/null
+++ b/tools/verification/models/sched/opid.dot
@@ -0,0 +1,35 @@
+digraph state_automaton {
+	center = true;
+	size = "7,11";
+	{node [shape = plaintext, style=invis, label=""] "__init_disabled"};
+	{node [shape = circle] "disabled"};
+	{node [shape = doublecircle] "enabled"};
+	{node [shape = circle] "enabled"};
+	{node [shape = circle] "in_irq"};
+	{node [shape = circle] "irq_disabled"};
+	{node [shape = circle] "preempt_disabled"};
+	"__init_disabled" -> "disabled";
+	"disabled" [label = "disabled"];
+	"disabled" -> "disabled" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
+	"disabled" -> "irq_disabled" [ label = "preempt_enable" ];
+	"disabled" -> "preempt_disabled" [ label = "irq_enable" ];
+	"enabled" [label = "enabled", color = green3];
+	"enabled" -> "enabled" [ label = "preempt_enable" ];
+	"enabled" -> "irq_disabled" [ label = "irq_disable" ];
+	"enabled" -> "preempt_disabled" [ label = "preempt_disable" ];
+	"in_irq" [label = "in_irq"];
+	"in_irq" -> "enabled" [ label = "irq_enable" ];
+	"in_irq" -> "in_irq" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
+	"irq_disabled" [label = "irq_disabled"];
+	"irq_disabled" -> "disabled" [ label = "preempt_disable" ];
+	"irq_disabled" -> "enabled" [ label = "irq_enable" ];
+	"irq_disabled" -> "in_irq" [ label = "irq_entry" ];
+	"irq_disabled" -> "irq_disabled" [ label = "sched_need_resched" ];
+	"preempt_disabled" [label = "preempt_disabled"];
+	"preempt_disabled" -> "disabled" [ label = "irq_disable" ];
+	"preempt_disabled" -> "enabled" [ label = "preempt_enable" ];
+	{ rank = min ;
+		"__init_disabled";
+		"disabled";
+	}
+}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 5/9] rv: Retry when da monitor detects race conditions
  2025-07-28 13:50 ` [PATCH v5 5/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-07-28 14:18   ` Nam Cao
  0 siblings, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-28 14:18 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Masami Hiramatsu,
	linux-trace-kernel, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
	Juri Lelli, Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 03:50:17PM +0200, Gabriele Monaco wrote:
> DA monitor can be accessed from multiple cores simultaneously, this is
> likely, for instance when dealing with per-task monitors reacting on
> events that do not always occur on the CPU where the task is running.
> This can cause race conditions where two events change the next state
> and we see inconsistent values. E.g.:
> 
>   [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
>   [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
>   [63] error_srs: 27: event sched_switch_suspend not expected in the state running
> 
> In this case the monitor fails because the event on CPU 62 wins against
> the one on CPU 63, although the correct state should have been
> sleepable, since the task get suspended.
> 
> Detect if the current state was modified by using try_cmpxchg while
> storing the next value. If it was, try again reading the current state.
> After a maximum number of failed retries, react by calling a special
> tracepoint, print on the console and reset the monitor.
> 
> Remove the functions da_monitor_curr_state() and da_monitor_set_state()
> as they only hide the underlying implementation in this case.
> 
> Monitors where this type of condition can occur must be able to account
> for racing events in any possible order, as we cannot know the winner.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

Reviewed-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-28 13:50 ` [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts Gabriele Monaco
@ 2025-07-28 15:53   ` Nam Cao
  2025-07-29  8:46     ` Gabriele Monaco
  2025-07-30 16:28     ` Nam Cao
  0 siblings, 2 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-28 15:53 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 03:50:19PM +0200, Gabriele Monaco wrote:
> The tss monitor currently guarantees task switches can happen only while
> scheduling, whereas the sncid monitor enforces scheduling occurs with
> interrupt disabled.
> 
> Replace the monitors with a more comprehensive specification which
> implies both but also ensures that:
> * each scheduler call disable interrupts to switch
> * each task switch happens with interrupts disabled
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

I gave this a try on riscv64 and observed some errors:

[  620.696055] rv: monitor sts does not allow event sched_switch on state enable_to_exit
[  621.047705] rv: monitor sts does not allow event sched_switch on state enable_to_exit
[  642.440209] rv: monitor sts does not allow event sched_switch on state enable_to_exit

I tested with two user programs:

    int main() { asm ("unimp"); }
    int main() { asm ("ebreak"); }

The two programs are repeatedly executed:

    #!/bin/bash
    ./test1 &
    ./test2 &
    # ... repeat lots of time

Any idea?

Nam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors
  2025-07-28 13:50 ` [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors Gabriele Monaco
@ 2025-07-28 15:56   ` Nam Cao
  0 siblings, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-28 15:56 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 03:50:20PM +0200, Gabriele Monaco wrote:
> Add 2 per-task monitors as part of the sched model:
> 
> * nrp: need-resched preempts
>     Monitor to ensure preemption requires need resched.
> * sssw: set state sleep and wakeup
>     Monitor to ensure sched_set_state to sleepable leads to sleeping and
>     sleeping tasks require wakeup.
> 
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

I can't comment on the model, but the monitor implementation looks okay. I
ran the two monitors on x86, powerpc64, riscv64 and didn't see anything
suspicious:

Acked-by: Nam Cao <namcao@linutronix.de>
Tested-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 9/9] rv: Add opid per-cpu monitor
  2025-07-28 13:50 ` [PATCH v5 9/9] rv: Add opid per-cpu monitor Gabriele Monaco
@ 2025-07-28 15:58   ` Nam Cao
  0 siblings, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-28 15:58 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 03:50:21PM +0200, Gabriele Monaco wrote:
> Add a per-cpu monitor as part of the sched model:
> * opid: operations with preemption and irq disabled
>     Monitor to ensure wakeup and need_resched occur with irq and
>     preemption disabled or in irq handlers.
> 
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>

The monitor implementation looks fine:
Acked-by: Nam Cao <namcao@linutronix.de>

And on x86 and riscv64:
Tested-by: Nam Cao <namcao@linutronix.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model
  2025-07-28 13:50 ` [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
@ 2025-07-28 16:02   ` Nam Cao
  0 siblings, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-28 16:02 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
	Masami Hiramatsu, linux-trace-kernel, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 03:50:18PM +0200, Gabriele Monaco wrote:
> Add the following tracepoint:
> * sched_set_need_resched(tsk, cpu, tif)
>     Called when a task is set the need resched [lazy] flag
> 
> Remove the unused ip parameter from sched_entry and sched_exit and alter
> sched_entry to have a value of preempt consistent with the one used in
> sched_switch.
> 
> Also adapt all monitors using sched_{entry,exit} to avoid breaking build.

Does it make sense to split this patch into two, one for
sched_set_need_resched and another for sched_{entry,exit}?

Nam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-28 15:53   ` Nam Cao
@ 2025-07-29  8:46     ` Gabriele Monaco
  2025-07-29  9:25       ` Nam Cao
  2025-07-30 16:28     ` Nam Cao
  1 sibling, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-29  8:46 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Mon, 2025-07-28 at 17:53 +0200, Nam Cao wrote:
> I gave this a try on riscv64 and observed some errors:
> 
> [  620.696055] rv: monitor sts does not allow event sched_switch on
> state enable_to_exit
> [  621.047705] rv: monitor sts does not allow event sched_switch on
> state enable_to_exit
> [  642.440209] rv: monitor sts does not allow event sched_switch on
> state enable_to_exit
> 
> I tested with two user programs:
> 
>     int main() { asm ("unimp"); }
>     int main() { asm ("ebreak"); }
> 
> The two programs are repeatedly executed:
> 
>     #!/bin/bash
>     ./test1 &
>     ./test2 &
>     # ... repeat lots of time
> 
> Any idea?

Mmh I see what you're doing here..
Those instructions are supposed to raise some sort of exception in the
CPU which apparently disables and enables interrupts without raising an
interrupt handler tracepoint (the discriminator for this monitor).
This lets the monitor believe we passed the time a switch is possible
and complain when it actually sees one.

I still couldn't reproduce it on my VM, yet I find the timing a bit
strange: it's alright we handle the illegal instruction like this, but
do we really end up doing that while scheduling although it doesn't
look like an interrupt?!

Could you share a bit more about your riscv setup? It might some
configuration/hardware specific thing.

Thanks for finding this,
Gabriele


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-29  8:46     ` Gabriele Monaco
@ 2025-07-29  9:25       ` Nam Cao
  2025-07-29  9:37         ` Nam Cao
  0 siblings, 1 reply; 24+ messages in thread
From: Nam Cao @ 2025-07-29  9:25 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On Tue, Jul 29, 2025 at 10:46:51AM +0200, Gabriele Monaco wrote:
> On Mon, 2025-07-28 at 17:53 +0200, Nam Cao wrote:
> > I gave this a try on riscv64 and observed some errors:
> > 
> > [  620.696055] rv: monitor sts does not allow event sched_switch on
> > state enable_to_exit
> > [  621.047705] rv: monitor sts does not allow event sched_switch on
> > state enable_to_exit
> > [  642.440209] rv: monitor sts does not allow event sched_switch on
> > state enable_to_exit
> > 
> > I tested with two user programs:
> > 
> >     int main() { asm ("unimp"); }
> >     int main() { asm ("ebreak"); }
> > 
> > The two programs are repeatedly executed:
> > 
> >     #!/bin/bash
> >     ./test1 &
> >     ./test2 &
> >     # ... repeat lots of time
> > 
> > Any idea?
> 
> Mmh I see what you're doing here..
> Those instructions are supposed to raise some sort of exception in the
> CPU which apparently disables and enables interrupts without raising an
> interrupt handler tracepoint (the discriminator for this monitor).
> This lets the monitor believe we passed the time a switch is possible
> and complain when it actually sees one.
> 
> I still couldn't reproduce it on my VM, yet I find the timing a bit
> strange: it's alright we handle the illegal instruction like this, but
> do we really end up doing that while scheduling although it doesn't
> look like an interrupt?!
> 
> Could you share a bit more about your riscv setup? It might some
> configuration/hardware specific thing.

Kernel:
  - base: ftrace/for-next
  - config: defconfig + mod2noconfig + PREEMPT_RT + monitors

Hardware:
	qemu-system-riscv64 -machine virt \
	-kernel ../linux/arch/riscv/boot/Image \
	-append "console=ttyS0 root=/dev/vda rw" \
	-nographic \
	-drive if=virtio,format=raw,file=riscv64.img \
	-smp 4 -m 4G

	riscv64.img is a Debian trixie image from debootstrap

Test:
	echo 0 > /proc/sys/debug/exception-trace
	./testall # see attached

[-- Attachment #2: test.sh --]
[-- Type: application/x-sh, Size: 373 bytes --]

[-- Attachment #3: testall --]
[-- Type: text/plain, Size: 480 bytes --]

#!/bin/bash
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &
./test.sh &

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-29  9:25       ` Nam Cao
@ 2025-07-29  9:37         ` Nam Cao
  2025-07-29 14:06           ` Gabriele Monaco
  0 siblings, 1 reply; 24+ messages in thread
From: Nam Cao @ 2025-07-29  9:37 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote:
> On Tue, Jul 29, 2025 at 10:46:51AM +0200, Gabriele Monaco wrote:
> > On Mon, 2025-07-28 at 17:53 +0200, Nam Cao wrote:
> > > I gave this a try on riscv64 and observed some errors:
> > > 
> > > [  620.696055] rv: monitor sts does not allow event sched_switch on
> > > state enable_to_exit
> > > [  621.047705] rv: monitor sts does not allow event sched_switch on
> > > state enable_to_exit
> > > [  642.440209] rv: monitor sts does not allow event sched_switch on
> > > state enable_to_exit
> > > 
> > > I tested with two user programs:
> > > 
> > >     int main() { asm ("unimp"); }
> > >     int main() { asm ("ebreak"); }
> > > 
> > > The two programs are repeatedly executed:
> > > 
> > >     #!/bin/bash
> > >     ./test1 &
> > >     ./test2 &
> > >     # ... repeat lots of time
> > > 
> > > Any idea?
> > 
> > Mmh I see what you're doing here..
> > Those instructions are supposed to raise some sort of exception in the
> > CPU which apparently disables and enables interrupts without raising an
> > interrupt handler tracepoint (the discriminator for this monitor).
> > This lets the monitor believe we passed the time a switch is possible
> > and complain when it actually sees one.
> > 
> > I still couldn't reproduce it on my VM, yet I find the timing a bit
> > strange: it's alright we handle the illegal instruction like this, but
> > do we really end up doing that while scheduling although it doesn't
> > look like an interrupt?!
> > 
> > Could you share a bit more about your riscv setup? It might some
> > configuration/hardware specific thing.
> 
> Kernel:
>   - base: ftrace/for-next
>   - config: defconfig + mod2noconfig + PREEMPT_RT + monitors
> 
> Hardware:
> 	qemu-system-riscv64 -machine virt \
> 	-kernel ../linux/arch/riscv/boot/Image \
> 	-append "console=ttyS0 root=/dev/vda rw" \
> 	-nographic \
> 	-drive if=virtio,format=raw,file=riscv64.img \
> 	-smp 4 -m 4G
> 
> 	riscv64.img is a Debian trixie image from debootstrap
> 
> Test:
> 	echo 0 > /proc/sys/debug/exception-trace
> 	./testall # see attached

I should note that this takes a few tries before something shows up.

Below is the backtrace, in case it helps:

illegal    3246 [000]  1020.132675: rv:error_sts: event sched_switch not expected in the state enable_to_exit
        ffffffff8013231c __traceiter_error_sts+0x28 ([kernel.kallsyms])
        ffffffff8013231c __traceiter_error_sts+0x28 ([kernel.kallsyms])
        ffffffff80138aa4 da_event_sts+0x198 ([kernel.kallsyms])
        ffffffff80138cf0 handle_sched_switch+0x46 ([kernel.kallsyms])
        ffffffff80aaf222 __schedule+0x4ba ([kernel.kallsyms])
        ffffffff80aafb80 preempt_schedule_irq+0x32 ([kernel.kallsyms])
        ffffffff80aac714 irqentry_exit+0x76 ([kernel.kallsyms])
        ffffffff80aac1dc do_irq+0x38 ([kernel.kallsyms])
        ffffffff80ab7da6 __lock_text_end+0x12e ([kernel.kallsyms])
        ffffffff80a93e50 mas_find+0x0 ([kernel.kallsyms])
        ffffffff8021ea60 vms_clear_ptes+0xe8 ([kernel.kallsyms])
        ffffffff8021f81a vms_complete_munmap_vmas+0x58 ([kernel.kallsyms])
        ffffffff80220706 do_vmi_align_munmap+0x15c ([kernel.kallsyms])
        ffffffff802207d0 do_vmi_munmap+0xa6 ([kernel.kallsyms])
        ffffffff80221f3c __vm_munmap+0xa2 ([kernel.kallsyms])
        ffffffff8020be7c vm_munmap+0xe ([kernel.kallsyms])
        ffffffff802bbdbe elf_load+0x14c ([kernel.kallsyms])
        ffffffff802bc1f4 load_elf_binary+0x36e ([kernel.kallsyms])
        ffffffff80264426 bprm_execve+0x254 ([kernel.kallsyms])
        ffffffff8026570c do_execveat_common.isra.0+0x11e ([kernel.kallsyms])
        ffffffff802664de __riscv_sys_execve+0x32 ([kernel.kallsyms])
        ffffffff80aabf84 do_trap_ecall_u+0x1bc ([kernel.kallsyms])
        ffffffff80ab7dc8 __lock_text_end+0x150 ([kernel.kallsyms])

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-29  9:37         ` Nam Cao
@ 2025-07-29 14:06           ` Gabriele Monaco
  2025-07-30 12:52             ` Nam Cao
  0 siblings, 1 reply; 24+ messages in thread
From: Gabriele Monaco @ 2025-07-29 14:06 UTC (permalink / raw)
  To: Nam Cao
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Tue, 2025-07-29 at 11:37 +0200, Nam Cao wrote:
> On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote:
> > Kernel:
> >   - base: ftrace/for-next

I assume you mean rv/for-next ? The one that includes all changes as of
yesterday.

> >   - config: defconfig + mod2noconfig + PREEMPT_RT + monitors
> > 
> > Hardware:
> > 	qemu-system-riscv64 -machine virt \
> > 	-kernel ../linux/arch/riscv/boot/Image \
> > 	-append "console=ttyS0 root=/dev/vda rw" \
> > 	-nographic \
> > 	-drive if=virtio,format=raw,file=riscv64.img \
> > 	-smp 4 -m 4G
> > 
> > 	riscv64.img is a Debian trixie image from debootstrap
> > 
> > Test:
> > 	echo 0 > /proc/sys/debug/exception-trace
> > 	./testall # see attached
> 
> I should note that this takes a few tries before something shows up.
> 

Thanks for all the details, but I still can't reproduce nor understand
what can be triggering the issue.

I tried enabling sts and setting panic as the reactor (to avoid missing
it with all the rubbish that gets printed on the dmesg) and run
testall. Still cannot see the error.

What might help would be to see the trace with irq_enable and
irq_disable around the error, something like (not tested):

  trace-cmd stream -e irq_enable -e irq_disable -e error_sts -e
irq_handler_entry -- sh testall | grep -B 10 error

The problem here is not when the error occurs, but a couple of events
earlier (where I possibly miss something that looks like an interrupt).

Thanks,
Gabriele

> Below is the backtrace, in case it helps:
> 
> illegal    3246 [000]  1020.132675: rv:error_sts: event sched_switch
> not expected in the state enable_to_exit
>         ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
>         ffffffff8013231c __traceiter_error_sts+0x28
> ([kernel.kallsyms])
>         ffffffff80138aa4 da_event_sts+0x198 ([kernel.kallsyms])
>         ffffffff80138cf0 handle_sched_switch+0x46 ([kernel.kallsyms])
>         ffffffff80aaf222 __schedule+0x4ba ([kernel.kallsyms])
>         ffffffff80aafb80 preempt_schedule_irq+0x32
> ([kernel.kallsyms])
>         ffffffff80aac714 irqentry_exit+0x76 ([kernel.kallsyms])
>         ffffffff80aac1dc do_irq+0x38 ([kernel.kallsyms])
>         ffffffff80ab7da6 __lock_text_end+0x12e ([kernel.kallsyms])
>         ffffffff80a93e50 mas_find+0x0 ([kernel.kallsyms])
>         ffffffff8021ea60 vms_clear_ptes+0xe8 ([kernel.kallsyms])
>         ffffffff8021f81a vms_complete_munmap_vmas+0x58
> ([kernel.kallsyms])
>         ffffffff80220706 do_vmi_align_munmap+0x15c
> ([kernel.kallsyms])
>         ffffffff802207d0 do_vmi_munmap+0xa6 ([kernel.kallsyms])
>         ffffffff80221f3c __vm_munmap+0xa2 ([kernel.kallsyms])
>         ffffffff8020be7c vm_munmap+0xe ([kernel.kallsyms])
>         ffffffff802bbdbe elf_load+0x14c ([kernel.kallsyms])
>         ffffffff802bc1f4 load_elf_binary+0x36e ([kernel.kallsyms])
>         ffffffff80264426 bprm_execve+0x254 ([kernel.kallsyms])
>         ffffffff8026570c do_execveat_common.isra.0+0x11e
> ([kernel.kallsyms])
>         ffffffff802664de __riscv_sys_execve+0x32 ([kernel.kallsyms])
>         ffffffff80aabf84 do_trap_ecall_u+0x1bc ([kernel.kallsyms])
>         ffffffff80ab7dc8 __lock_text_end+0x150 ([kernel.kallsyms])


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-29 14:06           ` Gabriele Monaco
@ 2025-07-30 12:52             ` Nam Cao
  2025-07-30 14:16               ` Nam Cao
  0 siblings, 1 reply; 24+ messages in thread
From: Nam Cao @ 2025-07-30 12:52 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Tue, Jul 29, 2025 at 04:06:17PM +0200, Gabriele Monaco wrote:
> On Tue, 2025-07-29 at 11:37 +0200, Nam Cao wrote:
> > On Tue, Jul 29, 2025 at 11:25:12AM +0200, Nam Cao wrote:
> > > Kernel:
> > >   - base: ftrace/for-next
> 
> I assume you mean rv/for-next ? The one that includes all changes as of
> yesterday.

I meant I apply this series on top of ftrace/for-next. But that one is
close enough.

> Thanks for all the details, but I still can't reproduce nor understand
> what can be triggering the issue.
> 
> I tried enabling sts and setting panic as the reactor (to avoid missing
> it with all the rubbish that gets printed on the dmesg) and run
> testall. Still cannot see the error.

Welcome to the "but it works on my machine" camp. I was there 2 weeks ago,
it was not fun.

> What might help would be to see the trace with irq_enable and
> irq_disable around the error, something like (not tested):
> 
>   trace-cmd stream -e irq_enable -e irq_disable -e error_sts -e
> irq_handler_entry -- sh testall | grep -B 10 error

I do not have trace-cmd in the riscv image, but I do have perf. I will give
it a try.

> The problem here is not when the error occurs, but a couple of events
> earlier (where I possibly miss something that looks like an interrupt).

I just accidentally hit this error again, not on riscv but on x86, while
doing something unrelated. Let me figure out a minimal way to reproduce it.

Nam

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-30 12:52             ` Nam Cao
@ 2025-07-30 14:16               ` Nam Cao
  2025-07-30 14:44                 ` Nam Cao
  0 siblings, 1 reply; 24+ messages in thread
From: Nam Cao @ 2025-07-30 14:16 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 30, 2025 at 02:52:25PM +0200, Nam Cao wrote:
> I do not have trace-cmd in the riscv image, but I do have perf. I will give
> it a try.

Instead, I replaced with tracepoints with trace_printk() and managed to
captured the log moments before disaster below.

I'm not sure what I'm seeing, just dumping this here, maybe you have an
idea.

Quite interesting that the last "normal" line for cpu1 is:

            test-762     [001] dn...   112.407548: da_event_sts: cant_sched x irq_enable -> can_sched (final)

But in the next error line, it mentions the "enable_to_exit". Where did
this state come from??

           <...>-1621    [001] d....   119.919846: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit

Something very strange is going on here..

Nam

            test-1198    [000] d....   112.407332: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-762     [001] d....   112.407338: da_event_sts: can_sched x irq_disable -> cant_sched
            test-762     [001] d....   112.407347: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-1198    [000] d....   112.407351: da_event_sts: can_sched x irq_disable -> cant_sched
            test-762     [001] d....   112.407363: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] d.h..   112.407370: da_event_sts: cant_sched x irq_entry -> cant_sched
            test-762     [001] d.h..   112.407383: da_event_sts: cant_sched x irq_entry -> cant_sched
            test-637     [002] dn...   112.407413: da_event_sts: cant_sched x irq_enable -> can_sched (final)
         illegal-1091    [003] dns..   112.407418: da_event_sts: in_irq x irq_enable -> scheduling
            test-637     [002] .n...   112.407424: da_event_sts: can_sched x schedule_entry -> scheduling
         illegal-1091    [003] dns..   112.407431: da_event_sts: scheduling x irq_disable -> disable_to_switch
            test-637     [002] dn...   112.407433: da_event_sts: scheduling x irq_disable -> disable_to_switch
         illegal-1091    [003] dns..   112.407438: da_event_sts: disable_to_switch x irq_enable -> enable_to_exit
         illegal-1091    [003] dns..   112.407448: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
            test-1198    [000] dns..   112.407459: da_event_sts: cant_sched x irq_enable -> can_sched (final)
         illegal-1091    [003] dns..   112.407460: da_event_sts: enable_to_exit x irq_enable -> enable_to_exit
            test-637     [002] d....   112.407468: da_event_sts: disable_to_switch x sched_switch -> switching
            test-1198    [000] dns..   112.407472: da_event_sts: can_sched x irq_disable -> cant_sched
         illegal-1091    [003] dns..   112.407472: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
         illegal-1091    [003] dn...   112.407481: da_event_sts: enable_to_exit x irq_enable -> enable_to_exit
            test-1198    [000] dns..   112.407483: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-762     [001] dns..   112.407488: da_event_sts: cant_sched x irq_enable -> can_sched (final)
         illegal-1091    [003] dn...   112.407492: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
            test-1198    [000] dns..   112.407493: da_event_sts: can_sched x irq_disable -> cant_sched
            test-762     [001] dns..   112.407501: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] dns..   112.407501: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-762     [001] dns..   112.407508: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-1198    [000] dns..   112.407513: da_event_sts: can_sched x irq_disable -> cant_sched
            test-649     [002] d....   112.407517: da_event_sts: switching x irq_enable -> enable_to_exit
            test-762     [001] dns..   112.407518: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] dns..   112.407521: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-762     [001] dns..   112.407529: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-649     [002] .....   112.407530: da_event_sts: enable_to_exit x schedule_exit -> can_sched (final)
            test-1198    [000] dns..   112.407541: da_event_sts: can_sched x irq_disable -> cant_sched
            test-762     [001] dns..   112.407541: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] dns..   112.407548: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-762     [001] dn...   112.407548: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-1198    [000] dns..   112.407560: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] dns..   112.407567: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-1198    [000] dns..   112.407578: da_event_sts: can_sched x irq_disable -> cant_sched
            test-1198    [000] dns..   112.407585: da_event_sts: cant_sched x irq_enable -> can_sched (final)
            test-649     [002] d....   112.407596: da_event_sts: can_sched x irq_disable -> cant_sched
         illegal-1091    [003] d....   112.410126: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit
           <...>-1621    [001] d....   119.919846: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit
            test-642     [001] d....   121.459650: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-30 14:16               ` Nam Cao
@ 2025-07-30 14:44                 ` Nam Cao
  2025-07-30 15:09                   ` Nam Cao
  0 siblings, 1 reply; 24+ messages in thread
From: Nam Cao @ 2025-07-30 14:44 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

On Wed, Jul 30, 2025 at 04:16:46PM +0200, Nam Cao wrote:
> Quite interesting that the last "normal" line for cpu1 is:
> 
>             test-762     [001] dn...   112.407548: da_event_sts: cant_sched x irq_enable -> can_sched (final)
> 
> But in the next error line, it mentions the "enable_to_exit". Where did
> this state come from??
> 
>            <...>-1621    [001] d....   119.919846: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit

Never mind about this one, only cpu3 is accurate here, because I cut off
the trace_printk() as soon as the first error appears:

diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 17fa4f6e5ea6..927cf2cda03f 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -18,15 +18,22 @@
 
 #ifdef CONFIG_RV_REACTORS
 
+static bool nam_stop = true;
+
 #define DECLARE_RV_REACTING_HELPERS(name, type)							\
 static void cond_react_##name(type curr_state, type event)					\
 {												\
 	if (!rv_reacting_on() || !rv_##name.react)						\
 		return;										\
+	nam_stop = true;\
 	rv_##name.react("rv: monitor %s does not allow event %s on state %s\n",			\
 			#name,									\
 			model_get_event_name_##name(event),					\
 			model_get_state_name_##name(curr_state));				\
+	trace_printk("rv: monitor %s does not allow event %s on state %s\n",			\
+			#name,									\
+			model_get_event_name_##name(event),					\
+			model_get_state_name_##name(curr_state));				\
 }
 
 #else /* CONFIG_RV_REACTOR */
@@ -136,6 +143,14 @@ da_event_##name(struct da_monitor *da_mon, enum events_##name event)				\
 					   model_get_event_name_##name(event),			\
 					   model_get_state_name_##name(next_state),		\
 					   model_is_final_state_##name(next_state));		\
+			if (nam_stop)\
+				return true;\
+			trace_printk("%s x %s -> %s%s\n", \
+					model_get_state_name_##name(curr_state),		\
+					   model_get_event_name_##name(event),			\
+					   model_get_state_name_##name(next_state),		\
+					   model_is_final_state_##name(next_state)?		\
+						" (final)" : "");		\
 			return true;								\
 		}										\
 	}											\

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-30 14:44                 ` Nam Cao
@ 2025-07-30 15:09                   ` Nam Cao
  0 siblings, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-30 15:09 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
	Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli,
	Clark Williams, John Kacur

Here is a deeper log, CPU3 only:

 illegal-1091    [003] d....   112.407127: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407163: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] d....   112.407177: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407186: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] d....   112.407196: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407203: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] d....   112.407212: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407218: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] d....   112.407227: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407234: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] d....   112.407254: da_event_sts: can_sched x irq_disable -> cant_sched
 illegal-1091    [003] d....   112.407261: da_event_sts: cant_sched x irq_enable -> can_sched (final)
 illegal-1091    [003] .....   112.407271: da_event_sts: can_sched x schedule_entry -> scheduling
 illegal-1091    [003] d....   112.407312: da_event_sts: scheduling x irq_disable -> disable_to_switch
 illegal-1091    [003] d.h..   112.407332: da_event_sts: disable_to_switch x irq_entry -> in_irq
 illegal-1091    [003] dns..   112.407418: da_event_sts: in_irq x irq_enable -> scheduling
 illegal-1091    [003] dns..   112.407431: da_event_sts: scheduling x irq_disable -> disable_to_switch
 illegal-1091    [003] dns..   112.407438: da_event_sts: disable_to_switch x irq_enable -> enable_to_exit
 illegal-1091    [003] dns..   112.407448: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
 illegal-1091    [003] dns..   112.407460: da_event_sts: enable_to_exit x irq_enable -> enable_to_exit
 illegal-1091    [003] dns..   112.407472: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
 illegal-1091    [003] dn...   112.407481: da_event_sts: enable_to_exit x irq_enable -> enable_to_exit
 illegal-1091    [003] dn...   112.407492: da_event_sts: enable_to_exit x irq_disable -> enable_to_exit
 illegal-1091    [003] d....   112.410126: da_event_sts: rv: monitor sts does not allow event sched_switch on state enable_to_exit

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts
  2025-07-28 15:53   ` Nam Cao
  2025-07-29  8:46     ` Gabriele Monaco
@ 2025-07-30 16:28     ` Nam Cao
  1 sibling, 0 replies; 24+ messages in thread
From: Nam Cao @ 2025-07-30 16:28 UTC (permalink / raw)
  To: Gabriele Monaco
  Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
	linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
	Tomas Glozar, Juri Lelli, Clark Williams, John Kacur

On Mon, Jul 28, 2025 at 05:53:34PM +0200, Nam Cao wrote:
> On Mon, Jul 28, 2025 at 03:50:19PM +0200, Gabriele Monaco wrote:
> > The tss monitor currently guarantees task switches can happen only while
> > scheduling, whereas the sncid monitor enforces scheduling occurs with
> > interrupt disabled.
> > 
> > Replace the monitors with a more comprehensive specification which
> > implies both but also ensures that:
> > * each scheduler call disable interrupts to switch
> > * each task switch happens with interrupts disabled
> > 
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> 
> I gave this a try on riscv64 and observed some errors:
> 
> [  620.696055] rv: monitor sts does not allow event sched_switch on state enable_to_exit
> [  621.047705] rv: monitor sts does not allow event sched_switch on state enable_to_exit
> [  642.440209] rv: monitor sts does not allow event sched_switch on state enable_to_exit
> 
> I tested with two user programs:
> 
>     int main() { asm ("unimp"); }
>     int main() { asm ("ebreak"); }
> 
> The two programs are repeatedly executed:
> 
>     #!/bin/bash
>     ./test1 &
>     ./test2 &
>     # ... repeat lots of time

Okay, I think I know why..

It seems the monitor is in scheduling state. Then it sees a pair of
irq_disable and irq_enable, and it mistakenly thinks that this is the
is_switch==false case in __schedule. So it thinks it is at the end of
__schedule(), and does not expect a switch_switch.

However, this is wrong. The irq_disable and irq_enable pair is not from
__schedule(), it is from softirq (see below).

In short, the monitor thinks it is at the end of __schedule(), but actually
it is still at the beginning.

That's just from my limited understanding of the model, so I may be wrong.
What do you think?

Nam

             test-256     [002] dns..    63.070743: da_event_sts: scheduling x irq_disable -> disable_to_switch
             test-256     [002] dns..    63.070748: <stack trace>
  => trace_dump_stack
  => da_event_sts
  => handle_irq_disable
  => trace_hardirqs_off.part.0
  => trace_hardirqs_off
  => note_gp_changes
  => rcu_core
  => rcu_core_si
  => handle_softirqs
  => __irq_exit_rcu
  => irq_exit_rcu
  => handle_riscv_irq
  => call_on_irq_stack
             test-256     [002] dns..    63.070755: da_event_sts: disable_to_switch x irq_enable -> enable_to_exit
             test-256     [002] dns..    63.070760: <stack trace>
  => trace_dump_stack
  => da_event_sts
  => handle_irq_enable
  => trace_hardirqs_on
  => note_gp_changes
  => rcu_core
  => rcu_core_si
  => handle_softirqs
  => __irq_exit_rcu
  => irq_exit_rcu
  => handle_riscv_irq
  => call_on_irq_stack
  => call_on_irq_stack

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-07-30 16:29 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28 13:50 [PATCH v5 0/9] rv: Add monitors to validate task switch Gabriele Monaco
2025-07-28 13:50 ` [PATCH v5 1/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-07-28 13:50 ` [PATCH v5 2/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-07-28 13:50 ` [PATCH v5 3/9] rv: Use strings in da monitors tracepoints Gabriele Monaco
2025-07-28 13:50 ` [PATCH v5 4/9] rv: Adjust monitor dependencies Gabriele Monaco
2025-07-28 13:50 ` [PATCH v5 5/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-07-28 14:18   ` Nam Cao
2025-07-28 13:50 ` [PATCH v5 6/9] sched: Adapt sched tracepoints for RV task model Gabriele Monaco
2025-07-28 16:02   ` Nam Cao
2025-07-28 13:50 ` [PATCH v5 7/9] rv: Replace tss and sncid monitors with more complete sts Gabriele Monaco
2025-07-28 15:53   ` Nam Cao
2025-07-29  8:46     ` Gabriele Monaco
2025-07-29  9:25       ` Nam Cao
2025-07-29  9:37         ` Nam Cao
2025-07-29 14:06           ` Gabriele Monaco
2025-07-30 12:52             ` Nam Cao
2025-07-30 14:16               ` Nam Cao
2025-07-30 14:44                 ` Nam Cao
2025-07-30 15:09                   ` Nam Cao
2025-07-30 16:28     ` Nam Cao
2025-07-28 13:50 ` [PATCH v5 8/9] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-07-28 15:56   ` Nam Cao
2025-07-28 13:50 ` [PATCH v5 9/9] rv: Add opid per-cpu monitor Gabriele Monaco
2025-07-28 15:58   ` Nam Cao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).