* [RFC PATCH 1/9] tools/rv: Do not skip idle in trace
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 2/9] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
` (6 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli
Currently, the userspace RV tool skips trace events triggered by the RV
tool itself, this can be changed by passing the parameter -s, which sets
the variable config_my_pid to 0 (instead of the tool's PID).
The current condition for per-task monitors (config_has_id) does not
check that config_my_pid isn't 0 to skip. In case we pass -s, we show
events triggered by RV but don't show those triggered by idle (PID 0).
Fix the condition to account this scenario.
Fixes: 6d60f89691fc ("tools/rv: Add in-kernel monitor interface")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
tools/verification/rv/src/in_kernel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/verification/rv/src/in_kernel.c b/tools/verification/rv/src/in_kernel.c
index c0dcee795c0de..72b03bae021cd 100644
--- a/tools/verification/rv/src/in_kernel.c
+++ b/tools/verification/rv/src/in_kernel.c
@@ -429,7 +429,7 @@ ikm_event_handler(struct trace_seq *s, struct tep_record *record,
tep_get_common_field_val(s, trace_event, "common_pid", record, &pid, 1);
- if (config_has_id && (config_my_pid == id))
+ if (config_my_pid && config_has_id && (config_my_pid == id))
return 0;
else if (config_my_pid && (config_my_pid == pid))
return 0;
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 2/9] tools/rv: Stop gracefully also on SIGTERM
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
2025-04-04 8:45 ` [RFC PATCH 1/9] tools/rv: Do not skip idle in trace Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 3/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
` (5 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli
Currently the userspace RV tool starts a monitor and waits for the user
to press Ctrl-C (SIGINT) to terminate and stop the monitor.
This doesn't account for a scenario where a user starts RV in background
and simply kills it (SIGTERM unless the user specifies differently).
E.g.:
# rv mon wip &
# kill %
Would terminate RV without stopping the monitor and next RV executions
won't start correctly.
Register the signal handler used for SIGINT also to SIGTERM.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
tools/verification/rv/src/rv.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/verification/rv/src/rv.c b/tools/verification/rv/src/rv.c
index 239de054d1e06..b8fe24a87d97c 100644
--- a/tools/verification/rv/src/rv.c
+++ b/tools/verification/rv/src/rv.c
@@ -191,6 +191,7 @@ int main(int argc, char **argv)
* and exit.
*/
signal(SIGINT, stop_rv);
+ signal(SIGTERM, stop_rv);
rv_mon(argc - 1, &argv[1]);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 3/9] rv: Add da_handle_start_run_event_ to per-task monitors
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
2025-04-04 8:45 ` [RFC PATCH 1/9] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 2/9] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 4/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
` (4 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli
The RV da_monitor API allows to start monitors in two ways:
da_handle_start_event_NAME and da_handle_start_run_event_NAME.
The former is used when the event is followed by the initial state of
the module, so we ignore the event but we know the monitor is in the
initial state and can start monitoring, the latter can be used if the
event can only occur in the initial state, so we do handle the event as
if the monitor was in the initial state.
This latter API is defined for implicit monitors but not per-task ones.
Define da_handle_start_run_event_NAME macro also for per-task monitors.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/rv/da_monitor.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 510c88bfabd43..215c3eb770ccc 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -512,6 +512,30 @@ da_handle_start_event_##name(struct task_struct *tsk, enum events_##name event)
__da_handle_event_##name(da_mon, tsk, event); \
\
return 1; \
+} \
+ \
+/* \
+ * da_handle_start_run_event_##name - start monitoring and handle event \
+ * \
+ * This function is used to notify the monitor that the system is in the \
+ * initial state, so the monitor can start monitoring and handling event. \
+ */ \
+static inline bool \
+da_handle_start_run_event_##name(struct task_struct *tsk, enum events_##name event) \
+{ \
+ struct da_monitor *da_mon; \
+ \
+ if (!da_monitor_enabled_##name()) \
+ return 0; \
+ \
+ da_mon = da_get_monitor_##name(tsk); \
+ \
+ if (unlikely(!da_monitoring_##name(da_mon))) \
+ da_monitor_start_##name(da_mon); \
+ \
+ __da_handle_event_##name(da_mon, tsk, event); \
+ \
+ return 1; \
}
/*
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 4/9] rv: Remove trailing whitespace from tracepoint string
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
` (2 preceding siblings ...)
2025-04-04 8:45 ` [RFC PATCH 3/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 5/9] sched: Add sched tracepoints for RV task model Gabriele Monaco
` (3 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Masami Hiramatsu,
linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli
RV event tracepoints print a line with the format:
"event_xyz: S0 x event -> S1 "
"event_xyz: S1 x event -> S0 (final)"
While printing an event leading to a non-final state, the line
has a trailing white space (visible above before the closing ").
Adapt the format string not to print the trailing whitespace if we are
not printing "(final)".
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
kernel/trace/rv/rv_trace.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 422b75f58891e..18fa0e358a30e 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -29,11 +29,11 @@ DECLARE_EVENT_CLASS(event_da_monitor,
__entry->final_state = final_state;
),
- TP_printk("%s x %s -> %s %s",
+ TP_printk("%s x %s -> %s%s",
__entry->state,
__entry->event,
__entry->next_state,
- __entry->final_state ? "(final)" : "")
+ __entry->final_state ? " (final)" : "")
);
DECLARE_EVENT_CLASS(error_da_monitor,
@@ -90,12 +90,12 @@ DECLARE_EVENT_CLASS(event_da_monitor_id,
__entry->final_state = final_state;
),
- TP_printk("%d: %s x %s -> %s %s",
+ TP_printk("%d: %s x %s -> %s%s",
__entry->id,
__entry->state,
__entry->event,
__entry->next_state,
- __entry->final_state ? "(final)" : "")
+ __entry->final_state ? " (final)" : "")
);
DECLARE_EVENT_CLASS(error_da_monitor_id,
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 5/9] sched: Add sched tracepoints for RV task model
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
` (3 preceding siblings ...)
2025-04-04 8:45 ` [RFC PATCH 4/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
` (2 subsequent siblings)
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Ingo Molnar, Peter Zijlstra, Steven Rostedt,
Masami Hiramatsu, linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli
Add the following tracepoints:
* sched_set_need_resched(tsk, cpu, tif)
Called when a task is set the need resched [lazy] flag
* sched_switch_vain(preempt, tsk, tsk_state)
Called when a task is selected again during __schedule
i.e. prev == next == tsk : no real context switch
These tracepoints are useful to describe the Linux task model and are
adapted from the patches by Daniel Bristot de Oliveira
(https://bristot.me/linux-task-model/).
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/linux/sched.h | 7 ++++++-
include/trace/events/sched.h | 8 ++++++++
kernel/sched/core.c | 9 +++++++++
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f96ac19828934..7bcd37493ab07 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -340,9 +340,11 @@ extern void io_schedule_finish(int token);
extern long io_schedule_timeout(long timeout);
extern void io_schedule(void);
-/* wrapper function to trace from this header file */
+/* wrapper functions to trace from this header file */
DECLARE_TRACEPOINT(sched_set_state_tp);
extern void __trace_set_current_state(int state_value);
+DECLARE_TRACEPOINT(sched_set_need_resched_tp);
+extern void __trace_set_need_resched(struct task_struct *curr, int tif);
/**
* struct prev_cputime - snapshot of system and user cputime
@@ -2064,6 +2066,9 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
static inline void set_tsk_need_resched(struct task_struct *tsk)
{
+ if (tracepoint_enabled(sched_set_need_resched_tp) &&
+ !test_tsk_thread_flag(tsk, TIF_NEED_RESCHED))
+ __trace_set_need_resched(tsk, TIF_NEED_RESCHED);
set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
}
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 8994e97d86c13..77bbcce407cb4 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -835,6 +835,14 @@ DECLARE_TRACE_CONDITION(sched_set_state_tp,
TP_ARGS(tsk, state),
TP_CONDITION(!!(tsk->__state) != !!state));
+DECLARE_TRACE(sched_set_need_resched_tp,
+ TP_PROTO(struct task_struct *tsk, int cpu, int tif),
+ TP_ARGS(tsk, cpu, tif));
+
+DECLARE_TRACE(sched_switch_vain_tp,
+ TP_PROTO(bool preempt, struct task_struct *tsk, unsigned int prev_state),
+ TP_ARGS(preempt, tsk, prev_state));
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cfaca3040b2f8..f2f79236d5811 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1109,6 +1109,7 @@ static void __resched_curr(struct rq *rq, int tif)
cpu = cpu_of(rq);
+ trace_sched_set_need_resched_tp(curr, cpu, tif);
if (cpu == smp_processor_id()) {
set_ti_thread_flag(cti, tif);
if (tif == TIF_NEED_RESCHED)
@@ -1124,6 +1125,13 @@ static void __resched_curr(struct rq *rq, int tif)
}
}
+#ifdef CONFIG_SMP
+void __trace_set_need_resched(struct task_struct *curr, int tif)
+{
+ trace_sched_set_need_resched_tp(curr, smp_processor_id(), tif);
+}
+#endif
+
void resched_curr(struct rq *rq)
{
__resched_curr(rq, TIF_NEED_RESCHED);
@@ -6767,6 +6775,7 @@ static void __sched notrace __schedule(int sched_mode)
rq = context_switch(rq, prev, next, &rf);
} else {
rq_unpin_lock(rq, &rf);
+ trace_sched_switch_vain_tp(preempt, prev, prev_state);
__balance_callbacks(rq);
raw_spin_rq_unlock_irq(rq);
}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
` (4 preceding siblings ...)
2025-04-04 8:45 ` [RFC PATCH 5/9] sched: Add sched tracepoints for RV task model Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-11 4:52 ` Nam Cao
2025-04-04 8:45 ` [RFC PATCH 8/9] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 9/9] rv: Add srs per-task monitor Gabriele Monaco
7 siblings, 1 reply; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, linux-trace-kernel
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
Juri Lelli
DA monitor can be accessed from multiple cores simultaneously, this is
likely, for instance when dealing with per-task monitors reacting on
events that do not always occur on the CPU where the task is running.
This can cause race conditions where two events change the next state
and we see inconsistent values. E.g.:
[62] event_srs: 27: sleepable x sched_wakeup -> running (final)
[63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
[63] error_srs: 27: event sched_switch_suspend not expected in the state running
In this case the monitor fails because the event on CPU 62 wins against
the one on CPU 63, although the correct state should have been
sleepable, since the task get suspended.
Detect if the current state was modified by using try_cmpxchg while
storing the next value. If it was, try again reading the current state.
After a maximum number of failed retries, react as if it was an error
with invalid current state (we cannot determine it).
Monitors where this type of condition can occur must be able to account
for racing events in any possible order, as we cannot know the winner.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
include/linux/rv.h | 3 ++-
include/rv/da_monitor.h | 53 +++++++++++++++++++++++++++++++----------
2 files changed, 42 insertions(+), 14 deletions(-)
diff --git a/include/linux/rv.h b/include/linux/rv.h
index 3452b5e4b29e7..a83a81ac6e466 100644
--- a/include/linux/rv.h
+++ b/include/linux/rv.h
@@ -7,7 +7,8 @@
#ifndef _LINUX_RV_H
#define _LINUX_RV_H
-#define MAX_DA_NAME_LEN 32
+#define MAX_DA_NAME_LEN 32
+#define MAX_DA_RETRY_RACING_EVENTS 3
#ifdef CONFIG_RV
/*
diff --git a/include/rv/da_monitor.h b/include/rv/da_monitor.h
index 215c3eb770ccc..8b714e3085a55 100644
--- a/include/rv/da_monitor.h
+++ b/include/rv/da_monitor.h
@@ -82,16 +82,19 @@ static inline void da_monitor_reset_##name(struct da_monitor *da_mon) \
*/ \
static inline type da_monitor_curr_state_##name(struct da_monitor *da_mon) \
{ \
- return da_mon->curr_state; \
+ return READ_ONCE(da_mon->curr_state); \
} \
\
/* \
* da_monitor_set_state_##name - set the new current state \
+ * \
+ * return false without the change in case the state was modified elsewhere \
*/ \
-static inline void \
-da_monitor_set_state_##name(struct da_monitor *da_mon, enum states_##name state) \
+static inline bool \
+da_monitor_set_state_##name(struct da_monitor *da_mon, enum states_##name prev_state, \
+ enum states_##name state) \
{ \
- da_mon->curr_state = state; \
+ return try_cmpxchg(&da_mon->curr_state, &prev_state, state); \
} \
\
/* \
@@ -150,17 +153,29 @@ static inline bool da_monitor_handling_event_##name(struct da_monitor *da_mon)
* Event handler for implicit monitors. Implicit monitor is the one which the
* handler does not need to specify which da_monitor to manipulate. Examples
* of implicit monitor are the per_cpu or the global ones.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
*/
#define DECLARE_DA_MON_MODEL_HANDLER_IMPLICIT(name, type) \
\
static inline bool \
da_event_##name(struct da_monitor *da_mon, enum events_##name event) \
{ \
- type curr_state = da_monitor_curr_state_##name(da_mon); \
- type next_state = model_get_next_state_##name(curr_state, event); \
+ bool changed; \
+ type curr_state, next_state; \
\
- if (next_state != INVALID_STATE) { \
- da_monitor_set_state_##name(da_mon, next_state); \
+ for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \
+ curr_state = da_monitor_curr_state_##name(da_mon); \
+ next_state = model_get_next_state_##name(curr_state, event); \
+ if (next_state == INVALID_STATE) \
+ break; \
+ changed = da_monitor_set_state_##name(da_mon, curr_state, next_state); \
+ if (unlikely(!changed)) { \
+ curr_state = -1; \
+ continue; \
+ } \
\
trace_event_##name(model_get_state_name_##name(curr_state), \
model_get_event_name_##name(event), \
@@ -181,17 +196,29 @@ da_event_##name(struct da_monitor *da_mon, enum events_##name event) \
/*
* Event handler for per_task monitors.
+ *
+ * Retry, in case there is a race while getting and setting the next state
+ * return an invalid current state if we run out of retries. The monitor should
+ * be able to handle various orders.
*/
#define DECLARE_DA_MON_MODEL_HANDLER_PER_TASK(name, type) \
\
static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct *tsk, \
enum events_##name event) \
{ \
- type curr_state = da_monitor_curr_state_##name(da_mon); \
- type next_state = model_get_next_state_##name(curr_state, event); \
- \
- if (next_state != INVALID_STATE) { \
- da_monitor_set_state_##name(da_mon, next_state); \
+ bool changed; \
+ type curr_state, next_state; \
+ \
+ for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \
+ curr_state = da_monitor_curr_state_##name(da_mon); \
+ next_state = model_get_next_state_##name(curr_state, event); \
+ if (next_state == INVALID_STATE) \
+ break; \
+ changed = da_monitor_set_state_##name(da_mon, curr_state, next_state); \
+ if (unlikely(!changed)) { \
+ curr_state = -1; \
+ continue; \
+ } \
\
trace_event_##name(tsk->pid, \
model_get_state_name_##name(curr_state), \
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 8/9] rv: Replace tss monitor with more complete sts
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
` (5 preceding siblings ...)
2025-04-04 8:45 ` [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 9/9] rv: Add srs per-task monitor Gabriele Monaco
7 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
Juri Lelli
The tss monitor currently guarantees context switches can happen only
while scheduling, but it doesn't enforce that each scheduling call
implies a task switch. This can be implied only if we rely on the newly
introduced sched_switch_vain tracepoint, which represents a
scheduler call where the previously running task is the same that is
picked to run next, in fact no context is switched.
Replace the monitor with a more comprehensive specification which
implies tss but also ensures that:
* each scheduler call switches context (or has a vain switch)
* each context switch happens with interrupts disabled
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 68 +++++++---
kernel/trace/rv/Kconfig | 2 +-
kernel/trace/rv/Makefile | 2 +-
kernel/trace/rv/monitors/sts/Kconfig | 18 +++
kernel/trace/rv/monitors/sts/sts.c | 118 ++++++++++++++++++
kernel/trace/rv/monitors/sts/sts.h | 62 +++++++++
.../{tss/tss_trace.h => sts/sts_trace.h} | 8 +-
kernel/trace/rv/monitors/tss/Kconfig | 14 ---
kernel/trace/rv/monitors/tss/tss.c | 91 --------------
kernel/trace/rv/monitors/tss/tss.h | 47 -------
kernel/trace/rv/rv_trace.h | 2 +-
tools/verification/models/sched/sts.dot | 29 +++++
tools/verification/models/sched/tss.dot | 18 ---
13 files changed, 282 insertions(+), 197 deletions(-)
create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
create mode 100644 kernel/trace/rv/monitors/sts/sts.c
create mode 100644 kernel/trace/rv/monitors/sts/sts.h
rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
delete mode 100644 kernel/trace/rv/monitors/tss/Kconfig
delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
create mode 100644 tools/verification/models/sched/sts.dot
delete mode 100644 tools/verification/models/sched/tss.dot
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 24b2c62a3bc26..641e70e359aaa 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -40,26 +40,6 @@ defined in by Daniel Bristot in [1].
Currently we included the following:
-Monitor tss
-~~~~~~~~~~~
-
-The task switch while scheduling (tss) monitor ensures a task switch happens
-only in scheduling context, that is inside a call to `__schedule`::
-
- |
- |
- v
- +-----------------+
- | thread | <+
- +-----------------+ |
- | |
- | schedule_entry | schedule_exit
- v |
- sched_switch |
- +--------------- |
- | sched |
- +--------------> -+
-
Monitor sco
~~~~~~~~~~~
@@ -165,6 +145,54 @@ schedule is not called with interrupt disabled::
|
cant_sched -+
+Monitor sts
+~~~~~~~~~~~
+
+The schedule implies task switch (sts) monitor ensures a task switch happens in
+every scheduling context, that is inside a call to ``__schedule``, as well as no
+task switch can happen without scheduling and before interrupts are disabled.
+This require the special type of switch called vain, which occurs when the next
+task picked for execution is the same as the previously running one, in fact no
+real task switch occurs::
+
+ |
+ |
+ v
+ #====================# irq_disable
+ H H irq_enable
+ H thread H --------------+
+ H H |
+ +-------------> H H <-------------+
+ | #====================#
+ | |
+ | | schedule_entry
+ | v
+ | +--------------------+
+ | | scheduling | <+
+ | +--------------------+ |
+ | | |
+ | | irq_disable | irq_enable
+ | v |
+ | +--------------------+ |
+ | | disable_to_switch | -+
+ | schedule_exit +--------------------+
+ | |
+ | | sched_switch
+ | | sched_switch_vain
+ | v
+ | +--------------------+
+ | | switching |
+ | +--------------------+
+ | |
+ | | irq_enable
+ | v
+ | +--------------------+ irq_disable
+ | | | irq_enable
+ | | enable_to_exit | --------------+
+ | | | |
+ +-------------- | | <-------------+
+ +--------------------+
+
References
----------
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index b39f36013ef23..a53a3eca9616d 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -28,12 +28,12 @@ menuconfig RV
source "kernel/trace/rv/monitors/wip/Kconfig"
source "kernel/trace/rv/monitors/wwnr/Kconfig"
source "kernel/trace/rv/monitors/sched/Kconfig"
-source "kernel/trace/rv/monitors/tss/Kconfig"
source "kernel/trace/rv/monitors/sco/Kconfig"
source "kernel/trace/rv/monitors/snroc/Kconfig"
source "kernel/trace/rv/monitors/scpd/Kconfig"
source "kernel/trace/rv/monitors/snep/Kconfig"
source "kernel/trace/rv/monitors/sncid/Kconfig"
+source "kernel/trace/rv/monitors/sts/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index f9b2cd0483c3c..c609b72275cb8 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -6,12 +6,12 @@ obj-$(CONFIG_RV) += rv.o
obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
-obj-$(CONFIG_RV_MON_TSS) += monitors/tss/tss.o
obj-$(CONFIG_RV_MON_SCO) += monitors/sco/sco.o
obj-$(CONFIG_RV_MON_SNROC) += monitors/snroc/snroc.o
obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
obj-$(CONFIG_RV_MON_SNEP) += monitors/snep/snep.o
obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
+obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/sts/Kconfig b/kernel/trace/rv/monitors/sts/Kconfig
new file mode 100644
index 0000000000000..5b486dac3f10f
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_STS
+ depends on RV
+ depends on IRQSOFF_TRACER
+ depends on RV_MON_SCHED
+ default y
+ select DA_MON_EVENTS_IMPLICIT
+ bool "sts monitor"
+ help
+ Monitor to ensure relationships between scheduler and switches
+ * each call to the scheduler implies a switch
+ * switches only happen inside the scheduler
+ * switches happen with interrupt disabled
+ This monitor is part of the sched monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sts/sts.c b/kernel/trace/rv/monitors/sts/sts.c
new file mode 100644
index 0000000000000..0a3ed35bfe894
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sts"
+
+#include <trace/events/sched.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sts.h"
+
+static struct rv_monitor rv_sts;
+DECLARE_DA_MON_PER_CPU(sts, unsigned char);
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_sts(irq_disable_sts);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_sts(irq_enable_sts);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ da_handle_event_sts(sched_switch_sts);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ da_handle_event_sts(sched_switch_vain_sts);
+}
+
+static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+{
+ da_handle_event_sts(schedule_entry_sts);
+}
+
+static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+{
+ da_handle_start_event_sts(schedule_exit_sts);
+}
+
+static int enable_sts(void)
+{
+ int retval;
+
+ retval = da_monitor_init_sts();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("sts", irq_disable, handle_irq_disable);
+ rv_attach_trace_probe("sts", irq_enable, handle_irq_enable);
+ rv_attach_trace_probe("sts", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_attach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+ rv_attach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+ return 0;
+}
+
+static void disable_sts(void)
+{
+ rv_sts.enabled = 0;
+
+ rv_detach_trace_probe("sts", irq_disable, handle_irq_disable);
+ rv_detach_trace_probe("sts", irq_enable, handle_irq_enable);
+ rv_detach_trace_probe("sts", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_detach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+ rv_detach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+ da_monitor_destroy_sts();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_sts = {
+ .name = "sts",
+ .description = "schedule implies task switch.",
+ .enable = enable_sts,
+ .disable = disable_sts,
+ .reset = da_monitor_reset_all_sts,
+ .enabled = 0,
+};
+
+static int __init register_sts(void)
+{
+ rv_register_monitor(&rv_sts, &rv_sched);
+ return 0;
+}
+
+static void __exit unregister_sts(void)
+{
+ rv_unregister_monitor(&rv_sts);
+}
+
+module_init(register_sts);
+module_exit(unregister_sts);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sts: schedule implies task switch.");
diff --git a/kernel/trace/rv/monitors/sts/sts.h b/kernel/trace/rv/monitors/sts/sts.h
new file mode 100644
index 0000000000000..6921c00422930
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sts automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sts {
+ thread_sts = 0,
+ disable_to_switch_sts,
+ enable_to_exit_sts,
+ scheduling_sts,
+ switching_sts,
+ state_max_sts
+};
+
+#define INVALID_STATE state_max_sts
+
+enum events_sts {
+ irq_disable_sts = 0,
+ irq_enable_sts,
+ sched_switch_sts,
+ sched_switch_vain_sts,
+ schedule_entry_sts,
+ schedule_exit_sts,
+ event_max_sts
+};
+
+struct automaton_sts {
+ char *state_names[state_max_sts];
+ char *event_names[event_max_sts];
+ unsigned char function[state_max_sts][event_max_sts];
+ unsigned char initial_state;
+ bool final_states[state_max_sts];
+};
+
+static const struct automaton_sts automaton_sts = {
+ .state_names = {
+ "thread",
+ "disable_to_switch",
+ "enable_to_exit",
+ "scheduling",
+ "switching"
+ },
+ .event_names = {
+ "irq_disable",
+ "irq_enable",
+ "sched_switch",
+ "sched_switch_vain",
+ "schedule_entry",
+ "schedule_exit"
+ },
+ .function = {
+ { thread_sts, thread_sts, INVALID_STATE, INVALID_STATE, scheduling_sts, INVALID_STATE },
+ { INVALID_STATE, scheduling_sts, switching_sts, switching_sts, INVALID_STATE, INVALID_STATE },
+ { enable_to_exit_sts, enable_to_exit_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, thread_sts },
+ { disable_to_switch_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ { INVALID_STATE, enable_to_exit_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ },
+ .initial_state = thread_sts,
+ .final_states = { 1, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h b/kernel/trace/rv/monitors/sts/sts_trace.h
similarity index 67%
rename from kernel/trace/rv/monitors/tss/tss_trace.h
rename to kernel/trace/rv/monitors/sts/sts_trace.h
index 4619dbb50cc06..d78beb58d5b3d 100644
--- a/kernel/trace/rv/monitors/tss/tss_trace.h
+++ b/kernel/trace/rv/monitors/sts/sts_trace.h
@@ -4,12 +4,12 @@
* Snippet to be included in rv_trace.h
*/
-#ifdef CONFIG_RV_MON_TSS
-DEFINE_EVENT(event_da_monitor, event_tss,
+#ifdef CONFIG_RV_MON_STS
+DEFINE_EVENT(event_da_monitor, event_sts,
TP_PROTO(char *state, char *event, char *next_state, bool final_state),
TP_ARGS(state, event, next_state, final_state));
-DEFINE_EVENT(error_da_monitor, error_tss,
+DEFINE_EVENT(error_da_monitor, error_sts,
TP_PROTO(char *state, char *event),
TP_ARGS(state, event));
-#endif /* CONFIG_RV_MON_TSS */
+#endif /* CONFIG_RV_MON_STS */
diff --git a/kernel/trace/rv/monitors/tss/Kconfig b/kernel/trace/rv/monitors/tss/Kconfig
deleted file mode 100644
index 479f86f52e60d..0000000000000
--- a/kernel/trace/rv/monitors/tss/Kconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-#
-config RV_MON_TSS
- depends on RV
- depends on RV_MON_SCHED
- default y
- select DA_MON_EVENTS_IMPLICIT
- bool "tss monitor"
- help
- Monitor to ensure sched_switch happens only in scheduling context.
- This monitor is part of the sched monitors collection.
-
- For further information, see:
- Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
deleted file mode 100644
index 542787e6524fc..0000000000000
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ /dev/null
@@ -1,91 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "tss"
-
-#include <trace/events/sched.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "tss.h"
-
-static struct rv_monitor rv_tss;
-DECLARE_DA_MON_PER_CPU(tss, unsigned char);
-
-static void handle_sched_switch(void *data, bool preempt,
- struct task_struct *prev,
- struct task_struct *next,
- unsigned int prev_state)
-{
- da_handle_event_tss(sched_switch_tss);
-}
-
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
-{
- da_handle_event_tss(schedule_entry_tss);
-}
-
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
-{
- da_handle_start_event_tss(schedule_exit_tss);
-}
-
-static int enable_tss(void)
-{
- int retval;
-
- retval = da_monitor_init_tss();
- if (retval)
- return retval;
-
- rv_attach_trace_probe("tss", sched_switch, handle_sched_switch);
- rv_attach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
- rv_attach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
- return 0;
-}
-
-static void disable_tss(void)
-{
- rv_tss.enabled = 0;
-
- rv_detach_trace_probe("tss", sched_switch, handle_sched_switch);
- rv_detach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
- rv_detach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
- da_monitor_destroy_tss();
-}
-
-static struct rv_monitor rv_tss = {
- .name = "tss",
- .description = "task switch while scheduling.",
- .enable = enable_tss,
- .disable = disable_tss,
- .reset = da_monitor_reset_all_tss,
- .enabled = 0,
-};
-
-static int __init register_tss(void)
-{
- rv_register_monitor(&rv_tss, &rv_sched);
- return 0;
-}
-
-static void __exit unregister_tss(void)
-{
- rv_unregister_monitor(&rv_tss);
-}
-
-module_init(register_tss);
-module_exit(unregister_tss);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("tss: task switch while scheduling.");
diff --git a/kernel/trace/rv/monitors/tss/tss.h b/kernel/trace/rv/monitors/tss/tss.h
deleted file mode 100644
index f0a36fda1b873..0000000000000
--- a/kernel/trace/rv/monitors/tss/tss.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of tss automaton
- * For further information about this format, see kernel documentation:
- * Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_tss {
- thread_tss = 0,
- sched_tss,
- state_max_tss
-};
-
-#define INVALID_STATE state_max_tss
-
-enum events_tss {
- sched_switch_tss = 0,
- schedule_entry_tss,
- schedule_exit_tss,
- event_max_tss
-};
-
-struct automaton_tss {
- char *state_names[state_max_tss];
- char *event_names[event_max_tss];
- unsigned char function[state_max_tss][event_max_tss];
- unsigned char initial_state;
- bool final_states[state_max_tss];
-};
-
-static const struct automaton_tss automaton_tss = {
- .state_names = {
- "thread",
- "sched"
- },
- .event_names = {
- "sched_switch",
- "schedule_entry",
- "schedule_exit"
- },
- .function = {
- { INVALID_STATE, sched_tss, INVALID_STATE },
- { sched_tss, INVALID_STATE, thread_tss },
- },
- .initial_state = thread_tss,
- .final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 18fa0e358a30e..a5e1c52e29926 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -58,11 +58,11 @@ DECLARE_EVENT_CLASS(error_da_monitor,
);
#include <monitors/wip/wip_trace.h>
-#include <monitors/tss/tss_trace.h>
#include <monitors/sco/sco_trace.h>
#include <monitors/scpd/scpd_trace.h>
#include <monitors/snep/snep_trace.h>
#include <monitors/sncid/sncid_trace.h>
+#include <monitors/sts/sts_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
#endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/sts.dot b/tools/verification/models/sched/sts.dot
new file mode 100644
index 0000000000000..8152675b4f528
--- /dev/null
+++ b/tools/verification/models/sched/sts.dot
@@ -0,0 +1,29 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = circle] "disable_to_switch"};
+ {node [shape = circle] "enable_to_exit"};
+ {node [shape = circle] "scheduling"};
+ {node [shape = circle] "switching"};
+ {node [shape = plaintext, style=invis, label=""] "__init_thread"};
+ {node [shape = doublecircle] "thread"};
+ {node [shape = circle] "thread"};
+ "__init_thread" -> "thread";
+ "disable_to_switch" [label = "disable_to_switch"];
+ "disable_to_switch" -> "scheduling" [ label = "irq_enable" ];
+ "disable_to_switch" -> "switching" [ label = "sched_switch\nsched_switch_vain" ];
+ "enable_to_exit" [label = "enable_to_exit"];
+ "enable_to_exit" -> "enable_to_exit" [ label = "irq_disable\nirq_enable" ];
+ "enable_to_exit" -> "thread" [ label = "schedule_exit" ];
+ "scheduling" [label = "scheduling"];
+ "scheduling" -> "disable_to_switch" [ label = "irq_disable" ];
+ "switching" [label = "switching"];
+ "switching" -> "enable_to_exit" [ label = "irq_enable" ];
+ "thread" [label = "thread", color = green3];
+ "thread" -> "scheduling" [ label = "schedule_entry" ];
+ "thread" -> "thread" [ label = "irq_disable\nirq_enable" ];
+ { rank = min ;
+ "__init_thread";
+ "thread";
+ }
+}
diff --git a/tools/verification/models/sched/tss.dot b/tools/verification/models/sched/tss.dot
deleted file mode 100644
index 7dfa1d9121bbd..0000000000000
--- a/tools/verification/models/sched/tss.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
- center = true;
- size = "7,11";
- {node [shape = plaintext] "sched"};
- {node [shape = plaintext, style=invis, label=""] "__init_thread"};
- {node [shape = ellipse] "thread"};
- {node [shape = plaintext] "thread"};
- "__init_thread" -> "thread";
- "sched" [label = "sched"];
- "sched" -> "sched" [ label = "sched_switch" ];
- "sched" -> "thread" [ label = "schedule_exit" ];
- "thread" [label = "thread", color = green3];
- "thread" -> "sched" [ label = "schedule_entry" ];
- { rank = min ;
- "__init_thread";
- "thread";
- }
-}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH 9/9] rv: Add srs per-task monitor
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
` (6 preceding siblings ...)
2025-04-04 8:45 ` [RFC PATCH 8/9] rv: Replace tss monitor with more complete sts Gabriele Monaco
@ 2025-04-04 8:45 ` Gabriele Monaco
2025-04-10 8:53 ` Juri Lelli
7 siblings, 1 reply; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-04 8:45 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Tomas Glozar,
Juri Lelli
Add a per-task monitor for task switches as part of the sched model:
* srs:
Monitor to describe conditions for different types of task switch.
This monitor enforces rules such as preempt after setting need
for reschedule and suspend after setting a task to sleepable.
This new monitor implies the previously introduced snroc (set non
runnable on its own context), so replace that monitor with srs.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 169 +++++++++++++++---
kernel/trace/rv/Kconfig | 2 +-
kernel/trace/rv/Makefile | 2 +-
kernel/trace/rv/monitors/snroc/snroc.c | 85 ---------
kernel/trace/rv/monitors/snroc/snroc.h | 47 -----
.../trace/rv/monitors/{snroc => srs}/Kconfig | 8 +-
kernel/trace/rv/monitors/srs/srs.c | 135 ++++++++++++++
kernel/trace/rv/monitors/srs/srs.h | 87 +++++++++
.../{snroc/snroc_trace.h => srs/srs_trace.h} | 8 +-
kernel/trace/rv/rv_trace.h | 2 +-
tools/verification/models/sched/snroc.dot | 18 --
tools/verification/models/sched/srs.dot | 61 +++++++
12 files changed, 443 insertions(+), 181 deletions(-)
delete mode 100644 kernel/trace/rv/monitors/snroc/snroc.c
delete mode 100644 kernel/trace/rv/monitors/snroc/snroc.h
rename kernel/trace/rv/monitors/{snroc => srs}/Kconfig (52%)
create mode 100644 kernel/trace/rv/monitors/srs/srs.c
create mode 100644 kernel/trace/rv/monitors/srs/srs.h
rename kernel/trace/rv/monitors/{snroc/snroc_trace.h => srs/srs_trace.h} (67%)
delete mode 100644 tools/verification/models/sched/snroc.dot
create mode 100644 tools/verification/models/sched/srs.dot
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 641e70e359aaa..c1516b71b8e73 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -61,27 +61,6 @@ happen only in thread context::
|
scheduling_context -+
-Monitor snroc
-~~~~~~~~~~~~~
-
-The set non runnable on its own context (snroc) monitor ensures changes in a
-task state happens only in the respective task's context. This is a per-task
-monitor::
-
- |
- |
- v
- +------------------+
- | other_context | <+
- +------------------+ |
- | |
- | sched_switch_in | sched_switch_out
- v |
- sched_set_state |
- +------------------ |
- | own_context |
- +-----------------> -+
-
Monitor scpd
~~~~~~~~~~~~
@@ -193,6 +172,154 @@ real task switch occurs::
+-------------- | | <-------------+
+--------------------+
+Monitor srs
+-----------
+
+The switch after resched or sleep (srs) monitor describes conditions for
+different types of task switch. This is a complex model, below we are going to
+explain it step by step. Unfortunately splitting this into smaller monitor is
+not trivial due to some shared events such as ``switch_in``::
+
+ set_runnable
+ | wakeup +---+
+ | switch_vain | |
+ v | v wakeup
+ #================================================# set_runnable
+ switch_in H H <----------+
+ +------------------> H running H |
+ | H H -----+ |
+ | #================================================# | |
+ | | | | ^ ^ | |
+ | | switch_yield need_resched | | | |
+ | | | need_resched_lazy | | | |
+ | set_sleepable v | | | | |
+ | | +-------------+ | | | | |
+ | +--------+----> | preempted | ---+- switch_in | | |
+ | | | +-------------+ | | | |
+ | switch_preempt | | | | | |
+ | switch_yield | need_resched | +- switch_vain | |
+ | | | v | | | |
+ | | | +-------------+ | | | |
+ | need_resched -+--------+----> | resched_out | | | | |
+ | | | | +-------------+ | | | |
+ | | | | | | | need_resched | |
+ | | | | switch_in | | wakeup | |
+ | | | | v v | set_runnable | |
+ | | | | +--------------------------+ -------+ | |
+ | | | | | | | | |
+ | | +--------+----- | rescheduling | <------+ | |
+ | | | | | | |
+ | | | +--------------------------+ -----------+ | |
+ | | | | ^ wakeup | | |
+ | | | set_sleepable set_runnable | | |
+ | | | v | | | |
+ | | +------------------+----- +---------------------------+ | | |
+ | | | | | | | | |
+ +--+--+---+------------------+----> | resched_sleepable | ---+ | | |
+ | | | | | | | | | | |
+ | | | | +-------------+----> +---------------------------+ | | | |
+ | | | | | | | ^ | | | | |
+ | | | | | | switch_preempt | need_resched | | | |
+ | | | | | | | | set_sleepable | | | |
+ | | | | | | v +------+ | | | |
+ | | | | | | +---------------------------+ --+------+---+-----+--+
+ | | | | | | | preempted_sleepable | | | | | |
+ | | | | | | +---------------------------+ --+------+---+--+ | |
+ | | | | | | | ^ | | | | | |
+ | | | | | | switch_in switch_preempt | | | | | |
+ | | | | | | v | switch_vain | | | | |
+ | | | | | | +-------------------------+ | | | | | |
+ | | | | | +------> | | <--+ | | | | |
+ | | | | | | sleepable | | | | | |
+ | | | | +- need_resched------- | | ----------+---+--+--+ |
+ | | | | need_resched_lazy +-------------------------+ | | | |
+ | | | | | ^ | switch_block | | |
+ | | | | | | set_sleepable | | | |
+ | | | | switch_block | switch_vain +----------+ | | |
+ | | | | switch_suspend +------+ | | | |
+ | | | | v v | | |
+ | | | | switch_block +-----------------------------+ switch_block | |
+ | | | +-switch_suspend--------> | sleeping | <-----------+ | |
+ | | | +-----------------------------+ | |
+ | | | | wakeup | |
+ | | | v | |
+ | | +- need_resched ------------- +-------------+ wakeup | |
+ | | | waking | <------------------------------+ |
+ | +------------------------------- +-------------+ |
+ | |
+ | +-----------------------+ |
+ +----- switch_in -------- | resched_out_sleepable | <-- sched_need_resched --------------+
+ +-----------------------+
+
+Types of switches:
+
+* ``switch_in``:
+ a non running task is scheduled in, this leads to ``running`` if the task is
+ runnable and ``sleepable`` if the task was preempted before sleeping.
+* ``switch_suspend``:
+ a task puts itself to sleep, this can happen only after explicitly setting
+ the task to ``sleepable``. After a task is suspended, it needs to be woken up
+ (``waking`` state) before being switched in again. The task can be set to
+ ``resched_sleepable`` via a ``need_resched`` but not preempted, in which case it
+ is equivalent to ``sleepable``.
+ Setting the task's state to ``sleepable`` can be reverted before switching if it
+ is woken up or set to runnable.
+* ``switch_blocked``:
+ a special case of a ``switch_suspend`` where the task is waiting on a
+ sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
+ state events racing with each other and this leads the model to perceive this
+ type of switch when the task is not set to sleepable. This is a limitation of
+ the model in SMP system and workarounds may require to slow down the
+ scheduler.
+* ``switch_yield``:
+ a task explicitly calls the scheduler, this looks like a preemption as the
+ task is still runnable but the ``need_resched`` flag is not set. It can
+ happen after a ``yield`` system call or from the idle task.
+* ``switch_preempt``:
+ a task is ``preempted``, this can happen after the need for ``rescheduling``
+ has been set, also in its ``lazy`` flavour. ``need_resched`` can be set as a
+ flag to the task or in the per-core preemption count, either of them can
+ trigger a preemption.
+ The task was previously running and can be switched in directly, but it is
+ possible that a task is preempted after it sets itself as ``sleepable``
+ (``preempted_sleepable``), in this condition, once the task is switched back
+ in, it will not be ``running`` but continue its sleeping process in
+ ``sleepable``.
+* ``switch_vain``:
+ a task goes through the scheduler but it is picked as the next task to run,
+ hence no real task switch occurs. Since we run the scheduler, this clears the
+ need to reschedule.
+
+The ``resched_out`` state (``resched_out_sleepable`` if the task is sleepable)
+is a special case reached if the ``need_resched`` is set after picking the next
+task but before switching. In this case, the previous task is switched out
+normally but once it switches in, it can be preempted in ``rescheduling``. This
+can happen, for instance, when a task disables migration and we do a
+dequeue/enqueue before actually switching out.
+
+This monitor has several events that can race with each other, for instance if
+running from multiple CPUs (e.g. ``need_resched`` and ``wakeup`` can occur from
+a remote CPU), we need to account for them in any possible order.
+
+This monitor allows set state (runnable or sleepable) only in states
+``running``, ``rescheduling``, ``resched_sleepable``, and ``sleepable``, and
+not for states where the task is ``sleeping`` or ``preempted``. This implies
+the set state event occurs only in the task's context::
+
+ |
+ |
+ v
+ +------------------+
+ | other_context | <+
+ +------------------+ |
+ | |
+ | switch_in | switch_out
+ v |
+ sched_set_state |
+ +------------------ |
+ | own_context |
+ +-----------------> -+
+
References
----------
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index a53a3eca9616d..f296095f1df28 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -29,11 +29,11 @@ source "kernel/trace/rv/monitors/wip/Kconfig"
source "kernel/trace/rv/monitors/wwnr/Kconfig"
source "kernel/trace/rv/monitors/sched/Kconfig"
source "kernel/trace/rv/monitors/sco/Kconfig"
-source "kernel/trace/rv/monitors/snroc/Kconfig"
source "kernel/trace/rv/monitors/scpd/Kconfig"
source "kernel/trace/rv/monitors/snep/Kconfig"
source "kernel/trace/rv/monitors/sncid/Kconfig"
source "kernel/trace/rv/monitors/sts/Kconfig"
+source "kernel/trace/rv/monitors/srs/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index c609b72275cb8..5aba1d85ad662 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -7,11 +7,11 @@ obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
obj-$(CONFIG_RV_MON_SCO) += monitors/sco/sco.o
-obj-$(CONFIG_RV_MON_SNROC) += monitors/snroc/snroc.o
obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
obj-$(CONFIG_RV_MON_SNEP) += monitors/snep/snep.o
obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
+obj-$(CONFIG_RV_MON_SRS) += monitors/srs/srs.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/snroc/snroc.c b/kernel/trace/rv/monitors/snroc/snroc.c
deleted file mode 100644
index bb1f60d552960..0000000000000
--- a/kernel/trace/rv/monitors/snroc/snroc.c
+++ /dev/null
@@ -1,85 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "snroc"
-
-#include <trace/events/sched.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "snroc.h"
-
-static struct rv_monitor rv_snroc;
-DECLARE_DA_MON_PER_TASK(snroc, unsigned char);
-
-static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
-{
- da_handle_event_snroc(tsk, sched_set_state_snroc);
-}
-
-static void handle_sched_switch(void *data, bool preempt,
- struct task_struct *prev,
- struct task_struct *next,
- unsigned int prev_state)
-{
- da_handle_start_event_snroc(prev, sched_switch_out_snroc);
- da_handle_event_snroc(next, sched_switch_in_snroc);
-}
-
-static int enable_snroc(void)
-{
- int retval;
-
- retval = da_monitor_init_snroc();
- if (retval)
- return retval;
-
- rv_attach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
- rv_attach_trace_probe("snroc", sched_switch, handle_sched_switch);
-
- return 0;
-}
-
-static void disable_snroc(void)
-{
- rv_snroc.enabled = 0;
-
- rv_detach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
- rv_detach_trace_probe("snroc", sched_switch, handle_sched_switch);
-
- da_monitor_destroy_snroc();
-}
-
-static struct rv_monitor rv_snroc = {
- .name = "snroc",
- .description = "set non runnable on its own context.",
- .enable = enable_snroc,
- .disable = disable_snroc,
- .reset = da_monitor_reset_all_snroc,
- .enabled = 0,
-};
-
-static int __init register_snroc(void)
-{
- rv_register_monitor(&rv_snroc, &rv_sched);
- return 0;
-}
-
-static void __exit unregister_snroc(void)
-{
- rv_unregister_monitor(&rv_snroc);
-}
-
-module_init(register_snroc);
-module_exit(unregister_snroc);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("snroc: set non runnable on its own context.");
diff --git a/kernel/trace/rv/monitors/snroc/snroc.h b/kernel/trace/rv/monitors/snroc/snroc.h
deleted file mode 100644
index c3650a2b1b107..0000000000000
--- a/kernel/trace/rv/monitors/snroc/snroc.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of snroc automaton
- * For further information about this format, see kernel documentation:
- * Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_snroc {
- other_context_snroc = 0,
- own_context_snroc,
- state_max_snroc
-};
-
-#define INVALID_STATE state_max_snroc
-
-enum events_snroc {
- sched_set_state_snroc = 0,
- sched_switch_in_snroc,
- sched_switch_out_snroc,
- event_max_snroc
-};
-
-struct automaton_snroc {
- char *state_names[state_max_snroc];
- char *event_names[event_max_snroc];
- unsigned char function[state_max_snroc][event_max_snroc];
- unsigned char initial_state;
- bool final_states[state_max_snroc];
-};
-
-static const struct automaton_snroc automaton_snroc = {
- .state_names = {
- "other_context",
- "own_context"
- },
- .event_names = {
- "sched_set_state",
- "sched_switch_in",
- "sched_switch_out"
- },
- .function = {
- { INVALID_STATE, own_context_snroc, INVALID_STATE },
- { own_context_snroc, INVALID_STATE, other_context_snroc },
- },
- .initial_state = other_context_snroc,
- .final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/monitors/snroc/Kconfig b/kernel/trace/rv/monitors/srs/Kconfig
similarity index 52%
rename from kernel/trace/rv/monitors/snroc/Kconfig
rename to kernel/trace/rv/monitors/srs/Kconfig
index 6e4365a2fea3b..a5b3998b32328 100644
--- a/kernel/trace/rv/monitors/snroc/Kconfig
+++ b/kernel/trace/rv/monitors/srs/Kconfig
@@ -1,13 +1,15 @@
# SPDX-License-Identifier: GPL-2.0-only
#
-config RV_MON_SNROC
+config RV_MON_SRS
depends on RV
depends on RV_MON_SCHED
default y
select DA_MON_EVENTS_ID
- bool "snroc monitor"
+ bool "srs monitor"
help
- Monitor to ensure sched_set_state happens only in the respective task's context.
+ Monitor to describe conditions for different types of task switch.
+ This monitor enforces rules such as preempt after setting need
+ for reschedule and suspend after setting a task to sleepable.
This monitor is part of the sched monitors collection.
For further information, see:
diff --git a/kernel/trace/rv/monitors/srs/srs.c b/kernel/trace/rv/monitors/srs/srs.c
new file mode 100644
index 0000000000000..3ea7419581e06
--- /dev/null
+++ b/kernel/trace/rv/monitors/srs/srs.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "srs"
+
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "srs.h"
+
+static struct rv_monitor rv_srs;
+DECLARE_DA_MON_PER_TASK(srs, unsigned char);
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk, int cpu, int tif)
+{
+ if (tif == TIF_NEED_RESCHED)
+ da_handle_event_srs(tsk, sched_need_resched_srs);
+ else
+ da_handle_event_srs(tsk, sched_need_resched_lazy_srs);
+}
+
+static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+{
+ /* special case from preempt_enable */
+ if (preempt && unlikely(!tif_need_resched() && test_preempt_need_resched()))
+ da_handle_event_srs(current, sched_need_resched_srs);
+}
+
+static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+{
+ if (state == TASK_RUNNING)
+ da_handle_event_srs(tsk, sched_set_state_runnable_srs);
+ else
+ da_handle_event_srs(tsk, sched_set_state_sleepable_srs);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ if (preempt)
+ da_handle_event_srs(prev, sched_switch_preempt_srs);
+ else if (prev_state == TASK_RUNNING)
+ da_handle_start_run_event_srs(prev, sched_switch_yield_srs);
+ else if (prev_state == TASK_RTLOCK_WAIT)
+ /* special case of sleeping task with racy conditions */
+ da_handle_event_srs(prev, sched_switch_blocking_srs);
+ else
+ da_handle_event_srs(prev, sched_switch_suspend_srs);
+ /* switch in also leads to sleepable or rescheduling */
+ if (task_is_running(next) && !test_tsk_thread_flag(next, TIF_NEED_RESCHED))
+ da_handle_start_event_srs(next, sched_switch_in_srs);
+ else
+ da_handle_event_srs(next, sched_switch_in_srs);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ da_handle_event_srs(tsk, sched_switch_vain_srs);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+ da_handle_event_srs(p, sched_wakeup_srs);
+}
+
+static int enable_srs(void)
+{
+ int retval;
+
+ retval = da_monitor_init_srs();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("srs", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_attach_trace_probe("srs", sched_set_state_tp, handle_sched_set_state);
+ rv_attach_trace_probe("srs", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("srs", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_attach_trace_probe("srs", sched_wakeup, handle_sched_wakeup);
+ rv_attach_trace_probe("srs", sched_entry_tp, handle_schedule_entry);
+
+ return 0;
+}
+
+static void disable_srs(void)
+{
+ rv_srs.enabled = 0;
+
+ rv_detach_trace_probe("srs", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_detach_trace_probe("srs", sched_set_state_tp, handle_sched_set_state);
+ rv_detach_trace_probe("srs", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("srs", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_detach_trace_probe("srs", sched_wakeup, handle_sched_wakeup);
+ rv_detach_trace_probe("srs", sched_entry_tp, handle_schedule_entry);
+
+ da_monitor_destroy_srs();
+}
+
+static struct rv_monitor rv_srs = {
+ .name = "srs",
+ .description = "switch after resched or sleep.",
+ .enable = enable_srs,
+ .disable = disable_srs,
+ .reset = da_monitor_reset_all_srs,
+ .enabled = 0,
+};
+
+static int __init register_srs(void)
+{
+ rv_register_monitor(&rv_srs, &rv_sched);
+ return 0;
+}
+
+static void __exit unregister_srs(void)
+{
+ rv_unregister_monitor(&rv_srs);
+}
+
+module_init(register_srs);
+module_exit(unregister_srs);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("srs: switch after resched or sleep.");
diff --git a/kernel/trace/rv/monitors/srs/srs.h b/kernel/trace/rv/monitors/srs/srs.h
new file mode 100644
index 0000000000000..ce2005880a105
--- /dev/null
+++ b/kernel/trace/rv/monitors/srs/srs.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of srs automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_srs {
+ running_srs = 0,
+ preempted_srs,
+ preempted_sleepable_srs,
+ resched_out_srs,
+ resched_out_sleepable_srs,
+ resched_sleepable_srs,
+ rescheduling_srs,
+ sleepable_srs,
+ sleeping_srs,
+ waking_srs,
+ state_max_srs
+};
+
+#define INVALID_STATE state_max_srs
+
+enum events_srs {
+ sched_need_resched_srs = 0,
+ sched_need_resched_lazy_srs,
+ sched_set_state_runnable_srs,
+ sched_set_state_sleepable_srs,
+ sched_switch_blocking_srs,
+ sched_switch_in_srs,
+ sched_switch_preempt_srs,
+ sched_switch_suspend_srs,
+ sched_switch_vain_srs,
+ sched_switch_yield_srs,
+ sched_wakeup_srs,
+ event_max_srs
+};
+
+struct automaton_srs {
+ char *state_names[state_max_srs];
+ char *event_names[event_max_srs];
+ unsigned char function[state_max_srs][event_max_srs];
+ unsigned char initial_state;
+ bool final_states[state_max_srs];
+};
+
+static const struct automaton_srs automaton_srs = {
+ .state_names = {
+ "running",
+ "preempted",
+ "preempted_sleepable",
+ "resched_out",
+ "resched_out_sleepable",
+ "resched_sleepable",
+ "rescheduling",
+ "sleepable",
+ "sleeping",
+ "waking"
+ },
+ .event_names = {
+ "sched_need_resched",
+ "sched_need_resched_lazy",
+ "sched_set_state_runnable",
+ "sched_set_state_sleepable",
+ "sched_switch_blocking",
+ "sched_switch_in",
+ "sched_switch_preempt",
+ "sched_switch_suspend",
+ "sched_switch_vain",
+ "sched_switch_yield",
+ "sched_wakeup"
+ },
+ .function = {
+ { rescheduling_srs, rescheduling_srs, running_srs, sleepable_srs, sleeping_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, running_srs, preempted_srs, running_srs },
+ { resched_out_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, running_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ { resched_out_sleepable_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, sleepable_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, waking_srs },
+ { INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, rescheduling_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ { INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, resched_sleepable_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ { resched_sleepable_srs, INVALID_STATE, rescheduling_srs, resched_sleepable_srs, sleeping_srs, INVALID_STATE, preempted_sleepable_srs, sleeping_srs, sleepable_srs, INVALID_STATE, rescheduling_srs },
+ { rescheduling_srs, INVALID_STATE, rescheduling_srs, resched_sleepable_srs, sleeping_srs, INVALID_STATE, preempted_srs, INVALID_STATE, running_srs, preempted_srs, rescheduling_srs },
+ { resched_sleepable_srs, resched_sleepable_srs, running_srs, sleepable_srs, sleeping_srs, INVALID_STATE, preempted_sleepable_srs, sleeping_srs, sleepable_srs, INVALID_STATE, running_srs },
+ { INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, waking_srs },
+ { resched_out_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, running_srs, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ },
+ .initial_state = running_srs,
+ .final_states = { 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/snroc/snroc_trace.h b/kernel/trace/rv/monitors/srs/srs_trace.h
similarity index 67%
rename from kernel/trace/rv/monitors/snroc/snroc_trace.h
rename to kernel/trace/rv/monitors/srs/srs_trace.h
index 50114cef51229..fd033de926542 100644
--- a/kernel/trace/rv/monitors/snroc/snroc_trace.h
+++ b/kernel/trace/rv/monitors/srs/srs_trace.h
@@ -4,12 +4,12 @@
* Snippet to be included in rv_trace.h
*/
-#ifdef CONFIG_RV_MON_SNROC
-DEFINE_EVENT(event_da_monitor_id, event_snroc,
+#ifdef CONFIG_RV_MON_SRS
+DEFINE_EVENT(event_da_monitor_id, event_srs,
TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
TP_ARGS(id, state, event, next_state, final_state));
-DEFINE_EVENT(error_da_monitor_id, error_snroc,
+DEFINE_EVENT(error_da_monitor_id, error_srs,
TP_PROTO(int id, char *state, char *event),
TP_ARGS(id, state, event));
-#endif /* CONFIG_RV_MON_SNROC */
+#endif /* CONFIG_RV_MON_SRS */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index a5e1c52e29926..f6ead55484751 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -123,7 +123,7 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
);
#include <monitors/wwnr/wwnr_trace.h>
-#include <monitors/snroc/snroc_trace.h>
+#include <monitors/srs/srs_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
#endif /* CONFIG_DA_MON_EVENTS_ID */
diff --git a/tools/verification/models/sched/snroc.dot b/tools/verification/models/sched/snroc.dot
deleted file mode 100644
index 8b71c32d4dca4..0000000000000
--- a/tools/verification/models/sched/snroc.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
- center = true;
- size = "7,11";
- {node [shape = plaintext, style=invis, label=""] "__init_other_context"};
- {node [shape = ellipse] "other_context"};
- {node [shape = plaintext] "other_context"};
- {node [shape = plaintext] "own_context"};
- "__init_other_context" -> "other_context";
- "other_context" [label = "other_context", color = green3];
- "other_context" -> "own_context" [ label = "sched_switch_in" ];
- "own_context" [label = "own_context"];
- "own_context" -> "other_context" [ label = "sched_switch_out" ];
- "own_context" -> "own_context" [ label = "sched_set_state" ];
- { rank = min ;
- "__init_other_context";
- "other_context";
- }
-}
diff --git a/tools/verification/models/sched/srs.dot b/tools/verification/models/sched/srs.dot
new file mode 100644
index 0000000000000..8926e1ab59d98
--- /dev/null
+++ b/tools/verification/models/sched/srs.dot
@@ -0,0 +1,61 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = plaintext, style=invis, label=""] "__init_running"};
+ {node [shape = doublecircle] "running"};
+ {node [shape = circle] "preempted"};
+ {node [shape = circle] "preempted_sleepable"};
+ {node [shape = circle] "resched_out"};
+ {node [shape = circle] "resched_out_sleepable"};
+ {node [shape = circle] "resched_sleepable"};
+ {node [shape = circle] "rescheduling"};
+ {node [shape = circle] "running"};
+ {node [shape = circle] "sleepable"};
+ {node [shape = circle] "sleeping"};
+ {node [shape = circle] "waking"};
+ "__init_running" -> "running";
+ "preempted" [label = "preempted"];
+ "preempted" -> "resched_out" [ label = "sched_need_resched" ];
+ "preempted" -> "running" [ label = "sched_switch_in" ];
+ "preempted_sleepable" [label = "preempted_sleepable"];
+ "preempted_sleepable" -> "resched_out_sleepable" [ label = "sched_need_resched" ];
+ "preempted_sleepable" -> "sleepable" [ label = "sched_switch_in" ];
+ "preempted_sleepable" -> "waking" [ label = "sched_wakeup" ];
+ "resched_out" [label = "resched_out"];
+ "resched_out" -> "rescheduling" [ label = "sched_switch_in" ];
+ "resched_out_sleepable" [label = "resched_out_sleepable"];
+ "resched_out_sleepable" -> "resched_sleepable" [ label = "sched_switch_in" ];
+ "resched_sleepable" [label = "resched_sleepable"];
+ "resched_sleepable" -> "preempted_sleepable" [ label = "sched_switch_preempt" ];
+ "resched_sleepable" -> "resched_sleepable" [ label = "sched_need_resched\nsched_set_state_sleepable" ];
+ "resched_sleepable" -> "rescheduling" [ label = "sched_wakeup\nsched_set_state_runnable" ];
+ "resched_sleepable" -> "sleepable" [ label = "sched_switch_vain" ];
+ "resched_sleepable" -> "sleeping" [ label = "sched_switch_suspend\nsched_switch_blocking" ];
+ "rescheduling" [label = "rescheduling"];
+ "rescheduling" -> "preempted" [ label = "sched_switch_preempt\nsched_switch_yield" ];
+ "rescheduling" -> "resched_sleepable" [ label = "sched_set_state_sleepable" ];
+ "rescheduling" -> "rescheduling" [ label = "sched_need_resched\nsched_wakeup\nsched_set_state_runnable" ];
+ "rescheduling" -> "running" [ label = "sched_switch_vain" ];
+ "rescheduling" -> "sleeping" [ label = "sched_switch_blocking" ];
+ "running" [label = "running", color = green3];
+ "running" -> "preempted" [ label = "sched_switch_yield" ];
+ "running" -> "rescheduling" [ label = "sched_need_resched\nsched_need_resched_lazy" ];
+ "running" -> "running" [ label = "sched_set_state_runnable\nsched_wakeup\nsched_switch_vain" ];
+ "running" -> "sleepable" [ label = "sched_set_state_sleepable" ];
+ "running" -> "sleeping" [ label = "sched_switch_blocking" ];
+ "sleepable" [label = "sleepable"];
+ "sleepable" -> "preempted_sleepable" [ label = "sched_switch_preempt" ];
+ "sleepable" -> "resched_sleepable" [ label = "sched_need_resched\nsched_need_resched_lazy" ];
+ "sleepable" -> "running" [ label = "sched_set_state_runnable\nsched_wakeup" ];
+ "sleepable" -> "sleepable" [ label = "sched_set_state_sleepable\nsched_switch_vain" ];
+ "sleepable" -> "sleeping" [ label = "sched_switch_suspend\nsched_switch_blocking" ];
+ "sleeping" [label = "sleeping"];
+ "sleeping" -> "waking" [ label = "sched_wakeup" ];
+ "waking" [label = "waking"];
+ "waking" -> "resched_out" [ label = "sched_need_resched" ];
+ "waking" -> "running" [ label = "sched_switch_in" ];
+ { rank = min ;
+ "__init_running";
+ "running";
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 9/9] rv: Add srs per-task monitor
2025-04-04 8:45 ` [RFC PATCH 9/9] rv: Add srs per-task monitor Gabriele Monaco
@ 2025-04-10 8:53 ` Juri Lelli
2025-04-11 6:12 ` Gabriele Monaco
0 siblings, 1 reply; 12+ messages in thread
From: Juri Lelli @ 2025-04-10 8:53 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli
Hi Gabriele,
On 04/04/25 10:45, Gabriele Monaco wrote:
> Add a per-task monitor for task switches as part of the sched model:
>
> * srs:
> Monitor to describe conditions for different types of task switch.
> This monitor enforces rules such as preempt after setting need
> for reschedule and suspend after setting a task to sleepable.
>
> This new monitor implies the previously introduced snroc (set non
> runnable on its own context), so replace that monitor with srs.
>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
...
> +Monitor srs
> +-----------
> +
> +The switch after resched or sleep (srs) monitor describes conditions for
> +different types of task switch. This is a complex model, below we are going to
Quite the ASCII art indeed. :-)
> +explain it step by step. Unfortunately splitting this into smaller monitor is
> +not trivial due to some shared events such as ``switch_in``::
Not splitting, but maybe providing several separate diagrams for the key
cases and transitions might help grasping the complete picture? Not
sure, just a thought.
In the below, set_{sleepable,runnable} corresponds to set_state_
{sleepable,runnable} in the code, does it? Not a big deal, but I was
confused at first.
Thanks,
Juri
> +
> + set_runnable
> + | wakeup +---+
> + | switch_vain | |
> + v | v wakeup
> + #================================================# set_runnable
> + switch_in H H <----------+
> + +------------------> H running H |
> + | H H -----+ |
> + | #================================================# | |
> + | | | | ^ ^ | |
> + | | switch_yield need_resched | | | |
> + | | | need_resched_lazy | | | |
> + | set_sleepable v | | | | |
> + | | +-------------+ | | | | |
> + | +--------+----> | preempted | ---+- switch_in | | |
> + | | | +-------------+ | | | |
> + | switch_preempt | | | | | |
> + | switch_yield | need_resched | +- switch_vain | |
> + | | | v | | | |
> + | | | +-------------+ | | | |
> + | need_resched -+--------+----> | resched_out | | | | |
> + | | | | +-------------+ | | | |
> + | | | | | | | need_resched | |
> + | | | | switch_in | | wakeup | |
> + | | | | v v | set_runnable | |
> + | | | | +--------------------------+ -------+ | |
> + | | | | | | | | |
> + | | +--------+----- | rescheduling | <------+ | |
> + | | | | | | |
> + | | | +--------------------------+ -----------+ | |
> + | | | | ^ wakeup | | |
> + | | | set_sleepable set_runnable | | |
> + | | | v | | | |
> + | | +------------------+----- +---------------------------+ | | |
> + | | | | | | | | |
> + +--+--+---+------------------+----> | resched_sleepable | ---+ | | |
> + | | | | | | | | | | |
> + | | | | +-------------+----> +---------------------------+ | | | |
> + | | | | | | | ^ | | | | |
> + | | | | | | switch_preempt | need_resched | | | |
> + | | | | | | | | set_sleepable | | | |
> + | | | | | | v +------+ | | | |
> + | | | | | | +---------------------------+ --+------+---+-----+--+
> + | | | | | | | preempted_sleepable | | | | | |
> + | | | | | | +---------------------------+ --+------+---+--+ | |
> + | | | | | | | ^ | | | | | |
> + | | | | | | switch_in switch_preempt | | | | | |
> + | | | | | | v | switch_vain | | | | |
> + | | | | | | +-------------------------+ | | | | | |
> + | | | | | +------> | | <--+ | | | | |
> + | | | | | | sleepable | | | | | |
> + | | | | +- need_resched------- | | ----------+---+--+--+ |
> + | | | | need_resched_lazy +-------------------------+ | | | |
> + | | | | | ^ | switch_block | | |
> + | | | | | | set_sleepable | | | |
> + | | | | switch_block | switch_vain +----------+ | | |
> + | | | | switch_suspend +------+ | | | |
> + | | | | v v | | |
> + | | | | switch_block +-----------------------------+ switch_block | |
> + | | | +-switch_suspend--------> | sleeping | <-----------+ | |
> + | | | +-----------------------------+ | |
> + | | | | wakeup | |
> + | | | v | |
> + | | +- need_resched ------------- +-------------+ wakeup | |
> + | | | waking | <------------------------------+ |
> + | +------------------------------- +-------------+ |
> + | |
> + | +-----------------------+ |
> + +----- switch_in -------- | resched_out_sleepable | <-- sched_need_resched --------------+
> + +-----------------------+
> +
> +Types of switches:
> +
> +* ``switch_in``:
> + a non running task is scheduled in, this leads to ``running`` if the task is
> + runnable and ``sleepable`` if the task was preempted before sleeping.
> +* ``switch_suspend``:
> + a task puts itself to sleep, this can happen only after explicitly setting
> + the task to ``sleepable``. After a task is suspended, it needs to be woken up
> + (``waking`` state) before being switched in again. The task can be set to
> + ``resched_sleepable`` via a ``need_resched`` but not preempted, in which case it
> + is equivalent to ``sleepable``.
> + Setting the task's state to ``sleepable`` can be reverted before switching if it
> + is woken up or set to runnable.
> +* ``switch_blocked``:
> + a special case of a ``switch_suspend`` where the task is waiting on a
> + sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
> + state events racing with each other and this leads the model to perceive this
> + type of switch when the task is not set to sleepable. This is a limitation of
> + the model in SMP system and workarounds may require to slow down the
> + scheduler.
> +* ``switch_yield``:
> + a task explicitly calls the scheduler, this looks like a preemption as the
> + task is still runnable but the ``need_resched`` flag is not set. It can
> + happen after a ``yield`` system call or from the idle task.
> +* ``switch_preempt``:
> + a task is ``preempted``, this can happen after the need for ``rescheduling``
> + has been set, also in its ``lazy`` flavour. ``need_resched`` can be set as a
> + flag to the task or in the per-core preemption count, either of them can
> + trigger a preemption.
> + The task was previously running and can be switched in directly, but it is
> + possible that a task is preempted after it sets itself as ``sleepable``
> + (``preempted_sleepable``), in this condition, once the task is switched back
> + in, it will not be ``running`` but continue its sleeping process in
> + ``sleepable``.
> +* ``switch_vain``:
> + a task goes through the scheduler but it is picked as the next task to run,
> + hence no real task switch occurs. Since we run the scheduler, this clears the
> + need to reschedule.
> +
...
> +enum events_srs {
> + sched_need_resched_srs = 0,
> + sched_need_resched_lazy_srs,
> + sched_set_state_runnable_srs,
> + sched_set_state_sleepable_srs,
> + sched_switch_blocking_srs,
> + sched_switch_in_srs,
> + sched_switch_preempt_srs,
> + sched_switch_suspend_srs,
> + sched_switch_vain_srs,
> + sched_switch_yield_srs,
> + sched_wakeup_srs,
> + event_max_srs
> +};
> +
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions
2025-04-04 8:45 ` [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
@ 2025-04-11 4:52 ` Nam Cao
2025-04-11 6:09 ` Gabriele Monaco
0 siblings, 1 reply; 12+ messages in thread
From: Nam Cao @ 2025-04-11 4:52 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
Peter Zijlstra, Tomas Glozar, Juri Lelli
On Fri, Apr 04, 2025 at 10:45:20AM +0200, Gabriele Monaco wrote:
> DA monitor can be accessed from multiple cores simultaneously, this is
> likely, for instance when dealing with per-task monitors reacting on
> events that do not always occur on the CPU where the task is running.
> This can cause race conditions where two events change the next state
> and we see inconsistent values. E.g.:
>
> [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
> [63] event_srs: 27: sleepable x sched_set_state_sleepable -> sleepable
> [63] error_srs: 27: event sched_switch_suspend not expected in the state running
>
> In this case the monitor fails because the event on CPU 62 wins against
> the one on CPU 63, although the correct state should have been
> sleepable, since the task get suspended.
>
> Detect if the current state was modified by using try_cmpxchg while
> storing the next value. If it was, try again reading the current state.
> After a maximum number of failed retries, react as if it was an error
> with invalid current state (we cannot determine it).
>
> Monitors where this type of condition can occur must be able to account
> for racing events in any possible order, as we cannot know the winner.
Is locking not simpler? I understand raw_spin_lock() doesn't work because
it steps on some tracepoints. But how about adding something like
raw_spin_lock_notrace()?
static inline bool raw_spin_lock_notrace(raw_spinlock_t *lock)
{
preempt_disable_notrace(); //probably not required, tracepoint handlers do this already
if (!do_raw_spin_trylock(lock))
do_raw_spin_lock(lock);
}
My LTL series theoretically also has this problem, but I have never got it
during testing yet. We should use the same solution for both DA and LTL.
Also, can you please Cc me in your RV patches?
Best regards,
Nam
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions
2025-04-11 4:52 ` Nam Cao
@ 2025-04-11 6:09 ` Gabriele Monaco
0 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-11 6:09 UTC (permalink / raw)
To: Nam Cao
Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, Ingo Molnar,
Peter Zijlstra, Tomas Glozar, Juri Lelli
On Fri, 2025-04-11 at 06:52 +0200, Nam Cao wrote:
> On Fri, Apr 04, 2025 at 10:45:20AM +0200, Gabriele Monaco wrote:
> > DA monitor can be accessed from multiple cores simultaneously, this
> > is
> > likely, for instance when dealing with per-task monitors reacting
> > on
> > events that do not always occur on the CPU where the task is
> > running.
> > This can cause race conditions where two events change the next
> > state
> > and we see inconsistent values. E.g.:
> >
> > [62] event_srs: 27: sleepable x sched_wakeup -> running (final)
> > [63] event_srs: 27: sleepable x sched_set_state_sleepable ->
> > sleepable
> > [63] error_srs: 27: event sched_switch_suspend not expected in
> > the state running
> >
> > In this case the monitor fails because the event on CPU 62 wins
> > against
> > the one on CPU 63, although the correct state should have been
> > sleepable, since the task get suspended.
> >
> > Detect if the current state was modified by using try_cmpxchg while
> > storing the next value. If it was, try again reading the current
> > state.
> > After a maximum number of failed retries, react as if it was an
> > error
> > with invalid current state (we cannot determine it).
> >
> > Monitors where this type of condition can occur must be able to
> > account
> > for racing events in any possible order, as we cannot know the
> > winner.
>
> Is locking not simpler? I understand raw_spin_lock() doesn't work
> because
> it steps on some tracepoints. But how about adding something like
> raw_spin_lock_notrace()?
It is probably simpler, but I think it would require also to disable
interrupts (some events occur in interrupt context), I'm not sure the
introduced overhead is going to be worth it in the fast path, but
that's kinda what I wanted to learn in this RFC ;)
>
> static inline bool raw_spin_lock_notrace(raw_spinlock_t *lock)
> {
> preempt_disable_notrace(); //probably not required,
> tracepoint handlers do this already
>
> if (!do_raw_spin_trylock(lock))
> do_raw_spin_lock(lock);
> }
>
> My LTL series theoretically also has this problem, but I have never
> got it
> during testing yet. We should use the same solution for both DA and
> LTL.
Yes totally, on the long run we might get some common utilities for
this kind of things that aren't too monitor specific. But for now I
wouldn't worry too much.
>
> Also, can you please Cc me in your RV patches?
>
Right.. will do!
Thanks for your feedback,
Gabriele
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH 9/9] rv: Add srs per-task monitor
2025-04-10 8:53 ` Juri Lelli
@ 2025-04-11 6:12 ` Gabriele Monaco
0 siblings, 0 replies; 12+ messages in thread
From: Gabriele Monaco @ 2025-04-11 6:12 UTC (permalink / raw)
To: Juri Lelli
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli
On Thu, 2025-04-10 at 10:53 +0200, Juri Lelli wrote:
> Hi Gabriele,
>
> On 04/04/25 10:45, Gabriele Monaco wrote:
> > Add a per-task monitor for task switches as part of the sched
> > model:
> >
> > * srs:
> > Monitor to describe conditions for different types of task
> > switch.
> > This monitor enforces rules such as preempt after setting need
> > for reschedule and suspend after setting a task to sleepable.
> >
> > This new monitor implies the previously introduced snroc (set non
> > runnable on its own context), so replace that monitor with srs.
> >
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
>
> ...
>
> > +Monitor srs
> > +-----------
> > +
> > +The switch after resched or sleep (srs) monitor describes
> > conditions for
> > +different types of task switch. This is a complex model, below we
> > are going to
>
> Quite the ASCII art indeed. :-)
>
> > +explain it step by step. Unfortunately splitting this into smaller
> > monitor is
> > +not trivial due to some shared events such as ``switch_in``::
>
> Not splitting, but maybe providing several separate diagrams for the
> key
> cases and transitions might help grasping the complete picture? Not
> sure, just a thought.
I tried to do that but then dropped the idea because it was hard to
maintain, not that this one is much easier though..
I might re-consider it!
>
> In the below, set_{sleepable,runnable} corresponds to set_state_
> {sleepable,runnable} in the code, does it? Not a big deal, but I was
> confused at first.
Yeah, I'm trying to keep the names short for the model not to explode
horizontally, but it seems I'm losing on clarity so probably better
finding other solutions.
Thanks,
Gabriele
>
> Thanks,
> Juri
>
> > +
> > + set_runnable
> > + | wakeup
> > +---+
> > + | switch_vain
> > | |
> > + v
> > | v wakeup
> > +
> > #================================================# set_runnable
> > + switch_in
> > H H <----------+
> > + +------------------> H
> > running H |
> > + |
> > H H -----+ |
> > + |
> > #================================================# | |
> > + | | | |
> > ^ ^ | |
> > + | | switch_yield need_resched
> > | | | |
> > + | | | need_resched_lazy
> > | | | |
> > + | set_sleepable v |
> > | | | |
> > + | | +-------------+ |
> > | | | |
> > + | +--------+----> | preempted | ---+-
> > switch_in | | |
> > + | | | +-------------+
> > | | | |
> > + | switch_preempt | |
> > | | | |
> > + | switch_yield | need_resched | +-
> > switch_vain | |
> > + | | | v |
> > | | |
> > + | | | +-------------+ |
> > | | |
> > + | need_resched -+--------+----> | resched_out | |
> > | | |
> > + | | | | +-------------+ |
> > | | |
> > + | | | | | | |
> > need_resched | |
> > + | | | | switch_in | |
> > wakeup | |
> > + | | | | v v |
> > set_runnable | |
> > + | | | | +--------------------------+
> > -------+ | |
> > + | | | | |
> > | | | |
> > + | | +--------+----- | rescheduling |
> > <------+ | |
> > + | | | |
> > | | |
> > + | | | +--------------------------+
> > -----------+ | |
> > + | | | | ^
> > wakeup | | |
> > + | | | set_sleepable
> > set_runnable | | |
> > + | | | v
> > | | | |
> > + | | +------------------+----- +---------------------------
> > + | | |
> > + | | | | |
> > | | | |
> > + +--+--+---+------------------+----> | resched_sleepable |
> > ---+ | | |
> > + | | | | | |
> > | | | | |
> > + | | | | +-------------+----> +---------------------------
> > + | | | |
> > + | | | | | | | ^
> > | | | | |
> > + | | | | | | switch_preempt |
> > need_resched | | | |
> > + | | | | | | | |
> > set_sleepable | | | |
> > + | | | | | | v +------
> > + | | | |
> > + | | | | | | +---------------------------
> > + --+------+---+-----+--+
> > + | | | | | | | preempted_sleepable
> > | | | | | |
> > + | | | | | | +---------------------------
> > + --+------+---+--+ | |
> > + | | | | | | |
> > ^ | | | | | |
> > + | | | | | | switch_in
> > switch_preempt | | | | | |
> > + | | | | | | v |
> > switch_vain | | | | |
> > + | | | | | | +-------------------------
> > + | | | | | |
> > + | | | | | +------> | |
> > <--+ | | | | |
> > + | | | | | | sleepable
> > | | | | | |
> > + | | | | +- need_resched------- | |
> > ----------+---+--+--+ |
> > + | | | | need_resched_lazy +-------------------------
> > + | | | |
> > + | | | | | ^ |
> > switch_block | | |
> > + | | | | | |
> > set_sleepable | | | |
> > + | | | | switch_block | switch_vain
> > +----------+ | | |
> > + | | | | switch_suspend +------+
> > | | | |
> > + | | | | v
> > v | | |
> > + | | | | switch_block +----------------------------
> > -+ switch_block | |
> > + | | | +-switch_suspend--------> | sleeping
> > | <-----------+ | |
> > + | | | +----------------------------
> > -+ | |
> > + | | | |
> > wakeup | |
> > + | | |
> > v | |
> > + | | +- need_resched ------------- +-------------+
> > wakeup | |
> > + | | | waking | <------------
> > ------------------+ |
> > + | +------------------------------- +-------------
> > + |
> > +
> > |
> > |
> > + | +-----------------------
> > + |
> > + +----- switch_in -------- | resched_out_sleepable | <--
> > sched_need_resched --------------+
> > + +-----------------------+
> > +
> > +Types of switches:
> > +
> > +* ``switch_in``:
> > + a non running task is scheduled in, this leads to ``running`` if
> > the task is
> > + runnable and ``sleepable`` if the task was preempted before
> > sleeping.
> > +* ``switch_suspend``:
> > + a task puts itself to sleep, this can happen only after
> > explicitly setting
> > + the task to ``sleepable``. After a task is suspended, it needs
> > to be woken up
> > + (``waking`` state) before being switched in again. The task can
> > be set to
> > + ``resched_sleepable`` via a ``need_resched`` but not preempted,
> > in which case it
> > + is equivalent to ``sleepable``.
> > + Setting the task's state to ``sleepable`` can be reverted before
> > switching if it
> > + is woken up or set to runnable.
> > +* ``switch_blocked``:
> > + a special case of a ``switch_suspend`` where the task is waiting
> > on a
> > + sleeping RT lock (``PREEMPT_RT`` only), it is common to see
> > wakeup and set
> > + state events racing with each other and this leads the model to
> > perceive this
> > + type of switch when the task is not set to sleepable. This is a
> > limitation of
> > + the model in SMP system and workarounds may require to slow down
> > the
> > + scheduler.
> > +* ``switch_yield``:
> > + a task explicitly calls the scheduler, this looks like a
> > preemption as the
> > + task is still runnable but the ``need_resched`` flag is not set.
> > It can
> > + happen after a ``yield`` system call or from the idle task.
> > +* ``switch_preempt``:
> > + a task is ``preempted``, this can happen after the need for
> > ``rescheduling``
> > + has been set, also in its ``lazy`` flavour. ``need_resched`` can
> > be set as a
> > + flag to the task or in the per-core preemption count, either of
> > them can
> > + trigger a preemption.
> > + The task was previously running and can be switched in directly,
> > but it is
> > + possible that a task is preempted after it sets itself as
> > ``sleepable``
> > + (``preempted_sleepable``), in this condition, once the task is
> > switched back
> > + in, it will not be ``running`` but continue its sleeping process
> > in
> > + ``sleepable``.
> > +* ``switch_vain``:
> > + a task goes through the scheduler but it is picked as the next
> > task to run,
> > + hence no real task switch occurs. Since we run the scheduler,
> > this clears the
> > + need to reschedule.
> > +
>
> ...
>
> > +enum events_srs {
> > + sched_need_resched_srs = 0,
> > + sched_need_resched_lazy_srs,
> > + sched_set_state_runnable_srs,
> > + sched_set_state_sleepable_srs,
> > + sched_switch_blocking_srs,
> > + sched_switch_in_srs,
> > + sched_switch_preempt_srs,
> > + sched_switch_suspend_srs,
> > + sched_switch_vain_srs,
> > + sched_switch_yield_srs,
> > + sched_wakeup_srs,
> > + event_max_srs
> > +};
> > +
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-04-11 6:12 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250404084512.98552-11-gmonaco@redhat.com>
2025-04-04 8:45 ` [RFC PATCH 1/9] tools/rv: Do not skip idle in trace Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 2/9] tools/rv: Stop gracefully also on SIGTERM Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 3/9] rv: Add da_handle_start_run_event_ to per-task monitors Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 4/9] rv: Remove trailing whitespace from tracepoint string Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 5/9] sched: Add sched tracepoints for RV task model Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 7/9] rv: Retry when da monitor detects race conditions Gabriele Monaco
2025-04-11 4:52 ` Nam Cao
2025-04-11 6:09 ` Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 8/9] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-04-04 8:45 ` [RFC PATCH 9/9] rv: Add srs per-task monitor Gabriele Monaco
2025-04-10 8:53 ` Juri Lelli
2025-04-11 6:12 ` Gabriele Monaco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).