* [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
@ 2025-05-14 8:43 ` Gabriele Monaco
2025-05-19 8:42 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 08/12] rv: Extend and adapt snroc model Gabriele Monaco
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-14 8:43 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
Tomas Glozar, Juri Lelli
The sched_set_state tracepoint changed prototype adding a new argument,
this argument can differentiate between an explicit set_state called by
a task and a set state to runnable by the scheduler due to a pending
signal.
Adapt the handlers prototypes for the sco monitor.
Expand the model to handle the new set_state flavour, the monitor was
making sure set state happens only outside of the scheduler, if the
event occurs with the new argument (from_signal) set to true, we instead
expect it to be inside the scheduler.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 35 +++++++++++++-----------
kernel/trace/rv/monitors/sco/sco.c | 8 ++++--
kernel/trace/rv/monitors/sco/sco.h | 6 ++--
tools/verification/models/sched/sco.dot | 1 +
4 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 24b2c62a3bc2..6f76bba94d9f 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -64,22 +64,25 @@ Monitor sco
~~~~~~~~~~~
The scheduling context operations (sco) monitor ensures changes in a task state
-happen only in thread context::
-
-
- |
- |
- v
- sched_set_state +------------------+
- +------------------ | |
- | | thread_context |
- +-----------------> | | <+
- +------------------+ |
- | |
- | schedule_entry | schedule_exit
- v |
- |
- scheduling_context -+
+happen only in thread context, the only exception is a special kind of set
+state that occurs if a task about to sleep has a pending signal. This set state
+is not called by the thread but by the scheduler itself::
+
+ |
+ |
+ v
+ sched_set_state +------------------+
+ +---------------------------------- | |
+ | | thread_context |
+ +---------------------------------> | | <+
+ +------------------+ |
+ | |
+ | schedule_entry | schedule_exit
+ v |
+ sched_set_state_runnable_signal |
+ +---------------------------------- |
+ | scheduling_context |
+ +---------------------------------> -+
Monitor snroc
~~~~~~~~~~~~~
diff --git a/kernel/trace/rv/monitors/sco/sco.c b/kernel/trace/rv/monitors/sco/sco.c
index 66f4639d46ac..6457ff2469d0 100644
--- a/kernel/trace/rv/monitors/sco/sco.c
+++ b/kernel/trace/rv/monitors/sco/sco.c
@@ -19,9 +19,13 @@
static struct rv_monitor rv_sco;
DECLARE_DA_MON_PER_CPU(sco, unsigned char);
-static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+ int state, bool from_signal)
{
- da_handle_start_event_sco(sched_set_state_sco);
+ if (from_signal)
+ da_handle_event_sco(sched_set_state_runnable_signal_sco);
+ else
+ da_handle_start_event_sco(sched_set_state_sco);
}
static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
diff --git a/kernel/trace/rv/monitors/sco/sco.h b/kernel/trace/rv/monitors/sco/sco.h
index 7a4c1f2d5ca1..302750687f9c 100644
--- a/kernel/trace/rv/monitors/sco/sco.h
+++ b/kernel/trace/rv/monitors/sco/sco.h
@@ -15,6 +15,7 @@ enum states_sco {
enum events_sco {
sched_set_state_sco = 0,
+ sched_set_state_runnable_signal_sco,
schedule_entry_sco,
schedule_exit_sco,
event_max_sco
@@ -35,12 +36,13 @@ static const struct automaton_sco automaton_sco = {
},
.event_names = {
"sched_set_state",
+ "sched_set_state_runnable_signal",
"schedule_entry",
"schedule_exit"
},
.function = {
- { thread_context_sco, scheduling_context_sco, INVALID_STATE },
- { INVALID_STATE, INVALID_STATE, thread_context_sco },
+ { thread_context_sco, INVALID_STATE, scheduling_context_sco, INVALID_STATE },
+ { INVALID_STATE, scheduling_context_sco, INVALID_STATE, thread_context_sco },
},
.initial_state = thread_context_sco,
.final_states = { 1, 0 },
diff --git a/tools/verification/models/sched/sco.dot b/tools/verification/models/sched/sco.dot
index 20b0e3b449a6..4e44ed58c62a 100644
--- a/tools/verification/models/sched/sco.dot
+++ b/tools/verification/models/sched/sco.dot
@@ -7,6 +7,7 @@ digraph state_automaton {
{node [shape = plaintext] "thread_context"};
"__init_thread_context" -> "thread_context";
"scheduling_context" [label = "scheduling_context"];
+ "scheduling_context" -> "scheduling_context" [ label = "sched_set_state_runnable_signal" ];
"scheduling_context" -> "thread_context" [ label = "schedule_exit" ];
"thread_context" [label = "thread_context", color = green3];
"thread_context" -> "scheduling_context" [ label = "schedule_entry" ];
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v2 08/12] rv: Extend and adapt snroc model
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
2025-05-14 8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
@ 2025-05-14 8:43 ` Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
` (2 subsequent siblings)
4 siblings, 0 replies; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-14 8:43 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
Tomas Glozar, Juri Lelli
The snroc monitor ensures changes in a task state happens only in the
respective task's context. This is the only monitor enforcing a task
is to be switched in after being switched out, and vice-versa.
Although this is a trivial claim, it is useful to complete other claims
on when scheduling is possible while keeping models simple.
Add the sched_switch_vain event to the model, enforcing that a vain
switch doesn't change the context but can only occur while a task is
running (e.g. a call to schedule that re-selects the current task).
Also adapt the set_state handler prototypes to the tracepoint change.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 32 ++++++++++++-----------
kernel/trace/rv/monitors/snroc/snroc.c | 12 ++++++++-
kernel/trace/rv/monitors/snroc/snroc.h | 8 +++---
tools/verification/models/sched/snroc.dot | 2 +-
4 files changed, 34 insertions(+), 20 deletions(-)
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 6f76bba94d9f..5cb58ac441d8 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -89,21 +89,23 @@ Monitor snroc
The set non runnable on its own context (snroc) monitor ensures changes in a
task state happens only in the respective task's context. This is a per-task
-monitor::
-
- |
- |
- v
- +------------------+
- | other_context | <+
- +------------------+ |
- | |
- | sched_switch_in | sched_switch_out
- v |
- sched_set_state |
- +------------------ |
- | own_context |
- +-----------------> -+
+monitor. A task is in its own context after switching in and leaves the context
+when switched out, a vain switch maintains the context::
+
+ |
+ |
+ v
+ +------------------+
+ | other_context | <+
+ +------------------+ |
+ | |
+ | sched_switch_in | sched_switch_out
+ v |
+ sched_set_state |
+ sched_switch_vain |
+ +-------------------- own_context |
+ | |
+ +-------------------> -+
Monitor scpd
~~~~~~~~~~~~
diff --git a/kernel/trace/rv/monitors/snroc/snroc.c b/kernel/trace/rv/monitors/snroc/snroc.c
index 540e686e699f..11a56b58665e 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.c
+++ b/kernel/trace/rv/monitors/snroc/snroc.c
@@ -19,7 +19,8 @@
static struct rv_monitor rv_snroc;
DECLARE_DA_MON_PER_TASK(snroc, unsigned char);
-static void handle_sched_set_state(void *data, struct task_struct *tsk, int state)
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+ int state, bool from_signal)
{
da_handle_event_snroc(tsk, sched_set_state_snroc);
}
@@ -33,6 +34,13 @@ static void handle_sched_switch(void *data, bool preempt,
da_handle_event_snroc(next, sched_switch_in_snroc);
}
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ da_handle_event_snroc(tsk, sched_switch_vain_snroc);
+}
+
static int enable_snroc(void)
{
int retval;
@@ -43,6 +51,7 @@ static int enable_snroc(void)
rv_attach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
rv_attach_trace_probe("snroc", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("snroc", sched_switch_vain_tp, handle_sched_switch_vain);
return 0;
}
@@ -53,6 +62,7 @@ static void disable_snroc(void)
rv_detach_trace_probe("snroc", sched_set_state_tp, handle_sched_set_state);
rv_detach_trace_probe("snroc", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("snroc", sched_switch_vain_tp, handle_sched_switch_vain);
da_monitor_destroy_snroc();
}
diff --git a/kernel/trace/rv/monitors/snroc/snroc.h b/kernel/trace/rv/monitors/snroc/snroc.h
index c3650a2b1b10..d6de40e15ae1 100644
--- a/kernel/trace/rv/monitors/snroc/snroc.h
+++ b/kernel/trace/rv/monitors/snroc/snroc.h
@@ -17,6 +17,7 @@ enum events_snroc {
sched_set_state_snroc = 0,
sched_switch_in_snroc,
sched_switch_out_snroc,
+ sched_switch_vain_snroc,
event_max_snroc
};
@@ -36,11 +37,12 @@ static const struct automaton_snroc automaton_snroc = {
.event_names = {
"sched_set_state",
"sched_switch_in",
- "sched_switch_out"
+ "sched_switch_out",
+ "sched_switch_vain"
},
.function = {
- { INVALID_STATE, own_context_snroc, INVALID_STATE },
- { own_context_snroc, INVALID_STATE, other_context_snroc },
+ { INVALID_STATE, own_context_snroc, INVALID_STATE, INVALID_STATE },
+ { own_context_snroc, INVALID_STATE, other_context_snroc, own_context_snroc },
},
.initial_state = other_context_snroc,
.final_states = { 1, 0 },
diff --git a/tools/verification/models/sched/snroc.dot b/tools/verification/models/sched/snroc.dot
index 8b71c32d4dca..5b1a787d0350 100644
--- a/tools/verification/models/sched/snroc.dot
+++ b/tools/verification/models/sched/snroc.dot
@@ -10,7 +10,7 @@ digraph state_automaton {
"other_context" -> "own_context" [ label = "sched_switch_in" ];
"own_context" [label = "own_context"];
"own_context" -> "other_context" [ label = "sched_switch_out" ];
- "own_context" -> "own_context" [ label = "sched_set_state" ];
+ "own_context" -> "own_context" [ label = "sched_set_state\nsched_switch_vain" ];
{ rank = min ;
"__init_other_context";
"other_context";
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
2025-05-14 8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 08/12] rv: Extend and adapt snroc model Gabriele Monaco
@ 2025-05-14 8:43 ` Gabriele Monaco
2025-06-24 7:36 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
4 siblings, 1 reply; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-14 8:43 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
Tomas Glozar, Juri Lelli
The tss monitor currently guarantees context switches can happen only
while scheduling, but it doesn't enforce that each scheduling call
implies a task switch. This can be implied only if we rely on the newly
introduced sched_switch_vain tracepoint, which represents a
scheduler call where the previously running task is the same that is
picked to run next, in fact no context is switched.
Replace the monitor with a more comprehensive specification which
implies tss but also ensures that:
* each scheduler call switches context (or has a vain switch)
* each context switch happens with interrupts disabled
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 68 +++++++---
kernel/trace/rv/Kconfig | 2 +-
kernel/trace/rv/Makefile | 2 +-
kernel/trace/rv/monitors/sts/Kconfig | 18 +++
kernel/trace/rv/monitors/sts/sts.c | 117 ++++++++++++++++++
kernel/trace/rv/monitors/sts/sts.h | 62 ++++++++++
.../{tss/tss_trace.h => sts/sts_trace.h} | 8 +-
kernel/trace/rv/monitors/tss/Kconfig | 14 ---
kernel/trace/rv/monitors/tss/tss.c | 90 --------------
kernel/trace/rv/monitors/tss/tss.h | 47 -------
kernel/trace/rv/rv_trace.h | 2 +-
tools/verification/models/sched/sts.dot | 29 +++++
tools/verification/models/sched/tss.dot | 18 ---
13 files changed, 281 insertions(+), 196 deletions(-)
create mode 100644 kernel/trace/rv/monitors/sts/Kconfig
create mode 100644 kernel/trace/rv/monitors/sts/sts.c
create mode 100644 kernel/trace/rv/monitors/sts/sts.h
rename kernel/trace/rv/monitors/{tss/tss_trace.h => sts/sts_trace.h} (67%)
delete mode 100644 kernel/trace/rv/monitors/tss/Kconfig
delete mode 100644 kernel/trace/rv/monitors/tss/tss.c
delete mode 100644 kernel/trace/rv/monitors/tss/tss.h
create mode 100644 tools/verification/models/sched/sts.dot
delete mode 100644 tools/verification/models/sched/tss.dot
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 5cb58ac441d8..e4171aadef1c 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -40,26 +40,6 @@ defined in by Daniel Bristot in [1].
Currently we included the following:
-Monitor tss
-~~~~~~~~~~~
-
-The task switch while scheduling (tss) monitor ensures a task switch happens
-only in scheduling context, that is inside a call to `__schedule`::
-
- |
- |
- v
- +-----------------+
- | thread | <+
- +-----------------+ |
- | |
- | schedule_entry | schedule_exit
- v |
- sched_switch |
- +--------------- |
- | sched |
- +--------------> -+
-
Monitor sco
~~~~~~~~~~~
@@ -170,6 +150,54 @@ schedule is not called with interrupt disabled::
|
cant_sched -+
+Monitor sts
+~~~~~~~~~~~
+
+The schedule implies task switch (sts) monitor ensures a task switch happens in
+every scheduling context, that is inside a call to ``__schedule``, as well as no
+task switch can happen without scheduling and before interrupts are disabled.
+This require the special type of switch called vain, which occurs when the next
+task picked for execution is the same as the previously running one, in fact no
+real task switch occurs::
+
+ |
+ |
+ v
+ #====================# irq_disable
+ H H irq_enable
+ H thread H --------------+
+ H H |
+ +-------------> H H <-------------+
+ | #====================#
+ | |
+ | | schedule_entry
+ | v
+ | +--------------------+
+ | | scheduling | <+
+ | +--------------------+ |
+ | | |
+ | | irq_disable | irq_enable
+ | v |
+ | +--------------------+ |
+ | | disable_to_switch | -+
+ | schedule_exit +--------------------+
+ | |
+ | | sched_switch
+ | | sched_switch_vain
+ | v
+ | +--------------------+
+ | | switching |
+ | +--------------------+
+ | |
+ | | irq_enable
+ | v
+ | +--------------------+ irq_disable
+ | | | irq_enable
+ | | enable_to_exit | --------------+
+ | | | |
+ +-------------- | | <-------------+
+ +--------------------+
+
References
----------
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index b39f36013ef2..a53a3eca9616 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -28,12 +28,12 @@ menuconfig RV
source "kernel/trace/rv/monitors/wip/Kconfig"
source "kernel/trace/rv/monitors/wwnr/Kconfig"
source "kernel/trace/rv/monitors/sched/Kconfig"
-source "kernel/trace/rv/monitors/tss/Kconfig"
source "kernel/trace/rv/monitors/sco/Kconfig"
source "kernel/trace/rv/monitors/snroc/Kconfig"
source "kernel/trace/rv/monitors/scpd/Kconfig"
source "kernel/trace/rv/monitors/snep/Kconfig"
source "kernel/trace/rv/monitors/sncid/Kconfig"
+source "kernel/trace/rv/monitors/sts/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index f9b2cd0483c3..c609b72275cb 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -6,12 +6,12 @@ obj-$(CONFIG_RV) += rv.o
obj-$(CONFIG_RV_MON_WIP) += monitors/wip/wip.o
obj-$(CONFIG_RV_MON_WWNR) += monitors/wwnr/wwnr.o
obj-$(CONFIG_RV_MON_SCHED) += monitors/sched/sched.o
-obj-$(CONFIG_RV_MON_TSS) += monitors/tss/tss.o
obj-$(CONFIG_RV_MON_SCO) += monitors/sco/sco.o
obj-$(CONFIG_RV_MON_SNROC) += monitors/snroc/snroc.o
obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
obj-$(CONFIG_RV_MON_SNEP) += monitors/snep/snep.o
obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
+obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/sts/Kconfig b/kernel/trace/rv/monitors/sts/Kconfig
new file mode 100644
index 000000000000..5b486dac3f10
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/Kconfig
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_STS
+ depends on RV
+ depends on IRQSOFF_TRACER
+ depends on RV_MON_SCHED
+ default y
+ select DA_MON_EVENTS_IMPLICIT
+ bool "sts monitor"
+ help
+ Monitor to ensure relationships between scheduler and switches
+ * each call to the scheduler implies a switch
+ * switches only happen inside the scheduler
+ * switches happen with interrupt disabled
+ This monitor is part of the sched monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sts/sts.c b/kernel/trace/rv/monitors/sts/sts.c
new file mode 100644
index 000000000000..deb18a9d48f3
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sts"
+
+#include <trace/events/sched.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sts.h"
+
+static struct rv_monitor rv_sts;
+DECLARE_DA_MON_PER_CPU(sts, unsigned char);
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_sts(irq_disable_sts);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_sts(irq_enable_sts);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ da_handle_event_sts(sched_switch_sts);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ da_handle_event_sts(sched_switch_vain_sts);
+}
+
+static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+{
+ da_handle_event_sts(schedule_entry_sts);
+}
+
+static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
+{
+ da_handle_start_event_sts(schedule_exit_sts);
+}
+
+static int enable_sts(void)
+{
+ int retval;
+
+ retval = da_monitor_init_sts();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("sts", irq_disable, handle_irq_disable);
+ rv_attach_trace_probe("sts", irq_enable, handle_irq_enable);
+ rv_attach_trace_probe("sts", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_attach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+ rv_attach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+ return 0;
+}
+
+static void disable_sts(void)
+{
+ rv_sts.enabled = 0;
+
+ rv_detach_trace_probe("sts", irq_disable, handle_irq_disable);
+ rv_detach_trace_probe("sts", irq_enable, handle_irq_enable);
+ rv_detach_trace_probe("sts", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("sts", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_detach_trace_probe("sts", sched_entry_tp, handle_schedule_entry);
+ rv_detach_trace_probe("sts", sched_exit_tp, handle_schedule_exit);
+
+ da_monitor_destroy_sts();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_sts = {
+ .name = "sts",
+ .description = "schedule implies task switch.",
+ .enable = enable_sts,
+ .disable = disable_sts,
+ .reset = da_monitor_reset_all_sts,
+ .enabled = 0,
+};
+
+static int __init register_sts(void)
+{
+ return rv_register_monitor(&rv_sts, &rv_sched);
+}
+
+static void __exit unregister_sts(void)
+{
+ rv_unregister_monitor(&rv_sts);
+}
+
+module_init(register_sts);
+module_exit(unregister_sts);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sts: schedule implies task switch.");
diff --git a/kernel/trace/rv/monitors/sts/sts.h b/kernel/trace/rv/monitors/sts/sts.h
new file mode 100644
index 000000000000..6921c0042293
--- /dev/null
+++ b/kernel/trace/rv/monitors/sts/sts.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sts automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sts {
+ thread_sts = 0,
+ disable_to_switch_sts,
+ enable_to_exit_sts,
+ scheduling_sts,
+ switching_sts,
+ state_max_sts
+};
+
+#define INVALID_STATE state_max_sts
+
+enum events_sts {
+ irq_disable_sts = 0,
+ irq_enable_sts,
+ sched_switch_sts,
+ sched_switch_vain_sts,
+ schedule_entry_sts,
+ schedule_exit_sts,
+ event_max_sts
+};
+
+struct automaton_sts {
+ char *state_names[state_max_sts];
+ char *event_names[event_max_sts];
+ unsigned char function[state_max_sts][event_max_sts];
+ unsigned char initial_state;
+ bool final_states[state_max_sts];
+};
+
+static const struct automaton_sts automaton_sts = {
+ .state_names = {
+ "thread",
+ "disable_to_switch",
+ "enable_to_exit",
+ "scheduling",
+ "switching"
+ },
+ .event_names = {
+ "irq_disable",
+ "irq_enable",
+ "sched_switch",
+ "sched_switch_vain",
+ "schedule_entry",
+ "schedule_exit"
+ },
+ .function = {
+ { thread_sts, thread_sts, INVALID_STATE, INVALID_STATE, scheduling_sts, INVALID_STATE },
+ { INVALID_STATE, scheduling_sts, switching_sts, switching_sts, INVALID_STATE, INVALID_STATE },
+ { enable_to_exit_sts, enable_to_exit_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, thread_sts },
+ { disable_to_switch_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ { INVALID_STATE, enable_to_exit_sts, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE },
+ },
+ .initial_state = thread_sts,
+ .final_states = { 1, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h b/kernel/trace/rv/monitors/sts/sts_trace.h
similarity index 67%
rename from kernel/trace/rv/monitors/tss/tss_trace.h
rename to kernel/trace/rv/monitors/sts/sts_trace.h
index 4619dbb50cc0..d78beb58d5b3 100644
--- a/kernel/trace/rv/monitors/tss/tss_trace.h
+++ b/kernel/trace/rv/monitors/sts/sts_trace.h
@@ -4,12 +4,12 @@
* Snippet to be included in rv_trace.h
*/
-#ifdef CONFIG_RV_MON_TSS
-DEFINE_EVENT(event_da_monitor, event_tss,
+#ifdef CONFIG_RV_MON_STS
+DEFINE_EVENT(event_da_monitor, event_sts,
TP_PROTO(char *state, char *event, char *next_state, bool final_state),
TP_ARGS(state, event, next_state, final_state));
-DEFINE_EVENT(error_da_monitor, error_tss,
+DEFINE_EVENT(error_da_monitor, error_sts,
TP_PROTO(char *state, char *event),
TP_ARGS(state, event));
-#endif /* CONFIG_RV_MON_TSS */
+#endif /* CONFIG_RV_MON_STS */
diff --git a/kernel/trace/rv/monitors/tss/Kconfig b/kernel/trace/rv/monitors/tss/Kconfig
deleted file mode 100644
index 479f86f52e60..000000000000
--- a/kernel/trace/rv/monitors/tss/Kconfig
+++ /dev/null
@@ -1,14 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-only
-#
-config RV_MON_TSS
- depends on RV
- depends on RV_MON_SCHED
- default y
- select DA_MON_EVENTS_IMPLICIT
- bool "tss monitor"
- help
- Monitor to ensure sched_switch happens only in scheduling context.
- This monitor is part of the sched monitors collection.
-
- For further information, see:
- Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/tss/tss.c b/kernel/trace/rv/monitors/tss/tss.c
deleted file mode 100644
index 0452fcd9edcf..000000000000
--- a/kernel/trace/rv/monitors/tss/tss.c
+++ /dev/null
@@ -1,90 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-#include <linux/ftrace.h>
-#include <linux/tracepoint.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/init.h>
-#include <linux/rv.h>
-#include <rv/instrumentation.h>
-#include <rv/da_monitor.h>
-
-#define MODULE_NAME "tss"
-
-#include <trace/events/sched.h>
-#include <rv_trace.h>
-#include <monitors/sched/sched.h>
-
-#include "tss.h"
-
-static struct rv_monitor rv_tss;
-DECLARE_DA_MON_PER_CPU(tss, unsigned char);
-
-static void handle_sched_switch(void *data, bool preempt,
- struct task_struct *prev,
- struct task_struct *next,
- unsigned int prev_state)
-{
- da_handle_event_tss(sched_switch_tss);
-}
-
-static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
-{
- da_handle_event_tss(schedule_entry_tss);
-}
-
-static void handle_schedule_exit(void *data, bool is_switch, unsigned long ip)
-{
- da_handle_start_event_tss(schedule_exit_tss);
-}
-
-static int enable_tss(void)
-{
- int retval;
-
- retval = da_monitor_init_tss();
- if (retval)
- return retval;
-
- rv_attach_trace_probe("tss", sched_switch, handle_sched_switch);
- rv_attach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
- rv_attach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
- return 0;
-}
-
-static void disable_tss(void)
-{
- rv_tss.enabled = 0;
-
- rv_detach_trace_probe("tss", sched_switch, handle_sched_switch);
- rv_detach_trace_probe("tss", sched_entry_tp, handle_schedule_entry);
- rv_detach_trace_probe("tss", sched_exit_tp, handle_schedule_exit);
-
- da_monitor_destroy_tss();
-}
-
-static struct rv_monitor rv_tss = {
- .name = "tss",
- .description = "task switch while scheduling.",
- .enable = enable_tss,
- .disable = disable_tss,
- .reset = da_monitor_reset_all_tss,
- .enabled = 0,
-};
-
-static int __init register_tss(void)
-{
- return rv_register_monitor(&rv_tss, &rv_sched);
-}
-
-static void __exit unregister_tss(void)
-{
- rv_unregister_monitor(&rv_tss);
-}
-
-module_init(register_tss);
-module_exit(unregister_tss);
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
-MODULE_DESCRIPTION("tss: task switch while scheduling.");
diff --git a/kernel/trace/rv/monitors/tss/tss.h b/kernel/trace/rv/monitors/tss/tss.h
deleted file mode 100644
index f0a36fda1b87..000000000000
--- a/kernel/trace/rv/monitors/tss/tss.h
+++ /dev/null
@@ -1,47 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * Automatically generated C representation of tss automaton
- * For further information about this format, see kernel documentation:
- * Documentation/trace/rv/deterministic_automata.rst
- */
-
-enum states_tss {
- thread_tss = 0,
- sched_tss,
- state_max_tss
-};
-
-#define INVALID_STATE state_max_tss
-
-enum events_tss {
- sched_switch_tss = 0,
- schedule_entry_tss,
- schedule_exit_tss,
- event_max_tss
-};
-
-struct automaton_tss {
- char *state_names[state_max_tss];
- char *event_names[event_max_tss];
- unsigned char function[state_max_tss][event_max_tss];
- unsigned char initial_state;
- bool final_states[state_max_tss];
-};
-
-static const struct automaton_tss automaton_tss = {
- .state_names = {
- "thread",
- "sched"
- },
- .event_names = {
- "sched_switch",
- "schedule_entry",
- "schedule_exit"
- },
- .function = {
- { INVALID_STATE, sched_tss, INVALID_STATE },
- { sched_tss, INVALID_STATE, thread_tss },
- },
- .initial_state = thread_tss,
- .final_states = { 1, 0 },
-};
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 18fa0e358a30..a5e1c52e2992 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -58,11 +58,11 @@ DECLARE_EVENT_CLASS(error_da_monitor,
);
#include <monitors/wip/wip_trace.h>
-#include <monitors/tss/tss_trace.h>
#include <monitors/sco/sco_trace.h>
#include <monitors/scpd/scpd_trace.h>
#include <monitors/snep/snep_trace.h>
#include <monitors/sncid/sncid_trace.h>
+#include <monitors/sts/sts_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
#endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/sts.dot b/tools/verification/models/sched/sts.dot
new file mode 100644
index 000000000000..8152675b4f52
--- /dev/null
+++ b/tools/verification/models/sched/sts.dot
@@ -0,0 +1,29 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = circle] "disable_to_switch"};
+ {node [shape = circle] "enable_to_exit"};
+ {node [shape = circle] "scheduling"};
+ {node [shape = circle] "switching"};
+ {node [shape = plaintext, style=invis, label=""] "__init_thread"};
+ {node [shape = doublecircle] "thread"};
+ {node [shape = circle] "thread"};
+ "__init_thread" -> "thread";
+ "disable_to_switch" [label = "disable_to_switch"];
+ "disable_to_switch" -> "scheduling" [ label = "irq_enable" ];
+ "disable_to_switch" -> "switching" [ label = "sched_switch\nsched_switch_vain" ];
+ "enable_to_exit" [label = "enable_to_exit"];
+ "enable_to_exit" -> "enable_to_exit" [ label = "irq_disable\nirq_enable" ];
+ "enable_to_exit" -> "thread" [ label = "schedule_exit" ];
+ "scheduling" [label = "scheduling"];
+ "scheduling" -> "disable_to_switch" [ label = "irq_disable" ];
+ "switching" [label = "switching"];
+ "switching" -> "enable_to_exit" [ label = "irq_enable" ];
+ "thread" [label = "thread", color = green3];
+ "thread" -> "scheduling" [ label = "schedule_entry" ];
+ "thread" -> "thread" [ label = "irq_disable\nirq_enable" ];
+ { rank = min ;
+ "__init_thread";
+ "thread";
+ }
+}
diff --git a/tools/verification/models/sched/tss.dot b/tools/verification/models/sched/tss.dot
deleted file mode 100644
index 7dfa1d9121bb..000000000000
--- a/tools/verification/models/sched/tss.dot
+++ /dev/null
@@ -1,18 +0,0 @@
-digraph state_automaton {
- center = true;
- size = "7,11";
- {node [shape = plaintext] "sched"};
- {node [shape = plaintext, style=invis, label=""] "__init_thread"};
- {node [shape = ellipse] "thread"};
- {node [shape = plaintext] "thread"};
- "__init_thread" -> "thread";
- "sched" [label = "sched"];
- "sched" -> "sched" [ label = "sched_switch" ];
- "sched" -> "thread" [ label = "schedule_exit" ];
- "thread" [label = "thread", color = green3];
- "thread" -> "sched" [ label = "schedule_entry" ];
- { rank = min ;
- "__init_thread";
- "thread";
- }
-}
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
` (2 preceding siblings ...)
2025-05-14 8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
@ 2025-05-14 8:43 ` Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
4 siblings, 0 replies; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-14 8:43 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
Tomas Glozar, Juri Lelli
Add 2 per-task monitors as part of the sched model:
* nrp: need-resched preempts
Monitor to ensure preemption requires need resched.
* sssw: set state sleep and wakeup
Monitor to ensure sched_set_state to sleepable leads to sleeping and
sleeping tasks require wakeup.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 117 ++++++++++++++++++
include/linux/rv.h | 2 +-
kernel/trace/rv/Kconfig | 2 +
kernel/trace/rv/Makefile | 2 +
kernel/trace/rv/monitors/nrp/Kconfig | 14 +++
kernel/trace/rv/monitors/nrp/nrp.c | 135 +++++++++++++++++++++
kernel/trace/rv/monitors/nrp/nrp.h | 51 ++++++++
kernel/trace/rv/monitors/nrp/nrp_trace.h | 15 +++
kernel/trace/rv/monitors/sssw/Kconfig | 15 +++
kernel/trace/rv/monitors/sssw/sssw.c | 115 ++++++++++++++++++
kernel/trace/rv/monitors/sssw/sssw.h | 64 ++++++++++
kernel/trace/rv/monitors/sssw/sssw_trace.h | 15 +++
kernel/trace/rv/rv_trace.h | 2 +
tools/verification/models/sched/nrp.dot | 19 +++
tools/verification/models/sched/sssw.dot | 24 ++++
15 files changed, 591 insertions(+), 1 deletion(-)
create mode 100644 kernel/trace/rv/monitors/nrp/Kconfig
create mode 100644 kernel/trace/rv/monitors/nrp/nrp.c
create mode 100644 kernel/trace/rv/monitors/nrp/nrp.h
create mode 100644 kernel/trace/rv/monitors/nrp/nrp_trace.h
create mode 100644 kernel/trace/rv/monitors/sssw/Kconfig
create mode 100644 kernel/trace/rv/monitors/sssw/sssw.c
create mode 100644 kernel/trace/rv/monitors/sssw/sssw.h
create mode 100644 kernel/trace/rv/monitors/sssw/sssw_trace.h
create mode 100644 tools/verification/models/sched/nrp.dot
create mode 100644 tools/verification/models/sched/sssw.dot
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index e4171aadef1c..97f0f1a10f43 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -198,6 +198,123 @@ real task switch occurs::
+-------------- | | <-------------+
+--------------------+
+Monitor nrp
+-----------
+
+The need resched preempts (nrp) monitor ensures preemption requires need
+resched. A preemption is any of the following types of ``sched_switch``:
+
+* ``sched_switch_preempt``:
+ a task is ``preempted``, this can happen after the need for ``rescheduling``
+ has been set, also in its ``lazy`` flavour (which doesn't make a different in
+ this monitor). ``need_resched`` can be set as a flag to the task or in the
+ per-core preemption count, either of them can trigger a preemption.
+* ``sched_switch_vain_preempt``:
+ a task goes through the scheduler from a preemption context but it is picked
+ as the next task to run, hence no real task switch occurs. Since we run the
+ scheduler, this clears the need to reschedule.
+
+This monitor ignores when the task is switched in, as this complicates things
+when different types of ``sched_switch`` occur (e.g. sleeping or yielding, here
+marked as ``sched_switch_other`` or ``sched_switch_vain``). The snroc monitor
+ensures a task is switched in before it can be switched out again.
+For this reason, the ``any_thread_running`` state does not imply that the
+monitored task is not running, simply it is not set for rescheduling::
+
+ |
+ |
+ v
+ sched_switch_other #=====================#
+ sched_switch_vain H H
+ +--------------------- H any_thread_running H
+ | H H
+ +--------------------> H H <+
+ #=====================# |
+ | |
+ | | sched_switch_preempt
+ | | sched_switch_vain_preempt
+ | sched_need_resched | sched_switch_other
+ | | sched_switch_vain
+ v |
+ sched_need_resched +---------------------+ |
+ +--------------------- | | |
+ | | rescheduling | |
+ +--------------------> | | -+
+ +---------------------+
+
+Monitor sssw
+------------
+
+The set state sleep and wakeup (sssw) monitor ensures ``sched_set_state`` to
+sleepable leads to sleeping and sleeping tasks require wakeup.
+It includes the following types of ``sched_switch``:
+
+* ``switch_suspend``:
+ a task puts itself to sleep, this can happen only after explicitly setting
+ the task to ``sleepable``. After a task is suspended, it needs to be woken up
+ (``waking`` state) before being switched in again.
+ Setting the task's state to ``sleepable`` can be reverted before switching if it
+ is woken up or set to ``runnable``.
+* ``switch_blocked``:
+ a special case of a ``switch_suspend`` where the task is waiting on a
+ sleeping RT lock (``PREEMPT_RT`` only), it is common to see wakeup and set
+ state events racing with each other and this leads the model to perceive this
+ type of switch when the task is not set to sleepable. This is a limitation of
+ the model in SMP system and workarounds may slow down the system.
+* ``switch_yield``:
+ a task explicitly calls the scheduler, this looks like a preemption as the
+ task is still runnable but the ``need_resched`` flag is not set. It can
+ happen after a ``yield`` system call or from the idle task. By definition,
+ a task cannot yield while ``sleepable`` as that would be a suspension.
+* ``switch_vain``:
+ a task explicitly calls the scheduler but it is picked as the next task to run,
+ hence no real task switch occurs. This can occur as a yield, which is not
+ valid when the task is sleepable. A special case of a yield is when a task in
+ ``TASK_INTERRUPTIBLE`` calls the scheduler while a signal is pending. The
+ task doesn't go through the usual blocking/waking and is set back to
+ runnable, the resulting switch looks like a yield.
+
+As for the nrp monitor, this monitor doesn't include a running state,
+``sleepable`` and ``runnable`` are only referring to the task's desired
+state, which could be scheduled out (e.g. due to preemption). However, it does
+include the event ``sched_switch_in`` to represent when a task is allowed to
+become running. This can be triggered also by preemption, but cannot occur
+after the task got to ``sleeping`` until a ``wakeup``::
+
+ sched_set_state_runnable
+ sched_wakeup
+ sched_switch_vain_preempt |
+ sched_switch_preempt |
+ sched_switch_yield v
+ sched_switch_vain #=============================================#
+ +-----------------> H H
+ | H H
+ +------------------ H runnable H
+ H H
+ +-----------------> H H
+ | #=============================================#
+ sched_set_state_runnable | | ^
+ sched_wakeup | sched_switch_blocking |
+ | sched_set_state_sleepable | |
+ | v | |
+ | +------------------------+ | sched_wakeup
+ +------------------ | | | |
+ | | | |
+ +-----------------> | sleepable | | |
+ | | | | |
+ +------------------ | | | |
+ sched_switch_in +------------------------+ | |
+ sched_switch_preempt | | |
+ sched_switch_vain_preempt sched_switch_suspend | |
+ sched_set_state_sleepable sched_switch_blocking | |
+ | | |
+ v | |
+ +------------------------+ | |
+ | sleeping | <-+ |
+ +------------------------+ |
+ | |
+ +----------------------------+
+
References
----------
diff --git a/include/linux/rv.h b/include/linux/rv.h
index a83a81ac6e46..20c89f5b9a5b 100644
--- a/include/linux/rv.h
+++ b/include/linux/rv.h
@@ -25,7 +25,7 @@ struct da_monitor {
* adding more or developing a dynamic method. So far, none of
* these are justified.
*/
-#define RV_PER_TASK_MONITORS 1
+#define RV_PER_TASK_MONITORS 3
#define RV_PER_TASK_MONITOR_INIT (RV_PER_TASK_MONITORS)
/*
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index a53a3eca9616..f106cf7b2fd3 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -34,6 +34,8 @@ source "kernel/trace/rv/monitors/scpd/Kconfig"
source "kernel/trace/rv/monitors/snep/Kconfig"
source "kernel/trace/rv/monitors/sncid/Kconfig"
source "kernel/trace/rv/monitors/sts/Kconfig"
+source "kernel/trace/rv/monitors/nrp/Kconfig"
+source "kernel/trace/rv/monitors/sssw/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index c609b72275cb..c076cf48af18 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -12,6 +12,8 @@ obj-$(CONFIG_RV_MON_SCPD) += monitors/scpd/scpd.o
obj-$(CONFIG_RV_MON_SNEP) += monitors/snep/snep.o
obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
+obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
+obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/nrp/Kconfig b/kernel/trace/rv/monitors/nrp/Kconfig
new file mode 100644
index 000000000000..726f34259a6d
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/Kconfig
@@ -0,0 +1,14 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_NRP
+ depends on RV
+ depends on RV_MON_SCHED
+ default y
+ select DA_MON_EVENTS_ID
+ bool "nrp monitor"
+ help
+ Monitor to ensure preemption requires need resched.
+ This monitor is part of the sched monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/nrp/nrp.c b/kernel/trace/rv/monitors/nrp/nrp.c
new file mode 100644
index 000000000000..9349026bb60b
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "nrp"
+
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "nrp.h"
+
+static struct rv_monitor rv_nrp;
+DECLARE_DA_MON_PER_TASK(nrp, unsigned char);
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk,
+ int cpu, int tif)
+{
+ da_handle_event_nrp(tsk, sched_need_resched_nrp);
+}
+
+static void handle_schedule_entry(void *data, bool preempt, unsigned long ip)
+{
+ /*
+ * In theory, a preemption can only occur after the need_resched flag
+ * is set. In practice, however, we may see a preemption where the flag
+ * is not set. This can happen in one specific condition:
+ *
+ * need_resched
+ * preempt_schedule()
+ * preempt_schedule_irq()
+ * __schedule()
+ * !need_resched
+ * __schedule()
+ *
+ * In the situation above, we start a standard preemption (e.g. from
+ * preempt_enable when the flag is set), an interrupts occurs before we
+ * schedule and, on its exit path, it schedules, which clears the
+ * need_resched flag.
+ * When the preempted task runs again, we continue the standard
+ * preemption started earlier, although the flag is no longer set.
+ *
+ * The following workaround allows the model not to fail in this
+ * condition, but makes it weaker. In fact, we are not proving that:
+ * 1. we don't miss any event setting need_resched
+ * 2. we don't preempt when not required
+ *
+ * Ideally, we should find a way to narrow down the condition, however
+ * that's rather tricky without adding several tracepoints in
+ * undesired locations.
+ */
+ if (preempt && unlikely(!tif_need_resched()))
+ da_handle_event_nrp(current, sched_need_resched_nrp);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ if (preempt)
+ da_handle_start_event_nrp(prev, sched_switch_preempt_nrp);
+ else
+ da_handle_start_event_nrp(prev, sched_switch_other_nrp);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ if (preempt)
+ da_handle_start_event_nrp(tsk, sched_switch_vain_preempt_nrp);
+ else
+ da_handle_start_event_nrp(tsk, sched_switch_vain_nrp);
+}
+
+static int enable_nrp(void)
+{
+ int retval;
+
+ retval = da_monitor_init_nrp();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("snep", sched_entry_tp, handle_schedule_entry);
+ rv_attach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_attach_trace_probe("nrp", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("nrp", sched_switch_vain_tp, handle_sched_switch_vain);
+
+ return 0;
+}
+
+static void disable_nrp(void)
+{
+ rv_nrp.enabled = 0;
+
+ rv_detach_trace_probe("snep", sched_entry_tp, handle_schedule_entry);
+ rv_detach_trace_probe("nrp", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_detach_trace_probe("nrp", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("nrp", sched_switch_vain_tp, handle_sched_switch_vain);
+
+ da_monitor_destroy_nrp();
+}
+
+static struct rv_monitor rv_nrp = {
+ .name = "nrp",
+ .description = "need resched preempts.",
+ .enable = enable_nrp,
+ .disable = disable_nrp,
+ .reset = da_monitor_reset_all_nrp,
+ .enabled = 0,
+};
+
+static int __init register_nrp(void)
+{
+ return rv_register_monitor(&rv_nrp, &rv_sched);
+}
+
+static void __exit unregister_nrp(void)
+{
+ rv_unregister_monitor(&rv_nrp);
+}
+
+module_init(register_nrp);
+module_exit(unregister_nrp);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("nrp: need resched preempts.");
diff --git a/kernel/trace/rv/monitors/nrp/nrp.h b/kernel/trace/rv/monitors/nrp/nrp.h
new file mode 100644
index 000000000000..51c0e23da72a
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of nrp automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_nrp {
+ any_thread_running_nrp = 0,
+ rescheduling_nrp,
+ state_max_nrp
+};
+
+#define INVALID_STATE state_max_nrp
+
+enum events_nrp {
+ sched_need_resched_nrp = 0,
+ sched_switch_other_nrp,
+ sched_switch_preempt_nrp,
+ sched_switch_vain_nrp,
+ sched_switch_vain_preempt_nrp,
+ event_max_nrp
+};
+
+struct automaton_nrp {
+ char *state_names[state_max_nrp];
+ char *event_names[event_max_nrp];
+ unsigned char function[state_max_nrp][event_max_nrp];
+ unsigned char initial_state;
+ bool final_states[state_max_nrp];
+};
+
+static const struct automaton_nrp automaton_nrp = {
+ .state_names = {
+ "any_thread_running",
+ "rescheduling"
+ },
+ .event_names = {
+ "sched_need_resched",
+ "sched_switch_other",
+ "sched_switch_preempt",
+ "sched_switch_vain",
+ "sched_switch_vain_preempt"
+ },
+ .function = {
+ { rescheduling_nrp, any_thread_running_nrp, INVALID_STATE, any_thread_running_nrp, INVALID_STATE },
+ { rescheduling_nrp, any_thread_running_nrp, any_thread_running_nrp, any_thread_running_nrp, any_thread_running_nrp },
+ },
+ .initial_state = any_thread_running_nrp,
+ .final_states = { 1, 0 },
+};
diff --git a/kernel/trace/rv/monitors/nrp/nrp_trace.h b/kernel/trace/rv/monitors/nrp/nrp_trace.h
new file mode 100644
index 000000000000..2e13497de3b6
--- /dev/null
+++ b/kernel/trace/rv/monitors/nrp/nrp_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_NRP
+DEFINE_EVENT(event_da_monitor_id, event_nrp,
+ TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_nrp,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_NRP */
diff --git a/kernel/trace/rv/monitors/sssw/Kconfig b/kernel/trace/rv/monitors/sssw/Kconfig
new file mode 100644
index 000000000000..23b7eeb38bbf
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_SSSW
+ depends on RV
+ depends on RV_MON_SCHED
+ default y
+ select DA_MON_EVENTS_ID
+ bool "sssw monitor"
+ help
+ Monitor to ensure sched_set_state to sleepable leads to sleeping and
+ sleeping tasks require wakeup.
+ This monitor is part of the sched monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/sssw/sssw.c b/kernel/trace/rv/monitors/sssw/sssw.c
new file mode 100644
index 000000000000..efa695dead36
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "sssw"
+
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "sssw.h"
+
+static struct rv_monitor rv_sssw;
+DECLARE_DA_MON_PER_TASK(sssw, unsigned char);
+
+static void handle_sched_set_state(void *data, struct task_struct *tsk,
+ int state, bool from_signal)
+{
+ if (state == TASK_RUNNING)
+ da_handle_start_event_sssw(tsk, sched_set_state_runnable_sssw);
+ else
+ da_handle_event_sssw(tsk, sched_set_state_sleepable_sssw);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ if (preempt)
+ da_handle_event_sssw(prev, sched_switch_preempt_sssw);
+ else if (prev_state == TASK_RUNNING)
+ da_handle_start_event_sssw(prev, sched_switch_yield_sssw);
+ else if (prev_state == TASK_RTLOCK_WAIT)
+ /* special case of sleeping task with racy conditions */
+ da_handle_event_sssw(prev, sched_switch_blocking_sssw);
+ else
+ da_handle_event_sssw(prev, sched_switch_suspend_sssw);
+ da_handle_event_sssw(next, sched_switch_in_sssw);
+}
+
+static void handle_sched_switch_vain(void *data, bool preempt,
+ struct task_struct *tsk,
+ unsigned int tsk_state)
+{
+ if (preempt)
+ da_handle_event_sssw(tsk, sched_switch_vain_preempt_sssw);
+ else
+ da_handle_start_event_sssw(tsk, sched_switch_vain_sssw);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+ da_handle_start_event_sssw(p, sched_wakeup_sssw);
+}
+
+static int enable_sssw(void)
+{
+ int retval;
+
+ retval = da_monitor_init_sssw();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+ rv_attach_trace_probe("sssw", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("sssw", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_attach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+
+ return 0;
+}
+
+static void disable_sssw(void)
+{
+ rv_sssw.enabled = 0;
+
+ rv_detach_trace_probe("sssw", sched_set_state_tp, handle_sched_set_state);
+ rv_detach_trace_probe("sssw", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("sssw", sched_switch_vain_tp, handle_sched_switch_vain);
+ rv_detach_trace_probe("sssw", sched_wakeup, handle_sched_wakeup);
+
+ da_monitor_destroy_sssw();
+}
+
+static struct rv_monitor rv_sssw = {
+ .name = "sssw",
+ .description = "set state sleep and wakeup.",
+ .enable = enable_sssw,
+ .disable = disable_sssw,
+ .reset = da_monitor_reset_all_sssw,
+ .enabled = 0,
+};
+
+static int __init register_sssw(void)
+{
+ return rv_register_monitor(&rv_sssw, &rv_sched);
+}
+
+static void __exit unregister_sssw(void)
+{
+ rv_unregister_monitor(&rv_sssw);
+}
+
+module_init(register_sssw);
+module_exit(unregister_sssw);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("sssw: set state sleep and wakeup.");
diff --git a/kernel/trace/rv/monitors/sssw/sssw.h b/kernel/trace/rv/monitors/sssw/sssw.h
new file mode 100644
index 000000000000..ae95a03bf2f3
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of sssw automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_sssw {
+ runnanble_sssw = 0,
+ sleepable_sssw,
+ sleeping_sssw,
+ state_max_sssw
+};
+
+#define INVALID_STATE state_max_sssw
+
+enum events_sssw {
+ sched_set_state_runnable_sssw = 0,
+ sched_set_state_sleepable_sssw,
+ sched_switch_blocking_sssw,
+ sched_switch_in_sssw,
+ sched_switch_preempt_sssw,
+ sched_switch_suspend_sssw,
+ sched_switch_vain_sssw,
+ sched_switch_vain_preempt_sssw,
+ sched_switch_yield_sssw,
+ sched_wakeup_sssw,
+ event_max_sssw
+};
+
+struct automaton_sssw {
+ char *state_names[state_max_sssw];
+ char *event_names[event_max_sssw];
+ unsigned char function[state_max_sssw][event_max_sssw];
+ unsigned char initial_state;
+ bool final_states[state_max_sssw];
+};
+
+static const struct automaton_sssw automaton_sssw = {
+ .state_names = {
+ "runnanble",
+ "sleepable",
+ "sleeping"
+ },
+ .event_names = {
+ "sched_set_state_runnable",
+ "sched_set_state_sleepable",
+ "sched_switch_blocking",
+ "sched_switch_in",
+ "sched_switch_preempt",
+ "sched_switch_suspend",
+ "sched_switch_vain",
+ "sched_switch_vain_preempt",
+ "sched_switch_yield",
+ "sched_wakeup"
+ },
+ .function = {
+ { runnanble_sssw, sleepable_sssw, sleeping_sssw, runnanble_sssw, runnanble_sssw, INVALID_STATE, runnanble_sssw, runnanble_sssw, runnanble_sssw, runnanble_sssw },
+ { runnanble_sssw, sleepable_sssw, sleeping_sssw, sleepable_sssw, sleepable_sssw, sleeping_sssw, INVALID_STATE, sleepable_sssw, INVALID_STATE, runnanble_sssw },
+ { INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, INVALID_STATE, runnanble_sssw },
+ },
+ .initial_state = runnanble_sssw,
+ .final_states = { 1, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/sssw/sssw_trace.h b/kernel/trace/rv/monitors/sssw/sssw_trace.h
new file mode 100644
index 000000000000..6c03cfc6960b
--- /dev/null
+++ b/kernel/trace/rv/monitors/sssw/sssw_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_SSSW
+DEFINE_EVENT(event_da_monitor_id, event_sssw,
+ TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_sssw,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+#endif /* CONFIG_RV_MON_SSSW */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index a5e1c52e2992..d12ab74dcabc 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -124,6 +124,8 @@ DECLARE_EVENT_CLASS(error_da_monitor_id,
#include <monitors/wwnr/wwnr_trace.h>
#include <monitors/snroc/snroc_trace.h>
+#include <monitors/nrp/nrp_trace.h>
+#include <monitors/sssw/sssw_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_ID here
#endif /* CONFIG_DA_MON_EVENTS_ID */
diff --git a/tools/verification/models/sched/nrp.dot b/tools/verification/models/sched/nrp.dot
new file mode 100644
index 000000000000..6aa7604e98fc
--- /dev/null
+++ b/tools/verification/models/sched/nrp.dot
@@ -0,0 +1,19 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = plaintext, style=invis, label=""] "__init_any_thread_running"};
+ {node [shape = doublecircle] "any_thread_running"};
+ {node [shape = circle] "any_thread_running"};
+ {node [shape = circle] "rescheduling"};
+ "__init_any_thread_running" -> "any_thread_running";
+ "any_thread_running" [label = "any_thread_running", color = green3];
+ "any_thread_running" -> "any_thread_running" [ label = "sched_switch_other\nsched_switch_vain" ];
+ "any_thread_running" -> "rescheduling" [ label = "sched_need_resched" ];
+ "rescheduling" [label = "rescheduling"];
+ "rescheduling" -> "any_thread_running" [ label = "sched_switch_preempt\nsched_switch_vain_preempt\nsched_switch_other\nsched_switch_vain" ];
+ "rescheduling" -> "rescheduling" [ label = "sched_need_resched" ];
+ { rank = min ;
+ "__init_any_thread_running";
+ "any_thread_running";
+ }
+}
diff --git a/tools/verification/models/sched/sssw.dot b/tools/verification/models/sched/sssw.dot
new file mode 100644
index 000000000000..362e783f2bd5
--- /dev/null
+++ b/tools/verification/models/sched/sssw.dot
@@ -0,0 +1,24 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = plaintext, style=invis, label=""] "__init_runnanble"};
+ {node [shape = doublecircle] "runnanble"};
+ {node [shape = circle] "runnanble"};
+ {node [shape = circle] "sleepable"};
+ {node [shape = circle] "sleeping"};
+ "__init_runnanble" -> "runnanble";
+ "runnanble" [label = "runnanble", color = green3];
+ "runnanble" -> "runnanble" [ label = "sched_set_state_runnable\nsched_wakeup\nsched_switch_in\nsched_switch_vain\nsched_switch_yield\nsched_switch_preempt\nsched_switch_vain_preempt" ];
+ "runnanble" -> "sleepable" [ label = "sched_set_state_sleepable" ];
+ "runnanble" -> "sleeping" [ label = "sched_switch_blocking" ];
+ "sleepable" [label = "sleepable"];
+ "sleepable" -> "runnanble" [ label = "sched_set_state_runnable\nsched_wakeup" ];
+ "sleepable" -> "sleepable" [ label = "sched_set_state_sleepable\nsched_switch_in\nsched_switch_preempt\nsched_switch_vain_preempt" ];
+ "sleepable" -> "sleeping" [ label = "sched_switch_suspend\nsched_switch_blocking" ];
+ "sleeping" [label = "sleeping"];
+ "sleeping" -> "runnanble" [ label = "sched_wakeup" ];
+ { rank = min ;
+ "__init_runnanble";
+ "runnanble";
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
` (3 preceding siblings ...)
2025-05-14 8:43 ` [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors Gabriele Monaco
@ 2025-05-14 8:43 ` Gabriele Monaco
2025-05-27 13:37 ` Nam Cao
4 siblings, 1 reply; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-14 8:43 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Ingo Molnar, Peter Zijlstra, Nam Cao,
Tomas Glozar, Juri Lelli
Add a per-cpu monitor as part of the sched model:
* opid: operations with preemption and irq disabled
Monitor to ensure wakeup and need_resched occur with irq and
preemption disabled or in irq handlers.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 48 +++++++
kernel/trace/rv/Kconfig | 1 +
kernel/trace/rv/Makefile | 1 +
kernel/trace/rv/monitors/opid/Kconfig | 17 +++
kernel/trace/rv/monitors/opid/opid.c | 151 +++++++++++++++++++++
kernel/trace/rv/monitors/opid/opid.h | 64 +++++++++
kernel/trace/rv/monitors/opid/opid_trace.h | 15 ++
kernel/trace/rv/rv_trace.h | 1 +
tools/verification/models/sched/opid.dot | 35 +++++
9 files changed, 333 insertions(+)
create mode 100644 kernel/trace/rv/monitors/opid/Kconfig
create mode 100644 kernel/trace/rv/monitors/opid/opid.c
create mode 100644 kernel/trace/rv/monitors/opid/opid.h
create mode 100644 kernel/trace/rv/monitors/opid/opid_trace.h
create mode 100644 tools/verification/models/sched/opid.dot
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 97f0f1a10f43..f044bca7ac31 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -315,6 +315,54 @@ after the task got to ``sleeping`` until a ``wakeup``::
| |
+----------------------------+
+Monitor opid
+------------
+
+The operations with preemption and irq disabled (opid) monitor ensures
+operations like ``wakeup`` and ``need_resched`` occur with interrupts and
+preemption disabled or during IRQs, in such case preemption may not be disabled
+explicitly.
+``need_resched`` can be set by some RCU internals functions, in which case it
+doesn't match a task wakeup and might occur with only interrupts disabled::
+
+ | sched_need_resched
+ | sched_waking
+ | irq_entry
+ | +--------------------+
+ v v |
+ +------------------------------------------------------+
+ +----------- | disabled | <+
+ | +------------------------------------------------------+ |
+ | | ^ |
+ | | preempt_disable sched_need_resched |
+ | preempt_enable | +--------------------+ |
+ | v | v | |
+ | +------------------------------------------------------+ |
+ | | irq_disabled | |
+ | +------------------------------------------------------+ |
+ | | | ^ |
+ | irq_entry | | |
+ | sched_need_resched v | irq_disable |
+ | sched_waking +--------------+ | | |
+ | +----- | | irq_enable | |
+ | | | in_irq | | | |
+ | +----> | | | | |
+ | +--------------+ | | irq_disable
+ | | | | |
+ | irq_enable | irq_enable | | |
+ | v v | |
+ | #======================================================# |
+ | H enabled H |
+ | #======================================================# |
+ | | ^ ^ preempt_enable | |
+ | preempt_disable preempt_enable +--------------------+ |
+ | v | |
+ | +------------------+ | |
+ +----------> | preempt_disabled | -+ |
+ +------------------+ |
+ | |
+ +-------------------------------------------------------+
+
References
----------
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index f106cf7b2fd3..9ebb80931a9f 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -36,6 +36,7 @@ source "kernel/trace/rv/monitors/sncid/Kconfig"
source "kernel/trace/rv/monitors/sts/Kconfig"
source "kernel/trace/rv/monitors/nrp/Kconfig"
source "kernel/trace/rv/monitors/sssw/Kconfig"
+source "kernel/trace/rv/monitors/opid/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index c076cf48af18..0eca5e77d0d2 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_RV_MON_SNCID) += monitors/sncid/sncid.o
obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
+obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/opid/Kconfig b/kernel/trace/rv/monitors/opid/Kconfig
new file mode 100644
index 000000000000..c59d51654cd1
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_OPID
+ depends on RV
+ depends on IRQSOFF_TRACER
+ depends on PREEMPT_TRACER
+ depends on RV_MON_SCHED
+ default y
+ select DA_MON_EVENTS_IMPLICIT
+ bool "opid monitor"
+ help
+ Monitor to ensure operations like wakeup and need resched occur with
+ interrupts and preemption disabled or during IRQs, where preemption
+ may not be disabled explicitly.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
new file mode 100644
index 000000000000..d8732d681753
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+#include <rv/da_monitor.h>
+
+#define MODULE_NAME "opid"
+
+#include <trace/events/sched.h>
+#include <trace/events/irq.h>
+#include <trace/events/preemptirq.h>
+#include <rv_trace.h>
+#include <monitors/sched/sched.h>
+
+#include "opid.h"
+
+static struct rv_monitor rv_opid;
+DECLARE_DA_MON_PER_CPU(opid, unsigned char);
+
+#ifdef CONFIG_X86_LOCAL_APIC
+#include <asm/trace/irq_vectors.h>
+
+static void handle_vector_irq_entry(void *data, int vector)
+{
+ da_handle_event_opid(irq_entry_opid);
+}
+
+static void attach_vector_irq(void) {
+ rv_attach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+}
+
+static void detach_vector_irq(void) {
+ rv_detach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
+}
+
+#else
+/* We assume irq_entry tracepoints are sufficient on other architectures */
+static void attach_vector_irq() { }
+static void detach_vector_irq() { }
+#endif
+
+static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_opid(irq_disable_opid);
+}
+
+static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_opid(irq_enable_opid);
+}
+
+static void handle_irq_entry(void *data, int irq, struct irqaction *action)
+{
+ da_handle_event_opid(irq_entry_opid);
+}
+
+static void handle_preempt_disable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_opid(preempt_disable_opid);
+}
+
+static void handle_preempt_enable(void *data, unsigned long ip, unsigned long parent_ip)
+{
+ da_handle_event_opid(preempt_enable_opid);
+}
+
+static void handle_sched_need_resched(void *data, struct task_struct *tsk, int cpu, int tif)
+{
+ if(in_irq())
+ da_handle_event_opid(sched_need_resched_opid);
+ else
+ da_handle_start_event_opid(sched_need_resched_opid);
+}
+
+static void handle_sched_waking(void *data, struct task_struct *p)
+{
+ if(in_irq())
+ da_handle_event_opid(sched_waking_opid);
+ else
+ da_handle_start_event_opid(sched_waking_opid);
+}
+
+static int enable_opid(void)
+{
+ int retval;
+
+ retval = da_monitor_init_opid();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("opid", irq_disable, handle_irq_disable);
+ rv_attach_trace_probe("opid", irq_enable, handle_irq_enable);
+ rv_attach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+ rv_attach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+ rv_attach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+ rv_attach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_attach_trace_probe("opid", sched_waking, handle_sched_waking);
+ attach_vector_irq();
+
+ return 0;
+}
+
+static void disable_opid(void)
+{
+ rv_opid.enabled = 0;
+
+ rv_detach_trace_probe("opid", irq_disable, handle_irq_disable);
+ rv_detach_trace_probe("opid", irq_enable, handle_irq_enable);
+ rv_detach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
+ rv_detach_trace_probe("opid", preempt_disable, handle_preempt_disable);
+ rv_detach_trace_probe("opid", preempt_enable, handle_preempt_enable);
+ rv_detach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
+ rv_detach_trace_probe("opid", sched_waking, handle_sched_waking);
+ detach_vector_irq();
+
+ da_monitor_destroy_opid();
+}
+
+/*
+ * This is the monitor register section.
+ */
+static struct rv_monitor rv_opid = {
+ .name = "opid",
+ .description = "operations with preemption and irq disabled.",
+ .enable = enable_opid,
+ .disable = disable_opid,
+ .reset = da_monitor_reset_all_opid,
+ .enabled = 0,
+};
+
+static int __init register_opid(void)
+{
+ rv_register_monitor(&rv_opid, &rv_sched);
+ return 0;
+}
+
+static void __exit unregister_opid(void)
+{
+ rv_unregister_monitor(&rv_opid);
+}
+
+module_init(register_opid);
+module_exit(unregister_opid);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("opid: operations with preemption and irq disabled.");
diff --git a/kernel/trace/rv/monitors/opid/opid.h b/kernel/trace/rv/monitors/opid/opid.h
new file mode 100644
index 000000000000..4c6d4a3964c5
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of opid automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+enum states_opid {
+ disabled_opid = 0,
+ enabled_opid,
+ in_irq_opid,
+ irq_disabled_opid,
+ preempt_disabled_opid,
+ state_max_opid
+};
+
+#define INVALID_STATE state_max_opid
+
+enum events_opid {
+ irq_disable_opid = 0,
+ irq_enable_opid,
+ irq_entry_opid,
+ preempt_disable_opid,
+ preempt_enable_opid,
+ sched_need_resched_opid,
+ sched_waking_opid,
+ event_max_opid
+};
+
+struct automaton_opid {
+ char *state_names[state_max_opid];
+ char *event_names[event_max_opid];
+ unsigned char function[state_max_opid][event_max_opid];
+ unsigned char initial_state;
+ bool final_states[state_max_opid];
+};
+
+static const struct automaton_opid automaton_opid = {
+ .state_names = {
+ "disabled",
+ "enabled",
+ "in_irq",
+ "irq_disabled",
+ "preempt_disabled"
+ },
+ .event_names = {
+ "irq_disable",
+ "irq_enable",
+ "irq_entry",
+ "preempt_disable",
+ "preempt_enable",
+ "sched_need_resched",
+ "sched_waking"
+ },
+ .function = {
+ { INVALID_STATE, preempt_disabled_opid, disabled_opid, INVALID_STATE, irq_disabled_opid, disabled_opid, disabled_opid },
+ { irq_disabled_opid, INVALID_STATE, INVALID_STATE, preempt_disabled_opid, enabled_opid, INVALID_STATE, INVALID_STATE },
+ { INVALID_STATE, enabled_opid, INVALID_STATE, INVALID_STATE, INVALID_STATE, in_irq_opid, in_irq_opid },
+ { INVALID_STATE, enabled_opid, in_irq_opid, disabled_opid, INVALID_STATE, irq_disabled_opid, INVALID_STATE },
+ { disabled_opid, INVALID_STATE, INVALID_STATE, INVALID_STATE, enabled_opid, INVALID_STATE, INVALID_STATE },
+ },
+ .initial_state = disabled_opid,
+ .final_states = { 0, 1, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/opid/opid_trace.h b/kernel/trace/rv/monitors/opid/opid_trace.h
new file mode 100644
index 000000000000..3df6ff955c30
--- /dev/null
+++ b/kernel/trace/rv/monitors/opid/opid_trace.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_OPID
+DEFINE_EVENT(event_da_monitor, event_opid,
+ TP_PROTO(char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor, error_opid,
+ TP_PROTO(char *state, char *event),
+ TP_ARGS(state, event));
+#endif /* CONFIG_RV_MON_OPID */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index d12ab74dcabc..5d3c5c3f7545 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -63,6 +63,7 @@ DECLARE_EVENT_CLASS(error_da_monitor,
#include <monitors/snep/snep_trace.h>
#include <monitors/sncid/sncid_trace.h>
#include <monitors/sts/sts_trace.h>
+#include <monitors/opid/opid_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
#endif /* CONFIG_DA_MON_EVENTS_IMPLICIT */
diff --git a/tools/verification/models/sched/opid.dot b/tools/verification/models/sched/opid.dot
new file mode 100644
index 000000000000..2d5e1df3405f
--- /dev/null
+++ b/tools/verification/models/sched/opid.dot
@@ -0,0 +1,35 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = plaintext, style=invis, label=""] "__init_disabled"};
+ {node [shape = circle] "disabled"};
+ {node [shape = doublecircle] "enabled"};
+ {node [shape = circle] "enabled"};
+ {node [shape = circle] "in_irq"};
+ {node [shape = circle] "irq_disabled"};
+ {node [shape = circle] "preempt_disabled"};
+ "__init_disabled" -> "disabled";
+ "disabled" [label = "disabled"];
+ "disabled" -> "disabled" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
+ "disabled" -> "irq_disabled" [ label = "preempt_enable" ];
+ "disabled" -> "preempt_disabled" [ label = "irq_enable" ];
+ "enabled" [label = "enabled", color = green3];
+ "enabled" -> "enabled" [ label = "preempt_enable" ];
+ "enabled" -> "irq_disabled" [ label = "irq_disable" ];
+ "enabled" -> "preempt_disabled" [ label = "preempt_disable" ];
+ "in_irq" [label = "in_irq"];
+ "in_irq" -> "enabled" [ label = "irq_enable" ];
+ "in_irq" -> "in_irq" [ label = "sched_need_resched\nsched_waking" ];
+ "irq_disabled" [label = "irq_disabled"];
+ "irq_disabled" -> "disabled" [ label = "preempt_disable" ];
+ "irq_disabled" -> "enabled" [ label = "irq_enable" ];
+ "irq_disabled" -> "in_irq" [ label = "irq_entry" ];
+ "irq_disabled" -> "irq_disabled" [ label = "sched_need_resched" ];
+ "preempt_disabled" [label = "preempt_disabled"];
+ "preempt_disabled" -> "disabled" [ label = "irq_disable" ];
+ "preempt_disabled" -> "enabled" [ label = "preempt_enable" ];
+ { rank = min ;
+ "__init_disabled";
+ "disabled";
+ }
+}
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state
2025-05-14 8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
@ 2025-05-19 8:42 ` Nam Cao
2025-05-19 9:04 ` Gabriele Monaco
0 siblings, 1 reply; 16+ messages in thread
From: Nam Cao @ 2025-05-19 8:42 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli
On Wed, May 14, 2025 at 10:43:09AM +0200, Gabriele Monaco wrote:
> .function = {
> - { thread_context_sco, scheduling_context_sco, INVALID_STATE },
> - { INVALID_STATE, INVALID_STATE, thread_context_sco },
> + { thread_context_sco, INVALID_STATE, scheduling_context_sco, INVALID_STATE },
> + { INVALID_STATE, scheduling_context_sco, INVALID_STATE, thread_context_sco },
This is over the 100 column limit.
I know it is not your fault, this is generated. Back when I was playing
with DA monitor, I made a patch to fix this. Maybe you could include it in
your series?
From b4fb648398a29a9c0d8e08bd12394978d3948a5e Mon Sep 17 00:00:00 2001
From: Nam Cao <namcao@linutronix.de>
Date: Fri, 15 Nov 2024 14:56:33 +0100
Subject: [PATCH] tools/rv/dot2c: Fix generated files going over 100 column
limit
The dot2c.py script generates all states in a single line. This breaks the
100 column limit when the state machines are non-trivial.
Change dot2c.py to generate the states in separate lines.
Signed-off-by: Nam Cao <namcao@linutronix.de>
---
tools/verification/rvgen/rvgen/dot2c.py | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/tools/verification/rvgen/rvgen/dot2c.py b/tools/verification/rvgen/rvgen/dot2c.py
index 6009caf568d9..abc0ee569b34 100644
--- a/tools/verification/rvgen/rvgen/dot2c.py
+++ b/tools/verification/rvgen/rvgen/dot2c.py
@@ -152,29 +152,22 @@ class Dot2c(Automata):
max_state_name = max(self.states, key = len).__len__()
return max(max_state_name, self.invalid_state_str.__len__())
- def __get_state_string_length(self):
- maxlen = self.__get_max_strlen_of_states() + self.enum_suffix.__len__()
- return "%" + str(maxlen) + "s"
-
def get_aut_init_function(self):
nr_states = self.states.__len__()
nr_events = self.events.__len__()
buff = []
- strformat = self.__get_state_string_length()
-
for x in range(nr_states):
- line = "\t\t{ "
+ buff.append("\t\t{")
for y in range(nr_events):
next_state = self.function[x][y]
if next_state != self.invalid_state_str:
next_state = self.function[x][y] + self.enum_suffix
if y != nr_events-1:
- line = line + strformat % next_state + ", "
+ buff.append(''.join(("\t\t\t", next_state, ",")))
else:
- line = line + strformat % next_state + " },"
- buff.append(line)
+ buff.append(''.join(("\t\t\t", next_state, "\n\t\t},")))
return self.__buff_to_string(buff)
--
2.39.5
Best regards,
Nam
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state
2025-05-19 8:42 ` Nam Cao
@ 2025-05-19 9:04 ` Gabriele Monaco
0 siblings, 0 replies; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-19 9:04 UTC (permalink / raw)
To: Nam Cao
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli
On Mon, 2025-05-19 at 10:42 +0200, Nam Cao wrote:
> On Wed, May 14, 2025 at 10:43:09AM +0200, Gabriele Monaco wrote:
> > .function = {
> > - { thread_context_sco,
> > scheduling_context_sco, INVALID_STATE },
> > - { INVALID_STATE,
> > INVALID_STATE, thread_context_sco },
> > + { thread_context_sco, INVALID_STATE,
> > scheduling_context_sco, INVALID_STATE },
> > + { INVALID_STATE,
> > scheduling_context_sco, INVALID_STATE,
> > thread_context_sco },
>
> This is over the 100 column limit.
>
> I know it is not your fault, this is generated. Back when I was
> playing
> with DA monitor, I made a patch to fix this. Maybe you could include
> it in
> your series?
>
> From b4fb648398a29a9c0d8e08bd12394978d3948a5e Mon Sep 17 00:00:00
> 2001
> From: Nam Cao <namcao@linutronix.de>
> Date: Fri, 15 Nov 2024 14:56:33 +0100
> Subject: [PATCH] tools/rv/dot2c: Fix generated files going over 100
> column
> limit
>
> The dot2c.py script generates all states in a single line. This
> breaks the
> 100 column limit when the state machines are non-trivial.
>
> Change dot2c.py to generate the states in separate lines.
>
> Signed-off-by: Nam Cao <namcao@linutronix.de>
> ---
> tools/verification/rvgen/rvgen/dot2c.py | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/tools/verification/rvgen/rvgen/dot2c.py
> b/tools/verification/rvgen/rvgen/dot2c.py
> index 6009caf568d9..abc0ee569b34 100644
> --- a/tools/verification/rvgen/rvgen/dot2c.py
> +++ b/tools/verification/rvgen/rvgen/dot2c.py
> @@ -152,29 +152,22 @@ class Dot2c(Automata):
> max_state_name = max(self.states, key = len).__len__()
> return max(max_state_name, self.invalid_state_str.__len__())
>
> - def __get_state_string_length(self):
> - maxlen = self.__get_max_strlen_of_states() +
> self.enum_suffix.__len__()
> - return "%" + str(maxlen) + "s"
> -
> def get_aut_init_function(self):
> nr_states = self.states.__len__()
> nr_events = self.events.__len__()
> buff = []
>
> - strformat = self.__get_state_string_length()
> -
> for x in range(nr_states):
> - line = "\t\t{ "
> + buff.append("\t\t{")
> for y in range(nr_events):
> next_state = self.function[x][y]
> if next_state != self.invalid_state_str:
> next_state = self.function[x][y] +
> self.enum_suffix
>
> if y != nr_events-1:
> - line = line + strformat % next_state + ", "
> + buff.append(''.join(("\t\t\t", next_state,
> ",")))
> else:
> - line = line + strformat % next_state + " },"
> - buff.append(line)
> + buff.append(''.join(("\t\t\t", next_state,
> "\n\t\t},")))
>
> return self.__buff_to_string(buff)
>
Thanks for bringing this up, I'm a bit undecided on this one..
The nice thing of the current representation is that it shows a matrix
and it's relatively easy to see what each event does.
On the other hand it's true larger models do exceed quite a bit the
size limits and considering you aren't really supposed to touch this
file directly (as the script does it for you), perhaps cleaner C code
should be the priority.
I'll play with your patch and see if it negatively affects the workflow
in any way. If not, I'd include it and adapt the monitors (perhaps only
those with long lines, not really need to change all).
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor
2025-05-14 8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
@ 2025-05-27 13:37 ` Nam Cao
2025-05-27 14:35 ` Gabriele Monaco
0 siblings, 1 reply; 16+ messages in thread
From: Nam Cao @ 2025-05-27 13:37 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli
On Wed, May 14, 2025 at 10:43:14AM +0200, Gabriele Monaco wrote:
> Add a per-cpu monitor as part of the sched model:
> * opid: operations with preemption and irq disabled
> Monitor to ensure wakeup and need_resched occur with irq and
> preemption disabled or in irq handlers.
This monitor reports some warnings:
$ perf record -e rv:error_opid --call-graph dwarf -a -- ./stress-epoll
(stress-epoll program from
https://github.com/rouming/test-tools/blob/master/stress-epoll.c)
$ perf script
stress-epoll 315 [003] 527.674724: rv:error_opid: event preempt_disable not expected in the state preempt_disabled
ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
ffffffff9fdfba0d handle_preempt_disable+0x3d ([kernel.kallsyms])
ffffffff9fdd32d0 __traceiter_preempt_disable+0x30 ([kernel.kallsyms])
ffffffff9fdd38fe trace_preempt_off+0x4e ([kernel.kallsyms])
ffffffff9fee6c1c vfs_write+0x12c ([kernel.kallsyms])
ffffffff9fee7128 ksys_write+0x68 ([kernel.kallsyms])
ffffffffa0bdbd92 do_syscall_64+0xb2 ([kernel.kallsyms])
ffffffff9fa00130 entry_SYSCALL_64_after_hwframe+0x77 ([kernel.kallsyms])
f833f __GI___libc_write+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
f833f __GI___libc_write+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
1937 thread_work+0x47 (/root/test-tools/stress-epoll)
891f4 start_thread+0x304 (/usr/lib/x86_64-linux-gnu/libc.so.6)
10989b clone3+0x2b (/usr/lib/x86_64-linux-gnu/libc.so.6)
stress-epoll 318 [002] 527.674759: rv:error_opid: event preempt_disable not expected in the state disabled
ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
ffffffff9fdfba0d handle_preempt_disable+0x3d ([kernel.kallsyms])
ffffffff9fdd32d0 __traceiter_preempt_disable+0x30 ([kernel.kallsyms])
ffffffff9fdd38fe trace_preempt_off+0x4e ([kernel.kallsyms])
ffffffffa0bec1aa _raw_spin_lock_irq+0x1a ([kernel.kallsyms])
ffffffff9ff4fe73 eventfd_write+0x63 ([kernel.kallsyms])
ffffffff9fee6be5 vfs_write+0xf5 ([kernel.kallsyms])
ffffffff9fee7128 ksys_write+0x68 ([kernel.kallsyms])
ffffffffa0bdbd92 do_syscall_64+0xb2 ([kernel.kallsyms])
ffffffff9fa00130 entry_SYSCALL_64_after_hwframe+0x77 ([kernel.kallsyms])
f833f __GI___libc_write+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
f833f __GI___libc_write+0x4f (/usr/lib/x86_64-linux-gnu/libc.so.6)
1937 thread_work+0x47 (/root/test-tools/stress-epoll)
891f4 start_thread+0x304 (/usr/lib/x86_64-linux-gnu/libc.so.6)
10989b clone3+0x2b (/usr/lib/x86_64-linux-gnu/libc.so.6)
I'm not sure what I'm looking at here. Do you think these are kernel bugs,
or the monitor is missing some corner cases?
Best regards,
Nam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor
2025-05-27 13:37 ` Nam Cao
@ 2025-05-27 14:35 ` Gabriele Monaco
2025-05-27 14:50 ` Nam Cao
0 siblings, 1 reply; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-27 14:35 UTC (permalink / raw)
To: Nam Cao
Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli
On Tue, 2025-05-27 at 15:37 +0200, Nam Cao wrote:
> On Wed, May 14, 2025 at 10:43:14AM +0200, Gabriele Monaco wrote:
> > Add a per-cpu monitor as part of the sched model:
> > * opid: operations with preemption and irq disabled
> > Monitor to ensure wakeup and need_resched occur with irq and
> > preemption disabled or in irq handlers.
>
> This monitor reports some warnings:
>
> $ perf record -e rv:error_opid --call-graph dwarf -a -- ./stress-
> epoll
> (stress-epoll program from
> https://github.com/rouming/test-tools/blob/master/stress-epoll.c)
>
Thanks for trying it out, and good to know about this stressor.
Unfortunately it's a bit hard to understand from this stack trace, but
that's very likely a problem in the model.
I have a few ideas where that could be but I believe it's something
visible only on a physical machine (haven't tested much on x86 bare
metal, only VM).
You're running on bare metal right?
> $ perf script
> stress-epoll 315 [003] 527.674724: rv:error_opid: event
> preempt_disable not expected in the state preempt_disabled
> ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
> ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
> ffffffff9fdfba0d handle_preempt_disable+0x3d
> ([kernel.kallsyms])
> ffffffff9fdd32d0 __traceiter_preempt_disable+0x30
> ([kernel.kallsyms])
> ffffffff9fdd38fe trace_preempt_off+0x4e ([kernel.kallsyms])
> ffffffff9fee6c1c vfs_write+0x12c ([kernel.kallsyms])
> ffffffff9fee7128 ksys_write+0x68 ([kernel.kallsyms])
> ffffffffa0bdbd92 do_syscall_64+0xb2 ([kernel.kallsyms])
> ffffffff9fa00130 entry_SYSCALL_64_after_hwframe+0x77
> ([kernel.kallsyms])
> f833f __GI___libc_write+0x4f (/usr/lib/x86_64-
> linux-gnu/libc.so.6)
> f833f __GI___libc_write+0x4f (/usr/lib/x86_64-
> linux-gnu/libc.so.6)
> 1937 thread_work+0x47 (/root/test-tools/stress-
> epoll)
> 891f4 start_thread+0x304 (/usr/lib/x86_64-linux-
> gnu/libc.so.6)
> 10989b clone3+0x2b (/usr/lib/x86_64-linux-
> gnu/libc.so.6)
>
> stress-epoll 318 [002] 527.674759: rv:error_opid: event
> preempt_disable not expected in the state disabled
> ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
> ffffffff9fdfb34f da_event_opid+0x10f ([kernel.kallsyms])
> ffffffff9fdfba0d handle_preempt_disable+0x3d
> ([kernel.kallsyms])
> ffffffff9fdd32d0 __traceiter_preempt_disable+0x30
> ([kernel.kallsyms])
> ffffffff9fdd38fe trace_preempt_off+0x4e ([kernel.kallsyms])
> ffffffffa0bec1aa _raw_spin_lock_irq+0x1a ([kernel.kallsyms])
> ffffffff9ff4fe73 eventfd_write+0x63 ([kernel.kallsyms])
> ffffffff9fee6be5 vfs_write+0xf5 ([kernel.kallsyms])
> ffffffff9fee7128 ksys_write+0x68 ([kernel.kallsyms])
> ffffffffa0bdbd92 do_syscall_64+0xb2 ([kernel.kallsyms])
> ffffffff9fa00130 entry_SYSCALL_64_after_hwframe+0x77
> ([kernel.kallsyms])
> f833f __GI___libc_write+0x4f (/usr/lib/x86_64-
> linux-gnu/libc.so.6)
> f833f __GI___libc_write+0x4f (/usr/lib/x86_64-
> linux-gnu/libc.so.6)
> 1937 thread_work+0x47 (/root/test-tools/stress-
> epoll)
> 891f4 start_thread+0x304 (/usr/lib/x86_64-linux-
> gnu/libc.so.6)
> 10989b clone3+0x2b (/usr/lib/x86_64-linux-
> gnu/libc.so.6)
>
> I'm not sure what I'm looking at here. Do you think these are kernel
> bugs,
> or the monitor is missing some corner cases?
>
As said, likely a missing corner case, I believe it has to do with IRQs
(which is what makes this monitor more complex than it could be).
Thanks for the pointers, I'll try reproduce it this way.
Gabriele
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor
2025-05-27 14:35 ` Gabriele Monaco
@ 2025-05-27 14:50 ` Nam Cao
2025-05-28 11:27 ` Gabriele Monaco
0 siblings, 1 reply; 16+ messages in thread
From: Nam Cao @ 2025-05-27 14:50 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli
On Tue, May 27, 2025 at 04:35:04PM +0200, Gabriele Monaco wrote:
> Thanks for trying it out, and good to know about this stressor.
> Unfortunately it's a bit hard to understand from this stack trace, but
> that's very likely a problem in the model. I have a few ideas where that
> could be but I believe it's something visible only on a physical machine
> (haven't tested much on x86 bare metal, only VM).
>
> You're running on bare metal right?
No, it's QEMU:
qemu-system-x86_64 -enable-kvm -m 2048 -smp 4 \
-nographic \
-drive if=virtio,format=raw,file=bookworm.img \
-kernel /srv/work/namcao/linux/arch/x86/boot/bzImage \
-append "console=ttyS0 root=/dev/vda rw" \
The kernel is just x86 defconfig + the monitors.
Nam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor
2025-05-27 14:50 ` Nam Cao
@ 2025-05-28 11:27 ` Gabriele Monaco
0 siblings, 0 replies; 16+ messages in thread
From: Gabriele Monaco @ 2025-05-28 11:27 UTC (permalink / raw)
To: Nam Cao
Cc: linux-kernel, Steven Rostedt, linux-trace-kernel, linux-doc,
Ingo Molnar, Peter Zijlstra, Tomas Glozar, Juri Lelli
On Tue, 2025-05-27 at 16:50 +0200, Nam Cao wrote:
> On Tue, May 27, 2025 at 04:35:04PM +0200, Gabriele Monaco wrote:
> > Thanks for trying it out, and good to know about this stressor.
> > Unfortunately it's a bit hard to understand from this stack trace,
> > but
> > that's very likely a problem in the model. I have a few ideas
> > where that
> > could be but I believe it's something visible only on a physical
> > machine
> > (haven't tested much on x86 bare metal, only VM).
> >
> > You're running on bare metal right?
>
> No, it's QEMU:
>
> qemu-system-x86_64 -enable-kvm -m 2048 -smp 4 \
> -nographic \
> -drive if=virtio,format=raw,file=bookworm.img \
> -kernel /srv/work/namcao/linux/arch/x86/boot/bzImage \
> -append "console=ttyS0 root=/dev/vda rw" \
>
> The kernel is just x86 defconfig + the monitors.
>
Apparently the error is visible on non-PREEMPT_RT only, the models are
designed for preempt-rt and I didn't really test them elsewhere.
Not sure if it's worth tailoring them for non RT kernels, but for now I
can just mark those monitors as RT-only via Kconfig.
Especially this type of monitors is describing very accurately the
preemptiveness of some events, I wouldn't be too surprised if some
rules don't hold in all preempt configurations.
The idea is that, as long as the models stand true, some assumptions
about latency can be made, on the long run this type of assumptions is
likely different across preemption models.
That said, it might be a stupid mistake as well, so I'd look into that
more closely ;)
Thanks again,
Gabriele
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
2025-05-14 8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
@ 2025-06-24 7:36 ` Nam Cao
2025-06-24 14:44 ` Gabriele Monaco
0 siblings, 1 reply; 16+ messages in thread
From: Nam Cao @ 2025-06-24 7:36 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli, john.ogness
On Wed, May 14, 2025 at 10:43:11AM +0200, Gabriele Monaco wrote:
> diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h b/kernel/trace/rv/monitors/sts/sts_trace.h
> similarity index 67%
> rename from kernel/trace/rv/monitors/tss/tss_trace.h
> rename to kernel/trace/rv/monitors/sts/sts_trace.h
> index 4619dbb50cc0..d78beb58d5b3 100644
> --- a/kernel/trace/rv/monitors/tss/tss_trace.h
> +++ b/kernel/trace/rv/monitors/sts/sts_trace.h
> @@ -4,12 +4,12 @@
> * Snippet to be included in rv_trace.h
> */
>
> -#ifdef CONFIG_RV_MON_TSS
> -DEFINE_EVENT(event_da_monitor, event_tss,
> +#ifdef CONFIG_RV_MON_STS
> +DEFINE_EVENT(event_da_monitor, event_sts,
> TP_PROTO(char *state, char *event, char *next_state, bool final_state),
> TP_ARGS(state, event, next_state, final_state));
>
> -DEFINE_EVENT(error_da_monitor, error_tss,
> +DEFINE_EVENT(error_da_monitor, error_sts,
> TP_PROTO(char *state, char *event),
> TP_ARGS(state, event));
> -#endif /* CONFIG_RV_MON_TSS */
> +#endif /* CONFIG_RV_MON_STS */
You are changing the tracepoint's name. Should we worry about breaking
userspace?
It probably doesn't matter at the moment, because I doubt anyone is really
relying on this tracepoint. But I think we should have a definite stance on
this, for future references.
I have seen tracepoints being changed (I know of [1][2][3], I was one of
them :P), so it seems to be considered okay. But adding userspace tools to
the equation and it doesn't make sense to me. For example, lttng is using
the page_fault tracepoints [4], which is broken by [3].
If this should be stable user API, then we should starting thinking about
better API which allows changes like this to happen. Otherwise, they should
be clearly documented to be unstable.
(I think I may also need to change my rtapp's tracepoint names at some point
in the future, that's why I am asking)
Best regards,
Nam
[1] commit dbb6ecb328cb ("btrfs: tracepoints: simplify raid56 events")
[2] commit 244132c4e577 ("tracing/timers: Rename the hrtimer_init event to hrtimer_setup")
[3] https://lore.kernel.org/lkml/2dda8c03-072a-43b2-af0c-bb996d64c388@cs.wisc.edu/#t
[4] https://github.com/lttng/lttng-modules/blob/master/include/instrumentation/events/arch/x86/exceptions.h#L88C48-L88C63
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
2025-06-24 7:36 ` Nam Cao
@ 2025-06-24 14:44 ` Gabriele Monaco
2025-06-24 15:50 ` Nam Cao
0 siblings, 1 reply; 16+ messages in thread
From: Gabriele Monaco @ 2025-06-24 14:44 UTC (permalink / raw)
To: Nam Cao
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli, john.ogness
On Tue, 2025-06-24 at 09:36 +0200, Nam Cao wrote:
> On Wed, May 14, 2025 at 10:43:11AM +0200, Gabriele Monaco wrote:
> > diff --git a/kernel/trace/rv/monitors/tss/tss_trace.h
> > b/kernel/trace/rv/monitors/sts/sts_trace.h
> > similarity index 67%
> > rename from kernel/trace/rv/monitors/tss/tss_trace.h
> > rename to kernel/trace/rv/monitors/sts/sts_trace.h
> > index 4619dbb50cc0..d78beb58d5b3 100644
> > --- a/kernel/trace/rv/monitors/tss/tss_trace.h
> > +++ b/kernel/trace/rv/monitors/sts/sts_trace.h
> > @@ -4,12 +4,12 @@
> > * Snippet to be included in rv_trace.h
> > */
> >
> > -#ifdef CONFIG_RV_MON_TSS
> > -DEFINE_EVENT(event_da_monitor, event_tss,
> > +#ifdef CONFIG_RV_MON_STS
> > +DEFINE_EVENT(event_da_monitor, event_sts,
> > TP_PROTO(char *state, char *event, char *next_state,
> > bool final_state),
> > TP_ARGS(state, event, next_state, final_state));
> >
> > -DEFINE_EVENT(error_da_monitor, error_tss,
> > +DEFINE_EVENT(error_da_monitor, error_sts,
> > TP_PROTO(char *state, char *event),
> > TP_ARGS(state, event));
> > -#endif /* CONFIG_RV_MON_TSS */
> > +#endif /* CONFIG_RV_MON_STS */
>
> You are changing the tracepoint's name. Should we worry about
> breaking
> userspace?
>
Well, to be extremely picky, although that's what git shows, I'm not
changing tracepoints names, I'm removing tracepoints and adding similar
ones with different names.
That said, you're bringing a very good point, I guess removing/adding
monitors is going to be something quite common in the near future.
> It probably doesn't matter at the moment, because I doubt anyone is
> really
> relying on this tracepoint. But I think we should have a definite
> stance on
> this, for future references.
>
> I have seen tracepoints being changed (I know of [1][2][3], I was one
> of
> them :P), so it seems to be considered okay. But adding userspace
> tools to
> the equation and it doesn't make sense to me. For example, lttng is
> using
> the page_fault tracepoints [4], which is broken by [3].
>
> If this should be stable user API, then we should starting thinking
> about
> better API which allows changes like this to happen. Otherwise, they
> should
> be clearly documented to be unstable.
>
> (I think I may also need to change my rtapp's tracepoint names at
> some point
> in the future, that's why I am asking)
>
As you mentioned, nobody is likely relying on those tracepoints names
at the moment, but I would rather be cautious basing userspace tools on
some monitors to exist at all.
In my opinion, RV tracepoints are useful as an alternative of
reactors/rv userspace tool but cannot be used without considering the
RV interface itself (e.g. available_monitors and friends).
We could at least stick to the following assumptions:
1. monitors can change names, be added or removed
2. tracepoints are consistent to monitor names (in available_monitors)
3. the tracepoint structure does not change (i.e. event_/error_, args.)
(can change for new monitors types where seen fit)
If in the future we allow the possibility to build RV monitors as BPF
programs, we'd probably also allow monitors without tracepoints at all,
but I'd say for now those assumptions are sensible.
What do you think?
Thanks,
Gabriele
> Best regards,
> Nam
>
> [1] commit dbb6ecb328cb ("btrfs: tracepoints: simplify raid56
> events")
> [2] commit 244132c4e577 ("tracing/timers: Rename the hrtimer_init
> event to hrtimer_setup")
> [3]
> https://lore.kernel.org/lkml/2dda8c03-072a-43b2-af0c-bb996d64c388@cs.wisc.edu/#t
> [4]
> https://github.com/lttng/lttng-modules/blob/master/include/instrumentation/events/arch/x86/exceptions.h#L88C48-L88C63
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
2025-06-24 14:44 ` Gabriele Monaco
@ 2025-06-24 15:50 ` Nam Cao
2025-06-24 19:31 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Nam Cao @ 2025-06-24 15:50 UTC (permalink / raw)
To: Gabriele Monaco
Cc: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli, john.ogness
On Tue, Jun 24, 2025 at 04:44:49PM +0200, Gabriele Monaco wrote:
> As you mentioned, nobody is likely relying on those tracepoints names
> at the moment, but I would rather be cautious basing userspace tools on
> some monitors to exist at all.
>
> In my opinion, RV tracepoints are useful as an alternative of
> reactors/rv userspace tool but cannot be used without considering the
> RV interface itself (e.g. available_monitors and friends).
>
> We could at least stick to the following assumptions:
> 1. monitors can change names, be added or removed
> 2. tracepoints are consistent to monitor names (in available_monitors)
> 3. the tracepoint structure does not change (i.e. event_/error_, args.)
> (can change for new monitors types where seen fit)
>
> If in the future we allow the possibility to build RV monitors as BPF
> programs, we'd probably also allow monitors without tracepoints at all,
> but I'd say for now those assumptions are sensible.
>
> What do you think?
I would like that. Ideally, the userspace tools only use tracepoints based
on available_monitors.
However, people may not do that, and just use tracepoints directly.
You could argue that those tools are not correctly designed. Therefore it
is their fault that the tools are broken after updating kernel.
On the other hand, there is this sentiment that we must never break
userspace.
I don't know enough to judge this. Maybe @Steven has something to add?
Best regards,
Nam
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
2025-06-24 15:50 ` Nam Cao
@ 2025-06-24 19:31 ` Steven Rostedt
2025-06-27 15:02 ` Nam Cao
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2025-06-24 19:31 UTC (permalink / raw)
To: Nam Cao
Cc: Gabriele Monaco, linux-kernel, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli, john.ogness
On Tue, 24 Jun 2025 17:50:53 +0200
Nam Cao <namcao@linutronix.de> wrote:
> I would like that. Ideally, the userspace tools only use tracepoints based
> on available_monitors.
>
> However, people may not do that, and just use tracepoints directly.
>
> You could argue that those tools are not correctly designed. Therefore it
> is their fault that the tools are broken after updating kernel.
>
> On the other hand, there is this sentiment that we must never break
> userspace.
>
> I don't know enough to judge this. Maybe @Steven has something to add?
So WRT tracepoints, it's the same as a tree falling in the woods[1].
If a user space ABI "breaks" but no user space tooling notices, did it
really break?
The answer is "No".
As for tracepoints, its fine to change them until it's not ;-)
We had only one case that a tracepoint change broke user space where
Linus reverted that change [2]. That was because powertop hard coded
the addresses to the tracepoint field offsets and didn't use the format
files (what libtraceevent gives you). And I removed an unused common
field, which shifted everything and broke powertop.
But I converted powertop to use libtraceevent, waited a few year until
all the major distros provided the new powertop, and then I removed the
field. Guess what? Nobody noticed. (And old powertop would still break).
BPF taps into most tracepoints that change all the time. I'm cleaning
up unused tracepoints which included a couple that were left around to
"not break old BPF programs". I replied, if an old BPF program relies on
that tracepoint, keeping it around (but not used) is *worse* than
having that BPF program break. That's because that BPF program is
still broken (it's expecting that unused tracepoint to fire) but now
it's getting garbage for output (that being no output!). It's worse
because it's "silently failing" and the user may be relying on
something they don't know is broken.
So yeah, change the tracepoint when the code its tracing changes. That
way any tooling depending on it, knows that it can no longer depend on
it.
Anything using tracepoints are pretty much tied to the kernel anyway,
and when the kernel updates, the tooling that is relying on it also
needs to be updated otherwise it's not getting the information it is
expecting. That most definitely includes monitors.
-- Steve
[1] https://en.wikipedia.org/wiki/If_a_tree_falls_in_a_forest_and_no_one_is_around_to_hear_it,_does_it_make_a_sound%3F
[2] https://lwn.net/Articles/442113/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts
2025-06-24 19:31 ` Steven Rostedt
@ 2025-06-27 15:02 ` Nam Cao
0 siblings, 0 replies; 16+ messages in thread
From: Nam Cao @ 2025-06-27 15:02 UTC (permalink / raw)
To: Steven Rostedt
Cc: Gabriele Monaco, linux-kernel, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc, Ingo Molnar, Peter Zijlstra,
Tomas Glozar, Juri Lelli, john.ogness
On Tue, Jun 24, 2025 at 03:31:25PM -0400, Steven Rostedt wrote:
> So WRT tracepoints, it's the same as a tree falling in the woods[1].
>
> If a user space ABI "breaks" but no user space tooling notices, did it
> really break?
>
> The answer is "No".
>
> As for tracepoints, its fine to change them until it's not ;-)
>
> We had only one case that a tracepoint change broke user space where
> Linus reverted that change [2]. That was because powertop hard coded
> the addresses to the tracepoint field offsets and didn't use the format
> files (what libtraceevent gives you). And I removed an unused common
> field, which shifted everything and broke powertop.
>
> But I converted powertop to use libtraceevent, waited a few year until
> all the major distros provided the new powertop, and then I removed the
> field. Guess what? Nobody noticed. (And old powertop would still break).
>
> BPF taps into most tracepoints that change all the time. I'm cleaning
> up unused tracepoints which included a couple that were left around to
> "not break old BPF programs". I replied, if an old BPF program relies on
> that tracepoint, keeping it around (but not used) is *worse* than
> having that BPF program break. That's because that BPF program is
> still broken (it's expecting that unused tracepoint to fire) but now
> it's getting garbage for output (that being no output!). It's worse
> because it's "silently failing" and the user may be relying on
> something they don't know is broken.
>
> So yeah, change the tracepoint when the code its tracing changes. That
> way any tooling depending on it, knows that it can no longer depend on
> it.
>
> Anything using tracepoints are pretty much tied to the kernel anyway,
> and when the kernel updates, the tooling that is relying on it also
> needs to be updated otherwise it's not getting the information it is
> expecting. That most definitely includes monitors.
Alright, thanks for sharing. That was an interesting discussion you had
back then.
Let me keep an eye out for any user tools based on RV, making sure they use
our API properly.
Best regards,
Nam
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-06-27 15:02 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250514084314.57976-1-gmonaco@redhat.com>
2025-05-14 8:43 ` [RFC PATCH v2 07/12] rv: Adapt the sco monitor to the new set_state Gabriele Monaco
2025-05-19 8:42 ` Nam Cao
2025-05-19 9:04 ` Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 08/12] rv: Extend and adapt snroc model Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 09/12] rv: Replace tss monitor with more complete sts Gabriele Monaco
2025-06-24 7:36 ` Nam Cao
2025-06-24 14:44 ` Gabriele Monaco
2025-06-24 15:50 ` Nam Cao
2025-06-24 19:31 ` Steven Rostedt
2025-06-27 15:02 ` Nam Cao
2025-05-14 8:43 ` [RFC PATCH v2 11/12] rv: Add nrp and sssw per-task monitors Gabriele Monaco
2025-05-14 8:43 ` [RFC PATCH v2 12/12] rv: Add opid per-cpu monitor Gabriele Monaco
2025-05-27 13:37 ` Nam Cao
2025-05-27 14:35 ` Gabriele Monaco
2025-05-27 14:50 ` Nam Cao
2025-05-28 11:27 ` Gabriele Monaco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).