* [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
@ 2025-09-19 14:09 ` Gabriele Monaco
2025-10-02 9:26 ` Nam Cao
2025-09-19 14:09 ` [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata Gabriele Monaco
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Gabriele Monaco @ 2025-09-19 14:09 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, linux-trace-kernel,
linux-doc
Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
Clark Williams, John Kacur
Previous changes refactored the da_monitor header file to avoid using
macros. This implies a few changes in how to import and use da_monitor
helpers:
DECLARE_DA_MON_<TYPE>(name, type) is substituted by
#define RV_MON_TYPE RV_MON_<TYPE>
da_handle_event_<name>() is substituted by
da_handle_event()
Update the documentation to reflect the changes.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_synthesis.rst | 44 ++++++++++----------
1 file changed, 21 insertions(+), 23 deletions(-)
diff --git a/Documentation/trace/rv/monitor_synthesis.rst b/Documentation/trace/rv/monitor_synthesis.rst
index ac808a7554f5..ce0c1a5104d4 100644
--- a/Documentation/trace/rv/monitor_synthesis.rst
+++ b/Documentation/trace/rv/monitor_synthesis.rst
@@ -100,54 +100,52 @@ rv/da_monitor.h
This initial implementation presents three different types of monitor instances:
-- ``#define DECLARE_DA_MON_GLOBAL(name, type)``
-- ``#define DECLARE_DA_MON_PER_CPU(name, type)``
-- ``#define DECLARE_DA_MON_PER_TASK(name, type)``
+- ``#define RV_MON_TYPE RV_MON_GLOBAL``
+- ``#define RV_MON_TYPE RV_MON_PER_CPU``
+- ``#define RV_MON_TYPE RV_MON_PER_TASK``
-The first declares the functions for a global deterministic automata monitor,
-the second for monitors with per-cpu instances, and the third with per-task
-instances.
+The first sets up functions declaration for a global deterministic automata
+monitor, the second for monitors with per-cpu instances, and the third with
+per-task instances.
-In all cases, the 'name' argument is a string that identifies the monitor, and
-the 'type' argument is the data type used by rvgen on the representation of
-the model in C.
+In all cases, the C file must include the $(MODEL_NAME).h file (generated by
+`rvgen`), for example, to define the per-cpu 'wip' monitor, the `wip.c` source
+file must include::
-For example, the wip model with two states and three events can be
-stored in an 'unsigned char' type. Considering that the preemption control
-is a per-cpu behavior, the monitor declaration in the 'wip.c' file is::
-
- DECLARE_DA_MON_PER_CPU(wip, unsigned char);
+ #define RV_MON_TYPE RV_MON_PER_CPU
+ #include "wip.h"
+ #include <rv/da_monitor.h>
The monitor is executed by sending events to be processed via the functions
presented below::
- da_handle_event_$(MONITOR_NAME)($(event from event enum));
- da_handle_start_event_$(MONITOR_NAME)($(event from event enum));
- da_handle_start_run_event_$(MONITOR_NAME)($(event from event enum));
+ da_handle_event($(event from event enum));
+ da_handle_start_event($(event from event enum));
+ da_handle_start_run_event($(event from event enum));
-The function ``da_handle_event_$(MONITOR_NAME)()`` is the regular case where
+The function ``da_handle_event()`` is the regular case where
the event will be processed if the monitor is processing events.
When a monitor is enabled, it is placed in the initial state of the automata.
However, the monitor does not know if the system is in the *initial state*.
-The ``da_handle_start_event_$(MONITOR_NAME)()`` function is used to notify the
+The ``da_handle_start_event()`` function is used to notify the
monitor that the system is returning to the initial state, so the monitor can
start monitoring the next event.
-The ``da_handle_start_run_event_$(MONITOR_NAME)()`` function is used to notify
+The ``da_handle_start_run_event()`` function is used to notify
the monitor that the system is known to be in the initial state, so the
monitor can start monitoring and monitor the current event.
Using the wip model as example, the events "preempt_disable" and
"sched_waking" should be sent to monitor, respectively, via [2]::
- da_handle_event_wip(preempt_disable_wip);
- da_handle_event_wip(sched_waking_wip);
+ da_handle_event(preempt_disable_wip);
+ da_handle_event(sched_waking_wip);
While the event "preempt_enabled" will use::
- da_handle_start_event_wip(preempt_enable_wip);
+ da_handle_start_event(preempt_enable_wip);
To notify the monitor that the system will be returning to the initial state,
so the system and the monitor should be in sync.
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
2025-09-19 14:09 ` [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring Gabriele Monaco
@ 2025-09-19 14:09 ` Gabriele Monaco
2025-10-10 13:46 ` Nam Cao
2025-09-19 14:09 ` [PATCH v2 14/20] rv: Add sample hybrid monitors stall Gabriele Monaco
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Gabriele Monaco @ 2025-09-19 14:09 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, linux-trace-kernel,
linux-doc
Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
Clark Williams, John Kacur
Describe theory and implementation of hybrid automata in the dedicated
page hybrid_automata.rst
Include a section on how to integrate a hybrid automaton in
monitor_synthesis.rst
Also remove a hanging $ in deterministic_automata.rst
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
.../trace/rv/deterministic_automata.rst | 2 +-
Documentation/trace/rv/hybrid_automata.rst | 340 ++++++++++++++++++
Documentation/trace/rv/index.rst | 1 +
Documentation/trace/rv/monitor_synthesis.rst | 107 ++++++
4 files changed, 449 insertions(+), 1 deletion(-)
create mode 100644 Documentation/trace/rv/hybrid_automata.rst
diff --git a/Documentation/trace/rv/deterministic_automata.rst b/Documentation/trace/rv/deterministic_automata.rst
index d0638f95a455..7a1c2b20ec72 100644
--- a/Documentation/trace/rv/deterministic_automata.rst
+++ b/Documentation/trace/rv/deterministic_automata.rst
@@ -11,7 +11,7 @@ where:
- *E* is the finite set of events;
- x\ :subscript:`0` is the initial state;
- X\ :subscript:`m` (subset of *X*) is the set of marked (or final) states.
-- *f* : *X* x *E* -> *X* $ is the transition function. It defines the state
+- *f* : *X* x *E* -> *X* is the transition function. It defines the state
transition in the occurrence of an event from *E* in the state *X*. In the
special case of deterministic automata, the occurrence of the event in *E*
in a state in *X* has a deterministic next state from *X*.
diff --git a/Documentation/trace/rv/hybrid_automata.rst b/Documentation/trace/rv/hybrid_automata.rst
new file mode 100644
index 000000000000..ea701114c54d
--- /dev/null
+++ b/Documentation/trace/rv/hybrid_automata.rst
@@ -0,0 +1,340 @@
+Hybrid Automata
+===============
+
+Hybrid automata are an extension of deterministic automata, there are several
+definitions of hybrid automata in the literature. The adaptation implemented
+here is formally denoted by G and defined as a 7-tuple:
+
+ *G* = { *X*, *E*, *V*, *f*, x\ :subscript:`0`, X\ :subscript:`m`, *i* }
+
+- *X* is the set of states;
+- *E* is the finite set of events;
+- *V* is the finite set of environment variables;
+- x\ :subscript:`0` is the initial state;
+- X\ :subscript:`m` (subset of *X*) is the set of marked (or final) states.
+- *f* : *X* x *E* x *C(V)* -> *X* is the transition function.
+ It defines the state transition in the occurrence of an event from *E* in the
+ state *X*. Unlike deterministic automata, the transition function also
+ includes guards from the set of all possible constraints (defined as *C(V)*).
+ Guards can be true or false with the valuation of *V* when the event occurs,
+ and the transition is possible only when constraints are true. Similarly to
+ deterministic automata, the occurrence of the event in *E* in a state in *X*
+ has a deterministic next state from *X*, if the guard is true.
+- *i* : *X* -> *C'(V)* is the invariant assignment function, this is a
+ constraint assigned to each state in *X*, every state in *X* must be left
+ before the invariant turns to false. We can omit the representation of
+ invariants whose value is true regardless of the valuation of *V*.
+
+The set of all possible constraints *C(V)* is defined according to the
+following grammar:
+
+ g = v < c | v > c | v <= c | v >= c | v == c | v != c | g && g | true
+
+With v a variable in *V* and c a numerical value.
+
+We define the special case of hybrid automata whose variables grow with uniform
+rates as timed automata. In this case, the variables are called clocks.
+As the name implies, timed automata can be used to describe real time.
+Additionally, clocks support another type of guard which always evaluates to true:
+
+ reset(v)
+
+The reset constraint is used to set the value of a clock to 0.
+
+The set of invariant constraints *C'(V)* is a subset of *C(V)* including only
+constraint of the form:
+
+ g = v < c | true
+
+This simplifies the implementation as a clock expiration is a necessary and
+sufficient condition for the violation of invariants while still allowing more
+complex constraints to be specified as guards.
+
+It is important to note that any valid hybrid automaton is a valid
+deterministic automaton with additional guards and invariants. Those can only
+further constrain what transitions are valid but it is not possible to define
+transition functions starting from the same state in *X* and the same event in
+*E* but ending up in different states in *X* based on the valuation of *V*.
+
+Examples
+--------
+
+Wip as hybrid automaton
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The 'wip' (wakeup in preemptive) example introduced as a deterministic automaton
+can also be described as:
+
+- *X* = { ``any_thread_running`` }
+- *E* = { ``sched_waking`` }
+- *V* = { ``preemptive`` }
+- x\ :subscript:`0` = ``any_thread_running``
+- X\ :subscript:`m` = {``any_thread_running``}
+- *f* =
+ - *f*\ (``any_thread_running``, ``sched_waking``, ``preemptive==0``) = ``any_thread_running``
+- *i* =
+ - *i*\ (``any_thread_running``) = ``true``
+
+Which can be represented graphically as::
+
+ |
+ |
+ v
+ #====================# sched_waking;preemptive==0
+ H H ------------------------------+
+ H any_thread_running H |
+ H H <-----------------------------+
+ #====================#
+
+In this example, by using the preemptive state of the system as an environment
+variable, we can assert this constraint on ``sched_waking`` without requiring
+preemption events (as we would in a deterministic automaton), which can be
+useful in case those events are not available or not reliable on the system.
+
+Since all the invariants in *i* are true, we can omit them from the representation.
+
+Stall model with guards (iteration 1)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As a sample timed automaton we can define 'stall' as:
+
+- *X* = { ``dequeued``, ``enqueued``, ``running``}
+- *E* = { ``enqueue``, ``dequeue``, ``switch_in``}
+- *V* = { ``clk`` }
+- x\ :subscript:`0` = ``dequeue``
+- X\ :subscript:`m` = {``dequeue``}
+- *f* =
+ - *f*\ (``enqueued``, ``switch_in``, ``clk < threshold``) = ``running``
+ - *f*\ (``running``, ``dequeue``) = ``dequeued``
+ - *f*\ (``dequeued``, ``enqueue``, ``reset(clk)``) = ``enqueued``
+- *i* = *omitted as all true*
+
+Graphically represented as::
+
+ |
+ |
+ v
+ #============================#
+ H dequeued H <+
+ #============================# |
+ | |
+ | enqueue; reset(clk) |
+ v |
+ +----------------------------+ |
+ | enqueued | | dequeue
+ +----------------------------+ |
+ | |
+ | switch_in; clk < threshold |
+ v |
+ +----------------------------+ |
+ | running | -+
+ +----------------------------+
+
+This model imposes that the time between when a task is enqueued (it becomes
+runnable) and when the task gets to run must be lower than a certain threshold.
+A failure in this model means that the task is starving.
+One problem in using guards on the edges in this case is that the model will
+not report a failure until the ``switch_in`` event occurs. This means that,
+according to the model, it is valid for the task never to run.
+
+Stall model with invariants (iteration 2)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The first iteration isn't exactly what was intended, we can change the model as:
+
+- *X* = { ``dequeued``, ``enqueued``, ``running``}
+- *E* = { ``enqueue``, ``dequeue``, ``switch_in``}
+- *V* = { ``clk`` }
+- x\ :subscript:`0` = ``dequeue``
+- X\ :subscript:`m` = {``dequeue``}
+- *f* =
+ - *f*\ (``enqueued``, ``switch_in``) = ``running``
+ - *f*\ (``running``, ``dequeue``) = ``dequeued``
+ - *f*\ (``dequeued``, ``enqueue``, ``reset(clk)``) = ``enqueued``
+- *i* =
+ - *i*\ (``enqueued``) = ``clk < threshold``
+
+Graphically::
+
+ |
+ |
+ v
+ #=========================#
+ H dequeued H <+
+ #=========================# |
+ | |
+ | enqueue; reset(clk) |
+ v |
+ +-------------------------+ |
+ | enqueued | |
+ | clk < threshold | | dequeue
+ +-------------------------+ |
+ | |
+ | switch_in |
+ v |
+ +-------------------------+ |
+ | running | -+
+ +-------------------------+
+
+In this case, we moved the guard as an invariant to the ``enqueued`` state,
+this means we not only forbid the occurrence of ``switch_in`` when ``clk`` is
+past the threshold but also mark as invalid in case we are *still* in
+``enqueued`` after the threshold. This model is effectively in an invalid state
+as soon as a task is starving, rather than when the starving task finally runs.
+
+Hybrid Automaton in C
+---------------------
+
+The definition of hybrid automata in C is heavily based on the deterministic
+automata one. Specifically, we add the set of environment variables and the
+constraints (both guards on transitions and invariants on states) as follows.
+This is a combination of both iterations of the stall example::
+
+ /* enum representation of X (set of states) to be used as index */
+ enum states {
+ dequeued = 0,
+ enqueued,
+ running,
+ state_max
+ };
+
+ #define INVALID_STATE state_max
+
+ /* enum representation of E (set of events) to be used as index */
+ enum events {
+ dequeue = 0,
+ enqueue,
+ switch_in,
+ event_max
+ };
+
+ /* enum representation of V (set of environment variables) to be used as index */
+ enum envs {
+ clk = 0,
+ env_max,
+ env_max_stored = env_max
+ };
+
+ struct automaton {
+ char *state_names[state_max]; // X: the set of states
+ char *event_names[event_max]; // E: the finite set of events
+ char *env_names[env_max]; // V: the finite set of env vars
+ unsigned char function[state_max][event_max]; // f: transition function
+ unsigned char initial_state; // x_0: the initial state
+ bool final_states[state_max]; // X_m: the set of marked states
+ };
+
+ struct automaton aut = {
+ .state_names = {
+ "dequeued",
+ "enqueued",
+ "running"
+ },
+ .event_names = {
+ "dequeue",
+ "enqueue",
+ "switch_in"
+ },
+ .env_names = {
+ "clk"
+ },
+ .function = {
+ { INVALID_STATE, enqueued, INVALID_STATE },
+ { INVALID_STATE, INVALID_STATE, running },
+ { dequeued, INVALID_STATE, INVALID_STATE },
+ },
+ .initial_state = dequeued,
+ .final_states = { 1, 0, 0 },
+ };
+
+ static bool verify_constraint(enum states curr_state, enum events event,
+ enum states next_state)
+ {
+ bool res = true;
+
+ /* Validate guards as part of f */
+ if (curr_state == enqueued && event == sched_switch_in)
+ res = get_env(clk) < threshold;
+ else if (curr_state == dequeued && event == sched_wakeup)
+ reset_env(clk);
+
+ /* Validate invariants in i */
+ if (next_state == curr_state || !res)
+ return res;
+ if (next_state == enqueued)
+ ha_start_timer_jiffy(ha_mon, clk, threshold_jiffies);
+ else if (curr_state == enqueued)
+ res = !ha_cancel_timer(ha_mon);
+ return res;
+ }
+
+The function ``verify_constraint``, here reported as simplified, checks guards,
+performs resets and starts timers to validate invariants according to
+specification.
+Due to the complex nature of environment variables, the user needs to provide
+functions to get and reset environment variables, although we provide some
+helpers for common types (e.g. clocks with ns or jiffy granularity).
+Since invariants are only defined as clock expirations (e.g. *clk <
+threshold*), the callback for timers armed when entering the state is in fact a
+failure in the model and triggers a reaction. Leaving the state stops the timer
+and checks for its expiration, in case the callback was late.
+
+It is important to note that timers introduce overhead, if the monitor has
+several instances (e.g. all tasks) this can become an issue.
+If the monitor is guaranteed to *eventually* leave the state and the incurred
+delay to wait for the next event is acceptable, guards can be use to lower the
+monitor's overhead.
+For instance in the stall example, if we are only interested in reporting
+stalled tasks, rather than reacting as soon as a task is stalled, and
+``switch_in`` is eventually going to occur, we can use the first iteration.
+
+Graphviz .dot format
+--------------------
+
+Also the Graphviz representation of hybrid automata is an extension of the
+deterministic automata one. Specifically, guards can be provided in the event
+name separated by ``;``::
+
+ "state_start" -> "state_dest" [ label = "sched_waking;preemptible==0;reset(clk)" ];
+
+Invariant can be specified in the state label (not the node name!) separated by ``\n``::
+
+ "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
+
+Constraints can be specified as valid C comparisons and allow spaces, the first
+element of the comparison must be the clock while the second is a numerical or
+parametrised value. Guards allow comparisons to be combined with boolean
+operations (``&&`` and ``||``), resets must be separated from other constraints.
+
+This is the full example of the last version of the 'stall' model in DOT::
+
+ digraph state_automaton {
+ {node [shape = circle] "enqueued"};
+ {node [shape = plaintext, style=invis, label=""] "__init_dequeued"};
+ {node [shape = doublecircle] "dequeued"};
+ {node [shape = circle] "running"};
+ "__init_dequeued" -> "dequeued";
+ "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
+ "running" [label = "running"];
+ "dequeued" [label = "dequeued"];
+ "enqueued" -> "running" [ label = "switch_in" ];
+ "running" -> "dequeued" [ label = "dequeue" ];
+ "dequeued" -> "enqueued" [ label = "enqueue;reset(clk)" ];
+ { rank = min ;
+ "__init_dequeued";
+ "dequeued";
+ }
+ }
+
+References
+----------
+
+One book covering model checking and timed automata is::
+
+ Christel Baier and Joost-Pieter Katoen: Principles of Model Checking,
+ The MIT Press, 2008.
+
+Hybrid automata are described in detail in::
+
+ Thomas Henzinger: The theory of hybrid automata,
+ Proceedings 11th Annual IEEE Symposium on Logic in Computer Science, 1996.
diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst
index a2812ac5cfeb..ad298784bda2 100644
--- a/Documentation/trace/rv/index.rst
+++ b/Documentation/trace/rv/index.rst
@@ -9,6 +9,7 @@ Runtime Verification
runtime-verification.rst
deterministic_automata.rst
linear_temporal_logic.rst
+ hybrid_automata.rst
monitor_synthesis.rst
da_monitor_instrumentation.rst
monitor_wip.rst
diff --git a/Documentation/trace/rv/monitor_synthesis.rst b/Documentation/trace/rv/monitor_synthesis.rst
index ce0c1a5104d4..00d3abd8c05a 100644
--- a/Documentation/trace/rv/monitor_synthesis.rst
+++ b/Documentation/trace/rv/monitor_synthesis.rst
@@ -45,6 +45,7 @@ creating monitors. The header files are:
* rv/da_monitor.h for deterministic automaton monitor.
* rv/ltl_monitor.h for linear temporal logic monitor.
+ * rv/ha_monitor.h for hybrid automaton monitor.
rvgen
-----
@@ -252,6 +253,112 @@ the task, the monitor may need some time to start validating tasks which have
been running before the monitor is enabled. Therefore, it is recommended to
start the tasks of interest after enabling the monitor.
+rv/ha_monitor.h
++++++++++++++++
+
+The implementation of hybrid automaton monitors derives directly from the
+deterministic automaton one. Despite using a different header
+(``ha_monitor.h``) the functions to handle events are the same (e.g.
+``da_handle_event``).
+
+Additionally, the `rvgen` tool populates skeletons for the
+``ha_verify_constraint``, ``ha_get_env`` and ``ha_reset_env`` based on the
+monitor specification in the monitor source file.
+
+``ha_verify_constraint`` is typically ready as it is generated by `rvgen`:
+
+* standard constraints on edges are turned into the form::
+
+ res = ha_get_env(ha_mon, ENV) < VALUE;
+
+* reset constraints are turned into the form::
+
+ ha_reset_env(ha_mon, ENV);
+
+* constraints on the state are implemented using timers
+
+ - armed before entering the state
+
+ - cancelled while entering any other state
+
+ - untouched if the state does not change as a result of the event
+
+ - checked if the timer expired but the callback did not run
+
+ - available implementation are `HA_TIMER_HRTIMER` and `HA_TIMER_WHEEL`
+
+ - hrtimers are more precise but may have higher overhead
+
+ - select by defining `HA_TIMER_TYPE` before including the header::
+
+ #define HA_TIMER_TYPE HA_TIMER_HRTIMER
+
+Constraint values can be specified in different forms:
+
+* literal value (with optional unit). E.g.::
+
+ preemptive == 0
+ clk < 100ns
+ threshold <= 10j
+
+* constant value (uppercase string). E.g.::
+
+ clk < MAX_NS
+
+* parameter (lowercase string). E.g.::
+
+ clk <= threshold_jiffies
+
+* macro (uppercase string with parentheses). E.g.::
+
+ clk < MAX_NS()
+
+* function (lowercase string with parentheses). E.g.::
+
+ clk <= threshold_jiffies()
+
+In all cases, `rvgen` will try to understand the type of the environment
+variable from the name or unit. For instance, constants or parameters
+terminating with ``_NS`` or ``_jiffies`` are intended as clocks with ns and jiffy
+granularity, respectively. Literals with measure unit `j` are jiffies and if a
+time unit is specified (`ns` to `s`), `rvgen` will convert the value to `ns`.
+
+Constants need to be defined by the user (but unlike the name, they don't
+necessarily need to be defined as constants). Parameters get converted to
+module parameters and the user needs to provide a default value.
+Also function and macros are defined by the user, by default they get as an
+argument the ``ha_monitor``, a common usage would be to get the required value
+from the target, e.g. the task in per-task monitors, using the helper
+``ha_get_target(ha_mon)``.
+
+If `rvgen` determines that the variable is a clock, it provides the getter and
+resetter based on the unit. Otherwise, the user needs to provide an appropriate
+definition.
+Typically non-clock environment variables are not reset. In such case only the
+getter skeleton will be present in the file generated by `rvgen`.
+For instance, the getter for preemptive can be filled as::
+
+ static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs env)
+ {
+ if (env == preemptible)
+ return preempt_count() == 0;
+ return ENV_INVALID_VALUE;
+ }
+
+The function is supplied the ``ha_mon`` parameter in case some storage is
+required (as it is for clocks), but environment variables without reset do not
+require a storage and can ignore that argument.
+The number of environment variables requiring a storage is limited by
+``MAX_HA_ENV_LEN``, however such limitation doesn't stand for other variables.
+
+Finally, constraints on states are only valid for clocks and only if the
+constraint is of the form `clk < N`. This is because such constraints are
+implemented with the expiration of a timer.
+Typically the clock variables are reset just before arming the timer, but this
+doesn't have to be the case and the available functions take care of it.
+It is a responsibility of per-task monitors to make sure no timer is left
+running when the task exits.
+
Final remarks
-------------
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 14/20] rv: Add sample hybrid monitors stall
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
2025-09-19 14:09 ` [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring Gabriele Monaco
2025-09-19 14:09 ` [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata Gabriele Monaco
@ 2025-09-19 14:09 ` Gabriele Monaco
2025-10-10 14:23 ` Nam Cao
2025-09-19 14:09 ` [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton Gabriele Monaco
2025-09-19 14:09 ` [PATCH v2 20/20] rv: Add deadline monitors Gabriele Monaco
4 siblings, 1 reply; 14+ messages in thread
From: Gabriele Monaco @ 2025-09-19 14:09 UTC (permalink / raw)
To: linux-kernel, Jonathan Corbet, Steven Rostedt, Masami Hiramatsu,
linux-doc, linux-trace-kernel
Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
Clark Williams, John Kacur
Add a sample monitor to showcase hybrid/timed automata.
The stall monitor identifies tasks stalled for longer than a threshold
and reacts when that happens.
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/tools/rv/index.rst | 1 +
Documentation/tools/rv/rv-mon-stall.rst | 44 ++++++
Documentation/trace/rv/index.rst | 1 +
Documentation/trace/rv/monitor_stall.rst | 43 ++++++
kernel/trace/rv/Kconfig | 1 +
kernel/trace/rv/Makefile | 1 +
kernel/trace/rv/monitors/stall/Kconfig | 13 ++
kernel/trace/rv/monitors/stall/stall.c | 146 +++++++++++++++++++
kernel/trace/rv/monitors/stall/stall.h | 64 ++++++++
kernel/trace/rv/monitors/stall/stall_trace.h | 19 +++
kernel/trace/rv/rv_trace.h | 1 +
tools/verification/models/stall.dot | 20 +++
12 files changed, 354 insertions(+)
create mode 100644 Documentation/tools/rv/rv-mon-stall.rst
create mode 100644 Documentation/trace/rv/monitor_stall.rst
create mode 100644 kernel/trace/rv/monitors/stall/Kconfig
create mode 100644 kernel/trace/rv/monitors/stall/stall.c
create mode 100644 kernel/trace/rv/monitors/stall/stall.h
create mode 100644 kernel/trace/rv/monitors/stall/stall_trace.h
create mode 100644 tools/verification/models/stall.dot
diff --git a/Documentation/tools/rv/index.rst b/Documentation/tools/rv/index.rst
index 64ba2efe2e85..4d66e0a78e1e 100644
--- a/Documentation/tools/rv/index.rst
+++ b/Documentation/tools/rv/index.rst
@@ -16,6 +16,7 @@ Runtime verification (rv) tool
rv-mon-wip
rv-mon-wwnr
rv-mon-sched
+ rv-mon-stall
.. only:: subproject and html
diff --git a/Documentation/tools/rv/rv-mon-stall.rst b/Documentation/tools/rv/rv-mon-stall.rst
new file mode 100644
index 000000000000..c79d7c2e4dd4
--- /dev/null
+++ b/Documentation/tools/rv/rv-mon-stall.rst
@@ -0,0 +1,44 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+rv-mon-stall
+============
+--------------------
+Stalled task monitor
+--------------------
+
+:Manual section: 1
+
+SYNOPSIS
+========
+
+**rv mon stall** [*OPTIONS*]
+
+DESCRIPTION
+===========
+
+The stalled task (**stall**) monitor is a sample per-task timed monitor that
+checks if tasks are scheduled within a defined threshold after they are ready.
+
+See kernel documentation for further information about this monitor:
+<https://docs.kernel.org/trace/rv/monitor_stall.html>
+
+OPTIONS
+=======
+
+.. include:: common_ikm.rst
+
+SEE ALSO
+========
+
+**rv**\(1), **rv-mon**\(1)
+
+Linux kernel *RV* documentation:
+<https://www.kernel.org/doc/html/latest/trace/rv/index.html>
+
+AUTHOR
+======
+
+Written by Gabriele Monaco <gmonaco@redhat.com>
+
+.. include:: common_appendix.rst
diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst
index ad298784bda2..bf9962f49959 100644
--- a/Documentation/trace/rv/index.rst
+++ b/Documentation/trace/rv/index.rst
@@ -16,3 +16,4 @@ Runtime Verification
monitor_wwnr.rst
monitor_sched.rst
monitor_rtapp.rst
+ monitor_stall.rst
diff --git a/Documentation/trace/rv/monitor_stall.rst b/Documentation/trace/rv/monitor_stall.rst
new file mode 100644
index 000000000000..e4d9b050a32f
--- /dev/null
+++ b/Documentation/trace/rv/monitor_stall.rst
@@ -0,0 +1,43 @@
+Monitor stall
+=============
+
+- Name: stall - wakeup in preemptive
+- Type: per-task hybrid automaton
+- Author: Gabriele Monaco <gmonaco@redhat.com>
+
+Description
+-----------
+
+The stalled task (stall) monitor is a sample per-task timed monitor that checks
+if tasks are scheduled within a defined threshold after they are ready::
+
+ |
+ |
+ v
+ #==================================#
+ H dequeued H <+
+ #==================================# |
+ | |
+ | sched_wakeup;reset(clk) |
+ v |
+ +----------------------------------+ |
+ | enqueued | |
+ | clk < threshold_jiffies | | sched_switch_wait
+ +----------------------------------+ |
+ | |
+ | sched_switch_in |
+ sched_switch_in v |
+ sched_wakeup +----------------------------------+ |
+ +------------------ | | |
+ | | running | |
+ +-----------------> | | -+
+ +----------------------------------+
+
+
+The threshold can be configured as a parameter by either booting with the
+``stall.threshold_jiffies=<new value>`` argument or writing a new value to
+``/sys/module/stall/parameters/threshold_jiffies``.
+
+Specification
+-------------
+Grapviz Dot file in tools/verification/models/stall.dot
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 4ad392dfc57f..720fbe4935f8 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -78,6 +78,7 @@ source "kernel/trace/rv/monitors/pagefault/Kconfig"
source "kernel/trace/rv/monitors/sleep/Kconfig"
# Add new rtapp monitors here
+source "kernel/trace/rv/monitors/stall/Kconfig"
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 750e4ad6fa0f..51c95e2d2da6 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_RV_MON_STS) += monitors/sts/sts.o
obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
+obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/stall/Kconfig b/kernel/trace/rv/monitors/stall/Kconfig
new file mode 100644
index 000000000000..6f846b642544
--- /dev/null
+++ b/kernel/trace/rv/monitors/stall/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_STALL
+ depends on RV
+ select HA_MON_EVENTS_ID
+ bool "stall monitor"
+ help
+ Enable the stall sample monitor that illustrates the usage of hybrid
+ automata monitors. It can be used to identify tasks stalled for
+ longer than a threshold.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_stall.rst
diff --git a/kernel/trace/rv/monitors/stall/stall.c b/kernel/trace/rv/monitors/stall/stall.c
new file mode 100644
index 000000000000..20e1345601a4
--- /dev/null
+++ b/kernel/trace/rv/monitors/stall/stall.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+
+#define MODULE_NAME "stall"
+
+#include <trace/events/sched.h>
+#include <rv_trace.h>
+
+#define RV_MON_TYPE RV_MON_PER_TASK
+#define HA_TIMER_TYPE HA_TIMER_WHEEL
+#include "stall.h"
+#include <rv/ha_monitor.h>
+
+static u64 threshold_jiffies = 1000;
+module_param(threshold_jiffies, ullong, 0644);
+
+static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_stall env, u64 time_ns)
+{
+ if (env == clk_stall)
+ return ha_get_clk_jiffy(ha_mon, env);
+ return ENV_INVALID_VALUE;
+}
+
+static void ha_reset_env(struct ha_monitor *ha_mon, enum envs_stall env, u64 time_ns)
+{
+ if (env == clk_stall)
+ ha_reset_clk_jiffy(ha_mon, env);
+}
+
+static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (curr_state == enqueued_stall)
+ return ha_check_invariant_jiffy(ha_mon, clk_stall, time_ns);
+ return true;
+}
+
+static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ bool res = true;
+
+ if (curr_state == dequeued_stall && event == sched_wakeup_stall)
+ ha_reset_env(ha_mon, clk_stall, time_ns);
+ return res;
+}
+
+static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (next_state == curr_state)
+ return;
+ if (next_state == enqueued_stall)
+ ha_start_timer_jiffy(ha_mon, clk_stall, threshold_jiffies, time_ns);
+ else if (curr_state == enqueued_stall)
+ ha_cancel_timer(ha_mon);
+}
+
+static bool ha_verify_constraint(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (!ha_verify_invariants(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ if (!ha_verify_guards(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ ha_setup_invariants(ha_mon, curr_state, event, next_state, time_ns);
+
+ return true;
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ if (!preempt && prev_state != TASK_RUNNING)
+ da_handle_start_event(prev, sched_switch_wait_stall);
+ da_handle_event(next, sched_switch_in_stall);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *p)
+{
+ da_handle_event(p, sched_wakeup_stall);
+}
+
+static int enable_stall(void)
+{
+ int retval;
+
+ retval = da_monitor_init();
+ if (retval)
+ return retval;
+
+ rv_attach_trace_probe("stall", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("stall", sched_wakeup, handle_sched_wakeup);
+
+ return 0;
+}
+
+static void disable_stall(void)
+{
+ rv_this.enabled = 0;
+
+ rv_detach_trace_probe("stall", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("stall", sched_wakeup, handle_sched_wakeup);
+
+ da_monitor_destroy();
+}
+
+static struct rv_monitor rv_this = {
+ .name = "stall",
+ .description = "identify tasks stalled for longer than a threshold.",
+ .enable = enable_stall,
+ .disable = disable_stall,
+ .reset = da_monitor_reset_all,
+ .enabled = 0,
+};
+
+static int __init register_stall(void)
+{
+ return rv_register_monitor(&rv_this, NULL);
+}
+
+static void __exit unregister_stall(void)
+{
+ rv_unregister_monitor(&rv_this);
+}
+
+module_init(register_stall);
+module_exit(unregister_stall);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("stall: identify tasks stalled for longer than a threshold.");
diff --git a/kernel/trace/rv/monitors/stall/stall.h b/kernel/trace/rv/monitors/stall/stall.h
new file mode 100644
index 000000000000..434b0c623fc4
--- /dev/null
+++ b/kernel/trace/rv/monitors/stall/stall.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of stall automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+#define MONITOR_NAME stall
+
+enum states_stall {
+ dequeued_stall,
+ enqueued_stall,
+ running_stall,
+ state_max_stall,
+};
+
+#define INVALID_STATE state_max_stall
+
+enum events_stall {
+ sched_switch_in_stall,
+ sched_switch_wait_stall,
+ sched_wakeup_stall,
+ event_max_stall,
+};
+
+enum envs_stall {
+ clk_stall,
+ env_max_stall,
+ env_max_stored_stall = env_max_stall,
+};
+
+_Static_assert(env_max_stored_stall <= MAX_HA_ENV_LEN, "Not enough slots");
+
+struct automaton_stall {
+ char *state_names[state_max_stall];
+ char *event_names[event_max_stall];
+ char *env_names[env_max_stall];
+ unsigned char function[state_max_stall][event_max_stall];
+ unsigned char initial_state;
+ bool final_states[state_max_stall];
+};
+
+static const struct automaton_stall automaton_stall = {
+ .state_names = {
+ "dequeued",
+ "enqueued",
+ "running",
+ },
+ .event_names = {
+ "sched_switch_in",
+ "sched_switch_wait",
+ "sched_wakeup",
+ },
+ .env_names = {
+ "clk",
+ },
+ .function = {
+ { INVALID_STATE, INVALID_STATE, enqueued_stall },
+ { running_stall, INVALID_STATE, INVALID_STATE },
+ { running_stall, dequeued_stall, running_stall },
+ },
+ .initial_state = dequeued_stall,
+ .final_states = { 1, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/stall/stall_trace.h b/kernel/trace/rv/monitors/stall/stall_trace.h
new file mode 100644
index 000000000000..6a7cc1b1d040
--- /dev/null
+++ b/kernel/trace/rv/monitors/stall/stall_trace.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_STALL
+DEFINE_EVENT(event_da_monitor_id, event_stall,
+ TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_stall,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+
+DEFINE_EVENT(error_env_da_monitor_id, error_env_stall,
+ TP_PROTO(int id, char *state, char *event, char *env),
+ TP_ARGS(id, state, event, env));
+#endif /* CONFIG_RV_MON_STALL */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 7c598967bc0e..1661f8fe4a88 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -187,6 +187,7 @@ DECLARE_EVENT_CLASS(error_env_da_monitor_id,
__get_str(env))
);
+#include <monitors/stall/stall_trace.h>
// Add new monitors based on CONFIG_HA_MON_EVENTS_ID here
#endif
diff --git a/tools/verification/models/stall.dot b/tools/verification/models/stall.dot
new file mode 100644
index 000000000000..98e3ae47e104
--- /dev/null
+++ b/tools/verification/models/stall.dot
@@ -0,0 +1,20 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = circle] "enqueued"};
+ {node [shape = plaintext, style=invis, label=""] "__init_dequeued"};
+ {node [shape = doublecircle] "dequeued"};
+ {node [shape = circle] "running"};
+ "__init_dequeued" -> "dequeued";
+ "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
+ "running" [label = "running"];
+ "dequeued" [label = "dequeued", color = green3];
+ "running" -> "running" [ label = "sched_switch_in\nsched_wakeup" ];
+ "enqueued" -> "running" [ label = "sched_switch_in" ];
+ "running" -> "dequeued" [ label = "sched_switch_wait" ];
+ "dequeued" -> "enqueued" [ label = "sched_wakeup;reset(clk)" ];
+ { rank = min ;
+ "__init_dequeued";
+ "dequeued";
+ }
+}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
` (2 preceding siblings ...)
2025-09-19 14:09 ` [PATCH v2 14/20] rv: Add sample hybrid monitors stall Gabriele Monaco
@ 2025-09-19 14:09 ` Gabriele Monaco
2025-10-10 14:29 ` Nam Cao
2025-09-19 14:09 ` [PATCH v2 20/20] rv: Add deadline monitors Gabriele Monaco
4 siblings, 1 reply; 14+ messages in thread
From: Gabriele Monaco @ 2025-09-19 14:09 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
Clark Williams, John Kacur
The opid monitor validates that wakeup and need_resched events only
occur with interrupts and preemption disabled by following the
preemptirq tracepoints.
As reported in [1], those tracepoints might be inaccurate in some
situations (e.g. NMIs).
Since the monitor doesn't validate other ordering properties, remove the
dependency on preemptirq tracepoints and convert the monitor to a hybrid
automaton to validate the constraint during event handling.
This makes the monitor more robust by also removing the workaround for
interrupts missing the preemption tracepoints, which was working on
PREEMPT_RT only and allows the monitor to be built on kernels without
the preemptirqs tracepoints.
[1] - https://lore.kernel.org/lkml/20250625120823.60600-1-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/monitor_sched.rst | 62 +++---------
kernel/trace/rv/monitors/opid/Kconfig | 11 +-
kernel/trace/rv/monitors/opid/opid.c | 111 +++++++--------------
kernel/trace/rv/monitors/opid/opid.h | 86 ++++------------
kernel/trace/rv/monitors/opid/opid_trace.h | 4 +
kernel/trace/rv/rv_trace.h | 2 +-
tools/verification/models/sched/opid.dot | 36 ++-----
7 files changed, 82 insertions(+), 230 deletions(-)
diff --git a/Documentation/trace/rv/monitor_sched.rst b/Documentation/trace/rv/monitor_sched.rst
index 3f8381ad9ec7..0b96d6e147c6 100644
--- a/Documentation/trace/rv/monitor_sched.rst
+++ b/Documentation/trace/rv/monitor_sched.rst
@@ -346,55 +346,21 @@ Monitor opid
The operations with preemption and irq disabled (opid) monitor ensures
operations like ``wakeup`` and ``need_resched`` occur with interrupts and
-preemption disabled or during interrupt context, in such case preemption may
-not be disabled explicitly.
+preemption disabled.
``need_resched`` can be set by some RCU internals functions, in which case it
-doesn't match a task wakeup and might occur with only interrupts disabled::
-
- | sched_need_resched
- | sched_waking
- | irq_entry
- | +--------------------+
- v v |
- +------------------------------------------------------+
- +----------- | disabled | <+
- | +------------------------------------------------------+ |
- | | ^ |
- | | preempt_disable sched_need_resched |
- | preempt_enable | +--------------------+ |
- | v | v | |
- | +------------------------------------------------------+ |
- | | irq_disabled | |
- | +------------------------------------------------------+ |
- | | | ^ |
- | irq_entry irq_entry | | |
- | sched_need_resched v | irq_disable |
- | sched_waking +--------------+ | | |
- | +----- | | irq_enable | |
- | | | in_irq | | | |
- | +----> | | | | |
- | +--------------+ | | irq_disable
- | | | | |
- | irq_enable | irq_enable | | |
- | v v | |
- | #======================================================# |
- | H enabled H |
- | #======================================================# |
- | | ^ ^ preempt_enable | |
- | preempt_disable preempt_enable +--------------------+ |
- | v | |
- | +------------------+ | |
- +----------> | preempt_disabled | -+ |
- +------------------+ |
- | |
- +-------------------------------------------------------+
-
-This monitor is designed to work on ``PREEMPT_RT`` kernels, the special case of
-events occurring in interrupt context is a shortcut to identify valid scenarios
-where the preemption tracepoints might not be visible, during interrupts
-preemption is always disabled. On non- ``PREEMPT_RT`` kernels, the interrupts
-might invoke a softirq to set ``need_resched`` and wake up a task. This is
-another special case that is currently not supported by the monitor.
+doesn't match a task wakeup and might occur with only interrupts disabled.
+The interrupt and preemption status are validated by the hybrid automaton
+constraints when processing the events::
+
+ |
+ |
+ v
+ #=========# sched_need_resched;irq_off == 1
+ H H sched_waking;irq_off == 1 && preempt_off == 1
+ H any H ------------------------------------------------+
+ H H |
+ H H <-----------------------------------------------+
+ #=========#
References
----------
diff --git a/kernel/trace/rv/monitors/opid/Kconfig b/kernel/trace/rv/monitors/opid/Kconfig
index 561d32da572b..6d02e239b684 100644
--- a/kernel/trace/rv/monitors/opid/Kconfig
+++ b/kernel/trace/rv/monitors/opid/Kconfig
@@ -2,18 +2,13 @@
#
config RV_MON_OPID
depends on RV
- depends on TRACE_IRQFLAGS
- depends on TRACE_PREEMPT_TOGGLE
depends on RV_MON_SCHED
- default y if PREEMPT_RT
- select DA_MON_EVENTS_IMPLICIT
+ default y
+ select HA_MON_EVENTS_IMPLICIT
bool "opid monitor"
help
Monitor to ensure operations like wakeup and need resched occur with
- interrupts and preemption disabled or during IRQs, where preemption
- may not be disabled explicitly.
-
- This monitor is unstable on !PREEMPT_RT, say N unless you are testing it.
+ interrupts and preemption disabled.
For further information, see:
Documentation/trace/rv/monitor_sched.rst
diff --git a/kernel/trace/rv/monitors/opid/opid.c b/kernel/trace/rv/monitors/opid/opid.c
index 25a40e90fa40..160a518ce1cb 100644
--- a/kernel/trace/rv/monitors/opid/opid.c
+++ b/kernel/trace/rv/monitors/opid/opid.c
@@ -10,94 +10,63 @@
#define MODULE_NAME "opid"
#include <trace/events/sched.h>
-#include <trace/events/irq.h>
-#include <trace/events/preemptirq.h>
#include <rv_trace.h>
#include <monitors/sched/sched.h>
#define RV_MON_TYPE RV_MON_PER_CPU
#include "opid.h"
-#include <rv/da_monitor.h>
+#include <rv/ha_monitor.h>
-#ifdef CONFIG_X86_LOCAL_APIC
-#include <asm/trace/irq_vectors.h>
-
-static void handle_vector_irq_entry(void *data, int vector)
+static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_opid env, u64 time_ns)
{
- da_handle_event(irq_entry_opid);
-}
-
-static void attach_vector_irq(void)
-{
- rv_attach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
- if (IS_ENABLED(CONFIG_IRQ_WORK))
- rv_attach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
- if (IS_ENABLED(CONFIG_SMP)) {
- rv_attach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
- rv_attach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
- rv_attach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
+ if (env == irq_off_opid)
+ return irqs_disabled();
+ else if (env == preempt_off_opid) {
+ /*
+ * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
+ * preemption (adding one to the preempt_count). Since we are
+ * interested in the preempt_count at the time the tracepoint was
+ * hit, we consider 1 as still enabled.
+ */
+ if (IS_ENABLED(CONFIG_PREEMPTION))
+ return (preempt_count() & PREEMPT_MASK) > 1;
+ return true;
}
+ return ENV_INVALID_VALUE;
}
-static void detach_vector_irq(void)
+static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
{
- rv_detach_trace_probe("opid", local_timer_entry, handle_vector_irq_entry);
- if (IS_ENABLED(CONFIG_IRQ_WORK))
- rv_detach_trace_probe("opid", irq_work_entry, handle_vector_irq_entry);
- if (IS_ENABLED(CONFIG_SMP)) {
- rv_detach_trace_probe("opid", reschedule_entry, handle_vector_irq_entry);
- rv_detach_trace_probe("opid", call_function_entry, handle_vector_irq_entry);
- rv_detach_trace_probe("opid", call_function_single_entry, handle_vector_irq_entry);
- }
+ bool res = true;
+
+ if (curr_state == any_opid && event == sched_need_resched_opid)
+ res = ha_get_env(ha_mon, irq_off_opid, time_ns) == 1ull;
+ else if (curr_state == any_opid && event == sched_waking_opid)
+ res = ha_get_env(ha_mon, irq_off_opid, time_ns) == 1ull &&
+ ha_get_env(ha_mon, preempt_off_opid, time_ns) == 1ull;
+ return res;
}
-#else
-/* We assume irq_entry tracepoints are sufficient on other architectures */
-static void attach_vector_irq(void) { }
-static void detach_vector_irq(void) { }
-#endif
-
-static void handle_irq_disable(void *data, unsigned long ip, unsigned long parent_ip)
+static bool ha_verify_constraint(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
{
- da_handle_event(irq_disable_opid);
-}
+ if (!ha_verify_guards(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
-static void handle_irq_enable(void *data, unsigned long ip, unsigned long parent_ip)
-{
- da_handle_event(irq_enable_opid);
-}
-
-static void handle_irq_entry(void *data, int irq, struct irqaction *action)
-{
- da_handle_event(irq_entry_opid);
-}
-
-static void handle_preempt_disable(void *data, unsigned long ip, unsigned long parent_ip)
-{
- da_handle_event(preempt_disable_opid);
-}
-
-static void handle_preempt_enable(void *data, unsigned long ip, unsigned long parent_ip)
-{
- da_handle_event(preempt_enable_opid);
+ return true;
}
static void handle_sched_need_resched(void *data, struct task_struct *tsk, int cpu, int tif)
{
- /* The monitor's intitial state is not in_irq */
- if (this_cpu_read(hardirq_context))
- da_handle_event(sched_need_resched_opid);
- else
- da_handle_start_event(sched_need_resched_opid);
+ da_handle_start_event(sched_need_resched_opid);
}
static void handle_sched_waking(void *data, struct task_struct *p)
{
- /* The monitor's intitial state is not in_irq */
- if (this_cpu_read(hardirq_context))
- da_handle_event(sched_waking_opid);
- else
- da_handle_start_event(sched_waking_opid);
+ da_handle_start_event(sched_waking_opid);
}
static int enable_opid(void)
@@ -108,14 +77,8 @@ static int enable_opid(void)
if (retval)
return retval;
- rv_attach_trace_probe("opid", irq_disable, handle_irq_disable);
- rv_attach_trace_probe("opid", irq_enable, handle_irq_enable);
- rv_attach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
- rv_attach_trace_probe("opid", preempt_disable, handle_preempt_disable);
- rv_attach_trace_probe("opid", preempt_enable, handle_preempt_enable);
rv_attach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
rv_attach_trace_probe("opid", sched_waking, handle_sched_waking);
- attach_vector_irq();
return 0;
}
@@ -124,14 +87,8 @@ static void disable_opid(void)
{
rv_this.enabled = 0;
- rv_detach_trace_probe("opid", irq_disable, handle_irq_disable);
- rv_detach_trace_probe("opid", irq_enable, handle_irq_enable);
- rv_detach_trace_probe("opid", irq_handler_entry, handle_irq_entry);
- rv_detach_trace_probe("opid", preempt_disable, handle_preempt_disable);
- rv_detach_trace_probe("opid", preempt_enable, handle_preempt_enable);
rv_detach_trace_probe("opid", sched_set_need_resched_tp, handle_sched_need_resched);
rv_detach_trace_probe("opid", sched_waking, handle_sched_waking);
- detach_vector_irq();
da_monitor_destroy();
}
diff --git a/kernel/trace/rv/monitors/opid/opid.h b/kernel/trace/rv/monitors/opid/opid.h
index 092992514970..fb0aa4c28aa6 100644
--- a/kernel/trace/rv/monitors/opid/opid.h
+++ b/kernel/trace/rv/monitors/opid/opid.h
@@ -8,30 +8,31 @@
#define MONITOR_NAME opid
enum states_opid {
- disabled_opid,
- enabled_opid,
- in_irq_opid,
- irq_disabled_opid,
- preempt_disabled_opid,
+ any_opid,
state_max_opid,
};
#define INVALID_STATE state_max_opid
enum events_opid {
- irq_disable_opid,
- irq_enable_opid,
- irq_entry_opid,
- preempt_disable_opid,
- preempt_enable_opid,
sched_need_resched_opid,
sched_waking_opid,
event_max_opid,
};
+enum envs_opid {
+ irq_off_opid,
+ preempt_off_opid,
+ env_max_opid,
+ env_max_stored_opid = irq_off_opid,
+};
+
+_Static_assert(env_max_stored_opid <= MAX_HA_ENV_LEN, "Not enough slots");
+
struct automaton_opid {
char *state_names[state_max_opid];
char *event_names[event_max_opid];
+ char *env_names[env_max_opid];
unsigned char function[state_max_opid][event_max_opid];
unsigned char initial_state;
bool final_states[state_max_opid];
@@ -39,68 +40,19 @@ struct automaton_opid {
static const struct automaton_opid automaton_opid = {
.state_names = {
- "disabled",
- "enabled",
- "in_irq",
- "irq_disabled",
- "preempt_disabled",
+ "any",
},
.event_names = {
- "irq_disable",
- "irq_enable",
- "irq_entry",
- "preempt_disable",
- "preempt_enable",
"sched_need_resched",
"sched_waking",
},
+ .env_names = {
+ "irq_off",
+ "preempt_off",
+ },
.function = {
- {
- INVALID_STATE,
- preempt_disabled_opid,
- disabled_opid,
- INVALID_STATE,
- irq_disabled_opid,
- disabled_opid,
- disabled_opid,
- },
- {
- irq_disabled_opid,
- INVALID_STATE,
- INVALID_STATE,
- preempt_disabled_opid,
- enabled_opid,
- INVALID_STATE,
- INVALID_STATE,
- },
- {
- INVALID_STATE,
- enabled_opid,
- in_irq_opid,
- INVALID_STATE,
- INVALID_STATE,
- in_irq_opid,
- in_irq_opid,
- },
- {
- INVALID_STATE,
- enabled_opid,
- in_irq_opid,
- disabled_opid,
- INVALID_STATE,
- irq_disabled_opid,
- INVALID_STATE,
- },
- {
- disabled_opid,
- INVALID_STATE,
- INVALID_STATE,
- INVALID_STATE,
- enabled_opid,
- INVALID_STATE,
- INVALID_STATE,
- },
+ { any_opid, any_opid },
},
- .initial_state = disabled_opid,
- .final_states = { 0, 1, 0, 0, 0 },
+ .initial_state = any_opid,
+ .final_states = { 1 },
};
diff --git a/kernel/trace/rv/monitors/opid/opid_trace.h b/kernel/trace/rv/monitors/opid/opid_trace.h
index 3df6ff955c30..b04005b64208 100644
--- a/kernel/trace/rv/monitors/opid/opid_trace.h
+++ b/kernel/trace/rv/monitors/opid/opid_trace.h
@@ -12,4 +12,8 @@ DEFINE_EVENT(event_da_monitor, event_opid,
DEFINE_EVENT(error_da_monitor, error_opid,
TP_PROTO(char *state, char *event),
TP_ARGS(state, event));
+
+DEFINE_EVENT(error_env_da_monitor, error_env_opid,
+ TP_PROTO(char *state, char *event, char *env),
+ TP_ARGS(state, event, env));
#endif /* CONFIG_RV_MON_OPID */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 1661f8fe4a88..8ac4f315a627 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -62,7 +62,6 @@ DECLARE_EVENT_CLASS(error_da_monitor,
#include <monitors/scpd/scpd_trace.h>
#include <monitors/snep/snep_trace.h>
#include <monitors/sts/sts_trace.h>
-#include <monitors/opid/opid_trace.h>
// Add new monitors based on CONFIG_DA_MON_EVENTS_IMPLICIT here
#ifdef CONFIG_HA_MON_EVENTS_IMPLICIT
@@ -92,6 +91,7 @@ DECLARE_EVENT_CLASS(error_env_da_monitor,
);
// Add new monitors based on CONFIG_HA_MON_EVENTS_IMPLICIT here
+#include <monitors/opid/opid_trace.h>
#endif
diff --git a/tools/verification/models/sched/opid.dot b/tools/verification/models/sched/opid.dot
index 840052f6952b..511051fce430 100644
--- a/tools/verification/models/sched/opid.dot
+++ b/tools/verification/models/sched/opid.dot
@@ -1,35 +1,13 @@
digraph state_automaton {
center = true;
size = "7,11";
- {node [shape = plaintext, style=invis, label=""] "__init_disabled"};
- {node [shape = circle] "disabled"};
- {node [shape = doublecircle] "enabled"};
- {node [shape = circle] "enabled"};
- {node [shape = circle] "in_irq"};
- {node [shape = circle] "irq_disabled"};
- {node [shape = circle] "preempt_disabled"};
- "__init_disabled" -> "disabled";
- "disabled" [label = "disabled"];
- "disabled" -> "disabled" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
- "disabled" -> "irq_disabled" [ label = "preempt_enable" ];
- "disabled" -> "preempt_disabled" [ label = "irq_enable" ];
- "enabled" [label = "enabled", color = green3];
- "enabled" -> "enabled" [ label = "preempt_enable" ];
- "enabled" -> "irq_disabled" [ label = "irq_disable" ];
- "enabled" -> "preempt_disabled" [ label = "preempt_disable" ];
- "in_irq" [label = "in_irq"];
- "in_irq" -> "enabled" [ label = "irq_enable" ];
- "in_irq" -> "in_irq" [ label = "sched_need_resched\nsched_waking\nirq_entry" ];
- "irq_disabled" [label = "irq_disabled"];
- "irq_disabled" -> "disabled" [ label = "preempt_disable" ];
- "irq_disabled" -> "enabled" [ label = "irq_enable" ];
- "irq_disabled" -> "in_irq" [ label = "irq_entry" ];
- "irq_disabled" -> "irq_disabled" [ label = "sched_need_resched" ];
- "preempt_disabled" [label = "preempt_disabled"];
- "preempt_disabled" -> "disabled" [ label = "irq_disable" ];
- "preempt_disabled" -> "enabled" [ label = "preempt_enable" ];
+ {node [shape = plaintext, style=invis, label=""] "__init_any"};
+ {node [shape = doublecircle] "any"};
+ "__init_any" -> "any";
+ "any" [label = "any", color = green3];
+ "any" -> "any" [ label = "sched_need_resched;irq_off == 1\nsched_waking;irq_off == 1 && preempt_off == 1" ];
{ rank = min ;
- "__init_disabled";
- "disabled";
+ "__init_any";
+ "any";
}
}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 20/20] rv: Add deadline monitors
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
` (3 preceding siblings ...)
2025-09-19 14:09 ` [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton Gabriele Monaco
@ 2025-09-19 14:09 ` Gabriele Monaco
2025-10-10 15:04 ` Nam Cao
4 siblings, 1 reply; 14+ messages in thread
From: Gabriele Monaco @ 2025-09-19 14:09 UTC (permalink / raw)
To: linux-kernel, Steven Rostedt, Jonathan Corbet, Masami Hiramatsu,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Nam Cao, Tomas Glozar, Juri Lelli,
Clark Williams, John Kacur
Add the deadline monitors collection to validate the deadline scheduler,
both for deadline tasks and servers.
The currently implemented monitors are:
* throttle:
validate dl entities are throttled when they use up their runtime
* nomiss:
validate dl entities run to completion before their deadiline
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
Documentation/trace/rv/index.rst | 1 +
Documentation/trace/rv/monitor_deadline.rst | 151 ++++++++++
kernel/trace/rv/Kconfig | 5 +
kernel/trace/rv/Makefile | 3 +
kernel/trace/rv/monitors/deadline/Kconfig | 10 +
kernel/trace/rv/monitors/deadline/deadline.c | 35 +++
kernel/trace/rv/monitors/deadline/deadline.h | 200 +++++++++++++
kernel/trace/rv/monitors/nomiss/Kconfig | 15 +
kernel/trace/rv/monitors/nomiss/nomiss.c | 283 ++++++++++++++++++
kernel/trace/rv/monitors/nomiss/nomiss.h | 130 ++++++++
.../trace/rv/monitors/nomiss/nomiss_trace.h | 19 ++
kernel/trace/rv/monitors/throttle/Kconfig | 15 +
kernel/trace/rv/monitors/throttle/throttle.c | 253 ++++++++++++++++
kernel/trace/rv/monitors/throttle/throttle.h | 118 ++++++++
.../rv/monitors/throttle/throttle_trace.h | 19 ++
kernel/trace/rv/rv_trace.h | 2 +
tools/verification/models/deadline/nomiss.dot | 39 +++
.../verification/models/deadline/throttle.dot | 44 +++
18 files changed, 1342 insertions(+)
create mode 100644 Documentation/trace/rv/monitor_deadline.rst
create mode 100644 kernel/trace/rv/monitors/deadline/Kconfig
create mode 100644 kernel/trace/rv/monitors/deadline/deadline.c
create mode 100644 kernel/trace/rv/monitors/deadline/deadline.h
create mode 100644 kernel/trace/rv/monitors/nomiss/Kconfig
create mode 100644 kernel/trace/rv/monitors/nomiss/nomiss.c
create mode 100644 kernel/trace/rv/monitors/nomiss/nomiss.h
create mode 100644 kernel/trace/rv/monitors/nomiss/nomiss_trace.h
create mode 100644 kernel/trace/rv/monitors/throttle/Kconfig
create mode 100644 kernel/trace/rv/monitors/throttle/throttle.c
create mode 100644 kernel/trace/rv/monitors/throttle/throttle.h
create mode 100644 kernel/trace/rv/monitors/throttle/throttle_trace.h
create mode 100644 tools/verification/models/deadline/nomiss.dot
create mode 100644 tools/verification/models/deadline/throttle.dot
diff --git a/Documentation/trace/rv/index.rst b/Documentation/trace/rv/index.rst
index bf9962f49959..29769f06bb0f 100644
--- a/Documentation/trace/rv/index.rst
+++ b/Documentation/trace/rv/index.rst
@@ -17,3 +17,4 @@ Runtime Verification
monitor_sched.rst
monitor_rtapp.rst
monitor_stall.rst
+ monitor_deadline.rst
diff --git a/Documentation/trace/rv/monitor_deadline.rst b/Documentation/trace/rv/monitor_deadline.rst
new file mode 100644
index 000000000000..941b93747204
--- /dev/null
+++ b/Documentation/trace/rv/monitor_deadline.rst
@@ -0,0 +1,151 @@
+Scheduler monitors
+==================
+
+- Name: deadline
+- Type: container for multiple monitors
+- Author: Gabriele Monaco <gmonaco@redhat.com>
+
+Description
+-----------
+
+The deadline monitor is a set of specifications to describe the deadline
+scheduler behaviour. It includes monitors per scheduling entity (deadline tasks
+and servers) that work independently to verify different specifications the
+deadline scheduler should follow.
+
+Specifications
+--------------
+
+Monitor throttle
+~~~~~~~~~~~~~~~~
+
+The throttle monitor ensures deadline entities are throttled when they use up
+their runtime. Deadline tasks can be only ``running``, ``preempted`` and
+``throttled``, the runtime is enforced only in ``running`` based on an internal
+clock and the runtime value in the deadline entity.
+
+Servers can be also in the ``armed`` state, which corresponds to when the
+server is consuming bandwidth in background (e.g. idle or normal tasks are
+running without any boost). From this state the server can be throttled but it
+can also use more runtime than available. A server is considered ``running``
+when it's actively boosting a task, only there the runtime is enforced. The
+server is preempted if the running task is not in the server's runqueue (e.g. a
+FIFO task for the fair server).
+Events like ``dl_armed`` and ``sched_switch_in`` can occur sequentially for
+servers since they are related to the current task (e.g. a 2 fair tasks can be
+switched in sequentially, that corresponds to multiple ``dl_armed``).
+
+Any task or server in the ``throttled`` state must leave it shortly, e.g.
+become ``preempted``::
+
+ |
+ |
+ dl_replenish;reset(clk) v
+ sched_switch_in #=========================# sched_switch_in;
+ +--------------- H H reset(clk)
+ | H H <----------------+
+ +--------------> H running H |
+ dl_throttle;reset(clk) H clk < runtime_left_ns() H |
+ +--------------------------- H H sched_switch_out |
+ | +------------------> H H -------------+ |
+ | dl_replenish;reset(clk) #=========================# | |
+ | | | ^ | |
+ v | dl_defer_arm | | |
+ +-------------------------+ | | | |
+ | throttled | | sched_switch_in;reset(clk) | |
+ | clk < THROTTLED_TIME_NS | v | | |
+ +-------------------------+ +----------------+ | |
+ | | | | sched_switch_out | |
+ | | +---------- | | -------------+ | |
+ | | dl_replenish | armed | | | |
+ | | dl_defer_arm | | <--------+ | | |
+ | | +---------> | | dl_defer_arm | | |
+ | | +----------------+ | | | |
+ | | | ^ | | | |
+ | | dl_throttle dl_replenish | | | |
+ | | dl_throttle;yielded==1 v | | | | |
+ | | dl_defer_arm +-------------------+ | v v |
+ | | +---------- | | +--------------+
+ | | | | | | |
+ | | +---------> | armed_throttled | | preempted |
+ | | | | | |
+ | +------------------------> | | +--------------+
+ | dl_defer_arm +-------------------+ | ^
+ | | ^ | |
+ | sched_switch_out dl_defer_arm | |
+ | v | | |
+ | sched_switch_out +-----------------------+ | |
+ | +-------------- | | dl_throttle; | |
+ | | | | is_constr_dl==1 | |
+ | +-------------> | preempted_throttled | <-----------------+ |
+ | | | |
+ +-----------------------> | | -- dl_replenish -----+
+ sched_switch_out +-----------------------+
+
+The value of ``runtime_left_ns()`` is directly read from the deadline entity
+and updated as the task runs. It is increased by 1 tick to account for the
+maximum delay to throttle (not valid if ``sched_feat(HRTICK_DL)`` is active).
+
+Monitor nomiss
+~~~~~~~~~~~~~~
+
+The nomiss monitor ensures dl entities get to run *and* run to completion
+before their deadiline, although deferrable servers may not run. An entity is
+considered done if ``throttled``, either because it yielded or used up its
+runtime, or when it voluntarily starts ``sleeping``.
+The monitor includes a user configurable deadline threshold. If the total
+utilisation of deadline tasks is larger than 1, they are only guaranteed
+bounded tardiness. See Documentation/scheduler/sched-deadline.rst for more
+details. The threshold (module parameter ``nomiss.deadline_thresh``) can be
+configured to avoid the monitor to fail based on the acceptable tardiness in
+the system. Since ``dl_throttle`` is a valid outcome for the entity to be done,
+the minimum tardiness needs be 1 tick to consider the throttle delay.
+
+Servers have also an intermediate ``idle`` state, occurring as soon as no
+runnable task is available. When a server goes to ``sleeping`` it is guaranteed
+to be done for the period (unlike tasks), hence it cannot be considered
+``ready`` before its runtime is replenished::
+
+ sched_wakeup |
+ dl_server_start |
+ dl_replenish;reset(clk) v
+ +------------ #=========================#
+ | H H dl_replenish;reset(clk)
+ +-----------> H H <------------------+
+ H H |
+ +- dl_server_stop --------- H ready H |
+ | H clk < DEADLINE_NS() H |
+ | +--------------------> H H dl_throttle; |
+ | | H H is_defer == 1 |
+ | | sched_switch_in -- H H ---------------+ |
+ | | | #=========================# | |
+ | | | | ^ | |
+ | | | dl_server_idle dl_replenish;reset(clk) | |
+ | | | v | | |
+ | | | +----------------+ | |
+ | | | +---------- | | | |
+ | | | dl_server_idle | idle | dl_throttle | |
+ | | | +---------> | | ---------------+ | |
+ | | | +----------------+ | | |
+ | | | | ^ | | |
+ | | | sched_switch_in dl_server_idle | | |
+ | | | v | | | |
+ | | | +---------- +---------------------+ | | |
+ | | | sched_switch_in | | | | |
+ | | | sched_wakeup | | dl_throttle | | |
+ | | | dl_replenish; | running | -------+ | | |
+ | | | reset(clk) | clk < DEADLINE_NS() | | | | |
+ | | | +---------> | | | | | |
+ | | +------------------> | | | | | |
+ | | +---------------------+ | | | |
+ | sched_wakeup | sched_switch_suspend | | | |
+ v dl_replenish;reset(clk) | dl_server_stop | | | |
+ +---------------+ | v v v |
+ | | <---------------+ dl_throttle +--------------+
+ | | sched_wakeup +-- | |
+ | sleeping | --+ dl_server_idle | | throttled |
+ | | dl_server_start dl_server_start +-> | |
+ | | dl_server_idle sched_switch_suspend +--------------+
+ +---------------+ <-+ ^
+ | dl_throttle;is_constr_dl == 1 |
+ +------------------------------------------------------------+
diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig
index 720fbe4935f8..719cdcfb6d41 100644
--- a/kernel/trace/rv/Kconfig
+++ b/kernel/trace/rv/Kconfig
@@ -79,6 +79,11 @@ source "kernel/trace/rv/monitors/sleep/Kconfig"
# Add new rtapp monitors here
source "kernel/trace/rv/monitors/stall/Kconfig"
+source "kernel/trace/rv/monitors/deadline/Kconfig"
+source "kernel/trace/rv/monitors/nomiss/Kconfig"
+source "kernel/trace/rv/monitors/throttle/Kconfig"
+# Add new deadline monitors here
+
# Add new monitors here
config RV_REACTORS
diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile
index 51c95e2d2da6..15a1edc8bd0f 100644
--- a/kernel/trace/rv/Makefile
+++ b/kernel/trace/rv/Makefile
@@ -18,6 +18,9 @@ obj-$(CONFIG_RV_MON_NRP) += monitors/nrp/nrp.o
obj-$(CONFIG_RV_MON_SSSW) += monitors/sssw/sssw.o
obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o
obj-$(CONFIG_RV_MON_STALL) += monitors/stall/stall.o
+obj-$(CONFIG_RV_MON_DEADLINE) += monitors/deadline/deadline.o
+obj-$(CONFIG_RV_MON_NOMISS) += monitors/nomiss/nomiss.o
+obj-$(CONFIG_RV_MON_THROTTLE) += monitors/throttle/throttle.o
# Add new monitors here
obj-$(CONFIG_RV_REACTORS) += rv_reactors.o
obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o
diff --git a/kernel/trace/rv/monitors/deadline/Kconfig b/kernel/trace/rv/monitors/deadline/Kconfig
new file mode 100644
index 000000000000..38804a6ad91d
--- /dev/null
+++ b/kernel/trace/rv/monitors/deadline/Kconfig
@@ -0,0 +1,10 @@
+config RV_MON_DEADLINE
+ depends on RV
+ bool "deadline monitor"
+ help
+ Collection of monitors to check the deadline scheduler and server
+ behave according to specifications. Enable this to enable all
+ scheduler specification supported by the current kernel.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_deadline.rst
diff --git a/kernel/trace/rv/monitors/deadline/deadline.c b/kernel/trace/rv/monitors/deadline/deadline.c
new file mode 100644
index 000000000000..45aed62c1371
--- /dev/null
+++ b/kernel/trace/rv/monitors/deadline/deadline.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+
+#define MODULE_NAME "deadline"
+
+#include "deadline.h"
+
+struct rv_monitor rv_deadline = {
+ .name = "deadline",
+ .description = "container for several deadline scheduler specifications.",
+ .enable = NULL,
+ .disable = NULL,
+ .reset = NULL,
+ .enabled = 0,
+};
+
+static int __init register_deadline(void)
+{
+ return rv_register_monitor(&rv_deadline, NULL);
+}
+
+static void __exit unregister_deadline(void)
+{
+ rv_unregister_monitor(&rv_deadline);
+}
+
+module_init(register_deadline);
+module_exit(unregister_deadline);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("deadline: container for several deadline scheduler specifications.");
diff --git a/kernel/trace/rv/monitors/deadline/deadline.h b/kernel/trace/rv/monitors/deadline/deadline.h
new file mode 100644
index 000000000000..76f2770a7246
--- /dev/null
+++ b/kernel/trace/rv/monitors/deadline/deadline.h
@@ -0,0 +1,200 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/kernel.h>
+#include <asm/syscall.h>
+#include <uapi/linux/sched/types.h>
+
+/*
+ * Dummy values if not available
+ */
+#ifndef __NR_sched_setscheduler
+#define __NR_sched_setscheduler -1
+#endif
+#ifndef __NR_sched_setattr
+#define __NR_sched_setattr -2
+#endif
+
+extern struct rv_monitor rv_deadline;
+
+static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se)
+{
+ if (WARN_ONCE(dl_se->dl_server, "Call this only on a DL task!"))
+ return NULL;
+ return container_of(dl_se, struct task_struct, dl);
+}
+
+#ifdef CONFIG_RT_MUTEXES
+static inline struct sched_dl_entity *pi_of(struct sched_dl_entity *dl_se)
+{
+ return dl_se->pi_se;
+}
+
+static inline bool is_dl_boosted(struct sched_dl_entity *dl_se)
+{
+ return pi_of(dl_se) != dl_se;
+}
+#else /* !CONFIG_RT_MUTEXES: */
+static inline struct sched_dl_entity *pi_of(struct sched_dl_entity *dl_se)
+{
+ return dl_se;
+}
+
+static inline bool is_dl_boosted(struct sched_dl_entity *dl_se)
+{
+ return false;
+}
+#endif /* !CONFIG_RT_MUTEXES */
+
+static inline bool dl_is_implicit(struct sched_dl_entity *dl_se)
+{
+ return dl_se->dl_deadline == dl_se->dl_period;
+}
+
+/*
+ * If both have dummy values, the syscalls are not supported and we don't even
+ * need to register the handler.
+ */
+static inline bool should_skip_syscall_handle(void)
+{
+ return __NR_sched_setattr < 0 && __NR_sched_setscheduler < 0;
+}
+
+/*
+ * Use negative numbers for the server.
+ * Currently only one fair server per CPU, may change in the future.
+ */
+#define fair_server_id(cpu) (-cpu)
+/*
+ * Get a unique id used for dl entities
+ *
+ * The cpu is not required for tasks as the pid is used there, if this function
+ * is called on a dl_se that for sure corresponds to a task, DL_TASK can be
+ * used in place of cpu.
+ * We need the cpu for servers as it is provided in the tracepoint and we
+ * cannot easily retrieve it from the dl_se (requires the struct rq definition).
+ */
+static inline int get_entity_id(struct sched_dl_entity *dl_se, int cpu)
+{
+ if (dl_se->dl_server)
+ return fair_server_id(cpu);
+ return dl_task_of(dl_se)->pid;
+}
+
+/* Expand id and target as arguments for da functions */
+#define EXPAND_ID(dl_se, cpu) get_entity_id(dl_se, cpu), dl_se
+
+/* Use this as the cpu in EXPAND_ID in case the dl_se is surely from a task */
+#define DL_TASK -1
+
+static inline int extract_params(struct pt_regs *regs, long id, struct task_struct **p)
+{
+ size_t size = offsetof(struct sched_attr, sched_nice);
+ struct sched_attr __user *uattr, attr;
+ int new_policy = -1, ret;
+ unsigned long args[6];
+ pid_t pid;
+
+ switch (id) {
+ case __NR_sched_setscheduler:
+ syscall_get_arguments(current, regs, args);
+ pid = args[0];
+ new_policy = args[1];
+ break;
+ case __NR_sched_setattr:
+ syscall_get_arguments(current, regs, args);
+ pid = args[0];
+ uattr = (void *)args[1];
+ /*
+ * Just copy up to sched_flags, we are not interested after that
+ */
+ ret = copy_struct_from_user(&attr, size, uattr, size);
+ if (ret)
+ return ret;
+ if (attr.sched_flags & SCHED_FLAG_KEEP_POLICY)
+ return -EINVAL;
+ new_policy = attr.sched_policy;
+ break;
+ default:
+ return -EINVAL;
+ }
+ if (!pid)
+ *p = current;
+ else {
+ /*
+ * Required for find_task_by_vpid, make sure the caller doesn't
+ * need to get_task_struct().
+ */
+ guard(rcu)();
+ *p = find_task_by_vpid(pid);
+ if (unlikely(!p))
+ return -EINVAL;
+ }
+
+ return new_policy;
+}
+
+/* Helper functions requiring DA/HA utilities*/
+#ifdef RV_MON_TYPE
+
+/*
+ * get_fair_server - get the fair server associated to a task
+ *
+ * If the task is a boosted task, the server is available in the task_struct,
+ * otherwise grab the dl entity saved for the CPU where the task is enqueued.
+ * This function assumes the task is enqueued somewhere.
+ */
+static inline struct sched_dl_entity *get_fair_server(struct task_struct *tsk)
+{
+ if (tsk->dl_server)
+ return tsk->dl_server;
+ return da_get_target_by_id(fair_server_id(task_cpu(tsk)));
+}
+
+/*
+ * Initialise monitors for all tasks and pre-allocate the storage for servers.
+ * This is necessary since we don't have access to the servers here and
+ * allocation can cause deadlocks from their tracepoints. We can only fill
+ * pre-initialised storage from there.
+ */
+static inline int init_storage(void)
+{
+ struct task_struct *g, *p;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (!da_create_empty_storage(fair_server_id(cpu), GFP_KERNEL))
+ goto fail;
+ }
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, p) {
+ if (p->policy == SCHED_DEADLINE) {
+ if (!da_create_storage(EXPAND_ID(&p->dl, DL_TASK), NULL,
+ GFP_KERNEL)) {
+ read_unlock(&tasklist_lock);
+ goto fail;
+ }
+ }
+ }
+ read_unlock(&tasklist_lock);
+ return 0;
+
+fail:
+ da_monitor_destroy();
+ return -ENOMEM;
+}
+
+static void handle_newtask(void *data, struct task_struct *task, unsigned long flags)
+{
+ /* Might be superfluous as tasks are not started with this policy.. */
+ if (task->policy == SCHED_DEADLINE)
+ da_create_storage(EXPAND_ID(&task->dl, DL_TASK), NULL, GFP_NOWAIT);
+}
+
+static void handle_exit(void *data, struct task_struct *p, bool group_dead)
+{
+ if (p->policy == SCHED_DEADLINE)
+ da_destroy_storage(get_entity_id(&p->dl, DL_TASK));
+}
+
+#endif
diff --git a/kernel/trace/rv/monitors/nomiss/Kconfig b/kernel/trace/rv/monitors/nomiss/Kconfig
new file mode 100644
index 000000000000..e1886c3a0dd9
--- /dev/null
+++ b/kernel/trace/rv/monitors/nomiss/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_NOMISS
+ depends on RV
+ depends on HAVE_SYSCALL_TRACEPOINTS
+ depends on RV_MON_DEADLINE
+ default y
+ select HA_MON_EVENTS_ID
+ bool "nomiss monitor"
+ help
+ Monitor to ensure dl entities run to completion before their deadiline.
+ This monitor is part of the deadline monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_deadline.rst
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.c b/kernel/trace/rv/monitors/nomiss/nomiss.c
new file mode 100644
index 000000000000..9bc8ab60d8dc
--- /dev/null
+++ b/kernel/trace/rv/monitors/nomiss/nomiss.c
@@ -0,0 +1,283 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+
+#define MODULE_NAME "nomiss"
+
+#include <uapi/linux/sched/types.h>
+#include <trace/events/syscalls.h>
+#include <trace/events/sched.h>
+#include <trace/events/task.h>
+#include <rv_trace.h>
+
+#define RV_MON_TYPE RV_MON_PER_OBJ
+#define HA_TIMER_TYPE HA_TIMER_WHEEL
+/* The start condition is on sched_switch, it's dangerous to allocate there */
+#define DA_SKIP_AUTO_ALLOC
+typedef struct sched_dl_entity *monitor_target;
+#include "nomiss.h"
+#include <rv/ha_monitor.h>
+#include <monitors/deadline/deadline.h>
+
+/*
+ * User configurable deadline threshold. If the total utilisation of deadline
+ * tasks is larger than 1, they are only guaranteed bounded tardiness. See
+ * Documentation/scheduler/sched-deadline.rst for more details.
+ * The minimum tardiness without sched_feat(HRTICK_DL) is 1 tick to accommodate
+ * for throttle enforced on the next tick.
+ */
+static u64 deadline_thresh = TICK_NSEC;
+module_param(deadline_thresh, ullong, 0644);
+#define DEADLINE_NS(ha_mon) (pi_of(ha_get_target(ha_mon))->dl_deadline + deadline_thresh)
+
+static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_nomiss env, u64 time_ns)
+{
+ if (env == clk_nomiss)
+ return ha_get_clk_ns(ha_mon, env, time_ns);
+ else if (env == is_constr_dl_nomiss)
+ return !dl_is_implicit(ha_get_target(ha_mon));
+ else if (env == is_defer_nomiss)
+ return ha_get_target(ha_mon)->dl_defer;
+ return ENV_INVALID_VALUE;
+}
+
+static void ha_reset_env(struct ha_monitor *ha_mon, enum envs_nomiss env, u64 time_ns)
+{
+ if (env == clk_nomiss)
+ ha_reset_clk_ns(ha_mon, env, time_ns);
+}
+
+static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (curr_state == ready_nomiss)
+ return ha_check_invariant_ns(ha_mon, clk_nomiss, time_ns);
+ else if (curr_state == running_nomiss)
+ return ha_check_invariant_ns(ha_mon, clk_nomiss, time_ns);
+ return true;
+}
+
+static inline void ha_convert_inv_guard(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (curr_state == next_state)
+ return;
+ if (curr_state == ready_nomiss)
+ ha_inv_to_guard(ha_mon, clk_nomiss, DEADLINE_NS(ha_mon), time_ns);
+ else if (curr_state == running_nomiss)
+ ha_inv_to_guard(ha_mon, clk_nomiss, DEADLINE_NS(ha_mon), time_ns);
+}
+
+static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ bool res = true;
+
+ if (curr_state == idle_nomiss && event == dl_replenish_nomiss)
+ ha_reset_env(ha_mon, clk_nomiss, time_ns);
+ else if (curr_state == ready_nomiss && event == dl_replenish_nomiss)
+ ha_reset_env(ha_mon, clk_nomiss, time_ns);
+ else if (curr_state == ready_nomiss && event == dl_throttle_nomiss)
+ res = ha_get_env(ha_mon, is_defer_nomiss, time_ns) == 1ull;
+ else if (curr_state == running_nomiss && event == dl_replenish_nomiss)
+ ha_reset_env(ha_mon, clk_nomiss, time_ns);
+ else if (curr_state == sleeping_nomiss && event == dl_replenish_nomiss)
+ ha_reset_env(ha_mon, clk_nomiss, time_ns);
+ else if (curr_state == sleeping_nomiss && event == dl_throttle_nomiss)
+ res = ha_get_env(ha_mon, is_constr_dl_nomiss, time_ns) == 1ull;
+ else if (curr_state == throttled_nomiss && event == dl_replenish_nomiss)
+ ha_reset_env(ha_mon, clk_nomiss, time_ns);
+ return res;
+}
+
+static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (next_state == curr_state && event != dl_replenish_nomiss)
+ return;
+ if (next_state == ready_nomiss)
+ ha_start_timer_ns(ha_mon, clk_nomiss, DEADLINE_NS(ha_mon), time_ns);
+ else if (next_state == running_nomiss)
+ ha_start_timer_ns(ha_mon, clk_nomiss, DEADLINE_NS(ha_mon), time_ns);
+ else if (curr_state == ready_nomiss)
+ ha_cancel_timer(ha_mon);
+ else if (curr_state == running_nomiss)
+ ha_cancel_timer(ha_mon);
+}
+
+static bool ha_verify_constraint(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (!ha_verify_invariants(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ ha_convert_inv_guard(ha_mon, curr_state, event, next_state, time_ns);
+
+ if (!ha_verify_guards(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ ha_setup_invariants(ha_mon, curr_state, event, next_state, time_ns);
+
+ return true;
+}
+
+static void handle_dl_replenish(void *data, struct sched_dl_entity *dl, int cpu)
+{
+ da_handle_event(EXPAND_ID(dl, cpu), dl_replenish_nomiss);
+}
+
+static void handle_dl_throttle(void *data, struct sched_dl_entity *dl, int cpu)
+{
+ da_handle_event(EXPAND_ID(dl, cpu), dl_throttle_nomiss);
+}
+
+static void handle_dl_server_start(void *data, struct sched_dl_entity *dl, int cpu)
+{
+ da_handle_event(EXPAND_ID(dl, cpu), dl_server_start_nomiss);
+}
+
+static void handle_dl_server_stop(void *data, struct sched_dl_entity *dl, int cpu, bool hard)
+{
+ /*
+ * This isn't the standard use of da_handle_start_run_event since those
+ * events cannot only occur from the initial states.
+ * It is fine to use here because they always bring to a known state
+ * and the fact we "pretend" the transition starts from the initial
+ * state has no side effect.
+ * dl_server_idle_nomiss is an even more special case as it can also be
+ * triggered by the CPU going idle, only that can lead to other states.
+ */
+ if (hard)
+ da_handle_start_run_event(EXPAND_ID(dl, cpu), dl_server_stop_nomiss);
+ else
+ da_handle_start_run_event(EXPAND_ID(dl, cpu), dl_server_idle_nomiss);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ struct sched_dl_entity *dl;
+ int cpu = task_cpu(next);
+
+ if (prev_state != TASK_RUNNING && !preempt && prev->policy == SCHED_DEADLINE)
+ da_handle_event(EXPAND_ID(&prev->dl, cpu), sched_switch_suspend_nomiss);
+ if (next->policy == SCHED_DEADLINE)
+ da_handle_start_run_event(EXPAND_ID(&next->dl, cpu), sched_switch_in_nomiss);
+
+ /*
+ * The server is available in next only if the next task is boosted,
+ * otherwise we need to retrieve it.
+ * Here the server continues in the state running/armed until actually
+ * stopped, this works since we continue expecting a throttle.
+ */
+ dl = get_fair_server(next);
+ if (!dl)
+ return;
+ if (next->dl_server)
+ da_handle_start_run_event(EXPAND_ID(next->dl_server, cpu), sched_switch_in_nomiss);
+ else if (is_idle_task(next))
+ da_handle_event(EXPAND_ID(dl, cpu), dl_server_idle_nomiss);
+}
+
+static void handle_syscall(void *data, struct pt_regs *regs, long id)
+{
+ struct task_struct *p;
+ int new_policy = -1;
+
+ new_policy = extract_params(regs, id, &p);
+ if (new_policy < 0 || new_policy == p->policy)
+ return;
+ if (p->policy == SCHED_DEADLINE)
+ da_reset(EXPAND_ID(&p->dl, DL_TASK));
+ else if (new_policy == SCHED_DEADLINE)
+ da_create_conditional(EXPAND_ID(&p->dl, DL_TASK), GFP_NOWAIT);
+}
+
+static void handle_sched_wakeup(void *data, struct task_struct *tsk)
+{
+ if (tsk->policy == SCHED_DEADLINE)
+ da_handle_event(EXPAND_ID(&tsk->dl, task_cpu(tsk)), sched_wakeup_nomiss);
+}
+
+static int enable_nomiss(void)
+{
+ int retval;
+
+ retval = da_monitor_init();
+ if (retval)
+ return retval;
+
+ retval = init_storage();
+ if (retval)
+ return retval;
+ rv_attach_trace_probe("nomiss", sched_dl_replenish_tp, handle_dl_replenish);
+ rv_attach_trace_probe("nomiss", sched_dl_throttle_tp, handle_dl_throttle);
+ rv_attach_trace_probe("nomiss", sched_dl_server_start_tp, handle_dl_server_start);
+ rv_attach_trace_probe("nomiss", sched_dl_server_stop_tp, handle_dl_server_stop);
+ rv_attach_trace_probe("nomiss", sched_switch, handle_sched_switch);
+ rv_attach_trace_probe("nomiss", sched_wakeup, handle_sched_wakeup);
+ if (!should_skip_syscall_handle())
+ rv_attach_trace_probe("nomiss", sys_enter, handle_syscall);
+ rv_attach_trace_probe("nomiss", task_newtask, handle_newtask);
+ rv_attach_trace_probe("nomiss", sched_process_exit, handle_exit);
+
+ return 0;
+}
+
+static void disable_nomiss(void)
+{
+ rv_this.enabled = 0;
+
+ /* Those are RCU writers, detach earlier hoping to close a bit faster */
+ rv_detach_trace_probe("nomiss", task_newtask, handle_newtask);
+ rv_detach_trace_probe("nomiss", sched_process_exit, handle_exit);
+ if (!should_skip_syscall_handle())
+ rv_detach_trace_probe("nomiss", sys_enter, handle_syscall);
+
+ rv_detach_trace_probe("nomiss", sched_dl_replenish_tp, handle_dl_replenish);
+ rv_detach_trace_probe("nomiss", sched_dl_throttle_tp, handle_dl_throttle);
+ rv_detach_trace_probe("nomiss", sched_dl_server_start_tp, handle_dl_server_start);
+ rv_detach_trace_probe("nomiss", sched_dl_server_stop_tp, handle_dl_server_stop);
+ rv_detach_trace_probe("nomiss", sched_switch, handle_sched_switch);
+ rv_detach_trace_probe("nomiss", sched_wakeup, handle_sched_wakeup);
+
+ da_monitor_destroy();
+}
+
+static struct rv_monitor rv_this = {
+ .name = "nomiss",
+ .description = "dl entities run to completion before their deadiline.",
+ .enable = enable_nomiss,
+ .disable = disable_nomiss,
+ .reset = da_monitor_reset_all,
+ .enabled = 0,
+};
+
+static int __init register_nomiss(void)
+{
+ return rv_register_monitor(&rv_this, &rv_deadline);
+}
+
+static void __exit unregister_nomiss(void)
+{
+ rv_unregister_monitor(&rv_this);
+}
+
+module_init(register_nomiss);
+module_exit(unregister_nomiss);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("nomiss: dl entities run to completion before their deadiline.");
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss.h b/kernel/trace/rv/monitors/nomiss/nomiss.h
new file mode 100644
index 000000000000..2cbfdbbc900f
--- /dev/null
+++ b/kernel/trace/rv/monitors/nomiss/nomiss.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of nomiss automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+#define MONITOR_NAME nomiss
+
+enum states_nomiss {
+ ready_nomiss,
+ idle_nomiss,
+ running_nomiss,
+ sleeping_nomiss,
+ throttled_nomiss,
+ state_max_nomiss,
+};
+
+#define INVALID_STATE state_max_nomiss
+
+enum events_nomiss {
+ dl_replenish_nomiss,
+ dl_server_idle_nomiss,
+ dl_server_start_nomiss,
+ dl_server_stop_nomiss,
+ dl_throttle_nomiss,
+ sched_switch_in_nomiss,
+ sched_switch_suspend_nomiss,
+ sched_wakeup_nomiss,
+ event_max_nomiss,
+};
+
+enum envs_nomiss {
+ clk_nomiss,
+ is_constr_dl_nomiss,
+ is_defer_nomiss,
+ env_max_nomiss,
+ env_max_stored_nomiss = is_constr_dl_nomiss,
+};
+
+_Static_assert(env_max_stored_nomiss <= MAX_HA_ENV_LEN, "Not enough slots");
+#define HA_CLK_NS
+
+struct automaton_nomiss {
+ char *state_names[state_max_nomiss];
+ char *event_names[event_max_nomiss];
+ char *env_names[env_max_nomiss];
+ unsigned char function[state_max_nomiss][event_max_nomiss];
+ unsigned char initial_state;
+ bool final_states[state_max_nomiss];
+};
+
+static const struct automaton_nomiss automaton_nomiss = {
+ .state_names = {
+ "ready",
+ "idle",
+ "running",
+ "sleeping",
+ "throttled",
+ },
+ .event_names = {
+ "dl_replenish",
+ "dl_server_idle",
+ "dl_server_start",
+ "dl_server_stop",
+ "dl_throttle",
+ "sched_switch_in",
+ "sched_switch_suspend",
+ "sched_wakeup",
+ },
+ .env_names = {
+ "clk",
+ "is_constr_dl",
+ "is_defer",
+ },
+ .function = {
+ {
+ ready_nomiss,
+ idle_nomiss,
+ ready_nomiss,
+ sleeping_nomiss,
+ throttled_nomiss,
+ running_nomiss,
+ INVALID_STATE,
+ ready_nomiss,
+ },
+ {
+ ready_nomiss,
+ idle_nomiss,
+ INVALID_STATE,
+ INVALID_STATE,
+ throttled_nomiss,
+ running_nomiss,
+ INVALID_STATE,
+ INVALID_STATE,
+ },
+ {
+ running_nomiss,
+ idle_nomiss,
+ INVALID_STATE,
+ sleeping_nomiss,
+ throttled_nomiss,
+ running_nomiss,
+ sleeping_nomiss,
+ running_nomiss,
+ },
+ {
+ ready_nomiss,
+ sleeping_nomiss,
+ sleeping_nomiss,
+ INVALID_STATE,
+ throttled_nomiss,
+ INVALID_STATE,
+ INVALID_STATE,
+ ready_nomiss,
+ },
+ {
+ ready_nomiss,
+ throttled_nomiss,
+ throttled_nomiss,
+ INVALID_STATE,
+ throttled_nomiss,
+ INVALID_STATE,
+ throttled_nomiss,
+ throttled_nomiss,
+ },
+ },
+ .initial_state = ready_nomiss,
+ .final_states = { 1, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/nomiss/nomiss_trace.h b/kernel/trace/rv/monitors/nomiss/nomiss_trace.h
new file mode 100644
index 000000000000..42e7efaca4e7
--- /dev/null
+++ b/kernel/trace/rv/monitors/nomiss/nomiss_trace.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_NOMISS
+DEFINE_EVENT(event_da_monitor_id, event_nomiss,
+ TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_nomiss,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+
+DEFINE_EVENT(error_env_da_monitor_id, error_env_nomiss,
+ TP_PROTO(int id, char *state, char *event, char *env),
+ TP_ARGS(id, state, event, env));
+#endif /* CONFIG_RV_MON_NOMISS */
diff --git a/kernel/trace/rv/monitors/throttle/Kconfig b/kernel/trace/rv/monitors/throttle/Kconfig
new file mode 100644
index 000000000000..d9bd2dc903cd
--- /dev/null
+++ b/kernel/trace/rv/monitors/throttle/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+config RV_MON_THROTTLE
+ depends on RV
+ depends on HAVE_SYSCALL_TRACEPOINTS
+ depends on RV_MON_DEADLINE
+ default y
+ select HA_MON_EVENTS_ID
+ bool "throttle monitor"
+ help
+ Monitor to ensure dl entities are throttled when they use up their runtime.
+ This monitor is part of the deadline monitors collection.
+
+ For further information, see:
+ Documentation/trace/rv/monitor_deadline.rst
diff --git a/kernel/trace/rv/monitors/throttle/throttle.c b/kernel/trace/rv/monitors/throttle/throttle.c
new file mode 100644
index 000000000000..2eceb11335c5
--- /dev/null
+++ b/kernel/trace/rv/monitors/throttle/throttle.c
@@ -0,0 +1,253 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/ftrace.h>
+#include <linux/tracepoint.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/rv.h>
+#include <rv/instrumentation.h>
+
+#define MODULE_NAME "throttle"
+
+#include <uapi/linux/sched/types.h>
+#include <trace/events/syscalls.h>
+#include <trace/events/sched.h>
+#include <trace/events/task.h>
+#include <rv_trace.h>
+
+#define RV_MON_TYPE RV_MON_PER_OBJ
+#define HA_TIMER_TYPE HA_TIMER_WHEEL
+/* The start condition is on sched_switch, it's dangerous to allocate there */
+#define DA_SKIP_AUTO_ALLOC
+typedef struct sched_dl_entity *monitor_target;
+#include "throttle.h"
+#include <rv/ha_monitor.h>
+#include <monitors/deadline/deadline.h>
+
+#define THROTTLED_TIME_NS TICK_NSEC
+/* with sched_feat(HRTICK_DL) the threshold can be lower */
+#define RUNTIME_THRESH TICK_NSEC
+
+static inline u64 runtime_left_ns(struct ha_monitor *ha_mon)
+{
+ return pi_of(ha_get_target(ha_mon))->runtime + RUNTIME_THRESH;
+}
+
+static u64 ha_get_env(struct ha_monitor *ha_mon, enum envs_throttle env, u64 time_ns)
+{
+ if (env == clk_throttle)
+ return ha_get_clk_ns(ha_mon, env, time_ns);
+ else if (env == is_constr_dl_throttle)
+ return !dl_is_implicit(ha_get_target(ha_mon));
+ else if (env == yielded_throttle)
+ return ha_get_target(ha_mon)->dl_yielded;
+ return ENV_INVALID_VALUE;
+}
+
+static void ha_reset_env(struct ha_monitor *ha_mon, enum envs_throttle env, u64 time_ns)
+{
+ if (env == clk_throttle)
+ ha_reset_clk_ns(ha_mon, env, time_ns);
+}
+
+static inline bool ha_verify_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (curr_state == running_throttle)
+ return ha_check_invariant_ns(ha_mon, clk_throttle, time_ns);
+ else if (curr_state == throttled_throttle)
+ return ha_check_invariant_ns(ha_mon, clk_throttle, time_ns);
+ return true;
+}
+
+static inline bool ha_verify_guards(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ bool res = true;
+
+ if (curr_state == armed_throttle && event == sched_switch_in_throttle)
+ ha_reset_env(ha_mon, clk_throttle, time_ns);
+ else if (curr_state == armed_throttled_throttle && event == dl_throttle_throttle)
+ res = ha_get_env(ha_mon, yielded_throttle, time_ns) == 1ull;
+ else if (curr_state == preempted_throttle && event == dl_throttle_throttle)
+ res = ha_get_env(ha_mon, is_constr_dl_throttle, time_ns) == 1ull;
+ else if (curr_state == preempted_throttle && event == sched_switch_in_throttle)
+ ha_reset_env(ha_mon, clk_throttle, time_ns);
+ else if (curr_state == running_throttle && event == dl_replenish_throttle)
+ ha_reset_env(ha_mon, clk_throttle, time_ns);
+ else if (curr_state == running_throttle && event == dl_throttle_throttle)
+ ha_reset_env(ha_mon, clk_throttle, time_ns);
+ else if (curr_state == throttled_throttle && event == dl_replenish_throttle)
+ ha_reset_env(ha_mon, clk_throttle, time_ns);
+ return res;
+}
+
+static inline void ha_setup_invariants(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (next_state == curr_state && event != dl_replenish_throttle)
+ return;
+ if (next_state == running_throttle)
+ ha_start_timer_ns(ha_mon, clk_throttle, runtime_left_ns(ha_mon), time_ns);
+ else if (next_state == throttled_throttle)
+ ha_start_timer_ns(ha_mon, clk_throttle, THROTTLED_TIME_NS, time_ns);
+ else if (curr_state == running_throttle)
+ ha_cancel_timer(ha_mon);
+ else if (curr_state == throttled_throttle)
+ ha_cancel_timer(ha_mon);
+}
+
+static bool ha_verify_constraint(struct ha_monitor *ha_mon,
+ enum states curr_state, enum events event,
+ enum states next_state, u64 time_ns)
+{
+ if (!ha_verify_invariants(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ if (!ha_verify_guards(ha_mon, curr_state, event, next_state, time_ns))
+ return false;
+
+ ha_setup_invariants(ha_mon, curr_state, event, next_state, time_ns);
+
+ return true;
+}
+
+static void handle_dl_replenish(void *data, struct sched_dl_entity *dl, int cpu)
+{
+ da_handle_event(EXPAND_ID(dl, cpu), dl_replenish_throttle);
+}
+
+static void handle_dl_throttle(void *data, struct sched_dl_entity *dl, int cpu)
+{
+ da_handle_event(EXPAND_ID(dl, cpu), dl_throttle_throttle);
+}
+
+static void handle_dl_server_stop(void *data, struct sched_dl_entity *dl, int cpu, bool hard)
+{
+ if (hard)
+ da_handle_event(EXPAND_ID(dl, cpu), sched_switch_out_throttle);
+}
+
+static void handle_sched_switch(void *data, bool preempt,
+ struct task_struct *prev,
+ struct task_struct *next,
+ unsigned int prev_state)
+{
+ struct sched_dl_entity *dl;
+ int cpu = task_cpu(next);
+
+ if (prev->policy == SCHED_DEADLINE)
+ da_handle_event(EXPAND_ID(&prev->dl, cpu), sched_switch_out_throttle);
+ if (next->policy == SCHED_DEADLINE)
+ da_handle_start_event(EXPAND_ID(&next->dl, cpu), sched_switch_in_throttle);
+
+ /*
+ * The server is available in next only if the next task is boosted,
+ * otherwise we need to retrieve it.
+ * Here the server continues in the state running/armed until actually
+ * stopped, this works since we continue expecting a throttle.
+ */
+ dl = get_fair_server(next);
+ if (!dl)
+ return;
+ if (next->dl_server)
+ da_handle_start_event(EXPAND_ID(next->dl_server, cpu), sched_switch_in_throttle);
+ else if (is_idle_task(next) || next->policy == SCHED_NORMAL)
+ da_handle_event(EXPAND_ID(dl, cpu), dl_defer_arm_throttle);
+}
+
+static void handle_syscall(void *data, struct pt_regs *regs, long id)
+{
+ struct task_struct *p;
+ int new_policy = -1;
+
+ new_policy = extract_params(regs, id, &p);
+ if (new_policy < 0 || new_policy == p->policy)
+ return;
+ if (p->policy == SCHED_DEADLINE) {
+ da_reset(EXPAND_ID(&p->dl, DL_TASK));
+ /*
+ * When a task changes from SCHED_DEADLINE to SCHED_NORMAL, the
+ * runtime after the change is counted in the fair server.
+ */
+ if (new_policy == SCHED_NORMAL) {
+ struct sched_dl_entity *dl = get_fair_server(p);
+
+ if (!dl || !p->on_cpu)
+ return;
+ da_handle_event(EXPAND_ID(dl, task_cpu(p)), dl_defer_arm_throttle);
+ }
+ } else if (new_policy == SCHED_DEADLINE) {
+ da_create_conditional(EXPAND_ID(&p->dl, DL_TASK), GFP_NOWAIT);
+ }
+}
+
+static int enable_throttle(void)
+{
+ int retval;
+
+ retval = da_monitor_init();
+ if (retval)
+ return retval;
+
+ retval = init_storage();
+ if (retval)
+ return retval;
+ rv_attach_trace_probe("throttle", sched_dl_replenish_tp, handle_dl_replenish);
+ rv_attach_trace_probe("throttle", sched_dl_throttle_tp, handle_dl_throttle);
+ rv_attach_trace_probe("throttle", sched_switch, handle_sched_switch);
+ if (!should_skip_syscall_handle())
+ rv_attach_trace_probe("throttle", sys_enter, handle_syscall);
+ rv_attach_trace_probe("throttle", task_newtask, handle_newtask);
+ rv_attach_trace_probe("nomiss", sched_dl_server_stop_tp, handle_dl_server_stop);
+ rv_attach_trace_probe("throttle", sched_process_exit, handle_exit);
+
+ return 0;
+}
+
+static void disable_throttle(void)
+{
+ rv_this.enabled = 0;
+
+ /* Those are RCU writers, detach earlier hoping to close a bit faster */
+ rv_detach_trace_probe("throttle", task_newtask, handle_newtask);
+ rv_detach_trace_probe("throttle", sched_process_exit, handle_exit);
+ if (!should_skip_syscall_handle())
+ rv_detach_trace_probe("throttle", sys_enter, handle_syscall);
+
+ rv_detach_trace_probe("throttle", sched_dl_replenish_tp, handle_dl_replenish);
+ rv_detach_trace_probe("throttle", sched_dl_throttle_tp, handle_dl_throttle);
+ rv_detach_trace_probe("nomiss", sched_dl_server_stop_tp, handle_dl_server_stop);
+ rv_detach_trace_probe("throttle", sched_switch, handle_sched_switch);
+
+ da_monitor_destroy();
+}
+
+static struct rv_monitor rv_this = {
+ .name = "throttle",
+ .description = "throttle dl entities when they use up their runtime.",
+ .enable = enable_throttle,
+ .disable = disable_throttle,
+ .reset = da_monitor_reset_all,
+ .enabled = 0,
+};
+
+static int __init register_throttle(void)
+{
+ return rv_register_monitor(&rv_this, &rv_deadline);
+}
+
+static void __exit unregister_throttle(void)
+{
+ rv_unregister_monitor(&rv_this);
+}
+
+module_init(register_throttle);
+module_exit(unregister_throttle);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Gabriele Monaco <gmonaco@redhat.com>");
+MODULE_DESCRIPTION("throttle: throttle dl entities when they use up their runtime.");
diff --git a/kernel/trace/rv/monitors/throttle/throttle.h b/kernel/trace/rv/monitors/throttle/throttle.h
new file mode 100644
index 000000000000..17bfa89ea26b
--- /dev/null
+++ b/kernel/trace/rv/monitors/throttle/throttle.h
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Automatically generated C representation of throttle automaton
+ * For further information about this format, see kernel documentation:
+ * Documentation/trace/rv/deterministic_automata.rst
+ */
+
+#define MONITOR_NAME throttle
+
+enum states_throttle {
+ running_throttle,
+ armed_throttle,
+ armed_throttled_throttle,
+ preempted_throttle,
+ preempted_throttled_throttle,
+ throttled_throttle,
+ state_max_throttle,
+};
+
+#define INVALID_STATE state_max_throttle
+
+enum events_throttle {
+ dl_defer_arm_throttle,
+ dl_replenish_throttle,
+ dl_throttle_throttle,
+ sched_switch_in_throttle,
+ sched_switch_out_throttle,
+ event_max_throttle,
+};
+
+enum envs_throttle {
+ clk_throttle,
+ is_constr_dl_throttle,
+ yielded_throttle,
+ env_max_throttle,
+ env_max_stored_throttle = is_constr_dl_throttle,
+};
+
+_Static_assert(env_max_stored_throttle <= MAX_HA_ENV_LEN, "Not enough slots");
+#define HA_CLK_NS
+
+struct automaton_throttle {
+ char *state_names[state_max_throttle];
+ char *event_names[event_max_throttle];
+ char *env_names[env_max_throttle];
+ unsigned char function[state_max_throttle][event_max_throttle];
+ unsigned char initial_state;
+ bool final_states[state_max_throttle];
+};
+
+static const struct automaton_throttle automaton_throttle = {
+ .state_names = {
+ "running",
+ "armed",
+ "armed_throttled",
+ "preempted",
+ "preempted_throttled",
+ "throttled",
+ },
+ .event_names = {
+ "dl_defer_arm",
+ "dl_replenish",
+ "dl_throttle",
+ "sched_switch_in",
+ "sched_switch_out",
+ },
+ .env_names = {
+ "clk",
+ "is_constr_dl",
+ "yielded",
+ },
+ .function = {
+ {
+ armed_throttle,
+ running_throttle,
+ throttled_throttle,
+ running_throttle,
+ preempted_throttle,
+ },
+ {
+ armed_throttle,
+ armed_throttle,
+ armed_throttled_throttle,
+ running_throttle,
+ preempted_throttle,
+ },
+ {
+ armed_throttled_throttle,
+ armed_throttle,
+ armed_throttled_throttle,
+ INVALID_STATE,
+ preempted_throttled_throttle,
+ },
+ {
+ armed_throttle,
+ preempted_throttle,
+ preempted_throttled_throttle,
+ running_throttle,
+ preempted_throttle,
+ },
+ {
+ armed_throttled_throttle,
+ preempted_throttle,
+ INVALID_STATE,
+ INVALID_STATE,
+ preempted_throttled_throttle,
+ },
+ {
+ armed_throttled_throttle,
+ running_throttle,
+ INVALID_STATE,
+ INVALID_STATE,
+ preempted_throttled_throttle,
+ },
+ },
+ .initial_state = running_throttle,
+ .final_states = { 1, 0, 0, 0, 0, 0 },
+};
diff --git a/kernel/trace/rv/monitors/throttle/throttle_trace.h b/kernel/trace/rv/monitors/throttle/throttle_trace.h
new file mode 100644
index 000000000000..7e376d3aec60
--- /dev/null
+++ b/kernel/trace/rv/monitors/throttle/throttle_trace.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Snippet to be included in rv_trace.h
+ */
+
+#ifdef CONFIG_RV_MON_THROTTLE
+DEFINE_EVENT(event_da_monitor_id, event_throttle,
+ TP_PROTO(int id, char *state, char *event, char *next_state, bool final_state),
+ TP_ARGS(id, state, event, next_state, final_state));
+
+DEFINE_EVENT(error_da_monitor_id, error_throttle,
+ TP_PROTO(int id, char *state, char *event),
+ TP_ARGS(id, state, event));
+
+DEFINE_EVENT(error_env_da_monitor_id, error_env_throttle,
+ TP_PROTO(int id, char *state, char *event, char *env),
+ TP_ARGS(id, state, event, env));
+#endif /* CONFIG_RV_MON_THROTTLE */
diff --git a/kernel/trace/rv/rv_trace.h b/kernel/trace/rv/rv_trace.h
index 8ac4f315a627..687f42ba2534 100644
--- a/kernel/trace/rv/rv_trace.h
+++ b/kernel/trace/rv/rv_trace.h
@@ -188,6 +188,8 @@ DECLARE_EVENT_CLASS(error_env_da_monitor_id,
);
#include <monitors/stall/stall_trace.h>
+#include <monitors/nomiss/nomiss_trace.h>
+#include <monitors/throttle/throttle_trace.h>
// Add new monitors based on CONFIG_HA_MON_EVENTS_ID here
#endif
diff --git a/tools/verification/models/deadline/nomiss.dot b/tools/verification/models/deadline/nomiss.dot
new file mode 100644
index 000000000000..c52970d6a8f7
--- /dev/null
+++ b/tools/verification/models/deadline/nomiss.dot
@@ -0,0 +1,39 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = circle] "idle"};
+ {node [shape = plaintext, style=invis, label=""] "__init_ready"};
+ {node [shape = doublecircle] "ready"};
+ {node [shape = circle] "ready"};
+ {node [shape = circle] "running"};
+ {node [shape = circle] "sleeping"};
+ {node [shape = circle] "throttled"};
+ "__init_ready" -> "ready";
+ "idle" [label = "idle"];
+ "idle" -> "idle" [ label = "dl_server_idle" ];
+ "idle" -> "ready" [ label = "dl_replenish;reset(clk)" ];
+ "idle" -> "running" [ label = "sched_switch_in" ];
+ "idle" -> "throttled" [ label = "dl_throttle" ];
+ "ready" [label = "ready\nclk < DEADLINE_NS()", color = green3];
+ "ready" -> "idle" [ label = "dl_server_idle" ];
+ "ready" -> "ready" [ label = "sched_wakeup\ndl_server_start\ndl_replenish;reset(clk)" ];
+ "ready" -> "running" [ label = "sched_switch_in" ];
+ "ready" -> "sleeping" [ label = "dl_server_stop" ];
+ "ready" -> "throttled" [ label = "dl_throttle;is_defer == 1" ];
+ "running" [label = "running\nclk < DEADLINE_NS()"];
+ "running" -> "idle" [ label = "dl_server_idle" ];
+ "running" -> "running" [ label = "dl_replenish;reset(clk)\nsched_switch_in\nsched_wakeup" ];
+ "running" -> "sleeping" [ label = "sched_switch_suspend\ndl_server_stop" ];
+ "running" -> "throttled" [ label = "dl_throttle" ];
+ "sleeping" [label = "sleeping"];
+ "sleeping" -> "ready" [ label = "sched_wakeup\ndl_replenish;reset(clk)" ];
+ "sleeping" -> "sleeping" [ label = "dl_server_idle\ndl_server_start" ];
+ "sleeping" -> "throttled" [ label = "dl_throttle;is_constr_dl == 1" ];
+ "throttled" [label = "throttled"];
+ "throttled" -> "ready" [ label = "dl_replenish;reset(clk)" ];
+ "throttled" -> "throttled" [ label = "sched_switch_suspend\nsched_wakeup\ndl_server_start\ndl_server_idle\ndl_throttle" ];
+ { rank = min ;
+ "__init_ready";
+ "ready";
+ }
+}
diff --git a/tools/verification/models/deadline/throttle.dot b/tools/verification/models/deadline/throttle.dot
new file mode 100644
index 000000000000..b68b054a635a
--- /dev/null
+++ b/tools/verification/models/deadline/throttle.dot
@@ -0,0 +1,44 @@
+digraph state_automaton {
+ center = true;
+ size = "7,11";
+ {node [shape = circle] "armed"};
+ {node [shape = circle] "armed_throttled"};
+ {node [shape = circle] "preempted"};
+ {node [shape = circle] "preempted_throttled"};
+ {node [shape = plaintext, style=invis, label=""] "__init_running"};
+ {node [shape = doublecircle] "running"};
+ {node [shape = circle] "running"};
+ {node [shape = circle] "throttled"};
+ "__init_running" -> "running";
+ "armed" [label = "armed"];
+ "armed" -> "armed" [ label = "dl_replenish\ndl_defer_arm" ];
+ "armed" -> "armed_throttled" [ label = "dl_throttle" ];
+ "armed" -> "preempted" [ label = "sched_switch_out" ];
+ "armed" -> "running" [ label = "sched_switch_in;reset(clk)" ];
+ "armed_throttled" [label = "armed_throttled"];
+ "armed_throttled" -> "armed" [ label = "dl_replenish" ];
+ "armed_throttled" -> "armed_throttled" [ label = "dl_defer_arm\ndl_throttle;yielded==1" ];
+ "armed_throttled" -> "preempted_throttled" [ label = "sched_switch_out" ];
+ "preempted" [label = "preempted"];
+ "preempted" -> "armed" [ label = "dl_defer_arm" ];
+ "preempted" -> "preempted" [ label = "dl_replenish\nsched_switch_out" ];
+ "preempted" -> "preempted_throttled" [ label = "dl_throttle;is_constr_dl == 1" ];
+ "preempted" -> "running" [ label = "sched_switch_in;reset(clk)" ];
+ "preempted_throttled" [label = "preempted_throttled"];
+ "preempted_throttled" -> "armed_throttled" [ label = "dl_defer_arm" ];
+ "preempted_throttled" -> "preempted" [ label = "dl_replenish" ];
+ "preempted_throttled" -> "preempted_throttled" [ label = "sched_switch_out" ];
+ "running" [label = "running\nclk < runtime_left_ns()", color = green3];
+ "running" -> "armed" [ label = "dl_defer_arm" ];
+ "running" -> "preempted" [ label = "sched_switch_out" ];
+ "running" -> "running" [ label = "dl_replenish;reset(clk)\nsched_switch_in" ];
+ "running" -> "throttled" [ label = "dl_throttle;reset(clk)" ];
+ "throttled" [label = "throttled\nclk < THROTTLED_TIME_NS"];
+ "throttled" -> "armed_throttled" [ label = "dl_defer_arm" ];
+ "throttled" -> "preempted_throttled" [ label = "sched_switch_out" ];
+ "throttled" -> "running" [ label = "dl_replenish;reset(clk)" ];
+ { rank = min ;
+ "__init_running";
+ "running";
+ }
+}
--
2.51.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring
2025-09-19 14:09 ` [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring Gabriele Monaco
@ 2025-10-02 9:26 ` Nam Cao
0 siblings, 0 replies; 14+ messages in thread
From: Nam Cao @ 2025-10-02 9:26 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Steven Rostedt, Jonathan Corbet,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
John Kacur
Gabriele Monaco <gmonaco@redhat.com> writes:
> Previous changes refactored the da_monitor header file to avoid using
> macros. This implies a few changes in how to import and use da_monitor
> helpers:
>
> DECLARE_DA_MON_<TYPE>(name, type) is substituted by
> #define RV_MON_TYPE RV_MON_<TYPE>
>
> da_handle_event_<name>() is substituted by
> da_handle_event()
>
> Update the documentation to reflect the changes.
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
This is probably best reviewed by someone without much RV background. As
I already know these information, I'm not sure whether the doc is
confusing to outsiders.
But from my point of view:
Reviewed-by: Nam Cao <namcao@linutronix.de>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata
2025-09-19 14:09 ` [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata Gabriele Monaco
@ 2025-10-10 13:46 ` Nam Cao
2025-10-13 8:33 ` Gabriele Monaco
0 siblings, 1 reply; 14+ messages in thread
From: Nam Cao @ 2025-10-10 13:46 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Steven Rostedt, Jonathan Corbet,
linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
John Kacur
Gabriele Monaco <gmonaco@redhat.com> writes:
> Describe theory and implementation of hybrid automata in the dedicated
> page hybrid_automata.rst
> Include a section on how to integrate a hybrid automaton in
> monitor_synthesis.rst
> Also remove a hanging $ in deterministic_automata.rst
>
> Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> ---
> .../trace/rv/deterministic_automata.rst | 2 +-
> Documentation/trace/rv/hybrid_automata.rst | 340 ++++++++++++++++++
> Documentation/trace/rv/index.rst | 1 +
> Documentation/trace/rv/monitor_synthesis.rst | 107 ++++++
> 4 files changed, 449 insertions(+), 1 deletion(-)
> create mode 100644 Documentation/trace/rv/hybrid_automata.rst
>
> diff --git a/Documentation/trace/rv/deterministic_automata.rst b/Documentation/trace/rv/deterministic_automata.rst
> index d0638f95a455..7a1c2b20ec72 100644
> --- a/Documentation/trace/rv/deterministic_automata.rst
> +++ b/Documentation/trace/rv/deterministic_automata.rst
> @@ -11,7 +11,7 @@ where:
> - *E* is the finite set of events;
> - x\ :subscript:`0` is the initial state;
> - X\ :subscript:`m` (subset of *X*) is the set of marked (or final) states.
> -- *f* : *X* x *E* -> *X* $ is the transition function. It defines the state
> +- *f* : *X* x *E* -> *X* is the transition function. It defines the state
> transition in the occurrence of an event from *E* in the state *X*. In the
> special case of deterministic automata, the occurrence of the event in *E*
> in a state in *X* has a deterministic next state from *X*.
> diff --git a/Documentation/trace/rv/hybrid_automata.rst b/Documentation/trace/rv/hybrid_automata.rst
> new file mode 100644
> index 000000000000..ea701114c54d
> --- /dev/null
> +++ b/Documentation/trace/rv/hybrid_automata.rst
> @@ -0,0 +1,340 @@
> +Hybrid Automata
> +===============
> +
> +Hybrid automata are an extension of deterministic automata, there are several
> +definitions of hybrid automata in the literature. The adaptation implemented
> +here is formally denoted by G and defined as a 7-tuple:
> +
> + *G* = { *X*, *E*, *V*, *f*, x\ :subscript:`0`, X\ :subscript:`m`, *i* }
> +
> +- *X* is the set of states;
> +- *E* is the finite set of events;
> +- *V* is the finite set of environment variables;
> +- x\ :subscript:`0` is the initial state;
> +- X\ :subscript:`m` (subset of *X*) is the set of marked (or final) states.
> +- *f* : *X* x *E* x *C(V)* -> *X* is the transition function.
> + It defines the state transition in the occurrence of an event from *E* in the
> + state *X*. Unlike deterministic automata, the transition function also
> + includes guards from the set of all possible constraints (defined as *C(V)*).
> + Guards can be true or false with the valuation of *V* when the event occurs,
> + and the transition is possible only when constraints are true. Similarly to
> + deterministic automata, the occurrence of the event in *E* in a state in *X*
> + has a deterministic next state from *X*, if the guard is true.
> +- *i* : *X* -> *C'(V)* is the invariant assignment function, this is a
> + constraint assigned to each state in *X*, every state in *X* must be left
> + before the invariant turns to false. We can omit the representation of
> + invariants whose value is true regardless of the valuation of *V*.
This brings back bad memories from university..
> +
> +The set of all possible constraints *C(V)* is defined according to the
> +following grammar:
> +
> + g = v < c | v > c | v <= c | v >= c | v == c | v != c | g && g | true
> +
> +With v a variable in *V* and c a numerical value.
> +
> +We define the special case of hybrid automata whose variables grow with uniform
> +rates as timed automata. In this case, the variables are called clocks.
> +As the name implies, timed automata can be used to describe real time.
> +Additionally, clocks support another type of guard which always evaluates to true:
> +
> + reset(v)
> +
> +The reset constraint is used to set the value of a clock to 0.
> +
> +The set of invariant constraints *C'(V)* is a subset of *C(V)* including only
> +constraint of the form:
> +
> + g = v < c | true
> +
> +This simplifies the implementation as a clock expiration is a necessary and
> +sufficient condition for the violation of invariants while still allowing more
> +complex constraints to be specified as guards.
> +
> +It is important to note that any valid hybrid automaton is a valid
> +deterministic automaton with additional guards and invariants. Those can only
> +further constrain what transitions are valid but it is not possible to define
> +transition functions starting from the same state in *X* and the same event in
> +*E* but ending up in different states in *X* based on the valuation of *V*.
Perhaps remove the double "valid". Usually people use the phrase "any
valid A is a valid B" to say that B is a superset of A, but it is
opposite here.
> +
> +Examples
> +--------
> +
> +Wip as hybrid automaton
> +~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The 'wip' (wakeup in preemptive) example introduced as a deterministic automaton
> +can also be described as:
> +
> +- *X* = { ``any_thread_running`` }
> +- *E* = { ``sched_waking`` }
> +- *V* = { ``preemptive`` }
> +- x\ :subscript:`0` = ``any_thread_running``
> +- X\ :subscript:`m` = {``any_thread_running``}
> +- *f* =
> + - *f*\ (``any_thread_running``, ``sched_waking``, ``preemptive==0``) = ``any_thread_running``
> +- *i* =
> + - *i*\ (``any_thread_running``) = ``true``
> +
> +Which can be represented graphically as::
> +
> + |
> + |
> + v
> + #====================# sched_waking;preemptive==0
> + H H ------------------------------+
> + H any_thread_running H |
> + H H <-----------------------------+
> + #====================#
> +
> +In this example, by using the preemptive state of the system as an environment
> +variable, we can assert this constraint on ``sched_waking`` without requiring
> +preemption events (as we would in a deterministic automaton), which can be
> +useful in case those events are not available or not reliable on the system.
> +
> +Since all the invariants in *i* are true, we can omit them from the representation.
> +
> +Stall model with guards (iteration 1)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +As a sample timed automaton we can define 'stall' as:
> +
> +- *X* = { ``dequeued``, ``enqueued``, ``running``}
> +- *E* = { ``enqueue``, ``dequeue``, ``switch_in``}
> +- *V* = { ``clk`` }
> +- x\ :subscript:`0` = ``dequeue``
> +- X\ :subscript:`m` = {``dequeue``}
> +- *f* =
> + - *f*\ (``enqueued``, ``switch_in``, ``clk < threshold``) = ``running``
> + - *f*\ (``running``, ``dequeue``) = ``dequeued``
> + - *f*\ (``dequeued``, ``enqueue``, ``reset(clk)``) = ``enqueued``
> +- *i* = *omitted as all true*
> +
> +Graphically represented as::
> +
> + |
> + |
> + v
> + #============================#
> + H dequeued H <+
> + #============================# |
> + | |
> + | enqueue; reset(clk) |
> + v |
> + +----------------------------+ |
> + | enqueued | | dequeue
> + +----------------------------+ |
> + | |
> + | switch_in; clk < threshold |
> + v |
> + +----------------------------+ |
> + | running | -+
> + +----------------------------+
> +
> +This model imposes that the time between when a task is enqueued (it becomes
> +runnable) and when the task gets to run must be lower than a certain threshold.
> +A failure in this model means that the task is starving.
> +One problem in using guards on the edges in this case is that the model will
> +not report a failure until the ``switch_in`` event occurs. This means that,
> +according to the model, it is valid for the task never to run.
> +
> +Stall model with invariants (iteration 2)
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The first iteration isn't exactly what was intended, we can change the model as:
> +
> +- *X* = { ``dequeued``, ``enqueued``, ``running``}
> +- *E* = { ``enqueue``, ``dequeue``, ``switch_in``}
> +- *V* = { ``clk`` }
> +- x\ :subscript:`0` = ``dequeue``
> +- X\ :subscript:`m` = {``dequeue``}
> +- *f* =
> + - *f*\ (``enqueued``, ``switch_in``) = ``running``
> + - *f*\ (``running``, ``dequeue``) = ``dequeued``
> + - *f*\ (``dequeued``, ``enqueue``, ``reset(clk)``) = ``enqueued``
> +- *i* =
> + - *i*\ (``enqueued``) = ``clk < threshold``
> +
> +Graphically::
> +
> + |
> + |
> + v
> + #=========================#
> + H dequeued H <+
> + #=========================# |
> + | |
> + | enqueue; reset(clk) |
> + v |
> + +-------------------------+ |
> + | enqueued | |
> + | clk < threshold | | dequeue
> + +-------------------------+ |
> + | |
> + | switch_in |
> + v |
> + +-------------------------+ |
> + | running | -+
> + +-------------------------+
> +
> +In this case, we moved the guard as an invariant to the ``enqueued`` state,
> +this means we not only forbid the occurrence of ``switch_in`` when ``clk`` is
> +past the threshold but also mark as invalid in case we are *still* in
> +``enqueued`` after the threshold. This model is effectively in an invalid state
> +as soon as a task is starving, rather than when the starving task finally runs.
> +
> +Hybrid Automaton in C
> +---------------------
> +
> +The definition of hybrid automata in C is heavily based on the deterministic
> +automata one. Specifically, we add the set of environment variables and the
> +constraints (both guards on transitions and invariants on states) as follows.
> +This is a combination of both iterations of the stall example::
> +
> + /* enum representation of X (set of states) to be used as index */
> + enum states {
> + dequeued = 0,
I think you already removed this " = 0" in an earlier patch?
> + enqueued,
> + running,
> + state_max
> + };
> +
> + #define INVALID_STATE state_max
> +
> + /* enum representation of E (set of events) to be used as index */
> + enum events {
> + dequeue = 0,
> + enqueue,
> + switch_in,
> + event_max
> + };
> +
> + /* enum representation of V (set of environment variables) to be used as index */
> + enum envs {
> + clk = 0,
> + env_max,
> + env_max_stored = env_max
> + };
> +
> + struct automaton {
> + char *state_names[state_max]; // X: the set of states
> + char *event_names[event_max]; // E: the finite set of events
> + char *env_names[env_max]; // V: the finite set of env vars
> + unsigned char function[state_max][event_max]; // f: transition function
> + unsigned char initial_state; // x_0: the initial state
> + bool final_states[state_max]; // X_m: the set of marked states
> + };
> +
> + struct automaton aut = {
> + .state_names = {
> + "dequeued",
> + "enqueued",
> + "running"
> + },
> + .event_names = {
> + "dequeue",
> + "enqueue",
> + "switch_in"
> + },
> + .env_names = {
> + "clk"
> + },
> + .function = {
> + { INVALID_STATE, enqueued, INVALID_STATE },
> + { INVALID_STATE, INVALID_STATE, running },
> + { dequeued, INVALID_STATE, INVALID_STATE },
> + },
> + .initial_state = dequeued,
> + .final_states = { 1, 0, 0 },
> + };
> +
> + static bool verify_constraint(enum states curr_state, enum events event,
> + enum states next_state)
> + {
> + bool res = true;
> +
> + /* Validate guards as part of f */
> + if (curr_state == enqueued && event == sched_switch_in)
> + res = get_env(clk) < threshold;
> + else if (curr_state == dequeued && event == sched_wakeup)
> + reset_env(clk);
> +
> + /* Validate invariants in i */
> + if (next_state == curr_state || !res)
^^^^
indentation error ;)
> + return res;
> + if (next_state == enqueued)
> + ha_start_timer_jiffy(ha_mon, clk, threshold_jiffies);
> + else if (curr_state == enqueued)
> + res = !ha_cancel_timer(ha_mon);
> + return res;
> + }
> +
> +The function ``verify_constraint``, here reported as simplified, checks guards,
> +performs resets and starts timers to validate invariants according to
> +specification.
> +Due to the complex nature of environment variables, the user needs to provide
> +functions to get and reset environment variables, although we provide some
> +helpers for common types (e.g. clocks with ns or jiffy granularity).
Is there theoretical reason that functions to get/set variables cannot
be generated? Or you just do not have time for it yet?
> +Since invariants are only defined as clock expirations (e.g. *clk <
> +threshold*), the callback for timers armed when entering the state is in fact a
> +failure in the model and triggers a reaction. Leaving the state stops the timer
> +and checks for its expiration, in case the callback was late.
"callback for timers armed when entering the state is in fact a failure
in the model and triggers a reaction." - I have problem parsing this
sentence. How can "callback for timers" be armed? Or do you mean arming
timers while entering a state is a failure in the model? What is it a failure?
> +It is important to note that timers introduce overhead, if the monitor has
> +several instances (e.g. all tasks) this can become an issue.
> +If the monitor is guaranteed to *eventually* leave the state and the incurred
> +delay to wait for the next event is acceptable, guards can be use to lower the
> +monitor's overhead.
How about having some sort of a "background task" which periodically
verifies the invariants?
> +For instance in the stall example, if we are only interested in reporting
> +stalled tasks, rather than reacting as soon as a task is stalled, and
> +``switch_in`` is eventually going to occur, we can use the first iteration.
> +
> +Graphviz .dot format
> +--------------------
> +
> +Also the Graphviz representation of hybrid automata is an extension of the
> +deterministic automata one. Specifically, guards can be provided in the event
> +name separated by ``;``::
> +
> + "state_start" -> "state_dest" [ label = "sched_waking;preemptible==0;reset(clk)" ];
> +
> +Invariant can be specified in the state label (not the node name!) separated by ``\n``::
> +
> + "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
> +
> +Constraints can be specified as valid C comparisons and allow spaces, the first
> +element of the comparison must be the clock while the second is a numerical or
> +parametrised value. Guards allow comparisons to be combined with boolean
> +operations (``&&`` and ``||``), resets must be separated from other constraints.
> +
> +This is the full example of the last version of the 'stall' model in DOT::
> +
> + digraph state_automaton {
> + {node [shape = circle] "enqueued"};
> + {node [shape = plaintext, style=invis, label=""] "__init_dequeued"};
> + {node [shape = doublecircle] "dequeued"};
> + {node [shape = circle] "running"};
> + "__init_dequeued" -> "dequeued";
> + "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
> + "running" [label = "running"];
> + "dequeued" [label = "dequeued"];
> + "enqueued" -> "running" [ label = "switch_in" ];
> + "running" -> "dequeued" [ label = "dequeue" ];
> + "dequeued" -> "enqueued" [ label = "enqueue;reset(clk)" ];
> + { rank = min ;
> + "__init_dequeued";
> + "dequeued";
> + }
Btw, the last block (rank = min) doesn't seem to serve any purpose. But
the last time I checked months ago, the parser explodes if it is
removed, not sure if it still does now. But this is another reason that
I would like a rewrite.
Nam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 14/20] rv: Add sample hybrid monitors stall
2025-09-19 14:09 ` [PATCH v2 14/20] rv: Add sample hybrid monitors stall Gabriele Monaco
@ 2025-10-10 14:23 ` Nam Cao
2025-10-13 9:01 ` Gabriele Monaco
0 siblings, 1 reply; 14+ messages in thread
From: Nam Cao @ 2025-10-10 14:23 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Jonathan Corbet, Steven Rostedt,
Masami Hiramatsu, linux-doc, linux-trace-kernel
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
John Kacur
Gabriele Monaco <gmonaco@redhat.com> writes:
> +- Name: stall - wakeup in preemptive
^^^^^^^^^^^^^^^^^^^^
copy-paste mistake?
> +- Type: per-task hybrid automaton
> +- Author: Gabriele Monaco <gmonaco@redhat.com>
> +
> +Description
> +-----------
> +
> +The stalled task (stall) monitor is a sample per-task timed monitor that checks
> +if tasks are scheduled within a defined threshold after they are ready::
> +
> + |
> + |
> + v
> + #==================================#
> + H dequeued H <+
> + #==================================# |
> + | |
> + | sched_wakeup;reset(clk) |
> + v |
> + +----------------------------------+ |
> + | enqueued | |
> + | clk < threshold_jiffies | | sched_switch_wait
> + +----------------------------------+ |
> + | |
> + | sched_switch_in |
> + sched_switch_in v |
> + sched_wakeup +----------------------------------+ |
> + +------------------ | | |
> + | | running | |
> + +-----------------> | | -+
> + +----------------------------------+
I think this monitor does not detect if a task get preempted, but then
never get scheduled again?
This sample monitor does not have to cover everything obviously, but I'm
curious if I understand it correct.
Nam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton
2025-09-19 14:09 ` [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton Gabriele Monaco
@ 2025-10-10 14:29 ` Nam Cao
2025-10-13 14:14 ` Gabriele Monaco
0 siblings, 1 reply; 14+ messages in thread
From: Nam Cao @ 2025-10-10 14:29 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Steven Rostedt, Jonathan Corbet,
Masami Hiramatsu, linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
John Kacur
Gabriele Monaco <gmonaco@redhat.com> writes:
> - | sched_need_resched
> - | sched_waking
> - | irq_entry
> - | +--------------------+
> - v v |
> - +------------------------------------------------------+
> - +----------- | disabled | <+
> - | +------------------------------------------------------+ |
> - | | ^ |
> - | | preempt_disable sched_need_resched |
> - | preempt_enable | +--------------------+ |
> - | v | v | |
> - | +------------------------------------------------------+ |
> - | | irq_disabled | |
> - | +------------------------------------------------------+ |
> - | | | ^ |
> - | irq_entry irq_entry | | |
> - | sched_need_resched v | irq_disable |
> - | sched_waking +--------------+ | | |
> - | +----- | | irq_enable | |
> - | | | in_irq | | | |
> - | +----> | | | | |
> - | +--------------+ | | irq_disable
> - | | | | |
> - | irq_enable | irq_enable | | |
> - | v v | |
> - | #======================================================# |
> - | H enabled H |
> - | #======================================================# |
> - | | ^ ^ preempt_enable | |
> - | preempt_disable preempt_enable +--------------------+ |
> - | v | |
> - | +------------------+ | |
> - +----------> | preempt_disabled | -+ |
> - +------------------+ |
> - | |
> - +-------------------------------------------------------+
> -
> +
> + |
> + |
> + v
> + #=========# sched_need_resched;irq_off == 1
> + H H sched_waking;irq_off == 1 && preempt_off == 1
> + H any H ------------------------------------------------+
> + H H |
> + H H <-----------------------------------------------+
> + #=========#
Nice!
> + * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
> + * preemption (adding one to the preempt_count). Since we are
> + * interested in the preempt_count at the time the tracepoint was
> + * hit, we consider 1 as still enabled.
> + */
> + if (IS_ENABLED(CONFIG_PREEMPTION))
> + return (preempt_count() & PREEMPT_MASK) > 1;
FYI, there is plan to keep preemption enabled during tracepoint
handling. So keep that in mind when this monitor breaks.
Nam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 20/20] rv: Add deadline monitors
2025-09-19 14:09 ` [PATCH v2 20/20] rv: Add deadline monitors Gabriele Monaco
@ 2025-10-10 15:04 ` Nam Cao
2025-10-13 7:30 ` Gabriele Monaco
0 siblings, 1 reply; 14+ messages in thread
From: Nam Cao @ 2025-10-10 15:04 UTC (permalink / raw)
To: Gabriele Monaco, linux-kernel, Steven Rostedt, Jonathan Corbet,
Masami Hiramatsu, linux-trace-kernel, linux-doc
Cc: Gabriele Monaco, Tomas Glozar, Juri Lelli, Clark Williams,
John Kacur
Gabriele Monaco <gmonaco@redhat.com> writes:
> +/*
> + * Dummy values if not available
> + */
> +#ifndef __NR_sched_setscheduler
> +#define __NR_sched_setscheduler -1
> +#endif
> +#ifndef __NR_sched_setattr
> +#define __NR_sched_setattr -2
> +#endif
You can use __COUNTER__ intead, would make it easier to add to or remove
from this list later on.
> +static void handle_syscall(void *data, struct pt_regs *regs, long id)
> +{
> + struct task_struct *p;
> + int new_policy = -1;
> +
> + new_policy = extract_params(regs, id, &p);
> + if (new_policy < 0 || new_policy == p->policy)
> + return;
> + if (p->policy == SCHED_DEADLINE)
> + da_reset(EXPAND_ID(&p->dl, DL_TASK));
> + else if (new_policy == SCHED_DEADLINE)
> + da_create_conditional(EXPAND_ID(&p->dl, DL_TASK), GFP_NOWAIT);
> +}
What if the syscall fails for any reason? Wouldn't the monitor stores
incorrect information?
Nam
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 20/20] rv: Add deadline monitors
2025-10-10 15:04 ` Nam Cao
@ 2025-10-13 7:30 ` Gabriele Monaco
0 siblings, 0 replies; 14+ messages in thread
From: Gabriele Monaco @ 2025-10-13 7:30 UTC (permalink / raw)
To: Nam Cao, linux-kernel, Steven Rostedt, Jonathan Corbet,
Masami Hiramatsu, linux-trace-kernel, linux-doc
Cc: Tomas Glozar, Juri Lelli, Clark Williams, John Kacur
On Fri, 2025-10-10 at 17:04 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > +/*
> > + * Dummy values if not available
> > + */
> > +#ifndef __NR_sched_setscheduler
> > +#define __NR_sched_setscheduler -1
> > +#endif
> > +#ifndef __NR_sched_setattr
> > +#define __NR_sched_setattr -2
> > +#endif
>
> You can use __COUNTER__ intead, would make it easier to add to or remove
> from this list later on.
Right, good point.
> > +static void handle_syscall(void *data, struct pt_regs *regs, long id)
> > +{
> > + struct task_struct *p;
> > + int new_policy = -1;
> > +
> > + new_policy = extract_params(regs, id, &p);
> > + if (new_policy < 0 || new_policy == p->policy)
> > + return;
> > + if (p->policy == SCHED_DEADLINE)
> > + da_reset(EXPAND_ID(&p->dl, DL_TASK));
> > + else if (new_policy == SCHED_DEADLINE)
> > + da_create_conditional(EXPAND_ID(&p->dl, DL_TASK),
> > GFP_NOWAIT);
> > +}
>
> What if the syscall fails for any reason? Wouldn't the monitor stores
> incorrect information?
Yes, that could happen, this seems an issue only in the throttle monitor when
failing a DEADLINE->NORMAL, other cases are at most resetting the monitor.
In this case, the monitor is really saying: a deadline task is no longer
deadline, so its runtime should be counted as fair server. If the syscall does
fail, we are assuming the fair server as armed (not running) for a bit more.
Here I couldn't find a nice way to handle this: if I attach only to the syscall
return, I see if it succeeded but I lost the previous policy, attaching to both
(and keep a per-cpu storage just for this process) seems a bit overcomplicated.
I see this can potentially lead to errors in the monitor, but I'm not sure the
extra complexity is worth it here, considering this failure might only happen if
a user really makes wrong calls.
I tried to avoid creating a new tracepoint on a successful policy change.
What do you think?
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata
2025-10-10 13:46 ` Nam Cao
@ 2025-10-13 8:33 ` Gabriele Monaco
0 siblings, 0 replies; 14+ messages in thread
From: Gabriele Monaco @ 2025-10-13 8:33 UTC (permalink / raw)
To: Nam Cao, linux-kernel, Steven Rostedt, Jonathan Corbet,
linux-trace-kernel, linux-doc
Cc: Tomas Glozar, Juri Lelli, Clark Williams, John Kacur
On Fri, 2025-10-10 at 15:46 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > Describe theory and implementation of hybrid automata in the dedicated
> > page hybrid_automata.rst
> > Include a section on how to integrate a hybrid automaton in
> > monitor_synthesis.rst
> > Also remove a hanging $ in deterministic_automata.rst
> >
> > Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
> > ---
> This brings back bad memories from university..
:')
> > +It is important to note that any valid hybrid automaton is a valid
> > +deterministic automaton
>
> Perhaps remove the double "valid". Usually people use the phrase "any
> valid A is a valid B" to say that B is a superset of A, but it is
> opposite here.
Alright, will do.
> > +This is a combination of both iterations of the stall example::
> > +
> > + /* enum representation of X (set of states) to be used as index */
> > + enum states {
> > + dequeued = 0,
>
> I think you already removed this " = 0" in an earlier patch?
Right, missed that.
> > + /* Validate invariants in i */
> > + if (next_state == curr_state || !res)
> ^^^^
> indentation error ;)
Good catch.
> > +Due to the complex nature of environment variables, the user needs to
> > provide
> > +functions to get and reset environment variables, although we provide some
> > +helpers for common types (e.g. clocks with ns or jiffy granularity).
>
> Is there theoretical reason that functions to get/set variables cannot
> be generated? Or you just do not have time for it yet?
Not theoretical but practical, the monitor cannot always define /what/ an
environment variable is. In case of clocks (jiffy and ns) that's easy and the
parser does in fact generate get and reset functions, the user only needs to
specify the measure unit as explained somewhere else.
It is possible to add more exotic variables that don't follow common clock rules
and need different get/reset definitions. Now, in practice, that may not happen
with clocks (I cannot think of an alternative clock definition), but can happen
for other variables. For instance if the variable describes the preempt count,
the model cannot know in advance and the user will need to supply how to read
that in the kernel (just like we do with tracepoints, although event names
/might/ hint something).
As I get it, this isn't so clear so I should probably try and reword it.
I might just assume variables without unit but with a reset are, say, jiffy
clocks and never expect manual definition of the reset function, but that might
be misleading at times: e.g. if a user wants a ns clock but forgets the unit,
the monitor would still build.
>
> > +Since invariants are only defined as clock expirations (e.g. *clk <
> > +threshold*), the callback for timers armed when entering the state is in
> > fact a
> > +failure in the model and triggers a reaction. Leaving the state stops the
> > timer
> > +and checks for its expiration, in case the callback was late.
>
> "callback for timers armed when entering the state is in fact a failure
> in the model and triggers a reaction." - I have problem parsing this
> sentence. How can "callback for timers" be armed? Or do you mean arming
> timers while entering a state is a failure in the model? What is it a failure?
Right, that sentence doesn't make sense.
We arm a timer when entering the state, expiration of such timer is a failure.
The timer is cancelled when leaving the state, so in fact leaving the state
before the timer expiration is the only valid behaviour.
> > +It is important to note that timers introduce overhead, if the monitor has
> > +several instances (e.g. all tasks) this can become an issue.
> > +If the monitor is guaranteed to *eventually* leave the state and the
> > incurred
> > +delay to wait for the next event is acceptable, guards can be use to lower
> > the
> > +monitor's overhead.
>
> How about having some sort of a "background task" which periodically
> verifies the invariants?
I didn't update this part, but now timers can work also via timer wheel, which
is cutting down costs by sacrificing some reactivity (not correctness though). I
assume the background thread would be quite similar to what the timer wheel
already does.
But I definitely need to mention this because the timer wheel is not as heavy as
the hrtimers and its overhead is usually acceptable (unless proven otherwise for
a specific monitor/workload, I'd say).
> > +This is the full example of the last version of the 'stall' model in DOT::
> > +
> > + digraph state_automaton {
> > + {node [shape = circle] "enqueued"};
> > + {node [shape = plaintext, style=invis, label=""] "__init_dequeued"};
> > + {node [shape = doublecircle] "dequeued"};
> > + {node [shape = circle] "running"};
> > + "__init_dequeued" -> "dequeued";
> > + "enqueued" [label = "enqueued\nclk < threshold_jiffies"];
> > + "running" [label = "running"];
> > + "dequeued" [label = "dequeued"];
> > + "enqueued" -> "running" [ label = "switch_in" ];
> > + "running" -> "dequeued" [ label = "dequeue" ];
> > + "dequeued" -> "enqueued" [ label = "enqueue;reset(clk)" ];
> > + { rank = min ;
> > + "__init_dequeued";
> > + "dequeued";
> > + }
>
> Btw, the last block (rank = min) doesn't seem to serve any purpose. But
> the last time I checked months ago, the parser explodes if it is
> removed, not sure if it still does now. But this is another reason that
> I would like a rewrite.
Mmh, that's automatically generated by Supremica and, I believe, in some models
it's tuning a bit the position of nodes. Quite strange that the parser exploded,
those lines should be completely ignored.. Still, we know the parser needs this
big refactor.
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 14/20] rv: Add sample hybrid monitors stall
2025-10-10 14:23 ` Nam Cao
@ 2025-10-13 9:01 ` Gabriele Monaco
0 siblings, 0 replies; 14+ messages in thread
From: Gabriele Monaco @ 2025-10-13 9:01 UTC (permalink / raw)
To: Nam Cao, linux-kernel, Jonathan Corbet, Steven Rostedt,
Masami Hiramatsu, linux-doc, linux-trace-kernel
Cc: Tomas Glozar, Juri Lelli, Clark Williams, John Kacur
On Fri, 2025-10-10 at 16:23 +0200, Nam Cao wrote:
> Gabriele Monaco <gmonaco@redhat.com> writes:
> > +- Name: stall - wakeup in preemptive
> ^^^^^^^^^^^^^^^^^^^^
> copy-paste mistake?
Damn, right..
> > +- Type: per-task hybrid automaton
> > +- Author: Gabriele Monaco <gmonaco@redhat.com>
> > +
> > +Description
> > +-----------
> > +
> > +The stalled task (stall) monitor is a sample per-task timed monitor that
> > checks
> > +if tasks are scheduled within a defined threshold after they are ready::
> > +
> > + |
> > + |
> > + v
> > + #==================================#
> > + H dequeued H <+
> > + #==================================# |
> > + | |
> > + | sched_wakeup;reset(clk) |
> > + v |
> > + +----------------------------------+ |
> > + | enqueued | |
> > + | clk < threshold_jiffies | |
> > sched_switch_wait
> > + +----------------------------------+ |
> > + | |
> > + | sched_switch_in |
> > + sched_switch_in v |
> > + sched_wakeup +----------------------------------+ |
> > + +------------------ | | |
> > + | | running | |
> > + +-----------------> | | -+
> > + +----------------------------------+
>
> I think this monitor does not detect if a task get preempted, but then
> never get scheduled again?
>
> This sample monitor does not have to cover everything obviously, but I'm
> curious if I understand it correct.
Yes, that's right. I think I could cover that scenario by just going back to
enqueued and resetting when preempted. Since it's a simple change, it's probably
something meaningful to do..
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton
2025-10-10 14:29 ` Nam Cao
@ 2025-10-13 14:14 ` Gabriele Monaco
0 siblings, 0 replies; 14+ messages in thread
From: Gabriele Monaco @ 2025-10-13 14:14 UTC (permalink / raw)
To: Nam Cao, linux-kernel, Steven Rostedt, Jonathan Corbet,
Masami Hiramatsu, linux-trace-kernel, linux-doc
Cc: Tomas Glozar, Juri Lelli, Clark Williams, John Kacur
On Fri, 2025-10-10 at 16:29 +0200, Nam Cao wrote:
> > + * If CONFIG_PREEMPTION is enabled, then the tracepoint
> > itself disables
> > + * preemption (adding one to the preempt_count). Since we
> > are
> > + * interested in the preempt_count at the time the
> > tracepoint was
> > + * hit, we consider 1 as still enabled.
> > + */
> > + if (IS_ENABLED(CONFIG_PREEMPTION))
> > + return (preempt_count() & PREEMPT_MASK) > 1;
>
> FYI, there is plan to keep preemption enabled during tracepoint
> handling. So keep that in mind when this monitor breaks.
Right, I've heard of it, thanks for the heads up. I'll adapt the monitors once
that happens.
Thanks,
Gabriele
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-13 14:14 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250919140954.104920-1-gmonaco@redhat.com>
2025-09-19 14:09 ` [PATCH v2 04/20] Documentation/rv: Adapt documentation after da_monitor refactoring Gabriele Monaco
2025-10-02 9:26 ` Nam Cao
2025-09-19 14:09 ` [PATCH v2 13/20] Documentation/rv: Add documentation about hybrid automata Gabriele Monaco
2025-10-10 13:46 ` Nam Cao
2025-10-13 8:33 ` Gabriele Monaco
2025-09-19 14:09 ` [PATCH v2 14/20] rv: Add sample hybrid monitors stall Gabriele Monaco
2025-10-10 14:23 ` Nam Cao
2025-10-13 9:01 ` Gabriele Monaco
2025-09-19 14:09 ` [PATCH v2 15/20] rv: Convert the opid monitor to a hybrid automaton Gabriele Monaco
2025-10-10 14:29 ` Nam Cao
2025-10-13 14:14 ` Gabriele Monaco
2025-09-19 14:09 ` [PATCH v2 20/20] rv: Add deadline monitors Gabriele Monaco
2025-10-10 15:04 ` Nam Cao
2025-10-13 7:30 ` Gabriele Monaco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).