From: Daniel Bristot de Oliveira <bristot@kernel.org>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>,
Wim Van Sebroeck <wim@linux-watchdog.org>,
Guenter Roeck <linux@roeck-us.net>,
Jonathan Corbet <corbet@lwn.net>, Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Will Deacon <will@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Marco Elver <elver@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Shuah Khan <skhan@linuxfoundation.org>,
Gabriele Paoloni <gpaoloni@redhat.com>,
Juri Lelli <juri.lelli@redhat.com>,
Clark Williams <williams@redhat.com>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-devel@vger.kernel.org
Subject: [PATCH V4 20/20] Documentation/rv: Add watchdog-monitor documentation
Date: Thu, 16 Jun 2022 10:45:02 +0200 [thread overview]
Message-ID: <129a431c1a12610fa7b44f76ce73aa8058f55bc6.1655368610.git.bristot@kernel.org> (raw)
In-Reply-To: <cover.1655368610.git.bristot@kernel.org>
Adds documentation about the safe_wtd and safe_wtd_nwo RV monitors,
and their usage via a safety application.
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Gabriele Paoloni <gpaoloni@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-trace-devel@vger.kernel.org
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
---
Documentation/trace/rv/watchdog-monitor.rst | 250 ++++++++++++++++++++
1 file changed, 250 insertions(+)
create mode 100644 Documentation/trace/rv/watchdog-monitor.rst
diff --git a/Documentation/trace/rv/watchdog-monitor.rst b/Documentation/trace/rv/watchdog-monitor.rst
new file mode 100644
index 000000000000..2b142fb31572
--- /dev/null
+++ b/Documentation/trace/rv/watchdog-monitor.rst
@@ -0,0 +1,250 @@
+Watchdog monitor
+----------------
+
+The watchdog is an essential building block for the usage of Linux in
+safety-critical systems because it allows the system to be monitored from
+an external element - the watchdog hardware, acting as a safety-monitor.
+
+A user-space application controls the watchdog device via the watchdog
+interface. This application, hereafter safety_app, enables the watchdog
+and periodically pets the watchdog upon correct completion of the safety
+related processing.
+
+If the safety_app, for any reason, stops pinging the watchdog,
+the watchdog hardware can set the system in a fail-safe state. For
+example, shutting the system down.
+
+Given the importance of the safety_app / watchdog hardware couple,
+the interaction between these software pieces also needs some
+sort of monitoring. In other words, "who monitors the monitor?"
+
+The safe watchdog (safe_wtd) RV monitor monitors the interaction between
+the safety_app and the watchdog device, enforcing the correct sequence of
+events that leads the system to a safe state.
+
+Furthermore, the safety_app can monitor the RV monitor by collecting the
+events generated by the RV monitor itself via tracing interface. In this way,
+closing the monitoring loop with the safety_app.
+
+A diagram of the components and their interactions is::
+
+ user-space:
+ +--------------------------------+
+ | safety_app |-----------+
+ +--------------------------------+ |
+ | ^ |
+ | Configure | Enable and |
+ | | check data |
+ ===================+====================+=============== |
+ kernel-space: | | |
+ v v |
+ +----------+ instr. +-------------+ |
+ | watchdog | ----------->| RV Monitor |----+ |
+ | device | +-------------+ | |
+ +----------+ | |
+ | | |
+ | | |
+ ================+====================================== | |
+ hardware: | | |
+ v | +-> Bring the system
+ +--------------------+ +----> to a safe state,
+ | watchdog hardware |---------------------------> e.g., halt.
+ +--------------------+
+
+Sample safety_app
+-----------------
+
+The user-space safety_app sample code in ``tools/verification/safety_app/``
+serves to illustrate the usage of the RV monitors for this use-case, as
+well as the starting point to the development of a user-specific safety_app.
+
+Watchdog events
+---------------
+
+The RV monitor observes the watchdog by using instrumentation to
+process the events generated by the interaction between the
+safety_app and the watchdog device layer in kernel.
+
+The monitored events are:
+
+ - watchdog:watchdog_open: open the watchdog device;
+ - watchdog:watchdog_close: close the watchdog device;
+ - watchdog:watchdog_start: start the watchdog;
+ - watchdog:watchdog_stop: stop the watchdog;
+ - watchdog:watchdog_set_timeout: set the watchdog timeout;
+ - watchdog:watchdog_ping: reprogram the watchdog with the previously set
+ timeout;
+ - watchdog:watchdog_nowayout: prevents the watchdog from stopping;
+ - watchdog:watchdog_set_keep_alive: set an intermediary ping to overcome
+ the limitation of a hardware watchdog maximum timeout being shorter than
+ the timeout set by the user-space tool;
+ - watchdog:watchdog_keep_alive: the execution of the function that runs the
+ intermediary keep alive ping;
+
+RV monitor events
+-----------------
+
+The RV monitor monitors the relevant events as an outside observer,
+interpreting all the components (the hardware; the watchdog device
+interface; and the safety monitor) as an integrated component.
+
+The events selected for the monitor are:
+
+ - other_threads: an event generated by any thread other than the
+ one that set nowayout or open the watchdog the last time.
+ - open: a thread opens the watchdog to manipulate it;
+ - close: a thread closes the watchdog;
+ - start: starts the watchdog countdown;
+ - stop: stops the watchdog;
+ - set_safe_timeout: configures the watchdog with a given timeout;
+ - ping: resets the watchdog countdown with the previously configured timeout;
+ - nowayout: prevents the watchdog to be stopped until the system's shutdown;
+ - sched_keep_alive: schedules a kernel worker to ping the watchdog if the
+ timeout is longer than the watchdog hardware can handle.
+ - keep_alive: executes the previously scheduled watchdog ping;
+
+Noting that the events that does not appear in the automata models are
+considered blocked events, and their execution will always cause the
+RV monitor to react to an unexpected event.
+
+RV monitor specification
+------------------------
+
+The monitor's goal is to assess a set of specifications that conducts the
+system to a safe state.
+
+These specifications are:
+
+ - 1: Once open, only one process manipulates the watchdog;
+ - 2: Following 1, the keep-alive mechanisms will not be used;
+ - 3: If required, nowayout will be set before opening the watchdog;
+ - 4: A safe timeout must be set;
+ - 5: At least one ping must be made before entering the safe/safe_nwo states
+ - 6: The RV monitor does not react if the watchdog is closed without stopping.
+ But the hardware watchdog is expected to react.
+
+Deterministic automata monitors
+-------------------------------
+
+Following the specifications, a deterministic automata monitor
+was developed. The monitor is modeled as Deterministic Automata model.
+
+The deterministic automata model for safe_wtd is::
+
+ #==================================# other_threads
+ H H ----------------+
+ -----------> H init H |
+ H H <---------------+
+ #==================================#
+ | | ^
+ | | | close
+ | | +----------------------------------------------------+
+ | | |
+ | | open |
+ | +------------------------------------------------------+ |
+ | | |
+ | nowayout | |
+ v | |
+ nowayout +-------------------+ | |
+ other_threads | | nowayout | |
+ +---------------- | nwo |<-------------------------------------+ | |
+ | | | | | |
+ +---------------> | | <+ | | |
+ +-------------------+ | | | |
+ | | | | |
+ | open | close | | |
+ v | | | |
+ +-------------------+ | | | |
+ | opened_nwo | -+ | | |
+ +-------------------+ | | |
+ | | | |
+ | start | | |
+ v | | |
+ +-------------------+ | | |
+ +---------------> | started_nwo | -+ | | |
+ | +-------------------+ | | | |
+ | | | | | |
+ | open | set_safe_timeout | | | |
+ | v | | | |
+ | +-------------------+ | | | |
+ | | set_nwo | | | | |
+ | +-------------------+ | | | |
+ | | | | | |
+ | +-------------+ | ping | | | |
+ | | | | | | | |
+ | | ping v v | | | |
+ | | +-------------------+ | | | |
+ | +-----------| safe_nwo | | | | |
+ | +-------------------+ | | | |
+ | | | | | |
+ | | close | close | | |
+ | v v | | |
+ | +----------------------------------+ nowayout | | |
+ | | | other_threads | | |
+ | | closed_running_nwo | ----------------+ | | |
+ | | | | | | |
+ +---------------- | | <---------------+ | | |
+ +----------------------------------+ | | |
+ | nowayout ^ | | |
+ +-----------------------------+ | | |
+ | | |
+ | | |
+ +-------------------+ +--------+ | | |
+ | | | |------+---+ |
+ | started | start | opened | | |
+ +---------------- | | <-------- | |>-----+-------+
+ | +-------------------+ +--------+ | ^
+ | | | |
+ | | set_safe_timeout +-------------+-------+
+ | v | |
+ | +-------------------+ | |
+ | | | | |
+ | | set | | |
+ +----------+---------------> | | | |
+ | | +-------------------+ | |
+ | | | | |
+ | | | ping | |
+ | | v | |
+ | | +-------------------+ ping | |
+ | | | | -------+ | |
+ | | +---- | safe | | | |
+ | | | | | <------+ | |
+ | | | +-------------------+ | |
+ | | | | | |
+ | | stop | | stop | |
+ | | | v | |
+ | | | +-------------------+ close | |
+ | +-----------+---> | stopped |-------------+ |
+ | | +-------------------+ |
+ | +---+ |
+ | | close |
+ | v |
+ | other_threads +----------------------------------------+ |
+ | +--------------> | | |
+ | | | closed_running | |
+ | +--------------- | |--------------+
+ | +----------------------------------------+
+ | | ^
+ | open | | close
+ | v |
+ | set_safe_timeout +-------------------+
+ +-------------------------> | reopened |
+ +-------------------+
+
+It is important to note that the events sched_keep_alive and keep_alive
+are not allowed in the monitor (they are said to be blocked events).
+The execution of any blocked events leads the RV monitor to react.
+
+Additional options
+------------------
+
+The RV monitor also has a set of options enabled via kernel command
+line/module options. They are:
+
+ - watchdog_id: the device id to monitor (default 0);
+ - dont_stop: once enabled, do not allow the RV monitor to be stopped (default off);
+ - safe_timeout: define a maximum safe value that a user-space application can
+ set as the watchdog timeout (default unlimited);
+ - check_timeout: After every ping, check if the time left in the watchdog is less
+ than or equal to the last timeout set for the watchdog. It only works for watchdog
+ devices that provide the get_timeleft() function (default off);
--
2.35.1
next prev parent reply other threads:[~2022-06-16 8:48 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-16 8:44 [PATCH V4 00/20] The Runtime Verification (RV) interface Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 01/20] rv: Add " Daniel Bristot de Oliveira
2022-06-23 17:21 ` Punit Agrawal
2022-07-01 13:24 ` Daniel Bristot de Oliveira
2022-06-23 20:26 ` Steven Rostedt
2022-07-04 19:49 ` Daniel Bristot de Oliveira
2022-07-06 17:49 ` Tao Zhou
2022-07-06 17:53 ` Matthew Wilcox
2022-07-08 15:36 ` Tao Zhou
2022-07-08 15:55 ` Matthew Wilcox
2022-07-08 14:39 ` Daniel Bristot de Oliveira
2022-07-10 15:11 ` Tao Zhou
2022-07-10 15:42 ` Steven Rostedt
2022-07-10 22:28 ` Tao Zhou
2022-06-16 8:44 ` [PATCH V4 02/20] rv: Add runtime reactors interface Daniel Bristot de Oliveira
2022-06-23 20:40 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 03/20] rv/include: Add helper functions for deterministic automata Daniel Bristot de Oliveira
2022-06-28 17:48 ` Steven Rostedt
2022-07-06 18:35 ` Tao Zhou
2022-07-13 18:38 ` Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 04/20] rv/include: Add deterministic automata monitor definition via C macros Daniel Bristot de Oliveira
2022-07-06 18:56 ` Tao Zhou
2022-07-13 18:39 ` Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 05/20] rv/include: Add instrumentation helper functions Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 06/20] tools/rv: Add dot2c Daniel Bristot de Oliveira
2022-06-28 18:10 ` Steven Rostedt
2022-06-28 18:16 ` Steven Rostedt
2022-07-13 18:41 ` Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 07/20] tools/rv: Add dot2k Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 08/20] rv/monitor: Add the wip monitor skeleton created by dot2k Daniel Bristot de Oliveira
2022-06-28 19:02 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 09/20] rv/monitor: wip instrumentation and Makefile/Kconfig entries Daniel Bristot de Oliveira
2022-06-16 11:21 ` kernel test robot
2022-06-16 21:00 ` Randy Dunlap
2022-06-17 16:07 ` Daniel Bristot de Oliveira
2022-06-28 19:02 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 10/20] rv/monitor: Add the wwnr monitor skeleton created by dot2k Daniel Bristot de Oliveira
2022-07-06 20:08 ` Tao Zhou
2022-06-16 8:44 ` [PATCH V4 11/20] rv/monitor: wwnr instrumentation and Makefile/Kconfig entries Daniel Bristot de Oliveira
2022-06-16 13:47 ` kernel test robot
2022-06-28 19:05 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 12/20] rv/reactor: Add the printk reactor Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 13/20] rv/reactor: Add the panic reactor Daniel Bristot de Oliveira
2022-06-16 15:20 ` kernel test robot
2022-06-16 21:03 ` Randy Dunlap
2022-06-17 16:09 ` Daniel Bristot de Oliveira
2022-07-13 18:47 ` Daniel Bristot de Oliveira
2022-06-28 19:06 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 14/20] Documentation/rv: Add a basic documentation Daniel Bristot de Oliveira
2022-06-29 3:35 ` Bagas Sanjaya
2022-07-13 19:30 ` Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 15/20] Documentation/rv: Add deterministic automata monitor synthesis documentation Daniel Bristot de Oliveira
2022-06-28 19:09 ` Steven Rostedt
2022-06-16 8:44 ` [PATCH V4 16/20] Documentation/rv: Add deterministic automata instrumentation documentation Daniel Bristot de Oliveira
2022-06-16 8:44 ` [PATCH V4 17/20] watchdog/dev: Add tracepoints Daniel Bristot de Oliveira
2022-06-16 13:44 ` Guenter Roeck
2022-06-16 15:47 ` Daniel Bristot de Oliveira
2022-06-16 23:55 ` Guenter Roeck
2022-06-17 16:16 ` Daniel Bristot de Oliveira
2022-07-13 18:49 ` Daniel Bristot de Oliveira
2022-06-16 8:45 ` [PATCH V4 18/20] rv/monitor: Add safe watchdog monitor Daniel Bristot de Oliveira
2022-06-16 13:36 ` Guenter Roeck
2022-06-16 15:29 ` Daniel Bristot de Oliveira
[not found] ` <CA+wEVJbvcMZbCroO2_rdVxLvYkUo-ePxCwsp5vbDpoqys4HGWQ@mail.gmail.com>
2022-06-16 23:53 ` Guenter Roeck
2022-06-17 17:06 ` Daniel Bristot de Oliveira
2022-06-28 19:32 ` Steven Rostedt
2022-07-01 14:45 ` Guenter Roeck
2022-07-01 15:38 ` Steven Rostedt
2022-07-04 12:41 ` Daniel Bristot de Oliveira
2022-06-16 20:57 ` Randy Dunlap
2022-06-17 16:17 ` Daniel Bristot de Oliveira
2022-07-13 19:13 ` Daniel Bristot de Oliveira
2022-06-16 8:45 ` [PATCH V4 19/20] rv/safety_app: Add a safety_app sample Daniel Bristot de Oliveira
2022-06-16 8:45 ` Daniel Bristot de Oliveira [this message]
2022-07-07 12:41 ` [PATCH V4 20/20] Documentation/rv: Add watchdog-monitor documentation Tao Zhou
2022-07-13 18:51 ` Daniel Bristot de Oliveira
2022-06-22 7:24 ` [PATCH V4 00/20] The Runtime Verification (RV) interface Song Liu
2022-06-23 16:41 ` Daniel Bristot de Oliveira
2022-06-23 17:52 ` Song Liu
2022-06-23 20:29 ` Daniel Bristot de Oliveira
2022-06-23 21:10 ` Song Liu
2022-07-06 16:18 ` Tao Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=129a431c1a12610fa7b44f76ce73aa8058f55bc6.1655368610.git.bristot@kernel.org \
--to=bristot@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=corbet@lwn.net \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=gpaoloni@redhat.com \
--cc=juri.lelli@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-devel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=skhan@linuxfoundation.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=williams@redhat.com \
--cc=wim@linux-watchdog.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).