public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation
@ 2026-03-12 23:22 Mayank Rungta via B4 Relay
  2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Mayank Rungta via B4 Relay @ 2026-03-12 23:22 UTC (permalink / raw)
  To: Petr Mladek, Jinchao Wang, Yunhui Cui, Stephane Eranian,
	Ian Rogers, Li Huafei, Feng Tang, Max Kellermann, Jonathan Corbet,
	Douglas Anderson, Andrew Morton, Florian Delizy, Shuah Khan
  Cc: linux-kernel, linux-doc, Mayank Rungta

This series addresses limitations in the hardlockup detector implementations
and updates the documentation to reflect actual behavior and recent changes.

The changes are structured as follows:

Refactoring (Patch 1)
=====================
Patch 1 refactors watchdog_hardlockup_check() to return early if no
lockup is detected. This reduces the indentation level of the main
logic block, serving as a clean base for the subsequent changes.

Hardlockup Detection Improvements (Patches 2 & 4)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.

Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag. This prevents stale comparisons which
can delay detection. This is a logic fix that ensures the detector remains
accurate even when the watchdog is frequently touched.

Patch 3 improves the Buddy detector's timeliness. The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s). This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.

Documentation Updates (Patches 3 & 5)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.

Patch 3 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector. It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).

Patch 5 adds a dedicated section for the Buddy detector, which was previously
undocumented. It details the mechanism, the new timing logic, and known
limitations.

Signed-off-by: Mayank Rungta <mrungta@google.com>
---
Changes in v2:
- Added Patch 1 to refactor watchdog_hardlockup_check() by returning
  early (Suggested by Douglas Anderson)
- Introduced the `watchdog_hardlockup_update_reset()` API (Suggested by
  Petr Mladek)
- Shifted original v1 patches to Patches 2-5 and rebased them on top of
  the new refactoring.
- Link to v1: https://lore.kernel.org/r/20260212-hardlockup-watchdog-fixes-v1-0-745f1dce04c3@google.com

---
Mayank Rungta (5):
      watchdog: Return early in watchdog_hardlockup_check()
      watchdog: Update saved interrupts during check
      doc: watchdog: Clarify hardlockup detection timing
      watchdog/hardlockup: improve buddy system detection timeliness
      doc: watchdog: Document buddy detector

 Documentation/admin-guide/lockup-watchdogs.rst | 132 ++++++++++++++++++----
 include/linux/nmi.h                            |   1 +
 kernel/watchdog.c                              | 148 ++++++++++++++-----------
 kernel/watchdog_buddy.c                        |   9 +-
 4 files changed, 199 insertions(+), 91 deletions(-)
---
base-commit: b4f0dd314b39ea154f62f3bd3115ed0470f9f71e
change-id: 20260211-hardlockup-watchdog-fixes-60317598ac20

Best regards,
-- 
Mayank Rungta <mrungta@google.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-03-23 22:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-12 23:22 [PATCH v2 0/5] watchdog/hardlockup: Improvements to hardlockup detection and documentation Mayank Rungta via B4 Relay
2026-03-12 23:22 ` [PATCH v2 1/5] watchdog: Return early in watchdog_hardlockup_check() Mayank Rungta via B4 Relay
2026-03-13 15:27   ` Doug Anderson
2026-03-23 15:47   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 2/5] watchdog: Update saved interrupts during check Mayank Rungta via B4 Relay
2026-03-13 15:27   ` Doug Anderson
2026-03-23 15:58   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 3/5] doc: watchdog: Clarify hardlockup detection timing Mayank Rungta via B4 Relay
2026-03-12 23:22 ` [PATCH v2 4/5] watchdog/hardlockup: improve buddy system detection timeliness Mayank Rungta via B4 Relay
2026-03-23 16:26   ` Petr Mladek
2026-03-12 23:22 ` [PATCH v2 5/5] doc: watchdog: Document buddy detector Mayank Rungta via B4 Relay
2026-03-23 17:26   ` Petr Mladek
2026-03-23 22:45     ` Doug Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox