From: Sourabh Jain <sourabhjain@linux.ibm.com>
To: linuxppc-dev@lists.ozlabs.org, maddy@linux.ibm.com, mpe@ellerman.id.au
Cc: npiggin@gmail.com, chleroy@kernel.org, ritesh.list@gmail.com,
shivangu@linux.ibm.com, hbathini@linux.ibm.com,
mahesh@linux.ibm.com, adityag@linux.ibm.com,
venkat88@linux.ibm.com, sourabhjain@linux.ibm.com,
stable@vger.kernel.org
Subject: [PATCH 0/1] powerpc/crash: protect kdump from active watchdogs
Date: Wed, 3 Jun 2026 12:32:16 +0530 [thread overview]
Message-ID: <20260603070217.483696-1-sourabhjain@linux.ibm.com> (raw)
On pseries LPAR systems in a high-availability environment using the
SBD[1][2] service, I observed that the system abruptly rebooted before
dump capture could complete.
Further investigation showed that SBD had configured a watchdog with
a 30-second timeout. Since the kernel crashes directly into the
kdump kernel without shutting down userspace services, the watchdog
remained active during dump capture. Once the watchdog timeout
expired, PHYP reset the LPAR, causing dump capture to fail.
The issue was reproducible only when the watchdog was active. Dump
capture completed successfully after disabling the watchdog,
stopping the SBD service, or increasing the watchdog timeout value.
This patch fixes the issue by stopping all active watchdogs on the
crash shutdown path before booting the kdump kernel.
Driver that export the hardware watchdog device is:
drivers/watchdog/pseries-wdt.c
[1] https://github.com/clusterlabs/sbd/blob/main/man/sbd.8.pod.in
[2] https://documentation.suse.com/sle-ha/15-SP4/html/SLE-HA-all/cha-ha-storage-protect.html
This issue can be reproduce using below program:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <signal.h>
#include <errno.h>
#include <sys/ioctl.h>
#include <linux/watchdog.h>
#define WATCHDOG_DEV "/dev/watchdog"
#define TIMEOUT 10
#define PET_INTERVAL 1
static int wdt_fd = -1;
static void watchdog_close(int disarm)
{
int flags;
if (wdt_fd < 0)
return;
if (disarm) {
flags = WDIOS_DISABLECARD;
if (ioctl(wdt_fd, WDIOC_SETOPTIONS, &flags) < 0)
printf("WDIOS_DISABLECARD failed: %m (nowayout may be set)\n");
else
printf("Watchdog disabled via WDIOS_DISABLECARD\n");
if (write(wdt_fd, "V", 1) < 0)
printf("Magic 'V' write failed: %m\n");
else
printf("Magic 'V' written\n");
} else {
printf("Closing WITHOUT disarming - watchdog keeps running!\n");
}
close(wdt_fd);
wdt_fd = -1;
printf("Watchdog fd closed\n");
}
static void safe_exit(int sig)
{
printf("\nSignal %d received - disarming watchdog...\n", sig);
watchdog_close(1);
exit(0);
}
static int watchdog_init(void)
{
int flags, timeout = TIMEOUT;
struct watchdog_info ident;
printf("Opening %s...\n", WATCHDOG_DEV);
wdt_fd = open(WATCHDOG_DEV, O_WRONLY);
if (wdt_fd < 0) {
printf("Failed to open %s: %m\n", WATCHDOG_DEV);
return -1;
}
printf("Watchdog opened and ARMED\n");
flags = WDIOS_ENABLECARD;
if (ioctl(wdt_fd, WDIOC_SETOPTIONS, &flags) < 0)
/* ENOTTY = driver always enabled, that's fine */
printf("WDIOS_ENABLECARD: %m (ok if ENOTTY)\n");
else
printf("Watchdog enabled via WDIOS_ENABLECARD\n");
if (ioctl(wdt_fd, WDIOC_SETTIMEOUT, &timeout) < 0)
printf("WDIOC_SETTIMEOUT failed: %m\n");
else
printf("Timeout set to %d seconds\n", timeout);
/* verify what the driver actually set */
if (ioctl(wdt_fd, WDIOC_GETTIMEOUT, &timeout) == 0)
printf("Actual timeout : %d seconds\n", timeout);
if (ioctl(wdt_fd, WDIOC_GETSUPPORT, &ident) == 0)
printf("Identity : %s\n", ident.identity);
return 0;
}
static void watchdog_tickle(void)
{
int timeleft = 0;
if (ioctl(wdt_fd, WDIOC_KEEPALIVE, 0) < 0) {
printf("WDIOC_KEEPALIVE failed: %m - falling back to write\n");
write(wdt_fd, "1", 1);
}
if (ioctl(wdt_fd, WDIOC_GETTIMELEFT, &timeleft) == 0)
printf("Petted watchdog. Timeleft: %d sec\n", timeleft);
else
printf("Petted watchdog.\n");
}
int main(void)
{
signal(SIGINT, safe_exit);
signal(SIGTERM, safe_exit);
if (watchdog_init() < 0)
return 1;
printf("\nPetting every %d seconds. Ctrl+C to safely stop.\n\n",
PET_INTERVAL);
while (1) {
watchdog_tickle();
sleep(PET_INTERVAL);
}
return 0;
}
Steps to reproduce the issue:
-----------------------------
1. Insert pseries-wdt driver
2. Compile the above proram and run the binary
3. Crash the kernel
Sourabh Jain (1):
powerpc/crash: stop watchdogs before booting kdump kernel
arch/powerpc/kexec/crash.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
--
2.52.0
next reply other threads:[~2026-06-03 7:02 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 7:02 Sourabh Jain [this message]
2026-06-03 7:02 ` [PATCH 1/1] powerpc/crash: stop watchdogs before booting kdump kernel Sourabh Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260603070217.483696-1-sourabhjain@linux.ibm.com \
--to=sourabhjain@linux.ibm.com \
--cc=adityag@linux.ibm.com \
--cc=chleroy@kernel.org \
--cc=hbathini@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mahesh@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=ritesh.list@gmail.com \
--cc=shivangu@linux.ibm.com \
--cc=stable@vger.kernel.org \
--cc=venkat88@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox