From: David Teigland <teigland@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-watchdog@vger.kernel.org,
"Wim Van Sebroeck" <wim@iguana.be>,
linux-kernel@vger.kernel.org,
"Timo Kokkonen" <timo.kokkonen@offcode.fi>,
"Uwe Kleine-König" <u.kleine-koenig@pengutronix.de>,
linux-doc@vger.kernel.org, "Jonathan Corbet" <corbet@lwn.net>
Subject: Re: [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure
Date: Wed, 5 Aug 2015 14:51:25 -0500 [thread overview]
Message-ID: <20150805195125.GA20863@redhat.com> (raw)
In-Reply-To: <55C25D92.8020609@roeck-us.net>
On Wed, Aug 05, 2015 at 12:01:38PM -0700, Guenter Roeck wrote:
> I think I can understand why Wim was reluctant to accept your patch;
> I must admit I don't understand your use case either.
Very breifly, sanlock is a shared storage based lease manager, and the
expiration of a lease is tied to the expiration of the watchdog. I have
to ensure that the watchdog expires at or before the time that the lease
expires. This means that I cannot allow a watchdog heartbeat apart from a
corresponding lease renewal on the shared storage. Otherwise, the
calculation by other hosts of the time of the hard reset will be wrong,
and the data on shared storage could be corrupted.
> I wonder if you are actually mis-using the watchdog subsystem to generate
> hard resets.
I am indeed using it to generate hard resets.
> After all, you could avoid the unexpected close situation with
> an exit handler in your application. That handler could catch anything but
> SIGKILL, but anyone using SIGKILL doesn't really deserve better.
I avoid the unexpected close situation by prematurely closing the device
to generate the heartbeat from close, and then reopening if needed. That
covers the SIGKILL case. So, I have a work around, but the patch would
still be nice.
> If the intent is to reset the system after the application closes,
> executing "/sbin/restart -f" might be a safer approach than just killing
> the watchdog.
I need to reset the system if the application crashes, or if the
application is running but can't renew its lease. In the former case,
executing something doesn't work. In the later case, I have done similar
(with /proc/sysrq-trigger), but it doesn't always apply and I'd still want
the hardware reset as redundancy.
> In addition to that, I don't think it is a good idea to rely on the assumption
> that the watchdog will expire exactly after the configured timeout.
> Many watchdog drivers implement a soft timeout on top of the hardware timeout,
> and thus already implement the internal heartbeat. Most of those drivers
> will stop sending internal heartbeats if user space did not send a heartbeat
> within the configured timeout period. The actual reset will then occur later,
> after the actual hardware watchdog timed out. This can be as much as the
> hardware timeout period, which may be substantial.
OK, thanks, I'll look into this in more detail. Is there a way I can
identify which cases these are, or do you know an example I can look at?
In the worst case I'd have to extend the lease expiration time by a full
timeout period when the dubious drivers are used.
Dave
next prev parent reply other threads:[~2015-08-05 19:51 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-04 2:13 [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure Guenter Roeck
2015-08-04 2:13 ` [PATCH 1/8] watchdog: watchdog_dev: Use single variable name for struct watchdog_device Guenter Roeck
2015-08-04 11:26 ` Uwe Kleine-König
2015-08-04 2:13 ` [PATCH 2/8] watchdog: Introduce hardware maximum timeout in watchdog core Guenter Roeck
2015-08-04 12:18 ` Uwe Kleine-König
2015-08-04 15:31 ` Guenter Roeck
2015-08-04 15:52 ` Uwe Kleine-König
2015-08-04 16:03 ` Guenter Roeck
2015-08-05 8:22 ` Uwe Kleine-König
2015-08-05 9:14 ` Guenter Roeck
2015-08-04 2:13 ` [PATCH 3/8] watchdog: Introduce WDOG_RUNNING flag Guenter Roeck
2015-08-04 12:25 ` Uwe Kleine-König
2015-08-04 15:41 ` Uwe Kleine-König
2015-08-04 15:56 ` Guenter Roeck
2015-08-04 2:13 ` [PATCH 4/8] watchdog: Make set_timeout function optional Guenter Roeck
2015-08-04 15:38 ` Uwe Kleine-König
2015-08-04 16:43 ` Guenter Roeck
2015-08-04 2:13 ` [PATCH 5/8] watchdog: imx2: Convert to use infrastructure triggered keepalives Guenter Roeck
2015-08-04 15:44 ` Uwe Kleine-König
2015-08-04 2:13 ` [PATCH 6/8] watchdog: retu: " Guenter Roeck
2015-08-04 2:13 ` [PATCH 7/8] watchdog: gpio_wdt: " Guenter Roeck
2015-08-04 2:13 ` [PATCH 8/8] watchdog: at91sam9: " Guenter Roeck
2015-08-04 11:24 ` [PATCH 0/8] watchdog: Add support for keepalives triggered by infrastructure Uwe Kleine-König
2015-08-04 15:01 ` Guenter Roeck
2015-08-04 23:43 ` Pádraig Brady
2015-08-05 0:49 ` Guenter Roeck
2015-08-05 7:36 ` Uwe Kleine-König
2015-08-05 7:50 ` Guenter Roeck
2015-08-05 8:27 ` Uwe Kleine-König
2015-08-05 17:13 ` David Teigland
2015-08-05 17:41 ` Guenter Roeck
2015-08-05 17:51 ` David Teigland
2015-08-05 19:01 ` Guenter Roeck
2015-08-05 19:51 ` David Teigland [this message]
2015-08-05 20:21 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150805195125.GA20863@redhat.com \
--to=teigland@redhat.com \
--cc=corbet@lwn.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-watchdog@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=timo.kokkonen@offcode.fi \
--cc=u.kleine-koenig@pengutronix.de \
--cc=wim@iguana.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox