All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Don Zickus <dzickus@redhat.com>
Cc: linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	wim@iguana.be, Guenter Roeck <linux@roeck-us.net>,
	dyoung@redhat.com, vgoyal@redhat.com
Subject: Re: [PATCH v3] watchdog: Add hook for kicking in kdump path
Date: Thu, 18 Apr 2013 11:09:29 -0700	[thread overview]
Message-ID: <87ehe77lt2.fsf@xmission.com> (raw)
In-Reply-To: <20130418174432.GN79013@redhat.com> (Don Zickus's message of "Thu, 18 Apr 2013 13:44:32 -0400")

Don Zickus <dzickus@redhat.com> writes:

> On Thu, Apr 18, 2013 at 09:35:05AM -0700, Eric W. Biederman wrote:
>> Don Zickus <dzickus@redhat.com> writes:
>> 
>> > A common problem with kdump is that during the boot up of the
>> > second kernel, the hardware watchdog times out and reboots the
>> > machine before a vmcore can be captured.
>> >
>> > Instead of tellling customers to disable their hardware watchdog
>> > timers, I hacked up a hook to put in the kdump path that provides
>> > one last kick before jumping into the second kernel.
>> 
>> Having thought about this a little more this patch is actively wrong.
>> 
>> The problem is you can easily be petting the watchdog in violation of
>> whatever policy is normally in place.  Which means that this extra
>> petting can result in a system that is unavailable for an unacceptably
>> long period of time.
>
> Not really, just an extra period which isn't that much.  This would only
> be noticable if kdump is setup and enabled and then _hung_, otherwise it
> just quickly reboots and noone notices. :-)

For the folks who care the definition of acceptable unavailability would
look like: watchdog timeout + max boot time + margin of error.  So it
is possible for an extra watchdog pet to eat up or exceed your margin
of error.

You are more likely to cause a how in the world did that happen than
something more extreme, but even playing invalidating peoples mental
model can be a problem sometimes.

>> I expect most watchdog policies are not that strict, but this patch
>> would preclude using those that are.
>
> I would assume most of those users would not enable kdump and would not be
> affected.

I have seen it be the case that the goal is to record what went wrong
if there is time, but to get back into service in a timely manner
regardless.

>> And like is being discussed in another subthread it does look like
>> changing the timeout and the interval should be enough all on it's own.
>
> Probably and I will pursue that.  Thanks for the suggestion.

Eric


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Don Zickus <dzickus@redhat.com>
Cc: linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
	wim@iguana.be, LKML <linux-kernel@vger.kernel.org>,
	vgoyal@redhat.com, dyoung@redhat.com,
	Guenter Roeck <linux@roeck-us.net>
Subject: Re: [PATCH v3] watchdog: Add hook for kicking in kdump path
Date: Thu, 18 Apr 2013 11:09:29 -0700	[thread overview]
Message-ID: <87ehe77lt2.fsf@xmission.com> (raw)
In-Reply-To: <20130418174432.GN79013@redhat.com> (Don Zickus's message of "Thu, 18 Apr 2013 13:44:32 -0400")

Don Zickus <dzickus@redhat.com> writes:

> On Thu, Apr 18, 2013 at 09:35:05AM -0700, Eric W. Biederman wrote:
>> Don Zickus <dzickus@redhat.com> writes:
>> 
>> > A common problem with kdump is that during the boot up of the
>> > second kernel, the hardware watchdog times out and reboots the
>> > machine before a vmcore can be captured.
>> >
>> > Instead of tellling customers to disable their hardware watchdog
>> > timers, I hacked up a hook to put in the kdump path that provides
>> > one last kick before jumping into the second kernel.
>> 
>> Having thought about this a little more this patch is actively wrong.
>> 
>> The problem is you can easily be petting the watchdog in violation of
>> whatever policy is normally in place.  Which means that this extra
>> petting can result in a system that is unavailable for an unacceptably
>> long period of time.
>
> Not really, just an extra period which isn't that much.  This would only
> be noticable if kdump is setup and enabled and then _hung_, otherwise it
> just quickly reboots and noone notices. :-)

For the folks who care the definition of acceptable unavailability would
look like: watchdog timeout + max boot time + margin of error.  So it
is possible for an extra watchdog pet to eat up or exceed your margin
of error.

You are more likely to cause a how in the world did that happen than
something more extreme, but even playing invalidating peoples mental
model can be a problem sometimes.

>> I expect most watchdog policies are not that strict, but this patch
>> would preclude using those that are.
>
> I would assume most of those users would not enable kdump and would not be
> affected.

I have seen it be the case that the goal is to record what went wrong
if there is time, but to get back into service in a timely manner
regardless.

>> And like is being discussed in another subthread it does look like
>> changing the timeout and the interval should be enough all on it's own.
>
> Probably and I will pursue that.  Thanks for the suggestion.

Eric


  reply	other threads:[~2013-04-18 18:09 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-17 21:19 [PATCH v3] watchdog: Add hook for kicking in kdump path Don Zickus
2013-04-17 21:19 ` Don Zickus
2013-04-17 21:33 ` Guenter Roeck
2013-04-17 21:33   ` Guenter Roeck
2013-04-17 21:49 ` Eric W. Biederman
2013-04-17 21:49   ` Eric W. Biederman
2013-04-18  3:03   ` Guenter Roeck
2013-04-18  3:03     ` Guenter Roeck
2013-04-18 13:00   ` Don Zickus
2013-04-18 13:00     ` Don Zickus
2013-04-18 13:49     ` Guenter Roeck
2013-04-18 13:49       ` Guenter Roeck
2013-04-18 13:52       ` Don Zickus
2013-04-18 13:52         ` Don Zickus
2013-04-18 14:54         ` Guenter Roeck
2013-04-18 14:54           ` Guenter Roeck
2013-04-24 14:42           ` Don Zickus
2013-04-24 14:42             ` Don Zickus
2013-04-24 15:21             ` Guenter Roeck
2013-04-24 15:21               ` Guenter Roeck
2013-05-27 19:16               ` Wim Van Sebroeck
2013-05-28  1:10                 ` Guenter Roeck
2013-05-28  1:10                   ` Guenter Roeck
2013-05-30 20:37                   ` Wim Van Sebroeck
2013-05-28 15:34                 ` Guenter Roeck
2013-05-28 15:34                   ` Guenter Roeck
2013-05-30 21:54                   ` Wim Van Sebroeck
2013-04-18 16:35 ` Eric W. Biederman
2013-04-18 16:35   ` Eric W. Biederman
2013-04-18 17:44   ` Don Zickus
2013-04-18 17:44     ` Don Zickus
2013-04-18 18:09     ` Eric W. Biederman [this message]
2013-04-18 18:09       ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ehe77lt2.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=dyoung@redhat.com \
    --cc=dzickus@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=vgoyal@redhat.com \
    --cc=wim@iguana.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.