From: Don Zickus <dzickus@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
LKML <linux-kernel@vger.kernel.org>,
wim@iguana.be, Dave Young <dyoung@redhat.com>,
vgoyal@redhat.com
Subject: Re: [RFC PATCH] watchdog: Add hook for kicking in kdump path
Date: Tue, 9 Apr 2013 10:44:31 -0400 [thread overview]
Message-ID: <20130409144431.GL79013@redhat.com> (raw)
In-Reply-To: <20130408151509.GA20919@roeck-us.net>
On Mon, Apr 08, 2013 at 08:15:09AM -0700, Guenter Roeck wrote:
> On Mon, Apr 08, 2013 at 08:48:58AM -0400, Don Zickus wrote:
> > On Mon, Apr 08, 2013 at 01:46:58PM +0800, Dave Young wrote:
> > > On 04/06/2013 04:16 AM, Don Zickus wrote:
> > > > A common problem with kdump is that during the boot up of the
> > > > second kernel, the hardware watchdog times out and reboots the
> > > > machine before a vmcore can be captured.
> > > >
> > > > Instead of tellling customers to disable their hardware watchdog
> > > > timers, I hacked up a hook to put in the kdump path that provides
> > > > one last kick before jumping into the second kernel.
> > > >
> > > > The assumption is the watchdog timeout is at least 10-30 seconds
> > > > long, enough to get the second kernel to userspace to kick the watchdog
> > > > again, if needed.
> > >
> > > For kdump kernel some devices need to reset, this might increase the
> > > boot time, it's not so reliable for the 10-30s for us to kicking the
> > > watchdog.
> > >
> > > Could we have another option to disable/stop the watchdog while panic
> > > happens? Ie. add a kernel cmdline panic_stop_wd=<0|1> for 1st kernel, if
> > > it's set to 1, then just stop the watchdog or we can kick the watchdog
> > > like what you do in this patch. Of course stopping watchdog should be
> > > lockless as well..
> >
> > Hmm, I can look into that. But I am not sure all watchdogs have the
> > ability to stop once started. I was also worried about the case where
>
> Correct.
>
> > kdump hangs for some reason. Having the watchdog there to 'reboot' would
> > be a nice safety net.
> >
> Absolutely agree. After all, the reason for the kdump is most likely that
> something went really wrong, meaning there is some likelyhood for the hang
> to occur. Turning off the watchdog in this condition does not seem to be
> a good idea.
>
> > Perhaps adjusting the watchdog 'timeout' to something like 3 minutes would
> > be easier?
> >
> Not all watchdogs support such large timeouts, unfortunately. Maybe it would
> make sense to implement infrastructure support for a softdog on top of the
> hardware watchdog. Several drivers implement that outside the infrastructure
> already.
Hi Guenter,
I am not familar with a softdog. Can you give me an example of how it
works?
Cheers,
Don
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Don Zickus <dzickus@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Dave Young <dyoung@redhat.com>,
linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
wim@iguana.be, LKML <linux-kernel@vger.kernel.org>,
vgoyal@redhat.com
Subject: Re: [RFC PATCH] watchdog: Add hook for kicking in kdump path
Date: Tue, 9 Apr 2013 10:44:31 -0400 [thread overview]
Message-ID: <20130409144431.GL79013@redhat.com> (raw)
In-Reply-To: <20130408151509.GA20919@roeck-us.net>
On Mon, Apr 08, 2013 at 08:15:09AM -0700, Guenter Roeck wrote:
> On Mon, Apr 08, 2013 at 08:48:58AM -0400, Don Zickus wrote:
> > On Mon, Apr 08, 2013 at 01:46:58PM +0800, Dave Young wrote:
> > > On 04/06/2013 04:16 AM, Don Zickus wrote:
> > > > A common problem with kdump is that during the boot up of the
> > > > second kernel, the hardware watchdog times out and reboots the
> > > > machine before a vmcore can be captured.
> > > >
> > > > Instead of tellling customers to disable their hardware watchdog
> > > > timers, I hacked up a hook to put in the kdump path that provides
> > > > one last kick before jumping into the second kernel.
> > > >
> > > > The assumption is the watchdog timeout is at least 10-30 seconds
> > > > long, enough to get the second kernel to userspace to kick the watchdog
> > > > again, if needed.
> > >
> > > For kdump kernel some devices need to reset, this might increase the
> > > boot time, it's not so reliable for the 10-30s for us to kicking the
> > > watchdog.
> > >
> > > Could we have another option to disable/stop the watchdog while panic
> > > happens? Ie. add a kernel cmdline panic_stop_wd=<0|1> for 1st kernel, if
> > > it's set to 1, then just stop the watchdog or we can kick the watchdog
> > > like what you do in this patch. Of course stopping watchdog should be
> > > lockless as well..
> >
> > Hmm, I can look into that. But I am not sure all watchdogs have the
> > ability to stop once started. I was also worried about the case where
>
> Correct.
>
> > kdump hangs for some reason. Having the watchdog there to 'reboot' would
> > be a nice safety net.
> >
> Absolutely agree. After all, the reason for the kdump is most likely that
> something went really wrong, meaning there is some likelyhood for the hang
> to occur. Turning off the watchdog in this condition does not seem to be
> a good idea.
>
> > Perhaps adjusting the watchdog 'timeout' to something like 3 minutes would
> > be easier?
> >
> Not all watchdogs support such large timeouts, unfortunately. Maybe it would
> make sense to implement infrastructure support for a softdog on top of the
> hardware watchdog. Several drivers implement that outside the infrastructure
> already.
Hi Guenter,
I am not familar with a softdog. Can you give me an example of how it
works?
Cheers,
Don
next prev parent reply other threads:[~2013-04-09 14:44 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1365192994-94850-1-git-send-email-dzickus@redhat.com>
2013-04-08 5:46 ` [RFC PATCH] watchdog: Add hook for kicking in kdump path Dave Young
2013-04-08 5:46 ` Dave Young
2013-04-08 12:48 ` Don Zickus
2013-04-08 12:48 ` Don Zickus
2013-04-08 15:15 ` Guenter Roeck
2013-04-08 15:15 ` Guenter Roeck
2013-04-09 14:44 ` Don Zickus [this message]
2013-04-09 14:44 ` Don Zickus
2013-04-09 14:52 ` Guenter Roeck
2013-04-09 14:52 ` Guenter Roeck
2013-04-09 15:14 ` Don Zickus
2013-04-09 15:14 ` Don Zickus
2013-04-09 16:07 ` Guenter Roeck
2013-04-09 16:07 ` Guenter Roeck
2013-04-10 13:40 ` Don Zickus
2013-04-10 13:40 ` Don Zickus
2013-04-10 13:51 ` Guenter Roeck
2013-04-10 13:51 ` Guenter Roeck
2013-04-10 14:20 ` Don Zickus
2013-04-10 14:20 ` Don Zickus
2013-04-10 15:10 ` Guenter Roeck
2013-04-10 15:10 ` Guenter Roeck
2013-04-10 16:17 ` Don Zickus
2013-04-10 16:17 ` Don Zickus
2013-04-10 16:30 ` Guenter Roeck
2013-04-10 16:30 ` Guenter Roeck
2013-04-12 21:16 ` Don Zickus
2013-04-12 21:16 ` Don Zickus
2013-04-12 21:30 ` Guenter Roeck
2013-04-12 21:30 ` Guenter Roeck
2013-04-15 20:55 ` Don Zickus
2013-04-15 20:55 ` Don Zickus
2013-04-15 22:50 ` Guenter Roeck
2013-04-15 22:50 ` Guenter Roeck
2013-04-10 16:49 ` David Teigland
2013-04-10 16:49 ` David Teigland
2013-04-10 17:17 ` Guenter Roeck
2013-04-10 17:17 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130409144431.GL79013@redhat.com \
--to=dzickus@redhat.com \
--cc=dyoung@redhat.com \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-watchdog@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=vgoyal@redhat.com \
--cc=wim@iguana.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.