All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>,
	wim@iguana.be, Dave Young <dyoung@redhat.com>,
	vgoyal@redhat.com
Subject: Re: [RFC PATCH] watchdog: Add hook for kicking in kdump path
Date: Wed, 10 Apr 2013 10:20:55 -0400	[thread overview]
Message-ID: <20130410142055.GW79013@redhat.com> (raw)
In-Reply-To: <20130410135123.GB15456@roeck-us.net>

On Wed, Apr 10, 2013 at 06:51:23AM -0700, Guenter Roeck wrote:
> On Wed, Apr 10, 2013 at 09:40:39AM -0400, Don Zickus wrote:
> > On Tue, Apr 09, 2013 at 09:07:58AM -0700, Guenter Roeck wrote:
> > > > > Just look for the use of mod_timer in the watchdog directory.
> > > > 
> > > > So looking at the mod_timer logic in various drivers, it seems regardless
> > > > if the /dev/watchdog device is opened or not, if it is running, it will
> > > > automagically kick the watchdog.
> > > > 
> > > yes
> > > 
> > > > This seems that we can avoid pulling in userspace pieces for this.  Just
> > > > load the driver and the hardware starts getting kicked.
> > > > 
> > > Only if it is already running. Also, you don't want to rely on it, because you
> > > lose protection against user space issues.
> > 
> > IOW if something goes wrong with a runaway userspace app, the kernel
> > blindly continues to kick the watchdog, which masks the problem, right?
> > 
> That would be wrong if any of the drivers does that. The kernel should stop
> kicking after the software timeout expires.
> 
> For example, if the HW needs to be kicked every second, and the high level
> timeout is set to one minute, the driver should keep kicking the hardware
> watchdog for one minute and then stop doing it if /dev/watchdog was opened
> and userspace is silent. 

Ah ok.

> 
> > > 
> > > A second use is if the hw watchdog needs to be pinged more often than user
> > > space can provide. Some of the HW watchdogs need a ping in one-second intervals
> > > or even faster.
> > > 
> > > > Is that true?  And if so, do all drivers detect if the hardware is already
> > > > running during their init?  Or is it based on the first device open?
> > > > 
> > > It is usually done in the probe function.
> > 
> > Ok.  Thanks for the understanding of how the softdog stuff works.
> > 
> > However, we still have the problem that if the machine panics and we want
> > to jump into the kdump kernel, we need to 'kick' the watchdog one more
> > time.  This provides us a sane sync point for determining how long we have
> > to load the watchdog driver in the second kernel before the hardware
> > reboots us.  Otherwise the reboots are pretty random and nothing is
> > guaranteed.
> > 
> > Hence the need for some sort of patch resembling the one I posted.
> > 
> > Soooooooo, any thoughts about that patch and what changes I should make?
> > :-)
> > 
> The FIXME is a problem, and I think the name and scope would have to be
> more generic (watchdog_kick ?). Also, it doesn't solve the problem
> of having multiple open watchdogs (my system has three, for example),
> and it doesn't check if the watchdog is running.

Ok.  I didn't know the watchdog subsystem well enough, so I just took
stabs in the dark about how things should work.  I appreciate the
feedback.

I could make the name more generic.  I wasn't sure if the watchdog
community would frown on that.  The FIXME is a problem, I am not sure how
to handle the 'fail' scenario (can't get the mutex with trylock).  And I
have no idea how to even find out if multiple watchdogs are open on the
system.  Is there a list I could walk?  And with regard to 'watchdog is
running', I thought 'watchdog_active' would do that.  But again, I could
be misreading the code.

Thanks for the feedback.

Cheers,
Don

> 
> Guenter

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Don Zickus <dzickus@redhat.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Dave Young <dyoung@redhat.com>,
	linux-watchdog@vger.kernel.org, kexec@lists.infradead.org,
	wim@iguana.be, LKML <linux-kernel@vger.kernel.org>,
	vgoyal@redhat.com
Subject: Re: [RFC PATCH] watchdog: Add hook for kicking in kdump path
Date: Wed, 10 Apr 2013 10:20:55 -0400	[thread overview]
Message-ID: <20130410142055.GW79013@redhat.com> (raw)
In-Reply-To: <20130410135123.GB15456@roeck-us.net>

On Wed, Apr 10, 2013 at 06:51:23AM -0700, Guenter Roeck wrote:
> On Wed, Apr 10, 2013 at 09:40:39AM -0400, Don Zickus wrote:
> > On Tue, Apr 09, 2013 at 09:07:58AM -0700, Guenter Roeck wrote:
> > > > > Just look for the use of mod_timer in the watchdog directory.
> > > > 
> > > > So looking at the mod_timer logic in various drivers, it seems regardless
> > > > if the /dev/watchdog device is opened or not, if it is running, it will
> > > > automagically kick the watchdog.
> > > > 
> > > yes
> > > 
> > > > This seems that we can avoid pulling in userspace pieces for this.  Just
> > > > load the driver and the hardware starts getting kicked.
> > > > 
> > > Only if it is already running. Also, you don't want to rely on it, because you
> > > lose protection against user space issues.
> > 
> > IOW if something goes wrong with a runaway userspace app, the kernel
> > blindly continues to kick the watchdog, which masks the problem, right?
> > 
> That would be wrong if any of the drivers does that. The kernel should stop
> kicking after the software timeout expires.
> 
> For example, if the HW needs to be kicked every second, and the high level
> timeout is set to one minute, the driver should keep kicking the hardware
> watchdog for one minute and then stop doing it if /dev/watchdog was opened
> and userspace is silent. 

Ah ok.

> 
> > > 
> > > A second use is if the hw watchdog needs to be pinged more often than user
> > > space can provide. Some of the HW watchdogs need a ping in one-second intervals
> > > or even faster.
> > > 
> > > > Is that true?  And if so, do all drivers detect if the hardware is already
> > > > running during their init?  Or is it based on the first device open?
> > > > 
> > > It is usually done in the probe function.
> > 
> > Ok.  Thanks for the understanding of how the softdog stuff works.
> > 
> > However, we still have the problem that if the machine panics and we want
> > to jump into the kdump kernel, we need to 'kick' the watchdog one more
> > time.  This provides us a sane sync point for determining how long we have
> > to load the watchdog driver in the second kernel before the hardware
> > reboots us.  Otherwise the reboots are pretty random and nothing is
> > guaranteed.
> > 
> > Hence the need for some sort of patch resembling the one I posted.
> > 
> > Soooooooo, any thoughts about that patch and what changes I should make?
> > :-)
> > 
> The FIXME is a problem, and I think the name and scope would have to be
> more generic (watchdog_kick ?). Also, it doesn't solve the problem
> of having multiple open watchdogs (my system has three, for example),
> and it doesn't check if the watchdog is running.

Ok.  I didn't know the watchdog subsystem well enough, so I just took
stabs in the dark about how things should work.  I appreciate the
feedback.

I could make the name more generic.  I wasn't sure if the watchdog
community would frown on that.  The FIXME is a problem, I am not sure how
to handle the 'fail' scenario (can't get the mutex with trylock).  And I
have no idea how to even find out if multiple watchdogs are open on the
system.  Is there a list I could walk?  And with regard to 'watchdog is
running', I thought 'watchdog_active' would do that.  But again, I could
be misreading the code.

Thanks for the feedback.

Cheers,
Don

> 
> Guenter

  reply	other threads:[~2013-04-10 14:21 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1365192994-94850-1-git-send-email-dzickus@redhat.com>
2013-04-08  5:46 ` [RFC PATCH] watchdog: Add hook for kicking in kdump path Dave Young
2013-04-08  5:46   ` Dave Young
2013-04-08 12:48   ` Don Zickus
2013-04-08 12:48     ` Don Zickus
2013-04-08 15:15     ` Guenter Roeck
2013-04-08 15:15       ` Guenter Roeck
2013-04-09 14:44       ` Don Zickus
2013-04-09 14:44         ` Don Zickus
2013-04-09 14:52         ` Guenter Roeck
2013-04-09 14:52           ` Guenter Roeck
2013-04-09 15:14           ` Don Zickus
2013-04-09 15:14             ` Don Zickus
2013-04-09 16:07             ` Guenter Roeck
2013-04-09 16:07               ` Guenter Roeck
2013-04-10 13:40               ` Don Zickus
2013-04-10 13:40                 ` Don Zickus
2013-04-10 13:51                 ` Guenter Roeck
2013-04-10 13:51                   ` Guenter Roeck
2013-04-10 14:20                   ` Don Zickus [this message]
2013-04-10 14:20                     ` Don Zickus
2013-04-10 15:10                     ` Guenter Roeck
2013-04-10 15:10                       ` Guenter Roeck
2013-04-10 16:17                       ` Don Zickus
2013-04-10 16:17                         ` Don Zickus
2013-04-10 16:30                         ` Guenter Roeck
2013-04-10 16:30                           ` Guenter Roeck
2013-04-12 21:16                       ` Don Zickus
2013-04-12 21:16                         ` Don Zickus
2013-04-12 21:30                         ` Guenter Roeck
2013-04-12 21:30                           ` Guenter Roeck
2013-04-15 20:55                           ` Don Zickus
2013-04-15 20:55                             ` Don Zickus
2013-04-15 22:50                             ` Guenter Roeck
2013-04-15 22:50                               ` Guenter Roeck
2013-04-10 16:49                 ` David Teigland
2013-04-10 16:49                   ` David Teigland
2013-04-10 17:17                   ` Guenter Roeck
2013-04-10 17:17                     ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130410142055.GW79013@redhat.com \
    --to=dzickus@redhat.com \
    --cc=dyoung@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-watchdog@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=vgoyal@redhat.com \
    --cc=wim@iguana.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.