public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: Don Zickus <dzickus@redhat.com>
Cc: "Liang, Kan" <kan.liang@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"babu.moger@oracle.com" <babu.moger@oracle.com>,
	"atomlin@redhat.com" <atomlin@redhat.com>,
	"prarit@redhat.com" <prarit@redhat.com>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"eranian@google.com" <eranian@google.com>,
	"acme@redhat.com" <acme@redhat.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH V2] kernel/watchdog: fix spurious hard lockups
Date: Wed, 28 Jun 2017 13:14:04 -0700	[thread overview]
Message-ID: <20170628201404.GM23705@tassilo.jf.intel.com> (raw)
In-Reply-To: <20170628190008.3ftqq75evhn2hozp@redhat.com>

On Wed, Jun 28, 2017 at 03:00:08PM -0400, Don Zickus wrote:
> On Tue, Jun 27, 2017 at 04:48:22PM -0700, Andi Kleen wrote:
> > > I haven't heard back any test result yet.
> > > 
> > > The above patch looks good to me.
> > 
> > This needs performance testing.  It may slow down performance or latency sensitive workloads.
> 
> More motivation to work through the issues with the proposed real fix? :-)
> 
> > 
> > > Which workaround do you prefer, the above one or the one checking timestamp?
> > 
> > I prefer the earlier patch, it has far less risk of performance issues.
> 
> But now you are slowing down the nmi_watchdog so much that the
> watchdog_thresh hold becomes meaningless, no? (granted the turbo-mode blows
> it out of the water too)  So now folks who depend on the 10/5/1/whatever second
> reliability lose that.  I think that might be unfair too.

What do you mean with reliability? If you need guarantees of resetting
you always need another separate hardware watchdog (like the TCO watchdog),
as the CPU could be hung up enough that even the NMI watchdog is not 
functional anymore.

So relying solely on the NMI watchdog doesn't make any sense.

It can be a useful debugging tool for a specific class of bugs: 
when kernel software is looping forever.

But if that happens does it really matter how many iterations the
loop does before it is stopped?

Even the current timeout is essentially eternity in CPU time, and 3x
eternity is still eternity.

> The hrtimer increase maintains that and just adds a few more
> interrupts/second.

Interruptions are a big deal for many people.

-Andi

  reply	other threads:[~2017-06-28 20:14 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-21 14:41 [PATCH V2] kernel/watchdog: fix spurious hard lockups kan.liang
2017-06-21 15:12 ` Thomas Gleixner
2017-06-21 15:47   ` Liang, Kan
2017-06-21 17:40     ` Prarit Bhargava
2017-06-21 17:07   ` Andi Kleen
2017-06-21 19:59     ` Thomas Gleixner
2017-06-21 21:53 ` Thomas Gleixner
2017-06-22 15:33   ` Thomas Gleixner
2017-06-22 15:44   ` Don Zickus
2017-06-22 15:48     ` Liang, Kan
2017-06-23  8:01     ` Thomas Gleixner
2017-06-23 16:29       ` Don Zickus
2017-06-23 21:50         ` Thomas Gleixner
2017-06-26 20:19           ` Don Zickus
2017-06-26 20:30             ` Thomas Gleixner
2017-06-27 20:12             ` Don Zickus
2017-06-27 20:49               ` Liang, Kan
2017-06-27 21:09                 ` Don Zickus
2017-06-27 23:48                 ` Andi Kleen
2017-06-28 19:00                   ` Don Zickus
2017-06-28 20:14                     ` Andi Kleen [this message]
2017-06-29 15:44                       ` Don Zickus
2017-06-29 16:12                         ` Andi Kleen
2017-06-29 16:26                           ` Don Zickus
2017-06-29 16:36                             ` Andi Kleen
2017-07-17  1:24               ` Liang, Kan
2017-07-17  7:14                 ` Thomas Gleixner
2017-07-17 12:18                   ` Liang, Kan
2017-07-17 13:13                     ` Thomas Gleixner
2017-07-17 14:46                       ` Liang, Kan
2017-07-17 15:00                         ` Thomas Gleixner
2017-07-17 14:46                 ` Don Zickus
2017-08-15  1:16                   ` Liang, Kan
2017-08-15  1:28                     ` Linus Torvalds
2017-08-15  7:50                     ` Thomas Gleixner
2017-08-17 15:45                       ` Liang, Kan
2017-08-18 10:39                       ` [tip:core/urgent] kernel/watchdog: Prevent false positives with turbo modes tip-bot for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170628201404.GM23705@tassilo.jf.intel.com \
    --to=ak@linux.intel.com \
    --cc=acme@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=atomlin@redhat.com \
    --cc=babu.moger@oracle.com \
    --cc=dzickus@redhat.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=prarit@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox