public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Frederic Weisbecker <fweisbec@gmail.com>
Cc: mingo@elte.hu, peterz@infradead.org, gorcunov@gmail.com,
	aris@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [watchdog] combine nmi_watchdog and softlockup
Date: Fri, 9 Apr 2010 09:32:52 -0400	[thread overview]
Message-ID: <20100409133252.GH15159@redhat.com> (raw)
In-Reply-To: <20100409010248.GE6672@nowhere>

On Fri, Apr 09, 2010 at 03:02:53AM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 05, 2010 at 10:11:22AM -0400, Don Zickus wrote:
> > On Tue, Mar 23, 2010 at 05:33:38PM -0400, Don Zickus wrote:
> > > The new nmi_watchdog (which uses the perf event subsystem) is very
> > > similar in structure to the softlockup detector.  Using Ingo's suggestion,
> > > I combined the two functionalities into one file, kernel/watchdog.c.
> > > 
> > > Now both the nmi_watchdog (or hardlockup detector) and softlockup detector
> > > sit on top of the perf event subsystem, which is run every 60 seconds or so
> > > to see if there are any lockups.
> > 
> > I raised some questions privately to Ingo, he asked I re-iterate them with
> > Peter Z. and Frederic W. cc'd.
> > 
> > > Ok thanks.  When you get a chance I had a couple of questions I was hoping
> > > you could answer for me.
> > >
> > > - does the hrtimer stuff look ok?
> 
> 
> IMO, only partially, as explained in my previous mail.

Yup, makes sense, thanks.

> 
> 
> > > - I wanted to merge the hung task detector code into watchdog.c.  The main
> > >   logic of the code is to walk the task list which i thought about doing
> > >   in the watchdog kthread.  I assume that is the right way to go, but i was a
> > >   little confused on how the scheduler worked.  I thought the watchdog kthread
> > >   would be scheduled very frequently (being a high priority task) but it seems
> > >   to only schedule when the code wakes it up.  Is that right?
> 
> 
> Yeah but high-prio doesn't mean that it is scheduled often.
> It means that once it is in a runnable state (TASK_RUNNING), it
> will have a higher priority to get into the cpu (lower prio tasks
> will have less time in the cpu than the higher prio until the higher prio
> get to sleep). Especially here this is a SCHED_FIFO class, so usual
> tasks (SCHED_OTHER) won't ever run until it goes to sleep.
> 
> But when it goes to sleep, it doesn't need the cpu, so other tasks
> can run.
> And it is only woken up every 30 secs, just to call
> __touch_softlockup_watchdog() and then it goes to sleep again
> until the timer wakes it up. That's why it doesn't run often.
> The high priority is just here to ensure it will do its job
> without too much latency, may be even to avoid rt-tasks to
> trigger spurious soft lockups just because the softlockup task
> couldn't run because of them taking the cpu for too long.
> If it starves because of a higher priority task running for
> too long, it can't touch the softlockup_touch_ts, and the timer
> will think there is a softlockup.

Ok.

> 
> 
> Concerning the hung task detector, I think it should be left as is in
> its own file and dedicated task. IIRC the hung task and softlockup
> detectors were in the same file before but they were split up.

I was doing that work based on a request by Ingo.  Ingo, thoughts?

> 
> We can't factorize both in the same task. The softlockup detector
> needs to be a real time task for the reasons stated above. And it's fine
> because it does very few things so it doesn't bother the other tasks
> with its high prio (unless there are strong rt requirement elsewhere).
> But the hung task detector must be a normal task, because it doesn't
> have latency requirements, it just checks if a task is blocked for too
> long, it's not like the softlockup detector that really needs to keep
> up with a timer. Also it does too much things to be an rt task (walking
> through the entire task list).

Ok.  Makes sense.

Cheers,
Don


  reply	other threads:[~2010-04-09 13:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-23 21:33 [watchdog] combine nmi_watchdog and softlockup Don Zickus
2010-03-28  2:46 ` Aristeu Sergio Rozanski Filho
2010-03-29 18:26   ` Don Zickus
2010-03-30 14:52     ` Aristeu Sergio Rozanski Filho
2010-04-05 20:11       ` Cyrill Gorcunov
2010-04-05 20:16         ` Don Zickus
2010-04-05 14:11 ` Don Zickus
2010-04-09  1:02   ` Frederic Weisbecker
2010-04-09 13:32     ` Don Zickus [this message]
2010-04-06 14:13 ` Frederic Weisbecker
2010-04-06 15:31   ` Cyrill Gorcunov
2010-04-08 23:52     ` Frederic Weisbecker
2010-04-09  0:00     ` Frederic Weisbecker
2010-04-09 14:56       ` Cyrill Gorcunov
2010-04-09 15:05         ` Don Zickus
2010-04-06 18:59   ` Don Zickus
2010-04-09  0:22     ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100409133252.GH15159@redhat.com \
    --to=dzickus@redhat.com \
    --cc=aris@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=gorcunov@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox