public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mandeep Singh Baines <msb@chromium.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Don Zickus <dzickus@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Michal Hocko <mhocko@suse.cz>, Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mandeep Singh Baines <msb@chromium.org>
Subject: Re: [PATCH] watchdog: Make sure the watchdog thread gets CPU on loaded system
Date: Wed, 14 Mar 2012 18:45:11 -0700	[thread overview]
Message-ID: <20120315014511.GT27051@google.com> (raw)
In-Reply-To: <20120314161906.e53359d3.akpm@linux-foundation.org>

Andrew Morton (akpm@linux-foundation.org) wrote:
> On Wed, 14 Mar 2012 16:38:45 -0400
> Don Zickus <dzickus@redhat.com> wrote:
> 
> > From: Michal Hocko <mhocko@suse.cz>
> 
> This changelog is awful.
> 
> > If the system is loaded while hotplugging a CPU we might end up with a bogus
> > hardlockup detection. This has been seen during LTP pounder test executed
> > in parallel with hotplug test.
> > 
> > The main problem is that enable_watchdog (called when CPU is brought up)
> 
> You mean watchdog_enable().
> 
> > registers perf event which periodically checks per-cpu counter
> > (hrtimer_interrupts), updated from a hrtimer callback, but the hrtimer is fired
> 
> s/fired/started/
> 
> > from the kernel thread.
> 
> "the kernel thread" being kernel/watchdog.c:watchdog()
> 
> > This means that while we already do check for the hard lockup the kernel thread
> 
> Who is "we" and where in the kernel does this check occur?
> 
> "the kernel thread" is still kernel/watchdog.c:watchdog().
> 
> > might be sitting on the runqueue with zillions of tasks
> 
> What causes these "zillions of tasks"?  Are they userspace tasks? 
> They're preventing the watchdog() function from being called in a
> timely fashion, I assume?
> 
> > so there is nobody to
> > update the value we rely on and so we KABOOM.
> 
> Who is "we" and what is "the value"?
> 
> etcetera.  It is maddeningly inaccurate, vague and handwavy for someone
> who is actually trying to understand what you're trying to tell us.
> 

My paraphrasing:

Set the task priority of the watchdog thread during creation. The current
implementation set the priority as one of the first few instructions from
the context of the watchdog thread. A false lockup can be detected because
the watchdog is not yet MAX_RT_PRIO - 1 so it can be prevented from
running due to a long runqueue or the running of a SCHED_FIFO process.
Once it changes its priority, this is no longer the case. The fix is to
set the priority to MAX_RT_PRIO -1 at creation time instead of at runtime.


> > Let's fix this by boosting the watchdog thread priority before we wake it up
> > rather than when it's already running.
> > This still doesn't handle a case where we have the same amount of high prio
> > FIFO tasks but that doesn't seem to be common.
> 
> Even a single FIFO thread could starve the watchdog() thread.
> 
> > The current implementation
> > doesn't handle that case anyway so this is not worse at least.
> 
> Right.  But this isn't specific to the startup case, is it?  A spinning
> SCHED_FIFO thread could cause watchdog() to get starved of CPU for an
> arbitrarily long time, triggering a false(?) lockup detection?  Or did
> we do something to prevent that case?  I assume we did - it would be
> pretty bad if this were to happen.
> 

I don't think anything prevents a SCHED_FIFO from preventing a false
lockup.

>From sched.h:

/*
 * Priority of a process goes from 0..MAX_PRIO-1, valid RT
 * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
 * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
 * values are inverted: lower p->prio value means higher priority.
 *
 * The MAX_USER_RT_PRIO value allows the actual maximum
 * RT priority to be separate from the value exported to
 * user-space.  This allows kernel threads to set their
 * priority to a value higher than any user task. Note:
 * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
 */

#define MAX_USER_RT_PRIO	100
#define MAX_RT_PRIO		MAX_USER_RT_PRIO

You could make MAX_RT_PRIO greater than MAX_USER_RT_PRIO but that might
have some impact on real-time applications. A simple one-line patch:

- #define MAX_RT_PRIO		MAX_USER_RT_PRIO
+ #define MAX_RT_PRIO		(MAX_USER_RT_PRIO + 1)

would prevent user-space from causing a false lockup detection.

Regards,
Mandeep

> > Unfortunately, we cannot start perf counter from the watchdog thread because we
> > could miss a real lock up and also we cannot start the hrtimer watchdog_enable
> > because we there is no way (at least I don't know any) to start a hrtimer from
> > a different CPU.
> > 
> > [fix compile issue with param -dcz]
> > 
> > Cc: Ingo Molnar <mingo@elte.hu>
> > Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Mandeep Singh Baines <msb@chromium.org>
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: Don Zickus <dzickus@redhat.com>
> > ---
> >  kernel/watchdog.c |    7 +++----
> >  1 files changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index d117262..6618cde 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -321,11 +321,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> >   */
> >  static int watchdog(void *unused)
> >  {
> > -	struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> > +	struct sched_param param = { .sched_priority = 0 };
> >  	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
> >  
> > -	sched_setscheduler(current, SCHED_FIFO, &param);
> > -
> >  	/* initialize timestamp */
> >  	__touch_watchdog();
> >  
> > @@ -350,7 +348,6 @@ static int watchdog(void *unused)
> >  		set_current_state(TASK_INTERRUPTIBLE);
> >  	}
> >  	__set_current_state(TASK_RUNNING);
> > -	param.sched_priority = 0;
> >  	sched_setscheduler(current, SCHED_NORMAL, &param);
> >  	return 0;
> >  }
> 
> Why did watchdog() reset the scheduling policy seven instructions
> before exiting?  Seems pointless.
> 
> > @@ -439,6 +436,7 @@ static int watchdog_enable(int cpu)
> >  
> >  	/* create the watchdog thread */
> >  	if (!p) {
> > +		struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
> >  		p = kthread_create_on_node(watchdog, NULL, cpu_to_node(cpu), "watchdog/%d", cpu);
> >  		if (IS_ERR(p)) {
> >  			printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu);
> > @@ -450,6 +448,7 @@ static int watchdog_enable(int cpu)
> >  			}
> >  			goto out;
> >  		}
> > +		sched_setscheduler(p, SCHED_FIFO, &param);
> >  		kthread_bind(p, cpu);
> >  		per_cpu(watchdog_touch_ts, cpu) = 0;
> >  		per_cpu(softlockup_watchdog, cpu) = p;
> 
> It's pretty silly that kthread_create_on_node() sets the scheduling
> policy and priority and then the caller immediately resets it.  There
> should be a version of kthread_create_on_node() whcih takes these as
> arguments.
> 
> Oh well, despite all that the patch looks OK to me, after using
> whiteout all over the changelog.
> 

  reply	other threads:[~2012-03-15  1:45 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-14 20:38 [PATCH] watchdog: Make sure the watchdog thread gets CPU on loaded system Don Zickus
2012-03-14 20:59 ` Mandeep Singh Baines
2012-03-14 23:19 ` Andrew Morton
2012-03-15  1:45   ` Mandeep Singh Baines [this message]
2012-03-15 11:00     ` Peter Zijlstra
2012-03-15 11:06       ` Peter Zijlstra
2012-03-15 12:42         ` Ingo Molnar
2012-03-15 14:00           ` Peter Zijlstra
2012-03-15 14:35             ` Don Zickus
2012-03-15 15:39               ` Mandeep Singh Baines
2012-03-15 16:10                 ` Peter Zijlstra
2012-03-15 16:11                   ` Peter Zijlstra
2012-03-15 16:16                     ` Peter Zijlstra
2012-03-15 17:04                       ` Mandeep Singh Baines
2012-03-15  8:02   ` Michal Hocko
2012-03-15 15:54     ` Don Zickus
2012-03-15 16:04       ` Peter Zijlstra
2012-03-19 22:00         ` Andrew Morton
2012-03-15 16:14       ` Michal Hocko
2012-03-15 17:14         ` Don Zickus
  -- strict thread matches above, loose matches on Subject: below --
2012-03-13  9:45 [PATCH] watchdog: make " Michal Hocko
2012-03-13 13:42 ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120315014511.GT27051@google.com \
    --to=msb@chromium.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=dzickus@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox