From: Don Zickus <dzickus@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Mandeep Singh Baines <msb@chromium.org>
Subject: Re: [PATCH] watchdog: Make sure the watchdog thread gets CPU on loaded system
Date: Thu, 15 Mar 2012 13:14:05 -0400 [thread overview]
Message-ID: <20120315171405.GH3941@redhat.com> (raw)
In-Reply-To: <20120315161422.GC19855@tiehlicka.suse.cz>
On Thu, Mar 15, 2012 at 05:14:22PM +0100, Michal Hocko wrote:
> On Thu 15-03-12 11:54:13, Don Zickus wrote:
> > On Thu, Mar 15, 2012 at 09:02:32AM +0100, Michal Hocko wrote:
> > > On Wed 14-03-12 16:19:06, Andrew Morton wrote:
> > > > On Wed, 14 Mar 2012 16:38:45 -0400
> > > > Don Zickus <dzickus@redhat.com> wrote:
> > > >
> > > > > From: Michal Hocko <mhocko@suse.cz>
> > > >
> > > > This changelog is awful.
> >
> > My apologies too, Andrew for not being more diligent.
> >
> > Some nitpicks below (hopefully it isn't too picky :-( )
>
> Thanks! Updated
I think it looks fine. Is this ok now Andrew? I can respin this.
Cheers,
Don
> ---
> From a8da58750ba78d737136a4df24af805cb936ee00 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Tue, 13 Mar 2012 10:34:44 +0100
> Subject: [PATCH] watchdog: make sure the watchdog thread gets CPU on loaded
> system
>
> If the system is heavily loaded while hotplugging a CPU, we might end up
> with a bogus hardlockup detection. This has been seen during LTP pounder
> test executed in parallel with the hotplug test.
>
> Hard lockup detector consist of two parts
> - watchdog_overflow_callback (executed as a perf counter callback
> from NMI) which checks whether per-cpu hrtimer_interrupts changed
> since the last time it run and panics if not
> - watchdog kernel thread which starts watchdog_hrtimer which
> periodically updates hrtimer_interrupts.
>
> The main problem is that watchdog_enable (called when a CPU is brought up)
> registers a perf event but the hrtimer is started later when the watchdog
> thread gets a chance to run.
>
> The watchdog thread starts with a normal priority currently and boosts
> itself as soon as it gets to a CPU. This might be, however, already too
> late as demonstrated with the LTP pounder test executed in parallel by
> LTP hotplug test. There are zillions of userspace processes sitting in
> the runque while the number of online CPUs gets down to 1. CPUs are
> onlined back in the second stage where the issue triggers.
>
> When we online a CPU and create the watchdog kernel thread it will take
> some time until it gets to a CPU. On the other hand the perf counter
> callback is executed in the timely fashion so we explode the first time
> it finds out that the hrtimer_interrupts wasn't incremented.
>
> Let's fix this by boosting the watchdog thread priority before we wake it up
> rather than when it's already running.
> This still doesn't handle a case where we have the same amount of high prio
> FIFO tasks but that doesn't seem to be common. The current implementation
> doesn't handle that case anyway so this is no worse at least.
>
> Unfortunately, we cannot start perf counter from the watchdog thread
> because we could miss a real lock up and also we cannot start the
> hrtimer from watchdog_enable because we there is no way (at least I
> don't know any) to start a hrtimer from a different CPU.
> --
> Michal Hocko
> SUSE Labs
> SUSE LINUX s.r.o.
> Lihovarska 1060/12
> 190 00 Praha 9
> Czech Republic
next prev parent reply other threads:[~2012-03-15 17:14 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-14 20:38 [PATCH] watchdog: Make sure the watchdog thread gets CPU on loaded system Don Zickus
2012-03-14 20:59 ` Mandeep Singh Baines
2012-03-14 23:19 ` Andrew Morton
2012-03-15 1:45 ` Mandeep Singh Baines
2012-03-15 11:00 ` Peter Zijlstra
2012-03-15 11:06 ` Peter Zijlstra
2012-03-15 12:42 ` Ingo Molnar
2012-03-15 14:00 ` Peter Zijlstra
2012-03-15 14:35 ` Don Zickus
2012-03-15 15:39 ` Mandeep Singh Baines
2012-03-15 16:10 ` Peter Zijlstra
2012-03-15 16:11 ` Peter Zijlstra
2012-03-15 16:16 ` Peter Zijlstra
2012-03-15 17:04 ` Mandeep Singh Baines
2012-03-15 8:02 ` Michal Hocko
2012-03-15 15:54 ` Don Zickus
2012-03-15 16:04 ` Peter Zijlstra
2012-03-19 22:00 ` Andrew Morton
2012-03-15 16:14 ` Michal Hocko
2012-03-15 17:14 ` Don Zickus [this message]
-- strict thread matches above, loose matches on Subject: below --
2012-03-13 9:45 [PATCH] watchdog: make " Michal Hocko
2012-03-13 13:42 ` Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120315171405.GH3941@redhat.com \
--to=dzickus@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@elte.hu \
--cc=msb@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.