Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Don Zickus <dzickus@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
Date: Tue, 23 Jul 2013 09:53:34 -0400	[thread overview]
Message-ID: <20130723135334.GF126784@redhat.com> (raw)
In-Reply-To: <1374503566-2521-1-git-send-email-mhocko@suse.cz>

On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> The nmi one is disabled and then reinitialized from scratch. This
> has an unpleasant side effect that the allocation of the new event might
> fail theoretically so the hard lockup detector would be disabled for
> such cpus. On the other hand such a memory allocation failure is very
> unlikely because the original event is deallocated right before.
> It would be much nicer if we just changed perf event period but there
> doesn't seem to be any API to do that right now.
> It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> unconditionally so we cannot use on_each_cpu() and do the same thing
> from the per-cpu context. The update from the current CPU should be
> safe because perf_event_disable removes the event atomically before
> it clears the per-cpu watchdog_ev so it cannot change anything under
> running handler feet.

I guess I don't have a problem with this.  I was hoping to have more
shared code with the regular stop/start routines but with the pmu bit
locking (to share pmus with oprofile), you really need to unregister
everything to stop the lockup detector.  This makes it a little too heavy
for a restart routine like this.

The only odd thing is I can't figure out which version you were using to
apply this patch.  I can't find old_thresh (though I understand the idea
of it).

Cheers,
Don

> 
> The hrtimer is simply restarted (thanks to Don Zickus who has pointed
> this out) if it is queued because we cannot rely it will fire&adopt
> to the new sampling period before a new nmi event triggers (when the
> treshold is decreased).
> 
> Changes since v1
> - restart hrtimer to ensure that hrtimer doesn't mess new nmi as pointed
>   out by Don Zickus
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
>  kernel/watchdog.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 50 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 2d64c02..eb4ebb5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -486,7 +486,52 @@ static struct smp_hotplug_thread watchdog_threads = {
>  	.unpark			= watchdog_enable,
>  };
>  
> -static int watchdog_enable_all_cpus(void)
> +static void restart_watchdog_hrtimer(void *info)
> +{
> +	struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
> +	int ret;
> +
> +	/*
> +	 * No need to cancel and restart hrtimer if it is currently executing
> +	 * because it will reprogram itself with the new period now.
> +	 * We should never see it unqueued here because we are running per-cpu
> +	 * with interrupts disabled.
> +	 */
> +	ret = hrtimer_try_to_cancel(hrtimer);
> +	if (ret == 1)
> +		hrtimer_start(hrtimer, ns_to_ktime(sample_period),
> +				HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void update_timers(int cpu)
> +{
> +	struct call_single_data data = {.func = restart_watchdog_hrtimer};
> +	/*
> +	 * Make sure that perf event counter will adopt to a new
> +	 * sampling period. Updating the sampling period directly would
> +	 * be much nicer but we do not have an API for that now so
> +	 * let's use a big hammer.
> +	 * Hrtimer will adopt the new period on the next tick but this
> +	 * might be late already so we have to restart the timer as well.
> +	 */
> +	watchdog_nmi_disable(cpu);
> +	__smp_call_function_single(cpu, &data, 1);
> +	watchdog_nmi_enable(cpu);
> +}
> +
> +static void update_timers_all_cpus(void)
> +{
> +	int cpu;
> +
> +	get_online_cpus();
> +	preempt_disable();
> +	for_each_online_cpu(cpu)
> +		update_timers(cpu);
> +	preempt_enable();
> +	put_online_cpus();
> +}
> +
> +static int watchdog_enable_all_cpus(bool sample_period_changed)
>  {
>  	int err = 0;
>  
> @@ -496,6 +541,8 @@ static int watchdog_enable_all_cpus(void)
>  			pr_err("Failed to create watchdog threads, disabled\n");
>  		else
>  			watchdog_running = 1;
> +	} else if (sample_period_changed) {
> +		update_timers_all_cpus();
>  	}
>  
>  	return err;
> @@ -537,7 +584,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
>  	 * watchdog_*_all_cpus() function takes care of this.
>  	 */
>  	if (watchdog_user_enabled && watchdog_thresh)
> -		err = watchdog_enable_all_cpus();
> +		err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
>  	else
>  		watchdog_disable_all_cpus();
>  
> @@ -565,5 +612,5 @@ void __init lockup_detector_init(void)
>  #endif
>  
>  	if (watchdog_user_enabled)
> -		watchdog_enable_all_cpus();
> +		watchdog_enable_all_cpus(false);
>  }
> -- 
> 1.8.3.2
>

next prev parent reply	other threads:[~2013-07-23 13:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-19  9:04 [RFC 1/2] watchdog: update watchdog attributes atomically Michal Hocko
2013-07-19  9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
2013-07-19 16:08   ` Don Zickus
2013-07-19 16:37     ` Michal Hocko
2013-07-19 18:05       ` Don Zickus
2013-07-20  8:42         ` Michal Hocko
2013-07-22 11:45   ` [RFC -v2 " Michal Hocko
2013-07-22 12:47     ` Michal Hocko
2013-07-22 14:32     ` [RFC -v3 " Michal Hocko
2013-07-23 13:53       ` Don Zickus [this message]
2013-07-23 14:07         ` Michal Hocko
2013-07-23 14:44           ` Don Zickus
2013-07-23 14:51             ` Michal Hocko
2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
2013-07-19 16:33   ` Michal Hocko
2013-07-23 13:56     ` Don Zickus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130723135334.GF126784@redhat.com \
    --to=dzickus@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).