From: Don Zickus <dzickus@redhat.com>
To: Michal Hocko <mhocko@suse.cz>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>
Subject: Re: [RFC -v3 2/2] watchdog: update watchdog_tresh properly
Date: Tue, 23 Jul 2013 09:53:34 -0400 [thread overview]
Message-ID: <20130723135334.GF126784@redhat.com> (raw)
In-Reply-To: <1374503566-2521-1-git-send-email-mhocko@suse.cz>
On Mon, Jul 22, 2013 at 04:32:46PM +0200, Michal Hocko wrote:
> The nmi one is disabled and then reinitialized from scratch. This
> has an unpleasant side effect that the allocation of the new event might
> fail theoretically so the hard lockup detector would be disabled for
> such cpus. On the other hand such a memory allocation failure is very
> unlikely because the original event is deallocated right before.
> It would be much nicer if we just changed perf event period but there
> doesn't seem to be any API to do that right now.
> It is also unfortunate that perf_event_alloc uses GFP_KERNEL allocation
> unconditionally so we cannot use on_each_cpu() and do the same thing
> from the per-cpu context. The update from the current CPU should be
> safe because perf_event_disable removes the event atomically before
> it clears the per-cpu watchdog_ev so it cannot change anything under
> running handler feet.
I guess I don't have a problem with this. I was hoping to have more
shared code with the regular stop/start routines but with the pmu bit
locking (to share pmus with oprofile), you really need to unregister
everything to stop the lockup detector. This makes it a little too heavy
for a restart routine like this.
The only odd thing is I can't figure out which version you were using to
apply this patch. I can't find old_thresh (though I understand the idea
of it).
Cheers,
Don
>
> The hrtimer is simply restarted (thanks to Don Zickus who has pointed
> this out) if it is queued because we cannot rely it will fire&adopt
> to the new sampling period before a new nmi event triggers (when the
> treshold is decreased).
>
> Changes since v1
> - restart hrtimer to ensure that hrtimer doesn't mess new nmi as pointed
> out by Don Zickus
>
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> kernel/watchdog.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 2d64c02..eb4ebb5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -486,7 +486,52 @@ static struct smp_hotplug_thread watchdog_threads = {
> .unpark = watchdog_enable,
> };
>
> -static int watchdog_enable_all_cpus(void)
> +static void restart_watchdog_hrtimer(void *info)
> +{
> + struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
> + int ret;
> +
> + /*
> + * No need to cancel and restart hrtimer if it is currently executing
> + * because it will reprogram itself with the new period now.
> + * We should never see it unqueued here because we are running per-cpu
> + * with interrupts disabled.
> + */
> + ret = hrtimer_try_to_cancel(hrtimer);
> + if (ret == 1)
> + hrtimer_start(hrtimer, ns_to_ktime(sample_period),
> + HRTIMER_MODE_REL_PINNED);
> +}
> +
> +static void update_timers(int cpu)
> +{
> + struct call_single_data data = {.func = restart_watchdog_hrtimer};
> + /*
> + * Make sure that perf event counter will adopt to a new
> + * sampling period. Updating the sampling period directly would
> + * be much nicer but we do not have an API for that now so
> + * let's use a big hammer.
> + * Hrtimer will adopt the new period on the next tick but this
> + * might be late already so we have to restart the timer as well.
> + */
> + watchdog_nmi_disable(cpu);
> + __smp_call_function_single(cpu, &data, 1);
> + watchdog_nmi_enable(cpu);
> +}
> +
> +static void update_timers_all_cpus(void)
> +{
> + int cpu;
> +
> + get_online_cpus();
> + preempt_disable();
> + for_each_online_cpu(cpu)
> + update_timers(cpu);
> + preempt_enable();
> + put_online_cpus();
> +}
> +
> +static int watchdog_enable_all_cpus(bool sample_period_changed)
> {
> int err = 0;
>
> @@ -496,6 +541,8 @@ static int watchdog_enable_all_cpus(void)
> pr_err("Failed to create watchdog threads, disabled\n");
> else
> watchdog_running = 1;
> + } else if (sample_period_changed) {
> + update_timers_all_cpus();
> }
>
> return err;
> @@ -537,7 +584,7 @@ int proc_dowatchdog(struct ctl_table *table, int write,
> * watchdog_*_all_cpus() function takes care of this.
> */
> if (watchdog_user_enabled && watchdog_thresh)
> - err = watchdog_enable_all_cpus();
> + err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
> else
> watchdog_disable_all_cpus();
>
> @@ -565,5 +612,5 @@ void __init lockup_detector_init(void)
> #endif
>
> if (watchdog_user_enabled)
> - watchdog_enable_all_cpus();
> + watchdog_enable_all_cpus(false);
> }
> --
> 1.8.3.2
>
next prev parent reply other threads:[~2013-07-23 13:53 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-19 9:04 [RFC 1/2] watchdog: update watchdog attributes atomically Michal Hocko
2013-07-19 9:04 ` [RFC 2/2] watchdog: update watchdog_tresh properly Michal Hocko
2013-07-19 16:08 ` Don Zickus
2013-07-19 16:37 ` Michal Hocko
2013-07-19 18:05 ` Don Zickus
2013-07-20 8:42 ` Michal Hocko
2013-07-22 11:45 ` [RFC -v2 " Michal Hocko
2013-07-22 12:47 ` Michal Hocko
2013-07-22 14:32 ` [RFC -v3 " Michal Hocko
2013-07-23 13:53 ` Don Zickus [this message]
2013-07-23 14:07 ` Michal Hocko
2013-07-23 14:44 ` Don Zickus
2013-07-23 14:51 ` Michal Hocko
2013-07-19 16:10 ` [RFC 1/2] watchdog: update watchdog attributes atomically Don Zickus
2013-07-19 16:33 ` Michal Hocko
2013-07-23 13:56 ` Don Zickus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130723135334.GF126784@redhat.com \
--to=dzickus@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).