From: Don Zickus <dzickus@redhat.com>
To: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
lkml <linux-kernel@vger.kernel.org>,
Ingo Molnar <mingo@kernel.org>, Michal Hocko <mhocko@suse.cz>,
Ulrich Obergfell <uobergfe@redhat.com>,
Stephane Eranian <eranian@google.com>,
Andi Kleen <ak@linux.intel.com>, Paul Bunyan <pbunyan@redhat.com>
Subject: Re: [PATCH] watchdog: Fix race on single cpu boot
Date: Fri, 31 Jul 2015 11:36:56 -0400 [thread overview]
Message-ID: <20150731153656.GB42272@redhat.com> (raw)
In-Reply-To: <1438343805-29771-1-git-send-email-jolsa@kernel.org>
On Fri, Jul 31, 2015 at 01:56:45PM +0200, Jiri Olsa wrote:
> We are experiencing following crashes under kdump
> kernel execution:
>
> [ 1.930428] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
>
> SNIP
>
> [ 2.478604] Call Trace:
> [ 2.479364] [<ffffffff81158f06>] perf_event_enable+0x16/0x40
> [ 2.481035] [<ffffffff81119a2c>] watchdog_nmi_enable+0x10c/0x170
> [ 2.482906] [<ffffffff81119c14>] watchdog_enable+0x54/0xc0
> [ 2.484583] [<ffffffff810ad209>] smpboot_thread_fn+0xb9/0x1a0
> [ 2.486302] [<ffffffff81634bb9>] ? schedule+0x29/0x70
> [ 2.487811] [<ffffffff810ad150>] ? lg_double_unlock+0x90/0x90
> [ 2.4810a484f>] kthread+0xcf/0xe0
> [ 2.991478] [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140
> [ 2.993519] [<ffffffff8163fad8>] ret_from_fork+0x58/0x90
> [ 2.995192] [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140
> [ 3.005787] RIP [<ffffffff8115744a>] perf_event_ctx_lock_nested.isra.50+0x5a/0xa0
> [ 3.008350] RSP <ffff88003199fde0>
> [ 3.009417] CR2: 0000000000000080
> [ 3.010432] ---[ end trace 892dd7be017c5cde ]---
> [ 3.011792] Kernel panic - not syncing: Fatal exception
> [ 3.013379] Rebooting in 10 seconds..
>
> It's caused by watchdog_nmi_disable_all/watchdog_nmi_enable_all calls
> executed by fixup_ht_bug (as fallback for HT bug) and watchdog_enable
> function executed by watchdog's smp_hotplug_thread setup callback.
Hi Jiri,
Uli privately has been working on a patchset that cleans up a bunch of these
race conditions. We believe it should cover this case. It uses the
proc_mutex to synchronize everything.
I think he is reaching out to you. If you could try his patchset to see if
it fixes things, it might be a cleaner approach than what you are doing.
<small time goes by>
Ok, I was poked on IRC and you have already done this and it passed your
testcase. Let me get Uli to post his patches. :-)
Cheers,
Don
>
> The watchdog_enable function calls watchdog_nmi_enable which is not serialized
> with watchdog_nmi_disable_all/watchdog_nmi_enable_all. We see following race
> during the boot.
>
> (booting with 'maxcpus=1' to emulate kdump mode and execute HT bug fallback).
>
> ---
> kernel_init
> kernel_init_freeable
> lockup_detector_init
> smpboot_register_percpu_thread
>
> -> schedules out to the new kthread
> ---
> smpboot_thread_fn
> watchdog_enable
> watchdog_nmi_enable
> perf_event_create_kernel_counter
> per_cpu(watchdog_ev, cpu) = event
> perf_event_enable
> perf_event_ctx_lock_nested
> -> schedules out back to the kernel_init
> ---
> kernel_init
> ...
> do_basic_setup
> do_initcalls
> fixup_ht_bug
> watchdog_nmi_disable_all
> -> release per_cpu(watchdog_ev, cpu) initialized by watchdog_enable
>
> watchdog_nmi_enable_all
> ...
> -> schedules out to the smpboot_thread_fn
> ---
> smpboot_thread_fn
> ...
> -> crash in perf_event_ctx_lock_nested because the event
> was released
>
> Protecting watchdog_nmi_(enable|disable) calls with new per-cpu mutex.
>
> I can easily reproduce this on SNB server with 'maxcpus=1' command line
> option, and I guess it could be triggered on any server that deals with
> the HT bug. I dig out following:
>
> case 42: /* 32nm SandyBridge */
> case 45: /* 32nm SandyBridge-E/EN/EP */
> case 58: /* 22nm IvyBridge */
> case 62: /* 22nm IvyBridge-EP/EX */
> case 60: /* 22nm Haswell Core */
> case 63: /* 22nm Haswell Server */
> case 69: /* 22nm Haswell ULT */
> case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */
>
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
> kernel/watchdog.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index a6ffa43f2993..8fcc75116277 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -82,6 +82,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
> static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
> static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct mutex, watchdog_ev_lock);
> #endif
> static unsigned long soft_lockup_nmi_warn;
>
> @@ -526,7 +527,10 @@ static unsigned long cpu0_err;
> static int watchdog_nmi_enable(unsigned int cpu)
> {
> struct perf_event_attr *wd_attr;
> - struct perf_event *event = per_cpu(watchdog_ev, cpu);
> + struct perf_event *event;
> +
> + mutex_lock(&per_cpu(watchdog_ev_lock, cpu));
> + event = per_cpu(watchdog_ev, cpu);
>
> /* nothing to do if the hard lockup detector is disabled */
> if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
> @@ -557,6 +561,8 @@ static int watchdog_nmi_enable(unsigned int cpu)
> goto out_save;
> }
>
> + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
> +
> /*
> * Disable the hard lockup detector if _any_ CPU fails to set up
> * set up the hardware perf event. The watchdog() function checks
> @@ -593,12 +599,16 @@ out_save:
> out_enable:
> perf_event_enable(per_cpu(watchdog_ev, cpu));
> out:
> + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
> return 0;
> }
>
> static void watchdog_nmi_disable(unsigned int cpu)
> {
> - struct perf_event *event = per_cpu(watchdog_ev, cpu);
> + struct perf_event *event;
> +
> + mutex_lock(&per_cpu(watchdog_ev_lock, cpu));
> + event = per_cpu(watchdog_ev, cpu);
>
> if (event) {
> perf_event_disable(event);
> @@ -611,6 +621,7 @@ static void watchdog_nmi_disable(unsigned int cpu)
> /* watchdog_nmi_enable() expects this to be zero initially. */
> cpu0_err = 0;
> }
> + mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
> }
>
> void watchdog_nmi_enable_all(void)
> @@ -928,6 +939,8 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>
> void __init lockup_detector_init(void)
> {
> + int cpu __maybe_unused;
> +
> set_sample_period();
>
> #ifdef CONFIG_NO_HZ_FULL
> @@ -942,6 +955,11 @@ void __init lockup_detector_init(void)
> cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
> #endif
>
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> + for_each_possible_cpu(cpu)
> + mutex_init(&per_cpu(watchdog_ev_lock, cpu));
> +#endif
> +
> if (watchdog_enabled)
> watchdog_enable_all_cpus();
> }
> --
> 2.4.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2015-07-31 15:37 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-31 11:56 [PATCH] watchdog: Fix race on single cpu boot Jiri Olsa
2015-07-31 15:36 ` Don Zickus [this message]
2015-08-01 13:05 ` Ulrich Obergfell
2015-08-01 16:25 ` Peter Zijlstra
2015-08-02 11:09 ` Ulrich Obergfell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150731153656.GB42272@redhat.com \
--to=dzickus@redhat.com \
--cc=a.p.zijlstra@chello.nl \
--cc=ak@linux.intel.com \
--cc=eranian@google.com \
--cc=jolsa@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@kernel.org \
--cc=pbunyan@redhat.com \
--cc=uobergfe@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.