Re: [PATCH] watchdog: Fix race on single cpu boot

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Don Zickus <dzickus@redhat.com>
To: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	lkml <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Michal Hocko <mhocko@suse.cz>,
	Ulrich Obergfell <uobergfe@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Andi Kleen <ak@linux.intel.com>, Paul Bunyan <pbunyan@redhat.com>
Subject: Re: [PATCH] watchdog: Fix race on single cpu boot
Date: Fri, 31 Jul 2015 11:36:56 -0400	[thread overview]
Message-ID: <20150731153656.GB42272@redhat.com> (raw)
In-Reply-To: <1438343805-29771-1-git-send-email-jolsa@kernel.org>

On Fri, Jul 31, 2015 at 01:56:45PM +0200, Jiri Olsa wrote:
> We are experiencing following crashes under kdump
> kernel execution:
> 
>   [    1.930428] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> 
>   SNIP
> 
>   [    2.478604] Call Trace:
>   [    2.479364]  [<ffffffff81158f06>] perf_event_enable+0x16/0x40
>   [    2.481035]  [<ffffffff81119a2c>] watchdog_nmi_enable+0x10c/0x170
>   [    2.482906]  [<ffffffff81119c14>] watchdog_enable+0x54/0xc0
>   [    2.484583]  [<ffffffff810ad209>] smpboot_thread_fn+0xb9/0x1a0
>   [    2.486302]  [<ffffffff81634bb9>] ? schedule+0x29/0x70
>   [    2.487811]  [<ffffffff810ad150>] ? lg_double_unlock+0x90/0x90
>   [    2.4810a484f>] kthread+0xcf/0xe0
>   [    2.991478]  [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140
>   [    2.993519]  [<ffffffff8163fad8>] ret_from_fork+0x58/0x90
>   [    2.995192]  [<ffffffff810a4780>] ? kthread_create_on_node+0x140/0x140
>   [    3.005787] RIP  [<ffffffff8115744a>] perf_event_ctx_lock_nested.isra.50+0x5a/0xa0
>   [    3.008350]  RSP <ffff88003199fde0>
>   [    3.009417] CR2: 0000000000000080
>   [    3.010432] ---[ end trace 892dd7be017c5cde ]---
>   [    3.011792] Kernel panic - not syncing: Fatal exception
>   [    3.013379] Rebooting in 10 seconds..
> 
> It's caused by watchdog_nmi_disable_all/watchdog_nmi_enable_all calls
> executed by fixup_ht_bug (as fallback for HT bug) and watchdog_enable
> function executed by watchdog's smp_hotplug_thread setup callback.

Hi Jiri,

Uli privately has been working on a patchset that cleans up a bunch of these
race conditions.  We believe it should cover this case.  It uses the
proc_mutex to synchronize everything.

I think he is reaching out to you.  If you could try his patchset to see if
it fixes things, it might be a cleaner approach than what you are doing.

<small time goes by>

Ok, I was poked on IRC and you have already done this and it passed your
testcase.  Let me get Uli to post his patches. :-)

Cheers,
Don

> 
> The watchdog_enable function calls watchdog_nmi_enable which is not serialized
> with watchdog_nmi_disable_all/watchdog_nmi_enable_all. We see following race
> during the boot.
> 
> (booting with 'maxcpus=1' to emulate kdump mode and execute HT bug fallback).
> 
>   ---
>   kernel_init
>     kernel_init_freeable
>       lockup_detector_init
>         smpboot_register_percpu_thread
> 
>         -> schedules out to the new kthread
>   ---
>   smpboot_thread_fn
>     watchdog_enable
>       watchdog_nmi_enable
>         perf_event_create_kernel_counter
>         per_cpu(watchdog_ev, cpu) = event
>         perf_event_enable
>           perf_event_ctx_lock_nested
>           -> schedules out back to the kernel_init
>   ---
>   kernel_init
>      ...
>      do_basic_setup
>        do_initcalls
>          fixup_ht_bug
>            watchdog_nmi_disable_all
>            -> release per_cpu(watchdog_ev, cpu) initialized by watchdog_enable
> 
>            watchdog_nmi_enable_all
>      ...
>      -> schedules out to the smpboot_thread_fn
>   ---
>   smpboot_thread_fn
>      ...
>      -> crash in perf_event_ctx_lock_nested because the event
>         was released
> 
> Protecting watchdog_nmi_(enable|disable) calls with new per-cpu mutex.
> 
> I can easily reproduce this on SNB server with 'maxcpus=1' command line
> option, and I guess it could be triggered on any server that deals with
> the HT bug. I dig out following:
> 
>   case 42: /* 32nm SandyBridge         */
>   case 45: /* 32nm SandyBridge-E/EN/EP */
>   case 58: /* 22nm IvyBridge       */
>   case 62: /* 22nm IvyBridge-EP/EX */
>   case 60: /* 22nm Haswell Core */
>   case 63: /* 22nm Haswell Server */
>   case 69: /* 22nm Haswell ULT */
>   case 70: /* 22nm Haswell + GT3e (Intel Iris Pro graphics) */
> 
> Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> ---
>  kernel/watchdog.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index a6ffa43f2993..8fcc75116277 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -82,6 +82,7 @@ static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct mutex, watchdog_ev_lock);
>  #endif
>  static unsigned long soft_lockup_nmi_warn;
>  
> @@ -526,7 +527,10 @@ static unsigned long cpu0_err;
>  static int watchdog_nmi_enable(unsigned int cpu)
>  {
>  	struct perf_event_attr *wd_attr;
> -	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +	struct perf_event *event;
> +
> +	mutex_lock(&per_cpu(watchdog_ev_lock, cpu));
> +	event = per_cpu(watchdog_ev, cpu);
>  
>  	/* nothing to do if the hard lockup detector is disabled */
>  	if (!(watchdog_enabled & NMI_WATCHDOG_ENABLED))
> @@ -557,6 +561,8 @@ static int watchdog_nmi_enable(unsigned int cpu)
>  		goto out_save;
>  	}
>  
> +	mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
> +
>  	/*
>  	 * Disable the hard lockup detector if _any_ CPU fails to set up
>  	 * set up the hardware perf event. The watchdog() function checks
> @@ -593,12 +599,16 @@ out_save:
>  out_enable:
>  	perf_event_enable(per_cpu(watchdog_ev, cpu));
>  out:
> +	mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
>  	return 0;
>  }
>  
>  static void watchdog_nmi_disable(unsigned int cpu)
>  {
> -	struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +	struct perf_event *event;
> +
> +	mutex_lock(&per_cpu(watchdog_ev_lock, cpu));
> +	event = per_cpu(watchdog_ev, cpu);
>  
>  	if (event) {
>  		perf_event_disable(event);
> @@ -611,6 +621,7 @@ static void watchdog_nmi_disable(unsigned int cpu)
>  		/* watchdog_nmi_enable() expects this to be zero initially. */
>  		cpu0_err = 0;
>  	}
> +	mutex_unlock(&per_cpu(watchdog_ev_lock, cpu));
>  }
>  
>  void watchdog_nmi_enable_all(void)
> @@ -928,6 +939,8 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>  
>  void __init lockup_detector_init(void)
>  {
> +	int cpu __maybe_unused;
> +
>  	set_sample_period();
>  
>  #ifdef CONFIG_NO_HZ_FULL
> @@ -942,6 +955,11 @@ void __init lockup_detector_init(void)
>  	cpumask_copy(&watchdog_cpumask, cpu_possible_mask);
>  #endif
>  
> +#ifdef CONFIG_HARDLOCKUP_DETECTOR
> +	for_each_possible_cpu(cpu)
> +		mutex_init(&per_cpu(watchdog_ev_lock, cpu));
> +#endif
> +
>  	if (watchdog_enabled)
>  		watchdog_enable_all_cpus();
>  }
> -- 
> 2.4.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

next prev parent reply	other threads:[~2015-07-31 15:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-31 11:56 [PATCH] watchdog: Fix race on single cpu boot Jiri Olsa
2015-07-31 15:36 ` Don Zickus [this message]
2015-08-01 13:05   ` Ulrich Obergfell
2015-08-01 16:25     ` Peter Zijlstra
2015-08-02 11:09       ` Ulrich Obergfell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150731153656.GB42272@redhat.com \
    --to=dzickus@redhat.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.cz \
    --cc=mingo@kernel.org \
    --cc=pbunyan@redhat.com \
    --cc=uobergfe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.