All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>
Cc: Chen Gong <gong.chen@linux.intel.com>,
	Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	x86@kernel.org, Thomas Gleixner <tglx@linutronix.de>,
	andi@firstfloor.org,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	tony.luck@gmail.com
Subject: Re: [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
Date: Tue, 02 Apr 2013 19:38:11 +0530	[thread overview]
Message-ID: <515AE64B.7080804@linux.vnet.ibm.com> (raw)
In-Reply-To: <514988F9.2000005@linux.vnet.ibm.com>

On 03/20/2013 03:31 PM, Srivatsa S. Bhat wrote:
[...]
> ------------------------------------------------------------>
> 
> From: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> Subject: [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
> 
> Dave Jones reports that offlining a CPU leads to this trace:
> 
> numa_remove_cpu cpu 1 node 0: mask now 0,2-3
> smpboot: CPU 1 is now offline
> BUG: using smp_processor_id() in preemptible [00000000] code:
> cpu-offline.sh/10591
> caller is cmci_rediscover+0x6a/0xe0
> Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
> Call Trace:
>  [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
>  [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
>  [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
>  [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
>  [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
>  [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
>  [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
>  [<ffffffff815ef082>] _cpu_down+0x302/0x350
>  [<ffffffff815ef106>] cpu_down+0x36/0x50
>  [<ffffffff815f1c9d>] store_online+0x8d/0xd0
>  [<ffffffff813edc48>] dev_attr_store+0x18/0x30
>  [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
>  [<ffffffff811adfb2>] vfs_write+0xa2/0x170
>  [<ffffffff811ae16c>] sys_write+0x4c/0xa0
>  [<ffffffff81613019>] system_call_fastpath+0x16/0x1b
> 
> 
> However, a look at cmci_rediscover shows that it can be simplified quite
> a bit, apart from solving the above issue. It invokes functions that
> take spin locks with interrupts disabled, and hence it can run in atomic
> context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
> is already dead and out of the cpu_online_mask. So take these points into
> account and simplify the code, and thereby also fix the above issue.
> 
> Reported-by: Dave Jones <davej@redhat.com>
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> ---
> 
>  arch/x86/include/asm/mce.h             |    4 ++--
>  arch/x86/kernel/cpu/mcheck/mce.c       |    2 +-
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |   25 +++++--------------------
>  3 files changed, 8 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index f4076af..fa5f71e 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -146,13 +146,13 @@ DECLARE_PER_CPU(struct device *, mce_device);
>  void mce_intel_feature_init(struct cpuinfo_x86 *c);
>  void cmci_clear(void);
>  void cmci_reenable(void);
> -void cmci_rediscover(int dying);
> +void cmci_rediscover(void);
>  void cmci_recheck(void);
>  #else
>  static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
>  static inline void cmci_clear(void) {}
>  static inline void cmci_reenable(void) {}
> -static inline void cmci_rediscover(int dying) {}
> +static inline void cmci_rediscover(void) {}
>  static inline void cmci_recheck(void) {}
>  #endif
>  
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 7bc1263..9239504 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -2358,7 +2358,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
>  
>  	if (action == CPU_POST_DEAD) {
>  		/* intentionally ignoring frozen here */
> -		cmci_rediscover(cpu);
> +		cmci_rediscover();
>  	}
>  
>  	return NOTIFY_OK;
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index 402c454..ae1697c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -285,39 +285,24 @@ void cmci_clear(void)
>  	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
>  }
>  
> -static long cmci_rediscover_work_func(void *arg)
> +static void cmci_rediscover_work_func(void *arg)
>  {
>  	int banks;
>  
>  	/* Recheck banks in case CPUs don't all have the same */
>  	if (cmci_supported(&banks))
>  		cmci_discover(banks);
> -
> -	return 0;
>  }
>  
> -/*
> - * After a CPU went down cycle through all the others and rediscover
> - * Must run in process context.
> - */
> -void cmci_rediscover(int dying)
> +/* After a CPU went down cycle through all the others and rediscover */
> +void cmci_rediscover(void)
>  {
> -	int cpu, banks;
> +	int banks;
>  
>  	if (!cmci_supported(&banks))
>  		return;
>  
> -	for_each_online_cpu(cpu) {
> -		if (cpu == dying)
> -			continue;
> -
> -		if (cpu == smp_processor_id()) {
> -			cmci_rediscover_work_func(NULL);
> -			continue;
> -		}
> -
> -		work_on_cpu(cpu, cmci_rediscover_work_func, NULL);
> -	}
> +	on_each_cpu(cmci_rediscover_work_func, NULL, 1);
>  }
>  
>  /*
> 

Hi Boris,

Tony mentioned that this patch worked fine for him. So could you kindly
pick up this patch?

Regards,
Srivatsa S. Bhat


  parent reply	other threads:[~2013-04-02 14:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-19 22:44 cpu offline causes backtrace from cmci_rediscover Dave Jones
2013-03-20  3:16 ` Chen Gong
2013-03-20  5:20   ` Mike Galbraith
2013-03-20 10:01   ` [PATCH] x86/mce: Rework cmci_rediscover() to play well with CPU hotplug Srivatsa S. Bhat
2013-03-20 10:46     ` Borislav Petkov
2013-03-20 23:28     ` Tony Luck
2013-04-02 14:08     ` Srivatsa S. Bhat [this message]
2013-04-02 14:17       ` Borislav Petkov
2013-04-02 20:58         ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=515AE64B.7080804@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=davej@redhat.com \
    --cc=gong.chen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.