public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Alok Kataria <akataria@vmware.com>
To: Ingo Molnar <mingo@elte.hu>, "H. Peter Anvin" <hpa@zytor.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	"kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	Haren Myneni <hbabu@us.ibm.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Daniel Hecht <dhecht@vmware.com>,
	"jeremy@xensource.com" <jeremy@xensource.com>
Subject: Re: [RFC PATCH] Bug during kexec...not all cpus are stopped
Date: Thu, 21 Oct 2010 12:09:16 -0700	[thread overview]
Message-ID: <1287688156.27008.13.camel@ank32.eng.vmware.com> (raw)
In-Reply-To: <1286929430.15658.30.camel@ank32.eng.vmware.com>


On Tue, 2010-10-12 at 17:23 -0700, Alok Kataria wrote:
> On Tue, 2010-10-12 at 15:17 -0700, Vivek Goyal wrote:
> > On Mon, Oct 11, 2010 at 03:10:11PM -0700, Eric W. Biederman wrote:
> > > Vivek Goyal <vgoyal@redhat.com> writes:
> > > 
> > > > On Mon, Oct 11, 2010 at 12:41:23PM -0700, Alok Kataria wrote:
> > > 
> > > > I don't think that kdump path uses smp_send_stop().
> > > 
> > > It doesn't.
> > > 
> > > > IIUC, on x86, we directly send NMI to other cpus.
> > > >
> > > > native_machine_crash_shutdown()
> > > >   kdump_nmi_shootdown_cpus()
> > > >         nmi_shootdown_cpus()
> > > >            smp_send_nmi_allbutself
> > > >                 apic->send_IPI_allbutself(NMI_VECTOR);
> > > >
> > > > So above description should be limited to only panic() path.
> > > 
> > > Is it actually confusing?  With respect to documenting the line
> > > of thinking it seems reasonable.
> > > 
> > 
> > No, just wanted to point out that let us modify the changelog to remove
> > keyword "kdump" from it. 
> > 
> > > > On a side note, I am wondering why panic() and kdump path can't share the
> > > > shutdown routine.
> > > 
> > > Hysterical raisins.  Andi's change to smp_send_stop says that NMIs not
> > > working on some boxes.  When someone wants to weed through all of the
> > > insanity it would probably be good to get the panic and the kdump paths
> > > sharing code.  For now simply separating panic and reboot should be
> > > enough, and it lets the code evolve where it needs to.
> > > 
> > 
> > Ok. Agreed that atleast conceptually kdump and panic() path should share
> > the code. But that's a different problem altogether and this patch can go in.
> 
> Okay now that we all agree, let me repost a patch with the updated
> changelog, this fits on top of tip/master. 

Hi Ingo, HPA

I don't think this patch was picked up for tip, now that the 2.6.37
merge window is open can you please pick this up push it upstream.
This patch fixes a legitimate regression, which was introduced during
2.6.30, by commit id 4ef702c10b5df18ab04921fc252c26421d4d6c75.

Thanks,
Alok

> 
> --
> 
> x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait"
> this allows the caller to specify if it wants to stop until all the cpus
> have processed the stop IPI. This is required specifically for the kexec case
> where we should wait for all the cpus to be stopped before starting the new
> kernel.
> We now wait for the cpus to stop in all cases except for panic where we expect
> things to be broken and we are doing our best to make things work anyway.
> 
> 
> Signed-off-by: Alok N Kataria <akataria@vmware.com>
> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
> 
> Index: linux-x86-tree.git/arch/x86/include/asm/smp.h
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/include/asm/smp.h	2010-02-07 16:37:26.000000000 -0800
> +++ linux-x86-tree.git/arch/x86/include/asm/smp.h	2010-10-12 16:37:04.000000000 -0700
> @@ -50,7 +50,7 @@ struct smp_ops {
>  	void (*smp_prepare_cpus)(unsigned max_cpus);
>  	void (*smp_cpus_done)(unsigned max_cpus);
>  
> -	void (*smp_send_stop)(void);
> +	void (*stop_other_cpus)(int wait);
>  	void (*smp_send_reschedule)(int cpu);
>  
>  	int (*cpu_up)(unsigned cpu);
> @@ -73,7 +73,12 @@ extern struct smp_ops smp_ops;
>  
>  static inline void smp_send_stop(void)
>  {
> -	smp_ops.smp_send_stop();
> +	smp_ops.stop_other_cpus(0);
> +}
> +
> +static inline void stop_other_cpus(void)
> +{
> +	smp_ops.stop_other_cpus(1);
>  }
>  
>  static inline void smp_prepare_boot_cpu(void)
> Index: linux-x86-tree.git/arch/x86/kernel/reboot.c
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/kernel/reboot.c	2010-08-17 12:09:51.000000000 -0700
> +++ linux-x86-tree.git/arch/x86/kernel/reboot.c	2010-10-12 16:37:04.000000000 -0700
> @@ -641,7 +641,7 @@ void native_machine_shutdown(void)
>  	/* O.K Now that I'm on the appropriate processor,
>  	 * stop all of the others.
>  	 */
> -	smp_send_stop();
> +	stop_other_cpus();
>  #endif
>  
>  	lapic_shutdown();
> Index: linux-x86-tree.git/arch/x86/kernel/smp.c
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/kernel/smp.c	2010-07-08 13:53:34.000000000 -0700
> +++ linux-x86-tree.git/arch/x86/kernel/smp.c	2010-10-12 16:37:04.000000000 -0700
> @@ -159,10 +159,10 @@ asmlinkage void smp_reboot_interrupt(voi
>  	irq_exit();
>  }
>  
> -static void native_smp_send_stop(void)
> +static void native_stop_other_cpus(int wait)
>  {
>  	unsigned long flags;
> -	unsigned long wait;
> +	unsigned long timeout;
>  
>  	if (reboot_force)
>  		return;
> @@ -179,9 +179,12 @@ static void native_smp_send_stop(void)
>  	if (num_online_cpus() > 1) {
>  		apic->send_IPI_allbutself(REBOOT_VECTOR);
>  
> -		/* Don't wait longer than a second */
> -		wait = USEC_PER_SEC;
> -		while (num_online_cpus() > 1 && wait--)
> +		/*
> +		 * Don't wait longer than a second if the caller
> +		 * didn't ask us to wait.
> +		 */
> +		timeout = USEC_PER_SEC;
> +		while (num_online_cpus() > 1 && (wait || timeout--))
>  			udelay(1);
>  	}
>  
> @@ -227,7 +230,7 @@ struct smp_ops smp_ops = {
>  	.smp_prepare_cpus	= native_smp_prepare_cpus,
>  	.smp_cpus_done		= native_smp_cpus_done,
>  
> -	.smp_send_stop		= native_smp_send_stop,
> +	.stop_other_cpus	= native_stop_other_cpus,
>  	.smp_send_reschedule	= native_smp_send_reschedule,
>  
>  	.cpu_up			= native_cpu_up,
> Index: linux-x86-tree.git/arch/x86/xen/enlighten.c
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/xen/enlighten.c	2010-10-12 16:36:28.000000000 -0700
> +++ linux-x86-tree.git/arch/x86/xen/enlighten.c	2010-10-12 16:37:04.000000000 -0700
> @@ -1019,7 +1019,7 @@ static void xen_reboot(int reason)
>  	struct sched_shutdown r = { .reason = reason };
>  
>  #ifdef CONFIG_SMP
> -	smp_send_stop();
> +	stop_other_cpus();
>  #endif
>  
>  	if (HYPERVISOR_sched_op(SCHEDOP_shutdown, &r))
> Index: linux-x86-tree.git/arch/x86/xen/smp.c
> ===================================================================
> --- linux-x86-tree.git.orig/arch/x86/xen/smp.c	2010-08-17 12:09:51.000000000 -0700
> +++ linux-x86-tree.git/arch/x86/xen/smp.c	2010-10-12 16:37:04.000000000 -0700
> @@ -400,9 +400,9 @@ static void stop_self(void *v)
>  	BUG();
>  }
>  
> -static void xen_smp_send_stop(void)
> +static void xen_stop_other_cpus(int wait)
>  {
> -	smp_call_function(stop_self, NULL, 0);
> +	smp_call_function(stop_self, NULL, wait);
>  }
>  
>  static void xen_smp_send_reschedule(int cpu)
> @@ -470,7 +470,7 @@ static const struct smp_ops xen_smp_ops 
>  	.cpu_disable = xen_cpu_disable,
>  	.play_dead = xen_play_dead,
>  
> -	.smp_send_stop = xen_smp_send_stop,
> +	.stop_other_cpus = xen_stop_other_cpus,
>  	.smp_send_reschedule = xen_smp_send_reschedule,
>  
>  	.send_call_func_ipi = xen_smp_send_call_function_ipi,
> 


  reply	other threads:[~2010-10-21 19:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08 20:34 [RFC PATCH] Bug during kexec...not all cpus are stopped Alok Kataria
2010-10-11 17:09 ` Alok Kataria
2010-10-11 18:07   ` Eric W. Biederman
2010-10-11 19:41     ` Alok Kataria
2010-10-11 21:17       ` Eric W. Biederman
2010-10-11 21:37         ` Alok Kataria
2010-10-21 21:40           ` [tip:x86/urgent] x86, kexec: Make sure to stop all CPUs before exiting the kernel tip-bot for Alok Kataria
2010-10-11 21:39       ` [RFC PATCH] Bug during kexec...not all cpus are stopped Vivek Goyal
2010-10-11 21:47         ` Alok Kataria
2010-10-11 22:10         ` Eric W. Biederman
2010-10-12 22:17           ` Vivek Goyal
2010-10-13  0:23             ` Alok Kataria
2010-10-21 19:09               ` Alok Kataria [this message]
2010-10-21 20:26                 ` H. Peter Anvin
2010-10-21 21:10                   ` Alok Kataria
2010-10-21 21:24                     ` H. Peter Anvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1287688156.27008.13.camel@ank32.eng.vmware.com \
    --to=akataria@vmware.com \
    --cc=dhecht@vmware.com \
    --cc=ebiederm@xmission.com \
    --cc=hbabu@us.ibm.com \
    --cc=hpa@zytor.com \
    --cc=jeremy@xensource.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox