All of lore.kernel.org
 help / color / mirror / Atom feed
From: Don Zickus <dzickus@redhat.com>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>,
	x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Robert Richter <robert.richter@amd.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	seiji.aguchi@hds.com, vgoyal@redhat.com, mjg@redhat.com,
	tony.luck@intel.com, gong.chen@intel.com, satoru.moriya@hds.com,
	avi@redhat.com
Subject: Re: [RFC][PATCH] x86, reboot:  use NMI instead of REBOOT_VECTOR to stop cpus
Date: Mon, 10 Oct 2011 12:03:41 -0400	[thread overview]
Message-ID: <20111010160341.GA5795@redhat.com> (raw)
In-Reply-To: <20111010065333.GL32173@elte.hu>

On Mon, Oct 10, 2011 at 08:53:33AM +0200, Ingo Molnar wrote:
> 
> * Don Zickus <dzickus@redhat.com> wrote:
> 
> > --- a/arch/x86/kernel/smp.c
> > +++ b/arch/x86/kernel/smp.c
> > @@ -28,6 +28,7 @@
> >  #include <asm/mmu_context.h>
> >  #include <asm/proto.h>
> >  #include <asm/apic.h>
> > +#include <asm/nmi.h>
> >  /*
> >   *	Some notes on x86 processor bugs affecting SMP operation:
> >   *
> > @@ -147,6 +148,57 @@ void native_send_call_func_ipi(const struct cpumask *mask)
> >  	free_cpumask_var(allbutself);
> >  }
> >  
> > +static int stopping_cpu;
> 
> Is access to this variable sufficiently serialized by all callpaths 
> of stop_other_cpus()?

Doesn't seem to be.  I can change this to an atomic_cmpxchg() to do that.

> 
> > +
> > +static int smp_stop_nmi_callback(unsigned int val, struct pt_regs *regs)
> > +{
> > +	/* We are registerd on stopping cpu too, avoid spurious NMI */
> > +	if (raw_smp_processor_id() == stopping_cpu)
> > +		return NMI_HANDLED;
> > +
> > +	stop_this_cpu(NULL);
> > +
> > +	return NMI_HANDLED;
> > +}
> > +
> > +static void native_nmi_stop_other_cpus(int wait)
> > +{
> > +	unsigned long flags;
> > +	unsigned long timeout;
> > +
> > +	if (reboot_force)
> > +		return;
> > +
> > +	/*
> > +	 * Use an own vector here because smp_call_function
> > +	 * does lots of things not suitable in a panic situation.
> > +	 */
> > +	if (num_online_cpus() > 1) {
> > +		stopping_cpu = safe_smp_processor_id();
> > +
> > +		if (register_nmi_handler(NMI_LOCAL, smp_stop_nmi_callback,
> > +					 NMI_FLAG_FIRST, "smp_stop"))
> > +			return;		/* return what? */
> > +
> > +		/* sync above data before sending NMI */
> > +		wmb();
> > +
> > +		apic->send_IPI_allbutself(NMI_VECTOR);
> > +
> > +		/*
> > +		 * Don't wait longer than a second if the caller
> > +		 * didn't ask us to wait.
> > +		 */
> > +		timeout = USEC_PER_SEC;
> > +		while (num_online_cpus() > 1 && (wait || timeout--))
> > +			udelay(1);
> > +	}
> > +
> > +	local_irq_save(flags);
> > +	disable_local_APIC();
> > +	local_irq_restore(flags);
> > +}
> > +
> >  /*
> >   * this function calls the 'stop' function on all other CPUs in the system.
> >   */
> > @@ -159,7 +211,7 @@ asmlinkage void smp_reboot_interrupt(void)
> >  	irq_exit();
> >  }
> >  
> > -static void native_stop_other_cpus(int wait)
> > +static void native_irq_stop_other_cpus(int wait)
> >  {
> >  	unsigned long flags;
> >  	unsigned long timeout;
> > @@ -229,7 +281,7 @@ struct smp_ops smp_ops = {
> >  	.smp_prepare_cpus	= native_smp_prepare_cpus,
> >  	.smp_cpus_done		= native_smp_cpus_done,
> >  
> > -	.stop_other_cpus	= native_stop_other_cpus,
> > +	.stop_other_cpus	= native_nmi_stop_other_cpus,
> >  	.smp_send_reschedule	= native_smp_send_reschedule,
> >  
> >  	.cpu_up			= native_cpu_up,
> 
> I'd be fine about this if you also aded some sort of 
> CONFIG_KERNEL_DEBUG dependent test facility that did a "send NMI to 
> all CPUs and check that they truly arrive" non-destructive test.
> 
> That would at least give people an automatic way to test it without 
> waiting for the first crash of their kernel.

Ok fair enough.  I also wanted to keep the 'old' fucntion around to let
people add something reboot=irq on the command line to sort through those
issues too.

I'll work on the 'DEBUG' idea you suggested and repost.

Thanks,
Don

      reply	other threads:[~2011-10-10 16:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-28 18:55 [RFC][PATCH] x86, reboot: use NMI instead of REBOOT_VECTOR to stop cpus Don Zickus
2011-10-06 20:50 ` Don Zickus
2011-10-06 21:00   ` Rafael J. Wysocki
2011-10-06 21:08     ` Matthew Garrett
2011-10-10  6:53 ` Ingo Molnar
2011-10-10 16:03   ` Don Zickus [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111010160341.GA5795@redhat.com \
    --to=dzickus@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=gong.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mjg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=satoru.moriya@hds.com \
    --cc=seiji.aguchi@hds.com \
    --cc=tony.luck@intel.com \
    --cc=vgoyal@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.