public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Sending cpu 0 back to SAL slave loop
@ 2006-10-06 20:39 Jack Steiner
  2006-10-06 20:44 ` Matthew Wilcox
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Jack Steiner @ 2006-10-06 20:39 UTC (permalink / raw)
  To: linux-ia64

On Tue, Sep 12, 2006 at 04:25:34PM -0500, Jack Steiner wrote:
> On Tue, Sep 12, 2006 at 01:23:54PM -0700, Luck, Tony wrote:
> > > Hmmm. I may have answered at least part of my question. It appears that the boot cpu
> > > cannot exit back to the SAL slave loop since it was never in the slave loop to start with.
> > >
> > > This will take some thought..... More later.
> > 
> > Yes.  cpu0 is a special case as there is no way to return it to SAL.
> > Linux hotplug code has a hack where we borrow the return details from
> > some other cpu in the case that someone wants to take cpu0 offline.
> > Will this work for Altix?  Would we have to be careful to get the
> > return details from some other cpu on the same node?
> > 
> > -Tony
> 
> Interesting idea. We might be able to make this work. It looks like we
> need to make some changes to our BIOS to make this work but it looks
> possible.
> 
> I'll investigate this some more.....
> 



(Sorry for taking so long to respond - vacation :-) & too much work)


I took another look at the SN issues involved in trying to send the boot
cpu back to the SAL slave loop during kexec. As others have pointed out,
the boot cpu was never in the slave loop so simply returning back to SAL
is not possible because the return address for cpu 0 (B0) does not point
to the SAL slave loop.

The HOTPLUG code added a hack to copy B0 from cpu 1
sal_boot_rendez_state[1].b0 to cpu  0 sal_boot_rendez_state area[0].b0.
This works only if the SAL slave loop is a simple assembly language
routine that does not use the RSE, SP, preserved registers, etc.

This is not true for the current SN BIOS. The SN SAL slave loop consists
of multiple functions written in both C & assembly. Sending cpu 0 back to
the SAL slave loop requires that the general registers & RSE area be
"fixed" as well. This is not possible for the general case since the state
could contain data such as stack pointers.

Fortunately, the changes to recode the slave loop as pure assembly appear
to be minimal & we plan to make these changes. The only hardspot is that
the slave loop adddress is not guaranteed to be the same if cpu0 & cpu1
are on different nodes & the nodes are running different versions of the
BIOS. Mixed BIOS versions is not a configuration that customers run so we
can ignore this problem - at least for now.  (We should try to detect
mixed BIOS versions & disable sending the boot cpu back to the slave
loop). 

Long term IA64 should implement an architected method for returning the
boot cpu to the SAL slave loop. Perhaps a new SAL call could do this.


For kexec, it is ESSENTIAL that all cpus except for the one doing
the kexec be returned to the SAL slave loop. If this is not done, our
chipset will misdirect IO interrupts on the newly exec'ed kernel.



-- jack

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Sending cpu 0 back to SAL slave loop
  2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
@ 2006-10-06 20:44 ` Matthew Wilcox
  2006-10-07  2:16 ` Jack Steiner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2006-10-06 20:44 UTC (permalink / raw)
  To: linux-ia64

On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> For kexec, it is ESSENTIAL that all cpus except for the one doing
> the kexec be returned to the SAL slave loop. If this is not done, our
> chipset will misdirect IO interrupts on the newly exec'ed kernel.

Could you do an IPI call to have CPU 0 do the kexec and have the CPU
that sent the IPI fall into the SAL slave loop instead?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Sending cpu 0 back to SAL slave loop
  2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
  2006-10-06 20:44 ` Matthew Wilcox
@ 2006-10-07  2:16 ` Jack Steiner
  2006-10-08  5:40 ` Zou, Nanhai
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jack Steiner @ 2006-10-07  2:16 UTC (permalink / raw)
  To: linux-ia64

On Fri, Oct 06, 2006 at 02:44:43PM -0600, Matthew Wilcox wrote:
> On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> > For kexec, it is ESSENTIAL that all cpus except for the one doing
> > the kexec be returned to the SAL slave loop. If this is not done, our
> > chipset will misdirect IO interrupts on the newly exec'ed kernel.
> 
> Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> that sent the IPI fall into the SAL slave loop instead?

Yes, that seems ok, too. One endcase that must be covered is the
case where where 1) cpu >0 panics and, 2) cpu 0 is looping with interrupts
disabled - perhaps waiting on a lock held by the panic'ing cpu.
In this case, the panic'ing cpu must send an NMI interrupt to cpu 0
to cause it to do the kexec.  Not sure but I think this can be made to work.

Hmmmmm. Another case that is more difficult to handle is a failure
where an MCA has occurred & cpu 0 is in the SAL rendez slave loop.
Kexec will have to bring cpu out of the rendez slave loop & cause it
to kexec the new kernel. Is this possible???


I like the NMI approach for the general case where you are kexec'ing 
a new kernel - not a crashdump kernel. We have some dependencies on 
the boot cpu not changing.

-- jack

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Sending cpu 0 back to SAL slave loop
  2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
  2006-10-06 20:44 ` Matthew Wilcox
  2006-10-07  2:16 ` Jack Steiner
@ 2006-10-08  5:40 ` Zou, Nanhai
  2006-10-08  7:37 ` Keith Owens
  2006-10-09 15:09 ` Jack Steiner
  4 siblings, 0 replies; 6+ messages in thread
From: Zou, Nanhai @ 2006-10-08  5:40 UTC (permalink / raw)
  To: linux-ia64

> -----Original Message-----
> From: Jack Steiner [mailto:steiner@sgi.com]
> Sent: 2006Äê10ÔÂ7ÈÕ 10:17
> To: Matthew Wilcox
> Cc: Zou, Nanhai; Jay Lan; Luck, Tony; Linux-IA64; fastboot; Eric W. Biederman
> Subject: Re: Sending cpu 0 back to SAL slave loop
> 
> On Fri, Oct 06, 2006 at 02:44:43PM -0600, Matthew Wilcox wrote:
> > On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> > > For kexec, it is ESSENTIAL that all cpus except for the one doing
> > > the kexec be returned to the SAL slave loop. If this is not done, our
> > > chipset will misdirect IO interrupts on the newly exec'ed kernel.
> >
> > Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> > that sent the IPI fall into the SAL slave loop instead?
> 
> Yes, that seems ok, too. One endcase that must be covered is the
> case where where 1) cpu >0 panics and, 2) cpu 0 is looping with interrupts
> disabled - perhaps waiting on a lock held by the panic'ing cpu.
> In this case, the panic'ing cpu must send an NMI interrupt to cpu 0
> to cause it to do the kexec.  Not sure but I think this can be made to work.
>

  This can be done. 
  Sent an IPI and wait CPU0 to kexec, if CPU0 not response for a certain period of time. Send an OS_INIT to force kexec on CPU0. However I wonder should we implement it with a kernel-parameter, or make it default behavior?
 
> Hmmmmm. Another case that is more difficult to handle is a failure
> where an MCA has occurred & cpu 0 is in the SAL rendez slave loop.
> Kexec will have to bring cpu out of the rendez slave loop & cause it
> to kexec the new kernel. Is this possible???
> 
  Not sure, I can have a try on Tiger, but I guess it depends on the how SAL is implementing rendez on each platform.
> 
> I like the NMI approach for the general case where you are kexec'ing
> a new kernel - not a crashdump kernel. We have some dependencies on
> the boot cpu not changing.
> 
  For none crashdumping kexec, I have sched_setaffinity code in kexec-tools->arch_process_options, so kexec -e will only run on CPU 0.

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Sending cpu 0 back to SAL slave loop
  2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
                   ` (2 preceding siblings ...)
  2006-10-08  5:40 ` Zou, Nanhai
@ 2006-10-08  7:37 ` Keith Owens
  2006-10-09 15:09 ` Jack Steiner
  4 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2006-10-08  7:37 UTC (permalink / raw)
  To: linux-ia64

Matthew Wilcox (on Fri, 6 Oct 2006 14:44:43 -0600) wrote:
>On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
>> For kexec, it is ESSENTIAL that all cpus except for the one doing
>> the kexec be returned to the SAL slave loop. If this is not done, our
>> chipset will misdirect IO interrupts on the newly exec'ed kernel.
>
>Could you do an IPI call to have CPU 0 do the kexec and have the CPU
>that sent the IPI fall into the SAL slave loop instead?

An IPI call will not work for MCA or INIT.  Both of those drive all
cpus into a state which has disabled interrupts.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Sending cpu 0 back to SAL slave loop
  2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
                   ` (3 preceding siblings ...)
  2006-10-08  7:37 ` Keith Owens
@ 2006-10-09 15:09 ` Jack Steiner
  4 siblings, 0 replies; 6+ messages in thread
From: Jack Steiner @ 2006-10-09 15:09 UTC (permalink / raw)
  To: linux-ia64

On Sun, Oct 08, 2006 at 05:37:04PM +1000, Keith Owens wrote:
> Matthew Wilcox (on Fri, 6 Oct 2006 14:44:43 -0600) wrote:
> >On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> >> For kexec, it is ESSENTIAL that all cpus except for the one doing
> >> the kexec be returned to the SAL slave loop. If this is not done, our
> >> chipset will misdirect IO interrupts on the newly exec'ed kernel.
> >
> >Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> >that sent the IPI fall into the SAL slave loop instead?
> 
> An IPI call will not work for MCA or INIT.  Both of those drive all
> cpus into a state which has disabled interrupts.

We could modify mca.c so that slave cpus are brought out of the mca spin
loop and sent back to the SAL slave loop.  Changes would be required to
the places in mca.c that spin on: 

	while (monarch_cpu != -1)
		cpu_relax();


However, I don't think I like this approach. We should try to drive all
cpus (except for the one doing the kexec) back to the SAL slave loop.
Unfortunately, this will not be successful in all cases - especially ones
where the system has experienced a hardware failure.

We need to modify our PROM so that we can detect a kexec of a new kernel.
At the time of the kexec, cpus not in the slave loop cannot be the target
for future interrupts. I think this can be done. 


-- jack


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-10-09 15:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
2006-10-06 20:44 ` Matthew Wilcox
2006-10-07  2:16 ` Jack Steiner
2006-10-08  5:40 ` Zou, Nanhai
2006-10-08  7:37 ` Keith Owens
2006-10-09 15:09 ` Jack Steiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox