* Re: Sending cpu 0 back to SAL slave loop
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
@ 2006-10-06 20:44 ` Matthew Wilcox
2006-10-07 2:16 ` Jack Steiner
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2006-10-06 20:44 UTC (permalink / raw)
To: linux-ia64
On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> For kexec, it is ESSENTIAL that all cpus except for the one doing
> the kexec be returned to the SAL slave loop. If this is not done, our
> chipset will misdirect IO interrupts on the newly exec'ed kernel.
Could you do an IPI call to have CPU 0 do the kexec and have the CPU
that sent the IPI fall into the SAL slave loop instead?
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Sending cpu 0 back to SAL slave loop
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
2006-10-06 20:44 ` Matthew Wilcox
@ 2006-10-07 2:16 ` Jack Steiner
2006-10-08 5:40 ` Zou, Nanhai
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Jack Steiner @ 2006-10-07 2:16 UTC (permalink / raw)
To: linux-ia64
On Fri, Oct 06, 2006 at 02:44:43PM -0600, Matthew Wilcox wrote:
> On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> > For kexec, it is ESSENTIAL that all cpus except for the one doing
> > the kexec be returned to the SAL slave loop. If this is not done, our
> > chipset will misdirect IO interrupts on the newly exec'ed kernel.
>
> Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> that sent the IPI fall into the SAL slave loop instead?
Yes, that seems ok, too. One endcase that must be covered is the
case where where 1) cpu >0 panics and, 2) cpu 0 is looping with interrupts
disabled - perhaps waiting on a lock held by the panic'ing cpu.
In this case, the panic'ing cpu must send an NMI interrupt to cpu 0
to cause it to do the kexec. Not sure but I think this can be made to work.
Hmmmmm. Another case that is more difficult to handle is a failure
where an MCA has occurred & cpu 0 is in the SAL rendez slave loop.
Kexec will have to bring cpu out of the rendez slave loop & cause it
to kexec the new kernel. Is this possible???
I like the NMI approach for the general case where you are kexec'ing
a new kernel - not a crashdump kernel. We have some dependencies on
the boot cpu not changing.
-- jack
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: Sending cpu 0 back to SAL slave loop
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
2006-10-06 20:44 ` Matthew Wilcox
2006-10-07 2:16 ` Jack Steiner
@ 2006-10-08 5:40 ` Zou, Nanhai
2006-10-08 7:37 ` Keith Owens
2006-10-09 15:09 ` Jack Steiner
4 siblings, 0 replies; 6+ messages in thread
From: Zou, Nanhai @ 2006-10-08 5:40 UTC (permalink / raw)
To: linux-ia64
> -----Original Message-----
> From: Jack Steiner [mailto:steiner@sgi.com]
> Sent: 2006Äê10ÔÂ7ÈÕ 10:17
> To: Matthew Wilcox
> Cc: Zou, Nanhai; Jay Lan; Luck, Tony; Linux-IA64; fastboot; Eric W. Biederman
> Subject: Re: Sending cpu 0 back to SAL slave loop
>
> On Fri, Oct 06, 2006 at 02:44:43PM -0600, Matthew Wilcox wrote:
> > On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> > > For kexec, it is ESSENTIAL that all cpus except for the one doing
> > > the kexec be returned to the SAL slave loop. If this is not done, our
> > > chipset will misdirect IO interrupts on the newly exec'ed kernel.
> >
> > Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> > that sent the IPI fall into the SAL slave loop instead?
>
> Yes, that seems ok, too. One endcase that must be covered is the
> case where where 1) cpu >0 panics and, 2) cpu 0 is looping with interrupts
> disabled - perhaps waiting on a lock held by the panic'ing cpu.
> In this case, the panic'ing cpu must send an NMI interrupt to cpu 0
> to cause it to do the kexec. Not sure but I think this can be made to work.
>
This can be done.
Sent an IPI and wait CPU0 to kexec, if CPU0 not response for a certain period of time. Send an OS_INIT to force kexec on CPU0. However I wonder should we implement it with a kernel-parameter, or make it default behavior?
> Hmmmmm. Another case that is more difficult to handle is a failure
> where an MCA has occurred & cpu 0 is in the SAL rendez slave loop.
> Kexec will have to bring cpu out of the rendez slave loop & cause it
> to kexec the new kernel. Is this possible???
>
Not sure, I can have a try on Tiger, but I guess it depends on the how SAL is implementing rendez on each platform.
>
> I like the NMI approach for the general case where you are kexec'ing
> a new kernel - not a crashdump kernel. We have some dependencies on
> the boot cpu not changing.
>
For none crashdumping kexec, I have sched_setaffinity code in kexec-tools->arch_process_options, so kexec -e will only run on CPU 0.
Thanks
Zou Nan hai
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Sending cpu 0 back to SAL slave loop
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
` (2 preceding siblings ...)
2006-10-08 5:40 ` Zou, Nanhai
@ 2006-10-08 7:37 ` Keith Owens
2006-10-09 15:09 ` Jack Steiner
4 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2006-10-08 7:37 UTC (permalink / raw)
To: linux-ia64
Matthew Wilcox (on Fri, 6 Oct 2006 14:44:43 -0600) wrote:
>On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
>> For kexec, it is ESSENTIAL that all cpus except for the one doing
>> the kexec be returned to the SAL slave loop. If this is not done, our
>> chipset will misdirect IO interrupts on the newly exec'ed kernel.
>
>Could you do an IPI call to have CPU 0 do the kexec and have the CPU
>that sent the IPI fall into the SAL slave loop instead?
An IPI call will not work for MCA or INIT. Both of those drive all
cpus into a state which has disabled interrupts.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Sending cpu 0 back to SAL slave loop
2006-10-06 20:39 Sending cpu 0 back to SAL slave loop Jack Steiner
` (3 preceding siblings ...)
2006-10-08 7:37 ` Keith Owens
@ 2006-10-09 15:09 ` Jack Steiner
4 siblings, 0 replies; 6+ messages in thread
From: Jack Steiner @ 2006-10-09 15:09 UTC (permalink / raw)
To: linux-ia64
On Sun, Oct 08, 2006 at 05:37:04PM +1000, Keith Owens wrote:
> Matthew Wilcox (on Fri, 6 Oct 2006 14:44:43 -0600) wrote:
> >On Fri, Oct 06, 2006 at 03:39:10PM -0500, Jack Steiner wrote:
> >> For kexec, it is ESSENTIAL that all cpus except for the one doing
> >> the kexec be returned to the SAL slave loop. If this is not done, our
> >> chipset will misdirect IO interrupts on the newly exec'ed kernel.
> >
> >Could you do an IPI call to have CPU 0 do the kexec and have the CPU
> >that sent the IPI fall into the SAL slave loop instead?
>
> An IPI call will not work for MCA or INIT. Both of those drive all
> cpus into a state which has disabled interrupts.
We could modify mca.c so that slave cpus are brought out of the mca spin
loop and sent back to the SAL slave loop. Changes would be required to
the places in mca.c that spin on:
while (monarch_cpu != -1)
cpu_relax();
However, I don't think I like this approach. We should try to drive all
cpus (except for the one doing the kexec) back to the SAL slave loop.
Unfortunately, this will not be successful in all cases - especially ones
where the system has experienced a hardware failure.
We need to modify our PROM so that we can detect a kexec of a new kernel.
At the time of the kexec, cpus not in the slave loop cannot be the target
for future interrupts. I think this can be done.
-- jack
^ permalink raw reply [flat|nested] 6+ messages in thread