linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* pseries softreset on cpus in 32bit mode
@ 2006-05-22 16:41 Olaf Hering
  2006-05-22 18:46 ` Olaf Hering
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Olaf Hering @ 2006-05-22 16:41 UTC (permalink / raw)
  To: linuxppc-dev


Consider a simple app like this, which is placed as '/init' in an initrd
cpio archive:

hello32.c

#include <stdio.h>                                                                                                              
int main(void) {
        printf("foobar\n");
        asm("li 31,0; b .\n");
        return 0;
}

It will keep one cpu busy, and in 32bit mode. If a soft-reset is
triggered, this cpu remains in 32bit mode (I think) when
system_reset_fwnmi() is invoked. Then bad_stack is called via
STD_EXCEPTION_COMMON() and EXCEPTION_PROLOG_COMMON() because the 32bit
stackpointer is  > 0 and the cpu was in usermode. Finally panic is
called, which doesnt make much sense in this context.
machine_check_fwnmi has likely the same issue.

One bug is that something trashes regs->nip, it gets 0x3200 or similar.

I'm not really sure what is supposed to happen. Clearly a softreset
should not panic with bad stack pointer.

This is on a JS20, but a large p550 dies the same way.

Linux version 2.6.17-rc4-g353b28ba (olaf@pomegranate) (gcc version 4.1.0 (SUSE Linux)) #2 SMP Mon May 22 18:37:06 CEST 2006
[boot]0012 Setup Arch
Top of RAM: 0x1e000000, Total RAM: 0x1e000000
Memory hole size: 0MB
PPC64 nvram contains 16384 bytes
Using default idle loop
[boot]0015 Setup Done
Built 1 zonelists
Kernel command line:  root=/dev/hda2  xmon=on quiet panic=1
foobar
Bad kernel stack pointer ffa57ac0 at 3200
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=128 NUMA
Modules linked in:
NIP: 0000000000003200 LR: 0000000010000338 CTR: 0000000000032DDC
REGS: c000000007a5ed40 TRAP: c000000007a5ef10   Not tainted  (2.6.17-rc4-g353b28ba)
MSR: 0000000040001032 <ME,IR,DR>  CR: 20000042  XER: 200FFFFF
TASK = c00000001dfdb7e0[1] 'init' THREAD: c00000000ffcc000 CPU: 1
GPR00: 0000000010000338 00000000FFA57AC0 000000001009B470 0000000007ACEFF8
GPR04: 000000001002487C 0000000040000042 0000000000004000 000000001000B0E0
GPR08: 000000000000F932 0000000000000000 0000000000000000 0000000000000000
GPR12: 00000000200FFFFF C00000000052D100 C000000000442820 4000000002010000
GPR16: C000000000440ED8 0000000000000000 00000000000413DB 00000000004FA998
GPR20: 000000000250AC08 00000000004FAC08 000000000183FE00 00000000004420C0
GPR24: 000000000052CF00 0000000010000C70 0000000010000BF0 0000000000000000
GPR28: 0000000000000000 0000000010090000 00000000005123D8 0000000000000000
NIP [0000000000003200] 0x3200
LR [0000000010000338] 0x10000338
Call Trace:
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: pseries softreset on cpus in 32bit mode
  2006-05-22 16:41 pseries softreset on cpus in 32bit mode Olaf Hering
@ 2006-05-22 18:46 ` Olaf Hering
  2006-05-23 13:07 ` [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware Olaf Hering
  2006-07-19  8:34 ` [PATCH] force 64bit mode in fwnmi handlers to workaround firmware bugs Olaf Hering
  2 siblings, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2006-05-22 18:46 UTC (permalink / raw)
  To: linuxppc-dev

 On Mon, May 22, Olaf Hering wrote:

> 
> Consider a simple app like this, which is placed as '/init' in an initrd
> cpio archive:
> 
> hello32.c
> 
> #include <stdio.h>                                                                                                              
> int main(void) {
>         printf("foobar\n");
>         asm("li 31,0; b .\n");
>         return 0;
> }

Modified the userland app to init some registers with a fixed value, and
ran a kernel with the debug patch below. It gets into bad_stack from
decrementer_common = c000000000003400. 3200 is coming from 
c000000000003200 <system_reset_common>:

So what does that mean? Should a softreset disable interrupts?

Linux version 2.6.17-rc4-g353b28ba-dirty (olaf@pomegranate) (gcc version 4.1.0 (SUSE Linux)) #13 SMP Mon May 22 20:38:56 CEST 2006
[boot]0012 Setup Arch
Top of RAM: 0x1e000000, Total RAM: 0x1e000000
Memory hole size: 0MB
PPC64 nvram contains 16384 bytes
Using default idle loop
[boot]0015 Setup Done
Built 1 zonelists
Kernel command line:  root=/dev/hda2  xmon=on quiet
foobar
Bad kernel stack pointer ffad3ad0 at 3200
cpu 0x1: Vector: c000000007a5ef10  at [c000000007a5ed40]
    pc: 0000000000003200
    lr: 0000000010000338
    sp: ffad3ad0
   msr: 40001032
  current = 0xc00000000ffc67e0
  paca    = 0xc00000000053a100
    pid   = 1, comm = init
enter ? for help
1:mon> r
R00 = 0000000010000338   R16 = c0000000004470e8
R01 = 00000000ffad3ad0   R17 = 0000000000000000
R02 = 000000001009c470   R18 = 00000000000413cd
R03 = 0000000007aceff8   R19 = 0000000000507ab8
R04 = 000000001002489c   R20 = 0000000000000042
R05 = 0000000040000042   R21 = 0000000000000042
R06 = 0000000000004000   R22 = 0000000000000042
R07 = 000000001000b100   R23 = 0000000000000042
R08 = 000000000000f932   R24 = 0000000000000042
R09 = 0000000000000000   R25 = 0000000000000042
R10 = 0000000000000000   R26 = 0000000000000042
R11 = 0000000000000000   R27 = 0000000000000042
R12 = 00000000200fffff   R28 = 0000000000000042
R13 = c00000000053a100   R29 = 0000000000003420
R14 = c000000000448a38   R30 = 0000000000003200
R15 = 4000000002010000   R31 = 0000000010000368
pc  = 0000000000003200
lr  = 0000000010000338
msr = 0000000040001032   cr  = 20000042
ctr = 0000000000032ddc   xer = 00000000200fffff   trap = c000000007a5ef10
1:mon> 


Index: linux-2.6/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6/arch/powerpc/kernel/head_64.S
@@ -269,7 +269,12 @@ exception_marker:
        subi    r1,r1,INT_FRAME_SIZE;   /* alloc frame on kernel stack  */ \
        beq-    1f;                                                        \
        ld      r1,PACAKSAVE(r13);      /* kernel stack to use          */ \
-1:     cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
+1:                                     \
+       cmpdi   cr1,r29,0x42;           \
+       bne     cr1,2f;                 \
+       li      r29,2f@l;               \
+2:                                     \
+       cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
        bge-    cr1,bad_stack;          /* abort if it is               */ \
        std     r9,_CCR(r1);            /* save CR in stackframe        */ \
        std     r11,_NIP(r1);           /* save SRR0 in stackframe      */ \
@@ -600,6 +605,7 @@ slb_miss_user_pseries:
 system_reset_fwnmi:
        HMT_MEDIUM
        mtspr   SPRN_SPRG1,r13          /* save r13 */
+       mfspr   r31,SPRN_SRR0
        EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common)
 
        .globl machine_check_fwnmi
@@ -842,6 +848,7 @@ bad_stack:
        std     r9,_CCR(r1)
        std     r10,GPR1(r1)
        std     r11,_NIP(r1)
+       mr      r30,r11
        std     r12,_MSR(r1)
        mfspr   r11,SPRN_DAR
        mfspr   r12,SPRN_DSISR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-22 16:41 pseries softreset on cpus in 32bit mode Olaf Hering
  2006-05-22 18:46 ` Olaf Hering
@ 2006-05-23 13:07 ` Olaf Hering
  2006-05-26 11:30   ` Paul Mackerras
  2006-06-09  8:11   ` Paul Mackerras
  2006-07-19  8:34 ` [PATCH] force 64bit mode in fwnmi handlers to workaround firmware bugs Olaf Hering
  2 siblings, 2 replies; 11+ messages in thread
From: Olaf Hering @ 2006-05-23 13:07 UTC (permalink / raw)
  To: linuxppc-dev

 On Mon, May 22, Olaf Hering wrote:

> NIP [0000000000003200] 0x3200
> LR [0000000010000338] 0x10000338

The reason is that system_reset_fwnmi is called in 32bit mode. Forcing
64bit mode fixes the corrupt NIP for me. JS20 and p690 are affected,
seems to work on p550 and JS21.

According to this change for EXCEPTION_PROLOG_COMMON, I get still into
decremeter_common, but its not fatal anymore because the cpu is now in
64bit mode and the stack is forced to PACAKSAVE(r13).

        subi    r1,r1,INT_FRAME_SIZE;   /* alloc frame on kernel stack  */ \
        beq-    1f;                                                        \
        ld      r1,PACAKSAVE(r13);      /* kernel stack to use          */ \
-1:     cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
+1:                                     \
+       cmpdi   cr1,r29,0x42;           \
+       bne     cr1,2f;                 \
+       li      r29,2f@l;               \
+2:                                     \
+       cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
        bge-    cr1,bad_stack;          /* abort if it is               */ \
        std     r9,_CCR(r1);            /* save CR in stackframe        */ \
        std     r11,_NIP(r1);           /* save SRR0 in stackframe      */ \




Index: linux-2.6/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6/arch/powerpc/kernel/head_64.S
@@ -211,6 +211,29 @@ exception_marker:
 	ori	reg,reg,(label)@l;	/* virt addr of handler ... */
 #endif
 
+#define EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(area, label)				\
+	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
+	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
+	std	r10,area+EX_R10(r13);					\
+	std	r11,area+EX_R11(r13);					\
+	std	r12,area+EX_R12(r13);					\
+	mfspr	r9,SPRN_SPRG1;						\
+	std	r9,area+EX_R13(r13);					\
+	mfcr	r9;							\
+	clrrdi	r12,r13,32;		/* get high part of &label */	\
+	mfmsr	r10;							\
+	li	r11,5;			/* MSR_SF_LG|MSR_ISF_LG */	\
+	rldicr	r11,r11,61,2;		/* (5 << 61) */	\
+	or	r10,r10,r11;						\
+	mfspr	r11,SPRN_SRR0;		/* save SRR0 */			\
+	LOAD_HANDLER(r12,label)						\
+	ori	r10,r10,MSR_IR|MSR_DR|MSR_RI;				\
+	mtspr	SPRN_SRR0,r12;						\
+	mfspr	r12,SPRN_SRR1;		/* and SRR1 */			\
+	mtspr	SPRN_SRR1,r10;						\
+	rfid;								\
+	b	.	/* prevent speculative execution */
+
 #define EXCEPTION_PROLOG_PSERIES(area, label)				\
 	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
 	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
@@ -600,14 +623,14 @@ slb_miss_user_pseries:
 system_reset_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common)
+	EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(PACA_EXGEN, system_reset_common)
 
 	.globl machine_check_fwnmi
       .align 7
 machine_check_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
+	EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(PACA_EXMC, machine_check_common)
 
 #ifdef CONFIG_PPC_ISERIES
 /***  ISeries-LPAR interrupt handlers ***/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-23 13:07 ` [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware Olaf Hering
@ 2006-05-26 11:30   ` Paul Mackerras
  2006-05-26 12:33     ` Olaf Hering
  2006-05-27 11:34     ` Olaf Hering
  2006-06-09  8:11   ` Paul Mackerras
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Mackerras @ 2006-05-26 11:30 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev

Olaf Hering writes:

> According to this change for EXCEPTION_PROLOG_COMMON, I get still into
> decremeter_common, but its not fatal anymore because the cpu is now in
> 64bit mode and the stack is forced to PACAKSAVE(r13).
> 
>         subi    r1,r1,INT_FRAME_SIZE;   /* alloc frame on kernel stack  */ \
>         beq-    1f;                                                        \
>         ld      r1,PACAKSAVE(r13);      /* kernel stack to use          */ \
> -1:     cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
> +1:                                     \
> +       cmpdi   cr1,r29,0x42;           \

Ummm, what's r29 supposed to have in it here?

> +       bne     cr1,2f;                 \
> +       li      r29,2f@l;               \

And why are we setting it?

Does it look like the SRR0 and SRR1 values are correct when we get
this problem occurring?  Is it just the MSR that is bogus?

Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-26 11:30   ` Paul Mackerras
@ 2006-05-26 12:33     ` Olaf Hering
  2006-05-26 12:40       ` Paul Mackerras
  2006-05-27 11:34     ` Olaf Hering
  1 sibling, 1 reply; 11+ messages in thread
From: Olaf Hering @ 2006-05-26 12:33 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

 On Fri, May 26, Paul Mackeras wrote:

> Olaf Hering writes:
> 
> > According to this change for EXCEPTION_PROLOG_COMMON, I get still into
> > decremeter_common, but its not fatal anymore because the cpu is now in
> > 64bit mode and the stack is forced to PACAKSAVE(r13).
> > 
> >         subi    r1,r1,INT_FRAME_SIZE;   /* alloc frame on kernel stack  */ \
> >         beq-    1f;                                                        \
> >         ld      r1,PACAKSAVE(r13);      /* kernel stack to use          */ \
> > -1:     cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
> > +1:                                     \
> > +       cmpdi   cr1,r29,0x42;           \
> 
> Ummm, what's r29 supposed to have in it here?

0x42, from the spinning hello32 app. I filled all regs from r14 to r31
with a fixed value, and used these regs for debugging in the reset
handler.

> > +       bne     cr1,2f;                 \
> > +       li      r29,2f@l;               \
> 
> And why are we setting it?

Just a debug thing to find out how I got into bad_stack. I expected that
system_reset_common calls bad_stack, but it was decrementer_common. No
idea how that happend, MSR_EE is off, the timer interrupt has the lowest
priority, so it should not trigger.


> Does it look like the SRR0 and SRR1 values are correct when we get
> this problem occurring?  Is it just the MSR that is bogus?

The MSR indicates 32bit mode, it is 0x1002 on entry and 0x1032 before
rfdi. I havent dumped the SRR1 content, SRR0 points to the hello32 'b .'
instruction.


It seems the JS20 and POWER4 firmware calls fwnmi on all cpus, while
JS21 (and probably POWER5) does it only on one cpu. The other cpus will
be stopped with an IPI, trap 501.
According to some firmware guys, the OS is supposed to force 64bit mode
on reset.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-26 12:33     ` Olaf Hering
@ 2006-05-26 12:40       ` Paul Mackerras
  2006-05-26 12:48         ` Olaf Hering
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2006-05-26 12:40 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev

Olaf Hering writes:

> According to some firmware guys, the OS is supposed to force 64bit mode
> on reset.

Who?  I'll get a platform architect to drop on them from a great
height. :)

Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-26 12:40       ` Paul Mackerras
@ 2006-05-26 12:48         ` Olaf Hering
  0 siblings, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2006-05-26 12:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

 On Fri, May 26, Paul Mackeras wrote:

> Olaf Hering writes:
> 
> > According to some firmware guys, the OS is supposed to force 64bit mode
> > on reset.
> 
> Who?  I'll get a platform architect to drop on them from a great
> height. :)

LTC22581 for the full story.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-26 11:30   ` Paul Mackerras
  2006-05-26 12:33     ` Olaf Hering
@ 2006-05-27 11:34     ` Olaf Hering
  1 sibling, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2006-05-27 11:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

 On Fri, May 26, Paul Mackeras wrote:

> Olaf Hering writes:
> 
> > According to this change for EXCEPTION_PROLOG_COMMON, I get still into
> > decremeter_common, but its not fatal anymore because the cpu is now in
> > 64bit mode and the stack is forced to PACAKSAVE(r13).
> > 
> >         subi    r1,r1,INT_FRAME_SIZE;   /* alloc frame on kernel stack  */ \
> >         beq-    1f;                                                        \
> >         ld      r1,PACAKSAVE(r13);      /* kernel stack to use          */ \
> > -1:     cmpdi   cr1,r1,0;               /* check if r1 is in userspace  */ \
> > +1:                                     \
> > +       cmpdi   cr1,r29,0x42;           \
> 
> Ummm, what's r29 supposed to have in it here?
> 
> > +       bne     cr1,2f;                 \
> > +       li      r29,2f@l;               \
> 
> And why are we setting it?
> 
> Does it look like the SRR0 and SRR1 values are correct when we get
> this problem occurring?  Is it just the MSR that is bogus?

I looked into this again, my debug patch was wrong. I cant rely on the
hello32, r29 has to be set in the system_reset path. This is the
register dump. It shows that the cpus are in 32bit mode, and that
system_reset_fwnmi calls system_reset_common correctly.


0:mon> r
R00 = 000000001000036c   R16 = 0000000000000042
R01 = 00000000ffe96ad0   R17 = 0000000000000042
R02 = 000000001009b470   R18 = 0000000000000042
R03 = 0000000000000009   R19 = 0000000000000042
R04 = 000000001002228c   R20 = 0000000000000042
R05 = 0000000040042082   R21 = 0000000000000042
R06 = 0000000000004000   R22 = 0000000000000042
R07 = 0000000010008af0   R23 = 0000000000000042
R08 = 0000000000000000   R24 = 0000000000000042
R09 = 0000000000000000   R25 = 0000000000000042
R10 = 8000000000001032   R26 = 0000000000000042
R11 = 00000000ffe96a50   R27 = 0000000000000042
R12 = 0000000020000082   R28 = 0000000000000042
R13 = 000000001009a410   R29 = 0000000000003220
R14 = 0000000000000042   R30 = 0000000000001002
R15 = 0000000000000042   R31 = a000000000001032
pc  = 00000000100003b4
lr  = 000000001000036c
msr = 000000000000d032   cr  = 20000082
ctr = 0000000000032ddc   xer = 00000000000fffff   trap =  100
0:mon> c1
1:mon> r
R00 = 000000001000036c   R16 = 0000000000000042
R01 = 00000000ffe96ad0   R17 = 0000000000000042
R02 = 000000001009b470   R18 = 0000000000000042
R03 = 0000000000000007   R19 = 0000000000000042
R04 = 000000001002228c   R20 = 0000000000000042
R05 = 0000000040022082   R21 = 0000000000000042
R06 = 0000000000004000   R22 = 0000000000000042
R07 = 0000000010008af0   R23 = 0000000000000042
R08 = 0000000000000000   R24 = 0000000000000042
R09 = 0000000000000000   R25 = 0000000000000042
R10 = 8000000000001032   R26 = 0000000000000042
R11 = 00000000ffe96a50   R27 = 0000000000000042
R12 = 0000000020000082   R28 = 0000000000000042
R13 = 000000001009a410   R29 = 0000000000003220
R14 = 0000000000000042   R30 = 0000000000001002
R15 = 0000000000000042   R31 = a000000000001032
pc  = 00000000100003b4
lr  = 000000001000036c
msr = 000000000000d032   cr  = 20000082
ctr = 0000000000032ddc   xer = 00000000000fffff   trap =  100




Index: linux-2.6/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6/arch/powerpc/kernel/head_64.S
@@ -211,6 +211,31 @@ exception_marker:
 	ori	reg,reg,(label)@l;	/* virt addr of handler ... */
 #endif
 
+#define EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(area, label)				\
+	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
+	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
+	std	r10,area+EX_R10(r13);					\
+	std	r11,area+EX_R11(r13);					\
+	std	r12,area+EX_R12(r13);					\
+	mfspr	r9,SPRN_SPRG1;						\
+	std	r9,area+EX_R13(r13);					\
+	mfcr	r9;							\
+	clrrdi	r12,r13,32;		/* get high part of &label */	\
+	mfmsr	r10;							\
+	mr	r30,r10;						\
+	li	r11,5;			/* MSR_SF_LG|MSR_ISF_LG */	\
+	rldicr	r11,r11,61,2;		/* (5 << 61) */	\
+	or	r10,r10,r11;						\
+	mfspr	r11,SPRN_SRR0;		/* save SRR0 */			\
+	LOAD_HANDLER(r12,label)						\
+	ori	r10,r10,MSR_IR|MSR_DR|MSR_RI;				\
+	mtspr	SPRN_SRR0,r12;						\
+	mfspr	r12,SPRN_SRR1;		/* and SRR1 */			\
+	mr	r31,r10;						\
+	mtspr	SPRN_SRR1,r10;						\
+	rfid;								\
+	b	.	/* prevent speculative execution */
+
 #define EXCEPTION_PROLOG_PSERIES(area, label)				\
 	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
 	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
@@ -269,7 +294,12 @@ exception_marker:
 	subi	r1,r1,INT_FRAME_SIZE;	/* alloc frame on kernel stack	*/ \
 	beq-	1f;							   \
 	ld	r1,PACAKSAVE(r13);	/* kernel stack to use		*/ \
-1:	cmpdi	cr1,r1,0;		/* check if r1 is in userspace	*/ \
+1:				\
+	cmpdi   cr1,r29,0x42;	\
+	bne     cr1,2f;		\
+	li      r29,2f@l;	\
+2: ;				\
+	cmpdi	cr1,r1,0;		/* check if r1 is in userspace	*/ \
 	bge-	cr1,bad_stack;		/* abort if it is		*/ \
 	std	r9,_CCR(r1);		/* save CR in stackframe	*/ \
 	std	r11,_NIP(r1);		/* save SRR0 in stackframe	*/ \
@@ -600,14 +630,15 @@ slb_miss_user_pseries:
 system_reset_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common)
+	li	r29,0x42
+	EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(PACA_EXGEN, system_reset_common)
 
 	.globl machine_check_fwnmi
       .align 7
 machine_check_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
+	EXCEPTION_PROLOG_PSERIES_BROKEN_POWER4_FIRMWARE(PACA_EXMC, machine_check_common)
 
 #ifdef CONFIG_PPC_ISERIES
 /***  ISeries-LPAR interrupt handlers ***/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-05-23 13:07 ` [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware Olaf Hering
  2006-05-26 11:30   ` Paul Mackerras
@ 2006-06-09  8:11   ` Paul Mackerras
  2006-06-09  9:04     ` Olaf Hering
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2006-06-09  8:11 UTC (permalink / raw)
  To: Olaf Hering; +Cc: linuxppc-dev

Olaf Hering writes:

> The reason is that system_reset_fwnmi is called in 32bit mode. Forcing
> 64bit mode fixes the corrupt NIP for me. JS20 and p690 are affected,
> seems to work on p550 and JS21.

What was the LTC bugzilla number for this again?

Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware
  2006-06-09  8:11   ` Paul Mackerras
@ 2006-06-09  9:04     ` Olaf Hering
  0 siblings, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2006-06-09  9:04 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev

 On Fri, Jun 09, Paul Mackeras wrote:

> Olaf Hering writes:
> 
> > The reason is that system_reset_fwnmi is called in 32bit mode. Forcing
> > 64bit mode fixes the corrupt NIP for me. JS20 and p690 are affected,
> > seems to work on p550 and JS21.
> 
> What was the LTC bugzilla number for this again?

LTC22581, the last comments confirm that firmware leaves the cpu in
32bit mode.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] force 64bit mode in fwnmi handlers to workaround firmware bugs
  2006-05-22 16:41 pseries softreset on cpus in 32bit mode Olaf Hering
  2006-05-22 18:46 ` Olaf Hering
  2006-05-23 13:07 ` [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware Olaf Hering
@ 2006-07-19  8:34 ` Olaf Hering
  2 siblings, 0 replies; 11+ messages in thread
From: Olaf Hering @ 2006-07-19  8:34 UTC (permalink / raw)
  To: linuxppc-dev, Paul Mackeras


The firmware of POWER4 and JS20 systems does not switch the cpu to 64bit
mode when the registered system_reset and machine_check handlers get called.
If a 32bit process runs on that cpu at the time of the event, the cpu 
remains in 32bit mode. xmon and kdump can not deal with it, the result is
an error like 'Bad kernel stack pointer fff2aad0 at 3200'.
xmon just loses some register info, but booting the kdump kernel usually fails.

Both handlers are not hot paths.  Duplicate the EXCEPTION_PROLOG_PSERIES macro
and add two instructions to switch to 64bit:

 li     r11,5; 
 rldimi r10,r11,61,0; 


Signed-off-by: Olaf Hering <olh@suse.de>
---
 arch/powerpc/kernel/head_64.S |   35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

Index: linux-2.6.18-rc2/arch/powerpc/kernel/head_64.S
===================================================================
--- linux-2.6.18-rc2.orig/arch/powerpc/kernel/head_64.S
+++ linux-2.6.18-rc2/arch/powerpc/kernel/head_64.S
@@ -191,6 +191,37 @@ exception_marker:
 	ori	reg,reg,(label)@l;	/* virt addr of handler ... */
 #endif
 
+/*
+ * Equal to EXCEPTION_PROLOG_PSERIES, except that it forces 64bit mode.
+ * The firmware calls the registered system_reset_fwnmi and
+ * machine_check_fwnmi handlers in 32bit mode if the cpu happens to run
+ * a 32bit application at the time of the event.
+ * This firmware bug is present on POWER4 and JS20.
+ */
+#define EXCEPTION_PROLOG_PSERIES_FORCE_64BIT(area, label)		\
+	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
+	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
+	std	r10,area+EX_R10(r13);					\
+	std	r11,area+EX_R11(r13);					\
+	std	r12,area+EX_R12(r13);					\
+	mfspr	r9,SPRN_SPRG1;						\
+	std	r9,area+EX_R13(r13);					\
+	mfcr	r9;							\
+	clrrdi	r12,r13,32;		/* get high part of &label */	\
+	mfmsr	r10;							\
+	/* force 64bit mode */						\
+	li	r11,5;			/* MSR_SF_LG|MSR_ISF_LG */	\
+	rldimi	r10,r11,61,0;		/* insert into top 3 bits */	\
+	/* done 64bit mode */						\
+	mfspr	r11,SPRN_SRR0;		/* save SRR0 */			\
+	LOAD_HANDLER(r12,label)						\
+	ori	r10,r10,MSR_IR|MSR_DR|MSR_RI;				\
+	mtspr	SPRN_SRR0,r12;						\
+	mfspr	r12,SPRN_SRR1;		/* and SRR1 */			\
+	mtspr	SPRN_SRR1,r10;						\
+	rfid;								\
+	b	.	/* prevent speculative execution */
+
 #define EXCEPTION_PROLOG_PSERIES(area, label)				\
 	mfspr	r13,SPRN_SPRG3;		/* get paca address into r13 */	\
 	std	r9,area+EX_R9(r13);	/* save r9 - r12 */		\
@@ -604,14 +635,14 @@ slb_miss_user_pseries:
 system_reset_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXGEN, system_reset_common)
+	EXCEPTION_PROLOG_PSERIES_FORCE_64BIT(PACA_EXGEN, system_reset_common)
 
 	.globl machine_check_fwnmi
       .align 7
 machine_check_fwnmi:
 	HMT_MEDIUM
 	mtspr	SPRN_SPRG1,r13		/* save r13 */
-	EXCEPTION_PROLOG_PSERIES(PACA_EXMC, machine_check_common)
+	EXCEPTION_PROLOG_PSERIES_FORCE_64BIT(PACA_EXMC, machine_check_common)
 
 #ifdef CONFIG_PPC_ISERIES
 /***  ISeries-LPAR interrupt handlers ***/

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-07-19  8:34 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-22 16:41 pseries softreset on cpus in 32bit mode Olaf Hering
2006-05-22 18:46 ` Olaf Hering
2006-05-23 13:07 ` [PATCH] force 64bit mode in system_reset_fwnmi for broken POWER4 firmware Olaf Hering
2006-05-26 11:30   ` Paul Mackerras
2006-05-26 12:33     ` Olaf Hering
2006-05-26 12:40       ` Paul Mackerras
2006-05-26 12:48         ` Olaf Hering
2006-05-27 11:34     ` Olaf Hering
2006-06-09  8:11   ` Paul Mackerras
2006-06-09  9:04     ` Olaf Hering
2006-07-19  8:34 ` [PATCH] force 64bit mode in fwnmi handlers to workaround firmware bugs Olaf Hering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).