linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Fix small race in 44x tlbie function
@ 2007-08-07  4:20 David Gibson
  2007-08-08 14:49 ` Josh Boyer
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: David Gibson @ 2007-08-07  4:20 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Todd Inglett, Volkmar Uhlig

The 440 family of processors don't have a tlbie instruction.  So, we
implement TLB invalidates by explicitly searching the TLB with tlbsx.,
then clobbering the relevant entry, if any.  Unfortunately the PID for
the search needs to be stored in the MMUCR register, which is also
used by the TLB miss handler.  Interrupts were enabled in _tlbie(), so
an interrupt between loading the MMUCR and the tlbsx could cause
incorrect search results, and thus a failure to invalide TLB entries
which needed to be invalidated.

This patch fixes the problem in both arch/ppc and arch/powerpc by
inhibiting interrupts (even critical and debug interrupts) across the
relevant instructions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
Paul, this one's a bugfix, which I think should go into 2.6.23.

Index: working-2.6/arch/powerpc/kernel/misc_32.S
===================================================================
--- working-2.6.orig/arch/powerpc/kernel/misc_32.S	2007-07-27 14:19:46.000000000 +1000
+++ working-2.6/arch/powerpc/kernel/misc_32.S	2007-07-27 14:30:46.000000000 +1000
@@ -301,9 +301,19 @@ _GLOBAL(_tlbie)
 	mfspr	r4,SPRN_MMUCR
 	mfspr	r5,SPRN_PID			/* Get PID */
 	rlwimi	r4,r5,0,24,31			/* Set TID */
-	mtspr	SPRN_MMUCR,r4
 
+	/* We have to run the search with interrupts disabled, even critical
+	 * and debug interrupts (in fact the only critical exceptions we have
+	 * are debug and machine check).  Otherwise  an interrupt which causes
+	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
+	mfmsr	r5
+	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
+	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
+	andc	r6,r5,r6
+	mtmsr	r6
+	mtspr	SPRN_MMUCR,r4
 	tlbsx.	r3, 0, r3
+	mtmsr	r5
 	bne	10f
 	sync
 	/* There are only 64 TLB entries, so r3 < 64,
Index: working-2.6/arch/ppc/kernel/misc.S
===================================================================
--- working-2.6.orig/arch/ppc/kernel/misc.S	2007-07-27 14:19:46.000000000 +1000
+++ working-2.6/arch/ppc/kernel/misc.S	2007-07-27 14:31:31.000000000 +1000
@@ -237,9 +237,19 @@ _GLOBAL(_tlbie)
 	mfspr	r4,SPRN_MMUCR
 	mfspr	r5,SPRN_PID			/* Get PID */
 	rlwimi	r4,r5,0,24,31			/* Set TID */
-	mtspr	SPRN_MMUCR,r4
 
+	/* We have to run the search with interrupts disabled, even critical
+	 * and debug interrupts (in fact the only critical exceptions we have
+	 * are debug and machine check).  Otherwise  an interrupt which causes
+	 * a TLB miss can clobber the MMUCR between the mtspr and the tlbsx. */
+	mfmsr	r5
+	lis	r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@ha
+	addi	r6,r6,(MSR_EE|MSR_CE|MSR_ME|MSR_DE)@l
+	andc	r6,r5,r6
+	mtmsr	r6
+	mtspr	SPRN_MMUCR,r4
 	tlbsx.	r3, 0, r3
+	mtmsr	r5
 	bne	10f
 	sync
 	/* There are only 64 TLB entries, so r3 < 64,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-07  4:20 Fix small race in 44x tlbie function David Gibson
@ 2007-08-08 14:49 ` Josh Boyer
  2007-08-08 15:20 ` Kumar Gala
  2007-08-08 20:43 ` Hollis Blanchard
  2 siblings, 0 replies; 19+ messages in thread
From: Josh Boyer @ 2007-08-08 14:49 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Todd Inglett, Volkmar Uhlig

On Tue, 7 Aug 2007 14:20:50 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> The 440 family of processors don't have a tlbie instruction.  So, we
> implement TLB invalidates by explicitly searching the TLB with tlbsx.,
> then clobbering the relevant entry, if any.  Unfortunately the PID for
> the search needs to be stored in the MMUCR register, which is also
> used by the TLB miss handler.  Interrupts were enabled in _tlbie(), so
> an interrupt between loading the MMUCR and the tlbsx could cause
> incorrect search results, and thus a failure to invalide TLB entries
> which needed to be invalidated.
> 
> This patch fixes the problem in both arch/ppc and arch/powerpc by
> inhibiting interrupts (even critical and debug interrupts) across the
> relevant instructions.
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>

And I agree this should go into 2.6.23.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-07  4:20 Fix small race in 44x tlbie function David Gibson
  2007-08-08 14:49 ` Josh Boyer
@ 2007-08-08 15:20 ` Kumar Gala
  2007-08-08 16:00   ` Josh Boyer
  2007-08-08 20:43 ` Hollis Blanchard
  2 siblings, 1 reply; 19+ messages in thread
From: Kumar Gala @ 2007-08-08 15:20 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Paul Mackerras, Todd Inglett, Volkmar Uhlig


On Aug 6, 2007, at 11:20 PM, David Gibson wrote:

> The 440 family of processors don't have a tlbie instruction.  So, we
> implement TLB invalidates by explicitly searching the TLB with tlbsx.,
> then clobbering the relevant entry, if any.  Unfortunately the PID for
> the search needs to be stored in the MMUCR register, which is also
> used by the TLB miss handler.  Interrupts were enabled in _tlbie(), so
> an interrupt between loading the MMUCR and the tlbsx could cause
> incorrect search results, and thus a failure to invalide TLB entries
> which needed to be invalidated.
>
> This patch fixes the problem in both arch/ppc and arch/powerpc by
> inhibiting interrupts (even critical and debug interrupts) across the
> relevant instructions.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> Paul, this one's a bugfix, which I think should go into 2.6.23.

Did you actually see this happen?

- k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Fix small race in 44x tlbie function
@ 2007-08-08 15:34 Volkmar Uhlig
  0 siblings, 0 replies; 19+ messages in thread
From: Volkmar Uhlig @ 2007-08-08 15:34 UTC (permalink / raw)
  To: galak, david; +Cc: linuxppc-dev, paulus, Todd Inglett

> -----Original Message-----
> From: galak@kernel.crashing.org [mailto:galak@kernel.crashing.org] 
> Sent: Wednesday, August 08, 2007 11:21 AM
> To: david@gibson.dropbear.id.au
> Cc: paulus@samba.org; linuxppc-dev@ozlabs.org; Todd Inglett; 
> Volkmar Uhlig
> Subject: Re: Fix small race in 44x tlbie function
> 
> 
> On Aug 6, 2007, at 11:20 PM, David Gibson wrote:
> 
> > The 440 family of processors don't have a tlbie instruction.  So, we
> > implement TLB invalidates by explicitly searching the TLB 
> with tlbsx.,
> > then clobbering the relevant entry, if any.  Unfortunately 
> the PID for
> > the search needs to be stored in the MMUCR register, which is also
> > used by the TLB miss handler.  Interrupts were enabled in 
> _tlbie(), so
> > an interrupt between loading the MMUCR and the tlbsx could cause
> > incorrect search results, and thus a failure to invalide TLB entries
> > which needed to be invalidated.
> >
> > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > inhibiting interrupts (even critical and debug interrupts) 
> across the
> > relevant instructions.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > Paul, this one's a bugfix, which I think should go into 2.6.23.
> 
> Did you actually see this happen?

Yes!  (I guess you didn't get the initial mail...)

- Volkmar

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 15:20 ` Kumar Gala
@ 2007-08-08 16:00   ` Josh Boyer
  2007-08-09  5:28     ` Kumar Gala
  0 siblings, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2007-08-08 16:00 UTC (permalink / raw)
  To: Kumar Gala
  Cc: linuxppc-dev, Volkmar Uhlig, Paul Mackerras, Todd Inglett,
	David Gibson

On Wed, 8 Aug 2007 10:20:45 -0500
Kumar Gala <galak@kernel.crashing.org> wrote:

> 
> On Aug 6, 2007, at 11:20 PM, David Gibson wrote:
> 
> > The 440 family of processors don't have a tlbie instruction.  So, we
> > implement TLB invalidates by explicitly searching the TLB with tlbsx.,
> > then clobbering the relevant entry, if any.  Unfortunately the PID for
> > the search needs to be stored in the MMUCR register, which is also
> > used by the TLB miss handler.  Interrupts were enabled in _tlbie(), so
> > an interrupt between loading the MMUCR and the tlbsx could cause
> > incorrect search results, and thus a failure to invalide TLB entries
> > which needed to be invalidated.
> >
> > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > inhibiting interrupts (even critical and debug interrupts) across the
> > relevant instructions.
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> > Paul, this one's a bugfix, which I think should go into 2.6.23.
> 
> Did you actually see this happen?

Yes.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-07  4:20 Fix small race in 44x tlbie function David Gibson
  2007-08-08 14:49 ` Josh Boyer
  2007-08-08 15:20 ` Kumar Gala
@ 2007-08-08 20:43 ` Hollis Blanchard
  2007-08-08 21:29   ` Josh Boyer
  2 siblings, 1 reply; 19+ messages in thread
From: Hollis Blanchard @ 2007-08-08 20:43 UTC (permalink / raw)
  To: linuxppc-dev

On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> 
> This patch fixes the problem in both arch/ppc and arch/powerpc by
> inhibiting interrupts (even critical and debug interrupts) across the
> relevant instructions.

How could a critical or debug interrupt modify the contents of MMUCR?

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 20:43 ` Hollis Blanchard
@ 2007-08-08 21:29   ` Josh Boyer
  2007-08-08 22:11     ` Hollis Blanchard
  2007-08-08 23:01     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 19+ messages in thread
From: Josh Boyer @ 2007-08-08 21:29 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: linuxppc-dev

On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
Hollis Blanchard <hollisb@us.ibm.com> wrote:

> On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > 
> > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > inhibiting interrupts (even critical and debug interrupts) across the
> > relevant instructions.
> 
> How could a critical or debug interrupt modify the contents of MMUCR?

Interrupts from UICs can be configured as critical.  If one of those
triggers, (or any other CE triggers) and causes a tlb miss, you have a
race.  The watchdog timer interrupt also is a CE IIRC.

CE and DE are admittedly a much smaller race, but still possible.
Masking EE off is the largest one.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 21:29   ` Josh Boyer
@ 2007-08-08 22:11     ` Hollis Blanchard
  2007-08-08 23:30       ` Benjamin Herrenschmidt
  2007-08-08 23:41       ` Josh Boyer
  2007-08-08 23:01     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 19+ messages in thread
From: Hollis Blanchard @ 2007-08-08 22:11 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev

On Wed, 2007-08-08 at 16:29 -0500, Josh Boyer wrote:
> On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
> Hollis Blanchard <hollisb@us.ibm.com> wrote:
> 
> > On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > > 
> > > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > > inhibiting interrupts (even critical and debug interrupts) across the
> > > relevant instructions.
> > 
> > How could a critical or debug interrupt modify the contents of MMUCR?
> 
> Interrupts from UICs can be configured as critical.  If one of those
> triggers, (or any other CE triggers) and causes a tlb miss, you have a
> race.  The watchdog timer interrupt also is a CE IIRC.

By "causes a tlb miss", you mean the interrupt handler associated with
the critical-priority UIC interrupt performs MMIO which causes a TLB
miss? Regular code couldn't cause a TLB miss AFAICS, since the kernel is
always mapped, and an interrupt handler doesn't access userspace.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 21:29   ` Josh Boyer
  2007-08-08 22:11     ` Hollis Blanchard
@ 2007-08-08 23:01     ` Benjamin Herrenschmidt
  2007-08-09  0:06       ` Josh Boyer
  1 sibling, 1 reply; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-08 23:01 UTC (permalink / raw)
  To: Josh Boyer; +Cc: linuxppc-dev, Hollis Blanchard

On Wed, 2007-08-08 at 16:29 -0500, Josh Boyer wrote:
> On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
> Hollis Blanchard <hollisb@us.ibm.com> wrote:
> 
> > On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > > 
> > > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > > inhibiting interrupts (even critical and debug interrupts) across the
> > > relevant instructions.
> > 
> > How could a critical or debug interrupt modify the contents of MMUCR?
> 
> Interrupts from UICs can be configured as critical.  If one of those
> triggers, (or any other CE triggers) and causes a tlb miss, you have a
> race.  The watchdog timer interrupt also is a CE IIRC.
> 
> CE and DE are admittedly a much smaller race, but still possible.
> Masking EE off is the largest one.

There is a much bigger problem if CEs can do tlb misses though... they
can interrupt the tlb miss handler itself, either between the two halves
of a tlb write, or between the write to MMUCR and the write to the tlb,
and I suspect both cases will cause trouble.

We might want to check if we were in the TLB miss handler upon return
from the CE and MCE handlers, and in this case, restart them (just
return to the faulting instruction, that is use srr0 instead of
csrr0/mcsrr0).

Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 22:11     ` Hollis Blanchard
@ 2007-08-08 23:30       ` Benjamin Herrenschmidt
  2007-08-08 23:41       ` Josh Boyer
  1 sibling, 0 replies; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-08 23:30 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: linuxppc-dev

On Wed, 2007-08-08 at 17:11 -0500, Hollis Blanchard wrote:
> On Wed, 2007-08-08 at 16:29 -0500, Josh Boyer wrote:
> > On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
> > Hollis Blanchard <hollisb@us.ibm.com> wrote:
> > 
> > > On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > > > 
> > > > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > > > inhibiting interrupts (even critical and debug interrupts) across the
> > > > relevant instructions.
> > > 
> > > How could a critical or debug interrupt modify the contents of MMUCR?
> > 
> > Interrupts from UICs can be configured as critical.  If one of those
> > triggers, (or any other CE triggers) and causes a tlb miss, you have a
> > race.  The watchdog timer interrupt also is a CE IIRC.
> 
> By "causes a tlb miss", you mean the interrupt handler associated with
> the critical-priority UIC interrupt performs MMIO which causes a TLB
> miss? Regular code couldn't cause a TLB miss AFAICS, since the kernel is
> always mapped, and an interrupt handler doesn't access userspace.

ioremap is an example, vmalloc space is another...

Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 22:11     ` Hollis Blanchard
  2007-08-08 23:30       ` Benjamin Herrenschmidt
@ 2007-08-08 23:41       ` Josh Boyer
  1 sibling, 0 replies; 19+ messages in thread
From: Josh Boyer @ 2007-08-08 23:41 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: linuxppc-dev

On Wed, Aug 08, 2007 at 05:11:09PM -0500, Hollis Blanchard wrote:
> On Wed, 2007-08-08 at 16:29 -0500, Josh Boyer wrote:
> > On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
> > Hollis Blanchard <hollisb@us.ibm.com> wrote:
> > 
> > > On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > > > 
> > > > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > > > inhibiting interrupts (even critical and debug interrupts) across the
> > > > relevant instructions.
> > > 
> > > How could a critical or debug interrupt modify the contents of MMUCR?
> > 
> > Interrupts from UICs can be configured as critical.  If one of those
> > triggers, (or any other CE triggers) and causes a tlb miss, you have a
> > race.  The watchdog timer interrupt also is a CE IIRC.
> 
> By "causes a tlb miss", you mean the interrupt handler associated with
> the critical-priority UIC interrupt performs MMIO which causes a TLB
> miss? Regular code couldn't cause a TLB miss AFAICS, since the kernel is
> always mapped, and an interrupt handler doesn't access userspace.

Yes.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 23:01     ` Benjamin Herrenschmidt
@ 2007-08-09  0:06       ` Josh Boyer
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Boyer @ 2007-08-09  0:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Hollis Blanchard

On Thu, Aug 09, 2007 at 09:01:29AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2007-08-08 at 16:29 -0500, Josh Boyer wrote:
> > On Wed, 8 Aug 2007 20:43:25 +0000 (UTC)
> > Hollis Blanchard <hollisb@us.ibm.com> wrote:
> > 
> > > On Tue, 07 Aug 2007 14:20:50 +1000, David Gibson wrote:
> > > > 
> > > > This patch fixes the problem in both arch/ppc and arch/powerpc by
> > > > inhibiting interrupts (even critical and debug interrupts) across the
> > > > relevant instructions.
> > > 
> > > How could a critical or debug interrupt modify the contents of MMUCR?
> > 
> > Interrupts from UICs can be configured as critical.  If one of those
> > triggers, (or any other CE triggers) and causes a tlb miss, you have a
> > race.  The watchdog timer interrupt also is a CE IIRC.
> > 
> > CE and DE are admittedly a much smaller race, but still possible.
> > Masking EE off is the largest one.
> 
> There is a much bigger problem if CEs can do tlb misses though... they
> can interrupt the tlb miss handler itself, either between the two halves
> of a tlb write, or between the write to MMUCR and the write to the tlb,
> and I suspect both cases will cause trouble.

Yes.

> We might want to check if we were in the TLB miss handler upon return
> from the CE and MCE handlers, and in this case, restart them (just
> return to the faulting instruction, that is use srr0 instead of
> csrr0/mcsrr0).

Something should be looked at, yeah.  

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-08 16:00   ` Josh Boyer
@ 2007-08-09  5:28     ` Kumar Gala
  2007-08-09  5:34       ` David Gibson
  2007-08-09 12:04       ` Josh Boyer
  0 siblings, 2 replies; 19+ messages in thread
From: Kumar Gala @ 2007-08-09  5:28 UTC (permalink / raw)
  To: Josh Boyer
  Cc: linuxppc-dev, Volkmar Uhlig, Paul Mackerras, Todd Inglett,
	David Gibson


On Aug 8, 2007, at 11:00 AM, Josh Boyer wrote:

> On Wed, 8 Aug 2007 10:20:45 -0500
> Kumar Gala <galak@kernel.crashing.org> wrote:
>
>>
>> On Aug 6, 2007, at 11:20 PM, David Gibson wrote:
>>
>>> The 440 family of processors don't have a tlbie instruction.  So, we
>>> implement TLB invalidates by explicitly searching the TLB with  
>>> tlbsx.,
>>> then clobbering the relevant entry, if any.  Unfortunately the  
>>> PID for
>>> the search needs to be stored in the MMUCR register, which is also
>>> used by the TLB miss handler.  Interrupts were enabled in _tlbie 
>>> (), so
>>> an interrupt between loading the MMUCR and the tlbsx could cause
>>> incorrect search results, and thus a failure to invalide TLB entries
>>> which needed to be invalidated.
>>>
>>> This patch fixes the problem in both arch/ppc and arch/powerpc by
>>> inhibiting interrupts (even critical and debug interrupts) across  
>>> the
>>> relevant instructions.
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> ---
>>> Paul, this one's a bugfix, which I think should go into 2.6.23.
>>
>> Did you actually see this happen?
>
> Yes.

When?

We don't have critical wired to anything, I don't expect watchdog to  
cause another fault.. so just wondering.

- k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09  5:28     ` Kumar Gala
@ 2007-08-09  5:34       ` David Gibson
  2007-08-09  6:35         ` Kumar Gala
  2007-08-09 12:04       ` Josh Boyer
  1 sibling, 1 reply; 19+ messages in thread
From: David Gibson @ 2007-08-09  5:34 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev, Volkmar Uhlig, Paul Mackerras, Todd Inglett

On Thu, Aug 09, 2007 at 12:28:20AM -0500, Kumar Gala wrote:
> 
> On Aug 8, 2007, at 11:00 AM, Josh Boyer wrote:
> 
> > On Wed, 8 Aug 2007 10:20:45 -0500
> > Kumar Gala <galak@kernel.crashing.org> wrote:
> >
> >>
> >> On Aug 6, 2007, at 11:20 PM, David Gibson wrote:
> >>
> >>> The 440 family of processors don't have a tlbie instruction.  So, we
> >>> implement TLB invalidates by explicitly searching the TLB with  
> >>> tlbsx.,
> >>> then clobbering the relevant entry, if any.  Unfortunately the  
> >>> PID for
> >>> the search needs to be stored in the MMUCR register, which is also
> >>> used by the TLB miss handler.  Interrupts were enabled in _tlbie 
> >>> (), so
> >>> an interrupt between loading the MMUCR and the tlbsx could cause
> >>> incorrect search results, and thus a failure to invalide TLB entries
> >>> which needed to be invalidated.
> >>>
> >>> This patch fixes the problem in both arch/ppc and arch/powerpc by
> >>> inhibiting interrupts (even critical and debug interrupts) across  
> >>> the
> >>> relevant instructions.
> >>>
> >>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >>> ---
> >>> Paul, this one's a bugfix, which I think should go into 2.6.23.
> >>
> >> Did you actually see this happen?
> >
> > Yes.
> 
> When?
> 
> We don't have critical wired to anything, I don't expect watchdog to  
> cause another fault.. so just wondering.

On debug (trace) interrupts on blue gene.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09  5:34       ` David Gibson
@ 2007-08-09  6:35         ` Kumar Gala
  2007-08-09  7:01           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 19+ messages in thread
From: Kumar Gala @ 2007-08-09  6:35 UTC (permalink / raw)
  To: David Gibson; +Cc: linuxppc-dev, Volkmar Uhlig, Paul Mackerras, Todd Inglett

>>>> Did you actually see this happen?
>>>
>>> Yes.
>>
>> When?
>>
>> We don't have critical wired to anything, I don't expect watchdog to
>> cause another fault.. so just wondering.
>
> On debug (trace) interrupts on blue gene.

Do you know why the debug code caused a fault?

- k

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09  6:35         ` Kumar Gala
@ 2007-08-09  7:01           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-09  7:01 UTC (permalink / raw)
  To: Kumar Gala
  Cc: Todd Inglett, linuxppc-dev, Paul Mackerras, Volkmar Uhlig,
	David Gibson

On Thu, 2007-08-09 at 01:35 -0500, Kumar Gala wrote:
> >>>> Did you actually see this happen?
> >>>
> >>> Yes.
> >>
> >> When?
> >>
> >> We don't have critical wired to anything, I don't expect watchdog to
> >> cause another fault.. so just wondering.
> >
> > On debug (trace) interrupts on blue gene.
> 
> Do you know why the debug code caused a fault?

Sure, it may access vmalloc space for example, which can cause a TLB
miss...

Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09  5:28     ` Kumar Gala
  2007-08-09  5:34       ` David Gibson
@ 2007-08-09 12:04       ` Josh Boyer
  2007-08-09 13:05         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 19+ messages in thread
From: Josh Boyer @ 2007-08-09 12:04 UTC (permalink / raw)
  To: Kumar Gala
  Cc: linuxppc-dev, Volkmar Uhlig, Paul Mackerras, Todd Inglett,
	David Gibson

On Thu, Aug 09, 2007 at 12:28:20AM -0500, Kumar Gala wrote:
> >>Did you actually see this happen?
> >
> >Yes.
> 
> When?

During some bluegene debug.

> We don't have critical wired to anything, I don't expect watchdog to  
> cause another fault.. so just wondering.

We being who?  I'm slightly confused here.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09 12:04       ` Josh Boyer
@ 2007-08-09 13:05         ` Benjamin Herrenschmidt
  2007-08-09 13:26           ` Josh Boyer
  0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-09 13:05 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Volkmar Uhlig, linuxppc-dev, Paul Mackerras, Todd Inglett,
	David Gibson

On Thu, 2007-08-09 at 07:04 -0500, Josh Boyer wrote:
> 
> > We don't have critical wired to anything, I don't expect watchdog
> to  
> > cause another fault.. so just wondering.
> 
> We being who?  I'm slightly confused here. 

I think Kumar doesn't know that we are talking about the BG kernel which
has more things "wired" to CRIT than what is upstream at the moment :-)

Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Fix small race in 44x tlbie function
  2007-08-09 13:05         ` Benjamin Herrenschmidt
@ 2007-08-09 13:26           ` Josh Boyer
  0 siblings, 0 replies; 19+ messages in thread
From: Josh Boyer @ 2007-08-09 13:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Uhlig, linuxppc-dev, Paul Mackerras, Todd Inglett, Volkmar,
	David Gibson

On Thu, 09 Aug 2007 23:05:36 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Thu, 2007-08-09 at 07:04 -0500, Josh Boyer wrote:
> > 
> > > We don't have critical wired to anything, I don't expect watchdog
> > to  
> > > cause another fault.. so just wondering.
> > 
> > We being who?  I'm slightly confused here. 
> 
> I think Kumar doesn't know that we are talking about the BG kernel
> which has more things "wired" to CRIT than what is upstream at the
> moment :-)

Ah, sure.  But even though we don't have much upstream that uses CE,
that doesn't mean someone can't reprogram the UICs on their boards to
use CE for some things, for example.  I know of at least one project
that has done that in the past.

josh

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-08-09 13:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-07  4:20 Fix small race in 44x tlbie function David Gibson
2007-08-08 14:49 ` Josh Boyer
2007-08-08 15:20 ` Kumar Gala
2007-08-08 16:00   ` Josh Boyer
2007-08-09  5:28     ` Kumar Gala
2007-08-09  5:34       ` David Gibson
2007-08-09  6:35         ` Kumar Gala
2007-08-09  7:01           ` Benjamin Herrenschmidt
2007-08-09 12:04       ` Josh Boyer
2007-08-09 13:05         ` Benjamin Herrenschmidt
2007-08-09 13:26           ` Josh Boyer
2007-08-08 20:43 ` Hollis Blanchard
2007-08-08 21:29   ` Josh Boyer
2007-08-08 22:11     ` Hollis Blanchard
2007-08-08 23:30       ` Benjamin Herrenschmidt
2007-08-08 23:41       ` Josh Boyer
2007-08-08 23:01     ` Benjamin Herrenschmidt
2007-08-09  0:06       ` Josh Boyer
  -- strict thread matches above, loose matches on Subject: below --
2007-08-08 15:34 Volkmar Uhlig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).