7450 bugs & fixes

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* 7450 bugs & fixes
@ 2001-11-29 15:00 Benjamin Herrenschmidt
  2001-11-29 18:20 ` Gabriel Paubert
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2001-11-29 15:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Dan Malek, paulus

After reading the 7450 errata book, I'm now trying to figure out
what need to be done for our kernel to work properly on these.
I'd appreciate any input from other people here as some of the
errata exact consequences aren't that clear to me.

I've figured out so far that we need to handle 7450's as far as
rev 2.0 included. Apple seem to be the company who used the earliest
ones in released products and according to the infos I got from them,
they used rev 2.0 on uniprocessor desktop machines only, and rev. 2.1
on SMP machines & laptops. They also seem to have developed a HW
workaround for errata #38 making safe the use of the HW hashtable
loopkups on rev. 2.1 CPUs. (at least according to an Apple engineer
I contacted on the Darwin mailing list).

Here are the errata I think we are concerned about. I didn't list
errata for things I beleive we don't use so far, like L3 flush assist,
but those may have impacts I didn't foresee.

 - errata 15: mismatched lwarx/stwcx. pairs can cause loss of
   atomicity. That errata basically says we mustn't issue an
   stwcx. if we didn't have a previous lwarx. That mean our
   instance of stwcx. in the return from exception path should
   be added an lwarx. Any other candidate spotted ?

 - errata 18: seem to imply NAP/SLEEP can't be used for us on rev 2.0
   Well, that's weird as it seem that darwin has specific workarounds
   for another rev 2.0 errata related to NAP/SLEEP (L1 coherency lost),
   I'll ask my Apple contact about this one. I didn't find the exact
   errata for the L1 issue though....

 - errata 23: Not sure how that one can affect us. I don't think we do
   explicit cache flush on locations subject to snooping from external
   HW, at least not on UP (and rev 2.0 isn't used on SMP setups afaik)

 - errata 28: dcbst reserving L2 cache lines. That one is bad, as afaik,
   it could be used by userland code to kill the L2 cache. We should
   probably replace use of dcbst by dcbf in the kernel.

 - errata 29: do we ever switch MSR:IR off via an mtmsr ? If yes, we
   need to add a sync, but I don't think we do.

 - errata 31: BTIC corruption. This one affect only rev 2.0 which isn't
   used on SMP. So only the UP case matters. I'm not sure what a proper
   fix would be, maybe the isync recommended workaround. Paul ?

 - errata 38: Should be worked around in HW by Apple on SMP macs using
   7450 2.1. Other machines may need to implement software tablewalk
   instead though (beware of other erratas related to using software
   tablewalk then ;)

 - errata 39: We must stop doing any DOZE/NAP in idle.c when we have an
   L3 cache.

 - errata 47: dcbz vs. snoop hang. I need some more input on this one
   we may have to disable store gathering when we have an L3 cache...

That's all for now (pfffeww....), I'd recommend anybody who want some
cold fever to read that errata document, available from moto web site.

Ben.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-11-29 15:00 7450 bugs & fixes Benjamin Herrenschmidt
@ 2001-11-29 18:20 ` Gabriel Paubert
  2001-11-30  8:36 ` Giuliano Pochini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Gabriel Paubert @ 2001-11-29 18:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Dan Malek, paulus


On Thu, 29 Nov 2001, Benjamin Herrenschmidt wrote:

>
> After reading the 7450 errata book, I'm now trying to figure out
> what need to be done for our kernel to work properly on these.
> I'd appreciate any input from other people here as some of the
> errata exact consequences aren't that clear to me.
>
> I've figured out so far that we need to handle 7450's as far as
> rev 2.0 included. Apple seem to be the company who used the earliest
> ones in released products and according to the infos I got from them,
> they used rev 2.0 on uniprocessor desktop machines only, and rev. 2.1
> on SMP machines & laptops. They also seem to have developed a HW
> workaround for errata #38 making safe the use of the HW hashtable
> loopkups on rev. 2.1 CPUs. (at least according to an Apple engineer
> I contacted on the Darwin mailing list).
>
> Here are the errata I think we are concerned about. I didn't list
> errata for things I beleive we don't use so far, like L3 flush assist,
> but those may have impacts I didn't foresee.
>
>  - errata 15: mismatched lwarx/stwcx. pairs can cause loss of
>    atomicity. That errata basically says we mustn't issue an
>    stwcx. if we didn't have a previous lwarx. That mean our
>    instance of stwcx. in the return from exception path should
>    be added an lwarx. Any other candidate spotted ?

The return from exception path does not have to be modified. The only
purpose of this stwcx. is to clear the reservation to ensure that an
atomic operation interrupted bwteen lwarx and stwcx. will never succeed.

Besides that, note that the errata is SMP only (of if you do atomic
operations on DMA descriptors for example) and the error is that CR0 does
not properly reflect the success of the write operation. But cr0 is a
"don't care" after the stwcx. in the exception return path.

	Regards,
	Gabriel.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: 7450 bugs & fixes
  2001-11-29 15:00 7450 bugs & fixes Benjamin Herrenschmidt
  2001-11-29 18:20 ` Gabriel Paubert
@ 2001-11-30  8:36 ` Giuliano Pochini
  2001-11-30 10:46   ` Holger Bettag
  2001-12-10 22:54 ` Jim
  2001-12-14 17:57 ` Gabriel Paubert
  3 siblings, 1 reply; 14+ messages in thread
From: Giuliano Pochini @ 2001-11-30  8:36 UTC (permalink / raw)
  To: linuxppc-dev


> After reading the 7450 errata book, I'm now trying to figure out
> what need to be done for our kernel to work properly on these. [...]

:-((
I've never seen so many bugs in a mot cpu before.
I hope they are not embracing the Intel's ship-things-asap-regardless-reliability style :(


Bye.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-11-30  8:36 ` Giuliano Pochini
@ 2001-11-30 10:46   ` Holger Bettag
  2001-11-30 21:52     ` Timothy A. Seufert
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Bettag @ 2001-11-30 10:46 UTC (permalink / raw)
  To: Giuliano Pochini; +Cc: linuxppc-dev


Giuliano Pochini <pochini@shiny.it> writes:

>
>
> > After reading the 7450 errata book, I'm now trying to figure out
> > what need to be done for our kernel to work properly on these. [...]
>
> :-((
> I've never seen so many bugs in a mot cpu before.
> I hope they are not embracing the Intel's ship-things-asap-regardless-reliability style :(
>
To my knowledge, Apple is their only customer that actually delivered the
buggy revisions to end users. I guess it's not Motorola who's in a hurry.

  Holger


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-11-30 10:46   ` Holger Bettag
@ 2001-11-30 21:52     ` Timothy A. Seufert
  0 siblings, 0 replies; 14+ messages in thread
From: Timothy A. Seufert @ 2001-11-30 21:52 UTC (permalink / raw)
  To: Holger Bettag, Giuliano Pochini; +Cc: linuxppc-dev

At 11:46 AM +0100 11/30/01, Holger Bettag wrote:
>Giuliano Pochini <pochini@shiny.it> writes:
>
>>
>>
>>  > After reading the 7450 errata book, I'm now trying to figure out
>>  > what need to be done for our kernel to work properly on these. [...]
>>
>>  :-((
>>  I've never seen so many bugs in a mot cpu before.
>>  I hope they are not embracing the Intel's
>>ship-things-asap-regardless-reliability style :(
>>
>To my knowledge, Apple is their only customer that actually delivered the
>buggy revisions to end users. I guess it's not Motorola who's in a hurry.

The 7450 was probably at least a year late (if not more), going by
various Motorola announcements, roadmaps, MPF presentations, and so
forth.  I don't think Apple was supposed to get stuck using the 7400
for as long as they did.  No doubt Apple put pressure on Motorola to
deliver a shippable version (meaning one whose bugs Apple could
remedy with workarounds) as soon as possible.

--
Tim Seufert

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-11-29 15:00 7450 bugs & fixes Benjamin Herrenschmidt
  2001-11-29 18:20 ` Gabriel Paubert
  2001-11-30  8:36 ` Giuliano Pochini
@ 2001-12-10 22:54 ` Jim
  2001-12-14  5:17   ` Christopher Murtagh
  2001-12-14 17:57 ` Gabriel Paubert
  3 siblings, 1 reply; 14+ messages in thread
From: Jim @ 2001-12-10 22:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Dan Malek, paulus


I'm working on an embedded project that has two 7450's.

Is anyone else doing any 7450 SMP work (perhaps on the new dual 800
PowerMac's)?
Also, do any of our more experienced people have any tips or helpful
thoughts -- particularly the area of interrupts?  Thanks a bunch.


> After reading the 7450 errata book, I'm now trying to figure out
> what need to be done for our kernel to work properly on these.
> I'd appreciate any input from other people here as some of the
> errata exact consequences aren't that clear to me.
>
> I've figured out so far that we need to handle 7450's as far as
> rev 2.0 included. Apple seem to be the company who used the earliest
> ones in released products and according to the infos I got from them,
> they used rev 2.0 on uniprocessor desktop machines only, and rev. 2.1
> on SMP machines & laptops. They also seem to have developed a HW
> workaround for errata #38 making safe the use of the HW hashtable
> loopkups on rev. 2.1 CPUs. (at least according to an Apple engineer
> I contacted on the Darwin mailing list).
>
> Here are the errata I think we are concerned about. I didn't list
> errata for things I beleive we don't use so far, like L3 flush assist,
> but those may have impacts I didn't foresee.
>
>  - errata 15: mismatched lwarx/stwcx. pairs can cause loss of
>    atomicity. That errata basically says we mustn't issue an
>    stwcx. if we didn't have a previous lwarx. That mean our
>    instance of stwcx. in the return from exception path should
>    be added an lwarx. Any other candidate spotted ?
>
>  - errata 18: seem to imply NAP/SLEEP can't be used for us on rev 2.0
>    Well, that's weird as it seem that darwin has specific workarounds
>    for another rev 2.0 errata related to NAP/SLEEP (L1 coherency lost),
>    I'll ask my Apple contact about this one. I didn't find the exact
>    errata for the L1 issue though....
>
>  - errata 23: Not sure how that one can affect us. I don't think we do
>    explicit cache flush on locations subject to snooping from external
>    HW, at least not on UP (and rev 2.0 isn't used on SMP setups afaik)
>
>  - errata 28: dcbst reserving L2 cache lines. That one is bad, as afaik,
>    it could be used by userland code to kill the L2 cache. We should
>    probably replace use of dcbst by dcbf in the kernel.
>
>  - errata 29: do we ever switch MSR:IR off via an mtmsr ? If yes, we
>    need to add a sync, but I don't think we do.
>
>  - errata 31: BTIC corruption. This one affect only rev 2.0 which isn't
>    used on SMP. So only the UP case matters. I'm not sure what a proper
>    fix would be, maybe the isync recommended workaround. Paul ?
>
>  - errata 38: Should be worked around in HW by Apple on SMP macs using
>    7450 2.1. Other machines may need to implement software tablewalk
>    instead though (beware of other erratas related to using software
>    tablewalk then ;)
>
>  - errata 39: We must stop doing any DOZE/NAP in idle.c when we have an
>    L3 cache.
>
>  - errata 47: dcbz vs. snoop hang. I need some more input on this one
>    we may have to disable store gathering when we have an L3 cache...
>
> That's all for now (pfffeww....), I'd recommend anybody who want some
> cold fever to read that errata document, available from moto web site.
>
> Ben.
>

--
Sincerely,

Jim Potter
45th Parallel Processing
(503) 769-9138
jrp@wvi.com

  Those that would give up a necessary freedom for
  temporary safety deserve neither freedom nor safety.
    -- Ben Franklin


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-10 22:54 ` Jim
@ 2001-12-14  5:17   ` Christopher Murtagh
  2001-12-14  9:17     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Murtagh @ 2001-12-14  5:17 UTC (permalink / raw)
  To: linuxppc-dev

Once upon a time, BenH wrote:
> After reading the 7450 errata book, I'm now trying to figure out what
>need to be done for our kernel to work properly on these. I'd appreciate
>any input from other people here as some of the errata exact
>consequences aren't that clear to me.

 Could this be the reason why my Dual 800 is crashing a lot? It is running
2.4.17-pre2, PostgreSQL 7.1, Apache 1.3.20 with 1GB of RAM, 512 MB swap
and hardly gets any work. This machine has crashed hard probably about 5
times in the last 2 weeks and I can't find anything in the logs (not sure
where else to look).

 Is anyone running this machine successfully or have the same experiences
as me? Any info would be much appreciated.

Cheers,

Chris

--

Christopher Murtagh
Webmaster / Sysadmin
Web Communications Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14  5:17   ` Christopher Murtagh
@ 2001-12-14  9:17     ` Benjamin Herrenschmidt
  2001-12-14 16:50       ` Christopher Murtagh
  0 siblings, 1 reply; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2001-12-14  9:17 UTC (permalink / raw)
  To: Christopher Murtagh, linuxppc-dev


>
>Once upon a time, BenH wrote:
>> After reading the 7450 errata book, I'm now trying to figure out what
>>need to be done for our kernel to work properly on these. I'd appreciate
>>any input from other people here as some of the errata exact
>>consequences aren't that clear to me.
>
> Could this be the reason why my Dual 800 is crashing a lot? It is running
>2.4.17-pre2, PostgreSQL 7.1, Apache 1.3.20 with 1GB of RAM, 512 MB swap
>and hardly gets any work. This machine has crashed hard probably about 5
>times in the last 2 weeks and I can't find anything in the logs (not sure
>where else to look).
>
> Is anyone running this machine successfully or have the same experiences
>as me? Any info would be much appreciated.

Get a more recent kernel. I've worked around a couple of these errata
in both my rsync tree and the bitkeeper ppc kernels.

And please, don't forget to tell me if it works better ;)

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14  9:17     ` Benjamin Herrenschmidt
@ 2001-12-14 16:50       ` Christopher Murtagh
  2001-12-14 16:53         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Murtagh @ 2001-12-14 16:50 UTC (permalink / raw)
  To: linuxppc-dev


On Fri, 14 Dec 2001, Benjamin Herrenschmidt wrote:
>Get a more recent kernel. I've worked around a couple of these errata in
>both my rsync tree and the bitkeeper ppc kernels.
>
>And please, don't forget to tell me if it works better ;)

 :-)

 Thanks Ben. I grabbed 2.4.17-pre8-ben0 and it is now running. I'll let
you know how it goes.

 Just to confirm, if I want to use more than 768 MB of RAM, I need to
compile with High Memory right?

 Thanks again.

Cheers,

Chris


--

Christopher Murtagh
Webmaster / Sysadmin
Web Communications Group
McGill University
Montreal, Quebec
Canada

Tel.: (514) 398-3122
Fax:  (514) 398-2017


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14 16:50       ` Christopher Murtagh
@ 2001-12-14 16:53         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2001-12-14 16:53 UTC (permalink / raw)
  To: Christopher Murtagh, linuxppc-dev


> Thanks Ben. I grabbed 2.4.17-pre8-ben0 and it is now running. I'll let
>you know how it goes.
>
> Just to confirm, if I want to use more than 768 MB of RAM, I need to
>compile with High Memory right?

Yes.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-11-29 15:00 7450 bugs & fixes Benjamin Herrenschmidt
                   ` (2 preceding siblings ...)
  2001-12-10 22:54 ` Jim
@ 2001-12-14 17:57 ` Gabriel Paubert
  2001-12-14 19:19   ` Benjamin Herrenschmidt
  3 siblings, 1 reply; 14+ messages in thread
From: Gabriel Paubert @ 2001-12-14 17:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, Dan Malek, paulus

Sorry for the delay, it took me some time to try to understand all the
errata. Actually I've not yet finished, but I'm going to be without
regular net access from tonight until January 3rd or so.

On Thu, 29 Nov 2001, Benjamin Herrenschmidt wrote:

>
> After reading the 7450 errata book, I'm now trying to figure out
> what need to be done for our kernel to work properly on these.
> I'd appreciate any input from other people here as some of the
> errata exact consequences aren't that clear to me.
>
> I've figured out so far that we need to handle 7450's as far as
> rev 2.0 included. Apple seem to be the company who used the earliest
> ones in released products and according to the infos I got from them,
> they used rev 2.0 on uniprocessor desktop machines only, and rev. 2.1
> on SMP machines & laptops. They also seem to have developed a HW
> workaround for errata #38 making safe the use of the HW hashtable
> loopkups on rev. 2.1 CPUs. (at least according to an Apple engineer
> I contacted on the Darwin mailing list).
>
> Here are the errata I think we are concerned about. I didn't list
> errata for things I beleive we don't use so far, like L3 flush assist,
> but those may have impacts I didn't foresee.

- erratum 13 ?

>
>  - errata 15: mismatched lwarx/stwcx. pairs can cause loss of
>    atomicity. That errata basically says we mustn't issue an
>    stwcx. if we didn't have a previous lwarx. That mean our
>    instance of stwcx. in the return from exception path should
>    be added an lwarx. Any other candidate spotted ?

Irrelevant as already explained.

>
>  - errata 18: seem to imply NAP/SLEEP can't be used for us on rev 2.0
>    Well, that's weird as it seem that darwin has specific workarounds
>    for another rev 2.0 errata related to NAP/SLEEP (L1 coherency lost),
>    I'll ask my Apple contact about this one. I didn't find the exact
>    errata for the L1 issue though....

and

>  - errata 39: We must stop doing any DOZE/NAP in idle.c when we have an
>    L3 cache.

I am slightly confused, but:

- only destops are affected by 39 since portables don't have L3 caches.

- the net result of 18 is that nap and sleep modes won't be entered,
which is harmless if it only affects desktops.

- however, if the processor does not actually enter nap/sleep modes, how
can it cause L3 cache corruption ? (or does it only happen if you work
around 18 by disabling interrupts and using reset/machine-check to wakeup)

I now realize that there is no explicit doze state on 7450, whether
NAP/SLEEP are entered or not depends on hardware handshake. With disabled
hardware handshake (QREQ/QACK pins IIRC), it would only enter doze mode.

>
>  - errata 23: Not sure how that one can affect us. I don't think we do
>    explicit cache flush on locations subject to snooping from external
>    HW, at least not on UP (and rev 2.0 isn't used on SMP setups afaik)

Very serious if drivers program DMA from application memory to devices
(zero copy TCP for example, raw device I/O). A malicious program could
cause a hang.

>
>  - errata 28: dcbst reserving L2 cache lines. That one is bad, as afaik,
>    it could be used by userland code to kill the L2 cache. We should
>    probably replace use of dcbst by dcbf in the kernel.

I consider that one to be much less serious than the previous one. It is
only a performance loss. I also believe that all dcbst are followed by a
sync (at least after the loop for cache flushes > 1 line).

>
>  - errata 29: do we ever switch MSR:IR off via an mtmsr ? If yes, we
>    need to add a sync, but I don't think we do.

No, because kernel is not mapped 1:1 to physical memory, doing this would
cause an implicit jump, which is prohibited by the architecture. Note that
it also solves erratum 37 (different symptom and bug, same cure).

>
>  - errata 31: BTIC corruption. This one affect only rev 2.0 which isn't
>    used on SMP. So only the UP case matters. I'm not sure what a proper
>    fix would be, maybe the isync recommended workaround. Paul ?

I am not sure about that one, but I think that the isync would be
sufficient. Motorola does not detail under which conditions the processor
might hang, which makes it hard to tell whether it is possible to get a
hang with icbi only of if it only happens in the tlbie case. Or if the
hang can only be caused in in kernel mode because it would require the
execution of an unwanted supervisor instruction (spurious mtmsr for
example). Typical cache flush routines do not have 2 branches between icbi
and isync AFAICT and are not affected, so whether you can cause a hang
from applications or not is the fundamental question.

I still don't follow very well the Motorola explanation that icbi can be
used by applications and therefore the solution may be impossible to
implement: AFAIR after an icbi or string of icbi instructions, an isync
(actually a context synchronizing instruction) is compulsory to avoid
stale instruction in the (potentially infinitely long) instruction
prefetch queue.

erratum 36 ?

>
>  - errata 38: Should be worked around in HW by Apple on SMP macs using
>    7450 2.1. Other machines may need to implement software tablewalk
>    instead though (beware of other erratas related to using software
>    tablewalk then ;)
>

I don't understand how they can do a hardware workaround on that one!

>
>  - errata 47: dcbz vs. snoop hang. I need some more input on this one
>    we may have to disable store gathering when we have an L3 cache...

It looks insufficient, since I understand that it could be used by
malicious application to cause a hang, more or less in the same way as
erratum 23.

	Gabriel.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14 17:57 ` Gabriel Paubert
@ 2001-12-14 19:19   ` Benjamin Herrenschmidt
  2001-12-14 20:02     ` Tom Rini
  2001-12-14 20:41     ` Timothy A. Seufert
  0 siblings, 2 replies; 14+ messages in thread
From: Benjamin Herrenschmidt @ 2001-12-14 19:19 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, Dan Malek, paulus


>>  - errata 18: seem to imply NAP/SLEEP can't be used for us on rev 2.0
>>    Well, that's weird as it seem that darwin has specific workarounds
>>    for another rev 2.0 errata related to NAP/SLEEP (L1 coherency lost),
>>    I'll ask my Apple contact about this one. I didn't find the exact
>>    errata for the L1 issue though....
>
>and
>
>>  - errata 39: We must stop doing any DOZE/NAP in idle.c when we have an
>>    L3 cache.
>
>I am slightly confused, but:
>
>- only destops are affected by 39 since portables don't have L3 caches.

Yes.

>- the net result of 18 is that nap and sleep modes won't be entered,
>which is harmless if it only affects desktops.

Yes. I changed the kernel to not have the DOZE/NAP capability at all
on rev 2.0 chips to avoid confusion.

>- however, if the processor does not actually enter nap/sleep modes, how
>can it cause L3 cache corruption ? (or does it only happen if you work
>around 18 by disabling interrupts and using reset/machine-check to wakeup)

Revision 2.1 can enter NAP and some desktops use 2.1 and L3. I modified
the setup_7450 function in head.S to clear the DOZE/NAP capability when
L3 has been enabled by the firmware.

>I now realize that there is no explicit doze state on 7450, whether
>NAP/SLEEP are entered or not depends on hardware handshake. With disabled
>hardware handshake (QREQ/QACK pins IIRC), it would only enter doze mode.

Who know what undocumented HW does ? :)
>>
>>  - errata 23: Not sure how that one can affect us. I don't think we do
>>    explicit cache flush on locations subject to snooping from external
>>    HW, at least not on UP (and rev 2.0 isn't used on SMP setups afaik)
>
>Very serious if drivers program DMA from application memory to devices
>(zero copy TCP for example, raw device I/O). A malicious program could
>cause a hang.

Yes
>>
>>  - errata 28: dcbst reserving L2 cache lines. That one is bad, as afaik,
>>    it could be used by userland code to kill the L2 cache. We should
>>    probably replace use of dcbst by dcbf in the kernel.
>
>I consider that one to be much less serious than the previous one. It is
>only a performance loss. I also believe that all dcbst are followed by a
>sync (at least after the loop for cache flushes > 1 line).

Ok.

>>  - errata 29: do we ever switch MSR:IR off via an mtmsr ? If yes, we
>>    need to add a sync, but I don't think we do.
>
>No, because kernel is not mapped 1:1 to physical memory, doing this would
>cause an implicit jump, which is prohibited by the architecture. Note that
>it also solves erratum 37 (different symptom and bug, same cure).

Yup. I just wanted to ask anyway ;) Maybe some early bootloaders do that.

>>  - errata 31: BTIC corruption. This one affect only rev 2.0 which isn't
>>    used on SMP. So only the UP case matters. I'm not sure what a proper
>>    fix would be, maybe the isync recommended workaround. Paul ?
>
>I am not sure about that one, but I think that the isync would be
>sufficient. Motorola does not detail under which conditions the processor
>might hang, which makes it hard to tell whether it is possible to get a
>hang with icbi only of if it only happens in the tlbie case. Or if the
>hang can only be caused in in kernel mode because it would require the
>execution of an unwanted supervisor instruction (spurious mtmsr for
>example). Typical cache flush routines do not have 2 branches between icbi
>and isync AFAICT and are not affected, so whether you can cause a hang
>from applications or not is the fundamental question.
>
>
>I still don't follow very well the Motorola explanation that icbi can be
>used by applications and therefore the solution may be impossible to
>implement: AFAIR after an icbi or string of icbi instructions, an isync
>(actually a context synchronizing instruction) is compulsory to avoid
>stale instruction in the (potentially infinitely long) instruction
>prefetch queue.
>
>>  - errata 38: Should be worked around in HW by Apple on SMP macs using
>>    7450 2.1. Other machines may need to implement software tablewalk
>>    instead though (beware of other erratas related to using software
>>    tablewalk then ;)
>>
>
>I don't understand how they can do a hardware workaround on that one!

I don't neither, they didn't give me any detail. Could they catch
icache misses on the bus and delay incoming tlbie (freezing the emitter)
when that happen ? I don't know the bus protocol ...

>>  - errata 47: dcbz vs. snoop hang. I need some more input on this one
>>    we may have to disable store gathering when we have an L3 cache...
>
>It looks insufficient, since I understand that it could be used by
>malicious application to cause a hang, more or less in the same way as
>erratum 23.

Hrm...

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14 19:19   ` Benjamin Herrenschmidt
@ 2001-12-14 20:02     ` Tom Rini
  2001-12-14 20:41     ` Timothy A. Seufert
  1 sibling, 0 replies; 14+ messages in thread
From: Tom Rini @ 2001-12-14 20:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Gabriel Paubert, linuxppc-dev, Dan Malek, paulus


On Fri, Dec 14, 2001 at 08:19:20PM +0100, Benjamin Herrenschmidt wrote:

> >>  - errata 29: do we ever switch MSR:IR off via an mtmsr ? If yes, we
> >>    need to add a sync, but I don't think we do.
> >
> >No, because kernel is not mapped 1:1 to physical memory, doing this would
> >cause an implicit jump, which is prohibited by the architecture. Note that
> >it also solves erratum 37 (different symptom and bug, same cure).
>
> Yup. I just wanted to ask anyway ;) Maybe some early bootloaders do that.

Actually, I _think_ there's a possibility.  In
arch/ppc/boot/common/util.S we do:
        .globl  disable_6xx_mmu
disable_6xx_mmu:
        /* Establish default MSR value, exception prefix 0xFFF.
         * If necessary, this function must fix up the LR if we
         * return to a different address space once the MMU is
         * disabled.
         */
        li      r8,MSR_IP|MSR_FP
        mtmsr   r8

So we should probably throw a 'sync' at the end of that, yes?

--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 7450 bugs & fixes
  2001-12-14 19:19   ` Benjamin Herrenschmidt
  2001-12-14 20:02     ` Tom Rini
@ 2001-12-14 20:41     ` Timothy A. Seufert
  1 sibling, 0 replies; 14+ messages in thread
From: Timothy A. Seufert @ 2001-12-14 20:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Gabriel Paubert; +Cc: linuxppc-dev, Dan Malek, paulus


At 8:19 PM +0100 12/14/01, Benjamin Herrenschmidt wrote:

>I don't neither, they didn't give me any detail. Could they catch
>icache misses on the bus and delay incoming tlbie (freezing the emitter)
>when that happen ? I don't know the bus protocol ...

Take this with a grain of salt (*), but I don't recall anything in
the bus protocol which could be used to identify whether a particular
busrt is filling an icache or a dcache miss, much less whether it's
an icache miss which happened during a tablewalk.  It all looks the
same to the outside world.

(*) I've designed hardware, but it was a 750cx and 8260 system and
therefore plain old 60x bus, not MAXbus like the systems in question.
Also, I wasn't the one who had to get deep into the the 60x bus
protocol.  That said, I don't remember anything in 60x which could
identify *why* a transfer was happening, and whenever I've skimmed
over MAXbus docs it looks a lot like a better version of 60x.
--
Tim Seufert

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-12-14 20:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-29 15:00 7450 bugs & fixes Benjamin Herrenschmidt
2001-11-29 18:20 ` Gabriel Paubert
2001-11-30  8:36 ` Giuliano Pochini
2001-11-30 10:46   ` Holger Bettag
2001-11-30 21:52     ` Timothy A. Seufert
2001-12-10 22:54 ` Jim
2001-12-14  5:17   ` Christopher Murtagh
2001-12-14  9:17     ` Benjamin Herrenschmidt
2001-12-14 16:50       ` Christopher Murtagh
2001-12-14 16:53         ` Benjamin Herrenschmidt
2001-12-14 17:57 ` Gabriel Paubert
2001-12-14 19:19   ` Benjamin Herrenschmidt
2001-12-14 20:02     ` Tom Rini
2001-12-14 20:41     ` Timothy A. Seufert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).