IBM 750GX SMP on Marvell Discovery II or III?

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* IBM 750GX SMP on Marvell Discovery II or III?
@ 2004-05-10  7:28 Amit Shah
  2004-05-10 23:36 ` Paul Mackerras
  0 siblings, 1 reply; 22+ messages in thread
From: Amit Shah @ 2004-05-10  7:28 UTC (permalink / raw)
  To: linuxppc-dev

Hi all,

I was wondering if SMP is supported for the 750 GX processor built on the
Marvell Discovery boards.

If not, what's the problem in getting it supported? An older mail by Cort
Dougan said it was because of TLB invalidate information not being
broadcasted, but it was a really old mail, has anyone come up with any
workarounds?

Thanks,
Amit.
--
Amit Shah
http://amitshah.nav.to/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-10  7:28 IBM 750GX SMP on Marvell Discovery II or III? Amit Shah
@ 2004-05-10 23:36 ` Paul Mackerras
  2004-05-11  2:09   ` Dan Malek
  2004-05-11  3:08   ` Huailin Chen
  0 siblings, 2 replies; 22+ messages in thread
From: Paul Mackerras @ 2004-05-10 23:36 UTC (permalink / raw)
  To: Amit Shah; +Cc: linuxppc-dev

Amit Shah writes:

> I was wondering if SMP is supported for the 750 GX processor built on the
> Marvell Discovery boards.

SMP is not supported for 750 processors.

> If not, what's the problem in getting it supported? An older mail by Cort
> Dougan said it was because of TLB invalidate information not being
> broadcasted, but it was a really old mail, has anyone come up with any
> workarounds?

The real killer is that the cache management instructions are not
broadcast.  The fact that the TLB invalidations are not broadcast is
painful but it can be worked around in the kernel.  In contrast, the
cache management instructions are used in userspace.

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-10 23:36 ` Paul Mackerras
@ 2004-05-11  2:09   ` Dan Malek
  2004-05-11  3:03     ` Paul Mackerras
  2004-05-11  3:08   ` Huailin Chen
  1 sibling, 1 reply; 22+ messages in thread
From: Dan Malek @ 2004-05-11  2:09 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Amit Shah, linuxppc-dev


On May 10, 2004, at 7:36 PM, Paul Mackerras wrote:

> SMP is not supported for 750 processors.

You mean for IBM 750 processors :-)
The MPC75x processors support this.


	-- Dan


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11  2:09   ` Dan Malek
@ 2004-05-11  3:03     ` Paul Mackerras
  2004-05-11 15:46       ` Dan Malek
  0 siblings, 1 reply; 22+ messages in thread
From: Paul Mackerras @ 2004-05-11  3:03 UTC (permalink / raw)
  To: Dan Malek; +Cc: Amit Shah, linuxppc-dev

Dan Malek writes:

> On May 10, 2004, at 7:36 PM, Paul Mackerras wrote:
>
> > SMP is not supported for 750 processors.
>
> You mean for IBM 750 processors :-)
> The MPC75x processors support this.

No, actually, I meant all 750 processors.  The MPC750 will broadcast
the cache operations if you set HID0[ABE], but that doesn't help us
because the MPC750 never snoops those operations.  According to page
2-63 of the MPC750 users manual: "Of the broadcast cache operations,
the MPC750 snoops only dcbz, regardless of the HID0[ABE] setting."

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11  3:03     ` Paul Mackerras
@ 2004-05-11 15:46       ` Dan Malek
  2004-05-11 17:23         ` Huailin Chen
  2004-05-12  0:12         ` Paul Mackerras
  0 siblings, 2 replies; 22+ messages in thread
From: Dan Malek @ 2004-05-11 15:46 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Amit Shah, linuxppc-dev

On May 10, 2004, at 11:03 PM, Paul Mackerras wrote:

> ......According to page
> 2-63 of the MPC750 users manual: "Of the broadcast cache operations,
> the MPC750 snoops only dcbz, regardless of the HID0[ABE] setting."

But, read the following sentence.  "Any bus activity caused by other
cache instructions results directly from performing the operation on
the MPC750 cache."  A dcbz has to be broadcast, others do not because
their operations appear just as standard load/store ops.

The only thing we should have to do in software is the icbi, which is
no big deal to broadcast.

My experience has been that MPC750s work in a SMP environment
on a 60x bus.  Maybe I was just lucky?  The way I read the manual,
they should work with a proper memory controller.

Thanks.

	-- Dan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11 15:46       ` Dan Malek
@ 2004-05-11 17:23         ` Huailin Chen
  2004-05-11 17:31           ` Amit Shah
  2004-05-12  0:12         ` Paul Mackerras
  1 sibling, 1 reply; 22+ messages in thread
From: Huailin Chen @ 2004-05-11 17:23 UTC (permalink / raw)
  To: Dan Malek, Paul Mackerras; +Cc: Amit Shah, linuxppc-dev


> My experience has been that MPC750s work in a SMP environment on a 60x
> bus. Maybe I was just lucky? The way I read the manual, they should
> work with a proper memory controller.

750 MEI protocol does NOT support SMP well. Sure, you
are right. It works, but not that good.

Ideally, MESI + MPX Bus is the best one for SMP under
PPC arch. That's why need go to G4+ GT 64360.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11 17:23         ` Huailin Chen
@ 2004-05-11 17:31           ` Amit Shah
  2004-05-11 20:51             ` Huailin Chen
  2004-05-12  0:17             ` Paul Mackerras
  0 siblings, 2 replies; 22+ messages in thread
From: Amit Shah @ 2004-05-11 17:31 UTC (permalink / raw)
  To: huailin; +Cc: Dan Malek, Paul Mackerras, linuxppc-dev


Hi all,

It's pretty strange then that people would come up with boards based on
dual-processors of the 7xx family (like the IBM Argan board or the Xcalibur
boards). Any idea how they use it or what are they intended for?

Amit.

On Tuesday 11 May 2004 22:53, Huailin Chen wrote:
> > My experience has been that MPC750s work in a SMP
> > environment
> > on a 60x bus.  Maybe I was just lucky?  The way I
> > read the manual,
> > they should work with a proper memory controller.
> >
> > Thanks.
>
> 750 MEI protocol does NOT support SMP well. Sure, you
> are right. It works, but not that good.
>
> Ideally, MESI + MPX Bus is the best one for SMP under
> PPC arch. That's why need go to G4+ GT 64360.

--
Amit Shah
Codito Technologies Pvt. Ltd.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11 17:31           ` Amit Shah
@ 2004-05-11 20:51             ` Huailin Chen
  2004-05-12  0:17             ` Paul Mackerras
  1 sibling, 0 replies; 22+ messages in thread
From: Huailin Chen @ 2004-05-11 20:51 UTC (permalink / raw)
  To: Amit Shah; +Cc: Dan Malek, Paul Mackerras, linuxppc-dev

When talking about Multi-Processor arch, we have to
consider the chipset support, especailly for the bus
protocol part.

For the particular PowerPC G3 vs. G4 issue, the thing
is: If you have a G4 with MPX support, it would be not
wise if have a chipset with ONLy 60x support. I mean,
going for SMP is to achieve high performance. However,
your goal will not be achieved if you don't have a
good system bus throughput, which usually is the real
bottleneck for an appliance.

Also, I even don't think GT64260 have the Door Bell
stuff with which one cpu can send external interrupts
to another.

For bus protocol itself, there are some signal/pins
different in MPX than 60x. Most of them is from data
streaming and so on, in order to remove dead circles
and so on.

For more detail, try to some data sheet about your
chipset.

Huailin,

--- Amit Shah <amit.shah@codito.com> wrote:
>
> It's pretty strange then that people would come up with boards based
> on dual-processors of the 7xx family (like the IBM Argan board or the
> Xcalibur boards). Any idea how they use it or what are they intended
> for?

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11 17:31           ` Amit Shah
  2004-05-11 20:51             ` Huailin Chen
@ 2004-05-12  0:17             ` Paul Mackerras
  1 sibling, 0 replies; 22+ messages in thread
From: Paul Mackerras @ 2004-05-12  0:17 UTC (permalink / raw)
  To: Amit Shah; +Cc: huailin, Dan Malek, linuxppc-dev

Amit Shah writes:

> It's pretty strange then that people would come up with boards based on
> dual-processors of the 7xx family (like the IBM Argan board or the Xcalibur
> boards). Any idea how they use it or what are they intended for?

I think it's the usual hardware designers' attitude that any bugs in
the hardware can be worked around in software.  And they can, it's
just that to address the problems properly is going to take effort,
and no-one has yet had the time, energy and motivation to sit down and
do that (and do it in a way which is sufficiently clean and well
though out to be accepted into the Linux kernel).  Fortunately Apple
never did a 750-based SMP system. :)

Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-11 15:46       ` Dan Malek
  2004-05-11 17:23         ` Huailin Chen
@ 2004-05-12  0:12         ` Paul Mackerras
  2004-05-12  7:57           ` Giuliano Pochini
  2004-05-12  8:00           ` Gabriel Paubert
  1 sibling, 2 replies; 22+ messages in thread
From: Paul Mackerras @ 2004-05-12  0:12 UTC (permalink / raw)
  To: Dan Malek; +Cc: Amit Shah, linuxppc-dev

Dan Malek writes:

> But, read the following sentence.  "Any bus activity caused by other
> cache instructions results directly from performing the operation on
> the MPC750 cache."  A dcbz has to be broadcast, others do not because
> their operations appear just as standard load/store ops.
>
> The only thing we should have to do in software is the icbi, which is
> no big deal to broadcast.

I don't think you are right, but it would be nice if you can prove me
wrong. ;)

Consider this scenario: an application is modifying some instructions
(for example, ld.so modifying a PLT entry).  It modifies the
instructions, and then just before it does its dcbst; sync; icbi;
isync sequence, it gets scheduled on the other CPU.  It goes ahead and
does the dcbst.  However, the relevant cache lines aren't in the the
cache (they are in the E state in the other CPU's cache), so nothing
gets written out to memory.  After doing the sync; icbi; isync it goes
to execute the instructions and gets the old instructions, not the new
ones.

The dcbst won't cause any stores to memory in this scenario.  It will
cause a dcbst address-only broadcast but that won't (according to my
reading of the manual) cause the other CPU to write back its copy of
the relevant cache line, since the dcbst isn't snooped.

The only workaround I can see for this is to completely flush the D
and I caches of both CPUs whenever we schedule a process on a
different CPU from that on which it last ran.  Triple yuck.

> My experience has been that MPC750s work in a SMP environment
> on a 60x bus.  Maybe I was just lucky?  The way I read the manual,
> they should work with a proper memory controller.

I think that the sorts of problems I am talking about wouldn't show up
very often.  Generally I think that these problems would just cause
the system to be a bit flaky rather than stop it from working at all.
If you didn't have L2 caches that would make the problems show up less
frequently, too.

Regards,
Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12  0:12         ` Paul Mackerras
@ 2004-05-12  7:57           ` Giuliano Pochini
  2004-05-12  8:00           ` Gabriel Paubert
  1 sibling, 0 replies; 22+ messages in thread
From: Giuliano Pochini @ 2004-05-12  7:57 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Amit Shah, Dan Malek

On 12-May-2004 Paul Mackerras wrote:
>
> The only workaround I can see for this is to completely flush the D
> and I caches of both CPUs whenever we schedule a process on a
> different CPU from that on which it last ran.  Triple yuck.

It's not very different than what Linux does with NUMA
systems AFAIK. If some cache management instructions cause
troubles and it is an embedded system, it may be a good
solution recompiling user space stuff removing problematic
parts.

--
Giuliano.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12  0:12         ` Paul Mackerras
  2004-05-12  7:57           ` Giuliano Pochini
@ 2004-05-12  8:00           ` Gabriel Paubert
  2004-05-12 10:26             ` Benjamin Herrenschmidt
  2004-05-12 11:46             ` Paul Mackerras
  1 sibling, 2 replies; 22+ messages in thread
From: Gabriel Paubert @ 2004-05-12  8:00 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Dan Malek, Amit Shah, linuxppc-dev

On Wed, May 12, 2004 at 10:12:47AM +1000, Paul Mackerras wrote:
>
> Dan Malek writes:
>
> > But, read the following sentence.  "Any bus activity caused by other
> > cache instructions results directly from performing the operation on
> > the MPC750 cache."  A dcbz has to be broadcast, others do not because
> > their operations appear just as standard load/store ops.
> >
> > The only thing we should have to do in software is the icbi, which is
> > no big deal to broadcast.
>
> I don't think you are right, but it would be nice if you can prove me
> wrong. ;)
>
> Consider this scenario: an application is modifying some instructions
> (for example, ld.so modifying a PLT entry).  It modifies the
> instructions, and then just before it does its dcbst; sync; icbi;
> isync sequence, it gets scheduled on the other CPU.  It goes ahead and
> does the dcbst.  However, the relevant cache lines aren't in the the
> cache (they are in the E state in the other CPU's cache), so nothing
> gets written out to memory.  After doing the sync; icbi; isync it goes
> to execute the instructions and gets the old instructions, not the new
> ones.

Are you sure? Since the cache lines are in the other processor memory,
they will be flushed to RAM when they are fetched by the processor,
provided that you can force the coherence bit on instruction fetches
(this is possible IIRC).

The most nasty scenario is I believe:
- proceeding up to icbi or isync on processor 1,
- scheduling and switching the process to processor 2
- the instructions were already in the icache on processor 2
 for some reasons (PLT entries are half a cache line long IIRC)

The only solution to this is full icache invalidate when a process
changes processors. Threading might however make things worse
because threads are entitled to believe from the architecture
specification that icbi will affect other threads simultaneously
running on other processors. And that has no clean solution AFAICS.

BTW, did I dream or did I read somewhere that on a PPC750 icbi
flushes all the cache ways (using only 7 bits of the address).
This would mean that flushing an instruction cache page flushes
the whole cache, and settnig HID0[ICFI] might be faster.

> The dcbst won't cause any stores to memory in this scenario.  It will
> cause a dcbst address-only broadcast but that won't (according to my
> reading of the manual) cause the other CPU to write back its copy of
> the relevant cache line, since the dcbst isn't snooped.

Yeah, but the subsequent fetch will be snooped if it's marked
coherent. dcbst is really only necessary because instruction fetches
don't look into the L1 data cache of the same processor.

>
> The only workaround I can see for this is to completely flush the D
> and I caches of both CPUs whenever we schedule a process on a
> different CPU from that on which it last ran.  Triple yuck.

As I said, I believe the real problem is multithreaded applications.

>
> > My experience has been that MPC750s work in a SMP environment
> > on a 60x bus.  Maybe I was just lucky?  The way I read the manual,
> > they should work with a proper memory controller.
>
> I think that the sorts of problems I am talking about wouldn't show up
> very often.  Generally I think that these problems would just cause
> the system to be a bit flaky rather than stop it from working at all.

I agree.

> If you didn't have L2 caches that would make the problems show up less
> frequently, too.

I'm not so sure. Instruction fetches look into L2 caches. The main issue
are:
1) are the instruction fetches marked coherent?
2) do you run multithreaded applications?

If you answer yes and no, then I don't see any showstopper.

	Regards,
	Gabriel

>
> Regards,
> Paul.
>

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12  8:00           ` Gabriel Paubert
@ 2004-05-12 10:26             ` Benjamin Herrenschmidt
  2004-05-12 11:53               ` Gabriel Paubert
  2004-05-12 11:46             ` Paul Mackerras
  1 sibling, 1 reply; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2004-05-12 10:26 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Paul Mackerras, Dan Malek, Amit Shah, linuxppc-dev list


>
> Are you sure? Since the cache lines are in the other processor memory,
> they will be flushed to RAM when they are fetched by the processor,
> provided that you can force the coherence bit on instruction fetches
> (this is possible IIRC).

Coherency of the data cache lines is one thing... getting the icbi
broadcast is another. Normal coherency will not help if you don't get
the icache of the other CPU to snoop your icbi and invalidate the trash
it has in its icache.

> As I said, I believe the real problem is multithreaded applications.

Which isn't a simple problem...

> >
> > > My experience has been that MPC750s work in a SMP environment
> > > on a 60x bus.  Maybe I was just lucky?  The way I read the manual,
> > > they should work with a proper memory controller.
> >
> > I think that the sorts of problems I am talking about wouldn't show up
> > very often.  Generally I think that these problems would just cause
> > the system to be a bit flaky rather than stop it from working at all.
>
> I agree.
>
> > If you didn't have L2 caches that would make the problems show up less
> > frequently, too.
>
> I'm not so sure. Instruction fetches look into L2 caches. The main issue
> are:
> 1) are the instruction fetches marked coherent?
> 2) do you run multithreaded applications?
>
> If you answer yes and no, then I don't see any showstopper.
>
> 	Regards,
> 	Gabriel
>
>
> >
> > Regards,
> > Paul.
> >
>
--
Benjamin Herrenschmidt <benh@kernel.crashing.org>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12 10:26             ` Benjamin Herrenschmidt
@ 2004-05-12 11:53               ` Gabriel Paubert
  0 siblings, 0 replies; 22+ messages in thread
From: Gabriel Paubert @ 2004-05-12 11:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Paul Mackerras, Dan Malek, Amit Shah, linuxppc-dev list


On Wed, May 12, 2004 at 08:26:19PM +1000, Benjamin Herrenschmidt wrote:
>
> >
> > Are you sure? Since the cache lines are in the other processor memory,
> > they will be flushed to RAM when they are fetched by the processor,
> > provided that you can force the coherence bit on instruction fetches
> > (this is possible IIRC).
>
> Coherency of the data cache lines is one thing... getting the icbi
> broadcast is another. Normal coherency will not help if you don't get
> the icache of the other CPU to snoop your icbi and invalidate the trash
> it has in its icache.
>
> > As I said, I believe the real problem is multithreaded applications.
>
> Which isn't a simple problem...

Indeed, it is actually not solvable in a reasonable way, disabling
the icache being far too unreasonable ;-)

But my point was that Paul's example, one process being rescheduled
on another processor, is actually quite solvable (provided it is the
sole owner of the MM context). You don't lose much by flushing the
icache on a MEI system compared with the hardware overhead of all
the invalidations and flushing that will take place because of the
process switch.

	Regards,
	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12  8:00           ` Gabriel Paubert
  2004-05-12 10:26             ` Benjamin Herrenschmidt
@ 2004-05-12 11:46             ` Paul Mackerras
  2004-05-12 13:45               ` Gabriel Paubert
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Mackerras @ 2004-05-12 11:46 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Dan Malek, Amit Shah, linuxppc-dev

Gabriel Paubert writes:

> Are you sure? Since the cache lines are in the other processor memory,
> they will be flushed to RAM when they are fetched by the processor,
> provided that you can force the coherence bit on instruction fetches
> (this is possible IIRC).

The table on page 3-29 of the 750 user manual implies that GBL is
asserted if M=1 on instruction fetches.  So you're right.

> The most nasty scenario is I believe:
> - proceeding up to icbi or isync on processor 1,
> - scheduling and switching the process to processor 2
> - the instructions were already in the icache on processor 2
>  for some reasons (PLT entries are half a cache line long IIRC)

Another bad scenario would be:

- write the instructions on processor 1
- switch the process to processor 2
- it does the dcbst + sync, which do nothing
- switch the process back to processor 1
- icbi, isync, try to execute the instructions

In this scenario the instructions don't get written back to memory.
So it sounds like when we switch a processor from cpu A to cpu B, we
would need to (at least) flush cpu A's data cache and cpu B's
instruction cache.

Basically you can't rely on any cache management instructions being
effective, because they could be executed on a different processor
from the one where you need to execute them.  This is true inside the
kernel as well if you have preemption enabled (you can of course
disable preemption where necessary, but you have to find and modify
all those places).  This will also affect the lazy cache flush logic
that we have that defers doing the dcache/icache flush on a page until
the page gets mapped into a user process.

> The only solution to this is full icache invalidate when a process
> changes processors. Threading might however make things worse
> because threads are entitled to believe from the architecture
> specification that icbi will affect other threads simultaneously
> running on other processors. And that has no clean solution AFAICS.

Indeed, I can't see one either.  Not being able to use threads takes
some of the fun out of SMP, of course.

> BTW, did I dream or did I read somewhere that on a PPC750 icbi
> flushes all the cache ways (using only 7 bits of the address).

Page 2-64 says about icbi: "All ways of a selected set are
invalidated".  It seems that saves them having to actually translate
the effective address. :)  That means that the kernel doing the
dcache/icache flush on a page is going to invalidate the whole
icache.  Ew...

Regards,
Paul.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12 11:46             ` Paul Mackerras
@ 2004-05-12 13:45               ` Gabriel Paubert
  2004-05-12 14:21                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 22+ messages in thread
From: Gabriel Paubert @ 2004-05-12 13:45 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Dan Malek, Amit Shah, linuxppc-dev


On Wed, May 12, 2004 at 09:46:19PM +1000, Paul Mackerras wrote:
>
> Gabriel Paubert writes:
>
> > Are you sure? Since the cache lines are in the other processor memory,
> > they will be flushed to RAM when they are fetched by the processor,
> > provided that you can force the coherence bit on instruction fetches
> > (this is possible IIRC).
>
> The table on page 3-29 of the 750 user manual implies that GBL is
> asserted if M=1 on instruction fetches.  So you're right.
>
> > The most nasty scenario is I believe:
> > - proceeding up to icbi or isync on processor 1,
> > - scheduling and switching the process to processor 2
> > - the instructions were already in the icache on processor 2
> >  for some reasons (PLT entries are half a cache line long IIRC)
>
> Another bad scenario would be:
>
> - write the instructions on processor 1
> - switch the process to processor 2
> - it does the dcbst + sync, which do nothing
> - switch the process back to processor 1
> - icbi, isync, try to execute the instructions
>
> In this scenario the instructions don't get written back to memory.
> So it sounds like when we switch a processor from cpu A to cpu B, we
> would need to (at least) flush cpu A's data cache and cpu B's
> instruction cache.

Argh, I did not think of that case. Switching twice in two instructions
is too devious for me ;-) It is also probably much harder to hit than
the example I gave (which requires either two process switches or a
multithreaded application), but correctness indeed requires a data
cache flush.

Data cache flushes are evil! Strictly speaking I believe that only
the L1 cache needs to be flushed since instruction fetches will look
at L2, but I hoped that a simple flash invalidate of icache would be
sufficient and it's not.

> Basically you can't rely on any cache management instructions being
> effective, because they could be executed on a different processor
> from the one where you need to execute them.  This is true inside the
> kernel as well if you have preemption enabled (you can of course
> disable preemption where necessary, but you have to find and modify
> all those places).  This will also affect the lazy cache flush logic
> that we have that defers doing the dcache/icache flush on a page until
> the page gets mapped into a user process.

I've never looked at that logic so I can't comment.


> > The only solution to this is full icache invalidate when a process
> > changes processors. Threading might however make things worse
> > because threads are entitled to believe from the architecture
> > specification that icbi will affect other threads simultaneously
> > running on other processors. And that has no clean solution AFAICS.
>
> Indeed, I can't see one either.  Not being able to use threads takes
> some of the fun out of SMP, of course.

Bottom line, 750 can't be used for SMP.

>
> > BTW, did I dream or did I read somewhere that on a PPC750 icbi
> > flushes all the cache ways (using only 7 bits of the address).
>
> Page 2-64 says about icbi: "All ways of a selected set are
> invalidated".  It seems that saves them having to actually translate
> the effective address. :)  That means that the kernel doing the
> dcache/icache flush on a page is going to invalidate the whole
> icache.  Ew...

Be more optimistic, consider this as an optimization opportunity!
don't loop over the lines, simply flush the whole cache. Especially
if you want to flush several pages.

For example and if I understand what you mean by lazy cache flushing:
once you have done an icache flush when mapping a page to userspace,
you don't need to perform any other until a page has been unmapped.
(This can probably be improved upon but it's a start).

	Regards,
	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12 13:45               ` Gabriel Paubert
@ 2004-05-12 14:21                 ` Geert Uytterhoeven
  2004-05-12 14:30                   ` Amit Shah
  2004-05-13  4:30                   ` Bryan Rittmeyer
  0 siblings, 2 replies; 22+ messages in thread
From: Geert Uytterhoeven @ 2004-05-12 14:21 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Paul Mackerras, Dan Malek, Amit Shah, Linux/PPC Development


On Wed, 12 May 2004, Gabriel Paubert wrote:
> Bottom line, 750 can't be used for SMP.

Solution: divide memory in pieces, run multiple instances of Linux, each on its
own CPU and memory piece, and use a piece of uncached RAM for implementing
communication channels between CPUs ;-)

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12 14:21                 ` Geert Uytterhoeven
@ 2004-05-12 14:30                   ` Amit Shah
  2004-05-13  4:30                   ` Bryan Rittmeyer
  1 sibling, 0 replies; 22+ messages in thread
From: Amit Shah @ 2004-05-12 14:30 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Gabriel Paubert, Paul Mackerras, Dan Malek, Linux/PPC Development


On Wednesday 12 May 2004 19:51, Geert Uytterhoeven wrote:
> On Wed, 12 May 2004, Gabriel Paubert wrote:
> > Bottom line, 750 can't be used for SMP.
>
> Solution: divide memory in pieces, run multiple instances of Linux, each on
> its own CPU and memory piece, and use a piece of uncached RAM for
> implementing communication channels between CPUs ;-)

Wow, I was thinking about the exact same thing when I read this mail. I still
haven't grokked the PPC architecture to comment on all the mails on this
thread, but I guess this would be the easiest and surest way to make "SMP"
work.

--
Amit Shah
http://amitshah.nav.to/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-12 14:21                 ` Geert Uytterhoeven
  2004-05-12 14:30                   ` Amit Shah
@ 2004-05-13  4:30                   ` Bryan Rittmeyer
  2004-05-14  8:02                     ` Geert Uytterhoeven
  1 sibling, 1 reply; 22+ messages in thread
From: Bryan Rittmeyer @ 2004-05-13  4:30 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linux/PPC Development


On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote:
> Solution: divide memory in pieces, run multiple instances of Linux, each on its
> own CPU and memory piece, and use a piece of uncached RAM for implementing
> communication channels between CPUs ;-)

Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a
1000Mbps NIC on each CPU and cable em together ;-\

-Bryan

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-13  4:30                   ` Bryan Rittmeyer
@ 2004-05-14  8:02                     ` Geert Uytterhoeven
  2004-05-14  9:11                       ` Gabriel Paubert
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2004-05-14  8:02 UTC (permalink / raw)
  To: Bryan Rittmeyer; +Cc: Linux/PPC Development


On Wed, 12 May 2004, Bryan Rittmeyer wrote:
> On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote:
> > Solution: divide memory in pieces, run multiple instances of Linux, each on its
> > own CPU and memory piece, and use a piece of uncached RAM for implementing
> > communication channels between CPUs ;-)
>
> Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a
> 1000Mbps NIC on each CPU and cable em together ;-\

You can always put the real data in cacheable memory, and keep only some
control descriptors in uncached memory. Needs some explicit cache handling, but
should be faster.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-14  8:02                     ` Geert Uytterhoeven
@ 2004-05-14  9:11                       ` Gabriel Paubert
  0 siblings, 0 replies; 22+ messages in thread
From: Gabriel Paubert @ 2004-05-14  9:11 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Bryan Rittmeyer, Linux/PPC Development

On Fri, May 14, 2004 at 10:02:10AM +0200, Geert Uytterhoeven wrote:
>
> On Wed, 12 May 2004, Bryan Rittmeyer wrote:
> > On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote:
> > > Solution: divide memory in pieces, run multiple instances of Linux, each on its
> > > own CPU and memory piece, and use a piece of uncached RAM for implementing
> > > communication channels between CPUs ;-)
> >
> > Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a
> > 1000Mbps NIC on each CPU and cable em together ;-\
>
> You can always put the real data in cacheable memory, and keep only some
> control descriptors in uncached memory. Needs some explicit cache handling, but
> should be faster.

No the problem was the coherency of instruction and data caches. Data
caches are just coherent, no shared state so you'd rather avoid having
two processors actively reading from the same cache lines, but that's
about all. Just map them through a non-execute segment so that you are
sure that the

Hmmm, now that I tinbk of it, this means that one processor fetching
an instruction line will invalidate the same cache line in the L2 cache
of the other processor. Which means that the L2 cache is actually
useless for sharing code and you might actually force it to only cache
data by fiddling with HID0.

Well, MEI caches are actually worse than what I believed for SMP. They
work well enough for UP with DMA.

	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: IBM 750GX SMP on Marvell Discovery II or III?
  2004-05-10 23:36 ` Paul Mackerras
  2004-05-11  2:09   ` Dan Malek
@ 2004-05-11  3:08   ` Huailin Chen
  1 sibling, 0 replies; 22+ messages in thread
From: Huailin Chen @ 2004-05-11  3:08 UTC (permalink / raw)
  To: Paul Mackerras, Amit Shah; +Cc: linuxppc-dev


750 series with MEI protocol is a dog for
Multi-Processor system. If you are working for a
high-end products, change it NOW to G4+. 750 is G3.

GT64260 is not good enough for MP yet. Someting
related to 60xbus vs. MPX.

Anyway, right, for PPC MP system, the best way is:
64360 + G4

Also, try to read latest errota when doing the design.

Huailin,

> > If not, what's the problem in getting it supported? An older mail by
> > Cort Dougan said it was because of TLB invalidate information not
> > being broadcasted, but it was a really old mail, has anyone come up
> > with any workarounds?
>
> The real killer is that the cache management instructions are not
> broadcast. The fact that the TLB invalidations are not broadcast is
> painful but it can be worked around in the kernel. In contrast, the
> cache management instructions are used in userspace.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-05-14  9:11 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-10  7:28 IBM 750GX SMP on Marvell Discovery II or III? Amit Shah
2004-05-10 23:36 ` Paul Mackerras
2004-05-11  2:09   ` Dan Malek
2004-05-11  3:03     ` Paul Mackerras
2004-05-11 15:46       ` Dan Malek
2004-05-11 17:23         ` Huailin Chen
2004-05-11 17:31           ` Amit Shah
2004-05-11 20:51             ` Huailin Chen
2004-05-12  0:17             ` Paul Mackerras
2004-05-12  0:12         ` Paul Mackerras
2004-05-12  7:57           ` Giuliano Pochini
2004-05-12  8:00           ` Gabriel Paubert
2004-05-12 10:26             ` Benjamin Herrenschmidt
2004-05-12 11:53               ` Gabriel Paubert
2004-05-12 11:46             ` Paul Mackerras
2004-05-12 13:45               ` Gabriel Paubert
2004-05-12 14:21                 ` Geert Uytterhoeven
2004-05-12 14:30                   ` Amit Shah
2004-05-13  4:30                   ` Bryan Rittmeyer
2004-05-14  8:02                     ` Geert Uytterhoeven
2004-05-14  9:11                       ` Gabriel Paubert
2004-05-11  3:08   ` Huailin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).