* IBM 750GX SMP on Marvell Discovery II or III? @ 2004-05-10 7:28 Amit Shah 2004-05-10 23:36 ` Paul Mackerras 0 siblings, 1 reply; 22+ messages in thread From: Amit Shah @ 2004-05-10 7:28 UTC (permalink / raw) To: linuxppc-dev Hi all, I was wondering if SMP is supported for the 750 GX processor built on the Marvell Discovery boards. If not, what's the problem in getting it supported? An older mail by Cort Dougan said it was because of TLB invalidate information not being broadcasted, but it was a really old mail, has anyone come up with any workarounds? Thanks, Amit. -- Amit Shah http://amitshah.nav.to/ ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-10 7:28 IBM 750GX SMP on Marvell Discovery II or III? Amit Shah @ 2004-05-10 23:36 ` Paul Mackerras 2004-05-11 2:09 ` Dan Malek 2004-05-11 3:08 ` Huailin Chen 0 siblings, 2 replies; 22+ messages in thread From: Paul Mackerras @ 2004-05-10 23:36 UTC (permalink / raw) To: Amit Shah; +Cc: linuxppc-dev Amit Shah writes: > I was wondering if SMP is supported for the 750 GX processor built on the > Marvell Discovery boards. SMP is not supported for 750 processors. > If not, what's the problem in getting it supported? An older mail by Cort > Dougan said it was because of TLB invalidate information not being > broadcasted, but it was a really old mail, has anyone come up with any > workarounds? The real killer is that the cache management instructions are not broadcast. The fact that the TLB invalidations are not broadcast is painful but it can be worked around in the kernel. In contrast, the cache management instructions are used in userspace. Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-10 23:36 ` Paul Mackerras @ 2004-05-11 2:09 ` Dan Malek 2004-05-11 3:03 ` Paul Mackerras 2004-05-11 3:08 ` Huailin Chen 1 sibling, 1 reply; 22+ messages in thread From: Dan Malek @ 2004-05-11 2:09 UTC (permalink / raw) To: Paul Mackerras; +Cc: Amit Shah, linuxppc-dev On May 10, 2004, at 7:36 PM, Paul Mackerras wrote: > SMP is not supported for 750 processors. You mean for IBM 750 processors :-) The MPC75x processors support this. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 2:09 ` Dan Malek @ 2004-05-11 3:03 ` Paul Mackerras 2004-05-11 15:46 ` Dan Malek 0 siblings, 1 reply; 22+ messages in thread From: Paul Mackerras @ 2004-05-11 3:03 UTC (permalink / raw) To: Dan Malek; +Cc: Amit Shah, linuxppc-dev Dan Malek writes: > On May 10, 2004, at 7:36 PM, Paul Mackerras wrote: > > > SMP is not supported for 750 processors. > > You mean for IBM 750 processors :-) > The MPC75x processors support this. No, actually, I meant all 750 processors. The MPC750 will broadcast the cache operations if you set HID0[ABE], but that doesn't help us because the MPC750 never snoops those operations. According to page 2-63 of the MPC750 users manual: "Of the broadcast cache operations, the MPC750 snoops only dcbz, regardless of the HID0[ABE] setting." Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 3:03 ` Paul Mackerras @ 2004-05-11 15:46 ` Dan Malek 2004-05-11 17:23 ` Huailin Chen 2004-05-12 0:12 ` Paul Mackerras 0 siblings, 2 replies; 22+ messages in thread From: Dan Malek @ 2004-05-11 15:46 UTC (permalink / raw) To: Paul Mackerras; +Cc: Amit Shah, linuxppc-dev On May 10, 2004, at 11:03 PM, Paul Mackerras wrote: > ......According to page > 2-63 of the MPC750 users manual: "Of the broadcast cache operations, > the MPC750 snoops only dcbz, regardless of the HID0[ABE] setting." But, read the following sentence. "Any bus activity caused by other cache instructions results directly from performing the operation on the MPC750 cache." A dcbz has to be broadcast, others do not because their operations appear just as standard load/store ops. The only thing we should have to do in software is the icbi, which is no big deal to broadcast. My experience has been that MPC750s work in a SMP environment on a 60x bus. Maybe I was just lucky? The way I read the manual, they should work with a proper memory controller. Thanks. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 15:46 ` Dan Malek @ 2004-05-11 17:23 ` Huailin Chen 2004-05-11 17:31 ` Amit Shah 2004-05-12 0:12 ` Paul Mackerras 1 sibling, 1 reply; 22+ messages in thread From: Huailin Chen @ 2004-05-11 17:23 UTC (permalink / raw) To: Dan Malek, Paul Mackerras; +Cc: Amit Shah, linuxppc-dev > My experience has been that MPC750s work in a SMP environment on a 60x > bus. Maybe I was just lucky? The way I read the manual, they should > work with a proper memory controller. 750 MEI protocol does NOT support SMP well. Sure, you are right. It works, but not that good. Ideally, MESI + MPX Bus is the best one for SMP under PPC arch. That's why need go to G4+ GT 64360. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 17:23 ` Huailin Chen @ 2004-05-11 17:31 ` Amit Shah 2004-05-11 20:51 ` Huailin Chen 2004-05-12 0:17 ` Paul Mackerras 0 siblings, 2 replies; 22+ messages in thread From: Amit Shah @ 2004-05-11 17:31 UTC (permalink / raw) To: huailin; +Cc: Dan Malek, Paul Mackerras, linuxppc-dev Hi all, It's pretty strange then that people would come up with boards based on dual-processors of the 7xx family (like the IBM Argan board or the Xcalibur boards). Any idea how they use it or what are they intended for? Amit. On Tuesday 11 May 2004 22:53, Huailin Chen wrote: > > My experience has been that MPC750s work in a SMP > > environment > > on a 60x bus. Maybe I was just lucky? The way I > > read the manual, > > they should work with a proper memory controller. > > > > Thanks. > > 750 MEI protocol does NOT support SMP well. Sure, you > are right. It works, but not that good. > > Ideally, MESI + MPX Bus is the best one for SMP under > PPC arch. That's why need go to G4+ GT 64360. -- Amit Shah Codito Technologies Pvt. Ltd. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 17:31 ` Amit Shah @ 2004-05-11 20:51 ` Huailin Chen 2004-05-12 0:17 ` Paul Mackerras 1 sibling, 0 replies; 22+ messages in thread From: Huailin Chen @ 2004-05-11 20:51 UTC (permalink / raw) To: Amit Shah; +Cc: Dan Malek, Paul Mackerras, linuxppc-dev When talking about Multi-Processor arch, we have to consider the chipset support, especailly for the bus protocol part. For the particular PowerPC G3 vs. G4 issue, the thing is: If you have a G4 with MPX support, it would be not wise if have a chipset with ONLy 60x support. I mean, going for SMP is to achieve high performance. However, your goal will not be achieved if you don't have a good system bus throughput, which usually is the real bottleneck for an appliance. Also, I even don't think GT64260 have the Door Bell stuff with which one cpu can send external interrupts to another. For bus protocol itself, there are some signal/pins different in MPX than 60x. Most of them is from data streaming and so on, in order to remove dead circles and so on. For more detail, try to some data sheet about your chipset. Huailin, --- Amit Shah <amit.shah@codito.com> wrote: > > It's pretty strange then that people would come up with boards based > on dual-processors of the 7xx family (like the IBM Argan board or the > Xcalibur boards). Any idea how they use it or what are they intended > for? ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 17:31 ` Amit Shah 2004-05-11 20:51 ` Huailin Chen @ 2004-05-12 0:17 ` Paul Mackerras 1 sibling, 0 replies; 22+ messages in thread From: Paul Mackerras @ 2004-05-12 0:17 UTC (permalink / raw) To: Amit Shah; +Cc: huailin, Dan Malek, linuxppc-dev Amit Shah writes: > It's pretty strange then that people would come up with boards based on > dual-processors of the 7xx family (like the IBM Argan board or the Xcalibur > boards). Any idea how they use it or what are they intended for? I think it's the usual hardware designers' attitude that any bugs in the hardware can be worked around in software. And they can, it's just that to address the problems properly is going to take effort, and no-one has yet had the time, energy and motivation to sit down and do that (and do it in a way which is sufficiently clean and well though out to be accepted into the Linux kernel). Fortunately Apple never did a 750-based SMP system. :) Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-11 15:46 ` Dan Malek 2004-05-11 17:23 ` Huailin Chen @ 2004-05-12 0:12 ` Paul Mackerras 2004-05-12 7:57 ` Giuliano Pochini 2004-05-12 8:00 ` Gabriel Paubert 1 sibling, 2 replies; 22+ messages in thread From: Paul Mackerras @ 2004-05-12 0:12 UTC (permalink / raw) To: Dan Malek; +Cc: Amit Shah, linuxppc-dev Dan Malek writes: > But, read the following sentence. "Any bus activity caused by other > cache instructions results directly from performing the operation on > the MPC750 cache." A dcbz has to be broadcast, others do not because > their operations appear just as standard load/store ops. > > The only thing we should have to do in software is the icbi, which is > no big deal to broadcast. I don't think you are right, but it would be nice if you can prove me wrong. ;) Consider this scenario: an application is modifying some instructions (for example, ld.so modifying a PLT entry). It modifies the instructions, and then just before it does its dcbst; sync; icbi; isync sequence, it gets scheduled on the other CPU. It goes ahead and does the dcbst. However, the relevant cache lines aren't in the the cache (they are in the E state in the other CPU's cache), so nothing gets written out to memory. After doing the sync; icbi; isync it goes to execute the instructions and gets the old instructions, not the new ones. The dcbst won't cause any stores to memory in this scenario. It will cause a dcbst address-only broadcast but that won't (according to my reading of the manual) cause the other CPU to write back its copy of the relevant cache line, since the dcbst isn't snooped. The only workaround I can see for this is to completely flush the D and I caches of both CPUs whenever we schedule a process on a different CPU from that on which it last ran. Triple yuck. > My experience has been that MPC750s work in a SMP environment > on a 60x bus. Maybe I was just lucky? The way I read the manual, > they should work with a proper memory controller. I think that the sorts of problems I am talking about wouldn't show up very often. Generally I think that these problems would just cause the system to be a bit flaky rather than stop it from working at all. If you didn't have L2 caches that would make the problems show up less frequently, too. Regards, Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 0:12 ` Paul Mackerras @ 2004-05-12 7:57 ` Giuliano Pochini 2004-05-12 8:00 ` Gabriel Paubert 1 sibling, 0 replies; 22+ messages in thread From: Giuliano Pochini @ 2004-05-12 7:57 UTC (permalink / raw) To: Paul Mackerras; +Cc: linuxppc-dev, Amit Shah, Dan Malek On 12-May-2004 Paul Mackerras wrote: > > The only workaround I can see for this is to completely flush the D > and I caches of both CPUs whenever we schedule a process on a > different CPU from that on which it last ran. Triple yuck. It's not very different than what Linux does with NUMA systems AFAIK. If some cache management instructions cause troubles and it is an embedded system, it may be a good solution recompiling user space stuff removing problematic parts. -- Giuliano. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 0:12 ` Paul Mackerras 2004-05-12 7:57 ` Giuliano Pochini @ 2004-05-12 8:00 ` Gabriel Paubert 2004-05-12 10:26 ` Benjamin Herrenschmidt 2004-05-12 11:46 ` Paul Mackerras 1 sibling, 2 replies; 22+ messages in thread From: Gabriel Paubert @ 2004-05-12 8:00 UTC (permalink / raw) To: Paul Mackerras; +Cc: Dan Malek, Amit Shah, linuxppc-dev On Wed, May 12, 2004 at 10:12:47AM +1000, Paul Mackerras wrote: > > Dan Malek writes: > > > But, read the following sentence. "Any bus activity caused by other > > cache instructions results directly from performing the operation on > > the MPC750 cache." A dcbz has to be broadcast, others do not because > > their operations appear just as standard load/store ops. > > > > The only thing we should have to do in software is the icbi, which is > > no big deal to broadcast. > > I don't think you are right, but it would be nice if you can prove me > wrong. ;) > > Consider this scenario: an application is modifying some instructions > (for example, ld.so modifying a PLT entry). It modifies the > instructions, and then just before it does its dcbst; sync; icbi; > isync sequence, it gets scheduled on the other CPU. It goes ahead and > does the dcbst. However, the relevant cache lines aren't in the the > cache (they are in the E state in the other CPU's cache), so nothing > gets written out to memory. After doing the sync; icbi; isync it goes > to execute the instructions and gets the old instructions, not the new > ones. Are you sure? Since the cache lines are in the other processor memory, they will be flushed to RAM when they are fetched by the processor, provided that you can force the coherence bit on instruction fetches (this is possible IIRC). The most nasty scenario is I believe: - proceeding up to icbi or isync on processor 1, - scheduling and switching the process to processor 2 - the instructions were already in the icache on processor 2 for some reasons (PLT entries are half a cache line long IIRC) The only solution to this is full icache invalidate when a process changes processors. Threading might however make things worse because threads are entitled to believe from the architecture specification that icbi will affect other threads simultaneously running on other processors. And that has no clean solution AFAICS. BTW, did I dream or did I read somewhere that on a PPC750 icbi flushes all the cache ways (using only 7 bits of the address). This would mean that flushing an instruction cache page flushes the whole cache, and settnig HID0[ICFI] might be faster. > The dcbst won't cause any stores to memory in this scenario. It will > cause a dcbst address-only broadcast but that won't (according to my > reading of the manual) cause the other CPU to write back its copy of > the relevant cache line, since the dcbst isn't snooped. Yeah, but the subsequent fetch will be snooped if it's marked coherent. dcbst is really only necessary because instruction fetches don't look into the L1 data cache of the same processor. > > The only workaround I can see for this is to completely flush the D > and I caches of both CPUs whenever we schedule a process on a > different CPU from that on which it last ran. Triple yuck. As I said, I believe the real problem is multithreaded applications. > > > My experience has been that MPC750s work in a SMP environment > > on a 60x bus. Maybe I was just lucky? The way I read the manual, > > they should work with a proper memory controller. > > I think that the sorts of problems I am talking about wouldn't show up > very often. Generally I think that these problems would just cause > the system to be a bit flaky rather than stop it from working at all. I agree. > If you didn't have L2 caches that would make the problems show up less > frequently, too. I'm not so sure. Instruction fetches look into L2 caches. The main issue are: 1) are the instruction fetches marked coherent? 2) do you run multithreaded applications? If you answer yes and no, then I don't see any showstopper. Regards, Gabriel > > Regards, > Paul. > ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 8:00 ` Gabriel Paubert @ 2004-05-12 10:26 ` Benjamin Herrenschmidt 2004-05-12 11:53 ` Gabriel Paubert 2004-05-12 11:46 ` Paul Mackerras 1 sibling, 1 reply; 22+ messages in thread From: Benjamin Herrenschmidt @ 2004-05-12 10:26 UTC (permalink / raw) To: Gabriel Paubert; +Cc: Paul Mackerras, Dan Malek, Amit Shah, linuxppc-dev list > > Are you sure? Since the cache lines are in the other processor memory, > they will be flushed to RAM when they are fetched by the processor, > provided that you can force the coherence bit on instruction fetches > (this is possible IIRC). Coherency of the data cache lines is one thing... getting the icbi broadcast is another. Normal coherency will not help if you don't get the icache of the other CPU to snoop your icbi and invalidate the trash it has in its icache. > As I said, I believe the real problem is multithreaded applications. Which isn't a simple problem... > > > > > My experience has been that MPC750s work in a SMP environment > > > on a 60x bus. Maybe I was just lucky? The way I read the manual, > > > they should work with a proper memory controller. > > > > I think that the sorts of problems I am talking about wouldn't show up > > very often. Generally I think that these problems would just cause > > the system to be a bit flaky rather than stop it from working at all. > > I agree. > > > If you didn't have L2 caches that would make the problems show up less > > frequently, too. > > I'm not so sure. Instruction fetches look into L2 caches. The main issue > are: > 1) are the instruction fetches marked coherent? > 2) do you run multithreaded applications? > > If you answer yes and no, then I don't see any showstopper. > > Regards, > Gabriel > > > > > > Regards, > > Paul. > > > -- Benjamin Herrenschmidt <benh@kernel.crashing.org> ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 10:26 ` Benjamin Herrenschmidt @ 2004-05-12 11:53 ` Gabriel Paubert 0 siblings, 0 replies; 22+ messages in thread From: Gabriel Paubert @ 2004-05-12 11:53 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Paul Mackerras, Dan Malek, Amit Shah, linuxppc-dev list On Wed, May 12, 2004 at 08:26:19PM +1000, Benjamin Herrenschmidt wrote: > > > > > Are you sure? Since the cache lines are in the other processor memory, > > they will be flushed to RAM when they are fetched by the processor, > > provided that you can force the coherence bit on instruction fetches > > (this is possible IIRC). > > Coherency of the data cache lines is one thing... getting the icbi > broadcast is another. Normal coherency will not help if you don't get > the icache of the other CPU to snoop your icbi and invalidate the trash > it has in its icache. > > > As I said, I believe the real problem is multithreaded applications. > > Which isn't a simple problem... Indeed, it is actually not solvable in a reasonable way, disabling the icache being far too unreasonable ;-) But my point was that Paul's example, one process being rescheduled on another processor, is actually quite solvable (provided it is the sole owner of the MM context). You don't lose much by flushing the icache on a MEI system compared with the hardware overhead of all the invalidations and flushing that will take place because of the process switch. Regards, Gabriel ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 8:00 ` Gabriel Paubert 2004-05-12 10:26 ` Benjamin Herrenschmidt @ 2004-05-12 11:46 ` Paul Mackerras 2004-05-12 13:45 ` Gabriel Paubert 1 sibling, 1 reply; 22+ messages in thread From: Paul Mackerras @ 2004-05-12 11:46 UTC (permalink / raw) To: Gabriel Paubert; +Cc: Dan Malek, Amit Shah, linuxppc-dev Gabriel Paubert writes: > Are you sure? Since the cache lines are in the other processor memory, > they will be flushed to RAM when they are fetched by the processor, > provided that you can force the coherence bit on instruction fetches > (this is possible IIRC). The table on page 3-29 of the 750 user manual implies that GBL is asserted if M=1 on instruction fetches. So you're right. > The most nasty scenario is I believe: > - proceeding up to icbi or isync on processor 1, > - scheduling and switching the process to processor 2 > - the instructions were already in the icache on processor 2 > for some reasons (PLT entries are half a cache line long IIRC) Another bad scenario would be: - write the instructions on processor 1 - switch the process to processor 2 - it does the dcbst + sync, which do nothing - switch the process back to processor 1 - icbi, isync, try to execute the instructions In this scenario the instructions don't get written back to memory. So it sounds like when we switch a processor from cpu A to cpu B, we would need to (at least) flush cpu A's data cache and cpu B's instruction cache. Basically you can't rely on any cache management instructions being effective, because they could be executed on a different processor from the one where you need to execute them. This is true inside the kernel as well if you have preemption enabled (you can of course disable preemption where necessary, but you have to find and modify all those places). This will also affect the lazy cache flush logic that we have that defers doing the dcache/icache flush on a page until the page gets mapped into a user process. > The only solution to this is full icache invalidate when a process > changes processors. Threading might however make things worse > because threads are entitled to believe from the architecture > specification that icbi will affect other threads simultaneously > running on other processors. And that has no clean solution AFAICS. Indeed, I can't see one either. Not being able to use threads takes some of the fun out of SMP, of course. > BTW, did I dream or did I read somewhere that on a PPC750 icbi > flushes all the cache ways (using only 7 bits of the address). Page 2-64 says about icbi: "All ways of a selected set are invalidated". It seems that saves them having to actually translate the effective address. :) That means that the kernel doing the dcache/icache flush on a page is going to invalidate the whole icache. Ew... Regards, Paul. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 11:46 ` Paul Mackerras @ 2004-05-12 13:45 ` Gabriel Paubert 2004-05-12 14:21 ` Geert Uytterhoeven 0 siblings, 1 reply; 22+ messages in thread From: Gabriel Paubert @ 2004-05-12 13:45 UTC (permalink / raw) To: Paul Mackerras; +Cc: Dan Malek, Amit Shah, linuxppc-dev On Wed, May 12, 2004 at 09:46:19PM +1000, Paul Mackerras wrote: > > Gabriel Paubert writes: > > > Are you sure? Since the cache lines are in the other processor memory, > > they will be flushed to RAM when they are fetched by the processor, > > provided that you can force the coherence bit on instruction fetches > > (this is possible IIRC). > > The table on page 3-29 of the 750 user manual implies that GBL is > asserted if M=1 on instruction fetches. So you're right. > > > The most nasty scenario is I believe: > > - proceeding up to icbi or isync on processor 1, > > - scheduling and switching the process to processor 2 > > - the instructions were already in the icache on processor 2 > > for some reasons (PLT entries are half a cache line long IIRC) > > Another bad scenario would be: > > - write the instructions on processor 1 > - switch the process to processor 2 > - it does the dcbst + sync, which do nothing > - switch the process back to processor 1 > - icbi, isync, try to execute the instructions > > In this scenario the instructions don't get written back to memory. > So it sounds like when we switch a processor from cpu A to cpu B, we > would need to (at least) flush cpu A's data cache and cpu B's > instruction cache. Argh, I did not think of that case. Switching twice in two instructions is too devious for me ;-) It is also probably much harder to hit than the example I gave (which requires either two process switches or a multithreaded application), but correctness indeed requires a data cache flush. Data cache flushes are evil! Strictly speaking I believe that only the L1 cache needs to be flushed since instruction fetches will look at L2, but I hoped that a simple flash invalidate of icache would be sufficient and it's not. > Basically you can't rely on any cache management instructions being > effective, because they could be executed on a different processor > from the one where you need to execute them. This is true inside the > kernel as well if you have preemption enabled (you can of course > disable preemption where necessary, but you have to find and modify > all those places). This will also affect the lazy cache flush logic > that we have that defers doing the dcache/icache flush on a page until > the page gets mapped into a user process. I've never looked at that logic so I can't comment. > > The only solution to this is full icache invalidate when a process > > changes processors. Threading might however make things worse > > because threads are entitled to believe from the architecture > > specification that icbi will affect other threads simultaneously > > running on other processors. And that has no clean solution AFAICS. > > Indeed, I can't see one either. Not being able to use threads takes > some of the fun out of SMP, of course. Bottom line, 750 can't be used for SMP. > > > BTW, did I dream or did I read somewhere that on a PPC750 icbi > > flushes all the cache ways (using only 7 bits of the address). > > Page 2-64 says about icbi: "All ways of a selected set are > invalidated". It seems that saves them having to actually translate > the effective address. :) That means that the kernel doing the > dcache/icache flush on a page is going to invalidate the whole > icache. Ew... Be more optimistic, consider this as an optimization opportunity! don't loop over the lines, simply flush the whole cache. Especially if you want to flush several pages. For example and if I understand what you mean by lazy cache flushing: once you have done an icache flush when mapping a page to userspace, you don't need to perform any other until a page has been unmapped. (This can probably be improved upon but it's a start). Regards, Gabriel ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 13:45 ` Gabriel Paubert @ 2004-05-12 14:21 ` Geert Uytterhoeven 2004-05-12 14:30 ` Amit Shah 2004-05-13 4:30 ` Bryan Rittmeyer 0 siblings, 2 replies; 22+ messages in thread From: Geert Uytterhoeven @ 2004-05-12 14:21 UTC (permalink / raw) To: Gabriel Paubert Cc: Paul Mackerras, Dan Malek, Amit Shah, Linux/PPC Development On Wed, 12 May 2004, Gabriel Paubert wrote: > Bottom line, 750 can't be used for SMP. Solution: divide memory in pieces, run multiple instances of Linux, each on its own CPU and memory piece, and use a piece of uncached RAM for implementing communication channels between CPUs ;-) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 14:21 ` Geert Uytterhoeven @ 2004-05-12 14:30 ` Amit Shah 2004-05-13 4:30 ` Bryan Rittmeyer 1 sibling, 0 replies; 22+ messages in thread From: Amit Shah @ 2004-05-12 14:30 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Gabriel Paubert, Paul Mackerras, Dan Malek, Linux/PPC Development On Wednesday 12 May 2004 19:51, Geert Uytterhoeven wrote: > On Wed, 12 May 2004, Gabriel Paubert wrote: > > Bottom line, 750 can't be used for SMP. > > Solution: divide memory in pieces, run multiple instances of Linux, each on > its own CPU and memory piece, and use a piece of uncached RAM for > implementing communication channels between CPUs ;-) Wow, I was thinking about the exact same thing when I read this mail. I still haven't grokked the PPC architecture to comment on all the mails on this thread, but I guess this would be the easiest and surest way to make "SMP" work. -- Amit Shah http://amitshah.nav.to/ ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-12 14:21 ` Geert Uytterhoeven 2004-05-12 14:30 ` Amit Shah @ 2004-05-13 4:30 ` Bryan Rittmeyer 2004-05-14 8:02 ` Geert Uytterhoeven 1 sibling, 1 reply; 22+ messages in thread From: Bryan Rittmeyer @ 2004-05-13 4:30 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Linux/PPC Development On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote: > Solution: divide memory in pieces, run multiple instances of Linux, each on its > own CPU and memory piece, and use a piece of uncached RAM for implementing > communication channels between CPUs ;-) Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a 1000Mbps NIC on each CPU and cable em together ;-\ -Bryan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-13 4:30 ` Bryan Rittmeyer @ 2004-05-14 8:02 ` Geert Uytterhoeven 2004-05-14 9:11 ` Gabriel Paubert 0 siblings, 1 reply; 22+ messages in thread From: Geert Uytterhoeven @ 2004-05-14 8:02 UTC (permalink / raw) To: Bryan Rittmeyer; +Cc: Linux/PPC Development On Wed, 12 May 2004, Bryan Rittmeyer wrote: > On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote: > > Solution: divide memory in pieces, run multiple instances of Linux, each on its > > own CPU and memory piece, and use a piece of uncached RAM for implementing > > communication channels between CPUs ;-) > > Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a > 1000Mbps NIC on each CPU and cable em together ;-\ You can always put the real data in cacheable memory, and keep only some control descriptors in uncached memory. Needs some explicit cache handling, but should be faster. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-14 8:02 ` Geert Uytterhoeven @ 2004-05-14 9:11 ` Gabriel Paubert 0 siblings, 0 replies; 22+ messages in thread From: Gabriel Paubert @ 2004-05-14 9:11 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Bryan Rittmeyer, Linux/PPC Development On Fri, May 14, 2004 at 10:02:10AM +0200, Geert Uytterhoeven wrote: > > On Wed, 12 May 2004, Bryan Rittmeyer wrote: > > On Wed, May 12, 2004 at 04:21:04PM +0200, Geert Uytterhoeven wrote: > > > Solution: divide memory in pieces, run multiple instances of Linux, each on its > > > own CPU and memory piece, and use a piece of uncached RAM for implementing > > > communication channels between CPUs ;-) > > > > Non-cacheable I/O throughput on the 60x bus is horrid; might be better to put a > > 1000Mbps NIC on each CPU and cable em together ;-\ > > You can always put the real data in cacheable memory, and keep only some > control descriptors in uncached memory. Needs some explicit cache handling, but > should be faster. No the problem was the coherency of instruction and data caches. Data caches are just coherent, no shared state so you'd rather avoid having two processors actively reading from the same cache lines, but that's about all. Just map them through a non-execute segment so that you are sure that the Hmmm, now that I tinbk of it, this means that one processor fetching an instruction line will invalidate the same cache line in the L2 cache of the other processor. Which means that the L2 cache is actually useless for sharing code and you might actually force it to only cache data by fiddling with HID0. Well, MEI caches are actually worse than what I believed for SMP. They work well enough for UP with DMA. Gabriel ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: IBM 750GX SMP on Marvell Discovery II or III? 2004-05-10 23:36 ` Paul Mackerras 2004-05-11 2:09 ` Dan Malek @ 2004-05-11 3:08 ` Huailin Chen 1 sibling, 0 replies; 22+ messages in thread From: Huailin Chen @ 2004-05-11 3:08 UTC (permalink / raw) To: Paul Mackerras, Amit Shah; +Cc: linuxppc-dev 750 series with MEI protocol is a dog for Multi-Processor system. If you are working for a high-end products, change it NOW to G4+. 750 is G3. GT64260 is not good enough for MP yet. Someting related to 60xbus vs. MPX. Anyway, right, for PPC MP system, the best way is: 64360 + G4 Also, try to read latest errota when doing the design. Huailin, > > If not, what's the problem in getting it supported? An older mail by > > Cort Dougan said it was because of TLB invalidate information not > > being broadcasted, but it was a really old mail, has anyone come up > > with any workarounds? > > The real killer is that the cache management instructions are not > broadcast. The fact that the TLB invalidations are not broadcast is > painful but it can be worked around in the kernel. In contrast, the > cache management instructions are used in userspace. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-05-14 9:11 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-05-10 7:28 IBM 750GX SMP on Marvell Discovery II or III? Amit Shah 2004-05-10 23:36 ` Paul Mackerras 2004-05-11 2:09 ` Dan Malek 2004-05-11 3:03 ` Paul Mackerras 2004-05-11 15:46 ` Dan Malek 2004-05-11 17:23 ` Huailin Chen 2004-05-11 17:31 ` Amit Shah 2004-05-11 20:51 ` Huailin Chen 2004-05-12 0:17 ` Paul Mackerras 2004-05-12 0:12 ` Paul Mackerras 2004-05-12 7:57 ` Giuliano Pochini 2004-05-12 8:00 ` Gabriel Paubert 2004-05-12 10:26 ` Benjamin Herrenschmidt 2004-05-12 11:53 ` Gabriel Paubert 2004-05-12 11:46 ` Paul Mackerras 2004-05-12 13:45 ` Gabriel Paubert 2004-05-12 14:21 ` Geert Uytterhoeven 2004-05-12 14:30 ` Amit Shah 2004-05-13 4:30 ` Bryan Rittmeyer 2004-05-14 8:02 ` Geert Uytterhoeven 2004-05-14 9:11 ` Gabriel Paubert 2004-05-11 3:08 ` Huailin Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).