linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
@ 2007-11-06 22:40 Yuri Tikhonov
  2007-11-28 19:50 ` Eugene Surovegin
  0 siblings, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2007-11-06 22:40 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: sr, dzu


 Hello all,

 Here is a patch-set for support L2-cache synchronization routines for
the ppc44x processors family. I know that the "ppc" branch is for bug-fixing only, thus
the patch-set is just FYI [though enabled but non-coherent L2-cache may appear as a bug for
someone who uses one of the boards listed below :)].

[PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for ppc44x;
[PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based boards: ALPR,
Katmai, Ocotea, and Taishan.

 Regards, Yuri

-- 
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2007-11-06 22:40 [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x Yuri Tikhonov
@ 2007-11-28 19:50 ` Eugene Surovegin
  2008-01-11 15:24   ` Yuri Tikhonov
  0 siblings, 1 reply; 7+ messages in thread
From: Eugene Surovegin @ 2007-11-28 19:50 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: linuxppc-dev, sr, dzu

On Wed, Nov 07, 2007 at 01:40:10AM +0300, Yuri Tikhonov wrote:
> 
>  Hello all,
> 
>  Here is a patch-set for support L2-cache synchronization routines for
> the ppc44x processors family. I know that the "ppc" branch is for bug-fixing only, thus
> the patch-set is just FYI [though enabled but non-coherent L2-cache may appear as a bug for
> someone who uses one of the boards listed below :)].
> 
> [PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for ppc44x;
> [PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based boards: ALPR,
> Katmai, Ocotea, and Taishan.

Why is this all needed?

IIRC ibm440gx_l2c_enable() configures 64G snoop region for L2C.

Did AMCC made non-only-coherent L2C chips recently?

-- 
Eugene

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2007-11-28 19:50 ` Eugene Surovegin
@ 2008-01-11 15:24   ` Yuri Tikhonov
  2008-01-11 17:41     ` Eugene Surovegin
  0 siblings, 1 reply; 7+ messages in thread
From: Yuri Tikhonov @ 2008-01-11 15:24 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, sr, dzu


 Hello, Eugene,

 The h/w snooping mechanism you are talking about is limited to the Low 
Latency (LL) segment of the PLB bus in ppc440sp and ppc440spe chips (see 
section "7.2.7 L2 Cache Coherency" of the ppc440spe spec), whereas DMA and 
XOR engines use the High Bandwidth (HB) segment of PLB bus (see 
section "1.1.2 Internal Buses" of the ppc440spe spec).

 Thus, the h/w snooping mechanism is not able to trace the results of 
operations performed by DMA and XOR engines and keep L2-cache coherent with 
SDRAM, because the data flow through the HB PLB segment. This leads to, for 
example, incorrect results of RAID-parity calculations if one uses the h/w 
accelerated ppc440spe ADMA driver with L2-cache enabled.

 The s/w synchronization algorithms proposed in my patches has no LL PLB 
limitations as opposed to h/w snooping, but, probably, this is not the best 
way of how it might be implemented. Even though with these patches the h/w 
accelerated RAID starts to operate correctly (with L2-cache enabled) there is 
a performance degradation (induced by loops in the L2-cache synchronization 
routines) observed in the most cases. So, as a result, there is no benefit 
from using L2-cache for these, RAID, cases at all.

 Regards, Yuri

On Wednesday 28 November 2007 22:50, Eugene Surovegin wrote:
> On Wed, Nov 07, 2007 at 01:40:10AM +0300, Yuri Tikhonov wrote:
> > 
> >  Hello all,
> > 
> >  Here is a patch-set for support L2-cache synchronization routines for
> > the ppc44x processors family. I know that the "ppc" branch is for 
bug-fixing only, thus
> > the patch-set is just FYI [though enabled but non-coherent L2-cache may 
appear as a bug for
> > someone who uses one of the boards listed below :)].
> > 
> > [PATCH 1/2] [PPC 4xx] invalidate_l2cache_range() implementation for 
ppc44x;
> > [PATCH 2/2] [PPC 44x] enable L2-cache for the following ppc44x-based 
boards: ALPR,
> > Katmai, Ocotea, and Taishan.
> 
> Why is this all needed?
> 
> IIRC ibm440gx_l2c_enable() configures 64G snoop region for L2C.
> 
> Did AMCC made non-only-coherent L2C chips recently?
> 
> -- 
> Eugene
> 
> 

-- 
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2008-01-11 15:24   ` Yuri Tikhonov
@ 2008-01-11 17:41     ` Eugene Surovegin
  2008-01-11 22:05       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Eugene Surovegin @ 2008-01-11 17:41 UTC (permalink / raw)
  To: Yuri Tikhonov; +Cc: linuxppc-dev, sr, dzu

On Fri, Jan 11, 2008 at 06:24:46PM +0300, Yuri Tikhonov wrote:
> 
>  Hello, Eugene,
> 
>  The h/w snooping mechanism you are talking about is limited to the Low 
> Latency (LL) segment of the PLB bus in ppc440sp and ppc440spe chips (see 
> section "7.2.7 L2 Cache Coherency" of the ppc440spe spec), whereas DMA and 
> XOR engines use the High Bandwidth (HB) segment of PLB bus (see 
> section "1.1.2 Internal Buses" of the ppc440spe spec).
> 
>  Thus, the h/w snooping mechanism is not able to trace the results of 
> operations performed by DMA and XOR engines and keep L2-cache coherent with 
> SDRAM, because the data flow through the HB PLB segment. This leads to, for 
> example, incorrect results of RAID-parity calculations if one uses the h/w 
> accelerated ppc440spe ADMA driver with L2-cache enabled.
> 
>  The s/w synchronization algorithms proposed in my patches has no LL PLB 
> limitations as opposed to h/w snooping, but, probably, this is not the best 
> way of how it might be implemented. Even though with these patches the h/w 
> accelerated RAID starts to operate correctly (with L2-cache enabled) there is 
> a performance degradation (induced by loops in the L2-cache synchronization 
> routines) observed in the most cases. So, as a result, there is no benefit 
> from using L2-cache for these, RAID, cases at all.

Thanks a lot for explanation, Yuri. I'd never imagine they were so 
stupid to make new chips with such behaviour.

-- 
Eugene

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2008-01-11 17:41     ` Eugene Surovegin
@ 2008-01-11 22:05       ` Benjamin Herrenschmidt
  2008-01-11 22:38         ` Eugene Surovegin
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-01-11 22:05 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, sr, dzu


> >  The s/w synchronization algorithms proposed in my patches has no LL PLB 
> > limitations as opposed to h/w snooping, but, probably, this is not the best 
> > way of how it might be implemented. Even though with these patches the h/w 
> > accelerated RAID starts to operate correctly (with L2-cache enabled) there is 
> > a performance degradation (induced by loops in the L2-cache synchronization 
> > routines) observed in the most cases. So, as a result, there is no benefit 
> > from using L2-cache for these, RAID, cases at all.
> 
> Thanks a lot for explanation, Yuri. I'd never imagine they were so 
> stupid to make new chips with such behaviour.

Indeed. Now the question is do we want to make that configurable by the
platform so it can select whether to enable snooping, or use this
mechanism (in which case we can disable snooping on the L2) ?

Another option would be to make the dma_ops smart enough to know whether
a given device is on the snooped portion of the bus, which would be
easier to do after I merge 32 and 64 bits DMA ops, so we get the ability
to change the dma-ops per bus or per device even.

What do you guys think ?

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2008-01-11 22:05       ` Benjamin Herrenschmidt
@ 2008-01-11 22:38         ` Eugene Surovegin
  2008-01-12  1:52           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Eugene Surovegin @ 2008-01-11 22:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, sr, dzu

On Sat, Jan 12, 2008 at 09:05:35AM +1100, Benjamin Herrenschmidt wrote:
> 
> > >  The s/w synchronization algorithms proposed in my patches has no LL PLB 
> > > limitations as opposed to h/w snooping, but, probably, this is not the best 
> > > way of how it might be implemented. Even though with these patches the h/w 
> > > accelerated RAID starts to operate correctly (with L2-cache enabled) there is 
> > > a performance degradation (induced by loops in the L2-cache synchronization 
> > > routines) observed in the most cases. So, as a result, there is no benefit 
> > > from using L2-cache for these, RAID, cases at all.
> > 
> > Thanks a lot for explanation, Yuri. I'd never imagine they were so 
> > stupid to make new chips with such behaviour.
> 
> Indeed. Now the question is do we want to make that configurable by the
> platform so it can select whether to enable snooping, or use this
> mechanism (in which case we can disable snooping on the L2) ?

I don't think we should panish platforms with sane L2 caches, because 
there are some brain-dead ones.

> Another option would be to make the dma_ops smart enough to know whether
> a given device is on the snooped portion of the bus, which would be
> easier to do after I merge 32 and 64 bits DMA ops, so we get the ability
> to change the dma-ops per bus or per device even.
> 
> What do you guys think ?

I like the idea of having smart DMA routines with different 
per-bus/device behaviour.

-- 
Eugene

 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x
  2008-01-11 22:38         ` Eugene Surovegin
@ 2008-01-12  1:52           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2008-01-12  1:52 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, sr, dzu


On Fri, 2008-01-11 at 14:38 -0800, Eugene Surovegin wrote:
> On Sat, Jan 12, 2008 at 09:05:35AM +1100, Benjamin Herrenschmidt wrote:
> > 
> > > >  The s/w synchronization algorithms proposed in my patches has no LL PLB 
> > > > limitations as opposed to h/w snooping, but, probably, this is not the best 
> > > > way of how it might be implemented. Even though with these patches the h/w 
> > > > accelerated RAID starts to operate correctly (with L2-cache enabled) there is 
> > > > a performance degradation (induced by loops in the L2-cache synchronization 
> > > > routines) observed in the most cases. So, as a result, there is no benefit 
> > > > from using L2-cache for these, RAID, cases at all.
> > > 
> > > Thanks a lot for explanation, Yuri. I'd never imagine they were so 
> > > stupid to make new chips with such behaviour.
> > 
> > Indeed. Now the question is do we want to make that configurable by the
> > platform so it can select whether to enable snooping, or use this
> > mechanism (in which case we can disable snooping on the L2) ?
> 
> I don't think we should panish platforms with sane L2 caches, because 
> there are some brain-dead ones.

I agree, which is why I'm thinking about making it some kind of explicit
thing that a give platform would call from it's setup_arch() callbacks
to turn on manual L2 sycnhronization.

> > Another option would be to make the dma_ops smart enough to know whether
> > a given device is on the snooped portion of the bus, which would be
> > easier to do after I merge 32 and 64 bits DMA ops, so we get the ability
> > to change the dma-ops per bus or per device even.
> > 
> > What do you guys think ?
> 
> I like the idea of having smart DMA routines with different 
> per-bus/device behaviour.

That would be longer term. When I merge the dma ops, I'll look into a
way to provide 44x specific DMA ops that handle that case, and then a
way for devices to be tagged (maybe via the device-tree) on whether they
are on an L2 coherent or non-L2 coherent segment of the bus.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-01-12  1:53 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-06 22:40 [PATCH 0/2] [PPC 4xx] L2-cache synchronization for ppc44x Yuri Tikhonov
2007-11-28 19:50 ` Eugene Surovegin
2008-01-11 15:24   ` Yuri Tikhonov
2008-01-11 17:41     ` Eugene Surovegin
2008-01-11 22:05       ` Benjamin Herrenschmidt
2008-01-11 22:38         ` Eugene Surovegin
2008-01-12  1:52           ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).