linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* consistent_alloc() on 82xx
@ 2005-06-08  9:29 Alex Zeffertt
  2005-06-08 16:12 ` Dan Malek
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Zeffertt @ 2005-06-08  9:29 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

I need to allocate un-cached memory on an 82xx and consistent_alloc() in
arch/ppc/mm/cachemap.c appears to be the thing I need.  However, this
code only seems to be built for 8xx and 4xx platforms, and not 6xx
platforms. Specifically, it is built if CONFIG_NOT_COHERENT_CACHE is
defined, which is the case if

	CONFIG_8xx = y, or
	CONFIG_4xx = y, but not if
	CONFIG_6xx = y

Does anybody know why it isn't built for 6xx cores?

I'm working on the ATM driver and it seems that certain external memory
areas accessed by the PQII CPM by-pass the cache.  So it would seem to
me that CONFIG_NOT_COHERENT_CACHE would actually be applicable for these
processors too....

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-08  9:29 consistent_alloc() on 82xx Alex Zeffertt
@ 2005-06-08 16:12 ` Dan Malek
  2005-06-09  6:15   ` Pantelis Antoniou
  2005-06-09  9:04   ` Alex Zeffertt
  0 siblings, 2 replies; 8+ messages in thread
From: Dan Malek @ 2005-06-08 16:12 UTC (permalink / raw)
  To: Alex Zeffertt; +Cc: linuxppc-embedded


On Jun 8, 2005, at 5:29 AM, Alex Zeffertt wrote:

> Does anybody know why it isn't built for 6xx cores?

Because 6xx cores are cache coherent and there shouldn't
be any need for "uncached" memory regions.

> I'm working on the ATM driver and it seems that certain external memory
> areas accessed by the PQII CPM by-pass the cache.

That's news to me, and I've written lots of CPM drivers, including ATM.
Do you have a specific example?

Thanks.

	-- Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-08 16:12 ` Dan Malek
@ 2005-06-09  6:15   ` Pantelis Antoniou
  2005-06-09  9:06     ` Alex Zeffertt
  2005-06-09 18:05     ` Dan Malek
  2005-06-09  9:04   ` Alex Zeffertt
  1 sibling, 2 replies; 8+ messages in thread
From: Pantelis Antoniou @ 2005-06-09  6:15 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

Dan Malek wrote:
> 
> On Jun 8, 2005, at 5:29 AM, Alex Zeffertt wrote:
> 
>> Does anybody know why it isn't built for 6xx cores?
> 
> 
> Because 6xx cores are cache coherent and there shouldn't
> be any need for "uncached" memory regions.
> 
>> I'm working on the ATM driver and it seems that certain external memory
>> areas accessed by the PQII CPM by-pass the cache.
> 
> 
> That's news to me, and I've written lots of CPM drivers, including ATM.
> Do you have a specific example?
> 

I may also need consistent_alloc for some testing reasons Dan. :)

> Thanks.
> 
>     -- Dan
> 

If I build arch/ppc/mm/cachemap.c will it work for 82xx? Any reason not to?

Regards

Pantelis

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-08 16:12 ` Dan Malek
  2005-06-09  6:15   ` Pantelis Antoniou
@ 2005-06-09  9:04   ` Alex Zeffertt
  2005-06-09 19:05     ` Dan Malek
  1 sibling, 1 reply; 8+ messages in thread
From: Alex Zeffertt @ 2005-06-09  9:04 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

On Wed, 8 Jun 2005 12:12:01 -0400
Dan Malek <dan@embeddededge.com> wrote:

> 
> On Jun 8, 2005, at 5:29 AM, Alex Zeffertt wrote:
> 
> > Does anybody know why it isn't built for 6xx cores?
> 
> Because 6xx cores are cache coherent and there shouldn't
> be any need for "uncached" memory regions.
> 
> > I'm working on the ATM driver and it seems that certain external
> > memory areas accessed by the PQII CPM by-pass the cache.
> 
> That's news to me, and I've written lots of CPM drivers, including
> ATM. Do you have a specific example?
> 

Hi Dan,

An example of non-cache coherency in the CPM2: In the ATM chapter of
the 8260 UM it says that the CPM will only assert GBL and "snoop"
buffers, Buffer Descriptors, and interrupt queues if you set TCT[GBL]
and RCT[GBL].  Presumably this means that it *does not* by-pass the
cache if this flag is not set.  More seriously, for address compression
tables, and external connection tables there is no way of specifying
that it *should* snoop - (which I assume means "use the cache").

I have seen this causing problems.  When I map a VPI/VCI to a connection
table using the address compression table, the first few cells received
do NOT get mapped.  I thought this might be a result of the CPM
by-passing the cache so I added a "flush_dcache_range" after writing to
the address compression tables.  When I did this the problem went away.

With external connection tables the problem is more severe.  During
frame transmission, the core is constantly having to read and write to
the Transmit Connection Table in order to use Auto-VC-ofF.  I couldn't
get this to work until I added lots of "invalidate_dcache_range" and
"flush_dcache_range" calls.  However, doing this is (i) inefficient, and
(ii) *very* error prone.

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-09  6:15   ` Pantelis Antoniou
@ 2005-06-09  9:06     ` Alex Zeffertt
  2005-06-09 18:05     ` Dan Malek
  1 sibling, 0 replies; 8+ messages in thread
From: Alex Zeffertt @ 2005-06-09  9:06 UTC (permalink / raw)
  To: Pantelis Antoniou; +Cc: linuxppc-embedded

On Thu, 09 Jun 2005 09:15:17 +0300
Pantelis Antoniou <panto@intracom.gr> wrote:

> Dan Malek wrote:
> > 
> > On Jun 8, 2005, at 5:29 AM, Alex Zeffertt wrote:
> > 
> >> Does anybody know why it isn't built for 6xx cores?
> > 
> > 
> > Because 6xx cores are cache coherent and there shouldn't
> > be any need for "uncached" memory regions.
> > 
> >> I'm working on the ATM driver and it seems that certain external
> >memory> areas accessed by the PQII CPM by-pass the cache.
> > 
> > 
> > That's news to me, and I've written lots of CPM drivers, including
> > ATM. Do you have a specific example?
> > 
> 
> I may also need consistent_alloc for some testing reasons Dan. :)
> 
> > Thanks.
> > 
> >     -- Dan
> > 
> 
> If I build arch/ppc/mm/cachemap.c will it work for 82xx? Any reason
> not to?
> 

Hi Pantelis,

I tried this in an attempt to work around my problem of the CPM
bypassing the cache but it didn't work for me....

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-09  6:15   ` Pantelis Antoniou
  2005-06-09  9:06     ` Alex Zeffertt
@ 2005-06-09 18:05     ` Dan Malek
  1 sibling, 0 replies; 8+ messages in thread
From: Dan Malek @ 2005-06-09 18:05 UTC (permalink / raw)
  To: Pantelis Antoniou; +Cc: linuxppc-embedded


On Jun 9, 2005, at 2:15 AM, Pantelis Antoniou wrote:

> I may also need consistent_alloc for some testing reasons Dan. :)

No, you don't :-)

Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-09  9:04   ` Alex Zeffertt
@ 2005-06-09 19:05     ` Dan Malek
  2005-06-10 10:11       ` Alex Zeffertt
  0 siblings, 1 reply; 8+ messages in thread
From: Dan Malek @ 2005-06-09 19:05 UTC (permalink / raw)
  To: Alex Zeffertt; +Cc: linuxppc-embedded


On Jun 9, 2005, at 5:04 AM, Alex Zeffertt wrote:

> An example of non-cache coherency in the CPM2: In the ATM chapter of
> the 8260 UM it says that the CPM will only assert GBL and "snoop"
> buffers, Buffer Descriptors, and interrupt queues if you set TCT[GBL]
> and RCT[GBL].

Right.

> ... Presumably this means that it *does not* by-pass the
> cache if this flag is not set.

It means just what it says.  If this flag is set, CPM DMA will
snoop the cache.  If you choose not to do this, you have to
maintain cache coherency through software.

>  ..... More seriously, for address compression
> tables, and external connection tables there is no way of specifying
> that it *should* snoop - (which I assume means "use the cache").

You need to be more clear with your terms.  Only the 60x core
"uses" the cache.  The CPM is a DMA device that can utilize
cache coherency protocols if you enable that with the GBL flags.

> I have seen this causing problems.  When I map a VPI/VCI to a 
> connection
> table using the address compression table,

Where are you placing the VP-level and VC-level tables?  I assume
you are properly configuring the GMODE to indicate their locations?
Does it work properly if the tables are in DP RAM?

> With external connection tables the problem is more severe.

There are some very subtle assumptions made by the ATM
controller regarding all channel data structures.  There aren't
configuration location flags for every level of table, and assumptions
are made that tables for a connection are either internal or
external.  Be careful with that.  I either entirely use DP RAM for
everything or external memory for everything, which seemed
to work for me in the past.  Maybe I was just lucky :-)

> ....  During
> frame transmission, the core is constantly having to read and write to
> the Transmit Connection Table in order to use Auto-VC-ofF.  I couldn't
> get this to work until I added lots of "invalidate_dcache_range" and
> "flush_dcache_range" calls.

Are you sure this is really a cache problem and not a race condition
with CPM access to the CTs?  The CPM does atomic burst read/write
of the RCT/TCT entries, and buy doing cache flush operations, the
60x core does the same.


Thanks.


	-- Dan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: consistent_alloc() on 82xx
  2005-06-09 19:05     ` Dan Malek
@ 2005-06-10 10:11       ` Alex Zeffertt
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Zeffertt @ 2005-06-10 10:11 UTC (permalink / raw)
  To: Dan Malek; +Cc: linuxppc-embedded

Hi Dan,


> 
> You need to be more clear with your terms.  Only the 60x core
> "uses" the cache.  The CPM is a DMA device that can utilize
> cache coherency protocols if you enable that with the GBL flags.
> 


Ah, I see.  If the CPM DMA is configured to "snoop" the cache then that
doesn't mean it "uses" the cache - it just tells the cache when data
needs to flush - namely before DMA writes.

> > I have seen this causing problems.  When I map a VPI/VCI to a 
> > connection
> > table using the address compression table,
> 
> Where are you placing the VP-level and VC-level tables?  I assume
> you are properly configuring the GMODE to indicate their locations?
> Does it work properly if the tables are in DP RAM?
> 


I've got the VP-level tables in DPRAM (as recommended by the UM) and the
VC-level tables in external memory (as required by UM).  The EVPT bit in
GMODE is cleared to indicate VP tables in DPRAM.

The first few cells received after writing the mapping are
definitely going to the raw cell queue.  This doesn't happen if I flush
the cache line following core writes to the VC-level table.  This
suggests to me that the CPM DMA is not employing the cache coherency
protocol when accessing the address compression tables.  This is not a
surprise since the UM only claims to be able to do "snooping" for
buffers, BDs, and interrupt tables.



> > With external connection tables the problem is more severe.
> 
> There are some very subtle assumptions made by the ATM
> controller regarding all channel data structures.  There aren't
> configuration location flags for every level of table, and assumptions
> are made that tables for a connection are either internal or
> external.  Be careful with that.  I either entirely use DP RAM for
> everything or external memory for everything, which seemed
> to work for me in the past.  Maybe I was just lucky :-)


The UM definitely claims you can have a mixture of internal channels and
external channels.  If you want a lot of simultaneous channels open you
are forced to use both since the number of internal channels is limited
by (a) the amount of DPRAM available, and (b) the design limit of 256.

> 
> > ....  During
> > frame transmission, the core is constantly having to read and write
> > to the Transmit Connection Table in order to use Auto-VC-ofF.  I
> > couldn't get this to work until I added lots of
> > "invalidate_dcache_range" and"flush_dcache_range" calls.
> 
> Are you sure this is really a cache problem and not a race condition
> with CPM access to the CTs?  The CPM does atomic burst read/write
> of the RCT/TCT entries, and buy doing cache flush operations, the
> 60x core does the same.
> 

Well it looks to me that for address compression tables and external
connection tables the CPM DMA is not employing it's cache coherency
mechanisms.  This mechanism probably just boils down to the CPM saying
to the cache "I want to write to ptr, please flush any pending writes
then invalidate the line".  By adding calls to
flush/invalidate_dcache_range in my code I am doing the cache coherency
mechanism in software instead.  It would be a lot easier though if I
could just allocate uncached memory....


Thanks for your help,

Alex

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2005-06-10 10:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-08  9:29 consistent_alloc() on 82xx Alex Zeffertt
2005-06-08 16:12 ` Dan Malek
2005-06-09  6:15   ` Pantelis Antoniou
2005-06-09  9:06     ` Alex Zeffertt
2005-06-09 18:05     ` Dan Malek
2005-06-09  9:04   ` Alex Zeffertt
2005-06-09 19:05     ` Dan Malek
2005-06-10 10:11       ` Alex Zeffertt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).