linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Not coherent cache DMA for G3/G4 CPUs: clarification needed
@ 2006-04-20 18:57 Gerhard Pircher
  2006-04-20 20:38 ` Eugene Surovegin
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 18:57 UTC (permalink / raw)
  To: linuxppc-dev, debian-powerpc

Hi,

I try to implement not coherent cache/DMA support for G3/G4 processors, by
reserving some physical memory for DMA operations. The memory used for
consistent allocations (removed from the top of the physical memory below
896MB) is excluded from the BAT mapping and the pages are marked as
reserved. This seems to work just fine, although I still have to mark the
pages as cache inhibited.

Whilst working on this workaround for the AmigaOne and reading some articles
about the Linux kernel page tables and memory management, I came to the
conclusion that there may be some problems with this approach for not
coherent DMA: 

1. The AmigaOne is similar to the PREP platform, i.e. DMA can only be
performed in the first 16MB for ISA devices (there's only a VIA southbridge,
no other SuperI/O IC with 32bit capable DMA controller). I guess the first
16MB cannot be reserved for not coherent DMA operation, because this memory
area is occupied by kernel data? (not to talk about the performance loss, if
the kernel data area would be excluded from the BAT mapping).

2. I'm not sure how to allocate memory for DMA operation. I think
alloc_pages() will not do the job for me, as the page tables for not
coherent DMA are reserved (SetPageReserved()) and removed from the available
lowmem. Also memory fragmentation may be a problem, if a lot DMA operations
with different buffer sizes are performed. Therefore a system could quickly
run out of memory for not coherent DMA operation, right?
Is there a way to minimize fragmentation?

3. How are DMA buffers used outside the kernel? Do user programs get a
pointer to the DMA buffer (in theory) from the device driver or is the data
copied to another buffer allocated by an user program?

Thanks!

Regards,

Gerhard

-- 
--
-- AmigaOne Linux kernel project:
-- http://amigaone-linux.sourceforge.net
--

Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 18:57 Not coherent cache DMA for G3/G4 CPUs: clarification needed Gerhard Pircher
@ 2006-04-20 20:38 ` Eugene Surovegin
  2006-04-20 20:56   ` Gerhard Pircher
  2006-04-20 21:06   ` Benjamin Herrenschmidt
  2006-04-20 21:03 ` Benjamin Herrenschmidt
  2006-04-20 22:07 ` Gabriel Paubert
  2 siblings, 2 replies; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 20:38 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Thu, Apr 20, 2006 at 08:57:46PM +0200, Gerhard Pircher wrote:
> Hi,
> 
> I try to implement not coherent cache/DMA support for G3/G4 processors, by
> reserving some physical memory for DMA operations. The memory used for
> consistent allocations (removed from the top of the physical memory below
> 896MB) is excluded from the BAT mapping and the pages are marked as
> reserved. This seems to work just fine, although I still have to mark the
> pages as cache inhibited.
> 
> Whilst working on this workaround for the AmigaOne and reading some articles
> about the Linux kernel page tables and memory management, I came to the
> conclusion that there may be some problems with this approach for not
> coherent DMA: 
> 
> 1. The AmigaOne is similar to the PREP platform, i.e. DMA can only be
> performed in the first 16MB for ISA devices (there's only a VIA southbridge,
> no other SuperI/O IC with 32bit capable DMA controller). I guess the first
> 16MB cannot be reserved for not coherent DMA operation, because this memory
> area is occupied by kernel data? (not to talk about the performance loss, if
> the kernel data area would be excluded from the BAT mapping).
> 
> 2. I'm not sure how to allocate memory for DMA operation. I think
> alloc_pages() will not do the job for me, as the page tables for not
> coherent DMA are reserved (SetPageReserved()) and removed from the available
> lowmem. Also memory fragmentation may be a problem, if a lot DMA operations
> with different buffer sizes are performed. Therefore a system could quickly
> run out of memory for not coherent DMA operation, right?
> Is there a way to minimize fragmentation?
> 
> 3. How are DMA buffers used outside the kernel? Do user programs get a
> pointer to the DMA buffer (in theory) from the device driver or is the data
> copied to another buffer allocated by an user program?


There are already non-coherent cache PPC archs (8xx, 4xx) just look 
how all this implemented there, don't reinvent the wheel.

Also, read Documentation/DMA-API.txt and DMA-mapping.txt

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 20:38 ` Eugene Surovegin
@ 2006-04-20 20:56   ` Gerhard Pircher
  2006-04-20 21:02     ` Eugene Surovegin
  2006-04-20 21:06   ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 20:56 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Eugene Surovegin <ebs@ebshome.net>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Thu, 20 Apr 2006 13:38:48 -0700
> 
> There are already non-coherent cache PPC archs (8xx, 4xx) just look 
> how all this implemented there, don't reinvent the wheel.
> 
> Also, read Documentation/DMA-API.txt and DMA-mapping.txt
I know! Unfortunately this implementation does not work at all with G3/G4
PPC desktop CPUs for various reasons (for example due to the BAT mapping,
page tables with different access attributes for the same physical memory
area allocated by the consistent DMA functions, etc.).

regards,

Gerhard

-- 
Echte DSL-Flatrate dauerhaft für 0,- Euro*!
"Feel free" mit GMX DSL! http://www.gmx.net/de/go/dsl

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 20:56   ` Gerhard Pircher
@ 2006-04-20 21:02     ` Eugene Surovegin
  2006-04-20 21:10       ` Gerhard Pircher
  0 siblings, 1 reply; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 21:02 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Thu, Apr 20, 2006 at 10:56:33PM +0200, Gerhard Pircher wrote:
> > --- Urspr?ngliche Nachricht ---
> > Von: Eugene Surovegin <ebs@ebshome.net>
> > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> > Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> > Datum: Thu, 20 Apr 2006 13:38:48 -0700
> > 
> > There are already non-coherent cache PPC archs (8xx, 4xx) just look 
> > how all this implemented there, don't reinvent the wheel.
> > 
> > Also, read Documentation/DMA-API.txt and DMA-mapping.txt
> I know! Unfortunately this implementation does not work at all with G3/G4
> PPC desktop CPUs for various reasons (for example due to the BAT mapping,
> page tables with different access attributes for the same physical memory
> area allocated by the consistent DMA functions, etc.).

We have the same situation on 44x (all kernel memory is mapped 
through several big TLBs and consistent functions allocate additional 
cache-inhibited mappings for the same physical pages).

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 18:57 Not coherent cache DMA for G3/G4 CPUs: clarification needed Gerhard Pircher
  2006-04-20 20:38 ` Eugene Surovegin
@ 2006-04-20 21:03 ` Benjamin Herrenschmidt
  2006-04-20 21:33   ` Gerhard Pircher
  2006-04-20 22:07 ` Gabriel Paubert
  2 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-20 21:03 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Thu, 2006-04-20 at 20:57 +0200, Gerhard Pircher wrote:

> 1. The AmigaOne is similar to the PREP platform, i.e. DMA can only be
> performed in the first 16MB for ISA devices (there's only a VIA southbridge,
> no other SuperI/O IC with 32bit capable DMA controller). I guess the first
> 16MB cannot be reserved for not coherent DMA operation, because this memory
> area is occupied by kernel data? (not to talk about the performance loss, if
> the kernel data area would be excluded from the BAT mapping).

Yeah that would suck. Are you sure you need ISA DMA ? You can do pseudo
DMA like pegasos for the floppy. Anything else should be able to do 32
bits DMA unless you have a very broken chipset.

> 2. I'm not sure how to allocate memory for DMA operation. I think
> alloc_pages() will not do the job for me, as the page tables for not
> coherent DMA are reserved (SetPageReserved()) and removed from the available
> lowmem.

alloc_pages() ? How so ? No, you want to allocate from your reserved
pool, you'll have to implement your own allocator. Easiest is a running
bitmap, look at ppc64 iommu code for an example.

>  Also memory fragmentation may be a problem, if a lot DMA operations
> with different buffer sizes are performed. Therefore a system could quickly
> run out of memory for not coherent DMA operation, right?
> Is there a way to minimize fragmentation?

Best you can do is have separate pools for small and big allocations I
suppose.

> 3. How are DMA buffers used outside the kernel? Do user programs get a
> pointer to the DMA buffer (in theory) from the device driver or is the data
> copied to another buffer allocated by an user program?

There are cases where some drivers try that but you should ignore it for
the moment. Won't happen in most cases.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 20:38 ` Eugene Surovegin
  2006-04-20 20:56   ` Gerhard Pircher
@ 2006-04-20 21:06   ` Benjamin Herrenschmidt
  2006-04-20 21:13     ` Eugene Surovegin
  2006-04-20 21:33     ` Eugene Surovegin
  1 sibling, 2 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-20 21:06 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: debian-powerpc, linuxppc-dev


> > 3. How are DMA buffers used outside the kernel? Do user programs get a
> > pointer to the DMA buffer (in theory) from the device driver or is the data
> > copied to another buffer allocated by an user program?
> 
> 
> There are already non-coherent cache PPC archs (8xx, 4xx) just look 
> how all this implemented there, don't reinvent the wheel.

Unfortunately, he has to do things a bit differently. He can't afford to
have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
a reserved pool. Last I looked at our coherent code, it didn't reserve
memory at all, just address space, thus assuming the CPU can handle
having both a caheable and a non-cacheable mapping of the same pages...
(On 6xx this is deadly even if you don't access those cacheable pages
because the CPU prefetch may do it for you).

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:02     ` Eugene Surovegin
@ 2006-04-20 21:10       ` Gerhard Pircher
  2006-04-20 21:55         ` Eugene Surovegin
  0 siblings, 1 reply; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 21:10 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Eugene Surovegin <ebs@ebshome.net>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Thu, 20 Apr 2006 14:02:01 -0700
> 
> On Thu, Apr 20, 2006 at 10:56:33PM +0200, Gerhard Pircher wrote:
> > > --- Urspr?ngliche Nachricht ---
> > > Von: Eugene Surovegin <ebs@ebshome.net>
> > > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > > Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> > > Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification
> > > needed
> > > Datum: Thu, 20 Apr 2006 13:38:48 -0700
> > > 
> > > There are already non-coherent cache PPC archs (8xx, 4xx) just look 
> > > how all this implemented there, don't reinvent the wheel.
> > > 
> > > Also, read Documentation/DMA-API.txt and DMA-mapping.txt
> > I know! Unfortunately this implementation does not work at all with
> > G3/G4 PPC desktop CPUs for various reasons (for example due to the BAT
> > mapping, page tables with different access attributes for the same
> > physical memory area allocated by the consistent DMA functions, etc.).
> 
> We have the same situation on 44x (all kernel memory is mapped 
> through several big TLBs and consistent functions allocate additional 
> cache-inhibited mappings for the same physical pages).

Well, Freescale's PPC programming environment manual clearly states that
this will not work on G4 CPUs (74xx). Also Benjamin Herrenschmidt told me,
that this implementation will not work for the reasons I mentioned before. 
The approach I'm trying to implement was his idea, so I have to trust in
him.

regards,

Gerhard

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:06   ` Benjamin Herrenschmidt
@ 2006-04-20 21:13     ` Eugene Surovegin
  2006-04-20 21:19       ` Eugene Surovegin
  2006-04-20 22:39       ` Benjamin Herrenschmidt
  2006-04-20 21:33     ` Eugene Surovegin
  1 sibling, 2 replies; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 21:13 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: debian-powerpc, linuxppc-dev

On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> Unfortunately, he has to do things a bit differently. He can't afford to
> have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
> a reserved pool. Last I looked at our coherent code, it didn't reserve
> memory at all, just address space, thus assuming the CPU can handle
> having both a caheable and a non-cacheable mapping of the same pages...
> (On 6xx this is deadly even if you don't access those cacheable pages
> because the CPU prefetch may do it for you).

Ben, is this _real_ problem on 6xx or just a theory? Does 6xx actually 
prefetch beyond page boundary?

So far, all "prefetching" I saw which broke non-coherent DMA was not 
due to the CPU doing prefetching, but _software_ prefetching being 
too aggressive.

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:13     ` Eugene Surovegin
@ 2006-04-20 21:19       ` Eugene Surovegin
  2006-04-20 22:40         ` Benjamin Herrenschmidt
  2006-04-20 22:39       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 21:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Gerhard Pircher, linuxppc-dev,
	debian-powerpc

On Thu, Apr 20, 2006 at 02:13:21PM -0700, Eugene Surovegin wrote:
> On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > Unfortunately, he has to do things a bit differently. He can't afford to
> > have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
> > a reserved pool. Last I looked at our coherent code, it didn't reserve
> > memory at all, just address space, thus assuming the CPU can handle
> > having both a caheable and a non-cacheable mapping of the same pages...
> > (On 6xx this is deadly even if you don't access those cacheable pages
> > because the CPU prefetch may do it for you).
> 
> Ben, is this _real_ problem on 6xx or just a theory? Does 6xx actually 
> prefetch beyond page boundary?
> 
> So far, all "prefetching" I saw which broke non-coherent DMA was not 
> due to the CPU doing prefetching, but _software_ prefetching being 
> too aggressive.

Even if this "prefetching" problem is real, instead of implementing 
separate pool for allocations which will be quite rare at best, just 
allocate guard space before your consistent memory and stop worrying 
about it.

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:06   ` Benjamin Herrenschmidt
  2006-04-20 21:13     ` Eugene Surovegin
@ 2006-04-20 21:33     ` Eugene Surovegin
  2006-04-20 22:41       ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 21:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: debian-powerpc, linuxppc-dev

On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> (On 6xx this is deadly even if you don't access those cacheable pages
> because the CPU prefetch may do it for you).

Here is another thought if this "prefetch" theory is correct.

You guys seems to focus on 
dma_alloc_coherent()/pci_alloc_consistent(), but forgeting about so 
called "streaming" mappings.

You cannot just flush/invalidate cache any more, because "CPU can 
prefetch this data back". So, to be completely correct (if you insist 
on "6xx can prefetch"-theory), you have to actually _copy_ data to 
your consistent memory on dma_map_single(). You can imagine 
performance implications. I suspect even 440 will be faster in this 
case than G4 :).

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:03 ` Benjamin Herrenschmidt
@ 2006-04-20 21:33   ` Gerhard Pircher
  0 siblings, 0 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 21:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Fri, 21 Apr 2006 07:03:45 +1000
> 
> On Thu, 2006-04-20 at 20:57 +0200, Gerhard Pircher wrote:
> 
> > 1. The AmigaOne is similar to the PREP platform, i.e. DMA can only be
> > performed in the first 16MB for ISA devices (there's only a VIA
> > southbridge, no other SuperI/O IC with 32bit capable DMA controller).
> > I guess the first 16MB cannot be reserved for not coherent DMA
> > operation, because this memory area is occupied by kernel data? (not to
> > talk about the performance loss, if the kernel data area would be
> > excluded from the BAT mapping).
> 
> Yeah that would suck. Are you sure you need ISA DMA ? You can do pseudo
> DMA like pegasos for the floppy. Anything else should be able to do 32
> bits DMA unless you have a very broken chipset.

I hope not. But I think the AmigaOne floppy driver is copied from the i386
architecture and this one uses DMA IIRC. On the side: who uses floppy drives
these days?

> > 2. I'm not sure how to allocate memory for DMA operation. I think
> > alloc_pages() will not do the job for me, as the page tables for not
> > coherent DMA are reserved (SetPageReserved()) and removed from the
> > available lowmem.
> 
> alloc_pages() ? How so ? No, you want to allocate from your reserved
> pool, you'll have to implement your own allocator. Easiest is a running
> bitmap, look at ppc64 iommu code for an example.
Thanks! I was searching a while for an example, but couldn't found one. 

> >  Also memory fragmentation may be a problem, if a lot DMA operations
> > with different buffer sizes are performed. Therefore a system could
> > quickly run out of memory for not coherent DMA operation, right?
> > Is there a way to minimize fragmentation?
> 
> Best you can do is have separate pools for small and big allocations I
> suppose.
Okay. Or is there a general slab allocator implementation in the Linux
kernel?

> > 3. How are DMA buffers used outside the kernel? Do user programs get a
> > pointer to the DMA buffer (in theory) from the device driver or is the
> > data copied to another buffer allocated by an user program?
> 
> There are cases where some drivers try that but you should ignore it for
> the moment. Won't happen in most cases.
That's good to hear. Makes my life a little bit easier. ;-)
 
Thanks,

Gerhard

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:10       ` Gerhard Pircher
@ 2006-04-20 21:55         ` Eugene Surovegin
  2006-04-20 22:08           ` Gerhard Pircher
  2006-04-21  4:38           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 30+ messages in thread
From: Eugene Surovegin @ 2006-04-20 21:55 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Thu, Apr 20, 2006 at 11:10:55PM +0200, Gerhard Pircher wrote:
> Well, Freescale's PPC programming environment manual clearly states that
> this will not work on G4 CPUs (74xx). Also Benjamin Herrenschmidt told me,
> that this implementation will not work for the reasons I mentioned before. 
> The approach I'm trying to implement was his idea, so I have to trust in
> him.

Well, you aren't the first person who tries to run G4 with 
CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
that those people had to implement anything as complex as you are 
trying to do.

You can try asking on #mklinux. It always better to ask people who 
actually _did_ this :).

In fact, I just grepped 2.6 and found 
#ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
systems usually have this type of bridge? Not 4xx/8xx, that's for sure.

Good luck.

-- 
Eugene

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 18:57 Not coherent cache DMA for G3/G4 CPUs: clarification needed Gerhard Pircher
  2006-04-20 20:38 ` Eugene Surovegin
  2006-04-20 21:03 ` Benjamin Herrenschmidt
@ 2006-04-20 22:07 ` Gabriel Paubert
  2006-04-20 22:26   ` Gerhard Pircher
  2 siblings, 1 reply; 30+ messages in thread
From: Gabriel Paubert @ 2006-04-20 22:07 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Thu, Apr 20, 2006 at 08:57:46PM +0200, Gerhard Pircher wrote:
> Hi,
> 
> I try to implement not coherent cache/DMA support for G3/G4 processors, by
> reserving some physical memory for DMA operations. The memory used for
> consistent allocations (removed from the top of the physical memory below
> 896MB) is excluded from the BAT mapping and the pages are marked as
> reserved. This seems to work just fine, although I still have to mark the
> pages as cache inhibited.
> 
> Whilst working on this workaround for the AmigaOne and reading some articles
> about the Linux kernel page tables and memory management, I came to the
> conclusion that there may be some problems with this approach for not
> coherent DMA: 
> 
> 1. The AmigaOne is similar to the PREP platform, i.e. DMA can only be
> performed in the first 16MB for ISA devices (there's only a VIA southbridge,
> no other SuperI/O IC with 32bit capable DMA controller). 

More details please what are the exact capabilities of the south and
host bridges? 

I've never needed (and therefore) used floppy on my PreP boards (Motorola
MVME2[467]xx series), but they have a south bridge (WinBond) that has 32
bit DMA capability. This was specified in the PreP spec. 

This may also depend on the host bridge since RAM appears at 2GB on
default PreP machines, which is an area that you can't access with
normal ISA DMA anyway. On the MVME machines, you could map PCI addresses
0-16 MB anywhere in RAM by reprogramming the host bridge.

However I actually reprogram the chipset to look like CHRP, i.e., 
1:1 mapping of RAM to PCI. This caused problems a long time
ago since sometimes DMA went to VGA video memory instead of RAM.
This was when kernels were not bloated enough to at least occupy
768kB of RAM, nowadays there is strictly no risk.

> 3. How are DMA buffers used outside the kernel? Do user programs get a
> pointer to the DMA buffer (in theory) from the device driver or is the data
> copied to another buffer allocated by an user program?

If your memory is uncacheable, you are better off copying it to
cacheable memory. At least you are sure that you only access it 
once (trying to copy with FP registers to halve the number of
accesses might be a big win, but you need to be careful).

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:55         ` Eugene Surovegin
@ 2006-04-20 22:08           ` Gerhard Pircher
  2006-04-24 19:21             ` Mark A. Greer
  2006-04-21  4:38           ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 22:08 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Eugene Surovegin <ebs@ebshome.net>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Thu, 20 Apr 2006 14:55:14 -0700
> 
> Well, you aren't the first person who tries to run G4 with 
> CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
> that those people had to implement anything as complex as you are 
> trying to do.

Maybe these systems have cache coherent northbridges, which is not the case
for the AmigaOne and its "famous" ArticiaS northbridge.

> You can try asking on #mklinux. It always better to ask people who 
> actually _did_ this :).
> 
> In fact, I just grepped 2.6 and found 
> #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> systems usually have this type of bridge? Not 4xx/8xx, that's for sure.

Hmm, strange. AFAIK the NOT_COHERENT_CACHE config option is available only
for the 4xx and 8xx platforms. Wouldn't the config option depend on
CONFIG_6XX too, if there are not cache coherent systems with G4 cpus? 

At least I could not compile in the dma-mapping.c file without modifying the
Kconfig file.

> Good luck.

Thanks!

Gerhard

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 22:07 ` Gabriel Paubert
@ 2006-04-20 22:26   ` Gerhard Pircher
  0 siblings, 0 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-20 22:26 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Gabriel Paubert <paubert@iram.es>
> An: Gerhard Pircher <gerhard_pircher@gmx.net>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Fri, 21 Apr 2006 00:07:08 +0200
> 
> More details please what are the exact capabilities of the south and
> host bridges? 
The southbridge is a VIA82C686B, which supports ISA DMA in the first 16MB.
The host bridge is a MAI ArticiaS. The ArticiaS has a bug in the snoop
signal logic and therefore does not support cache coherent DMA. 

> I've never needed (and therefore) used floppy on my PreP boards (Motorola
> MVME2[467]xx series), but they have a south bridge (WinBond) that has 32
> bit DMA capability. This was specified in the PreP spec. 

Oh, I thought PReP specifies only 24bit DMA. Okay, so the AmigaOne is more
like the i386 platform, just with a PPC cpu. ;-)

> This may also depend on the host bridge since RAM appears at 2GB on
> default PreP machines, which is an area that you can't access with
> normal ISA DMA anyway. On the MVME machines, you could map PCI addresses
> 0-16 MB anywhere in RAM by reprogramming the host bridge.
This is not the case for the AmigaOne. The RAM starts at physical address 0
(similar to CHRP). AFAIK the host bridge does not allow the remapping of the
address space. Maybe the southbridge can do this for DMA operation. I have
to investigate this. Thanks for the hint!

> > 3. How are DMA buffers used outside the kernel? Do user programs get a
> > pointer to the DMA buffer (in theory) from the device driver or is the
> > data copied to another buffer allocated by an user program?
> 
> If your memory is uncacheable, you are better off copying it to
> cacheable memory. At least you are sure that you only access it 
> once (trying to copy with FP registers to halve the number of
> accesses might be a big win, but you need to be careful).
Sounds like a big performance loss. I hope this is not necessary.

Thanks,

Gerhard

-- 
GMX Produkte empfehlen und ganz einfach Geld verdienen!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:13     ` Eugene Surovegin
  2006-04-20 21:19       ` Eugene Surovegin
@ 2006-04-20 22:39       ` Benjamin Herrenschmidt
  2006-04-20 23:46         ` Gabriel Paubert
  1 sibling, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-20 22:39 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: debian-powerpc, linuxppc-dev

On Thu, 2006-04-20 at 14:13 -0700, Eugene Surovegin wrote:
> On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > Unfortunately, he has to do things a bit differently. He can't afford to
> > have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
> > a reserved pool. Last I looked at our coherent code, it didn't reserve
> > memory at all, just address space, thus assuming the CPU can handle
> > having both a caheable and a non-cacheable mapping of the same pages...
> > (On 6xx this is deadly even if you don't access those cacheable pages
> > because the CPU prefetch may do it for you).
> 
> Ben, is this _real_ problem on 6xx or just a theory? Does 6xx actually 
> prefetch beyond page boundary?
> 
> So far, all "prefetching" I saw which broke non-coherent DMA was not 
> due to the CPU doing prefetching, but _software_ prefetching being 
> too aggressive.

Not 100% certain... we definitely have a bug with AGP on macs currently
for that reason though I yet have to isolate a crash caused by it ;)
(That is, we map AGP pages non-cacheable but they stay in the linear
mapping).

On POWER4, 970 and later, the chip guys confirmed that the problem is
real though. Not only bcs of prefetch but also speculative execution
which can cause the chip to do a load that will never actually be
executed. Imagine for example a loop walking an array, the chip might
speculatively load elements beyond the array by speculatively executing
beyond the branch that ends the loop.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:19       ` Eugene Surovegin
@ 2006-04-20 22:40         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-20 22:40 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: debian-powerpc, linuxppc-dev

On Thu, 2006-04-20 at 14:19 -0700, Eugene Surovegin wrote:
> On Thu, Apr 20, 2006 at 02:13:21PM -0700, Eugene Surovegin wrote:
> > On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > > Unfortunately, he has to do things a bit differently. He can't afford to
> > > have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
> > > a reserved pool. Last I looked at our coherent code, it didn't reserve
> > > memory at all, just address space, thus assuming the CPU can handle
> > > having both a caheable and a non-cacheable mapping of the same pages...
> > > (On 6xx this is deadly even if you don't access those cacheable pages
> > > because the CPU prefetch may do it for you).
> > 
> > Ben, is this _real_ problem on 6xx or just a theory? Does 6xx actually 
> > prefetch beyond page boundary?
> > 
> > So far, all "prefetching" I saw which broke non-coherent DMA was not 
> > due to the CPU doing prefetching, but _software_ prefetching being 
> > too aggressive.
> 
> Even if this "prefetching" problem is real, instead of implementing 
> separate pool for allocations which will be quite rare at best, just 
> allocate guard space before your consistent memory and stop worrying 
> about it.

Won't necessarily help with the speculative execution problem and in
fact, how do you do that in practice ?

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:33     ` Eugene Surovegin
@ 2006-04-20 22:41       ` Benjamin Herrenschmidt
  2006-04-21  8:21         ` Gerhard Pircher
  0 siblings, 1 reply; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-20 22:41 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: debian-powerpc, linuxppc-dev

On Thu, 2006-04-20 at 14:33 -0700, Eugene Surovegin wrote:
> On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > (On 6xx this is deadly even if you don't access those cacheable pages
> > because the CPU prefetch may do it for you).
> 
> Here is another thought if this "prefetch" theory is correct.
> 
> You guys seems to focus on 
> dma_alloc_coherent()/pci_alloc_consistent(), but forgeting about so 
> called "streaming" mappings.
> 
> You cannot just flush/invalidate cache any more, because "CPU can 
> prefetch this data back". So, to be completely correct (if you insist 
> on "6xx can prefetch"-theory), you have to actually _copy_ data to 
> your consistent memory on dma_map_single(). You can imagine 
> performance implications. I suspect even 440 will be faster in this 
> case than G4 :).

Yes.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 22:39       ` Benjamin Herrenschmidt
@ 2006-04-20 23:46         ` Gabriel Paubert
  2006-04-21  0:09           ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 30+ messages in thread
From: Gabriel Paubert @ 2006-04-20 23:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: debian-powerpc, linuxppc-dev

On Fri, Apr 21, 2006 at 08:39:29AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2006-04-20 at 14:13 -0700, Eugene Surovegin wrote:
> > On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > > Unfortunately, he has to do things a bit differently. He can't afford to
> > > have the kernel BAT mapping cover his non-cacheable pages. Thus he needs
> > > a reserved pool. Last I looked at our coherent code, it didn't reserve
> > > memory at all, just address space, thus assuming the CPU can handle
> > > having both a caheable and a non-cacheable mapping of the same pages...
> > > (On 6xx this is deadly even if you don't access those cacheable pages
> > > because the CPU prefetch may do it for you).
> > 
> > Ben, is this _real_ problem on 6xx or just a theory? Does 6xx actually 
> > prefetch beyond page boundary?
> > 
> > So far, all "prefetching" I saw which broke non-coherent DMA was not 
> > due to the CPU doing prefetching, but _software_ prefetching being 
> > too aggressive.
> 
> Not 100% certain... we definitely have a bug with AGP on macs currently
> for that reason though I yet have to isolate a crash caused by it ;)
> (That is, we map AGP pages non-cacheable but they stay in the linear
> mapping).

In this case the problem is double mapping with inconsistent attributes
(through BAT and page tables I assume). 

> 
> On POWER4, 970 and later, the chip guys confirmed that the problem is
> real though. Not only bcs of prefetch but also speculative execution
> which can cause the chip to do a load that will never actually be
> executed. Imagine for example a loop walking an array, the chip might
> speculatively load elements beyond the array by speculatively executing
> beyond the branch that ends the loop.

Even if the page has the guarded bit set?

	Gabriel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 23:46         ` Gabriel Paubert
@ 2006-04-21  0:09           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-21  0:09 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: debian-powerpc, linuxppc-dev


> In this case the problem is double mapping with inconsistent attributes
> (through BAT and page tables I assume). 

Yes.
 
> > On POWER4, 970 and later, the chip guys confirmed that the problem is
> > real though. Not only bcs of prefetch but also speculative execution
> > which can cause the chip to do a load that will never actually be
> > executed. Imagine for example a loop walking an array, the chip might
> > speculatively load elements beyond the array by speculatively executing
> > beyond the branch that ends the loop.
> 
> Even if the page has the guarded bit set?

The BAT mapping doesn't have G set.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 21:55         ` Eugene Surovegin
  2006-04-20 22:08           ` Gerhard Pircher
@ 2006-04-21  4:38           ` Benjamin Herrenschmidt
  2006-04-21  8:03             ` Gerhard Pircher
                               ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-21  4:38 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: debian-powerpc, linuxppc-dev

On Thu, 2006-04-20 at 14:55 -0700, Eugene Surovegin wrote:
> On Thu, Apr 20, 2006 at 11:10:55PM +0200, Gerhard Pircher wrote:
> > Well, Freescale's PPC programming environment manual clearly states that
> > this will not work on G4 CPUs (74xx). Also Benjamin Herrenschmidt told me,
> > that this implementation will not work for the reasons I mentioned before. 
> > The approach I'm trying to implement was his idea, so I have to trust in
> > him.
> 
> Well, you aren't the first person who tries to run G4 with 
> CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
> that those people had to implement anything as complex as you are 
> trying to do.
> 
> You can try asking on #mklinux. It always better to ask people who 
> actually _did_ this :).
> 
> In fact, I just grepped 2.6 and found 
> #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> systems usually have this type of bridge? Not 4xx/8xx, that's for sure.

I think some folks tried ... and failed.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-21  4:38           ` Benjamin Herrenschmidt
@ 2006-04-21  8:03             ` Gerhard Pircher
  2006-04-21 14:33             ` Brent Cook
  2006-04-27 21:31             ` Mark A. Greer
  2 siblings, 0 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-21  8:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Eugene Surovegin <ebs@ebshome.net>
> Kopie: Gerhard Pircher <gerhard_pircher@gmx.net>,
> linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Fri, 21 Apr 2006 14:38:05 +1000
> 
> > In fact, I just grepped 2.6 and found 
> > #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> > systems usually have this type of bridge? Not 4xx/8xx, that's for sure.
> 
> I think some folks tried ... and failed.
That doesn't sound encouraging. ;)

Gerhard

-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 22:41       ` Benjamin Herrenschmidt
@ 2006-04-21  8:21         ` Gerhard Pircher
  0 siblings, 0 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-21  8:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: Eugene Surovegin <ebs@ebshome.net>
> Kopie: Gerhard Pircher <gerhard_pircher@gmx.net>,
> linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Fri, 21 Apr 2006 08:41:09 +1000
> 
> On Thu, 2006-04-20 at 14:33 -0700, Eugene Surovegin wrote:
> > On Fri, Apr 21, 2006 at 07:06:13AM +1000, Benjamin Herrenschmidt wrote:
> > > (On 6xx this is deadly even if you don't access those cacheable pages
> > > because the CPU prefetch may do it for you).
> > 
> > Here is another thought if this "prefetch" theory is correct.
> > 
> > You guys seems to focus on 
> > dma_alloc_coherent()/pci_alloc_consistent(), but forgeting about so 
> > called "streaming" mappings.
> > 
> > You cannot just flush/invalidate cache any more, because "CPU can 
> > prefetch this data back". So, to be completely correct (if you insist 
> > on "6xx can prefetch"-theory), you have to actually _copy_ data to 
> > your consistent memory on dma_map_single(). You can imagine 
> > performance implications. I suspect even 440 will be faster in this 
> > case than G4 :).
> 
> Yes.

I'm not sure, if I can follow you here. Would that mean I have to allocate
two consistent buffers of the same size? I guess the first buffer would be
used for the actual DMA transfer and is read only and the second one by the
CPU for further data processing (after the CPU has copied the data from the
first one to the second one)?

Hmm, doesn't sound feasible for me.. ?)

Gerhard

-- 
Analog-/ISDN-Nutzer sparen mit GMX SmartSurfer bis zu 70%!
Kostenlos downloaden: http://www.gmx.net/de/go/smartsurfer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-21  4:38           ` Benjamin Herrenschmidt
  2006-04-21  8:03             ` Gerhard Pircher
@ 2006-04-21 14:33             ` Brent Cook
  2006-04-21 21:51               ` Benjamin Herrenschmidt
  2006-04-27 21:31             ` Mark A. Greer
  2 siblings, 1 reply; 30+ messages in thread
From: Brent Cook @ 2006-04-21 14:33 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: debian-powerpc

On Thursday 20 April 2006 23:38, Benjamin Herrenschmidt wrote:
> On Thu, 2006-04-20 at 14:55 -0700, Eugene Surovegin wrote:
> > On Thu, Apr 20, 2006 at 11:10:55PM +0200, Gerhard Pircher wrote:
> > > Well, Freescale's PPC programming environment manual clearly states
> > > that this will not work on G4 CPUs (74xx). Also Benjamin Herrenschmidt
> > > told me, that this implementation will not work for the reasons I
> > > mentioned before. The approach I'm trying to implement was his idea, so
> > > I have to trust in him.
> >
> > Well, you aren't the first person who tries to run G4 with
> > CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember
> > that those people had to implement anything as complex as you are
> > trying to do.
> >
> > You can try asking on #mklinux. It always better to ask people who
> > actually _did_ this :).
> >
> > In fact, I just grepped 2.6 and found
> > #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what
> > systems usually have this type of bridge? Not 4xx/8xx, that's for sure.
>
> I think some folks tried ... and failed.
>
> Ben.

I'm not claiming to understand all of the issues here, but I have some 
MV64460 / MPC7448-based systems, and they only boot if 
CONFIG_NOT_COHERENT_CACHE=y

 - Brent

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-21 14:33             ` Brent Cook
@ 2006-04-21 21:51               ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-21 21:51 UTC (permalink / raw)
  To: Brent Cook; +Cc: linuxppc-dev, debian-powerpc

>  not claiming to understand all of the issues here, but I have some 
> MV64460 / MPC7448-based systems, and they only boot if 
> CONFIG_NOT_COHERENT_CACHE=y

That is strange... Pegasos uses a Marvell bridge and it works with
coherent cache. Do you have some kernel patches in addition to what is
in mainstream to make CONFIG_NOT_COHERENT_CACHE work at all on
CONFIG_6xx ? At the moment, it doesn't do much ...

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-20 22:08           ` Gerhard Pircher
@ 2006-04-24 19:21             ` Mark A. Greer
  0 siblings, 0 replies; 30+ messages in thread
From: Mark A. Greer @ 2006-04-24 19:21 UTC (permalink / raw)
  To: Gerhard Pircher; +Cc: linuxppc-dev, debian-powerpc

On Fri, Apr 21, 2006 at 12:08:18AM +0200, Gerhard Pircher wrote:
> > --- Ursprüngliche Nachricht ---
> > Von: Eugene Surovegin <ebs@ebshome.net>
> > An: Gerhard Pircher <gerhard_pircher@gmx.net>
> > Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> > Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> > Datum: Thu, 20 Apr 2006 14:55:14 -0700
> > 
> > Well, you aren't the first person who tries to run G4 with 
> > CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
> > that those people had to implement anything as complex as you are 
> > trying to do.
> 
> Maybe these systems have cache coherent northbridges, which is not the case
> for the AmigaOne and its "famous" ArticiaS northbridge.

Nope.  I believe Eugene is referring to the Marvell 64x60 line of *North*
bridges.

> > You can try asking on #mklinux. It always better to ask people who 
> > actually _did_ this :).
> > 
> > In fact, I just grepped 2.6 and found 
> > #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> > systems usually have this type of bridge? Not 4xx/8xx, that's for sure.
> 
> Hmm, strange. AFAIK the NOT_COHERENT_CACHE config option is available only
> for the 4xx and 8xx platforms. Wouldn't the config option depend on
> CONFIG_6XX too, if there are not cache coherent systems with G4 cpus? 
> 
> At least I could not compile in the dma-mapping.c file without modifying the
> Kconfig file.

If you're looking in arch/ppc/Kconfig in either the kernel.org or paulus'
git trees, look further down.  There is a separate option where
NOT_COHERENT_CACHE can be set for 64x60 bridges.

Mark

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-21  4:38           ` Benjamin Herrenschmidt
  2006-04-21  8:03             ` Gerhard Pircher
  2006-04-21 14:33             ` Brent Cook
@ 2006-04-27 21:31             ` Mark A. Greer
  2006-04-27 21:53               ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 30+ messages in thread
From: Mark A. Greer @ 2006-04-27 21:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

On Fri, Apr 21, 2006 at 02:38:05PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2006-04-20 at 14:55 -0700, Eugene Surovegin wrote:
> > On Thu, Apr 20, 2006 at 11:10:55PM +0200, Gerhard Pircher wrote:
> > > Well, Freescale's PPC programming environment manual clearly states that
> > > this will not work on G4 CPUs (74xx). Also Benjamin Herrenschmidt told me,
> > > that this implementation will not work for the reasons I mentioned before. 
> > > The approach I'm trying to implement was his idea, so I have to trust in
> > > him.
> > 
> > Well, you aren't the first person who tries to run G4 with 
> > CONFIG_NOT_COHERENT_CACHE. This was done before and I don't remember 
> > that those people had to implement anything as complex as you are 
> > trying to do.
> > 
> > You can try asking on #mklinux. It always better to ask people who 
> > actually _did_ this :).
> > 
> > In fact, I just grepped 2.6 and found 
> > #ifdef(CONFIG_NOT_COHERENT_CACHE) in syslib/mv64x60.c. Guess what 
> > systems usually have this type of bridge? Not 4xx/8xx, that's for sure.
> 
> I think some folks tried ... and failed.

Who has tried and failed?

There are many mv64x60 based platforms working just fine today with
CONFIG_NOT_COHERENT_CACHE defined.  The reason for turning coherency off
is that there is a bug in the bridge requiring a hardware workaround.
Unfortunately, not all of the hardware vendors have implemented that
workaround and I know of one that considers it infeasible and will
not implement it.

I expect that the pegasos has that hardware workaround implemented so
the kernel maintainers for that platform have the good fortune of being
able to run with coherency on.

What Ben says is correct, there is that issue.  However, AFAIK, I have
not yet to run into it.

If that hardware workaround is not implemented, the options are:
a) 100% chance of a system hang with coherency on
or
b) < 0.0..1% chance of a system hang with coherency off (at least in my
experience to far).

The choice is simple.

Mark

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-27 21:31             ` Mark A. Greer
@ 2006-04-27 21:53               ` Benjamin Herrenschmidt
  2006-04-27 22:08                 ` Mark A. Greer
  2006-04-29 17:57                 ` Gerhard Pircher
  0 siblings, 2 replies; 30+ messages in thread
From: Benjamin Herrenschmidt @ 2006-04-27 21:53 UTC (permalink / raw)
  To: Mark A. Greer; +Cc: linuxppc-dev, debian-powerpc


> There are many mv64x60 based platforms working just fine today with
> CONFIG_NOT_COHERENT_CACHE defined.  The reason for turning coherency off
> is that there is a bug in the bridge requiring a hardware workaround.
> Unfortunately, not all of the hardware vendors have implemented that
> workaround and I know of one that considers it infeasible and will
> not implement it.

Define "working fine" ... With the current implementation, and according
to the spec, it will randomly crap out or checkstop due to the same page
beging accessed via the NCU and being in the L2 unless you disabled
speculative loads and made sure it can't prefetch accross page
boundaries maybe ? Or set the G bit all over the BAT mapping (ouch !).

> I expect that the pegasos has that hardware workaround implemented so
> the kernel maintainers for that platform have the good fortune of being
> able to run with coherency on.

I suppose so...

> What Ben says is correct, there is that issue.  However, AFAIK, I have
> not yet to run into it.

Hrm... well, I wouldn't rely on that tho.

> If that hardware workaround is not implemented, the options are:
> a) 100% chance of a system hang with coherency on
> or
> b) < 0.0..1% chance of a system hang with coherency off (at least in my
> experience to far).
> 
> The choice is simple.

I disagree. A solution that is known to have a hole in it is no good
even if you haven't managed to trigger it so far. Now it's Gerhard's
choice.

Ben.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-27 21:53               ` Benjamin Herrenschmidt
@ 2006-04-27 22:08                 ` Mark A. Greer
  2006-04-29 17:57                 ` Gerhard Pircher
  1 sibling, 0 replies; 30+ messages in thread
From: Mark A. Greer @ 2006-04-27 22:08 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

On Fri, Apr 28, 2006 at 07:53:29AM +1000, Benjamin Herrenschmidt wrote:
> 
> > There are many mv64x60 based platforms working just fine today with
> > CONFIG_NOT_COHERENT_CACHE defined.  The reason for turning coherency off
> > is that there is a bug in the bridge requiring a hardware workaround.
> > Unfortunately, not all of the hardware vendors have implemented that
> > workaround and I know of one that considers it infeasible and will
> > not implement it.
> 
> Define "working fine" ... With the current implementation, and according
> to the spec, it will randomly crap out or checkstop due to the same page
> beging accessed via the NCU and being in the L2 unless you disabled
> speculative loads and made sure it can't prefetch accross page
> boundaries maybe ? Or set the G bit all over the BAT mapping (ouch !).

"working fine" == running in a production environment for
weeks/months without crapping out.

> > If that hardware workaround is not implemented, the options are:
> > a) 100% chance of a system hang with coherency on
> > or
> > b) < 0.0..1% chance of a system hang with coherency off (at least in my
> > experience to far).
> > 
> > The choice is simple.
> 
> I disagree. A solution that is known to have a hole in it is no good
> even if you haven't managed to trigger it so far. Now it's Gerhard's
> choice.

TBH, I haven't really looked at what Gerhard is doing yet so I can't
comment.  No matter what, though, its certainly his choice.

Mark

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
  2006-04-27 21:53               ` Benjamin Herrenschmidt
  2006-04-27 22:08                 ` Mark A. Greer
@ 2006-04-29 17:57                 ` Gerhard Pircher
  1 sibling, 0 replies; 30+ messages in thread
From: Gerhard Pircher @ 2006-04-29 17:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, debian-powerpc

> --- Ursprüngliche Nachricht ---
> Von: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> An: "Mark A. Greer" <mgreer@mvista.com>
> Kopie: linuxppc-dev@ozlabs.org, debian-powerpc@lists.debian.org
> Betreff: Re: Not coherent cache DMA for G3/G4 CPUs: clarification needed
> Datum: Fri, 28 Apr 2006 07:53:29 +1000
> 
> > What Ben says is correct, there is that issue.  However, AFAIK, I have
> > not yet to run into it.
> 
> Hrm... well, I wouldn't rely on that tho.
> 
> > If that hardware workaround is not implemented, the options are:
> > a) 100% chance of a system hang with coherency on
> > or
> > b) < 0.0..1% chance of a system hang with coherency off (at least in my
> > experience to far).
> > 
> > The choice is simple.
> 
> I disagree. A solution that is known to have a hole in it is no good
> even if you haven't managed to trigger it so far. Now it's Gerhard's
> choice.
The choice isn't so simple (at least for me):

I read some old posts of AmigaOS4 developers in the last days. It seems they
just do cache flushes at the beginning/end and during (sync) a DMA transfer.
Also the memory used for DMA is marked as cacheable!? Only the memory used
for the PRD tables (for the IDE controller) is marked as cache inhibited.

I tried to get in contact with some OS4 developers, but I couldn't get an
answer yet. :-(

So I will try out the CONFIG_NOT_COHERENT_CACHE implementation first. As far
as I could understand OS4 does not use BATs for memory mapping, thus the
requisites are not really the same, but it's worth a try. On the other side
I don't understand why the PRD tables have to be in non cacheable memory and
I don't like the idea to modify the Linux IDE driver to do a cache
flush/invalidate for the PRD table memory area.

Thanks again for all your help!

Gerhard

-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2006-04-29 17:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-20 18:57 Not coherent cache DMA for G3/G4 CPUs: clarification needed Gerhard Pircher
2006-04-20 20:38 ` Eugene Surovegin
2006-04-20 20:56   ` Gerhard Pircher
2006-04-20 21:02     ` Eugene Surovegin
2006-04-20 21:10       ` Gerhard Pircher
2006-04-20 21:55         ` Eugene Surovegin
2006-04-20 22:08           ` Gerhard Pircher
2006-04-24 19:21             ` Mark A. Greer
2006-04-21  4:38           ` Benjamin Herrenschmidt
2006-04-21  8:03             ` Gerhard Pircher
2006-04-21 14:33             ` Brent Cook
2006-04-21 21:51               ` Benjamin Herrenschmidt
2006-04-27 21:31             ` Mark A. Greer
2006-04-27 21:53               ` Benjamin Herrenschmidt
2006-04-27 22:08                 ` Mark A. Greer
2006-04-29 17:57                 ` Gerhard Pircher
2006-04-20 21:06   ` Benjamin Herrenschmidt
2006-04-20 21:13     ` Eugene Surovegin
2006-04-20 21:19       ` Eugene Surovegin
2006-04-20 22:40         ` Benjamin Herrenschmidt
2006-04-20 22:39       ` Benjamin Herrenschmidt
2006-04-20 23:46         ` Gabriel Paubert
2006-04-21  0:09           ` Benjamin Herrenschmidt
2006-04-20 21:33     ` Eugene Surovegin
2006-04-20 22:41       ` Benjamin Herrenschmidt
2006-04-21  8:21         ` Gerhard Pircher
2006-04-20 21:03 ` Benjamin Herrenschmidt
2006-04-20 21:33   ` Gerhard Pircher
2006-04-20 22:07 ` Gabriel Paubert
2006-04-20 22:26   ` Gerhard Pircher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).