linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* scatter/gather DMA and cache coherency
@ 2006-02-16  7:21 Phil Nitschke
  2006-02-16  8:03 ` Eugene Surovegin
  2006-02-16 17:46 ` Mark A. Greer
  0 siblings, 2 replies; 10+ messages in thread
From: Phil Nitschke @ 2006-02-16  7:21 UTC (permalink / raw)
  To: linuxppc-embedded

Hi,

I've been using a PCI device driver developed by a third party company.
It uses a scatter/gather DMA I/O to transfer data from the PCI device
into user memory.  When using a buffer size of about 1 MB, the driver
achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
bus.

The problem is, that sometimes the data is corrupt (usually on the first
transfer).  We've concluded that the problem is related to cache
coherency.  The Artesyn 2.6.10 reference kernel (branched from the
kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
as Artesyn have never successfully verified operation with hardware
coherency enabled.
My understanding is that their Marvel system controller (MV64460)
supports cache snooping, but their Linux kernel support hasn't caught up
yet.

So if I understand my situation correctly, the device driver must use
software-enforced coherency to avoid data corruption.  Is this correct?

What currently happens is this:

The buffers are allocated with get_user_pages(...)

After each DMA transfer is complete, the driver invalidates the cache
using  __dma_sync_page(...)

Only on close() does the driver set the pages dirty, like this:

  /* Set each cache page dirty */
  for (ipage = 0; ipage < nr_pages; ipage++)
  {
    if (!PageReserved (pages[ipage]))
      SetPageDirty ( pages[ ipage ] );
  }

  /* Every mapped page must be released from the page cache */
  for (ipage = 0; ipage < nr_pages; ipage++)
    page_cache_release ( pages[ ipage ] );

According to my reading of "Linux Device Drivers, Third Edition" by
Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman,
SetPageDirty() should be called every time the pages are changed (not
just when the pages are released).  (OTOH, the text does not mention the
__dma_sync_page() routine at all.)

Could this be the cause of the corruption we're seeing?

If not, are there any other steps required to enforce "software"
coherency?

--
Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16  7:21 Phil Nitschke
@ 2006-02-16  8:03 ` Eugene Surovegin
  2006-02-16 13:52   ` Phil Nitschke
  2006-02-16 17:46 ` Mark A. Greer
  1 sibling, 1 reply; 10+ messages in thread
From: Eugene Surovegin @ 2006-02-16  8:03 UTC (permalink / raw)
  To: Phil Nitschke; +Cc: linuxppc-embedded

On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
> Hi,
> 
> I've been using a PCI device driver developed by a third party company.
> It uses a scatter/gather DMA I/O to transfer data from the PCI device
> into user memory.  When using a buffer size of about 1 MB, the driver
> achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
> bus.
> 
> The problem is, that sometimes the data is corrupt (usually on the first
> transfer).  We've concluded that the problem is related to cache
> coherency.  The Artesyn 2.6.10 reference kernel (branched from the
> kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
> as Artesyn have never successfully verified operation with hardware
> coherency enabled.
> My understanding is that their Marvel system controller (MV64460)
> supports cache snooping, but their Linux kernel support hasn't caught up
> yet.
> 
> So if I understand my situation correctly, the device driver must use
> software-enforced coherency to avoid data corruption.  Is this correct?
> 
> What currently happens is this:
> 
> The buffers are allocated with get_user_pages(...)
> 
> After each DMA transfer is complete, the driver invalidates the cache
> using  __dma_sync_page(...)

No, buffers must be invalidated _before_ DMA transfer, not after. 
Also, don't use internal PPC functions like __dma_sync_page. Please, 
read Documentation/DMA-API.txt for official API.

-- 
Eugene

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16  8:03 ` Eugene Surovegin
@ 2006-02-16 13:52   ` Phil Nitschke
  2006-02-16 16:33     ` Eugene Surovegin
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Nitschke @ 2006-02-16 13:52 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded

>>>>> "ES" == Eugene Surovegin <ebs@ebshome.net> writes:

  ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
  >> Hi,
  >> 
  >> I've been using a PCI device driver developed by a third party
  >> company.  It uses a scatter/gather DMA I/O to transfer data from
  >> the PCI device into user memory.  When using a buffer size of
  >> about 1 MB, the driver achieves a transfer bandwidth of about 60
  >> MB/s, on a 66 MHz, 32-bit bus.
  >> 
  >> The problem is, that sometimes the data is corrupt (usually on
  >> the first transfer).  We've concluded that the problem is related
  >> to cache coherency.  The Artesyn 2.6.10 reference kernel
  >> (branched from the kernel at penguinppc.org) must be built with
  >> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
  >> verified operation with hardware coherency enabled.  My
  >> understanding is that their Marvel system controller (MV64460)
  >> supports cache snooping, but their Linux kernel support hasn't
  >> caught up yet.
  >> 
  >> So if I understand my situation correctly, the device driver must
  >> use software-enforced coherency to avoid data corruption.  Is
  >> this correct?
  >> 
  >> What currently happens is this:
  >> 
  >> The buffers are allocated with get_user_pages(...)
  >> 
  >> After each DMA transfer is complete, the driver invalidates the
  >> cache using __dma_sync_page(...)

  ES> No, buffers must be invalidated _before_ DMA transfer, not
  ES> after.  Also, don't use internal PPC functions like
  ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
  ES> official API.

Thanks for the suggestions.  I'd like to point out, however, a few
points: 

1/.  I did not write the driver (see my first line above).  I'm
     reading someone else's source and trying to figure out whether it
     is right or wrong, so I can discuss with them authoritatively
     what is going on.

2/.  I'm not _sure_ I understand terms like software-enforced
     coherency, non-consistent platforms, etc.  So should I be looking
     at the API in section I or II of DMA-API.txt ?  (I think section 'Id')

3/.  I think I did not explain the DMA process clearly enough.  This
     is how the third party documentation says the driver should be
     used (my annotations in parenthesis): 

	- Allocate and lock buffer into physical memory
            (Call driver ioctl function to map user DMA buffer using
            get_user_pages()) 
	- Configure DMA chain
	- Start DMA transfer
            (Set ID of the DMA descriptor that the DMA controller
            shall load first.  Allow target to perform bus-mastered
            DMA into platform memory)
	- Wait for DMA transfer to complete
            (interrupt signals end of transfer from target)
	- Do Cache Invalidate
            (Call driver ioctl which calls __dma_sync_page(), to
            invalidate the cache prior to reading the buffer from the
            host CPU.  Then copy data from buffer into other user
            memory.)
	- Unlock and free buffer from physical memory
            (Call device driver ioctl function which calls
            free_user_pages()) 

     So is __dma_sync_page being called by their driver routines at
     the wrong time?

4/.  The DMA-API.txt says:
        "Memory coherency operates at a granularity called the cache
        line width.  In order for memory mapped by this API to operate
        correctly, the mapped region must begin exactly on a cache
        line boundary and end exactly on one (to prevent two
        separately mapped regions from sharing a single cache line)."

     Given that we're not relying on cache snooping, and we call
     functions to invalidate the cache, does this statement still
     apply? 

Thanks again,

-- 
Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16 13:52   ` Phil Nitschke
@ 2006-02-16 16:33     ` Eugene Surovegin
  0 siblings, 0 replies; 10+ messages in thread
From: Eugene Surovegin @ 2006-02-16 16:33 UTC (permalink / raw)
  To: Phil.Nitschke; +Cc: linuxppc-embedded

On Fri, Feb 17, 2006 at 12:22:11AM +1030, Phil Nitschke wrote:
> >>>>> "ES" == Eugene Surovegin <ebs@ebshome.net> writes:
> 
>   ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
>   >> Hi,
>   >> 
>   >> I've been using a PCI device driver developed by a third party
>   >> company.  It uses a scatter/gather DMA I/O to transfer data from
>   >> the PCI device into user memory.  When using a buffer size of
>   >> about 1 MB, the driver achieves a transfer bandwidth of about 60
>   >> MB/s, on a 66 MHz, 32-bit bus.
>   >> 
>   >> The problem is, that sometimes the data is corrupt (usually on
>   >> the first transfer).  We've concluded that the problem is related
>   >> to cache coherency.  The Artesyn 2.6.10 reference kernel
>   >> (branched from the kernel at penguinppc.org) must be built with
>   >> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
>   >> verified operation with hardware coherency enabled.  My
>   >> understanding is that their Marvel system controller (MV64460)
>   >> supports cache snooping, but their Linux kernel support hasn't
>   >> caught up yet.
>   >> 
>   >> So if I understand my situation correctly, the device driver must
>   >> use software-enforced coherency to avoid data corruption.  Is
>   >> this correct?
>   >> 
>   >> What currently happens is this:
>   >> 
>   >> The buffers are allocated with get_user_pages(...)
>   >> 
>   >> After each DMA transfer is complete, the driver invalidates the
>   >> cache using __dma_sync_page(...)
> 
>   ES> No, buffers must be invalidated _before_ DMA transfer, not
>   ES> after.  Also, don't use internal PPC functions like
>   ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
>   ES> official API.
> 

[snip]

> 2/.  I'm not _sure_ I understand terms like software-enforced
>      coherency, non-consistent platforms, etc.  So should I be looking
>      at the API in section I or II of DMA-API.txt ?  (I think section 'Id')

Non-consistent means without cache snooping. On such platforms you 
have to use software enforced cache coherency or non-cached memory for 
DMA.

> 
> 3/.  I think I did not explain the DMA process clearly enough.  This
>      is how the third party documentation says the driver should be
>      used (my annotations in parenthesis): 
> 
> 	- Allocate and lock buffer into physical memory
>             (Call driver ioctl function to map user DMA buffer using
>             get_user_pages()) 
> 	- Configure DMA chain
> 	- Start DMA transfer
>             (Set ID of the DMA descriptor that the DMA controller
>             shall load first.  Allow target to perform bus-mastered
>             DMA into platform memory)
> 	- Wait for DMA transfer to complete
>             (interrupt signals end of transfer from target)
> 	- Do Cache Invalidate
>             (Call driver ioctl which calls __dma_sync_page(), to
>             invalidate the cache prior to reading the buffer from the
>             host CPU.  Then copy data from buffer into other user
>             memory.)
> 	- Unlock and free buffer from physical memory
>             (Call device driver ioctl function which calls
>             free_user_pages()) 
> 
>      So is __dma_sync_page being called by their driver routines at
>      the wrong time?

As I said before, invalidate must be done _before_ initiating DMA 
transfer. If that "third party documentation" states otherwise, that 
means people who wrote it didn't understand how caches work.

Consider the following scenario, you allocated page from kernel page 
allocator. Some parts of that page are in L1 cache and are dirty 
(e.g. because they were recently used), I'm assuming cache is 
write-back. You start DMA transfer and go on with some other tasks. 
For some reason, those dirty lines are forced out of cache, e.g. 
because L1 needs cache lines for some other data. During this write 
back you overwrite already DMAed data and end up with memory 
corruption.

> 
> 4/.  The DMA-API.txt says:
>         "Memory coherency operates at a granularity called the cache
>         line width.  In order for memory mapped by this API to operate
>         correctly, the mapped region must begin exactly on a cache
>         line boundary and end exactly on one (to prevent two
>         separately mapped regions from sharing a single cache line)."
> 
>      Given that we're not relying on cache snooping, and we call
>      functions to invalidate the cache, does this statement still
>      apply? 

Yes. Cache line granularity is very important for software enforced 
cache coherency.

I'd recommend you look at any driver which works on non-coherent cache 
platform like 4xx or 8xx for good examples on how to manage cache 
coherency.

-- 
Eugene

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16  7:21 Phil Nitschke
  2006-02-16  8:03 ` Eugene Surovegin
@ 2006-02-16 17:46 ` Mark A. Greer
  2006-02-17  1:22   ` Phil Nitschke
  1 sibling, 1 reply; 10+ messages in thread
From: Mark A. Greer @ 2006-02-16 17:46 UTC (permalink / raw)
  To: Phil Nitschke; +Cc: linuxppc-embedded

On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:

> The problem is, that sometimes the data is corrupt (usually on the first
> transfer).  We've concluded that the problem is related to cache
> coherency.  The Artesyn 2.6.10 reference kernel (branched from the
> kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
> as Artesyn have never successfully verified operation with hardware
> coherency enabled.
> My understanding is that their Marvel system controller (MV64460)
> supports cache snooping, but their Linux kernel support hasn't caught up
> yet.

It would have been useful if you had given the actual hardware you're
using.  It sure sounds like you're using a katana or a very similar
board.  Coherency can't work on the katana b/c there is a hw
erratum of the bridge that is not implemented on that board so
"CONFIG_NOT_COHERENT_CACHE=y" is the only option.  Fix the hardware
and the kernel will work with coherency enabled with a flip of a
switch (on the latest kernel).

For the record, don't assume that this is Artesyn's fault.  Artesyn says
that the erratum workaround is impractical and they may be right.
I don't know, I just write software...

> So if I understand my situation correctly, the device driver must use
> software-enforced coherency to avoid data corruption.  Is this correct?

It looks like Eugene is guiding you on this.  Listen to him.  I will add
that you should align your buffers on cacheline boundaries and make the
allocation sizes multiples of the cacheline size otherwise you could
have other data sharing the first and/or last cacheline of your buffers
and mess up your software cache mgmt.

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: scatter/gather DMA and cache coherency
@ 2006-02-16 18:23 Buhler, Greg
  2006-02-16 22:19 ` Phil Nitschke
  0 siblings, 1 reply; 10+ messages in thread
From: Buhler, Greg @ 2006-02-16 18:23 UTC (permalink / raw)
  To: Phil.Nitschke, linuxppc-embedded

Phil,
If the third party DMA driver is not proprietary send it over and I'd be
happy to take a look at it for you. I have been working with an
(unfortunately proprietary) scatter/gather DMA driver which uses all 4
of the DMA channels on a PPC405gp and have had to fix several cache
coherency problems to get SGDMA working properly.

I have this driver working properly on a branch of linux-2.4.21, and am
currently porting it to linux-2.6.15.4.

Make sure to post any findings you have to the list.

______________________
Greg Buhler
760.476.2699

-----Original Message-----
From: linuxppc-embedded-bounces+greg.buhler=3Dviasat.com@ozlabs.org
[mailto:linuxppc-embedded-bounces+greg.buhler=3Dviasat.com@ozlabs.org] =
On
Behalf Of Phil Nitschke
Sent: Wednesday, February 15, 2006 11:21 PM
To: linuxppc-embedded@ozlabs.org
Subject: scatter/gather DMA and cache coherency

Hi,

I've been using a PCI device driver developed by a third party company.
It uses a scatter/gather DMA I/O to transfer data from the PCI device
into user memory.  When using a buffer size of about 1 MB, the driver
achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
bus.

The problem is, that sometimes the data is corrupt (usually on the first
transfer).  We've concluded that the problem is related to cache
coherency.  The Artesyn 2.6.10 reference kernel (branched from the
kernel at penguinppc.org) must be built with
CONFIG_NOT_COHERENT_CACHE=3Dy,
as Artesyn have never successfully verified operation with hardware
coherency enabled.
My understanding is that their Marvel system controller (MV64460)
supports cache snooping, but their Linux kernel support hasn't caught up
yet.

So if I understand my situation correctly, the device driver must use
software-enforced coherency to avoid data corruption.  Is this correct?

What currently happens is this:

The buffers are allocated with get_user_pages(...)

After each DMA transfer is complete, the driver invalidates the cache
using  __dma_sync_page(...)

Only on close() does the driver set the pages dirty, like this:

  /* Set each cache page dirty */
  for (ipage =3D 0; ipage < nr_pages; ipage++)
  {
    if (!PageReserved (pages[ipage]))
      SetPageDirty ( pages[ ipage ] );
  }

  /* Every mapped page must be released from the page cache */
  for (ipage =3D 0; ipage < nr_pages; ipage++)
    page_cache_release ( pages[ ipage ] );

According to my reading of "Linux Device Drivers, Third Edition" by
Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman,
SetPageDirty() should be called every time the pages are changed (not
just when the pages are released).  (OTOH, the text does not mention the
__dma_sync_page() routine at all.)

Could this be the cause of the corruption we're seeing?

If not, are there any other steps required to enforce "software"
coherency?

--
Phil
_______________________________________________
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16 18:23 scatter/gather DMA and cache coherency Buhler, Greg
@ 2006-02-16 22:19 ` Phil Nitschke
  2006-02-16 22:52   ` Eugene Surovegin
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Nitschke @ 2006-02-16 22:19 UTC (permalink / raw)
  To: Buhler, Greg; +Cc: linuxppc-embedded

>>>>> "GB" == Buhler, Greg <greg.buhler@viasat.com> writes:

  GB> Phil, If the third party DMA driver is not proprietary send it
  GB> over and I'd be happy to take a look at it for you. 

I don't think I can, due to this in the code:

========================================================================
/*
Copyright Notice:
  This computer software is proprietary to VMETRO. The use of this software
  is governed by a licensing agreement. VMETRO retains all rights under
  the copyright laws of the United States of America and other countries.
  This software may not be furnished or disclosed to any third party and
  may not be copied or reproduced by any means, electronic, mechanical, or
  otherwise, in whole or in part, without specific authorization in writing
  from VMETRO.
 
    Copyright (c) 1996-2005 by VMETRO, ASA.  All Rights Reserved.
*/

[snip]

/* Set the right GPL license to avoid warrnings then loading the driver */
MODULE_LICENSE("GPL");
========================================================================

Can you have a GPL driver where the source is copyright?

Thanks for the offer, Greg.

-- 
Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16 22:19 ` Phil Nitschke
@ 2006-02-16 22:52   ` Eugene Surovegin
  0 siblings, 0 replies; 10+ messages in thread
From: Eugene Surovegin @ 2006-02-16 22:52 UTC (permalink / raw)
  To: Phil Nitschke; +Cc: linuxppc-embedded

On Fri, Feb 17, 2006 at 08:49:50AM +1030, Phil Nitschke wrote:
> >>>>> "GB" == Buhler, Greg <greg.buhler@viasat.com> writes:
> 
>   GB> Phil, If the third party DMA driver is not proprietary send it
>   GB> over and I'd be happy to take a look at it for you. 
> 
> I don't think I can, due to this in the code:
> 
> ========================================================================
> /*
> Copyright Notice:
>   This computer software is proprietary to VMETRO. The use of this software
>   is governed by a licensing agreement. VMETRO retains all rights under
>   the copyright laws of the United States of America and other countries.
>   This software may not be furnished or disclosed to any third party and
>   may not be copied or reproduced by any means, electronic, mechanical, or
>   otherwise, in whole or in part, without specific authorization in writing
>   from VMETRO.
>  
>     Copyright (c) 1996-2005 by VMETRO, ASA.  All Rights Reserved.
> */
> 
> [snip]
> 
> /* Set the right GPL license to avoid warrnings then loading the driver */
> MODULE_LICENSE("GPL");
> ========================================================================
> 

I'm not a lawyer, but what they are doing is of questionable legality 
at least, they circumvent Linux protection but claiming that module is 
GPL, but that copyright notice isn't GPL compatible.

If you are going to sell systems with this module, you may have 
trouble with your customers, because you'll clearly be violating GPL.

My experience with such vendors - their code isn't worth the trouble 
(I have yet to see good Linux driver written by hw vendor) and I'd 
rather avoid them completely.

-- 
Eugene

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-16 17:46 ` Mark A. Greer
@ 2006-02-17  1:22   ` Phil Nitschke
  2006-02-17 18:12     ` Mark A. Greer
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Nitschke @ 2006-02-17  1:22 UTC (permalink / raw)
  To: Mark A. Greer; +Cc: linuxppc-embedded

>>>>> "MAG" == Mark A Greer <mgreer@mvista.com> writes:

  MAG> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
  >> The problem is, that sometimes the data is corrupt (usually on the
  >> first transfer).  We've concluded that the problem is related to
  >> cache coherency.  The Artesyn 2.6.10 reference kernel (branched
  >> from the kernel at penguinppc.org) must be built with
  >> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
  >> verified operation with hardware coherency enabled.  My
  >> understanding is that their Marvel system controller (MV64460)
  >> supports cache snooping, but their Linux kernel support hasn't
  >> caught up yet.

  MAG> It would have been useful if you had given the actual hardware
  MAG> you're using.

Processor: http://www.artesyncp.com/products/PmPPC7448.html

  MAG> For the record, don't assume that this is Artesyn's fault.
  MAG> Artesyn says that the erratum workaround is impractical and they
  MAG> may be right.  I don't know, I just write software...

I don't know either.  I don't have a problem with Artesyn; they've
always been nice to me ;-)  Here's what one of their engineers had to
say on the topic:

  Artesyn> I stated in a previous email that our boards must have the
  Artesyn> CONFIG_NOT_COHERENT_CACHE option turned on.  This is because
  Artesyn> or our history with the Discovery family of bridges.
  Artesyn> Initially it was reported that the hardware cache coherency
  Artesyn> (snooping) was known to be not functional.  Then at a later
  Artesyn> date when it was supposed to be fixed, we found that it was
  Artesyn> not completely dependable so Artesyn has taken a stance to
  Artesyn> not trust snooping on the Discovery chips and to always use
  Artesyn> software cache coherency methods.

  >> So if I understand my situation correctly, the device driver must
  >> use software-enforced coherency to avoid data corruption.  Is this
  >> correct?

  MAG> It looks like Eugene is guiding you on this.  Listen to him.  I
  MAG> will add that you should align your buffers on cacheline
  MAG> boundaries and make the allocation sizes multiples of the
  MAG> cacheline size otherwise you could have other data sharing the
  MAG> first and/or last cacheline of your buffers and mess up your
  MAG> software cache mgmt.

It might well be that the third party driver isn't enforcing the
cacheline boundary alignment.  Artesyn tell me that "it is stated in the
MV64460 Users Manual that when interfacing cache coherent DRAM or
integrated SRAM, the maximum write burst size must be set to 32 bytes".
So I guess this is that cacheline size?  Anyway, we don't see any
corruption when the DMA buffer size is 32 bytes, but we do see it for 24
bytes, 36 bytes, etc.

I'll discuss this with the H/W vendors that wrote the driver.

--
Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: scatter/gather DMA and cache coherency
  2006-02-17  1:22   ` Phil Nitschke
@ 2006-02-17 18:12     ` Mark A. Greer
  0 siblings, 0 replies; 10+ messages in thread
From: Mark A. Greer @ 2006-02-17 18:12 UTC (permalink / raw)
  To: Phil Nitschke; +Cc: linuxppc-embedded

Hi Phil,

On Fri, Feb 17, 2006 at 11:52:31AM +1030, Phil Nitschke wrote:
> >>>>> "MAG" == Mark A Greer <mgreer@mvista.com> writes:

<snip>

>   MAG> It would have been useful if you had given the actual hardware
>   MAG> you're using.
> 
> Processor: http://www.artesyncp.com/products/PmPPC7448.html

Okay but since its a ppmc module, the motherboard its installed on would
be useful info too.  Don't worry about it now, more for future reference.

>   MAG> For the record, don't assume that this is Artesyn's fault.
>   MAG> Artesyn says that the erratum workaround is impractical and they
>   MAG> may be right.  I don't know, I just write software...
> 
> I don't know either.  I don't have a problem with Artesyn; they've
> always been nice to me ;-)  Here's what one of their engineers had to
> say on the topic:
> 
>   Artesyn> I stated in a previous email that our boards must have the
>   Artesyn> CONFIG_NOT_COHERENT_CACHE option turned on.  This is because
>   Artesyn> or our history with the Discovery family of bridges.
>   Artesyn> Initially it was reported that the hardware cache coherency
>   Artesyn> (snooping) was known to be not functional.  Then at a later
>   Artesyn> date when it was supposed to be fixed, we found that it was
>   Artesyn> not completely dependable so Artesyn has taken a stance to
>   Artesyn> not trust snooping on the Discovery chips and to always use
>   Artesyn> software cache coherency methods.

Yep.  I didn't mean to implicate Artesyn.  Marvell bridges [so far] have
all had problems with coherency so I definitely believe what's written
above.

>   >> So if I understand my situation correctly, the device driver must
>   >> use software-enforced coherency to avoid data corruption.  Is this
>   >> correct?
> 
>   MAG> It looks like Eugene is guiding you on this.  Listen to him.  I
>   MAG> will add that you should align your buffers on cacheline
>   MAG> boundaries and make the allocation sizes multiples of the
>   MAG> cacheline size otherwise you could have other data sharing the
>   MAG> first and/or last cacheline of your buffers and mess up your
>   MAG> software cache mgmt.
> 
> It might well be that the third party driver isn't enforcing the
> cacheline boundary alignment.

If it isn't, then you have a bug and it will bite you.

> Artesyn tell me that "it is stated in the
> MV64460 Users Manual that when interfacing cache coherent DRAM or
> integrated SRAM, the maximum write burst size must be set to 32 bytes".

Yes, but you [should] have coherency off so this isn't an issue for you.

> So I guess this is that cacheline size?

Correct, the cacheline size of the 7448 is 32 bytes.

> Anyway, we don't see any
> corruption when the DMA buffer size is 32 bytes, but we do see it for 24
> bytes, 36 bytes, etc.

This sounds like what I was referring to.  Do you see the problem?

If you have some other data in the same cacheline as your buffers
(or buffer descriptors) then whenever that other data is read/written
you have the potential for it to screw up the manual cache mgmt you
*thought* you did for your buffers/buf desc's.

Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-02-17 18:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-16 18:23 scatter/gather DMA and cache coherency Buhler, Greg
2006-02-16 22:19 ` Phil Nitschke
2006-02-16 22:52   ` Eugene Surovegin
  -- strict thread matches above, loose matches on Subject: below --
2006-02-16  7:21 Phil Nitschke
2006-02-16  8:03 ` Eugene Surovegin
2006-02-16 13:52   ` Phil Nitschke
2006-02-16 16:33     ` Eugene Surovegin
2006-02-16 17:46 ` Mark A. Greer
2006-02-17  1:22   ` Phil Nitschke
2006-02-17 18:12     ` Mark A. Greer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).