* Re: scatter/gather DMA and cache coherency
From: Phil Nitschke @ 2006-02-17 1:22 UTC (permalink / raw)
To: Mark A. Greer; +Cc: linuxppc-embedded
In-Reply-To: <20060216174655.GC16848@mag.az.mvista.com>
>>>>> "MAG" == Mark A Greer <mgreer@mvista.com> writes:
MAG> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
>> The problem is, that sometimes the data is corrupt (usually on the
>> first transfer). We've concluded that the problem is related to
>> cache coherency. The Artesyn 2.6.10 reference kernel (branched
>> from the kernel at penguinppc.org) must be built with
>> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
>> verified operation with hardware coherency enabled. My
>> understanding is that their Marvel system controller (MV64460)
>> supports cache snooping, but their Linux kernel support hasn't
>> caught up yet.
MAG> It would have been useful if you had given the actual hardware
MAG> you're using.
Processor: http://www.artesyncp.com/products/PmPPC7448.html
MAG> For the record, don't assume that this is Artesyn's fault.
MAG> Artesyn says that the erratum workaround is impractical and they
MAG> may be right. I don't know, I just write software...
I don't know either. I don't have a problem with Artesyn; they've
always been nice to me ;-) Here's what one of their engineers had to
say on the topic:
Artesyn> I stated in a previous email that our boards must have the
Artesyn> CONFIG_NOT_COHERENT_CACHE option turned on. This is because
Artesyn> or our history with the Discovery family of bridges.
Artesyn> Initially it was reported that the hardware cache coherency
Artesyn> (snooping) was known to be not functional. Then at a later
Artesyn> date when it was supposed to be fixed, we found that it was
Artesyn> not completely dependable so Artesyn has taken a stance to
Artesyn> not trust snooping on the Discovery chips and to always use
Artesyn> software cache coherency methods.
>> So if I understand my situation correctly, the device driver must
>> use software-enforced coherency to avoid data corruption. Is this
>> correct?
MAG> It looks like Eugene is guiding you on this. Listen to him. I
MAG> will add that you should align your buffers on cacheline
MAG> boundaries and make the allocation sizes multiples of the
MAG> cacheline size otherwise you could have other data sharing the
MAG> first and/or last cacheline of your buffers and mess up your
MAG> software cache mgmt.
It might well be that the third party driver isn't enforcing the
cacheline boundary alignment. Artesyn tell me that "it is stated in the
MV64460 Users Manual that when interfacing cache coherent DRAM or
integrated SRAM, the maximum write burst size must be set to 32 bytes".
So I guess this is that cacheline size? Anyway, we don't see any
corruption when the DMA buffer size is 32 bytes, but we do see it for 24
bytes, 36 bytes, etc.
I'll discuss this with the H/W vendors that wrote the driver.
--
Phil
^ permalink raw reply
* Re: How to access uboot environment variables from Linux?
From: Wolfgang Denk @ 2006-02-16 23:53 UTC (permalink / raw)
To: Bizhan Gholikhamseh (bgholikh); +Cc: linuxppc-dev
In-Reply-To: <F795765B112E7344AF36AA9112796415019ECCF1@xmb-sjc-212.amer.cisco.com>
In message <F795765B112E7344AF36AA9112796415019ECCF1@xmb-sjc-212.amer.cisco.com> you wrote:
>
> How could I access the uboot environment variables from Linux? For
This question is off topic here; you should ask such stuff on the
u-boot-uswers mailing list instead.
> example I would like to access the "serverip"
> and change that to a different ip address during run time.
You should also read the FAQ's before posting. See here:
http://www.denx.de/wiki/view/DULG/HowCanIAccessUBootEnvironmentVariablesInLinux
Best regards,
Wolfgang Denk
--
Software Engineering: Embedded and Realtime Systems, Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Our business is run on trust. We trust you will pay in advance.
^ permalink raw reply
* SMP support for PPC405 on Virtex-II Pro
From: Eric L @ 2006-02-16 22:56 UTC (permalink / raw)
To: linuxppc-embedded
Hi all,
I know MontaVista has commercial kernels for this feature.
I also know PPC405 doesn't implement cache coherency.
Anyhow, is there an open kernel that has SMP support for
PPC405 cores? Does anyone have a suggestions on getting
dual PPC405 cores to work under Linux other than having two
copies of kernels running? Thanks plenty!
-Eric
_________________________________________________________________
Dont just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Eugene Surovegin @ 2006-02-16 22:52 UTC (permalink / raw)
To: Phil Nitschke; +Cc: linuxppc-embedded
In-Reply-To: <kw3bijdjzt.fsf@lamorak.int.avalon.com.au>
On Fri, Feb 17, 2006 at 08:49:50AM +1030, Phil Nitschke wrote:
> >>>>> "GB" == Buhler, Greg <greg.buhler@viasat.com> writes:
>
> GB> Phil, If the third party DMA driver is not proprietary send it
> GB> over and I'd be happy to take a look at it for you.
>
> I don't think I can, due to this in the code:
>
> ========================================================================
> /*
> Copyright Notice:
> This computer software is proprietary to VMETRO. The use of this software
> is governed by a licensing agreement. VMETRO retains all rights under
> the copyright laws of the United States of America and other countries.
> This software may not be furnished or disclosed to any third party and
> may not be copied or reproduced by any means, electronic, mechanical, or
> otherwise, in whole or in part, without specific authorization in writing
> from VMETRO.
>
> Copyright (c) 1996-2005 by VMETRO, ASA. All Rights Reserved.
> */
>
> [snip]
>
> /* Set the right GPL license to avoid warrnings then loading the driver */
> MODULE_LICENSE("GPL");
> ========================================================================
>
I'm not a lawyer, but what they are doing is of questionable legality
at least, they circumvent Linux protection but claiming that module is
GPL, but that copyright notice isn't GPL compatible.
If you are going to sell systems with this module, you may have
trouble with your customers, because you'll clearly be violating GPL.
My experience with such vendors - their code isn't worth the trouble
(I have yet to see good Linux driver written by hw vendor) and I'd
rather avoid them completely.
--
Eugene
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Phil Nitschke @ 2006-02-16 22:19 UTC (permalink / raw)
To: Buhler, Greg; +Cc: linuxppc-embedded
In-Reply-To: <68997D3094017740BB875EB2A425A6EA02E46197@VCAEXCH01.hq.corp.viasat.com>
>>>>> "GB" == Buhler, Greg <greg.buhler@viasat.com> writes:
GB> Phil, If the third party DMA driver is not proprietary send it
GB> over and I'd be happy to take a look at it for you.
I don't think I can, due to this in the code:
========================================================================
/*
Copyright Notice:
This computer software is proprietary to VMETRO. The use of this software
is governed by a licensing agreement. VMETRO retains all rights under
the copyright laws of the United States of America and other countries.
This software may not be furnished or disclosed to any third party and
may not be copied or reproduced by any means, electronic, mechanical, or
otherwise, in whole or in part, without specific authorization in writing
from VMETRO.
Copyright (c) 1996-2005 by VMETRO, ASA. All Rights Reserved.
*/
[snip]
/* Set the right GPL license to avoid warrnings then loading the driver */
MODULE_LICENSE("GPL");
========================================================================
Can you have a GPL driver where the source is copyright?
Thanks for the offer, Greg.
--
Phil
^ permalink raw reply
* [PATCH 2.6.16rc2] EST8260 has bogus bd_info
From: Paul Gortmaker @ 2006-02-16 19:14 UTC (permalink / raw)
To: linuxppc-embedded; +Cc: p_gortmaker
I managed to rescue an old EST8260 board from a life as a doorstop, and
after sticking u-boot on it, I was getting nothing but a silent death.
I eventually discovered it wouldn't boot because est8260.h had its own
personal copy of an ancient bd_info struct that doesn't match any U-boot
from this century.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
--- linux-2.6.16rc2-orig/arch/ppc/platforms/est8260.h 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.16rc2/arch/ppc/platforms/est8260.h 2006-02-16 12:04:02.000000000 -0500
@@ -6,30 +6,15 @@
#ifndef __EST8260_PLATFORM
#define __EST8260_PLATFORM
+#include <linux/config.h>
+#include <asm/ppcboot.h>
+
#define CPM_MAP_ADDR ((uint)0xf0000000)
#define BOOTROM_RESTART_ADDR ((uint)0xff000104)
/* For our show_cpuinfo hooks. */
-#define CPUINFO_VENDOR "EST Corporation"
-#define CPUINFO_MACHINE "SBC8260 PowerPC"
-
-/* A Board Information structure that is given to a program when
- * prom starts it up.
- */
-typedef struct bd_info {
- unsigned int bi_memstart; /* Memory start address */
- unsigned int bi_memsize; /* Memory (end) size in bytes */
- unsigned int bi_intfreq; /* Internal Freq, in Hz */
- unsigned int bi_busfreq; /* Bus Freq, in MHz */
- unsigned int bi_cpmfreq; /* CPM Freq, in MHz */
- unsigned int bi_brgfreq; /* BRG Freq, in MHz */
- unsigned int bi_vco; /* VCO Out from PLL */
- unsigned int bi_baudrate; /* Default console baud rate */
- unsigned int bi_immr; /* IMMR when called from boot rom */
- unsigned char bi_enetaddr[6];
-} bd_t;
-
-extern bd_t m8xx_board_info;
+#define CPUINFO_VENDOR "Wind River"
+#define CPUINFO_MACHINE "EST SBC8260 PowerPC"
#endif /* __EST8260_PLATFORM */
^ permalink raw reply
* PEMICRO CABLEPPC and GDB
From: dibacco @ 2006-02-16 19:22 UTC (permalink / raw)
To: linuxppc-embedded
I have bought a P&E MICRO CABLEPPC (parallel bdm wiggler). Someone was su=
ccessful using it with Linux? Is there a driver for linux?
Bye,
Antonio.
^ permalink raw reply
* module_init macro seems not to work
From: dibacco @ 2006-02-16 19:20 UTC (permalink / raw)
To: linuxppc-embedded
If I use something like:
module_init(cpm_timer_init);
cpm_timer_init is not called when I load the module.
I have to call cpm_timer_init inside init_module() .
Bye,
Antonio.
^ permalink raw reply
* RE: scatter/gather DMA and cache coherency
From: Buhler, Greg @ 2006-02-16 18:23 UTC (permalink / raw)
To: Phil.Nitschke, linuxppc-embedded
Phil,
If the third party DMA driver is not proprietary send it over and I'd be
happy to take a look at it for you. I have been working with an
(unfortunately proprietary) scatter/gather DMA driver which uses all 4
of the DMA channels on a PPC405gp and have had to fix several cache
coherency problems to get SGDMA working properly.
I have this driver working properly on a branch of linux-2.4.21, and am
currently porting it to linux-2.6.15.4.
Make sure to post any findings you have to the list.
______________________
Greg Buhler
760.476.2699
-----Original Message-----
From: linuxppc-embedded-bounces+greg.buhler=3Dviasat.com@ozlabs.org
[mailto:linuxppc-embedded-bounces+greg.buhler=3Dviasat.com@ozlabs.org] =
On
Behalf Of Phil Nitschke
Sent: Wednesday, February 15, 2006 11:21 PM
To: linuxppc-embedded@ozlabs.org
Subject: scatter/gather DMA and cache coherency
Hi,
I've been using a PCI device driver developed by a third party company.
It uses a scatter/gather DMA I/O to transfer data from the PCI device
into user memory. When using a buffer size of about 1 MB, the driver
achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
bus.
The problem is, that sometimes the data is corrupt (usually on the first
transfer). We've concluded that the problem is related to cache
coherency. The Artesyn 2.6.10 reference kernel (branched from the
kernel at penguinppc.org) must be built with
CONFIG_NOT_COHERENT_CACHE=3Dy,
as Artesyn have never successfully verified operation with hardware
coherency enabled.
My understanding is that their Marvel system controller (MV64460)
supports cache snooping, but their Linux kernel support hasn't caught up
yet.
So if I understand my situation correctly, the device driver must use
software-enforced coherency to avoid data corruption. Is this correct?
What currently happens is this:
The buffers are allocated with get_user_pages(...)
After each DMA transfer is complete, the driver invalidates the cache
using __dma_sync_page(...)
Only on close() does the driver set the pages dirty, like this:
/* Set each cache page dirty */
for (ipage =3D 0; ipage < nr_pages; ipage++)
{
if (!PageReserved (pages[ipage]))
SetPageDirty ( pages[ ipage ] );
}
/* Every mapped page must be released from the page cache */
for (ipage =3D 0; ipage < nr_pages; ipage++)
page_cache_release ( pages[ ipage ] );
According to my reading of "Linux Device Drivers, Third Edition" by
Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman,
SetPageDirty() should be called every time the pages are changed (not
just when the pages are released). (OTOH, the text does not mention the
__dma_sync_page() routine at all.)
Could this be the cause of the corruption we're seeing?
If not, are there any other steps required to enforce "software"
coherency?
--
Phil
_______________________________________________
Linuxppc-embedded mailing list
Linuxppc-embedded@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-embedded
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Mark A. Greer @ 2006-02-16 17:46 UTC (permalink / raw)
To: Phil Nitschke; +Cc: linuxppc-embedded
In-Reply-To: <kw4q2zg45r.fsf@lamorak.int.avalon.com.au>
On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
> The problem is, that sometimes the data is corrupt (usually on the first
> transfer). We've concluded that the problem is related to cache
> coherency. The Artesyn 2.6.10 reference kernel (branched from the
> kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
> as Artesyn have never successfully verified operation with hardware
> coherency enabled.
> My understanding is that their Marvel system controller (MV64460)
> supports cache snooping, but their Linux kernel support hasn't caught up
> yet.
It would have been useful if you had given the actual hardware you're
using. It sure sounds like you're using a katana or a very similar
board. Coherency can't work on the katana b/c there is a hw
erratum of the bridge that is not implemented on that board so
"CONFIG_NOT_COHERENT_CACHE=y" is the only option. Fix the hardware
and the kernel will work with coherency enabled with a flip of a
switch (on the latest kernel).
For the record, don't assume that this is Artesyn's fault. Artesyn says
that the erratum workaround is impractical and they may be right.
I don't know, I just write software...
> So if I understand my situation correctly, the device driver must use
> software-enforced coherency to avoid data corruption. Is this correct?
It looks like Eugene is guiding you on this. Listen to him. I will add
that you should align your buffers on cacheline boundaries and make the
allocation sizes multiples of the cacheline size otherwise you could
have other data sharing the first and/or last cacheline of your buffers
and mess up your software cache mgmt.
Mark
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Eugene Surovegin @ 2006-02-16 16:33 UTC (permalink / raw)
To: Phil.Nitschke; +Cc: linuxppc-embedded
In-Reply-To: <wqfymjif78.fsf@toby.int.avalon.com.au>
On Fri, Feb 17, 2006 at 12:22:11AM +1030, Phil Nitschke wrote:
> >>>>> "ES" == Eugene Surovegin <ebs@ebshome.net> writes:
>
> ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
> >> Hi,
> >>
> >> I've been using a PCI device driver developed by a third party
> >> company. It uses a scatter/gather DMA I/O to transfer data from
> >> the PCI device into user memory. When using a buffer size of
> >> about 1 MB, the driver achieves a transfer bandwidth of about 60
> >> MB/s, on a 66 MHz, 32-bit bus.
> >>
> >> The problem is, that sometimes the data is corrupt (usually on
> >> the first transfer). We've concluded that the problem is related
> >> to cache coherency. The Artesyn 2.6.10 reference kernel
> >> (branched from the kernel at penguinppc.org) must be built with
> >> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
> >> verified operation with hardware coherency enabled. My
> >> understanding is that their Marvel system controller (MV64460)
> >> supports cache snooping, but their Linux kernel support hasn't
> >> caught up yet.
> >>
> >> So if I understand my situation correctly, the device driver must
> >> use software-enforced coherency to avoid data corruption. Is
> >> this correct?
> >>
> >> What currently happens is this:
> >>
> >> The buffers are allocated with get_user_pages(...)
> >>
> >> After each DMA transfer is complete, the driver invalidates the
> >> cache using __dma_sync_page(...)
>
> ES> No, buffers must be invalidated _before_ DMA transfer, not
> ES> after. Also, don't use internal PPC functions like
> ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
> ES> official API.
>
[snip]
> 2/. I'm not _sure_ I understand terms like software-enforced
> coherency, non-consistent platforms, etc. So should I be looking
> at the API in section I or II of DMA-API.txt ? (I think section 'Id')
Non-consistent means without cache snooping. On such platforms you
have to use software enforced cache coherency or non-cached memory for
DMA.
>
> 3/. I think I did not explain the DMA process clearly enough. This
> is how the third party documentation says the driver should be
> used (my annotations in parenthesis):
>
> - Allocate and lock buffer into physical memory
> (Call driver ioctl function to map user DMA buffer using
> get_user_pages())
> - Configure DMA chain
> - Start DMA transfer
> (Set ID of the DMA descriptor that the DMA controller
> shall load first. Allow target to perform bus-mastered
> DMA into platform memory)
> - Wait for DMA transfer to complete
> (interrupt signals end of transfer from target)
> - Do Cache Invalidate
> (Call driver ioctl which calls __dma_sync_page(), to
> invalidate the cache prior to reading the buffer from the
> host CPU. Then copy data from buffer into other user
> memory.)
> - Unlock and free buffer from physical memory
> (Call device driver ioctl function which calls
> free_user_pages())
>
> So is __dma_sync_page being called by their driver routines at
> the wrong time?
As I said before, invalidate must be done _before_ initiating DMA
transfer. If that "third party documentation" states otherwise, that
means people who wrote it didn't understand how caches work.
Consider the following scenario, you allocated page from kernel page
allocator. Some parts of that page are in L1 cache and are dirty
(e.g. because they were recently used), I'm assuming cache is
write-back. You start DMA transfer and go on with some other tasks.
For some reason, those dirty lines are forced out of cache, e.g.
because L1 needs cache lines for some other data. During this write
back you overwrite already DMAed data and end up with memory
corruption.
>
> 4/. The DMA-API.txt says:
> "Memory coherency operates at a granularity called the cache
> line width. In order for memory mapped by this API to operate
> correctly, the mapped region must begin exactly on a cache
> line boundary and end exactly on one (to prevent two
> separately mapped regions from sharing a single cache line)."
>
> Given that we're not relying on cache snooping, and we call
> functions to invalidate the cache, does this statement still
> apply?
Yes. Cache line granularity is very important for software enforced
cache coherency.
I'd recommend you look at any driver which works on non-coherent cache
platform like 4xx or 8xx for good examples on how to manage cache
coherency.
--
Eugene
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Phil Nitschke @ 2006-02-16 13:52 UTC (permalink / raw)
To: Eugene Surovegin; +Cc: linuxppc-embedded
In-Reply-To: <20060216080303.GD23150@gate.ebshome.net>
>>>>> "ES" == Eugene Surovegin <ebs@ebshome.net> writes:
ES> On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
>> Hi,
>>
>> I've been using a PCI device driver developed by a third party
>> company. It uses a scatter/gather DMA I/O to transfer data from
>> the PCI device into user memory. When using a buffer size of
>> about 1 MB, the driver achieves a transfer bandwidth of about 60
>> MB/s, on a 66 MHz, 32-bit bus.
>>
>> The problem is, that sometimes the data is corrupt (usually on
>> the first transfer). We've concluded that the problem is related
>> to cache coherency. The Artesyn 2.6.10 reference kernel
>> (branched from the kernel at penguinppc.org) must be built with
>> CONFIG_NOT_COHERENT_CACHE=y, as Artesyn have never successfully
>> verified operation with hardware coherency enabled. My
>> understanding is that their Marvel system controller (MV64460)
>> supports cache snooping, but their Linux kernel support hasn't
>> caught up yet.
>>
>> So if I understand my situation correctly, the device driver must
>> use software-enforced coherency to avoid data corruption. Is
>> this correct?
>>
>> What currently happens is this:
>>
>> The buffers are allocated with get_user_pages(...)
>>
>> After each DMA transfer is complete, the driver invalidates the
>> cache using __dma_sync_page(...)
ES> No, buffers must be invalidated _before_ DMA transfer, not
ES> after. Also, don't use internal PPC functions like
ES> __dma_sync_page. Please, read Documentation/DMA-API.txt for
ES> official API.
Thanks for the suggestions. I'd like to point out, however, a few
points:
1/. I did not write the driver (see my first line above). I'm
reading someone else's source and trying to figure out whether it
is right or wrong, so I can discuss with them authoritatively
what is going on.
2/. I'm not _sure_ I understand terms like software-enforced
coherency, non-consistent platforms, etc. So should I be looking
at the API in section I or II of DMA-API.txt ? (I think section 'Id')
3/. I think I did not explain the DMA process clearly enough. This
is how the third party documentation says the driver should be
used (my annotations in parenthesis):
- Allocate and lock buffer into physical memory
(Call driver ioctl function to map user DMA buffer using
get_user_pages())
- Configure DMA chain
- Start DMA transfer
(Set ID of the DMA descriptor that the DMA controller
shall load first. Allow target to perform bus-mastered
DMA into platform memory)
- Wait for DMA transfer to complete
(interrupt signals end of transfer from target)
- Do Cache Invalidate
(Call driver ioctl which calls __dma_sync_page(), to
invalidate the cache prior to reading the buffer from the
host CPU. Then copy data from buffer into other user
memory.)
- Unlock and free buffer from physical memory
(Call device driver ioctl function which calls
free_user_pages())
So is __dma_sync_page being called by their driver routines at
the wrong time?
4/. The DMA-API.txt says:
"Memory coherency operates at a granularity called the cache
line width. In order for memory mapped by this API to operate
correctly, the mapped region must begin exactly on a cache
line boundary and end exactly on one (to prevent two
separately mapped regions from sharing a single cache line)."
Given that we're not relying on cache snooping, and we call
functions to invalidate the cache, does this statement still
apply?
Thanks again,
--
Phil
^ permalink raw reply
* Re: scatter/gather DMA and cache coherency
From: Eugene Surovegin @ 2006-02-16 8:03 UTC (permalink / raw)
To: Phil Nitschke; +Cc: linuxppc-embedded
In-Reply-To: <kw4q2zg45r.fsf@lamorak.int.avalon.com.au>
On Thu, Feb 16, 2006 at 05:51:20PM +1030, Phil Nitschke wrote:
> Hi,
>
> I've been using a PCI device driver developed by a third party company.
> It uses a scatter/gather DMA I/O to transfer data from the PCI device
> into user memory. When using a buffer size of about 1 MB, the driver
> achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
> bus.
>
> The problem is, that sometimes the data is corrupt (usually on the first
> transfer). We've concluded that the problem is related to cache
> coherency. The Artesyn 2.6.10 reference kernel (branched from the
> kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
> as Artesyn have never successfully verified operation with hardware
> coherency enabled.
> My understanding is that their Marvel system controller (MV64460)
> supports cache snooping, but their Linux kernel support hasn't caught up
> yet.
>
> So if I understand my situation correctly, the device driver must use
> software-enforced coherency to avoid data corruption. Is this correct?
>
> What currently happens is this:
>
> The buffers are allocated with get_user_pages(...)
>
> After each DMA transfer is complete, the driver invalidates the cache
> using __dma_sync_page(...)
No, buffers must be invalidated _before_ DMA transfer, not after.
Also, don't use internal PPC functions like __dma_sync_page. Please,
read Documentation/DMA-API.txt for official API.
--
Eugene
^ permalink raw reply
* scatter/gather DMA and cache coherency
From: Phil Nitschke @ 2006-02-16 7:21 UTC (permalink / raw)
To: linuxppc-embedded
Hi,
I've been using a PCI device driver developed by a third party company.
It uses a scatter/gather DMA I/O to transfer data from the PCI device
into user memory. When using a buffer size of about 1 MB, the driver
achieves a transfer bandwidth of about 60 MB/s, on a 66 MHz, 32-bit
bus.
The problem is, that sometimes the data is corrupt (usually on the first
transfer). We've concluded that the problem is related to cache
coherency. The Artesyn 2.6.10 reference kernel (branched from the
kernel at penguinppc.org) must be built with CONFIG_NOT_COHERENT_CACHE=y,
as Artesyn have never successfully verified operation with hardware
coherency enabled.
My understanding is that their Marvel system controller (MV64460)
supports cache snooping, but their Linux kernel support hasn't caught up
yet.
So if I understand my situation correctly, the device driver must use
software-enforced coherency to avoid data corruption. Is this correct?
What currently happens is this:
The buffers are allocated with get_user_pages(...)
After each DMA transfer is complete, the driver invalidates the cache
using __dma_sync_page(...)
Only on close() does the driver set the pages dirty, like this:
/* Set each cache page dirty */
for (ipage = 0; ipage < nr_pages; ipage++)
{
if (!PageReserved (pages[ipage]))
SetPageDirty ( pages[ ipage ] );
}
/* Every mapped page must be released from the page cache */
for (ipage = 0; ipage < nr_pages; ipage++)
page_cache_release ( pages[ ipage ] );
According to my reading of "Linux Device Drivers, Third Edition" by
Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman,
SetPageDirty() should be called every time the pages are changed (not
just when the pages are released). (OTOH, the text does not mention the
__dma_sync_page() routine at all.)
Could this be the cause of the corruption we're seeing?
If not, are there any other steps required to enforce "software"
coherency?
--
Phil
^ permalink raw reply
* Re: Re: Gigabit ethernet support of ppc440gx in 2.6 and 2.4
From: 廖荣生 @ 2006-02-16 4:45 UTC (permalink / raw)
To: Eugene Surovegin; +Cc: linuxppc-embedded
Hi Eugene:
Does the TCP/IP Acceleration Hardware of 440GX have been supported in official 2.6 kernel?
How about the CPU utilization when you get 900+Mb/s? Since we want to do something such as simple datas codec at the same time.
Regards,
Lonsn
>
>On Wed, Feb 15, 2006 at 02:08:52PM +0800, ????????? wrote:
>> We want to get a data rate of 600Mbits/s over gigabit ethernet of ppc440gx.
>> How about the status of support to ppc440gx GigE in Linux kernel?
>> Which kernel version should we select? 2.6 or 2.4?
>
>GigE support for 440GX is in official 2.6. Patch for 2.4 is available
>at http://kernel.ebshome.net. If you don't feel comfortable dealing
>with kernel patches, I'd recommend 2.6
>
>Effective Ethernet throughput highly depends on packet size. For some
>small packet sizes 600Mb/s is theoretically impossible over GigE.
>
>I achieved 900+ Mb/s TCP throughput with my driver (packets around 4K
>long) and using sendfile(2) based test application.
>
>--
>Eugene
>
>
^ permalink raw reply
* Re: Re: Gigabit ethernet support of ppc440gx in 2.6 and 2.4
From: Eugene Surovegin @ 2006-02-16 4:49 UTC (permalink / raw)
To: ?????????; +Cc: linuxppc-embedded
In-Reply-To: <43F402AC.026BD0.10674>
On Thu, Feb 16, 2006 at 12:45:15PM +0800, ????????? wrote:
> Does the TCP/IP Acceleration Hardware of 440GX have been supported
> in official 2.6 kernel?
> How about the CPU utilization when you get 900+Mb/s? Since we want
> to do something such as simple datas codec at the same time.
My driver (both 2.4 and 2.6) supports TCP/UDP checksum offload. No TSO
yet.
I don't remember exact CPU load numbers, but it was less than 20% for
TX case (Ocotea was transmitting data).
--
Eugene
^ permalink raw reply
* Re: PowerQUICC II Pro MPC8349E-MDS Linux 2.6 support
From: David Hawkins @ 2006-02-16 0:01 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-embedded
In-Reply-To: <Pine.LNX.4.44.0602151743030.450-100000@gate.crashing.org>
>> ... for the MPC8349E-MDS board?
>
> Yes, Freescale names the board four different things durings its
> development (ADS, SYS, MDS, ok maybe only three :)
Ok, great.
>>In the Denx source, there is modified versions for the
>>TQM board, but I wasn't sure what platform the original
>>files were targeted for.
>
> Not sure, I follow. The MPC834x MDS was the first 834x system supported.
Sorry, my fault, I wasn't clear.
I was just referring to the fact that in addition to your
source, the Denx tree also has the work they are doing on
the TQM board that contains an 8349E.
> However, if you are looking at 834x, I recommend you grab my u-boot tree
> from kernel.org since it has support for booting the kernel with a flat
> device tree. From 2.6.16, all future 83xx work will be done in
> arch/powerpc which requires a flat dev tree to boot.
Ok.
The reason I was asking, was that I want to benchmark an 8349E
based system. Wolfgang Denx mentioned the TQM system, so I took
a look in the Denx tree, and saw that their work was based
on yours. I assumed your work was for a Freescale reference board,
but didn't know which.
Thanks for the clarification.
Dave
^ permalink raw reply
* Re: PowerQUICC II Pro MPC8349E-MDS Linux 2.6 support
From: Kumar Gala @ 2006-02-15 23:45 UTC (permalink / raw)
To: David Hawkins; +Cc: linuxppc-embedded
In-Reply-To: <43F3BE4A.7070603@ovro.caltech.edu>
On Wed, 15 Feb 2006, David Hawkins wrote:
>
> Hi Kumar,
>
> I saw your email earlier, so figured you were listening in
> on the PPC group at the moment.
>
> In the recent 2.6 kernel source, you authored the file:
>
> arch/ppc/platforms/83xx/mpc834x_sys.h and .c
>
> are these files for the MPC8349E-MDS board?
Yes, Freescale names the board four different things durings its
development (ADS, SYS, MDS, ok maybe only three :)
> In the Denx source, there is modified versions for the
> TQM board, but I wasn't sure what platform the original
> files were targeted for.
Not sure, I follow. The MPC834x MDS was the first 834x system supported.
However, if you are looking at 834x, I recommend you grab my u-boot tree
from kernel.org since it has support for booting the kernel with a flat
device tree. From 2.6.16, all future 83xx work will be done in
arch/powerpc which requires a flat dev tree to boot.
- kumar
^ permalink raw reply
* PowerQUICC II Pro MPC8349E-MDS Linux 2.6 support
From: David Hawkins @ 2006-02-15 23:50 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-embedded
In-Reply-To: <Pine.LNX.4.44.0602151722050.30048-100000@gate.crashing.org>
Hi Kumar,
I saw your email earlier, so figured you were listening in
on the PPC group at the moment.
In the recent 2.6 kernel source, you authored the file:
arch/ppc/platforms/83xx/mpc834x_sys.h and .c
are these files for the MPC8349E-MDS board?
In the Denx source, there is modified versions for the
TQM board, but I wasn't sure what platform the original
files were targeted for.
Thanks,
Dave Hawkins,
Caltech.
^ permalink raw reply
* Re: Cache-inhibited region for certain exception handler(e500 chips, 2.4 kernel)?
From: Kumar Gala @ 2006-02-15 23:25 UTC (permalink / raw)
To: risc10; +Cc: linuxppc-embedded
In-Reply-To: <000201c6327c$1a2b80f0$b07c520a@fsl.freescale.net>
On Wed, 15 Feb 2006, Xianghua Xiao wrote:
> Is there a way to put certain exception handler(e.g. machine check) on e500
> to a cache-inhibited region?
>
> 1. The e500 kernel puts exception handlers at the starting of the physical
> memory.
> 2. All the physical memory are covered by a few TLB1s to do
> 0xc0000000-0x00000000 translation.
> 3. We can not add a new TLB1 to map a small piece of memory, because it has
> boundary limitation(4K...256M). We can not use two TLB1 to overlap since it
> will cause program error.
> 4. When we tried to move a handler(e.g. machine check) to a different
> location, the kernel won't boot.
> 5. We don't want to map all the exceptional handlers to be cache inhibited,
> say, the first 1MB, the performance will be horrible if we do so.
>
> Is there a way at all to tweak things like this, i.e., put an exception
> handler into a piece of memory that is cache-inhibited?
>
> I also thought about use mlock/mmap on /dev/mem, move the specific exception
> handler to a high address then use a separate TLB1 to cover it(need change
> link script?),etc.
>
Why exactly do you the mcheck handler to be cache inhibited? One simple
way would be to setup a temp mapping in kernel virtual address space
somewhere and have the first thing the current handler does is jump to
that location.
- kumar
^ permalink raw reply
* Cache-inhibited region for certain exception handler(e500 chips, 2.4 kernel)?
From: Xianghua Xiao @ 2006-02-15 22:06 UTC (permalink / raw)
To: linuxppc-embedded
[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]
Is there a way to put certain exception handler(e.g. machine check) on e500
to a cache-inhibited region?
1. The e500 kernel puts exception handlers at the starting of the physical
memory.
2. All the physical memory are covered by a few TLB1s to do
0xc0000000-0x00000000 translation.
3. We can not add a new TLB1 to map a small piece of memory, because it has
boundary limitation(4K...256M). We can not use two TLB1 to overlap since it
will cause program error.
4. When we tried to move a handler(e.g. machine check) to a different
location, the kernel won't boot.
5. We don't want to map all the exceptional handlers to be cache inhibited,
say, the first 1MB, the performance will be horrible if we do so.
Is there a way at all to tweak things like this, i.e., put an exception
handler into a piece of memory that is cache-inhibited?
I also thought about use mlock/mmap on /dev/mem, move the specific exception
handler to a high address then use a separate TLB1 to cover it(need change
link script?),etc.
Any suggestion is greatly appreciated.
xianghua
[-- Attachment #2: Type: text/html, Size: 2687 bytes --]
^ permalink raw reply
* Re: MMU is enabled in u-boot?
From: Eugene Surovegin @ 2006-02-15 22:00 UTC (permalink / raw)
To: dibacco@inwind.it; +Cc: linuxppc-embedded
In-Reply-To: <IUR10W$F435AD432BF7E29E360C74B2CBBCC42B@libero.it>
On Wed, Feb 15, 2006 at 10:58:08PM +0100, dibacco@inwind.it wrote:
> u-boot works with MMU enabled? Why is it needed, if so?
Depends on CPU. Some CPUs just don't work without MMU.
--
Eugene
^ permalink raw reply
* MMU is enabled in u-boot?
From: dibacco @ 2006-02-15 21:58 UTC (permalink / raw)
To: linuxppc-embedded
u-boot works with MMU enabled? Why is it needed, if so?
^ permalink raw reply
* Re: IMAP_ADDR is virtual or physical address?
From: Kumar Gala @ 2006-02-15 21:36 UTC (permalink / raw)
To: dibacco@inwind.it; +Cc: linuxppc-embedded
In-Reply-To: <IUR08Q$EA50D33FB87244406C6EB74722F8E71C@libero.it>
On Wed, 15 Feb 2006, dibacco@inwind.it wrote:
> I'm wondering if IMAP_ADDR is a virtual address or a physical one. Normally I see things like this in drivers:
>
> static volatile immap_t *immr = (immap_t *) IMAP_ADDR;
>
> It seems therefore a virtual address.
IMAP_ADDR tends to be a physical address, however its mapped 1:1 in the
kernel via io_block_mapping() on a number of systems to get the 1:1
mapping.
This is somewhat frowned upon, and drivers should really do their own
ioremap() with a physical address for IMAP_ADDR.
>
> But I can see the same also in some u-boot code where I imagine we are accessing physical addresses.
>
> Clear my doubt please!!
In u-boot everything is mapped 1:1 so there is no difference between virt
and phys addrs.
- kumar
^ permalink raw reply
* IMAP_ADDR is virtual or physical address?
From: dibacco @ 2006-02-15 21:41 UTC (permalink / raw)
To: linuxppc-embedded
I'm wondering if IMAP_ADDR is a virtual address or a physical one. Normal=
ly I see things like this in drivers:
static volatile immap_t *immr =3D (immap_t *) IMAP_ADDR;
It seems therefore a virtual address.
But I can see the same also in some u-boot code where I imagine we are ac=
cessing physical addresses.
Clear my doubt please!!
Bye,
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox