* Re: PXA270 SSP DMA Corruption
@ 2008-11-12 23:14 Ned Forrester
0 siblings, 0 replies; 8+ messages in thread
From: Ned Forrester @ 2008-11-12 23:14 UTC (permalink / raw)
To: spi-devel; +Cc: J. Scott Merritt
missed spi-devel on distribution...
-------- Original Message --------
Subject: Re: PXA270 SSP DMA Corruption
Date: Wed, 12 Nov 2008 18:10:01 -0500
From: Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org>
Organization: Woods Hole Oceanographic Institution
To: J. Scott Merritt <merrij3-IL7dBOYR4Vg@public.gmane.org>
CC: linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW@public.gmane.org,
stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR@public.gmane.org, David Brownell <david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>
References: <20080211174339.73ca7ed5.merrij3-IL7dBOYR4Vg@public.gmane.org>
<47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> <20081107184819.54baa679.merrij3-IL7dBOYR4Vg@public.gmane.org>
J. Scott Merritt wrote:
> Long ago, I posted a message on linux-arm-kernel describing data
> corruption that I was seeing on PXA270 SSP transfers using DMA:
> http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2
>
> I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL
> interface provided by spidev ... and unfortunately am still seeing
> data corruption with DMA transfers.
>
> I have managed to reproduce the problem using the internal loopback
> facility of the PXA270 SSP hardware so it should be possible to
> test this problem in other environments. On my system, the program
> below reports an error within 30-50 attempts. From the data reported,
> it would appear the the DMA controller (or memory cache) is transferring
> the contents of the previous buffer rather than the current buffer.
>
> I have also included the platform data that initializes the SSP
> device drivers. Perhaps I have misconfigured or misued this in some
> way ??? Note that simply disabling DMA in the platform data allows
> the test to run indefinitely without errors.
[snip code]
Sorry for the delayed reply.
I have looked over your code, and tried to familiarize myself with
spidev. I cannot easily test your code because I am still using 2.6.20,
and spidev is was not introduced until 2.6.22. I don't see any obvious
problems with either your test code or spidev. Certainly, spidev is of
similar complexity to pxa2xx_spi, so I would say that they are similarly
likely to have issues. I don't know how much use spidev has gotten, but
it is probably more than pxa2xx_spi, because it works across platforms.
There are a couple of things to consider:
1. spidev ought to allocate "buffer" using the __GFP_DMA flag. I think
you already showed that this is not causing your problem. I think that
on ARM, all memory can be used for DMA; that is not true on all
architectures and some apparently will silently allocate bounce buffers
for non-dma-accessible memory.
2. spidev uses the same buffer for tx and rx. That is supposed to be
allowed, but I don't think pxa2xx_spi handles this case correctly.
pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it
uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx
buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers
are the same. Thus if they are the same, the memory is mapped twice.
"Linux Device Drivers", Corbet, et al. does not address this
possibility, but I bet it is not a good thing to do.
If you are willing, it looks simple to modify spidev to use two buffers,
and test whether that works better. I agree that spidev should work
either way, but this would be a quick test. If that works, I will
submit a patch for pxa2xx_spi to fix the case of same buffers.
Alternatively, you might prefer to try fixing map_dma_buffers() and
unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works,
that would be the proper fix, anyway. Basically it needs to trap the
case of same address (or even overlapping ranges) and do one
dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use
the same flag. I think I would choose to make it fail on overlapped but
unequal ranges, and perform correctly for the cases of separate or equal.
--
A couple of things that might improve your performance, but probably are
not the cause of the problem:
I notice that you don't set the bits_per_word in struct pxa2xx_spi_chip.
The default is 8. Presumably that is what you want.
There is a transfer timeout that is built into pxa2xx_spi.c and that is
loaded to the timeout register in the SSP device. It defaults to "1000"
arbitrary units (actually counts of an unspecified-by-Intel clock that
is measured to be 99.5MHz on the PXA255 NSSP port, but I don't know what
it is on the PXA270). This is a very short time of 10usec, and the
result is extra interrupts to service during a transfer. You might try
setting this value to 1,000,000 (10msec), or whatever makes sense in
your application. You do this by changing the value of the
struct pxa2xx_spi_chip.timeout parameter. You have the default value of
1000 in that location.
The values of 12 and 4 for the thresholds are sub-optimal. They should
probably be 8 and 8 for DMA. A patch has already been submitted by
Vernon Sauder (it will appear in 2.6.28) to fix the documentation, and
variously improve the defaults for unspecified values in struct
pxa2xx_spi_chip.
--
Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab 508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079
--
Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org
Oceanographic Systems Lab 508-289-2226
Applied Ocean Physics and Engineering Dept.
Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA
http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212
http://www.whoi.edu/hpb/Site.do?id=1532
http://www.whoi.edu/page.do?pid=10079
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
^ permalink raw reply [flat|nested] 8+ messages in thread* PXA270 SSPSFRM gates chip select ? @ 2008-02-11 22:43 J. Scott Merritt 2008-02-11 23:26 ` Ned Forrester 0 siblings, 1 reply; 8+ messages in thread From: J. Scott Merritt @ 2008-02-11 22:43 UTC (permalink / raw) To: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Reposted from linux-arm-kernel mailing list ... at the suggestion of David Brownell. Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as external chip selects. On the very first message after boot, I am receiving an extra clock pulse at the slave device (causing the whole message to be "shifted"). It appears that on the first message, the Chip Select is activated before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, resulting in an extra, positive-going transition of SSPSCLK as it attempts to establish the proper (high) clock level for the SPH=1 setting. Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 also performs the chip select call before updating SSCR0 & SSCR1. Slave device is NXP LPC2366. I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; but still have the clock riding low before the first chip select. ... David Brownell responds: Actually, the SPI_CPOL mode bit controls the clock polarity before the chip select coes active: CPOL=0 should mean it's forced low (if it isn't already low). So if that driver is doing chipselect *then* clock setup, it's doing the wrong thing. A patch would be appreciated... ... Thanks, Scott. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PXA270 SSPSFRM gates chip select ? @ 2008-02-11 23:26 ` Ned Forrester [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> [not found] ` <20081107184819.54baa679.merrij3@rpi.edu> 0 siblings, 2 replies; 8+ messages in thread From: Ned Forrester @ 2008-02-11 23:26 UTC (permalink / raw) To: J. Scott Merritt Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR J. Scott Merritt wrote: > Reposted from linux-arm-kernel mailing list ... at the suggestion > of David Brownell. > > Using pxa2xx_spi from Kernel 2.6.21. PXA270 is SSP Master in > SPI_MODE_3 (SPO=1, SPH=1) with internal clock and GPIO's used as > external chip selects. > > On the very first message after boot, I am receiving an extra clock > pulse at the slave device (causing the whole message to be "shifted"). > > It appears that on the first message, the Chip Select is activated > before the SPO=1/SPH=1 is fully established in the PXA's SSP hardware, > resulting in an extra, positive-going transition of SSPSCLK as it > attempts to establish the proper (high) clock level for the SPH=1 > setting. > > Testing was performed on Kernal 2.6.21, but it appears that 2.6.24 > also performs the chip select call before updating SSCR0 & SSCR1. > Slave device is NXP LPC2366. > > I have tried setting the "default" in pxa2xx_spi.c to SPO=1, SPH=1; > but still have the clock riding low before the first chip select. I have worked with this driver extensively, so I took a look. It seems that you are right. The these bits are set at the end of pump_transfers(), based on internal values that are configured in setup(). The SSCR1 register is initialized with default data in probe(), but this default value is not influenced from any other configuration data. SSCR1 contains the interrupt enables and dma service request enables. As such, it is set as the very last task in pump_transfers(), after all other setup, including the call to cs_control(). If I recall correctly, in some modes of operation, it is important that this register not be written when activity is already in progress, and so it is only set if changed at the end of pump_transfers(). It appears that there needs to be a check for changed clock mode in pump_transfers(), prior to setting chip select. It is important to do this carefully, so as not to interfere with on-going operations. I will take a closer look at this. I know that normally each of these SPI transfers is independent, but I worked with Stephen Street (the maintainer) over a year ago to prep this driver for some external master data steaming that I needed, and I know we worked in this area of the driver. It is possible that we messed this up, as I don't use chip select. > ... David Brownell responds: > > Actually, the SPI_CPOL mode bit controls the clock polarity > before the chip select coes active: CPOL=0 should mean it's > forced low (if it isn't already low). > > So if that driver is doing chipselect *then* clock setup, > it's doing the wrong thing. A patch would be appreciated... > > ... > Thanks, Scott. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > spi-devel-general mailing list > spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org > https://lists.sourceforge.net/lists/listinfo/spi-devel-general > > -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org>]
* PXA270 SSP DMA Corruption [not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org> @ 2008-11-07 16:43 ` J. Scott Merritt [not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: J. Scott Merritt @ 2008-11-07 16:43 UTC (permalink / raw) To: Ned Forrester Cc: David Brownell, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Long ago, I posted a message on linux-arm-kernel describing data corruption that I was seeing on PXA270 SSP transfers using DMA: http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL interface provided by spidev ... and unfortunately am still seeing data corruption with DMA transfers. I have a user program that is sending single blocks (1-255 words) of fabricated, non-zero data to an outbound processor (at 300 kHz). I find that after a small number of blocks (10-20), the outboard processor will receive a packet that begins with multiple zeros, rather than the intended data. I have put printk statements into spidev that show that the correct, non-zero data is always present in the DMA bounce buffer before and after the transfer - even on the packets that fail to arrive. If I display the DMA flag in the platform data, then the transfers continue indefinitely without any problems. More interestingly, while using DMA, if I simply insert a sched_yield (on an otherwise unoccupied processor), before each transfer from user land, the problem disappears ! In my original posting last year, I suspected a cache coherency problem. However, based on the latest symptoms, I'm wondering if the SSP transfer might be initiated before the DMA controller is ready/able to provide source data .... Thanks, Scott. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-07 18:59 ` Ned Forrester 0 siblings, 0 replies; 8+ messages in thread From: Ned Forrester @ 2008-11-07 18:59 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f J. Scott Merritt wrote: > Long ago, I posted a message on linux-arm-kernel describing data corruption > that I was seeing on PXA270 SSP transfers using DMA: > http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 > > I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL > interface provided by spidev ... and unfortunately am still seeing > data corruption with DMA transfers. pxa2xx_spi.c has been pretty well wrung out by various people. I worked with Stephen preparing fixes that I think were introduced in 2.6.20. While my task is receive-only, and it looks like your task is transmit-only, I did a lot of loopback testing at that time, without the sort of trouble you report. Since then Stephen seems to have moved on to other interests. > I have a user program that is sending single blocks (1-255 words) of > fabricated, non-zero data to an outbound processor (at 300 kHz). I > find that after a small number of blocks (10-20), the outboard processor > will receive a packet that begins with multiple zeros, rather than the > intended data. > > I have put printk statements into spidev that show that the correct, > non-zero data is always present in the DMA bounce buffer before and > after the transfer - even on the packets that fail to arrive. I assume you are still talking about spi_char as the protocol driver. Right? You may be having trouble with the way the buffers are allocated. I think, but am not sure, that PXA wants the buffers aligned on 32-bit boundaries. I note that spi_char statically allocates the buffers in the middle of the struct spichar_driver. You might want to allocate the buffers separately, and change struct spichar_driver to have only pointers to the separately allocated buffers. Then.... In my protocol driver, I allocate my buffers with a few more flags: d_buf = (char *)kzalloc(count, GFP_KERNEL | __GFP_DMA | __GFP_COLD); I think the important one is probably __GFP_DMA. I'm not sure how important that is on PXA, because I think you can DMA from any memory, but it may enforce needed alignment. At least alignment checks never fail when I do this, but perhaps they would not anyway. Also note that if pxa2xx_spi discovers that the allocated buffers are not aligned on a 32-bit boundary or if dma_map_single fails, it will silently perform the transfer in PIO mode. You can also try doing the dma mapping in the protocol driver, and passing the mapped address in struct spi_transfer->tx_buf. David Brownell pointed out to me once that this is more efficient, because it allows the kernel to free the cache line associated with the buffer sooner than if it is left to pxa2xx_spi. If you do map in the protocol driver, don't forget to unmap after the transfer is given back from pxa2xx_spi. > If I display the DMA flag in the platform data, then the transfers > continue indefinitely without any problems. More interestingly, while > using DMA, if I simply insert a sched_yield (on an otherwise unoccupied > processor), before each transfer from user land, the problem > disappears ! > > In my original posting last year, I suspected a cache coherency problem. > However, based on the latest symptoms, I'm wondering if the SSP > transfer might be initiated before the DMA controller is ready/able > to provide source data .... Anything is possible, but I doubt that there is any such problem in pxa2xx_spi. Let us know what happens after changing the allocation of the buffers. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20081107184819.54baa679.merrij3@rpi.edu>]
[parent not found: <491B6249.7070407@whoi.edu>]
[parent not found: <491B6249.7070407-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org> @ 2008-11-13 1:24 ` J. Scott Merritt [not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: J. Scott Merritt @ 2008-11-13 1:24 UTC (permalink / raw) To: Ned Forrester Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR On Wed, 12 Nov 2008 18:10:01 -0500 Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > J. Scott Merritt wrote: > > Long ago, I posted a message on linux-arm-kernel describing data > > corruption that I was seeing on PXA270 SSP transfers using DMA: > > http://marc.info/?l=linux-arm-kernel&m=117780219128682&w=2 > > > > I have recently upgraded to Kernel 2.6.27.4 and am now using the IOCTL > > interface provided by spidev ... and unfortunately am still seeing > > data corruption with DMA transfers. > > > > I have managed to reproduce the problem using the internal loopback > > facility of the PXA270 SSP hardware so it should be possible to > > test this problem in other environments. On my system, the program > > below reports an error within 30-50 attempts. From the data reported, > > it would appear the the DMA controller (or memory cache) is transferring > > the contents of the previous buffer rather than the current buffer. > > > > I have also included the platform data that initializes the SSP > > device drivers. Perhaps I have misconfigured or misued this in some > > way ??? Note that simply disabling DMA in the platform data allows > > the test to run indefinitely without errors. > > [snip code] > > Sorry for the delayed reply. > > I have looked over your code, and tried to familiarize myself with > spidev. I cannot easily test your code because I am still using 2.6.20, > and spidev is was not introduced until 2.6.22. I don't see any obvious > problems with either your test code or spidev. Certainly, spidev is of > similar complexity to pxa2xx_spi, so I would say that they are similarly > likely to have issues. I don't know how much use spidev has gotten, but > it is probably more than pxa2xx_spi, because it works across platforms. > > There are a couple of things to consider: > > 1. spidev ought to allocate "buffer" using the __GFP_DMA flag. I think > you already showed that this is not causing your problem. I think that > on ARM, all memory can be used for DMA; that is not true on all > architectures and some apparently will silently allocate bounce buffers > for non-dma-accessible memory. > > 2. spidev uses the same buffer for tx and rx. That is supposed to be > allowed, but I don't think pxa2xx_spi handles this case correctly. > pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it > uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx > buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers > are the same. Thus if they are the same, the memory is mapped twice. > "Linux Device Drivers", Corbet, et al. does not address this > possibility, but I bet it is not a good thing to do. > > If you are willing, it looks simple to modify spidev to use two buffers, > and test whether that works better. I agree that spidev should work > either way, but this would be a quick test. If that works, I will > submit a patch for pxa2xx_spi to fix the case of same buffers. > > Alternatively, you might prefer to try fixing map_dma_buffers() and > unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, > that would be the proper fix, anyway. Basically it needs to trap the > case of same address (or even overlapping ranges) and do one > dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use > the same flag. I think I would choose to make it fail on overlapped but > unequal ranges, and perform correctly for the cases of separate or equal. > [snip performance suggestions] Ned, Thank for your thorough and thoughtful review. It looks like your concerns regarding the duplicate mapping of the overlapping buffers were correct. I tried both suggestions - namely, using two buffers in spidev, as well as cutting back to a single buffer mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes (by themselves) seemed to eliminate the data corruption in my simple test program ! However, I have some lingering concerns regarding the latter solution. >From what little I have read, it appears the DMA-BIDIRECTIONAL is intended for situations where the transfer direction is not know ... it is not immediately clear (to me) that it also handles our case, which might be better described as DMA_SIMULTANEOUS. It is possible that using DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding of the buffering and caching that is going on between the two independent DMA channel it is difficult for me to speculate. Many thanks, Scott. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 1:48 ` Ned Forrester [not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Ned Forrester @ 2008-11-13 1:48 UTC (permalink / raw) To: J. Scott Merritt Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR J. Scott Merritt wrote: > On Wed, 12 Nov 2008 18:10:01 -0500 > Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > >> 2. spidev uses the same buffer for tx and rx. That is supposed to be >> allowed, but I don't think pxa2xx_spi handles this case correctly. >> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it >> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx >> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers >> are the same. Thus if they are the same, the memory is mapped twice. >> "Linux Device Drivers", Corbet, et al. does not address this >> possibility, but I bet it is not a good thing to do. >> >> If you are willing, it looks simple to modify spidev to use two buffers, >> and test whether that works better. I agree that spidev should work >> either way, but this would be a quick test. If that works, I will >> submit a patch for pxa2xx_spi to fix the case of same buffers. >> >> Alternatively, you might prefer to try fixing map_dma_buffers() and >> unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, >> that would be the proper fix, anyway. Basically it needs to trap the >> case of same address (or even overlapping ranges) and do one >> dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use >> the same flag. I think I would choose to make it fail on overlapped but >> unequal ranges, and perform correctly for the cases of separate or equal. >> > [snip performance suggestions] > > Ned, Thank for your thorough and thoughtful review. > > It looks like your concerns regarding the duplicate mapping of the > overlapping buffers were correct. I tried both suggestions - namely, > using two buffers in spidev, as well as cutting back to a single buffer > mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes > (by themselves) seemed to eliminate the data corruption in my simple > test program ! That's good. I will write a patch for pxa2xx_spi, unless you would rather do it. > However, I have some lingering concerns regarding the latter solution. >>From what little I have read, it appears the DMA-BIDIRECTIONAL is intended > for situations where the transfer direction is not know ... it is not > immediately clear (to me) that it also handles our case, which might be > better described as DMA_SIMULTANEOUS. It is possible that using > DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding > of the buffering and caching that is going on between the two independent > DMA channel it is difficult for me to speculate. According to "Linux Device Drivers": "If data can move in either direction, use DMA_BIDIRECTIONAL." and "Incidentally, bounce buffers are one reason why it is important to get the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both before and after the operation, which is often an unnecessary waste of CPU cycles." >From the general discussion in the book, it is clear that dma mapping takes care of issues like flushing cache (for DMA_TO_DEVICE), invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory mapping and scatter/gather, etc. Once stream mapped (which dma_map_single() does), the CPU is not allowed to touch the memory, and once unmapped, the device cannot touch the memory. I think it is clear that DMA_BIDIRECTIONAL is for exactly the case where the device will do some DMA in each direction before giving the memory back to the CPU. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <491B8783.9050800-/d+BM93fTQY@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org> @ 2008-11-13 1:59 ` Ned Forrester 0 siblings, 0 replies; 8+ messages in thread From: Ned Forrester @ 2008-11-13 1:59 UTC (permalink / raw) To: J. Scott Merritt Cc: david-b-yBeKhBN/0LDR7s880joybQ, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, linux-arm-kernel-xIg/pKzrS19vn6HldHNs0ANdhmdF6hFW, stephen-nl6u4wocdmy51APUEpUfAkEOCMrvLtNR Ned Forrester wrote: > J. Scott Merritt wrote: >> On Wed, 12 Nov 2008 18:10:01 -0500 >> Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: >> >>> 2. spidev uses the same buffer for tx and rx. That is supposed to be >>> allowed, but I don't think pxa2xx_spi handles this case correctly. >>> pxa2xx_spi handles NULL buffers (for uni-directional transfers), and it >>> uses dma_map_single to map the rx buffers as DMA_FROM_DEVICE, and the tx >>> buffer as DMA_TO_DEVICE, without checking whether the rx and tx buffers >>> are the same. Thus if they are the same, the memory is mapped twice. >>> "Linux Device Drivers", Corbet, et al. does not address this >>> possibility, but I bet it is not a good thing to do. >>> >>> If you are willing, it looks simple to modify spidev to use two buffers, >>> and test whether that works better. I agree that spidev should work >>> either way, but this would be a quick test. If that works, I will >>> submit a patch for pxa2xx_spi to fix the case of same buffers. >>> >>> Alternatively, you might prefer to try fixing map_dma_buffers() and >>> unmap_dma_buffers() in pxa2xx_spi.c to test this theory. If it works, >>> that would be the proper fix, anyway. Basically it needs to trap the >>> case of same address (or even overlapping ranges) and do one >>> dma_map_single using the DMA_BIDIRECTIONAL flag. Unmapping has to use >>> the same flag. I think I would choose to make it fail on overlapped but >>> unequal ranges, and perform correctly for the cases of separate or equal. >>> >> [snip performance suggestions] >> >> Ned, Thank for your thorough and thoughtful review. >> >> It looks like your concerns regarding the duplicate mapping of the >> overlapping buffers were correct. I tried both suggestions - namely, >> using two buffers in spidev, as well as cutting back to a single buffer >> mapped with DMA_BIDIRECTIONAL in pxa2xx_spi.c. Each of these changes >> (by themselves) seemed to eliminate the data corruption in my simple >> test program ! > > That's good. I will write a patch for pxa2xx_spi, unless you would > rather do it. > >> However, I have some lingering concerns regarding the latter solution. >> >From what little I have read, it appears the DMA-BIDIRECTIONAL is intended >> for situations where the transfer direction is not know ... it is not >> immediately clear (to me) that it also handles our case, which might be >> better described as DMA_SIMULTANEOUS. It is possible that using >> DMA_BIDIRECTIONAL is sufficient, but without a much deeper understanding >> of the buffering and caching that is going on between the two independent >> DMA channel it is difficult for me to speculate. > > According to "Linux Device Drivers": > > "If data can move in either direction, use DMA_BIDIRECTIONAL." > > and > > "Incidentally, bounce buffers are one reason why it is important to get > the direction right. DMA_BIDIRECTIONAL bounce buffers are copied both > before and after the operation, which is often an unnecessary waste of > CPU cycles." > >>From the general discussion in the book, it is clear that dma mapping > takes care of issues like flushing cache (for DMA_TO_DEVICE), > invalidating cache (for DMA_FROM_DEVICE), setting up any I/O memory > mapping and scatter/gather, etc. Once stream mapped (which > dma_map_single() does), the CPU is not allowed to touch the memory, and > once unmapped, the device cannot touch the memory. I think it is clear > that DMA_BIDIRECTIONAL is for exactly the case where the device will do > some DMA in each direction before giving the memory back to the CPU. I should have added that the logic of SPI is that for every byte written DMA tx), there is a byte read (DMA rx). Because the clocks that read are generated by the clocks that transmit, the tx word has to be in the FIFO before the rx word is returned to the FIFO. Thus the tx and rx DMA channels will never access the same word at the same time, if the tx and rx buffers start at the same address. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20081112213403.402948b9.merrij3@rpi.edu>]
[parent not found: <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 3:19 ` Ned Forrester [not found] ` <20081113120134.70d533c8.merrij3@rpi.edu> 0 siblings, 1 reply; 8+ messages in thread From: Ned Forrester @ 2008-11-13 3:19 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel J. Scott Merritt wrote: > On Wed, 12 Nov 2008 18:10:01 -0500 > It appears that the timeout is computed based upon the "Peripheral Clock > frequency" on the PXA270 - which would appear to be 312Mhz, (or something > divided down from there). If it is 312Mhz, then for my SSP speed of > 300K, I guess I need something around 10,000. It says "peripheral clock frequency" without ever defining that phrase (at least in the 1/2006 version of the developer's manual). Oddly on PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz machine. I bet it is not as high as 312MHz, but you can measure it by setting very long times, forcing the timeout (no tx data in pio mode, I think), and perhaps using GPIO probes to trigger a scope. I did something like that on my system. It would sure be nice to know the answer to that, as no one using PXA270 has ever answered the question. > Table 8-4 in my PXA manual is titled "TFT and RFT values with possible > DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7 > "Do not use DMA" ... Am I misinterpreting this ? I -think- that means > that 8/8 is not a good choice ... The table you refer to lists the values loaded into the TFT and RFT bit fields, which are the desired threshold-1. The values passed to pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8. > With respect to the pxa2xx_spi patch, please proceed - I will certainly > not attempt to generate one myself. I'll get something out in the next couple of days. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20081113120134.70d533c8.merrij3@rpi.edu>]
[parent not found: <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org>]
* Re: PXA270 SSP DMA Corruption [not found] ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org> @ 2008-11-13 18:54 ` Ned Forrester 0 siblings, 0 replies; 8+ messages in thread From: Ned Forrester @ 2008-11-13 18:54 UTC (permalink / raw) To: J. Scott Merritt; +Cc: spi-devel J. Scott Merritt wrote: > On Wed, 12 Nov 2008 22:19:49 -0500 > Ned Forrester <nforrester-/d+BM93fTQY@public.gmane.org> wrote: > >> It says "peripheral clock frequency" without ever defining that phrase >> (at least in the 1/2006 version of the developer's manual). Oddly on >> PXA255 this frequency appears to be runclock/4 = 99.5MHz for a 400MHz >> machine. I bet it is not as high as 312MHz, but you can measure it by >> setting very long times, forcing the timeout (no tx data in pio mode, I >> think), and perhaps using GPIO probes to trigger a scope. I did >> something like that on my system. It would sure be nice to know the >> answer to that, as no one using PXA270 has ever answered the question. >> >>> Table 8-4 in my PXA manual is titled "TFT and RFT values with possible >>> DMA Burst Sizes" ... for 8 bit data, it says ... for TFT > 7 >>> "Do not use DMA" ... Am I misinterpreting this ? I -think- that means >>> that 8/8 is not a good choice ... > > My manual seemed to define "peripheral clock" in multiple places with > multiple (conflicting) frequencies (or options) ;) That sounds about like what I found. > I regret that I am probably not able to attempt the clock measurement > effort at this time - too many other pressures .... If you are really > anxious to resolve this, I might be able to loan you a PXA270 system > for a couple of weeks. Alternatively, I might be able to find time > for a couple of Kernel builds/loads if you wanted to build all of the > test programs and patches ... Excessive busyness affects all of us. I must decline. >> The table you refer to lists the values loaded into the TFT and RFT bit >> fields, which are the desired threshold-1. The values passed to >> pxa2xx_spi are the desired thresholds, so TFT>7 is the same as threshold>8. > > You are correct ... of course ... :) > >>>> With respect to the pxa2xx_spi patch, please proceed - I will certainly >>> not attempt to generate one myself. >> I'll get something out in the next couple of days. > > Any chance we could get it accepted as a repair into 2.6.27 ? Yes, I plan to submit the patch to the stable tree, also. It usually takes a few weeks to propagate. -- Ned Forrester nforrester-/d+BM93fTQY@public.gmane.org Oceanographic Systems Lab 508-289-2226 Applied Ocean Physics and Engineering Dept. Woods Hole Oceanographic Institution Woods Hole, MA 02543, USA http://www.whoi.edu/sbl/liteSite.do?litesiteid=7212 http://www.whoi.edu/hpb/Site.do?id=1532 http://www.whoi.edu/page.do?pid=10079 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-11-13 18:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 23:14 PXA270 SSP DMA Corruption Ned Forrester
-- strict thread matches above, loose matches on Subject: below --
2008-02-11 22:43 PXA270 SSPSFRM gates chip select ? J. Scott Merritt
2008-02-11 23:26 ` Ned Forrester
[not found] ` <47B0D9A4.6080104-/d+BM93fTQY@public.gmane.org>
2008-11-07 16:43 ` PXA270 SSP DMA Corruption J. Scott Merritt
[not found] ` <20081107114312.2f34b389.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-07 18:59 ` Ned Forrester
[not found] ` <20081107184819.54baa679.merrij3@rpi.edu>
[not found] ` <491B6249.7070407@whoi.edu>
[not found] ` <491B6249.7070407-/d+BM93fTQY@public.gmane.org>
2008-11-13 1:24 ` J. Scott Merritt
[not found] ` <20081112202438.61c28cf4.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13 1:48 ` Ned Forrester
[not found] ` <491B8783.9050800-/d+BM93fTQY@public.gmane.org>
2008-11-13 1:59 ` Ned Forrester
[not found] ` <20081112213403.402948b9.merrij3@rpi.edu>
[not found] ` <20081112213403.402948b9.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13 3:19 ` Ned Forrester
[not found] ` <20081113120134.70d533c8.merrij3@rpi.edu>
[not found] ` <20081113120134.70d533c8.merrij3-IL7dBOYR4Vg@public.gmane.org>
2008-11-13 18:54 ` Ned Forrester
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.