* About dma_sync_single_for_{cpu,device} [not found] <20120730202401.GA4947@gobelin> @ 2012-07-31 6:45 ` Karl Beldan 2012-07-31 6:59 ` Clemens Ladisch 2012-07-31 9:09 ` Russell King - ARM Linux 0 siblings, 2 replies; 9+ messages in thread From: Karl Beldan @ 2012-07-31 6:45 UTC (permalink / raw) To: linux-arm-kernel Hi, (This is an email originally addressed to the linux-kernel mailing-list.) On our board we've got an MV78200 and a network device between which we xfer memory chunks via the ddram with an external dma controller. To handle these xfers we're using the dma API. To tx a chunk of data from the SoC => network device, we : - prepare a buffer with a leading header embedding a pattern, - trigger the xfer and wait for an irq // The device updates the pattern and then triggers an irq - upon irq we check the pattern for the xfer completion I was expecting the following to work: addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); dev_send(buffer); // wait for irq (don't peek in the buffer) ... got irq dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE); if (!xfer_done(buffer)) // not RAM value dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); [...] But this does not work (the buffer pattern does not reflect the ddram value). On the other hand, the following works: [...] // wait for irq (don't peek in the buffer) ... got irq dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); if (!xfer_done(buffer)) // RAM value [...] Looking at dma-mapping.c:__dma_page_cpu_to_{dev,cpu}() and proc-feroceon.S: feroceon_dma_{,un}map_area this behavior is not surprising. The sync_for_cpu calls the unmap which just invalidates the outer cache while the sync_for_device invalidates both inner and outer. It seems that: - we need to invalidate after the RAM has been updated - we need to invalidate with sync_single_for_device rather than sync_single_for_cpu to check the value Is it correct ? Maybe the following comment in dma-mapping.c explains the situation : /* * The DMA API is built upon the notion of "buffer ownership". A buffer * is either exclusively owned by the CPU (and therefore may be accessed * by it) or exclusively owned by the DMA device. These helper functions * represent the transitions between these two ownership states. * * Note, however, that on later ARMs, this notion does not work due to * speculative prefetches. We model our approach on the assumption that * the CPU does do speculative prefetches, which means we clean caches * before transfers and delay cache invalidation until transfer completion. * */ Thanks for your input, Regards, Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan @ 2012-07-31 6:59 ` Clemens Ladisch 2012-07-31 7:27 ` Karl Beldan 2012-07-31 9:09 ` Russell King - ARM Linux 1 sibling, 1 reply; 9+ messages in thread From: Clemens Ladisch @ 2012-07-31 6:59 UTC (permalink / raw) To: linux-arm-kernel Karl Beldan wrote: > To tx a chunk of data from the SoC => network device, we : > - prepare a buffer with a leading header embedding a pattern, > - trigger the xfer and wait for an irq > // The device updates the pattern and then triggers an irq > - upon irq we check the pattern for the xfer completion > > I was expecting the following to work: > addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); Of both the CPU and the device write to the buffer, you must use DMA_BIDIRECTIONAL. Regards, Clemens ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 6:59 ` Clemens Ladisch @ 2012-07-31 7:27 ` Karl Beldan 2012-07-31 7:34 ` Clemens Ladisch 0 siblings, 1 reply; 9+ messages in thread From: Karl Beldan @ 2012-07-31 7:27 UTC (permalink / raw) To: linux-arm-kernel On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote: > Karl Beldan wrote: >> To tx a chunk of data from the SoC => network device, we : >> - prepare a buffer with a leading header embedding a pattern, >> - trigger the xfer and wait for an irq >> // The device updates the pattern and then triggers an irq >> - upon irq we check the pattern for the xfer completion >> >> I was expecting the following to work: >> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); > > Of both the CPU and the device write to the buffer, you must use > DMA_BIDIRECTIONAL. > Hi Clemens, This does not work (tested) : seems to me BIDIRECTIONAL would just add invalidate, and invalidate before the ram has been updated, as stated, does not work. In fact the immediately following sync_for_device is intended to cater for what DMA_BIDIRECTIONAL would provide (though it is not implementation agnostic), only invalidating a smaller address range. Regards, Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 7:27 ` Karl Beldan @ 2012-07-31 7:34 ` Clemens Ladisch 2012-07-31 8:30 ` Karl Beldan 0 siblings, 1 reply; 9+ messages in thread From: Clemens Ladisch @ 2012-07-31 7:34 UTC (permalink / raw) To: linux-arm-kernel Karl Beldan wrote: > On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote: >> Karl Beldan wrote: >>> To tx a chunk of data from the SoC => network device, we : >>> - prepare a buffer with a leading header embedding a pattern, >>> - trigger the xfer and wait for an irq >>> // The device updates the pattern and then triggers an irq >>> - upon irq we check the pattern for the xfer completion >>> >>> I was expecting the following to work: >>> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); >> >> Of both the CPU and the device write to the buffer, you must use >> DMA_BIDIRECTIONAL. > > This does not work (tested) : seems to me BIDIRECTIONAL would just > add invalidate, and invalidate before the ram has been updated, as > stated, does not work. Please show the exact sequence of dma_* calls, and also show when and how the CPU and the device access the buffer. Regards, Clemens ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 7:34 ` Clemens Ladisch @ 2012-07-31 8:30 ` Karl Beldan 0 siblings, 0 replies; 9+ messages in thread From: Karl Beldan @ 2012-07-31 8:30 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 31, 2012 at 09:34:01AM +0200, Clemens Ladisch wrote: > Karl Beldan wrote: > > On 7/31/12, Clemens Ladisch <clemens@ladisch.de> wrote: > >> Karl Beldan wrote: > >>> To tx a chunk of data from the SoC => network device, we : > >>> - prepare a buffer with a leading header embedding a pattern, > >>> - trigger the xfer and wait for an irq > >>> // The device updates the pattern and then triggers an irq > >>> - upon irq we check the pattern for the xfer completion > >>> > >>> I was expecting the following to work: > >>> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); > >> > >> Of both the CPU and the device write to the buffer, you must use > >> DMA_BIDIRECTIONAL. > > > > This does not work (tested) : seems to me BIDIRECTIONAL would just > > add invalidate, and invalidate before the ram has been updated, as > > stated, does not work. > > Please show the exact sequence of dma_* calls, and also show when and > how the CPU and the device access the buffer. > Hmm, so I just spotted a line where we peek in the buffer after invalidating .. cannot believe I missed it .. so sorry for the noise .. now it's working. I felt I would find the culprit right after posting ;) Thanks Clemens ! ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan 2012-07-31 6:59 ` Clemens Ladisch @ 2012-07-31 9:09 ` Russell King - ARM Linux 2012-07-31 19:31 ` Karl Beldan 1 sibling, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2012-07-31 9:09 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote: > I was expecting the following to work: > addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); > dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); > dev_send(buffer); > // wait for irq (don't peek in the buffer) ... got irq > dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE); > if (!xfer_done(buffer)) // not RAM value > dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); > [...] First point is that you clearly do not understand the DMA API at all. The DMA API has the idea of buffer ownership. Only the owner may access the buffer: *CPU OWNS THE BUFFER* dma_map_single() *DEVICE OWNS THE BUFFER* dma_sync_single_for_cpu() *CPU OWNS THE BUFFER* dma_sync_single_for_device() *DEVICE OWNS THE BUFFER* dma_unmap_single() *CPU OWNS THE BUFFER* So, there is absolutely no noeed what so ever to follow dma_map_single() with dma_sync_single_for_device(). Second point is that you should not change the 'direction' argument while a buffer is mapped. Thirdly, enable DMA API debugging (DMA_API_DEBUG) to make sure you're using the DMA API correctly. Fourthly, remember that the CPU deals with cache lines, and dirty cache lines may be written back in their _entirety_ - which means that data DMA'd from a device in the same cache line that you've modified via the CPU will not work (either the data in the cache line has to be invalidated and therefore the CPU update discarded, or the cache line has to be flushed back to RAM and the DMA'd data is overwritten.) Hence why the buffer ownership rules are extremely important. The solution for this fourth point is to use coherent DMA memory for things like ring buffers and the like, which does not suffer from this. > Maybe the following comment in dma-mapping.c explains the situation : > /* > * The DMA API is built upon the notion of "buffer ownership". A buffer > * is either exclusively owned by the CPU (and therefore may be accessed > * by it) or exclusively owned by the DMA device. These helper functions > * represent the transitions between these two ownership states. > * > * Note, however, that on later ARMs, this notion does not work due to > * speculative prefetches. We model our approach on the assumption that > * the CPU does do speculative prefetches, which means we clean caches > * before transfers and delay cache invalidation until transfer completion. > * > */ Even with that comment, the idea of buffer ownership must be preserved by drivers, and they must follow that rule. The fact that some ARM CPU do not respect the ownership entirely is worked around inside the DMA API and is of no interest to drivers. Feroceon is not one CPU which does this though. ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 9:09 ` Russell King - ARM Linux @ 2012-07-31 19:31 ` Karl Beldan 2012-07-31 20:08 ` Russell King - ARM Linux 0 siblings, 1 reply; 9+ messages in thread From: Karl Beldan @ 2012-07-31 19:31 UTC (permalink / raw) To: linux-arm-kernel On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote: >> I was expecting the following to work: >> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); >> dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); >> dev_send(buffer); >> // wait for irq (don't peek in the buffer) ... got irq >> dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE); >> if (!xfer_done(buffer)) // not RAM value >> dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); >> [...] > Hi Russell, > First point is that you clearly do not understand the DMA API at all. The > DMA API has the idea of buffer ownership. Only the owner may access the > buffer: > Are you saying that this scenario does not work ? We are taking some liberties with the DMA API, we're more using some of its funcs rather than _using_ it ;). The question was not whether this was a proper usage of the API, but why that scenario would not lead to the expected results .. and now I've found the culprit peek I am happy. [...] > So, there is absolutely no noeed what so ever to follow dma_map_single() > with dma_sync_single_for_device(). > Cleaning a wide address range while invalidating a small one ? [...] > Fourthly, remember that the CPU deals with cache lines, and dirty cache > lines may be written back in their _entirety_ - which means that data > DMA'd from a device in the same cache line that you've modified via the > CPU will not work (either the data in the cache line has to be invalidated > and therefore the CPU update discarded, or the cache line has to be flushed > back to RAM and the DMA'd data is overwritten.) Hence why the buffer > ownership rules are extremely important. > Of course. > The solution for this fourth point is to use coherent DMA memory for things > like ring buffers and the like, which does not suffer from this. > I might use something different in a not so distant future, yet, for the time being, this way of doing as its advantages. [...] > Even with that comment, the idea of buffer ownership must be preserved by > drivers, and they must follow that rule. The fact that some ARM CPU > do not respect the ownership entirely is worked around inside the DMA API > and is of no interest to drivers. Feroceon is not one CPU which does this > though. > It was kind of a last resort explanation for a cache line filled out of the blue .. before I spotted the culprit peek this morning. Thanks for your input, Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 19:31 ` Karl Beldan @ 2012-07-31 20:08 ` Russell King - ARM Linux 2012-08-01 6:50 ` Karl Beldan 0 siblings, 1 reply; 9+ messages in thread From: Russell King - ARM Linux @ 2012-07-31 20:08 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 31, 2012 at 09:31:13PM +0200, Karl Beldan wrote: > On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > > On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote: > >> I was expecting the following to work: > >> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); > >> dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); > >> dev_send(buffer); > >> // wait for irq (don't peek in the buffer) ... got irq > >> dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE); > >> if (!xfer_done(buffer)) // not RAM value > >> dma_sync_single_for_device(dev, buffer, pattern_size, DMA_FROM_DEVICE); > >> [...] > > > > Hi Russell, > > > > First point is that you clearly do not understand the DMA API at all. The > > DMA API has the idea of buffer ownership. Only the owner may access the > > buffer: > > > Are you saying that this scenario does not work ? > We are taking some liberties with the DMA API, we're more using some > of its funcs rather than _using_ it ;). > The question was not whether this was a proper usage of the API, but > why that scenario would not lead to the expected results .. and now > I've found the culprit peek I am happy. If you abuse the API don't expect your stuff to work in future kernel versions. It seems that the overall tone of your reply is "what we have now works, we don't care if it's correct, sod you." Fine, I won't spend any more time on this. Just don't ever think about merging it into mainline, thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* About dma_sync_single_for_{cpu,device} 2012-07-31 20:08 ` Russell King - ARM Linux @ 2012-08-01 6:50 ` Karl Beldan 0 siblings, 0 replies; 9+ messages in thread From: Karl Beldan @ 2012-08-01 6:50 UTC (permalink / raw) To: linux-arm-kernel On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Tue, Jul 31, 2012 at 09:31:13PM +0200, Karl Beldan wrote: >> On 7/31/12, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: >> > On Tue, Jul 31, 2012 at 08:45:57AM +0200, Karl Beldan wrote: >> >> I was expecting the following to work: >> >> addr = dma_map_single(dev, buffer, size, DMA_TO_DEVICE); >> >> dma_sync_single_for_device(dev, buffer, pattern_size, >> >> DMA_FROM_DEVICE); >> >> dev_send(buffer); >> >> // wait for irq (don't peek in the buffer) ... got irq >> >> dma_sync_single_for_cpu(dev, buffer, pattern_size, DMA_FROM_DEVICE); >> >> if (!xfer_done(buffer)) // not RAM value >> >> dma_sync_single_for_device(dev, buffer, pattern_size, >> >> DMA_FROM_DEVICE); >> >> [...] >> > >> >> Hi Russell, >> >> >> > First point is that you clearly do not understand the DMA API at all. >> > The >> > DMA API has the idea of buffer ownership. Only the owner may access >> > the >> > buffer: >> > >> Are you saying that this scenario does not work ? >> We are taking some liberties with the DMA API, we're more using some >> of its funcs rather than _using_ it ;). >> The question was not whether this was a proper usage of the API, but >> why that scenario would not lead to the expected results .. and now >> I've found the culprit peek I am happy. > > If you abuse the API don't expect your stuff to work in future kernel > versions. > Of course. > It seems that the overall tone of your reply is "what we have now works, > we don't care if it's correct, sod you." > Not at all : { On 7/31/12, Karl Beldan <karl.beldan@gmail.com> wrote: > I might use something different in a not so distant future, yet, for > the time being, this way of doing as its advantages. } and during this time I might stick to the API. I am not at ease telling people how they should take things, especially if I asked for their help, all the more when they put efforts within the exchange while being expert on the matter, yet please, do not assume I did not care for your advices, which I deem of the most valuable, as, needless to say, do the community. > Fine, I won't spend any more time on this. Just don't ever think about > merging it into mainline, thanks. > Merge submission while taking such liberties .. that I would not dare ;) this really was a down to the ground technical question not the start of a disguised start of a merging request. I am sure that taking such liberties and feeling its limits before sticking to a super safe API is not a bad thing, e.g it might trigger easierly nasty hidden bugs, it is often beneficial to me at least. Thanks for your feedback, Karl ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-01 6:50 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20120730202401.GA4947@gobelin> 2012-07-31 6:45 ` About dma_sync_single_for_{cpu,device} Karl Beldan 2012-07-31 6:59 ` Clemens Ladisch 2012-07-31 7:27 ` Karl Beldan 2012-07-31 7:34 ` Clemens Ladisch 2012-07-31 8:30 ` Karl Beldan 2012-07-31 9:09 ` Russell King - ARM Linux 2012-07-31 19:31 ` Karl Beldan 2012-07-31 20:08 ` Russell King - ARM Linux 2012-08-01 6:50 ` Karl Beldan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).