public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* How is the Linux kernel API dma_alloc_coherent() typically implemented for the ARM Architecture?
@ 2018-10-03 16:55 Casey Leedom
  2018-10-03 17:44 ` Robin Murphy
  0 siblings, 1 reply; 6+ messages in thread
From: Casey Leedom @ 2018-10-03 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

  I have a question about ARM CPU versus PCIe DMA I/O Coherence that I'm
trying to understand. In general, I thought that ARM is I/O Incoherent and
that setting up Device DMA READs from Coherent Memory and Device DMA WRITEs
to Coherent Memory require that the Device Driver/OS coordinate to
FLUSH/INVALIDATE Caches, etc.  In Linux this is all handled automatically
via the dam_map*()/dma_unmap*() APIs.  But what does the Linux kernel API
dma_alloc_coherent() do on an architecture like ARM?  Return an UNCACHED
mapping? I've tried ferreting my way down through the layers and layers of
abstraction and implementation differences for various ARM platforms but
it's pretty opaque ...
 
  We use the Linux dma_alloc_coherent() API in order to allocate our TX and
RX "Rings".  All TX and RX "Buffers" are managed with the dma_map*(*READ*
and *WRITE*) APIs in order to Flush Caches to Memory / Invalidate Caches, etc.

  But these "Rings" serve as "message" rings between the Host and the Device
and we don't do Cache Flushes/Invalidates on them.  Messages sent from the
Host to the Device include Work Requests and lists of Free List Buffer
Pointers.  Messages sent from the Device to the Host include Ingress Packet
Delivery Notifications, Link Status, etc.  For the Ingress Queues which the
Device uses to send messages to the Host, we use a Generation Bit scheme
where the Generation Bit flips back and forth between 0 and 1 every time the
Device's Write Index in the Ingress Queue wraps back around to the start of
the Ingress Queue.  The Host software uses the Generation Bit value to
determine when there are new Device Messages available in the Ingress Queue.

  So, as I was grinding my way down through the layers of implementation on
the Linux dma_alloc_coherent() I was trying to see how the above
dma_alloc_coherent() semantic was being implemented on the ARM architecture
which [I thought] doesn't generally support I/O Coherency.  Setting up a
completely UNCACHED mapping would of course work but at a significant cost
in terms of access.  It's conceivable that the TX Rings could be mapped with
a WRITE-COMBINING UNCACHED mapping I suppose (though the Linux API doesn't
include any information on the DIRECTION of a dma_map_coherent() call).  So
I'm curious about how that all fits together.

Casey

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-04 19:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-03 16:55 How is the Linux kernel API dma_alloc_coherent() typically implemented for the ARM Architecture? Casey Leedom
2018-10-03 17:44 ` Robin Murphy
2018-10-03 18:08   ` Russell King - ARM Linux
2018-10-04 11:05     ` Robin Murphy
2018-10-04 19:13       ` Casey Leedom
2018-10-03 18:36   ` Casey Leedom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox