From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Thu, 24 Apr 2014 20:12:20 +0100 Subject: [PATCH] ARM: mm: dma: Update coherent streaming apis with missing memory barrier In-Reply-To: <5359236A.4000707@ti.com> References: <20140423171727.GK5649@arm.com> <20140423183742.GK24070@n2100.arm.linux.org.uk> <6414220.SShvCHLvZQ@wuerfel> <20140423190448.GB26756@n2100.arm.linux.org.uk> <20140424104737.GE8521@arm.com> <20140424111547.GP26756@n2100.arm.linux.org.uk> <20140424112152.GF19564@arm.com> <535913D4.6020401@ti.com> <20140424140913.GB14110@arm.com> <5359236A.4000707@ti.com> Message-ID: <20140424191220.GA26756@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, Apr 24, 2014 at 10:44:58AM -0400, Santosh Shilimkar wrote: > DMA_TO_DEVICE: CPU->producer and DMA->consumer > 1. CPU fills a descriptor/buffer in memory for DMA to pick it up. > 2. Performs necessary dma_op() which on coherent case is NOP... > ** Here I agree the ordering from all CPUs within the cluster is guaranteed > as per as the descriptor memory view is concerned. > But what is produced by CPU is not visible to DMA yet. So completion > isn't guaranteed. > 3. If DMA kicks the transfer assuming the producer(CPU) completion then > that doesn't work. Step 3 should be done via a writel(), which is a dsb() followed by an outer_sync() followed by the actual write to the register. The dsb and outer_sync are there to ensure that the previous writes to things like DMA coherent memory are visible to the device before the device sees the write to its register. Moreover, if there are descriptors in DMA coherent memory, and there is a bit in them which must be set to hand ownership over to the device (eg, as in an ethernet driver) then _additionally_ the driver already has to add an additional barrier between the remainder of the descriptor update and handing the descriptor over, and that barrier should ensure that *any* effects prior to the barrier are seen before the effects of the accesses after the barrier. That said, in __dma_page_cpu_to_dev() we do the L1 followed by the L2 cache. The effects of cleaning out the L1 cache must be seen by the L2 cache before the effects of cleaning the L2 cache. So we _do_ have an ordering requirement there which is purely down to the implementation, and not down to any other requirements. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it.