From mboxrd@z Thu Jan  1 00:00:00 1970
From: catalin.marinas@arm.com (Catalin Marinas)
Date: Wed, 23 Apr 2014 17:02:16 +0100
Subject: [PATCH] ARM: mm: dma: Update coherent streaming apis with
 missing memory barrier
In-Reply-To: <20140423090251.GA5281@arm.com>
References: <1398103390-31968-1-git-send-email-santosh.shilimkar@ti.com>
 <201404222153.41786.arnd@arndb.de> <5356C9D1.2060001@ti.com>
 <5895821.PvurJ8TWz2@wuerfel> <5356D163.1070304@ti.com>
 <20140423090251.GA5281@arm.com>
Message-ID: <20140423160216.GC2208@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Wed, Apr 23, 2014 at 10:02:51AM +0100, Will Deacon wrote:
> On Tue, Apr 22, 2014 at 09:30:27PM +0100, Santosh Shilimkar wrote:
> > writel() or an explcit barrier in the driver will do the job. I was
> > just thinking that we are trying to work around the short comings
> > of streaming API by adding barriers in the driver. For example
> > on a non-coherent system, i don't need that barrier because
> > dma_ops does take care of that.
> 
> I wonder whether we can remove those barriers altogether then (from the DMA
> cache operations). For the coherent case, the driver must provide the
> barrier (probably via writel) so the non-coherent case shouldn't be any
> different.

For the DMA_TO_DEVICE case the effect should be the same as wmb()
implies dsb (and outer_sync() for write). But the reason we have
barriers in the DMA ops is slightly different - the completion of the
cache maintenance operation rather than ordering with any previous
writes to the DMA buffer.

In the DMA_FROM_DEVICE scenario for example, the CPU gets an interrupt
for a finished DMA transfer and executes dma_unmap_single() prior to
accessing the page. However the CPU access after unmapping is done using
normal LDR/STR which do not imply any barrier. So we need to ensure the
completion of the cache invalidation in the dma operation.

In the I/O coherency case, I would say it is the responsibility of the
device/hardware to ensure that the data is visible to all observers
(CPUs) prior to issuing a interrupt for DMA-ready. Looking at the mvebu
code, I think it covers such scenario from-device or bidirectional
scenarios.

Maybe Santosh still has a point ;) but I don't know what the right
barrier would be here. And I really *hate* per-SoC/snoop unit barriers
(I still hope a dsb would do the trick on newer/ARMv8 systems).

> I need some more coffee and a serious look at the code, but we may be able
> to use dmb instructions to order the cache maintenance and avoid a final
> dsb for completion.

Is the dmb enough (assuming no outer cache)? We need to ensure the
flushed cache lines reach the memory for device access.

-- 
Catalin