From mboxrd@z Thu Jan  1 00:00:00 1970
From: will.deacon@arm.com (Will Deacon)
Date: Wed, 23 Apr 2014 10:02:51 +0100
Subject: [PATCH] ARM: mm: dma: Update coherent streaming apis with
 missing memory barrier
In-Reply-To: <5356D163.1070304@ti.com>
References: <1398103390-31968-1-git-send-email-santosh.shilimkar@ti.com>
 <201404222153.41786.arnd@arndb.de> <5356C9D1.2060001@ti.com>
 <5895821.PvurJ8TWz2@wuerfel> <5356D163.1070304@ti.com>
Message-ID: <20140423090251.GA5281@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Tue, Apr 22, 2014 at 09:30:27PM +0100, Santosh Shilimkar wrote:
> On Tuesday 22 April 2014 04:23 PM, Arnd Bergmann wrote:
> > On Tuesday 22 April 2014 15:58:09 Santosh Shilimkar wrote:
> >> On Tuesday 22 April 2014 03:53 PM, Arnd Bergmann wrote:
> >>> On Tuesday 22 April 2014, Santosh Shilimkar wrote:
> >>>> On Tuesday 22 April 2014 10:08 AM, Arnd Bergmann wrote:
> >>>>> It's not the nicest API ever, but that's what it is and has been, mostly
> >>>>> for compatibility with x86, where the 'mov' instruction performing the
> >>>>> store to MMIO registers implies that all writes to DMA memory are
> >>>>> visible to the device.
> >>>>>
> >>>> This is not about writel() and  writel_relaxed(). The driver don't
> >>>> need that barrier. For example if the actual start of the DMA
> >>>> happens bit later, that doesn't matter for the driver.
> >>>>
> >>>> DMA APIs already do barriers today for non-coherent case. We
> >>>> are not talking anything new here. Sorry but I don't see the
> >>>> connection here.
> >>>
> >>> I don't think they do, nor should they. Can you tell me where
> >>> you see a barrier in dma_sync_single_for_cpu() or 
> >>> arm_dma_sync_single_for_device()? For all I can tell, they
> >>> only deal with L1 and L2 cache maintainance in arm_dma_ops.
> >>>
> >> The cache APIs used by dma_ops do have the necessary barriers
> >> at end of the of the cache operations. Thats what I meant. So for
> >> end user(Device driver), its transparent.
> > 
> > Ok, I see it now for the noncoherent operations, and I see
> > the same thing on PowerPC and MIPS, which also have both coherent
> > and noncoherent versions of their dma_map_ops.
> > 
> > However, I also see that neither of those does a wmb() for the
> > coherent version. I don't see why ARM should be different from
> > the others here, so if there is a reason to do a barrier there,
> > we should change all architectures. I still don't see a reason
> > why the barrier is needed though.
> > 
> Thats fair.
> 
> > Can you be more specific in your driver example why you think
> > the barrier in the writel() is not sufficient?
> > 
> writel() or an explcit barrier in the driver will do the job. I was
> just thinking that we are trying to work around the short comings
> of streaming API by adding barriers in the driver. For example
> on a non-coherent system, i don't need that barrier because
> dma_ops does take care of that.

I wonder whether we can remove those barriers altogether then (from the DMA
cache operations). For the coherent case, the driver must provide the
barrier (probably via writel) so the non-coherent case shouldn't be any
different.

I need some more coffee and a serious look at the code, but we may be able
to use dmb instructions to order the cache maintenance and avoid a final
dsb for completion.

Will