From mboxrd@z Thu Jan 1 00:00:00 1970
From: maxime.ripard@free-electrons.com (Maxime Ripard)
Date: Wed, 9 Mar 2016 11:58:07 +0100
Subject: [linux-sunxi] Re: [PATCH] dma: sun4i: expose block size and wait
cycle configuration to DMA users
In-Reply-To: <20160308100538.GO11154@localhost>
References: <1457344771-12946-1-git-send-email-boris.brezillon@free-electrons.com>
<20160307145429.GG11154@localhost>
<20160307160857.577bb04d@bbrezillon>
<20160307203024.GD8418@lukather> <20160308025547.GI11154@localhost>
<20160308075131.GE8418@lukather> <56DE9077.3020905@redhat.com>
<20160308100538.GO11154@localhost>
Message-ID: <20160309105807.GO8418@lukather>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org
On Tue, Mar 08, 2016 at 03:35:38PM +0530, Vinod Koul wrote:
> On Tue, Mar 08, 2016 at 09:42:31AM +0100, Hans de Goede wrote:
> >
> >
> > I see 2 possible reasons why waiting till checking for drq can help:
> >
> > 1) A lot of devices have an internal fifo hooked up to a single mmio data
> > register which gets read using the general purpose dma-engine, it allows
> > this fifo to fill, and thus do burst transfers
> > (We've seen similar issues with the scanout engine for the display which
> > has its own dma engine, and doing larger transfers helps a lot).
> >
> > 2) Physical memory on the sunxi SoCs is (often) divided into banks
> > with a shared data / address bus doing bank-switches is expensive, so
> > this wait cycles may introduce latency which allows a user of another
> > bank to complete its RAM accesses before the dma engine forces a
> > bank switch, which ends up avoiding a lot of (interleaved) bank switches
> > while both try to access a different banj and thus waiting makes things
> > (much) faster in the end (again a known problem with the display
> > scanout engine).
> >
> >
> >
> > Note the differences these kinda tweaks make can be quite dramatic,
> > when using a 1920x1080p60 hdmi output on the A10 SoC with a 16 bit
> > memory bus (real world worst case scenario), the memory bandwidth
> > left for userspace processes (measured through memset) almost doubles
> > from 48 MB/s to 85 MB/s, source:
> > http://ssvb.github.io/2014/11/11/revisiting-fullhd-x11-desktop-performance-of-the-allwinner-a10.html
> >
> > TL;DR: Waiting before starting DMA allows for doing larger burst
> > transfers which ends up making things more efficient.
> >
> > Given this, I really expect there to be other dma-engines which
> > have some option to wait a bit before starting/unpausing a transfer
> > instead of starting it as soon as (more) data is available, so I think
> > this would make a good addition to dma_slave_config.
>
> I tend to agree but before we do that I would like this hypothesis to be
> confirmed :)
We can't confirm it, we don't have access to any documentation that
might explain what this is about.
Maxime
--
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL:
From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
id S932191AbcCILPl (ORCPT );
Wed, 9 Mar 2016 06:15:41 -0500
Received: from down.free-electrons.com ([37.187.137.238]:41009 "EHLO
mail.free-electrons.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
with ESMTP id S1753058AbcCILPg (ORCPT
);
Wed, 9 Mar 2016 06:15:36 -0500
Date: Wed, 9 Mar 2016 11:58:07 +0100
From: Maxime Ripard
To: Vinod Koul
Cc: Hans de Goede ,
Boris Brezillon ,
Dan Williams , dmaengine@vger.kernel.org,
Chen-Yu Tsai , linux-sunxi@googlegroups.com,
Emilio =?iso-8859-1?Q?L=F3pez?= ,
linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [linux-sunxi] Re: [PATCH] dma: sun4i: expose block size and wait
cycle configuration to DMA users
Message-ID: <20160309105807.GO8418@lukather>
References: <1457344771-12946-1-git-send-email-boris.brezillon@free-electrons.com>
<20160307145429.GG11154@localhost>
<20160307160857.577bb04d@bbrezillon>
<20160307203024.GD8418@lukather>
<20160308025547.GI11154@localhost>
<20160308075131.GE8418@lukather>
<56DE9077.3020905@redhat.com>
<20160308100538.GO11154@localhost>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="7jXYzuaWN+k1iCx7"
Content-Disposition: inline
In-Reply-To: <20160308100538.GO11154@localhost>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID:
X-Mailing-List: linux-kernel@vger.kernel.org
--7jXYzuaWN+k1iCx7
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Tue, Mar 08, 2016 at 03:35:38PM +0530, Vinod Koul wrote:
> On Tue, Mar 08, 2016 at 09:42:31AM +0100, Hans de Goede wrote:
> >
> >=20
> > I see 2 possible reasons why waiting till checking for drq can help:
> >=20
> > 1) A lot of devices have an internal fifo hooked up to a single mmio da=
ta
> > register which gets read using the general purpose dma-engine, it allows
> > this fifo to fill, and thus do burst transfers
> > (We've seen similar issues with the scanout engine for the display which
> > has its own dma engine, and doing larger transfers helps a lot).
> >=20
> > 2) Physical memory on the sunxi SoCs is (often) divided into banks
> > with a shared data / address bus doing bank-switches is expensive, so
> > this wait cycles may introduce latency which allows a user of another
> > bank to complete its RAM accesses before the dma engine forces a
> > bank switch, which ends up avoiding a lot of (interleaved) bank switches
> > while both try to access a different banj and thus waiting makes things
> > (much) faster in the end (again a known problem with the display
> > scanout engine).
> >=20
> >
> >=20
> > Note the differences these kinda tweaks make can be quite dramatic,
> > when using a 1920x1080p60 hdmi output on the A10 SoC with a 16 bit
> > memory bus (real world worst case scenario), the memory bandwidth
> > left for userspace processes (measured through memset) almost doubles
> > from 48 MB/s to 85 MB/s, source:
> > http://ssvb.github.io/2014/11/11/revisiting-fullhd-x11-desktop-performa=
nce-of-the-allwinner-a10.html
> >=20
> > TL;DR: Waiting before starting DMA allows for doing larger burst
> > transfers which ends up making things more efficient.
> >=20
> > Given this, I really expect there to be other dma-engines which
> > have some option to wait a bit before starting/unpausing a transfer
> > instead of starting it as soon as (more) data is available, so I think
> > this would make a good addition to dma_slave_config.
>=20
> I tend to agree but before we do that I would like this hypothesis to be
> confirmed :)
We can't confirm it, we don't have access to any documentation that
might explain what this is about.
Maxime
--=20
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
--7jXYzuaWN+k1iCx7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAEBAgAGBQJW4AG+AAoJEBx+YmzsjxAgT4kP/0aZz5KRIYH8d2t0LMPfkrdH
MW9AuFfpA+45lGoQ8lfszBolKWACutkaryVJg72EZw6E8RfxTl9+fiJzXF3NbJni
VVn7h99kUInfcnQgbwCCsBngPzyE2/TG1NxwYs0j+k2euxwpoOuQdavc+Ihevbug
8mQQqljTi0+Epgm6He6jxUvD/igU1ZwnDlkCjwyJ5BMTj58Mni1+6DMKdGXV2YgR
R5/XzbGFzfuWbofVIijHTEiycWY0ztaa155woV3O/+WwEZSazjqediMqJNTEw/24
YnavI6fsXudXiYGeJvhCMaVu4u2WXMFq8zQsHrj+00acONZJCDtvnAtaL9J4co8y
AEZARDIxQMDIdG9vHpxtFK5A9+3upNZhHoIjoF//crXCzmz4DB0hmgzAO+/xPhmp
uzPvYPtwaxZCec/NnQkb06cdkheRb98PAmW0Rg9TPWJHsfRFsMSUUh95P/9ygF+B
zKwCl/gfQRnjm3eRN/Cflv2BWP2yTW2UAmxzMy8HYKzoTYjUfLmQrbKr6mIZIB+7
m7SjUfMtKAJlB0hhhZtDQ81//CkplBdxVLpRyW507upklTZIhgEw0O/8xU18K7lQ
taxoI7hv/dr7e/VvwB+RaWXjtw1wvC+5EboH0ynxDCRecXiXFv4p1kc49hsVbPu3
n2ndKQzxNWURUST7zXvj
=CmwG
-----END PGP SIGNATURE-----
--7jXYzuaWN+k1iCx7--