From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: [PATCH 0/5] radeon: Write-combined CPU mappings of
	BOs in GTT
Date: Wed, 23 Jul 2014 09:32:04 +0200
Message-ID: <53CF64F4.5030809@vodafone.de>
References: <1405591275-14461-1-git-send-email-michel@daenzer.net>
 <53C7A0D0.6080202@vodafone.de> <53C88F8C.40907@daenzer.net>
 <53C941A1.60001@vodafone.de> <53C9C6CD.80204@daenzer.net>
 <53CCCA32.70507@vodafone.de> <53CF31FB.7070508@daenzer.net>
 <53CF5961.8010609@vodafone.de> <53CF626B.4030201@daenzer.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <mesa-dev-bounces@lists.freedesktop.org>
In-Reply-To: <53CF626B.4030201@daenzer.net>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/mesa-dev>,
 <mailto:mesa-dev-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/mesa-dev>
List-Post: <mailto:mesa-dev@lists.freedesktop.org>
List-Help: <mailto:mesa-dev-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/mesa-dev>,
 <mailto:mesa-dev-request@lists.freedesktop.org?subject=subscribe>
Errors-To: mesa-dev-bounces@lists.freedesktop.org
Sender: "mesa-dev" <mesa-dev-bounces@lists.freedesktop.org>
To: =?windows-1252?Q?Michel_D=E4nzer?= <michel@daenzer.net>, Alex Deucher <alexdeucher@gmail.com>
Cc: mesa-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

Am 23.07.2014 09:21, schrieb Michel D=E4nzer:
> On 23.07.2014 15:42, Christian K=F6nig wrote:
>> Am 23.07.2014 05:54, schrieb Michel D=E4nzer:
>>> On 21.07.2014 17:07, Christian K=F6nig wrote:
>>>> Am 19.07.2014 03:15, schrieb Michel D=E4nzer:
>>>>> On 19.07.2014 00:47, Christian K=F6nig wrote:
>>>>>> Am 18.07.2014 05:07, schrieb Michel D=E4nzer:
>>>>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >=3D SI
>>>>>>>> I'm still not very keen with this change since I still don't
>>>>>>>> understand
>>>>>>>> the reason why it's faster than with GTT. Definitely needs more
>>>>>>>> testing
>>>>>>>> on a wider range of systems.
>>>>>>> Sure. If anyone wants to give this patch a spin and see if they can
>>>>>>> measure any performance difference, good or bad, that would be
>>>>>>> interesting.
>>>>>>>
>>>>>>>> Maybe limit it to APUs for now?
>>>>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an
>>>>>>> even
>>>>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU
>>>>>>> on my
>>>>>>> system. I suspect it may depend on the bandwidth available for
>>>>>>> PCIe vs.
>>>>>>> system memory though.
>>>>>> I've made a few tests today with the kernel part of the patches
>>>>>> running
>>>>>> Xonotic on Ultra in 1920 x 1080.
>>>>>>
>>>>>> Without any patches I get around ~47.0fps on average with my dedicat=
ed
>>>>>> HD7870.
>>>>>>
>>>>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and
>>>>>> IBs on >=3D SI" and that goes down to ~45.3fps.
>>>>>>
>>>>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on =
>=3D
>>>>>> SI" and the frame rate goes down to ~27.74fps.
>>>>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as
>>>>> well.
>>> I haven't been able to consistently[0] measure any significant
>>> difference between all placements of the rings and IBs with Xonotic or
>>> Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU
>>> memory bandwidth bound rather than CS bound anyway, so a ~40% hit from
>>> that kernel patch alone is very surprising. Are you sure it wasn't just
>>> the same kind of variation as described below?
>> Yes, I've measured that multiple times and the results where quite
>> consistent.
>>
>> But I didn't measured it on a Bonaire, where the bottleneck probably
>> isn't the CPU load. I measured it on a fast Pitcairn
> Ahem, my Bonaire is cranking out ~90fps of Xonotic Ultra at 1920x1080.
> :) (And AFAIK there are even faster Bonaire variants)

My Bonaire only makes something around 17fps with Xonotic Ultra at =

1920x1080, might be a good idea to figure out why at some point.

>
>> and there Xonotic was clearly affected by the patches.
> Okay, I hadn't realized we're not doing any command stream checking as
> of CIK, that probably explains the difference.

Good point, I should probably test the putting IBs in VRAM patch with my =

Bonaire as well.

>>>> My tests clearly show that we still can use USWC for the ring buffer on
>>>> SI and probably earlier chips as well.
>>> Yeah, that might be the safest approach for now.
>> How about using USWC for the rings on all chips since R600
> Any particular reason against doing it for older chips which support
> unsnooped access as well?

Not really, I just didn't noticed that older chips can do this as well.

Christian.

>
>> and for the IB only on CIK? As far as I can see that should do the trick
>> quite well.
> Yeah, sounds good.
>
>