From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?windows-1252?Q?Michel_D=E4nzer?= <michel@daenzer.net>
Subject: Re: [PATCH 0/5] radeon: Write-combined CPU mappings of
	BOs in GTT
Date: Wed, 23 Jul 2014 12:54:35 +0900
Message-ID: <53CF31FB.7070508@daenzer.net>
References: <1405591275-14461-1-git-send-email-michel@daenzer.net>
 <53C7A0D0.6080202@vodafone.de> <53C88F8C.40907@daenzer.net>
 <53C941A1.60001@vodafone.de> <53C9C6CD.80204@daenzer.net>
 <53CCCA32.70507@vodafone.de>
Mime-Version: 1.0
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Return-path: <mesa-dev-bounces@lists.freedesktop.org>
In-Reply-To: <53CCCA32.70507@vodafone.de>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/mesa-dev>,
 <mailto:mesa-dev-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/mesa-dev>
List-Post: <mailto:mesa-dev@lists.freedesktop.org>
List-Help: <mailto:mesa-dev-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/mesa-dev>,
 <mailto:mesa-dev-request@lists.freedesktop.org?subject=subscribe>
Errors-To: mesa-dev-bounces@lists.freedesktop.org
Sender: "mesa-dev" <mesa-dev-bounces@lists.freedesktop.org>
To: =?windows-1252?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>, Alex Deucher <alexdeucher@gmail.com>
Cc: mesa-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

On 21.07.2014 17:07, Christian K=F6nig wrote:
> Am 19.07.2014 03:15, schrieb Michel D=E4nzer:
>> On 19.07.2014 00:47, Christian K=F6nig wrote:
>>> Am 18.07.2014 05:07, schrieb Michel D=E4nzer:
>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >=3D SI
>>>>> I'm still not very keen with this change since I still don't
>>>>> understand
>>>>> the reason why it's faster than with GTT. Definitely needs more
>>>>> testing
>>>>> on a wider range of systems.
>>>> Sure. If anyone wants to give this patch a spin and see if they can
>>>> measure any performance difference, good or bad, that would be
>>>> interesting.
>>>>
>>>>> Maybe limit it to APUs for now?
>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an
>>>> even
>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my
>>>> system. I suspect it may depend on the bandwidth available for PCIe vs.
>>>> system memory though.
>>> I've made a few tests today with the kernel part of the patches running
>>> Xonotic on Ultra in 1920 x 1080.
>>>
>>> Without any patches I get around ~47.0fps on average with my dedicated
>>> HD7870.
>>>
>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and
>>> IBs on >=3D SI" and that goes down to ~45.3fps.
>>>
>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >=
=3D
>>> SI" and the frame rate goes down to ~27.74fps.
>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as
>> well.

I haven't been able to consistently[0] measure any significant
difference between all placements of the rings and IBs with Xonotic or
Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU
memory bandwidth bound rather than CS bound anyway, so a ~40% hit from
that kernel patch alone is very surprising. Are you sure it wasn't just
the same kind of variation as described below?

[0] There were slightly different results sometimes, but next time I
tried the same setup again, it was back to the same as always. So it
seemed to depend more on the particular system boot / test run / moon
phase / ... than the kernel patches themselves.


>> Alex, given those numbers, it's probably best if you remove the "Use
>> write-combined CPU mappings of rings and IBs on >=3D SI" change from your
>> tree as well for now.
> =

> I wouldn't go as far as reverting the patch. It just needs a bit more
> fine tuning and that can happen in the 3.17rc cycle.

There's no need to revert it, just drop it from the tree. I'd still
prefer that for now.


> My tests clearly show that we still can use USWC for the ring buffer on
> SI and probably earlier chips as well.

Yeah, that might be the safest approach for now.


-- =

Earthling Michel D=E4nzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer