From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Michel_D=E4nzer?= Subject: Re: [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT Date: Wed, 23 Jul 2014 12:54:35 +0900 Message-ID: <53CF31FB.7070508@daenzer.net> References: <1405591275-14461-1-git-send-email-michel@daenzer.net> <53C7A0D0.6080202@vodafone.de> <53C88F8C.40907@daenzer.net> <53C941A1.60001@vodafone.de> <53C9C6CD.80204@daenzer.net> <53CCCA32.70507@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <53CCCA32.70507@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: mesa-dev-bounces@lists.freedesktop.org Sender: "mesa-dev" To: =?windows-1252?Q?Christian_K=F6nig?= , Alex Deucher Cc: mesa-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 21.07.2014 17:07, Christian K=F6nig wrote: > Am 19.07.2014 03:15, schrieb Michel D=E4nzer: >> On 19.07.2014 00:47, Christian K=F6nig wrote: >>> Am 18.07.2014 05:07, schrieb Michel D=E4nzer: >>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >=3D SI >>>>> I'm still not very keen with this change since I still don't >>>>> understand >>>>> the reason why it's faster than with GTT. Definitely needs more >>>>> testing >>>>> on a wider range of systems. >>>> Sure. If anyone wants to give this patch a spin and see if they can >>>> measure any performance difference, good or bad, that would be >>>> interesting. >>>> >>>>> Maybe limit it to APUs for now? >>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an >>>> even >>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my >>>> system. I suspect it may depend on the bandwidth available for PCIe vs. >>>> system memory though. >>> I've made a few tests today with the kernel part of the patches running >>> Xonotic on Ultra in 1920 x 1080. >>> >>> Without any patches I get around ~47.0fps on average with my dedicated >>> HD7870. >>> >>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and >>> IBs on >=3D SI" and that goes down to ~45.3fps. >>> >>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= =3D >>> SI" and the frame rate goes down to ~27.74fps. >> Hmm, looks like I'll need to do more benchmarking of 3D workloads as >> well. I haven't been able to consistently[0] measure any significant difference between all placements of the rings and IBs with Xonotic or Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU memory bandwidth bound rather than CS bound anyway, so a ~40% hit from that kernel patch alone is very surprising. Are you sure it wasn't just the same kind of variation as described below? [0] There were slightly different results sometimes, but next time I tried the same setup again, it was back to the same as always. So it seemed to depend more on the particular system boot / test run / moon phase / ... than the kernel patches themselves. >> Alex, given those numbers, it's probably best if you remove the "Use >> write-combined CPU mappings of rings and IBs on >=3D SI" change from your >> tree as well for now. > = > I wouldn't go as far as reverting the patch. It just needs a bit more > fine tuning and that can happen in the 3.17rc cycle. There's no need to revert it, just drop it from the tree. I'd still prefer that for now. > My tests clearly show that we still can use USWC for the ring buffer on > SI and probably earlier chips as well. Yeah, that might be the safest approach for now. -- = Earthling Michel D=E4nzer | http://www.amd.com Libre software enthusiast | Mesa and X developer