From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Christian_K=F6nig?= Subject: Re: [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT Date: Mon, 21 Jul 2014 10:07:14 +0200 Message-ID: <53CCCA32.70507@vodafone.de> References: <1405591275-14461-1-git-send-email-michel@daenzer.net> <53C7A0D0.6080202@vodafone.de> <53C88F8C.40907@daenzer.net> <53C941A1.60001@vodafone.de> <53C9C6CD.80204@daenzer.net> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <53C9C6CD.80204@daenzer.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?windows-1252?Q?Michel_D=E4nzer?= , Alex Deucher Cc: mesa-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Am 19.07.2014 03:15, schrieb Michel D=E4nzer: > On 19.07.2014 00:47, Christian K=F6nig wrote: >> Am 18.07.2014 05:07, schrieb Michel D=E4nzer: >>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >=3D SI >>>> I'm still not very keen with this change since I still don't understand >>>> the reason why it's faster than with GTT. Definitely needs more testing >>>> on a wider range of systems. >>> Sure. If anyone wants to give this patch a spin and see if they can >>> measure any performance difference, good or bad, that would be >>> interesting. >>> >>>> Maybe limit it to APUs for now? >>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an even >>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my >>> system. I suspect it may depend on the bandwidth available for PCIe vs. >>> system memory though. >> I've made a few tests today with the kernel part of the patches running >> Xonotic on Ultra in 1920 x 1080. >> >> Without any patches I get around ~47.0fps on average with my dedicated >> HD7870. >> >> Adding only "drm/radeon: Use write-combined CPU mappings of rings and >> IBs on >=3D SI" and that goes down to ~45.3fps. >> >> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >=3D >> SI" and the frame rate goes down to ~27.74fps. > Hmm, looks like I'll need to do more benchmarking of 3D workloads as well. > > Alex, given those numbers, it's probably best if you remove the "Use > write-combined CPU mappings of rings and IBs on >=3D SI" change from your > tree as well for now. I wouldn't go as far as reverting the patch. It just needs a bit more = fine tuning and that can happen in the 3.17rc cycle. My tests clearly show that we still can use USWC for the ring buffer on = SI and probably earlier chips as well. The performance drop comes from = reading the IB content for command stream validation on SI. Putting the IB into VRAM still doesn't seems to be a good idea on = dedicated cards, but it might actually make sense on APUs. Regards, Christian.