From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Michel_D=E4nzer?= Subject: Re: [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT Date: Sat, 19 Jul 2014 10:15:57 +0900 Message-ID: <53C9C6CD.80204@daenzer.net> References: <1405591275-14461-1-git-send-email-michel@daenzer.net> <53C7A0D0.6080202@vodafone.de> <53C88F8C.40907@daenzer.net> <53C941A1.60001@vodafone.de> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <53C941A1.60001@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: mesa-dev-bounces@lists.freedesktop.org Sender: "mesa-dev" To: =?windows-1252?Q?Christian_K=F6nig?= , Alex Deucher Cc: mesa-dev@lists.freedesktop.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 19.07.2014 00:47, Christian K=F6nig wrote: > Am 18.07.2014 05:07, schrieb Michel D=E4nzer: >>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >=3D SI >>> I'm still not very keen with this change since I still don't understand >>> the reason why it's faster than with GTT. Definitely needs more testing >>> on a wider range of systems. >> Sure. If anyone wants to give this patch a spin and see if they can >> measure any performance difference, good or bad, that would be >> interesting. >> >>> Maybe limit it to APUs for now? >> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an even >> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my >> system. I suspect it may depend on the bandwidth available for PCIe vs. >> system memory though. > = > I've made a few tests today with the kernel part of the patches running > Xonotic on Ultra in 1920 x 1080. > = > Without any patches I get around ~47.0fps on average with my dedicated > HD7870. > = > Adding only "drm/radeon: Use write-combined CPU mappings of rings and > IBs on >=3D SI" and that goes down to ~45.3fps. > = > Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >=3D > SI" and the frame rate goes down to ~27.74fps. Hmm, looks like I'll need to do more benchmarking of 3D workloads as well. Alex, given those numbers, it's probably best if you remove the "Use write-combined CPU mappings of rings and IBs on >=3D SI" change from your tree as well for now. -- = Earthling Michel D=E4nzer | http://www.amd.com Libre software enthusiast | Mesa and X developer