From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roland Scheidegger Subject: Re: drm/radeon/kms: improve performance of blit-copy Date: Thu, 13 Oct 2011 21:00:02 +0200 Message-ID: <4E973532.9040503@hispeed.ch> References: <1318476582-8365-1-git-send-email-ihadzic@research.bell-labs.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from fep32.mx.upcmail.net (fep32.mx.upcmail.net [62.179.121.50]) by gabe.freedesktop.org (Postfix) with ESMTP id D6BAD9E87A for ; Thu, 13 Oct 2011 12:34:26 -0700 (PDT) In-Reply-To: <1318476582-8365-1-git-send-email-ihadzic@research.bell-labs.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Ilija Hadzic Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org Am 13.10.2011 05:29, schrieb Ilija Hadzic: > > The following set of patches will improve the performance of > blit-copy functions for Radeon GPUs based on R600, R700, Evergreen > and NI ASICs. > > The foundation for improvement is the use of tiled mode access (which > for copying bo's can be used regardless of whether the content is > tiled or not), and segmenting the memory block being copied into > rectangles whose edge ratio is between 1:1 and 1:2. This maximizes > the number of PCIe transactions that use maximum payload size > (typically 128 bytes) and also creates a memory access pattern that > is more favorable for both VRAM and host DRAM than what's currently > in the kernel. > > To come up with the new blit-copy code, I did a lot of PCIe traffic > analysis with the bus analyzer and also had many discussions with > Alex, trying to explain what's going on (thanks to Alex for his > time). > > Below (at the end of this note) are the results of some benchmarks > that I did with various GPUs (all in the same host: Intel i7 CPU, X58 > chipset, three DRAM channels). To run the tests on your machine load > the radeon module with 'benchmark=1 pcie_gen2=1' parameters. Most > significant improvement is in the upstream (VRAM to GART) direction > because that's where the PCIe transactions were fragmented and also > where memory access pattern was such that it created a lot of > backpressure from the host. > > It is also interesting that high-end devices (e.g. Cayman) exhibit > the least improvement and were the worst to begin with. This is > because high-end devices copy more tiles in parallel which in turn > can create bank conflicts on host memory and cause the host to do > lots of bank-close/precharge/bank-open cycles. Interesting stuff! Nice results showing the low-end devices completely blowing away the high-end ones for VRAM->GTT blits :-). I guess it isn't possible to temporarily disable some RBEs or otherwise reconfigure the chip that you could get the same performance for the high-end chips? Granted the high-end chips are only much slower for VRAM->GTT according to these results but even the other way it's still ~20% or so. Anyway, can't comment much on the patches, though the idea certainly seems to make sense. Roland > As an added "bonus", I also did some code cleanup and consolidated > the repeated code into common function, so r600 and evergreen/NI > parts now share the blit-copy code. I also expanded on the benchmark > coverage, so the module now takes benckmark parameter value between 1 > and 8 and each results in running a different benchmark. > > For details, see the commit log messages and the code. I have been > running with these patches for a few months (and I kept rebasing them > to drm-core-next as the public git progressed) and I used them in a > system setup that does *many* copying of this kind (and does them > frequently); I have not seen instabilities introduced by these > patches. I also verified the correctness of the copy using test=1 > parameter for each GPU that I had and the test passed. > > I would welcome some feedback and if you run the benchmarks with the > new blit code, I would very much like to hear what kind of > improvement you are seeing. >