From mboxrd@z Thu Jan  1 00:00:00 1970
From: Roland Scheidegger <rscheidegger_lists@hispeed.ch>
Subject: Re: drm/radeon/kms: improve performance of blit-copy
Date: Thu, 13 Oct 2011 21:00:02 +0200
Message-ID: <4E973532.9040503@hispeed.ch>
References: <1318476582-8365-1-git-send-email-ihadzic@research.bell-labs.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from fep32.mx.upcmail.net (fep32.mx.upcmail.net [62.179.121.50])
	by gabe.freedesktop.org (Postfix) with ESMTP id D6BAD9E87A
	for <dri-devel@lists.freedesktop.org>;
	Thu, 13 Oct 2011 12:34:26 -0700 (PDT)
In-Reply-To: <1318476582-8365-1-git-send-email-ihadzic@research.bell-labs.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Ilija Hadzic <ihadzic@research.bell-labs.com>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

Am 13.10.2011 05:29, schrieb Ilija Hadzic:
> 
> The following set of patches will improve the performance of
> blit-copy functions for Radeon GPUs based on R600, R700, Evergreen
> and NI ASICs.
> 
> The foundation for improvement is the use of tiled mode access (which
> for copying bo's can be used regardless of whether the content is
> tiled or not), and segmenting the memory block being copied into
> rectangles whose edge ratio is between 1:1 and 1:2. This maximizes
> the number of PCIe transactions that use maximum payload size
> (typically 128 bytes) and also creates a memory access pattern that
> is more favorable for both VRAM and host DRAM than what's currently
> in the kernel.
> 
> To come up with the new blit-copy code, I did a lot of PCIe traffic
> analysis with the bus analyzer and also had many discussions with
> Alex, trying to explain what's going on (thanks to Alex for his
> time).
> 
> Below (at the end of this note) are the results of some benchmarks 
> that I did with various GPUs (all in the same host: Intel i7 CPU, X58
> chipset, three DRAM channels). To run the tests on your machine load
> the radeon module with 'benchmark=1 pcie_gen2=1' parameters. Most
> significant improvement is in the upstream (VRAM to GART) direction
> because that's where the PCIe transactions were fragmented and also
> where memory access pattern was such that it created a lot of 
> backpressure from the host.
> 
> It is also interesting that high-end devices (e.g. Cayman) exhibit 
> the least improvement and were the worst to begin with. This is 
> because high-end devices copy more tiles in parallel which in turn
> can create bank conflicts on host memory and cause the host to do
> lots of bank-close/precharge/bank-open cycles.
Interesting stuff! Nice results showing the low-end devices completely
blowing away the high-end ones for VRAM->GTT blits :-).
I guess it isn't possible to temporarily disable some RBEs or otherwise
reconfigure the chip that you could get the same performance for the
high-end chips? Granted the high-end chips are only much slower for
VRAM->GTT according to these results but even the other way it's still
~20% or so.
Anyway, can't comment much on the patches, though the idea certainly
seems to make sense.

Roland


> As an added "bonus", I also did some code cleanup and consolidated 
> the repeated code into common function, so r600 and evergreen/NI 
> parts now share the blit-copy code. I also expanded on the benchmark
> coverage, so the module now takes benckmark parameter value between 1
> and 8 and each results in running a different benchmark.
> 
> For details, see the commit log messages and the code. I have been
> running with these patches for a few months (and I kept rebasing them
> to drm-core-next as the public git progressed) and I used them in a
> system setup that does *many* copying of this kind (and does them
> frequently); I have not seen instabilities introduced by these
> patches. I also verified the correctness of the copy using test=1
> parameter for each GPU that I had and the test passed.
> 
> I would welcome some feedback and if you run the benchmarks with the
> new blit code, I would very much like to hear what kind of
> improvement you are seeing.
>