All of lore.kernel.org
 help / color / mirror / Atom feed
* drm/radeon/kms: improve performance of blit-copy
@ 2011-10-13  3:29 Ilija Hadzic
  2011-10-13  3:29 ` [PATCH 1/9] drm/radeon/kms: improve evergreen blit code Ilija Hadzic
                   ` (10 more replies)
  0 siblings, 11 replies; 13+ messages in thread
From: Ilija Hadzic @ 2011-10-13  3:29 UTC (permalink / raw)
  To: airlied, dri-devel


The following set of patches will improve the performance
of blit-copy functions for Radeon GPUs based on 
R600, R700, Evergreen and NI ASICs.

The foundation for improvement is the use of tiled mode access
(which for copying bo's can be used regardless of whether the
content is tiled or not), and segmenting the memory block
being copied into rectangles whose edge ratio is between 1:1
and 1:2. This maximizes the number of PCIe transactions that
use maximum payload size (typically 128 bytes) and also 
creates a memory access pattern that is more favorable for
both VRAM and host DRAM than what's currently in the kernel.

To come up with the new blit-copy code, I did a lot of 
PCIe traffic analysis with the bus analyzer and also 
had many discussions with Alex, trying to explain what's 
going on (thanks to Alex for his time).

Below (at the end of this note) are the results of some benchmarks
that I did with various GPUs (all in the same host: Intel i7 CPU,
X58 chipset, three DRAM channels). To run the tests on your machine
load the radeon module with 'benchmark=1 pcie_gen2=1' parameters.
Most significant improvement is in the upstream (VRAM to GART)
direction because that's where the PCIe transactions were fragmented 
and also where memory access pattern was such that it created a lot of 
backpressure from the host.

It is also interesting that high-end devices (e.g. Cayman) exhibit
the least improvement and were the worst to begin with. This is
because high-end devices copy more tiles in parallel which 
in turn can create bank conflicts on host memory and cause the
host to do lots of bank-close/precharge/bank-open cycles. 

As an added "bonus", I also did some code cleanup and consolidated
the repeated code into common function, so r600 and evergreen/NI
parts now share the blit-copy code. I also expanded on the
benchmark coverage, so the module now takes benckmark parameter
value between 1 and 8 and each results in running a different 
benchmark.

For details, see the commit log messages and the code.
I have been running with these patches for a few months 
(and I kept rebasing them to drm-core-next as the public 
git progressed) and I used them in a system setup that does
*many* copying of this kind (and does them frequently); I 
have not seen instabilities introduced by these patches. I also
verified the correctness of the copy using test=1 parameter
for each GPU that I had and the test passed.

I would welcome some feedback and if you run the benchmarks
with the new blit code, I would very much like to hear
what kind of improvement you are seeing.


BENCHMARK RESULTS:
==================

1) VRAM to GTT 
==============

Card (ASIC)	VRAM		Before	After
---------------------------------------------
5570 (Redwood)	DDR3 1600MHZ	 454	3912
6450 (Caicos)	DDR5 3200MHz	3718	5090
6570 (Turks)	DDR3 1800MHz	 484	4144
5450 (Cedar)	DDR3 1600MHz	3679	5090
5450 (Cedar)	DDR2  800MHz	2695	4639
E4690 (RV730)	DDR3 1400MHZ	 485	4969
E6760 (Turks)	DDR5 3200MHz	 474	4177
V5700 (RV730)	DDR3 ????MHz	 488	4297
2260 (RV620)	DDR2 ????MHz	 494	3093
6870 (Barts)	DDR5 4200MHz	 475	1113
6970 (Cayman)	DDR5 4200MHz	 473	 710

2) GTT to VRAM
==============

Card (ASIC)	VRAM		Before	After
---------------------------------------------
5570 (Redwood)	DDR3 1600MHz	3158	3360
6450 (Caicos)	DDR5 3200MHz	2995	3393
6570 (Turks)	DDR3 1800MHz	3039	3339
5450 (Cedar)	DDR3 1600MHz	3246	3404
5450 (Cedar)	DDR2  800MHz	2614	3371
E4690 (RV730)	DDR3 1400MHz 	3084	3426
E6760 (Turks)	DDR5 3200MHz	2443	2570
V5700 (RV730)	DDR3 ????MHz	3187	3506	
2260 (RV620)	DDR2 ????MHz	 584	3246
6870 (Barts)	DDR5 4200MHz	2472	2601
6970 (Cayman)	DDR5 4200MHz	2460	2737

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-10-13 19:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-13  3:29 drm/radeon/kms: improve performance of blit-copy Ilija Hadzic
2011-10-13  3:29 ` [PATCH 1/9] drm/radeon/kms: improve evergreen blit code Ilija Hadzic
2011-10-13  3:29 ` [PATCH 2/9] drm/radeon/kms: improve r6xx " Ilija Hadzic
2011-10-13  3:29 ` [PATCH 3/9] drm/radeon/kms: demystify evergreen " Ilija Hadzic
2011-10-13  3:29 ` [PATCH 4/9] drm/radeon/kms: demystify r600 " Ilija Hadzic
2011-10-13  3:29 ` [PATCH 5/9] drm/radeon/kms: cleanup benchmark code Ilija Hadzic
2011-10-13  3:29 ` [PATCH 6/9] drm/radeon/kms: add more elaborate benchmarks Ilija Hadzic
2011-10-13  3:29 ` [PATCH 7/9] drm/radeon/kms: cleanup r600 blit code Ilija Hadzic
2011-10-13  3:29 ` [PATCH 8/9] drm/radeon/kms: blit code commoning Ilija Hadzic
2011-10-13  3:29 ` [PATCH 9/9] drm/radeon/kms: rename a variable for consistency Ilija Hadzic
2011-10-13 18:21 ` drm/radeon/kms: improve performance of blit-copy Ilija Hadzic
2011-10-13 19:00 ` Roland Scheidegger
2011-10-13 19:18   ` Ilija Hadzic

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.