From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 103804] igt/benchmark/gem_exec_nop does not permit to select
execution ring
Date: Fri, 17 Nov 2017 22:30:37 +0000
Message-ID:
Bug ID
103804
Summary
igt/benchmark/gem_exec_nop does not permit to select executio=
n ring
Product
DRI
Version
unspecified
Hardware
Other
OS
All
Status
NEW
Severity
normal
Priority
medium
Component
IGT
Assignee
dri-devel@lists.freedesktop.org
Reporter
dmitry.v.rogozhkin@intel.com
Looking into the code igt/benchmark/gem_exec_nop should permit=
to select a RING
to load. However, this feature is not functional. For example, assuming that
i915 PMU patches https://patchwork.freedesktop.org/series/29735/ are applied
for the kernel, try:
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e rcs
4.433
Performance counter stats for 'system wide':
2,002,891,967 ns i915/rcs0-busy/
280,244 ns i915/vcs0-busy/
118,222 ns i915/vcs1-busy/
361,440 ns i915/vecs0-busy/
365,253 ns i915/bcs0-busy/
3.033127723 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vcs
4.531
Performance counter stats for 'system wide':
2,005,028,005 ns i915/rcs0-busy/
304,735 ns i915/vcs0-busy/
100,476 ns i915/vcs1-busy/
348,364 ns i915/vecs0-busy/
383,365 ns i915/bcs0-busy/
3.048972240 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e bcs
4.548
Performance counter stats for 'system wide':
2,003,302,067 ns i915/rcs0-busy/
229,991 ns i915/vcs0-busy/
50,410 ns i915/vcs1-busy/
249,257 ns i915/vecs0-busy/
267,072 ns i915/bcs0-busy/
3.050740036 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vecs
4.547
Performance counter stats for 'system wide':
2,002,918,507 ns i915/rcs0-busy/
251,940 ns i915/vcs0-busy/
134,314 ns i915/vcs1-busy/
345,163 ns i915/vecs0-busy/
366,121 ns i915/bcs0-busy/
3.054508956 seconds time elapsed
# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e all
4.488
Performance counter stats for 'system wide':
2,004,461,103 ns i915/rcs0-busy/
194,267 ns i915/vcs0-busy/
104,581 ns i915/vcs1-busy/
306,019 ns i915/vecs0-busy/
291,113 ns i915/bcs0-busy/
3.061850018 seconds time elapsed
So, you see that the load goes always to rcs0. The reason seems to be the
commit:
commit 05ca171aa9a6902614241f9685de2f62f30126d8
Author: Chris Wilson <ch=
ris@chris-wilson.co.uk>
Date: Fri Jun 3 10:43:09 2016 +0100
benchmarks/gem_exec_nop: Extend submission to check write inter-engine =
sync
Currently, we look at the throughput for submitting a read batch to a
single engine or any. The kernel optimises for this by allowing multiple
engine to read at the same time, but writes are exclusive to a single
engine. So lets try to measure the impact of inserting the barriers
between writes on different engines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
which actually shadowed the RING parameter in the loop function:
static int loop(unsigned ring, int reps, int ncpus, unsigned flags) {
all_nengine =3D 0;
for (ring =3D 1; ring < 16; ring++) {
execbuf.flags &=3D ~ENGINE_FLAGS;
execbuf.flags |=3D ring;
if (__gem_execbuf(fd, &execbuf) =3D=3D 0)
all_engines[all_nengine++] =3D ring;
}
if (ring =3D=3D -1) {
nengine =3D all_nengine;
memcpy(engines, all_engines, all_nengine*sizeof(engines[0])=
);
} else {
nengine =3D 1;
engines[0] =3D ring;
}
}