From mboxrd@z Thu Jan  1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 103804] igt/benchmark/gem_exec_nop does not permit to select
 execution ring
Date: Fri, 17 Nov 2017 22:30:37 +0000
Message-ID: <bug-103804-502@http.bugs.freedesktop.org/>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0529881261=="
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
 [131.252.210.165])
 by gabe.freedesktop.org (Postfix) with ESMTP id 6D5A16E306
 for <dri-devel@lists.freedesktop.org>; Fri, 17 Nov 2017 22:30:37 +0000 (UTC)
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org


--===============0529881261==
Content-Type: multipart/alternative; boundary="15109578370.28EBA8eb.23295";
 charset="UTF-8"


--15109578370.28EBA8eb.23295
Date: Fri, 17 Nov 2017 22:30:37 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

https://bugs.freedesktop.org/show_bug.cgi?id=3D103804

            Bug ID: 103804
           Summary: igt/benchmark/gem_exec_nop does not permit to select
                    execution ring
           Product: DRI
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: IGT
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: dmitry.v.rogozhkin@intel.com

Looking into the code igt/benchmark/gem_exec_nop should permit to select a =
RING
to load. However, this feature is not functional. For example, assuming that
i915 PMU patches https://patchwork.freedesktop.org/series/29735/ are applied
for the kernel, try:

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e rcs
  4.433

 Performance counter stats for 'system wide':

     2,002,891,967 ns   i915/rcs0-busy/
           280,244 ns   i915/vcs0-busy/
           118,222 ns   i915/vcs1-busy/
           361,440 ns   i915/vecs0-busy/
           365,253 ns   i915/bcs0-busy/

       3.033127723 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vcs
  4.531

 Performance counter stats for 'system wide':

     2,005,028,005 ns   i915/rcs0-busy/
           304,735 ns   i915/vcs0-busy/
           100,476 ns   i915/vcs1-busy/
           348,364 ns   i915/vecs0-busy/
           383,365 ns   i915/bcs0-busy/

       3.048972240 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e bcs
  4.548

 Performance counter stats for 'system wide':

     2,003,302,067 ns   i915/rcs0-busy/
           229,991 ns   i915/vcs0-busy/
            50,410 ns   i915/vcs1-busy/
           249,257 ns   i915/vecs0-busy/
           267,072 ns   i915/bcs0-busy/

       3.050740036 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vecs
  4.547

 Performance counter stats for 'system wide':

     2,002,918,507 ns   i915/rcs0-busy/
           251,940 ns   i915/vcs0-busy/
           134,314 ns   i915/vcs1-busy/
           345,163 ns   i915/vecs0-busy/
           366,121 ns   i915/bcs0-busy/

       3.054508956 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e all
  4.488

 Performance counter stats for 'system wide':

     2,004,461,103 ns   i915/rcs0-busy/
           194,267 ns   i915/vcs0-busy/
           104,581 ns   i915/vcs1-busy/
           306,019 ns   i915/vecs0-busy/
           291,113 ns   i915/bcs0-busy/

       3.061850018 seconds time elapsed

So, you see that the load goes always to rcs0. The reason seems to be the
commit:

commit 05ca171aa9a6902614241f9685de2f62f30126d8
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jun 3 10:43:09 2016 +0100

    benchmarks/gem_exec_nop: Extend submission to check write inter-engine =
sync

    Currently, we look at the throughput for submitting a read batch to a
    single engine or any. The kernel optimises for this by allowing multiple
    engine to read at the same time, but writes are exclusive to a single
    engine. So lets try to measure the impact of inserting the barriers
    between writes on different engines.

    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

which actually shadowed the RING parameter in the loop function:

static int loop(unsigned ring, int reps, int ncpus, unsigned flags) {

       all_nengine =3D 0;
        for (ring =3D 1; ring < 16; ring++) {
                execbuf.flags &=3D ~ENGINE_FLAGS;
                execbuf.flags |=3D ring;
                if (__gem_execbuf(fd, &execbuf) =3D=3D 0)
                        all_engines[all_nengine++] =3D ring;
        }

        if (ring =3D=3D -1) {
                nengine =3D all_nengine;
                memcpy(engines, all_engines, all_nengine*sizeof(engines[0])=
);
        } else {
                nengine =3D 1;
                engines[0] =3D ring;
        }

}

--=20
You are receiving this mail because:
You are the assignee for the bug.=

--15109578370.28EBA8eb.23295
Date: Fri, 17 Nov 2017 22:30:37 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

<html>
    <head>
      <base href=3D"https://bugs.freedesktop.org/">
    </head>
    <body><table border=3D"1" cellspacing=3D"0" cellpadding=3D"8">
        <tr>
          <th>Bug ID</th>
          <td><a class=3D"bz_bug_link=20
          bz_status_NEW "
   title=3D"NEW - igt/benchmark/gem_exec_nop does not permit to select exec=
ution ring"
   href=3D"https://bugs.freedesktop.org/show_bug.cgi?id=3D103804">103804</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>igt/benchmark/gem_exec_nop does not permit to select executio=
n ring
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DRI
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>unspecified
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>IGT
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dri-devel&#64;lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dmitry.v.rogozhkin&#64;intel.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Looking into the code igt/benchmark/gem_exec_nop should permit=
 to select a RING
to load. However, this feature is not functional. For example, assuming that
i915 PMU patches <a href=3D"https://patchwork.freedesktop.org/series/29735/=
">https://patchwork.freedesktop.org/series/29735/</a> are applied
for the kernel, try:

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e rcs
  4.433

 Performance counter stats for 'system wide':

     2,002,891,967 ns   i915/rcs0-busy/
           280,244 ns   i915/vcs0-busy/
           118,222 ns   i915/vcs1-busy/
           361,440 ns   i915/vecs0-busy/
           365,253 ns   i915/bcs0-busy/

       3.033127723 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vcs
  4.531

 Performance counter stats for 'system wide':

     2,005,028,005 ns   i915/rcs0-busy/
           304,735 ns   i915/vcs0-busy/
           100,476 ns   i915/vcs1-busy/
           348,364 ns   i915/vecs0-busy/
           383,365 ns   i915/bcs0-busy/

       3.048972240 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e bcs
  4.548

 Performance counter stats for 'system wide':

     2,003,302,067 ns   i915/rcs0-busy/
           229,991 ns   i915/vcs0-busy/
            50,410 ns   i915/vcs1-busy/
           249,257 ns   i915/vecs0-busy/
           267,072 ns   i915/bcs0-busy/

       3.050740036 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e vecs
  4.547

 Performance counter stats for 'system wide':

     2,002,918,507 ns   i915/rcs0-busy/
           251,940 ns   i915/vcs0-busy/
           134,314 ns   i915/vcs1-busy/
           345,163 ns   i915/vecs0-busy/
           366,121 ns   i915/bcs0-busy/

       3.054508956 seconds time elapsed

# perf stat -e
i915/rcs0-busy/,i915/vcs0-busy/,i915/vcs1-busy/,i915/vecs0-busy/,i915/bcs0-=
busy/
-a ./gem_exec_nop -e all
  4.488

 Performance counter stats for 'system wide':

     2,004,461,103 ns   i915/rcs0-busy/
           194,267 ns   i915/vcs0-busy/
           104,581 ns   i915/vcs1-busy/
           306,019 ns   i915/vecs0-busy/
           291,113 ns   i915/bcs0-busy/

       3.061850018 seconds time elapsed

So, you see that the load goes always to rcs0. The reason seems to be the
commit:

commit 05ca171aa9a6902614241f9685de2f62f30126d8
Author: Chris Wilson &lt;<a href=3D"mailto:chris&#64;chris-wilson.co.uk">ch=
ris&#64;chris-wilson.co.uk</a>&gt;
Date:   Fri Jun 3 10:43:09 2016 +0100

    benchmarks/gem_exec_nop: Extend submission to check write inter-engine =
sync

    Currently, we look at the throughput for submitting a read batch to a
    single engine or any. The kernel optimises for this by allowing multiple
    engine to read at the same time, but writes are exclusive to a single
    engine. So lets try to measure the impact of inserting the barriers
    between writes on different engines.

    Signed-off-by: Chris Wilson &lt;<a href=3D"mailto:chris&#64;chris-wilso=
n.co.uk">chris&#64;chris-wilson.co.uk</a>&gt;

which actually shadowed the RING parameter in the loop function:

static int loop(unsigned ring, int reps, int ncpus, unsigned flags) {

       all_nengine =3D 0;
        for (ring =3D 1; ring &lt; 16; ring++) {
                execbuf.flags &amp;=3D ~ENGINE_FLAGS;
                execbuf.flags |=3D ring;
                if (__gem_execbuf(fd, &amp;execbuf) =3D=3D 0)
                        all_engines[all_nengine++] =3D ring;
        }

        if (ring =3D=3D -1) {
                nengine =3D all_nengine;
                memcpy(engines, all_engines, all_nengine*sizeof(engines[0])=
);
        } else {
                nengine =3D 1;
                engines[0] =3D ring;
        }

}</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>=

--15109578370.28EBA8eb.23295--

--===============0529881261==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==

--===============0529881261==--