From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Harrison <John.C.Harrison@Intel.com>
Subject: Re: [PATCH 2/2] igt/gem_exec_nop: clarify & extend
 output from parallel execution test
Date: Mon, 22 Aug 2016 15:39:17 +0100
Message-ID: <f78c2fac-ec2e-e3ef-19b0-bbae864aa5f2@Intel.com>
References: <1470238607-34415-1-git-send-email-david.s.gordon@intel.com>
 <1470238607-34415-3-git-send-email-david.s.gordon@intel.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1478897141=="
Return-path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by gabe.freedesktop.org (Postfix) with ESMTPS id 570426E4EC
 for <intel-gfx@lists.freedesktop.org>; Mon, 22 Aug 2016 14:39:20 +0000 (UTC)
In-Reply-To: <1470238607-34415-3-git-send-email-david.s.gordon@intel.com>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
To: intel-gfx@lists.freedesktop.org
List-Id: intel-gfx@lists.freedesktop.org

This is a multi-part message in MIME format.
--===============1478897141==
Content-Type: multipart/alternative;
 boundary="------------24351E2E9204752AF132A9B9"

This is a multi-part message in MIME format.
--------------24351E2E9204752AF132A9B9
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

On 03/08/2016 16:36, Dave Gordon wrote:
> To make sense of the output of the parallel execution test (preferably
> without reading the source!), we need to see the various measurements
> that it makes, specifically: time/batch on each engine separately, total
> time across all engines sequentially, and the time/batch when the work
> is distributed over all engines in parallel.
>
> Since we know the per-batch time on the slowest engine (which will
> determine the minimum possible execution time of any equal-split
> parallel test), we can also calculate a new figure representing the
> degree to which work on the faster engines is overlapped with that on
> the slowest engine, and therefore does not contribute to the total time.
> Here we choose to present it as a percentage, with parallel-time==serial
> time giving 0% overlap, up to parallel-time==slowest-engine-
> time/n_engines being 100%. Note that negative values are possible;
> values greater than 100% may also be possible, although less likely.
>
> Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
> ---
>   tests/gem_exec_nop.c | 15 ++++++++++-----
>   1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
> index c2bd472..05aa383 100644
> --- a/tests/gem_exec_nop.c
> +++ b/tests/gem_exec_nop.c
> @@ -137,7 +137,9 @@ static void all(int fd, uint32_t handle, int timeout)
>   		if (ignore_engine(fd, engine))
>   			continue;
>   
> -		time = nop_on_ring(fd, handle, engine, 1, &count) / count;
> +		time = nop_on_ring(fd, handle, engine, 2, &count) / count;
> +		igt_info("%s: %'lu cycles: %.3fus/batch\n",
> +			 e__->name, count, time*1e6);
>   		if (time > max) {
>   			name = e__->name;
>   			max = time;
> @@ -148,8 +150,9 @@ static void all(int fd, uint32_t handle, int timeout)
>   		engines[nengine++] = engine;
>   	}
>   	igt_require(nengine);
> -	igt_info("Maximum execution latency on %s, %.3fus, total %.3fus per cycle\n",
> -		 name, max*1e6, sum*1e6);
> +	igt_info("Slowest engine was %s, %.3fus/batch\n", name, max*1e6);
> +	igt_info("Total for all %d engines is %.3fus per cycle, average %.3fus/batch\n",
> +		 nengine, sum*1e6, sum*1e6/nengine);
>   
>   	memset(&obj, 0, sizeof(obj));
>   	obj.handle = handle;
> @@ -187,8 +190,10 @@ static void all(int fd, uint32_t handle, int timeout)
>   	igt_assert_eq(intel_detect_and_clear_missed_interrupts(fd), 0);
>   
>   	time = elapsed(&start, &now) / count;
> -	igt_info("All (%d engines): %'lu cycles, average %.3fus per cycle\n",
> -		 nengine, count, 1e6*time);
> +	igt_info("All %d engines (parallel/%d): %'lu cycles, "
> +		 "average %.3fus/batch, overlap %.1f%\n",
> +		 nengine, BURST, count,
> +		 1e6*time, 100*(sum-time)/(sum-(max/nengine)));
>   
>   	/* The rate limiting step is how fast the slowest engine can
>   	 * its queue of requests, if we wait upon a full ring all dispatch

I'm not entirely convinced about the overlap calculation. The other info 
is definitely useful though.

Reviewed-by: John Harrison <john.c.harrison@intel.com>


--------------24351E2E9204752AF132A9B9
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    On 03/08/2016 16:36, Dave Gordon wrote:<br>
    <blockquote
      cite="mid:1470238607-34415-3-git-send-email-david.s.gordon@intel.com"
      type="cite">
      <pre wrap="">To make sense of the output of the parallel execution test (preferably
without reading the source!), we need to see the various measurements
that it makes, specifically: time/batch on each engine separately, total
time across all engines sequentially, and the time/batch when the work
is distributed over all engines in parallel.

Since we know the per-batch time on the slowest engine (which will
determine the minimum possible execution time of any equal-split
parallel test), we can also calculate a new figure representing the
degree to which work on the faster engines is overlapped with that on
the slowest engine, and therefore does not contribute to the total time.
Here we choose to present it as a percentage, with parallel-time==serial
time giving 0% overlap, up to parallel-time==slowest-engine-
time/n_engines being 100%. Note that negative values are possible;
values greater than 100% may also be possible, although less likely.

Signed-off-by: Dave Gordon <a class="moz-txt-link-rfc2396E" href="mailto:david.s.gordon@intel.com">&lt;david.s.gordon@intel.com&gt;</a>
---
 tests/gem_exec_nop.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tests/gem_exec_nop.c b/tests/gem_exec_nop.c
index c2bd472..05aa383 100644
--- a/tests/gem_exec_nop.c
+++ b/tests/gem_exec_nop.c
@@ -137,7 +137,9 @@ static void all(int fd, uint32_t handle, int timeout)
 		if (ignore_engine(fd, engine))
 			continue;
 
-		time = nop_on_ring(fd, handle, engine, 1, &amp;count) / count;
+		time = nop_on_ring(fd, handle, engine, 2, &amp;count) / count;
+		igt_info("%s: %'lu cycles: %.3fus/batch\n",
+			 e__-&gt;name, count, time*1e6);
 		if (time &gt; max) {
 			name = e__-&gt;name;
 			max = time;
@@ -148,8 +150,9 @@ static void all(int fd, uint32_t handle, int timeout)
 		engines[nengine++] = engine;
 	}
 	igt_require(nengine);
-	igt_info("Maximum execution latency on %s, %.3fus, total %.3fus per cycle\n",
-		 name, max*1e6, sum*1e6);
+	igt_info("Slowest engine was %s, %.3fus/batch\n", name, max*1e6);
+	igt_info("Total for all %d engines is %.3fus per cycle, average %.3fus/batch\n",
+		 nengine, sum*1e6, sum*1e6/nengine);
 
 	memset(&amp;obj, 0, sizeof(obj));
 	obj.handle = handle;
@@ -187,8 +190,10 @@ static void all(int fd, uint32_t handle, int timeout)
 	igt_assert_eq(intel_detect_and_clear_missed_interrupts(fd), 0);
 
 	time = elapsed(&amp;start, &amp;now) / count;
-	igt_info("All (%d engines): %'lu cycles, average %.3fus per cycle\n",
-		 nengine, count, 1e6*time);
+	igt_info("All %d engines (parallel/%d): %'lu cycles, "
+		 "average %.3fus/batch, overlap %.1f%\n",
+		 nengine, BURST, count,
+		 1e6*time, 100*(sum-time)/(sum-(max/nengine)));
 
 	/* The rate limiting step is how fast the slowest engine can
 	 * its queue of requests, if we wait upon a full ring all dispatch
</pre>
    </blockquote>
    <br>
    I'm not entirely convinced about the overlap calculation. The other
    info is definitely useful though.<br>
    <br>
    Reviewed-by: John Harrison <a class="moz-txt-link-rfc2396E"
      href="mailto:john.c.harrison@intel.com">&lt;john.c.harrison@intel.com&gt;</a><br>
    <br>
  </body>
</html>

--------------24351E2E9204752AF132A9B9--

--===============1478897141==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4
IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg==

--===============1478897141==--