[igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements

Igt-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements
@ 2018-02-01 12:47 Tvrtko Ursulin
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement Tvrtko Ursulin
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 12:47 UTC (permalink / raw)
  To: igt-dev; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some reworks to hopefully improve the reliability of tests on small core
machines.

Mostly ensuring measured sleep approach is always used in tests which look at
time, and that time spent setting up the test is ignored as much as possible.

Tvrtko Ursulin (3):
  tests/perf_pmu: Tighten busy measurement
  tests/perf_pmu: More busy measurement tightening
  tests/perf_pmu: Use measured sleep in all time based tests

 tests/perf_pmu.c | 102 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 60 insertions(+), 42 deletions(-)

-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
@ 2018-02-01 12:47 ` Tvrtko Ursulin
  2018-02-01 12:57   ` Chris Wilson
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening Tvrtko Ursulin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 12:47 UTC (permalink / raw)
  To: igt-dev; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

In cases where we manually terminate the busy batch, we always want to
sample busyness while the batch is running, just before we will
terminate it, and not the other way around. This way we make the window
for unwated idleness getting sampled smaller.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 28 +++++++++++++---------------
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index 2f7d33414a53..bf16e5e8b1f9 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
 		spin = NULL;
 
 	slept = measured_usleep(batch_duration_ns / 1000);
-	igt_spin_batch_end(spin);
-
 	val = pmu_read_single(fd);
 
+	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd);
 
@@ -256,7 +255,7 @@ busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
 	gem_quiescent_gpu(gem_fd);
 }
 
-static void log_busy(int fd, unsigned int num_engines, uint64_t *val)
+static void log_busy(unsigned int num_engines, uint64_t *val)
 {
 	char buf[1024];
 	int rem = sizeof(buf);
@@ -303,14 +302,14 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 
 	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
 	slept = measured_usleep(batch_duration_ns / 1000);
-	igt_spin_batch_end(spin);
-
 	pmu_read_multi(fd[0], num_engines, val);
-	log_busy(fd[0], num_engines, val);
 
+	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	log_busy(num_engines, val);
+
 	assert_within_epsilon(val[busy_idx], slept, tolerance);
 	for (i = 0; i < num_engines; i++) {
 		if (i == busy_idx)
@@ -364,14 +363,14 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 		fd[i] = open_group(val[i], fd[0]);
 
 	slept = measured_usleep(batch_duration_ns / 1000);
-	igt_spin_batch_end(spin);
-
 	pmu_read_multi(fd[0], num_engines, val);
-	log_busy(fd[0], num_engines, val);
 
+	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	log_busy(num_engines, val);
+
 	for (i = 0; i < num_engines; i++) {
 		if (i == idle_idx)
 			assert_within_epsilon(val[i], 0.0f, tolerance);
@@ -420,14 +419,14 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
 		fd[i] = open_group(val[i], fd[0]);
 
 	slept = measured_usleep(batch_duration_ns / 1000);
-	igt_spin_batch_end(spin);
-
 	pmu_read_multi(fd[0], num_engines, val);
-	log_busy(fd[0], num_engines, val);
 
+	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	log_busy(num_engines, val);
+
 	for (i = 0; i < num_engines; i++)
 		assert_within_epsilon(val[i], slept, tolerance);
 	gem_quiescent_gpu(gem_fd);
@@ -903,12 +902,11 @@ static void cpu_hotplug(int gem_fd)
 
 	igt_waitchildren();
 
-	igt_spin_batch_end(spin);
-	gem_sync(gem_fd, spin->handle);
-
 	ref = igt_nsec_elapsed(&start);
 	val = pmu_read_single(fd);
 
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd);
 
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening
  2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement Tvrtko Ursulin
@ 2018-02-01 12:47 ` Tvrtko Ursulin
  2018-02-01 12:59   ` Chris Wilson
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests Tvrtko Ursulin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 12:47 UTC (permalink / raw)
  To: igt-dev; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Where we use measured sleeps, take PMU samples immediately before and
after and look at their delta in order to minimize the effect of any
test setup delays.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index bf16e5e8b1f9..bdf452c8347c 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -145,8 +145,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
 	else
 		spin = NULL;
 
-	slept = measured_usleep(batch_duration_ns / 1000);
 	val = pmu_read_single(fd);
+	slept = measured_usleep(batch_duration_ns / 1000);
+	val = pmu_read_single(fd) - val;
 
 	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
@@ -180,8 +181,9 @@ busy_start(int gem_fd, const struct intel_execution_engine2 *e)
 
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
-	slept = measured_usleep(batch_duration_ns / 1000);
 	val = pmu_read_single(fd);
+	slept = measured_usleep(batch_duration_ns / 1000);
+	val = pmu_read_single(fd) - val;
 
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd);
@@ -227,8 +229,9 @@ busy_double_start(int gem_fd, const struct intel_execution_engine2 *e)
 	 */
 	fd = open_pmu(I915_PMU_ENGINE_BUSY(e->class, e->instance));
 
-	slept = measured_usleep(batch_duration_ns / 1000);
 	val = pmu_read_single(fd);
+	slept = measured_usleep(batch_duration_ns / 1000);
+	val = pmu_read_single(fd) - val;
 
 	igt_spin_batch_end(spin[0]);
 	igt_spin_batch_end(spin[1]);
@@ -279,6 +282,7 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	       const unsigned int num_engines)
 {
 	const struct intel_execution_engine2 *e_;
+	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
 	int fd[num_engines];
 	unsigned long slept;
@@ -301,13 +305,17 @@ busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	igt_assert_eq(i, num_engines);
 
 	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
+	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
-	pmu_read_multi(fd[0], num_engines, val);
+	pmu_read_multi(fd[0], num_engines, tval[1]);
 
 	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	for (i = 0; i < num_engines; i++)
+		val[i] = tval[1][i] - tval[0][i];
+
 	log_busy(num_engines, val);
 
 	assert_within_epsilon(val[busy_idx], slept, tolerance);
@@ -324,6 +332,7 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 		    const unsigned int num_engines)
 {
 	const struct intel_execution_engine2 *e_;
+	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
 	int fd[num_engines];
 	unsigned long slept;
@@ -362,13 +371,17 @@ most_busy_check_all(int gem_fd, const struct intel_execution_engine2 *e,
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
-	pmu_read_multi(fd[0], num_engines, val);
+	pmu_read_multi(fd[0], num_engines, tval[1]);
 
 	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	for (i = 0; i < num_engines; i++)
+		val[i] = tval[1][i] - tval[0][i];
+
 	log_busy(num_engines, val);
 
 	for (i = 0; i < num_engines; i++) {
@@ -384,6 +397,7 @@ static void
 all_busy_check_all(int gem_fd, const unsigned int num_engines)
 {
 	const struct intel_execution_engine2 *e;
+	uint64_t tval[2][num_engines];
 	uint64_t val[num_engines];
 	int fd[num_engines];
 	unsigned long slept;
@@ -418,13 +432,17 @@ all_busy_check_all(int gem_fd, const unsigned int num_engines)
 	for (i = 0; i < num_engines; i++)
 		fd[i] = open_group(val[i], fd[0]);
 
+	pmu_read_multi(fd[0], num_engines, tval[0]);
 	slept = measured_usleep(batch_duration_ns / 1000);
-	pmu_read_multi(fd[0], num_engines, val);
+	pmu_read_multi(fd[0], num_engines, tval[1]);
 
 	igt_spin_batch_end(spin);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
+	for (i = 0; i < num_engines; i++)
+		val[i] = tval[1][i] - tval[0][i];
+
 	log_busy(num_engines, val);
 
 	for (i = 0; i < num_engines; i++)
@@ -765,13 +783,15 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
 	igt_spin_batch_set_timeout(spin, 2 * batch_duration_ns);
 
-	slept = measured_usleep(batch_duration_ns / 1000);
+	val[0] = pmu_read_single(fd[0]);
 	val[1] = pmu_read_single(fd[1]);
+	slept = measured_usleep(batch_duration_ns / 1000);
+	val[1] = pmu_read_single(fd[1]) - val[1];
 	close(fd[1]);
 
 	gem_sync(gem_fd, spin->handle);
 
-	val[0] = pmu_read_single(fd[0]);
+	val[0] = pmu_read_single(fd[0]) - val[0];
 
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests
  2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement Tvrtko Ursulin
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening Tvrtko Ursulin
@ 2018-02-01 12:47 ` Tvrtko Ursulin
  2018-02-01 17:37   ` Chris Wilson
  2018-02-01 13:22 ` [igt-dev] ✓ Fi.CI.BAT: success for perf_pmu reliability improvements Patchwork
  2018-02-01 16:38 ` [igt-dev] ✗ Fi.CI.IGT: warning " Patchwork
  4 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 12:47 UTC (permalink / raw)
  To: igt-dev; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Stop relying on timers to end spin batches but use measured sleep, which
was established to work better, in all time based tests.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 tests/perf_pmu.c | 42 +++++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
index bdf452c8347c..e27e74abe686 100644
--- a/tests/perf_pmu.c
+++ b/tests/perf_pmu.c
@@ -454,30 +454,31 @@ static void
 no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
 {
 	igt_spin_t *spin;
-	uint64_t val[2];
+	uint64_t val[2][2];
 	int fd;
 
 	fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
 	open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
 
-	if (busy) {
+	if (busy)
 		spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-		igt_spin_batch_set_timeout(spin, batch_duration_ns);
-	} else {
-		usleep(batch_duration_ns / 1000);
-	}
 
-	if (busy)
-		gem_sync(gem_fd, spin->handle);
+	pmu_read_multi(fd, 2, val[0]);
+	measured_usleep(batch_duration_ns / 1000);
+	pmu_read_multi(fd, 2, val[1]);
 
-	pmu_read_multi(fd, 2, val);
+	val[0][0] = val[1][0] - val[0][0];
+	val[0][1] = val[1][1] - val[0][1];
 
-	if (busy)
+	if (busy) {
+		igt_spin_batch_end(spin);
+		gem_sync(gem_fd, spin->handle);
 		igt_spin_batch_free(gem_fd, spin);
+	}
 	close(fd);
 
-	assert_within_epsilon(val[0], 0.0f, tolerance);
-	assert_within_epsilon(val[1], 0.0f, tolerance);
+	assert_within_epsilon(val[0][0], 0.0f, tolerance);
+	assert_within_epsilon(val[0][1], 0.0f, tolerance);
 }
 
 #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
@@ -766,7 +767,7 @@ static void
 multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 {
 	uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
-	unsigned int slept;
+	unsigned int slept[2];
 	igt_spin_t *spin;
 	uint64_t val[2];
 	int fd[2];
@@ -781,23 +782,22 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
 	fd[1] = open_pmu(config);
 
 	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
-	igt_spin_batch_set_timeout(spin, 2 * batch_duration_ns);
 
-	val[0] = pmu_read_single(fd[0]);
-	val[1] = pmu_read_single(fd[1]);
-	slept = measured_usleep(batch_duration_ns / 1000);
+	val[0] = val[1] = pmu_read_single(fd[0]);
+	slept[1] = measured_usleep(batch_duration_ns / 1000);
 	val[1] = pmu_read_single(fd[1]) - val[1];
 	close(fd[1]);
 
-	gem_sync(gem_fd, spin->handle);
-
+	slept[0] = measured_usleep(batch_duration_ns / 1000) + slept[1];
 	val[0] = pmu_read_single(fd[0]) - val[0];
 
+	igt_spin_batch_end(spin);
+	gem_sync(gem_fd, spin->handle);
 	igt_spin_batch_free(gem_fd, spin);
 	close(fd[0]);
 
-	assert_within_epsilon(val[0], 2 * batch_duration_ns, tolerance);
-	assert_within_epsilon(val[1], slept, tolerance);
+	assert_within_epsilon(val[0], slept[0], tolerance);
+	assert_within_epsilon(val[1], slept[1], tolerance);
 }
 
 /**
-- 
2.14.1

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement Tvrtko Ursulin
@ 2018-02-01 12:57   ` Chris Wilson
  2018-02-01 16:26     ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 12:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> In cases where we manually terminate the busy batch, we always want to
> sample busyness while the batch is running, just before we will
> terminate it, and not the other way around. This way we make the window
> for unwated idleness getting sampled smaller.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  tests/perf_pmu.c | 28 +++++++++++++---------------
>  1 file changed, 13 insertions(+), 15 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index 2f7d33414a53..bf16e5e8b1f9 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>                 spin = NULL;
>  
>         slept = measured_usleep(batch_duration_ns / 1000);
> -       igt_spin_batch_end(spin);
> -
>         val = pmu_read_single(fd);
>  
> +       igt_spin_batch_end(spin);

But that's the wrong way round as we are measuring busyness, and the
sampling should terminate as soon as the spin-batch ends, before we even
read the PMU sample? For the timer sampler, it's lost in the noise.

So the idea was to cancel the busyness asap so that the sampler stops
updating before we even have to cross into the kernel for the PMU read.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening Tvrtko Ursulin
@ 2018-02-01 12:59   ` Chris Wilson
  2018-02-01 16:37     ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 12:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 12:47:45)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Where we use measured sleeps, take PMU samples immediately before and
> after and look at their delta in order to minimize the effect of any
> test setup delays.

The system and pmu were meant to be idle at the start of the test,
right? So val should always be zero?
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [igt-dev] ✓ Fi.CI.BAT: success for perf_pmu reliability improvements
  2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests Tvrtko Ursulin
@ 2018-02-01 13:22 ` Patchwork
  2018-02-01 16:38 ` [igt-dev] ✗ Fi.CI.IGT: warning " Patchwork
  4 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-02-01 13:22 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: perf_pmu reliability improvements
URL   : https://patchwork.freedesktop.org/series/37481/
State : success

== Summary ==

IGT patchset tested on top of latest successful build
689ea8ddcaf769c14b26df8c0fcd8fcc74e06dd2 igt/gem_exec_params: Drop drm master privileges only on drm master fds

with latest DRM-Tip kernel build CI_DRM_3711
4244f989341c drm-tip: 2018y-02m-01d-12h-39m-55s UTC integration manifest

No testlist changes.

fi-bdw-5557u     total:288  pass:267  dwarn:0   dfail:0   fail:0   skip:21  time:426s
fi-bdw-gvtdvm    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:425s
fi-blb-e6850     total:288  pass:223  dwarn:1   dfail:0   fail:0   skip:64  time:372s
fi-bsw-n3050     total:288  pass:242  dwarn:0   dfail:0   fail:0   skip:46  time:489s
fi-bwr-2160      total:288  pass:183  dwarn:0   dfail:0   fail:0   skip:105 time:282s
fi-bxt-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:483s
fi-bxt-j4205     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:485s
fi-byt-j1900     total:288  pass:253  dwarn:0   dfail:0   fail:0   skip:35  time:468s
fi-byt-n2820     total:288  pass:249  dwarn:0   dfail:0   fail:0   skip:39  time:456s
fi-cfl-s2        total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:566s
fi-elk-e7500     total:288  pass:229  dwarn:0   dfail:0   fail:0   skip:59  time:418s
fi-gdg-551       total:288  pass:179  dwarn:0   dfail:0   fail:1   skip:108 time:283s
fi-glk-1         total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:509s
fi-hsw-4770      total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:393s
fi-hsw-4770r     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:401s
fi-ilk-650       total:288  pass:228  dwarn:0   dfail:0   fail:0   skip:60  time:410s
fi-ivb-3520m     total:288  pass:259  dwarn:0   dfail:0   fail:0   skip:29  time:449s
fi-ivb-3770      total:288  pass:255  dwarn:0   dfail:0   fail:0   skip:33  time:412s
fi-kbl-7500u     total:288  pass:263  dwarn:1   dfail:0   fail:0   skip:24  time:459s
fi-kbl-7560u     total:288  pass:269  dwarn:0   dfail:0   fail:0   skip:19  time:496s
fi-kbl-7567u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:454s
fi-kbl-r         total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:503s
fi-pnv-d510      total:288  pass:222  dwarn:1   dfail:0   fail:0   skip:65  time:578s
fi-skl-6260u     total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:426s
fi-skl-6600u     total:288  pass:261  dwarn:0   dfail:0   fail:0   skip:27  time:507s
fi-skl-6700hq    total:288  pass:262  dwarn:0   dfail:0   fail:0   skip:26  time:526s
fi-skl-6700k2    total:288  pass:264  dwarn:0   dfail:0   fail:0   skip:24  time:486s
fi-skl-6770hq    total:288  pass:268  dwarn:0   dfail:0   fail:0   skip:20  time:485s
fi-skl-guc       total:288  pass:260  dwarn:0   dfail:0   fail:0   skip:28  time:417s
fi-skl-gvtdvm    total:288  pass:265  dwarn:0   dfail:0   fail:0   skip:23  time:429s
fi-snb-2520m     total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:528s
fi-snb-2600      total:288  pass:248  dwarn:0   dfail:0   fail:0   skip:40  time:399s
Blacklisted hosts:
fi-glk-dsi       total:288  pass:258  dwarn:0   dfail:0   fail:0   skip:30  time:468s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_848/issues.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 12:57   ` Chris Wilson
@ 2018-02-01 16:26     ` Tvrtko Ursulin
  2018-02-01 16:39       ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 16:26 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

On 01/02/2018 12:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> In cases where we manually terminate the busy batch, we always want to
>> sample busyness while the batch is running, just before we will
>> terminate it, and not the other way around. This way we make the window
>> for unwated idleness getting sampled smaller.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   tests/perf_pmu.c | 28 +++++++++++++---------------
>>   1 file changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>> index 2f7d33414a53..bf16e5e8b1f9 100644
>> --- a/tests/perf_pmu.c
>> +++ b/tests/perf_pmu.c
>> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>>                  spin = NULL;
>>   
>>          slept = measured_usleep(batch_duration_ns / 1000);
>> -       igt_spin_batch_end(spin);
>> -
>>          val = pmu_read_single(fd);
>>   
>> +       igt_spin_batch_end(spin);
> 
> But that's the wrong way round as we are measuring busyness, and the
> sampling should terminate as soon as the spin-batch ends, before we even
> read the PMU sample? For the timer sampler, it's lost in the noise.
> 
> So the idea was to cancel the busyness asap so that the sampler stops
> updating before we even have to cross into the kernel for the PMU read.

I don't follow on the problem statement. This is how code used to looks 
in many places:

  	slept = measured_usleep(batch_duration_ns / 1000);
	igt_spin_batch_end(spin);

  	pmu_read_multi(fd[0], num_engines, val);

Problem here is there is indeterministic time gap, depending on test 
execution speed, between requesting the batch to end to reading the 
counter. This can add a random amount of idleness to the read value.

And another indeterminism in how long it takes for the batch end request 
to get picked up by the GPU. If that is slower in some cases, the 
counter will drift from the expected value toward other direction, 
overestimating busyness relative to sleep duration.

Attempted improvement was simply to reverse the last two lines, so we 
read the counter when we know it is busy (after the sleep), and then 
request batch termination.

This only leaves the scheduling delay between end of sleep and counter 
read, which is smaller than end of sleep - batch end - counter read.

These tests are not testing for edge conditions, just that the busy 
engines are reported as busy, in various combinations, so that sounded 
like a reasonable change.

I hope I did not get confused here, it wouldn't be the first time in 
these tests...

Regards,

Tvrtko

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening
  2018-02-01 12:59   ` Chris Wilson
@ 2018-02-01 16:37     ` Tvrtko Ursulin
  2018-02-01 16:48       ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 16:37 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

On 01/02/2018 12:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 12:47:45)
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Where we use measured sleeps, take PMU samples immediately before and
>> after and look at their delta in order to minimize the effect of any
>> test setup delays.
> 
> The system and pmu were meant to be idle at the start of the test,
> right? So val should always be zero?

Yes, but there is a time delay between starting the counters and 
applying busyness. For instance, busy-check-all, current version:

	... pmu open somewhere before ...

  	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
  	slept = measured_usleep(batch_duration_ns / 1000);
	pmu_read_multi(fd[0], num_engines, val);

In this case the slept value vs the read busyness will miss a tiny bit 
between igt_spin_batch_new to measured_usleep. Probably minimal indeed, 
but I thought just for extra safety to take explicit initial read just 
before the sleep, so:

  	spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
	pmu_read_multi(fd[0], num_engines, tval[0]);
  	slept = measured_usleep(batch_duration_ns / 1000);
	pmu_read_multi(fd[0], num_engines, tval[1]);

More importantly, it is a potentially larger time delta in tests which 
open multiple counters after starting the spinner. Like 
most_busy_check_all for instance:

	... start spin batch...

  	for (i = 0; i < num_engines; i++)
  		fd[i] = open_group(val[i], fd[0]);

  	slept = measured_usleep(batch_duration_ns / 1000);
	pmu_read_multi(fd[0], num_engines, val);

So the counter value relative to slept value will include time spent 
opening num_engines event. Once again change to take an explicit initial 
value just before the sleep looked reasonable to me.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [igt-dev] ✗ Fi.CI.IGT: warning for perf_pmu reliability improvements
  2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2018-02-01 13:22 ` [igt-dev] ✓ Fi.CI.BAT: success for perf_pmu reliability improvements Patchwork
@ 2018-02-01 16:38 ` Patchwork
  4 siblings, 0 replies; 19+ messages in thread
From: Patchwork @ 2018-02-01 16:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: igt-dev

== Series Details ==

Series: perf_pmu reliability improvements
URL   : https://patchwork.freedesktop.org/series/37481/
State : warning

== Summary ==

Test kms_flip:
        Subgroup plain-flip-ts-check:
                fail       -> PASS       (shard-hsw) fdo#100368
        Subgroup 2x-flip-vs-expired-vblank:
                pass       -> FAIL       (shard-hsw) fdo#102887
Test kms_pipe_crc_basic:
        Subgroup suspend-read-crc-pipe-a:
                skip       -> PASS       (shard-snb) fdo#103375
Test kms_vblank:
        Subgroup pipe-a-wait-idle-hang:
                pass       -> SKIP       (shard-snb)
        Subgroup pipe-a-accuracy-idle:
                fail       -> PASS       (shard-apl) fdo#102583
Test gem_eio:
        Subgroup in-flight-contexts:
                fail       -> PASS       (shard-hsw) fdo#104676
Test kms_setmode:
        Subgroup basic:
                pass       -> FAIL       (shard-hsw) fdo#99912
Test kms_sysfs_edid_timing:
                warn       -> PASS       (shard-apl) fdo#100047
Test perf:
        Subgroup enable-disable:
                pass       -> FAIL       (shard-apl) fdo#103715

fdo#100368 https://bugs.freedesktop.org/show_bug.cgi?id=100368
fdo#102887 https://bugs.freedesktop.org/show_bug.cgi?id=102887
fdo#103375 https://bugs.freedesktop.org/show_bug.cgi?id=103375
fdo#102583 https://bugs.freedesktop.org/show_bug.cgi?id=102583
fdo#104676 https://bugs.freedesktop.org/show_bug.cgi?id=104676
fdo#99912 https://bugs.freedesktop.org/show_bug.cgi?id=99912
fdo#100047 https://bugs.freedesktop.org/show_bug.cgi?id=100047
fdo#103715 https://bugs.freedesktop.org/show_bug.cgi?id=103715

shard-apl        total:2838 pass:1751 dwarn:1   dfail:0   fail:22  skip:1064 time:12530s
shard-hsw        total:2838 pass:1735 dwarn:1   dfail:0   fail:11  skip:1090 time:11737s
shard-snb        total:2838 pass:1329 dwarn:1   dfail:0   fail:10  skip:1498 time:6602s
Blacklisted hosts:
shard-kbl        total:2827 pass:1865 dwarn:1   dfail:0   fail:22  skip:938 time:9349s

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_848/shards.html
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 16:26     ` Tvrtko Ursulin
@ 2018-02-01 16:39       ` Chris Wilson
  2018-02-01 16:58         ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 16:39 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 16:26:58)
> 
> On 01/02/2018 12:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> In cases where we manually terminate the busy batch, we always want to
> >> sample busyness while the batch is running, just before we will
> >> terminate it, and not the other way around. This way we make the window
> >> for unwated idleness getting sampled smaller.
> >>
> >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   tests/perf_pmu.c | 28 +++++++++++++---------------
> >>   1 file changed, 13 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> >> index 2f7d33414a53..bf16e5e8b1f9 100644
> >> --- a/tests/perf_pmu.c
> >> +++ b/tests/perf_pmu.c
> >> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
> >>                  spin = NULL;
> >>   
> >>          slept = measured_usleep(batch_duration_ns / 1000);
> >> -       igt_spin_batch_end(spin);
> >> -
> >>          val = pmu_read_single(fd);
> >>   
> >> +       igt_spin_batch_end(spin);
> > 
> > But that's the wrong way round as we are measuring busyness, and the
> > sampling should terminate as soon as the spin-batch ends, before we even
> > read the PMU sample? For the timer sampler, it's lost in the noise.
> > 
> > So the idea was to cancel the busyness asap so that the sampler stops
> > updating before we even have to cross into the kernel for the PMU read.
> 
> I don't follow on the problem statement. This is how code used to looks 
> in many places:
> 
>         slept = measured_usleep(batch_duration_ns / 1000);
>         igt_spin_batch_end(spin);
> 
>         pmu_read_multi(fd[0], num_engines, val);
> 
> Problem here is there is indeterministic time gap, depending on test 
> execution speed, between requesting the batch to end to reading the 
> counter. This can add a random amount of idleness to the read value.

But we are not measuring idleness, but busyness. The batch will end a
few tens of nanoseconds after the write hits memory. The desire is that
time is zero so that the sleep exactly corresponds with the interval the
batch was spinning (busy).

> And another indeterminism in how long it takes for the batch end request 
> to get picked up by the GPU. If that is slower in some cases, the 
> counter will drift from the expected value toward other direction, 
> overestimating busyness relative to sleep duration.
> 
> Attempted improvement was simply to reverse the last two lines, so we 
> read the counter when we know it is busy (after the sleep), and then 
> request batch termination.
> 
> This only leaves the scheduling delay between end of sleep and counter 
> read, which is smaller than end of sleep - batch end - counter read.
> 
> These tests are not testing for edge conditions, just that the busy 
> engines are reported as busy, in various combinations, so that sounded 
> like a reasonable change.
> 
> I hope I did not get confused here, it wouldn't be the first time in 
> these tests...

Aiui, the PMU measure busyness which is the duration of the spin-batch
(or first enabling of the PMU event). The choice is between a context
switch into the kernel to stop the counter, or a single write to memory
to stop the batch).

My bet is the write to memory will turn off the counter within 100ns,
worst case including the interrupt processing. The PMU event stopping
I estimate 100ns best case (including the kernel context switch, and
don't ask how KPTI perturbs that switch as I don't know off hand how
much worse it will get).

Now, the write to memory is asynchronous to the PMU event stopping. So
if you write first, whichever is processed first (the kernel context
switch + event stop, or the CS interrupt) causes the busy counter to
cease. So best of both worlds?

I too may be wrong...
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening
  2018-02-01 16:37     ` Tvrtko Ursulin
@ 2018-02-01 16:48       ` Chris Wilson
  2018-02-01 17:02         ` Tvrtko Ursulin
  0 siblings, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 16:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 16:37:29)
> 
> On 01/02/2018 12:59, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-02-01 12:47:45)
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Where we use measured sleeps, take PMU samples immediately before and
> >> after and look at their delta in order to minimize the effect of any
> >> test setup delays.
> > 
> > The system and pmu were meant to be idle at the start of the test,
> > right? So val should always be zero?
> 
> Yes, but there is a time delay between starting the counters and 
> applying busyness. For instance, busy-check-all, current version:
> 
>         ... pmu open somewhere before ...
> 
>         spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>         slept = measured_usleep(batch_duration_ns / 1000);
>         pmu_read_multi(fd[0], num_engines, val);
> 
> In this case the slept value vs the read busyness will miss a tiny bit 
> between igt_spin_batch_new to measured_usleep. Probably minimal indeed, 
> but I thought just for extra safety to take explicit initial read just 
> before the sleep, so:
> 
>         spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>         pmu_read_multi(fd[0], num_engines, tval[0]);

Time gap here as well. How do we know this is better than before?

>         slept = measured_usleep(batch_duration_ns / 1000);
>         pmu_read_multi(fd[0], num_engines, tval[1]);
> 
> More importantly, it is a potentially larger time delta in tests which 
> open multiple counters after starting the spinner. Like 
> most_busy_check_all for instance:
> 
>         ... start spin batch...
> 
>         for (i = 0; i < num_engines; i++)
>                 fd[i] = open_group(val[i], fd[0]);
> 
>         slept = measured_usleep(batch_duration_ns / 1000);
>         pmu_read_multi(fd[0], num_engines, val);
> 
> So the counter value relative to slept value will include time spent 
> opening num_engines event. Once again change to take an explicit initial 
> value just before the sleep looked reasonable to me.

I was working on open being minimal delay and insignificant. I have no
idea what the relative costs are. That would be nice to know.

The issue I have is that the scheduler can preempt us at time (so
other than the argument one is quicker and so gives less systematic
error in the ideal case), we are at the mercy of the scheduler which can
inject unknown sleeps between any point. I fear we may need RT, mlocking
and more?, but would much rather avoid it.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

(in exchange for a small test to benchmark open_(single|group),
read_(single|group), pretty please :)
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 16:39       ` Chris Wilson
@ 2018-02-01 16:58         ` Tvrtko Ursulin
  2018-02-01 17:08           ` Chris Wilson
  0 siblings, 1 reply; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 16:58 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin


On 01/02/2018 16:39, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 16:26:58)
>>
>> On 01/02/2018 12:57, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-02-01 12:47:44)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> In cases where we manually terminate the busy batch, we always want to
>>>> sample busyness while the batch is running, just before we will
>>>> terminate it, and not the other way around. This way we make the window
>>>> for unwated idleness getting sampled smaller.
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    tests/perf_pmu.c | 28 +++++++++++++---------------
>>>>    1 file changed, 13 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
>>>> index 2f7d33414a53..bf16e5e8b1f9 100644
>>>> --- a/tests/perf_pmu.c
>>>> +++ b/tests/perf_pmu.c
>>>> @@ -146,10 +146,9 @@ single(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>>>>                   spin = NULL;
>>>>    
>>>>           slept = measured_usleep(batch_duration_ns / 1000);
>>>> -       igt_spin_batch_end(spin);
>>>> -
>>>>           val = pmu_read_single(fd);
>>>>    
>>>> +       igt_spin_batch_end(spin);
>>>
>>> But that's the wrong way round as we are measuring busyness, and the
>>> sampling should terminate as soon as the spin-batch ends, before we even
>>> read the PMU sample? For the timer sampler, it's lost in the noise.
>>>
>>> So the idea was to cancel the busyness asap so that the sampler stops
>>> updating before we even have to cross into the kernel for the PMU read.
>>
>> I don't follow on the problem statement. This is how code used to looks
>> in many places:
>>
>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          igt_spin_batch_end(spin);
>>
>>          pmu_read_multi(fd[0], num_engines, val);
>>
>> Problem here is there is indeterministic time gap, depending on test
>> execution speed, between requesting the batch to end to reading the
>> counter. This can add a random amount of idleness to the read value.
> 
> But we are not measuring idleness, but busyness. The batch will end a

Yep, and that's why I removed some idleness from the busyness! :) Plus 
removed the time required to signal batch end from it as well.

> few tens of nanoseconds after the write hits memory. The desire is that
> time is zero so that the sleep exactly corresponds with the interval the
> batch was spinning (busy).
> 
>> And another indeterminism in how long it takes for the batch end request
>> to get picked up by the GPU. If that is slower in some cases, the
>> counter will drift from the expected value toward other direction,
>> overestimating busyness relative to sleep duration.
>>
>> Attempted improvement was simply to reverse the last two lines, so we
>> read the counter when we know it is busy (after the sleep), and then
>> request batch termination.
>>
>> This only leaves the scheduling delay between end of sleep and counter
>> read, which is smaller than end of sleep - batch end - counter read.
>>
>> These tests are not testing for edge conditions, just that the busy
>> engines are reported as busy, in various combinations, so that sounded
>> like a reasonable change.
>>
>> I hope I did not get confused here, it wouldn't be the first time in
>> these tests...
> 
> Aiui, the PMU measure busyness which is the duration of the spin-batch
> (or first enabling of the PMU event). The choice is between a context
> switch into the kernel to stop the counter, or a single write to memory
> to stop the batch).
> 
> My bet is the write to memory will turn off the counter within 100ns,
> worst case including the interrupt processing. The PMU event stopping
> I estimate 100ns best case (including the kernel context switch, and
> don't ask how KPTI perturbs that switch as I don't know off hand how
> much worse it will get).

There is not PMU event stopping in the changed bits, so not being under 
measure either before or after.

> Now, the write to memory is asynchronous to the PMU event stopping. So
> if you write first, whichever is processed first (the kernel context
> switch + event stop, or the CS interrupt) causes the busy counter to
> cease. So best of both worlds?

I am just moving the two sample read-outs closer together in time. Two 
samples are measure of how long we slept, and measure of PMU busyness. 
There should be as little time as possible between the two readouts. 
Ending the spinner I don't see how it belongs in this sandwich, or 
anything else really.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening
  2018-02-01 16:48       ` Chris Wilson
@ 2018-02-01 17:02         ` Tvrtko Ursulin
  0 siblings, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 17:02 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin


On 01/02/2018 16:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 16:37:29)
>>
>> On 01/02/2018 12:59, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-02-01 12:47:45)
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> Where we use measured sleeps, take PMU samples immediately before and
>>>> after and look at their delta in order to minimize the effect of any
>>>> test setup delays.
>>>
>>> The system and pmu were meant to be idle at the start of the test,
>>> right? So val should always be zero?
>>
>> Yes, but there is a time delay between starting the counters and
>> applying busyness. For instance, busy-check-all, current version:
>>
>>          ... pmu open somewhere before ...
>>
>>          spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, val);
>>
>> In this case the slept value vs the read busyness will miss a tiny bit
>> between igt_spin_batch_new to measured_usleep. Probably minimal indeed,
>> but I thought just for extra safety to take explicit initial read just
>> before the sleep, so:
>>
>>          spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
>>          pmu_read_multi(fd[0], num_engines, tval[0]);
> 
> Time gap here as well. How do we know this is better than before?

It is theoretically marginally better on the leading edge in this 
example. Is it practically significant I don't know. At least it isn't 
worse. I can compare the numbers before and after once the results get 
in. I suspect Apollo Lake will the interesting machine to look at.

>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, tval[1]);
>>
>> More importantly, it is a potentially larger time delta in tests which
>> open multiple counters after starting the spinner. Like
>> most_busy_check_all for instance:
>>
>>          ... start spin batch...
>>
>>          for (i = 0; i < num_engines; i++)
>>                  fd[i] = open_group(val[i], fd[0]);
>>
>>          slept = measured_usleep(batch_duration_ns / 1000);
>>          pmu_read_multi(fd[0], num_engines, val);
>>
>> So the counter value relative to slept value will include time spent
>> opening num_engines event. Once again change to take an explicit initial
>> value just before the sleep looked reasonable to me.
> 
> I was working on open being minimal delay and insignificant. I have no
> idea what the relative costs are. That would be nice to know.
> 
> The issue I have is that the scheduler can preempt us at time (so
> other than the argument one is quicker and so gives less systematic
> error in the ideal case), we are at the mercy of the scheduler which can
> inject unknown sleeps between any point. I fear we may need RT, mlocking
> and more?, but would much rather avoid it.

Yep, that was exactly my thinking. Lets avoid using big ugly hammers 
(RT) unless there is no other way. With this series applied, if there 
are still random unexplained discrepancies, I think there will be 
nothing else to try than to go RT.

> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> (in exchange for a small test to benchmark open_(single|group),
> read_(single|group), pretty please :)

Okay, marking the email as TODO.

Regards,

Tvrtko


_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 16:58         ` Tvrtko Ursulin
@ 2018-02-01 17:08           ` Chris Wilson
  2018-02-01 17:16             ` Chris Wilson
  2018-02-01 17:20             ` Tvrtko Ursulin
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 17:08 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 16:58:21)
> 
> I am just moving the two sample read-outs closer together in time. Two 
> samples are measure of how long we slept, and measure of PMU busyness. 
> There should be as little time as possible between the two readouts. 
> Ending the spinner I don't see how it belongs in this sandwich, or 
> anything else really.

Then we disagree on what we think we are measuring. The usleep should
equal the busy value defined by the spin-batch.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 17:08           ` Chris Wilson
@ 2018-02-01 17:16             ` Chris Wilson
  2018-02-01 17:34               ` Chris Wilson
  2018-02-01 17:20             ` Tvrtko Ursulin
  1 sibling, 1 reply; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 17:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Chris Wilson (2018-02-01 17:08:09)
> Quoting Tvrtko Ursulin (2018-02-01 16:58:21)
> > 
> > I am just moving the two sample read-outs closer together in time. Two 
> > samples are measure of how long we slept, and measure of PMU busyness. 
> > There should be as little time as possible between the two readouts. 
> > Ending the spinner I don't see how it belongs in this sandwich, or 
> > anything else really.
> 
> Then we disagree on what we think we are measuring. The usleep should
> equal the busy value defined by the spin-batch.

How about if it was phrased:

	create spin-batch per-engine queued on a unsignal vgem
	open_group

	signal vgem // submit all queued spinners simultaneously (well, that's our goal)
	duration = measured_usleep
	end spinners

	busy = read_group

That's what I want to measure. The passage of time outside of the
spinner phase is then irrelevant, and the error inside is something that
we are trying to minimise as part of the driver overhead.
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 17:08           ` Chris Wilson
  2018-02-01 17:16             ` Chris Wilson
@ 2018-02-01 17:20             ` Tvrtko Ursulin
  1 sibling, 0 replies; 19+ messages in thread
From: Tvrtko Ursulin @ 2018-02-01 17:20 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin


On 01/02/2018 17:08, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-02-01 16:58:21)
>>
>> I am just moving the two sample read-outs closer together in time. Two
>> samples are measure of how long we slept, and measure of PMU busyness.
>> There should be as little time as possible between the two readouts.
>> Ending the spinner I don't see how it belongs in this sandwich, or
>> anything else really.
> 
> Then we disagree on what we think we are measuring. The usleep should
> equal the busy value defined by the spin-batch.

I agree, and that is why there should be no extra work between the sleep 
ending and PMU read-out.

Sleep is defining the spin-batch duration and that is the most accurate 
value for it we got.

Regards,

Tvrtko
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement
  2018-02-01 17:16             ` Chris Wilson
@ 2018-02-01 17:34               ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 17:34 UTC (permalink / raw)
  To: Tvrtko Ursulin, Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Chris Wilson (2018-02-01 17:16:28)
> Quoting Chris Wilson (2018-02-01 17:08:09)
> > Quoting Tvrtko Ursulin (2018-02-01 16:58:21)
> > > 
> > > I am just moving the two sample read-outs closer together in time. Two 
> > > samples are measure of how long we slept, and measure of PMU busyness. 
> > > There should be as little time as possible between the two readouts. 
> > > Ending the spinner I don't see how it belongs in this sandwich, or 
> > > anything else really.
> > 
> > Then we disagree on what we think we are measuring. The usleep should
> > equal the busy value defined by the spin-batch.
> 
> How about if it was phrased:
> 
>         create spin-batch per-engine queued on a unsignal vgem
>         open_group
> 
>         signal vgem // submit all queued spinners simultaneously (well, that's our goal)
>         duration = measured_usleep
>         end spinners
> 
>         busy = read_group
> 
> That's what I want to measure. The passage of time outside of the
> spinner phase is then irrelevant, and the error inside is something that
> we are trying to minimise as part of the driver overhead.

Ok, so we are looking at slightly different angles here. The patch
applies your idea of excluding the vagaries in controlling the spinning
batch from measuring the accuracy of the PMU event itself,

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

But I think there's some merit in my pov still :)
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests
  2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests Tvrtko Ursulin
@ 2018-02-01 17:37   ` Chris Wilson
  0 siblings, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2018-02-01 17:37 UTC (permalink / raw)
  To: Tvrtko Ursulin, igt-dev; +Cc: Tvrtko Ursulin

Quoting Tvrtko Ursulin (2018-02-01 12:47:46)
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Stop relying on timers to end spin batches but use measured sleep, which
> was established to work better, in all time based tests.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  tests/perf_pmu.c | 42 +++++++++++++++++++++---------------------
>  1 file changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/tests/perf_pmu.c b/tests/perf_pmu.c
> index bdf452c8347c..e27e74abe686 100644
> --- a/tests/perf_pmu.c
> +++ b/tests/perf_pmu.c
> @@ -454,30 +454,31 @@ static void
>  no_sema(int gem_fd, const struct intel_execution_engine2 *e, bool busy)
>  {
>         igt_spin_t *spin;
> -       uint64_t val[2];
> +       uint64_t val[2][2];
>         int fd;
>  
>         fd = open_group(I915_PMU_ENGINE_SEMA(e->class, e->instance), -1);
>         open_group(I915_PMU_ENGINE_WAIT(e->class, e->instance), fd);
>  
> -       if (busy) {
> +       if (busy)
>                 spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
> -               igt_spin_batch_set_timeout(spin, batch_duration_ns);
> -       } else {
> -               usleep(batch_duration_ns / 1000);
> -       }
>  
> -       if (busy)
> -               gem_sync(gem_fd, spin->handle);
> +       pmu_read_multi(fd, 2, val[0]);
> +       measured_usleep(batch_duration_ns / 1000);
> +       pmu_read_multi(fd, 2, val[1]);
>  
> -       pmu_read_multi(fd, 2, val);
> +       val[0][0] = val[1][0] - val[0][0];
> +       val[0][1] = val[1][1] - val[0][1];
>  
> -       if (busy)
> +       if (busy) {
> +               igt_spin_batch_end(spin);
> +               gem_sync(gem_fd, spin->handle);
>                 igt_spin_batch_free(gem_fd, spin);
> +       }
>         close(fd);
>  
> -       assert_within_epsilon(val[0], 0.0f, tolerance);
> -       assert_within_epsilon(val[1], 0.0f, tolerance);
> +       assert_within_epsilon(val[0][0], 0.0f, tolerance);
> +       assert_within_epsilon(val[0][1], 0.0f, tolerance);
>  }

This one looks innocent, but I don't see anything gained from being
inconsistent here. (Most bug discovery is serendipitous, so I welcome
inconsistency between tests :)

>  #define MI_INSTR(opcode, flags) (((opcode) << 23) | (flags))
> @@ -766,7 +767,7 @@ static void
>  multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>  {
>         uint64_t config = I915_PMU_ENGINE_BUSY(e->class, e->instance);
> -       unsigned int slept;
> +       unsigned int slept[2];
>         igt_spin_t *spin;
>         uint64_t val[2];
>         int fd[2];
> @@ -781,23 +782,22 @@ multi_client(int gem_fd, const struct intel_execution_engine2 *e)
>         fd[1] = open_pmu(config);
>  
>         spin = igt_spin_batch_new(gem_fd, 0, e2ring(gem_fd, e), 0);
> -       igt_spin_batch_set_timeout(spin, 2 * batch_duration_ns);
>  
> -       val[0] = pmu_read_single(fd[0]);
> -       val[1] = pmu_read_single(fd[1]);
> -       slept = measured_usleep(batch_duration_ns / 1000);
> +       val[0] = val[1] = pmu_read_single(fd[0]);
> +       slept[1] = measured_usleep(batch_duration_ns / 1000);
>         val[1] = pmu_read_single(fd[1]) - val[1];
>         close(fd[1]);
>  
> -       gem_sync(gem_fd, spin->handle);
> -
> +       slept[0] = measured_usleep(batch_duration_ns / 1000) + slept[1];
>         val[0] = pmu_read_single(fd[0]) - val[0];
>  
> +       igt_spin_batch_end(spin);
> +       gem_sync(gem_fd, spin->handle);
>         igt_spin_batch_free(gem_fd, spin);
>         close(fd[0]);
>  
> -       assert_within_epsilon(val[0], 2 * batch_duration_ns, tolerance);
> -       assert_within_epsilon(val[1], slept, tolerance);
> +       assert_within_epsilon(val[0], slept[0], tolerance);
> +       assert_within_epsilon(val[1], slept[1], tolerance);

Whereas this should, given our experience elsewhere with unknown sleeps,
make a difference.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2018-02-01 17:37 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-02-01 12:47 [igt-dev] [PATCH i-g-t 0/3] perf_pmu reliability improvements Tvrtko Ursulin
2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 1/3] tests/perf_pmu: Tighten busy measurement Tvrtko Ursulin
2018-02-01 12:57   ` Chris Wilson
2018-02-01 16:26     ` Tvrtko Ursulin
2018-02-01 16:39       ` Chris Wilson
2018-02-01 16:58         ` Tvrtko Ursulin
2018-02-01 17:08           ` Chris Wilson
2018-02-01 17:16             ` Chris Wilson
2018-02-01 17:34               ` Chris Wilson
2018-02-01 17:20             ` Tvrtko Ursulin
2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 2/3] tests/perf_pmu: More busy measurement tightening Tvrtko Ursulin
2018-02-01 12:59   ` Chris Wilson
2018-02-01 16:37     ` Tvrtko Ursulin
2018-02-01 16:48       ` Chris Wilson
2018-02-01 17:02         ` Tvrtko Ursulin
2018-02-01 12:47 ` [igt-dev] [PATCH i-g-t 3/3] tests/perf_pmu: Use measured sleep in all time based tests Tvrtko Ursulin
2018-02-01 17:37   ` Chris Wilson
2018-02-01 13:22 ` [igt-dev] ✓ Fi.CI.BAT: success for perf_pmu reliability improvements Patchwork
2018-02-01 16:38 ` [igt-dev] ✗ Fi.CI.IGT: warning " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox