[PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around

Intel-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around
@ 2018-05-22 11:00 Chris Wilson
  2018-05-22 11:00 ` [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog Chris Wilson
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Chris Wilson @ 2018-05-22 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

Extend the i915 load to (optionally) pass a write hazard between
engines, causing us to wait on the interrupt between engines. Thus
adding MI_USER_INTERRUPT irq handling to our list of sins.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 benchmarks/gem_syslatency.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
index de59eaf82..9160e2199 100644
--- a/benchmarks/gem_syslatency.c
+++ b/benchmarks/gem_syslatency.c
@@ -53,6 +53,7 @@ struct gem_busyspin {
 	pthread_t thread;
 	unsigned long count;
 	bool leak;
+	bool interrupts;
 };
 
 struct sys_wait {
@@ -94,7 +95,7 @@ static void *gem_busyspin(void *arg)
 	const uint32_t bbe = MI_BATCH_BUFFER_END;
 	struct gem_busyspin *bs = arg;
 	struct drm_i915_gem_execbuffer2 execbuf;
-	struct drm_i915_gem_exec_object2 obj;
+	struct drm_i915_gem_exec_object2 obj[2];
 	const unsigned sz = bs->leak ? 16 << 20 : 4 << 10;
 	unsigned engines[16];
 	unsigned nengine;
@@ -107,13 +108,15 @@ static void *gem_busyspin(void *arg)
 	for_each_engine(fd, engine)
 		if (!ignore_engine(fd, engine)) engines[nengine++] = engine;
 
-	memset(&obj, 0, sizeof(obj));
-	obj.handle = gem_create(fd, sz);
-	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
+	memset(obj, 0, sizeof(obj));
+	obj[0].handle = gem_create(fd, 4096);
+	obj[0].flags = EXEC_OBJECT_WRITE;
+	obj[1].handle = gem_create(fd, sz);
+	gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
 
 	memset(&execbuf, 0, sizeof(execbuf));
-	execbuf.buffers_ptr = (uintptr_t)&obj;
-	execbuf.buffer_count = 1;
+	execbuf.buffers_ptr = (uintptr_t)(obj + !bs->interrupts);
+	execbuf.buffer_count = 1 + !!bs->interrupts;
 	execbuf.flags |= LOCAL_I915_EXEC_HANDLE_LUT;
 	execbuf.flags |= LOCAL_I915_EXEC_NO_RELOC;
 	if (__gem_execbuf(fd, &execbuf)) {
@@ -129,9 +132,9 @@ static void *gem_busyspin(void *arg)
 		}
 		bs->count += nengine;
 		if (bs->leak) {
-			gem_madvise(fd, obj.handle, I915_MADV_DONTNEED);
-			obj.handle = gem_create(fd, sz);
-			gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
+			gem_madvise(fd, obj[1].handle, I915_MADV_DONTNEED);
+			obj[1].handle = gem_create(fd, sz);
+			gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
 		}
 	}
 
@@ -305,13 +308,17 @@ int main(int argc, char **argv)
 	int field = -1;
 	int enable_gem_sysbusy = 1;
 	bool leak = false;
+	bool interrupts = false;
 	int n, c;
 
-	while ((c = getopt(argc, argv, "t:f:bmn")) != -1) {
+	while ((c = getopt(argc, argv, "t:f:bmni")) != -1) {
 		switch (c) {
 		case 'n': /* dry run, measure baseline system latency */
 			enable_gem_sysbusy = 0;
 			break;
+		case 'i': /* interrupts ahoy! */
+			interrupts = true;
+			break;
 		case 't':
 			/* How long to run the benchmark for (seconds) */
 			time = atoi(optarg);
@@ -346,6 +353,7 @@ int main(int argc, char **argv)
 		for (n = 0; n < ncpus; n++) {
 			bind_cpu(&attr, n);
 			busy[n].leak = leak;
+			busy[n].interrupts = interrupts;
 			pthread_create(&busy[n].thread, &attr,
 				       gem_busyspin, &busy[n]);
 		}
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog
  2018-05-22 11:00 [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Chris Wilson
@ 2018-05-22 11:00 ` Chris Wilson
  2018-05-22 11:38   ` Tvrtko Ursulin
  2018-05-22 11:00 ` [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration Chris Wilson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2018-05-22 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

Normally we use a hog per CPU to ensure that the system is fully
loaded to see how much latency we cause. For simple sanitychecking, allow
ourselves to limit it to just one CPU hog.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 benchmarks/gem_syslatency.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
index 9160e2199..d1056773a 100644
--- a/benchmarks/gem_syslatency.c
+++ b/benchmarks/gem_syslatency.c
@@ -311,8 +311,11 @@ int main(int argc, char **argv)
 	bool interrupts = false;
 	int n, c;
 
-	while ((c = getopt(argc, argv, "t:f:bmni")) != -1) {
+	while ((c = getopt(argc, argv, "t:f:bmni1")) != -1) {
 		switch (c) {
+		case '1':
+			ncpus = 1;
+			break;
 		case 'n': /* dry run, measure baseline system latency */
 			enable_gem_sysbusy = 0;
 			break;
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog
  2018-05-22 11:00 ` [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog Chris Wilson
@ 2018-05-22 11:38   ` Tvrtko Ursulin
  0 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2018-05-22 11:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 22/05/2018 12:00, Chris Wilson wrote:
> Normally we use a hog per CPU to ensure that the system is fully
> loaded to see how much latency we cause. For simple sanitychecking, allow
> ourselves to limit it to just one CPU hog.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   benchmarks/gem_syslatency.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
> index 9160e2199..d1056773a 100644
> --- a/benchmarks/gem_syslatency.c
> +++ b/benchmarks/gem_syslatency.c
> @@ -311,8 +311,11 @@ int main(int argc, char **argv)
>   	bool interrupts = false;
>   	int n, c;
>   
> -	while ((c = getopt(argc, argv, "t:f:bmni")) != -1) {
> +	while ((c = getopt(argc, argv, "t:f:bmni1")) != -1) {
>   		switch (c) {
> +		case '1':
> +			ncpus = 1;
> +			break;
>   		case 'n': /* dry run, measure baseline system latency */
>   			enable_gem_sysbusy = 0;
>   			break;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration
  2018-05-22 11:00 [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Chris Wilson
  2018-05-22 11:00 ` [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog Chris Wilson
@ 2018-05-22 11:00 ` Chris Wilson
  2018-05-22 11:49   ` Tvrtko Ursulin
  2018-05-22 11:24 ` [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Mika Kuoppala
  2018-05-22 11:37 ` Tvrtko Ursulin
  3 siblings, 1 reply; 8+ messages in thread
From: Chris Wilson @ 2018-05-22 11:00 UTC (permalink / raw)
  To: intel-gfx; +Cc: igt-dev

While for stressing the system we want to submit as many batches as we
can as that shows us worst case impact on system latency, it is not a
very realistic case. To introduce a bit more realism allow the batches
run for a user defined duration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 benchmarks/gem_syslatency.c | 71 ++++++++++++++++++++++++++++++++++---
 1 file changed, 67 insertions(+), 4 deletions(-)

diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
index d1056773a..45cabe86c 100644
--- a/benchmarks/gem_syslatency.c
+++ b/benchmarks/gem_syslatency.c
@@ -51,6 +51,7 @@ static volatile int done;
 
 struct gem_busyspin {
 	pthread_t thread;
+	unsigned long sz;
 	unsigned long count;
 	bool leak;
 	bool interrupts;
@@ -96,7 +97,8 @@ static void *gem_busyspin(void *arg)
 	struct gem_busyspin *bs = arg;
 	struct drm_i915_gem_execbuffer2 execbuf;
 	struct drm_i915_gem_exec_object2 obj[2];
-	const unsigned sz = bs->leak ? 16 << 20 : 4 << 10;
+	const unsigned sz =
+		bs->sz ? bs->sz + sizeof(bbe) : bs->leak ? 16 << 20 : 4 << 10;
 	unsigned engines[16];
 	unsigned nengine;
 	unsigned engine;
@@ -112,7 +114,7 @@ static void *gem_busyspin(void *arg)
 	obj[0].handle = gem_create(fd, 4096);
 	obj[0].flags = EXEC_OBJECT_WRITE;
 	obj[1].handle = gem_create(fd, sz);
-	gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
+	gem_write(fd, obj[1].handle, bs->sz, &bbe, sizeof(bbe));
 
 	memset(&execbuf, 0, sizeof(execbuf));
 	execbuf.buffers_ptr = (uintptr_t)(obj + !bs->interrupts);
@@ -125,6 +127,12 @@ static void *gem_busyspin(void *arg)
 	}
 
 	while (!done) {
+		for (int n = 0; n < nengine; n++) {
+			const int m = rand() % nengine;
+			unsigned int tmp = engines[n];
+			engines[n] = engines[m];
+			engines[m] = tmp;
+		}
 		for (int n = 0; n < nengine; n++) {
 			execbuf.flags &= ~ENGINE_FLAGS;
 			execbuf.flags |= engines[n];
@@ -134,7 +142,7 @@ static void *gem_busyspin(void *arg)
 		if (bs->leak) {
 			gem_madvise(fd, obj[1].handle, I915_MADV_DONTNEED);
 			obj[1].handle = gem_create(fd, sz);
-			gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
+			gem_write(fd, obj[1].handle, bs->sz, &bbe, sizeof(bbe));
 		}
 	}
 
@@ -294,6 +302,50 @@ static void *background_fs(void *path)
 	return NULL;
 }
 
+static unsigned long calibrate_nop(unsigned int target_us,
+				   unsigned int tolerance_pct)
+{
+	const uint32_t bbe = MI_BATCH_BUFFER_END;
+	const unsigned int loops = 100;
+	struct drm_i915_gem_exec_object2 obj = {};
+	struct drm_i915_gem_execbuffer2 eb =
+		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
+	struct timespec t_0, t_end;
+	long sz, prev;
+	int fd;
+
+	fd = drm_open_driver(DRIVER_INTEL);
+
+	clock_gettime(CLOCK_MONOTONIC, &t_0);
+
+	sz = 256 * 1024;
+	do {
+		struct timespec t_start;
+
+		obj.handle = gem_create(fd, sz + sizeof(bbe));
+		gem_write(fd, obj.handle, sz, &bbe, sizeof(bbe));
+		gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+
+		clock_gettime(CLOCK_MONOTONIC, &t_start);
+		for (int loop = 0; loop < loops; loop++)
+			gem_execbuf(fd, &eb);
+		gem_sync(fd, obj.handle);
+		clock_gettime(CLOCK_MONOTONIC, &t_end);
+
+		gem_close(fd, obj.handle);
+
+		prev = sz;
+		sz = loops * sz / elapsed(&t_start, &t_end) * 1e3 * target_us;
+		sz = ALIGN(sz, sizeof(uint32_t));
+	} while (elapsed(&t_0, &t_end) < 5 ||
+		 abs(sz - prev) > (sz * tolerance_pct / 100));
+
+	close(fd);
+
+	return sz;
+}
+
 int main(int argc, char **argv)
 {
 	struct gem_busyspin *busy;
@@ -309,9 +361,10 @@ int main(int argc, char **argv)
 	int enable_gem_sysbusy = 1;
 	bool leak = false;
 	bool interrupts = false;
+	long batch = 0;
 	int n, c;
 
-	while ((c = getopt(argc, argv, "t:f:bmni1")) != -1) {
+	while ((c = getopt(argc, argv, "r:t:f:bmni1")) != -1) {
 		switch (c) {
 		case '1':
 			ncpus = 1;
@@ -328,6 +381,10 @@ int main(int argc, char **argv)
 			if (time < 0)
 				time = INT_MAX;
 			break;
+		case 'r':
+			/* Duration of each batch (microseconds) */
+			batch = atoi(optarg);
+			break;
 		case 'f':
 			/* Select an output field */
 			field = atoi(optarg);
@@ -350,11 +407,17 @@ int main(int argc, char **argv)
 	force_low_latency();
 	min = min_measurement_error();
 
+	if (batch > 0)
+		batch = calibrate_nop(batch, 2);
+	else
+		batch = -batch;
+
 	busy = calloc(ncpus, sizeof(*busy));
 	pthread_attr_init(&attr);
 	if (enable_gem_sysbusy) {
 		for (n = 0; n < ncpus; n++) {
 			bind_cpu(&attr, n);
+			busy[n].sz = batch;
 			busy[n].leak = leak;
 			busy[n].interrupts = interrupts;
 			pthread_create(&busy[n].thread, &attr,
-- 
2.17.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration
  2018-05-22 11:00 ` [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration Chris Wilson
@ 2018-05-22 11:49   ` Tvrtko Ursulin
  0 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2018-05-22 11:49 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 22/05/2018 12:00, Chris Wilson wrote:
> While for stressing the system we want to submit as many batches as we
> can as that shows us worst case impact on system latency, it is not a
> very realistic case. To introduce a bit more realism allow the batches
> run for a user defined duration.
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   benchmarks/gem_syslatency.c | 71 ++++++++++++++++++++++++++++++++++---
>   1 file changed, 67 insertions(+), 4 deletions(-)
> 
> diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
> index d1056773a..45cabe86c 100644
> --- a/benchmarks/gem_syslatency.c
> +++ b/benchmarks/gem_syslatency.c
> @@ -51,6 +51,7 @@ static volatile int done;
>   
>   struct gem_busyspin {
>   	pthread_t thread;
> +	unsigned long sz;
>   	unsigned long count;
>   	bool leak;
>   	bool interrupts;
> @@ -96,7 +97,8 @@ static void *gem_busyspin(void *arg)
>   	struct gem_busyspin *bs = arg;
>   	struct drm_i915_gem_execbuffer2 execbuf;
>   	struct drm_i915_gem_exec_object2 obj[2];
> -	const unsigned sz = bs->leak ? 16 << 20 : 4 << 10;
> +	const unsigned sz =
> +		bs->sz ? bs->sz + sizeof(bbe) : bs->leak ? 16 << 20 : 4 << 10;
>   	unsigned engines[16];
>   	unsigned nengine;
>   	unsigned engine;
> @@ -112,7 +114,7 @@ static void *gem_busyspin(void *arg)
>   	obj[0].handle = gem_create(fd, 4096);
>   	obj[0].flags = EXEC_OBJECT_WRITE;
>   	obj[1].handle = gem_create(fd, sz);
> -	gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
> +	gem_write(fd, obj[1].handle, bs->sz, &bbe, sizeof(bbe));

Hm what was the point in creating large batches here if bbend was always 
first?

>   
>   	memset(&execbuf, 0, sizeof(execbuf));
>   	execbuf.buffers_ptr = (uintptr_t)(obj + !bs->interrupts);
> @@ -125,6 +127,12 @@ static void *gem_busyspin(void *arg)
>   	}
>   
>   	while (!done) {
> +		for (int n = 0; n < nengine; n++) {
> +			const int m = rand() % nengine;
> +			unsigned int tmp = engines[n];
> +			engines[n] = engines[m];
> +			engines[m] = tmp;

igt_exchange_int? Problem with frameworks getting more featureful is 
easier to forget what is there. :) Or even igt_permute_array?

But what it has to do with batch duration?

> +		}
>   		for (int n = 0; n < nengine; n++) {
>   			execbuf.flags &= ~ENGINE_FLAGS;
>   			execbuf.flags |= engines[n];
> @@ -134,7 +142,7 @@ static void *gem_busyspin(void *arg)
>   		if (bs->leak) {
>   			gem_madvise(fd, obj[1].handle, I915_MADV_DONTNEED);
>   			obj[1].handle = gem_create(fd, sz);
> -			gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
> +			gem_write(fd, obj[1].handle, bs->sz, &bbe, sizeof(bbe));
>   		}
>   	}
>   
> @@ -294,6 +302,50 @@ static void *background_fs(void *path)
>   	return NULL;
>   }
>   
> +static unsigned long calibrate_nop(unsigned int target_us,
> +				   unsigned int tolerance_pct)
> +{
> +	const uint32_t bbe = MI_BATCH_BUFFER_END;
> +	const unsigned int loops = 100;
> +	struct drm_i915_gem_exec_object2 obj = {};
> +	struct drm_i915_gem_execbuffer2 eb =
> +		{ .buffer_count = 1, .buffers_ptr = (uintptr_t)&obj};
> +	struct timespec t_0, t_end;
> +	long sz, prev;
> +	int fd;
> +
> +	fd = drm_open_driver(DRIVER_INTEL);
> +
> +	clock_gettime(CLOCK_MONOTONIC, &t_0);
> +
> +	sz = 256 * 1024;
> +	do {
> +		struct timespec t_start;
> +
> +		obj.handle = gem_create(fd, sz + sizeof(bbe));
> +		gem_write(fd, obj.handle, sz, &bbe, sizeof(bbe));
> +		gem_execbuf(fd, &eb);
> +		gem_sync(fd, obj.handle);
> +
> +		clock_gettime(CLOCK_MONOTONIC, &t_start);
> +		for (int loop = 0; loop < loops; loop++)
> +			gem_execbuf(fd, &eb);
> +		gem_sync(fd, obj.handle);
> +		clock_gettime(CLOCK_MONOTONIC, &t_end);
> +
> +		gem_close(fd, obj.handle);
> +
> +		prev = sz;
> +		sz = loops * sz / elapsed(&t_start, &t_end) * 1e3 * target_us;
> +		sz = ALIGN(sz, sizeof(uint32_t));
> +	} while (elapsed(&t_0, &t_end) < 5 ||
> +		 abs(sz - prev) > (sz * tolerance_pct / 100));
> +
> +	close(fd);
> +
> +	return sz;
> +}

I presume this is a copy&paste so don't have to look into it in detail.

> +
>   int main(int argc, char **argv)
>   {
>   	struct gem_busyspin *busy;
> @@ -309,9 +361,10 @@ int main(int argc, char **argv)
>   	int enable_gem_sysbusy = 1;
>   	bool leak = false;
>   	bool interrupts = false;
> +	long batch = 0;
>   	int n, c;
>   
> -	while ((c = getopt(argc, argv, "t:f:bmni1")) != -1) {
> +	while ((c = getopt(argc, argv, "r:t:f:bmni1")) != -1) {
>   		switch (c) {
>   		case '1':
>   			ncpus = 1;
> @@ -328,6 +381,10 @@ int main(int argc, char **argv)
>   			if (time < 0)
>   				time = INT_MAX;
>   			break;
> +		case 'r':
> +			/* Duration of each batch (microseconds) */
> +			batch = atoi(optarg);
> +			break;
>   		case 'f':
>   			/* Select an output field */
>   			field = atoi(optarg);
> @@ -350,11 +407,17 @@ int main(int argc, char **argv)
>   	force_low_latency();
>   	min = min_measurement_error();
>   
> +	if (batch > 0)
> +		batch = calibrate_nop(batch, 2);
> +	else
> +		batch = -batch;
> +

No idea of the purpose of this. User passes in negative on the cmd line? 
But then calibration is missing.

>   	busy = calloc(ncpus, sizeof(*busy));
>   	pthread_attr_init(&attr);
>   	if (enable_gem_sysbusy) {
>   		for (n = 0; n < ncpus; n++) {
>   			bind_cpu(&attr, n);
> +			busy[n].sz = batch;
>   			busy[n].leak = leak;
>   			busy[n].interrupts = interrupts;
>   			pthread_create(&busy[n].thread, &attr,
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around
  2018-05-22 11:00 [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Chris Wilson
  2018-05-22 11:00 ` [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog Chris Wilson
  2018-05-22 11:00 ` [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration Chris Wilson
@ 2018-05-22 11:24 ` Mika Kuoppala
  2018-05-22 11:28   ` Chris Wilson
  2018-05-22 11:37 ` Tvrtko Ursulin
  3 siblings, 1 reply; 8+ messages in thread
From: Mika Kuoppala @ 2018-05-22 11:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Extend the i915 load to (optionally) pass a write hazard between
> engines, causing us to wait on the interrupt between engines. Thus
> adding MI_USER_INTERRUPT irq handling to our list of sins.


Is it the eb_move_to_gpu waiting then for the object
due to write?

..and this then arming the interrupts later down the
chain?

-Mika

>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  benchmarks/gem_syslatency.c | 28 ++++++++++++++++++----------
>  1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
> index de59eaf82..9160e2199 100644
> --- a/benchmarks/gem_syslatency.c
> +++ b/benchmarks/gem_syslatency.c
> @@ -53,6 +53,7 @@ struct gem_busyspin {
>  	pthread_t thread;
>  	unsigned long count;
>  	bool leak;
> +	bool interrupts;
>  };
>  
>  struct sys_wait {
> @@ -94,7 +95,7 @@ static void *gem_busyspin(void *arg)
>  	const uint32_t bbe = MI_BATCH_BUFFER_END;
>  	struct gem_busyspin *bs = arg;
>  	struct drm_i915_gem_execbuffer2 execbuf;
> -	struct drm_i915_gem_exec_object2 obj;
> +	struct drm_i915_gem_exec_object2 obj[2];
>  	const unsigned sz = bs->leak ? 16 << 20 : 4 << 10;
>  	unsigned engines[16];
>  	unsigned nengine;
> @@ -107,13 +108,15 @@ static void *gem_busyspin(void *arg)
>  	for_each_engine(fd, engine)
>  		if (!ignore_engine(fd, engine)) engines[nengine++] = engine;
>  
> -	memset(&obj, 0, sizeof(obj));
> -	obj.handle = gem_create(fd, sz);
> -	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +	memset(obj, 0, sizeof(obj));
> +	obj[0].handle = gem_create(fd, 4096);
> +	obj[0].flags = EXEC_OBJECT_WRITE;
> +	obj[1].handle = gem_create(fd, sz);
> +	gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
>  
>  	memset(&execbuf, 0, sizeof(execbuf));
> -	execbuf.buffers_ptr = (uintptr_t)&obj;
> -	execbuf.buffer_count = 1;
> +	execbuf.buffers_ptr = (uintptr_t)(obj + !bs->interrupts);
> +	execbuf.buffer_count = 1 + !!bs->interrupts;
>  	execbuf.flags |= LOCAL_I915_EXEC_HANDLE_LUT;
>  	execbuf.flags |= LOCAL_I915_EXEC_NO_RELOC;
>  	if (__gem_execbuf(fd, &execbuf)) {
> @@ -129,9 +132,9 @@ static void *gem_busyspin(void *arg)
>  		}
>  		bs->count += nengine;
>  		if (bs->leak) {
> -			gem_madvise(fd, obj.handle, I915_MADV_DONTNEED);
> -			obj.handle = gem_create(fd, sz);
> -			gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +			gem_madvise(fd, obj[1].handle, I915_MADV_DONTNEED);
> +			obj[1].handle = gem_create(fd, sz);
> +			gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
>  		}
>  	}
>  
> @@ -305,13 +308,17 @@ int main(int argc, char **argv)
>  	int field = -1;
>  	int enable_gem_sysbusy = 1;
>  	bool leak = false;
> +	bool interrupts = false;
>  	int n, c;
>  
> -	while ((c = getopt(argc, argv, "t:f:bmn")) != -1) {
> +	while ((c = getopt(argc, argv, "t:f:bmni")) != -1) {
>  		switch (c) {
>  		case 'n': /* dry run, measure baseline system latency */
>  			enable_gem_sysbusy = 0;
>  			break;
> +		case 'i': /* interrupts ahoy! */
> +			interrupts = true;
> +			break;
>  		case 't':
>  			/* How long to run the benchmark for (seconds) */
>  			time = atoi(optarg);
> @@ -346,6 +353,7 @@ int main(int argc, char **argv)
>  		for (n = 0; n < ncpus; n++) {
>  			bind_cpu(&attr, n);
>  			busy[n].leak = leak;
> +			busy[n].interrupts = interrupts;
>  			pthread_create(&busy[n].thread, &attr,
>  				       gem_busyspin, &busy[n]);
>  		}
> -- 
> 2.17.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around
  2018-05-22 11:24 ` [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Mika Kuoppala
@ 2018-05-22 11:28   ` Chris Wilson
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Wilson @ 2018-05-22 11:28 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx; +Cc: igt-dev

Quoting Mika Kuoppala (2018-05-22 12:24:59)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > Extend the i915 load to (optionally) pass a write hazard between
> > engines, causing us to wait on the interrupt between engines. Thus
> > adding MI_USER_INTERRUPT irq handling to our list of sins.
> 
> 
> Is it the eb_move_to_gpu waiting then for the object
> due to write?

Don't be silly! That was like 3 years ago :-p
 
> ..and this then arming the interrupts later down the
> chain?

i915_gem_request_await_object adds the callback for the request to be
submitted when its dependencies are complete.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around
  2018-05-22 11:00 [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Chris Wilson
                   ` (2 preceding siblings ...)
  2018-05-22 11:24 ` [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Mika Kuoppala
@ 2018-05-22 11:37 ` Tvrtko Ursulin
  3 siblings, 0 replies; 8+ messages in thread
From: Tvrtko Ursulin @ 2018-05-22 11:37 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: igt-dev


On 22/05/2018 12:00, Chris Wilson wrote:
> Extend the i915 load to (optionally) pass a write hazard between
> engines, causing us to wait on the interrupt between engines. Thus
> adding MI_USER_INTERRUPT irq handling to our list of sins.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   benchmarks/gem_syslatency.c | 28 ++++++++++++++++++----------
>   1 file changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/benchmarks/gem_syslatency.c b/benchmarks/gem_syslatency.c
> index de59eaf82..9160e2199 100644
> --- a/benchmarks/gem_syslatency.c
> +++ b/benchmarks/gem_syslatency.c
> @@ -53,6 +53,7 @@ struct gem_busyspin {
>   	pthread_t thread;
>   	unsigned long count;
>   	bool leak;
> +	bool interrupts;
>   };
>   
>   struct sys_wait {
> @@ -94,7 +95,7 @@ static void *gem_busyspin(void *arg)
>   	const uint32_t bbe = MI_BATCH_BUFFER_END;
>   	struct gem_busyspin *bs = arg;
>   	struct drm_i915_gem_execbuffer2 execbuf;
> -	struct drm_i915_gem_exec_object2 obj;
> +	struct drm_i915_gem_exec_object2 obj[2];
>   	const unsigned sz = bs->leak ? 16 << 20 : 4 << 10;
>   	unsigned engines[16];
>   	unsigned nengine;
> @@ -107,13 +108,15 @@ static void *gem_busyspin(void *arg)
>   	for_each_engine(fd, engine)
>   		if (!ignore_engine(fd, engine)) engines[nengine++] = engine;
>   
> -	memset(&obj, 0, sizeof(obj));
> -	obj.handle = gem_create(fd, sz);
> -	gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +	memset(obj, 0, sizeof(obj));
> +	obj[0].handle = gem_create(fd, 4096);
> +	obj[0].flags = EXEC_OBJECT_WRITE;
> +	obj[1].handle = gem_create(fd, sz);
> +	gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
>   
>   	memset(&execbuf, 0, sizeof(execbuf));
> -	execbuf.buffers_ptr = (uintptr_t)&obj;
> -	execbuf.buffer_count = 1;
> +	execbuf.buffers_ptr = (uintptr_t)(obj + !bs->interrupts);
> +	execbuf.buffer_count = 1 + !!bs->interrupts;

Above two lines are to hacky. :/ Suggest a more pedestrian approach with 
a ternary or something.

>   	execbuf.flags |= LOCAL_I915_EXEC_HANDLE_LUT;
>   	execbuf.flags |= LOCAL_I915_EXEC_NO_RELOC;
>   	if (__gem_execbuf(fd, &execbuf)) {
> @@ -129,9 +132,9 @@ static void *gem_busyspin(void *arg)
>   		}
>   		bs->count += nengine;
>   		if (bs->leak) {
> -			gem_madvise(fd, obj.handle, I915_MADV_DONTNEED);
> -			obj.handle = gem_create(fd, sz);
> -			gem_write(fd, obj.handle, 0, &bbe, sizeof(bbe));
> +			gem_madvise(fd, obj[1].handle, I915_MADV_DONTNEED);
> +			obj[1].handle = gem_create(fd, sz);
> +			gem_write(fd, obj[1].handle, 0, &bbe, sizeof(bbe));
>   		}
>   	}
>   
> @@ -305,13 +308,17 @@ int main(int argc, char **argv)
>   	int field = -1;
>   	int enable_gem_sysbusy = 1;
>   	bool leak = false;
> +	bool interrupts = false;
>   	int n, c;
>   
> -	while ((c = getopt(argc, argv, "t:f:bmn")) != -1) {
> +	while ((c = getopt(argc, argv, "t:f:bmni")) != -1) {
>   		switch (c) {
>   		case 'n': /* dry run, measure baseline system latency */
>   			enable_gem_sysbusy = 0;
>   			break;
> +		case 'i': /* interrupts ahoy! */
> +			interrupts = true;
> +			break;
>   		case 't':
>   			/* How long to run the benchmark for (seconds) */
>   			time = atoi(optarg);
> @@ -346,6 +353,7 @@ int main(int argc, char **argv)
>   		for (n = 0; n < ncpus; n++) {
>   			bind_cpu(&attr, n);
>   			busy[n].leak = leak;
> +			busy[n].interrupts = interrupts;
>   			pthread_create(&busy[n].thread, &attr,
>   				       gem_busyspin, &busy[n]);
>   		}
> 

With the hackery eliminated:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-22 11:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-22 11:00 [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Chris Wilson
2018-05-22 11:00 ` [PATCH i-g-t 2/3] benchmarks/gem_syslatency: Allow limiting to just 1 CPU hog Chris Wilson
2018-05-22 11:38   ` Tvrtko Ursulin
2018-05-22 11:00 ` [PATCH i-g-t 3/3] benchmarks/gem_syslatency: Specify batch duration Chris Wilson
2018-05-22 11:49   ` Tvrtko Ursulin
2018-05-22 11:24 ` [PATCH i-g-t 1/3] benchmarks/gem_syslatency: Pass a write hazard around Mika Kuoppala
2018-05-22 11:28   ` Chris Wilson
2018-05-22 11:37 ` Tvrtko Ursulin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox