Re: [i-g-t PATCH v7 2/5] lib: add igt_dummyload

From: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
To: "Chris Wilson" <chris@chris-wilson.co.uk>,
	intel-gfx@lists.freedesktop.org,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Ville Syrjälä" <ville.syrjala@linux.intel.com>,
	tomeu@tomeuvizoso.net
Subject: Re: [i-g-t PATCH v7 2/5] lib: add igt_dummyload
Date: Wed, 16 Nov 2016 16:07:33 +0200	[thread overview]
Message-ID: <686aed2f-7fc5-a7bc-fc76-9221ff681fda@linux.intel.com> (raw)
In-Reply-To: <20161116135659.GE22232@nuc-i3427.alporthouse.com>

On 16.11.2016 15:56, Chris Wilson wrote:
> On Wed, Nov 16, 2016 at 11:18:01PM +0200, Abdiel Janulgue wrote:
>> A lot of igt testcases need some GPU workload to make sure a race
>> window is big enough. Unfortunately having a fixed amount of
>> workload leads to spurious test failures or overtly long runtimes
>> on some fast/slow platforms. This library contains functionality
>> to submit GPU workloads that should consume exactly a specific
>> amount of time.
>>
>> v2 : Add recursive batch feature from Chris
>> v3 : Drop auto-tuned stuff. Add bo dependecy to recursive batch
>>      by adding a dummy reloc to the bo as suggested by Ville.
>> v4:  Fix dependency reloc as write instead of read (Ville).
>>      Fix wrong handling of batchbuffer start on ILK causing
>>      test failure
>> v5:  Convert kms_busy to use this api
>> v6:  Add this library to docs
>> v7:  Document global use of batch, reuse defines
>>      Minor code cleanups.
>>      Rename igt_spin_batch and igt_post_spin_batch to
>>      igt_spin_batch_new and igt_spin_batch_free
>>      respectively (Tomeu Vizoso).
>>      Fix error in dependency relocation handling in HSW causing
>>      tests to fail.
>>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: tomeu@tomeuvizoso.net
>> Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>> ---
>>  .../intel-gpu-tools/intel-gpu-tools-docs.xml       |   1 +
>>  lib/Makefile.sources                               |   2 +
>>  lib/igt.h                                          |   1 +
>>  lib/igt_dummyload.c                                | 281 +++++++++++++++++++++
>>  lib/igt_dummyload.h                                |  43 ++++
>>  5 files changed, 328 insertions(+)
>>  create mode 100644 lib/igt_dummyload.c
>>  create mode 100644 lib/igt_dummyload.h
>>
>> diff --git a/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml b/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml
>> index c862f2a..55902ab 100644
>> --- a/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml
>> +++ b/docs/reference/intel-gpu-tools/intel-gpu-tools-docs.xml
>> @@ -32,6 +32,7 @@
>>      <xi:include href="xml/intel_io.xml"/>
>>      <xi:include href="xml/igt_vc4.xml"/>
>>      <xi:include href="xml/igt_vgem.xml"/>
>> +    <xi:include href="xml/igt_dummyload.xml"/>
>>    </chapter>
>>    <xi:include href="xml/igt_test_programs.xml"/>
>>  
>> diff --git a/lib/Makefile.sources b/lib/Makefile.sources
>> index e8e277b..7fc5ec2 100644
>> --- a/lib/Makefile.sources
>> +++ b/lib/Makefile.sources
>> @@ -75,6 +75,8 @@ lib_source_list =	 	\
>>  	igt_draw.h		\
>>  	igt_pm.c		\
>>  	igt_pm.h		\
>> +	igt_dummyload.c		\
>> +	igt_dummyload.h		\
>>  	uwildmat/uwildmat.h	\
>>  	uwildmat/uwildmat.c	\
>>  	$(NULL)
>> diff --git a/lib/igt.h b/lib/igt.h
>> index d751f24..a0028d5 100644
>> --- a/lib/igt.h
>> +++ b/lib/igt.h
>> @@ -32,6 +32,7 @@
>>  #include "igt_core.h"
>>  #include "igt_debugfs.h"
>>  #include "igt_draw.h"
>> +#include "igt_dummyload.h"
>>  #include "igt_fb.h"
>>  #include "igt_gt.h"
>>  #include "igt_kms.h"
>> diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c
>> new file mode 100644
>> index 0000000..d266195
>> --- /dev/null
>> +++ b/lib/igt_dummyload.c
>> @@ -0,0 +1,281 @@
>> +/*
>> + * Copyright © 2016 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the next
>> + * paragraph) shall be included in all copies or substantial portions of the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
>> + * IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include "igt.h"
>> +#include "igt_dummyload.h"
>> +#include <time.h>
>> +#include <signal.h>
>> +#include <sys/syscall.h>
>> +
>> +/**
>> + * SECTION:igt_dummyload
>> + * @short_description: Library for submitting GPU workloads
>> + * @title: Dummyload
>> + * @include: igt.h
>> + *
>> + * A lot of igt testcases need some GPU workload to make sure a race window is
>> + * big enough. Unfortunately having a fixed amount of workload leads to
>> + * spurious test failures or overly long runtimes on some fast/slow platforms.
>> + * This library contains functionality to submit GPU workloads that should
>> + * consume exactly a specific amount of time.
>> + */
>> +
>> +#define LOCAL_I915_EXEC_BSD_SHIFT      (13)
>> +#define LOCAL_I915_EXEC_BSD_MASK       (3 << LOCAL_I915_EXEC_BSD_SHIFT)
>> +
>> +#define ENGINE_MASK  (I915_EXEC_RING_MASK | LOCAL_I915_EXEC_BSD_MASK)
>> +
>> +static const int bo_size = 4096;
>> +
>> +static void
>> +fill_object(struct drm_i915_gem_exec_object2 *obj, uint32_t gem_handle,
>> +	    struct drm_i915_gem_relocation_entry *relocs, uint32_t count)
>> +{
>> +	memset(obj, 0, sizeof(*obj));
>> +	obj->handle = gem_handle;
>> +	obj->relocation_count = count;
>> +	obj->relocs_ptr = (uintptr_t)relocs;
>> +}
>> +
>> +static void
>> +fill_reloc(struct drm_i915_gem_relocation_entry *reloc,
>> +	   uint32_t gem_handle, uint32_t offset,
>> +	   uint32_t read_domains, uint32_t write_domains)
>> +{
>> +	reloc->target_handle = gem_handle;
>> +	reloc->delta = 0;
>> +	reloc->offset = offset * sizeof(uint32_t);
>> +	reloc->presumed_offset = 0;
>> +	reloc->read_domains = read_domains;
>> +	reloc->write_domain = write_domains;
>> +}
>> +
>> +/*
>> + * Needs to be global. Signal handlers don't accept arguments
>> + */
>> +static uint32_t *batch;
>> +
>> +static uint32_t emit_recursive_batch(int fd, int engine, unsigned dep_handle)
>> +{
>> +	const int gen = intel_gen(intel_get_drm_devid(fd));
>> +	struct drm_i915_gem_exec_object2 obj[2];
>> +	struct drm_i915_gem_relocation_entry relocs[2];
>> +	struct drm_i915_gem_execbuffer2 execbuf;
>> +	unsigned engines[16];
>> +	unsigned nengine, handle;
>> +	int i = 0, reloc_count = 0, buf_count = 0;
>> +
>> +	buf_count = 0;
>> +	nengine = 0;
>> +	if (engine < 0) {
>> +		for_each_engine(fd, engine)
>> +			if (engine)
>> +				engines[nengine++] = engine;
>> +	} else {
>> +		gem_require_ring(fd, engine);
>> +		engines[nengine++] = engine;
>> +	}
>> +	igt_require(nengine);
>> +
>> +	memset(&execbuf, 0, sizeof(execbuf));
>> +	memset(obj, 0, sizeof(obj));
>> +	memset(relocs, 0, sizeof(relocs));
>> +
>> +	execbuf.buffers_ptr = (uintptr_t) obj;
>> +	handle = gem_create(fd, bo_size);
>> +	batch = gem_mmap__gtt(fd, handle, bo_size, PROT_WRITE);
>> +	igt_assert(batch);
>> +	gem_set_domain(fd, handle,
>> +			I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
>> +
>> +	if (gen >= 8) {
>> +		batch[i++] = MI_BATCH_BUFFER_START | 1 << 8 | 1;
>> +		/* recurse */
>> +		fill_reloc(&relocs[reloc_count], handle, i,
>> +			   I915_GEM_DOMAIN_COMMAND, 0);
>> +		batch[i++] = 0;
>> +		batch[i++] = 0;
>> +	} else if (gen >= 6) {
>> +		batch[i++] = MI_BATCH_BUFFER_START | 1 << 8;
>> +		/* recurse */
>> +		fill_reloc(&relocs[reloc_count], handle, i,
>> +			   I915_GEM_DOMAIN_COMMAND, 0);
>> +		batch[i++] = 0;
>> +	} else {
>> +		batch[i++] = MI_BATCH_BUFFER_START | 2 << 6 |
>> +			((gen < 4) ? 1 : 0);
>> +		/* recurse */
>> +		fill_reloc(&relocs[reloc_count], handle, i,
>> +			   I915_GEM_DOMAIN_COMMAND, 0);
>> +		batch[i++] = 0;
>> +		if (gen < 4)
>> +			relocs[reloc_count].delta = 1;
>> +	}
>> +	reloc_count++;
>> +
>> +	if (dep_handle > 0) {
>> +		igt_assert(nengine == 1);
>> +		/* dummy write to dependency */
>> +		fill_object(&obj[buf_count], dep_handle, NULL, 0);
>> +		buf_count++;
>> +
>> +		fill_reloc(&relocs[reloc_count], dep_handle, 256,
>> +			   I915_GEM_DOMAIN_RENDER,
>> +			   I915_GEM_DOMAIN_RENDER);
>> +		reloc_count++;
>> +	
> 
> This is not the dummy load you were looking for. This is an infinite
> walk with no termination condition.

This is based on kms_busy's previous make_fb_busy() recursive batch
setup. I'm wondering why that one works for making the bo busy? Or did I
miss something?

> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx