Re: [igt-dev] [PATCH i-g-t] i915/gem_exec_balancer: Check for scheduling bonded-pairs on the same engine

From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: igt-dev@lists.freedesktop.org, Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Subject: Re: [igt-dev] [PATCH i-g-t] i915/gem_exec_balancer: Check for scheduling bonded-pairs on the same engine
Date: Mon, 23 Sep 2019 17:21:50 +0100	[thread overview]
Message-ID: <c05c43c9-f0a3-c304-aa3e-a4ce08a093c3@linux.intel.com> (raw)
In-Reply-To: <156925342367.29243.9296989859129945270@skylake-alporthouse-com>

On 23/09/2019 16:43, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-23 15:29:11)
>>
>> On 20/09/2019 23:26, Chris Wilson wrote:
>>> The expectation for bonded submission is that they are run concurrently,
>>> in parallel on multiple engines. However, given a lack of constraints in
>>> the scheduler's selection combined with timeslicing could mean that the
>>> bonded requests could be run in opposite order on the same engine. With
>>> just the right pair of requests, this can cause a GPU hang (or at least
>>> trigger hangchecker), best (worst) case would be execution running
>>> several times slower than ideal.
>>
>> I don't see any bonding being setup?
>>
>> [comes back later]
>>
>> Oh you used only the submit fence and not actually bonds. But you also
>> don't use the virtual engine at all?
> 
> A is using either of the 2 real engines, B is using the virtual engine
> to select the other available engine. Bonding in this case is just that
> the requests are bonded together to run in parallel.
> 
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    tests/i915/gem_exec_balancer.c | 151 +++++++++++++++++++++++++++++++++
>>>    1 file changed, 151 insertions(+)
>>>
>>> diff --git a/tests/i915/gem_exec_balancer.c b/tests/i915/gem_exec_balancer.c
>>> index 407dc0eca..e4fe75747 100644
>>> --- a/tests/i915/gem_exec_balancer.c
>>> +++ b/tests/i915/gem_exec_balancer.c
>>> @@ -30,6 +30,15 @@
>>>    
>>>    IGT_TEST_DESCRIPTION("Exercise in-kernel load-balancing");
>>>    
>>> +#define MI_SEMAPHORE_WAIT            (0x1c << 23)
>>> +#define   MI_SEMAPHORE_POLL             (1 << 15)
>>> +#define   MI_SEMAPHORE_SAD_GT_SDD       (0 << 12)
>>> +#define   MI_SEMAPHORE_SAD_GTE_SDD      (1 << 12)
>>> +#define   MI_SEMAPHORE_SAD_LT_SDD       (2 << 12)
>>> +#define   MI_SEMAPHORE_SAD_LTE_SDD      (3 << 12)
>>> +#define   MI_SEMAPHORE_SAD_EQ_SDD       (4 << 12)
>>> +#define   MI_SEMAPHORE_SAD_NEQ_SDD      (5 << 12)
>>> +
>>>    #define INSTANCE_COUNT (1 << I915_PMU_SAMPLE_INSTANCE_BITS)
>>>    
>>>    static size_t sizeof_load_balance(int count)
>>> @@ -694,6 +703,145 @@ static void bonded(int i915, unsigned int flags)
>>>        gem_context_destroy(i915, master);
>>>    }
>>>    
>>> +static unsigned int offset_in_page(void *addr)
>>> +{
>>> +     return (uintptr_t)addr & 4095;
>>> +}
>>> +
>>> +static uint32_t create_semaphore_to_spinner(int i915, igt_spin_t *spin)
>>> +{
>>> +     uint32_t *cs, *map;
>>> +     uint32_t handle;
>>> +
>>> +     handle = gem_create(i915, 4096);
>>> +     cs = map = gem_mmap__cpu(i915, handle, 0, 4096, PROT_WRITE);
>>> +
>>> +     /* Wait until the spinner is running */
>>> +     *cs++ = MI_SEMAPHORE_WAIT |
>>> +             MI_SEMAPHORE_POLL |
>>> +             MI_SEMAPHORE_SAD_NEQ_SDD |
>>> +             (4 - 2);
>>> +     *cs++ = 0;
>>> +     *cs++ = spin->obj[0].offset + 4 * SPIN_POLL_START_IDX;
>>> +     *cs++ = 0;
>>> +
>>> +     /* Then cancel the spinner */
>>> +     *cs++ = MI_STORE_DWORD_IMM;
>>> +     *cs++ = spin->obj[IGT_SPIN_BATCH].offset +
>>> +             offset_in_page(spin->condition);
>>> +     *cs++ = 0;
>>> +     *cs++ = MI_BATCH_BUFFER_END;
>>> +
>>> +     *cs++ = MI_BATCH_BUFFER_END;
>>> +     munmap(map, 4096);
>>> +
>>> +     return handle;
>>> +}
>>> +
>>> +static void bonded_slice(int i915)
>>> +{
>>> +     uint32_t ctx;
>>> +     int *stop;
>>> +
>>> +     igt_require(gem_scheduler_has_semaphores(i915));
>>> +
>>> +     stop = mmap(0, 4096, PROT_WRITE, MAP_SHARED | MAP_ANON, -1, 0);
>>> +     igt_assert(stop != MAP_FAILED);
>>> +
>>> +     ctx = gem_context_create(i915);
>>> +
>>> +     for (int class = 0; class < 32; class++) {
>>> +             struct i915_engine_class_instance *siblings;
>>> +             struct drm_i915_gem_exec_object2 obj[3] = {};
>>> +             struct drm_i915_gem_execbuffer2 eb = {};
>>> +             unsigned int count;
>>> +             igt_spin_t *spin;
>>> +
>>> +             siblings = list_engines(i915, 1u << class, &count);
>>> +             if (!siblings)
>>> +                     continue;
>>> +
>>> +             if (count < 2) {
>>> +                     free(siblings);
>>> +                     continue;
>>> +             }
>>> +
>>> +             /*
>>> +              * A: semaphore wait on spinner; cancel spinner
>>> +              * B: unpreemptable spinner
>>> +              *
>>> +              * A waits for running ack from B, if scheduled on the same
>>> +              * engine -> hang.
>>> +              *
>>> +              * C+: background load across engines
>>> +              */
>>> +
>>> +             set_load_balancer(i915, ctx, siblings, count, NULL);
>>> +
>>> +             spin = __igt_spin_new(i915,
>>> +                                   .ctx = ctx,
>>> +                                   .flags = (IGT_SPIN_NO_PREEMPTION |
>>> +                                             IGT_SPIN_POLL_RUN));
>>> +             igt_spin_end(spin); /* we just want its address for later */
>>> +             gem_sync(i915, spin->handle);
>>> +             igt_spin_reset(spin);
>>> +
>>> +             obj[0] = spin->obj[0];
>>> +             obj[1] = spin->obj[1];

igt_assert_eq(IGT_SPIN_BATCH, 1);

?

>>> +             obj[2].handle = create_semaphore_to_spinner(i915, spin);
>>> +
>>> +             eb.buffers_ptr = to_user_pointer(obj);
>>> +             eb.rsvd1 = ctx;
>>> +
>>> +             *stop = 0;
>>> +             igt_fork(child, count + 1) {
>>> +                     igt_list_del(&spin->link);
>>> +
>>> +                     ctx = gem_context_clone(i915, ctx,
>>> +                                             I915_CONTEXT_CLONE_ENGINES, 0);
>>> +
>>> +                     while (!READ_ONCE(*stop)) {
>>> +                             spin = igt_spin_new(i915,
>>> +                                                 .ctx = ctx,
>>> +                                                 .engine = (1 + rand() % count),
>>
>> With "count + 1" children and rand load my end up uneven across engines
>> - are you happy with that?
> 
> It's using rand, it's going to be uneven. count + 1 isn't significant in
> any way. I stopped as soon as I had the test reliably hitting the issue.
> 
>>> +                                                 .flags = IGT_SPIN_POLL_RUN);
>>> +                             igt_spin_busywait_until_started(spin);
>>> +                             usleep(50000);
>>
>> 50ms, hm, ideally there should be a pipe signal before parent starts the
>> test to know children have started. Otherwise parent can finish before
>> they even start, no?
> 
> The children are just to provide noise. The requirement is that we have
> enough load across the system to cause timeslicing to kick in.
>   
>>> +                             igt_spin_free(i915, spin);
>>> +                     }
>>> +
>>> +                     gem_context_destroy(i915, ctx);
>>> +             }
>>> +
>>> +             igt_until_timeout(5) {
>>> +                     igt_spin_reset(spin);
>>
>> What is the reset for?
> 
> We are reusing the spinner inside the loop.
> 
>>> +
>>> +                     /* A: Submit the semaphore wait */
>>> +                     eb.buffer_count = 3;
>>> +                     eb.flags = (1 + rand() % count) | I915_EXEC_FENCE_OUT;
>>> +                     gem_execbuf_wr(i915, &eb);
>>> +
>>> +                     /* B: Submit the spinner (in parallel) */
>>
>> How in parallel when it is the same context so they are implicitly in order?
> 
> Different engines with different timelines, using the submit to request
> parallel execution.

Yeah I kind of missed a few things. Looks good. For completeness you 
should also add a flavour which actually sets up the bond so the "if 
(bond)" path in virtual_bond_execute is also exercised. But this looks good.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
igt-dev mailing list
igt-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/igt-dev