Re: [PATCH 12/13] drm/i915: Consolidate legacy semaphore initialization

From: Dave Gordon <david.s.gordon@intel.com>
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	Chris Wilson <chris@chris-wilson.co.uk>,
	Intel-gfx@lists.freedesktop.org
Subject: Re: [PATCH 12/13] drm/i915: Consolidate legacy semaphore initialization
Date: Wed, 20 Jul 2016 13:50:37 +0100	[thread overview]
Message-ID: <578F739D.3040302@intel.com> (raw)
In-Reply-To: <578F4A3D.4020707@linux.intel.com>

On 20/07/16 10:54, Tvrtko Ursulin wrote:
>
> On 19/07/16 19:38, Dave Gordon wrote:
>> On 15/07/16 14:13, Tvrtko Ursulin wrote:
>>>
>>> On 29/06/16 17:00, Chris Wilson wrote:
>>>> On Wed, Jun 29, 2016 at 04:41:58PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 29/06/16 16:34, Chris Wilson wrote:
>>>>>> On Wed, Jun 29, 2016 at 04:09:31PM +0100, Tvrtko Ursulin wrote:
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>
>>>>>>> Replace per-engine initialization with a common half-programatic,
>>>>>>> half-data driven code for ease of maintenance and compactness.
>>>>>>>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> This is the biggest pill to swallow (since our 5x5 table is only
>>>>>> sparsely populated), but it looks correct, and more importantly
>>>>>> easier to
>>>>>> read.
>>>>>
>>>>> Yeah I was out of ideas on how to improve it. Fresh mind needed to
>>>>> try and spot a pattern in how MI_SEMAPHORE_SYNC_* and GEN6_*SYNC map
>>>>> to bits and registers respectively, and write it as a function.
>>>>
>>>> It's actually a very simple cyclic function based on register
>>>> offset = base + (signaler hw_id - waiter hw_id - 1) % num_rings.
>>>>
>>>> (The only real challenge is picking the direction.)
>>>>
>>>> commit c8c99b0f0dea1ced5d0e10cdb9143356cc16b484
>>>> Author: Ben Widawsky <ben@bwidawsk.net>
>>>> Date:   Wed Sep 14 20:32:47 2011 -0700
>>>>
>>>>      drm/i915: Dumb down the semaphore logic
>>>>
>>>>      While I think the previous code is correct, it was hard to follow
>>>> and
>>>>      hard to debug. Since we already have a ring abstraction, might as
>>>> well
>>>>      use it to handle the semaphore updates and compares.
>>>
>>> Doesn't seem to fit, or I just can't figure it out. Needs two functions
>>> to get rid of the table:
>>>
>>> f1(0, 1) = 2
>>> f1(0, 2) = 0
>>> f1(0, 3) = 2
>>> f1(1, 0) = 0
>>> f1(1, 2) = 2
>>> f1(1, 3) = 1
>>> f1(2, 0) = 2
>>> f1(2, 1) = 0
>>> f1(2, 3) = 0
>>> f1(3, 0) = 1
>>> f1(3, 1) = 1
>>> f1(3, 2) = 1
>>>
>>> and:
>>>
>>> f2(0, 1) = 1
>>> f2(0, 2) = 0
>>> f2(0, 3) = 1
>>> f2(1, 0) = 0
>>> f2(1, 2) = 1
>>> f2(1, 3) = 2
>>> f2(2, 0) = 1
>>> f2(2, 1) = 0
>>> f2(2, 3) = 0
>>> f2(3, 0) = 2
>>> f2(3, 1) = 2
>>> f2(3, 2) = 2
>>>
>>> A weekend math puzzle for someone? :)
>>>
>>> Regards,
>>> Tvrtko
>>
>> Here's the APL expression for (the transpose of) f2, with -1's filled in
>> along the leading diagonal (you need ⎕io←0 so the ⍳-vectors are in
>> origin 0)
>>
>>        {¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>
>> ┌────────┬────────┬────────┬────────┐
>> │¯1 0 1 2│1 ¯1 0 2│0 1 ¯1 2│1 2 0 ¯1│
>> └────────┴────────┴────────┴────────┘
>>
>> or transposed back so that the first argument is the row index and the
>> second is the column index:
>>
>>        ⍉↑{¯1+(⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))1+⍳3}¨⍳4
>>
>>  ¯1  1  0  1
>>   0 ¯1  1  2
>>   1  0 ¯1  0
>>   2  2  2 ¯1
>>
>> http://tryapl.org/?a=%u2349%u2191%7B%AF1+%28%u2375%u2260%u23734%29%u2340%282%7C%u2375%29%u233D%28%u233D%u2363%281%3D%u2375%29%291+%u23733%7D%A8%u23734&run
>>
>
>   :-C ! How to convert that to C ? :)
>
>> f1 is trivially derived from this by the observation that f1 is just f2
>> with the 1's and 2's interchanged.
>
> Ah yes, nicely spotted.
>
> Regards,
> Tvrtko

Assuming you don't care about the leading diagonal (x == y), then

  (⍵≠⍳4)⍀(2|⍵)⌽(⌽⍣(1=⍵))

translates into:

int f2(unsigned int x, unsigned int y)
{
     x -= x >= y;
     if (y == 1)
         x = 3 - x;
     x += y & 1;
     return x % 3;
}

y:x 0 1 2 3
0:  0 0 1 2
1:  1 1 0 2
2:  0 1 1 2
3:  1 2 0 0

Each line of C corresponds quite closely to one operation in the APL :)
Although, in APL we tend to leave the data unchanged while shuffling it 
around into new shapes, whereas the C below does the equivalent things 
by changing the data (noting that it's all modulo-3 arithmetic).

  (⍵≠⍳4)⍀  inserts the leading diagonal, corresponding to the subtraction
           of x >= y (which removes the leading diagonal).

  ⌽⍣(1=⍵)  reverses the sequence if y==1; in C, that's the 3-x

  (2|⍵)⌽   rotates the sequence by 1 if y is odd; that's the +=

and the final % ensures that the result is 0-2.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx