All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
To: Chris Wilson <chris@chris-wilson.co.uk>, intel-gfx@lists.freedesktop.org
Cc: Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait
Date: Thu, 16 Jul 2020 09:41:17 +0100	[thread overview]
Message-ID: <1019ccaf-383e-2c45-e82f-823669607c62@linux.intel.com> (raw)
In-Reply-To: <159482447053.13728.16851584647996091210@build.alporthouse.com>


On 15/07/2020 15:47, Chris Wilson wrote:
> Quoting Chris Wilson (2020-07-15 15:47:15)
>> Quoting Tvrtko Ursulin (2020-07-15 13:26:23)
>>>
>>> On 15/07/2020 13:06, Tvrtko Ursulin wrote:
>>>>
>>>> On 15/07/2020 11:50, Chris Wilson wrote:
>>>>> Currently, we use i915_request_completed() directly in
>>>>> i915_request_wait() and follow up with a manual invocation of
>>>>> dma_fence_signal(). This appears to cause a large number of contentions
>>>>> on i915_request.lock as when the process is woken up after the fence is
>>>>> signaled by an interrupt, we will then try and call dma_fence_signal()
>>>>> ourselves while the signaler is still holding the lock.
>>>>> dma_fence_is_signaled() has the benefit of checking the
>>>>> DMA_FENCE_FLAG_SIGNALED_BIT prior to calling dma_fence_signal() and so
>>>>> avoids most of that contention.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Matthew Auld <matthew.auld@intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_request.c | 12 ++++--------
>>>>>    1 file changed, 4 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_request.c
>>>>> b/drivers/gpu/drm/i915/i915_request.c
>>>>> index 0b2fe55e6194..bb4eb1a8780e 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>>>> @@ -1640,7 +1640,7 @@ static bool busywait_stop(unsigned long timeout,
>>>>> unsigned int cpu)
>>>>>        return this_cpu != cpu;
>>>>>    }
>>>>> -static bool __i915_spin_request(const struct i915_request * const rq,
>>>>> int state)
>>>>> +static bool __i915_spin_request(struct i915_request * const rq, int
>>>>> state)
>>>>>    {
>>>>>        unsigned long timeout_ns;
>>>>>        unsigned int cpu;
>>>>> @@ -1673,7 +1673,7 @@ static bool __i915_spin_request(const struct
>>>>> i915_request * const rq, int state)
>>>>>        timeout_ns = READ_ONCE(rq->engine->props.max_busywait_duration_ns);
>>>>>        timeout_ns += local_clock_ns(&cpu);
>>>>>        do {
>>>>> -        if (i915_request_completed(rq))
>>>>> +        if (dma_fence_is_signaled(&rq->fence))
>>>>>                return true;
>>>>>            if (signal_pending_state(state, current))
>>>>> @@ -1766,10 +1766,8 @@ long i915_request_wait(struct i915_request *rq,
>>>>>         * duration, which we currently lack.
>>>>>         */
>>>>>        if (IS_ACTIVE(CONFIG_DRM_I915_MAX_REQUEST_BUSYWAIT) &&
>>>>> -        __i915_spin_request(rq, state)) {
>>>>> -        dma_fence_signal(&rq->fence);
>>>>> +        __i915_spin_request(rq, state))
>>>>>            goto out;
>>>>> -    }
>>>>>        /*
>>>>>         * This client is about to stall waiting for the GPU. In many cases
>>>>> @@ -1796,10 +1794,8 @@ long i915_request_wait(struct i915_request *rq,
>>>>>        for (;;) {
>>>>>            set_current_state(state);
>>>>> -        if (i915_request_completed(rq)) {
>>>>> -            dma_fence_signal(&rq->fence);
>>>>> +        if (dma_fence_is_signaled(&rq->fence))
>>>>>                break;
>>>>> -        }
>>>>>            intel_engine_flush_submission(rq->engine);
>>>>>
>>>>
>>>> In other words putting some latency back into the waiters, which is
>>>> probably okay, since sync waits is not our primary model.
>>>>
>>>> I have a slight concern about the remaining value of busy spinning if
>>>> i915_request_completed check is removed from there as well. Of course it
>>>> doesn't make sense to have different completion criteria between the
>>>> two.. We could wait a bit longer if real check in busyspin said request
>>>> is actually completed, just not signal it but wait for the breadcrumbs
>>>> to do it.
>>>
>>> What a load of nonsense.. :)
>>>
>>> Okay, I think the only real question is i915_request_completed vs
>>> dma_fence_signaled in __i915_spin_request. Do we want to burn CPU cycles
>>> waiting on GPU and breadcrumb irq work, or just the GPU.
>>
>> dma_fence_is_signaled() {
>>          if (test_bit(SIGNALED_BIT))
>>                  return true;
>>          
>>          if (i915_request_completed()) {
>>                  dma_fence_signal();
>>                  return true;
>>          }
>>          
>>          return false;
>> }
>>
>> with the indirection. So the question is whether the indirection is
>> worth the extra test bit. Just purely looking at the i915_request.lock
>> contention suggests that it probably is. For the spinner, burning a few
>> extra CPU cycles for *vfunc is not an issue, it's the wakeup latency,
>> and since we are calling dma_fence_signal() upon wakeup we do take the
>> spinlock without checking for an early return from the SIGNALED_BIT.
>> So I think it's a net positive. The alternative was to write
>>
>>          if (i915_request_completed()) {
>>                  if (!i915_request_is_signaled())
>>                          dma_fence_signal();
>>                  break;
>>          }
>>
>> but
>>
>>          if (dma_fence_is_signaled())
>>                  break;
>>
>> does appear simpler, if only by virtue of hiding the details in an
>> inline.
> 
> Or a patch to add test_bit() to dma_fence_signal, but this looked
> better.

Right I missed dma_fence_is_signaled calls i915_request_completed.

In this case the remaining question is do we care about wait ioctl 
potentially returning before the hypothetical sync fence for the same 
request is signaled? This seems like a slight change in user observable 
behaviour.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2020-07-16  8:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15 10:50 [Intel-gfx] [PATCH] drm/i915: Reduce i915_request.lock contention for i915_request_wait Chris Wilson
2020-07-15 12:06 ` Tvrtko Ursulin
2020-07-15 12:26   ` Tvrtko Ursulin
2020-07-15 14:47     ` Chris Wilson
2020-07-15 14:47       ` Chris Wilson
2020-07-16  8:41         ` Tvrtko Ursulin [this message]
2020-07-16  8:47           ` Chris Wilson
2020-07-16  9:02             ` Tvrtko Ursulin
2020-07-15 12:12 ` [Intel-gfx] ✗ Fi.CI.SPARSE: warning for " Patchwork
2020-07-15 12:34 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-07-15 16:11 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1019ccaf-383e-2c45-e82f-823669607c62@linux.intel.com \
    --to=tvrtko.ursulin@linux.intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.