All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement
@ 2020-03-03 13:07 Chris Wilson
  2020-03-03 13:40 ` Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2020-03-03 13:07 UTC (permalink / raw)
  To: intel-gfx

As we have pinned the timeline (using tl->active_count), we can safely
drop the tl->mutex as we wait for what we believe to be the final
request on that timeline. This is useful for ensuring that we do not
block the engine heartbeat by hogging the kernel_context's timeline on a
dead GPU.

References: https://gitlab.freedesktop.org/drm/intel/issues/1364
Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 8a5054f21bf8..436412d07689 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -147,24 +147,31 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
 
 			fence = i915_active_fence_get(&tl->last_request);
 			if (fence) {
+				mutex_unlock(&tl->mutex);
+
 				timeout = dma_fence_wait_timeout(fence,
 								 interruptible,
 								 timeout);
 				dma_fence_put(fence);
+
+				if (!mutex_trylock(&tl->mutex)) {
+					active_count++;
+					goto out_active;
+				}
 			}
 		}
 
 		if (!retire_requests(tl) || flush_submission(gt))
 			active_count++;
+		mutex_unlock(&tl->mutex);
 
-		spin_lock(&timelines->lock);
+out_active:	spin_lock(&timelines->lock);
 
 		/* Resume iteration after dropping lock */
 		list_safe_reset_next(tl, tn, link);
 		if (atomic_dec_and_test(&tl->active_count))
 			list_del(&tl->link);
 
-		mutex_unlock(&tl->mutex);
 
 		/* Defer the final release to after the spinlock */
 		if (refcount_dec_and_test(&tl->kref.refcount)) {
-- 
2.25.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement
  2020-03-03 13:07 [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement Chris Wilson
@ 2020-03-03 13:40 ` Mika Kuoppala
  2020-03-03 14:04   ` Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Mika Kuoppala @ 2020-03-03 13:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> As we have pinned the timeline (using tl->active_count), we can safely
> drop the tl->mutex as we wait for what we believe to be the final
> request on that timeline. This is useful for ensuring that we do not
> block the engine heartbeat by hogging the kernel_context's timeline on a
> dead GPU.
>
> References: https://gitlab.freedesktop.org/drm/intel/issues/1364
> Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 8a5054f21bf8..436412d07689 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -147,24 +147,31 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>  
>  			fence = i915_active_fence_get(&tl->last_request);
>  			if (fence) {
> +				mutex_unlock(&tl->mutex);
> +
>  				timeout = dma_fence_wait_timeout(fence,
>  								 interruptible,
>  								 timeout);
>  				dma_fence_put(fence);
> +
> +				if (!mutex_trylock(&tl->mutex)) {

If you can't take it, it must be active and for the retirement
advancement we can bail out early.

Or is there something else with a sampled try?

> +					active_count++;
> +					goto out_active;
> +				}
>  			}
>  		}
>  
>  		if (!retire_requests(tl) || flush_submission(gt))
>  			active_count++;
> +		mutex_unlock(&tl->mutex);
>  
> -		spin_lock(&timelines->lock);
> +out_active:	spin_lock(&timelines->lock);
>  
>  		/* Resume iteration after dropping lock */

You either fixed this comment with this patch.
Or that the comment remains a highly confusing.

>  		list_safe_reset_next(tl, tn, link);
>  		if (atomic_dec_and_test(&tl->active_count))
>  			list_del(&tl->link);

We have the timelines lock and the above seems safe
wtithout the actual mutex.

But the comment is still hauting me.
-Mika

>  
> -		mutex_unlock(&tl->mutex);
>  
>  		/* Defer the final release to after the spinlock */
>  		if (refcount_dec_and_test(&tl->kref.refcount)) {
> -- 
> 2.25.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement
@ 2020-03-03 14:00 Chris Wilson
  2020-03-03 14:06 ` Mika Kuoppala
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Wilson @ 2020-03-03 14:00 UTC (permalink / raw)
  To: intel-gfx

As we have pinned the timeline (using tl->active_count), we can safely
drop the tl->mutex as we wait for what we believe to be the final
request on that timeline. This is useful for ensuring that we do not
block the engine heartbeat by hogging the kernel_context's timeline on a
dead GPU.

References: https://gitlab.freedesktop.org/drm/intel/issues/1364
Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_gt_requests.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
index 8a5054f21bf8..24c99d0838af 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
@@ -147,24 +147,32 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
 
 			fence = i915_active_fence_get(&tl->last_request);
 			if (fence) {
+				mutex_unlock(&tl->mutex);
+
 				timeout = dma_fence_wait_timeout(fence,
 								 interruptible,
 								 timeout);
 				dma_fence_put(fence);
+
+				/* Retirement is best effort */
+				if (!mutex_trylock(&tl->mutex)) {
+					active_count++;
+					goto out_active;
+				}
 			}
 		}
 
 		if (!retire_requests(tl) || flush_submission(gt))
 			active_count++;
+		mutex_unlock(&tl->mutex);
 
-		spin_lock(&timelines->lock);
+out_active:	spin_lock(&timelines->lock);
 
-		/* Resume iteration after dropping lock */
+		/* Resume list iteration after reacquiring spinlock */
 		list_safe_reset_next(tl, tn, link);
 		if (atomic_dec_and_test(&tl->active_count))
 			list_del(&tl->link);
 
-		mutex_unlock(&tl->mutex);
 
 		/* Defer the final release to after the spinlock */
 		if (refcount_dec_and_test(&tl->kref.refcount)) {
-- 
2.25.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement
  2020-03-03 13:40 ` Mika Kuoppala
@ 2020-03-03 14:04   ` Mika Kuoppala
  0 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2020-03-03 14:04 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Mika Kuoppala <mika.kuoppala@linux.intel.com> writes:

> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
>> As we have pinned the timeline (using tl->active_count), we can safely
>> drop the tl->mutex as we wait for what we believe to be the final
>> request on that timeline. This is useful for ensuring that we do not
>> block the engine heartbeat by hogging the kernel_context's timeline on a
>> dead GPU.
>>
>> References: https://gitlab.freedesktop.org/drm/intel/issues/1364
>> Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> index 8a5054f21bf8..436412d07689 100644
>> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
>> @@ -147,24 +147,31 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>>  
>>  			fence = i915_active_fence_get(&tl->last_request);
>>  			if (fence) {
>> +				mutex_unlock(&tl->mutex);
>> +
>>  				timeout = dma_fence_wait_timeout(fence,
>>  								 interruptible,
>>  								 timeout);
>>  				dma_fence_put(fence);
>> +
>> +				if (!mutex_trylock(&tl->mutex)) {
>
> If you can't take it, it must be active and for the retirement
> advancement we can bail out early.
>
> Or is there something else with a sampled try?

Got some answers in irc: this is best effort so yes,
if it is active it is used and we can bail out and
retire later.

>
>> +					active_count++;
>> +					goto out_active;
>> +				}
>>  			}
>>  		}
>>  
>>  		if (!retire_requests(tl) || flush_submission(gt))
>>  			active_count++;
>> +		mutex_unlock(&tl->mutex);
>>  
>> -		spin_lock(&timelines->lock);
>> +out_active:	spin_lock(&timelines->lock);
>>  
>>  		/* Resume iteration after dropping lock */
>
> You either fixed this comment with this patch.
> Or that the comment remains a highly confusing.
>

This is for the timeline spinlock.
Still, it is very confusing as we only drop it for the wait,
rather deep in iteration.

After discussing with Chris on irc, we agreed that
massaging it to a proper form is warranted.

With the comment bended to match the lock acquisition,

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>>  		list_safe_reset_next(tl, tn, link);
>>  		if (atomic_dec_and_test(&tl->active_count))
>>  			list_del(&tl->link);
>
> We have the timelines lock and the above seems safe
> wtithout the actual mutex.
>
> But the comment is still hauting me.
> -Mika
>
>>  
>> -		mutex_unlock(&tl->mutex);
>>  
>>  		/* Defer the final release to after the spinlock */
>>  		if (refcount_dec_and_test(&tl->kref.refcount)) {
>> -- 
>> 2.25.1
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement
  2020-03-03 14:00 Chris Wilson
@ 2020-03-03 14:06 ` Mika Kuoppala
  0 siblings, 0 replies; 5+ messages in thread
From: Mika Kuoppala @ 2020-03-03 14:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> As we have pinned the timeline (using tl->active_count), we can safely
> drop the tl->mutex as we wait for what we believe to be the final
> request on that timeline. This is useful for ensuring that we do not
> block the engine heartbeat by hogging the kernel_context's timeline on a
> dead GPU.
>
> References: https://gitlab.freedesktop.org/drm/intel/issues/1364
> Fixes: 058179e72e09 ("drm/i915/gt: Replace hangcheck by heartbeats")
> Fixes: f33a8a51602c ("drm/i915: Merge wait_for_timelines with retire_request")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_gt_requests.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> index 8a5054f21bf8..24c99d0838af 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c
> @@ -147,24 +147,32 @@ long intel_gt_retire_requests_timeout(struct intel_gt *gt, long timeout)
>  
>  			fence = i915_active_fence_get(&tl->last_request);
>  			if (fence) {
> +				mutex_unlock(&tl->mutex);
> +
>  				timeout = dma_fence_wait_timeout(fence,
>  								 interruptible,
>  								 timeout);
>  				dma_fence_put(fence);
> +
> +				/* Retirement is best effort */
> +				if (!mutex_trylock(&tl->mutex)) {
> +					active_count++;
> +					goto out_active;
> +				}
>  			}
>  		}
>  
>  		if (!retire_requests(tl) || flush_submission(gt))
>  			active_count++;
> +		mutex_unlock(&tl->mutex);
>  
> -		spin_lock(&timelines->lock);
> +out_active:	spin_lock(&timelines->lock);
>  
> -		/* Resume iteration after dropping lock */
> +		/* Resume list iteration after reacquiring spinlock */

Now it is in mint.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  		list_safe_reset_next(tl, tn, link);
>  		if (atomic_dec_and_test(&tl->active_count))
>  			list_del(&tl->link);
>  
> -		mutex_unlock(&tl->mutex);
>  
>  		/* Defer the final release to after the spinlock */
>  		if (refcount_dec_and_test(&tl->kref.refcount)) {
> -- 
> 2.25.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-03-03 14:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-03-03 13:07 [Intel-gfx] [PATCH] drm/i915/gt: Drop the timeline->mutex as we wait for retirement Chris Wilson
2020-03-03 13:40 ` Mika Kuoppala
2020-03-03 14:04   ` Mika Kuoppala
  -- strict thread matches above, loose matches on Subject: below --
2020-03-03 14:00 Chris Wilson
2020-03-03 14:06 ` Mika Kuoppala

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.