All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: Matthew Brost <matthew.brost@intel.com>
Cc: intel-xe@lists.freedesktop.org, Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH 1/2] drm/xe: fix unbalanced rpm put() with fence_fini()
Date: Wed, 9 Oct 2024 08:51:16 +0100	[thread overview]
Message-ID: <a9ec9180-754e-495c-aecc-759469610694@intel.com> (raw)
In-Reply-To: <ZwVoNDSywONjEye0@DUT025-TGLU.fm.intel.com>

On 08/10/2024 18:13, Matthew Brost wrote:
> On Tue, Oct 08, 2024 at 11:47:24AM +0100, Matthew Auld wrote:
>> Currently we can call fence_fini() twice if something goes wrong when
>> sending the GuC CT for the tlb request, since we signal the fence and
>> return an error, leading to the caller also calling fini() on the error
>> path in the case of stack version of the flow, which leads to an extra
>> rpm put() which might later cause device to enter suspend when it
>> shouldn't. It looks like we can just drop the fini() call since the
>> fence signaller side will already call this for us.
>>
>> There are known mysterious splats with device going to sleep even with
>> an rpm ref, and this could be one candidate.
>>
>> Fixes: 0a382f9bc5dc ("drm/xe: Hold a PM ref when GT TLB invalidations are inflight")
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Matthew Brost <matthew.brost@intel.com>
>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 26 ++++++++-------------
>>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h |  1 -
>>   drivers/gpu/drm/xe/xe_vm.c                  |  8 ++-----
>>   3 files changed, 12 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> index 98616de0c5bb..3eca8d680533 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> @@ -37,6 +37,12 @@ static long tlb_timeout_jiffies(struct xe_gt *gt)
>>   	return hw_tlb_timeout + 2 * delay;
>>   }
>>   
>> +static void xe_gt_tlb_invalidation_fence_fini(struct xe_gt_tlb_invalidation_fence *fence)
>> +{
> 
> To catch a double call of this:
> 
> if (WARN_ON_ONCE(!fence->gt))
> 	return;

It should hit an NPD here, if called twice. But I guess it's easy enough 
not to crash here and have a warn instead. Will add this.

> 
> Everything else LGTM. With that:
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>
> 
>> +	xe_pm_runtime_put(gt_to_xe(fence->gt));
>> +	fence->gt = NULL; /* fini() should be called once */
>> +}
>> +
>>   static void
>>   __invalidation_fence_signal(struct xe_device *xe, struct xe_gt_tlb_invalidation_fence *fence)
>>   {
>> @@ -204,7 +210,7 @@ static int send_tlb_invalidation(struct xe_guc *guc,
>>   						   tlb_timeout_jiffies(gt));
>>   		}
>>   		spin_unlock_irq(&gt->tlb_invalidation.pending_lock);
>> -	} else if (ret < 0) {
>> +	} else {
>>   		__invalidation_fence_signal(xe, fence);
>>   	}
>>   	if (!ret) {
>> @@ -267,10 +273,8 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>   
>>   		xe_gt_tlb_invalidation_fence_init(gt, &fence, true);
>>   		ret = xe_gt_tlb_invalidation_guc(gt, &fence);
>> -		if (ret < 0) {
>> -			xe_gt_tlb_invalidation_fence_fini(&fence);
>> +		if (ret)
>>   			return ret;
>> -		}
>>   
>>   		xe_gt_tlb_invalidation_fence_wait(&fence);
>>   	} else if (xe_device_uc_enabled(xe) && !xe_device_wedged(xe)) {
>> @@ -498,7 +502,8 @@ static const struct dma_fence_ops invalidation_fence_ops = {
>>    * @stack: fence is stack variable
>>    *
>>    * Initialize TLB invalidation fence for use. xe_gt_tlb_invalidation_fence_fini
>> - * must be called if fence is not signaled.
>> + * will be automatically called when fence is signalled (all fences must signal),
>> + * even on error.
>>    */
>>   void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt,
>>   				       struct xe_gt_tlb_invalidation_fence *fence,
>> @@ -518,14 +523,3 @@ void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt,
>>   		dma_fence_get(&fence->base);
>>   	fence->gt = gt;
>>   }
>> -
>> -/**
>> - * xe_gt_tlb_invalidation_fence_fini - Finalize TLB invalidation fence
>> - * @fence: TLB invalidation fence to finalize
>> - *
>> - * Drop PM ref which fence took durinig init.
>> - */
>> -void xe_gt_tlb_invalidation_fence_fini(struct xe_gt_tlb_invalidation_fence *fence)
>> -{
>> -	xe_pm_runtime_put(gt_to_xe(fence->gt));
>> -}
>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
>> index a84065fa324c..f430d5797af7 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.h
>> @@ -28,7 +28,6 @@ int xe_guc_tlb_invalidation_done_handler(struct xe_guc *guc, u32 *msg, u32 len);
>>   void xe_gt_tlb_invalidation_fence_init(struct xe_gt *gt,
>>   				       struct xe_gt_tlb_invalidation_fence *fence,
>>   				       bool stack);
>> -void xe_gt_tlb_invalidation_fence_fini(struct xe_gt_tlb_invalidation_fence *fence);
>>   
>>   static inline void
>>   xe_gt_tlb_invalidation_fence_wait(struct xe_gt_tlb_invalidation_fence *fence)
>> diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
>> index ce9dca4d4e87..c99380271de6 100644
>> --- a/drivers/gpu/drm/xe/xe_vm.c
>> +++ b/drivers/gpu/drm/xe/xe_vm.c
>> @@ -3199,10 +3199,8 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
>>   
>>   			ret = xe_gt_tlb_invalidation_vma(tile->primary_gt,
>>   							 &fence[fence_id], vma);
>> -			if (ret < 0) {
>> -				xe_gt_tlb_invalidation_fence_fini(&fence[fence_id]);
>> +			if (ret)
>>   				goto wait;
>> -			}
>>   			++fence_id;
>>   
>>   			if (!tile->media_gt)
>> @@ -3214,10 +3212,8 @@ int xe_vm_invalidate_vma(struct xe_vma *vma)
>>   
>>   			ret = xe_gt_tlb_invalidation_vma(tile->media_gt,
>>   							 &fence[fence_id], vma);
>> -			if (ret < 0) {
>> -				xe_gt_tlb_invalidation_fence_fini(&fence[fence_id]);
>> +			if (ret)
>>   				goto wait;
>> -			}
>>   			++fence_id;
>>   		}
>>   	}
>> -- 
>> 2.46.2
>>

  reply	other threads:[~2024-10-09  7:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 10:47 [PATCH 1/2] drm/xe: fix unbalanced rpm put() with fence_fini() Matthew Auld
2024-10-08 10:47 ` [PATCH 2/2] drm/xe: fix unbalanced rpm put() with declare_wedged() Matthew Auld
2024-10-08 17:14   ` Matthew Brost
2024-10-08 10:53 ` ✓ CI.Patch_applied: success for series starting with [1/2] drm/xe: fix unbalanced rpm put() with fence_fini() Patchwork
2024-10-08 10:53 ` ✓ CI.checkpatch: " Patchwork
2024-10-08 10:54 ` ✓ CI.KUnit: " Patchwork
2024-10-08 11:06 ` ✓ CI.Build: " Patchwork
2024-10-08 11:08 ` ✓ CI.Hooks: " Patchwork
2024-10-08 11:10 ` ✓ CI.checksparse: " Patchwork
2024-10-08 11:33 ` ✓ CI.BAT: " Patchwork
2024-10-08 17:13 ` [PATCH 1/2] " Matthew Brost
2024-10-09  7:51   ` Matthew Auld [this message]
2024-10-08 18:11 ` ✗ CI.FULL: failure for series starting with [1/2] " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a9ec9180-754e-495c-aecc-759469610694@intel.com \
    --to=matthew.auld@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=nirmoy.das@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.