Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Matthew Brost <matthew.brost@intel.com>,
	 Lucas De Marchi <lucas.demarchi@intel.com>
Subject: Re: [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds
Date: Mon, 9 Sep 2024 14:59:56 +0530	[thread overview]
Message-ID: <43116f22-0495-44ec-9895-aad9dcd5165d@intel.com> (raw)
In-Reply-To: <Ztst1yxHrJyHQMl6@intel.com>



On 06-09-2024 21:59, Rodrigo Vivi wrote:
> On Fri, Sep 06, 2024 at 01:21:41AM +0530, Ghimiray, Himal Prasad wrote:
>>
>>
>> On 06-09-2024 01:07, Rodrigo Vivi wrote:
>>> On Fri, Aug 30, 2024 at 10:53:24AM +0530, Himal Prasad Ghimiray wrote:
>>>> A failure in xe_force_wake_get() no longer increments the domain's
>>>> refcount, so xe_force_wake_put() should not be called in such cases
>>>>
>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 ++++++---
>>>>    1 file changed, 6 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> index cca9cf536f76..3f86ab704c4f 100644
>>>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>> @@ -259,11 +259,11 @@ static int xe_gt_tlb_invalidation_guc(struct xe_gt *gt,
>>>>    int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>    {
>>>>    	struct xe_device *xe = gt_to_xe(gt);
>>>> +	int ret;
>>>>    	if (xe_guc_ct_enabled(&gt->uc.guc.ct) &&
>>>>    	    gt->uc.guc.submission_state.enabled) {
>>>>    		struct xe_gt_tlb_invalidation_fence fence;
>>>> -		int ret;
>>>>    		xe_gt_tlb_invalidation_fence_init(gt, &fence, true);
>>>>    		ret = xe_gt_tlb_invalidation_guc(gt, &fence);
>>>> @@ -277,7 +277,9 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>    		if (IS_SRIOV_VF(xe))
>>>>    			return 0;
>>>> -		xe_gt_WARN_ON(gt, xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
>>>> +		ret =  xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>> +		xe_gt_WARN_ON(gt, ret);
>>>> +
>>>>    		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
>>>>    			xe_mmio_write32(gt, PVC_GUC_TLB_INV_DESC1,
>>>>    					PVC_GUC_TLB_INV_DESC1_INVALIDATE);
>>>> @@ -287,7 +289,8 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>    			xe_mmio_write32(gt, GUC_TLB_INV_CR,
>>>>    					GUC_TLB_INV_CR_INVALIDATE);
>>>>    		}
>>>> -		xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>> +		if (!ret)
>>>> +			xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>
>>> looking all these cases now I honestly prefer the other way around.
>>>
>>> If we called the get, we call the put.
>>> get always increase the reference and put does the clean-up.
>>>
>>> fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>
>>> xe_force_wake_put(gt_to_fw(gt), fw_ref);
>>>
>>> so, the fw_ref is a mask of the woken up cases which require
>>> the ref drop and sleep call.
>>
>> Hi Rodrigo,
>>
>> Thanks for the input. AFAIU using this approach creates issue in the
>> subsequent force_wake_get/put in callee function. Which I have tried to
>> explain in cover letter.
>>
>> [1] subsequent forcewake call by callee function assumes domains are
>> already awake, which might not be true. This shows perfectly balanced
>> xe_force_wake_get/_put can also cause problem.
>>
>> [1] func_a() {
>> 	XE_WARN(xe_force_wake_get()) <---> fails but increments refcount
>>
>> 	func_b();
>>
>> 	XE_WARN(xe_force_wake_put());<---> decrements refcounts
>>   }
>>
>>      func_b() {
>> 	if(xe_force_wake_get()) <---> succeeds due to refcount of caller
>> 		return;
>>
>> 	does mmio_operations(); <---> Domain might not be awake
>>
>> 	xe_force_wake_put(); <---> decrement refcount
>>   }
> 
> Well, to be honest, this is what bugs me in this whole series.
> 
> If func_a failed, why would function b succeed? It that's the
> case should we include more redundancy and retries so the
> func_a would succeed like the func_b is expected in your
> scenario?


Hi Rodrigo,

This is current behavior, which patch [1] resolves. I misunderstood your
comment as dropping of that patch and simply balancing all _gets with
respective _puts.


> 
> But other then that, I'm afraid that you didn't fully understand
> my idea. Sorry for not being clear.
> 
> My thought is, you do what you are doing in this series.
> If the get doesn't succeed you drop the ref count and call the
> disable.


OK. IMO, just reducing refcount is better for failing domain and not to
disable it explicitly


> 
> The return of the get is just for the domains that have succeeded.
> then the put returns only the ones that had succeeded.
> The function B will then try to wake-up whatever had failed in
> func_a.

I assumw with this, the return of xe_force_wake_get will return the 
mask, hence the caller will need to verify whether the returned mask is 
correct or failed.


> 
> Something like:
> 
> 
> func_a() {
> 	fw_ref = xe_force_wake_get(ALL_DOMAINS) <---> fails GT-domain but return a mask with all the domains except GT.
> 
> 	XE_WARN(!fw_ref);


XE_WARN(!fw_ref); will work for all individual domains but not  ALL_DOMAINS

XE_WARN(fw_ref != ALL_DOMAINS); <-- If user wants to continue -->

if (fw_ref != ALL_DOMAINS)  <--If user wants to return on failure -->
	xe_force_wake_put(fw_ref); <-- ensure to put awake domain -->
				
	return;
}


> 
> 	func_b();
> 
> 	XE_WARN(xe_force_wake_put(fw_ref));<---> decrements refcounts of the domains which were actually woken up.

Makes sense.

> }
> 
>     func_b() {
>          fw_ref = xe_force_wake_get(GT_DOMAIN);
> 	if(fw_ref & GT_DOMAIN) <---> likely fail anyway since func_a has failed, but it at least tries it out because you have handled it in your series...
> 		return;
> 
> 	does mmio_operations(); <---> Domain might not be awake
> 
> 	xe_force_wake_put(fw_ref); <---> decrement refcount of the domains you woked up.
> }
> 
> does it make sense now?


Yes, this is indeed a much better approach for FORCEWAKE_ALL. Thank you 
for the suggestion. To summarize, rather than disabling the successfully 
awakened domain in the event of a failure, we will use forcewake_put to 
handle the disabling of them and user will decide when to call it.


> 
>>
>> BR
>> Himal
>>
>>>
>>>>    	}
>>>>    	return 0;
>>>> -- 
>>>> 2.34.1
>>>>

  reply	other threads:[~2024-09-09  9:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30  5:23 [RFC 0/9] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-08-30  5:18 ` ✓ CI.Patch_applied: success for " Patchwork
2024-08-30  5:18 ` ✓ CI.checkpatch: " Patchwork
2024-08-30  5:19 ` ✓ CI.KUnit: " Patchwork
2024-08-30  5:23 ` [RFC 1/9] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-08-30  6:37   ` Jani Nikula
2024-08-30  6:45     ` Ghimiray, Himal Prasad
2024-09-05 19:29   ` Rodrigo Vivi
2024-09-05 20:02     ` Ghimiray, Himal Prasad
2024-09-06 16:18       ` Rodrigo Vivi
2024-09-10 18:27         ` Nilawar, Badal
2024-09-11  6:51           ` Ghimiray, Himal Prasad
2024-09-11  6:40       ` Upadhyay, Tejas
2024-08-30  5:23 ` [RFC 2/9] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-05 19:30   ` Rodrigo Vivi
2024-08-30  5:23 ` [RFC 3/9] drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get() succeeds Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 4/9] drm/xe/gt: " Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 5/9] drm/xe/guc: " Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 6/9] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds Himal Prasad Ghimiray
2024-09-05 19:37   ` Rodrigo Vivi
2024-09-05 19:51     ` Ghimiray, Himal Prasad
2024-09-06 16:29       ` Rodrigo Vivi
2024-09-09  9:29         ` Ghimiray, Himal Prasad [this message]
2024-09-10 14:37           ` Nilawar, Badal
2024-09-10 17:39             ` Rodrigo Vivi
2024-09-10 17:53               ` Nilawar, Badal
2024-08-30  5:23 ` [RFC 8/9] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 9/9] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-08-30  5:32 ` ✓ CI.Build: success for Fix xe_force_wake_get() failure handling Patchwork
2024-08-30  5:37 ` ✓ CI.Hooks: " Patchwork
2024-08-30  5:42 ` ✓ CI.checksparse: " Patchwork
2024-08-30  6:05 ` ✓ CI.BAT: " Patchwork
2024-08-30 17:41 ` ✓ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43116f22-0495-44ec-9895-aad9dcd5165d@intel.com \
    --to=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox