Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
To: Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Badal Nilawar <badal.nilawar@intel.com>,
	Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Lucas De Marchi <lucas.demarchi@intel.com>,
	"Nirmoy Das" <nirmoy.das@intel.com>
Subject: Re: [PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get()
Date: Wed, 25 Sep 2024 22:06:11 +0530	[thread overview]
Message-ID: <337515e5-83f8-4969-bc79-f1d380a31330@intel.com> (raw)
In-Reply-To: <f10c3f76-aac8-441f-80c1-c7e2632136f6@intel.com>



On 25-09-2024 17:31, Michal Wajdeczko wrote:
> 
> 
> On 24.09.2024 14:16, Himal Prasad Ghimiray wrote:
>> If an acknowledgment timeout occurs for a domain awake request, do not
>> increment the reference count for the domain. This ensures that
>> subsequent _get calls do not incorrectly assume the domain is awake. The
>> return value is a mask of domains whose reference counts were
>> incremented, and these domains need to be released using
>> xe_force_wake_put.
>>
>> The caller needs to compare the return value with the input domains to
>> determine the success or failure of the operation and decide whether to
>> continue or return accordingly.
>>
>> While at it, add simple kernel-doc for xe_force_wake_get()
>>
>> v3
>> - Use explicit type for mask (Michal/Badal)
>> - Improve kernel-doc (Michal)
>> - Use unsigned int instead of abusing enum (Michal)
>>
>> v5
>> - Use unsigned int for return (MattB/Badal/Rodrigo)
>> - use xe_gt_WARN for domain awake ack failure (Badal/Rodrigo)
>>
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_force_wake.c | 37 +++++++++++++++++++++++-------
>>   drivers/gpu/drm/xe/xe_force_wake.h |  4 ++--
>>   2 files changed, 31 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
>> index a64c14757c84..d190aa93be90 100644
>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>> @@ -150,28 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>   					 (ffs(tmp__) - 1))) && \
>>   					 domain__->reg_ctl.addr)
>>   
>> -int xe_force_wake_get(struct xe_force_wake *fw,
>> -		      enum xe_force_wake_domains domains)
>> +/**
>> + * xe_force_wake_get() : Increase the domain refcount
>> + * @fw: struct xe_force_wake
>> + * @domains: forcewake domains to get refcount on
>> + *
>> + * This function takes references for the input @domains and wakes them if
>> + * they are asleep.
> 
> nit: likely we should also add the note that one shall call the
> xe_force_wake_put() function to decrease refcounts

Sure.

> 
>> + *
>> + * Return: mask of refcount increased domains.
> 
> do we really need to expose implementation detail to the caller?
> 
> can't we just treat the returned value as opaque data that just needs to
> be passed to the matching xe_force_wake_put(fw, ref) call ?
> 
>> If the return value is
>> + * equal to the input parameter @domains, the operation is considered
>> + * successful.
> 
> sorry, but I'm still little uncomfortable with such approach, due to a
> mismatch between input parameter type that is enum xe_force_wake_domains
> and return type which is unsigned int, as IMO forcing the user to
> compare two different types seems wrong

Sure, I understand the thought process behind it. Will a helper to do it 
is OK. Justifying below for the helper.

> 
> can't we just say that if returned value is zero then no domains were
> waken (which would provide definite answer if single domain was requested) ?

I am concerned  about the scenario, where user ends up considering non 
zero return of FORCEWAKE_ALL as success.

Having multiple success or failure condition for same API, based on 
input parameter looks bad design to me.

ref = xe_force_wake_get(fw, GT)
if (ref) -> means success condn

whereas

ref = xe_force_wake_put(fw, ALL)
if (ref) -> doesn't necessarily means successfully awakened all domain.

User can very easily interpret it wrongly:
ref = xe_force_wake_get(fw, ALL)
if(!ref)
	error_handling;

/* Irrespective of failure or success of xe_force_wake_get code reaches 
here */
do_multiple_operations(); /* Needed all fw domains supported on GT to be 
awake */

xe_force_wake_put(fw, ref);

We have use-cases throughout the driver where user relies on 
success/failure for FORCEWAKE_ALL domains.

> 
> and for the xe_force_wake_get(fw, ALL_DOMAINS) case, we can provide
> helper function that will check if specified domain is really awake:
> 
> 	ref = xe_force_wake_put(fw, ALL_DOMAIN)
> 
> 	if (ref) {
> 		xe_force_wake_is_awake(fw, SINGLE_DOMAIN)


I assume this API is to check whether ref has domain in it or not. Will 
be good helper to have for such scenarios, Which are currently not there 
in driver.


> 
> 		xe_force_wake_put(fw, ref)
> 	}
> 

I believe a helper to determine success or failure of force_wake_get, 
instead of caller doing it will streamline all the domains and usecases.

int xe_force_wake_get_status(enum domain, unsigned int fw_ref)
{
   return  (fw_ref == domain) ? 0 : -ETIMEDOUT;
}

Usecases:

A) No error check, just continue in case of failure too.

ref =  xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
do_operations();
xe_force_wake_put(fw, ref);

B) error check, abort in case of failure.

ref =  xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
int err = xe_force_wake_get_status(domain, ref);
if(err) {
	 xe_force_wake_put(fw, ref);
          return err;
         }
do_operations();
xe_force_wake_put(fw, ref);

c) get all domain, but check specific domain:

ref =  xe_force_wake_get(fw, ALL_domain);
    if (xe_force_wake_get_status(domain, ref))
         dmesg_warn( "unable to awake all requested domain \n");

     if (xe_fwref_has_domain(fw, SINGLE_DOMAIN))
         do_operations()

    xe_force_wake_put(fw, ref);

>> Otherwise, the operation is considered a failure, and
>> + * the caller should handle the failure case, potentially returning
>> + * -ETIMEDOUT.
>> + */
>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>> +			       enum xe_force_wake_domains domains)
>>   {
>>   	struct xe_gt *gt = fw->gt;
>>   	struct xe_force_wake_domain *domain;
>> -	enum xe_force_wake_domains tmp, woken = 0;
>> +	unsigned int tmp, ret, awake_rqst = 0, awake_failed = 0;
>>   	unsigned long flags;
>> -	int ret = 0;
>>   
>>   	spin_lock_irqsave(&fw->lock, flags);
>>   	for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>   		if (!domain->ref++) {
>> -			woken |= BIT(domain->id);
>> +			awake_rqst |= BIT(domain->id);
>>   			domain_wake(gt, domain);
>>   		}
>>   	}
>> -	for_each_fw_domain_masked(domain, woken, fw, tmp) {
>> -		ret |= domain_wake_wait(gt, domain);
>> +	for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>> +		if (domain_wake_wait(gt, domain) == 0) {
>> +			fw->awake_domains |= BIT(domain->id);
>> +		} else {
>> +			awake_failed |= BIT(domain->id);
>> +			--domain->ref;
>> +		}
>>   	}
>> -	fw->awake_domains |= woken;
>> +	ret = (domains & ~awake_failed);
>>   	spin_unlock_irqrestore(&fw->lock, flags);
>>   
>> +	xe_gt_WARN(gt, awake_failed, "domain%s %#x failed to acknowledgment awake\n",
>> +		   str_plural(hweight_long(awake_failed)), awake_failed);
>> +
>>   	return ret;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.h b/drivers/gpu/drm/xe/xe_force_wake.h
>> index a2577672f4e3..6c1ade39139b 100644
>> --- a/drivers/gpu/drm/xe/xe_force_wake.h
>> +++ b/drivers/gpu/drm/xe/xe_force_wake.h
>> @@ -15,8 +15,8 @@ void xe_force_wake_init_gt(struct xe_gt *gt,
>>   			   struct xe_force_wake *fw);
>>   void xe_force_wake_init_engines(struct xe_gt *gt,
>>   				struct xe_force_wake *fw);
>> -int xe_force_wake_get(struct xe_force_wake *fw,
>> -		      enum xe_force_wake_domains domains);
>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>> +			       enum xe_force_wake_domains domains);
>>   int xe_force_wake_put(struct xe_force_wake *fw,
>>   		      enum xe_force_wake_domains domains);
>>   
> 


  reply	other threads:[~2024-09-25 16:36 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-24 12:16 [PATCH v5 00/23] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-09-24 17:27   ` Nilawar, Badal
2024-09-25  6:21     ` Ghimiray, Himal Prasad
2024-09-25 12:01   ` Michal Wajdeczko
2024-09-25 16:36     ` Ghimiray, Himal Prasad [this message]
2024-09-25 17:20       ` Michal Wajdeczko
2024-09-25 18:14         ` Ghimiray, Himal Prasad
2024-09-26 11:03           ` Michal Wajdeczko
2024-09-26 11:43             ` Ghimiray, Himal Prasad
2024-09-24 12:16 ` [PATCH v5 02/23] drm/xe: Modify xe_force_wake_put to handle _get returned mask Himal Prasad Ghimiray
2024-09-25 10:27   ` Nilawar, Badal
2024-09-25 14:03   ` Michal Wajdeczko
2024-09-25 16:44     ` Ghimiray, Himal Prasad
2024-09-24 12:16 ` [PATCH v5 03/23] drm/xe/device: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 04/23] drm/xe/hdcp: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 05/23] drm/xe/gsc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 06/23] drm/xe/gt: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 07/23] drm/xe/xe_gt_idle: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 08/23] drm/xe/devcoredump: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 09/23] drm/xe/tests/mocs: Update xe_force_wake_get() return handling Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 10/23] drm/xe/mocs: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 11/23] drm/xe/xe_drm_client: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 12/23] drm/xe/xe_gt_debugfs: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 13/23] drm/xe/guc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 14/23] drm/xe/huc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 15/23] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 16/23] drm/xe/pat: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 17/23] drm/xe/gt_tlb_invalidation_ggtt: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 18/23] drm/xe/xe_reg_sr: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 19/23] drm/xe/query: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 20/23] drm/xe/vram: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 21/23] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 22/23] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 23/23] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-09-25 10:30   ` Nilawar, Badal
2024-09-25 14:07   ` Michal Wajdeczko
2024-09-25 16:46     ` Ghimiray, Himal Prasad
2024-09-26  1:04 ` ✓ CI.Patch_applied: success for Fix xe_force_wake_get() failure handling (rev5) Patchwork
2024-09-26  1:04 ` ✓ CI.checkpatch: " Patchwork
2024-09-26  1:05 ` ✓ CI.KUnit: " Patchwork
2024-09-26  1:17 ` ✓ CI.Build: " Patchwork
2024-09-26  1:19 ` ✓ CI.Hooks: " Patchwork
2024-09-26  1:20 ` ✓ CI.checksparse: " Patchwork
2024-09-26  1:42 ` ✓ CI.BAT: " Patchwork
2024-09-26  8:17 ` ✗ CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=337515e5-83f8-4969-bc79-f1d380a31330@intel.com \
    --to=himal.prasad.ghimiray@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox