From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
intel-xe@lists.freedesktop.org
Cc: Badal Nilawar <badal.nilawar@intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Lucas De Marchi <lucas.demarchi@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get()
Date: Wed, 25 Sep 2024 19:20:51 +0200 [thread overview]
Message-ID: <8b045c35-95bc-4977-840e-067c2abbcd10@intel.com> (raw)
In-Reply-To: <337515e5-83f8-4969-bc79-f1d380a31330@intel.com>
On 25.09.2024 18:36, Ghimiray, Himal Prasad wrote:
>
>
> On 25-09-2024 17:31, Michal Wajdeczko wrote:
>>
>>
>> On 24.09.2024 14:16, Himal Prasad Ghimiray wrote:
>>> If an acknowledgment timeout occurs for a domain awake request, do not
>>> increment the reference count for the domain. This ensures that
>>> subsequent _get calls do not incorrectly assume the domain is awake. The
>>> return value is a mask of domains whose reference counts were
>>> incremented, and these domains need to be released using
>>> xe_force_wake_put.
>>>
>>> The caller needs to compare the return value with the input domains to
>>> determine the success or failure of the operation and decide whether to
>>> continue or return accordingly.
>>>
>>> While at it, add simple kernel-doc for xe_force_wake_get()
>>>
>>> v3
>>> - Use explicit type for mask (Michal/Badal)
>>> - Improve kernel-doc (Michal)
>>> - Use unsigned int instead of abusing enum (Michal)
>>>
>>> v5
>>> - Use unsigned int for return (MattB/Badal/Rodrigo)
>>> - use xe_gt_WARN for domain awake ack failure (Badal/Rodrigo)
>>>
>>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>> ---
>>> drivers/gpu/drm/xe/xe_force_wake.c | 37 +++++++++++++++++++++++-------
>>> drivers/gpu/drm/xe/xe_force_wake.h | 4 ++--
>>> 2 files changed, 31 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/
>>> xe_force_wake.c
>>> index a64c14757c84..d190aa93be90 100644
>>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>>> @@ -150,28 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>> (ffs(tmp__) - 1))) && \
>>> domain__->reg_ctl.addr)
>>> -int xe_force_wake_get(struct xe_force_wake *fw,
>>> - enum xe_force_wake_domains domains)
>>> +/**
>>> + * xe_force_wake_get() : Increase the domain refcount
>>> + * @fw: struct xe_force_wake
>>> + * @domains: forcewake domains to get refcount on
>>> + *
>>> + * This function takes references for the input @domains and wakes
>>> them if
>>> + * they are asleep.
>>
>> nit: likely we should also add the note that one shall call the
>> xe_force_wake_put() function to decrease refcounts
>
> Sure.
>
>>
>>> + *
>>> + * Return: mask of refcount increased domains.
>>
>> do we really need to expose implementation detail to the caller?
>>
>> can't we just treat the returned value as opaque data that just needs to
>> be passed to the matching xe_force_wake_put(fw, ref) call ?
>>
>>> If the return value is
>>> + * equal to the input parameter @domains, the operation is considered
>>> + * successful.
>>
>> sorry, but I'm still little uncomfortable with such approach, due to a
>> mismatch between input parameter type that is enum xe_force_wake_domains
>> and return type which is unsigned int, as IMO forcing the user to
>> compare two different types seems wrong
>
> Sure, I understand the thought process behind it. Will a helper to do it
> is OK. Justifying below for the helper.
>
>>
>> can't we just say that if returned value is zero then no domains were
>> waken (which would provide definite answer if single domain was
>> requested) ?
>
> I am concerned about the scenario, where user ends up considering non
> zero return of FORCEWAKE_ALL as success.
while user can always do something wrong, note that in above statement
we didn't say that _all_ requested domains were awake if ref != 0
>
> Having multiple success or failure condition for same API, based on
> input parameter looks bad design to me.
it's not multiple, in fact it is unified:
1) ALL_DOMAINS
ref = xe_force_wake_get(fw, ALL_DOMAINS)
if (!ref)
return -EIO
// explicit check
if (xe_force_wake_is_awake(fw, SINGLE_DOMAIN))
...
xe_force_wake_put(fw, ref)
2a) SINGLE_DOMAIN
ref = xe_force_wake_get(fw, SINGLE_DOMAIN)
if (!ref)
return -EIO
// explicit check
if (xe_force_wake_is_awake(fw, SINGLE_DOMAIN))
...
xe_force_wake_put(fw, ref)
and what we did is just optimization for SINGLE_DOMAIN
2b) SINGLE_DOMAIN
ref = xe_force_wake_get(fw, SINGLE_DOMAIN)
if (!ref)
return -EIO
BUG_ON(!xe_force_wake_is_awake(fw, SINGLE_DOMAIN))
xe_force_wake_put(fw, ref)
>
> ref = xe_force_wake_get(fw, GT)
> if (ref) -> means success condn
>
> whereas
>
> ref = xe_force_wake_put(fw, ALL)
> if (ref) -> doesn't necessarily means successfully awakened all domain.
>
> User can very easily interpret it wrongly:
> ref = xe_force_wake_get(fw, ALL)
> if(!ref)
> error_handling;
>
> /* Irrespective of failure or success of xe_force_wake_get code reaches
> here */
> do_multiple_operations(); /* Needed all fw domains supported on GT to be
> awake */
>
> xe_force_wake_put(fw, ref);
>
> We have use-cases throughout the driver where user relies on success/
> failure for FORCEWAKE_ALL domains.
maybe for more consistent API we should have two functions:
a) pass only if _all_ domains are awake
ref = xe_force_wake_get(fw, ALL_DOMAINS)
if (!ref)
return -EIO
// no need for explicit checks
BUG_ON(!xe_force_wake_is_awake(fw, FOO_DOMAIN))
BUG_ON(!xe_force_wake_is_awake(fw, BAR_DOMAIN))
...
xe_force_wake_put(fw, ref)
b) fails only if _all_ domains are not awake
ref = xe_force_wake_get(fw, ALL_DOMAINS)
if (!ref)
return -EIO
// must do explicit checks
if (xe_force_wake_is_awake(fw, SINGLE_DOMAIN))
...
xe_force_wake_put(fw, ref)
>
>>
>> and for the xe_force_wake_get(fw, ALL_DOMAINS) case, we can provide
>> helper function that will check if specified domain is really awake:
>>
>> ref = xe_force_wake_put(fw, ALL_DOMAIN)
>>
>> if (ref) {
>> xe_force_wake_is_awake(fw, SINGLE_DOMAIN)
>
>
> I assume this API is to check whether ref has domain in it or not. Will
> be good helper to have for such scenarios, Which are currently not there
> in driver.
there could be two variants, one that checks fw
xe_force_wake_is_awake(fw, SINGLE_DOMAIN)
other that checks ref
xe_force_wake_ref_has_domain(ref, SINGLE_DOMAIN)
but the only benefit from the latter is that it could be lightweight as
otherwise
BUG_ON(xe_force_wake_is_awake(fw, SINGLE_DOMAIN) ^
xe_force_wake_ref_has_domain(ref, SINGLE_DOMAIN))
>
>
>>
>> xe_force_wake_put(fw, ref)
>> }
>>
>
> I believe a helper to determine success or failure of force_wake_get,
> instead of caller doing it will streamline all the domains and usecases.
>
> int xe_force_wake_get_status(enum domain, unsigned int fw_ref)
> {
> return (fw_ref == domain) ? 0 : -ETIMEDOUT;
> }
I'm not sure we should provide error code, IMO simple bool function will
better and caller can decide what to do next
>
> Usecases:
>
> A) No error check, just continue in case of failure too.
>
> ref = xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
> do_operations();
> xe_force_wake_put(fw, ref);
yep
and inside do_operations() we can always use xe_force_wake_is_awake to
check every domain
>
> B) error check, abort in case of failure.
>
> ref = xe_force_wake_get(fw, domain /* can be ALL_DOMAIN */);
> int err = xe_force_wake_get_status(domain, ref);
> if(err) {
> xe_force_wake_put(fw, ref);
> return err;
> }
> do_operations();
> xe_force_wake_put(fw, ref);
nay, too complicated, better:
ref = xe_force_wake_get(fw, ALL);
err = xe_force_wake_is_awake(fw, ALL) ? do_operations() : -EFATAL;
xe_force_wake_put(fw, ref);
>
> c) get all domain, but check specific domain:
>
> ref = xe_force_wake_get(fw, ALL_domain);
> if (xe_force_wake_get_status(domain, ref))
> dmesg_warn( "unable to awake all requested domain \n");
IIRC in xe_force_wake_get() there is one WARN already
>
> if (xe_fwref_has_domain(fw, SINGLE_DOMAIN))
> do_operations()
>
> xe_force_wake_put(fw, ref);
>
so maybe simpler:
ref = xe_force_wake_get(fw, ALL);
err = xe_force_wake_is_awake(fw, FOO) ? do_operations() : -EMINOR;
xe_force_wake_put(fw, ref);
>>> Otherwise, the operation is considered a failure, and
>>> + * the caller should handle the failure case, potentially returning
>>> + * -ETIMEDOUT.
>>> + */
>>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>>> + enum xe_force_wake_domains domains)
>>> {
>>> struct xe_gt *gt = fw->gt;
>>> struct xe_force_wake_domain *domain;
>>> - enum xe_force_wake_domains tmp, woken = 0;
>>> + unsigned int tmp, ret, awake_rqst = 0, awake_failed = 0;
>>> unsigned long flags;
>>> - int ret = 0;
>>> spin_lock_irqsave(&fw->lock, flags);
>>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>> if (!domain->ref++) {
>>> - woken |= BIT(domain->id);
>>> + awake_rqst |= BIT(domain->id);
>>> domain_wake(gt, domain);
>>> }
>>> }
>>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>>> - ret |= domain_wake_wait(gt, domain);
>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>> + if (domain_wake_wait(gt, domain) == 0) {
>>> + fw->awake_domains |= BIT(domain->id);
>>> + } else {
>>> + awake_failed |= BIT(domain->id);
>>> + --domain->ref;
>>> + }
>>> }
>>> - fw->awake_domains |= woken;
>>> + ret = (domains & ~awake_failed);
>>> spin_unlock_irqrestore(&fw->lock, flags);
>>> + xe_gt_WARN(gt, awake_failed, "domain%s %#x failed to
>>> acknowledgment awake\n",
>>> + str_plural(hweight_long(awake_failed)), awake_failed);
>>> +
>>> return ret;
>>> }
>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.h b/drivers/gpu/drm/
>>> xe/xe_force_wake.h
>>> index a2577672f4e3..6c1ade39139b 100644
>>> --- a/drivers/gpu/drm/xe/xe_force_wake.h
>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.h
>>> @@ -15,8 +15,8 @@ void xe_force_wake_init_gt(struct xe_gt *gt,
>>> struct xe_force_wake *fw);
>>> void xe_force_wake_init_engines(struct xe_gt *gt,
>>> struct xe_force_wake *fw);
>>> -int xe_force_wake_get(struct xe_force_wake *fw,
>>> - enum xe_force_wake_domains domains);
>>> +unsigned int xe_force_wake_get(struct xe_force_wake *fw,
>>> + enum xe_force_wake_domains domains);
>>> int xe_force_wake_put(struct xe_force_wake *fw,
>>> enum xe_force_wake_domains domains);
>>>
>>
>
next prev parent reply other threads:[~2024-09-25 17:20 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-24 12:16 [PATCH v5 00/23] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 01/23] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-09-24 17:27 ` Nilawar, Badal
2024-09-25 6:21 ` Ghimiray, Himal Prasad
2024-09-25 12:01 ` Michal Wajdeczko
2024-09-25 16:36 ` Ghimiray, Himal Prasad
2024-09-25 17:20 ` Michal Wajdeczko [this message]
2024-09-25 18:14 ` Ghimiray, Himal Prasad
2024-09-26 11:03 ` Michal Wajdeczko
2024-09-26 11:43 ` Ghimiray, Himal Prasad
2024-09-24 12:16 ` [PATCH v5 02/23] drm/xe: Modify xe_force_wake_put to handle _get returned mask Himal Prasad Ghimiray
2024-09-25 10:27 ` Nilawar, Badal
2024-09-25 14:03 ` Michal Wajdeczko
2024-09-25 16:44 ` Ghimiray, Himal Prasad
2024-09-24 12:16 ` [PATCH v5 03/23] drm/xe/device: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 04/23] drm/xe/hdcp: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 05/23] drm/xe/gsc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 06/23] drm/xe/gt: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 07/23] drm/xe/xe_gt_idle: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 08/23] drm/xe/devcoredump: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 09/23] drm/xe/tests/mocs: Update xe_force_wake_get() return handling Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 10/23] drm/xe/mocs: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 11/23] drm/xe/xe_drm_client: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 12/23] drm/xe/xe_gt_debugfs: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 13/23] drm/xe/guc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 14/23] drm/xe/huc: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 15/23] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 16/23] drm/xe/pat: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 17/23] drm/xe/gt_tlb_invalidation_ggtt: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 18/23] drm/xe/xe_reg_sr: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 19/23] drm/xe/query: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 20/23] drm/xe/vram: " Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 21/23] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 22/23] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-24 12:16 ` [PATCH v5 23/23] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-09-25 10:30 ` Nilawar, Badal
2024-09-25 14:07 ` Michal Wajdeczko
2024-09-25 16:46 ` Ghimiray, Himal Prasad
2024-09-26 1:04 ` ✓ CI.Patch_applied: success for Fix xe_force_wake_get() failure handling (rev5) Patchwork
2024-09-26 1:04 ` ✓ CI.checkpatch: " Patchwork
2024-09-26 1:05 ` ✓ CI.KUnit: " Patchwork
2024-09-26 1:17 ` ✓ CI.Build: " Patchwork
2024-09-26 1:19 ` ✓ CI.Hooks: " Patchwork
2024-09-26 1:20 ` ✓ CI.checksparse: " Patchwork
2024-09-26 1:42 ` ✓ CI.BAT: " Patchwork
2024-09-26 8:17 ` ✗ CI.FULL: failure " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8b045c35-95bc-4977-840e-067c2abbcd10@intel.com \
--to=michal.wajdeczko@intel.com \
--cc=badal.nilawar@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox