From: "Nilawar, Badal" <badal.nilawar@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>,
"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
Lucas De Marchi <lucas.demarchi@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [RFC 1/9] drm/xe: Error handling in xe_force_wake_get()
Date: Tue, 10 Sep 2024 23:57:56 +0530 [thread overview]
Message-ID: <5d29a6d1-80ea-4b21-b14c-619d719a7d23@intel.com> (raw)
In-Reply-To: <ZtsrTvFHbMuNvXvk@intel.com>
On 06-09-2024 21:48, Rodrigo Vivi wrote:
> On Fri, Sep 06, 2024 at 01:32:38AM +0530, Ghimiray, Himal Prasad wrote:
>>
>>
>> On 06-09-2024 00:59, Rodrigo Vivi wrote:
>>> On Fri, Aug 30, 2024 at 10:53:18AM +0530, Himal Prasad Ghimiray wrote:
>>>> If an acknowledgment timeout occurs for a domain awake request, put to
>>>> sleep all domains awakened by the caller and decrease the reference
>>>> count for all requested domains. This prevents xe_force_wake_get() from
>>>> leaving an unhandled reference count in case of failure.
>>>> While at it, add simple kernel-doc for xe_force_wake_get() and
>>>> xe_force_wake_put() functions.
>>>>
>>>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>> ---
>>>> drivers/gpu/drm/xe/xe_force_wake.c | 52 +++++++++++++++++++++++++++---
>>>> 1 file changed, 47 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
>>>> index b263fff15273..8aa8d9b41052 100644
>>>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>>>> @@ -150,31 +150,73 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>>> (ffs(tmp__) - 1))) && \
>>>> domain__->reg_ctl.addr)
>>>> +/**
>>>> + * xe_force_wake_get : Increase the domain refcount; if it was 0 initially, wake the domain
>>>> + * @fw: struct xe_force_wake
>>>> + * @domains: forcewake domains to get refcount on
>>>> + *
>>>> + * Increment refcount for the force-wake domain. If the domain is
>>>> + * asleep, awaken it and wait for acknowledgment within the specified
>>>> + * timeout. If a timeout occurs, decrement the refcount and put the
>>>> + * caller awaken domains to sleep.
>>>> + *
>>>> + * Return: 0 on success or 1 on ack timeout from domains.
>>>
>>> * Returns 0 for success, negative error code otherwise.
>>
>> Hi Rodrigo,
>>
>> Sure. Will fix in next version.
>>
>>>
>>>> + */
>>>> int xe_force_wake_get(struct xe_force_wake *fw,
>>>> enum xe_force_wake_domains domains)
>>>> {
>>>> struct xe_gt *gt = fw->gt;
>>>> struct xe_force_wake_domain *domain;
>>>> - enum xe_force_wake_domains tmp, woken = 0;
>>>> + enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
>>>> unsigned long flags;
>>>> int ret = 0;
>>>> spin_lock_irqsave(&fw->lock, flags);
>>>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>>> if (!domain->ref++) {
>>>> - woken |= BIT(domain->id);
>>>> + awake_rqst |= BIT(domain->id);
>>>> domain_wake(gt, domain);
>>>> }
>>>> }
>>>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>>>> - ret |= domain_wake_wait(gt, domain);
>>>
>>> now you suppress the mmio error code...
>>> should be better to find a way to propagate that.
>>
>>
>> AFAIU the only possible error code from domain_wake_wait is -ETIMEDOUT, was
>> planning to assign same to ret below, which I missed in the RFC.
>>
>>
>>>
>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>> + if (domain_wake_wait(gt, domain) == 0)
>>>> + awake_ack |= BIT(domain->id);
>>>> + }
>>>> +
>>>> + ret = (awake_ack == awake_rqst) ? 0 : 1;
>>>
>>> s/1/-EIO/ ?
>>
>> How about -ETIMEDOUT ? Since this is same error which will be propogated in
>> case of domain_wake_wait failure ?
>
> hmm, I guess it makes more sense indeed.
On patch 9 discussion we are aligning with returning mask of awake
domains. Make sure whenever the error code is required to return for
_get -ETIMEDOUT is maintained. May be document this as guideline.
>
>>
>>>
>>>> +
>>>> + /*
>>>> + * If @domains is XE_FORCEWAKE_ALL and an acknowledgment times out
>>>> + * for any domain, decrease the reference count and put the awake
>>>> + * domains to sleep. For individual domains, just decrement the
>>>> + * reference count.
>>>> + */
>>>> + if (ret) {
>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>> + if (!--domain->ref && (awake_ack & BIT(domain->id)))
>>>> + domain_sleep(gt, domain);
>>>
>>> wonder if it would help to extract this in a separate function to be
>>> used here and in the -put function.
>>
>> Let me think around that.
>>
>>>
>>> But more then that, I have a question here...
>>> Do we really need to sleep other domains if we are not getting ack from certain domain?
>>> Doesn't it generally means that we are busted anyway?
>>
>> I have no strong opinion on this, main thing is refcount shouldn't be
>> incremented.
>>
>>>
>>> But also, if we really need to sleep, then perhaps shouldn't we also
>>> call the sleep function even from the guys who didn't ack? perhaps the ack
>>> timedout, but it really woke-up? how sure we are that this is not possible?
>>
>> I didn't want to change the hw state by calling sleep for the "ack failed"
>> domain, so if necessary, Debug tools (PythonSV) can help us pinpoint the
>> exact failure state of the HW registers.
Agreed, let’s avoid putting a failed domain to sleep as it will aid in
debugging. It’s possible that the acknowledgment timed out but the
domain still woke up. As discussed in patch 9, subsequent firmware
get/put calls will put the domain to sleep. The only concern is if the
device is idle and forcewake is triggered via a sysfs/debugfs entry, the
domain may remain awake until a forcewake get/put call is made.
Regards,
Badal
>>
>>
>>>
>>>> + }
>>>> + awake_ack = 0;
>>>> }
>>>> - fw->awake_domains |= woken;
>>>> +
>>>> + fw->awake_domains |= awake_ack;
>>>> spin_unlock_irqrestore(&fw->lock, flags);
>>>> return ret;
>>>> }
>>>> +/**
>>>> + * xe_force_wake_put - Decrement the refcount and put domain to sleep if refcount becomes 0
>>>> + * @fw: Pointer to the force wake structure
>>>> + * @domains: forcewake domains to put reference
>>>> + *
>>>> + * This function reduces the reference counts for specified domains. If
>>>> + * refcount for any of the specified domain reaches 0, it puts the domain to sleep
>>>> + * and waits for acknowledgment for domain to sleep within specified timeout.
>>>> + * Ensure this function is called only in case of successful xe_force_wake_get().
>>>> + *
>>>> + * Returns 0 in case of success or non-zero in case of timeout of ack
>>>> + */
>>>> int xe_force_wake_put(struct xe_force_wake *fw,
>>>> enum xe_force_wake_domains domains)
>>>> {
>>>> --
>>>> 2.34.1
>>>>
next prev parent reply other threads:[~2024-09-10 18:28 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 5:23 [RFC 0/9] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-08-30 5:18 ` ✓ CI.Patch_applied: success for " Patchwork
2024-08-30 5:18 ` ✓ CI.checkpatch: " Patchwork
2024-08-30 5:19 ` ✓ CI.KUnit: " Patchwork
2024-08-30 5:23 ` [RFC 1/9] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-08-30 6:37 ` Jani Nikula
2024-08-30 6:45 ` Ghimiray, Himal Prasad
2024-09-05 19:29 ` Rodrigo Vivi
2024-09-05 20:02 ` Ghimiray, Himal Prasad
2024-09-06 16:18 ` Rodrigo Vivi
2024-09-10 18:27 ` Nilawar, Badal [this message]
2024-09-11 6:51 ` Ghimiray, Himal Prasad
2024-09-11 6:40 ` Upadhyay, Tejas
2024-08-30 5:23 ` [RFC 2/9] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-05 19:30 ` Rodrigo Vivi
2024-08-30 5:23 ` [RFC 3/9] drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get() succeeds Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 4/9] drm/xe/gt: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 5/9] drm/xe/guc: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 6/9] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds Himal Prasad Ghimiray
2024-09-05 19:37 ` Rodrigo Vivi
2024-09-05 19:51 ` Ghimiray, Himal Prasad
2024-09-06 16:29 ` Rodrigo Vivi
2024-09-09 9:29 ` Ghimiray, Himal Prasad
2024-09-10 14:37 ` Nilawar, Badal
2024-09-10 17:39 ` Rodrigo Vivi
2024-09-10 17:53 ` Nilawar, Badal
2024-08-30 5:23 ` [RFC 8/9] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 9/9] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-08-30 5:32 ` ✓ CI.Build: success for Fix xe_force_wake_get() failure handling Patchwork
2024-08-30 5:37 ` ✓ CI.Hooks: " Patchwork
2024-08-30 5:42 ` ✓ CI.checksparse: " Patchwork
2024-08-30 6:05 ` ✓ CI.BAT: " Patchwork
2024-08-30 17:41 ` ✓ CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d29a6d1-80ea-4b21-b14c-619d719a7d23@intel.com \
--to=badal.nilawar@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox