From: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
To: "Nilawar, Badal" <badal.nilawar@intel.com>,
Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
Lucas De Marchi <lucas.demarchi@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [RFC 1/9] drm/xe: Error handling in xe_force_wake_get()
Date: Wed, 11 Sep 2024 12:21:05 +0530 [thread overview]
Message-ID: <eb8b96e2-d593-4c15-9116-f4dca3976dd7@intel.com> (raw)
In-Reply-To: <5d29a6d1-80ea-4b21-b14c-619d719a7d23@intel.com>
On 10-09-2024 23:57, Nilawar, Badal wrote:
>
>
> On 06-09-2024 21:48, Rodrigo Vivi wrote:
>> On Fri, Sep 06, 2024 at 01:32:38AM +0530, Ghimiray, Himal Prasad wrote:
>>>
>>>
>>> On 06-09-2024 00:59, Rodrigo Vivi wrote:
>>>> On Fri, Aug 30, 2024 at 10:53:18AM +0530, Himal Prasad Ghimiray wrote:
>>>>> If an acknowledgment timeout occurs for a domain awake request, put to
>>>>> sleep all domains awakened by the caller and decrease the reference
>>>>> count for all requested domains. This prevents xe_force_wake_get()
>>>>> from
>>>>> leaving an unhandled reference count in case of failure.
>>>>> While at it, add simple kernel-doc for xe_force_wake_get() and
>>>>> xe_force_wake_put() functions.
>>>>>
>>>>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/xe/xe_force_wake.c | 52
>>>>> +++++++++++++++++++++++++++---
>>>>> 1 file changed, 47 insertions(+), 5 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> index b263fff15273..8aa8d9b41052 100644
>>>>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>> @@ -150,31 +150,73 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>>>> (ffs(tmp__) - 1))) && \
>>>>> domain__->reg_ctl.addr)
>>>>> +/**
>>>>> + * xe_force_wake_get : Increase the domain refcount; if it was 0
>>>>> initially, wake the domain
>>>>> + * @fw: struct xe_force_wake
>>>>> + * @domains: forcewake domains to get refcount on
>>>>> + *
>>>>> + * Increment refcount for the force-wake domain. If the domain is
>>>>> + * asleep, awaken it and wait for acknowledgment within the specified
>>>>> + * timeout. If a timeout occurs, decrement the refcount and put the
>>>>> + * caller awaken domains to sleep.
>>>>> + *
>>>>> + * Return: 0 on success or 1 on ack timeout from domains.
>>>>
>>>> * Returns 0 for success, negative error code otherwise.
>>>
>>> Hi Rodrigo,
>>>
>>> Sure. Will fix in next version.
>>>
>>>>
>>>>> + */
>>>>> int xe_force_wake_get(struct xe_force_wake *fw,
>>>>> enum xe_force_wake_domains domains)
>>>>> {
>>>>> struct xe_gt *gt = fw->gt;
>>>>> struct xe_force_wake_domain *domain;
>>>>> - enum xe_force_wake_domains tmp, woken = 0;
>>>>> + enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
>>>>> unsigned long flags;
>>>>> int ret = 0;
>>>>> spin_lock_irqsave(&fw->lock, flags);
>>>>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>>>> if (!domain->ref++) {
>>>>> - woken |= BIT(domain->id);
>>>>> + awake_rqst |= BIT(domain->id);
>>>>> domain_wake(gt, domain);
>>>>> }
>>>>> }
>>>>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>>>>> - ret |= domain_wake_wait(gt, domain);
>>>>
>>>> now you suppress the mmio error code...
>>>> should be better to find a way to propagate that.
>>>
>>>
>>> AFAIU the only possible error code from domain_wake_wait is
>>> -ETIMEDOUT, was
>>> planning to assign same to ret below, which I missed in the RFC.
>>>
>>>
>>>>
>>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>>> + if (domain_wake_wait(gt, domain) == 0)
>>>>> + awake_ack |= BIT(domain->id);
>>>>> + }
>>>>> +
>>>>> + ret = (awake_ack == awake_rqst) ? 0 : 1;
>>>>
>>>> s/1/-EIO/ ?
>>>
>>> How about -ETIMEDOUT ? Since this is same error which will be
>>> propogated in
>>> case of domain_wake_wait failure ?
>>
>> hmm, I guess it makes more sense indeed.
> On patch 9 discussion we are aligning with returning mask of awake
> domains. Make sure whenever the error code is required to return for
> _get -ETIMEDOUT is maintained. May be document this as guideline.
Thanks for the input, very valid point. Will try to document it.
>
>>
>>>
>>>>
>>>>> +
>>>>> + /*
>>>>> + * If @domains is XE_FORCEWAKE_ALL and an acknowledgment times
>>>>> out
>>>>> + * for any domain, decrease the reference count and put the awake
>>>>> + * domains to sleep. For individual domains, just decrement the
>>>>> + * reference count.
>>>>> + */
>>>>> + if (ret) {
>>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>>> + if (!--domain->ref && (awake_ack & BIT(domain->id)))
>>>>> + domain_sleep(gt, domain);
>>>>
>>>> wonder if it would help to extract this in a separate function to be
>>>> used here and in the -put function.
>>>
>>> Let me think around that.
>>>
>>>>
>>>> But more then that, I have a question here...
>>>> Do we really need to sleep other domains if we are not getting ack
>>>> from certain domain?
>>>> Doesn't it generally means that we are busted anyway?
>>>
>>> I have no strong opinion on this, main thing is refcount shouldn't be
>>> incremented.
>>>
>>>>
>>>> But also, if we really need to sleep, then perhaps shouldn't we also
>>>> call the sleep function even from the guys who didn't ack? perhaps
>>>> the ack
>>>> timedout, but it really woke-up? how sure we are that this is not
>>>> possible?
>>>
>>> I didn't want to change the hw state by calling sleep for the "ack
>>> failed"
>>> domain, so if necessary, Debug tools (PythonSV) can help us pinpoint the
>>> exact failure state of the HW registers.
> Agreed, let’s avoid putting a failed domain to sleep as it will aid in
> debugging. It’s possible that the acknowledgment timed out but the
> domain still woke up. As discussed in patch 9, subsequent firmware
> get/put calls will put the domain to sleep. The only concern is if the
> device is idle and forcewake is triggered via a sysfs/debugfs entry, the
> domain may remain awake until a forcewake get/put call is made.
That is true. I think this is something we will need to live with in
term of keeping hardware state same untill next get/put.
>
> Regards,
> Badal
>>>
>>>
>>>>
>>>>> + }
>>>>> + awake_ack = 0;
>>>>> }
>>>>> - fw->awake_domains |= woken;
>>>>> +
>>>>> + fw->awake_domains |= awake_ack;
>>>>> spin_unlock_irqrestore(&fw->lock, flags);
>>>>> return ret;
>>>>> }
>>>>> +/**
>>>>> + * xe_force_wake_put - Decrement the refcount and put domain to
>>>>> sleep if refcount becomes 0
>>>>> + * @fw: Pointer to the force wake structure
>>>>> + * @domains: forcewake domains to put reference
>>>>> + *
>>>>> + * This function reduces the reference counts for specified
>>>>> domains. If
>>>>> + * refcount for any of the specified domain reaches 0, it puts the
>>>>> domain to sleep
>>>>> + * and waits for acknowledgment for domain to sleep within
>>>>> specified timeout.
>>>>> + * Ensure this function is called only in case of successful
>>>>> xe_force_wake_get().
>>>>> + *
>>>>> + * Returns 0 in case of success or non-zero in case of timeout of ack
>>>>> + */
>>>>> int xe_force_wake_put(struct xe_force_wake *fw,
>>>>> enum xe_force_wake_domains domains)
>>>>> {
>>>>> --
>>>>> 2.34.1
>>>>>
next prev parent reply other threads:[~2024-09-11 6:51 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 5:23 [RFC 0/9] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-08-30 5:18 ` ✓ CI.Patch_applied: success for " Patchwork
2024-08-30 5:18 ` ✓ CI.checkpatch: " Patchwork
2024-08-30 5:19 ` ✓ CI.KUnit: " Patchwork
2024-08-30 5:23 ` [RFC 1/9] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-08-30 6:37 ` Jani Nikula
2024-08-30 6:45 ` Ghimiray, Himal Prasad
2024-09-05 19:29 ` Rodrigo Vivi
2024-09-05 20:02 ` Ghimiray, Himal Prasad
2024-09-06 16:18 ` Rodrigo Vivi
2024-09-10 18:27 ` Nilawar, Badal
2024-09-11 6:51 ` Ghimiray, Himal Prasad [this message]
2024-09-11 6:40 ` Upadhyay, Tejas
2024-08-30 5:23 ` [RFC 2/9] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-05 19:30 ` Rodrigo Vivi
2024-08-30 5:23 ` [RFC 3/9] drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get() succeeds Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 4/9] drm/xe/gt: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 5/9] drm/xe/guc: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 6/9] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds Himal Prasad Ghimiray
2024-09-05 19:37 ` Rodrigo Vivi
2024-09-05 19:51 ` Ghimiray, Himal Prasad
2024-09-06 16:29 ` Rodrigo Vivi
2024-09-09 9:29 ` Ghimiray, Himal Prasad
2024-09-10 14:37 ` Nilawar, Badal
2024-09-10 17:39 ` Rodrigo Vivi
2024-09-10 17:53 ` Nilawar, Badal
2024-08-30 5:23 ` [RFC 8/9] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 9/9] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-08-30 5:32 ` ✓ CI.Build: success for Fix xe_force_wake_get() failure handling Patchwork
2024-08-30 5:37 ` ✓ CI.Hooks: " Patchwork
2024-08-30 5:42 ` ✓ CI.checksparse: " Patchwork
2024-08-30 6:05 ` ✓ CI.BAT: " Patchwork
2024-08-30 17:41 ` ✓ CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eb8b96e2-d593-4c15-9116-f4dca3976dd7@intel.com \
--to=himal.prasad.ghimiray@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox