From: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
To: Matthew Brost <matthew.brost@intel.com>,
"Nilawar, Badal" <badal.nilawar@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>,
<intel-xe@lists.freedesktop.org>,
Rodrigo Vivi <rodrigo.vivi@intel.com>,
Lucas De Marchi <lucas.demarchi@intel.com>,
Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [PATCH v2 01/23] drm/xe: Error handling in xe_force_wake_get()
Date: Wed, 18 Sep 2024 12:02:14 +0530 [thread overview]
Message-ID: <271c25c3-401c-43b3-8d9c-dc13027a6ea7@intel.com> (raw)
In-Reply-To: <ZunPgOAndPtrl74H@DUT025-TGLU.fm.intel.com>
On 18-09-2024 00:20, Matthew Brost wrote:
> On Tue, Sep 17, 2024 at 11:18:47AM +0530, Nilawar, Badal wrote:
>>
>>
>> On 13-09-2024 18:47, Ghimiray, Himal Prasad wrote:
>>>
>>>
>>> On 13-09-2024 16:56, Michal Wajdeczko wrote:
>>>>
>>>>
>>>> On 13.09.2024 05:59, Ghimiray, Himal Prasad wrote:
>>>>>
>>>>>
>>>>> On 13-09-2024 03:01, Michal Wajdeczko wrote:
>>>>>>
>>>>>>
>>>>>> On 12.09.2024 21:15, Himal Prasad Ghimiray wrote:
>>>>>>> If an acknowledgment timeout occurs for a domain awake request, do not
>>>>>>> increment the reference count for the domain. This ensures that
>>>>>>> subsequent _get calls do not incorrectly assume the
>>>>>>> domain is awake. The
>>>>>>> return value is a mask of domains whose reference counts were
>>>>>>> incremented, and these domains need to be released using
>>>>>>> xe_force_wake_put.
>>>>>>>
>>>>>>> The caller needs to compare the return value with the input domains to
>>>>>>> determine the success or failure of the operation and
>>>>>>> decide whether to
>>>>>>> continue or return accordingly.
>>>>>>>
>>>>>>> While at it, add simple kernel-doc for xe_force_wake_get()
>>>>>>>
>>>>>>> Cc: Badal Nilawar <badal.nilawar@intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>>>> Cc: Nirmoy Das <nirmoy.das@intel.com>
>>>>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/xe/xe_force_wake.c | 35
>>>>>>> ++++++++++++++++++++++++ +-----
>>>>>>> 1 file changed, 29 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>>>> b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>>>> index a64c14757c84..fa42d652d23f 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_force_wake.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_force_wake.c
>>>>>>> @@ -150,26 +150,49 @@ static int domain_sleep_wait(struct xe_gt *gt,
>>>>>>> (ffs(tmp__) - 1))) && \
>>>>>>> domain__->reg_ctl.addr)
>>>>>>> +/**
>>>>>>> + * xe_force_wake_get : Increase the domain refcount; if it was 0
>>>>>>> initially, wake the domain
>>>>>>
>>>>>> while likely this is still recognized by the kernel-doc tool, this is
>>>>>> not correct notation for the function() documentation
>>>>>
>>>>>
>>>>> I assume you are suggesting %s/xe_force_wake_get/xe_force_wake_get()
>>>>> will fix it.
>>>>>
>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://docs.kernel.org/doc-guide/kernel-doc.html#function-
>>>>>> documentation
>>>>>>
>>>>>>> + * @fw: struct xe_force_wake
>>>>>>> + * @domains: forcewake domains to get refcount on
>>>>>>> + *
>>>>>>> + * Increment refcount for the force-wake domain. If the domain is
>>>>>>> + * asleep, awaken it and wait for acknowledgment within the specified
>>>>>>> + * timeout. If a timeout occurs, decrement the refcount.
>>>>>>
>>>>>> not sure if doc shall be 1:1 of low level implementation details
>>>>>
>>>>> Does this sound okay ?
>>>>> This function takes references for the input @domains and wakes them if
>>>>> they are asleep.
>>>>>
>>>>>>
>>>>>>> + * The caller should compare the return value with the @domains to
>>>>>>> + * determine the success or failure of the operation.
>>>>>>> + *
>>>>>>> + * Return: mask of refcount increased domains.
>>>>>>
>>>>>> if we return a 'mask' then maybe it should be of 'unsigned int' type?
>>>>>
>>>>> Agreed. Will fix in next version.
>>>>>
>>>>>>
>>>>>>> If the return value is
>>>>>>> + * equal to the input parameter @domains, the operation is considered
>>>>>>> + * successful. Otherwise, the operation is considered a failure, and
>>>>>>> + * the caller should handle the failure case, potentially returning
>>>>>>> + * -ETIMEDOUT.
>>>>>>
>>>>>> it looks that all problems with the nice API is due to the
>>>>>> XE_FORCEWAKE_ALL that is not a single domain ID and requires extra care
>>>>>>
>>>>>> maybe there should be different pair of functions:
>>>>>
>>>>> I am not convinced with different pair of functions:
>>>>>
>>>>> In current implementation:
>>>>>
>>>>> int mask = xe_force_wake_get(fw, domains)
>>>>> if (mask != domains) {
>>>>> Non critical path continue with warning;
>>>>> or
>>>>> critical path:
>>>>> xe_force_wake_put(fw, mask);
>>>>> return -ETIMEDOUT;
>>>>> }
>>>>>
>>>>> do_ops;
>>>>> xe_force_wake_put(fw, mask);
>>>>> return err;
>>>>>
>>>>> Above flow remains intact irrespective of individual domains or
>>>>> FORCEWAKE_ALL.
>>>>>
>>>>> In case of individual domains if (mask != domains) can be replaced with
>>>>> (!mask) and user can avoid xe_force_wake_put(fw, mask) in failure path
>>>>> since mask is 0;
>>>>
>>>> so maybe we should have (by reinventing i915?):
>>>>
>>>> // opaque, but zero means failure/no domains are awake
>>>> typedef unsigned long xe_wakeref_t;
>>>>
>>>>
>>>> // caller should test for ref != 0
>>>> // but shall call put if ref != 0
>>>> xe_wakeref_t xe_force_wake_get(fw, enum xe_force_wake_domains d)
>>>>
>>>> // safe to call with ref == 0
>>>> void xe_force_wake_put(fw, xe_wakeref_t ref)
>>>>
>>>>
>>>> // helpers for critical work that must be sure about domain
>>>>
>>>> // compares opaque ref with explicit domain != ALL
>>>> // can be used by the code that obtained the ref
>>>> bool xe_wakeref_has_domain(xe_wakeref_t, enum xe_force_wake_domains d)
>>>>
>>>> // compares fw with explicit domain != ALL
>>>> // can be used by the code that does not have direct access to the ref
>>>> bool xe_force_wake_is_awake(fw, enum xe_force_wake_domains d)
>>>>
>>>>
>>>> // helpers for checking correctness
>>>> void xe_force_wake_assert_held(fw, enum xe_force_wake_domains d)
>>>>
>>>>
>>>> then usage would be:
>>>>
>>>> xe_wakeref_t ref;
>>>>
>>>> ref = xe_force_wake_get(fw, d);
>>>> if (ref) {
>>>> // ...
>>>> xe_force_wake_put(fw, ref);
>>>> }
>>>>
>>>> or:
>>>>
>>>> xe_wakeref_t ref;
>>>>
>>>> ref = xe_force_wake_get(fw, ALL);
>>>> if (xe_wakeref_has_domain(ref, d1))
>>>> // ... critical work1
>>>> if (xe_wakeref_has_domain(ref, d2))
>>>> // ... critical work2
>>>> xe_force_wake_put(fw, ref);
>>>>
>>>>
>>>> so above will be very similar to what you have but by having explicit
>>>> types IMO it will help connect all functions into proper use-case flow
>>>
>>>
>>> Agreed implementation/usage will be same, will use explicit type for
>>> clarity.
>>> IMO typedef unsigned int xe_wakeref_t is sufficient instead of
>>> typedef unsigned long xe_wakeref_t;
>>
>> I agree with this.
>>
>
> What? Really? I thought it was pretty clear rule in kernel programing
> not use typedefs [1]. Reading through conditions acceptable and I don't
> use anything applies to this series, maybe a) applies but not really
> convinced. The example in a) is a pte_t which can likely change based on
> platform target whereas here we only have one target and see no reason
> this needs to be opaque.
>
> Matt
>
> [1] https://www.kernel.org/doc/html/v4.14/process/coding-style.html#typedefs
While running checkpatch on my changes, patchwork had also issued a
WARNING: NEW_TYPEDEFS: do not add new typedefs. I reviewed the usage in
the Linux kernel tree and found it used in many places, which led me to
assume it was safe. I now realize that I should have been more careful
in understanding the context of its usage and referred to the kernel
coding guidelines. This was an oversight on my part.
Since this doesn’t impact the CI or runtime, I will avoid reverting to
unsigned int immediately and will hold off until I receive the other
review comments. I will incorporate the changes to revert it in
subsequent versions while also addressing the other review comments.
Thank you for bringing this to the attention.
BR
Himal
>
>> Regards,
>> Badal
>>>
>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> // for single domain where ret=0 is success, ret<0 is error
>>>>>
>>>>> This leads to caller only calling xe_force_wake_put incase of get
>>>>> success. so in case of caller continuing with failure, he will need to
>>>>> ensure the put is not called.
>>>>>
>>>>> for example:
>>>>> int ret;
>>>>>
>>>>> ret = xe_force_wake_get(fw, DOMAIN_GT);
>>>>> XE_WARN_ON(ret)
>>>>> if(!ret)
>>>>> xe_force_wake_put(fw, DOMAIN_GT);
>>>>>
>>>>>> int xe_force_wake_get(fw, enum xe_force_wake_domain_id id);
>>>>>> void xe_force_wake_put(fw, enum xe_force_wake_domain_id id);
>>>>>>
>>>>>> and
>>>>>>
>>>>>> // for all domain where ret=0 is success, ret<0 is error
>>>>>> int int xe_force_wake_get_all(fw);
>>>>>> void xe_force_wake_put_all(fw);
>>>>>
>>>>> In case of xe_force_wake_get_all(fw) failure, how the caller will know
>>>>> which domains got awake and which failed ?
>>>>>
>>>>> ret = xe_force_wake_get_all(fw);
>>>>> if(!ret)
>>>>> No way to put awake domains to sleep
>>>>
>>>> in case of failure, it would be the responsibility of the
>>>> xe_force_wake_get_all() to put all partial awakes immediately, since it
>>>> failed to awake all requested domains (same as in single domain case)
>>>>
>>>> but let's drop this idea
>>>>
>>>>>
>>>>>>
>>>>>> and
>>>>>>
>>>>>> // input: mask of domains, return: mask of domain
>>>>>> unsigned int xe_force_wake_get_mask(fw, mask);
>>>>>> void xe_force_wake_put_mask(fw, mask);
>>>>>>
>>>>>> this last one can be just main implementation (static or public if we
>>>>>> really want to continue with random set of enabled domains)
>>>>>>
>>>>>>> + */
>>>>>>> int xe_force_wake_get(struct xe_force_wake *fw,
>>>>>>> enum xe_force_wake_domains domains)
>>>>>>> {
>>>>>>> struct xe_gt *gt = fw->gt;
>>>>>>> struct xe_force_wake_domain *domain;
>>>>>>> - enum xe_force_wake_domains tmp, woken = 0;
>>>>>>> + enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
>>>>>>
>>>>>> it looks that you're abusing even more all enum variables by treating
>>>>>> them as plain integers
>>>>>
>>>>> Miss at my end. Will address them in next version.
>>>>>
>>>>>>
>>>>>>> unsigned long flags;
>>>>>>> - int ret = 0;
>>>>>>> + int ret = domains;
>>>>>>> spin_lock_irqsave(&fw->lock, flags);
>>>>>>> for_each_fw_domain_masked(domain, domains, fw, tmp) {
>>>>>>> if (!domain->ref++) {
>>>>>>> - woken |= BIT(domain->id);
>>>>>>> + awake_rqst |= BIT(domain->id);
>>>>>>> domain_wake(gt, domain);
>>>>>>> }
>>>>>>> }
>>>>>>> - for_each_fw_domain_masked(domain, woken, fw, tmp) {
>>>>>>> - ret |= domain_wake_wait(gt, domain);
>>>>>>> + for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
>>>>>>> + if (domain_wake_wait(gt, domain) == 0) {
>>>>>>> + awake_ack |= BIT(domain->id);
>>>>>>> + } else {
>>>>>>> + ret &= ~BIT(domain->id);
>>>>>>> + --domain->ref;
>>>>>>> + }
>>>>>>> }
>>>>>>> - fw->awake_domains |= woken;
>>>>>>> +
>>>>>>> + fw->awake_domains |= awake_ack;
>>>>>>> spin_unlock_irqrestore(&fw->lock, flags);
>>>>>>> return ret;
>>
next prev parent reply other threads:[~2024-09-18 6:34 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-12 19:15 [PATCH v2 00/23] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 01/23] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-09-12 21:31 ` Michal Wajdeczko
2024-09-13 3:59 ` Ghimiray, Himal Prasad
2024-09-13 11:26 ` Michal Wajdeczko
2024-09-13 13:17 ` Ghimiray, Himal Prasad
2024-09-17 5:48 ` Nilawar, Badal
2024-09-17 18:50 ` Matthew Brost
2024-09-18 6:32 ` Ghimiray, Himal Prasad [this message]
2024-09-18 7:19 ` Jani Nikula
2024-09-18 14:50 ` Jani Nikula
2024-09-19 11:07 ` Nilawar, Badal
2024-09-19 11:36 ` Jani Nikula
2024-09-19 12:32 ` Nilawar, Badal
2024-09-23 12:36 ` Ghimiray, Himal Prasad
2024-09-23 16:15 ` Rodrigo Vivi
2024-09-12 19:15 ` [PATCH v2 02/23] drm/xe: Modify xe_force_wake_put to handle _get returned mask Himal Prasad Ghimiray
2024-09-12 21:34 ` Michal Wajdeczko
2024-09-13 4:05 ` Ghimiray, Himal Prasad
2024-09-12 19:15 ` [PATCH v2 03/23] drm/xe/device: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 04/23] drm/xe/hdcp: " Himal Prasad Ghimiray
2024-09-13 4:23 ` Kandpal, Suraj
2024-09-12 19:15 ` [PATCH v2 05/23] drm/xe/gsc: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 06/23] drm/xe/gt: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 07/23] drm/xe/xe_gt_idle: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 08/23] drm/xe/devcoredump: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 09/23] drm/xe/tests/mocs: Update xe_force_wake_get() return handling Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 10/23] drm/xe/mocs: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 11/23] drm/xe/xe_drm_client: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 12/23] drm/xe/xe_gt_debugfs: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 13/23] drm/xe/guc: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 14/23] drm/xe/huc: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 15/23] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 16/23] drm/xe/pat: Update handling of xe_force_wake_get return Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 17/23] drm/xe/gt_tlb_invalidation_ggtt: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 18/23] drm/xe/xe_reg_sr: " Himal Prasad Ghimiray
2024-09-12 19:15 ` [PATCH v2 19/23] drm/xe/query: " Himal Prasad Ghimiray
2024-09-12 19:16 ` [PATCH v2 20/23] drm/xe/vram: " Himal Prasad Ghimiray
2024-09-12 19:16 ` [PATCH v2 21/23] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-09-12 19:16 ` [PATCH v2 22/23] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-12 19:16 ` [PATCH v2 23/23] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-09-13 4:09 ` Ghimiray, Himal Prasad
2024-09-13 10:24 ` Michal Wajdeczko
2024-09-13 13:26 ` Ghimiray, Himal Prasad
2024-09-13 13:31 ` Ghimiray, Himal Prasad
2024-09-16 18:42 ` Nilawar, Badal
2024-09-17 4:48 ` Ghimiray, Himal Prasad
2024-09-17 4:52 ` Nilawar, Badal
2024-09-17 5:21 ` Nilawar, Badal
2024-09-17 5:24 ` Ghimiray, Himal Prasad
2024-09-12 19:24 ` ✓ CI.Patch_applied: success for Fix xe_force_wake_get() failure handling (rev2) Patchwork
2024-09-12 19:24 ` ✓ CI.checkpatch: " Patchwork
2024-09-12 19:25 ` ✓ CI.KUnit: " Patchwork
2024-09-12 19:37 ` ✓ CI.Build: " Patchwork
2024-09-12 19:39 ` ✓ CI.Hooks: " Patchwork
2024-09-12 19:41 ` ✓ CI.checksparse: " Patchwork
2024-09-12 19:58 ` ✗ CI.BAT: failure " Patchwork
2024-09-13 12:01 ` ✗ CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=271c25c3-401c-43b3-8d9c-dc13027a6ea7@intel.com \
--to=himal.prasad.ghimiray@intel.com \
--cc=badal.nilawar@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=michal.wajdeczko@intel.com \
--cc=nirmoy.das@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox