From: "Nilawar, Badal" <badal.nilawar@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
<intel-xe@lists.freedesktop.org>,
Matthew Brost <matthew.brost@intel.com>,
Lucas De Marchi <lucas.demarchi@intel.com>
Subject: Re: [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds
Date: Tue, 10 Sep 2024 23:23:06 +0530 [thread overview]
Message-ID: <a6ce3220-3522-4489-a875-3731bd85f9ed@intel.com> (raw)
In-Reply-To: <ZuCEO9o3aqy4fdU0@intel.com>
On 10-09-2024 23:09, Rodrigo Vivi wrote:
> On Tue, Sep 10, 2024 at 08:07:01PM +0530, Nilawar, Badal wrote:
>>
>>
>> On 09-09-2024 14:59, Ghimiray, Himal Prasad wrote:
>>>
>>>
>>> On 06-09-2024 21:59, Rodrigo Vivi wrote:
>>>> On Fri, Sep 06, 2024 at 01:21:41AM +0530, Ghimiray, Himal Prasad wrote:
>>>>>
>>>>>
>>>>> On 06-09-2024 01:07, Rodrigo Vivi wrote:
>>>>>> On Fri, Aug 30, 2024 at 10:53:24AM +0530, Himal Prasad Ghimiray wrote:
>>>>>>> A failure in xe_force_wake_get() no longer increments the domain's
>>>>>>> refcount, so xe_force_wake_put() should not be called in such cases
>>>>>>>
>>>>>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>>>>> Cc: Lucas De Marchi <lucas.demarchi@intel.com>
>>>>>>> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 ++++++---
>>>>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>>>>> b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>>>>> index cca9cf536f76..3f86ab704c4f 100644
>>>>>>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>>>>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>>>>>>> @@ -259,11 +259,11 @@ static int
>>>>>>> xe_gt_tlb_invalidation_guc(struct xe_gt *gt,
>>>>>>> int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>>>> {
>>>>>>> struct xe_device *xe = gt_to_xe(gt);
>>>>>>> + int ret;
>>>>>>> if (xe_guc_ct_enabled(>->uc.guc.ct) &&
>>>>>>> gt->uc.guc.submission_state.enabled) {
>>>>>>> struct xe_gt_tlb_invalidation_fence fence;
>>>>>>> - int ret;
>>>>>>> xe_gt_tlb_invalidation_fence_init(gt, &fence, true);
>>>>>>> ret = xe_gt_tlb_invalidation_guc(gt, &fence);
>>>>>>> @@ -277,7 +277,9 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>>>> if (IS_SRIOV_VF(xe))
>>>>>>> return 0;
>>>>>>> - xe_gt_WARN_ON(gt, xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
>>>>>>> + ret = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>>>>> + xe_gt_WARN_ON(gt, ret);
>>>>>>> +
>>>>>>> if (xe->info.platform == XE_PVC ||
>>>>>>> GRAPHICS_VER(xe) >= 20) {
>>>>>>> xe_mmio_write32(gt, PVC_GUC_TLB_INV_DESC1,
>>>>>>> PVC_GUC_TLB_INV_DESC1_INVALIDATE);
>>>>>>> @@ -287,7 +289,8 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>>>>>> xe_mmio_write32(gt, GUC_TLB_INV_CR,
>>>>>>> GUC_TLB_INV_CR_INVALIDATE);
>>>>>>> }
>>>>>>> - xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>>>>> + if (!ret)
>>>>>>> + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT);
>>>>>>
>>>>>> looking all these cases now I honestly prefer the other way around.
>>>>>>
>>>>>> If we called the get, we call the put.
>>>>>> get always increase the reference and put does the clean-up.
>>>>>>
>>>>>> fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>>>>>>
>>>>>> xe_force_wake_put(gt_to_fw(gt), fw_ref);
>>>>>>
>>>>>> so, the fw_ref is a mask of the woken up cases which require
>>>>>> the ref drop and sleep call.
>>>>>
>>>>> Hi Rodrigo,
>>>>>
>>>>> Thanks for the input. AFAIU using this approach creates issue in the
>>>>> subsequent force_wake_get/put in callee function. Which I have tried to
>>>>> explain in cover letter.
>>>>>
>>>>> [1] subsequent forcewake call by callee function assumes domains are
>>>>> already awake, which might not be true. This shows perfectly balanced
>>>>> xe_force_wake_get/_put can also cause problem.
>>>>>
>>>>> [1] func_a() {
>>>>> XE_WARN(xe_force_wake_get()) <---> fails but increments refcount
>>>>>
>>>>> func_b();
>>>>>
>>>>> XE_WARN(xe_force_wake_put());<---> decrements refcounts
>>>>> }
>>>>>
>>>>> func_b() {
>>>>> if(xe_force_wake_get()) <---> succeeds due to refcount of caller
>>>>> return;
>>>>>
>>>>> does mmio_operations(); <---> Domain might not be awake
>>>>>
>>>>> xe_force_wake_put(); <---> decrement refcount
>>>>> }
>>>>
>>>> Well, to be honest, this is what bugs me in this whole series.
>>>>
>>>> If func_a failed, why would function b succeed? It that's the
>>>> case should we include more redundancy and retries so the
>>>> func_a would succeed like the func_b is expected in your
>>>> scenario?
>>>
>>>
>>> Hi Rodrigo,
>>>
>>> This is current behavior, which patch [1] resolves. I misunderstood your
>>> comment as dropping of that patch and simply balancing all _gets with
>>> respective _puts.
>>>
>>>
>>>>
>>>> But other then that, I'm afraid that you didn't fully understand
>>>> my idea. Sorry for not being clear.
>>>>
>>>> My thought is, you do what you are doing in this series.
>>>> If the get doesn't succeed you drop the ref count and call the
>>>> disable.
>>>
>>>
>>> OK. IMO, just reducing refcount is better for failing domain and not to
>>> disable it explicitly
>>>
>>>
>>>>
>>>> The return of the get is just for the domains that have succeeded.
>>>> then the put returns only the ones that had succeeded.
>>>> The function B will then try to wake-up whatever had failed in
>>>> func_a.
>>>
>>> I assumw with this, the return of xe_force_wake_get will return the
>>> mask, hence the caller will need to verify whether the returned mask is
>>> correct or failed.
>>>
>>>
>>>>
>>>> Something like:
>>>>
>>>>
>>>> func_a() {
>>>> fw_ref = xe_force_wake_get(ALL_DOMAINS) <---> fails GT-domain
>>>> but return a mask with all the domains except GT.
>>>>
>>>> XE_WARN(!fw_ref);
>>>
>>>
>>> XE_WARN(!fw_ref); will work for all individual domains but not ALL_DOMAINS
>>>
>>> XE_WARN(fw_ref != ALL_DOMAINS); <-- If user wants to continue -->
>>>
>>> if (fw_ref != ALL_DOMAINS) <--If user wants to return on failure -->
>>> xe_force_wake_put(fw_ref); <-- ensure to put awake domain -->
>>>
>>> return;
>>> }
>>>
>>>
>>>>
>>>> func_b();
>>>>
>>>> XE_WARN(xe_force_wake_put(fw_ref));<---> decrements refcounts of
>>>> the domains which were actually woken up.
>>>
>>> Makes sense.
>>>
>>>> }
>>>>
>>>> func_b() {
>>>> fw_ref = xe_force_wake_get(GT_DOMAIN);
>>>> if(fw_ref & GT_DOMAIN) <---> likely fail anyway since func_a has
>>>> failed, but it at least tries it out because you have handled it in
>>>> your series...
>>>> return;
>>>>
>>>> does mmio_operations(); <---> Domain might not be awake
>>>>
>>>> xe_force_wake_put(fw_ref); <---> decrement refcount of the
>>>> domains you woked up.
>>>> }
>>>>
>>>> does it make sense now?
>>>
>>>
>>> Yes, this is indeed a much better approach for FORCEWAKE_ALL. Thank you
>>> for the suggestion. To summarize, rather than disabling the successfully
>>> awakened domain in the event of a failure, we will use forcewake_put to
>>> handle the disabling of them and user will decide when to call it.
>>
>> This way of implementing looks ok to me. Only concern is what if the
>> func_b() calls xe_force_wake_assert_held(), this will raise the assert as it
>> will not find expected domain awake. This doesn't align the idea of
>> continuing in case of ack failure. IMO user decide to continue even after
>> set ack failure by assuming domain woken up but ack didn't arrive in time.
>
> yeap, and then we fix this case.
> If the assert is in place is because the _get wasn't properly handled.
Ok. Should we just use xe_force_wake_get/put, or let the user decide?
We should also document guidelines on when to use each option.
Regards,
Badal
>
>>
>> Regards,
>> Badal
>>>
>>>
>>>>
>>>>>
>>>>> BR
>>>>> Himal
>>>>>
>>>>>>
>>>>>>> }
>>>>>>> return 0;
>>>>>>> --
>>>>>>> 2.34.1
>>>>>>>
next prev parent reply other threads:[~2024-09-10 17:53 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-30 5:23 [RFC 0/9] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-08-30 5:18 ` ✓ CI.Patch_applied: success for " Patchwork
2024-08-30 5:18 ` ✓ CI.checkpatch: " Patchwork
2024-08-30 5:19 ` ✓ CI.KUnit: " Patchwork
2024-08-30 5:23 ` [RFC 1/9] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-08-30 6:37 ` Jani Nikula
2024-08-30 6:45 ` Ghimiray, Himal Prasad
2024-09-05 19:29 ` Rodrigo Vivi
2024-09-05 20:02 ` Ghimiray, Himal Prasad
2024-09-06 16:18 ` Rodrigo Vivi
2024-09-10 18:27 ` Nilawar, Badal
2024-09-11 6:51 ` Ghimiray, Himal Prasad
2024-09-11 6:40 ` Upadhyay, Tejas
2024-08-30 5:23 ` [RFC 2/9] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-05 19:30 ` Rodrigo Vivi
2024-08-30 5:23 ` [RFC 3/9] drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get() succeeds Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 4/9] drm/xe/gt: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 5/9] drm/xe/guc: " Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 6/9] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds Himal Prasad Ghimiray
2024-09-05 19:37 ` Rodrigo Vivi
2024-09-05 19:51 ` Ghimiray, Himal Prasad
2024-09-06 16:29 ` Rodrigo Vivi
2024-09-09 9:29 ` Ghimiray, Himal Prasad
2024-09-10 14:37 ` Nilawar, Badal
2024-09-10 17:39 ` Rodrigo Vivi
2024-09-10 17:53 ` Nilawar, Badal [this message]
2024-08-30 5:23 ` [RFC 8/9] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-08-30 5:23 ` [RFC 9/9] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-08-30 5:32 ` ✓ CI.Build: success for Fix xe_force_wake_get() failure handling Patchwork
2024-08-30 5:37 ` ✓ CI.Hooks: " Patchwork
2024-08-30 5:42 ` ✓ CI.checksparse: " Patchwork
2024-08-30 6:05 ` ✓ CI.BAT: " Patchwork
2024-08-30 17:41 ` ✓ CI.FULL: " Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a6ce3220-3522-4489-a875-3731bd85f9ed@intel.com \
--to=badal.nilawar@intel.com \
--cc=himal.prasad.ghimiray@intel.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=lucas.demarchi@intel.com \
--cc=matthew.brost@intel.com \
--cc=rodrigo.vivi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox