Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
To: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>
Cc: <intel-xe@lists.freedesktop.org>,
	Badal Nilawar <badal.nilawar@intel.com>,
	 Lucas De Marchi <lucas.demarchi@intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>
Subject: Re: [RFC 1/9] drm/xe: Error handling in xe_force_wake_get()
Date: Fri, 6 Sep 2024 12:18:22 -0400	[thread overview]
Message-ID: <ZtsrTvFHbMuNvXvk@intel.com> (raw)
In-Reply-To: <82a78b3a-14e6-472a-8e45-2cb4af2ff3a2@intel.com>

On Fri, Sep 06, 2024 at 01:32:38AM +0530, Ghimiray, Himal Prasad wrote:
> 
> 
> On 06-09-2024 00:59, Rodrigo Vivi wrote:
> > On Fri, Aug 30, 2024 at 10:53:18AM +0530, Himal Prasad Ghimiray wrote:
> > > If an acknowledgment timeout occurs for a domain awake request, put to
> > > sleep all domains awakened by the caller and decrease the reference
> > > count for all requested domains. This prevents xe_force_wake_get() from
> > > leaving an unhandled reference count in case of failure.
> > > While at it, add simple kernel-doc for xe_force_wake_get() and
> > > xe_force_wake_put() functions.
> > > 
> > > Cc: Badal Nilawar <badal.nilawar@intel.com>
> > > Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
> > > Cc: Lucas De Marchi <lucas.demarchi@intel.com>
> > > Cc: Nirmoy Das <nirmoy.das@intel.com>
> > > Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> > > ---
> > >   drivers/gpu/drm/xe/xe_force_wake.c | 52 +++++++++++++++++++++++++++---
> > >   1 file changed, 47 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_force_wake.c b/drivers/gpu/drm/xe/xe_force_wake.c
> > > index b263fff15273..8aa8d9b41052 100644
> > > --- a/drivers/gpu/drm/xe/xe_force_wake.c
> > > +++ b/drivers/gpu/drm/xe/xe_force_wake.c
> > > @@ -150,31 +150,73 @@ static int domain_sleep_wait(struct xe_gt *gt,
> > >   					 (ffs(tmp__) - 1))) && \
> > >   					 domain__->reg_ctl.addr)
> > > +/**
> > > + * xe_force_wake_get : Increase the domain refcount; if it was 0 initially, wake the domain
> > > + * @fw: struct xe_force_wake
> > > + * @domains: forcewake domains to get refcount on
> > > + *
> > > + * Increment refcount for the force-wake domain. If the domain is
> > > + * asleep, awaken it and wait for acknowledgment within the specified
> > > + * timeout. If a timeout occurs, decrement the refcount and put the
> > > + * caller awaken domains to sleep.
> > > + *
> > > + * Return: 0 on success or 1 on ack timeout from domains.
> > 
> > * Returns 0 for success, negative error code otherwise.
> 
> Hi Rodrigo,
> 
> Sure. Will fix in next version.
> 
> > 
> > > + */
> > >   int xe_force_wake_get(struct xe_force_wake *fw,
> > >   		      enum xe_force_wake_domains domains)
> > >   {
> > >   	struct xe_gt *gt = fw->gt;
> > >   	struct xe_force_wake_domain *domain;
> > > -	enum xe_force_wake_domains tmp, woken = 0;
> > > +	enum xe_force_wake_domains tmp, awake_rqst = 0, awake_ack = 0;
> > >   	unsigned long flags;
> > >   	int ret = 0;
> > >   	spin_lock_irqsave(&fw->lock, flags);
> > >   	for_each_fw_domain_masked(domain, domains, fw, tmp) {
> > >   		if (!domain->ref++) {
> > > -			woken |= BIT(domain->id);
> > > +			awake_rqst |= BIT(domain->id);
> > >   			domain_wake(gt, domain);
> > >   		}
> > >   	}
> > > -	for_each_fw_domain_masked(domain, woken, fw, tmp) {
> > > -		ret |= domain_wake_wait(gt, domain);
> > 
> > now you suppress the mmio error code...
> > should be better to find a way to propagate that.
> 
> 
> AFAIU the only possible error code from domain_wake_wait is -ETIMEDOUT, was
> planning to assign same to ret below, which I missed in the RFC.
> 
> 
> > 
> > > +	for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
> > > +		if (domain_wake_wait(gt, domain) == 0)
> > > +			awake_ack |= BIT(domain->id);
> > > +	}
> > > +
> > > +	ret = (awake_ack == awake_rqst) ? 0 : 1;
> > 
> > s/1/-EIO/ ?
> 
> How about -ETIMEDOUT ? Since this is same error which will be propogated in
> case of domain_wake_wait failure ?

hmm, I guess it makes more sense indeed.

> 
> > 
> > > +
> > > +	/*
> > > +	 * If @domains is XE_FORCEWAKE_ALL and an acknowledgment times out
> > > +	 * for any domain, decrease the reference count and put the awake
> > > +	 * domains to sleep. For individual domains, just decrement the
> > > +	 * reference count.
> > > +	 */
> > > +	if (ret) {
> > > +		for_each_fw_domain_masked(domain, awake_rqst, fw, tmp) {
> > > +			if (!--domain->ref && (awake_ack & BIT(domain->id)))
> > > +				domain_sleep(gt, domain);
> > 
> > wonder if it would help to extract this in a separate function to be
> > used here and in the -put function.
> 
> Let me think around that.
> 
> > 
> > But more then that, I have a question here...
> > Do we really need to sleep other domains if we are not getting ack from certain domain?
> > Doesn't it generally means that we are busted anyway?
> 
> I have no strong opinion on this, main thing is refcount shouldn't be
> incremented.
> 
> > 
> > But also, if we really need to sleep, then perhaps shouldn't we also
> > call the sleep function even from the guys who didn't ack? perhaps the ack
> > timedout, but it really woke-up? how sure we are that this is not possible?
> 
> I didn't want to change the hw state by calling sleep for the "ack failed"
> domain, so if necessary, Debug tools (PythonSV) can help us pinpoint the
> exact failure state of the HW registers.
> 
> 
> > 
> > > +		}
> > > +		awake_ack = 0;
> > >   	}
> > > -	fw->awake_domains |= woken;
> > > +
> > > +	fw->awake_domains |= awake_ack;
> > >   	spin_unlock_irqrestore(&fw->lock, flags);
> > >   	return ret;
> > >   }
> > > +/**
> > > + * xe_force_wake_put - Decrement the refcount and put domain to sleep if refcount becomes 0
> > > + * @fw: Pointer to the force wake structure
> > > + * @domains: forcewake domains to put reference
> > > + *
> > > + * This function reduces the reference counts for specified domains. If
> > > + * refcount for any of the specified domain reaches 0, it puts the domain to sleep
> > > + * and waits for acknowledgment for domain to sleep within specified timeout.
> > > + * Ensure this function is called only in case of successful xe_force_wake_get().
> > > + *
> > > + * Returns 0 in case of success or non-zero in case of timeout of ack
> > > + */
> > >   int xe_force_wake_put(struct xe_force_wake *fw,
> > >   		      enum xe_force_wake_domains domains)
> > >   {
> > > -- 
> > > 2.34.1
> > > 

  reply	other threads:[~2024-09-06 16:18 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-30  5:23 [RFC 0/9] Fix xe_force_wake_get() failure handling Himal Prasad Ghimiray
2024-08-30  5:18 ` ✓ CI.Patch_applied: success for " Patchwork
2024-08-30  5:18 ` ✓ CI.checkpatch: " Patchwork
2024-08-30  5:19 ` ✓ CI.KUnit: " Patchwork
2024-08-30  5:23 ` [RFC 1/9] drm/xe: Error handling in xe_force_wake_get() Himal Prasad Ghimiray
2024-08-30  6:37   ` Jani Nikula
2024-08-30  6:45     ` Ghimiray, Himal Prasad
2024-09-05 19:29   ` Rodrigo Vivi
2024-09-05 20:02     ` Ghimiray, Himal Prasad
2024-09-06 16:18       ` Rodrigo Vivi [this message]
2024-09-10 18:27         ` Nilawar, Badal
2024-09-11  6:51           ` Ghimiray, Himal Prasad
2024-09-11  6:40       ` Upadhyay, Tejas
2024-08-30  5:23 ` [RFC 2/9] drm/xe: Ensure __must_check for xe_force_wake_get() return Himal Prasad Ghimiray
2024-09-05 19:30   ` Rodrigo Vivi
2024-08-30  5:23 ` [RFC 3/9] drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get() succeeds Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 4/9] drm/xe/gt: " Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 5/9] drm/xe/guc: " Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 6/9] drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init() Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 7/9] drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if xe_force_wake_get succeds Himal Prasad Ghimiray
2024-09-05 19:37   ` Rodrigo Vivi
2024-09-05 19:51     ` Ghimiray, Himal Prasad
2024-09-06 16:29       ` Rodrigo Vivi
2024-09-09  9:29         ` Ghimiray, Himal Prasad
2024-09-10 14:37           ` Nilawar, Badal
2024-09-10 17:39             ` Rodrigo Vivi
2024-09-10 17:53               ` Nilawar, Badal
2024-08-30  5:23 ` [RFC 8/9] drm/xe: Change return type to void for xe_force_wake_put Himal Prasad Ghimiray
2024-08-30  5:23 ` [RFC 9/9] drm/xe: forcewake debugfs open fails on xe_forcewake_get failure Himal Prasad Ghimiray
2024-08-30  5:32 ` ✓ CI.Build: success for Fix xe_force_wake_get() failure handling Patchwork
2024-08-30  5:37 ` ✓ CI.Hooks: " Patchwork
2024-08-30  5:42 ` ✓ CI.checksparse: " Patchwork
2024-08-30  6:05 ` ✓ CI.BAT: " Patchwork
2024-08-30 17:41 ` ✓ CI.FULL: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZtsrTvFHbMuNvXvk@intel.com \
    --to=rodrigo.vivi@intel.com \
    --cc=badal.nilawar@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=nirmoy.das@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox