Re: [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Matthew Auld <matthew.auld@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>, intel-xe@lists.freedesktop.org
Subject: Re: [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm
Date: Tue, 9 Jan 2024 12:09:45 +0000	[thread overview]
Message-ID: <40019771-542e-442a-823a-071fd8a5aab5@intel.com> (raw)
In-Reply-To: <20231228021232.2366249-16-rodrigo.vivi@intel.com>

On 28/12/2023 02:12, Rodrigo Vivi wrote:
> First of all, this !ongoing && !from_runtime_functions seems a case that
> should not happen and be bad anyway. So, let's at least stop doing
> the workaround and if we find the case again we need to find in which
> outer bound we need to protect this access, or another real condition.
> 
> On top of that we are now protecting more outer bounds instead of
> a more granular memory access, so we might be fine. Or maybe ensure
> that we really shut off GuC on these conditions.
> 
> Anyway, let's proceed with our killing of the memory_access callers
> for now.

IIRC the main concern was that you could somehow get unsolicited CT 
messages from GuC, so we figured it was best to ensure the device is 
awake (and stays awake) before proceeding with accessing the device. Say 
device enters SUSPENDING just as we get/process an unsolicited CT 
message, or maybe CT request timed out waiting for response and caller 
bails out dropping rpm and maybe triggering async gt reset, but turns 
out GuC is just being slow and we do eventually get that response and 
process it? Is that type of stuff not possible?

> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
>   drivers/gpu/drm/xe/xe_guc_ct.c | 40 ----------------------------------
>   1 file changed, 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 4cde93c18a2d4..7e68ef69ca8d5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -1096,14 +1096,8 @@ static void g2h_fast_path(struct xe_guc_ct *ct, u32 *msg, u32 len)
>    */
>   void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   {
> -	struct xe_device *xe = ct_to_xe(ct);
> -	bool ongoing;
>   	int len;
>   
> -	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> -	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> -		return;
> -
>   	spin_lock(&ct->fast_lock);
>   	do {
>   		len = g2h_read(ct, ct->fast_msg, true);
> @@ -1111,9 +1105,6 @@ void xe_guc_ct_fast_path(struct xe_guc_ct *ct)
>   			g2h_fast_path(ct, ct->fast_msg, len);
>   	} while (len > 0);
>   	spin_unlock(&ct->fast_lock);
> -
> -	if (ongoing)
> -		xe_device_mem_access_put(xe);
>   }
>   
>   /* Returns less than zero on error, 0 on done, 1 on more available */
> @@ -1144,36 +1135,8 @@ static int dequeue_one_g2h(struct xe_guc_ct *ct)
>   static void g2h_worker_func(struct work_struct *w)
>   {
>   	struct xe_guc_ct *ct = container_of(w, struct xe_guc_ct, g2h_worker);
> -	bool ongoing;
>   	int ret;
>   
> -	/*
> -	 * Normal users must always hold mem_access.ref around CT calls. However
> -	 * during the runtime pm callbacks we rely on CT to talk to the GuC, but
> -	 * at this stage we can't rely on mem_access.ref and even the
> -	 * callback_task will be different than current.  For such cases we just
> -	 * need to ensure we always process the responses from any blocking
> -	 * ct_send requests or where we otherwise expect some response when
> -	 * initiated from those callbacks (which will need to wait for the below
> -	 * dequeue_one_g2h()).  The dequeue_one_g2h() will gracefully fail if
> -	 * the device has suspended to the point that the CT communication has
> -	 * been disabled.
> -	 *
> -	 * If we are inside the runtime pm callback, we can be the only task
> -	 * still issuing CT requests (since that requires having the
> -	 * mem_access.ref).  It seems like it might in theory be possible to
> -	 * receive unsolicited events from the GuC just as we are
> -	 * suspending-resuming, but those will currently anyway be lost when
> -	 * eventually exiting from suspend, hence no need to wake up the device
> -	 * here. If we ever need something stronger than get_if_ongoing() then
> -	 * we need to be careful with blocking the pm callbacks from getting CT
> -	 * responses, if the worker here is blocked on those callbacks
> -	 * completing, creating a deadlock.
> -	 */
> -	ongoing = xe_device_mem_access_get_if_ongoing(ct_to_xe(ct));
> -	if (!ongoing && xe_pm_read_callback_task(ct_to_xe(ct)) == NULL)
> -		return;
> -
>   	do {
>   		mutex_lock(&ct->lock);
>   		ret = dequeue_one_g2h(ct);
> @@ -1187,9 +1150,6 @@ static void g2h_worker_func(struct work_struct *w)
>   			kick_reset(ct);
>   		}
>   	} while (ret == 1);
> -
> -	if (ongoing)
> -		xe_device_mem_access_put(ct_to_xe(ct));
>   }
>   
>   static void guc_ctb_snapshot_capture(struct xe_device *xe, struct guc_ctb *ctb,

next prev parent reply	other threads:[~2024-01-09 12:09 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-28  2:12 [RFC 00/20] First attempt to kill mem_access Rodrigo Vivi
2023-12-28  2:12 ` [RFC 01/20] drm/xe: Document Xe PM component Rodrigo Vivi
2023-12-28  2:12 ` [RFC 02/20] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
2023-12-28  2:12 ` [RFC 03/20] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
2023-12-28  2:12 ` [RFC 04/20] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
2023-12-28  2:12 ` [RFC 05/20] drm/xe: Prepare display for D3Cold Rodrigo Vivi
2023-12-28  2:12 ` [RFC 06/20] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
2024-01-09 11:06   ` Matthew Auld
2024-01-09 17:50     ` Rodrigo Vivi
2023-12-28  2:12 ` [RFC 07/20] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
2024-01-02 11:30   ` Gupta, Anshuman
2024-01-09 17:57     ` Rodrigo Vivi
2023-12-28  2:12 ` [RFC 08/20] drm/xe: Runtime PM wake on every exec Rodrigo Vivi
2024-01-09 11:24   ` Matthew Auld
2024-01-09 17:41     ` Rodrigo Vivi
2024-01-09 18:40       ` Matthew Auld
2023-12-28  2:12 ` [RFC 09/20] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
2023-12-28  2:12 ` [RFC 10/20] drm/xe: Sort some xe_pm_runtime related functions Rodrigo Vivi
2024-01-09 11:26   ` Matthew Auld
2023-12-28  2:12 ` [RFC 11/20] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
2023-12-28  2:12 ` [RFC 12/20] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 13/20] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
2023-12-28  2:12 ` [RFC 14/20] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 15/20] drm/xe: Allow GuC CT fast path and worker regardless of runtime_pm Rodrigo Vivi
2024-01-09 12:09   ` Matthew Auld [this message]
2023-12-28  2:12 ` [RFC 16/20] drm/xe: Remove mem_access calls from migration Rodrigo Vivi
2024-01-09 12:33   ` Matthew Auld
2024-01-09 17:58     ` Rodrigo Vivi
2024-01-09 18:49       ` Matthew Auld
2024-01-09 22:40         ` Rodrigo Vivi
2024-01-11 14:17           ` Matthew Brost
2023-12-28  2:12 ` [RFC 17/20] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
2023-12-28  2:12 ` [RFC 18/20] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
2023-12-28  2:12 ` [RFC 19/20] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
2023-12-28  2:12 ` [RFC 20/20] drm/xe: Mega Kill of mem_access Rodrigo Vivi
2024-01-09 11:41   ` Matthew Auld
2024-01-09 17:39     ` Rodrigo Vivi
2024-01-09 18:27       ` Matthew Auld
2024-01-09 22:34         ` Rodrigo Vivi
2024-01-04  5:40 ` ✓ CI.Patch_applied: success for First attempt to kill mem_access Patchwork
2024-01-04  5:40 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-04  5:41 ` ✗ CI.KUnit: failure " Patchwork
2024-01-10  5:21 ` [RFC 00/20] " Matthew Brost
2024-01-10 14:06   ` Rodrigo Vivi
2024-01-10 14:08     ` Vivi, Rodrigo
2024-01-10 14:33     ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40019771-542e-442a-823a-071fd8a5aab5@intel.com \
    --to=matthew.auld@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox