Re: [PATCH] drm/xe: Always check force_wake_get return code

Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: "Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH] drm/xe: Always check force_wake_get return code
Date: Thu, 14 Mar 2024 06:53:41 -0700	[thread overview]
Message-ID: <10c117bd-3a02-4049-b6a2-35227f7edb6b@intel.com> (raw)
In-Reply-To: <SJ1PR11MB62042FB7B7B7570831A8D36B81292@SJ1PR11MB6204.namprd11.prod.outlook.com>



On 3/14/2024 3:58 AM, Upadhyay, Tejas wrote:
>
>> -----Original Message-----
>> From: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>
>> Sent: Wednesday, March 13, 2024 1:13 AM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Ceraolo Spurio, Daniele <daniele.ceraolospurio@intel.com>; Upadhyay,
>> Tejas <tejas.upadhyay@intel.com>
>> Subject: [PATCH] drm/xe: Always check force_wake_get return code
>>
>> A force_wake_get failure means that the HW might not be awake for the
>> access we're doing; this can lead to an immediate error or it can be a more
>> subtle problem (e.g. a register read might return an incorrect value that is still
>> valid, leading the driver to make a wrong choice instead of flagging an error).
>> We avoid an error from the force_wake function because callers might handle
>> or tolerate the error, but this only works if all callers are checking the error
>> code. The majority already do, but a few are not.
>> These are mainly falling into 3 categories, which are each handled
>> differently:
>>
>> 1) error capture: in this case we want to continue the capture, but we
>>     log an info message in dmesg to notify the user that the capture
>>     might have incorrect data.
>>
>> 2) ioctl: in this case we return a -EIO error to userspace
>>
>> 3) unabortable actions: these are scenarios where we can't simply abort
>>     and retry and so it's better to just try it anyway because there is a
>>     chance the HW is awake even with the failure. In this case we throw a
>>     warning so we know there was a forcewake problem if something fails
>>     down the line.
>>
>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_devcoredump.c         | 9 +++++++--
>>   drivers/gpu/drm/xe/xe_gsc.c                 | 2 +-
>>   drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 2 +-
>>   drivers/gpu/drm/xe/xe_guc.c                 | 2 +-
>>   drivers/gpu/drm/xe/xe_guc_pc.c              | 2 +-
>>   drivers/gpu/drm/xe/xe_guc_submit.c          | 4 +++-
>>   drivers/gpu/drm/xe/xe_query.c               | 3 ++-
>>   7 files changed, 16 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c
>> b/drivers/gpu/drm/xe/xe_devcoredump.c
>> index 0fcd30680323..7d3aa6bd3524 100644
>> --- a/drivers/gpu/drm/xe/xe_devcoredump.c
>> +++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>> @@ -13,6 +13,7 @@
>>   #include "xe_exec_queue.h"
>>   #include "xe_force_wake.h"
>>   #include "xe_gt.h"
>> +#include "xe_gt_printk.h"
>>   #include "xe_guc_ct.h"
>>   #include "xe_guc_submit.h"
>>   #include "xe_hw_engine.h"
>> @@ -64,7 +65,9 @@ static void xe_devcoredump_deferred_snap_work(struct
>> work_struct *work)  {
>>   	struct xe_devcoredump_snapshot *ss = container_of(work,
>> typeof(*ss), work);
>>
>> -	xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL);
>> +	/* keep going if fw fails as we still want to save the memory and SW
>> data */
>> +	if (xe_force_wake_get(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL))
>> +		xe_gt_info(ss->gt, "failed to get forcewake for coredump
>> capture\n");
>>   	xe_vm_snapshot_capture_delayed(ss->vm);
>>   	xe_guc_exec_queue_snapshot_capture_delayed(ss->ge);
>>   	xe_force_wake_put(gt_to_fw(ss->gt), XE_FORCEWAKE_ALL); @@ -
>> 180,7 +183,9 @@ static void devcoredump_snapshot(struct xe_devcoredump
>> *coredump,
>>   		}
>>   	}
>>
>> -	xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL);
>> +	/* keep going if fw fails as we still want to save the memory and SW
>> data */
>> +	if (xe_force_wake_get(gt_to_fw(q->gt), XE_FORCEWAKE_ALL))
>> +		xe_gt_info(ss->gt, "failed to get forcewake for coredump
>> capture\n");
>>
>>   	coredump->snapshot.ct = xe_guc_ct_snapshot_capture(&guc->ct,
>> true);
>>   	coredump->snapshot.ge =
>> xe_guc_exec_queue_snapshot_capture(job);
>> diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c index
>> d9aa815a5bc2..902c52d95a8a 100644
>> --- a/drivers/gpu/drm/xe/xe_gsc.c
>> +++ b/drivers/gpu/drm/xe/xe_gsc.c
>> @@ -287,7 +287,7 @@ static void gsc_work(struct work_struct *work)
>>   	spin_unlock_irq(&gsc->lock);
>>
>>   	xe_pm_runtime_get(xe);
>> -	xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC);
>> +	XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GSC));
>>
>>   	if (actions & GSC_ACTION_FW_LOAD) {
>>   		ret = gsc_upload_and_init(gsc);
>> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> index a3c4ffba679d..7f1ac02b96c6 100644
>> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
>> @@ -247,7 +247,7 @@ int xe_gt_tlb_invalidation_ggtt(struct xe_gt *gt)
>>
>>   		xe_gt_tlb_invalidation_wait(gt, seqno);
>>   	} else if (xe_device_uc_enabled(xe)) {
>> -		xe_force_wake_get(gt_to_fw(gt), XE_FW_GT);
>> +		XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
>>   		if (xe->info.platform == XE_PVC || GRAPHICS_VER(xe) >= 20) {
>>   			xe_mmio_write32(gt, PVC_GUC_TLB_INV_DESC1,
>>
>> 	PVC_GUC_TLB_INV_DESC1_INVALIDATE);
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c index
>> caa86ccbe9e7..32b6da891f72 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -251,7 +251,7 @@ static void guc_fini(struct drm_device *drm, void
>> *arg)  {
>>   	struct xe_guc *guc = arg;
>>
>> -	xe_force_wake_get(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL);
>> +	XE_WARN_ON(xe_force_wake_get(gt_to_fw(guc_to_gt(guc)),
>> +XE_FORCEWAKE_ALL));
>>   	xe_uc_fini_hw(&guc_to_gt(guc)->uc);
>>   	xe_force_wake_put(gt_to_fw(guc_to_gt(guc)), XE_FORCEWAKE_ALL);
>> } diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c
>> b/drivers/gpu/drm/xe/xe_guc_pc.c index f4b031b8d9de..b4b39bfcb5d5
>> 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_pc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_pc.c
>> @@ -927,7 +927,7 @@ static void xe_guc_pc_fini(struct drm_device *drm,
>> void *arg)
>>   		return;
>>   	}
>>
>> -	xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL);
>> +	XE_WARN_ON(xe_force_wake_get(gt_to_fw(pc_to_gt(pc)),
>> +XE_FORCEWAKE_ALL));
>>   	XE_WARN_ON(xe_guc_pc_gucrc_disable(pc));
>>   	XE_WARN_ON(xe_guc_pc_stop(pc));
>>   	xe_force_wake_put(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL);
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c
>> b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 19efdb2f881f..d7462969af1e 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -833,7 +833,9 @@ static void simple_error_capture(struct
>> xe_exec_queue *q)
>>   			}
>>   		}
>>
>> -		xe_force_wake_get(gt_to_fw(guc_to_gt(guc)),
>> XE_FORCEWAKE_ALL);
>> +		if (xe_force_wake_get(gt_to_fw(guc_to_gt(guc)),
>> XE_FORCEWAKE_ALL))
>> +			xe_gt_info(guc_to_gt(guc),
>> +				   "failed to get forcewake for error capture");
>>   		xe_guc_ct_print(&guc->ct, &p, true);
>>   		guc_exec_queue_print(q, &p);
>>   		for_each_hw_engine(hwe, guc_to_gt(guc), id) { diff --git
>> a/drivers/gpu/drm/xe/xe_query.c b/drivers/gpu/drm/xe/xe_query.c index
>> a6a20a6dd360..3975f99fdca5 100644
>> --- a/drivers/gpu/drm/xe/xe_query.c
>> +++ b/drivers/gpu/drm/xe/xe_query.c
>> @@ -147,7 +147,8 @@ query_engine_cycles(struct xe_device *xe,
>>   	if (!hwe)
>>   		return -EINVAL;
>>
>> -	xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
>> +	if (xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL))
>> +		return -EIO;
> There are many more such unhandled forcewake returns, but I guess each file and functionality will have different impact of forcewake failure like you explained here, so for gsc,

Are there? I locally added a __must_check to xe_force_wake_get() and 
didn't get any errors apart from the ones I'm addressing in this patch 
(which are not just the GSC-related calls). I haven't re-checked if any 
new call was merged after I posted this patch, but there shouldn't be 
many (if any). Can you point me to what I missed?

Thanks,
Daniele

>
> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
>
>>   	__read_timestamps(gt,
>>   			  RING_TIMESTAMP(hwe->mmio_base),
>> --
>> 2.43.0

next prev parent reply	other threads:[~2024-03-14 13:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-12 19:42 [PATCH] drm/xe: Always check force_wake_get return code Daniele Ceraolo Spurio
2024-03-12 19:47 ` ✓ CI.Patch_applied: success for " Patchwork
2024-03-12 19:47 ` ✓ CI.checkpatch: " Patchwork
2024-03-12 19:48 ` ✓ CI.KUnit: " Patchwork
2024-03-12 19:59 ` ✓ CI.Build: " Patchwork
2024-03-12 20:02 ` ✓ CI.Hooks: " Patchwork
2024-03-12 20:03 ` ✓ CI.checksparse: " Patchwork
2024-03-12 20:22 ` ✓ CI.BAT: " Patchwork
2024-03-12 23:07 ` [PATCH] " Matt Roper
2024-03-13  8:31 ` Jani Nikula
2024-03-13 14:24   ` Daniele Ceraolo Spurio
2024-03-13 14:56     ` Jani Nikula
2024-03-13 17:35       ` Daniele Ceraolo Spurio
2024-03-14 10:58 ` Upadhyay, Tejas
2024-03-14 13:53   ` Daniele Ceraolo Spurio [this message]
2024-03-14 14:12     ` Upadhyay, Tejas
  -- strict thread matches above, loose matches on Subject: below --
2024-06-03 11:30 Tejas Upadhyay
2024-06-03 18:56 ` Daniele Ceraolo Spurio
2024-06-03 19:39   ` Lucas De Marchi
2024-06-04 18:31     ` Rodrigo Vivi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=10c117bd-3a02-4049-b6a2-35227f7edb6b@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=tejas.upadhyay@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox