All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: John Harrison <john.c.harrison@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Subject: Re: [Intel-xe] [PATCH 01/12] drm/xe: implement driver initiated function-reset
Date: Wed, 8 Nov 2023 10:14:34 -0800	[thread overview]
Message-ID: <9c107d75-bcbf-47a1-bf35-19dbb2fa2ee2@intel.com> (raw)
In-Reply-To: <c645d629-0d1d-4f3a-9199-6edcb3745739@intel.com>



On 11/7/2023 3:46 PM, John Harrison wrote:
> On 10/27/2023 15:29, Daniele Ceraolo Spurio wrote:
>> From: Andrzej Hajda <andrzej.hajda@intel.com>
>>
>> Driver initiated function-reset (FLR) is the highest level of reset
>> that we can trigger from within the driver. In contrast to PCI FLR it
>> doesn't require re-enumeration of PCI BAR. It can be useful in case
>> GT fails to reset. It is also the only way to trigger GSC reset from
>> the driver and can be used in future addition of GSC support.
>>
>> v2:
>>    - use regs from xe_regs.h
>>    - move the flag to xe.mmio
>>    - call flr only on root gt
>>    - use BIOS protection check
>>    - copy/paste comments from i915
>> v3:
>>    - flr code moved to xe_device.c
>> v4:
>>    - needs_flr_on_fini moved to xe_device
>>
>> Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
>> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
>> ---
>>   drivers/gpu/drm/xe/regs/xe_regs.h    |  7 +++
>>   drivers/gpu/drm/xe/xe_device.c       | 78 ++++++++++++++++++++++++++++
>>   drivers/gpu/drm/xe/xe_device_types.h |  3 ++
>>   drivers/gpu/drm/xe/xe_gt.c           |  2 +
>>   4 files changed, 90 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/regs/xe_regs.h 
>> b/drivers/gpu/drm/xe/regs/xe_regs.h
>> index 2240cd157603..a646d13af03a 100644
>> --- a/drivers/gpu/drm/xe/regs/xe_regs.h
>> +++ b/drivers/gpu/drm/xe/regs/xe_regs.h
>> @@ -57,8 +57,15 @@
>>     #define SOFTWARE_FLAGS_SPR33            XE_REG(0x4f084)
>>   +#define GU_CNTL_PROTECTED            XE_REG(0x10100C)
>> +#define   DRIVERINT_FLR_DIS            REG_BIT(31)
>> +
>>   #define GU_CNTL                    XE_REG(0x101010)
>>   #define   LMEM_INIT                REG_BIT(7)
>> +#define   DRIVERFLR                REG_BIT(31)
>> +
>> +#define GU_DEBUG                XE_REG(0x101018)
>> +#define   DRIVERFLR_STATUS            REG_BIT(31)
>>     #define XEHP_CLOCK_GATE_DIS            XE_REG(0x101014)
>>   #define   SGSI_SIDECLK_DIS            REG_BIT(17)
>> diff --git a/drivers/gpu/drm/xe/xe_device.c 
>> b/drivers/gpu/drm/xe/xe_device.c
>> index 8341acf66e5f..515cdf599fab 100644
>> --- a/drivers/gpu/drm/xe/xe_device.c
>> +++ b/drivers/gpu/drm/xe/xe_device.c
>> @@ -5,6 +5,8 @@
>>     #include "xe_device.h"
>>   +#include <linux/units.h>
>> +
>>   #include <drm/drm_aperture.h>
>>   #include <drm/drm_atomic_helper.h>
>>   #include <drm/drm_gem_ttm_helper.h>
>> @@ -260,6 +262,78 @@ struct xe_device *xe_device_create(struct 
>> pci_dev *pdev,
>>       return ERR_PTR(err);
>>   }
>>   +/*
>> + * The driver-initiated FLR is the highest level of reset that we 
>> can trigger
>> + * from within the driver. It is different from the PCI FLR in that 
>> it doesn't
>> + * fully reset the SGUnit and doesn't modify the PCI config space 
>> and therefore
>> + * it doesn't require a re-enumeration of the PCI BARs. However, the
>> + * driver-initiated FLR does still cause a reset of both GT and 
>> display and a
>> + * memory wipe of local and stolen memory, so recovery would require 
>> a full HW
>> + * re-init and saving/restoring (or re-populating) the wiped memory. 
>> Since we
>> + * perform the FLR as the very last action before releasing access 
>> to the HW
>> + * during the driver release flow, we don't attempt recovery at all, 
>> because
>> + * if/when a new instance of i915 is bound to the device it will do 
>> a full
>> + * re-init anyway.
>> + */
>> +static void xe_driver_flr(struct xe_device *xe)
>> +{
>> +    const unsigned int flr_timeout = 3 * MICRO; /* specs recommend a 
>> 3s wait */
> 3s or 3us?

MICRO is the number of usecs in a sec, so that matches 3 seconds as a 
usec value

>
> And is it supposed to be the same timeout for both registers?

The spec says that the whole process can take up to 3 secs, but there is 
no indication on how this time is split across the 2 waits. We could 
subtract the amount waited on the first wait from the timeout of the 
second one, but that has very little benefit because if the FLR fails 
the HW is dead anyway, so it doesn't really matter if we notice a bit 
later; given that, it's just easier code-wise to have the full 3 secs as 
timeout both times.

Daniele

>
> John.
>
>> +    struct xe_gt *gt = xe_root_mmio_gt(xe);
>> +    int ret;
>> +
>> +    if (xe_mmio_read32(gt, GU_CNTL_PROTECTED) & DRIVERINT_FLR_DIS) {
>> +        drm_info_once(&xe->drm, "BIOS Disabled Driver-FLR\n");
>> +        return;
>> +    }
>> +
>> +    drm_dbg(&xe->drm, "Triggering Driver-FLR\n");
>> +
>> +    /*
>> +     * Make sure any pending FLR requests have cleared by waiting 
>> for the
>> +     * FLR trigger bit to go to zero. Also clear GU_DEBUG's 
>> DRIVERFLR_STATUS
>> +     * to make sure it's not still set from a prior attempt (it's a 
>> write to
>> +     * clear bit).
>> +     * Note that we should never be in a situation where a previous 
>> attempt
>> +     * is still pending (unless the HW is totally dead), but better 
>> to be
>> +     * safe in case something unexpected happens
>> +     */
>> +    ret = xe_mmio_wait32(gt, GU_CNTL, DRIVERFLR, 0, flr_timeout, 
>> NULL, false);
>> +    if (ret) {
>> +        drm_err(&xe->drm, "Driver-FLR-prepare wait for ready failed! 
>> %d\n", ret);
>> +        return;
>> +    }
>> +    xe_mmio_write32(gt, GU_DEBUG, DRIVERFLR_STATUS);
>> +
>> +    /* Trigger the actual Driver-FLR */
>> +    xe_mmio_rmw32(gt, GU_CNTL, 0, DRIVERFLR);
>> +
>> +    /* Wait for hardware teardown to complete */
>> +    ret = xe_mmio_wait32(gt, GU_CNTL, DRIVERFLR, 0, flr_timeout, 
>> NULL, false);
>> +    if (ret) {
>> +        drm_err(&xe->drm, "Driver-FLR-teardown wait completion 
>> failed! %d\n", ret);
>> +        return;
>> +    }
>> +
>> +    /* Wait for hardware/firmware re-init to complete */
>> +    ret = xe_mmio_wait32(gt, GU_DEBUG, DRIVERFLR_STATUS, 
>> DRIVERFLR_STATUS,
>> +                 flr_timeout, NULL, false);
>> +    if (ret) {
>> +        drm_err(&xe->drm, "Driver-FLR-reinit wait completion failed! 
>> %d\n", ret);
>> +        return;
>> +    }
>> +
>> +    /* Clear sticky completion status */
>> +    xe_mmio_write32(gt, GU_DEBUG, DRIVERFLR_STATUS);
>> +}
>> +
>> +static void xe_driver_flr_fini(struct drm_device *drm, void *arg)
>> +{
>> +    struct xe_device *xe = arg;
>> +
>> +    if (xe->needs_flr_on_fini)
>> +        xe_driver_flr(xe);
>> +}
>> +
>>   static void xe_device_sanitize(struct drm_device *drm, void *arg)
>>   {
>>       struct xe_device *xe = arg;
>> @@ -294,6 +368,10 @@ int xe_device_probe(struct xe_device *xe)
>>       if (err)
>>           return err;
>>   +    err = drmm_add_action_or_reset(&xe->drm, xe_driver_flr_fini, xe);
>> +    if (err)
>> +        return err;
>> +
>>       for_each_gt(gt, xe, id) {
>>           err = xe_pcode_probe(gt);
>>           if (err)
>> diff --git a/drivers/gpu/drm/xe/xe_device_types.h 
>> b/drivers/gpu/drm/xe/xe_device_types.h
>> index 44d622d4cc3a..9a0b0ccc1018 100644
>> --- a/drivers/gpu/drm/xe/xe_device_types.h
>> +++ b/drivers/gpu/drm/xe/xe_device_types.h
>> @@ -405,6 +405,9 @@ struct xe_device {
>>       /** @heci_gsc: graphics security controller */
>>       struct xe_heci_gsc heci_gsc;
>>   +    /** @needs_flr_on_fini: requests function-reset on fini */
>> +    bool needs_flr_on_fini;
>> +
>>       /* private: */
>>     #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
>> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>> index d380f67b3365..73c090762771 100644
>> --- a/drivers/gpu/drm/xe/xe_gt.c
>> +++ b/drivers/gpu/drm/xe/xe_gt.c
>> @@ -626,6 +626,8 @@ static int gt_reset(struct xe_gt *gt)
>> xe_uevent_gt_reset_failure(to_pci_dev(gt_to_xe(gt)->drm.dev),
>>                      gt_to_tile(gt)->id, gt->info.id);
>>   +    gt_to_xe(gt)->needs_flr_on_fini = true;
>> +
>>       return err;
>>   }
>


  reply	other threads:[~2023-11-08 18:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-27 22:29 [Intel-xe] [PATCH 00/12] GSC FW loading Daniele Ceraolo Spurio
2023-10-27 22:29 ` [Intel-xe] [PATCH 01/12] drm/xe: implement driver initiated function-reset Daniele Ceraolo Spurio
2023-11-07 23:46   ` John Harrison
2023-11-08 18:14     ` Daniele Ceraolo Spurio [this message]
2023-10-27 22:29 ` [Intel-xe] [PATCH 02/12] fixup! drm/xe/guc: Report submission version of GuC firmware Daniele Ceraolo Spurio
2023-10-31 14:09   ` Andrzej Hajda
2023-10-31 19:00     ` Daniele Ceraolo Spurio
2023-11-07 23:07   ` John Harrison
2023-11-07 23:24     ` Daniele Ceraolo Spurio
2023-11-07 23:38       ` John Harrison
2023-11-09 19:59         ` Daniele Ceraolo Spurio
2023-10-27 22:29 ` [Intel-xe] [PATCH 03/12] drm/xe/uc: Rework uC version tracking Daniele Ceraolo Spurio
2023-11-07 23:20   ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 04/12] drm/xe/gsc: Introduce GSC FW Daniele Ceraolo Spurio
2023-11-07 23:26   ` John Harrison
2023-11-07 23:32     ` Daniele Ceraolo Spurio
2023-11-07 23:52       ` John Harrison
2023-11-07 23:59         ` Daniele Ceraolo Spurio
2023-10-27 22:29 ` [Intel-xe] [PATCH 05/12] drm/xe/gsc: Parse GSC FW header Daniele Ceraolo Spurio
2023-11-07 23:45   ` John Harrison
2023-11-07 23:57     ` Daniele Ceraolo Spurio
2023-11-08  0:42       ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 06/12] drm/xe/gsc: GSC FW load Daniele Ceraolo Spurio
2023-11-08 22:17   ` John Harrison
2023-11-08 22:23     ` Daniele Ceraolo Spurio
2023-11-08 22:29       ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 07/12] drm/xe/gsc: Implement WA 14015076503 Daniele Ceraolo Spurio
2023-11-08 22:22   ` John Harrison
2023-11-08 22:35     ` Daniele Ceraolo Spurio
2023-11-08 22:40       ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 08/12] drm/xe/gsc: Trigger a driver flr to cleanup the GSC on unload Daniele Ceraolo Spurio
2023-11-08 22:24   ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 09/12] drm/xe/gsc: Add an interface for GSC packet submissions Daniele Ceraolo Spurio
2023-10-31  8:08   ` Kandpal, Suraj
2023-10-31 19:29     ` Daniele Ceraolo Spurio
2023-11-08  8:25       ` Kandpal, Suraj
2023-11-13 19:59   ` John Harrison
2023-11-13 21:19     ` Daniele Ceraolo Spurio
2023-11-14 19:32       ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 10/12] drm/xe/gsc: Query GSC compatibility version Daniele Ceraolo Spurio
2023-11-13 20:10   ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 11/12] drm/xe/gsc: Define GSCCS for MTL Daniele Ceraolo Spurio
2023-11-13 20:23   ` John Harrison
2023-11-13 21:32     ` Daniele Ceraolo Spurio
2023-11-14 19:39       ` John Harrison
2023-10-27 22:29 ` [Intel-xe] [PATCH 12/12] drm/xe/gsc: Define GSC FW " Daniele Ceraolo Spurio
2023-11-13 20:26   ` John Harrison
2023-11-13 21:33     ` Daniele Ceraolo Spurio
2023-10-27 22:32 ` [Intel-xe] ✓ CI.Patch_applied: success for GSC FW loading Patchwork
2023-10-27 22:32 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-11-13 20:29   ` John Harrison
2023-10-27 22:33 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-11-13 20:30   ` John Harrison
2023-11-13 21:13     ` Daniele Ceraolo Spurio
2023-11-13 16:05 ` [Intel-xe] [PATCH 00/12] " Lucas De Marchi
2023-11-13 16:09   ` Daniele Ceraolo Spurio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c107d75-bcbf-47a1-bf35-19dbb2fa2ee2@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=andrzej.hajda@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=john.c.harrison@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.