Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Gustavo Sousa <gustavo.sousa@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>,
	Michal Wajdeczko <michal.wajdeczko@intel.com>,
	<intel-xe@lists.freedesktop.org>
Subject: Re: [PATCH v3 07/23] drm/xe: Read VF GMD_ID with a specifically-allocated dummy GT
Date: Thu, 2 Oct 2025 09:43:32 -0300	[thread overview]
Message-ID: <175940901218.1952.3042709661416045697@intel.com> (raw)
In-Reply-To: <0ce62511-3c46-4d1a-aef9-8c5f5a4f4d0a@intel.com>

Quoting Michal Wajdeczko (2025-10-01 07:07:04-03:00)
>
>
>On 10/1/2025 12:56 AM, Matt Roper wrote:
>> SRIOV VF initialization has a bit of a chicken and egg design problem.
>> Determining the IP version of the graphics and media IPs can't be done
>> via direct register reads as it is on PF or native and instead requires
>> querying the GuC.  However initialization of the GT, including its GuC,
>> needs to wait until after we know the IP versions so that the proper
>> initialization steps for the platform/IP are followed.
>> 
>> Currently the (somewhat hacky) solution is to manually fill out just
>> enough fields in tile 0's primary GT structure to make it look as if the
>> GT has been initialized so that the GuC can be partially initialized and
>> queried to obtain the GMD_ID values.  When the GT gets properly
>> initialized during the regular flows, the hacked-up values will get
>> overwritten as part of the general initialization flows.
>> 
>> Rather than using tile 0's primary GT structure to hold the hacked up
>> values for querying every GT on every tile, instead allocate a dedicated
>> dummy structure.  This will allow us to move the tile->primary_gt's
>> allocation to a more consistent place later in the initialization flow
>> in future patches (i.e., we shouldn't even allocate this GT structure if
>> the GT is disabled/unavailable).  It also helps ensure there can't be
>> any accidental leakage of initialization or state between the dummy
>> initialization for GMD_ID and the real driver initialization of the GT.
>> 
>> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
>> ---
>>  drivers/gpu/drm/xe/tests/xe_pci.c |  6 ++-
>>  drivers/gpu/drm/xe/xe_pci.c       | 61 +++++++++++++++++--------------
>>  2 files changed, 38 insertions(+), 29 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/xe/tests/xe_pci.c b/drivers/gpu/drm/xe/tests/xe_pci.c
>> index 0f136bc85b76..969f1dacade8 100644
>> --- a/drivers/gpu/drm/xe/tests/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/tests/xe_pci.c
>> @@ -307,8 +307,8 @@ const void *xe_pci_id_gen_param(const void *prev, char *desc)
>>  }
>>  EXPORT_SYMBOL_IF_KUNIT(xe_pci_id_gen_param);
>>  
>> -static void fake_read_gmdid(struct xe_device *xe, enum xe_gmdid_type type,
>> -                            u32 *ver, u32 *revid)
>> +static int fake_read_gmdid(struct xe_device *xe, enum xe_gmdid_type type,
>> +                           u32 *ver, u32 *revid)
>>  {
>>          struct kunit *test = kunit_get_current_test();
>>          struct xe_pci_fake_data *data = test->priv;
>> @@ -320,6 +320,8 @@ static void fake_read_gmdid(struct xe_device *xe, enum xe_gmdid_type type,
>>                  *ver = data->graphics_verx100;
>>                  *revid = xe_step_to_gmdid(data->step.graphics);
>>          }
>> +
>> +        return 0;
>>  }
>>  
>>  static void fake_xe_info_probe_tile_count(struct xe_device *xe)
>> diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
>> index 37ae49f4b648..9fb5df10844d 100644
>> --- a/drivers/gpu/drm/xe/xe_pci.c
>> +++ b/drivers/gpu/drm/xe/xe_pci.c
>> @@ -464,7 +464,7 @@ enum xe_gmdid_type {
>>          GMDID_MEDIA
>>  };
>>  
>> -static void read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver, u32 *revid)
>> +static int read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver, u32 *revid)
>>  {
>>          struct xe_mmio *mmio = xe_root_tile_mmio(xe);
>>          struct xe_reg gmdid_reg = GMD_ID;
>> @@ -473,21 +473,19 @@ static void read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver,
>>          KUNIT_STATIC_STUB_REDIRECT(read_gmdid, xe, type, ver, revid);
>>  
>>          if (IS_SRIOV_VF(xe)) {
>> -                struct xe_gt *gt = xe_root_mmio_gt(xe);
>> -
>>                  /*
>>                   * To get the value of the GMDID register, VFs must obtain it
>>                   * from the GuC using MMIO communication.
>>                   *
>> -                 * Note that at this point the xe_gt is not fully uninitialized
>> -                 * and only basic access to MMIO registers is possible. To use
>> -                 * our existing GuC communication functions we must perform at
>> -                 * least basic xe_gt and xe_guc initialization.
>> -                 *
>> -                 * Since to obtain the value of GMDID_MEDIA we need to use the
>> -                 * media GuC, temporarily tweak the gt type.
>> +                 * Note that at this point the GTs are not initialized and only
>> +                 * tile-level access to MMIO registers is possible. To use our
>> +                 * existing GuC communication functions we must create a dummy
>> +                 * GT structure and perform at least basic xe_gt and xe_guc
>> +                 * initialization.
>>                   */
>> -                xe_gt_assert(gt, gt->info.type == XE_GT_TYPE_UNINITIALIZED);
>> +                struct xe_gt *gt = kzalloc(sizeof(*gt), GFP_KERNEL);
>
>nit: we can use __free(kfree) here ...

Taking a step back, do we really need to allocate this? I get the
impression that keeping the dummy gt on the stack should be fine since
we are keeping its lifetime constrained within this function.

--
Gustavo Sousa

>
>> +                if (!gt)
>> +                        return -ENOMEM;
>>  
>>                  if (type == GMDID_MEDIA) {
>>                          gt->info.id = 1;
>
>it looks that it is still crashing in xe_gt_mmio_init() called from below
>
><1> [146.543510] BUG: kernel NULL pointer dereference, address: 0000000000000028
><1> [146.543528] #PF: supervisor read access in kernel mode
><1> [146.543538] #PF: error_code(0x0000) - not-present page
><4> [146.543611] RIP: 0010:xe_gt_mmio_init+0x27/0x1b0 [xe]
><4> [146.544175] Call Trace:
><4> [146.544182]  <TASK>
><4> [146.544196]  read_gmdid+0x12b/0x2a0 [xe]
><4> [146.544601]  xe_info_init+0x50e/0xc90 [xe]
><4> [146.544948]  ? drmm_kmalloc+0x87/0x100
><4> [146.544968]  ? xe_device_probe_early+0xdf/0x230 [xe]
><4> [146.545253]  ? pci_write_config_word+0x27/0x50
><4> [146.545274]  xe_pci_probe+0x163/0x600 [xe]
>
>likely due to NULL gt->tile
>
>> @@ -503,12 +501,7 @@ static void read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver,
>>                  /* Don't bother with GMDID if failed to negotiate the GuC ABI */
>>                  val = xe_gt_sriov_vf_bootstrap(gt) ? 0 : xe_gt_sriov_vf_gmdid(gt);
>
>... so in case of bootstrap error we can easily return actual error,
>instead of returning plain 0.00 version
>
>>  
>> -                /*
>> -                 * Only undo xe_gt.info here, the remaining changes made above
>> -                 * will be overwritten as part of the regular initialization.
>> -                 */
>> -                gt->info.id = 0;
>> -                gt->info.type = XE_GT_TYPE_UNINITIALIZED;
>> +                kfree(gt);
>>          } else {
>>                  /*
>>                   * GMD_ID is a GT register, but at this point in the driver
>> @@ -526,6 +519,8 @@ static void read_gmdid(struct xe_device *xe, enum xe_gmdid_type type, u32 *ver,
>>  
>>          *ver = REG_FIELD_GET(GMD_ID_ARCH_MASK, val) * 100 + REG_FIELD_GET(GMD_ID_RELEASE_MASK, val);
>>          *revid = REG_FIELD_GET(GMD_ID_REVID, val);
>> +
>> +        return 0;
>>  }
>>  
>>  static const struct xe_ip *find_graphics_ip(unsigned int verx100)
>> @@ -552,18 +547,21 @@ static const struct xe_ip *find_media_ip(unsigned int verx100)
>>   * Read IP version from hardware and select graphics/media IP descriptors
>>   * based on the result.
>>   */
>> -static void handle_gmdid(struct xe_device *xe,
>> -                         const struct xe_ip **graphics_ip,
>> -                         const struct xe_ip **media_ip,
>> -                         u32 *graphics_revid,
>> -                         u32 *media_revid)
>> +static int handle_gmdid(struct xe_device *xe,
>> +                        const struct xe_ip **graphics_ip,
>> +                        const struct xe_ip **media_ip,
>> +                        u32 *graphics_revid,
>> +                        u32 *media_revid)
>>  {
>>          u32 ver;
>> +        int ret;
>>  
>>          *graphics_ip = NULL;
>>          *media_ip = NULL;
>>  
>> -        read_gmdid(xe, GMDID_GRAPHICS, &ver, graphics_revid);
>> +        ret = read_gmdid(xe, GMDID_GRAPHICS, &ver, graphics_revid);
>> +        if (ret)
>> +                return ret;
>>  
>>          *graphics_ip = find_graphics_ip(ver);
>>          if (!*graphics_ip) {
>> @@ -571,16 +569,21 @@ static void handle_gmdid(struct xe_device *xe,
>>                          ver / 100, ver % 100);
>>          }
>>  
>> -        read_gmdid(xe, GMDID_MEDIA, &ver, media_revid);
>> +        ret = read_gmdid(xe, GMDID_MEDIA, &ver, media_revid);
>> +        if (ret)
>> +                return ret;
>> +
>>          /* Media may legitimately be fused off / not present */
>>          if (ver == 0)
>> -                return;
>> +                return 0;
>>  
>>          *media_ip = find_media_ip(ver);
>>          if (!*media_ip) {
>>                  drm_err(&xe->drm, "Hardware reports unknown media version %u.%02u\n",
>>                          ver / 100, ver % 100);
>>          }
>> +
>> +        return 0;
>>  }
>>  
>>  /*
>> @@ -690,6 +693,7 @@ static int xe_info_init(struct xe_device *xe,
>>          const struct xe_media_desc *media_desc;
>>          struct xe_tile *tile;
>>          struct xe_gt *gt;
>> +        int ret;
>>          u8 id;
>>  
>>          /*
>> @@ -705,8 +709,11 @@ static int xe_info_init(struct xe_device *xe,
>>                  xe->info.step = xe_step_pre_gmdid_get(xe);
>>          } else {
>>                  xe_assert(xe, !desc->pre_gmdid_media_ip);
>> -                handle_gmdid(xe, &graphics_ip, &media_ip,
>> -                             &graphics_gmdid_revid, &media_gmdid_revid);
>> +                ret = handle_gmdid(xe, &graphics_ip, &media_ip,
>> +                                   &graphics_gmdid_revid, &media_gmdid_revid);
>> +                if (ret)
>> +                        return ret;
>> +
>>                  xe->info.step = xe_step_gmdid_get(xe,
>>                                                    graphics_gmdid_revid,
>>                                                    media_gmdid_revid);
>

  reply	other threads:[~2025-10-02 12:43 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-30 22:56 [PATCH v3 00/23] Allow configfs to disable specific GT type(s) Matt Roper
2025-09-30 22:56 ` [PATCH v3 01/23] drm/xe/huc: Adjust HuC check on primary GT Matt Roper
2025-09-30 22:56 ` [PATCH v3 02/23] drm/xe: Drop GT parameter to xe_display_irq_postinstall() Matt Roper
2025-09-30 22:56 ` [PATCH v3 03/23] drm/xe: Move 'va_bits' flag back to platform descriptor Matt Roper
2025-10-01  9:44   ` Michal Wajdeczko
2025-09-30 22:56 ` [PATCH v3 04/23] drm/xe: Move 'vm_max_level' " Matt Roper
2025-10-01 21:51   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 05/23] drm/xe: Move 'vram_flags' " Matt Roper
2025-09-30 22:56 ` [PATCH v3 06/23] drm/xe: Move 'has_flatccs' " Matt Roper
2025-09-30 22:56 ` [PATCH v3 07/23] drm/xe: Read VF GMD_ID with a specifically-allocated dummy GT Matt Roper
2025-10-01 10:07   ` Michal Wajdeczko
2025-10-02 12:43     ` Gustavo Sousa [this message]
2025-10-07 17:07       ` Matt Roper
2025-10-07 17:14         ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 08/23] drm/xe: Move primary GT allocation from xe_tile_init_early to xe_tile_init Matt Roper
2025-09-30 22:56 ` [PATCH v3 09/23] drm/xe: Skip L2 / TDF cache flushes if primary GT is disabled Matt Roper
2025-10-01  6:39   ` Upadhyay, Tejas
2025-09-30 22:56 ` [PATCH v3 10/23] drm/xe/query: Report hwconfig size as 0 " Matt Roper
2025-10-01  6:42   ` Upadhyay, Tejas
2025-09-30 22:56 ` [PATCH v3 11/23] drm/xe/pmu: Initialize PMU event types based on first available GT Matt Roper
2025-10-01 20:59   ` Lucas De Marchi
2025-09-30 22:56 ` [PATCH v3 12/23] drm/xe: Check for primary GT before looking up Wa_22019338487 Matt Roper
2025-10-01 21:10   ` Lucas De Marchi
2025-10-02 13:46   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 13/23] drm/xe: Make display part of Wa_22019338487 a device workaround Matt Roper
2025-10-02 14:26   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 14/23] drm/xe/irq: Don't try to lookup engine masks for non-existent primary GT Matt Roper
2025-09-30 22:56 ` [PATCH v3 15/23] drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds Matt Roper
2025-10-02 17:49   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 16/23] drm/xe/rtp: Pass xe_device parameter to FUNC matches Matt Roper
2025-10-02 18:24   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 17/23] drm/xe: Bypass Wa_14018094691 when primary GT is disabled Matt Roper
2025-10-03 12:44   ` Gustavo Sousa
2025-10-07 17:39     ` Matt Roper
2025-09-30 22:56 ` [PATCH v3 18/23] drm/xe: Correct lineage for Wa_22014953428 and only check with valid GT Matt Roper
2025-10-03 12:52   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 19/23] drm/xe: Check that GT is not NULL before testing Wa_16023588340 Matt Roper
2025-09-30 22:56 ` [PATCH v3 20/23] drm/xe: Don't check BIOS-disabled FlatCCS if primary GT is disabled Matt Roper
2025-10-03 13:17   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 21/23] drm/xe: Break GT setup out of xe_info_init() Matt Roper
2025-10-03 13:47   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 22/23] drm/xe/configfs: Add attribute to disable GT types Matt Roper
2025-10-03 18:05   ` Gustavo Sousa
2025-09-30 22:56 ` [PATCH v3 23/23] drm/xe/sriov: Disable SR-IOV if primary GT is disabled via configfs Matt Roper
2025-10-01 11:51   ` Michal Wajdeczko
2025-09-30 23:19 ` ✗ CI.checkpatch: warning for Allow configfs to disable specific GT type(s) (rev3) Patchwork
2025-09-30 23:21 ` ✓ CI.KUnit: success " Patchwork
2025-10-01  0:07 ` ✗ Xe.CI.BAT: failure " Patchwork
2025-10-01  2:30 ` ✗ Xe.CI.Full: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=175940901218.1952.3042709661416045697@intel.com \
    --to=gustavo.sousa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox