Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Wajdeczko <michal.wajdeczko@intel.com>
To: Arun R Murthy <arun.r.murthy@intel.com>,
	<intel-xe@lists.freedesktop.org>
Cc: <rodrigo.vivi@intel.com>, <lucas.demarchi@intel.com>,
	Matt Roper <matthew.d.roper@intel.com>
Subject: Re: [PATCH] drm/xe: Check for dead config space on reading all 1s
Date: Tue, 12 May 2026 13:55:33 +0200	[thread overview]
Message-ID: <37b6a12c-2d30-43ef-a4d8-2e73a9adac48@intel.com> (raw)
In-Reply-To: <20260512060128.209698-1-arun.r.murthy@intel.com>



On 5/12/2026 8:01 AM, Arun R Murthy wrote:
> Reading the VF_CAP returns all 1s when the config space is dead leading
> to missdetection of VF and confusing the GuC.
> Check for all 1s and pci device detect can act as a sanity.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7941
> Signed-off-by: Arun R Murthy <arun.r.murthy@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_sriov.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sriov.c b/drivers/gpu/drm/xe/xe_sriov.c
> index f3835867fce5..40e142d78924 100644
> --- a/drivers/gpu/drm/xe/xe_sriov.c
> +++ b/drivers/gpu/drm/xe/xe_sriov.c
> @@ -4,6 +4,7 @@
>   */
>  
>  #include <linux/fault-inject.h>
> +#include <linux/pci.h>
>  
>  #include <drm/drm_managed.h>
>  
> @@ -39,8 +40,21 @@ const char *xe_sriov_mode_to_string(enum xe_sriov_mode mode)
>  
>  static bool test_is_vf(struct xe_device *xe)
>  {
> +	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
>  	u32 value = xe_mmio_read32(xe_root_tile_mmio(xe), VF_CAP_REG);
>  
> +	/*
> +	 * If the device is inaccessible (e.g. parent bridge stuck in D3cold,
> +	 * fatal AER Errors) MMIO reads returns all-ones, 0xffff.
> +	 * VF_CAP would appear set and would misdetect as VF mode.
> +	 * Sanity check PCI presence before trusting the read.
> +	 */

can we have this sanity check elsewhere please?
this is not a SR-IOV specific problem and we will not be here when info.has_sriov is not set

some attempts to show the problem were already done on both i915 and xe, see:

[1] https://elixir.bootlin.com/linux/v7.1-rc1/source/drivers/gpu/drm/i915/intel_uncore.c#L559
[2] https://elixir.bootlin.com/linux/v7.1-rc1/source/drivers/gpu/drm/xe/xe_force_wake.c#L121

so maybe we should just extend our xe_mmio_probe_early() and do some sanity checks there?
then any error will be correctly propagated to abort the probe sooner than today.

also likely similar sanity checks shall be done on the resume paths

Michal


> +	if (value == U32_MAX && !pci_device_is_present(pdev)) {
> +		drm_err(&xe->drm,

nit: we can use xe_err(xe, ...)

> +			"VF_CAP_REG returned all 1s, config space looks to be dead, skipping SR-IOV mode detection\n");

nit: this is still misleading, we will continue as PF/native
and likely we will see just different set of random errors
without realizing that PCI/MMIO is dead ...


> +		return false;
> +	}
> +
>  	return value & VF_CAP;
>  }
>  


  parent reply	other threads:[~2026-05-12 11:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12  6:01 [PATCH] drm/xe: Check for dead config space on reading all 1s Arun R Murthy
2026-05-12 11:12 ` ✓ CI.KUnit: success for " Patchwork
2026-05-12 11:55 ` Michal Wajdeczko [this message]
2026-05-12 12:32 ` ✓ Xe.CI.BAT: " Patchwork
2026-05-12 22:28 ` ✗ Xe.CI.FULL: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=37b6a12c-2d30-43ef-a4d8-2e73a9adac48@intel.com \
    --to=michal.wajdeczko@intel.com \
    --cc=arun.r.murthy@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=rodrigo.vivi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox