linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/xe: Fix some rebar issues
@ 2025-09-18 20:58 Lucas De Marchi
  2025-09-18 20:58 ` [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize Lucas De Marchi
  2025-09-18 20:58 ` [PATCH 2/2] drm/xe: Move rebar to be done earlier Lucas De Marchi
  0 siblings, 2 replies; 10+ messages in thread
From: Lucas De Marchi @ 2025-09-18 20:58 UTC (permalink / raw)
  To: intel-xe, linux-pci, dri-devel
  Cc: Lucas De Marchi, Ilpo Järvinen, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	linux-kernel, stable

Our implementation for BAR2 (lmembar) resize works at the xe_vram layer
and only releases that BAR before resizing. That is not always
sufficient. If the parent bridge needs to move, the BAR0 also needs to
be released, otherwise the resize fails. This is the case of not having
enough space allocated from the beginning.

Also, there's a BAR0 in the upstream port of the pcie switch in BMG
preventing the resize to propagate to the bridge as previously discussed
in https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
and https://lore.kernel.org/intel-xe/wqukxnxni2dbpdhri3cbvlrzsefgdanesgskzmxi5sauvsirsl@xor663jw2cdw
I'm bringing that commit from Ilpo here so this can be tested with the
xe changes and propagate to stable. Note that the use of a pci fixup is
not ideal, but without intrusive changes on resource fitting it's
possibly the best alternative. I also have confirmation from HW folks
that the BAR in the upstream port has no production use.

I have more cleanups on top on the xe side, but those conflict with some
refactors Ilpo is working on as prep for the resource fitting, so I will
wait things settle to submit again.

I propose to take this through the drm tree.

With this I could resize the lmembar on some problematic hosts and after
doing an SBR, with one caveat: the audio device also prevents the BAR
from moving and it needs to be manually removed before resizing. With
the PCI refactors and BAR fitting logic that Ilpo is working on, it's
expected that it won't be needed for a long time.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
Ilpo Järvinen (1):
      PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize

Lucas De Marchi (1):
      drm/xe: Move rebar to be done earlier

 drivers/gpu/drm/xe/xe_pci.c  |  2 ++
 drivers/gpu/drm/xe/xe_vram.c | 34 ++++++++++++++++++++++++++--------
 drivers/gpu/drm/xe/xe_vram.h |  1 +
 drivers/pci/quirks.c         | 23 +++++++++++++++++++++++
 4 files changed, 52 insertions(+), 8 deletions(-)

base-commit: 8031d70dbb4201841897de480cec1f9750d4a5dc
change-id: 20250917-xe-pci-rebar-2-c0fe2f04c879

Lucas De Marchi


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
  2025-09-18 20:58 [PATCH 0/2] drm/xe: Fix some rebar issues Lucas De Marchi
@ 2025-09-18 20:58 ` Lucas De Marchi
  2025-10-02 16:09   ` Lucas De Marchi
  2025-10-24 22:44   ` Bjorn Helgaas
  2025-09-18 20:58 ` [PATCH 2/2] drm/xe: Move rebar to be done earlier Lucas De Marchi
  1 sibling, 2 replies; 10+ messages in thread
From: Lucas De Marchi @ 2025-09-18 20:58 UTC (permalink / raw)
  To: intel-xe, linux-pci, dri-devel
  Cc: Lucas De Marchi, Ilpo Järvinen, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	linux-kernel, stable

From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Resizing BAR to a larger size has to release upstream bridge windows in
order make the bridge windows larger as well (and to potential relocate
them into a larger free block within iomem space). Some GPUs have an
integrated PCI switch that has BAR0. The resource allocation assigns
space for that BAR0 as it does for any resource.

An extra resource on a bridge will pin its upstream bridge window in
place which prevents BAR resize for anything beneath that bridge.

Nothing in the pcieport driver provided by PCI core, which typically is
the driver bound to these bridges, requires that BAR0. Because of that,
releasing the extra BAR does not seem to have notable downsides but
comes with a clear upside.

Therefore, release BAR0 of such switches using a quirk and clear its
flags to prevent any new invocation of the resource assignment
algorithm from assigning the resource again.

Due to other siblings within the PCI hierarchy of all the devices
integrated into the GPU, some other devices may still have to be
manually removed before the resize is free of any bridge window pins.
Such siblings can be released through sysfs to unpin windows while
leaving access to GPU's sysfs entries required for initiating the
resize operation, whereas removing the topmost bridge this quirk
targets would result in removing the GPU device as well so no manual
workaround for this problem exists.

Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---

Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
better solution which is to consider Resizable BAR maximum size the
resource fitting algorithm. But then, I don't expect the better solution
to be something we want to push into stable due to extremely invasive
dependencies. So maybe consider this an interim/legacy solution to the
resizing problem and remove it once the algorithmic approach works (or
more precisely retain it only in the old kernel versions).
---
 drivers/pci/quirks.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index d97335a401930..9b1c08de3aa89 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
 #endif
+
+/*
+ * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
+ * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
+ * bridge window it's under in place. Nothing in pcieport requires that
+ * BAR0.
+ *
+ * Release and disable BAR0 permanently by clearing its flags to prevent
+ * anything from assigning it again.
+ */
+static void pci_release_bar0(struct pci_dev *pdev)
+{
+	struct resource *res = pci_resource_n(pdev, 0);
+
+	if (!res->parent)
+		return;
+
+	pci_release_resource(pdev, 0);
+	res->flags = 0;
+}
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/xe: Move rebar to be done earlier
  2025-09-18 20:58 [PATCH 0/2] drm/xe: Fix some rebar issues Lucas De Marchi
  2025-09-18 20:58 ` [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize Lucas De Marchi
@ 2025-09-18 20:58 ` Lucas De Marchi
  2025-09-29 13:37   ` Lucas De Marchi
  1 sibling, 1 reply; 10+ messages in thread
From: Lucas De Marchi @ 2025-09-18 20:58 UTC (permalink / raw)
  To: intel-xe, linux-pci, dri-devel
  Cc: Lucas De Marchi, Ilpo Järvinen, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	linux-kernel, stable

There may be cases in which the BAR0 also needs to move to accommodate
the bigger BAR2. However if it's not released, the BAR2 resize fails.
During the vram probe it can't be released as it's already in use by
xe_mmio for early register access.

Add a new function in xe_vram and let xe_pci call it directly before
even early device probe. This allows the BAR2 to resize in cases BAR0
also needs to move:

	[] xe 0000:03:00.0: vgaarb: deactivate vga console
	[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB
	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
	[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
	[] pcieport 0000:00:01.0:   bridge window [mem 0x83000000-0x840fffff]
	[] pcieport 0000:00:01.0:   bridge window [mem 0x4000000000-0x44007fffff 64bit pref]
	[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
	[] pcieport 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
	[] pcieport 0000:01:00.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
	[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
	[] pcieport 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
	[] pcieport 0000:02:01.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
	[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
	[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE  e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ...

As shown above, it happens even before we try to read any register for
platform identification.

All the rebar logic is more pci-specific than xe-specific and can be
done very early in the probe sequence. In future it would be good to
move it out of xe_vram.c, but this refactor is left for later.

Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # 6.12+
Link: https://lore.kernel.org/intel-xe/fafda2a3-fc63-ce97-d22b-803f771a4d19@linux.intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
---
v2:
- Use res->parent to test resource assignment and avoid resetting
  resources unnecessarily (Ilpo)
- Use pci_dev_for_each_resource() to loop through the resources and
  release resource with same type as lmembar (Ilpo)
---
 drivers/gpu/drm/xe/xe_pci.c  |  2 ++
 drivers/gpu/drm/xe/xe_vram.c | 34 ++++++++++++++++++++++++++--------
 drivers/gpu/drm/xe/xe_vram.h |  1 +
 3 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
index 77bee811a1501..95c8aafc0810e 100644
--- a/drivers/gpu/drm/xe/xe_pci.c
+++ b/drivers/gpu/drm/xe/xe_pci.c
@@ -868,6 +868,8 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (err)
 		return err;
 
+	xe_vram_resize_bar(xe);
+
 	err = xe_device_probe_early(xe);
 	/*
 	 * In Boot Survivability mode, no drm card is exposed and driver
diff --git a/drivers/gpu/drm/xe/xe_vram.c b/drivers/gpu/drm/xe/xe_vram.c
index b44ebf50fedbb..652df7a5f4f65 100644
--- a/drivers/gpu/drm/xe/xe_vram.c
+++ b/drivers/gpu/drm/xe/xe_vram.c
@@ -26,15 +26,35 @@
 
 #define BAR_SIZE_SHIFT 20
 
-static void
-_resize_bar(struct xe_device *xe, int resno, resource_size_t size)
+/*
+ * Release all the BARs that could influence/block LMEMBAR resizing, i.e.
+ * assigned IORESOURCE_MEM_64 BARs
+ */
+static void release_bars(struct pci_dev *pdev)
+{
+	struct resource *res;
+	int i;
+
+	pci_dev_for_each_resource(pdev, res, i) {
+		/* Resource already un-assigned, do not reset it */
+		if (!res->parent)
+			continue;
+
+		/* No need to release unrelated BARs */
+		if (!(res->flags & IORESOURCE_MEM_64))
+			continue;
+
+		pci_release_resource(pdev, i);
+	}
+}
+
+static void resize_bar(struct xe_device *xe, int resno, resource_size_t size)
 {
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
 	int bar_size = pci_rebar_bytes_to_size(size);
 	int ret;
 
-	if (pci_resource_len(pdev, resno))
-		pci_release_resource(pdev, resno);
+	release_bars(pdev);
 
 	ret = pci_resize_resource(pdev, resno, bar_size);
 	if (ret) {
@@ -50,7 +70,7 @@ _resize_bar(struct xe_device *xe, int resno, resource_size_t size)
  * if force_vram_bar_size is set, attempt to set to the requested size
  * else set to maximum possible size
  */
-static void resize_vram_bar(struct xe_device *xe)
+void xe_vram_resize_bar(struct xe_device *xe)
 {
 	int force_vram_bar_size = xe_modparam.force_vram_bar_size;
 	struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
@@ -119,7 +139,7 @@ static void resize_vram_bar(struct xe_device *xe)
 	pci_read_config_dword(pdev, PCI_COMMAND, &pci_cmd);
 	pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd & ~PCI_COMMAND_MEMORY);
 
-	_resize_bar(xe, LMEM_BAR, rebar_size);
+	resize_bar(xe, LMEM_BAR, rebar_size);
 
 	pci_assign_unassigned_bus_resources(pdev->bus);
 	pci_write_config_dword(pdev, PCI_COMMAND, pci_cmd);
@@ -148,8 +168,6 @@ static int determine_lmem_bar_size(struct xe_device *xe, struct xe_vram_region *
 		return -ENXIO;
 	}
 
-	resize_vram_bar(xe);
-
 	lmem_bar->io_start = pci_resource_start(pdev, LMEM_BAR);
 	lmem_bar->io_size = pci_resource_len(pdev, LMEM_BAR);
 	if (!lmem_bar->io_size)
diff --git a/drivers/gpu/drm/xe/xe_vram.h b/drivers/gpu/drm/xe/xe_vram.h
index 72860f714fc66..13505cfb184dc 100644
--- a/drivers/gpu/drm/xe/xe_vram.h
+++ b/drivers/gpu/drm/xe/xe_vram.h
@@ -11,6 +11,7 @@
 struct xe_device;
 struct xe_vram_region;
 
+void xe_vram_resize_bar(struct xe_device *xe);
 int xe_vram_probe(struct xe_device *xe);
 
 struct xe_vram_region *xe_vram_region_alloc(struct xe_device *xe, u8 id, u32 placement);

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/xe: Move rebar to be done earlier
  2025-09-18 20:58 ` [PATCH 2/2] drm/xe: Move rebar to be done earlier Lucas De Marchi
@ 2025-09-29 13:37   ` Lucas De Marchi
  2025-09-29 13:56     ` Ilpo Järvinen
  0 siblings, 1 reply; 10+ messages in thread
From: Lucas De Marchi @ 2025-09-29 13:37 UTC (permalink / raw)
  To: intel-xe, linux-pci, dri-devel
  Cc: Ilpo Järvinen, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	linux-kernel, stable

Hi,

On Thu, Sep 18, 2025 at 01:58:57PM -0700, Lucas De Marchi wrote:
>There may be cases in which the BAR0 also needs to move to accommodate
>the bigger BAR2. However if it's not released, the BAR2 resize fails.
>During the vram probe it can't be released as it's already in use by
>xe_mmio for early register access.
>
>Add a new function in xe_vram and let xe_pci call it directly before
>even early device probe. This allows the BAR2 to resize in cases BAR0
>also needs to move:
>
>	[] xe 0000:03:00.0: vgaarb: deactivate vga console
>	[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB -> 16384MiB
>	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
>	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
>	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
>	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff 64bit pref]: releasing
>	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
>	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
>	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]: assigned
>	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
>	[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
>	[] pcieport 0000:00:01.0:   bridge window [mem 0x83000000-0x840fffff]
>	[] pcieport 0000:00:01.0:   bridge window [mem 0x4000000000-0x44007fffff 64bit pref]
>	[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
>	[] pcieport 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
>	[] pcieport 0000:01:00.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
>	[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
>	[] pcieport 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
>	[] pcieport 0000:02:01.0:   bridge window [mem 0x4000000000-0x43ffffffff 64bit pref]
>	[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
>	[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE  e221:0000 dgfx:1 gfx:Xe2_HPG (20.02) ...
>
>As shown above, it happens even before we try to read any register for
>platform identification.
>
>All the rebar logic is more pci-specific than xe-specific and can be
>done very early in the probe sequence. In future it would be good to
>move it out of xe_vram.c, but this refactor is left for later.

Ilpo, can you take a look on this patch? It fixed the issue that I had
with BMG. It needs the first patch for the full fix, but the fixes are
more or less orthogonal.

thanks
Lucas De Marchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/xe: Move rebar to be done earlier
  2025-09-29 13:37   ` Lucas De Marchi
@ 2025-09-29 13:56     ` Ilpo Järvinen
  2025-10-10  5:18       ` Lucas De Marchi
  0 siblings, 1 reply; 10+ messages in thread
From: Ilpo Järvinen @ 2025-09-29 13:56 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, linux-pci, dri-devel, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	LKML, stable

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]

On Mon, 29 Sep 2025, Lucas De Marchi wrote:

> Hi,
> 
> On Thu, Sep 18, 2025 at 01:58:57PM -0700, Lucas De Marchi wrote:
> > There may be cases in which the BAR0 also needs to move to accommodate
> > the bigger BAR2. However if it's not released, the BAR2 resize fails.
> > During the vram probe it can't be released as it's already in use by
> > xe_mmio for early register access.
> > 
> > Add a new function in xe_vram and let xe_pci call it directly before
> > even early device probe. This allows the BAR2 to resize in cases BAR0
> > also needs to move:
> > 
> > 	[] xe 0000:03:00.0: vgaarb: deactivate vga console
> > 	[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB ->
> > 16384MiB
> > 	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
> > 	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]:
> > releasing
> > 	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff
> > 64bit pref]: releasing
> > 	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff
> > 64bit pref]: releasing
> > 	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff
> > 64bit pref]: assigned
> > 	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff
> > 64bit pref]: assigned
> > 	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]:
> > assigned
> > 	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
> > 	[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
> > 	[] pcieport 0000:00:01.0:   bridge window [mem 0x83000000-0x840fffff]
> > 	[] pcieport 0000:00:01.0:   bridge window [mem
> > 0x4000000000-0x44007fffff 64bit pref]
> > 	[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
> > 	[] pcieport 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
> > 	[] pcieport 0000:01:00.0:   bridge window [mem
> > 0x4000000000-0x43ffffffff 64bit pref]
> > 	[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
> > 	[] pcieport 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
> > 	[] pcieport 0000:02:01.0:   bridge window [mem
> > 0x4000000000-0x43ffffffff 64bit pref]
> > 	[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
> > 	[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE  e221:0000
> > dgfx:1 gfx:Xe2_HPG (20.02) ...
> > 
> > As shown above, it happens even before we try to read any register for
> > platform identification.
> > 
> > All the rebar logic is more pci-specific than xe-specific and can be
> > done very early in the probe sequence. In future it would be good to
> > move it out of xe_vram.c, but this refactor is left for later.
> 
> Ilpo, can you take a look on this patch? It fixed the issue that I had
> with BMG. It needs the first patch for the full fix, but the fixes are
> more or less orthogonal.

FWIW, it looks okay to me from PCI perspective,

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
  2025-09-18 20:58 ` [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize Lucas De Marchi
@ 2025-10-02 16:09   ` Lucas De Marchi
  2025-10-24 22:44   ` Bjorn Helgaas
  1 sibling, 0 replies; 10+ messages in thread
From: Lucas De Marchi @ 2025-10-02 16:09 UTC (permalink / raw)
  To: intel-xe, linux-pci, dri-devel, Bjorn Helgaas
  Cc: Ilpo Järvinen, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Simon Richter, linux-kernel,
	stable

On Thu, Sep 18, 2025 at 01:58:56PM -0700, Lucas De Marchi wrote:
>From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>
>Resizing BAR to a larger size has to release upstream bridge windows in
>order make the bridge windows larger as well (and to potential relocate
>them into a larger free block within iomem space). Some GPUs have an
>integrated PCI switch that has BAR0. The resource allocation assigns
>space for that BAR0 as it does for any resource.
>
>An extra resource on a bridge will pin its upstream bridge window in
>place which prevents BAR resize for anything beneath that bridge.
>
>Nothing in the pcieport driver provided by PCI core, which typically is
>the driver bound to these bridges, requires that BAR0. Because of that,
>releasing the extra BAR does not seem to have notable downsides but
>comes with a clear upside.
>
>Therefore, release BAR0 of such switches using a quirk and clear its
>flags to prevent any new invocation of the resource assignment
>algorithm from assigning the resource again.
>
>Due to other siblings within the PCI hierarchy of all the devices
>integrated into the GPU, some other devices may still have to be
>manually removed before the resize is free of any bridge window pins.
>Such siblings can be released through sysfs to unpin windows while
>leaving access to GPU's sysfs entries required for initiating the
>resize operation, whereas removing the topmost bridge this quirk
>targets would result in removing the GPU device as well so no manual
>workaround for this problem exists.
>
>Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
>Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
>Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
>Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>Cc: <stable@vger.kernel.org> # v6.12+
>Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>---
>
>Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
>better solution which is to consider Resizable BAR maximum size the
>resource fitting algorithm. But then, I don't expect the better solution
>to be something we want to push into stable due to extremely invasive
>dependencies. So maybe consider this an interim/legacy solution to the
>resizing problem and remove it once the algorithmic approach works (or
>more precisely retain it only in the old kernel versions).

Bjorn, would that be acceptable? If so, please let me know if I can take
it through the drm tree together with the second patch to have this BAR
resize fixed for BMG.

thanks
Lucas De Marchi


>---
> drivers/pci/quirks.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
>diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>index d97335a401930..9b1c08de3aa89 100644
>--- a/drivers/pci/quirks.c
>+++ b/drivers/pci/quirks.c
>@@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
> #endif
>+
>+/*
>+ * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
>+ * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
>+ * bridge window it's under in place. Nothing in pcieport requires that
>+ * BAR0.
>+ *
>+ * Release and disable BAR0 permanently by clearing its flags to prevent
>+ * anything from assigning it again.
>+ */
>+static void pci_release_bar0(struct pci_dev *pdev)
>+{
>+	struct resource *res = pci_resource_n(pdev, 0);
>+
>+	if (!res->parent)
>+		return;
>+
>+	pci_release_resource(pdev, 0);
>+	res->flags = 0;
>+}
>+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
>+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
>+DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);
>
>-- 
>2.50.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/xe: Move rebar to be done earlier
  2025-09-29 13:56     ` Ilpo Järvinen
@ 2025-10-10  5:18       ` Lucas De Marchi
  0 siblings, 0 replies; 10+ messages in thread
From: Lucas De Marchi @ 2025-10-10  5:18 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: intel-xe, linux-pci, dri-devel, Icenowy Zheng, Vivian Wang,
	Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas, Simon Richter,
	LKML, stable

On Mon, Sep 29, 2025 at 04:56:03PM +0300, Ilpo Järvinen wrote:
>On Mon, 29 Sep 2025, Lucas De Marchi wrote:
>
>> Hi,
>>
>> On Thu, Sep 18, 2025 at 01:58:57PM -0700, Lucas De Marchi wrote:
>> > There may be cases in which the BAR0 also needs to move to accommodate
>> > the bigger BAR2. However if it's not released, the BAR2 resize fails.
>> > During the vram probe it can't be released as it's already in use by
>> > xe_mmio for early register access.
>> >
>> > Add a new function in xe_vram and let xe_pci call it directly before
>> > even early device probe. This allows the BAR2 to resize in cases BAR0
>> > also needs to move:
>> >
>> > 	[] xe 0000:03:00.0: vgaarb: deactivate vga console
>> > 	[] xe 0000:03:00.0: [drm] Attempting to resize bar from 8192MiB ->
>> > 16384MiB
>> > 	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: releasing
>> > 	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x41ffffffff 64bit pref]:
>> > releasing
>> > 	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x41ffffffff
>> > 64bit pref]: releasing
>> > 	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x41ffffffff
>> > 64bit pref]: releasing
>> > 	[] pcieport 0000:01:00.0: bridge window [mem 0x4000000000-0x43ffffffff
>> > 64bit pref]: assigned
>> > 	[] pcieport 0000:02:01.0: bridge window [mem 0x4000000000-0x43ffffffff
>> > 64bit pref]: assigned
>> > 	[] xe 0000:03:00.0: BAR 2 [mem 0x4000000000-0x43ffffffff 64bit pref]:
>> > assigned
>> > 	[] xe 0000:03:00.0: BAR 0 [mem 0x83000000-0x83ffffff 64bit]: assigned
>> > 	[] pcieport 0000:00:01.0: PCI bridge to [bus 01-04]
>> > 	[] pcieport 0000:00:01.0:   bridge window [mem 0x83000000-0x840fffff]
>> > 	[] pcieport 0000:00:01.0:   bridge window [mem
>> > 0x4000000000-0x44007fffff 64bit pref]
>> > 	[] pcieport 0000:01:00.0: PCI bridge to [bus 02-04]
>> > 	[] pcieport 0000:01:00.0:   bridge window [mem 0x83000000-0x840fffff]
>> > 	[] pcieport 0000:01:00.0:   bridge window [mem
>> > 0x4000000000-0x43ffffffff 64bit pref]
>> > 	[] pcieport 0000:02:01.0: PCI bridge to [bus 03]
>> > 	[] pcieport 0000:02:01.0:   bridge window [mem 0x83000000-0x83ffffff]
>> > 	[] pcieport 0000:02:01.0:   bridge window [mem
>> > 0x4000000000-0x43ffffffff 64bit pref]
>> > 	[] xe 0000:03:00.0: [drm] BAR2 resized to 16384M
>> > 	[] xe 0000:03:00.0: [drm:xe_pci_probe [xe]] BATTLEMAGE  e221:0000
>> > dgfx:1 gfx:Xe2_HPG (20.02) ...
>> >
>> > As shown above, it happens even before we try to read any register for
>> > platform identification.
>> >
>> > All the rebar logic is more pci-specific than xe-specific and can be
>> > done very early in the probe sequence. In future it would be good to
>> > move it out of xe_vram.c, but this refactor is left for later.
>>
>> Ilpo, can you take a look on this patch? It fixed the issue that I had
>> with BMG. It needs the first patch for the full fix, but the fixes are
>> more or less orthogonal.
>
>FWIW, it looks okay to me from PCI perspective,
>
>Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

I'm pushing this to drm-xe-next. The first one may go through pci or drm
tree when it's reviewed.

Merged to drm-xe-next, thanks!

[2/2] drm/xe: Move rebar to be done earlier
       commit: 45e33f220fd625492c11e15733d8e9b4f9db82a4

thanks
Lucas De Marchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
  2025-09-18 20:58 ` [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize Lucas De Marchi
  2025-10-02 16:09   ` Lucas De Marchi
@ 2025-10-24 22:44   ` Bjorn Helgaas
  2025-10-24 23:00     ` Lucas De Marchi
  2025-10-27 15:04     ` Ilpo Järvinen
  1 sibling, 2 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2025-10-24 22:44 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: intel-xe, linux-pci, dri-devel, Ilpo Järvinen, Icenowy Zheng,
	Vivian Wang, Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas,
	Simon Richter, linux-kernel, stable

On Thu, Sep 18, 2025 at 01:58:56PM -0700, Lucas De Marchi wrote:
> From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> 
> Resizing BAR to a larger size has to release upstream bridge windows in
> order make the bridge windows larger as well (and to potential relocate
> them into a larger free block within iomem space). Some GPUs have an
> integrated PCI switch that has BAR0. The resource allocation assigns
> space for that BAR0 as it does for any resource.
> 
> An extra resource on a bridge will pin its upstream bridge window in
> place which prevents BAR resize for anything beneath that bridge.
> 
> Nothing in the pcieport driver provided by PCI core, which typically is
> the driver bound to these bridges, requires that BAR0. Because of that,
> releasing the extra BAR does not seem to have notable downsides but
> comes with a clear upside.
> 
> Therefore, release BAR0 of such switches using a quirk and clear its
> flags to prevent any new invocation of the resource assignment
> algorithm from assigning the resource again.
> 
> Due to other siblings within the PCI hierarchy of all the devices
> integrated into the GPU, some other devices may still have to be
> manually removed before the resize is free of any bridge window pins.
> Such siblings can be released through sysfs to unpin windows while
> leaving access to GPU's sysfs entries required for initiating the
> resize operation, whereas removing the topmost bridge this quirk
> targets would result in removing the GPU device as well so no manual
> workaround for this problem exists.
> 
> Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
> Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
> Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.12+
> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> ---
> 
> Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
> better solution which is to consider Resizable BAR maximum size the
> resource fitting algorithm. But then, I don't expect the better solution
> to be something we want to push into stable due to extremely invasive
> dependencies. So maybe consider this an interim/legacy solution to the
> resizing problem and remove it once the algorithmic approach works (or
> more precisely retain it only in the old kernel versions).
> ---
>  drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index d97335a401930..9b1c08de3aa89 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>  #endif
> +
> +/*
> + * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
> + * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
> + * bridge window it's under in place. Nothing in pcieport requires that
> + * BAR0.
> + *
> + * Release and disable BAR0 permanently by clearing its flags to prevent
> + * anything from assigning it again.

Does "disabling BAR0" actually work?  This quirk keeps the PCI core
from assigning resources to the BAR, but I don't think we have a way
to actually disable an individual BAR, do we?

I think the only control is PCI_COMMAND_MEMORY, and the bridge must
have PCI_COMMAND_MEMORY enabled so memory accesses to downstream
devices work.

No matter what we do to the struct resource, the hardware BAR still
contains some address, and the bridge will decode any accesses that
match the address in the BAR.

Maybe we could effectively disable the BAR by setting it to some
impossible address, i.e., something outside both the upstream and
downstream bridge windows so memory accesses could never be routed to
it?

> + */
> +static void pci_release_bar0(struct pci_dev *pdev)
> +{
> +	struct resource *res = pci_resource_n(pdev, 0);
> +
> +	if (!res->parent)
> +		return;
> +
> +	pci_release_resource(pdev, 0);
> +	res->flags = 0;
> +}
> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);
> 
> -- 
> 2.50.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
  2025-10-24 22:44   ` Bjorn Helgaas
@ 2025-10-24 23:00     ` Lucas De Marchi
  2025-10-27 15:04     ` Ilpo Järvinen
  1 sibling, 0 replies; 10+ messages in thread
From: Lucas De Marchi @ 2025-10-24 23:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: intel-xe, linux-pci, dri-devel, Ilpo Järvinen, Icenowy Zheng,
	Vivian Wang, Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas,
	Simon Richter, linux-kernel, stable

On Fri, Oct 24, 2025 at 05:44:01PM -0500, Bjorn Helgaas wrote:
>On Thu, Sep 18, 2025 at 01:58:56PM -0700, Lucas De Marchi wrote:
>> From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>>
>> Resizing BAR to a larger size has to release upstream bridge windows in
>> order make the bridge windows larger as well (and to potential relocate
>> them into a larger free block within iomem space). Some GPUs have an
>> integrated PCI switch that has BAR0. The resource allocation assigns
>> space for that BAR0 as it does for any resource.
>>
>> An extra resource on a bridge will pin its upstream bridge window in
>> place which prevents BAR resize for anything beneath that bridge.
>>
>> Nothing in the pcieport driver provided by PCI core, which typically is
>> the driver bound to these bridges, requires that BAR0. Because of that,
>> releasing the extra BAR does not seem to have notable downsides but
>> comes with a clear upside.
>>
>> Therefore, release BAR0 of such switches using a quirk and clear its
>> flags to prevent any new invocation of the resource assignment
>> algorithm from assigning the resource again.
>>
>> Due to other siblings within the PCI hierarchy of all the devices
>> integrated into the GPU, some other devices may still have to be
>> manually removed before the resize is free of any bridge window pins.
>> Such siblings can be released through sysfs to unpin windows while
>> leaving access to GPU's sysfs entries required for initiating the
>> resize operation, whereas removing the topmost bridge this quirk
>> targets would result in removing the GPU device as well so no manual
>> workaround for this problem exists.
>>
>> Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
>> Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>> Cc: <stable@vger.kernel.org> # v6.12+
>> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
>> ---
>>
>> Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
>> better solution which is to consider Resizable BAR maximum size the
>> resource fitting algorithm. But then, I don't expect the better solution
>> to be something we want to push into stable due to extremely invasive
>> dependencies. So maybe consider this an interim/legacy solution to the
>> resizing problem and remove it once the algorithmic approach works (or
>> more precisely retain it only in the old kernel versions).
>> ---
>>  drivers/pci/quirks.c | 23 +++++++++++++++++++++++
>>  1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index d97335a401930..9b1c08de3aa89 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
>>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
>>  #endif
>> +
>> +/*
>> + * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
>> + * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
>> + * bridge window it's under in place. Nothing in pcieport requires that
>> + * BAR0.
>> + *
>> + * Release and disable BAR0 permanently by clearing its flags to prevent
>> + * anything from assigning it again.
>
>Does "disabling BAR0" actually work?  This quirk keeps the PCI core
>from assigning resources to the BAR, but I don't think we have a way
>to actually disable an individual BAR, do we?
>
>I think the only control is PCI_COMMAND_MEMORY, and the bridge must
>have PCI_COMMAND_MEMORY enabled so memory accesses to downstream
>devices work.
>
>No matter what we do to the struct resource, the hardware BAR still
>contains some address, and the bridge will decode any accesses that
>match the address in the BAR.

there's no real use for that BAR, so I don't think it matters if the hw
will still decode accesses to it - in this case I don't think it's
really important to really disable it. As long as pci can manage the
resources and not block the move of endpoint's BARs, it should work.

These 2 patches definitely fixed the rebar for me in a system with
Battle Mage GPU. I don't have access to it right now, but I can dig more
info about it if it's needed.

>
>Maybe we could effectively disable the BAR by setting it to some
>impossible address, i.e., something outside both the upstream and
>downstream bridge windows so memory accesses could never be routed to
>it?

yeah, I guess that would be possible, but given the above, do you see
any advantage on that?

thanks
Lucas De Marchi

>
>> + */
>> +static void pci_release_bar0(struct pci_dev *pdev)
>> +{
>> +	struct resource *res = pci_resource_n(pdev, 0);
>> +
>> +	if (!res->parent)
>> +		return;
>> +
>> +	pci_release_resource(pdev, 0);
>> +	res->flags = 0;
>> +}
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
>> +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);
>>
>> --
>> 2.50.1
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize
  2025-10-24 22:44   ` Bjorn Helgaas
  2025-10-24 23:00     ` Lucas De Marchi
@ 2025-10-27 15:04     ` Ilpo Järvinen
  1 sibling, 0 replies; 10+ messages in thread
From: Ilpo Järvinen @ 2025-10-27 15:04 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Lucas De Marchi, intel-xe, linux-pci, dri-devel, Icenowy Zheng,
	Vivian Wang, Thomas Hellström, Rodrigo Vivi, Bjorn Helgaas,
	Simon Richter, LKML, stable

[-- Attachment #1: Type: text/plain, Size: 5980 bytes --]

On Fri, 24 Oct 2025, Bjorn Helgaas wrote:
> On Thu, Sep 18, 2025 at 01:58:56PM -0700, Lucas De Marchi wrote:
> > From: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > 
> > Resizing BAR to a larger size has to release upstream bridge windows in
> > order make the bridge windows larger as well (and to potential relocate
> > them into a larger free block within iomem space). Some GPUs have an
> > integrated PCI switch that has BAR0. The resource allocation assigns
> > space for that BAR0 as it does for any resource.
> > 
> > An extra resource on a bridge will pin its upstream bridge window in
> > place which prevents BAR resize for anything beneath that bridge.
> > 
> > Nothing in the pcieport driver provided by PCI core, which typically is
> > the driver bound to these bridges, requires that BAR0. Because of that,
> > releasing the extra BAR does not seem to have notable downsides but
> > comes with a clear upside.
> > 
> > Therefore, release BAR0 of such switches using a quirk and clear its
> > flags to prevent any new invocation of the resource assignment
> > algorithm from assigning the resource again.
> > 
> > Due to other siblings within the PCI hierarchy of all the devices
> > integrated into the GPU, some other devices may still have to be
> > manually removed before the resize is free of any bridge window pins.
> > Such siblings can be released through sysfs to unpin windows while
> > leaving access to GPU's sysfs entries required for initiating the
> > resize operation, whereas removing the topmost bridge this quirk
> > targets would result in removing the GPU device as well so no manual
> > workaround for this problem exists.
> > 
> > Reported-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > Link: https://lore.kernel.org/linux-pci/fl6tx5ztvttg7txmz2ps7oyd745wg3lwcp3h7esmvnyg26n44y@owo2ojiu2mov/
> > Link: https://lore.kernel.org/intel-xe/20250721173057.867829-1-uwu@icenowy.me/
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> > Cc: <stable@vger.kernel.org> # v6.12+
> > Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
> > ---
> > 
> > Remarks from Ilpo: this feels quite hacky to me and I'm working towards a
> > better solution which is to consider Resizable BAR maximum size the
> > resource fitting algorithm. But then, I don't expect the better solution
> > to be something we want to push into stable due to extremely invasive
> > dependencies. So maybe consider this an interim/legacy solution to the
> > resizing problem and remove it once the algorithmic approach works (or
> > more precisely retain it only in the old kernel versions).
> > ---
> >  drivers/pci/quirks.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index d97335a401930..9b1c08de3aa89 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -6338,3 +6338,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
> >  #endif
> > +
> > +/*
> > + * PCI switches integrated into Intel Arc GPUs have BAR0 that prevents
> > + * resizing the BARs of the GPU device due to that bridge BAR0 pinning the
> > + * bridge window it's under in place. Nothing in pcieport requires that
> > + * BAR0.
> > + *
> > + * Release and disable BAR0 permanently by clearing its flags to prevent
> > + * anything from assigning it again.
> 
> Does "disabling BAR0" actually work?  This quirk keeps the PCI core
> from assigning resources to the BAR, but I don't think we have a way
> to actually disable an individual BAR, do we?

No, we don't and that was just sloppy wording from me. The same problem
applies to any other non-assigned BAR resource, they too are there with
a dangling address that could conflict.

> I think the only control is PCI_COMMAND_MEMORY, and the bridge must
> have PCI_COMMAND_MEMORY enabled so memory accesses to downstream
> devices work.
> 
> No matter what we do to the struct resource, the hardware BAR still
> contains some address, and the bridge will decode any accesses that
> match the address in the BAR.
> 
> Maybe we could effectively disable the BAR by setting it to some
> impossible address, i.e., something outside both the upstream and
> downstream bridge windows so memory accesses could never be routed to
> it?

I'm not entire sure how one should acquire address outside of the valid 
address ranges? Is the resource-to-bus mapping even valid outside a 
window?

Perhaps find either min(start address) or max(end address) over all
windows as those boundary addresses should be still mappable and place 
the BAR right below or above either of those by subtracting the resource 
size or adding +1). How does that approach sound?

(There could be cases where a simple approach like that fails when both 
ends of the range are in use but then I wouldn't want to over-engineer the 
approach at this point unless we know there are such problematic cases
in practice.)

It would be nice to do it eventually for any non-assigned BAR but it 
requires preserving those res->flags (for non-window resources too) in 
order to know which of them are even even usable as BARs.

-- 
 i.

> > + */
> > +static void pci_release_bar0(struct pci_dev *pdev)
> > +{
> > +	struct resource *res = pci_resource_n(pdev, 0);
> > +
> > +	if (!res->parent)
> > +		return;
> > +
> > +	pci_release_resource(pdev, 0);
> > +	res->flags = 0;
> > +}
> > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa0, pci_release_bar0);
> > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0x4fa1, pci_release_bar0);
> > +DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_INTEL, 0xe2ff, pci_release_bar0);
> > 
> > -- 
> > 2.50.1
> > 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-27 15:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-18 20:58 [PATCH 0/2] drm/xe: Fix some rebar issues Lucas De Marchi
2025-09-18 20:58 ` [PATCH 1/2] PCI: Release BAR0 of an integrated bridge to allow GPU BAR resize Lucas De Marchi
2025-10-02 16:09   ` Lucas De Marchi
2025-10-24 22:44   ` Bjorn Helgaas
2025-10-24 23:00     ` Lucas De Marchi
2025-10-27 15:04     ` Ilpo Järvinen
2025-09-18 20:58 ` [PATCH 2/2] drm/xe: Move rebar to be done earlier Lucas De Marchi
2025-09-29 13:37   ` Lucas De Marchi
2025-09-29 13:56     ` Ilpo Järvinen
2025-10-10  5:18       ` Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).