linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] PCI: Unify domain emulation and misc documentation update
@ 2025-07-16 16:08 Dan Williams
  2025-07-16 16:08 ` [PATCH 1/3] PCI: Establish document for PCI host bridge sysfs attributes Dan Williams
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Dan Williams @ 2025-07-16 16:08 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, lukas, linux-kernel, Jonathan.Cameron, Dexuan Cui,
	Haiyang Zhang, K. Y. Srinivasan, Lorenzo Pieralisi,
	Manivannan Sadhasivam, Nirmal Patel, Rob Herring,
	Suzuki K Poulose, Wei Liu

Bjorn,

This is a small collection of miscellaneous updates that originated in
the PCI/TSM work, but are suitable to go ahead in v6.17. It is a
documentation update and a new pci_bus_find_emul_domain_nr() helper.

First, the PCI/TSM work (Trusted Execution Environment Security Manager
(PCI device assignment for confidential guests)) wants to add some
additional PCI host bridge sysfs attributes. In preparation for that,
document what is already there.

Next, the PCI/TSM effort proposes samples/devsec/ as a reference and
test implementation of all the TSM infrastructure. It is implemented via
host bridge emulation and aims to be cross-architecture compatible. It
stumbled over the current state of PCI domain number emulation being
arch and driver specific. Remove some of that differentiation and unify
the existing x86 host bridge emulators (hyper-v and vmd) on a common
pci_bus_find_emul_domain_nr() helper.

Dan Williams (3):
  PCI: Establish document for PCI host bridge sysfs attributes
  PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  PCI: vmd: Switch to pci_bus_find_emul_domain_nr()

 .../ABI/testing/sysfs-devices-pci-host-bridge | 19 +++++++
 MAINTAINERS                                   |  1 +
 drivers/pci/controller/pci-hyperv.c           | 53 ++-----------------
 drivers/pci/controller/vmd.c                  | 33 ++++--------
 drivers/pci/pci.c                             | 43 ++++++++++++++-
 drivers/pci/probe.c                           |  8 ++-
 include/linux/pci.h                           |  4 ++
 7 files changed, 86 insertions(+), 75 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-pci-host-bridge


base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
-- 
2.50.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] PCI: Establish document for PCI host bridge sysfs attributes
  2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
@ 2025-07-16 16:08 ` Dan Williams
  2025-07-16 16:08 ` [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms Dan Williams
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2025-07-16 16:08 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, lukas, linux-kernel, Jonathan.Cameron

In preparation for adding more host bridge sysfs attributes, document the
existing naming format and 'firmware_node' attribute.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 .../ABI/testing/sysfs-devices-pci-host-bridge | 19 +++++++++++++++++++
 MAINTAINERS                                   |  1 +
 2 files changed, 20 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-devices-pci-host-bridge

diff --git a/Documentation/ABI/testing/sysfs-devices-pci-host-bridge b/Documentation/ABI/testing/sysfs-devices-pci-host-bridge
new file mode 100644
index 000000000000..8c3a652799f1
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-pci-host-bridge
@@ -0,0 +1,19 @@
+What:		/sys/devices/pciDDDD:BB
+		/sys/devices/.../pciDDDD:BB
+Contact:	linux-pci@vger.kernel.org
+Description:
+		A PCI host bridge device parents a PCI bus device topology. PCI
+		controllers may also parent host bridges. The DDDD:BB format
+		conveys the PCI domain (ACPI segment) number and root bus number
+		(in hexadecimal) of the host bridge. Note that the domain number
+		may be larger than the 16-bits that the "DDDD" format implies
+		for emulated host-bridges.
+
+What:		pciDDDD:BB/firmware_node
+Contact:	linux-pci@vger.kernel.org
+Description:
+		(RO) Symlink to the platform firmware device object "companion"
+		of the host bridge. For example, an ACPI device with an _HID of
+		PNP0A08 (/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00). See
+		/sys/devices/pciDDDD:BB entry for details about the DDDD:BB
+		format.
diff --git a/MAINTAINERS b/MAINTAINERS
index 0c1d245bf7b8..368ca0bfdb9f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19198,6 +19198,7 @@ Q:	https://patchwork.kernel.org/project/linux-pci/list/
 B:	https://bugzilla.kernel.org
 C:	irc://irc.oftc.net/linux-pci
 T:	git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
+F:	Documentation/ABI/testing/sysfs-devices-pci-host-bridge
 F:	Documentation/PCI/
 F:	Documentation/devicetree/bindings/pci/
 F:	arch/x86/kernel/early-quirks.c
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
  2025-07-16 16:08 ` [PATCH 1/3] PCI: Establish document for PCI host bridge sysfs attributes Dan Williams
@ 2025-07-16 16:08 ` Dan Williams
  2025-07-17 17:25   ` Michael Kelley
  2025-07-16 16:08 ` [PATCH 3/3] PCI: vmd: Switch to pci_bus_find_emul_domain_nr() Dan Williams
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2025-07-16 16:08 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, lukas, linux-kernel, Jonathan.Cameron,
	Suzuki K Poulose, Lorenzo Pieralisi, Manivannan Sadhasivam,
	Rob Herring, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui

The ability to emulate a host bridge is useful not only for hardware PCI
controllers like CONFIG_VMD, or virtual PCI controllers like
CONFIG_PCI_HYPERV, but also for test and development scenarios like
CONFIG_SAMPLES_DEVSEC [1].

One stumbling block for defining CONFIG_SAMPLES_DEVSEC, a sample
implementation of a platform TSM for PCI Device Security, is the need to
accommodate PCI_DOMAINS_GENERIC architectures alongside x86 [2].

In support of supplementing the existing CONFIG_PCI_BRIDGE_EMUL
infrastructure for host bridges:

* Introduce pci_bus_find_emul_domain_nr() as a common way to find a free
  PCI domain number whether that is to reuse the existing dynamic
  allocation code in the !ACPI case, or to assign an unused domain above
  the last ACPI segment.

* Convert pci-hyperv to the new allocator so that the PCI core can
  unconditionally assume that bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET
  is the dynamically allocated case.

A follow on patch can also convert vmd to the new scheme. Currently vmd
is limited to CONFIG_PCI_DOMAINS_GENERIC=n (x86) so, unlike pci-hyperv,
it does not immediately conflict with this new
pci_bus_find_emul_domain_nr() mechanism.

Link: http://lore.kernel.org/174107249038.1288555.12362100502109498455.stgit@dwillia2-xfh.jf.intel.com [1]
Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Closes: http://lore.kernel.org/20250311144601.145736-3-suzuki.poulose@arm.com
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/pci/controller/pci-hyperv.c | 53 ++---------------------------
 drivers/pci/pci.c                   | 43 ++++++++++++++++++++++-
 drivers/pci/probe.c                 |  8 ++++-
 include/linux/pci.h                 |  4 +++
 4 files changed, 56 insertions(+), 52 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ef5d655a0052..cfe9806bdbe4 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3630,48 +3630,6 @@ static int hv_send_resources_released(struct hv_device *hdev)
 	return 0;
 }
 
-#define HVPCI_DOM_MAP_SIZE (64 * 1024)
-static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
-
-/*
- * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
- * as invalid for passthrough PCI devices of this driver.
- */
-#define HVPCI_DOM_INVALID 0
-
-/**
- * hv_get_dom_num() - Get a valid PCI domain number
- * Check if the PCI domain number is in use, and return another number if
- * it is in use.
- *
- * @dom: Requested domain number
- *
- * return: domain number on success, HVPCI_DOM_INVALID on failure
- */
-static u16 hv_get_dom_num(u16 dom)
-{
-	unsigned int i;
-
-	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
-		return dom;
-
-	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
-		if (test_and_set_bit(i, hvpci_dom_map) == 0)
-			return i;
-	}
-
-	return HVPCI_DOM_INVALID;
-}
-
-/**
- * hv_put_dom_num() - Mark the PCI domain number as free
- * @dom: Domain number to be freed
- */
-static void hv_put_dom_num(u16 dom)
-{
-	clear_bit(dom, hvpci_dom_map);
-}
-
 /**
  * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
  * @hdev:	VMBus's tracking struct for this root PCI bus
@@ -3715,9 +3673,9 @@ static int hv_pci_probe(struct hv_device *hdev,
 	 * collisions) in the same VM.
 	 */
 	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
-	dom = hv_get_dom_num(dom_req);
+	dom = pci_bus_find_emul_domain_nr(dom_req);
 
-	if (dom == HVPCI_DOM_INVALID) {
+	if (dom < 0) {
 		dev_err(&hdev->device,
 			"Unable to use dom# 0x%x or other numbers", dom_req);
 		ret = -EINVAL;
@@ -3851,7 +3809,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 destroy_wq:
 	destroy_workqueue(hbus->wq);
 free_dom:
-	hv_put_dom_num(hbus->bridge->domain_nr);
+	pci_bus_release_emul_domain_nr(hbus->bridge->domain_nr);
 free_bus:
 	kfree(hbus);
 	return ret;
@@ -3976,8 +3934,6 @@ static void hv_pci_remove(struct hv_device *hdev)
 	irq_domain_remove(hbus->irq_domain);
 	irq_domain_free_fwnode(hbus->fwnode);
 
-	hv_put_dom_num(hbus->bridge->domain_nr);
-
 	kfree(hbus);
 }
 
@@ -4148,9 +4104,6 @@ static int __init init_hv_pci_drv(void)
 	if (ret)
 		return ret;
 
-	/* Set the invalid domain number's bit, so it will not be used */
-	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
-
 	/* Initialize PCI block r/w interface */
 	hvpci_block_ops.read_block = hv_read_config_block;
 	hvpci_block_ops.write_block = hv_write_config_block;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e9448d55113b..833ebf2d5213 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
 #endif
 }
 
+#ifdef CONFIG_PCI_DOMAINS
+static DEFINE_IDA(pci_domain_nr_dynamic_ida);
+
+/*
+ * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
+ * fallback to the first free domain number above the last ACPI segment number.
+ * Caller may have a specific domain number in mind, in which case try to
+ * reserve it.
+ *
+ * Note that this allocation is freed by pci_release_host_bridge_dev().
+ */
+int pci_bus_find_emul_domain_nr(int hint)
+{
+	if (hint >= 0) {
+		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
+				       GFP_KERNEL);
+
+		if (hint >= 0)
+			return hint;
+	}
+
+	if (acpi_disabled)
+		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
+
+	/*
+	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
+	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
+	 * which the lower 16 bits are the PCI Segment Group (domain) number.
+	 * Other bits are currently reserved.
+	 */
+	return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX,
+			       GFP_KERNEL);
+}
+EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr);
+
+void pci_bus_release_emul_domain_nr(int domain_nr)
+{
+	ida_free(&pci_domain_nr_dynamic_ida, domain_nr);
+}
+EXPORT_SYMBOL_GPL(pci_bus_release_emul_domain_nr);
+#endif
+
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 static DEFINE_IDA(pci_domain_nr_static_ida);
-static DEFINE_IDA(pci_domain_nr_dynamic_ida);
 
 static void of_pci_reserve_static_domain_nr(void)
 {
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..e94978c3be3d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -632,6 +632,11 @@ static void pci_release_host_bridge_dev(struct device *dev)
 
 	pci_free_resource_list(&bridge->windows);
 	pci_free_resource_list(&bridge->dma_ranges);
+
+	/* Host bridges only have domain_nr set in the emulation case */
+	if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
+		pci_bus_release_emul_domain_nr(bridge->domain_nr);
+
 	kfree(bridge);
 }
 
@@ -1112,7 +1117,8 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
 	device_del(&bridge->dev);
 free:
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
-	pci_bus_release_domain_nr(parent, bus->domain_nr);
+	if (bridge->domain_nr == PCI_DOMAIN_NR_NOT_SET)
+		pci_bus_release_domain_nr(parent, bus->domain_nr);
 #endif
 	if (bus_registered)
 		put_device(&bus->dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e68f35f392..f6a713da5c49 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1934,10 +1934,14 @@ DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
  */
 #ifdef CONFIG_PCI_DOMAINS
 extern int pci_domains_supported;
+int pci_bus_find_emul_domain_nr(int hint);
+void pci_bus_release_emul_domain_nr(int domain_nr);
 #else
 enum { pci_domains_supported = 0 };
 static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
 static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
+static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; }
+static inline void pci_bus_release_emul_domain_nr(int domain_nr) { }
 #endif /* CONFIG_PCI_DOMAINS */
 
 /*
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] PCI: vmd: Switch to pci_bus_find_emul_domain_nr()
  2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
  2025-07-16 16:08 ` [PATCH 1/3] PCI: Establish document for PCI host bridge sysfs attributes Dan Williams
  2025-07-16 16:08 ` [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms Dan Williams
@ 2025-07-16 16:08 ` Dan Williams
  2025-07-17 22:13 ` [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Bjorn Helgaas
  2025-07-18  0:26 ` dan.j.williams
  4 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2025-07-16 16:08 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, lukas, linux-kernel, Jonathan.Cameron, Nirmal Patel

The new common domain number allocator can replace the custom allocator
in VMD.

Beyond some code reuse benefits it does close a potential race whereby
vmd_find_free_domain() collides with new PCI buses coming online with a
conflicting domain number. Such a race has not been observed in
practice, hence not tagging this change as a fix.

As VMD uses pci_create_root_bus() rather than pci_alloc_host_bridge() +
pci_scan_root_bus_bridge() it has no chance to set ->domain_nr in the
bridge so needs to manage freeing the domain number on its own.

Cc: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/pci/controller/vmd.c | 33 ++++++++++-----------------------
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 8df064b62a2f..f60244ff9ef8 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -565,22 +565,6 @@ static void vmd_detach_resources(struct vmd_dev *vmd)
 	vmd->dev->resource[VMD_MEMBAR2].child = NULL;
 }
 
-/*
- * VMD domains start at 0x10000 to not clash with ACPI _SEG domains.
- * Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of which the lower
- * 16 bits are the PCI Segment Group (domain) number.  Other bits are
- * currently reserved.
- */
-static int vmd_find_free_domain(void)
-{
-	int domain = 0xffff;
-	struct pci_bus *bus = NULL;
-
-	while ((bus = pci_find_next_bus(bus)) != NULL)
-		domain = max_t(int, domain, pci_domain_nr(bus));
-	return domain + 1;
-}
-
 static int vmd_get_phys_offsets(struct vmd_dev *vmd, bool native_hint,
 				resource_size_t *offset1,
 				resource_size_t *offset2)
@@ -865,13 +849,6 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 		.parent = res,
 	};
 
-	sd->vmd_dev = vmd->dev;
-	sd->domain = vmd_find_free_domain();
-	if (sd->domain < 0)
-		return sd->domain;
-
-	sd->node = pcibus_to_node(vmd->dev->bus);
-
 	/*
 	 * Currently MSI remapping must be enabled in guest passthrough mode
 	 * due to some missing interrupt remapping plumbing. This is probably
@@ -903,9 +880,17 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
 	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
 
+	sd->vmd_dev = vmd->dev;
+	sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET);
+	if (sd->domain < 0)
+		return sd->domain;
+
+	sd->node = pcibus_to_node(vmd->dev->bus);
+
 	vmd->bus = pci_create_root_bus(&vmd->dev->dev, vmd->busn_start,
 				       &vmd_ops, sd, &resources);
 	if (!vmd->bus) {
+		pci_bus_release_emul_domain_nr(sd->domain);
 		pci_free_resource_list(&resources);
 		vmd_remove_irq_domain(vmd);
 		return -ENODEV;
@@ -998,6 +983,7 @@ static int vmd_probe(struct pci_dev *dev, const struct pci_device_id *id)
 		return -ENOMEM;
 
 	vmd->dev = dev;
+	vmd->sysdata.domain = PCI_DOMAIN_NR_NOT_SET;
 	vmd->instance = ida_alloc(&vmd_instance_ida, GFP_KERNEL);
 	if (vmd->instance < 0)
 		return vmd->instance;
@@ -1063,6 +1049,7 @@ static void vmd_remove(struct pci_dev *dev)
 	vmd_detach_resources(vmd);
 	vmd_remove_irq_domain(vmd);
 	ida_free(&vmd_instance_ida, vmd->instance);
+	pci_bus_release_emul_domain_nr(vmd->sysdata.domain);
 }
 
 static void vmd_shutdown(struct pci_dev *dev)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-16 16:08 ` [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms Dan Williams
@ 2025-07-17 17:25   ` Michael Kelley
  2025-07-17 19:59     ` dan.j.williams
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Kelley @ 2025-07-17 17:25 UTC (permalink / raw)
  To: Dan Williams, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Manivannan Sadhasivam,
	Rob Herring, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, July 16, 2025 9:09 AM
> 
> The ability to emulate a host bridge is useful not only for hardware PCI
> controllers like CONFIG_VMD, or virtual PCI controllers like
> CONFIG_PCI_HYPERV, but also for test and development scenarios like
> CONFIG_SAMPLES_DEVSEC [1].
> 
> One stumbling block for defining CONFIG_SAMPLES_DEVSEC, a sample
> implementation of a platform TSM for PCI Device Security, is the need to
> accommodate PCI_DOMAINS_GENERIC architectures alongside x86 [2].
> 
> In support of supplementing the existing CONFIG_PCI_BRIDGE_EMUL
> infrastructure for host bridges:
> 
> * Introduce pci_bus_find_emul_domain_nr() as a common way to find a free
>   PCI domain number whether that is to reuse the existing dynamic
>   allocation code in the !ACPI case, or to assign an unused domain above
>   the last ACPI segment.
> 
> * Convert pci-hyperv to the new allocator so that the PCI core can
>   unconditionally assume that bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET
>   is the dynamically allocated case.
> 
> A follow on patch can also convert vmd to the new scheme. Currently vmd
> is limited to CONFIG_PCI_DOMAINS_GENERIC=n (x86) so, unlike pci-hyperv,
> it does not immediately conflict with this new
> pci_bus_find_emul_domain_nr() mechanism.
> 
> Link: https://lore.kernel.org/all/174107249038.1288555.12362100502109498455.stgit@dwillia2-xfh.jf.intel.com/ [1]
> Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Closes: https://lore.kernel.org/all/20250311144601.145736-3-suzuki.poulose@arm.com/ 
> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Cc: Rob Herring <robh@kernel.org>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Wei Liu <wei.liu@kernel.org>
> Cc: Dexuan Cui <decui@microsoft.com>
> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 53 ++---------------------------
>  drivers/pci/pci.c                   | 43 ++++++++++++++++++++++-
>  drivers/pci/probe.c                 |  8 ++++-
>  include/linux/pci.h                 |  4 +++
>  4 files changed, 56 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index ef5d655a0052..cfe9806bdbe4 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3630,48 +3630,6 @@ static int hv_send_resources_released(struct hv_device *hdev)
>  	return 0;
>  }
> 
> -#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> -static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> -
> -/*
> - * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> - * as invalid for passthrough PCI devices of this driver.
> - */
> -#define HVPCI_DOM_INVALID 0
> -
> -/**
> - * hv_get_dom_num() - Get a valid PCI domain number
> - * Check if the PCI domain number is in use, and return another number if
> - * it is in use.
> - *
> - * @dom: Requested domain number
> - *
> - * return: domain number on success, HVPCI_DOM_INVALID on failure
> - */
> -static u16 hv_get_dom_num(u16 dom)
> -{
> -	unsigned int i;
> -
> -	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> -		return dom;
> -
> -	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> -		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> -			return i;
> -	}
> -
> -	return HVPCI_DOM_INVALID;
> -}
> -
> -/**
> - * hv_put_dom_num() - Mark the PCI domain number as free
> - * @dom: Domain number to be freed
> - */
> -static void hv_put_dom_num(u16 dom)
> -{
> -	clear_bit(dom, hvpci_dom_map);
> -}
> -
>  /**
>   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
>   * @hdev:	VMBus's tracking struct for this root PCI bus
> @@ -3715,9 +3673,9 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	 * collisions) in the same VM.
>  	 */
>  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> -	dom = hv_get_dom_num(dom_req);
> +	dom = pci_bus_find_emul_domain_nr(dom_req);
> 
> -	if (dom == HVPCI_DOM_INVALID) {
> +	if (dom < 0) {
>  		dev_err(&hdev->device,
>  			"Unable to use dom# 0x%x or other numbers", dom_req);
>  		ret = -EINVAL;
> @@ -3851,7 +3809,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  destroy_wq:
>  	destroy_workqueue(hbus->wq);
>  free_dom:
> -	hv_put_dom_num(hbus->bridge->domain_nr);
> +	pci_bus_release_emul_domain_nr(hbus->bridge->domain_nr);
>  free_bus:
>  	kfree(hbus);
>  	return ret;
> @@ -3976,8 +3934,6 @@ static void hv_pci_remove(struct hv_device *hdev)
>  	irq_domain_remove(hbus->irq_domain);
>  	irq_domain_free_fwnode(hbus->fwnode);
> 
> -	hv_put_dom_num(hbus->bridge->domain_nr);
> -
>  	kfree(hbus);
>  }
> 
> @@ -4148,9 +4104,6 @@ static int __init init_hv_pci_drv(void)
>  	if (ret)
>  		return ret;
> 
> -	/* Set the invalid domain number's bit, so it will not be used */
> -	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> -
>  	/* Initialize PCI block r/w interface */
>  	hvpci_block_ops.read_block = hv_read_config_block;
>  	hvpci_block_ops.write_block = hv_write_config_block;
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e9448d55113b..833ebf2d5213 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
>  #endif
>  }
> 
> +#ifdef CONFIG_PCI_DOMAINS
> +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> +
> +/*
> + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> + * fallback to the first free domain number above the last ACPI segment number.
> + * Caller may have a specific domain number in mind, in which case try to
> + * reserve it.
> + *
> + * Note that this allocation is freed by pci_release_host_bridge_dev().
> + */
> +int pci_bus_find_emul_domain_nr(int hint)
> +{
> +	if (hint >= 0) {
> +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> +				       GFP_KERNEL);

This almost preserves the existing functionality in pci-hyperv.c. But if the
"hint" passed in is zero, current code in pci-hyperv.c treats that as a
collision and allocates some other value. The special treatment of zero is
necessary per the comment with the definition of HVPCI_DOM_INVALID.

I don't have an opinion on whether the code here should treat a "hint"
of zero as invalid, or whether that should be handled in pci-hyperv.c.

> +
> +		if (hint >= 0)
> +			return hint;
> +	}
> +
> +	if (acpi_disabled)
> +		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> +
> +	/*
> +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> +	 * Other bits are currently reserved.
> +	 */

Back in 2018 and 2019, the Microsoft Linux team encountered problems with
PCI domain IDs that exceeded 0xFFFF. User space code, such as the Xorg X server,
assumed PCI domain IDs were at most 16 bits, and retained only the low 16 bits
if the value was larger. My memory of the details is vague, but I believe some
or all of this behavior was tied to libpciaccess. As a result of these user space
limitations, the pci-hyperv.c code made sure to not create any domain IDs
larger than 0xFFFF. The problem was not just theoretical -- Microsoft had
customers reporting issues due to the "32-bit domain ID problem" and the
pci-hyperv.c code was updated to avoid it.

I don't have information on whether user space code has been fixed, or
the extent to which such a fix has propagated into distro versions. At the
least, a VM with old user space code might break if the kernel is upgraded
to one with this patch. How do people see the risks now that it is 6 years
later? I don't have enough data to make an assessment.

Michael

> +	return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX,
> +			       GFP_KERNEL);
> +}
> +EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr);
> +
> +void pci_bus_release_emul_domain_nr(int domain_nr)
> +{
> +	ida_free(&pci_domain_nr_dynamic_ida, domain_nr);
> +}
> +EXPORT_SYMBOL_GPL(pci_bus_release_emul_domain_nr);
> +#endif
> +
>  #ifdef CONFIG_PCI_DOMAINS_GENERIC
>  static DEFINE_IDA(pci_domain_nr_static_ida);
> -static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> 
>  static void of_pci_reserve_static_domain_nr(void)
>  {
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 4b8693ec9e4c..e94978c3be3d 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -632,6 +632,11 @@ static void pci_release_host_bridge_dev(struct device *dev)
> 
>  	pci_free_resource_list(&bridge->windows);
>  	pci_free_resource_list(&bridge->dma_ranges);
> +
> +	/* Host bridges only have domain_nr set in the emulation case */
> +	if (bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET)
> +		pci_bus_release_emul_domain_nr(bridge->domain_nr);
> +
>  	kfree(bridge);
>  }
> 
> @@ -1112,7 +1117,8 @@ static int pci_register_host_bridge(struct pci_host_bridge *bridge)
>  	device_del(&bridge->dev);
>  free:
>  #ifdef CONFIG_PCI_DOMAINS_GENERIC
> -	pci_bus_release_domain_nr(parent, bus->domain_nr);
> +	if (bridge->domain_nr == PCI_DOMAIN_NR_NOT_SET)
> +		pci_bus_release_domain_nr(parent, bus->domain_nr);
>  #endif
>  	if (bus_registered)
>  		put_device(&bus->dev);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 05e68f35f392..f6a713da5c49 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1934,10 +1934,14 @@ DEFINE_GUARD(pci_dev, struct pci_dev *,
> pci_dev_lock(_T), pci_dev_unlock(_T))
>   */
>  #ifdef CONFIG_PCI_DOMAINS
>  extern int pci_domains_supported;
> +int pci_bus_find_emul_domain_nr(int hint);
> +void pci_bus_release_emul_domain_nr(int domain_nr);
>  #else
>  enum { pci_domains_supported = 0 };
>  static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
>  static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
> +static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; }
> +static inline void pci_bus_release_emul_domain_nr(int domain_nr) { }
>  #endif /* CONFIG_PCI_DOMAINS */
> 
>  /*
> --
> 2.50.1
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-17 17:25   ` Michael Kelley
@ 2025-07-17 19:59     ` dan.j.williams
  2025-07-17 23:06       ` Michael Kelley
  0 siblings, 1 reply; 12+ messages in thread
From: dan.j.williams @ 2025-07-17 19:59 UTC (permalink / raw)
  To: Michael Kelley, Dan Williams, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Manivannan Sadhasivam,
	Rob Herring, K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

Michael Kelley wrote:
> From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, July 16, 2025 9:09 AM

Thanks for taking a look Michael!

[..]
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e9448d55113b..833ebf2d5213 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
> >  #endif
> >  }
> > 
> > +#ifdef CONFIG_PCI_DOMAINS
> > +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > +
> > +/*
> > + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > + * fallback to the first free domain number above the last ACPI segment number.
> > + * Caller may have a specific domain number in mind, in which case try to
> > + * reserve it.
> > + *
> > + * Note that this allocation is freed by pci_release_host_bridge_dev().
> > + */
> > +int pci_bus_find_emul_domain_nr(int hint)
> > +{
> > +	if (hint >= 0) {
> > +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > +				       GFP_KERNEL);
> 
> This almost preserves the existing functionality in pci-hyperv.c. But if the
> "hint" passed in is zero, current code in pci-hyperv.c treats that as a
> collision and allocates some other value. The special treatment of zero is
> necessary per the comment with the definition of HVPCI_DOM_INVALID.
> 
> I don't have an opinion on whether the code here should treat a "hint"
> of zero as invalid, or whether that should be handled in pci-hyperv.c.

Oh, I see what you are saying. I made the "hint == 0" case start working
where previously it should have failed. I feel like that's probably best
handled in pci-hyperv.c with something like the following which also
fixes up a regression I caused with @dom being unsigned:

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index cfe9806bdbe4..813757db98d1 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
 {
 	struct pci_host_bridge *bridge;
 	struct hv_pcibus_device *hbus;
-	u16 dom_req, dom;
+	int ret, dom = -EINVAL;
+	u16 dom_req;
 	char *name;
-	int ret;
 
 	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
 	if (!bridge)
@@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 	 * collisions) in the same VM.
 	 */
 	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
-	dom = pci_bus_find_emul_domain_nr(dom_req);
+	if (dom_req)
+		dom = pci_bus_find_emul_domain_nr(dom_req);
 
 	if (dom < 0) {
 		dev_err(&hdev->device,

> > +
> > +		if (hint >= 0)
> > +			return hint;
> > +	}
> > +
> > +	if (acpi_disabled)
> > +		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> > +
> > +	/*
> > +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> > +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> > +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> > +	 * Other bits are currently reserved.
> > +	 */
> 
> Back in 2018 and 2019, the Microsoft Linux team encountered problems with
> PCI domain IDs that exceeded 0xFFFF. User space code, such as the Xorg X server,
> assumed PCI domain IDs were at most 16 bits, and retained only the low 16 bits
> if the value was larger. My memory of the details is vague, but I believe some
> or all of this behavior was tied to libpciaccess. As a result of these user space
> limitations, the pci-hyperv.c code made sure to not create any domain IDs
> larger than 0xFFFF. The problem was not just theoretical -- Microsoft had
> customers reporting issues due to the "32-bit domain ID problem" and the
> pci-hyperv.c code was updated to avoid it.
> 
> I don't have information on whether user space code has been fixed, or
> the extent to which such a fix has propagated into distro versions. At the
> least, a VM with old user space code might break if the kernel is upgraded
> to one with this patch. How do people see the risks now that it is 6 years
> later? I don't have enough data to make an assessment.

A couple observations:

- I think it would be reasonable to not fallback in the hint case with
  something like this:

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 833ebf2d5213..0bd2053dbe8a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida);
  */
 int pci_bus_find_emul_domain_nr(int hint)
 {
-	if (hint >= 0) {
-		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
+	if (hint >= 0)
+		return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
 				       GFP_KERNEL);
 
-		if (hint >= 0)
-			return hint;
-	}
-
 	if (acpi_disabled)
 		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
 
- The VMD driver has been allocating 32-bit PCI domain numbers since
  v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
  Device (VMD)"). At a minimum if it is still a problem, it is a shared
  problem, but the significant deployment of VMD in the time likely
  indicates it is ok. If not, the above change at least makes the
  hyper-v case avoid 32-bit domain numbers.

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] PCI: Unify domain emulation and misc documentation update
  2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
                   ` (2 preceding siblings ...)
  2025-07-16 16:08 ` [PATCH 3/3] PCI: vmd: Switch to pci_bus_find_emul_domain_nr() Dan Williams
@ 2025-07-17 22:13 ` Bjorn Helgaas
  2025-07-18  0:26 ` dan.j.williams
  4 siblings, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2025-07-17 22:13 UTC (permalink / raw)
  To: Dan Williams
  Cc: bhelgaas, linux-pci, lukas, linux-kernel, Jonathan.Cameron,
	Dexuan Cui, Haiyang Zhang, K. Y. Srinivasan, Lorenzo Pieralisi,
	Manivannan Sadhasivam, Nirmal Patel, Rob Herring,
	Suzuki K Poulose, Wei Liu

On Wed, Jul 16, 2025 at 09:08:32AM -0700, Dan Williams wrote:
> Bjorn,
> 
> This is a small collection of miscellaneous updates that originated in
> the PCI/TSM work, but are suitable to go ahead in v6.17. It is a
> documentation update and a new pci_bus_find_emul_domain_nr() helper.
> 
> First, the PCI/TSM work (Trusted Execution Environment Security Manager
> (PCI device assignment for confidential guests)) wants to add some
> additional PCI host bridge sysfs attributes. In preparation for that,
> document what is already there.
> 
> Next, the PCI/TSM effort proposes samples/devsec/ as a reference and
> test implementation of all the TSM infrastructure. It is implemented via
> host bridge emulation and aims to be cross-architecture compatible. It
> stumbled over the current state of PCI domain number emulation being
> arch and driver specific. Remove some of that differentiation and unify
> the existing x86 host bridge emulators (hyper-v and vmd) on a common
> pci_bus_find_emul_domain_nr() helper.
> 
> Dan Williams (3):
>   PCI: Establish document for PCI host bridge sysfs attributes
>   PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
>   PCI: vmd: Switch to pci_bus_find_emul_domain_nr()
> 
>  .../ABI/testing/sysfs-devices-pci-host-bridge | 19 +++++++
>  MAINTAINERS                                   |  1 +
>  drivers/pci/controller/pci-hyperv.c           | 53 ++-----------------
>  drivers/pci/controller/vmd.c                  | 33 ++++--------
>  drivers/pci/pci.c                             | 43 ++++++++++++++-
>  drivers/pci/probe.c                           |  8 ++-
>  include/linux/pci.h                           |  4 ++
>  7 files changed, 86 insertions(+), 75 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-devices-pci-host-bridge

Seems OK to me, modulo the conversation with Michael.  Would like a
Reviewed-by from the owners of pci-hyperv.c and vmd.c, of course.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-17 19:59     ` dan.j.williams
@ 2025-07-17 23:06       ` Michael Kelley
  2025-07-18  0:22         ` dan.j.williams
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Kelley @ 2025-07-17 23:06 UTC (permalink / raw)
  To: dan.j.williams@intel.com, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Rob Herring,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

From: dan.j.williams@intel.com <dan.j.williams@intel.com> Sent: Thursday, July 17, 2025 12:59 PM
> 
> Michael Kelley wrote:
> > From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, July 16, 2025 9:09 AM
> 
> Thanks for taking a look Michael!
> 
> [..]
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index e9448d55113b..833ebf2d5213 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
> > >  #endif
> > >  }
> > >
> > > +#ifdef CONFIG_PCI_DOMAINS
> > > +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > > +
> > > +/*
> > > + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > > + * fallback to the first free domain number above the last ACPI segment number.
> > > + * Caller may have a specific domain number in mind, in which case try to
> > > + * reserve it.
> > > + *
> > > + * Note that this allocation is freed by pci_release_host_bridge_dev().
> > > + */
> > > +int pci_bus_find_emul_domain_nr(int hint)
> > > +{
> > > +	if (hint >= 0) {
> > > +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > +				       GFP_KERNEL);
> >
> > This almost preserves the existing functionality in pci-hyperv.c. But if the
> > "hint" passed in is zero, current code in pci-hyperv.c treats that as a
> > collision and allocates some other value. The special treatment of zero is
> > necessary per the comment with the definition of HVPCI_DOM_INVALID.
> >
> > I don't have an opinion on whether the code here should treat a "hint"
> > of zero as invalid, or whether that should be handled in pci-hyperv.c.
> 
> Oh, I see what you are saying. I made the "hint == 0" case start working
> where previously it should have failed. I feel like that's probably best
> handled in pci-hyperv.c with something like the following which also
> fixes up a regression I caused with @dom being unsigned:
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index cfe9806bdbe4..813757db98d1 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
>  {
>  	struct pci_host_bridge *bridge;
>  	struct hv_pcibus_device *hbus;
> -	u16 dom_req, dom;
> +	int ret, dom = -EINVAL;
> +	u16 dom_req;
>  	char *name;
> -	int ret;
> 
>  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
>  	if (!bridge)
> @@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	 * collisions) in the same VM.
>  	 */
>  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> -	dom = pci_bus_find_emul_domain_nr(dom_req);
> +	if (dom_req)
> +		dom = pci_bus_find_emul_domain_nr(dom_req);

No, I don't think this is right either. If dom_req is 0, we don't want to
hv_pci_probe() to fail. We want the "collision" path to be taken so that
some other unused PCI domain ID is assigned. That could be done by
passing -1 as the hint to pci_bus_bind_emul_domain_nr(). Or PCI
domain ID 0 could be pre-reserved in init_hv_pci_drv() like is done
with HVPCI_DOM_INVALID in current code.

> 
>  	if (dom < 0) {
>  		dev_err(&hdev->device,
> 
> > > +
> > > +		if (hint >= 0)
> > > +			return hint;
> > > +	}
> > > +
> > > +	if (acpi_disabled)
> > > +		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> > > +
> > > +	/*
> > > +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> > > +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> > > +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> > > +	 * Other bits are currently reserved.
> > > +	 */
> >
> > Back in 2018 and 2019, the Microsoft Linux team encountered problems with
> > PCI domain IDs that exceeded 0xFFFF. User space code, such as the Xorg X server,
> > assumed PCI domain IDs were at most 16 bits, and retained only the low 16 bits
> > if the value was larger. My memory of the details is vague, but I believe some
> > or all of this behavior was tied to libpciaccess. As a result of these user space
> > limitations, the pci-hyperv.c code made sure to not create any domain IDs
> > larger than 0xFFFF. The problem was not just theoretical -- Microsoft had
> > customers reporting issues due to the "32-bit domain ID problem" and the
> > pci-hyperv.c code was updated to avoid it.
> >
> > I don't have information on whether user space code has been fixed, or
> > the extent to which such a fix has propagated into distro versions. At the
> > least, a VM with old user space code might break if the kernel is upgraded
> > to one with this patch. How do people see the risks now that it is 6 years
> > later? I don't have enough data to make an assessment.
> 
> A couple observations:
> 
> - I think it would be reasonable to not fallback in the hint case with
>   something like this:

We *do* need the fallback in the hint case. If the hint causes a collision
(i.e., another device is already using the hinted PCI domain ID), then we
need to choose some other PCI domain ID. Again, we don't want hv_pci_probe()
to fail for the device because the value of bytes 4 and 5 chosen from device's
GUID (as assigned by Hyper-V) accidently matches bytes 4 and 5 of some other
device's GUID. Hyper-V guarantees the GUIDs are unique, but not bytes 4 and
5 standing alone. Current code behaves like the acpi_disabled case in your
patch, and picks some other unused PCI domain ID in the 1 to 0xFFFF range.

> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 833ebf2d5213..0bd2053dbe8a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida);
>   */
>  int pci_bus_find_emul_domain_nr(int hint)
>  {
> -	if (hint >= 0) {
> -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> +	if (hint >= 0)
> +		return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
>  				       GFP_KERNEL);
> 
> -		if (hint >= 0)
> -			return hint;
> -	}
> -
>  	if (acpi_disabled)
>  		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> 
> - The VMD driver has been allocating 32-bit PCI domain numbers since
>   v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
>   Device (VMD)"). At a minimum if it is still a problem, it is a shared
>   problem, but the significant deployment of VMD in the time likely
>   indicates it is ok. If not, the above change at least makes the
>   hyper-v case avoid 32-bit domain numbers.

The problem we encountered in 2018/2019 was with graphics devices
and the Xorg X Server, specifically with the PCI domain ID stored in
xorg.conf to identify the graphics device that the X Server was to run
against. I don't recall ever seeing a similar problem with storage or NIC
devices, but my memory could be incomplete. It's plausible that user
space code accessing the VMD device correctly handled 32-bit domain
IDs, but that's not necessarily an indicator for user space graphics
software. The Xorg X Server issues would have started somewhere after
commit 4a9b0933bdfc in the 4.11 kernel, and were finally fixed in the 5.4
kernel with commits be700103efd10 and f73f8a504e279.

All that said, I'm not personally averse to trying again in assigning a
domain ID > 0xFFFF. I do see a commit [1] to fix libpciaccess that was
made 7 years ago in response to the issues we were seeing on Hyper-V.
Assuming those fixes have propagated into using packages like X Server,
then we're good. But someone from Microsoft should probably sign off
on taking this risk. I retired from Microsoft nearly two years ago, and
meddle in things from time-to-time without the burden of dealing
with customer support issues. ;-)

[1] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/a167bd6474522a709ff3cbb00476c0e4309cb66f

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-17 23:06       ` Michael Kelley
@ 2025-07-18  0:22         ` dan.j.williams
  2025-07-18  3:03           ` Michael Kelley
  0 siblings, 1 reply; 12+ messages in thread
From: dan.j.williams @ 2025-07-18  0:22 UTC (permalink / raw)
  To: Michael Kelley, dan.j.williams@intel.com, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Rob Herring,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

Michael Kelley wrote:
> From: dan.j.williams@intel.com <dan.j.williams@intel.com> Sent: Thursday, July 17, 2025 12:59 PM
> > 
> > Michael Kelley wrote:
> > > From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, July 16, 2025 9:09 AM
> > 
> > Thanks for taking a look Michael!
> > 
> > [..]
> > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > index e9448d55113b..833ebf2d5213 100644
> > > > --- a/drivers/pci/pci.c
> > > > +++ b/drivers/pci/pci.c
> > > > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
> > > >  #endif
> > > >  }
> > > >
> > > > +#ifdef CONFIG_PCI_DOMAINS
> > > > +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > > > +
> > > > +/*
> > > > + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > > > + * fallback to the first free domain number above the last ACPI segment number.
> > > > + * Caller may have a specific domain number in mind, in which case try to
> > > > + * reserve it.
> > > > + *
> > > > + * Note that this allocation is freed by pci_release_host_bridge_dev().
> > > > + */
> > > > +int pci_bus_find_emul_domain_nr(int hint)
> > > > +{
> > > > +	if (hint >= 0) {
> > > > +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > > +				       GFP_KERNEL);
> > >
> > > This almost preserves the existing functionality in pci-hyperv.c. But if the
> > > "hint" passed in is zero, current code in pci-hyperv.c treats that as a
> > > collision and allocates some other value. The special treatment of zero is
> > > necessary per the comment with the definition of HVPCI_DOM_INVALID.
> > >
> > > I don't have an opinion on whether the code here should treat a "hint"
> > > of zero as invalid, or whether that should be handled in pci-hyperv.c.
> > 
> > Oh, I see what you are saying. I made the "hint == 0" case start working
> > where previously it should have failed. I feel like that's probably best
> > handled in pci-hyperv.c with something like the following which also
> > fixes up a regression I caused with @dom being unsigned:
> > 
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index cfe9806bdbe4..813757db98d1 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  {
> >  	struct pci_host_bridge *bridge;
> >  	struct hv_pcibus_device *hbus;
> > -	u16 dom_req, dom;
> > +	int ret, dom = -EINVAL;
> > +	u16 dom_req;
> >  	char *name;
> > -	int ret;
> > 
> >  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
> >  	if (!bridge)
> > @@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  	 * collisions) in the same VM.
> >  	 */
> >  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > -	dom = pci_bus_find_emul_domain_nr(dom_req);
> > +	if (dom_req)
> > +		dom = pci_bus_find_emul_domain_nr(dom_req);
> 
> No, I don't think this is right either. If dom_req is 0, we don't want to
> hv_pci_probe() to fail. We want the "collision" path to be taken so that
> some other unused PCI domain ID is assigned. That could be done by
> passing -1 as the hint to pci_bus_bind_emul_domain_nr(). Or PCI
> domain ID 0 could be pre-reserved in init_hv_pci_drv() like is done
> with HVPCI_DOM_INVALID in current code.

Yeah, I realized that shortly after sending. I will slow down.

> > 
> > A couple observations:
> > 
> > - I think it would be reasonable to not fallback in the hint case with
> >   something like this:
> 
> We *do* need the fallback in the hint case. If the hint causes a collision
> (i.e., another device is already using the hinted PCI domain ID), then we
> need to choose some other PCI domain ID. Again, we don't want hv_pci_probe()
> to fail for the device because the value of bytes 4 and 5 chosen from device's
> GUID (as assigned by Hyper-V) accidently matches bytes 4 and 5 of some other
> device's GUID. Hyper-V guarantees the GUIDs are unique, but not bytes 4 and
> 5 standing alone. Current code behaves like the acpi_disabled case in your
> patch, and picks some other unused PCI domain ID in the 1 to 0xFFFF range.

Ok, that feels like "let the caller set the range in addition to the
hint".

> 
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 833ebf2d5213..0bd2053dbe8a 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> >   */
> >  int pci_bus_find_emul_domain_nr(int hint)
> >  {
> > -	if (hint >= 0) {
> > -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > +	if (hint >= 0)
> > +		return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> >  				       GFP_KERNEL);
> > 
> > -		if (hint >= 0)
> > -			return hint;
> > -	}
> > -
> >  	if (acpi_disabled)
> >  		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> > 
> > - The VMD driver has been allocating 32-bit PCI domain numbers since
> >   v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
> >   Device (VMD)"). At a minimum if it is still a problem, it is a shared
> >   problem, but the significant deployment of VMD in the time likely
> >   indicates it is ok. If not, the above change at least makes the
> >   hyper-v case avoid 32-bit domain numbers.
> 
> The problem we encountered in 2018/2019 was with graphics devices
> and the Xorg X Server, specifically with the PCI domain ID stored in
> xorg.conf to identify the graphics device that the X Server was to run
> against. I don't recall ever seeing a similar problem with storage or NIC
> devices, but my memory could be incomplete. It's plausible that user
> space code accessing the VMD device correctly handled 32-bit domain
> IDs, but that's not necessarily an indicator for user space graphics
> software. The Xorg X Server issues would have started somewhere after
> commit 4a9b0933bdfc in the 4.11 kernel, and were finally fixed in the 5.4
> kernel with commits be700103efd10 and f73f8a504e279.
> 
> All that said, I'm not personally averse to trying again in assigning a
> domain ID > 0xFFFF. I do see a commit [1] to fix libpciaccess that was
> made 7 years ago in response to the issues we were seeing on Hyper-V.
> Assuming those fixes have propagated into using packages like X Server,
> then we're good. But someone from Microsoft should probably sign off
> on taking this risk. I retired from Microsoft nearly two years ago, and
> meddle in things from time-to-time without the burden of dealing
> with customer support issues. ;-)

Living the dream! Extra thanks for taking a look.

> [1] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/a167bd6474522a709ff3cbb00476c0e4309cb66f

Thanks for this.

I would rather do the equivalent conversion for now because 7 years old
is right on the cusp of "someone might still be running that with new
kernels".

Here is the replacement fixup that I will fold in if it looks good to
you:

-- 8< --
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index cfe9806bdbe4..f1079a438bff 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
 {
 	struct pci_host_bridge *bridge;
 	struct hv_pcibus_device *hbus;
-	u16 dom_req, dom;
+	int ret, dom;
+	u16 dom_req;
 	char *name;
-	int ret;
 
 	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
 	if (!bridge)
@@ -3673,8 +3673,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 	 * collisions) in the same VM.
 	 */
 	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
-	dom = pci_bus_find_emul_domain_nr(dom_req);
-
+	dom = pci_bus_find_emul_domain_nr(dom_req, 1, U16_MAX);
 	if (dom < 0) {
 		dev_err(&hdev->device,
 			"Unable to use dom# 0x%x or other numbers", dom_req);
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index f60244ff9ef8..30935fe85af9 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -881,7 +881,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
 
 	sd->vmd_dev = vmd->dev;
-	sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET);
+
+	/*
+	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
+	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
+	 * which the lower 16 bits are the PCI Segment Group (domain) number.
+	 * Other bits are currently reserved.
+	 */
+	sd->domain = pci_bus_find_emul_domain_nr(0, 0x10000, INT_MAX);
 	if (sd->domain < 0)
 		return sd->domain;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 833ebf2d5213..de42e53f07d0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6695,34 +6695,15 @@ static void pci_no_domains(void)
 #ifdef CONFIG_PCI_DOMAINS
 static DEFINE_IDA(pci_domain_nr_dynamic_ida);
 
-/*
- * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
- * fallback to the first free domain number above the last ACPI segment number.
- * Caller may have a specific domain number in mind, in which case try to
- * reserve it.
- *
- * Note that this allocation is freed by pci_release_host_bridge_dev().
+/**
+ * pci_bus_find_emul_domain_nr() - allocate a PCI domain number per constraints
+ * @hint: desired domain, 0 if any id in the range of @min to @max is acceptable
+ * @min: minimum allowable domain
+ * @max: maximum allowable domain, no ids higher than INT_MAX will be returned
  */
-int pci_bus_find_emul_domain_nr(int hint)
+u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)
 {
-	if (hint >= 0) {
-		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
-				       GFP_KERNEL);
-
-		if (hint >= 0)
-			return hint;
-	}
-
-	if (acpi_disabled)
-		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
-
-	/*
-	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
-	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
-	 * which the lower 16 bits are the PCI Segment Group (domain) number.
-	 * Other bits are currently reserved.
-	 */
-	return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX,
+	return ida_alloc_range(&pci_domain_nr_dynamic_ida, max(hint, min), max,
 			       GFP_KERNEL);
 }
 EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f6a713da5c49..4aeabe8e2f1f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1934,13 +1934,16 @@ DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
  */
 #ifdef CONFIG_PCI_DOMAINS
 extern int pci_domains_supported;
-int pci_bus_find_emul_domain_nr(int hint);
+u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max);
 void pci_bus_release_emul_domain_nr(int domain_nr);
 #else
 enum { pci_domains_supported = 0 };
 static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
 static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
-static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; }
+static inline u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)
+{
+	return 0;
+}
 static inline void pci_bus_release_emul_domain_nr(int domain_nr) { }
 #endif /* CONFIG_PCI_DOMAINS */
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] PCI: Unify domain emulation and misc documentation update
  2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
                   ` (3 preceding siblings ...)
  2025-07-17 22:13 ` [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Bjorn Helgaas
@ 2025-07-18  0:26 ` dan.j.williams
  4 siblings, 0 replies; 12+ messages in thread
From: dan.j.williams @ 2025-07-18  0:26 UTC (permalink / raw)
  To: Dan Williams, bhelgaas
  Cc: linux-pci, lukas, linux-kernel, Jonathan.Cameron, Dexuan Cui,
	Haiyang Zhang, K. Y. Srinivasan, Lorenzo Pieralisi,
	Manivannan Sadhasivam, Nirmal Patel, Rob Herring,
	Suzuki K Poulose, Wei Liu, szymon.durawa

[ add Syzmon for extra eyes on the VMD change ]

Syzmon see the proposed fixup in the discussion with Michael:

http://lore.kernel.org/687993e03bb4c_14c3561008c@dwillia2-xfh.jf.intel.com.notmuch

Dan Williams wrote:
> Bjorn,
> 
> This is a small collection of miscellaneous updates that originated in
> the PCI/TSM work, but are suitable to go ahead in v6.17. It is a
> documentation update and a new pci_bus_find_emul_domain_nr() helper.
> 
> First, the PCI/TSM work (Trusted Execution Environment Security Manager
> (PCI device assignment for confidential guests)) wants to add some
> additional PCI host bridge sysfs attributes. In preparation for that,
> document what is already there.
> 
> Next, the PCI/TSM effort proposes samples/devsec/ as a reference and
> test implementation of all the TSM infrastructure. It is implemented via
> host bridge emulation and aims to be cross-architecture compatible. It
> stumbled over the current state of PCI domain number emulation being
> arch and driver specific. Remove some of that differentiation and unify
> the existing x86 host bridge emulators (hyper-v and vmd) on a common
> pci_bus_find_emul_domain_nr() helper.
> 
> Dan Williams (3):
>   PCI: Establish document for PCI host bridge sysfs attributes
>   PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
>   PCI: vmd: Switch to pci_bus_find_emul_domain_nr()
> 
>  .../ABI/testing/sysfs-devices-pci-host-bridge | 19 +++++++
>  MAINTAINERS                                   |  1 +
>  drivers/pci/controller/pci-hyperv.c           | 53 ++-----------------
>  drivers/pci/controller/vmd.c                  | 33 ++++--------
>  drivers/pci/pci.c                             | 43 ++++++++++++++-
>  drivers/pci/probe.c                           |  8 ++-
>  include/linux/pci.h                           |  4 ++
>  7 files changed, 86 insertions(+), 75 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-devices-pci-host-bridge
> 
> 
> base-commit: e04c78d86a9699d136910cfc0bdcf01087e3267e
> -- 
> 2.50.1
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-18  0:22         ` dan.j.williams
@ 2025-07-18  3:03           ` Michael Kelley
  2025-07-18 19:17             ` dan.j.williams
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Kelley @ 2025-07-18  3:03 UTC (permalink / raw)
  To: dan.j.williams@intel.com, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Rob Herring,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

From: dan.j.williams@intel.com <dan.j.williams@intel.com> Sent: Thursday, July 17, 2025 5:23 PM
> 
> Michael Kelley wrote:
> > From: dan.j.williams@intel.com <dan.j.williams@intel.com> Sent: Thursday, July 17, 2025 12:59 PM
> > >
> > > Michael Kelley wrote:
> > > > From: Dan Williams <dan.j.williams@intel.com> Sent: Wednesday, July 16, 2025 9:09 AM
> > >
> > > Thanks for taking a look Michael!
> > >
> > > [..]
> > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > > > index e9448d55113b..833ebf2d5213 100644
> > > > > --- a/drivers/pci/pci.c
> > > > > +++ b/drivers/pci/pci.c
> > > > > @@ -6692,9 +6692,50 @@ static void pci_no_domains(void)
> > > > >  #endif
> > > > >  }
> > > > >
> > > > > +#ifdef CONFIG_PCI_DOMAINS
> > > > > +static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > > > > +
> > > > > +/*
> > > > > + * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > > > > + * fallback to the first free domain number above the last ACPI segment number.
> > > > > + * Caller may have a specific domain number in mind, in which case try to
> > > > > + * reserve it.
> > > > > + *
> > > > > + * Note that this allocation is freed by pci_release_host_bridge_dev().
> > > > > + */
> > > > > +int pci_bus_find_emul_domain_nr(int hint)
> > > > > +{
> > > > > +	if (hint >= 0) {
> > > > > +		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > > > +				       GFP_KERNEL);
> > > >
> > > > This almost preserves the existing functionality in pci-hyperv.c. But if the
> > > > "hint" passed in is zero, current code in pci-hyperv.c treats that as a
> > > > collision and allocates some other value. The special treatment of zero is
> > > > necessary per the comment with the definition of HVPCI_DOM_INVALID.
> > > >
> > > > I don't have an opinion on whether the code here should treat a "hint"
> > > > of zero as invalid, or whether that should be handled in pci-hyperv.c.
> > >
> > > Oh, I see what you are saying. I made the "hint == 0" case start working
> > > where previously it should have failed. I feel like that's probably best
> > > handled in pci-hyperv.c with something like the following which also
> > > fixes up a regression I caused with @dom being unsigned:
> > >
> > > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > > index cfe9806bdbe4..813757db98d1 100644
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
> > >  {
> > >  	struct pci_host_bridge *bridge;
> > >  	struct hv_pcibus_device *hbus;
> > > -	u16 dom_req, dom;
> > > +	int ret, dom = -EINVAL;
> > > +	u16 dom_req;
> > >  	char *name;
> > > -	int ret;
> > >
> > >  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
> > >  	if (!bridge)
> > > @@ -3673,7 +3673,8 @@ static int hv_pci_probe(struct hv_device *hdev,
> > >  	 * collisions) in the same VM.
> > >  	 */
> > >  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > > -	dom = pci_bus_find_emul_domain_nr(dom_req);
> > > +	if (dom_req)
> > > +		dom = pci_bus_find_emul_domain_nr(dom_req);
> >
> > No, I don't think this is right either. If dom_req is 0, we don't want to
> > hv_pci_probe() to fail. We want the "collision" path to be taken so that
> > some other unused PCI domain ID is assigned. That could be done by
> > passing -1 as the hint to pci_bus_bind_emul_domain_nr(). Or PCI
> > domain ID 0 could be pre-reserved in init_hv_pci_drv() like is done
> > with HVPCI_DOM_INVALID in current code.
> 
> Yeah, I realized that shortly after sending. I will slow down.
> 
> > >
> > > A couple observations:
> > >
> > > - I think it would be reasonable to not fallback in the hint case with
> > >   something like this:
> >
> > We *do* need the fallback in the hint case. If the hint causes a collision
> > (i.e., another device is already using the hinted PCI domain ID), then we
> > need to choose some other PCI domain ID. Again, we don't want hv_pci_probe()
> > to fail for the device because the value of bytes 4 and 5 chosen from device's
> > GUID (as assigned by Hyper-V) accidently matches bytes 4 and 5 of some other
> > device's GUID. Hyper-V guarantees the GUIDs are unique, but not bytes 4 and
> > 5 standing alone. Current code behaves like the acpi_disabled case in your
> > patch, and picks some other unused PCI domain ID in the 1 to 0xFFFF range.
> 
> Ok, that feels like "let the caller set the range in addition to the
> hint".
> 
> >
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 833ebf2d5213..0bd2053dbe8a 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -6705,14 +6705,10 @@ static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > >   */
> > >  int pci_bus_find_emul_domain_nr(int hint)
> > >  {
> > > -	if (hint >= 0) {
> > > -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > > +	if (hint >= 0)
> > > +		return ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> > >  				       GFP_KERNEL);
> > >
> > > -		if (hint >= 0)
> > > -			return hint;
> > > -	}
> > > -
> > >  	if (acpi_disabled)
> > >  		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> > >
> > > - The VMD driver has been allocating 32-bit PCI domain numbers since
> > >   v4.5 185a383ada2e ("x86/PCI: Add driver for Intel Volume Management
> > >   Device (VMD)"). At a minimum if it is still a problem, it is a shared
> > >   problem, but the significant deployment of VMD in the time likely
> > >   indicates it is ok. If not, the above change at least makes the
> > >   hyper-v case avoid 32-bit domain numbers.
> >
> > The problem we encountered in 2018/2019 was with graphics devices
> > and the Xorg X Server, specifically with the PCI domain ID stored in
> > xorg.conf to identify the graphics device that the X Server was to run
> > against. I don't recall ever seeing a similar problem with storage or NIC
> > devices, but my memory could be incomplete. It's plausible that user
> > space code accessing the VMD device correctly handled 32-bit domain
> > IDs, but that's not necessarily an indicator for user space graphics
> > software. The Xorg X Server issues would have started somewhere after
> > commit 4a9b0933bdfc in the 4.11 kernel, and were finally fixed in the 5.4
> > kernel with commits be700103efd10 and f73f8a504e279.
> >
> > All that said, I'm not personally averse to trying again in assigning a
> > domain ID > 0xFFFF. I do see a commit [1] to fix libpciaccess that was
> > made 7 years ago in response to the issues we were seeing on Hyper-V.
> > Assuming those fixes have propagated into using packages like X Server,
> > then we're good. But someone from Microsoft should probably sign off
> > on taking this risk. I retired from Microsoft nearly two years ago, and
> > meddle in things from time-to-time without the burden of dealing
> > with customer support issues. ;-)
> 
> Living the dream! Extra thanks for taking a look.
> 
> > [1] https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/commit/a167bd6474522a709ff3cbb00476c0e4309cb66f > 
> Thanks for this.
> 
> I would rather do the equivalent conversion for now because 7 years old
> is right on the cusp of "someone might still be running that with new
> kernels".

Works for me, and is a bit less risky.

> 
> Here is the replacement fixup that I will fold in if it looks good to
> you:
> 
> -- 8< --
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index cfe9806bdbe4..f1079a438bff 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
>  {
>  	struct pci_host_bridge *bridge;
>  	struct hv_pcibus_device *hbus;
> -	u16 dom_req, dom;
> +	int ret, dom;
> +	u16 dom_req;
>  	char *name;
> -	int ret;
> 
>  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
>  	if (!bridge)
> @@ -3673,8 +3673,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	 * collisions) in the same VM.
>  	 */
>  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> -	dom = pci_bus_find_emul_domain_nr(dom_req);
> -

As an additional paragraph the larger comment block above, let's include a
massaged version of the comment associated with HVPCI_DOM_INVALID.
Perhaps:

 *
 * Because Gen1 VMs use domain 0, don't allow picking domain 0 here, even
 * if bytes 4 and 5 of the instance GUID are both zero.
 */

> +	dom = pci_bus_find_emul_domain_nr(dom_req, 1, U16_MAX);
>  	if (dom < 0) {
>  		dev_err(&hdev->device,
>  			"Unable to use dom# 0x%x or other numbers", dom_req);
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index f60244ff9ef8..30935fe85af9 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -881,7 +881,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
> 
>  	sd->vmd_dev = vmd->dev;
> -	sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET);
> +
> +	/*
> +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> +	 * Other bits are currently reserved.
> +	 */
> +	sd->domain = pci_bus_find_emul_domain_nr(0, 0x10000, INT_MAX);
>  	if (sd->domain < 0)
>  		return sd->domain;
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 833ebf2d5213..de42e53f07d0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -6695,34 +6695,15 @@ static void pci_no_domains(void)
>  #ifdef CONFIG_PCI_DOMAINS
>  static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> 
> -/*
> - * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> - * fallback to the first free domain number above the last ACPI segment number.
> - * Caller may have a specific domain number in mind, in which case try to
> - * reserve it.
> - *
> - * Note that this allocation is freed by pci_release_host_bridge_dev().
> +/**
> + * pci_bus_find_emul_domain_nr() - allocate a PCI domain number per constraints
> + * @hint: desired domain, 0 if any id in the range of @min to @max is acceptable
> + * @min: minimum allowable domain
> + * @max: maximum allowable domain, no ids higher than INT_MAX will be returned
>   */
> -int pci_bus_find_emul_domain_nr(int hint)
> +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)

Shouldn't the return type here still be "int"?  ida_alloc_range() can return a negative
errno if it fails. And the call sites in hv_pci_probe() and vmd_enable_domain()
store the return value into an "int".

Other than that, and my suggested added comment, this looks good.

Michael

>  {
> -	if (hint >= 0) {
> -		hint = ida_alloc_range(&pci_domain_nr_dynamic_ida, hint, hint,
> -				       GFP_KERNEL);
> -
> -		if (hint >= 0)
> -			return hint;
> -	}
> -
> -	if (acpi_disabled)
> -		return ida_alloc(&pci_domain_nr_dynamic_ida, GFP_KERNEL);
> -
> -	/*
> -	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> -	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> -	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> -	 * Other bits are currently reserved.
> -	 */
> -	return ida_alloc_range(&pci_domain_nr_dynamic_ida, 0x10000, INT_MAX,
> +	return ida_alloc_range(&pci_domain_nr_dynamic_ida, max(hint, min), max,
>  			       GFP_KERNEL);
>  }
>  EXPORT_SYMBOL_GPL(pci_bus_find_emul_domain_nr);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f6a713da5c49..4aeabe8e2f1f 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1934,13 +1934,16 @@ DEFINE_GUARD(pci_dev, struct pci_dev *,
> pci_dev_lock(_T), pci_dev_unlock(_T))
>   */
>  #ifdef CONFIG_PCI_DOMAINS
>  extern int pci_domains_supported;
> -int pci_bus_find_emul_domain_nr(int hint);
> +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max);
>  void pci_bus_release_emul_domain_nr(int domain_nr);
>  #else
>  enum { pci_domains_supported = 0 };
>  static inline int pci_domain_nr(struct pci_bus *bus) { return 0; }
>  static inline int pci_proc_domain(struct pci_bus *bus) { return 0; }
> -static inline int pci_bus_find_emul_domain_nr(int hint) { return 0; }
> +static inline u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)
> +{
> +	return 0;
> +}
>  static inline void pci_bus_release_emul_domain_nr(int domain_nr) { }
>  #endif /* CONFIG_PCI_DOMAINS */
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
  2025-07-18  3:03           ` Michael Kelley
@ 2025-07-18 19:17             ` dan.j.williams
  0 siblings, 0 replies; 12+ messages in thread
From: dan.j.williams @ 2025-07-18 19:17 UTC (permalink / raw)
  To: Michael Kelley, dan.j.williams@intel.com, bhelgaas@google.com
  Cc: linux-pci@vger.kernel.org, lukas@wunner.de,
	linux-kernel@vger.kernel.org, Jonathan.Cameron@huawei.com,
	Suzuki K Poulose, Lorenzo Pieralisi, Rob Herring,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui,
	open list:Hyper-V/Azure CORE AND DRIVERS

Michael Kelley wrote:
[..]
> > Here is the replacement fixup that I will fold in if it looks good to
> > you:
> > 
> > -- 8< --
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index cfe9806bdbe4..f1079a438bff 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -3642,9 +3642,9 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  {
> >  	struct pci_host_bridge *bridge;
> >  	struct hv_pcibus_device *hbus;
> > -	u16 dom_req, dom;
> > +	int ret, dom;
> > +	u16 dom_req;
> >  	char *name;
> > -	int ret;
> > 
> >  	bridge = devm_pci_alloc_host_bridge(&hdev->device, 0);
> >  	if (!bridge)
> > @@ -3673,8 +3673,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  	 * collisions) in the same VM.
> >  	 */
> >  	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > -	dom = pci_bus_find_emul_domain_nr(dom_req);
> > -
> 
> As an additional paragraph the larger comment block above, let's include a
> massaged version of the comment associated with HVPCI_DOM_INVALID.
> Perhaps:
> 
>  *
>  * Because Gen1 VMs use domain 0, don't allow picking domain 0 here, even
>  * if bytes 4 and 5 of the instance GUID are both zero.
>  */

That looks good to me.

> 
> > +	dom = pci_bus_find_emul_domain_nr(dom_req, 1, U16_MAX);
> >  	if (dom < 0) {
> >  		dev_err(&hdev->device,
> >  			"Unable to use dom# 0x%x or other numbers", dom_req);
> > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > index f60244ff9ef8..30935fe85af9 100644
> > --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -881,7 +881,14 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
> >  	pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);
> > 
> >  	sd->vmd_dev = vmd->dev;
> > -	sd->domain = pci_bus_find_emul_domain_nr(PCI_DOMAIN_NR_NOT_SET);
> > +
> > +	/*
> > +	 * Emulated domains start at 0x10000 to not clash with ACPI _SEG
> > +	 * domains.  Per ACPI r6.0, sec 6.5.6,  _SEG returns an integer, of
> > +	 * which the lower 16 bits are the PCI Segment Group (domain) number.
> > +	 * Other bits are currently reserved.
> > +	 */
> > +	sd->domain = pci_bus_find_emul_domain_nr(0, 0x10000, INT_MAX);
> >  	if (sd->domain < 0)
> >  		return sd->domain;
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 833ebf2d5213..de42e53f07d0 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -6695,34 +6695,15 @@ static void pci_no_domains(void)
> >  #ifdef CONFIG_PCI_DOMAINS
> >  static DEFINE_IDA(pci_domain_nr_dynamic_ida);
> > 
> > -/*
> > - * Find a free domain_nr either allocated by pci_domain_nr_dynamic_ida or
> > - * fallback to the first free domain number above the last ACPI segment number.
> > - * Caller may have a specific domain number in mind, in which case try to
> > - * reserve it.
> > - *
> > - * Note that this allocation is freed by pci_release_host_bridge_dev().
> > +/**
> > + * pci_bus_find_emul_domain_nr() - allocate a PCI domain number per constraints
> > + * @hint: desired domain, 0 if any id in the range of @min to @max is acceptable
> > + * @min: minimum allowable domain
> > + * @max: maximum allowable domain, no ids higher than INT_MAX will be returned
> >   */
> > -int pci_bus_find_emul_domain_nr(int hint)
> > +u32 pci_bus_find_emul_domain_nr(u32 hint, u32 min, u32 max)
> 
> Shouldn't the return type here still be "int"?  ida_alloc_range() can return a negative
> errno if it fails. And the call sites in hv_pci_probe() and vmd_enable_domain()
> store the return value into an "int".

Yup, good catch.

> Other than that, and my suggested added comment, this looks good.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-18 19:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-16 16:08 [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Dan Williams
2025-07-16 16:08 ` [PATCH 1/3] PCI: Establish document for PCI host bridge sysfs attributes Dan Williams
2025-07-16 16:08 ` [PATCH 2/3] PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms Dan Williams
2025-07-17 17:25   ` Michael Kelley
2025-07-17 19:59     ` dan.j.williams
2025-07-17 23:06       ` Michael Kelley
2025-07-18  0:22         ` dan.j.williams
2025-07-18  3:03           ` Michael Kelley
2025-07-18 19:17             ` dan.j.williams
2025-07-16 16:08 ` [PATCH 3/3] PCI: vmd: Switch to pci_bus_find_emul_domain_nr() Dan Williams
2025-07-17 22:13 ` [PATCH 0/3] PCI: Unify domain emulation and misc documentation update Bjorn Helgaas
2025-07-18  0:26 ` dan.j.williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).