Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: David Miller @ 2019-08-09  1:13 UTC (permalink / raw)
  To: decui
  Cc: netdev, haiyangz, sthemmin, sashal, kys, mikelley, linux-hyperv,
	linux-kernel, olaf, apw, jasowang, vkuznets, marcelo.cerri
In-Reply-To: <PU1P153MB0169AECABF6094A3E7BEE381BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Tue, 6 Aug 2019 05:17:44 +0000

> 
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Please resend with appropriate fixes tag.

^ permalink raw reply

* RE: [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-08-08 20:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <20190808191913.GI151852@google.com>

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Thursday, August 8, 2019 12:19 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Thu, Aug 08, 2019 at 06:46:51PM +0000, Dexuan Cui wrote:
> >
> > In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> > In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> > but the current code misses the pci_legacy_resume_early() path, so the
> > state remains in PCI_UNKNOWN in that path, and during hiberantion this
> > causes an error for the Mellanox VF driver, which fails to enable
> > MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:
> 
> s/hiberantion/hibernation/

Thanks for spoting this typo! :-)
 
> Actually, it sounds more like "during *resume*, this causes an error",
> so maybe you want s/hiberantion/resume/ instead?

Yes, it's during "resume", and to be more accurate, it happens during the
"resume" of the "thaw" phase: when we run "echo disk > /sys/power/state",
first the kernel "freezes" all the devices and create a hibernation image, then
the kernel "thaws" the devices including the disk/NIC, and writes the memory
to the disk and powers down. This patch fixes the error message for the
Mellanox VF driver in this phase. 

When the system starts again, a fresh kernel starts to run, and when the kernel
detects that a hibernation image was saved, the kernel "quiesces" the devices,
and "restores" the devices from the saved image. Here device_resume_noirq()
-> ... -> pci_pm_restore_noirq() -> pci_pm_default_resume_early() ->
pci_power_up() moves the device states back to PCI_D0. This path is not broken
and doesn't need my patch. 

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Bjorn Helgaas @ 2019-08-08 19:19 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <PU1P153MB01695867B01987A8C239A8CCBFD70@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Aug 08, 2019 at 06:46:51PM +0000, Dexuan Cui wrote:
> 
> In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> but the current code misses the pci_legacy_resume_early() path, so the
> state remains in PCI_UNKNOWN in that path, and during hiberantion this
> causes an error for the Mellanox VF driver, which fails to enable
> MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:

s/hiberantion/hibernation/

Actually, it sounds more like "during *resume*, this causes an error",
so maybe you want s/hiberantion/resume/ instead?

> mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> mlx4_core a6d1:00:02.0: Sending reset
> mlx4_core a6d1:00:02.0: Sending vhcr0
> mlx4_core a6d1:00:02.0: HCA minimum page size:512
> mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/pci/pci-driver.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 36dbe960306b..27dfc68db9e7 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  			return error;
>  	}
>  
> -	if (pci_has_legacy_pm_support(pci_dev))
> -		return pci_legacy_resume_early(dev);
> -
>  	/*
>  	 * pci_restore_state() requires the device to be in D0 (because of MSI
>  	 * restoration among other things), so force it into D0 in case the
>  	 * driver's "freeze" callbacks put it into a low-power state directly.
>  	 */
>  	pci_set_power_state(pci_dev, PCI_D0);
> +
> +	if (pci_has_legacy_pm_support(pci_dev))
> +		return pci_legacy_resume_early(dev);
> +
>  	pci_restore_state(pci_dev);
>  
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> -- 
> 2.19.1
> 

^ permalink raw reply

* [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-08-08 18:46 UTC (permalink / raw)
  To: lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org
  Cc: Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com, Dexuan Cui


In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
but the current code misses the pci_legacy_resume_early() path, so the
state remains in PCI_UNKNOWN in that path, and during hiberantion this
causes an error for the Mellanox VF driver, which fails to enable
MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:

mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
mlx4_core a6d1:00:02.0: Sending reset
mlx4_core a6d1:00:02.0: Sending vhcr0
mlx4_core a6d1:00:02.0: HCA minimum page size:512
mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
PM: Device a6d1:00:02.0 failed to thaw: error -95

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/pci/pci-driver.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 36dbe960306b..27dfc68db9e7 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
 			return error;
 	}
 
-	if (pci_has_legacy_pm_support(pci_dev))
-		return pci_legacy_resume_early(dev);
-
 	/*
 	 * pci_restore_state() requires the device to be in D0 (because of MSI
 	 * restoration among other things), so force it into D0 in case the
 	 * driver's "freeze" callbacks put it into a low-power state directly.
 	 */
 	pci_set_power_state(pci_dev, PCI_D0);
+
+	if (pci_has_legacy_pm_support(pci_dev))
+		return pci_legacy_resume_early(dev);
+
 	pci_restore_state(pci_dev);
 
 	if (drv && drv->pm && drv->pm->thaw_noirq)
-- 
2.19.1


^ permalink raw reply related

* RE: [PATCH v2 6/7] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
From: Dexuan Cui @ 2019-08-07 23:09 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-7-git-send-email-decui@microsoft.com>

> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, July 31, 2019 10:52 AM
> To: linux-hyperv@vger.kernel.org; gregkh@linuxfoundation.org; Stephen
> @@ -2050,6 +2095,10 @@ static int vmbus_acpi_add(struct acpi_device
> *device)
>  };
>  MODULE_DEVICE_TABLE(acpi, vmbus_acpi_device_ids);
> 
> +static const struct dev_pm_ops vmbus_bus_pm = {
> +	SET_SYSTEM_SLEEP_PM_OPS(vmbus_bus_suspend, vmbus_bus_resume)
> +};

It turns out this " SET_SYSTEM_SLEEP_PM_OPS" should be changed to
"SET_NOIRQ_SYSTEM_SLEEP_PM_OPS" to make NIC SR-IOV work with
hibernation.

This is because the "pci_dev_pm_ops" uses the "noirq" callbacks. 
In the resume path, the "noirq" restore callback runs before the non-noirq
callbacks, meaning the vmbus_bus_resume() and the pci-hyperv's .resume()
must also run via the "noirq" callbacks.

Similarly, I also need to make the same change in this patch:
[PATCH v2 7/7] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation

BTW, I'd like to change the print_hex_dump_debug() to print_hex_dump() in this patch:
[PATCH v2 5/7] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation

print_hex_dump_debug() outputs nothing unless we enable the dyndbg. 
This is not good as we do want see the error messages here, if there is an error.

I'll do more tests and post a v3 of the patchset.

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v2 6/7] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
From: Michael Kelley @ 2019-08-07 15:22 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-7-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Wednesday, July 31, 2019 10:52 AM
> 
> Before Linux enters hibernation, it sends the CHANNELMSG_UNLOAD message to
> the host so all the offers are gone. After hibernation, Linux needs to
> re-negotiate with the host using the same vmbus protocol version (which
> was in use before hibernation), and ask the host to re-offer the vmbus
> devices.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/connection.c   |  3 +--
>  drivers/hv/hyperv_vmbus.h |  2 ++
>  drivers/hv/vmbus_drv.c    | 50
> +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+), 2 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2 5/7] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
From: Michael Kelley @ 2019-08-07 15:22 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-6-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com>  Sent: Wednesday, July 31, 2019 10:52 AM
> 
> When the VM resumes, the host re-sends the offers. We should not add the
> offers to the global vmbus_connection.chn_list again.
> 
> Added some debug code, in case the host screws up the exact info related to
> the offers.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2 4/7] Drivers: hv: vmbus: Suspend/resume the synic for hibernation
From: Michael Kelley @ 2019-08-07 15:21 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-5-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Wednesday, July 31, 2019 10:52 AM
> 
> This is needed when we resume the old kernel from the "current" kernel.
> 
> Note: when hv_synic_suspend() and hv_synic_resume() run, all the
> non-boot CPUs have been offlined, and interrupts are disabled on CPU0.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-07  6:56 UTC (permalink / raw)
  To: Jakub Kicinski, Stephen Hemminger
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com
In-Reply-To: <20190806121242.141c2324@cakuba.netronome.com>

> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Tuesday, August 6, 2019 12:13 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> > This fixes a warning of "suspicious rcu_dereference_check() usage"
> > when nload runs.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> Minor change in behaviour would perhaps be worth acknowledging in the
> commit message (since you check ndev for NULL later now), and a Fixes
> tag would be good.
> 
> But the looks pretty straightforward and correct!

Hi,
Yeah, it looks the minor behavior change doesn't matter, because IMO the 
'nvdev' can only be NULL when the NIC is being removed, or the MTU is
being changed, etc.

The Fixes tag is:
Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")

If I should send a v2, please let me know.

Thanks,
-- Dexuan


^ permalink raw reply

* [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-06 23:52 UTC (permalink / raw)
  To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, linux-kernel@vger.kernel.org

Currently in Azure cloud, for passthrough devices including GPU, the
host sets the device instance ID's bytes 8 - 15 to a value derived from
the host HWID, which is the same on all devices in a VM. So, the device
instance ID's bytes 8 and 9 provided by the host are no longer unique.

This can cause device passthrough to VMs to fail because the bytes 8 and
9 is used as PCI domain number. So, as recommended by Azure host team,
we now use the bytes 4 and 5 which usually contain unique numbers as PCI
domain. The chance of collision is greatly reduced. In the rare cases of
collision, we will detect and find another number that is not in use.

Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
 1 file changed, 79 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..4f3d97e 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
 		complete(&hbus->remove_event);
 }
 
+#define HVPCI_DOM_MAP_SIZE (64 * 1024)
+static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
+
+/*
+ * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
+ * as invalid for passthrough PCI devices of this driver.
+ */
+#define HVPCI_DOM_INVALID 0
+
+/**
+ * hv_get_dom_num() - Get a valid PCI domain number
+ * Check if the PCI domain number is in use, and return another number if
+ * it is in use.
+ *
+ * @dom: Requested domain number
+ *
+ * return: domain number on success, HVPCI_DOM_INVALID on failure
+ */
+static u16 hv_get_dom_num(u16 dom)
+{
+	unsigned int i;
+
+	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
+		return dom;
+
+	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
+		if (test_and_set_bit(i, hvpci_dom_map) == 0)
+			return i;
+	}
+
+	return HVPCI_DOM_INVALID;
+}
+
+/**
+ * hv_put_dom_num() - Mark the PCI domain number as free
+ * @dom: Domain number to be freed
+ */
+static void hv_put_dom_num(u16 dom)
+{
+	clear_bit(dom, hvpci_dom_map);
+}
+
 /**
  * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
  * @hdev:	VMBus's tracking struct for this root PCI bus
@@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 			const struct hv_vmbus_device_id *dev_id)
 {
 	struct hv_pcibus_device *hbus;
+	u16 dom_req, dom;
 	int ret;
 
 	/*
@@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus->state = hv_pcibus_init;
 
 	/*
-	 * The PCI bus "domain" is what is called "segment" in ACPI and
-	 * other specs.  Pull it from the instance ID, to get something
-	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
-	 * do the same thing for consistency.  Note that, since this code
-	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
-	 * that (1) the only domain in use for something that looks like
-	 * a physical PCI bus (which is actually emulated by the
-	 * hypervisor) is domain 0 and (2) there will be no overlap
-	 * between domains derived from these instance IDs in the same
-	 * VM.
+	 * The PCI bus "domain" is what is called "segment" in ACPI and other
+	 * specs. Pull it from the instance ID, to get something usually
+	 * unique. In rare cases of collision, we will find out another number
+	 * not in use.
+	 *
+	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
+	 * together with this guest driver can guarantee that (1) The only
+	 * domain used by Gen1 VMs for something that looks like a physical
+	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
+	 * (2) There will be no overlap between domains (after fixing possible
+	 * collisions) in the same VM.
 	 */
-	hbus->sysdata.domain = hdev->dev_instance.b[9] |
-			       hdev->dev_instance.b[8] << 8;
+	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
+	dom = hv_get_dom_num(dom_req);
+
+	if (dom == HVPCI_DOM_INVALID) {
+		dev_err(&hdev->device,
+			"Unable to use dom# 0x%hx or other numbers", dom_req);
+		ret = -EINVAL;
+		goto free_bus;
+	}
+
+	if (dom != dom_req)
+		dev_info(&hdev->device,
+			 "PCI dom# 0x%hx has collision, using 0x%hx",
+			 dom_req, dom);
+
+	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
 	refcount_set(&hbus->remove_lock, 1);
@@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
 		ret = -ENOMEM;
-		goto free_bus;
+		goto free_dom;
 	}
 
 	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
@@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 	vmbus_close(hdev->channel);
 destroy_wq:
 	destroy_workqueue(hbus->wq);
+free_dom:
+	hv_put_dom_num(hbus->sysdata.domain);
 free_bus:
 	free_page((unsigned long)hbus);
 	return ret;
@@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
 	put_hvpcibus(hbus);
 	wait_for_completion(&hbus->remove_event);
 	destroy_workqueue(hbus->wq);
+
+	hv_put_dom_num(hbus->sysdata.domain);
+
 	free_page((unsigned long)hbus);
 	return 0;
 }
@@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
 
 static int __init init_hv_pci_drv(void)
 {
+	/* Set the invalid domain number's bit, so it will not be used */
+	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
+
 	return vmbus_driver_register(&hv_pci_drv);
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-06 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, Stephen Hemminger, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <20190806201611.GT151852@google.com>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Bjorn Helgaas
> Sent: Tuesday, August 6, 2019 1:16 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> Thanks for updating this.  But you didn't update the subject line,
> which is really still a little too low-level.  Maybe Lorenzo will fix
> this.  Something like this, maybe?
> 
>   PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it

This is better. Thanks!

I hope Lorenzo can help to fix this so I could avoid a v3. :-)

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Bjorn Helgaas @ 2019-08-06 20:16 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, Stephen Hemminger, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <PU1P153MB01693F32F6BB02F9655CC84EBFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Thanks for updating this.  But you didn't update the subject line,
which is really still a little too low-level.  Maybe Lorenzo will fix
this.  Something like this, maybe?

  PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it

On Fri, Aug 02, 2019 at 10:50:20PM +0000, Dexuan Cui wrote:
> 
> The slot must be removed before the pci_dev is removed, otherwise a panic
> can happen due to use-after-free.
> 
> Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> 
> Changes in v2:
>   Improved the changelog accordign to the discussion with Bjorn Helgaas:
> 	  https://lkml.org/lkml/2019/8/1/1173
> 	  https://lkml.org/lkml/2019/8/2/1559
> 
>  drivers/pci/controller/pci-hyperv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6b9cc6e60a..68c611d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> -		pci_remove_root_bus(hbus->pci_bus);
>  		hv_pci_remove_slots(hbus);
> +		pci_remove_root_bus(hbus->pci_bus);
>  		pci_unlock_rescan_remove();
>  		hbus->state = hv_pcibus_removed;
>  	}
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-06 19:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: sashal@kernel.org, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190806185515.GR151852@google.com>



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, August 6, 2019 2:55 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; lorenzo.pieralisi@arm.com; linux-
> hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
> 
> On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> > Due to Azure host agent settings, the device instance ID's bytes 8 and
> > 9 are no longer unique. This causes some of the PCI devices not
> > showing up in VMs with multiple passthrough devices, such as GPUs. So,
> > as recommended by Azure host team, we now use the bytes 4 and 5 which
> > usually provide unique numbers.
> 
> What does "Azure host agent settings" mean?  Would it be useful to say
> something more specific, so users could ready this and say "oh, I'm using the
> Azure host agent settings mentioned here, so I need this patch"?  Is this
> related to a specific Azure host agent commit or release?
> 
> "This causes some of the PCI devices ..." is not a sentence.  I think I
> understand what you're saying -- "This sometimes causes device passthrough
> to VMs to fail." Is there something about GPUs that makes them more
> susceptible to this problem?
> 
> I think there are really two changes in this patch:
> 
>   1) Start with a domain number from bytes 4-5 instead of bytes 8-9.
> 
>   2) If the domain number is not unique, allocate another one using
>   the bitmap.
> 
> It sounds like part 2) by itself would be enough to solve the problem, and
> including part 1) just reduces the likelihood of having to allocate another
> domain number.
> 
> > In the rare cases of collision, we will detect and find another number
> > that is not in use.
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this
> idea.
> 
> This looks like two paragraphs and should have a blank line between them.
> 
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 91
> > +++++++++++++++++++++++++++++++------
> >  1 file changed, 78 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 82acd61..6b9cc6e60a 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -37,6 +37,8 @@
> >   * the PCI back-end driver in Hyper-V.
> >   */
> >
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > @@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct
> hv_pcibus_device *hbus)
> >  		complete(&hbus->remove_event);
> >  }
> >
> > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) static
> > +DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> > +
> > +/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so
> > +define 0
> > + * as invalid for passthrough PCI devices of this driver.
> > + */
> 
> Please use the usual multi-line comment style:
> 
>   /*
>    * PCI domain number ...
>    */
> 
> > +#define HVPCI_DOM_INVALID 0
> > +
> > +/**
> > + * hv_get_dom_num() - Get a valid PCI domain number
> > + * Check if the PCI domain number is in use, and return another
> > +number if
> > + * it is in use.
> > + *
> > + * @dom: Requested domain number
> > + *
> > + * return: domain number on success, HVPCI_DOM_INVALID on failure  */
> > +static u16 hv_get_dom_num(u16 dom) {
> > +	unsigned int i;
> > +
> > +	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> > +		return dom;
> > +
> > +	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> > +		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> > +			return i;
> > +	}
> > +
> > +	return HVPCI_DOM_INVALID;
> > +}
> > +
> > +/**
> > + * hv_put_dom_num() - Mark the PCI domain number as free
> > + * @dom: Domain number to be freed
> > + */
> > +static void hv_put_dom_num(u16 dom)
> > +{
> > +	clear_bit(dom, hvpci_dom_map);
> > +}
> > +
> >  /**
> >   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
> >   * @hdev:	VMBus's tracking struct for this root PCI bus
> > @@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  			const struct hv_vmbus_device_id *dev_id)  {
> >  	struct hv_pcibus_device *hbus;
> > +	u16 dom_req, dom;
> >  	int ret;
> >
> >  	/*
> > @@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device
> *hdev,
> >  	hbus->state = hv_pcibus_init;
> >
> >  	/*
> > -	 * The PCI bus "domain" is what is called "segment" in ACPI and
> > -	 * other specs.  Pull it from the instance ID, to get something
> > -	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
> > -	 * do the same thing for consistency.  Note that, since this code
> > -	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> > -	 * that (1) the only domain in use for something that looks like
> > -	 * a physical PCI bus (which is actually emulated by the
> > -	 * hypervisor) is domain 0 and (2) there will be no overlap
> > -	 * between domains derived from these instance IDs in the same
> > -	 * VM.
> > +	 * The PCI bus "domain" is what is called "segment" in ACPI and other
> > +	 * specs. Pull it from the instance ID, to get something usually
> > +	 * unique. In rare cases of collision, we will find out another number
> > +	 * not in use.
> > +	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> > +	 * together with this guest driver can guarantee that (1) The only
> > +	 * domain used by Gen1 VMs for something that looks like a physical
> > +	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> > +	 * (2) There will be no overlap between domains (after fixing possible
> > +	 * collisions) in the same VM.
> 
> Please use blank lines between paragraphs.
> 
> >  	 */
> > -	hbus->sysdata.domain = hdev->dev_instance.b[9] |
> > -			       hdev->dev_instance.b[8] << 8;
> > +	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > +	dom = hv_get_dom_num(dom_req);
> > +
> > +	if (dom == HVPCI_DOM_INVALID) {
> > +		pr_err("Unable to use dom# 0x%hx or other numbers",
> > +		       dom_req);
> > +		ret = -EINVAL;
> > +		goto free_bus;
> > +	}
> > +
> > +	if (dom != dom_req)
> > +		pr_info("PCI dom# 0x%hx has collision, using 0x%hx",
> > +			dom_req, dom);
> 
> Can these use "dev_err/info(&hdev->device, ...)" like the other message in
> this function?  It's always nicer to have a specific device reference when one
> is available.  Probably don't need the new pr_fmt() definition if you do this.
> 
> > +
> > +	hbus->sysdata.domain = dom;
> >
> >  	hbus->hdev = hdev;
> >  	refcount_set(&hbus->remove_lock, 1); @@ -2559,7 +2616,7 @@
> static
> > int hv_pci_probe(struct hv_device *hdev,
> >  					   hbus->sysdata.domain);
> >  	if (!hbus->wq) {
> >  		ret = -ENOMEM;
> > -		goto free_bus;
> > +		goto free_dom;
> >  	}
> >
> >  	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL,
> > 0, @@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device
> *hdev,
> >  	vmbus_close(hdev->channel);
> >  destroy_wq:
> >  	destroy_workqueue(hbus->wq);
> > +free_dom:
> > +	hv_put_dom_num(hbus->sysdata.domain);
> >  free_bus:
> >  	free_page((unsigned long)hbus);
> >  	return ret;
> > @@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  	put_hvpcibus(hbus);
> >  	wait_for_completion(&hbus->remove_event);
> >  	destroy_workqueue(hbus->wq);
> > +
> > +	hv_put_dom_num(hbus->sysdata.domain);
> > +
> >  	free_page((unsigned long)hbus);
> >  	return 0;
> >  }
> > @@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void)
> >
> >  static int __init init_hv_pci_drv(void)  {
> > +	/* Set the invalid domain number's bit, so it will not be used */
> > +	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> > +
> >  	return vmbus_driver_register(&hv_pci_drv);
> >  }
> >
> > --
> > 1.8.3.1

Thanks for the comments. I will make the recommended changes.

- Haiyang

^ permalink raw reply

* Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Jakub Kicinski @ 2019-08-06 19:12 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org, KY Srinivasan,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB0169AECABF6094A3E7BEE381BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Minor change in behaviour would perhaps be worth acknowledging in the
commit message (since you check ndev for NULL later now), and a Fixes
tag would be good.

But the looks pretty straightforward and correct!

^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Bjorn Helgaas @ 2019-08-06 18:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Haiyang Zhang, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190805183657.GD17747@sasha-vm>

On Mon, Aug 05, 2019 at 02:36:57PM -0400, Sasha Levin wrote:
> On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> > Due to Azure host agent settings, the device instance ID's bytes 8 and 9
> > are no longer unique. This causes some of the PCI devices not showing up
> > in VMs with multiple passthrough devices, such as GPUs. So, as recommended
> > by Azure host team, we now use the bytes 4 and 5 which usually provide
> > unique numbers.
> > 
> > In the rare cases of collision, we will detect and find another number
> > that is not in use.
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Acked-by: Sasha Levin <sashal@kernel.org>
> 
> Bjorn, will you take it through the PCI tree or do you want me to take
> it through hyper-v?

Lorenzo usually applies patches to drivers/pci/controller/*, so that
would be my preference.

Bjorn

^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Bjorn Helgaas @ 2019-08-06 18:55 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1564771954-9181-1-git-send-email-haiyangz@microsoft.com>

On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> Due to Azure host agent settings, the device instance ID's bytes 8 and 9
> are no longer unique. This causes some of the PCI devices not showing up
> in VMs with multiple passthrough devices, such as GPUs. So, as recommended
> by Azure host team, we now use the bytes 4 and 5 which usually provide
> unique numbers.

What does "Azure host agent settings" mean?  Would it be useful to say
something more specific, so users could ready this and say "oh, I'm
using the Azure host agent settings mentioned here, so I need this
patch"?  Is this related to a specific Azure host agent commit or
release?

"This causes some of the PCI devices ..." is not a sentence.  I think
I understand what you're saying -- "This sometimes causes device
passthrough to VMs to fail." Is there something about GPUs that makes
them more susceptible to this problem?

I think there are really two changes in this patch:

  1) Start with a domain number from bytes 4-5 instead of bytes 8-9.

  2) If the domain number is not unique, allocate another one using
  the bitmap.

It sounds like part 2) by itself would be enough to solve the problem,
and including part 1) just reduces the likelihood of having to
allocate another domain number.

> In the rare cases of collision, we will detect and find another number
> that is not in use.
> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

This looks like two paragraphs and should have a blank line between
them.

> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 91 +++++++++++++++++++++++++++++++------
>  1 file changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 82acd61..6b9cc6e60a 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -37,6 +37,8 @@
>   * the PCI back-end driver in Hyper-V.
>   */
>  
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> @@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
>  		complete(&hbus->remove_event);
>  }
>  
> +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> +
> +/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> + * as invalid for passthrough PCI devices of this driver.
> + */

Please use the usual multi-line comment style:

  /*
   * PCI domain number ...
   */

> +#define HVPCI_DOM_INVALID 0
> +
> +/**
> + * hv_get_dom_num() - Get a valid PCI domain number
> + * Check if the PCI domain number is in use, and return another number if
> + * it is in use.
> + *
> + * @dom: Requested domain number
> + *
> + * return: domain number on success, HVPCI_DOM_INVALID on failure
> + */
> +static u16 hv_get_dom_num(u16 dom)
> +{
> +	unsigned int i;
> +
> +	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> +		return dom;
> +
> +	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> +		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> +			return i;
> +	}
> +
> +	return HVPCI_DOM_INVALID;
> +}
> +
> +/**
> + * hv_put_dom_num() - Mark the PCI domain number as free
> + * @dom: Domain number to be freed
> + */
> +static void hv_put_dom_num(u16 dom)
> +{
> +	clear_bit(dom, hvpci_dom_map);
> +}
> +
>  /**
>   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
>   * @hdev:	VMBus's tracking struct for this root PCI bus
> @@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  			const struct hv_vmbus_device_id *dev_id)
>  {
>  	struct hv_pcibus_device *hbus;
> +	u16 dom_req, dom;
>  	int ret;
>  
>  	/*
> @@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	hbus->state = hv_pcibus_init;
>  
>  	/*
> -	 * The PCI bus "domain" is what is called "segment" in ACPI and
> -	 * other specs.  Pull it from the instance ID, to get something
> -	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
> -	 * do the same thing for consistency.  Note that, since this code
> -	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> -	 * that (1) the only domain in use for something that looks like
> -	 * a physical PCI bus (which is actually emulated by the
> -	 * hypervisor) is domain 0 and (2) there will be no overlap
> -	 * between domains derived from these instance IDs in the same
> -	 * VM.
> +	 * The PCI bus "domain" is what is called "segment" in ACPI and other
> +	 * specs. Pull it from the instance ID, to get something usually
> +	 * unique. In rare cases of collision, we will find out another number
> +	 * not in use.
> +	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> +	 * together with this guest driver can guarantee that (1) The only
> +	 * domain used by Gen1 VMs for something that looks like a physical
> +	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> +	 * (2) There will be no overlap between domains (after fixing possible
> +	 * collisions) in the same VM.

Please use blank lines between paragraphs.

>  	 */
> -	hbus->sysdata.domain = hdev->dev_instance.b[9] |
> -			       hdev->dev_instance.b[8] << 8;
> +	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> +	dom = hv_get_dom_num(dom_req);
> +
> +	if (dom == HVPCI_DOM_INVALID) {
> +		pr_err("Unable to use dom# 0x%hx or other numbers",
> +		       dom_req);
> +		ret = -EINVAL;
> +		goto free_bus;
> +	}
> +
> +	if (dom != dom_req)
> +		pr_info("PCI dom# 0x%hx has collision, using 0x%hx",
> +			dom_req, dom);

Can these use "dev_err/info(&hdev->device, ...)" like the other
message in this function?  It's always nicer to have a specific device
reference when one is available.  Probably don't need the new pr_fmt()
definition if you do this.

> +
> +	hbus->sysdata.domain = dom;
>  
>  	hbus->hdev = hdev;
>  	refcount_set(&hbus->remove_lock, 1);
> @@ -2559,7 +2616,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  					   hbus->sysdata.domain);
>  	if (!hbus->wq) {
>  		ret = -ENOMEM;
> -		goto free_bus;
> +		goto free_dom;
>  	}
>  
>  	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> @@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	vmbus_close(hdev->channel);
>  destroy_wq:
>  	destroy_workqueue(hbus->wq);
> +free_dom:
> +	hv_put_dom_num(hbus->sysdata.domain);
>  free_bus:
>  	free_page((unsigned long)hbus);
>  	return ret;
> @@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev)
>  	put_hvpcibus(hbus);
>  	wait_for_completion(&hbus->remove_event);
>  	destroy_workqueue(hbus->wq);
> +
> +	hv_put_dom_num(hbus->sysdata.domain);
> +
>  	free_page((unsigned long)hbus);
>  	return 0;
>  }
> @@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void)
>  
>  static int __init init_hv_pci_drv(void)
>  {
> +	/* Set the invalid domain number's bit, so it will not be used */
> +	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> +
>  	return vmbus_driver_register(&hv_pci_drv);
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: hv_netvsc: WARNING: suspicious RCU usage?
From: Dexuan Cui @ 2019-08-06  5:25 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Yidong Ren, Haiyang Zhang,
	Stephen Hemminger, David S. Miller
  Cc: linux-hyperv@vger.kernel.org
In-Reply-To: <PU1P153MB0169B7073A4865D50AED9EE5BFDA0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Monday, August 5, 2019 4:56 PM
> The warning is caused by the rcu_dereference_rtnl() :
> 
> 1239 static void netvsc_get_stats64(struct net_device *net,
> 1240                                struct rtnl_link_stats64 *t)
> 1241 {
> 1242         struct net_device_context *ndev_ctx = netdev_priv(net);
> 1243         struct netvsc_device *nvdev =
> rcu_dereference_rtnl(ndev_ctx->nvdev);
> 
> I think here netvsc_get_stats64() neither holds rcu_read_lock() nor RTNL
> 
> IMO it should call rcu_read_lock()/unlock(), or get RTNL to fix the warning?

I just posted a patch on behalf of Stephen:
[PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
 
Thanks,
-- Dexuan


^ permalink raw reply

* [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-06  5:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger
  Cc: sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com


This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 44 +++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f9209594624b5..25502d335b94f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1236,25 +1236,10 @@ static void netvsc_get_pcpu_stats(struct net_device *net,
 	}
 }
 
-static void netvsc_get_stats64(struct net_device *net,
-			       struct rtnl_link_stats64 *t)
+static void netvsc_get_per_chan_stats(struct netvsc_device *nvdev,
+				      struct rtnl_link_stats64 *t)
 {
-	struct net_device_context *ndev_ctx = netdev_priv(net);
-	struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
-	struct netvsc_vf_pcpu_stats vf_tot;
-	int i;
-
-	if (!nvdev)
-		return;
-
-	netdev_stats_to_stats64(t, &net->stats);
-
-	netvsc_get_vf_stats(net, &vf_tot);
-	t->rx_packets += vf_tot.rx_packets;
-	t->tx_packets += vf_tot.tx_packets;
-	t->rx_bytes   += vf_tot.rx_bytes;
-	t->tx_bytes   += vf_tot.tx_bytes;
-	t->tx_dropped += vf_tot.tx_dropped;
+	u32 i;
 
 	for (i = 0; i < nvdev->num_chn; i++) {
 		const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
@@ -1286,6 +1271,29 @@ static void netvsc_get_stats64(struct net_device *net,
 	}
 }
 
+static void netvsc_get_stats64(struct net_device *net,
+			       struct rtnl_link_stats64 *t)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(net);
+	struct netvsc_device *nvdev;
+	struct netvsc_vf_pcpu_stats vf_tot;
+
+	netdev_stats_to_stats64(t, &net->stats);
+
+	netvsc_get_vf_stats(net, &vf_tot);
+	t->rx_packets += vf_tot.rx_packets;
+	t->tx_packets += vf_tot.tx_packets;
+	t->rx_bytes   += vf_tot.rx_bytes;
+	t->tx_bytes   += vf_tot.tx_bytes;
+	t->tx_dropped += vf_tot.tx_dropped;
+
+	rcu_read_lock();
+	nvdev = rcu_dereference(ndev_ctx->nvdev);
+	if (nvdev)
+		netvsc_get_per_chan_stats(nvdev, t);
+	rcu_read_unlock();
+}
+
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
 	struct net_device_context *ndc = netdev_priv(ndev);

^ permalink raw reply related

* hv_netvsc: WARNING: suspicious RCU usage?
From: Dexuan Cui @ 2019-08-05 23:55 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Yidong Ren, Haiyang Zhang,
	Stephen Hemminger, David S. Miller
  Cc: linux-hyperv@vger.kernel.org

Hi,
After the VM boots up, I always get the below call-trace when I run "nload" for the first time:

[  113.910911] WARNING: suspicious RCU usage
[  113.913244] 5.2.0+ #19 Not tainted
[  113.915216] -----------------------------
[  113.917521] drivers/net/hyperv/netvsc_drv.c:1243 suspicious rcu_dereference_check() usage!
[  113.922191]
[  113.922191] other info that might help us debug this:
[  113.926573]
[  113.926573] rcu_scheduler_active = 2, debug_locks = 1
[  113.930052] 4 locks held by nload/1977:
[  113.932251]  #0: 0000000080b71e86 (&p->lock){+.+.}, at: seq_read+0x41/0x3d0
[  113.936115]  #1: 00000000cacff770 (&of->mutex){+.+.}, at: kernfs_seq_start+0x2a/0x90
[  113.940115]  #2: 00000000287c988f (kn->count#134){.+.+}, at: kernfs_seq_start+0x32/0x90
[  113.944292]  #3: 00000000996fa9cc (dev_base_lock){++.+}, at: netstat_show.isra.25+0x4a/0xb0
[  113.958076]
[  113.958076] stack backtrace:
[  113.958081] CPU: 3 PID: 1977 Comm: nload Not tainted 5.2.0+ #19
[  113.958083] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
[  113.958084] Call Trace:
[  113.958091]  dump_stack+0x67/0x90
[  113.973663]  netvsc_get_stats64+0x159/0x170 [hv_netvsc]
[  113.973663]  dev_get_stats+0x55/0xb0
[  113.973663]  netstat_show.isra.25+0x5b/0xb0
[  113.973663]  dev_attr_show+0x15/0x40
[  113.981661]  sysfs_kf_seq_show+0xad/0xf0
[  113.981661]  seq_read+0x146/0x3d0
[  113.981661]  vfs_read+0x9c/0x160
[  113.989025]  ksys_read+0x5c/0xd0
[  113.989025]  do_syscall_64+0x5e/0x220
[  113.989025]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  113.989025] RIP: 0033:0x7f4485daaf31

nload is a console application which monitors network traffic and bandwidth usage in real time.

The warning is caused by the rcu_dereference_rtnl() :

1239 static void netvsc_get_stats64(struct net_device *net,
1240                                struct rtnl_link_stats64 *t)
1241 {
1242         struct net_device_context *ndev_ctx = netdev_priv(net);
1243         struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);

I think here netvsc_get_stats64() neither holds rcu_read_lock() nor RTNL

IMO it should call rcu_read_lock()/unlock(), or get RTNL to fix the warning?

Thanks,
-- Dexuan


^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Sasha Levin @ 2019-08-05 18:36 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1564771954-9181-1-git-send-email-haiyangz@microsoft.com>

On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
>Due to Azure host agent settings, the device instance ID's bytes 8 and 9
>are no longer unique. This causes some of the PCI devices not showing up
>in VMs with multiple passthrough devices, such as GPUs. So, as recommended
>by Azure host team, we now use the bytes 4 and 5 which usually provide
>unique numbers.
>
>In the rare cases of collision, we will detect and find another number
>that is not in use.
>Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.
>
>Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Acked-by: Sasha Levin <sashal@kernel.org>

Bjorn, will you take it through the PCI tree or do you want me to take
it through hyper-v?

--
Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-08-03  0:49 UTC (permalink / raw)
  To: David Miller
  Cc: Sunil Muthuswamy, netdev@vger.kernel.org, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, sashal@kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <20190802.172729.1656276508211556851.davem@davemloft.net>

> From: linux-hyperv-owner@vger.kernel.org
> Sent: Friday, August 2, 2019 5:27 PM
> ...
> Applied and queued up for -stable.
> 
> Do not ever CC: stable for networking patches, we submit to -stable manually.
 
Thanks, David!
I'll remember to not add the stable tag for network patches.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: David Miller @ 2019-08-03  0:27 UTC (permalink / raw)
  To: decui
  Cc: sunilmut, netdev, kys, haiyangz, sthemmin, sashal, mikelley,
	linux-hyperv, linux-kernel, olaf, apw, jasowang, vkuznets,
	marcelo.cerri
In-Reply-To: <PU1P153MB01696DDD3A3F601370701DD2BFDF0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 31 Jul 2019 01:25:45 +0000

> 
> There is a race condition for an established connection that is being closed
> by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
> 'remove_sock' is false):
> 
> 1 for the initial value;
> 1 for the sk being in the bound list;
> 1 for the sk being in the connected list;
> 1 for the delayed close_work.
> 
> After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
> decrease the refcnt to 3.
> 
> Concurrently, hvs_close_connection() runs in another thread:
>   calls vsock_remove_sock() to decrease the refcnt by 2;
>   call sock_put() to decrease the refcnt to 0, and free the sk;
>   next, the "release_sock(sk)" may hang due to use-after-free.
> 
> In the above, after hvs_release() finishes, if hvs_close_connection() runs
> faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
> because at the beginning of hvs_close_connection(), the refcnt is still 4.
> 
> The issue can be resolved if an extra reference is taken when the
> connection is established.
> 
> Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Applied and queued up for -stable.

Do not ever CC: stable for networking patches, we submit to -stable manually.

Thank you.

^ permalink raw reply

* [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-02 22:50 UTC (permalink / raw)
  To: lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, Michael Kelley, Stephen Hemminger
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com, Dexuan Cui


The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---

Changes in v2:
  Improved the changelog accordign to the discussion with Bjorn Helgaas:
	  https://lkml.org/lkml/2019/8/1/1173
	  https://lkml.org/lkml/2019/8/2/1559

 drivers/pci/controller/pci-hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6b9cc6e60a..68c611d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);
-		pci_remove_root_bus(hbus->pci_bus);
 		hv_pci_remove_slots(hbus);
+		pci_remove_root_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
 		hbus->state = hv_pcibus_removed;
 	}
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Bjorn Helgaas @ 2019-08-02 22:15 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Stephen Hemminger, lorenzo.pieralisi@arm.com,
	linux-pci@vger.kernel.org, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <PU1P153MB01698F51FE22C39086CC8353BFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Fri, Aug 02, 2019 at 08:31:26PM +0000, Dexuan Cui wrote:
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Friday, August 2, 2019 12:41 PM
> > The subject line only describes the mechanical code change, which is
> > obvious from the patch.  It would be better if we could say something
> > about *why* we need this.
> 
> Hi Bjorn,
> Sorry. I'll try to write a better changelog in v2. :-)
>  
> > On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> > >
> > > When a slot is removed, the pci_dev must still exist.
> > >
> > > pci_remove_root_bus() removes and free all the pci_devs, so
> > > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > > otherwise a general protection fault can happen, if the kernel is built
> > 
> > "general protection fault" is an x86 term that doesn't really say what
> > the issue is.  I suspect this would be a "use-after-free" problem.
> 
> Yes, it's use-after-free. I'll fix the the wording.
>  
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > -		pci_remove_root_bus(hbus->pci_bus);
> > >  		hv_pci_remove_slots(hbus);
> > > +		pci_remove_root_bus(hbus->pci_bus);
> > 
> > I'm curious about why we need hv_pci_remove_slots() at all.  None of
> > the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> > anything similar to hv_pci_remove_slots().
> > 
> > Surely some of those callers also support slots, so there must be some
> > other path that calls pci_destroy_slot() in those cases.  Can we use a
> > similar strategy here?
> 
> Originally Stephen Heminger added the slot code for pci-hyperv.c:
> a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
> So he may know this better. My understanding is: we can not use the similar
> stragegy used in the 2 other users of pci_create_slot():
> 
> drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
> It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
> pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
> to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().
> 
> drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
> the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
> PCI-device-discovery protocol (which is based on VMBus rather the emulated
> ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
> discovered by pci-hyperv, so we can not use the standard register_slot() ->
> pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
> pci_destroy_slot() can not work for pci-hyperv.

Hmm, ok.  This still doesn't seem right to me, but I think the bottom
line will be that the current slot registration interfaces just don't
work quite right for all the cases we want them to.

Maybe it would be a good project for somebody to rethink them, but it
doesn't seem practical for *this* patch.  Thanks for looking into it
this far!

> I think I can use this as the v2 changelog:
> 
> The slot must be removed before the pci_dev is removed, otherwise a panic
> can happen due to use-after-free.

Sounds good.

Bjorn

^ permalink raw reply

* RE: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-02 20:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Stephen Hemminger
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <20190802194053.GL151852@google.com>

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Friday, August 2, 2019 12:41 PM
> The subject line only describes the mechanical code change, which is
> obvious from the patch.  It would be better if we could say something
> about *why* we need this.

Hi Bjorn,
Sorry. I'll try to write a better changelog in v2. :-)
 
> On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> >
> > When a slot is removed, the pci_dev must still exist.
> >
> > pci_remove_root_bus() removes and free all the pci_devs, so
> > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > otherwise a general protection fault can happen, if the kernel is built
> 
> "general protection fault" is an x86 term that doesn't really say what
> the issue is.  I suspect this would be a "use-after-free" problem.

Yes, it's use-after-free. I'll fix the the wording.
 
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		/* Remove the bus from PCI's point of view. */
> >  		pci_lock_rescan_remove();
> >  		pci_stop_root_bus(hbus->pci_bus);
> > -		pci_remove_root_bus(hbus->pci_bus);
> >  		hv_pci_remove_slots(hbus);
> > +		pci_remove_root_bus(hbus->pci_bus);
> 
> I'm curious about why we need hv_pci_remove_slots() at all.  None of
> the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> anything similar to hv_pci_remove_slots().
> 
> Surely some of those callers also support slots, so there must be some
> other path that calls pci_destroy_slot() in those cases.  Can we use a
> similar strategy here?

Originally Stephen Heminger added the slot code for pci-hyperv.c:
a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
So he may know this better. My understanding is: we can not use the similar
stragegy used in the 2 other users of pci_create_slot():

drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().

drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
PCI-device-discovery protocol (which is based on VMBus rather the emulated
ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
discovered by pci-hyperv, so we can not use the standard register_slot() ->
pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
pci_destroy_slot() can not work for pci-hyperv.

I think I can use this as the v2 changelog:

The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Thanks,
Dexuan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox