Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-07  6:56 UTC (permalink / raw)
  To: Jakub Kicinski, Stephen Hemminger
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com
In-Reply-To: <20190806121242.141c2324@cakuba.netronome.com>

> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Tuesday, August 6, 2019 12:13 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> > This fixes a warning of "suspicious rcu_dereference_check() usage"
> > when nload runs.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> Minor change in behaviour would perhaps be worth acknowledging in the
> commit message (since you check ndev for NULL later now), and a Fixes
> tag would be good.
> 
> But the looks pretty straightforward and correct!

Hi,
Yeah, it looks the minor behavior change doesn't matter, because IMO the 
'nvdev' can only be NULL when the NIC is being removed, or the MTU is
being changed, etc.

The Fixes tag is:
Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")

If I should send a v2, please let me know.

Thanks,
-- Dexuan


^ permalink raw reply

* [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-06 23:52 UTC (permalink / raw)
  To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, linux-kernel@vger.kernel.org

Currently in Azure cloud, for passthrough devices including GPU, the
host sets the device instance ID's bytes 8 - 15 to a value derived from
the host HWID, which is the same on all devices in a VM. So, the device
instance ID's bytes 8 and 9 provided by the host are no longer unique.

This can cause device passthrough to VMs to fail because the bytes 8 and
9 is used as PCI domain number. So, as recommended by Azure host team,
we now use the bytes 4 and 5 which usually contain unique numbers as PCI
domain. The chance of collision is greatly reduced. In the rare cases of
collision, we will detect and find another number that is not in use.

Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
 1 file changed, 79 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..4f3d97e 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
 		complete(&hbus->remove_event);
 }
 
+#define HVPCI_DOM_MAP_SIZE (64 * 1024)
+static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
+
+/*
+ * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
+ * as invalid for passthrough PCI devices of this driver.
+ */
+#define HVPCI_DOM_INVALID 0
+
+/**
+ * hv_get_dom_num() - Get a valid PCI domain number
+ * Check if the PCI domain number is in use, and return another number if
+ * it is in use.
+ *
+ * @dom: Requested domain number
+ *
+ * return: domain number on success, HVPCI_DOM_INVALID on failure
+ */
+static u16 hv_get_dom_num(u16 dom)
+{
+	unsigned int i;
+
+	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
+		return dom;
+
+	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
+		if (test_and_set_bit(i, hvpci_dom_map) == 0)
+			return i;
+	}
+
+	return HVPCI_DOM_INVALID;
+}
+
+/**
+ * hv_put_dom_num() - Mark the PCI domain number as free
+ * @dom: Domain number to be freed
+ */
+static void hv_put_dom_num(u16 dom)
+{
+	clear_bit(dom, hvpci_dom_map);
+}
+
 /**
  * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
  * @hdev:	VMBus's tracking struct for this root PCI bus
@@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 			const struct hv_vmbus_device_id *dev_id)
 {
 	struct hv_pcibus_device *hbus;
+	u16 dom_req, dom;
 	int ret;
 
 	/*
@@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus->state = hv_pcibus_init;
 
 	/*
-	 * The PCI bus "domain" is what is called "segment" in ACPI and
-	 * other specs.  Pull it from the instance ID, to get something
-	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
-	 * do the same thing for consistency.  Note that, since this code
-	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
-	 * that (1) the only domain in use for something that looks like
-	 * a physical PCI bus (which is actually emulated by the
-	 * hypervisor) is domain 0 and (2) there will be no overlap
-	 * between domains derived from these instance IDs in the same
-	 * VM.
+	 * The PCI bus "domain" is what is called "segment" in ACPI and other
+	 * specs. Pull it from the instance ID, to get something usually
+	 * unique. In rare cases of collision, we will find out another number
+	 * not in use.
+	 *
+	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
+	 * together with this guest driver can guarantee that (1) The only
+	 * domain used by Gen1 VMs for something that looks like a physical
+	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
+	 * (2) There will be no overlap between domains (after fixing possible
+	 * collisions) in the same VM.
 	 */
-	hbus->sysdata.domain = hdev->dev_instance.b[9] |
-			       hdev->dev_instance.b[8] << 8;
+	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
+	dom = hv_get_dom_num(dom_req);
+
+	if (dom == HVPCI_DOM_INVALID) {
+		dev_err(&hdev->device,
+			"Unable to use dom# 0x%hx or other numbers", dom_req);
+		ret = -EINVAL;
+		goto free_bus;
+	}
+
+	if (dom != dom_req)
+		dev_info(&hdev->device,
+			 "PCI dom# 0x%hx has collision, using 0x%hx",
+			 dom_req, dom);
+
+	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
 	refcount_set(&hbus->remove_lock, 1);
@@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
 		ret = -ENOMEM;
-		goto free_bus;
+		goto free_dom;
 	}
 
 	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
@@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 	vmbus_close(hdev->channel);
 destroy_wq:
 	destroy_workqueue(hbus->wq);
+free_dom:
+	hv_put_dom_num(hbus->sysdata.domain);
 free_bus:
 	free_page((unsigned long)hbus);
 	return ret;
@@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
 	put_hvpcibus(hbus);
 	wait_for_completion(&hbus->remove_event);
 	destroy_workqueue(hbus->wq);
+
+	hv_put_dom_num(hbus->sysdata.domain);
+
 	free_page((unsigned long)hbus);
 	return 0;
 }
@@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
 
 static int __init init_hv_pci_drv(void)
 {
+	/* Set the invalid domain number's bit, so it will not be used */
+	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
+
 	return vmbus_driver_register(&hv_pci_drv);
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-06 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, Stephen Hemminger, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <20190806201611.GT151852@google.com>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Bjorn Helgaas
> Sent: Tuesday, August 6, 2019 1:16 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> Thanks for updating this.  But you didn't update the subject line,
> which is really still a little too low-level.  Maybe Lorenzo will fix
> this.  Something like this, maybe?
> 
>   PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it

This is better. Thanks!

I hope Lorenzo can help to fix this so I could avoid a v3. :-)

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Bjorn Helgaas @ 2019-08-06 20:16 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, Stephen Hemminger, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <PU1P153MB01693F32F6BB02F9655CC84EBFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Thanks for updating this.  But you didn't update the subject line,
which is really still a little too low-level.  Maybe Lorenzo will fix
this.  Something like this, maybe?

  PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it

On Fri, Aug 02, 2019 at 10:50:20PM +0000, Dexuan Cui wrote:
> 
> The slot must be removed before the pci_dev is removed, otherwise a panic
> can happen due to use-after-free.
> 
> Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> 
> Changes in v2:
>   Improved the changelog accordign to the discussion with Bjorn Helgaas:
> 	  https://lkml.org/lkml/2019/8/1/1173
> 	  https://lkml.org/lkml/2019/8/2/1559
> 
>  drivers/pci/controller/pci-hyperv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6b9cc6e60a..68c611d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> -		pci_remove_root_bus(hbus->pci_bus);
>  		hv_pci_remove_slots(hbus);
> +		pci_remove_root_bus(hbus->pci_bus);
>  		pci_unlock_rescan_remove();
>  		hbus->state = hv_pcibus_removed;
>  	}
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-06 19:45 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: sashal@kernel.org, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190806185515.GR151852@google.com>



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Tuesday, August 6, 2019 2:55 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; lorenzo.pieralisi@arm.com; linux-
> hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
> 
> On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> > Due to Azure host agent settings, the device instance ID's bytes 8 and
> > 9 are no longer unique. This causes some of the PCI devices not
> > showing up in VMs with multiple passthrough devices, such as GPUs. So,
> > as recommended by Azure host team, we now use the bytes 4 and 5 which
> > usually provide unique numbers.
> 
> What does "Azure host agent settings" mean?  Would it be useful to say
> something more specific, so users could ready this and say "oh, I'm using the
> Azure host agent settings mentioned here, so I need this patch"?  Is this
> related to a specific Azure host agent commit or release?
> 
> "This causes some of the PCI devices ..." is not a sentence.  I think I
> understand what you're saying -- "This sometimes causes device passthrough
> to VMs to fail." Is there something about GPUs that makes them more
> susceptible to this problem?
> 
> I think there are really two changes in this patch:
> 
>   1) Start with a domain number from bytes 4-5 instead of bytes 8-9.
> 
>   2) If the domain number is not unique, allocate another one using
>   the bitmap.
> 
> It sounds like part 2) by itself would be enough to solve the problem, and
> including part 1) just reduces the likelihood of having to allocate another
> domain number.
> 
> > In the rare cases of collision, we will detect and find another number
> > that is not in use.
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this
> idea.
> 
> This looks like two paragraphs and should have a blank line between them.
> 
> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> > ---
> >  drivers/pci/controller/pci-hyperv.c | 91
> > +++++++++++++++++++++++++++++++------
> >  1 file changed, 78 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c
> > b/drivers/pci/controller/pci-hyperv.c
> > index 82acd61..6b9cc6e60a 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -37,6 +37,8 @@
> >   * the PCI back-end driver in Hyper-V.
> >   */
> >
> > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > +
> >  #include <linux/kernel.h>
> >  #include <linux/module.h>
> >  #include <linux/pci.h>
> > @@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct
> hv_pcibus_device *hbus)
> >  		complete(&hbus->remove_event);
> >  }
> >
> > +#define HVPCI_DOM_MAP_SIZE (64 * 1024) static
> > +DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> > +
> > +/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so
> > +define 0
> > + * as invalid for passthrough PCI devices of this driver.
> > + */
> 
> Please use the usual multi-line comment style:
> 
>   /*
>    * PCI domain number ...
>    */
> 
> > +#define HVPCI_DOM_INVALID 0
> > +
> > +/**
> > + * hv_get_dom_num() - Get a valid PCI domain number
> > + * Check if the PCI domain number is in use, and return another
> > +number if
> > + * it is in use.
> > + *
> > + * @dom: Requested domain number
> > + *
> > + * return: domain number on success, HVPCI_DOM_INVALID on failure  */
> > +static u16 hv_get_dom_num(u16 dom) {
> > +	unsigned int i;
> > +
> > +	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> > +		return dom;
> > +
> > +	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> > +		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> > +			return i;
> > +	}
> > +
> > +	return HVPCI_DOM_INVALID;
> > +}
> > +
> > +/**
> > + * hv_put_dom_num() - Mark the PCI domain number as free
> > + * @dom: Domain number to be freed
> > + */
> > +static void hv_put_dom_num(u16 dom)
> > +{
> > +	clear_bit(dom, hvpci_dom_map);
> > +}
> > +
> >  /**
> >   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
> >   * @hdev:	VMBus's tracking struct for this root PCI bus
> > @@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev,
> >  			const struct hv_vmbus_device_id *dev_id)  {
> >  	struct hv_pcibus_device *hbus;
> > +	u16 dom_req, dom;
> >  	int ret;
> >
> >  	/*
> > @@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device
> *hdev,
> >  	hbus->state = hv_pcibus_init;
> >
> >  	/*
> > -	 * The PCI bus "domain" is what is called "segment" in ACPI and
> > -	 * other specs.  Pull it from the instance ID, to get something
> > -	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
> > -	 * do the same thing for consistency.  Note that, since this code
> > -	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> > -	 * that (1) the only domain in use for something that looks like
> > -	 * a physical PCI bus (which is actually emulated by the
> > -	 * hypervisor) is domain 0 and (2) there will be no overlap
> > -	 * between domains derived from these instance IDs in the same
> > -	 * VM.
> > +	 * The PCI bus "domain" is what is called "segment" in ACPI and other
> > +	 * specs. Pull it from the instance ID, to get something usually
> > +	 * unique. In rare cases of collision, we will find out another number
> > +	 * not in use.
> > +	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> > +	 * together with this guest driver can guarantee that (1) The only
> > +	 * domain used by Gen1 VMs for something that looks like a physical
> > +	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> > +	 * (2) There will be no overlap between domains (after fixing possible
> > +	 * collisions) in the same VM.
> 
> Please use blank lines between paragraphs.
> 
> >  	 */
> > -	hbus->sysdata.domain = hdev->dev_instance.b[9] |
> > -			       hdev->dev_instance.b[8] << 8;
> > +	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> > +	dom = hv_get_dom_num(dom_req);
> > +
> > +	if (dom == HVPCI_DOM_INVALID) {
> > +		pr_err("Unable to use dom# 0x%hx or other numbers",
> > +		       dom_req);
> > +		ret = -EINVAL;
> > +		goto free_bus;
> > +	}
> > +
> > +	if (dom != dom_req)
> > +		pr_info("PCI dom# 0x%hx has collision, using 0x%hx",
> > +			dom_req, dom);
> 
> Can these use "dev_err/info(&hdev->device, ...)" like the other message in
> this function?  It's always nicer to have a specific device reference when one
> is available.  Probably don't need the new pr_fmt() definition if you do this.
> 
> > +
> > +	hbus->sysdata.domain = dom;
> >
> >  	hbus->hdev = hdev;
> >  	refcount_set(&hbus->remove_lock, 1); @@ -2559,7 +2616,7 @@
> static
> > int hv_pci_probe(struct hv_device *hdev,
> >  					   hbus->sysdata.domain);
> >  	if (!hbus->wq) {
> >  		ret = -ENOMEM;
> > -		goto free_bus;
> > +		goto free_dom;
> >  	}
> >
> >  	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL,
> > 0, @@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device
> *hdev,
> >  	vmbus_close(hdev->channel);
> >  destroy_wq:
> >  	destroy_workqueue(hbus->wq);
> > +free_dom:
> > +	hv_put_dom_num(hbus->sysdata.domain);
> >  free_bus:
> >  	free_page((unsigned long)hbus);
> >  	return ret;
> > @@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  	put_hvpcibus(hbus);
> >  	wait_for_completion(&hbus->remove_event);
> >  	destroy_workqueue(hbus->wq);
> > +
> > +	hv_put_dom_num(hbus->sysdata.domain);
> > +
> >  	free_page((unsigned long)hbus);
> >  	return 0;
> >  }
> > @@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void)
> >
> >  static int __init init_hv_pci_drv(void)  {
> > +	/* Set the invalid domain number's bit, so it will not be used */
> > +	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> > +
> >  	return vmbus_driver_register(&hv_pci_drv);
> >  }
> >
> > --
> > 1.8.3.1

Thanks for the comments. I will make the recommended changes.

- Haiyang

^ permalink raw reply

* Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Jakub Kicinski @ 2019-08-06 19:12 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org, KY Srinivasan,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB0169AECABF6094A3E7BEE381BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Minor change in behaviour would perhaps be worth acknowledging in the
commit message (since you check ndev for NULL later now), and a Fixes
tag would be good.

But the looks pretty straightforward and correct!

^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Bjorn Helgaas @ 2019-08-06 18:55 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Haiyang Zhang, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190805183657.GD17747@sasha-vm>

On Mon, Aug 05, 2019 at 02:36:57PM -0400, Sasha Levin wrote:
> On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> > Due to Azure host agent settings, the device instance ID's bytes 8 and 9
> > are no longer unique. This causes some of the PCI devices not showing up
> > in VMs with multiple passthrough devices, such as GPUs. So, as recommended
> > by Azure host team, we now use the bytes 4 and 5 which usually provide
> > unique numbers.
> > 
> > In the rare cases of collision, we will detect and find another number
> > that is not in use.
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

> > Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Acked-by: Sasha Levin <sashal@kernel.org>
> 
> Bjorn, will you take it through the PCI tree or do you want me to take
> it through hyper-v?

Lorenzo usually applies patches to drivers/pci/controller/*, so that
would be my preference.

Bjorn

^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Bjorn Helgaas @ 2019-08-06 18:55 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1564771954-9181-1-git-send-email-haiyangz@microsoft.com>

On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
> Due to Azure host agent settings, the device instance ID's bytes 8 and 9
> are no longer unique. This causes some of the PCI devices not showing up
> in VMs with multiple passthrough devices, such as GPUs. So, as recommended
> by Azure host team, we now use the bytes 4 and 5 which usually provide
> unique numbers.

What does "Azure host agent settings" mean?  Would it be useful to say
something more specific, so users could ready this and say "oh, I'm
using the Azure host agent settings mentioned here, so I need this
patch"?  Is this related to a specific Azure host agent commit or
release?

"This causes some of the PCI devices ..." is not a sentence.  I think
I understand what you're saying -- "This sometimes causes device
passthrough to VMs to fail." Is there something about GPUs that makes
them more susceptible to this problem?

I think there are really two changes in this patch:

  1) Start with a domain number from bytes 4-5 instead of bytes 8-9.

  2) If the domain number is not unique, allocate another one using
  the bitmap.

It sounds like part 2) by itself would be enough to solve the problem,
and including part 1) just reduces the likelihood of having to
allocate another domain number.

> In the rare cases of collision, we will detect and find another number
> that is not in use.
> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

This looks like two paragraphs and should have a blank line between
them.

> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 91 +++++++++++++++++++++++++++++++------
>  1 file changed, 78 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 82acd61..6b9cc6e60a 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -37,6 +37,8 @@
>   * the PCI back-end driver in Hyper-V.
>   */
>  
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
>  #include <linux/kernel.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> @@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
>  		complete(&hbus->remove_event);
>  }
>  
> +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> +
> +/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> + * as invalid for passthrough PCI devices of this driver.
> + */

Please use the usual multi-line comment style:

  /*
   * PCI domain number ...
   */

> +#define HVPCI_DOM_INVALID 0
> +
> +/**
> + * hv_get_dom_num() - Get a valid PCI domain number
> + * Check if the PCI domain number is in use, and return another number if
> + * it is in use.
> + *
> + * @dom: Requested domain number
> + *
> + * return: domain number on success, HVPCI_DOM_INVALID on failure
> + */
> +static u16 hv_get_dom_num(u16 dom)
> +{
> +	unsigned int i;
> +
> +	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> +		return dom;
> +
> +	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> +		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> +			return i;
> +	}
> +
> +	return HVPCI_DOM_INVALID;
> +}
> +
> +/**
> + * hv_put_dom_num() - Mark the PCI domain number as free
> + * @dom: Domain number to be freed
> + */
> +static void hv_put_dom_num(u16 dom)
> +{
> +	clear_bit(dom, hvpci_dom_map);
> +}
> +
>  /**
>   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
>   * @hdev:	VMBus's tracking struct for this root PCI bus
> @@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  			const struct hv_vmbus_device_id *dev_id)
>  {
>  	struct hv_pcibus_device *hbus;
> +	u16 dom_req, dom;
>  	int ret;
>  
>  	/*
> @@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	hbus->state = hv_pcibus_init;
>  
>  	/*
> -	 * The PCI bus "domain" is what is called "segment" in ACPI and
> -	 * other specs.  Pull it from the instance ID, to get something
> -	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
> -	 * do the same thing for consistency.  Note that, since this code
> -	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> -	 * that (1) the only domain in use for something that looks like
> -	 * a physical PCI bus (which is actually emulated by the
> -	 * hypervisor) is domain 0 and (2) there will be no overlap
> -	 * between domains derived from these instance IDs in the same
> -	 * VM.
> +	 * The PCI bus "domain" is what is called "segment" in ACPI and other
> +	 * specs. Pull it from the instance ID, to get something usually
> +	 * unique. In rare cases of collision, we will find out another number
> +	 * not in use.
> +	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> +	 * together with this guest driver can guarantee that (1) The only
> +	 * domain used by Gen1 VMs for something that looks like a physical
> +	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> +	 * (2) There will be no overlap between domains (after fixing possible
> +	 * collisions) in the same VM.

Please use blank lines between paragraphs.

>  	 */
> -	hbus->sysdata.domain = hdev->dev_instance.b[9] |
> -			       hdev->dev_instance.b[8] << 8;
> +	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> +	dom = hv_get_dom_num(dom_req);
> +
> +	if (dom == HVPCI_DOM_INVALID) {
> +		pr_err("Unable to use dom# 0x%hx or other numbers",
> +		       dom_req);
> +		ret = -EINVAL;
> +		goto free_bus;
> +	}
> +
> +	if (dom != dom_req)
> +		pr_info("PCI dom# 0x%hx has collision, using 0x%hx",
> +			dom_req, dom);

Can these use "dev_err/info(&hdev->device, ...)" like the other
message in this function?  It's always nicer to have a specific device
reference when one is available.  Probably don't need the new pr_fmt()
definition if you do this.

> +
> +	hbus->sysdata.domain = dom;
>  
>  	hbus->hdev = hdev;
>  	refcount_set(&hbus->remove_lock, 1);
> @@ -2559,7 +2616,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  					   hbus->sysdata.domain);
>  	if (!hbus->wq) {
>  		ret = -ENOMEM;
> -		goto free_bus;
> +		goto free_dom;
>  	}
>  
>  	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> @@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	vmbus_close(hdev->channel);
>  destroy_wq:
>  	destroy_workqueue(hbus->wq);
> +free_dom:
> +	hv_put_dom_num(hbus->sysdata.domain);
>  free_bus:
>  	free_page((unsigned long)hbus);
>  	return ret;
> @@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev)
>  	put_hvpcibus(hbus);
>  	wait_for_completion(&hbus->remove_event);
>  	destroy_workqueue(hbus->wq);
> +
> +	hv_put_dom_num(hbus->sysdata.domain);
> +
>  	free_page((unsigned long)hbus);
>  	return 0;
>  }
> @@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void)
>  
>  static int __init init_hv_pci_drv(void)
>  {
> +	/* Set the invalid domain number's bit, so it will not be used */
> +	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> +
>  	return vmbus_driver_register(&hv_pci_drv);
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: hv_netvsc: WARNING: suspicious RCU usage?
From: Dexuan Cui @ 2019-08-06  5:25 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Yidong Ren, Haiyang Zhang,
	Stephen Hemminger, David S. Miller
  Cc: linux-hyperv@vger.kernel.org
In-Reply-To: <PU1P153MB0169B7073A4865D50AED9EE5BFDA0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Monday, August 5, 2019 4:56 PM
> The warning is caused by the rcu_dereference_rtnl() :
> 
> 1239 static void netvsc_get_stats64(struct net_device *net,
> 1240                                struct rtnl_link_stats64 *t)
> 1241 {
> 1242         struct net_device_context *ndev_ctx = netdev_priv(net);
> 1243         struct netvsc_device *nvdev =
> rcu_dereference_rtnl(ndev_ctx->nvdev);
> 
> I think here netvsc_get_stats64() neither holds rcu_read_lock() nor RTNL
> 
> IMO it should call rcu_read_lock()/unlock(), or get RTNL to fix the warning?

I just posted a patch on behalf of Stephen:
[PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
 
Thanks,
-- Dexuan


^ permalink raw reply

* [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-06  5:17 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger
  Cc: sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com


This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 44 +++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f9209594624b5..25502d335b94f 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1236,25 +1236,10 @@ static void netvsc_get_pcpu_stats(struct net_device *net,
 	}
 }
 
-static void netvsc_get_stats64(struct net_device *net,
-			       struct rtnl_link_stats64 *t)
+static void netvsc_get_per_chan_stats(struct netvsc_device *nvdev,
+				      struct rtnl_link_stats64 *t)
 {
-	struct net_device_context *ndev_ctx = netdev_priv(net);
-	struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
-	struct netvsc_vf_pcpu_stats vf_tot;
-	int i;
-
-	if (!nvdev)
-		return;
-
-	netdev_stats_to_stats64(t, &net->stats);
-
-	netvsc_get_vf_stats(net, &vf_tot);
-	t->rx_packets += vf_tot.rx_packets;
-	t->tx_packets += vf_tot.tx_packets;
-	t->rx_bytes   += vf_tot.rx_bytes;
-	t->tx_bytes   += vf_tot.tx_bytes;
-	t->tx_dropped += vf_tot.tx_dropped;
+	u32 i;
 
 	for (i = 0; i < nvdev->num_chn; i++) {
 		const struct netvsc_channel *nvchan = &nvdev->chan_table[i];
@@ -1286,6 +1271,29 @@ static void netvsc_get_stats64(struct net_device *net,
 	}
 }
 
+static void netvsc_get_stats64(struct net_device *net,
+			       struct rtnl_link_stats64 *t)
+{
+	struct net_device_context *ndev_ctx = netdev_priv(net);
+	struct netvsc_device *nvdev;
+	struct netvsc_vf_pcpu_stats vf_tot;
+
+	netdev_stats_to_stats64(t, &net->stats);
+
+	netvsc_get_vf_stats(net, &vf_tot);
+	t->rx_packets += vf_tot.rx_packets;
+	t->tx_packets += vf_tot.tx_packets;
+	t->rx_bytes   += vf_tot.rx_bytes;
+	t->tx_bytes   += vf_tot.tx_bytes;
+	t->tx_dropped += vf_tot.tx_dropped;
+
+	rcu_read_lock();
+	nvdev = rcu_dereference(ndev_ctx->nvdev);
+	if (nvdev)
+		netvsc_get_per_chan_stats(nvdev, t);
+	rcu_read_unlock();
+}
+
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
 {
 	struct net_device_context *ndc = netdev_priv(ndev);

^ permalink raw reply related

* hv_netvsc: WARNING: suspicious RCU usage?
From: Dexuan Cui @ 2019-08-05 23:55 UTC (permalink / raw)
  To: netdev@vger.kernel.org, Yidong Ren, Haiyang Zhang,
	Stephen Hemminger, David S. Miller
  Cc: linux-hyperv@vger.kernel.org

Hi,
After the VM boots up, I always get the below call-trace when I run "nload" for the first time:

[  113.910911] WARNING: suspicious RCU usage
[  113.913244] 5.2.0+ #19 Not tainted
[  113.915216] -----------------------------
[  113.917521] drivers/net/hyperv/netvsc_drv.c:1243 suspicious rcu_dereference_check() usage!
[  113.922191]
[  113.922191] other info that might help us debug this:
[  113.926573]
[  113.926573] rcu_scheduler_active = 2, debug_locks = 1
[  113.930052] 4 locks held by nload/1977:
[  113.932251]  #0: 0000000080b71e86 (&p->lock){+.+.}, at: seq_read+0x41/0x3d0
[  113.936115]  #1: 00000000cacff770 (&of->mutex){+.+.}, at: kernfs_seq_start+0x2a/0x90
[  113.940115]  #2: 00000000287c988f (kn->count#134){.+.+}, at: kernfs_seq_start+0x32/0x90
[  113.944292]  #3: 00000000996fa9cc (dev_base_lock){++.+}, at: netstat_show.isra.25+0x4a/0xb0
[  113.958076]
[  113.958076] stack backtrace:
[  113.958081] CPU: 3 PID: 1977 Comm: nload Not tainted 5.2.0+ #19
[  113.958083] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
[  113.958084] Call Trace:
[  113.958091]  dump_stack+0x67/0x90
[  113.973663]  netvsc_get_stats64+0x159/0x170 [hv_netvsc]
[  113.973663]  dev_get_stats+0x55/0xb0
[  113.973663]  netstat_show.isra.25+0x5b/0xb0
[  113.973663]  dev_attr_show+0x15/0x40
[  113.981661]  sysfs_kf_seq_show+0xad/0xf0
[  113.981661]  seq_read+0x146/0x3d0
[  113.981661]  vfs_read+0x9c/0x160
[  113.989025]  ksys_read+0x5c/0xd0
[  113.989025]  do_syscall_64+0x5e/0x220
[  113.989025]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  113.989025] RIP: 0033:0x7f4485daaf31

nload is a console application which monitors network traffic and bandwidth usage in real time.

The warning is caused by the rcu_dereference_rtnl() :

1239 static void netvsc_get_stats64(struct net_device *net,
1240                                struct rtnl_link_stats64 *t)
1241 {
1242         struct net_device_context *ndev_ctx = netdev_priv(net);
1243         struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);

I think here netvsc_get_stats64() neither holds rcu_read_lock() nor RTNL

IMO it should call rcu_read_lock()/unlock(), or get RTNL to fix the warning?

Thanks,
-- Dexuan


^ permalink raw reply

* Re: [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Sasha Levin @ 2019-08-05 18:36 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1564771954-9181-1-git-send-email-haiyangz@microsoft.com>

On Fri, Aug 02, 2019 at 06:52:56PM +0000, Haiyang Zhang wrote:
>Due to Azure host agent settings, the device instance ID's bytes 8 and 9
>are no longer unique. This causes some of the PCI devices not showing up
>in VMs with multiple passthrough devices, such as GPUs. So, as recommended
>by Azure host team, we now use the bytes 4 and 5 which usually provide
>unique numbers.
>
>In the rare cases of collision, we will detect and find another number
>that is not in use.
>Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.
>
>Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Acked-by: Sasha Levin <sashal@kernel.org>

Bjorn, will you take it through the PCI tree or do you want me to take
it through hyper-v?

--
Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-08-03  0:49 UTC (permalink / raw)
  To: David Miller
  Cc: Sunil Muthuswamy, netdev@vger.kernel.org, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, sashal@kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <20190802.172729.1656276508211556851.davem@davemloft.net>

> From: linux-hyperv-owner@vger.kernel.org
> Sent: Friday, August 2, 2019 5:27 PM
> ...
> Applied and queued up for -stable.
> 
> Do not ever CC: stable for networking patches, we submit to -stable manually.
 
Thanks, David!
I'll remember to not add the stable tag for network patches.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v2 net] hv_sock: Fix hang when a connection is closed
From: David Miller @ 2019-08-03  0:27 UTC (permalink / raw)
  To: decui
  Cc: sunilmut, netdev, kys, haiyangz, sthemmin, sashal, mikelley,
	linux-hyperv, linux-kernel, olaf, apw, jasowang, vkuznets,
	marcelo.cerri
In-Reply-To: <PU1P153MB01696DDD3A3F601370701DD2BFDF0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 31 Jul 2019 01:25:45 +0000

> 
> There is a race condition for an established connection that is being closed
> by the guest: the refcnt is 4 at the end of hvs_release() (Note: here the
> 'remove_sock' is false):
> 
> 1 for the initial value;
> 1 for the sk being in the bound list;
> 1 for the sk being in the connected list;
> 1 for the delayed close_work.
> 
> After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may*
> decrease the refcnt to 3.
> 
> Concurrently, hvs_close_connection() runs in another thread:
>   calls vsock_remove_sock() to decrease the refcnt by 2;
>   call sock_put() to decrease the refcnt to 0, and free the sk;
>   next, the "release_sock(sk)" may hang due to use-after-free.
> 
> In the above, after hvs_release() finishes, if hvs_close_connection() runs
> faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
> because at the beginning of hvs_close_connection(), the refcnt is still 4.
> 
> The issue can be resolved if an extra reference is taken when the
> connection is established.
> 
> Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Applied and queued up for -stable.

Do not ever CC: stable for networking patches, we submit to -stable manually.

Thank you.

^ permalink raw reply

* [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-02 22:50 UTC (permalink / raw)
  To: lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org, Michael Kelley, Stephen Hemminger
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com, Dexuan Cui


The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
---

Changes in v2:
  Improved the changelog accordign to the discussion with Bjorn Helgaas:
	  https://lkml.org/lkml/2019/8/1/1173
	  https://lkml.org/lkml/2019/8/2/1559

 drivers/pci/controller/pci-hyperv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 6b9cc6e60a..68c611d 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
 		/* Remove the bus from PCI's point of view. */
 		pci_lock_rescan_remove();
 		pci_stop_root_bus(hbus->pci_bus);
-		pci_remove_root_bus(hbus->pci_bus);
 		hv_pci_remove_slots(hbus);
+		pci_remove_root_bus(hbus->pci_bus);
 		pci_unlock_rescan_remove();
 		hbus->state = hv_pcibus_removed;
 	}
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Bjorn Helgaas @ 2019-08-02 22:15 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Stephen Hemminger, lorenzo.pieralisi@arm.com,
	linux-pci@vger.kernel.org, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <PU1P153MB01698F51FE22C39086CC8353BFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Fri, Aug 02, 2019 at 08:31:26PM +0000, Dexuan Cui wrote:
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Friday, August 2, 2019 12:41 PM
> > The subject line only describes the mechanical code change, which is
> > obvious from the patch.  It would be better if we could say something
> > about *why* we need this.
> 
> Hi Bjorn,
> Sorry. I'll try to write a better changelog in v2. :-)
>  
> > On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> > >
> > > When a slot is removed, the pci_dev must still exist.
> > >
> > > pci_remove_root_bus() removes and free all the pci_devs, so
> > > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > > otherwise a general protection fault can happen, if the kernel is built
> > 
> > "general protection fault" is an x86 term that doesn't really say what
> > the issue is.  I suspect this would be a "use-after-free" problem.
> 
> Yes, it's use-after-free. I'll fix the the wording.
>  
> > > --- a/drivers/pci/controller/pci-hyperv.c
> > > +++ b/drivers/pci/controller/pci-hyperv.c
> > > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> > >  		/* Remove the bus from PCI's point of view. */
> > >  		pci_lock_rescan_remove();
> > >  		pci_stop_root_bus(hbus->pci_bus);
> > > -		pci_remove_root_bus(hbus->pci_bus);
> > >  		hv_pci_remove_slots(hbus);
> > > +		pci_remove_root_bus(hbus->pci_bus);
> > 
> > I'm curious about why we need hv_pci_remove_slots() at all.  None of
> > the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> > anything similar to hv_pci_remove_slots().
> > 
> > Surely some of those callers also support slots, so there must be some
> > other path that calls pci_destroy_slot() in those cases.  Can we use a
> > similar strategy here?
> 
> Originally Stephen Heminger added the slot code for pci-hyperv.c:
> a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
> So he may know this better. My understanding is: we can not use the similar
> stragegy used in the 2 other users of pci_create_slot():
> 
> drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
> It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
> pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
> to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().
> 
> drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
> the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
> PCI-device-discovery protocol (which is based on VMBus rather the emulated
> ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
> discovered by pci-hyperv, so we can not use the standard register_slot() ->
> pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
> pci_destroy_slot() can not work for pci-hyperv.

Hmm, ok.  This still doesn't seem right to me, but I think the bottom
line will be that the current slot registration interfaces just don't
work quite right for all the cases we want them to.

Maybe it would be a good project for somebody to rethink them, but it
doesn't seem practical for *this* patch.  Thanks for looking into it
this far!

> I think I can use this as the v2 changelog:
> 
> The slot must be removed before the pci_dev is removed, otherwise a panic
> can happen due to use-after-free.

Sounds good.

Bjorn

^ permalink raw reply

* RE: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-02 20:31 UTC (permalink / raw)
  To: Bjorn Helgaas, Stephen Hemminger
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <20190802194053.GL151852@google.com>

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Friday, August 2, 2019 12:41 PM
> The subject line only describes the mechanical code change, which is
> obvious from the patch.  It would be better if we could say something
> about *why* we need this.

Hi Bjorn,
Sorry. I'll try to write a better changelog in v2. :-)

> On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> >
> > When a slot is removed, the pci_dev must still exist.
> >
> > pci_remove_root_bus() removes and free all the pci_devs, so
> > hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> > otherwise a general protection fault can happen, if the kernel is built
> 
> "general protection fault" is an x86 term that doesn't really say what
> the issue is.  I suspect this would be a "use-after-free" problem.

Yes, it's use-after-free. I'll fix the the wording.

> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
> >  		/* Remove the bus from PCI's point of view. */
> >  		pci_lock_rescan_remove();
> >  		pci_stop_root_bus(hbus->pci_bus);
> > -		pci_remove_root_bus(hbus->pci_bus);
> >  		hv_pci_remove_slots(hbus);
> > +		pci_remove_root_bus(hbus->pci_bus);
> 
> I'm curious about why we need hv_pci_remove_slots() at all.  None of
> the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
> anything similar to hv_pci_remove_slots().
> 
> Surely some of those callers also support slots, so there must be some
> other path that calls pci_destroy_slot() in those cases.  Can we use a
> similar strategy here?

Originally Stephen Heminger added the slot code for pci-hyperv.c:
a15f2c08c708 ("PCI: hv: support reporting serial number as slot information")
So he may know this better. My understanding is: we can not use the similar
stragegy used in the 2 other users of pci_create_slot():

drivers/pci/hotplug/pci_hotplug_core.c calls pci_create_slot().
It looks drivers/pci/hotplug/ is quite different from pci-hyperv.c because
pci-hyper-v uses a simple *private* hot-plug protocol, making it impossible
to use the API pci_hp_register() and pci_hp_destroy() -> pci_destroy_slot().

drivers/acpi/pci_slot.c calls pci_create_slot(), and saves the created slots in
the static "slot_list" list in the same file. Again, since pci-hyper-v uses a private
PCI-device-discovery protocol (which is based on VMBus rather the emulated
ACPI and PCI), acpi_pci_slot_enumerate() can not find the PCI devices that are
discovered by pci-hyperv, so we can not use the standard register_slot() ->
pci_create_slot() to create the slots and hence acpi_pci_slot_remove() -> 
pci_destroy_slot() can not work for pci-hyperv.

I think I can use this as the v2 changelog:

The slot must be removed before the pci_dev is removed, otherwise a panic
can happen due to use-after-free.

Thanks,
Dexuan

^ permalink raw reply

* Re: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Bjorn Helgaas @ 2019-08-02 19:40 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com, jackm@mellanox.com
In-Reply-To: <PU1P153MB0169DBCFEE7257F5BB93580ABFD90@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Hi Dexuan,

The subject line only describes the mechanical code change, which is
obvious from the patch.  It would be better if we could say something
about *why* we need this.

On Fri, Aug 02, 2019 at 01:32:28AM +0000, Dexuan Cui wrote:
> 
> When a slot is removed, the pci_dev must still exist.
> 
> pci_remove_root_bus() removes and free all the pci_devs, so
> hv_pci_remove_slots() must be called before pci_remove_root_bus(),
> otherwise a general protection fault can happen, if the kernel is built

"general protection fault" is an x86 term that doesn't really say what
the issue is.  I suspect this would be a "use-after-free" problem.

> with the memory debugging options.
> 
> Fixes: 15becc2b56c6 ("PCI: hv: Add hv_pci_remove_slots() when we unload the driver")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> 
> ---
> 
> When pci-hyperv is unloaded, this panic can happen:
> 
>  general protection fault:
>  CPU: 2 PID: 1091 Comm: rmmod Not tainted 5.2.0+
>  RIP: 0010:pci_slot_release+0x30/0xd0
>  Call Trace:
>   kobject_release+0x65/0x190
>   pci_destroy_slot+0x25/0x60
>   hv_pci_remove+0xec/0x110 [pci_hyperv]
>   vmbus_remove+0x20/0x30 [hv_vmbus]
>   device_release_driver_internal+0xd5/0x1b0
>   driver_detach+0x44/0x7c
>   bus_remove_driver+0x75/0xc7
>   vmbus_driver_unregister+0x50/0xbd [hv_vmbus]
>   __x64_sys_delete_module+0x136/0x200
>   do_syscall_64+0x5e/0x220
> 
>  drivers/pci/controller/pci-hyperv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 6b9cc6e60a..68c611d 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2757,8 +2757,8 @@ static int hv_pci_remove(struct hv_device *hdev)
>  		/* Remove the bus from PCI's point of view. */
>  		pci_lock_rescan_remove();
>  		pci_stop_root_bus(hbus->pci_bus);
> -		pci_remove_root_bus(hbus->pci_bus);
>  		hv_pci_remove_slots(hbus);
> +		pci_remove_root_bus(hbus->pci_bus);

I'm curious about why we need hv_pci_remove_slots() at all.  None of
the other callers of pci_stop_root_bus() and pci_remove_root_bus() do
anything similar to hv_pci_remove_slots().

Surely some of those callers also support slots, so there must be some
other path that calls pci_destroy_slot() in those cases.  Can we use a
similar strategy here?

>  		pci_unlock_rescan_remove();
>  		hbus->state = hv_pcibus_removed;
>  	}
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH 2/3] drivers: hv: vmbus: add fuzz test attributes to sysfs
From: Branden Bonaby @ 2019-08-02 19:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, linux-kernel, kys, haiyangz, sthemmin, sashal
In-Reply-To: <87a7csggvj.fsf@vitty.brq.redhat.com>

On Fri, Aug 02, 2019 at 09:34:40AM +0200, Vitaly Kuznetsov wrote:
> Branden Bonaby <brandonbonaby94@gmail.com> writes:
> 
> > Expose the test parameters as part of the sysfs channel attributes.
> > We will control the testing state via these attributes.
> >
> > Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> > ---
> >  Documentation/ABI/stable/sysfs-bus-vmbus | 22 ++++++
> >  drivers/hv/vmbus_drv.c                   | 97 +++++++++++++++++++++++-
> >  2 files changed, 118 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
> > index 8e8d167eca31..239fcb6fdc75 100644
> > --- a/Documentation/ABI/stable/sysfs-bus-vmbus
> > +++ b/Documentation/ABI/stable/sysfs-bus-vmbus
> > @@ -185,3 +185,25 @@ Contact:        Michael Kelley <mikelley@microsoft.com>
> >  Description:    Total number of write operations that encountered an outbound
> >  		ring buffer full condition
> >  Users:          Debugging tools
> > +
> > +What:           /sys/bus/vmbus/devices/<UUID>/fuzz_test_state
> 
> I would prefer this to go under /sys/kernel/debug/ as this is clearly a
> debug/test feature.
> -- 
> Vitaly

Alright, it is testing so I see what you mean and why the code in 
this patch should go in debugfs. Will fix that and resend.

Branden Bonaby 

^ permalink raw reply

* Re: [PATCH 1/3] drivers: hv: vmbus: Introduce latency testing
From: Branden Bonaby @ 2019-08-02 19:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kys, haiyangz, sthemmin, sashal, linux-hyperv, linux-kernel
In-Reply-To: <87d0hoggyc.fsf@vitty.brq.redhat.com>

On Fri, Aug 02, 2019 at 09:32:59AM +0200, Vitaly Kuznetsov wrote:
> Branden Bonaby <brandonbonaby94@gmail.com> writes:
> 
> > Introduce user specified latency in the packet reception path.
> >
> > Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> > ---
> >  drivers/hv/connection.c  |  5 +++++
> >  drivers/hv/ring_buffer.c | 10 ++++++++++
> >  include/linux/hyperv.h   | 14 ++++++++++++++
> >  3 files changed, 29 insertions(+)
> >
> > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> > index 09829e15d4a0..2a2c22f5570e 100644
> > --- a/drivers/hv/connection.c
> > +++ b/drivers/hv/connection.c
> > @@ -354,9 +354,14 @@ void vmbus_on_event(unsigned long data)
> >  {
> >  	struct vmbus_channel *channel = (void *) data;
> >  	unsigned long time_limit = jiffies + 2;
> > +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> > +						channel :
> > +						channel->primary_channel;
> >  
> >  	trace_vmbus_on_event(channel);
> >  
> > +	if (unlikely(test_channel->fuzz_testing_buffer_delay > 0))
> > +		udelay(test_channel->fuzz_testing_buffer_delay);
> >  	do {
> >  		void (*callback_fn)(void *);
> >  
> > diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> > index 9a03b163cbbd..d7627c9023d6 100644
> > --- a/drivers/hv/ring_buffer.c
> > +++ b/drivers/hv/ring_buffer.c
> > @@ -395,7 +395,12 @@ struct vmpacket_descriptor *hv_pkt_iter_first(struct vmbus_channel *channel)
> >  {
> >  	struct hv_ring_buffer_info *rbi = &channel->inbound;
> >  	struct vmpacket_descriptor *desc;
> > +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> > +						channel :
> > +						channel->primary_channel;
> >  
> > +	if (unlikely(test_channel->fuzz_testing_message_delay > 0))
> > +		udelay(test_channel->fuzz_testing_message_delay);
> >  	if (hv_pkt_iter_avail(rbi) < sizeof(struct vmpacket_descriptor))
> >  		return NULL;
> >  
> > @@ -420,7 +425,12 @@ __hv_pkt_iter_next(struct vmbus_channel *channel,
> >  	struct hv_ring_buffer_info *rbi = &channel->inbound;
> >  	u32 packetlen = desc->len8 << 3;
> >  	u32 dsize = rbi->ring_datasize;
> > +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> > +						channel :
> > +						channel->primary_channel;
> 
> This pattern is repeated 3 times so a define is justified. I would also
> reversed the logic:
> 
>    test_channel = channel->primary_channel ? channel->primary_channel : channel;
> 
> >  
> > +	if (unlikely(test_channel->fuzz_testing_message_delay > 0))
> > +		udelay(test_channel->fuzz_testing_message_delay);
> 
> unlikely() is good but if it was under #ifdef it would've been even better.
> 
> >  	/* bump offset to next potential packet */
> -- 
> Vitaly

Makes sense, I'll address the repeated code and will change the way I
handled that if statement. Using an ifdef CONFIG_HYPERV_TESTING
seems like a good thing to add in here like you suggested.

^ permalink raw reply

* Re: [PATCH 0/3] hv: vmbus: add fuzz testing to hv devices
From: Branden Bonaby @ 2019-08-02 19:02 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, linux-kernel, kys, haiyangz, sthemmin, sashal
In-Reply-To: <87ftmkgh2t.fsf@vitty.brq.redhat.com>

On Fri, Aug 02, 2019 at 09:30:18AM +0200, Vitaly Kuznetsov wrote:
> Branden Bonaby <brandonbonaby94@gmail.com> writes:
> 
> > This patchset introduces a testing framework for Hyper-V drivers.
> > This framework allows us to introduce delays in the packet receive
> > path on a per-device basis. While the current code only supports 
> > introducing arbitrary delays in the host/guest communication path,
> > we intend to expand this to support error injection in the future.
> >
> > Branden Bonaby (3):
> >   drivers: hv: vmbus: Introduce latency testing
> >   drivers: hv: vmbus: add fuzz test attributes to sysfs
> >   tools: hv: add vmbus testing tool
> >
> >  Documentation/ABI/stable/sysfs-bus-vmbus |  22 ++
> >  drivers/hv/connection.c                  |   5 +
> >  drivers/hv/ring_buffer.c                 |  10 +
> >  drivers/hv/vmbus_drv.c                   |  97 ++++++-
> 
> Can we have something like CONFIG_HYPERV_TESTING and put this new 
> code under #ifdef?
> 
> >  include/linux/hyperv.h                   |  14 +
> >  tools/hv/vmbus_testing                   | 326 +++++++++++++++++++++++
> >  6 files changed, 473 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/hv/vmbus_testing
> 
> -- 
> Vitaly

You're right, it would be better to do it that way with ifdef's.
Will edit my patches and resend.

Branden Bonaby

^ permalink raw reply

* [PATCH] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-02 18:52 UTC (permalink / raw)
  To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, linux-kernel@vger.kernel.org

Due to Azure host agent settings, the device instance ID's bytes 8 and 9
are no longer unique. This causes some of the PCI devices not showing up
in VMs with multiple passthrough devices, such as GPUs. So, as recommended
by Azure host team, we now use the bytes 4 and 5 which usually provide
unique numbers.

In the rare cases of collision, we will detect and find another number
that is not in use.
Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/pci/controller/pci-hyperv.c | 91 +++++++++++++++++++++++++++++++------
 1 file changed, 78 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 82acd61..6b9cc6e60a 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -37,6 +37,8 @@
  * the PCI back-end driver in Hyper-V.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <linux/pci.h>
@@ -2507,6 +2509,47 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
 		complete(&hbus->remove_event);
 }
 
+#define HVPCI_DOM_MAP_SIZE (64 * 1024)
+static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
+
+/* PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
+ * as invalid for passthrough PCI devices of this driver.
+ */
+#define HVPCI_DOM_INVALID 0
+
+/**
+ * hv_get_dom_num() - Get a valid PCI domain number
+ * Check if the PCI domain number is in use, and return another number if
+ * it is in use.
+ *
+ * @dom: Requested domain number
+ *
+ * return: domain number on success, HVPCI_DOM_INVALID on failure
+ */
+static u16 hv_get_dom_num(u16 dom)
+{
+	unsigned int i;
+
+	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
+		return dom;
+
+	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
+		if (test_and_set_bit(i, hvpci_dom_map) == 0)
+			return i;
+	}
+
+	return HVPCI_DOM_INVALID;
+}
+
+/**
+ * hv_put_dom_num() - Mark the PCI domain number as free
+ * @dom: Domain number to be freed
+ */
+static void hv_put_dom_num(u16 dom)
+{
+	clear_bit(dom, hvpci_dom_map);
+}
+
 /**
  * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
  * @hdev:	VMBus's tracking struct for this root PCI bus
@@ -2518,6 +2561,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 			const struct hv_vmbus_device_id *dev_id)
 {
 	struct hv_pcibus_device *hbus;
+	u16 dom_req, dom;
 	int ret;
 
 	/*
@@ -2532,19 +2576,32 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus->state = hv_pcibus_init;
 
 	/*
-	 * The PCI bus "domain" is what is called "segment" in ACPI and
-	 * other specs.  Pull it from the instance ID, to get something
-	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
-	 * do the same thing for consistency.  Note that, since this code
-	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
-	 * that (1) the only domain in use for something that looks like
-	 * a physical PCI bus (which is actually emulated by the
-	 * hypervisor) is domain 0 and (2) there will be no overlap
-	 * between domains derived from these instance IDs in the same
-	 * VM.
+	 * The PCI bus "domain" is what is called "segment" in ACPI and other
+	 * specs. Pull it from the instance ID, to get something usually
+	 * unique. In rare cases of collision, we will find out another number
+	 * not in use.
+	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
+	 * together with this guest driver can guarantee that (1) The only
+	 * domain used by Gen1 VMs for something that looks like a physical
+	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
+	 * (2) There will be no overlap between domains (after fixing possible
+	 * collisions) in the same VM.
 	 */
-	hbus->sysdata.domain = hdev->dev_instance.b[9] |
-			       hdev->dev_instance.b[8] << 8;
+	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
+	dom = hv_get_dom_num(dom_req);
+
+	if (dom == HVPCI_DOM_INVALID) {
+		pr_err("Unable to use dom# 0x%hx or other numbers",
+		       dom_req);
+		ret = -EINVAL;
+		goto free_bus;
+	}
+
+	if (dom != dom_req)
+		pr_info("PCI dom# 0x%hx has collision, using 0x%hx",
+			dom_req, dom);
+
+	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
 	refcount_set(&hbus->remove_lock, 1);
@@ -2559,7 +2616,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
 		ret = -ENOMEM;
-		goto free_bus;
+		goto free_dom;
 	}
 
 	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
@@ -2636,6 +2693,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 	vmbus_close(hdev->channel);
 destroy_wq:
 	destroy_workqueue(hbus->wq);
+free_dom:
+	hv_put_dom_num(hbus->sysdata.domain);
 free_bus:
 	free_page((unsigned long)hbus);
 	return ret;
@@ -2717,6 +2776,9 @@ static int hv_pci_remove(struct hv_device *hdev)
 	put_hvpcibus(hbus);
 	wait_for_completion(&hbus->remove_event);
 	destroy_workqueue(hbus->wq);
+
+	hv_put_dom_num(hbus->sysdata.domain);
+
 	free_page((unsigned long)hbus);
 	return 0;
 }
@@ -2744,6 +2806,9 @@ static void __exit exit_hv_pci_drv(void)
 
 static int __init init_hv_pci_drv(void)
 {
+	/* Set the invalid domain number's bit, so it will not be used */
+	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
+
 	return vmbus_driver_register(&hv_pci_drv);
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Dexuan Cui @ 2019-08-02 18:13 UTC (permalink / raw)
  To: Sasha Levin, lorenzo.pieralisi@arm.com
  Cc: linux-hyperv@vger.kernel.org, stable@vger.kernel.org,
	stable@vger.kernel.org
In-Reply-To: <20190802180817.A206520578@mail.kernel.org>

> From: Sasha Levin <sashal@kernel.org>
> Sent: Friday, August 2, 2019 11:08 AM
> 
> Hi,
> 
> [This is an automated email]
> 
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 15becc2b56c6 PCI: hv: Add hv_pci_remove_slots() when we
> unload the driver.
> 
> The bot has tested the following trees: v5.2.5, v4.19.63, v4.14.135.
> 
> v5.2.5: Build OK!
> v4.19.63: Build OK!
> v4.14.135: Failed to apply! Possible dependencies:
>     Unable to calculate
> 
> NOTE: The patch will not be queued to stable trees until it is upstream.
> 
> How should we proceed with this patch?
> Sasha

I'm waiting the patch to be accepted into the pci tree first.
We do not need to do anything at this moment.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH 2/3] drivers: hv: vmbus: add fuzz test attributes to sysfs
From: Vitaly Kuznetsov @ 2019-08-02  7:34 UTC (permalink / raw)
  To: Branden Bonaby
  Cc: linux-hyperv, linux-kernel, kys, haiyangz, sthemmin, sashal
In-Reply-To: <20f96dba927eaa42fceeebfc7a6a37f3b1a9ee65.1564527684.git.brandonbonaby94@gmail.com>

Branden Bonaby <brandonbonaby94@gmail.com> writes:

> Expose the test parameters as part of the sysfs channel attributes.
> We will control the testing state via these attributes.
>
> Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> ---
>  Documentation/ABI/stable/sysfs-bus-vmbus | 22 ++++++
>  drivers/hv/vmbus_drv.c                   | 97 +++++++++++++++++++++++-
>  2 files changed, 118 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
> index 8e8d167eca31..239fcb6fdc75 100644
> --- a/Documentation/ABI/stable/sysfs-bus-vmbus
> +++ b/Documentation/ABI/stable/sysfs-bus-vmbus
> @@ -185,3 +185,25 @@ Contact:        Michael Kelley <mikelley@microsoft.com>
>  Description:    Total number of write operations that encountered an outbound
>  		ring buffer full condition
>  Users:          Debugging tools
> +
> +What:           /sys/bus/vmbus/devices/<UUID>/fuzz_test_state

I would prefer this to go under /sys/kernel/debug/ as this is clearly a
debug/test feature.


> +Date:           July 2019
> +KernelVersion:  5.2
> +Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
> +Description:    Fuzz testing status of a vmbus device, whether its in an ON
> +		 state or a OFF state
> +Users:          Debugging tools
> +
> +What:           /sys/bus/vmbus/devices/<UUID>/fuzz_test_buffer_delay
> +Date:           July 2019
> +KernelVersion:  5.2
> +Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
> +Description:    Fuzz testing buffer delay value between 0 - 1000
> +Users:          Debugging tools
> +
> +What:           /sys/bus/vmbus/devices/<UUID>/fuzz_test_message_delay
> +Date:           July 2019
> +KernelVersion:  5.2
> +Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
> +Description:    Fuzz testing message delay value between 0 - 1000
> +Users:          Debugging tools
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 92b1874b3eb3..0c71fd66ef81 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -22,7 +22,7 @@
>  #include <linux/clockchips.h>
>  #include <linux/cpu.h>
>  #include <linux/sched/task_stack.h>
> -
> +#include <linux/kernel.h>
>  #include <asm/mshyperv.h>
>  #include <linux/notifier.h>
>  #include <linux/ptrace.h>
> @@ -584,6 +584,98 @@ static ssize_t driver_override_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RW(driver_override);
>  
> +static ssize_t fuzz_test_state_store(struct device *dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, size_t count)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +	int state;
> +	int delay = kstrtoint(buf, 0, &state);
> +
> +	if (delay)
> +		return count;
> +	if (state)
> +		channel->fuzz_testing_state = 1;
> +	else
> +		channel->fuzz_testing_state = 0;
> +	return count;
> +}
> +
> +static ssize_t fuzz_test_state_show(struct device *dev,
> +				    struct device_attribute *dev_attr,
> +				    char *buf)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +
> +	return sprintf(buf, "%u\n", channel->fuzz_testing_state);
> +}
> +static DEVICE_ATTR_RW(fuzz_test_state);
> +
> +static ssize_t fuzz_test_buffer_delay_store(struct device *dev,
> +					    struct device_attribute *attr,
> +					    const char *buf, size_t count)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +	int val;
> +	int delay = kstrtoint(buf, 0, &val);
> +
> +	if (delay)
> +		return count;
> +	if (val >= 1 && val <= 1000)
> +		channel->fuzz_testing_buffer_delay = val;
> +	/*Best to not use else statement here since we want
> +	 *the buffer delay to remain the same if val > 1000
> +	 */
> +	else if (val <= 0)
> +		channel->fuzz_testing_buffer_delay = 0;
> +	return count;
> +}
> +
> +static ssize_t fuzz_test_buffer_delay_show(struct device *dev,
> +					   struct device_attribute *dev_attr,
> +					   char *buf)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +
> +	return sprintf(buf, "%u\n", channel->fuzz_testing_buffer_delay);
> +}
> +static DEVICE_ATTR_RW(fuzz_test_buffer_delay);
> +
> +static ssize_t fuzz_test_message_delay_store(struct device *dev,
> +					     struct device_attribute *attr,
> +					     const char *buf, size_t count)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +	int val;
> +	int delay = kstrtoint(buf, 0, &val);
> +
> +	if (delay)
> +		return count;
> +	if (val >= 1 && val <= 1000)
> +		channel->fuzz_testing_message_delay = val;
> +	/*Best to not use else statement here since we want
> +	 *the message delay to remain the same if val > 1000
> +	 */
> +	else if (val <= 0)
> +		channel->fuzz_testing_message_delay = 0;
> +	return count;
> +}
> +
> +static ssize_t fuzz_test_message_delay_show(struct device *dev,
> +					    struct device_attribute *dev_attr,
> +					    char *buf)
> +{
> +	struct hv_device *hv_dev = device_to_hv_device(dev);
> +	struct vmbus_channel *channel = hv_dev->channel;
> +
> +	return sprintf(buf, "%u\n", channel->fuzz_testing_message_delay);
> +}
> +static DEVICE_ATTR_RW(fuzz_test_message_delay);
>  /* Set up per device attributes in /sys/bus/vmbus/devices/<bus device> */
>  static struct attribute *vmbus_dev_attrs[] = {
>  	&dev_attr_id.attr,
> @@ -615,6 +707,9 @@ static struct attribute *vmbus_dev_attrs[] = {
>  	&dev_attr_vendor.attr,
>  	&dev_attr_device.attr,
>  	&dev_attr_driver_override.attr,
> +	&dev_attr_fuzz_test_state.attr,
> +	&dev_attr_fuzz_test_buffer_delay.attr,
> +	&dev_attr_fuzz_test_message_delay.attr,
>  	NULL,
>  };

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 1/3] drivers: hv: vmbus: Introduce latency testing
From: Vitaly Kuznetsov @ 2019-08-02  7:32 UTC (permalink / raw)
  To: Branden Bonaby, kys, haiyangz, sthemmin, sashal
  Cc: Branden Bonaby, linux-hyperv, linux-kernel
In-Reply-To: <18193677a879c402d00955c445ae7ce461b4198f.1564527684.git.brandonbonaby94@gmail.com>

Branden Bonaby <brandonbonaby94@gmail.com> writes:

> Introduce user specified latency in the packet reception path.
>
> Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> ---
>  drivers/hv/connection.c  |  5 +++++
>  drivers/hv/ring_buffer.c | 10 ++++++++++
>  include/linux/hyperv.h   | 14 ++++++++++++++
>  3 files changed, 29 insertions(+)
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 09829e15d4a0..2a2c22f5570e 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -354,9 +354,14 @@ void vmbus_on_event(unsigned long data)
>  {
>  	struct vmbus_channel *channel = (void *) data;
>  	unsigned long time_limit = jiffies + 2;
> +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> +						channel :
> +						channel->primary_channel;
>  
>  	trace_vmbus_on_event(channel);
>  
> +	if (unlikely(test_channel->fuzz_testing_buffer_delay > 0))
> +		udelay(test_channel->fuzz_testing_buffer_delay);
>  	do {
>  		void (*callback_fn)(void *);
>  
> diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
> index 9a03b163cbbd..d7627c9023d6 100644
> --- a/drivers/hv/ring_buffer.c
> +++ b/drivers/hv/ring_buffer.c
> @@ -395,7 +395,12 @@ struct vmpacket_descriptor *hv_pkt_iter_first(struct vmbus_channel *channel)
>  {
>  	struct hv_ring_buffer_info *rbi = &channel->inbound;
>  	struct vmpacket_descriptor *desc;
> +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> +						channel :
> +						channel->primary_channel;
>  
> +	if (unlikely(test_channel->fuzz_testing_message_delay > 0))
> +		udelay(test_channel->fuzz_testing_message_delay);
>  	if (hv_pkt_iter_avail(rbi) < sizeof(struct vmpacket_descriptor))
>  		return NULL;
>  
> @@ -420,7 +425,12 @@ __hv_pkt_iter_next(struct vmbus_channel *channel,
>  	struct hv_ring_buffer_info *rbi = &channel->inbound;
>  	u32 packetlen = desc->len8 << 3;
>  	u32 dsize = rbi->ring_datasize;
> +	struct vmbus_channel *test_channel = !channel->primary_channel ?
> +						channel :
> +						channel->primary_channel;

This pattern is repeated 3 times so a define is justified. I would also
reversed the logic:

   test_channel = channel->primary_channel ? channel->primary_channel : channel;

>  
> +	if (unlikely(test_channel->fuzz_testing_message_delay > 0))
> +		udelay(test_channel->fuzz_testing_message_delay);

unlikely() is good but if it was under #ifdef it would've been even better.

>  	/* bump offset to next potential packet */
>  	rbi->priv_read_index += packetlen + VMBUS_PKT_TRAILER;
>  	if (rbi->priv_read_index >= dsize)
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index 6256cc34c4a6..8d068956dd67 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -23,6 +23,7 @@
>  #include <linux/mod_devicetable.h>
>  #include <linux/interrupt.h>
>  #include <linux/reciprocal_div.h>
> +#include <linux/delay.h>
>  
>  #define MAX_PAGE_BUFFER_COUNT				32
>  #define MAX_MULTIPAGE_BUFFER_COUNT			32 /* 128K */
> @@ -926,6 +927,19 @@ struct vmbus_channel {
>  	 * full outbound ring buffer.
>  	 */
>  	u64 out_full_first;
> +
> +	/* enabling/disabling fuzz testing on the channel (default is false)*/
> +	bool fuzz_testing_state;
> +
> +	/* Buffer delay will delay the guest from emptying the ring buffer
> +	 * for a specific amount of time. The delay is in microseconds and will
> +	 * be between 1 to a maximum of 1000, its default is 0 (no delay).
> +	 * The  Message delay will delay guest reading on a per message basis
> +	 * in microseconds between 1 to 1000 with the default being 0
> +	 * (no delay).
> +	 */
> +	u32 fuzz_testing_buffer_delay;
> +	u32 fuzz_testing_message_delay;
>  };
>  
>  static inline bool is_hvsock_channel(const struct vmbus_channel *c)

-- 
Vitaly

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox