Linux-HyperV List
 help / color / mirror / Atom feed
* RE: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-10  1:04 UTC (permalink / raw)
  To: David Miller
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, linux-kernel@vger.kernel.org
In-Reply-To: <20190709.172936.1666884223446806217.davem@davemloft.net>



> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-
> owner@vger.kernel.org> On Behalf Of David Miller
> Sent: Tuesday, July 9, 2019 8:30 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] Name NICs based on vmbus offer and enable
> async probe by default
> 
> 
> The net-next tree, if you are reading netdev today, has been closed.
I will re-submit when the tree re-opened. 
Thanks,
- Haiyang

^ permalink raw reply

* RE: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-10  1:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190709164714.70774c92@hermes.lan>



> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, July 9, 2019 7:47 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stephen
> Hemminger <sthemmin@microsoft.com>; olaf@aepfle.de; vkuznets
> <vkuznets@redhat.com>; davem@davemloft.net; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] Name NICs based on vmbus offer and enable
> async probe by default
> 
> On Tue, 9 Jul 2019 22:56:30 +0000
> Haiyang Zhang <haiyangz@microsoft.com> wrote:
> 
> > -				VRSS_CHANNEL_MAX);
> > +	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
> > +		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
> 
> What about PCI passthrough or VF devices that are also being probed and
> consuming the ethN names.  Won't there be a collision?

VF usually shows up a few seconds later than the synthetic NIC. Faster probing
will reduce the probability of collision.
Even if a collision happens, the code below will re-register the NIC with "eth%d":
+	if (ret == -EEXIST) {
+		pr_info("NIC name %s exists, request another name.\n",
+			net->name);
+		strlcpy(net->name, "eth%d", IFNAMSIZ);
+		ret = register_netdevice(net);
+	}

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: David Miller @ 2019-07-10  0:29 UTC (permalink / raw)
  To: haiyangz
  Cc: sashal, linux-hyperv, netdev, kys, sthemmin, olaf, vkuznets,
	linux-kernel
In-Reply-To: <1562712932-79936-1-git-send-email-haiyangz@microsoft.com>


The net-next tree, if you are reading netdev today, has been closed.

^ permalink raw reply

* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Stephen Hemminger @ 2019-07-09 23:47 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	olaf@aepfle.de, vkuznets, davem@davemloft.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <1562712932-79936-1-git-send-email-haiyangz@microsoft.com>

On Tue, 9 Jul 2019 22:56:30 +0000
Haiyang Zhang <haiyangz@microsoft.com> wrote:

> -				VRSS_CHANNEL_MAX);
> +	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
> +		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);

What about PCI passthrough or VF devices that are also being probed and
consuming the ethN names.  Won't there be a collision?

^ permalink raw reply

* [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: Haiyang Zhang @ 2019-07-09 22:56 UTC (permalink / raw)
  To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org

Previously the async probing caused NIC naming in random order.

The patch adds a dev_num field in vmbus channel structure. It’s assigned
to the first available number when the channel is offered. So netvsc can
use it for NIC naming based on channel offer sequence. Now we re-enable
the async probing mode by default for faster probing.

Also added a modules parameter, probe_type, to set sync probing mode if
a user wants to.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/hv/channel_mgmt.c       | 46 +++++++++++++++++++++++++++++++++++++++--
 drivers/net/hyperv/netvsc_drv.c | 33 ++++++++++++++++++++++++++---
 include/linux/hyperv.h          |  4 ++++
 3 files changed, 78 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index addcef5..ab7c05b 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -304,6 +304,8 @@ bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
 
 EXPORT_SYMBOL_GPL(vmbus_prep_negotiate_resp);
 
+#define HV_DEV_NUM_INVALID (-1)
+
 /*
  * alloc_channel - Allocate and initialize a vmbus channel object
  */
@@ -315,6 +317,8 @@ static struct vmbus_channel *alloc_channel(void)
 	if (!channel)
 		return NULL;
 
+	channel->dev_num = HV_DEV_NUM_INVALID;
+
 	spin_lock_init(&channel->lock);
 	init_completion(&channel->rescind_event);
 
@@ -533,6 +537,42 @@ static void vmbus_add_channel_work(struct work_struct *work)
 }
 
 /*
+ * Get the first available device number of its type, then
+ * record it in the channel structure.
+ */
+static void hv_set_devnum(struct vmbus_channel *newchannel)
+{
+	struct vmbus_channel *channel;
+	unsigned int i = 0;
+	bool found;
+
+	BUG_ON(!mutex_is_locked(&vmbus_connection.channel_mutex));
+
+	/* Only HV_NIC uses this number for now */
+	if (hv_get_dev_type(newchannel) != HV_NIC)
+		return;
+
+next:
+	found = false;
+
+	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
+		if (i == channel->dev_num &&
+		    guid_equal(&channel->offermsg.offer.if_type,
+			       &newchannel->offermsg.offer.if_type)) {
+			found = true;
+			break;
+		}
+	}
+
+	if (found) {
+		i++;
+		goto next;
+	}
+
+	newchannel->dev_num = i;
+}
+
+/*
  * vmbus_process_offer - Process the offer by creating a channel/device
  * associated with this offer
  */
@@ -561,10 +601,12 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel)
 		}
 	}
 
-	if (fnew)
+	if (fnew) {
+		hv_set_devnum(newchannel);
+
 		list_add_tail(&newchannel->listentry,
 			      &vmbus_connection.chn_list);
-	else {
+	} else {
 		/*
 		 * Check to see if this is a valid sub-channel.
 		 */
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index afdcc56..af53690 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -57,6 +57,10 @@
 module_param(debug, int, 0444);
 MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
 
+static unsigned int probe_type __ro_after_init = PROBE_PREFER_ASYNCHRONOUS;
+module_param(probe_type, uint, 0444);
+MODULE_PARM_DESC(probe_type, "Probe type: 1=async(default), 2=sync");
+
 static LIST_HEAD(netvsc_dev_list);
 
 static void netvsc_change_rx_flags(struct net_device *net, int change)
@@ -2233,10 +2237,19 @@ static int netvsc_probe(struct hv_device *dev,
 	struct net_device_context *net_device_ctx;
 	struct netvsc_device_info *device_info = NULL;
 	struct netvsc_device *nvdev;
+	char name[IFNAMSIZ];
 	int ret = -ENOMEM;
 
-	net = alloc_etherdev_mq(sizeof(struct net_device_context),
-				VRSS_CHANNEL_MAX);
+	if (probe_type == PROBE_PREFER_ASYNCHRONOUS) {
+		snprintf(name, IFNAMSIZ, "eth%d", dev->channel->dev_num);
+		net = alloc_netdev_mqs(sizeof(struct net_device_context), name,
+				       NET_NAME_ENUM, ether_setup,
+				       VRSS_CHANNEL_MAX, VRSS_CHANNEL_MAX);
+	} else {
+		net = alloc_etherdev_mq(sizeof(struct net_device_context),
+					VRSS_CHANNEL_MAX);
+	}
+
 	if (!net)
 		goto no_net;
 
@@ -2323,6 +2336,14 @@ static int netvsc_probe(struct hv_device *dev,
 		net->max_mtu = ETH_DATA_LEN;
 
 	ret = register_netdevice(net);
+
+	if (ret == -EEXIST) {
+		pr_info("NIC name %s exists, request another name.\n",
+			net->name);
+		strlcpy(net->name, "eth%d", IFNAMSIZ);
+		ret = register_netdevice(net);
+	}
+
 	if (ret != 0) {
 		pr_err("Unable to register netdev.\n");
 		goto register_failed;
@@ -2407,7 +2428,7 @@ static int netvsc_remove(struct hv_device *dev)
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
 	.driver = {
-		.probe_type = PROBE_FORCE_SYNCHRONOUS,
+		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
 	},
 };
 
@@ -2473,6 +2494,12 @@ static int __init netvsc_drv_init(void)
 	}
 	netvsc_ring_bytes = ring_size * PAGE_SIZE;
 
+	if (probe_type != PROBE_PREFER_ASYNCHRONOUS)
+		probe_type = PROBE_FORCE_SYNCHRONOUS;
+
+	netvsc_drv.driver.probe_type = probe_type;
+	pr_info("probe_type: %u\n", probe_type);
+
 	ret = vmbus_driver_register(&netvsc_drv);
 	if (ret)
 		return ret;
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..12fc5ea 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -841,6 +841,10 @@ struct vmbus_channel {
 	 */
 	struct vmbus_channel *primary_channel;
 	/*
+	 * Used for device naming based on channel offer sequence.
+	 */
+	int dev_num;
+	/*
 	 * Support per-channel state for use by vmbus drivers.
 	 */
 	void *per_channel_state;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 3/7] Drivers: hv: vmbus: Split hv_synic_init/cleanup into regs and timer settings
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

There is only one functional change: the unnecessary check
"if (sctrl.enable != 1) return -EFAULT;" is removed, because when we're in
hv_synic_cleanup(), we're absolutely sure sctrl.enable must be 1.

The new functions hv_synic_disable/enable_regs() will be used by a later patch
to support hibernation.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/hv.c           | 66 ++++++++++++++++++++++++++---------------------
 drivers/hv/hyperv_vmbus.h |  2 ++
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 6188fb7..fcc5279 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -154,7 +154,7 @@ void hv_synic_free(void)
  * retrieve the initialized message and event pages.  Otherwise, we create and
  * initialize the message and event pages.
  */
-int hv_synic_init(unsigned int cpu)
+void hv_synic_enable_regs(unsigned int cpu)
 {
 	struct hv_per_cpu_context *hv_cpu
 		= per_cpu_ptr(hv_context.cpu_context, cpu);
@@ -196,6 +196,11 @@ int hv_synic_init(unsigned int cpu)
 	sctrl.enable = 1;
 
 	hv_set_synic_state(sctrl.as_uint64);
+}
+
+int hv_synic_init(unsigned int cpu)
+{
+	hv_synic_enable_regs(cpu);
 
 	hv_stimer_init(cpu);
 
@@ -205,20 +210,45 @@ int hv_synic_init(unsigned int cpu)
 /*
  * hv_synic_cleanup - Cleanup routine for hv_synic_init().
  */
-int hv_synic_cleanup(unsigned int cpu)
+void hv_synic_disable_regs(unsigned int cpu)
 {
 	union hv_synic_sint shared_sint;
 	union hv_synic_simp simp;
 	union hv_synic_siefp siefp;
 	union hv_synic_scontrol sctrl;
+
+	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+	shared_sint.masked = 1;
+
+	/* Need to correctly cleanup in the case of SMP!!! */
+	/* Disable the interrupt */
+	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
+
+	hv_get_simp(simp.as_uint64);
+	simp.simp_enabled = 0;
+	simp.base_simp_gpa = 0;
+
+	hv_set_simp(simp.as_uint64);
+
+	hv_get_siefp(siefp.as_uint64);
+	siefp.siefp_enabled = 0;
+	siefp.base_siefp_gpa = 0;
+
+	hv_set_siefp(siefp.as_uint64);
+
+	/* Disable the global synic bit */
+	hv_get_synic_state(sctrl.as_uint64);
+	sctrl.enable = 0;
+	hv_set_synic_state(sctrl.as_uint64);
+}
+
+int hv_synic_cleanup(unsigned int cpu)
+{
 	struct vmbus_channel *channel, *sc;
 	bool channel_found = false;
 	unsigned long flags;
 
-	hv_get_synic_state(sctrl.as_uint64);
-	if (sctrl.enable != 1)
-		return -EFAULT;
-
 	/*
 	 * Search for channels which are bound to the CPU we're about to
 	 * cleanup. In case we find one and vmbus is still connected we need to
@@ -249,29 +279,7 @@ int hv_synic_cleanup(unsigned int cpu)
 
 	hv_stimer_cleanup(cpu);
 
-	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	shared_sint.masked = 1;
-
-	/* Need to correctly cleanup in the case of SMP!!! */
-	/* Disable the interrupt */
-	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
-
-	hv_get_simp(simp.as_uint64);
-	simp.simp_enabled = 0;
-	simp.base_simp_gpa = 0;
-
-	hv_set_simp(simp.as_uint64);
-
-	hv_get_siefp(siefp.as_uint64);
-	siefp.siefp_enabled = 0;
-	siefp.base_siefp_gpa = 0;
-
-	hv_set_siefp(siefp.as_uint64);
-
-	/* Disable the global synic bit */
-	sctrl.enable = 0;
-	hv_set_synic_state(sctrl.as_uint64);
+	hv_synic_disable_regs(cpu);
 
 	return 0;
 }
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 362e70e..26ea161 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -171,8 +171,10 @@ extern int hv_post_message(union hv_connection_id connection_id,
 
 extern void hv_synic_free(void);
 
+extern void hv_synic_enable_regs(unsigned int cpu);
 extern int hv_synic_init(unsigned int cpu);
 
+extern void hv_synic_disable_regs(unsigned int cpu);
 extern int hv_synic_cleanup(unsigned int cpu);
 
 /* Interface */
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 4/7] Drivers: hv: vmbus: Suspend/resume the synic for hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

This is needed when we resume the old kernel from the "current" kernel.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 72d5a7c..1c2d935 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -30,6 +30,7 @@
 #include <linux/kdebug.h>
 #include <linux/efi.h>
 #include <linux/random.h>
+#include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 #include "hyperv_vmbus.h"
 
@@ -2088,6 +2089,41 @@ static void hv_crash_handler(struct pt_regs *regs)
 	hyperv_cleanup();
 };
 
+static int hv_synic_suspend(void)
+{
+	/*
+	 * Here we only need to care about CPU0: when the other CPUs are
+	 * offlined, hv_synic_cleanup() has been called for them, and the
+	 * timers on them have been automatically disabled and deleted in
+	 * tick_cleanup_dead_cpu().
+	 */
+	hv_stimer_cleanup(0);
+
+	hv_synic_disable_regs(0);
+
+	return 0;
+}
+
+static void hv_synic_resume(void)
+{
+	/*
+	 * Here we only need to care about CPU0: when the other CPUs are
+	 * onlined, hv_synic_init() has been called, and the timers are
+	 * added there.
+	 *
+	 * Note: we don't need to call hv_stimer_init() for stimer0, because
+	 * it is not deleted before hibernation and it's resumed in
+	 * timekeeping_resume().
+	 */
+	hv_synic_enable_regs(0);
+}
+
+/* The callbacks run only on CPU0, with irqs_disabled. */
+static struct syscore_ops hv_synic_syscore_ops = {
+	.suspend = hv_synic_suspend,
+	.resume = hv_synic_resume,
+};
+
 static int __init hv_acpi_init(void)
 {
 	int ret, t;
@@ -2118,6 +2154,8 @@ static int __init hv_acpi_init(void)
 	hv_setup_kexec_handler(hv_kexec_handler);
 	hv_setup_crash_handler(hv_crash_handler);
 
+	register_syscore_ops(&hv_synic_syscore_ops);
+
 	return 0;
 
 cleanup:
@@ -2130,6 +2168,8 @@ static void __exit vmbus_exit(void)
 {
 	int cpu;
 
+	unregister_syscore_ops(&hv_synic_syscore_ops);
+
 	hv_remove_kexec_handler();
 	hv_remove_crash_handler();
 	vmbus_connection.conn_state = DISCONNECTED;
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 5/7] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

When the VM resumes, the host re-sends the offers. We should not add the
offers to the global vmbus_connection.chn_list again.

Added some debug code, in case the host screws up the exact info related to
the offers.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/channel_mgmt.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index addcef5..a9aeeab 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -854,12 +854,38 @@ void vmbus_initiate_unload(bool crash)
 static void vmbus_onoffer(struct vmbus_channel_message_header *hdr)
 {
 	struct vmbus_channel_offer_channel *offer;
-	struct vmbus_channel *newchannel;
+	struct vmbus_channel *oldchannel, *newchannel;
+	size_t offer_sz;
 
 	offer = (struct vmbus_channel_offer_channel *)hdr;
 
 	trace_vmbus_onoffer(offer);
 
+	mutex_lock(&vmbus_connection.channel_mutex);
+	oldchannel = relid2channel(offer->child_relid);
+	mutex_unlock(&vmbus_connection.channel_mutex);
+
+	if (oldchannel != NULL) {
+		atomic_dec(&vmbus_connection.offer_in_progress);
+
+		/*
+		 * We're resuming from hibernation: we expect the host to send
+		 * exactly the same offers that we had before the hibernation.
+		 */
+		offer_sz = sizeof(*offer);
+		if (memcmp(offer, &oldchannel->offermsg, offer_sz) == 0)
+			return;
+
+		pr_err("Mismatched offer from the host (relid=%d)!\n",
+		       offer->child_relid);
+
+		print_hex_dump_debug("Old vmbus offer: ", DUMP_PREFIX_OFFSET, 4,
+				     4, &oldchannel->offermsg, offer_sz, false);
+		print_hex_dump_debug("New vmbus offer: ", DUMP_PREFIX_OFFSET, 4,
+				     4, offer, offer_sz, false);
+		return;
+	}
+
 	/* Allocate the channel object and save this offer. */
 	newchannel = alloc_channel();
 	if (!newchannel) {
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 7/7] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

The high-level VSC drivers will implement device-specific callbacks.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/vmbus_drv.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/hyperv.h |  3 +++
 2 files changed, 45 insertions(+)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 1730e7b..e29e2171 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -911,6 +911,43 @@ static void vmbus_shutdown(struct device *child_device)
 		drv->shutdown(dev);
 }
 
+/*
+ * vmbus_suspend - Suspend a vmbus device
+ */
+static int vmbus_suspend(struct device *child_device)
+{
+	struct hv_driver *drv;
+	struct hv_device *dev = device_to_hv_device(child_device);
+
+	/* The device may not be attached yet */
+	if (!child_device->driver)
+		return 0;
+
+	drv = drv_to_hv_drv(child_device->driver);
+	if (!drv->suspend)
+		return -EOPNOTSUPP;
+
+	return drv->suspend(dev);
+}
+
+/*
+ * vmbus_resume - Resume a vmbus device
+ */
+static int vmbus_resume(struct device *child_device)
+{
+	struct hv_driver *drv;
+	struct hv_device *dev = device_to_hv_device(child_device);
+
+	/* The device may not be attached yet */
+	if (!child_device->driver)
+		return 0;
+
+	drv = drv_to_hv_drv(child_device->driver);
+	if (!drv->resume)
+		return -EOPNOTSUPP;
+
+	return drv->resume(dev);
+}
 
 /*
  * vmbus_device_release - Final callback release of the vmbus child device
@@ -926,6 +963,10 @@ static void vmbus_device_release(struct device *device)
 	kfree(hv_dev);
 }
 
+static const struct dev_pm_ops vmbus_pm = {
+	SET_SYSTEM_SLEEP_PM_OPS(vmbus_suspend, vmbus_resume)
+};
+
 /* The one and only one */
 static struct bus_type  hv_bus = {
 	.name =		"vmbus",
@@ -936,6 +977,7 @@ static void vmbus_device_release(struct device *device)
 	.uevent =		vmbus_uevent,
 	.dev_groups =		vmbus_dev_groups,
 	.drv_groups =		vmbus_drv_groups,
+	.pm =			&vmbus_pm,
 };
 
 struct onmessage_work_context {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 6256cc3..94443c4 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1149,6 +1149,9 @@ struct hv_driver {
 	int (*remove)(struct hv_device *);
 	void (*shutdown)(struct hv_device *);
 
+	int (*suspend)(struct hv_device *);
+	int (*resume)(struct hv_device *);
+
 };
 
 /* Base device object */
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 6/7] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

This is needed when we resume the old kernel from the "current" kernel.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/hv/connection.c   |  3 +--
 drivers/hv/hyperv_vmbus.h |  2 ++
 drivers/hv/vmbus_drv.c    | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 09829e1..806319c 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -59,8 +59,7 @@ static __u32 vmbus_get_next_version(__u32 current_version)
 	}
 }
 
-static int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo,
-					__u32 version)
+int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
 	int ret = 0;
 	unsigned int cur_cpu;
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 26ea161..e657197 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -274,6 +274,8 @@ struct vmbus_msginfo {
 
 extern struct vmbus_connection vmbus_connection;
 
+int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version);
+
 static inline void vmbus_send_interrupt(u32 relid)
 {
 	sync_set_bit(relid, vmbus_connection.send_int_page);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 1c2d935..1730e7b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2045,6 +2045,41 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
+static int vmbus_bus_suspend(struct device *dev)
+{
+	vmbus_initiate_unload(false);
+
+	vmbus_connection.conn_state = DISCONNECTED;
+
+	return 0;
+}
+
+static int vmbus_bus_resume(struct device *dev)
+{
+	struct vmbus_channel_msginfo *msginfo;
+	size_t msgsize;
+	int ret;
+
+	msgsize = sizeof(*msginfo) +
+		  sizeof(struct vmbus_channel_initiate_contact);
+
+	msginfo = kzalloc(msgsize, GFP_KERNEL);
+
+	if (msginfo == NULL)
+		return -ENOMEM;
+
+	ret = vmbus_negotiate_version(msginfo, vmbus_proto_version);
+
+	kfree(msginfo);
+
+	if (ret != 0)
+		return ret;
+
+	vmbus_request_offers();
+
+	return 0;
+}
+
 static const struct acpi_device_id vmbus_acpi_device_ids[] = {
 	{"VMBUS", 0},
 	{"VMBus", 0},
@@ -2052,6 +2087,10 @@ static int vmbus_acpi_add(struct acpi_device *device)
 };
 MODULE_DEVICE_TABLE(acpi, vmbus_acpi_device_ids);
 
+static const struct dev_pm_ops vmbus_bus_pm = {
+	SET_SYSTEM_SLEEP_PM_OPS(vmbus_bus_suspend, vmbus_bus_resume)
+};
+
 static struct acpi_driver vmbus_acpi_driver = {
 	.name = "vmbus",
 	.ids = vmbus_acpi_device_ids,
@@ -2059,6 +2098,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 		.add = vmbus_acpi_add,
 		.remove = vmbus_acpi_remove,
 	},
+	.drv.pm = &vmbus_bus_pm,
 };
 
 static void hv_kexec_handler(void)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 2/7] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

This is needed for hibernation, e.g. when we resume the old kernel, we need
to disable the "current" kernel's TSC page and then resume the old kernel's.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/clocksource/hyperv_timer.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba2c79e6..41c31a7 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -237,12 +237,37 @@ static u64 read_hv_clock_tsc(struct clocksource *arg)
 	return read_hv_sched_clock_tsc();
 }
 
+static void suspend_hv_clock_tsc(struct clocksource *arg)
+{
+	u64 tsc_msr;
+
+	/* Disable the TSC page */
+	hv_get_reference_tsc(tsc_msr);
+	tsc_msr &= ~BIT_ULL(0);
+	hv_set_reference_tsc(tsc_msr);
+}
+
+
+static void resume_hv_clock_tsc(struct clocksource *arg)
+{
+	phys_addr_t phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+	u64 tsc_msr;
+
+	/* Re-enable the TSC page */
+	hv_get_reference_tsc(tsc_msr);
+	tsc_msr &= GENMASK_ULL(11, 0);
+	tsc_msr |= BIT_ULL(0) | (u64)phys_addr;
+	hv_set_reference_tsc(tsc_msr);
+}
+
 static struct clocksource hyperv_cs_tsc = {
 	.name	= "hyperv_clocksource_tsc_page",
 	.rating	= 400,
 	.read	= read_hv_clock_tsc,
 	.mask	= CLOCKSOURCE_MASK(64),
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
+	.suspend= suspend_hv_clock_tsc,
+	.resume	= resume_hv_clock_tsc,
 };
 #endif
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 1/7] x86/hyper-v: Suspend/resume the hypercall page for hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui
In-Reply-To: <1562650084-99874-1-git-send-email-decui@microsoft.com>

This is needed for hibernation, e.g. when we resume the old kernel, we need
to disable the "current" kernel's hypercall page and then resume the old
kernel's.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0e033ef..3005871 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include <linux/hyperv.h>
 #include <linux/slab.h>
 #include <linux/cpuhotplug.h>
+#include <linux/syscore_ops.h>
 #include <clocksource/hyperv_timer.h>
 
 void *hv_hypercall_pg;
@@ -214,6 +215,34 @@ static int __init hv_pci_init(void)
 	return 1;
 }
 
+static int hv_suspend(void)
+{
+	union hv_x64_msr_hypercall_contents hypercall_msr;
+
+	/* Reset the hypercall page */
+	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+	hypercall_msr.enable = 0;
+	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+
+	return 0;
+}
+
+static void hv_resume(void)
+{
+	union hv_x64_msr_hypercall_contents hypercall_msr;
+
+	/* Re-enable the hypercall page */
+	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+	hypercall_msr.enable = 1;
+	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
+	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
+}
+
+static struct syscore_ops hv_syscore_ops = {
+	.suspend = hv_suspend,
+	.resume = hv_resume,
+};
+
 /*
  * This function is to be invoked early in the boot sequence after the
  * hypervisor has been detected.
@@ -294,6 +323,9 @@ void __init hyperv_init(void)
 
 	/* Register Hyper-V specific clocksource */
 	hv_init_clocksource();
+
+	register_syscore_ops(&hv_syscore_ops);
+
 	return;
 
 remove_cpuhp_state:
@@ -313,6 +345,8 @@ void hyperv_cleanup(void)
 {
 	union hv_x64_msr_hypercall_contents hypercall_msr;
 
+	unregister_syscore_ops(&hv_syscore_ops);
+
 	/* Reset our OS id */
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 0/7] Enhance the hv_vmbus driver to support hibernation
From: Dexuan Cui @ 2019-07-09  5:29 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui

Hi,
This is the first patchset to enable hibernation when Linux VM runs on Hyper-V.

A second patchset to enhance the high-level VSC drivers (hv_netvsc,
hv_storvsc, etc.) for hibernation will be posted later. The second patchset
depends on this first patchset, so I hope this pathset can be accepted soon.

The changes in this patchset are mostly straightforward new code that only
runs when the VM enters hibernation, so IMHO it's pretty safe to have this
patchset, because the hibernation feature never worked for Linux VM running
on Hyper-V.

The patchset is rebased on Linus's tree today.

Please review.

Thanks,
Dexuan

Dexuan Cui (7):
  x86/hyper-v: Suspend/resume the hypercall page for hibernation
  clocksource/drivers: Suspend/resume Hyper-V clocksource for
    hibernation
  Drivers: hv: vmbus: Split hv_synic_init/cleanup into regs and timer
    settings
  Drivers: hv: vmbus: Suspend/resume the synic for hibernation
  Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
  Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
  Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for
    hibernation

 arch/x86/hyperv/hv_init.c          |  34 +++++++++++
 drivers/clocksource/hyperv_timer.c |  25 ++++++++
 drivers/hv/channel_mgmt.c          |  28 ++++++++-
 drivers/hv/connection.c            |   3 +-
 drivers/hv/hv.c                    |  66 +++++++++++---------
 drivers/hv/hyperv_vmbus.h          |   4 ++
 drivers/hv/vmbus_drv.c             | 122 +++++++++++++++++++++++++++++++++++++
 include/linux/hyperv.h             |   3 +
 8 files changed, 253 insertions(+), 32 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
From: Randy Dunlap @ 2019-07-09  0:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Haiyang Zhang, LKML, linux-pci, Stephen Hemminger, Matthew Wilcox,
	Jake Oshins, KY Srinivasan, Sasha Levin, Bjorn Helgaas,
	linux-hyperv@vger.kernel.org, Dexuan Cui, Yuehaibing
In-Reply-To: <20190708080527.138e18e9@hermes.lan>

On 7/8/19 8:05 AM, Stephen Hemminger wrote:
> On Sun, 7 Jul 2019 10:46:22 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
> 
>> On 7/3/19 11:06 AM, Haiyang Zhang wrote:
>>>
>>>   
>>>> -----Original Message-----
>>>> From: Randy Dunlap <rdunlap@infradead.org>
>>>> Sent: Wednesday, July 3, 2019 12:59 PM
>>>> To: LKML <linux-kernel@vger.kernel.org>; linux-pci <linux-  
>>>> pci@vger.kernel.org>  
>>>> Cc: Matthew Wilcox <willy@infradead.org>; Jake Oshins
>>>> <jakeo@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang
>>>> Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>>>> <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>; Bjorn
>>>> Helgaas <bhelgaas@google.com>; linux-hyperv@vger.kernel.org; Dexuan
>>>> Cui <decui@microsoft.com>; Yuehaibing <yuehaibing@huawei.com>
>>>> Subject: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
>>>>
>>>> From: Randy Dunlap <rdunlap@infradead.org>
>>>>
>>>> Fix build of drivers/pci/controller/pci-hyperv.o when
>>>> CONFIG_SYSFS is not set/enabled by adding stubs for
>>>> pci_create_slot() and pci_destroy_slot().
>>>>
>>>> Fixes these build errors:
>>>>
>>>> ERROR: "pci_destroy_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
>>>> ERROR: "pci_create_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
>>>>
>>>> Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot
>>>> information")
>>>>
>>>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>>>> Cc: Matthew Wilcox <willy@infradead.org>
>>>> Cc: Jake Oshins <jakeo@microsoft.com>
>>>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>>>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>>>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>>>> Cc: Sasha Levin <sashal@kernel.org>
>>>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>>>> Cc: linux-pci@vger.kernel.org
>>>> Cc: linux-hyperv@vger.kernel.org
>>>> Cc: Dexuan Cui <decui@microsoft.com>
>>>> Cc: Yuehaibing <yuehaibing@huawei.com>
>>>> ---
>>>> v2:
>>>> - provide non-CONFIG_SYSFS stubs for pci_create_slot() and
>>>>   pci_destroy_slot() [suggested by Matthew Wilcox <willy@infradead.org>]
>>>> - use the correct Fixes: tag [Dexuan Cui <decui@microsoft.com>]
>>>>
>>>>  include/linux/pci.h |   12 ++++++++++--
>>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>>
>>>> --- lnx-52-rc7.orig/include/linux/pci.h
>>>> +++ lnx-52-rc7/include/linux/pci.h
>>>> @@ -25,6 +25,7 @@
>>>>  #include <linux/ioport.h>
>>>>  #include <linux/list.h>
>>>>  #include <linux/compiler.h>
>>>> +#include <linux/err.h>
>>>>  #include <linux/errno.h>
>>>>  #include <linux/kobject.h>
>>>>  #include <linux/atomic.h>
>>>> @@ -947,14 +948,21 @@ int pci_scan_root_bus_bridge(struct pci_
>>>>  struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev
>>>> *dev,
>>>>  				int busnr);
>>>>  void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
>>>> +#ifdef CONFIG_SYSFS
>>>> +void pci_dev_assign_slot(struct pci_dev *dev);
>>>>  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>>>>  				 const char *name,
>>>>  				 struct hotplug_slot *hotplug);
>>>>  void pci_destroy_slot(struct pci_slot *slot);
>>>> -#ifdef CONFIG_SYSFS
>>>> -void pci_dev_assign_slot(struct pci_dev *dev);
>>>>  #else
>>>>  static inline void pci_dev_assign_slot(struct pci_dev *dev) { }
>>>> +static inline struct pci_slot *pci_create_slot(struct pci_bus *parent,
>>>> +					       int slot_nr,
>>>> +					       const char *name,
>>>> +					       struct hotplug_slot *hotplug) {
>>>> +	return ERR_PTR(-EINVAL);
>>>> +}
>>>> +static inline void pci_destroy_slot(struct pci_slot *slot) { }
>>>>  #endif
>>>>  int pci_scan_slot(struct pci_bus *bus, int devfn);
>>>>  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>>>>  
>>>
>>> The serial number in slot info is used to match VF NIC with Synthetic NIC.
>>> Without selecting SYSFS, the SRIOV feature will fail on VM on Hyper-V and
>>> Azure. The first version of this patch should be used.
>>>
>>> @Stephen Hemminger how do you think?
> 
> Haiyang is right, accelerated networking won't work if slot is not recorded.
> 
> So the original patch (to depend on SYSFS) or using "select SYSFS" is
> are necessary.

Thanks, I'll resend that one with the corrected Fixes: tag.

> The whole thing is a bit of "angels dancing on the head of a pin" because
> there is no good reason to build kernel without SYSFS in real world.
> It would just be looking for trouble. As far as I can tell it is all
> about getting "make randconfig" to work in more cases.

You could submit a patch to remove the SYSFS kconfig entry.  :)

cheers.

-- 
~Randy

^ permalink raw reply

* Re: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
From: Stephen Hemminger @ 2019-07-08 15:05 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Haiyang Zhang, LKML, linux-pci, Stephen Hemminger, Matthew Wilcox,
	Jake Oshins, KY Srinivasan, Sasha Levin, Bjorn Helgaas,
	linux-hyperv@vger.kernel.org, Dexuan Cui, Yuehaibing
In-Reply-To: <c1261e6d-84c3-4874-5c32-a3988c5a85d6@infradead.org>

On Sun, 7 Jul 2019 10:46:22 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 7/3/19 11:06 AM, Haiyang Zhang wrote:
> > 
> >   
> >> -----Original Message-----
> >> From: Randy Dunlap <rdunlap@infradead.org>
> >> Sent: Wednesday, July 3, 2019 12:59 PM
> >> To: LKML <linux-kernel@vger.kernel.org>; linux-pci <linux-  
> >> pci@vger.kernel.org>  
> >> Cc: Matthew Wilcox <willy@infradead.org>; Jake Oshins
> >> <jakeo@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang
> >> Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> >> <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>; Bjorn
> >> Helgaas <bhelgaas@google.com>; linux-hyperv@vger.kernel.org; Dexuan
> >> Cui <decui@microsoft.com>; Yuehaibing <yuehaibing@huawei.com>
> >> Subject: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
> >>
> >> From: Randy Dunlap <rdunlap@infradead.org>
> >>
> >> Fix build of drivers/pci/controller/pci-hyperv.o when
> >> CONFIG_SYSFS is not set/enabled by adding stubs for
> >> pci_create_slot() and pci_destroy_slot().
> >>
> >> Fixes these build errors:
> >>
> >> ERROR: "pci_destroy_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
> >> ERROR: "pci_create_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
> >>
> >> Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot
> >> information")
> >>
> >> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
> >> Cc: Matthew Wilcox <willy@infradead.org>
> >> Cc: Jake Oshins <jakeo@microsoft.com>
> >> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> >> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> >> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> >> Cc: Sasha Levin <sashal@kernel.org>
> >> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >> Cc: linux-pci@vger.kernel.org
> >> Cc: linux-hyperv@vger.kernel.org
> >> Cc: Dexuan Cui <decui@microsoft.com>
> >> Cc: Yuehaibing <yuehaibing@huawei.com>
> >> ---
> >> v2:
> >> - provide non-CONFIG_SYSFS stubs for pci_create_slot() and
> >>   pci_destroy_slot() [suggested by Matthew Wilcox <willy@infradead.org>]
> >> - use the correct Fixes: tag [Dexuan Cui <decui@microsoft.com>]
> >>
> >>  include/linux/pci.h |   12 ++++++++++--
> >>  1 file changed, 10 insertions(+), 2 deletions(-)
> >>
> >> --- lnx-52-rc7.orig/include/linux/pci.h
> >> +++ lnx-52-rc7/include/linux/pci.h
> >> @@ -25,6 +25,7 @@
> >>  #include <linux/ioport.h>
> >>  #include <linux/list.h>
> >>  #include <linux/compiler.h>
> >> +#include <linux/err.h>
> >>  #include <linux/errno.h>
> >>  #include <linux/kobject.h>
> >>  #include <linux/atomic.h>
> >> @@ -947,14 +948,21 @@ int pci_scan_root_bus_bridge(struct pci_
> >>  struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev
> >> *dev,
> >>  				int busnr);
> >>  void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
> >> +#ifdef CONFIG_SYSFS
> >> +void pci_dev_assign_slot(struct pci_dev *dev);
> >>  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
> >>  				 const char *name,
> >>  				 struct hotplug_slot *hotplug);
> >>  void pci_destroy_slot(struct pci_slot *slot);
> >> -#ifdef CONFIG_SYSFS
> >> -void pci_dev_assign_slot(struct pci_dev *dev);
> >>  #else
> >>  static inline void pci_dev_assign_slot(struct pci_dev *dev) { }
> >> +static inline struct pci_slot *pci_create_slot(struct pci_bus *parent,
> >> +					       int slot_nr,
> >> +					       const char *name,
> >> +					       struct hotplug_slot *hotplug) {
> >> +	return ERR_PTR(-EINVAL);
> >> +}
> >> +static inline void pci_destroy_slot(struct pci_slot *slot) { }
> >>  #endif
> >>  int pci_scan_slot(struct pci_bus *bus, int devfn);
> >>  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
> >>  
> > 
> > The serial number in slot info is used to match VF NIC with Synthetic NIC.
> > Without selecting SYSFS, the SRIOV feature will fail on VM on Hyper-V and
> > Azure. The first version of this patch should be used.
> > 
> > @Stephen Hemminger how do you think?

Haiyang is right, accelerated networking won't work if slot is not recorded.

So the original patch (to depend on SYSFS) or using "select SYSFS" is
are necessary.

The whole thing is a bit of "angels dancing on the head of a pin" because
there is no good reason to build kernel without SYSFS in real world.
It would just be looking for trouble. As far as I can tell it is all
about getting "make randconfig" to work in more cases.

 

^ permalink raw reply

* RE: [PATCH] x86/hyper-v: Zero out the VP assist page to fix CPU offlining
From: Michael Kelley @ 2019-07-08  1:41 UTC (permalink / raw)
  To: Dexuan Cui, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	Sasha Levin, linux-hyperv@vger.kernel.org, Long Li, vkuznets,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	x86@kernel.org
  Cc: linux-kernel@vger.kernel.org, marcelo.cerri@canonical.com,
	driverdev-devel@linuxdriverproject.org, olaf@aepfle.de,
	apw@canonical.com, jasowang@redhat.com
In-Reply-To: <PU1P153MB01697CBE66649B4BA91D8B48BFFA0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com> Sent: Wednesday, July 3, 2019 6:46 PM
> 
> When a CPU is being offlined, the CPU usually still receives a few
> interrupts (e.g. reschedule IPIs), after hv_cpu_die() disables the
> HV_X64_MSR_VP_ASSIST_PAGE, so hv_apic_eoi_write() may not write the EOI
> MSR, if the apic_assist field's bit0 happens to be 1; as a result, Hyper-V
> may not be able to deliver all the interrupts to the CPU, and the CPU may
> not be stopped, and the kernel will hang soon.
> 
> The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
> 5.2.1 "GPA Overlay Pages"), so with this fix we're sure the apic_assist
> field is still zero, after the VP ASSIST PAGE is disabled.
> 
> Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0e033ef11a9f..db51a301f759 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -60,8 +60,14 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> +	/*
> +	 * The ZERO flag is necessary, because in the case of CPU offlining
> +	 * the page can still be used by hv_apic_eoi_write() for a while,
> +	 * after the VP ASSIST PAGE is disabled in hv_cpu_die().
> +	 */
>  	if (!*hvp)
> -		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> +		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
> +				 PAGE_KERNEL);
> 
>  	if (*hvp) {
>  		u64 val;
> --
> 2.19.1

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply

* Re: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
From: Randy Dunlap @ 2019-07-07 17:46 UTC (permalink / raw)
  To: Haiyang Zhang, LKML, linux-pci, Stephen Hemminger
  Cc: Matthew Wilcox, Jake Oshins, KY Srinivasan, Sasha Levin,
	Bjorn Helgaas, linux-hyperv@vger.kernel.org, Dexuan Cui,
	Yuehaibing, Stephen Hemminger
In-Reply-To: <DM6PR21MB13373F2B76558930CC368E17CAFB0@DM6PR21MB1337.namprd21.prod.outlook.com>

On 7/3/19 11:06 AM, Haiyang Zhang wrote:
> 
> 
>> -----Original Message-----
>> From: Randy Dunlap <rdunlap@infradead.org>
>> Sent: Wednesday, July 3, 2019 12:59 PM
>> To: LKML <linux-kernel@vger.kernel.org>; linux-pci <linux-
>> pci@vger.kernel.org>
>> Cc: Matthew Wilcox <willy@infradead.org>; Jake Oshins
>> <jakeo@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang
>> Zhang <haiyangz@microsoft.com>; Stephen Hemminger
>> <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>; Bjorn
>> Helgaas <bhelgaas@google.com>; linux-hyperv@vger.kernel.org; Dexuan
>> Cui <decui@microsoft.com>; Yuehaibing <yuehaibing@huawei.com>
>> Subject: [PATCH v2] PCI: hv: fix pci-hyperv build when SYSFS not enabled
>>
>> From: Randy Dunlap <rdunlap@infradead.org>
>>
>> Fix build of drivers/pci/controller/pci-hyperv.o when
>> CONFIG_SYSFS is not set/enabled by adding stubs for
>> pci_create_slot() and pci_destroy_slot().
>>
>> Fixes these build errors:
>>
>> ERROR: "pci_destroy_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
>> ERROR: "pci_create_slot" [drivers/pci/controller/pci-hyperv.ko] undefined!
>>
>> Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot
>> information")
>>
>> Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Jake Oshins <jakeo@microsoft.com>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: linux-pci@vger.kernel.org
>> Cc: linux-hyperv@vger.kernel.org
>> Cc: Dexuan Cui <decui@microsoft.com>
>> Cc: Yuehaibing <yuehaibing@huawei.com>
>> ---
>> v2:
>> - provide non-CONFIG_SYSFS stubs for pci_create_slot() and
>>   pci_destroy_slot() [suggested by Matthew Wilcox <willy@infradead.org>]
>> - use the correct Fixes: tag [Dexuan Cui <decui@microsoft.com>]
>>
>>  include/linux/pci.h |   12 ++++++++++--
>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> --- lnx-52-rc7.orig/include/linux/pci.h
>> +++ lnx-52-rc7/include/linux/pci.h
>> @@ -25,6 +25,7 @@
>>  #include <linux/ioport.h>
>>  #include <linux/list.h>
>>  #include <linux/compiler.h>
>> +#include <linux/err.h>
>>  #include <linux/errno.h>
>>  #include <linux/kobject.h>
>>  #include <linux/atomic.h>
>> @@ -947,14 +948,21 @@ int pci_scan_root_bus_bridge(struct pci_
>>  struct pci_bus *pci_add_new_bus(struct pci_bus *parent, struct pci_dev
>> *dev,
>>  				int busnr);
>>  void pcie_update_link_speed(struct pci_bus *bus, u16 link_status);
>> +#ifdef CONFIG_SYSFS
>> +void pci_dev_assign_slot(struct pci_dev *dev);
>>  struct pci_slot *pci_create_slot(struct pci_bus *parent, int slot_nr,
>>  				 const char *name,
>>  				 struct hotplug_slot *hotplug);
>>  void pci_destroy_slot(struct pci_slot *slot);
>> -#ifdef CONFIG_SYSFS
>> -void pci_dev_assign_slot(struct pci_dev *dev);
>>  #else
>>  static inline void pci_dev_assign_slot(struct pci_dev *dev) { }
>> +static inline struct pci_slot *pci_create_slot(struct pci_bus *parent,
>> +					       int slot_nr,
>> +					       const char *name,
>> +					       struct hotplug_slot *hotplug) {
>> +	return ERR_PTR(-EINVAL);
>> +}
>> +static inline void pci_destroy_slot(struct pci_slot *slot) { }
>>  #endif
>>  int pci_scan_slot(struct pci_bus *bus, int devfn);
>>  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>>
> 
> The serial number in slot info is used to match VF NIC with Synthetic NIC.
> Without selecting SYSFS, the SRIOV feature will fail on VM on Hyper-V and
> Azure. The first version of this patch should be used.
> 
> @Stephen Hemminger how do you think?
> 
> Thanks,
> - Haiyang


Hi Stephen,

Please comment on this patch or v1.
v1:  https://lore.kernel.org/lkml/69c25bc3-da00-2758-92ee-13c82b51fc45@infradead.org/


thanks.
-- 
~Randy

^ permalink raw reply

* Re: [PATCH] ACPI: PM: Fix "multiple definition of acpi_sleep_state_supported" for ARM64
From: Rafael J. Wysocki @ 2019-07-06  7:53 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Pavel Machek, Michael Kelley, linux-acpi@vger.kernel.org,
	rjw@rjwysocki.net, lenb@kernel.org, robert.moore@intel.com,
	erik.schmauss@intel.com, Russell King, Russ Dill,
	Sebastian Capella, Lorenzo Pieralisi,
	Russell King - ARM Linux admin, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	Haiyang Zhang, Sasha Levin, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB0169731042EFE4D6B08F04A5BFF50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Fri, Jul 5, 2019 at 10:18 PM Dexuan Cui <decui@microsoft.com> wrote:
>
>
> If CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT is not set, the dummy version of
> the function should be static.
>
> Fixes: 1e2c3f0f1e93 ("ACPI: PM: Make acpi_sleep_state_supported() non-static")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> ---
>
> Sorry for not doing it right in the previous patch!
>
> The patch fixes the build errors on ARM64:
>
>    drivers/net/ethernet/qualcomm/emac/emac-phy.o: In function `acpi_sleep_state_supported':
> >> emac-phy.c:(.text+0x1d8): multiple definition of `acpi_sleep_state_supported'
>    drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here
>    drivers/net/ethernet/qualcomm/emac/emac-sgmii.o: In function `acpi_sleep_state_supported':
>    emac-sgmii.c:(.text+0x548): multiple definition of `acpi_sleep_state_supported'
>    drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here
>
>
>  include/acpi/acpi_bus.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 4ce59bdc852e..8ffc4acf2b56 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -657,7 +657,7 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable)
>  #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
>  bool acpi_sleep_state_supported(u8 sleep_state);
>  #else
> -bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
> +static bool acpi_sleep_state_supported(u8 sleep_state) { return false; }

This should be static inline even.

I've reapplied the original patch with this change folded in.

Thanks!

^ permalink raw reply

* [PATCH] ACPI: PM: Fix "multiple definition of acpi_sleep_state_supported" for ARM64
From: Dexuan Cui @ 2019-07-05 20:18 UTC (permalink / raw)
  To: Pavel Machek, Michael Kelley, linux-acpi@vger.kernel.org,
	rjw@rjwysocki.net, lenb@kernel.org, robert.moore@intel.com,
	erik.schmauss@intel.com, Russell King, Russ Dill,
	Sebastian Capella, Lorenzo Pieralisi,
	Russell King - ARM Linux admin
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com


If CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT is not set, the dummy version of
the function should be static.

Fixes: 1e2c3f0f1e93 ("ACPI: PM: Make acpi_sleep_state_supported() non-static")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reported-by: kbuild test robot <lkp@intel.com>
---

Sorry for not doing it right in the previous patch!

The patch fixes the build errors on ARM64:

   drivers/net/ethernet/qualcomm/emac/emac-phy.o: In function `acpi_sleep_state_supported':
>> emac-phy.c:(.text+0x1d8): multiple definition of `acpi_sleep_state_supported'
   drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here
   drivers/net/ethernet/qualcomm/emac/emac-sgmii.o: In function `acpi_sleep_state_supported':
   emac-sgmii.c:(.text+0x548): multiple definition of `acpi_sleep_state_supported'
   drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here


 include/acpi/acpi_bus.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 4ce59bdc852e..8ffc4acf2b56 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -657,7 +657,7 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable)
 #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
 bool acpi_sleep_state_supported(u8 sleep_state);
 #else
-bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
+static bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
 #endif
 
 #ifdef CONFIG_ACPI_SLEEP
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH v2] PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
From: Lorenzo Pieralisi @ 2019-07-05 13:41 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-pci@vger.kernel.org, bhelgaas@google.com, Haiyang Zhang,
	KY Srinivasan, Stephen Hemminger, Sasha Levin,
	linux-hyperv@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Michael Kelley, Lili Deng (Wicresoft North America Ltd),
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org
In-Reply-To: <PU1P153MB0169D420EAB61757DF4B337FBFE70@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Fri, Jun 21, 2019 at 11:45:23PM +0000, Dexuan Cui wrote:
> 
> The commit 05f151a73ec2 itself is correct, but it exposes this
> use-after-free bug, which is caught by some memory debug options.
> 
> Add a Fixes tag to indicate the dependency.
> 
> Fixes: 05f151a73ec2 ("PCI: hv: Fix a memory leak in hv_eject_device_work()")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> ---
> 
> In v2:
> Replaced "hpdev->hbus" with "hbus", since we have the new "hbus" variable. [Michael Kelley]
> 
>  drivers/pci/controller/pci-hyperv.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)

Applied to pci/hv for v5.3, thanks.

Lorenzo

> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 808a182830e5..5dadc964ad3b 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1880,6 +1880,7 @@ static void hv_pci_devices_present(struct hv_pcibus_device *hbus,
>  static void hv_eject_device_work(struct work_struct *work)
>  {
>  	struct pci_eject_response *ejct_pkt;
> +	struct hv_pcibus_device *hbus;
>  	struct hv_pci_dev *hpdev;
>  	struct pci_dev *pdev;
>  	unsigned long flags;
> @@ -1890,6 +1891,7 @@ static void hv_eject_device_work(struct work_struct *work)
>  	} ctxt;
>  
>  	hpdev = container_of(work, struct hv_pci_dev, wrk);
> +	hbus = hpdev->hbus;
>  
>  	WARN_ON(hpdev->state != hv_pcichild_ejecting);
>  
> @@ -1900,8 +1902,7 @@ static void hv_eject_device_work(struct work_struct *work)
>  	 * because hbus->pci_bus may not exist yet.
>  	 */
>  	wslot = wslot_to_devfn(hpdev->desc.win_slot.slot);
> -	pdev = pci_get_domain_bus_and_slot(hpdev->hbus->sysdata.domain, 0,
> -					   wslot);
> +	pdev = pci_get_domain_bus_and_slot(hbus->sysdata.domain, 0, wslot);
>  	if (pdev) {
>  		pci_lock_rescan_remove();
>  		pci_stop_and_remove_bus_device(pdev);
> @@ -1909,9 +1910,9 @@ static void hv_eject_device_work(struct work_struct *work)
>  		pci_unlock_rescan_remove();
>  	}
>  
> -	spin_lock_irqsave(&hpdev->hbus->device_list_lock, flags);
> +	spin_lock_irqsave(&hbus->device_list_lock, flags);
>  	list_del(&hpdev->list_entry);
> -	spin_unlock_irqrestore(&hpdev->hbus->device_list_lock, flags);
> +	spin_unlock_irqrestore(&hbus->device_list_lock, flags);
>  
>  	if (hpdev->pci_slot)
>  		pci_destroy_slot(hpdev->pci_slot);
> @@ -1920,7 +1921,7 @@ static void hv_eject_device_work(struct work_struct *work)
>  	ejct_pkt = (struct pci_eject_response *)&ctxt.pkt.message;
>  	ejct_pkt->message_type.type = PCI_EJECTION_COMPLETE;
>  	ejct_pkt->wslot.slot = hpdev->desc.win_slot.slot;
> -	vmbus_sendpacket(hpdev->hbus->hdev->channel, ejct_pkt,
> +	vmbus_sendpacket(hbus->hdev->channel, ejct_pkt,
>  			 sizeof(*ejct_pkt), (unsigned long)&ctxt.pkt,
>  			 VM_PKT_DATA_INBAND, 0);
>  
> @@ -1929,7 +1930,9 @@ static void hv_eject_device_work(struct work_struct *work)
>  	/* For the two refs got in new_pcichild_device() */
>  	put_pcichild(hpdev);
>  	put_pcichild(hpdev);
> -	put_hvpcibus(hpdev->hbus);
> +	/* hpdev has been freed. Do not use it any more. */
> +
> +	put_hvpcibus(hbus);
>  }
>  
>  /**
> -- 
> 2.17.1
> 

^ permalink raw reply

* Re: [PATCH] ACPI: PM: Make acpi_sleep_state_supported() non-static
From: Rafael J. Wysocki @ 2019-07-05  9:40 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Pavel Machek, Michael Kelley, linux-acpi@vger.kernel.org,
	lenb@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com,
	Russell King, Russ Dill, Sebastian Capella, Lorenzo Pieralisi,
	Russell King - ARM Linux admin, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, KY Srinivasan, Stephen Hemminger,
	Haiyang Zhang, Sasha Levin, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB0169A260911AACDA861F0029BFFA0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thursday, July 4, 2019 4:43:32 AM CEST Dexuan Cui wrote:
> 
> With some upcoming patches to save/restore the Hyper-V drivers related
> states, a Linux VM running on Hyper-V will be able to hibernate. When
> a Linux VM hibernates, unluckily we must disable the memory hot-add/remove
> and balloon up/down capabilities in the hv_balloon driver
> (drivers/hv/hv_balloon.c), because these can not really work according to
> the design of the related back-end driver on the host.
> 
> By default, Hyper-V does not enable the virtual ACPI S4 state for a VM;
> on recent Hyper-V hosts, the administrator is able to enable the virtual
> ACPI S4 state for a VM, so we hope to use the presence of the virtual ACPI
> S4 state as a hint for hv_balloon to disable the aforementioned
> capabilities. In this way, hibernation will work more reliably, from the
> user's perspective.
> 
> By marking acpi_sleep_state_supported() non-static, we'll be able to
> implement a hv_is_hibernation_supported() API in the always-built-in
> module arch/x86/hyperv/hv_init.c, and the API will be called by hv_balloon.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> Previously I posted a version that tries to export the function:
> https://lkml.org/lkml/2019/6/14/1077, which may be an overkill.
> 
> So I proposed a second patch (which covers this patch and shows how this
> patch will be used): https://lkml.org/lkml/2019/6/19/861
> 
> I explained the situation in detail here: https://lkml.org/lkml/2019/6/21/63
> (a correction: old Hyper-V hosts can support guest hibernation, but some
> important functionalities in the host's management tool stack are missing).
> 
> There is no further reply in that discussion, so I'm sending this patch to
> draw people's attention again. :-)
> 
>  drivers/acpi/sleep.c    | 2 +-
>  include/acpi/acpi_bus.h | 6 ++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
> index 8ff08e531443..d1ff303a857a 100644
> --- a/drivers/acpi/sleep.c
> +++ b/drivers/acpi/sleep.c
> @@ -77,7 +77,7 @@ static int acpi_sleep_prepare(u32 acpi_state)
>  	return 0;
>  }
>  
> -static bool acpi_sleep_state_supported(u8 sleep_state)
> +bool acpi_sleep_state_supported(u8 sleep_state)
>  {
>  	acpi_status status;
>  	u8 type_a, type_b;
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 31b6c87d6240..3e6563e1a2c0 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -651,6 +651,12 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable)
>  }
>  #endif
>  
> +#ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
> +bool acpi_sleep_state_supported(u8 sleep_state);
> +#else
> +bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
> +#endif
> +
>  #ifdef CONFIG_ACPI_SLEEP
>  u32 acpi_target_system_state(void);
>  #else
> 

Applied, thanks!




^ permalink raw reply

* [PATCH] ACPI: PM: Make acpi_sleep_state_supported() non-static
From: Dexuan Cui @ 2019-07-04  2:43 UTC (permalink / raw)
  To: Pavel Machek, Michael Kelley, linux-acpi@vger.kernel.org,
	rjw@rjwysocki.net, lenb@kernel.org, robert.moore@intel.com,
	erik.schmauss@intel.com, Russell King, Russ Dill,
	Sebastian Capella, Lorenzo Pieralisi,
	Russell King - ARM Linux admin
  Cc: linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, Haiyang Zhang, Sasha Levin,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com


With some upcoming patches to save/restore the Hyper-V drivers related
states, a Linux VM running on Hyper-V will be able to hibernate. When
a Linux VM hibernates, unluckily we must disable the memory hot-add/remove
and balloon up/down capabilities in the hv_balloon driver
(drivers/hv/hv_balloon.c), because these can not really work according to
the design of the related back-end driver on the host.

By default, Hyper-V does not enable the virtual ACPI S4 state for a VM;
on recent Hyper-V hosts, the administrator is able to enable the virtual
ACPI S4 state for a VM, so we hope to use the presence of the virtual ACPI
S4 state as a hint for hv_balloon to disable the aforementioned
capabilities. In this way, hibernation will work more reliably, from the
user's perspective.

By marking acpi_sleep_state_supported() non-static, we'll be able to
implement a hv_is_hibernation_supported() API in the always-built-in
module arch/x86/hyperv/hv_init.c, and the API will be called by hv_balloon.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Previously I posted a version that tries to export the function:
https://lkml.org/lkml/2019/6/14/1077, which may be an overkill.

So I proposed a second patch (which covers this patch and shows how this
patch will be used): https://lkml.org/lkml/2019/6/19/861

I explained the situation in detail here: https://lkml.org/lkml/2019/6/21/63
(a correction: old Hyper-V hosts can support guest hibernation, but some
important functionalities in the host's management tool stack are missing).

There is no further reply in that discussion, so I'm sending this patch to
draw people's attention again. :-)

 drivers/acpi/sleep.c    | 2 +-
 include/acpi/acpi_bus.h | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/sleep.c b/drivers/acpi/sleep.c
index 8ff08e531443..d1ff303a857a 100644
--- a/drivers/acpi/sleep.c
+++ b/drivers/acpi/sleep.c
@@ -77,7 +77,7 @@ static int acpi_sleep_prepare(u32 acpi_state)
 	return 0;
 }
 
-static bool acpi_sleep_state_supported(u8 sleep_state)
+bool acpi_sleep_state_supported(u8 sleep_state)
 {
 	acpi_status status;
 	u8 type_a, type_b;
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 31b6c87d6240..3e6563e1a2c0 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -651,6 +651,12 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable)
 }
 #endif
 
+#ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
+bool acpi_sleep_state_supported(u8 sleep_state);
+#else
+bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
+#endif
+
 #ifdef CONFIG_ACPI_SLEEP
 u32 acpi_target_system_state(void);
 #else
-- 
2.19.1


^ permalink raw reply related

* [PATCH v3] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-07-03  2:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Zhenzhong Duan, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Juergen Gross, Boris Ostrovsky,
	Peter Zijlstra, Waiman Long, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, linux-hyperv

With the boot parameter "hv_nopvspin" specified a Hyperv guest should
not make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this disable the virt_spin_lock_key.

Same change for XEN is already in Commit e6fd28eb3522
("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-hyperv@vger.kernel.org
---
v3: remove unlikely() as suggested by Sasha

 arch/x86/hyperv/hv_spinlock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..210495b 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (!hv_pvspin)
+		static_branch_disable(&virt_spin_lock_key);
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] x86/hyper-v: Zero out the VP assist page to fix CPU offlining
From: Dexuan Cui @ 2019-07-04  1:45 UTC (permalink / raw)
  To: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, Sasha Levin,
	linux-hyperv@vger.kernel.org, Michael Kelley, Long Li, vkuznets,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	x86@kernel.org
  Cc: linux-kernel@vger.kernel.org, marcelo.cerri@canonical.com,
	driverdev-devel@linuxdriverproject.org, olaf@aepfle.de,
	apw@canonical.com, jasowang@redhat.com

When a CPU is being offlined, the CPU usually still receives a few
interrupts (e.g. reschedule IPIs), after hv_cpu_die() disables the
HV_X64_MSR_VP_ASSIST_PAGE, so hv_apic_eoi_write() may not write the EOI
MSR, if the apic_assist field's bit0 happens to be 1; as a result, Hyper-V
may not be able to deliver all the interrupts to the CPU, and the CPU may
not be stopped, and the kernel will hang soon.

The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
5.2.1 "GPA Overlay Pages"), so with this fix we're sure the apic_assist
field is still zero, after the VP ASSIST PAGE is disabled.

Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 arch/x86/hyperv/hv_init.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0e033ef11a9f..db51a301f759 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -60,8 +60,14 @@ static int hv_cpu_init(unsigned int cpu)
 	if (!hv_vp_assist_page)
 		return 0;
 
+	/*
+	 * The ZERO flag is necessary, because in the case of CPU offlining
+	 * the page can still be used by hv_apic_eoi_write() for a while,
+	 * after the VP ASSIST PAGE is disabled in hv_cpu_die().
+	 */
 	if (!*hvp)
-		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
+				 PAGE_KERNEL);
 
 	if (*hvp) {
 		u64 val;
-- 
2.19.1


^ permalink raw reply related

* Re: [Xen-devel] [PATCH v2 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-07-03 18:09 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Sasha Levin, linux-hyperv@vger.kernel.org,
	the arch/x86 maintainers, Stephen Hemminger, kvm list,
	Peter Zijlstra, Haiyang Zhang, Dave Hansen,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org, Ingo Molnar,
	Borislav Petkov, Andy Lutomirski, Paolo Bonzini, xen-devel,
	Thomas Gleixner, K. Y. Srinivasan, Boris Ostrovsky
In-Reply-To: <6038042c-917f-d361-5d79-f0205152fe00@citrix.com>

> On Jul 3, 2019, at 10:43 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> 
> On 03/07/2019 18:02, Nadav Amit wrote:
>>> On Jul 3, 2019, at 7:04 AM, Juergen Gross <jgross@suse.com> wrote:
>>> 
>>> On 03.07.19 01:51, Nadav Amit wrote:
>>>> To improve TLB shootdown performance, flush the remote and local TLBs
>>>> concurrently. Introduce flush_tlb_multi() that does so. Introduce
>>>> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
>>>> and hyper-v are only compile-tested).
>>>> While the updated smp infrastructure is capable of running a function on
>>>> a single local core, it is not optimized for this case. The multiple
>>>> function calls and the indirect branch introduce some overhead, and
>>>> might make local TLB flushes slower than they were before the recent
>>>> changes.
>>>> Before calling the SMP infrastructure, check if only a local TLB flush
>>>> is needed to restore the lost performance in this common case. This
>>>> requires to check mm_cpumask() one more time, but unless this mask is
>>>> updated very frequently, this should impact performance negatively.
>>>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>>>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>>>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>>>> Cc: Sasha Levin <sashal@kernel.org>
>>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>>> Cc: Ingo Molnar <mingo@redhat.com>
>>>> Cc: Borislav Petkov <bp@alien8.de>
>>>> Cc: x86@kernel.org
>>>> Cc: Juergen Gross <jgross@suse.com>
>>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>>> Cc: Andy Lutomirski <luto@kernel.org>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>>>> Cc: linux-hyperv@vger.kernel.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> Cc: virtualization@lists.linux-foundation.org
>>>> Cc: kvm@vger.kernel.org
>>>> Cc: xen-devel@lists.xenproject.org
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> arch/x86/hyperv/mmu.c                 | 13 +++---
>>>> arch/x86/include/asm/paravirt.h       |  6 +--
>>>> arch/x86/include/asm/paravirt_types.h |  4 +-
>>>> arch/x86/include/asm/tlbflush.h       |  9 ++--
>>>> arch/x86/include/asm/trace/hyperv.h   |  2 +-
>>>> arch/x86/kernel/kvm.c                 | 11 +++--
>>>> arch/x86/kernel/paravirt.c            |  2 +-
>>>> arch/x86/mm/tlb.c                     | 65 ++++++++++++++++++++-------
>>>> arch/x86/xen/mmu_pv.c                 | 20 ++++++---
>>>> include/trace/events/xen.h            |  2 +-
>>>> 10 files changed, 91 insertions(+), 43 deletions(-)
>>> ...
>>> 
>>>> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
>>>> index beb44e22afdf..19e481e6e904 100644
>>>> --- a/arch/x86/xen/mmu_pv.c
>>>> +++ b/arch/x86/xen/mmu_pv.c
>>>> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr)
>>>> 	preempt_enable();
>>>> }
>>>> -static void xen_flush_tlb_others(const struct cpumask *cpus,
>>>> -				 const struct flush_tlb_info *info)
>>>> +static void xen_flush_tlb_multi(const struct cpumask *cpus,
>>>> +				const struct flush_tlb_info *info)
>>>> {
>>>> 	struct {
>>>> 		struct mmuext_op op;
>>>> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
>>>> 	const size_t mc_entry_size = sizeof(args->op) +
>>>> 		sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus());
>>>> -	trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end);
>>>> +	trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end);
>>>>   	if (cpumask_empty(cpus))
>>>> 		return;		/* nothing to do */
>>>> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask *cpus,
>>>> 	args = mcs.args;
>>>> 	args->op.arg2.vcpumask = to_cpumask(args->mask);
>>>> -	/* Remove us, and any offline CPUS. */
>>>> +	/* Flush locally if needed and remove us */
>>>> +	if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) {
>>>> +		local_irq_disable();
>>>> +		flush_tlb_func_local(info);
>>> I think this isn't the correct function for PV guests.
>>> 
>>> In fact it should be much easier: just don't clear the own cpu from the
>>> mask, that's all what's needed. The hypervisor is just fine having the
>>> current cpu in the mask and it will do the right thing.
>> Thanks. I will do so in v3. I don’t think Hyper-V people would want to do
>> the same, unfortunately, since it would induce VM-exit on TLB flushes.
> 
> Why do you believe the vmexit matters?  You're talking one anyway for
> the IPI.
> 
> Intel only have virtualised self-IPI, and while AMD do have working
> non-self IPIs, you still take a vmexit anyway if any destination vcpu
> isn't currently running in non-root mode (IIRC).
> 
> At that point, you might as well have the hypervisor do all the hard
> work via a multi-cpu shootdown/flush hypercall, rather than trying to
> arrange it locally.

I forgot that xen_flush_tlb_multi() should actually only be called when
there are some remote CPUs (as I optimized the case in which there is only a
single local CPU that needs to be flushed), so you are right.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox