Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-07  6:56 UTC (permalink / raw)
  To: Jakub Kicinski, Stephen Hemminger
  Cc: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com
In-Reply-To: <20190806121242.141c2324@cakuba.netronome.com>

> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> Sent: Tuesday, August 6, 2019 12:13 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Tue, 6 Aug 2019 05:17:44 +0000, Dexuan Cui wrote:
> > This fixes a warning of "suspicious rcu_dereference_check() usage"
> > when nload runs.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> Minor change in behaviour would perhaps be worth acknowledging in the
> commit message (since you check ndev for NULL later now), and a Fixes
> tag would be good.
> 
> But the looks pretty straightforward and correct!

Hi,
Yeah, it looks the minor behavior change doesn't matter, because IMO the 
'nvdev' can only be NULL when the NIC is being removed, or the MTU is
being changed, etc.

The Fixes tag is:
Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")

If I should send a v2, please let me know.

Thanks,
-- Dexuan


^ permalink raw reply

* RE: [PATCH v2 4/7] Drivers: hv: vmbus: Suspend/resume the synic for hibernation
From: Michael Kelley @ 2019-08-07 15:21 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-5-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Wednesday, July 31, 2019 10:52 AM
> 
> This is needed when we resume the old kernel from the "current" kernel.
> 
> Note: when hv_synic_suspend() and hv_synic_resume() run, all the
> non-boot CPUs have been offlined, and interrupts are disabled on CPU0.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/vmbus_drv.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2 5/7] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation
From: Michael Kelley @ 2019-08-07 15:22 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-6-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com>  Sent: Wednesday, July 31, 2019 10:52 AM
> 
> When the VM resumes, the host re-sends the offers. We should not add the
> offers to the global vmbus_connection.chn_list again.
> 
> Added some debug code, in case the host screws up the exact info related to
> the offers.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/channel_mgmt.c | 29 ++++++++++++++++++++++++++++-
>  1 file changed, 28 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2 6/7] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
From: Michael Kelley @ 2019-08-07 15:22 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-7-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Wednesday, July 31, 2019 10:52 AM
> 
> Before Linux enters hibernation, it sends the CHANNELMSG_UNLOAD message to
> the host so all the offers are gone. After hibernation, Linux needs to
> re-negotiate with the host using the same vmbus protocol version (which
> was in use before hibernation), and ask the host to re-offer the vmbus
> devices.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/connection.c   |  3 +--
>  drivers/hv/hyperv_vmbus.h |  2 ++
>  drivers/hv/vmbus_drv.c    | 50
> +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 53 insertions(+), 2 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH v2 6/7] Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation
From: Dexuan Cui @ 2019-08-07 23:09 UTC (permalink / raw)
  To: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan, Michael Kelley, tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1564595464-56520-7-git-send-email-decui@microsoft.com>

> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, July 31, 2019 10:52 AM
> To: linux-hyperv@vger.kernel.org; gregkh@linuxfoundation.org; Stephen
> @@ -2050,6 +2095,10 @@ static int vmbus_acpi_add(struct acpi_device
> *device)
>  };
>  MODULE_DEVICE_TABLE(acpi, vmbus_acpi_device_ids);
> 
> +static const struct dev_pm_ops vmbus_bus_pm = {
> +	SET_SYSTEM_SLEEP_PM_OPS(vmbus_bus_suspend, vmbus_bus_resume)
> +};

It turns out this " SET_SYSTEM_SLEEP_PM_OPS" should be changed to
"SET_NOIRQ_SYSTEM_SLEEP_PM_OPS" to make NIC SR-IOV work with
hibernation.

This is because the "pci_dev_pm_ops" uses the "noirq" callbacks. 
In the resume path, the "noirq" restore callback runs before the non-noirq
callbacks, meaning the vmbus_bus_resume() and the pci-hyperv's .resume()
must also run via the "noirq" callbacks.

Similarly, I also need to make the same change in this patch:
[PATCH v2 7/7] Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation

BTW, I'd like to change the print_hex_dump_debug() to print_hex_dump() in this patch:
[PATCH v2 5/7] Drivers: hv: vmbus: Ignore the offers when resuming from hibernation

print_hex_dump_debug() outputs nothing unless we enable the dyndbg. 
This is not good as we do want see the error messages here, if there is an error.

I'll do more tests and post a v3 of the patchset.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-08-08 18:46 UTC (permalink / raw)
  To: lorenzo.pieralisi@arm.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org
  Cc: Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com, Dexuan Cui


In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
but the current code misses the pci_legacy_resume_early() path, so the
state remains in PCI_UNKNOWN in that path, and during hiberantion this
causes an error for the Mellanox VF driver, which fails to enable
MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:

mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
mlx4_core a6d1:00:02.0: Sending reset
mlx4_core a6d1:00:02.0: Sending vhcr0
mlx4_core a6d1:00:02.0: HCA minimum page size:512
mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
PM: Device a6d1:00:02.0 failed to thaw: error -95

Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/pci/pci-driver.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 36dbe960306b..27dfc68db9e7 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
 			return error;
 	}
 
-	if (pci_has_legacy_pm_support(pci_dev))
-		return pci_legacy_resume_early(dev);
-
 	/*
 	 * pci_restore_state() requires the device to be in D0 (because of MSI
 	 * restoration among other things), so force it into D0 in case the
 	 * driver's "freeze" callbacks put it into a low-power state directly.
 	 */
 	pci_set_power_state(pci_dev, PCI_D0);
+
+	if (pci_has_legacy_pm_support(pci_dev))
+		return pci_legacy_resume_early(dev);
+
 	pci_restore_state(pci_dev);
 
 	if (drv && drv->pm && drv->pm->thaw_noirq)
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Bjorn Helgaas @ 2019-08-08 19:19 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <PU1P153MB01695867B01987A8C239A8CCBFD70@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Aug 08, 2019 at 06:46:51PM +0000, Dexuan Cui wrote:
> 
> In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> but the current code misses the pci_legacy_resume_early() path, so the
> state remains in PCI_UNKNOWN in that path, and during hiberantion this
> causes an error for the Mellanox VF driver, which fails to enable
> MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:

s/hiberantion/hibernation/

Actually, it sounds more like "during *resume*, this causes an error",
so maybe you want s/hiberantion/resume/ instead?

> mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> mlx4_core a6d1:00:02.0: Sending reset
> mlx4_core a6d1:00:02.0: Sending vhcr0
> mlx4_core a6d1:00:02.0: HCA minimum page size:512
> mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/pci/pci-driver.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 36dbe960306b..27dfc68db9e7 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  			return error;
>  	}
>  
> -	if (pci_has_legacy_pm_support(pci_dev))
> -		return pci_legacy_resume_early(dev);
> -
>  	/*
>  	 * pci_restore_state() requires the device to be in D0 (because of MSI
>  	 * restoration among other things), so force it into D0 in case the
>  	 * driver's "freeze" callbacks put it into a low-power state directly.
>  	 */
>  	pci_set_power_state(pci_dev, PCI_D0);
> +
> +	if (pci_has_legacy_pm_support(pci_dev))
> +		return pci_legacy_resume_early(dev);
> +
>  	pci_restore_state(pci_dev);
>  
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> -- 
> 2.19.1
> 

^ permalink raw reply

* RE: [PATCH] PCI: PM: Also move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-08-08 20:02 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <20190808191913.GI151852@google.com>

> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Thursday, August 8, 2019 12:19 PM
> To: Dexuan Cui <decui@microsoft.com>
> 
> On Thu, Aug 08, 2019 at 06:46:51PM +0000, Dexuan Cui wrote:
> >
> > In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> > In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> > but the current code misses the pci_legacy_resume_early() path, so the
> > state remains in PCI_UNKNOWN in that path, and during hiberantion this
> > causes an error for the Mellanox VF driver, which fails to enable
> > MSI-X: pci_msi_supported() is false due to dev->current_state != PCI_D0:
> 
> s/hiberantion/hibernation/

Thanks for spoting this typo! :-)
 
> Actually, it sounds more like "during *resume*, this causes an error",
> so maybe you want s/hiberantion/resume/ instead?

Yes, it's during "resume", and to be more accurate, it happens during the
"resume" of the "thaw" phase: when we run "echo disk > /sys/power/state",
first the kernel "freezes" all the devices and create a hibernation image, then
the kernel "thaws" the devices including the disk/NIC, and writes the memory
to the disk and powers down. This patch fixes the error message for the
Mellanox VF driver in this phase. 

When the system starts again, a fresh kernel starts to run, and when the kernel
detects that a hibernation image was saved, the kernel "quiesces" the devices,
and "restores" the devices from the saved image. Here device_resume_noirq()
-> ... -> pci_pm_restore_noirq() -> pci_pm_default_resume_early() ->
pci_power_up() moves the device states back to PCI_D0. This path is not broken
and doesn't need my patch. 

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: David Miller @ 2019-08-09  1:13 UTC (permalink / raw)
  To: decui
  Cc: netdev, haiyangz, sthemmin, sashal, kys, mikelley, linux-hyperv,
	linux-kernel, olaf, apw, jasowang, vkuznets, marcelo.cerri
In-Reply-To: <PU1P153MB0169AECABF6094A3E7BEE381BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Tue, 6 Aug 2019 05:17:44 +0000

> 
> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Please resend with appropriate fixes tag.

^ permalink raw reply

* RE: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-09  1:16 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com
In-Reply-To: <20190808.181350.1331633709956960086.davem@davemloft.net>

> From: David Miller <davem@davemloft.net>
> Sent: Thursday, August 8, 2019 6:14 PM
> To: Dexuan Cui <decui@microsoft.com>
> Cc: netdev@vger.kernel.org; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; sashal@kernel.org; KY
> Srinivasan <kys@microsoft.com>; Michael Kelley <mikelley@microsoft.com>;
> linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org; olaf@aepfle.de;
> apw@canonical.com; jasowang@redhat.com; vkuznets
> <vkuznets@redhat.com>; marcelo.cerri@canonical.com
> Subject: Re: [PATCH net] hv_netvsc: Fix a warning of suspicious RCU usage
> 
> From: Dexuan Cui <decui@microsoft.com>
> Date: Tue, 6 Aug 2019 05:17:44 +0000
> 
> >
> > This fixes a warning of "suspicious rcu_dereference_check() usage"
> > when nload runs.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> Please resend with appropriate fixes tag.

Will do shortly.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH net v2] hv_netvsc: Fix a warning of suspicious RCU usage
From: Dexuan Cui @ 2019-08-09  1:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org, David S. Miller, Haiyang Zhang,
	Stephen Hemminger, Jakub Kicinski
  Cc: sashal@kernel.org, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com


This fixes a warning of "suspicious rcu_dereference_check() usage"
when nload runs.

Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---

Changes in v2:
    Made the minimal required change.
	Added a Fixes tag.
	Removed Stephen H.'s Signed-off-by since this is somewhat different from the 
		v1 from him; if there is any bug in v2, it's all my fault. :-)

 drivers/net/hyperv/netvsc_drv.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index f9209594624b..b6357a75712c 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -1240,12 +1240,15 @@ static void netvsc_get_stats64(struct net_device *net,
 			       struct rtnl_link_stats64 *t)
 {
 	struct net_device_context *ndev_ctx = netdev_priv(net);
-	struct netvsc_device *nvdev = rcu_dereference_rtnl(ndev_ctx->nvdev);
+	struct netvsc_device *nvdev;
 	struct netvsc_vf_pcpu_stats vf_tot;
 	int i;
 
+	rcu_read_lock();
+
+	nvdev = rcu_dereference(ndev_ctx->nvdev);
 	if (!nvdev)
-		return;
+		goto out;
 
 	netdev_stats_to_stats64(t, &net->stats);
 
@@ -1284,6 +1287,8 @@ static void netvsc_get_stats64(struct net_device *net,
 		t->rx_packets	+= packets;
 		t->multicast	+= multicast;
 	}
+out:
+	rcu_read_unlock();
 }
 
 static int netvsc_set_mac_addr(struct net_device *ndev, void *p)
-- 
2.19.1


^ permalink raw reply related

* [PATCH 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support
From: lantianyu1986 @ 2019-08-09  9:49 UTC (permalink / raw)
  To: pbonzini, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx,
	mingo, bp, hpa, x86, michael.h.kelley
  Cc: Tianyu Lan, kvm, linux-doc, linux-hyperv, linux-kernel, vkuznets

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
in L0 can delegate L1 hypervisor to handle tlb flush request from
L2 guest when direct tlb flush is enabled in L1.

Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
feature from user space. User space should enable this feature only
when Hyper-V hypervisor capability is exposed to guest and KVM profile
is hided. There is a parameter conflict between KVM and Hyper-V hypercall.
We hope L2 guest doesn't use KVM hypercall when the feature is
enabled. Detail please see comment of new API "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"

Tianyu Lan (2):
  x86/Hyper-V: Fix definition of struct hv_vp_assist_page
  KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH

Vitaly Kuznetsov (1):
  KVM/Hyper-V/VMX: Add direct tlb flush support

 Documentation/virtual/kvm/api.txt  | 10 ++++++++++
 arch/x86/include/asm/hyperv-tlfs.h | 24 +++++++++++++++++++-----
 arch/x86/include/asm/kvm_host.h    |  2 ++
 arch/x86/kvm/vmx/evmcs.h           |  2 ++
 arch/x86/kvm/vmx/vmx.c             | 38 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                 |  8 ++++++++
 include/linux/kvm_host.h           |  1 +
 include/uapi/linux/kvm.h           |  1 +
 8 files changed, 81 insertions(+), 5 deletions(-)

-- 
2.14.2

^ permalink raw reply

* [PATCH 1/3] x86/Hyper-V: Fix definition of struct hv_vp_assist_page
From: lantianyu1986 @ 2019-08-09  9:49 UTC (permalink / raw)
  To: pbonzini, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx,
	mingo, bp, hpa, x86, michael.h.kelley
  Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, vkuznets
In-Reply-To: <20190809094939.76093-1-Tianyu.Lan@microsoft.com>

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

The struct hv_vp_assist_page was defined incorrectly.
The "vtl_control" should be u64[3], "nested_enlightenments_control"
should be a u64 and there is 7 reserved bytes following "enlighten_vmentry".
This patch is to fix it.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index af78cd72b8f3..a79703c56ebe 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -517,11 +517,11 @@ struct hv_timer_message_payload {
 /* Define virtual processor assist page structure. */
 struct hv_vp_assist_page {
 	__u32 apic_assist;
-	__u32 reserved;
-	__u64 vtl_control[2];
+	__u32 reserved1;
+	__u64 vtl_control[3];
 	__u64 nested_enlightenments_control[2];
-	__u32 enlighten_vmentry;
-	__u32 padding;
+	__u8 enlighten_vmentry;
+	__u8 reserved2[7];
 	__u64 current_nested_vmcs;
 } __packed;
 
-- 
2.14.2


^ permalink raw reply related

* [PATCH 2/3] KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH
From: lantianyu1986 @ 2019-08-09  9:49 UTC (permalink / raw)
  To: pbonzini, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx,
	mingo, bp, hpa, x86, michael.h.kelley
  Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, vkuznets
In-Reply-To: <20190809094939.76093-1-Tianyu.Lan@microsoft.com>

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

This patch adds new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH and let
user space to enable direct tlb flush function when only Hyper-V
hypervsior capability is exposed to VM. This patch also adds
enable_direct_tlbflush callback in the struct kvm_x86_ops and
platforms may use it to implement direct tlb flush support.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 Documentation/virtual/kvm/api.txt | 10 ++++++++++
 arch/x86/include/asm/kvm_host.h   |  2 ++
 arch/x86/kvm/x86.c                |  8 ++++++++
 include/uapi/linux/kvm.h          |  1 +
 4 files changed, 21 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 2cd6250b2896..45308ed6dd75 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -5289,3 +5289,13 @@ Architectures: x86
 This capability indicates that KVM supports paravirtualized Hyper-V IPI send
 hypercalls:
 HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
+8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
+
+Architecture: x86
+
+This capability indicates that KVM supports Hyper-V direct tlb flush function.
+User space should enable this feature only when Hyper-V hypervisor capability
+is exposed to guest and KVM profile is hided. Both Hyper-V and KVM hypercalls
+use RAX and RCX registers to pass parameters. If KVM hypercall is exposed
+to L2 guest with direct tlbflush enabled, Hyper-V may mistake KVM hypercall
+for Hyper-V tlb flush Hypercall due to paremeter register overlap.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cc5b611a113..667d154e89d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1205,6 +1205,8 @@ struct kvm_x86_ops {
 	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
 
 	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
+
+	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
 };
 
 struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9d7b9e6a0939..a9d8ee7f7bf0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3183,6 +3183,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = kvm_x86_ops->get_nested_state ?
 			kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0;
 		break;
+	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+		r = kvm_x86_ops->enable_direct_tlbflush ? 1 : 0;
+		break;
 	default:
 		break;
 	}
@@ -3953,6 +3956,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
 				r = -EFAULT;
 		}
 		return r;
+	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
+		if (!kvm_x86_ops->enable_direct_tlbflush)
+			return -ENOTTY;
+
+		return kvm_x86_ops->enable_direct_tlbflush(vcpu);
 
 	default:
 		return -EINVAL;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index a7c19540ce21..cb959bc925b1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
 #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 174
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.14.2


^ permalink raw reply related

* [PATCH 3/3] KVM/Hyper-V/VMX: Add direct tlb flush support
From: lantianyu1986 @ 2019-08-09  9:49 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
	pbonzini, rkrcmar, michael.h.kelley
  Cc: Vitaly Kuznetsov, linux-hyperv, linux-kernel, kvm, Tianyu Lan
In-Reply-To: <20190809094939.76093-1-Tianyu.Lan@microsoft.com>

From: Vitaly Kuznetsov <vkuznets@redhat.com>

This patch is to enable Hyper-V direct tlb flush function
for VMX.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 16 +++++++++++++++-
 arch/x86/kvm/vmx/evmcs.h           |  2 ++
 arch/x86/kvm/vmx/vmx.c             | 38 ++++++++++++++++++++++++++++++++++++++
 include/linux/kvm_host.h           |  1 +
 4 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index a79703c56ebe..d53d6e4a6210 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -171,6 +171,7 @@
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
 /* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
+#define HV_X64_NESTED_DIRECT_FLUSH			BIT(17)
 #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
 #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
 
@@ -514,12 +515,22 @@ struct hv_timer_message_payload {
 	__u64 delivery_time;	/* When the message was delivered */
 } __packed;
 
+struct hv_nested_enlightenments_control {
+	struct {
+		__u32 directhypercall:1;
+		__u32 reserved:31;
+	} features;
+	struct {
+		__u32 reserved;
+	} hypercallControls;
+} __packed;
+
 /* Define virtual processor assist page structure. */
 struct hv_vp_assist_page {
 	__u32 apic_assist;
 	__u32 reserved1;
 	__u64 vtl_control[3];
-	__u64 nested_enlightenments_control[2];
+	struct hv_nested_enlightenments_control nested_control;
 	__u8 enlighten_vmentry;
 	__u8 reserved2[7];
 	__u64 current_nested_vmcs;
@@ -872,4 +883,7 @@ struct hv_tlb_flush_ex {
 	u64 gva_list[];
 } __packed;
 
+struct hv_partition_assist_pg {
+	u32 tlb_lock_count;
+};
 #endif
diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
index 39a24eec8884..07ebf6882a45 100644
--- a/arch/x86/kvm/vmx/evmcs.h
+++ b/arch/x86/kvm/vmx/evmcs.h
@@ -178,6 +178,8 @@ static inline void evmcs_load(u64 phys_addr)
 	struct hv_vp_assist_page *vp_ap =
 		hv_get_vp_assist_page(smp_processor_id());
 
+	if (current_evmcs->hv_enlightenments_control.nested_flush_hypercall)
+		vp_ap->nested_control.features.directhypercall = 1;
 	vp_ap->current_nested_vmcs = phys_addr;
 	vp_ap->enlighten_vmentry = 1;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 84f8d49a2fd2..a49be029864e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -486,6 +486,34 @@ static int hv_remote_flush_tlb(struct kvm *kvm)
 	return hv_remote_flush_tlb_with_range(kvm, NULL);
 }
 
+static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
+{
+	struct hv_enlightened_vmcs *evmcs;
+
+	/*
+	 * Synthetic VM-Exit is not enabled in current code and so All
+	 * evmcs in singe VM shares same assist page.
+	 */
+	if (!vcpu->kvm->hv_pa_pg) {
+		vcpu->kvm->hv_pa_pg = kzalloc(PAGE_SIZE, GFP_KERNEL);
+		if (!vcpu->kvm->hv_pa_pg)
+			return -ENOMEM;
+		pr_debug("KVM: Hyper-V: allocated PA_PG for %llx\n",
+		       (u64)&vcpu->kvm);
+	}
+
+	evmcs = (struct hv_enlightened_vmcs *)to_vmx(vcpu)->loaded_vmcs->vmcs;
+
+	evmcs->partition_assist_page =
+		__pa(vcpu->kvm->hv_pa_pg);
+	evmcs->hv_vm_id = (u64)vcpu->kvm;
+	evmcs->hv_enlightenments_control.nested_flush_hypercall = 1;
+
+	pr_debug("KVM: Hyper-V: enabled DIRECT flush for %llx\n",
+		 (u64)vcpu->kvm);
+	return 0;
+}
+
 #endif /* IS_ENABLED(CONFIG_HYPERV) */
 
 /*
@@ -6516,6 +6544,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 		current_evmcs->hv_clean_fields |=
 			HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
 
+	if (static_branch_unlikely(&enable_evmcs))
+		current_evmcs->hv_vp_id = vcpu->arch.hyperv.vp_index;
+
 	/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
 	if (vmx->host_debugctlmsr)
 		update_debugctlmsr(vmx->host_debugctlmsr);
@@ -6583,6 +6614,7 @@ static struct kvm *vmx_vm_alloc(void)
 
 static void vmx_vm_free(struct kvm *kvm)
 {
+	kfree(kvm->hv_pa_pg);
 	vfree(to_kvm_vmx(kvm));
 }
 
@@ -7815,6 +7847,7 @@ static void vmx_exit(void)
 			if (!vp_ap)
 				continue;
 
+			vp_ap->nested_control.features.directhypercall = 0;
 			vp_ap->current_nested_vmcs = 0;
 			vp_ap->enlighten_vmentry = 0;
 		}
@@ -7854,6 +7887,11 @@ static int __init vmx_init(void)
 			pr_info("KVM: vmx: using Hyper-V Enlightened VMCS\n");
 			static_branch_enable(&enable_evmcs);
 		}
+
+		if (ms_hyperv.nested_features & HV_X64_NESTED_DIRECT_FLUSH)
+			vmx_x86_ops.enable_direct_tlbflush
+				= hv_enable_direct_tlbflush;
+
 	} else {
 		enlightened_vmcs = false;
 	}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c5da875f19e3..479ad76661e6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -500,6 +500,7 @@ struct kvm {
 	struct srcu_struct srcu;
 	struct srcu_struct irq_srcu;
 	pid_t userspace_pid;
+	struct hv_partition_assist_pg *hv_pa_pg;
 };
 
 #define kvm_err(fmt, ...) \
-- 
2.14.2


^ permalink raw reply related

* Re: [PATCH 1/3] x86/Hyper-V: Fix definition of struct hv_vp_assist_page
From: Vitaly Kuznetsov @ 2019-08-09 10:25 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, pbonzini,
	rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp,
	hpa, x86, michael.h.kelley
In-Reply-To: <20190809094939.76093-2-Tianyu.Lan@microsoft.com>

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> The struct hv_vp_assist_page was defined incorrectly.
> The "vtl_control" should be u64[3], "nested_enlightenments_control"
> should be a u64 and there is 7 reserved bytes following "enlighten_vmentry".
> This patch is to fix it.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index af78cd72b8f3..a79703c56ebe 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -517,11 +517,11 @@ struct hv_timer_message_payload {
>  /* Define virtual processor assist page structure. */
>  struct hv_vp_assist_page {
>  	__u32 apic_assist;
> -	__u32 reserved;
> -	__u64 vtl_control[2];
> +	__u32 reserved1;
> +	__u64 vtl_control[3];
>  	__u64 nested_enlightenments_control[2];

In PATCH3 you define 'struct hv_nested_enlightenments_control' and it is
64bit long, not 128. We should change it here too as ...

> -	__u32 enlighten_vmentry;

enlighten_vmentry filed will get a very different offset breaking
Enlightened VMCS.

> -	__u32 padding;
> +	__u8 enlighten_vmentry;
> +	__u8 reserved2[7];
>  	__u64 current_nested_vmcs;
>  } __packed;

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 2/3] KVM/Hyper-V: Add new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH
From: Vitaly Kuznetsov @ 2019-08-09 10:44 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Tianyu Lan, kvm, linux-doc, linux-kernel, linux-hyperv, pbonzini,
	rkrcmar, corbet, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp,
	hpa, x86, michael.h.kelley
In-Reply-To: <20190809094939.76093-3-Tianyu.Lan@microsoft.com>

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> This patch adds new KVM cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH and let
> user space to enable direct tlb flush function when only Hyper-V
> hypervsior capability is exposed to VM. This patch also adds
> enable_direct_tlbflush callback in the struct kvm_x86_ops and
> platforms may use it to implement direct tlb flush support.
>
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  Documentation/virtual/kvm/api.txt | 10 ++++++++++
>  arch/x86/include/asm/kvm_host.h   |  2 ++
>  arch/x86/kvm/x86.c                |  8 ++++++++
>  include/uapi/linux/kvm.h          |  1 +
>  4 files changed, 21 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 2cd6250b2896..45308ed6dd75 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -5289,3 +5289,13 @@ Architectures: x86
>  This capability indicates that KVM supports paravirtualized Hyper-V IPI send
>  hypercalls:
>  HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx.
> +8.21 KVM_CAP_HYPERV_DIRECT_TLBFLUSH
> +
> +Architecture: x86
> +
> +This capability indicates that KVM supports Hyper-V direct tlb flush function.
> +User space should enable this feature only when Hyper-V hypervisor capability
> +is exposed to guest and KVM profile is hided. Both Hyper-V and KVM hypercalls
> +use RAX and RCX registers to pass parameters. If KVM hypercall is exposed
> +to L2 guest with direct tlbflush enabled, Hyper-V may mistake KVM hypercall
> +for Hyper-V tlb flush Hypercall due to paremeter register overlap.

First, we need to explicitly state that this is for KVM on Hyper-V and
second, that this disables normal hypercall handling by KVM.

My take:

This capability indicates that KVM running on top of Hyper-V hypervisor
enables Direct TLB flush for its guests meaning that TLB flush
hypercalls are handled by Level 1 hypervisor (Hyper-V) bypassing KVM. 
Due to the different ABI for hypercall parameters between Hyper-V and
KVM, enabling this capability effectively disables all hypercall
handling by KVM (as some KVM hypercall may be mistakenly treated as TLB
flush hypercalls by Hyper-C) so userspace should disable KVM
identification in CPUID.

I think we should also enforce this somehow leaving only Hyper-V style
hypercalls handling (for Windows guests) in place.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0cc5b611a113..667d154e89d4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1205,6 +1205,8 @@ struct kvm_x86_ops {
>  	uint16_t (*nested_get_evmcs_version)(struct kvm_vcpu *vcpu);
>  
>  	bool (*need_emulation_on_page_fault)(struct kvm_vcpu *vcpu);
> +
> +	int (*enable_direct_tlbflush)(struct kvm_vcpu *vcpu);
>  };
>  
>  struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9d7b9e6a0939..a9d8ee7f7bf0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3183,6 +3183,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = kvm_x86_ops->get_nested_state ?
>  			kvm_x86_ops->get_nested_state(NULL, NULL, 0) : 0;
>  		break;
> +	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> +		r = kvm_x86_ops->enable_direct_tlbflush ? 1 : 0;
> +		break;
>  	default:
>  		break;
>  	}
> @@ -3953,6 +3956,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
>  				r = -EFAULT;
>  		}
>  		return r;
> +	case KVM_CAP_HYPERV_DIRECT_TLBFLUSH:
> +		if (!kvm_x86_ops->enable_direct_tlbflush)
> +			return -ENOTTY;
> +
> +		return kvm_x86_ops->enable_direct_tlbflush(vcpu);
>  
>  	default:
>  		return -EINVAL;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index a7c19540ce21..cb959bc925b1 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -996,6 +996,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
>  #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
>  #define KVM_CAP_PMU_EVENT_FILTER 173
> +#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 174
>  
>  #ifdef KVM_CAP_IRQ_ROUTING

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH net v2] hv_netvsc: Fix a warning of suspicious RCU usage
From: David Miller @ 2019-08-09 20:42 UTC (permalink / raw)
  To: decui
  Cc: netdev, haiyangz, sthemmin, jakub.kicinski, sashal, kys, mikelley,
	linux-hyperv, linux-kernel, olaf, apw, jasowang, vkuznets,
	marcelo.cerri
In-Reply-To: <PU1P153MB0169A6492DCBB490FE7FE52CBFD60@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com>
Date: Fri, 9 Aug 2019 01:58:08 +0000

> This fixes a warning of "suspicious rcu_dereference_check() usage"
> when nload runs.
> 
> Fixes: 776e726bfb34 ("netvsc: fix RCU warning in get_stats")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] PCI: hv: Fix panic by calling hv_pci_remove_slots() earlier
From: Lorenzo Pieralisi @ 2019-08-12 13:06 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org, Michael Kelley,
	Stephen Hemminger, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	jackm@mellanox.com
In-Reply-To: <PU1P153MB0169F9EDD707FFE1517F8D56BFD50@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Tue, Aug 06, 2019 at 08:41:17PM +0000, Dexuan Cui wrote:
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Bjorn Helgaas
> > Sent: Tuesday, August 6, 2019 1:16 PM
> > To: Dexuan Cui <decui@microsoft.com>
> > 
> > Thanks for updating this.  But you didn't update the subject line,
> > which is really still a little too low-level.  Maybe Lorenzo will fix
> > this.  Something like this, maybe?
> > 
> >   PCI: hv: Avoid use of hv_pci_dev->pci_slot after freeing it
> 
> This is better. Thanks!
> 
> I hope Lorenzo can help to fix this so I could avoid a v3. :-)

You should have fixed it yourself, this time I will.

Thanks,
Lorenzo

^ permalink raw reply

* RE: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 13:50 UTC (permalink / raw)
  To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1565135484-31351-1-git-send-email-haiyangz@microsoft.com>



> -----Original Message-----
> From: LKML haiyangz <lkmlhyz@microsoft.com> On Behalf Of Haiyang
> Zhang
> Sent: Tuesday, August 6, 2019 7:52 PM
> To: sashal@kernel.org; bhelgaas@google.com; lorenzo.pieralisi@arm.com;
> linux-hyperv@vger.kernel.org; linux-pci@vger.kernel.org
> Cc: Haiyang Zhang <haiyangz@microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
> 
> Currently in Azure cloud, for passthrough devices including GPU, the
> host sets the device instance ID's bytes 8 - 15 to a value derived from
> the host HWID, which is the same on all devices in a VM. So, the device
> instance ID's bytes 8 and 9 provided by the host are no longer unique.
> 
> This can cause device passthrough to VMs to fail because the bytes 8 and
> 9 is used as PCI domain number. So, as recommended by Azure host team,
> we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> domain. The chance of collision is greatly reduced. In the rare cases of
> collision, we will detect and find another number that is not in use.
> 
> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Acked-by: Sasha Levin <sashal@kernel.org>
> ---
>  drivers/pci/controller/pci-hyperv.c | 92

Hi Lorenzo,

This patch has been updated based on Bjorn's comments. Do you have any further
comments? Could you take it from your tree?

Thanks,
- Haiyang

^ permalink raw reply

* Re: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Lorenzo Pieralisi @ 2019-08-12 15:38 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sashal@kernel.org, bhelgaas@google.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <1565135484-31351-1-git-send-email-haiyangz@microsoft.com>

On Tue, Aug 06, 2019 at 11:52:11PM +0000, Haiyang Zhang wrote:
> Currently in Azure cloud, for passthrough devices including GPU, the
> host sets the device instance ID's bytes 8 - 15 to a value derived from
> the host HWID, which is the same on all devices in a VM. So, the device
> instance ID's bytes 8 and 9 provided by the host are no longer unique.
> 
> This can cause device passthrough to VMs to fail because the bytes 8 and
> 9 is used as PCI domain number. So, as recommended by Azure host team,
> we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> domain. The chance of collision is greatly reduced. In the rare cases of
> collision, we will detect and find another number that is not in use.

This is not clear at all. Why "finding another number" is fine with
this patch while it is not with current kernel code ? Also does this
have backward compatibility issues ?

I do not understand if a collision is a problem or not from the
log above.

> Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this idea.

Add it as Suggested-by: tag.

Thanks,
Lorenzo

> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> Acked-by: Sasha Levin <sashal@kernel.org>
> ---
>  drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
>  1 file changed, 79 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 40b6254..4f3d97e 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
>  		complete(&hbus->remove_event);
>  }
>  
> +#define HVPCI_DOM_MAP_SIZE (64 * 1024)
> +static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
> +
> +/*
> + * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
> + * as invalid for passthrough PCI devices of this driver.
> + */
> +#define HVPCI_DOM_INVALID 0
> +
> +/**
> + * hv_get_dom_num() - Get a valid PCI domain number
> + * Check if the PCI domain number is in use, and return another number if
> + * it is in use.
> + *
> + * @dom: Requested domain number
> + *
> + * return: domain number on success, HVPCI_DOM_INVALID on failure
> + */
> +static u16 hv_get_dom_num(u16 dom)
> +{
> +	unsigned int i;
> +
> +	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
> +		return dom;
> +
> +	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
> +		if (test_and_set_bit(i, hvpci_dom_map) == 0)
> +			return i;
> +	}
> +
> +	return HVPCI_DOM_INVALID;
> +}
> +
> +/**
> + * hv_put_dom_num() - Mark the PCI domain number as free
> + * @dom: Domain number to be freed
> + */
> +static void hv_put_dom_num(u16 dom)
> +{
> +	clear_bit(dom, hvpci_dom_map);
> +}
> +
>  /**
>   * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
>   * @hdev:	VMBus's tracking struct for this root PCI bus
> @@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  			const struct hv_vmbus_device_id *dev_id)
>  {
>  	struct hv_pcibus_device *hbus;
> +	u16 dom_req, dom;
>  	int ret;
>  
>  	/*
> @@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	hbus->state = hv_pcibus_init;
>  
>  	/*
> -	 * The PCI bus "domain" is what is called "segment" in ACPI and
> -	 * other specs.  Pull it from the instance ID, to get something
> -	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
> -	 * do the same thing for consistency.  Note that, since this code
> -	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
> -	 * that (1) the only domain in use for something that looks like
> -	 * a physical PCI bus (which is actually emulated by the
> -	 * hypervisor) is domain 0 and (2) there will be no overlap
> -	 * between domains derived from these instance IDs in the same
> -	 * VM.
> +	 * The PCI bus "domain" is what is called "segment" in ACPI and other
> +	 * specs. Pull it from the instance ID, to get something usually
> +	 * unique. In rare cases of collision, we will find out another number
> +	 * not in use.
> +	 *
> +	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
> +	 * together with this guest driver can guarantee that (1) The only
> +	 * domain used by Gen1 VMs for something that looks like a physical
> +	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
> +	 * (2) There will be no overlap between domains (after fixing possible
> +	 * collisions) in the same VM.
>  	 */
> -	hbus->sysdata.domain = hdev->dev_instance.b[9] |
> -			       hdev->dev_instance.b[8] << 8;
> +	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
> +	dom = hv_get_dom_num(dom_req);
> +
> +	if (dom == HVPCI_DOM_INVALID) {
> +		dev_err(&hdev->device,
> +			"Unable to use dom# 0x%hx or other numbers", dom_req);
> +		ret = -EINVAL;
> +		goto free_bus;
> +	}
> +
> +	if (dom != dom_req)
> +		dev_info(&hdev->device,
> +			 "PCI dom# 0x%hx has collision, using 0x%hx",
> +			 dom_req, dom);
> +
> +	hbus->sysdata.domain = dom;
>  
>  	hbus->hdev = hdev;
>  	refcount_set(&hbus->remove_lock, 1);
> @@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
>  					   hbus->sysdata.domain);
>  	if (!hbus->wq) {
>  		ret = -ENOMEM;
> -		goto free_bus;
> +		goto free_dom;
>  	}
>  
>  	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
> @@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
>  	vmbus_close(hdev->channel);
>  destroy_wq:
>  	destroy_workqueue(hbus->wq);
> +free_dom:
> +	hv_put_dom_num(hbus->sysdata.domain);
>  free_bus:
>  	free_page((unsigned long)hbus);
>  	return ret;
> @@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
>  	put_hvpcibus(hbus);
>  	wait_for_completion(&hbus->remove_event);
>  	destroy_workqueue(hbus->wq);
> +
> +	hv_put_dom_num(hbus->sysdata.domain);
> +
>  	free_page((unsigned long)hbus);
>  	return 0;
>  }
> @@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
>  
>  static int __init init_hv_pci_drv(void)
>  {
> +	/* Set the invalid domain number's bit, so it will not be used */
> +	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
> +
>  	return vmbus_driver_register(&hv_pci_drv);
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 15:56 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: sashal@kernel.org, bhelgaas@google.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
	KY Srinivasan, Stephen Hemminger, olaf@aepfle.de, vkuznets,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190812153833.GA30794@e121166-lin.cambridge.arm.com>



> -----Original Message-----
> From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Sent: Monday, August 12, 2019 11:39 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: sashal@kernel.org; bhelgaas@google.com; linux-
> hyperv@vger.kernel.org; linux-pci@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>;
> olaf@aepfle.de; vkuznets <vkuznets@redhat.com>; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH v2] PCI: hv: Detect and fix Hyper-V PCI domain number
> collision
> 
> On Tue, Aug 06, 2019 at 11:52:11PM +0000, Haiyang Zhang wrote:
> > Currently in Azure cloud, for passthrough devices including GPU, the
> > host sets the device instance ID's bytes 8 - 15 to a value derived from
> > the host HWID, which is the same on all devices in a VM. So, the device
> > instance ID's bytes 8 and 9 provided by the host are no longer unique.
> >
> > This can cause device passthrough to VMs to fail because the bytes 8 and
> > 9 is used as PCI domain number. So, as recommended by Azure host team,
> > we now use the bytes 4 and 5 which usually contain unique numbers as PCI
> > domain. The chance of collision is greatly reduced. In the rare cases of
> > collision, we will detect and find another number that is not in use.
> 
> This is not clear at all. Why "finding another number" is fine with
> this patch while it is not with current kernel code ? Also does this
> have backward compatibility issues ?
The bytes 4, 5 have more uniqueness (info entropy) than bytes 8, 9, so we use
bytes 4, 5. On older hosts, bytes 4, 5 can also be used -- so it has no backward
compatibility issues.
 
> I do not understand if a collision is a problem or not from the
> log above.
Collision will cause the second device with the same domain number fails to load.
I will include these info into the patch description.

> 
> > Thanks to Michael Kelley <mikelley@microsoft.com> for proposing this
> idea.
> 
> Add it as Suggested-by: tag.
I will add this line.

Thanks,
- Haiyang

^ permalink raw reply

* [PATCH v3] PCI: hv: Detect and fix Hyper-V PCI domain number collision
From: Haiyang Zhang @ 2019-08-12 18:20 UTC (permalink / raw)
  To: sashal@kernel.org, bhelgaas@google.com, lorenzo.pieralisi@arm.com,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, linux-kernel@vger.kernel.org

Currently in Azure cloud, for passthrough devices including GPU, the host
sets the device instance ID's bytes 8 - 15 to a value derived from the host
HWID, which is the same on all devices in a VM. So, the device instance
ID's bytes 8 and 9 provided by the host are no longer unique. This can
cause device passthrough to VMs to fail because the bytes 8 and 9 are used
as PCI domain number. Collision of domain numbers will cause the second
device with the same domain number fail to load.

As recommended by Azure host team, the bytes 4, 5 have more uniqueness
(info entropy) than bytes 8, 9. So now we use bytes 4, 5 as the PCI domain
numbers. On older hosts, bytes 4, 5 can also be used -- no backward
compatibility issues here. The chance of collision is greatly reduced. In
the rare cases of collision, we will detect and find another number that is
not in use.

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Acked-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pci/controller/pci-hyperv.c | 92 +++++++++++++++++++++++++++++++------
 1 file changed, 79 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b6254..4f3d97e 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2510,6 +2510,48 @@ static void put_hvpcibus(struct hv_pcibus_device *hbus)
 		complete(&hbus->remove_event);
 }
 
+#define HVPCI_DOM_MAP_SIZE (64 * 1024)
+static DECLARE_BITMAP(hvpci_dom_map, HVPCI_DOM_MAP_SIZE);
+
+/*
+ * PCI domain number 0 is used by emulated devices on Gen1 VMs, so define 0
+ * as invalid for passthrough PCI devices of this driver.
+ */
+#define HVPCI_DOM_INVALID 0
+
+/**
+ * hv_get_dom_num() - Get a valid PCI domain number
+ * Check if the PCI domain number is in use, and return another number if
+ * it is in use.
+ *
+ * @dom: Requested domain number
+ *
+ * return: domain number on success, HVPCI_DOM_INVALID on failure
+ */
+static u16 hv_get_dom_num(u16 dom)
+{
+	unsigned int i;
+
+	if (test_and_set_bit(dom, hvpci_dom_map) == 0)
+		return dom;
+
+	for_each_clear_bit(i, hvpci_dom_map, HVPCI_DOM_MAP_SIZE) {
+		if (test_and_set_bit(i, hvpci_dom_map) == 0)
+			return i;
+	}
+
+	return HVPCI_DOM_INVALID;
+}
+
+/**
+ * hv_put_dom_num() - Mark the PCI domain number as free
+ * @dom: Domain number to be freed
+ */
+static void hv_put_dom_num(u16 dom)
+{
+	clear_bit(dom, hvpci_dom_map);
+}
+
 /**
  * hv_pci_probe() - New VMBus channel probe, for a root PCI bus
  * @hdev:	VMBus's tracking struct for this root PCI bus
@@ -2521,6 +2563,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 			const struct hv_vmbus_device_id *dev_id)
 {
 	struct hv_pcibus_device *hbus;
+	u16 dom_req, dom;
 	int ret;
 
 	/*
@@ -2535,19 +2578,34 @@ static int hv_pci_probe(struct hv_device *hdev,
 	hbus->state = hv_pcibus_init;
 
 	/*
-	 * The PCI bus "domain" is what is called "segment" in ACPI and
-	 * other specs.  Pull it from the instance ID, to get something
-	 * unique.  Bytes 8 and 9 are what is used in Windows guests, so
-	 * do the same thing for consistency.  Note that, since this code
-	 * only runs in a Hyper-V VM, Hyper-V can (and does) guarantee
-	 * that (1) the only domain in use for something that looks like
-	 * a physical PCI bus (which is actually emulated by the
-	 * hypervisor) is domain 0 and (2) there will be no overlap
-	 * between domains derived from these instance IDs in the same
-	 * VM.
+	 * The PCI bus "domain" is what is called "segment" in ACPI and other
+	 * specs. Pull it from the instance ID, to get something usually
+	 * unique. In rare cases of collision, we will find out another number
+	 * not in use.
+	 *
+	 * Note that, since this code only runs in a Hyper-V VM, Hyper-V
+	 * together with this guest driver can guarantee that (1) The only
+	 * domain used by Gen1 VMs for something that looks like a physical
+	 * PCI bus (which is actually emulated by the hypervisor) is domain 0.
+	 * (2) There will be no overlap between domains (after fixing possible
+	 * collisions) in the same VM.
 	 */
-	hbus->sysdata.domain = hdev->dev_instance.b[9] |
-			       hdev->dev_instance.b[8] << 8;
+	dom_req = hdev->dev_instance.b[5] << 8 | hdev->dev_instance.b[4];
+	dom = hv_get_dom_num(dom_req);
+
+	if (dom == HVPCI_DOM_INVALID) {
+		dev_err(&hdev->device,
+			"Unable to use dom# 0x%hx or other numbers", dom_req);
+		ret = -EINVAL;
+		goto free_bus;
+	}
+
+	if (dom != dom_req)
+		dev_info(&hdev->device,
+			 "PCI dom# 0x%hx has collision, using 0x%hx",
+			 dom_req, dom);
+
+	hbus->sysdata.domain = dom;
 
 	hbus->hdev = hdev;
 	refcount_set(&hbus->remove_lock, 1);
@@ -2562,7 +2620,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 					   hbus->sysdata.domain);
 	if (!hbus->wq) {
 		ret = -ENOMEM;
-		goto free_bus;
+		goto free_dom;
 	}
 
 	ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
@@ -2639,6 +2697,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 	vmbus_close(hdev->channel);
 destroy_wq:
 	destroy_workqueue(hbus->wq);
+free_dom:
+	hv_put_dom_num(hbus->sysdata.domain);
 free_bus:
 	free_page((unsigned long)hbus);
 	return ret;
@@ -2720,6 +2780,9 @@ static int hv_pci_remove(struct hv_device *hdev)
 	put_hvpcibus(hbus);
 	wait_for_completion(&hbus->remove_event);
 	destroy_workqueue(hbus->wq);
+
+	hv_put_dom_num(hbus->sysdata.domain);
+
 	free_page((unsigned long)hbus);
 	return 0;
 }
@@ -2747,6 +2810,9 @@ static void __exit exit_hv_pci_drv(void)
 
 static int __init init_hv_pci_drv(void)
 {
+	/* Set the invalid domain number's bit, so it will not be used */
+	set_bit(HVPCI_DOM_INVALID, hvpci_dom_map);
+
 	return vmbus_driver_register(&hv_pci_drv);
 }
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v1] Tools: hv: move to tools buildsystem
From: Andy Shevchenko @ 2019-08-12 18:27 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, linux-hyperv,
	Sasha Levin
In-Reply-To: <20190628170542.28481-1-andriy.shevchenko@linux.intel.com>

On Fri, Jun 28, 2019 at 08:05:42PM +0300, Andy Shevchenko wrote:
> There is a nice buildsystem dedicated for userspace tools in Linux kernel tree.
> Switch gpio target to be built by it.
> 

Any comments on this?

> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  tools/hv/Build    |  3 +++
>  tools/hv/Makefile | 51 +++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 44 insertions(+), 10 deletions(-)
>  create mode 100644 tools/hv/Build
> 
> diff --git a/tools/hv/Build b/tools/hv/Build
> new file mode 100644
> index 000000000000..6cf51fa4b306
> --- /dev/null
> +++ b/tools/hv/Build
> @@ -0,0 +1,3 @@
> +hv_kvp_daemon-y += hv_kvp_daemon.o
> +hv_vss_daemon-y += hv_vss_daemon.o
> +hv_fcopy_daemon-y += hv_fcopy_daemon.o
> diff --git a/tools/hv/Makefile b/tools/hv/Makefile
> index 5db5e62cebda..b57143d9459c 100644
> --- a/tools/hv/Makefile
> +++ b/tools/hv/Makefile
> @@ -1,28 +1,55 @@
>  # SPDX-License-Identifier: GPL-2.0
>  # Makefile for Hyper-V tools
> -
> -WARNINGS = -Wall -Wextra
> -CFLAGS = $(WARNINGS) -g $(shell getconf LFS_CFLAGS)
> -
> -CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
> +include ../scripts/Makefile.include
>  
>  sbindir ?= /usr/sbin
>  libexecdir ?= /usr/libexec
>  sharedstatedir ?= /var/lib
>  
> -ALL_PROGRAMS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +ifeq ($(srctree),)
> +srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> +srctree := $(patsubst %/,%,$(dir $(srctree)))
> +endif
> +
> +# Do not use make's built-in rules
> +# (this improves performance and avoids hard-to-debug behaviour);
> +MAKEFLAGS += -r
> +
> +override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
> +
> +ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
>  
>  ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
>  
>  all: $(ALL_PROGRAMS)
>  
> -%: %.c
> -	$(CC) $(CFLAGS) -o $@ $^
> +export srctree OUTPUT CC LD CFLAGS
> +include $(srctree)/tools/build/Makefile.include
> +
> +HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
> +$(HV_KVP_DAEMON_IN): FORCE
> +	$(Q)$(MAKE) $(build)=hv_kvp_daemon
> +$(OUTPUT)hv_kvp_daemon: $(HV_KVP_DAEMON_IN)
> +	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> +
> +HV_VSS_DAEMON_IN := $(OUTPUT)hv_vss_daemon-in.o
> +$(HV_VSS_DAEMON_IN): FORCE
> +	$(Q)$(MAKE) $(build)=hv_vss_daemon
> +$(OUTPUT)hv_vss_daemon: $(HV_VSS_DAEMON_IN)
> +	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> +
> +HV_FCOPY_DAEMON_IN := $(OUTPUT)hv_fcopy_daemon-in.o
> +$(HV_FCOPY_DAEMON_IN): FORCE
> +	$(Q)$(MAKE) $(build)=hv_fcopy_daemon
> +$(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
> +	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
>  
>  clean:
> -	$(RM) hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +	rm -f $(ALL_PROGRAMS)
> +	find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
>  
> -install: all
> +install: $(ALL_PROGRAMS)
>  	install -d -m 755 $(DESTDIR)$(sbindir); \
>  	install -d -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd; \
>  	install -d -m 755 $(DESTDIR)$(sharedstatedir); \
> @@ -33,3 +60,7 @@ install: all
>  	for script in $(ALL_SCRIPTS); do \
>  		install $$script -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd/$${script%.sh}; \
>  	done
> +
> +FORCE:
> +
> +.PHONY: all install clean FORCE prepare
> -- 
> 2.20.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply

* RE: [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
From: Michael Kelley @ 2019-08-12 18:39 UTC (permalink / raw)
  To: lantianyu1986@gmail.com, luto@kernel.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, hpa@zytor.com, x86@kernel.org,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, daniel.lezcano@linaro.org, arnd@arndb.de,
	ashal@kernel.org
  Cc: Tianyu Lan, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-arch@vger.kernel.org
In-Reply-To: <20190729075243.22745-2-Tianyu.Lan@microsoft.com>

From: lantianyu1986@gmail.com <lantianyu1986@gmail.com> Sent: Monday, July 29, 2019 12:53 AM
> 
> This is to prepare to add Hyper-V sched clock callback and move
> Hyper-V reference TSC initialization much earlier in the boot
> process when timestamp is 0. So no discontinuity is observed
> when pv_ops.time.sched_clock to calculate its offset. This earlier
> initialization requires that the Hyper-V TSC page be allocated
> statically instead of with vmalloc(), so fixup the references
> to the TSC page and the method of getting its physical address.

I'd suggest tweaking the commit message wording a bit:

Prepare to add Hyper-V sched clock callback and move Hyper-V
Reference TSC initialization much earlier in the boot process.  Earlier
initialization is needed so that it happens while the timestamp value
is still 0 and no discontinuity in the timestamp will occur when 
pv_ops.time.sched_clock calculates its offset.  The earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
>  arch/x86/entry/vdso/vma.c          |  2 +-
>  drivers/clocksource/hyperv_timer.c | 12 ++++--------
>  2 files changed, 5 insertions(+), 9 deletions(-)
> 
> @@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
>  	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>  		return false;
> 
> -	tsc_pg = vmalloc(PAGE_SIZE);
> -	if (!tsc_pg)
> -		return false;
> -
>  	hyperv_cs = &hyperv_cs_tsc;
> -	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> +	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;

The and'ing with PAGE_MASK isn't needed.  You've set up tsc_pg
to ensure it is page aligned, so there's no need to mask out any
low order bits.  That's why the previous code didn't do the masking
either.

> 
>  	/*
>  	 * The Hyper-V TLFS specifies to preserve the value of reserved
> --
> 2.14.5


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox