Linux-HyperV List
 help / color / mirror / Atom feed
* RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Dexuan Cui @ 2019-09-16 21:46 UTC (permalink / raw)
  To: Dexuan Cui, Wei Hu, Michael Kelley, rdunlap@infradead.org,
	shc_work@mail.ru, gregkh@linuxfoundation.org,
	lee.jones@linaro.org, alexandre.belloni@bootlin.com,
	baijiaju1990@gmail.com, fthain@telegraphics.com.au,
	info@metux.net, linux-hyperv@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-fbdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, sashal@kernel.org,
	Stephen Hemminger, Haiyang Zhang, KY Srinivasan
In-Reply-To: <PU1P153MB0169E5E73D258A034B4869DCBFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>



Thanks,
-- Dexuan

> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Thursday, September 12, 2019 11:38 PM
> To: Wei Hu <weh@microsoft.com>; Michael Kelley <mikelley@microsoft.com>;
> rdunlap@infradead.org; shc_work@mail.ru; gregkh@linuxfoundation.org;
> lee.jones@linaro.org; alexandre.belloni@bootlin.com;
> baijiaju1990@gmail.com; fthain@telegraphics.com.au; info@metux.net;
> linux-hyperv@vger.kernel.org; dri-devel@lists.freedesktop.org;
> linux-fbdev@vger.kernel.org; linux-kernel@vger.kernel.org; sashal@kernel.org;
> Stephen Hemminger <sthemmin@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> Subject: RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for
> Hyper-V frame buffer driver
> 
> > From: Wei Hu <weh@microsoft.com>
> > Sent: Thursday, September 12, 2019 11:03 PM
> >
> > Without deferred IO support, hyperv_fb driver informs the host to refresh
> > the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> > is screen update or not. This patch supports deferred IO for screens in
> > graphics mode and also enables the frame buffer on-demand refresh. The
> > highest refresh rate is still set at 20Hz.
> >
> > Currently Hyper-V only takes a physical address from guest as the starting
> > address of frame buffer. This implies the guest must allocate contiguous
> > physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> > accept address from MMIO region as frame buffer address. Due to these
> > limitations on Hyper-V host, we keep a shadow copy of frame buffer
> > in the guest. This means one more copy of the dirty rectangle inside
> > guest when doing the on-demand refresh. This can be optimized in the
> > future with help from host. For now the host performance gain from deferred
> > IO outweighs the shadow copy impact in the guest.
> >
> > Signed-off-by: Wei Hu <weh@microsoft.com>
> > ---
> >     v2: Incorporated review comments from Michael Kelley
> >     - Increased dirty rectangle by one row in deferred IO case when sending
> >     to Hyper-V.
> >     - Corrected the dirty rectangle size in the text mode.
> >     - Added more comments.
> >     - Other minor code cleanups.
> >
> >     v3: Incorporated more review comments
> >     - Removed a few unnecessary variable tests
> >
> >     v4: Incorporated test and review feedback from Dexuan Cui
> >     - Not disable interrupt while acquiring docopy_lock in
> >       hvfb_update_work(). This avoids significant bootup delay in
> >       large vCPU count VMs.
> >
> >     v5: Completely remove the unnecessary docopy_lock after discussing
> >     with Dexuan Cui.
> 
> Thanks! Looks good to me.
> 
> Reviewed-by: Dexuan Cui <decui@microsoft.com>


Hi Wei,
It turns out we need to make a further fix. :-)

The patch forgets to take par->update into consideration.

When the VM Connection window is closed (or minimized?),
the host sends a message to the guest, and the guest sets
par->update to false in synthvid_recv_sub().

If par->update is false, the guest doesn't need to call
synthvid_update().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
From: Dexuan Cui @ 2019-09-16 21:45 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, b.zolnierkie@samsung.com,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stephen Hemminger, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan
  Cc: Iouri Tarassov
In-Reply-To: <PU1P153MB0169656B3EC48BFCF4D8C134BFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Thursday, September 12, 2019 11:39 PM
> To: Michael Kelley <mikelley@microsoft.com>; Wei Hu <weh@microsoft.com>;
> b.zolnierkie@samsung.com; linux-hyperv@vger.kernel.org;
> dri-devel@lists.freedesktop.org; linux-fbdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> Cc: Iouri Tarassov <iourit@microsoft.com>
> Subject: RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution
> from Hyper-V host
> 
> > From: Michael Kelley <mikelley@microsoft.com>
> > Sent: Thursday, September 5, 2019 7:06 AM
> >
> > From: Wei Hu <weh@microsoft.com> Sent: Thursday, September 5, 2019
> 2:12
> > AM
> > >
> > > Beginning from Windows 10 RS5+, VM screen resolution is obtained from
> > host.
> > > The "video=hyperv_fb" boot time option is not needed, but still can be
> > > used to overwrite what the host specifies. The VM resolution on the host
> > > could be set by executing the powershell "set-vmvideo" command.
> > >
> > > Signed-off-by: Iouri Tarassov <iourit@microsoft.com>
> > > Signed-off-by: Wei Hu <weh@microsoft.com>
> > > ---
> > >     v2:
> > >     - Implemented fallback when version negotiation failed.
> > >     - Defined full size for supported_resolution array.
> > >
> > >     v3:
> > >     - Corrected the synthvid major and minor version comparison
> problem.
> > >
> > >     v4:
> > >     - Changed function name to synthvid_ver_ge().
> > >
> > >  drivers/video/fbdev/hyperv_fb.c | 159
> > +++++++++++++++++++++++++++++---
> > >  1 file changed, 147 insertions(+), 12 deletions(-)
> > >
> >
> > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> 
> Looks good to me.
> 
> Reviewed-by: Dexuan Cui <decui@microsoft.com>

Hi Wei,
It turns out we need to make a further fix. :-)

The patch forgets to take par->update into consideration.

When the VM Connection window is closed (or minimized?),
the host sends a message to the guest, and the guest sets
par->update to false in synthvid_recv_sub().

If par->update is false, the guest doesn't need to call
synthvid_update().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
From: Haiyang Zhang @ 2019-09-16 21:19 UTC (permalink / raw)
  To: Denis Efremov, Bjorn Helgaas
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Andrew Murray, linux-hyperv@vger.kernel.org, KY Srinivasan,
	Stephen Hemminger, Sasha Levin
In-Reply-To: <20190916204158.6889-3-efremov@linux.com>



> -----Original Message-----
> From: Denis Efremov <efremov@linux.com>
> Sent: Monday, September 16, 2019 4:42 PM
> To: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Denis Efremov <efremov@linux.com>; linux-kernel@vger.kernel.org;
> linux-pci@vger.kernel.org; Andrew Murray <andrew.murray@arm.com>;
> linux-hyperv@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Haiyang
> Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>
> Subject: [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
> 
> Replace the magic constant (6) with define PCI_STD_NUM_BARS
> representing the number of PCI BARs.
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index 40b625458afa..1665c23b649f 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -307,7 +307,7 @@ struct pci_bus_relations {  struct
> pci_q_res_req_response {
>  	struct vmpacket_descriptor hdr;
>  	s32 status;			/* negative values are failures */
> -	u32 probed_bar[6];
> +	u32 probed_bar[PCI_STD_NUM_BARS];
>  } __packed;
> 
>  struct pci_set_power {
> @@ -503,7 +503,7 @@ struct hv_pci_dev {
>  	 * What would be observed if one wrote 0xFFFFFFFF to a BAR and
> then
>  	 * read it back, for each of the BAR offsets within config space.
>  	 */
> -	u32 probed_bar[6];
> +	u32 probed_bar[PCI_STD_NUM_BARS];
>  };
> 
>  struct hv_pci_compl {
> @@ -1327,7 +1327,7 @@ static void survey_child_resources(struct
> hv_pcibus_device *hbus)
>  	 * so it's sufficient to just add them up without tracking alignment.
>  	 */
>  	list_for_each_entry(hpdev, &hbus->children, list_entry) {
> -		for (i = 0; i < 6; i++) {
> +		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  			if (hpdev->probed_bar[i] &
> PCI_BASE_ADDRESS_SPACE_IO)
>  				dev_err(&hbus->hdev->device,
>  					"There's an I/O BAR in this list!\n");
> @@ -1401,7 +1401,7 @@ static void prepopulate_bars(struct
> hv_pcibus_device *hbus)
>  	/* Pick addresses for the BARs. */
>  	do {
>  		list_for_each_entry(hpdev, &hbus->children, list_entry) {
> -			for (i = 0; i < 6; i++) {
> +			for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  				bar_val = hpdev->probed_bar[i];
>  				if (bar_val == 0)
>  					continue;
> @@ -1558,7 +1558,7 @@ static void q_resource_requirements(void
> *context, struct pci_response *resp,
>  			"query resource requirements failed: %x\n",
>  			resp->status);
>  	} else {
> -		for (i = 0; i < 6; i++) {
> +		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  			completion->hpdev->probed_bar[i] =
>  				q_res_req->probed_bar[i];
>  		}

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thanks.

^ permalink raw reply

* [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
From: Denis Efremov @ 2019-09-16 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, linux-kernel, linux-pci, Andrew Murray,
	linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin
In-Reply-To: <20190916204158.6889-1-efremov@linux.com>

Replace the magic constant (6) with define PCI_STD_NUM_BARS representing
the number of PCI BARs.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/pci/controller/pci-hyperv.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b625458afa..1665c23b649f 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -307,7 +307,7 @@ struct pci_bus_relations {
 struct pci_q_res_req_response {
 	struct vmpacket_descriptor hdr;
 	s32 status;			/* negative values are failures */
-	u32 probed_bar[6];
+	u32 probed_bar[PCI_STD_NUM_BARS];
 } __packed;
 
 struct pci_set_power {
@@ -503,7 +503,7 @@ struct hv_pci_dev {
 	 * What would be observed if one wrote 0xFFFFFFFF to a BAR and then
 	 * read it back, for each of the BAR offsets within config space.
 	 */
-	u32 probed_bar[6];
+	u32 probed_bar[PCI_STD_NUM_BARS];
 };
 
 struct hv_pci_compl {
@@ -1327,7 +1327,7 @@ static void survey_child_resources(struct hv_pcibus_device *hbus)
 	 * so it's sufficient to just add them up without tracking alignment.
 	 */
 	list_for_each_entry(hpdev, &hbus->children, list_entry) {
-		for (i = 0; i < 6; i++) {
+		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 			if (hpdev->probed_bar[i] & PCI_BASE_ADDRESS_SPACE_IO)
 				dev_err(&hbus->hdev->device,
 					"There's an I/O BAR in this list!\n");
@@ -1401,7 +1401,7 @@ static void prepopulate_bars(struct hv_pcibus_device *hbus)
 	/* Pick addresses for the BARs. */
 	do {
 		list_for_each_entry(hpdev, &hbus->children, list_entry) {
-			for (i = 0; i < 6; i++) {
+			for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 				bar_val = hpdev->probed_bar[i];
 				if (bar_val == 0)
 					continue;
@@ -1558,7 +1558,7 @@ static void q_resource_requirements(void *context, struct pci_response *resp,
 			"query resource requirements failed: %x\n",
 			resp->status);
 	} else {
-		for (i = 0; i < 6; i++) {
+		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 			completion->hpdev->probed_bar[i] =
 				q_res_req->probed_bar[i];
 		}
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Jim Mattson @ 2019-09-16 17:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-2-vkuznets@redhat.com>

On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> KVM needs to know if SMT is theoretically possible, this means it is
> supported and not forcefully disabled ('nosmt=force'). Create and
> export cpu_smt_possible() answering this question.

It seems to me that KVM really just wants to know if the scheduler can
be trusted to avoid violating the invariant expressed by the Hyper-V
enlightenment, NoNonArchitecturalCoreSharing. It is possible to do
that even when SMT is enabled, if the scheduler is core-aware.
Wouldn't it be better to implement a scheduler API that told you
exactly what you wanted to know, rather than trying to infer the
answer from various breadcrumbs?

^ permalink raw reply

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Jim Mattson @ 2019-09-16 16:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-3-vkuznets@redhat.com>

On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
> guarantee that two virtual processors won't end up running on sibling SMT
> threads without knowing about it. This is done as an optimization as in
> this case there is nothing the guest can do to protect itself against MDS
> and issuing additional flush requests is just pointless. On bare metal the
> topology is known, however, when Hyper-V is running nested (e.g. on top of
> KVM) it needs an additional piece of information: a confirmation that the
> exposed topology (wrt vCPU placement on different SMT threads) is
> trustworthy.
>
> NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
> TLFS as follows: "Indicates that a virtual processor will never share a
> physical core with another virtual processor, except for virtual processors
> that are reported as sibling SMT threads." From KVM we can give such
> guarantee in two cases:
> - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
>  as it can become re-enabled during the lifetime of the guest).
> - vCPUs are properly pinned so the scheduler won't put them on sibling
> SMT threads (when they're not reported as such).

That's a nice bit of information. Have you considered a mechanism for
communicating this information to kvm guests in a way that doesn't
require Hyper-V enlightenments?

> This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
> first case. The second case is outside of KVM's domain of responsibility
> (as vCPU pinning is actually done by someone who manages KVM's userspace -
> e.g. libvirt pinning QEMU threads).
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 7 +++++++
>  arch/x86/kvm/hyperv.c              | 4 +++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index af78cd72b8f3..989a1efe7f5e 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -170,6 +170,13 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED            BIT(14)
>
> +/*
> + * Virtual processor will never share a physical core with another virtual
> + * processor, except for virtual processors that are reported as sibling SMT
> + * threads.
> + */
> +#define HV_X64_NO_NONARCH_CORESHARING                  BIT(18)
> +
>  /* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
>  #define HV_X64_NESTED_GUEST_MAPPING_FLUSH              BIT(18)
>  #define HV_X64_NESTED_MSR_BITMAP                       BIT(19)
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index fff790a3f4ee..9c187d16a9cd 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -23,6 +23,7 @@
>  #include "ioapic.h"
>  #include "hyperv.h"
>
> +#include <linux/cpu.h>
>  #include <linux/kvm_host.h>
>  #include <linux/highmem.h>
>  #include <linux/sched/cputime.h>
> @@ -1864,7 +1865,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>                         ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
>                         if (evmcs_ver)
>                                 ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> -
> +                       if (!cpu_smt_possible())
> +                               ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
>                         /*
>                          * Default number of spinlock retry attempts, matches
>                          * HyperV 2016.
> --
> 2.20.1
>

^ permalink raw reply

* [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

KVM needs to know if SMT is theoretically possible, this means it is
supported and not forcefully disabled ('nosmt=force'). Create and
export cpu_smt_possible() answering this question.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/linux/cpu.h |  2 ++
 kernel/cpu.c        | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index fcb1386bb0d4..6d48fc456d58 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -201,12 +201,14 @@ enum cpuhp_smt_control {
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
 extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control		(CPU_SMT_NOT_IMPLEMENTED)
 static inline void cpu_smt_disable(bool force) { }
 static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; }
 #endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e84c0873559e..2f8c2631e641 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -389,8 +389,7 @@ enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
 {
-	if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
-		cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+	if (!cpu_smt_possible())
 		return;
 
 	if (force) {
@@ -435,6 +434,14 @@ static inline bool cpu_smt_allowed(unsigned int cpu)
 	 */
 	return !per_cpu(cpuhp_state, cpu).booted_once;
 }
+
+/* Returns true if SMT is not supported of forcefully (irreversibly) disabled */
+bool cpu_smt_possible(void)
+{
+	return cpu_smt_control != CPU_SMT_FORCE_DISABLED &&
+		cpu_smt_control != CPU_SMT_NOT_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(cpu_smt_possible);
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
 #endif
-- 
2.20.1


^ permalink raw reply related

* [PATCH 3/3] KVM: selftests: hyperv_cpuid: add check for NoNonArchitecturalCoreSharing bit
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

The bit is supposed to be '1' when SMT is not supported or forcefully
disabled and '0' otherwise.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/x86_64/hyperv_cpuid.c       | 27 +++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
index ee59831fbc98..443a2b54645b 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
@@ -26,6 +26,25 @@ static void guest_code(void)
 {
 }
 
+static int smt_possible(void)
+{
+	char buf[16];
+	FILE *f;
+	bool res = 1;
+
+	f = fopen("/sys/devices/system/cpu/smt/control", "r");
+	if (f) {
+		if (fread(buf, sizeof(*buf), sizeof(buf), f) > 0) {
+			if (!strncmp(buf, "forceoff", 8) ||
+			    !strncmp(buf, "notsupported", 12))
+				res = 0;
+		}
+		fclose(f);
+	}
+
+	return res;
+}
+
 static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries,
 			  int evmcs_enabled)
 {
@@ -59,6 +78,14 @@ static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries,
 		TEST_ASSERT(!entry->padding[0] && !entry->padding[1] &&
 			    !entry->padding[2], "padding should be zero");
 
+		if (entry->function == 0x40000004) {
+			int nononarchcs = !!(entry->eax & (1UL << 18));
+
+			TEST_ASSERT(nononarchcs == !smt_possible(),
+				    "NoNonArchitecturalCoreSharing bit"
+				    " doesn't reflect SMT setting");
+		}
+
 		/*
 		 * If needed for debug:
 		 * fprintf(stdout,
-- 
2.20.1


^ permalink raw reply related

* [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
guarantee that two virtual processors won't end up running on sibling SMT
threads without knowing about it. This is done as an optimization as in
this case there is nothing the guest can do to protect itself against MDS
and issuing additional flush requests is just pointless. On bare metal the
topology is known, however, when Hyper-V is running nested (e.g. on top of
KVM) it needs an additional piece of information: a confirmation that the
exposed topology (wrt vCPU placement on different SMT threads) is
trustworthy.

NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
TLFS as follows: "Indicates that a virtual processor will never share a
physical core with another virtual processor, except for virtual processors
that are reported as sibling SMT threads." From KVM we can give such
guarantee in two cases:
- SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
 as it can become re-enabled during the lifetime of the guest).
- vCPUs are properly pinned so the scheduler won't put them on sibling
SMT threads (when they're not reported as such).

This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
first case. The second case is outside of KVM's domain of responsibility
(as vCPU pinning is actually done by someone who manages KVM's userspace -
e.g. libvirt pinning QEMU threads).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 7 +++++++
 arch/x86/kvm/hyperv.c              | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index af78cd72b8f3..989a1efe7f5e 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -170,6 +170,13 @@
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
+/*
+ * Virtual processor will never share a physical core with another virtual
+ * processor, except for virtual processors that are reported as sibling SMT
+ * threads.
+ */
+#define HV_X64_NO_NONARCH_CORESHARING                  BIT(18)
+
 /* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
 #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
 #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fff790a3f4ee..9c187d16a9cd 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,6 +23,7 @@
 #include "ioapic.h"
 #include "hyperv.h"
 
+#include <linux/cpu.h>
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
@@ -1864,7 +1865,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 			ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
 			if (evmcs_ver)
 				ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-
+			if (!cpu_smt_possible())
+				ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
 			/*
 			 * Default number of spinlock retry attempts, matches
 			 * HyperV 2016.
-- 
2.20.1


^ permalink raw reply related

* [PATCH 0/3] KVM: x86: hyper-v: make L2 Hyper-V 2019 on KVM guests see MD_CLEAR
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan

[The series is KVM specific but the first patch of the series likely requires
 someone else's ACK. hyperv-tlfs.h gets a small addition too.]

It was discovered that L2 guests on Hyper-V 2019 on KVM don't see MD_CLEAR
bit (and thus think they're MDS vulnerable) even when it is present on the
host. Turns out, Hyper-V is filtering it out because it is not sure the
topology L0 is exposing is trustworthy and generally it is not. In some
specific cases (e.g. when SMT is unsupported or forcesully disabled) it is
and we can tell this to userspace hoping that it'll pass this info to L1.
See PATCH2 of the series for additional details.

The series can be tested with QEMU-4.1+ and 'hv-passthrough' CPU flag.

Vitaly Kuznetsov (3):
  cpu/SMT: create and export cpu_smt_possible()
  KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when
    SMT is impossible
  KVM: selftests: hyperv_cpuid: add check for
    NoNonArchitecturalCoreSharing bit

 arch/x86/include/asm/hyperv-tlfs.h            |  7 +++++
 arch/x86/kvm/hyperv.c                         |  4 ++-
 include/linux/cpu.h                           |  2 ++
 kernel/cpu.c                                  | 11 ++++++--
 .../selftests/kvm/x86_64/hyperv_cpuid.c       | 27 +++++++++++++++++++
 5 files changed, 48 insertions(+), 3 deletions(-)

-- 
2.20.1


^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Vitaly Kuznetsov @ 2019-09-16  8:45 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB0169C6B4A78787930CC9ED4FBFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Dexuan Cui <decui@microsoft.com> writes:

>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Sent: Thursday, September 12, 2019 9:37 AM
>
>> > +static int util_suspend(struct hv_device *dev)
>> > +{
>> > +	struct hv_util_service *srv = hv_get_drvdata(dev);
>> > +
>> > +	if (srv->util_cancel_work)
>> > +		srv->util_cancel_work();
>> > +
>> > +	vmbus_close(dev->channel);
>> 
>> And what happens if you receive a real reply from userspace after you
>> close the channel? You've only cancelled timeout work so the driver
>> will not try to reply by itself but this doesn't mean we won't try to
>> write to the channel on legitimate replies from daemons.
>> 
>> I think you need to block all userspace requests (hang in kernel until
>> util_resume()).
>> 
>> While I'm not sure we can't get away without it but I'd suggest we add a
>> new HVUTIL_SUSPENDED state to the hvutil state machine.
>> Vitaly
>
> When we reach util_suspend(), all the userspace processes have been
> frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(),
> so here we can not receive a reply from the userspace daemon.
>

Let's add a WARN() or something then as if this ever happens this is
going to be realy tricky to catch.

> However, it looks there is indeed some tricky corner cases we need to deal
> with: in util_resume(), before we call vmbus_open(), all the userspace
> processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) 
> can be in any of these states:
>
> 1. the driver has not buffered any message for the daemon. This is good.
>
> 2. the driver has buffered a message for the daemon, and
> kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon
> writes the response to the driver, and in kvp_on_msg() 
> kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since
> cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has
> been cancelled by util_suspend()), the driver reports nothing to the host,
> which is good as the VM has undergone a hibernation event and IMO the
> response may be stale and I believe the host is not expecting this 
> response from the VM at all (the host side application that requested the
> KVP must have failed or timed out), but the bad thing is: the "state"
> remains in HVUTIL_USERSPACE_RECV, preventing
> hv_kvp_onchannelcallback() from reading from the channel ringbuffer.
>
> I suggest we work around this race condition by the below patch:
>
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len)
>          */
>         if (cancel_delayed_work_sync(&kvp_timeout_work)) {
>                 kvp_respond_to_host(message, error);
> -               hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>         }
> +       hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>
>         return 0;
>  }
>
> How do you like this?
>

Is it safe to call hv_poll_channel() simultaneously on several CPUs? It
seems it is as we're doing 

smp_call_function_single(channel->target_cpu, cb, channel, true);

(I'm asking because if it's not, than doing what you suggest will open
the following window: timeout function kvp_timeout_func() is already
running but the daemon is trying to reply at the same moment).

> I can imagine there is still a small chance that the state machine can run
> out of order, and the kvp daemon may exit due to the return values of
> read()/write() being -EINVAL, but the chance should be small enough in
> practice, and IMO the issue even exists in the current code when
> hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() 
> can run concurrently; if kvp_on_msg() runs slowly due to some reason 
> (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func()
> fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts
> to run and returns -EINVAL, and the kvp daemon will exit().
>
> IMHO here it looks extremely difficult to make things flawless (if that's
> even possible), so it's necessary to ask the daemons to auto-restart once
> they exit() unexpectedly. This can be achieved by configuring systemd
> properly for the kvp/vss/fcopy services. 

I think we can also teach daemons to ignore -EINVAL or switch to
something like -EAGAIN in non-fatal cases.
 
>
> I'm not sure introducing a HVUTIL_SUSPENDED state would solve all
> of the corner cases, but I'm sure that would further complicate the
> current code, at least to me. :-)
>

Maybe, if we don't need it than we don't. Basically, I see the only
advantage in having such state: it makes our tricks to support
hibernation more visible in the code.

-- 
Vitaly


^ permalink raw reply

* RE: [PATCH] hv_balloon: Add the support of hibernation
From: Dexuan Cui @ 2019-09-14  0:26 UTC (permalink / raw)
  To: David Hildenbrand, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <ef6f8554-8324-a4d8-4549-759495e482b7@redhat.com>

> From: David Hildenbrand <david@redhat.com>
> Sent: Friday, September 13, 2019 2:44 PM
>
> > On recent Windows Server 2019+ hosts, the toolstacks on the hosts
> > guarantees that Dynamic Memory and Memory Resizing can not be enabled
> > if the virtual ACPI S4 state is enabled, and vice versa. Please refer to the
> > long write-up I made here.
>
> Hah, so the patch here is not actually relevant for modern Hyper-V

Correct.

> installations. (I would have loved to read that in the patch description
> - but maybe I missed that)

I'll add the related description into the changelog of v2 of this patch.

> > And, to make the hibernation functionality automated, the host is able to
> > send a "please hibernate" message to the VM via the Hyper-V shutdown
> > device upon the user's request (e.g. via GUI or scripting): see [...]
> > When the host sends the message,
> > it checks if the virtual ACPI S4 state is enabled for the VM: if not, the host
> > refuses to send the message. This means that the user does want to make
> > sure the virtual ACPI S4 state is enabled for the VM, if the user of the VM
> > wants to use the hibernation feature, and this means Dynamic Memory
> > and Memory Resizing can not be active due to the restrictions from the
> > host toolstack.
>
> Okay, *but* this is a current limitation. Just saying. If you could at
> least support balloon inflate/deflate, that would be a clear win for
> users. And less configuration knobs.

For Hyper-V (on recently hosts), Dynamic Memory (and Memory Resizing)
and hibernation are mutually exclusive and as I mentioned the host toolstack
guarantees they can not be both enabled. This is a host limitation and the VM
(i.e. we the Linux team) can do nothing about this. Note: here "enable
hibernation for a VM" means "enable the virtual ACPI S4 state for the VM".

By default a VM running on Hyper-V doesn't have the S4 state enabled, and
balloon inflate/deflate are indeed supported.

The knob (I think you mean the virtual ACPI S4 state) is introduced in the
host side design of the VM hibernation feature, and is enforced in the
host toolstack (as I described about the host-to-VM "please hibernate"
message). No knob or module parameter is introduced by the VM here.

> > And the hibernation functionality won't be officially supported on old
> > Windows Server hosts.
> >
> > So, IMHO we can't be bother to implement the idea you described in
> > detail. Sorry. :-)
>
> No worries, I neither develop for, use or work with Hyper-V. I was just
> reading along and wondering why you basically make the hv_balloon
> unusable in these environments. (initially I thought, "why don't you
> just disallow probing the device completely")

The Hyper-V team told me that: when hibernation is enabled & used for
a VM the only purpose of loading hv_balloon is that the driver can
still report the VM's memory pressure to the host, and it looks due to
some (non-technical?) reason the Hyper-V team thinks this info can be
useful.

> I am aware of the (hypervisor) issues of hibernation/suspend when it
> comes to balloon drivers / memory hot(un)plug. (currently working on
> virtio-mem myself and initially decided to block any
> hibernation/suspension attempts in case the driver is loaded and memory
> was plugged/unplugged)
>
> >
> > And, while I agree your idea is good, technically speaking I suspect it may
> > not be really useful, because once hv_balloon allows balloon-up/down,
> > hv_balloon effectively loses control of memory pages: after the host
> > takes some memory away, the VM never knows when exactly the
> > host will give it back -- actually the host never guarantees how soon
> > it will give the memory back. Consequently, the VM almost immediately
> > ends up in an un-hibernatable state...
> If you go via the host, you might be able to make sure to request to
> deflate the balloon before you try to hibernate, and inflate again when
> back up. You might even ask the user for permissions. Of course, once
> you deflated the balloon, it might not be guaranteed to inflate the
> balloon to the original size. But after all, it's "dynamic memory", so
> it might even be what the name suggests. It could be very well
> controlled from the host.
>
> If you go via the guest, you would first have to tell your hypervisor
> "please allow me to deflate so I can hibernate", or something like that.
> After hibernation (or some time X), the host might then decide to
> inflate again.
>
> E.g., take a look at virtio-balloon. When suspending, it simply deflates
> (without asking ...), to inflate again when resuming. Not saying that's
> the best approach (it's not :) ), but one approach to at least make it work.

Yes, I noticed this a few months ago. I think a major difference in Hyper-V
ballooning mechanism is that: all the deflate/inflate requests are from
the host and the VM can never proactively ask the host to deflate/inflate
the VM's memory. All that the VM can do is report its memory pressure
to the host and hope the host will soon give back the memory that was
taken away by the host.

I personally like the approach used in virtio-balloon. :-)

> Anyhow, just some comments from my side :) I can see how Windows Server
> worked around that issue right now by just XOR'ing both features.
>
> David / dhildenb

Thanks for sharing your thoughts!

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Michael Kelley @ 2019-09-13 23:31 UTC (permalink / raw)
  To: Wei Hu, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan, Dexuan Cui
In-Reply-To: <20190913060209.3604-1-weh@microsoft.com>

From: Wei Hu <weh@microsoft.com> Sent: Thursday, September 12, 2019 11:03 PM
> 
> Without deferred IO support, hyperv_fb driver informs the host to refresh
> the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> is screen update or not. This patch supports deferred IO for screens in
> graphics mode and also enables the frame buffer on-demand refresh. The
> highest refresh rate is still set at 20Hz.
> 
> Currently Hyper-V only takes a physical address from guest as the starting
> address of frame buffer. This implies the guest must allocate contiguous
> physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> accept address from MMIO region as frame buffer address. Due to these
> limitations on Hyper-V host, we keep a shadow copy of frame buffer
> in the guest. This means one more copy of the dirty rectangle inside
> guest when doing the on-demand refresh. This can be optimized in the
> future with help from host. For now the host performance gain from deferred
> IO outweighs the shadow copy impact in the guest.
> 
> Signed-off-by: Wei Hu <weh@microsoft.com>
> ---
>     v2: Incorporated review comments from Michael Kelley
>     - Increased dirty rectangle by one row in deferred IO case when sending
>     to Hyper-V.
>     - Corrected the dirty rectangle size in the text mode.
>     - Added more comments.
>     - Other minor code cleanups.
> 
>     v3: Incorporated more review comments
>     - Removed a few unnecessary variable tests
> 
>     v4: Incorporated test and review feedback from Dexuan Cui
>     - Not disable interrupt while acquiring docopy_lock in
>       hvfb_update_work(). This avoids significant bootup delay in
>       large vCPU count VMs.
> 
>     v5: Completely remove the unnecessary docopy_lock after discussing
>     with Dexuan Cui.
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* Re: [PATCH] scsi: storvsc: Add the support of hibernation
From: Martin K. Petersen @ 2019-09-13 22:47 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	linux-hyperv@vger.kernel.org, linux-scsi@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <1568244905-66625-1-git-send-email-decui@microsoft.com>


Dexuan,

> When we're in storvsc_suspend(), we're sure the SCSI layer has
> quiesced the scsi device by scsi_bus_suspend() -> ... ->
> scsi_device_quiesce(), so the low level SCSI adapter driver only needs
> to suspend/resume its own state.

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH] hv_balloon: Add the support of hibernation
From: David Hildenbrand @ 2019-09-13 21:44 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB01691EC455AAF37BC6AF26DDBFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On 13.09.19 22:54, Dexuan Cui wrote:
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Friday, September 13, 2019 12:46 AM
>>
>> On 12.09.19 21:18, Dexuan Cui wrote:
>>> 3. Hibernation can be especially useful when we pass through a PCIe device,
>>> e.g. a NIC, a NVMe controller or a GPU, to the VM, as usually save/restore
>>> and live migration can not work with this kind of configuration, because
>>> usually the host doesn't know how to save/restore the state of the PCIe
>>> device.
>>
>> Interesting. Under QEMU/KVM (especially for migration), the discussed
>> solutions I am aware of rather wanted to temporarily unplug the PCI
>> devices or replace them with some kind of "standby" device temporarily.
> 
> For the complex devices like a modern GPU, there may not be an 
> equivalent "standby" software-emulated device for it, and unplugging the
> PCI device temporarily is not good, as it may not be transparent to the
> userspace applications. Hibernation here is especially useful, e.g. to Virtual
> Desktop Infrastructure users whose VMs can own physical GPUs, because
> all the userspace applications are frozen when the VM is hibernated, and
> when the VM resumes back, the applications are automatically resumed 
> and continue to run seamlessly, at least in theory. A hibernated VM saves
> compute resources and cost for the users.

Yes, I can see how GPUs might be problematic, especially for desktop
infrastructures (and maybe especially when running specific guest
operating systems :) ). Thanks for the explanation.

[...]

> On recent Windows Server 2019+ hosts, the toolstacks on the hosts
> guarantees that Dynamic Memory and Memory Resizing can not be enabled
> if the virtual ACPI S4 state is enabled, and vice versa. Please refer to the
> long write-up I made here: https://lkml.org/lkml/2019/9/5/1160 .

Hah, so the patch here is not actually relevant for modern Hyper-V
installations. (I would have loved to read that in the patch description
- but maybe I missed that)

> 
> And, to make the hibernation functionality automated, the host is able to
> send a "please hibernate" message to the VM via the Hyper-V shutdown
> device upon the user's request (e.g. via GUI or scripting): see 
> https://lkml.org/lkml/2019/9/13/811 . When the host sends the message,
> it checks if the virtual ACPI S4 state is enabled for the VM: if not, the host
> refuses to send the message. This means that the user does want to make
> sure the virtual ACPI S4 state is enabled for the VM, if the user of the VM
> wants to use the hibernation feature, and this means Dynamic Memory
> and Memory Resizing can not be active due to the restrictions from the 
> host toolstack.

Okay, *but* this is a current limitation. Just saying. If you could at
least support balloon inflate/deflate, that would be a clear win for
users. And less configuration knobs.

> 
> And the hibernation functionality won't be officially supported on old
> Windows Server hosts.
> 
> So, IMHO we can't be bother to implement the idea you described in
> detail. Sorry. :-)

No worries, I neither develop for, use or work with Hyper-V. I was just
reading along and wondering why you basically make the hv_balloon
unusable in these environments. (initially I thought, "why don't you
just disallow probing the device completely")

I am aware of the (hypervisor) issues of hibernation/suspend when it
comes to balloon drivers / memory hot(un)plug. (currently working on
virtio-mem myself and initially decided to block any
hibernation/suspension attempts in case the driver is loaded and memory
was plugged/unplugged)

> 
> And, while I agree your idea is good, technically speaking I suspect it may
> not be really useful, because once hv_balloon allows balloon-up/down,
> hv_balloon effectively loses control of memory pages: after the host
> takes some memory away, the VM never knows when exactly the
> host will give it back -- actually the host never guarantees how soon
> it will give the memory back. Consequently, the VM almost immediately
> ends up in an un-hibernatable state...
If you go via the host, you might be able to make sure to request to
deflate the balloon before you try to hibernate, and inflate again when
back up. You might even ask the user for permissions. Of course, once
you deflated the balloon, it might not be guaranteed to inflate the
balloon to the original size. But after all, it's "dynamic memory", so
it might even be what the name suggests. It could be very well
controlled from the host.

If you go via the guest, you would first have to tell your hypervisor
"please allow me to deflate so I can hibernate", or something like that.
After hibernation (or some time X), the host might then decide to
inflate again.

E.g., take a look at virtio-balloon. When suspending, it simply deflates
(without asking ...), to inflate again when resuming. Not saying that's
the best approach (it's not :) ), but one approach to at least make it work.

Anyhow, just some comments from my side :) I can see how Windows Server
worked around that issue right now by just XOR'ing both features.

> 
> Thanks,
> -- Dexuan
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply

* RE: [PATCH] hv_balloon: Add the support of hibernation
From: Dexuan Cui @ 2019-09-13 20:54 UTC (permalink / raw)
  To: David Hildenbrand, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <7d218fd5-76d9-f5fa-548a-76fe5dfab230@redhat.com>

> From: David Hildenbrand <david@redhat.com>
> Sent: Friday, September 13, 2019 12:46 AM
> 
> On 12.09.19 21:18, Dexuan Cui wrote:
> > 3. Hibernation can be especially useful when we pass through a PCIe device,
> > e.g. a NIC, a NVMe controller or a GPU, to the VM, as usually save/restore
> > and live migration can not work with this kind of configuration, because
> > usually the host doesn't know how to save/restore the state of the PCIe
> > device.
> 
> Interesting. Under QEMU/KVM (especially for migration), the discussed
> solutions I am aware of rather wanted to temporarily unplug the PCI
> devices or replace them with some kind of "standby" device temporarily.

For the complex devices like a modern GPU, there may not be an 
equivalent "standby" software-emulated device for it, and unplugging the
PCI device temporarily is not good, as it may not be transparent to the
userspace applications. Hibernation here is especially useful, e.g. to Virtual
Desktop Infrastructure users whose VMs can own physical GPUs, because
all the userspace applications are frozen when the VM is hibernated, and
when the VM resumes back, the applications are automatically resumed 
and continue to run seamlessly, at least in theory. A hibernated VM saves
compute resources and cost for the users.
 
> Anyhow, would it also be an option for you instead of making the balloon
> basically useless in case the virtual ACPI S4 state is enabled to
> 
> a) Remember if there was a harmful requests that was processed (memory
> add, balloon up, balloon down) - or if the device is *currently* in an
> un-hibernatable state. E.g., if somebody inflated the balloon, you can't
> hibernate. But if the balloon was deflated again, you can again hibernate.
> 
> b) Block hibernation in balloon_suspend() in case the device is in such
> an un-hibernatable state.
> 
> 
> Then you don't need hv_is_hibernation_supported(). The VM is able to
> hibernate as long as Dynamic Memory and Memory Resizing was not used.
> This is something that can be documented perfectly well.
> 
> David / dhildenb

On recent Windows Server 2019+ hosts, the toolstacks on the hosts
guarantees that Dynamic Memory and Memory Resizing can not be enabled
if the virtual ACPI S4 state is enabled, and vice versa. Please refer to the
long write-up I made here: https://lkml.org/lkml/2019/9/5/1160 .

And, to make the hibernation functionality automated, the host is able to
send a "please hibernate" message to the VM via the Hyper-V shutdown
device upon the user's request (e.g. via GUI or scripting): see 
https://lkml.org/lkml/2019/9/13/811 . When the host sends the message,
it checks if the virtual ACPI S4 state is enabled for the VM: if not, the host
refuses to send the message. This means that the user does want to make
sure the virtual ACPI S4 state is enabled for the VM, if the user of the VM
wants to use the hibernation feature, and this means Dynamic Memory
and Memory Resizing can not be active due to the restrictions from the 
host toolstack.

And the hibernation functionality won't be officially supported on old
Windows Server hosts.

So, IMHO we can't be bother to implement the idea you described in
detail. Sorry. :-)

And, while I agree your idea is good, technically speaking I suspect it may
not be really useful, because once hv_balloon allows balloon-up/down,
hv_balloon effectively loses control of memory pages: after the host
takes some memory away, the VM never knows when exactly the
host will give it back -- actually the host never guarantees how soon
it will give the memory back. Consequently, the VM almost immediately
ends up in an un-hibernatable state...

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH][PATCH net-next] hv_sock: Add the support of hibernation
From: Dexuan Cui @ 2019-09-13 20:13 UTC (permalink / raw)
  To: David Miller, sashal@kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <20190913.210343.724088723062134961.davem@davemloft.net>

> From: David Miller <davem@davemloft.net>
> Sent: Friday, September 13, 2019 1:04 PM
> 
> From: Dexuan Cui <decui@microsoft.com>
> Date: Wed, 11 Sep 2019 23:37:27 +0000
> > I request this patch should go through Sasha's tree rather than the
> > net-next tree.
> 
> That's fine:
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Thanks, David!

@Sasha: I found a few typos in my comment below. I'll post a v2.

> > +/* hv_sock connections can not persist across hibernation, and all the hv_sock
> >  + * channels are forceed to be rescinded before hibernation: see

forceed -> forced

> >  + * are only needed because hibernation requires that every device's driver

every device's driver -> every vmbus device's driver

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH][PATCH net-next] hv_sock: Add the support of hibernation
From: David Miller @ 2019-09-13 20:03 UTC (permalink / raw)
  To: decui
  Cc: kys, haiyangz, sthemmin, sashal, linux-hyperv, netdev,
	linux-kernel, mikelley
In-Reply-To: <1568245042-66967-1-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com>
Date: Wed, 11 Sep 2019 23:37:27 +0000

> Add the necessary dummy callbacks for hibernation.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> This patch is basically a pure Hyper-V specific change and it has a
> build dependency on the commit 271b2224d42f ("Drivers: hv: vmbus: Implement
> suspend/resume for VSC drivers for hibernation"), which is on Sasha Levin's
> Hyper-V tree's hyperv-next branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-next
> 
> I request this patch should go through Sasha's tree rather than the
> net-next tree.

That's fine:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Dexuan Cui @ 2019-09-13 19:15 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <877e6dcvzj.fsf@vitty.brq.redhat.com>

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Thursday, September 12, 2019 9:37 AM

> > +static int util_suspend(struct hv_device *dev)
> > +{
> > +	struct hv_util_service *srv = hv_get_drvdata(dev);
> > +
> > +	if (srv->util_cancel_work)
> > +		srv->util_cancel_work();
> > +
> > +	vmbus_close(dev->channel);
> 
> And what happens if you receive a real reply from userspace after you
> close the channel? You've only cancelled timeout work so the driver
> will not try to reply by itself but this doesn't mean we won't try to
> write to the channel on legitimate replies from daemons.
> 
> I think you need to block all userspace requests (hang in kernel until
> util_resume()).
> 
> While I'm not sure we can't get away without it but I'd suggest we add a
> new HVUTIL_SUSPENDED state to the hvutil state machine.
> Vitaly

When we reach util_suspend(), all the userspace processes have been
frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(),
so here we can not receive a reply from the userspace daemon.

However, it looks there is indeed some tricky corner cases we need to deal
with: in util_resume(), before we call vmbus_open(), all the userspace
processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) 
can be in any of these states:

1. the driver has not buffered any message for the daemon. This is good.

2. the driver has buffered a message for the daemon, and
kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon
writes the response to the driver, and in kvp_on_msg() 
kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since
cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has
been cancelled by util_suspend()), the driver reports nothing to the host,
which is good as the VM has undergone a hibernation event and IMO the
response may be stale and I believe the host is not expecting this 
response from the VM at all (the host side application that requested the
KVP must have failed or timed out), but the bad thing is: the "state"
remains in HVUTIL_USERSPACE_RECV, preventing
hv_kvp_onchannelcallback() from reading from the channel ringbuffer.

I suggest we work around this race condition by the below patch:

--- a/drivers/hv/hv_kvp.c
+++ b/drivers/hv/hv_kvp.c
@@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len)
         */
        if (cancel_delayed_work_sync(&kvp_timeout_work)) {
                kvp_respond_to_host(message, error);
-               hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
        }
+       hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);

        return 0;
 }

How do you like this?

I can imagine there is still a small chance that the state machine can run
out of order, and the kvp daemon may exit due to the return values of
read()/write() being -EINVAL, but the chance should be small enough in
practice, and IMO the issue even exists in the current code when
hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() 
can run concurrently; if kvp_on_msg() runs slowly due to some reason 
(e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func()
fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts
to run and returns -EINVAL, and the kvp daemon will exit().

IMHO here it looks extremely difficult to make things flawless (if that's
even possible), so it's necessary to ask the daemons to auto-restart once
they exit() unexpectedly. This can be achieved by configuring systemd
properly for the kvp/vss/fcopy services. 

I'm not sure introducing a HVUTIL_SUSPENDED state would solve all
of the corner cases, but I'm sure that would further complicate the
current code, at least to me. :-)

Please share your thoughts. I believe you know all of these much
better than me, as you're the author of the state machine. :-)

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH 2/3] hv_utils: Support host-initiated hibernation request
From: Dexuan Cui @ 2019-09-13 16:42 UTC (permalink / raw)
  To: vkuznets
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <87a7b9cwfj.fsf@vitty.brq.redhat.com>

> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Thursday, September 12, 2019 9:27 AM
> 
> Dexuan Cui <decui@microsoft.com> writes:
> 
> > +static void perform_hibernation(struct work_struct *dummy)
> > +{
> > +	/*
> > +	 * The user is expected to create the program, which can be a simple
> > +	 * script containing two lines:
> > +	 * #!/bin/bash
> > +	 * echo disk > /sys/power/state
> 
> 'systemctl hibernate' is what people do nowadays :-)

Thanks for sharing this command! 
 
> > +	 */
> > +	static char hibernate_cmd[PATH_MAX] = "/sbin/hyperv-hibernate";
> > +
> 
> Let's not do that (I remember when we were triggering network restart
> from netvsc and it was a lot of pain).
> 
> Receiving hybernation request from the host is similar to pushing power
> button on your desktop: an ACPI event is going to be generated and your
> userspace will somehow react to it. I see two options:
> 1) We try to hook up some existing userspace (udev?)
> 2) We write a new hyperv-daemon handling the request (with a config file
> instead of hardcoding please).
> 
> Vitaly

Thanks for the suggestions! 
I prefer the udev method, e.g. something like this:

	char *uevent_env[2] = { "EVENT=hibernate", NULL };
	kobject_uevent_env(&ctx->dev->device.kobj, KOBJ_CHANGE, uevent_env);

Then the user is expected to create the below udev rule file, which is applied
upon the host-initiated hibernation request:

root@localhost:~# cat /usr/lib/udev/rules.d/40-vm-hibernation.rules
SUBSYSTEM=="vmbus", ACTION=="change", DRIVER=="hv_utils", ENV{EVENT}=="hibernate", RUN+="/usr/bin/systemctl hibernate"

The full patch is here:
https://github.com/dcui/linux/commit/0d92b53f48a8dca92bbd3493ea9c5bd098c99623

I'll post it as v2.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH] hv_balloon: Add the support of hibernation
From: David Hildenbrand @ 2019-09-13  7:46 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB0169E922DF7A5A43C7026D82BFB00@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On 12.09.19 21:18, Dexuan Cui wrote:
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Thursday, September 12, 2019 3:09 AM
>> On 12.09.19 01:36, Dexuan Cui wrote:
>>> When hibernation is enabled, we must ignore the balloon up/down and
>>> hot-add requests from the host, if any.
>>
>> Why do you even care about supporting hibernation? Can't you just pause
>> the VM in the hypervisor and continue to live a happy life? :)
>>
>> (to be more precise, most QEMU/KVM distributions I am aware of don't
>> support suspend/hibernation of guests for said reason, so I wonder why
>> Hyper-V needs it)
> 
> In some scenarios, hibernation can be better than pause/unpause,
> save/restore and live migration:
> 
> 1. Compared to pause/unpause, the VM can power off completely with
> hibernation, and all the states are saved inside the VM image, then the
> image can be copied to another host to start the VM again, as long as
> the new host uses exactly the same configuration for the VM.

Okay, under QEMU that also works just fine via pause/unpause (e.g.,
simply migration).

> 
> 2. Compared to pause/unpause, hibernation may be more reliable, since it's
> performed by the VM kernel rather than the host, so the VM kernel may
> better tackle some clock-source/event-sensitive issues.

Not sure I agree, but maybe that's a Hyper-V special issue.

> 
> 3. Hibernation can be especially useful when we pass through a PCIe device,
> e.g. a NIC, a NVMe controller or a GPU, to the VM, as usually save/restore
> and live migration can not work with this kind of configuration, because
> usually the host doesn't know how to save/restore the state of the PCIe
> device.

Interesting. Under QEMU/KVM (especially for migration), the discussed
solutions I am aware of rather wanted to temporarily unplug the PCI
devices or replace them with some kind of "standby" device temporarily.


Anyhow, would it also be an option for you instead of making the balloon
basically useless in case the virtual ACPI S4 state is enabled to

a) Remember if there was a harmful requests that was processed (memory
add, balloon up, balloon down) - or if the device is *currently* in an
un-hibernatable state. E.g., if somebody inflated the balloon, you can't
hibernate. But if the balloon was deflated again, you can again hibernate.

b) Block hibernation in balloon_suspend() in case the device is in such
an un-hibernatable state.


Then you don't need hv_is_hibernation_supported(). The VM is able to
hibernate as long as Dynamic Memory and Memory Resizing was not used.
This is something that can be documented perfectly well.

-- 

Thanks,

David / dhildenb

^ permalink raw reply

* RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
From: Dexuan Cui @ 2019-09-13  6:38 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, b.zolnierkie@samsung.com,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stephen Hemminger, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan
  Cc: Iouri Tarassov
In-Reply-To: <DM5PR21MB0137D40DF705CDB372497266D7BB0@DM5PR21MB0137.namprd21.prod.outlook.com>

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Thursday, September 5, 2019 7:06 AM
> 
> From: Wei Hu <weh@microsoft.com> Sent: Thursday, September 5, 2019 2:12
> AM
> >
> > Beginning from Windows 10 RS5+, VM screen resolution is obtained from
> host.
> > The "video=hyperv_fb" boot time option is not needed, but still can be
> > used to overwrite what the host specifies. The VM resolution on the host
> > could be set by executing the powershell "set-vmvideo" command.
> >
> > Signed-off-by: Iouri Tarassov <iourit@microsoft.com>
> > Signed-off-by: Wei Hu <weh@microsoft.com>
> > ---
> >     v2:
> >     - Implemented fallback when version negotiation failed.
> >     - Defined full size for supported_resolution array.
> >
> >     v3:
> >     - Corrected the synthvid major and minor version comparison problem.
> >
> >     v4:
> >     - Changed function name to synthvid_ver_ge().
> >
> >  drivers/video/fbdev/hyperv_fb.c | 159
> +++++++++++++++++++++++++++++---
> >  1 file changed, 147 insertions(+), 12 deletions(-)
> >
> 
> Reviewed-by: Michael Kelley <mikelley@microsoft.com>

Looks good to me.

Reviewed-by: Dexuan Cui <decui@microsoft.com>

^ permalink raw reply

* RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Dexuan Cui @ 2019-09-13  6:37 UTC (permalink / raw)
  To: Wei Hu, Michael Kelley, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan
In-Reply-To: <20190913060209.3604-1-weh@microsoft.com>

> From: Wei Hu <weh@microsoft.com>
> Sent: Thursday, September 12, 2019 11:03 PM
> 
> Without deferred IO support, hyperv_fb driver informs the host to refresh
> the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> is screen update or not. This patch supports deferred IO for screens in
> graphics mode and also enables the frame buffer on-demand refresh. The
> highest refresh rate is still set at 20Hz.
> 
> Currently Hyper-V only takes a physical address from guest as the starting
> address of frame buffer. This implies the guest must allocate contiguous
> physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> accept address from MMIO region as frame buffer address. Due to these
> limitations on Hyper-V host, we keep a shadow copy of frame buffer
> in the guest. This means one more copy of the dirty rectangle inside
> guest when doing the on-demand refresh. This can be optimized in the
> future with help from host. For now the host performance gain from deferred
> IO outweighs the shadow copy impact in the guest.
> 
> Signed-off-by: Wei Hu <weh@microsoft.com>
> ---
>     v2: Incorporated review comments from Michael Kelley
>     - Increased dirty rectangle by one row in deferred IO case when sending
>     to Hyper-V.
>     - Corrected the dirty rectangle size in the text mode.
>     - Added more comments.
>     - Other minor code cleanups.
> 
>     v3: Incorporated more review comments
>     - Removed a few unnecessary variable tests
> 
>     v4: Incorporated test and review feedback from Dexuan Cui
>     - Not disable interrupt while acquiring docopy_lock in
>       hvfb_update_work(). This avoids significant bootup delay in
>       large vCPU count VMs.
> 
>     v5: Completely remove the unnecessary docopy_lock after discussing
>     with Dexuan Cui.

Thanks! Looks good to me.

Reviewed-by: Dexuan Cui <decui@microsoft.com>

^ permalink raw reply

* [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Wei Hu @ 2019-09-13  6:02 UTC (permalink / raw)
  To: Michael Kelley, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan, Dexuan Cui
  Cc: Wei Hu

Without deferred IO support, hyperv_fb driver informs the host to refresh
the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
is screen update or not. This patch supports deferred IO for screens in
graphics mode and also enables the frame buffer on-demand refresh. The
highest refresh rate is still set at 20Hz.

Currently Hyper-V only takes a physical address from guest as the starting
address of frame buffer. This implies the guest must allocate contiguous
physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
accept address from MMIO region as frame buffer address. Due to these
limitations on Hyper-V host, we keep a shadow copy of frame buffer
in the guest. This means one more copy of the dirty rectangle inside
guest when doing the on-demand refresh. This can be optimized in the
future with help from host. For now the host performance gain from deferred
IO outweighs the shadow copy impact in the guest.

Signed-off-by: Wei Hu <weh@microsoft.com>
---
    v2: Incorporated review comments from Michael Kelley
    - Increased dirty rectangle by one row in deferred IO case when sending
    to Hyper-V.
    - Corrected the dirty rectangle size in the text mode.
    - Added more comments.
    - Other minor code cleanups.

    v3: Incorporated more review comments
    - Removed a few unnecessary variable tests

    v4: Incorporated test and review feedback from Dexuan Cui
    - Not disable interrupt while acquiring docopy_lock in
      hvfb_update_work(). This avoids significant bootup delay in
      large vCPU count VMs.

    v5: Completely remove the unnecessary docopy_lock after discussing
    with Dexuan Cui.

 drivers/video/fbdev/Kconfig     |   1 +
 drivers/video/fbdev/hyperv_fb.c | 208 +++++++++++++++++++++++++++++---
 2 files changed, 189 insertions(+), 20 deletions(-)

diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 1b2f5f31fb6f..e781f89a1824 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -2241,6 +2241,7 @@ config FB_HYPERV
 	select FB_CFB_FILLRECT
 	select FB_CFB_COPYAREA
 	select FB_CFB_IMAGEBLIT
+	select FB_DEFERRED_IO
 	help
 	  This framebuffer driver supports Microsoft Hyper-V Synthetic Video.
 
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index fe319fc39bec..89a5ec3741b9 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -237,6 +237,7 @@ struct synthvid_msg {
 #define RING_BUFSIZE (256 * 1024)
 #define VSP_TIMEOUT (10 * HZ)
 #define HVFB_UPDATE_DELAY (HZ / 20)
+#define HVFB_ONDEMAND_THROTTLE (HZ / 20)
 
 struct hvfb_par {
 	struct fb_info *info;
@@ -256,6 +257,16 @@ struct hvfb_par {
 	bool synchronous_fb;
 
 	struct notifier_block hvfb_panic_nb;
+
+	/* Memory for deferred IO and frame buffer itself */
+	unsigned char *dio_vp;
+	unsigned char *mmio_vp;
+	unsigned long mmio_pp;
+
+	/* Dirty rectangle, protected by delayed_refresh_lock */
+	int x1, y1, x2, y2;
+	bool delayed_refresh;
+	spinlock_t delayed_refresh_lock;
 };
 
 static uint screen_width = HVFB_WIDTH;
@@ -264,6 +275,7 @@ static uint screen_width_max = HVFB_WIDTH;
 static uint screen_height_max = HVFB_HEIGHT;
 static uint screen_depth;
 static uint screen_fb_size;
+static uint dio_fb_size; /* FB size for deferred IO */
 
 /* Send message to Hyper-V host */
 static inline int synthvid_send(struct hv_device *hdev,
@@ -350,28 +362,88 @@ static int synthvid_send_ptr(struct hv_device *hdev)
 }
 
 /* Send updated screen area (dirty rectangle) location to host */
-static int synthvid_update(struct fb_info *info)
+static int
+synthvid_update(struct fb_info *info, int x1, int y1, int x2, int y2)
 {
 	struct hv_device *hdev = device_to_hv_device(info->device);
 	struct synthvid_msg msg;
 
 	memset(&msg, 0, sizeof(struct synthvid_msg));
+	if (x2 == INT_MAX)
+		x2 = info->var.xres;
+	if (y2 == INT_MAX)
+		y2 = info->var.yres;
 
 	msg.vid_hdr.type = SYNTHVID_DIRT;
 	msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
 		sizeof(struct synthvid_dirt);
 	msg.dirt.video_output = 0;
 	msg.dirt.dirt_count = 1;
-	msg.dirt.rect[0].x1 = 0;
-	msg.dirt.rect[0].y1 = 0;
-	msg.dirt.rect[0].x2 = info->var.xres;
-	msg.dirt.rect[0].y2 = info->var.yres;
+	msg.dirt.rect[0].x1 = (x1 > x2) ? 0 : x1;
+	msg.dirt.rect[0].y1 = (y1 > y2) ? 0 : y1;
+	msg.dirt.rect[0].x2 =
+		(x2 < x1 || x2 > info->var.xres) ? info->var.xres : x2;
+	msg.dirt.rect[0].y2 =
+		(y2 < y1 || y2 > info->var.yres) ? info->var.yres : y2;
 
 	synthvid_send(hdev, &msg);
 
 	return 0;
 }
 
+static void hvfb_docopy(struct hvfb_par *par,
+			unsigned long offset,
+			unsigned long size)
+{
+	if (!par || !par->mmio_vp || !par->dio_vp || !par->fb_ready ||
+	    size == 0 || offset >= dio_fb_size)
+		return;
+
+	if (offset + size > dio_fb_size)
+		size = dio_fb_size - offset;
+
+	memcpy(par->mmio_vp + offset, par->dio_vp + offset, size);
+}
+
+/* Deferred IO callback */
+static void synthvid_deferred_io(struct fb_info *p,
+				 struct list_head *pagelist)
+{
+	struct hvfb_par *par = p->par;
+	struct page *page;
+	unsigned long start, end;
+	int y1, y2, miny, maxy;
+
+	miny = INT_MAX;
+	maxy = 0;
+
+	/*
+	 * Merge dirty pages. It is possible that last page cross
+	 * over the end of frame buffer row yres. This is taken care of
+	 * in synthvid_update function by clamping the y2
+	 * value to yres.
+	 */
+	list_for_each_entry(page, pagelist, lru) {
+		start = page->index << PAGE_SHIFT;
+		end = start + PAGE_SIZE - 1;
+		y1 = start / p->fix.line_length;
+		y2 = end / p->fix.line_length;
+		miny = min_t(int, miny, y1);
+		maxy = max_t(int, maxy, y2);
+
+		/* Copy from dio space to mmio address */
+		if (par->fb_ready)
+			hvfb_docopy(par, start, PAGE_SIZE);
+	}
+
+	if (par->fb_ready)
+		synthvid_update(p, 0, miny, p->var.xres, maxy + 1);
+}
+
+static struct fb_deferred_io synthvid_defio = {
+	.delay		= HZ / 20,
+	.deferred_io	= synthvid_deferred_io,
+};
 
 /*
  * Actions on received messages from host:
@@ -618,7 +690,7 @@ static int synthvid_send_config(struct hv_device *hdev)
 	msg->vid_hdr.type = SYNTHVID_VRAM_LOCATION;
 	msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
 		sizeof(struct synthvid_vram_location);
-	msg->vram.user_ctx = msg->vram.vram_gpa = info->fix.smem_start;
+	msg->vram.user_ctx = msg->vram.vram_gpa = par->mmio_pp;
 	msg->vram.is_vram_gpa_specified = 1;
 	synthvid_send(hdev, msg);
 
@@ -628,7 +700,7 @@ static int synthvid_send_config(struct hv_device *hdev)
 		ret = -ETIMEDOUT;
 		goto out;
 	}
-	if (msg->vram_ack.user_ctx != info->fix.smem_start) {
+	if (msg->vram_ack.user_ctx != par->mmio_pp) {
 		pr_err("Unable to set VRAM location\n");
 		ret = -ENODEV;
 		goto out;
@@ -645,19 +717,77 @@ static int synthvid_send_config(struct hv_device *hdev)
 
 /*
  * Delayed work callback:
- * It is called at HVFB_UPDATE_DELAY or longer time interval to process
- * screen updates. It is re-scheduled if further update is necessary.
+ * It is scheduled to call whenever update request is received and it has
+ * not been called in last HVFB_ONDEMAND_THROTTLE time interval.
  */
 static void hvfb_update_work(struct work_struct *w)
 {
 	struct hvfb_par *par = container_of(w, struct hvfb_par, dwork.work);
 	struct fb_info *info = par->info;
+	unsigned long flags;
+	int x1, x2, y1, y2;
+	int j;
 
+	spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+	/* Reset the request flag */
+	par->delayed_refresh = false;
+
+	/* Store the dirty rectangle to local variables */
+	x1 = par->x1;
+	x2 = par->x2;
+	y1 = par->y1;
+	y2 = par->y2;
+
+	/* Clear dirty rectangle */
+	par->x1 = par->y1 = INT_MAX;
+	par->x2 = par->y2 = 0;
+
+	spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
+
+	if (x1 > info->var.xres || x2 > info->var.xres ||
+	    y1 > info->var.yres || y2 > info->var.yres || x2 <= x1)
+		return;
+
+	/* Copy the dirty rectangle to frame buffer memory */
+	for (j = y1; j < y2; j++) {
+		hvfb_docopy(par,
+			    j * info->fix.line_length +
+			    (x1 * screen_depth / 8),
+			    (x2 - x1) * screen_depth / 8);
+	}
+
+	/* Refresh */
 	if (par->fb_ready)
-		synthvid_update(info);
+		synthvid_update(info, x1, y1, x2, y2);
+}
+
+/*
+ * Control the on-demand refresh frequency. It schedules a delayed
+ * screen update if it has not yet.
+ */
+static void hvfb_ondemand_refresh_throttle(struct hvfb_par *par,
+					   int x1, int y1, int w, int h)
+{
+	unsigned long flags;
+	int x2 = x1 + w;
+	int y2 = y1 + h;
+
+	spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+
+	/* Merge dirty rectangle */
+	par->x1 = min_t(int, par->x1, x1);
+	par->y1 = min_t(int, par->y1, y1);
+	par->x2 = max_t(int, par->x2, x2);
+	par->y2 = max_t(int, par->y2, y2);
+
+	/* Schedule a delayed screen update if not yet */
+	if (par->delayed_refresh == false) {
+		schedule_delayed_work(&par->dwork,
+				      HVFB_ONDEMAND_THROTTLE);
+		par->delayed_refresh = true;
+	}
 
-	if (par->update)
-		schedule_delayed_work(&par->dwork, HVFB_UPDATE_DELAY);
+	spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
 }
 
 static int hvfb_on_panic(struct notifier_block *nb,
@@ -669,7 +799,8 @@ static int hvfb_on_panic(struct notifier_block *nb,
 	par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
 	par->synchronous_fb = true;
 	info = par->info;
-	synthvid_update(info);
+	hvfb_docopy(par, 0, dio_fb_size);
+	synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
 
 	return NOTIFY_DONE;
 }
@@ -730,7 +861,10 @@ static void hvfb_cfb_fillrect(struct fb_info *p,
 
 	cfb_fillrect(p, rect);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, rect->dx, rect->dy,
+					       rect->width, rect->height);
 }
 
 static void hvfb_cfb_copyarea(struct fb_info *p,
@@ -740,7 +874,10 @@ static void hvfb_cfb_copyarea(struct fb_info *p,
 
 	cfb_copyarea(p, area);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, area->dx, area->dy,
+					       area->width, area->height);
 }
 
 static void hvfb_cfb_imageblit(struct fb_info *p,
@@ -750,7 +887,10 @@ static void hvfb_cfb_imageblit(struct fb_info *p,
 
 	cfb_imageblit(p, image);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, image->dx, image->dy,
+					       image->width, image->height);
 }
 
 static struct fb_ops hvfb_ops = {
@@ -809,6 +949,9 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 	resource_size_t pot_start, pot_end;
 	int ret;
 
+	dio_fb_size =
+		screen_width * screen_height * screen_depth / 8;
+
 	if (gen2vm) {
 		pot_start = 0;
 		pot_end = -1;
@@ -843,9 +986,14 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 	if (!fb_virt)
 		goto err2;
 
+	/* Allocate memory for deferred IO */
+	par->dio_vp = vzalloc(round_up(dio_fb_size, PAGE_SIZE));
+	if (par->dio_vp == NULL)
+		goto err3;
+
 	info->apertures = alloc_apertures(1);
 	if (!info->apertures)
-		goto err3;
+		goto err4;
 
 	if (gen2vm) {
 		info->apertures->ranges[0].base = screen_info.lfb_base;
@@ -857,16 +1005,23 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 		info->apertures->ranges[0].size = pci_resource_len(pdev, 0);
 	}
 
+	/* Physical address of FB device */
+	par->mmio_pp = par->mem->start;
+	/* Virtual address of FB device */
+	par->mmio_vp = (unsigned char *) fb_virt;
+
 	info->fix.smem_start = par->mem->start;
-	info->fix.smem_len = screen_fb_size;
-	info->screen_base = fb_virt;
-	info->screen_size = screen_fb_size;
+	info->fix.smem_len = dio_fb_size;
+	info->screen_base = par->dio_vp;
+	info->screen_size = dio_fb_size;
 
 	if (!gen2vm)
 		pci_dev_put(pdev);
 
 	return 0;
 
+err4:
+	vfree(par->dio_vp);
 err3:
 	iounmap(fb_virt);
 err2:
@@ -884,6 +1039,7 @@ static void hvfb_putmem(struct fb_info *info)
 {
 	struct hvfb_par *par = info->par;
 
+	vfree(par->dio_vp);
 	iounmap(info->screen_base);
 	vmbus_free_mmio(par->mem->start, screen_fb_size);
 	par->mem = NULL;
@@ -909,6 +1065,11 @@ static int hvfb_probe(struct hv_device *hdev,
 	init_completion(&par->wait);
 	INIT_DELAYED_WORK(&par->dwork, hvfb_update_work);
 
+	par->delayed_refresh = false;
+	spin_lock_init(&par->delayed_refresh_lock);
+	par->x1 = par->y1 = INT_MAX;
+	par->x2 = par->y2 = 0;
+
 	/* Connect to VSP */
 	hv_set_drvdata(hdev, info);
 	ret = synthvid_connect_vsp(hdev);
@@ -960,6 +1121,10 @@ static int hvfb_probe(struct hv_device *hdev,
 	info->fbops = &hvfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
 
+	/* Initialize deferred IO */
+	info->fbdefio = &synthvid_defio;
+	fb_deferred_io_init(info);
+
 	/* Send config to host */
 	ret = synthvid_send_config(hdev);
 	if (ret)
@@ -981,6 +1146,7 @@ static int hvfb_probe(struct hv_device *hdev,
 	return 0;
 
 error:
+	fb_deferred_io_cleanup(info);
 	hvfb_putmem(info);
 error2:
 	vmbus_close(hdev->channel);
@@ -1003,6 +1169,8 @@ static int hvfb_remove(struct hv_device *hdev)
 	par->update = false;
 	par->fb_ready = false;
 
+	fb_deferred_io_cleanup(info);
+
 	unregister_framebuffer(info);
 	cancel_delayed_work_sync(&par->dwork);
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH v5 2/2] tools: hv: add vmbus testing tool
From: Branden Bonaby @ 2019-09-13  2:32 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal
  Cc: Branden Bonaby, linux-hyperv, linux-kernel
In-Reply-To: <cover.1568320416.git.brandonbonaby94@gmail.com>

This is a userspace tool to drive the testing. Currently it supports
introducing user specified delay in the host to guest communication
path on a per-channel basis.

Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
---
Changes in v4:
- Based on Harrys comments, made the tool more
  user friendly and added more error checking.

Changes in v3:
- Align python tool to match Linux coding style.

Changes in v2:
 - Move testing location to new location in debugfs.

 tools/hv/vmbus_testing | 376 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 376 insertions(+)
 create mode 100644 tools/hv/vmbus_testing

diff --git a/tools/hv/vmbus_testing b/tools/hv/vmbus_testing
new file mode 100644
index 000000000000..e7212903dd1d
--- /dev/null
+++ b/tools/hv/vmbus_testing
@@ -0,0 +1,376 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Program to allow users to fuzz test Hyper-V drivers
+# by interfacing with Hyper-V debugfs attributes.
+# Current test methods available:
+#       1. delay testing
+#
+# Current file/directory structure of hyper-V debugfs:
+#       /sys/kernel/debug/hyperv/UUID
+#       /sys/kernel/debug/hyperv/UUID/<test-state filename>
+#       /sys/kernel/debug/hyperv/UUID/<test-method sub-directory>
+#
+# author: Branden Bonaby <brandonbonaby94@gmail.com>
+
+import os
+import cmd
+import argparse
+import glob
+from argparse import RawDescriptionHelpFormatter
+from argparse import RawTextHelpFormatter
+from enum import Enum
+
+# Do not change unless, you change the debugfs attributes
+# in /drivers/hv/debugfs.c. All fuzz testing
+# attributes will start with "fuzz_test".
+
+# debugfs path for hyperv must exist before proceeding
+debugfs_hyperv_path = "/sys/kernel/debug/hyperv"
+if not os.path.isdir(debugfs_hyperv_path):
+        print("{} doesn't exist/check permissions".format(debugfs_hyperv_path))
+        exit(-1)
+
+class dev_state(Enum):
+        off = 0
+        on = 1
+
+# File names, that correspond to the files created in
+# /drivers/hv/debugfs.c
+class f_names(Enum):
+        state_f = "fuzz_test_state"
+        buff_f =  "fuzz_test_buffer_interrupt_delay"
+        mess_f =  "fuzz_test_message_delay"
+
+# Both single_actions and all_actions are used
+# for error checking and to allow for some subparser
+# names to be abbreviated. Do not abbreviate the
+# test method names, as it will become less intuitive
+# as to what the user can do. If you do decide to
+# abbreviate the test method name, make sure the main
+# function reflects this change.
+
+all_actions = [
+        "disable_all",
+        "D",
+        "enable_all",
+        "view_all",
+        "V"
+]
+
+single_actions = [
+        "disable_single",
+        "d",
+        "enable_single",
+        "view_single",
+        "v"
+]
+
+def main():
+
+        file_map = recursive_file_lookup(debugfs_hyperv_path, dict())
+        args = parse_args()
+        if (not args.action):
+                print ("Error, no options selected...exiting")
+                exit(-1)
+        arg_set = { k for (k,v) in vars(args).items() if v and k != "action" }
+        arg_set.add(args.action)
+        path = args.path if "path" in arg_set else None
+        if (path and path[-1] == "/"):
+                path = path[:-1]
+        validate_args_path(path, arg_set, file_map)
+        if (path and "enable_single" in arg_set):
+            state_path = locate_state(path, file_map)
+            set_test_state(state_path, dev_state.on.value, args.quiet)
+
+        # Use subparsers as the key for different actions
+        if ("delay" in arg_set):
+                validate_delay_values(args.delay_time)
+                if (args.enable_all):
+                        set_delay_all_devices(file_map, args.delay_time,
+                                              args.quiet)
+                else:
+                        set_delay_values(path, file_map, args.delay_time,
+                                         args.quiet)
+        elif ("disable_all" in arg_set or "D" in arg_set):
+                disable_all_testing(file_map)
+        elif ("disable_single" in arg_set or "d" in arg_set):
+                disable_testing_single_device(path, file_map)
+        elif ("view_all" in arg_set or "V" in arg_set):
+                get_all_devices_test_status(file_map)
+        elif ("view_single" in arg_set or  "v" in arg_set):
+                get_device_test_values(path, file_map)
+
+# Get the state location
+def locate_state(device, file_map):
+        return file_map[device][f_names.state_f.value]
+
+# Validate delay values to make sure they are acceptable to
+# enable delays on a device
+def validate_delay_values(delay):
+
+        if (delay[0]  == -1 and delay[1] == -1):
+                print("\nError, At least 1 value must be greater than 0")
+                exit(-1)
+        for i in delay:
+                if (i < -1 or i == 0 or i > 1000):
+                        print("\nError, Values must be  equal to -1 "
+                              "or be > 0 and <= 1000")
+                        exit(-1)
+
+# Validate argument path
+def validate_args_path(path, arg_set, file_map):
+
+        if (not path and any(element in arg_set for element in single_actions)):
+                print("Error, path (-p) REQUIRED for the specified option. "
+                      "Use (-h) to check usage.")
+                exit(-1)
+        elif (path and any(item in arg_set for item in all_actions)):
+                print("Error, path (-p) NOT REQUIRED for the specified option. "
+                      "Use (-h) to check usage." )
+                exit(-1)
+        elif (path not in file_map and any(item in arg_set
+                                           for item in single_actions)):
+                print("Error, path '{}' not a valid vmbus device".format(path))
+                exit(-1)
+
+# display Testing status of single device
+def get_device_test_values(path, file_map):
+
+        for name in file_map[path]:
+                file_location = file_map[path][name]
+                print( name + " = " + str(read_test_files(file_location)))
+
+# Create a map of the vmbus devices and their associated files
+# [key=device, value = [key = filename, value = file path]]
+def recursive_file_lookup(path, file_map):
+
+        for f_path in glob.iglob(path + '**/*'):
+                if (os.path.isfile(f_path)):
+                        if (f_path.rsplit("/",2)[0] == debugfs_hyperv_path):
+                                directory = f_path.rsplit("/",1)[0]
+                        else:
+                                directory = f_path.rsplit("/",2)[0]
+                        f_name = f_path.split("/")[-1]
+                        if (file_map.get(directory)):
+                                file_map[directory].update({f_name:f_path})
+                        else:
+                                file_map[directory] = {f_name:f_path}
+                elif (os.path.isdir(f_path)):
+                        recursive_file_lookup(f_path,file_map)
+        return file_map
+
+# display Testing state of devices
+def get_all_devices_test_status(file_map):
+
+        for device in file_map:
+                if (get_test_state(locate_state(device, file_map)) is 1):
+                        print("Testing = ON for: {}"
+                              .format(device.split("/")[5]))
+                else:
+                        print("Testing = OFF for: {}"
+                              .format(device.split("/")[5]))
+
+# read the vmbus device files, path must be absolute path before calling
+def read_test_files(path):
+        try:
+                with open(path,"r") as f:
+                        file_value = f.readline().strip()
+                return int(file_value)
+
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on file {2}"
+                      .format(errno, strerror, path))
+                exit(-1)
+        except ValueError:
+                print ("Element to int conversion error in: \n{}".format(path))
+                exit(-1)
+
+# writing to vmbus device files, path must be absolute path before calling
+def write_test_files(path, value):
+
+        try:
+                with open(path,"w") as f:
+                        f.write("{}".format(value))
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on file {2}"
+                      .format(errno, strerror, path))
+                exit(-1)
+
+# set testing state of device
+def set_test_state(state_path, state_value, quiet):
+
+        write_test_files(state_path, state_value)
+        if (get_test_state(state_path) is 1):
+                if (not quiet):
+                        print("Testing = ON for device: {}"
+                              .format(state_path.split("/")[5]))
+        else:
+                if (not quiet):
+                        print("Testing = OFF for device: {}"
+                              .format(state_path.split("/")[5]))
+
+# get testing state of device
+def get_test_state(state_path):
+        #state == 1 - test = ON
+        #state == 0 - test = OFF
+        return  read_test_files(state_path)
+
+# write 1 - 1000 microseconds, into a single device using the
+# fuzz_test_buffer_interrupt_delay and fuzz_test_message_delay
+# debugfs attributes
+def set_delay_values(device, file_map, delay_length, quiet):
+
+        try:
+                interrupt = file_map[device][f_names.buff_f.value]
+                message = file_map[device][f_names.mess_f.value]
+
+                # delay[0]- buffer interrupt delay, delay[1]- message delay
+                if (delay_length[0] >= 0 and delay_length[0] <= 1000):
+                        write_test_files(interrupt, delay_length[0])
+                if (delay_length[1] >= 0 and delay_length[1] <= 1000):
+                        write_test_files(message, delay_length[1])
+                if (not quiet):
+                        print("Buffer delay testing = {} for: {}"
+                              .format(read_test_files(interrupt),
+                                      interrupt.split("/")[5]))
+                        print("Message delay testing = {} for: {}"
+                              .format(read_test_files(message),
+                                      message.split("/")[5]))
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on files {2}{3}"
+                      .format(errno, strerror, interrupt, message))
+                exit(-1)
+
+# enabling delay testing on all devices
+def set_delay_all_devices(file_map, delay, quiet):
+
+        for device in (file_map):
+                set_test_state(locate_state(device, file_map),
+                               dev_state.on.value,
+                               quiet)
+                set_delay_values(device, file_map, delay, quiet)
+
+# disable all testing on a SINGLE device.
+def disable_testing_single_device(device, file_map):
+
+        for name in file_map[device]:
+                file_location = file_map[device][name]
+                write_test_files(file_location, dev_state.off.value)
+        print("ALL testing now OFF for {}".format(device.split("/")[-1]))
+
+# disable all testing on ALL devices
+def disable_all_testing(file_map):
+
+        for device in file_map:
+                disable_testing_single_device(device, file_map)
+
+def parse_args():
+        parser = argparse.ArgumentParser(prog = "vmbus_testing",usage ="\n"
+                "%(prog)s [delay]   [-h] [-e|-E] -t [-p]\n"
+                "%(prog)s [view_all       | V]      [-h]\n"
+                "%(prog)s [disable_all    | D]      [-h]\n"
+                "%(prog)s [disable_single | d]      [-h|-p]\n"
+                "%(prog)s [view_single    | v]      [-h|-p]\n"
+                "%(prog)s --version\n",
+                description = "\nUse lsvmbus to get vmbus device type "
+                "information.\n" "\nThe debugfs root path is "
+                "/sys/kernel/debug/hyperv",
+                formatter_class = RawDescriptionHelpFormatter)
+        subparsers = parser.add_subparsers(dest = "action")
+        parser.add_argument("--version", action = "version",
+                version = '%(prog)s 0.1.0')
+        parser.add_argument("-q","--quiet", action = "store_true",
+                help = "silence none important test messages."
+                       " This will only work when enabling testing"
+                       " on a device.")
+        # Use the path parser to hold the --path attribute so it can
+        # be shared between subparsers. Also do the same for the state
+        # parser, as all testing methods will use --enable_all and
+        # enable_single.
+        path_parser = argparse.ArgumentParser(add_help=False)
+        path_parser.add_argument("-p","--path", metavar = "",
+                help = "Debugfs path to a vmbus device. The path "
+                "must be the absolute path to the device.")
+        state_parser = argparse.ArgumentParser(add_help=False)
+        state_group = state_parser.add_mutually_exclusive_group(required = True)
+        state_group.add_argument("-E", "--enable_all", action = "store_const",
+                                 const = "enable_all",
+                                 help = "Enable the specified test type "
+                                 "on ALL vmbus devices.")
+        state_group.add_argument("-e", "--enable_single",
+                                 action = "store_const",
+                                 const = "enable_single",
+                                 help = "Enable the specified test type on a "
+                                 "SINGLE vmbus device.")
+        parser_delay = subparsers.add_parser("delay",
+                        parents = [state_parser, path_parser],
+                        help = "Delay the ring buffer interrupt or the "
+                        "ring buffer message reads in microseconds.",
+                        prog = "vmbus_testing",
+                        usage = "%(prog)s [-h]\n"
+                        "%(prog)s -E -t [value] [value]\n"
+                        "%(prog)s -e -t [value] [value] -p",
+                        description = "Delay the ring buffer interrupt for "
+                        "vmbus devices, or delay the ring buffer message "
+                        "reads for vmbus devices (both in microseconds). This "
+                        "is only on the host to guest channel.")
+        parser_delay.add_argument("-t", "--delay_time", metavar = "", nargs = 2,
+                        type = check_range, default =[0,0], required = (True),
+                        help = "Set [buffer] & [message] delay time. "
+                        "Value constraints: -1 == value "
+                        "or 0 < value <= 1000.\n"
+                        "Use -1 to keep the previous value for that delay "
+                        "type, or a value > 0 <= 1000 to change the delay "
+                        "time.")
+        parser_dis_all = subparsers.add_parser("disable_all",
+                        aliases = ['D'], prog = "vmbus_testing",
+                        usage = "%(prog)s [disable_all | D] -h\n"
+                        "%(prog)s [disable_all | D]\n",
+                        help = "Disable ALL testing on ALL vmbus devices.",
+                        description = "Disable ALL testing on ALL vmbus "
+                        "devices.")
+        parser_dis_single = subparsers.add_parser("disable_single",
+                        aliases = ['d'],
+                        parents = [path_parser], prog = "vmbus_testing",
+                        usage = "%(prog)s [disable_single | d] -h\n"
+                        "%(prog)s [disable_single | d] -p\n",
+                        help = "Disable ALL testing on a SINGLE vmbus device.",
+                        description = "Disable ALL testing on a SINGLE vmbus "
+                        "device.")
+        parser_view_all = subparsers.add_parser("view_all", aliases = ['V'],
+                        help = "View the test state for ALL vmbus devices.",
+                        prog = "vmbus_testing",
+                        usage = "%(prog)s [view_all | V] -h\n"
+                        "%(prog)s [view_all | V]\n",
+                        description = "This shows the test state for ALL the "
+                        "vmbus devices.")
+        parser_view_single = subparsers.add_parser("view_single",
+                        aliases = ['v'],parents = [path_parser],
+                        help = "View the test values for a SINGLE vmbus "
+                        "device.",
+                        description = "This shows the test values for a SINGLE "
+                        "vmbus device.", prog = "vmbus_testing",
+                        usage = "%(prog)s [view_single | v] -h\n"
+                        "%(prog)s [view_single | v] -p")
+
+        return  parser.parse_args()
+
+# value checking for range checking input in parser
+def check_range(arg1):
+
+        try:
+                val = int(arg1)
+        except ValueError as err:
+                raise argparse.ArgumentTypeError(str(err))
+        if val < -1 or val > 1000:
+                message = ("\n\nvalue must be -1 or  0 < value <= 1000. "
+                           "Value program received: {}\n").format(val)
+                raise argparse.ArgumentTypeError(message)
+        return val
+
+if __name__ == "__main__":
+        main()
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox