Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* RE: [PATCH] hv_balloon: Add the support of hibernation
From: Dexuan Cui @ 2019-09-14  0:26 UTC (permalink / raw)
  To: David Hildenbrand, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <ef6f8554-8324-a4d8-4549-759495e482b7@redhat.com>

> From: David Hildenbrand <david@redhat.com>
> Sent: Friday, September 13, 2019 2:44 PM
>
> > On recent Windows Server 2019+ hosts, the toolstacks on the hosts
> > guarantees that Dynamic Memory and Memory Resizing can not be enabled
> > if the virtual ACPI S4 state is enabled, and vice versa. Please refer to the
> > long write-up I made here.
>
> Hah, so the patch here is not actually relevant for modern Hyper-V

Correct.

> installations. (I would have loved to read that in the patch description
> - but maybe I missed that)

I'll add the related description into the changelog of v2 of this patch.

> > And, to make the hibernation functionality automated, the host is able to
> > send a "please hibernate" message to the VM via the Hyper-V shutdown
> > device upon the user's request (e.g. via GUI or scripting): see [...]
> > When the host sends the message,
> > it checks if the virtual ACPI S4 state is enabled for the VM: if not, the host
> > refuses to send the message. This means that the user does want to make
> > sure the virtual ACPI S4 state is enabled for the VM, if the user of the VM
> > wants to use the hibernation feature, and this means Dynamic Memory
> > and Memory Resizing can not be active due to the restrictions from the
> > host toolstack.
>
> Okay, *but* this is a current limitation. Just saying. If you could at
> least support balloon inflate/deflate, that would be a clear win for
> users. And less configuration knobs.

For Hyper-V (on recently hosts), Dynamic Memory (and Memory Resizing)
and hibernation are mutually exclusive and as I mentioned the host toolstack
guarantees they can not be both enabled. This is a host limitation and the VM
(i.e. we the Linux team) can do nothing about this. Note: here "enable
hibernation for a VM" means "enable the virtual ACPI S4 state for the VM".

By default a VM running on Hyper-V doesn't have the S4 state enabled, and
balloon inflate/deflate are indeed supported.

The knob (I think you mean the virtual ACPI S4 state) is introduced in the
host side design of the VM hibernation feature, and is enforced in the
host toolstack (as I described about the host-to-VM "please hibernate"
message). No knob or module parameter is introduced by the VM here.

> > And the hibernation functionality won't be officially supported on old
> > Windows Server hosts.
> >
> > So, IMHO we can't be bother to implement the idea you described in
> > detail. Sorry. :-)
>
> No worries, I neither develop for, use or work with Hyper-V. I was just
> reading along and wondering why you basically make the hv_balloon
> unusable in these environments. (initially I thought, "why don't you
> just disallow probing the device completely")

The Hyper-V team told me that: when hibernation is enabled & used for
a VM the only purpose of loading hv_balloon is that the driver can
still report the VM's memory pressure to the host, and it looks due to
some (non-technical?) reason the Hyper-V team thinks this info can be
useful.

> I am aware of the (hypervisor) issues of hibernation/suspend when it
> comes to balloon drivers / memory hot(un)plug. (currently working on
> virtio-mem myself and initially decided to block any
> hibernation/suspension attempts in case the driver is loaded and memory
> was plugged/unplugged)
>
> >
> > And, while I agree your idea is good, technically speaking I suspect it may
> > not be really useful, because once hv_balloon allows balloon-up/down,
> > hv_balloon effectively loses control of memory pages: after the host
> > takes some memory away, the VM never knows when exactly the
> > host will give it back -- actually the host never guarantees how soon
> > it will give the memory back. Consequently, the VM almost immediately
> > ends up in an un-hibernatable state...
> If you go via the host, you might be able to make sure to request to
> deflate the balloon before you try to hibernate, and inflate again when
> back up. You might even ask the user for permissions. Of course, once
> you deflated the balloon, it might not be guaranteed to inflate the
> balloon to the original size. But after all, it's "dynamic memory", so
> it might even be what the name suggests. It could be very well
> controlled from the host.
>
> If you go via the guest, you would first have to tell your hypervisor
> "please allow me to deflate so I can hibernate", or something like that.
> After hibernation (or some time X), the host might then decide to
> inflate again.
>
> E.g., take a look at virtio-balloon. When suspending, it simply deflates
> (without asking ...), to inflate again when resuming. Not saying that's
> the best approach (it's not :) ), but one approach to at least make it work.

Yes, I noticed this a few months ago. I think a major difference in Hyper-V
ballooning mechanism is that: all the deflate/inflate requests are from
the host and the VM can never proactively ask the host to deflate/inflate
the VM's memory. All that the VM can do is report its memory pressure
to the host and hope the host will soon give back the memory that was
taken away by the host.

I personally like the approach used in virtio-balloon. :-)

> Anyhow, just some comments from my side :) I can see how Windows Server
> worked around that issue right now by just XOR'ing both features.
>
> David / dhildenb

Thanks for sharing your thoughts!

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH 1/3] hv_utils: Add the support of hibernation
From: Vitaly Kuznetsov @ 2019-09-16  8:45 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, Michael Kelley
In-Reply-To: <PU1P153MB0169C6B4A78787930CC9ED4FBFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

Dexuan Cui <decui@microsoft.com> writes:

>> From: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Sent: Thursday, September 12, 2019 9:37 AM
>
>> > +static int util_suspend(struct hv_device *dev)
>> > +{
>> > +	struct hv_util_service *srv = hv_get_drvdata(dev);
>> > +
>> > +	if (srv->util_cancel_work)
>> > +		srv->util_cancel_work();
>> > +
>> > +	vmbus_close(dev->channel);
>> 
>> And what happens if you receive a real reply from userspace after you
>> close the channel? You've only cancelled timeout work so the driver
>> will not try to reply by itself but this doesn't mean we won't try to
>> write to the channel on legitimate replies from daemons.
>> 
>> I think you need to block all userspace requests (hang in kernel until
>> util_resume()).
>> 
>> While I'm not sure we can't get away without it but I'd suggest we add a
>> new HVUTIL_SUSPENDED state to the hvutil state machine.
>> Vitaly
>
> When we reach util_suspend(), all the userspace processes have been
> frozen: see kernel/power/hibernate.c: hibernate() -> freeze_processes(),
> so here we can not receive a reply from the userspace daemon.
>

Let's add a WARN() or something then as if this ever happens this is
going to be realy tricky to catch.

> However, it looks there is indeed some tricky corner cases we need to deal
> with: in util_resume(), before we call vmbus_open(), all the userspace
> processes are still frozen, and the userspace daemon (e.g. hv_kvp_daemon) 
> can be in any of these states:
>
> 1. the driver has not buffered any message for the daemon. This is good.
>
> 2. the driver has buffered a message for the daemon, and
> kvp_transaction.state is HVUTIL_USERSPACE_REQ. Later, the kvp daemon
> writes the response to the driver, and in kvp_on_msg() 
> kvp_transaction.state is moved to HVUTIL_USERSPACE_RECV, but since
> cancel_delayed_work_sync(&kvp_timeout_work) is false (the work has
> been cancelled by util_suspend()), the driver reports nothing to the host,
> which is good as the VM has undergone a hibernation event and IMO the
> response may be stale and I believe the host is not expecting this 
> response from the VM at all (the host side application that requested the
> KVP must have failed or timed out), but the bad thing is: the "state"
> remains in HVUTIL_USERSPACE_RECV, preventing
> hv_kvp_onchannelcallback() from reading from the channel ringbuffer.
>
> I suggest we work around this race condition by the below patch:
>
> --- a/drivers/hv/hv_kvp.c
> +++ b/drivers/hv/hv_kvp.c
> @@ -250,8 +250,8 @@ static int kvp_on_msg(void *msg, int len)
>          */
>         if (cancel_delayed_work_sync(&kvp_timeout_work)) {
>                 kvp_respond_to_host(message, error);
> -               hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>         }
> +       hv_poll_channel(kvp_transaction.recv_channel, kvp_poll_wrapper);
>
>         return 0;
>  }
>
> How do you like this?
>

Is it safe to call hv_poll_channel() simultaneously on several CPUs? It
seems it is as we're doing 

smp_call_function_single(channel->target_cpu, cb, channel, true);

(I'm asking because if it's not, than doing what you suggest will open
the following window: timeout function kvp_timeout_func() is already
running but the daemon is trying to reply at the same moment).

> I can imagine there is still a small chance that the state machine can run
> out of order, and the kvp daemon may exit due to the return values of
> read()/write() being -EINVAL, but the chance should be small enough in
> practice, and IMO the issue even exists in the current code when
> hibernation is not involved, e.g. kvp_timeout_func() and kvp_on_msg() 
> can run concurrently; if kvp_on_msg() runs slowly due to some reason 
> (e.g. the kvp daemon is stopped as I'm gdb'ing it), kvp_timeout_func()
> fires and moves the state to HVUTIL_READY; later kvp_on_msg() starts
> to run and returns -EINVAL, and the kvp daemon will exit().
>
> IMHO here it looks extremely difficult to make things flawless (if that's
> even possible), so it's necessary to ask the daemons to auto-restart once
> they exit() unexpectedly. This can be achieved by configuring systemd
> properly for the kvp/vss/fcopy services. 

I think we can also teach daemons to ignore -EINVAL or switch to
something like -EAGAIN in non-fatal cases.
 
>
> I'm not sure introducing a HVUTIL_SUSPENDED state would solve all
> of the corner cases, but I'm sure that would further complicate the
> current code, at least to me. :-)
>

Maybe, if we don't need it than we don't. Basically, I see the only
advantage in having such state: it makes our tricks to support
hibernation more visible in the code.

-- 
Vitaly


^ permalink raw reply

* [PATCH 0/3] KVM: x86: hyper-v: make L2 Hyper-V 2019 on KVM guests see MD_CLEAR
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan

[The series is KVM specific but the first patch of the series likely requires
 someone else's ACK. hyperv-tlfs.h gets a small addition too.]

It was discovered that L2 guests on Hyper-V 2019 on KVM don't see MD_CLEAR
bit (and thus think they're MDS vulnerable) even when it is present on the
host. Turns out, Hyper-V is filtering it out because it is not sure the
topology L0 is exposing is trustworthy and generally it is not. In some
specific cases (e.g. when SMT is unsupported or forcesully disabled) it is
and we can tell this to userspace hoping that it'll pass this info to L1.
See PATCH2 of the series for additional details.

The series can be tested with QEMU-4.1+ and 'hv-passthrough' CPU flag.

Vitaly Kuznetsov (3):
  cpu/SMT: create and export cpu_smt_possible()
  KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when
    SMT is impossible
  KVM: selftests: hyperv_cpuid: add check for
    NoNonArchitecturalCoreSharing bit

 arch/x86/include/asm/hyperv-tlfs.h            |  7 +++++
 arch/x86/kvm/hyperv.c                         |  4 ++-
 include/linux/cpu.h                           |  2 ++
 kernel/cpu.c                                  | 11 ++++++--
 .../selftests/kvm/x86_64/hyperv_cpuid.c       | 27 +++++++++++++++++++
 5 files changed, 48 insertions(+), 3 deletions(-)

-- 
2.20.1

^ permalink raw reply

* [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
guarantee that two virtual processors won't end up running on sibling SMT
threads without knowing about it. This is done as an optimization as in
this case there is nothing the guest can do to protect itself against MDS
and issuing additional flush requests is just pointless. On bare metal the
topology is known, however, when Hyper-V is running nested (e.g. on top of
KVM) it needs an additional piece of information: a confirmation that the
exposed topology (wrt vCPU placement on different SMT threads) is
trustworthy.

NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
TLFS as follows: "Indicates that a virtual processor will never share a
physical core with another virtual processor, except for virtual processors
that are reported as sibling SMT threads." From KVM we can give such
guarantee in two cases:
- SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
 as it can become re-enabled during the lifetime of the guest).
- vCPUs are properly pinned so the scheduler won't put them on sibling
SMT threads (when they're not reported as such).

This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
first case. The second case is outside of KVM's domain of responsibility
(as vCPU pinning is actually done by someone who manages KVM's userspace -
e.g. libvirt pinning QEMU threads).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 7 +++++++
 arch/x86/kvm/hyperv.c              | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index af78cd72b8f3..989a1efe7f5e 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -170,6 +170,13 @@
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
+/*
+ * Virtual processor will never share a physical core with another virtual
+ * processor, except for virtual processors that are reported as sibling SMT
+ * threads.
+ */
+#define HV_X64_NO_NONARCH_CORESHARING                  BIT(18)
+
 /* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
 #define HV_X64_NESTED_GUEST_MAPPING_FLUSH		BIT(18)
 #define HV_X64_NESTED_MSR_BITMAP			BIT(19)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fff790a3f4ee..9c187d16a9cd 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -23,6 +23,7 @@
 #include "ioapic.h"
 #include "hyperv.h"
 
+#include <linux/cpu.h>
 #include <linux/kvm_host.h>
 #include <linux/highmem.h>
 #include <linux/sched/cputime.h>
@@ -1864,7 +1865,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
 			ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
 			if (evmcs_ver)
 				ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
-
+			if (!cpu_smt_possible())
+				ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
 			/*
 			 * Default number of spinlock retry attempts, matches
 			 * HyperV 2016.
-- 
2.20.1


^ permalink raw reply related

* [PATCH 3/3] KVM: selftests: hyperv_cpuid: add check for NoNonArchitecturalCoreSharing bit
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

The bit is supposed to be '1' when SMT is not supported or forcefully
disabled and '0' otherwise.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 .../selftests/kvm/x86_64/hyperv_cpuid.c       | 27 +++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
index ee59831fbc98..443a2b54645b 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c
@@ -26,6 +26,25 @@ static void guest_code(void)
 {
 }
 
+static int smt_possible(void)
+{
+	char buf[16];
+	FILE *f;
+	bool res = 1;
+
+	f = fopen("/sys/devices/system/cpu/smt/control", "r");
+	if (f) {
+		if (fread(buf, sizeof(*buf), sizeof(buf), f) > 0) {
+			if (!strncmp(buf, "forceoff", 8) ||
+			    !strncmp(buf, "notsupported", 12))
+				res = 0;
+		}
+		fclose(f);
+	}
+
+	return res;
+}
+
 static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries,
 			  int evmcs_enabled)
 {
@@ -59,6 +78,14 @@ static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries,
 		TEST_ASSERT(!entry->padding[0] && !entry->padding[1] &&
 			    !entry->padding[2], "padding should be zero");
 
+		if (entry->function == 0x40000004) {
+			int nononarchcs = !!(entry->eax & (1UL << 18));
+
+			TEST_ASSERT(nononarchcs == !smt_possible(),
+				    "NoNonArchitecturalCoreSharing bit"
+				    " doesn't reflect SMT setting");
+		}
+
 		/*
 		 * If needed for debug:
 		 * fprintf(stdout,
-- 
2.20.1


^ permalink raw reply related

* [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Vitaly Kuznetsov @ 2019-09-16 16:22 UTC (permalink / raw)
  To: kvm
  Cc: linux-kernel, linux-hyperv, x86, Paolo Bonzini,
	Radim Krčmář, Sean Christopherson, Jim Mattson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-1-vkuznets@redhat.com>

KVM needs to know if SMT is theoretically possible, this means it is
supported and not forcefully disabled ('nosmt=force'). Create and
export cpu_smt_possible() answering this question.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 include/linux/cpu.h |  2 ++
 kernel/cpu.c        | 11 +++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index fcb1386bb0d4..6d48fc456d58 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -201,12 +201,14 @@ enum cpuhp_smt_control {
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
 extern void cpu_smt_check_topology(void);
+extern bool cpu_smt_possible(void);
 extern int cpuhp_smt_enable(void);
 extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval);
 #else
 # define cpu_smt_control		(CPU_SMT_NOT_IMPLEMENTED)
 static inline void cpu_smt_disable(bool force) { }
 static inline void cpu_smt_check_topology(void) { }
+static inline bool cpu_smt_possible(void) { return false; }
 static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; }
 #endif
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e84c0873559e..2f8c2631e641 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -389,8 +389,7 @@ enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED;
 
 void __init cpu_smt_disable(bool force)
 {
-	if (cpu_smt_control == CPU_SMT_FORCE_DISABLED ||
-		cpu_smt_control == CPU_SMT_NOT_SUPPORTED)
+	if (!cpu_smt_possible())
 		return;
 
 	if (force) {
@@ -435,6 +434,14 @@ static inline bool cpu_smt_allowed(unsigned int cpu)
 	 */
 	return !per_cpu(cpuhp_state, cpu).booted_once;
 }
+
+/* Returns true if SMT is not supported of forcefully (irreversibly) disabled */
+bool cpu_smt_possible(void)
+{
+	return cpu_smt_control != CPU_SMT_FORCE_DISABLED &&
+		cpu_smt_control != CPU_SMT_NOT_SUPPORTED;
+}
+EXPORT_SYMBOL_GPL(cpu_smt_possible);
 #else
 static inline bool cpu_smt_allowed(unsigned int cpu) { return true; }
 #endif
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Jim Mattson @ 2019-09-16 16:34 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-3-vkuznets@redhat.com>

On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
> guarantee that two virtual processors won't end up running on sibling SMT
> threads without knowing about it. This is done as an optimization as in
> this case there is nothing the guest can do to protect itself against MDS
> and issuing additional flush requests is just pointless. On bare metal the
> topology is known, however, when Hyper-V is running nested (e.g. on top of
> KVM) it needs an additional piece of information: a confirmation that the
> exposed topology (wrt vCPU placement on different SMT threads) is
> trustworthy.
>
> NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
> TLFS as follows: "Indicates that a virtual processor will never share a
> physical core with another virtual processor, except for virtual processors
> that are reported as sibling SMT threads." From KVM we can give such
> guarantee in two cases:
> - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
>  as it can become re-enabled during the lifetime of the guest).
> - vCPUs are properly pinned so the scheduler won't put them on sibling
> SMT threads (when they're not reported as such).

That's a nice bit of information. Have you considered a mechanism for
communicating this information to kvm guests in a way that doesn't
require Hyper-V enlightenments?

> This patch reports NoNonArchitecturalCoreSharing bit in to userspace in the
> first case. The second case is outside of KVM's domain of responsibility
> (as vCPU pinning is actually done by someone who manages KVM's userspace -
> e.g. libvirt pinning QEMU threads).
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 7 +++++++
>  arch/x86/kvm/hyperv.c              | 4 +++-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index af78cd72b8f3..989a1efe7f5e 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -170,6 +170,13 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED            BIT(14)
>
> +/*
> + * Virtual processor will never share a physical core with another virtual
> + * processor, except for virtual processors that are reported as sibling SMT
> + * threads.
> + */
> +#define HV_X64_NO_NONARCH_CORESHARING                  BIT(18)
> +
>  /* Nested features. These are HYPERV_CPUID_NESTED_FEATURES.EAX bits. */
>  #define HV_X64_NESTED_GUEST_MAPPING_FLUSH              BIT(18)
>  #define HV_X64_NESTED_MSR_BITMAP                       BIT(19)
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index fff790a3f4ee..9c187d16a9cd 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -23,6 +23,7 @@
>  #include "ioapic.h"
>  #include "hyperv.h"
>
> +#include <linux/cpu.h>
>  #include <linux/kvm_host.h>
>  #include <linux/highmem.h>
>  #include <linux/sched/cputime.h>
> @@ -1864,7 +1865,8 @@ int kvm_vcpu_ioctl_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
>                         ent->eax |= HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED;
>                         if (evmcs_ver)
>                                 ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> -
> +                       if (!cpu_smt_possible())
> +                               ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
>                         /*
>                          * Default number of spinlock retry attempts, matches
>                          * HyperV 2016.
> --
> 2.20.1
>

^ permalink raw reply

* Re: [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Jim Mattson @ 2019-09-16 17:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <20190916162258.6528-2-vkuznets@redhat.com>

On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> KVM needs to know if SMT is theoretically possible, this means it is
> supported and not forcefully disabled ('nosmt=force'). Create and
> export cpu_smt_possible() answering this question.

It seems to me that KVM really just wants to know if the scheduler can
be trusted to avoid violating the invariant expressed by the Hyper-V
enlightenment, NoNonArchitecturalCoreSharing. It is possible to do
that even when SMT is enabled, if the scheduler is core-aware.
Wouldn't it be better to implement a scheduler API that told you
exactly what you wanted to know, rather than trying to infer the
answer from various breadcrumbs?

^ permalink raw reply

* [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
From: Denis Efremov @ 2019-09-16 20:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Denis Efremov, linux-kernel, linux-pci, Andrew Murray,
	linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin
In-Reply-To: <20190916204158.6889-1-efremov@linux.com>

Replace the magic constant (6) with define PCI_STD_NUM_BARS representing
the number of PCI BARs.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: Denis Efremov <efremov@linux.com>
---
 drivers/pci/controller/pci-hyperv.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 40b625458afa..1665c23b649f 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -307,7 +307,7 @@ struct pci_bus_relations {
 struct pci_q_res_req_response {
 	struct vmpacket_descriptor hdr;
 	s32 status;			/* negative values are failures */
-	u32 probed_bar[6];
+	u32 probed_bar[PCI_STD_NUM_BARS];
 } __packed;
 
 struct pci_set_power {
@@ -503,7 +503,7 @@ struct hv_pci_dev {
 	 * What would be observed if one wrote 0xFFFFFFFF to a BAR and then
 	 * read it back, for each of the BAR offsets within config space.
 	 */
-	u32 probed_bar[6];
+	u32 probed_bar[PCI_STD_NUM_BARS];
 };
 
 struct hv_pci_compl {
@@ -1327,7 +1327,7 @@ static void survey_child_resources(struct hv_pcibus_device *hbus)
 	 * so it's sufficient to just add them up without tracking alignment.
 	 */
 	list_for_each_entry(hpdev, &hbus->children, list_entry) {
-		for (i = 0; i < 6; i++) {
+		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 			if (hpdev->probed_bar[i] & PCI_BASE_ADDRESS_SPACE_IO)
 				dev_err(&hbus->hdev->device,
 					"There's an I/O BAR in this list!\n");
@@ -1401,7 +1401,7 @@ static void prepopulate_bars(struct hv_pcibus_device *hbus)
 	/* Pick addresses for the BARs. */
 	do {
 		list_for_each_entry(hpdev, &hbus->children, list_entry) {
-			for (i = 0; i < 6; i++) {
+			for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 				bar_val = hpdev->probed_bar[i];
 				if (bar_val == 0)
 					continue;
@@ -1558,7 +1558,7 @@ static void q_resource_requirements(void *context, struct pci_response *resp,
 			"query resource requirements failed: %x\n",
 			resp->status);
 	} else {
-		for (i = 0; i < 6; i++) {
+		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
 			completion->hpdev->probed_bar[i] =
 				q_res_req->probed_bar[i];
 		}
-- 
2.21.0


^ permalink raw reply related

* RE: [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
From: Haiyang Zhang @ 2019-09-16 21:19 UTC (permalink / raw)
  To: Denis Efremov, Bjorn Helgaas
  Cc: linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	Andrew Murray, linux-hyperv@vger.kernel.org, KY Srinivasan,
	Stephen Hemminger, Sasha Levin
In-Reply-To: <20190916204158.6889-3-efremov@linux.com>



> -----Original Message-----
> From: Denis Efremov <efremov@linux.com>
> Sent: Monday, September 16, 2019 4:42 PM
> To: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Denis Efremov <efremov@linux.com>; linux-kernel@vger.kernel.org;
> linux-pci@vger.kernel.org; Andrew Murray <andrew.murray@arm.com>;
> linux-hyperv@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Haiyang
> Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>
> Subject: [PATCH v3 02/26] PCI: hv: Use PCI_STD_NUM_BARS
> 
> Replace the magic constant (6) with define PCI_STD_NUM_BARS
> representing the number of PCI BARs.
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Signed-off-by: Denis Efremov <efremov@linux.com>
> ---
>  drivers/pci/controller/pci-hyperv.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-
> hyperv.c
> index 40b625458afa..1665c23b649f 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -307,7 +307,7 @@ struct pci_bus_relations {  struct
> pci_q_res_req_response {
>  	struct vmpacket_descriptor hdr;
>  	s32 status;			/* negative values are failures */
> -	u32 probed_bar[6];
> +	u32 probed_bar[PCI_STD_NUM_BARS];
>  } __packed;
> 
>  struct pci_set_power {
> @@ -503,7 +503,7 @@ struct hv_pci_dev {
>  	 * What would be observed if one wrote 0xFFFFFFFF to a BAR and
> then
>  	 * read it back, for each of the BAR offsets within config space.
>  	 */
> -	u32 probed_bar[6];
> +	u32 probed_bar[PCI_STD_NUM_BARS];
>  };
> 
>  struct hv_pci_compl {
> @@ -1327,7 +1327,7 @@ static void survey_child_resources(struct
> hv_pcibus_device *hbus)
>  	 * so it's sufficient to just add them up without tracking alignment.
>  	 */
>  	list_for_each_entry(hpdev, &hbus->children, list_entry) {
> -		for (i = 0; i < 6; i++) {
> +		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  			if (hpdev->probed_bar[i] &
> PCI_BASE_ADDRESS_SPACE_IO)
>  				dev_err(&hbus->hdev->device,
>  					"There's an I/O BAR in this list!\n");
> @@ -1401,7 +1401,7 @@ static void prepopulate_bars(struct
> hv_pcibus_device *hbus)
>  	/* Pick addresses for the BARs. */
>  	do {
>  		list_for_each_entry(hpdev, &hbus->children, list_entry) {
> -			for (i = 0; i < 6; i++) {
> +			for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  				bar_val = hpdev->probed_bar[i];
>  				if (bar_val == 0)
>  					continue;
> @@ -1558,7 +1558,7 @@ static void q_resource_requirements(void
> *context, struct pci_response *resp,
>  			"query resource requirements failed: %x\n",
>  			resp->status);
>  	} else {
> -		for (i = 0; i < 6; i++) {
> +		for (i = 0; i < PCI_STD_NUM_BARS; i++) {
>  			completion->hpdev->probed_bar[i] =
>  				q_res_req->probed_bar[i];
>  		}

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thanks.

^ permalink raw reply

* RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
From: Dexuan Cui @ 2019-09-16 21:45 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, b.zolnierkie@samsung.com,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stephen Hemminger, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan
  Cc: Iouri Tarassov
In-Reply-To: <PU1P153MB0169656B3EC48BFCF4D8C134BFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Thursday, September 12, 2019 11:39 PM
> To: Michael Kelley <mikelley@microsoft.com>; Wei Hu <weh@microsoft.com>;
> b.zolnierkie@samsung.com; linux-hyperv@vger.kernel.org;
> dri-devel@lists.freedesktop.org; linux-fbdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> Cc: Iouri Tarassov <iourit@microsoft.com>
> Subject: RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution
> from Hyper-V host
> 
> > From: Michael Kelley <mikelley@microsoft.com>
> > Sent: Thursday, September 5, 2019 7:06 AM
> >
> > From: Wei Hu <weh@microsoft.com> Sent: Thursday, September 5, 2019
> 2:12
> > AM
> > >
> > > Beginning from Windows 10 RS5+, VM screen resolution is obtained from
> > host.
> > > The "video=hyperv_fb" boot time option is not needed, but still can be
> > > used to overwrite what the host specifies. The VM resolution on the host
> > > could be set by executing the powershell "set-vmvideo" command.
> > >
> > > Signed-off-by: Iouri Tarassov <iourit@microsoft.com>
> > > Signed-off-by: Wei Hu <weh@microsoft.com>
> > > ---
> > >     v2:
> > >     - Implemented fallback when version negotiation failed.
> > >     - Defined full size for supported_resolution array.
> > >
> > >     v3:
> > >     - Corrected the synthvid major and minor version comparison
> problem.
> > >
> > >     v4:
> > >     - Changed function name to synthvid_ver_ge().
> > >
> > >  drivers/video/fbdev/hyperv_fb.c | 159
> > +++++++++++++++++++++++++++++---
> > >  1 file changed, 147 insertions(+), 12 deletions(-)
> > >
> >
> > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> 
> Looks good to me.
> 
> Reviewed-by: Dexuan Cui <decui@microsoft.com>

Hi Wei,
It turns out we need to make a further fix. :-)

The patch forgets to take par->update into consideration.

When the VM Connection window is closed (or minimized?),
the host sends a message to the guest, and the guest sets
par->update to false in synthvid_recv_sub().

If par->update is false, the guest doesn't need to call
synthvid_update().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Dexuan Cui @ 2019-09-16 21:46 UTC (permalink / raw)
  To: Dexuan Cui, Wei Hu, Michael Kelley, rdunlap@infradead.org,
	shc_work@mail.ru, gregkh@linuxfoundation.org,
	lee.jones@linaro.org, alexandre.belloni@bootlin.com,
	baijiaju1990@gmail.com, fthain@telegraphics.com.au,
	info@metux.net, linux-hyperv@vger.kernel.org,
	dri-devel@lists.freedesktop.org, linux-fbdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, sashal@kernel.org,
	Stephen Hemminger, Haiyang Zhang, KY Srinivasan
In-Reply-To: <PU1P153MB0169E5E73D258A034B4869DCBFB30@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>



Thanks,
-- Dexuan

> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> Sent: Thursday, September 12, 2019 11:38 PM
> To: Wei Hu <weh@microsoft.com>; Michael Kelley <mikelley@microsoft.com>;
> rdunlap@infradead.org; shc_work@mail.ru; gregkh@linuxfoundation.org;
> lee.jones@linaro.org; alexandre.belloni@bootlin.com;
> baijiaju1990@gmail.com; fthain@telegraphics.com.au; info@metux.net;
> linux-hyperv@vger.kernel.org; dri-devel@lists.freedesktop.org;
> linux-fbdev@vger.kernel.org; linux-kernel@vger.kernel.org; sashal@kernel.org;
> Stephen Hemminger <sthemmin@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> Subject: RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for
> Hyper-V frame buffer driver
> 
> > From: Wei Hu <weh@microsoft.com>
> > Sent: Thursday, September 12, 2019 11:03 PM
> >
> > Without deferred IO support, hyperv_fb driver informs the host to refresh
> > the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> > is screen update or not. This patch supports deferred IO for screens in
> > graphics mode and also enables the frame buffer on-demand refresh. The
> > highest refresh rate is still set at 20Hz.
> >
> > Currently Hyper-V only takes a physical address from guest as the starting
> > address of frame buffer. This implies the guest must allocate contiguous
> > physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> > accept address from MMIO region as frame buffer address. Due to these
> > limitations on Hyper-V host, we keep a shadow copy of frame buffer
> > in the guest. This means one more copy of the dirty rectangle inside
> > guest when doing the on-demand refresh. This can be optimized in the
> > future with help from host. For now the host performance gain from deferred
> > IO outweighs the shadow copy impact in the guest.
> >
> > Signed-off-by: Wei Hu <weh@microsoft.com>
> > ---
> >     v2: Incorporated review comments from Michael Kelley
> >     - Increased dirty rectangle by one row in deferred IO case when sending
> >     to Hyper-V.
> >     - Corrected the dirty rectangle size in the text mode.
> >     - Added more comments.
> >     - Other minor code cleanups.
> >
> >     v3: Incorporated more review comments
> >     - Removed a few unnecessary variable tests
> >
> >     v4: Incorporated test and review feedback from Dexuan Cui
> >     - Not disable interrupt while acquiring docopy_lock in
> >       hvfb_update_work(). This avoids significant bootup delay in
> >       large vCPU count VMs.
> >
> >     v5: Completely remove the unnecessary docopy_lock after discussing
> >     with Dexuan Cui.
> 
> Thanks! Looks good to me.
> 
> Reviewed-by: Dexuan Cui <decui@microsoft.com>


Hi Wei,
It turns out we need to make a further fix. :-)

The patch forgets to take par->update into consideration.

When the VM Connection window is closed (or minimized?),
the host sends a message to the guest, and the guest sets
par->update to false in synthvid_recv_sub().

If par->update is false, the guest doesn't need to call
synthvid_update().

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution from Hyper-V host
From: Dexuan Cui @ 2019-09-16 21:48 UTC (permalink / raw)
  To: Michael Kelley, Wei Hu, b.zolnierkie@samsung.com,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stephen Hemminger, sashal@kernel.org, Haiyang Zhang,
	KY Srinivasan
  Cc: Iouri Tarassov
In-Reply-To: <PU1P153MB0169E5FA3D359C6BDD50EC34BF8C0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: Dexuan Cui
> Sent: Monday, September 16, 2019 2:46 PM
> To: Michael Kelley <mikelley@microsoft.com>; Wei Hu <weh@microsoft.com>;
> b.zolnierkie@samsung.com; linux-hyperv@vger.kernel.org;
> dri-devel@lists.freedesktop.org; linux-fbdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> Cc: Iouri Tarassov <iourit@microsoft.com>
> Subject: RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution
> from Hyper-V host
> 
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Dexuan Cui
> > Sent: Thursday, September 12, 2019 11:39 PM
> > To: Michael Kelley <mikelley@microsoft.com>; Wei Hu
> <weh@microsoft.com>;
> > b.zolnierkie@samsung.com; linux-hyperv@vger.kernel.org;
> > dri-devel@lists.freedesktop.org; linux-fbdev@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Stephen Hemminger
> > <sthemmin@microsoft.com>; sashal@kernel.org; Haiyang Zhang
> > <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>
> > Cc: Iouri Tarassov <iourit@microsoft.com>
> > Subject: RE: [PATCH v4] video: hyperv: hyperv_fb: Obtain screen resolution
> > from Hyper-V host
> >
> > > From: Michael Kelley <mikelley@microsoft.com>
> > > Sent: Thursday, September 5, 2019 7:06 AM
> > >
> > > From: Wei Hu <weh@microsoft.com> Sent: Thursday, September 5, 2019
> > 2:12
> > > AM
> > > >
> > > > Beginning from Windows 10 RS5+, VM screen resolution is obtained from
> > > host.
> > > > The "video=hyperv_fb" boot time option is not needed, but still can be
> > > > used to overwrite what the host specifies. The VM resolution on the host
> > > > could be set by executing the powershell "set-vmvideo" command.
> > > >
> > > > Signed-off-by: Iouri Tarassov <iourit@microsoft.com>
> > > > Signed-off-by: Wei Hu <weh@microsoft.com>
> > > > ---
> > > >     v2:
> > > >     - Implemented fallback when version negotiation failed.
> > > >     - Defined full size for supported_resolution array.
> > > >
> > > >     v3:
> > > >     - Corrected the synthvid major and minor version comparison
> > problem.
> > > >
> > > >     v4:
> > > >     - Changed function name to synthvid_ver_ge().
> > > >
> > > >  drivers/video/fbdev/hyperv_fb.c | 159
> > > +++++++++++++++++++++++++++++---
> > > >  1 file changed, 147 insertions(+), 12 deletions(-)
> > > >
> > >
> > > Reviewed-by: Michael Kelley <mikelley@microsoft.com>
> >
> > Looks good to me.
> >
> > Reviewed-by: Dexuan Cui <decui@microsoft.com>
> 
> Hi Wei,
> It turns out we need to make a further fix. :-)
> 
> The patch forgets to take par->update into consideration.
> 
> When the VM Connection window is closed (or minimized?),
> the host sends a message to the guest, and the guest sets
> par->update to false in synthvid_recv_sub().
> 
> If par->update is false, the guest doesn't need to call
> synthvid_update().
> 
> Thanks,
> -- Dexuan

Please ignore the last reply from me. 

It was meant to reply another mail:
RE: [PATCH v5] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver

Sorry for the confusion.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Vitaly Kuznetsov @ 2019-09-17  9:33 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <CALMp9eRa0-HO+JWGDoAFO1zOtNjrutfT7d4pLxjsxn-XiAJwwQ@mail.gmail.com>

Jim Mattson <jmattson@google.com> writes:

> On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>
>> Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
>> guarantee that two virtual processors won't end up running on sibling SMT
>> threads without knowing about it. This is done as an optimization as in
>> this case there is nothing the guest can do to protect itself against MDS
>> and issuing additional flush requests is just pointless. On bare metal the
>> topology is known, however, when Hyper-V is running nested (e.g. on top of
>> KVM) it needs an additional piece of information: a confirmation that the
>> exposed topology (wrt vCPU placement on different SMT threads) is
>> trustworthy.
>>
>> NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
>> TLFS as follows: "Indicates that a virtual processor will never share a
>> physical core with another virtual processor, except for virtual processors
>> that are reported as sibling SMT threads." From KVM we can give such
>> guarantee in two cases:
>> - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
>>  as it can become re-enabled during the lifetime of the guest).
>> - vCPUs are properly pinned so the scheduler won't put them on sibling
>> SMT threads (when they're not reported as such).
>
> That's a nice bit of information. Have you considered a mechanism for
> communicating this information to kvm guests in a way that doesn't
> require Hyper-V enlightenments?
>

(I haven't put much thought in this) but can we re-use MD_CLEAR CPUID
bit for that? Like if the hypervisor can't guarantee usefulness
(e.g. when two random vCPUs can be put on sibling SMT threads) of
flushing, is there any reason to still make the guest think the feature
is there?

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH V4 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support
From: Paolo Bonzini @ 2019-09-17 13:27 UTC (permalink / raw)
  To: lantianyu1986, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, hpa, x86, michael.h.kelley
  Cc: Tianyu Lan, kvm, linux-doc, linux-hyperv, linux-kernel, vkuznets
In-Reply-To: <20190822143021.7518-1-Tianyu.Lan@microsoft.com>

On 22/08/19 16:30, lantianyu1986@gmail.com wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
> in L0 can delegate L1 hypervisor to handle tlb flush request from
> L2 guest when direct tlb flush is enabled in L1.
> 
> Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
> feature from user space. User space should enable this feature only
> when Hyper-V hypervisor capability is exposed to guest and KVM profile
> is hided. There is a parameter conflict between KVM and Hyper-V hypercall.
> We hope L2 guest doesn't use KVM hypercall when the feature is
> enabled. Detail please see comment of new API "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"
> 
> Change since v3:
>        - Update changelog in each patches. 
> 
> Change since v2:
>        - Move hv assist page(hv_pa_pg) from struct kvm  to struct kvm_hv.
> 
> Change since v1:
>        - Fix offset issue in the patch 1.
>        - Update description of KVM KVM_CAP_HYPERV_DIRECT_TLBFLUSH.
> 
> Tianyu Lan (2):
>   x86/Hyper-V: Fix definition of struct hv_vp_assist_page
>   KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH
> 
> Vitaly Kuznetsov (1):
>   KVM/Hyper-V/VMX: Add direct tlb flush support
> 
>  Documentation/virtual/kvm/api.txt  | 13 +++++++++++++
>  arch/x86/include/asm/hyperv-tlfs.h | 24 ++++++++++++++++++-----
>  arch/x86/include/asm/kvm_host.h    |  4 ++++
>  arch/x86/kvm/vmx/evmcs.h           |  2 ++
>  arch/x86/kvm/vmx/vmx.c             | 39 ++++++++++++++++++++++++++++++++++++++
>  arch/x86/kvm/x86.c                 |  8 ++++++++
>  include/uapi/linux/kvm.h           |  1 +
>  7 files changed, 86 insertions(+), 5 deletions(-)
> 

Queued, thanks.

Paolo

^ permalink raw reply

* Re: [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Paolo Bonzini @ 2019-09-17 14:07 UTC (permalink / raw)
  To: Jim Mattson, Vitaly Kuznetsov
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Radim Krčmář, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <CALMp9eQP7Dup+mMuAiShNtH754R_Wwuvf63hezygh3TGR=a9rg@mail.gmail.com>

On 16/09/19 19:16, Jim Mattson wrote:
>> KVM needs to know if SMT is theoretically possible, this means it is
>> supported and not forcefully disabled ('nosmt=force'). Create and
>> export cpu_smt_possible() answering this question.
> It seems to me that KVM really just wants to know if the scheduler can
> be trusted to avoid violating the invariant expressed by the Hyper-V
> enlightenment, NoNonArchitecturalCoreSharing. It is possible to do
> that even when SMT is enabled, if the scheduler is core-aware.
> Wouldn't it be better to implement a scheduler API that told you
> exactly what you wanted to know, rather than trying to infer the
> answer from various breadcrumbs?

Yes, however that scheduler API could also rely on something like
cpu_smt_possible(), at least in the case where core scheduling is not
active, so this is still a step in the right direction.

Paolo

^ permalink raw reply

* Re: [PATCH 2/3] KVM: x86: hyper-v: set NoNonArchitecturalCoreSharing CPUID bit when SMT is impossible
From: Paolo Bonzini @ 2019-09-17 14:08 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Jim Mattson
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Radim Krčmář, Sean Christopherson, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <87ef0fb72x.fsf@vitty.brq.redhat.com>

On 17/09/19 11:33, Vitaly Kuznetsov wrote:
> Jim Mattson <jmattson@google.com> writes:
> 
>> On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>>
>>> Hyper-V 2019 doesn't expose MD_CLEAR CPUID bit to guests when it cannot
>>> guarantee that two virtual processors won't end up running on sibling SMT
>>> threads without knowing about it. This is done as an optimization as in
>>> this case there is nothing the guest can do to protect itself against MDS
>>> and issuing additional flush requests is just pointless. On bare metal the
>>> topology is known, however, when Hyper-V is running nested (e.g. on top of
>>> KVM) it needs an additional piece of information: a confirmation that the
>>> exposed topology (wrt vCPU placement on different SMT threads) is
>>> trustworthy.
>>>
>>> NoNonArchitecturalCoreSharing (CPUID 0x40000004 EAX bit 18) is described in
>>> TLFS as follows: "Indicates that a virtual processor will never share a
>>> physical core with another virtual processor, except for virtual processors
>>> that are reported as sibling SMT threads." From KVM we can give such
>>> guarantee in two cases:
>>> - SMT is unsupported or forcefully disabled (just 'disabled' doesn't work
>>>  as it can become re-enabled during the lifetime of the guest).
>>> - vCPUs are properly pinned so the scheduler won't put them on sibling
>>> SMT threads (when they're not reported as such).
>>
>> That's a nice bit of information. Have you considered a mechanism for
>> communicating this information to kvm guests in a way that doesn't
>> require Hyper-V enlightenments?
>>
> 
> (I haven't put much thought in this) but can we re-use MD_CLEAR CPUID
> bit for that? Like if the hypervisor can't guarantee usefulness
> (e.g. when two random vCPUs can be put on sibling SMT threads) of
> flushing, is there any reason to still make the guest think the feature
> is there?

Yes, that's a good idea.

Paolo

^ permalink raw reply

* Re: [PATCH 1/3] cpu/SMT: create and export cpu_smt_possible()
From: Vitaly Kuznetsov @ 2019-09-17 15:11 UTC (permalink / raw)
  To: Jim Mattson
  Cc: kvm list, LKML, linux-hyperv, the arch/x86 maintainers,
	Paolo Bonzini, Radim Krčmář, Sean Christopherson,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Peter Zijlstra (Intel), Michael Kelley, Roman Kagan
In-Reply-To: <CALMp9eQP7Dup+mMuAiShNtH754R_Wwuvf63hezygh3TGR=a9rg@mail.gmail.com>

Jim Mattson <jmattson@google.com> writes:

> On Mon, Sep 16, 2019 at 9:23 AM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>>
>> KVM needs to know if SMT is theoretically possible, this means it is
>> supported and not forcefully disabled ('nosmt=force'). Create and
>> export cpu_smt_possible() answering this question.
>
> It seems to me that KVM really just wants to know if the scheduler can
> be trusted to avoid violating the invariant expressed by the Hyper-V
> enlightenment, NoNonArchitecturalCoreSharing. It is possible to do
> that even when SMT is enabled, if the scheduler is core-aware.
> Wouldn't it be better to implement a scheduler API that told you
> exactly what you wanted to know, rather than trying to infer the
> answer from various breadcrumbs?

(I know not that much about scheduler so please bear with me)

Having a trustworthy scheduler not placing unrelated (not exposed as
sibling SMT threads to a guest or vCPUs from different guests) on
sibling SMT threads when it's not limited with affinity is definitely a
good thing. We, however, also need to know if vCPU pinning is planned
for the guest: like when QEMU vCPU threads are created they're not
pinned, however, libvirt pins them if needed before launching the
guest. So 'NoNonArchitecturalCoreSharing' can also be set in two cases:
- No vCPU pinning is planned but the scheduler is aware of the problem
(I'm not sure it is nowadays)
- The upper layer promises to do the right pinning.

This patch series, however, doesn't go that deep, it only covers the
simplest case: SMT is unavailable or forcefully disabled. I'll try to
learn more about scheduler though.

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH V4 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support
From: Vitaly Kuznetsov @ 2019-09-17 15:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Tianyu Lan, kvm, linux-doc, linux-hyperv, linux-kernel,
	lantianyu1986, rkrcmar, corbet, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, hpa, x86, michael.h.kelley
In-Reply-To: <7ea7fa06-f100-1507-8507-1c701877c8ab@redhat.com>

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 22/08/19 16:30, lantianyu1986@gmail.com wrote:
>> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> 
>> This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
>> in L0 can delegate L1 hypervisor to handle tlb flush request from
>> L2 guest when direct tlb flush is enabled in L1.
>> 
>> Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
>> feature from user space. User space should enable this feature only
>> when Hyper-V hypervisor capability is exposed to guest and KVM profile
>> is hided. There is a parameter conflict between KVM and Hyper-V hypercall.
>> We hope L2 guest doesn't use KVM hypercall when the feature is
>> enabled. Detail please see comment of new API "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"
>> 
>> Change since v3:
>>        - Update changelog in each patches. 
>> 
>> Change since v2:
>>        - Move hv assist page(hv_pa_pg) from struct kvm  to struct kvm_hv.
>> 
>> Change since v1:
>>        - Fix offset issue in the patch 1.
>>        - Update description of KVM KVM_CAP_HYPERV_DIRECT_TLBFLUSH.
>> 
>> Tianyu Lan (2):
>>   x86/Hyper-V: Fix definition of struct hv_vp_assist_page
>>   KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH
>> 
>> Vitaly Kuznetsov (1):
>>   KVM/Hyper-V/VMX: Add direct tlb flush support
>> 
>>  Documentation/virtual/kvm/api.txt  | 13 +++++++++++++
>>  arch/x86/include/asm/hyperv-tlfs.h | 24 ++++++++++++++++++-----
>>  arch/x86/include/asm/kvm_host.h    |  4 ++++
>>  arch/x86/kvm/vmx/evmcs.h           |  2 ++
>>  arch/x86/kvm/vmx/vmx.c             | 39 ++++++++++++++++++++++++++++++++++++++
>>  arch/x86/kvm/x86.c                 |  8 ++++++++
>>  include/uapi/linux/kvm.h           |  1 +
>>  7 files changed, 86 insertions(+), 5 deletions(-)
>> 
>
> Queued, thanks.
>

I had a suggestion how we can get away without the new capability (like
direct tlb flush gets automatically enabled when Hyper-V hypercall page
is activated and we know we can't handle KVM hypercalls any more)
but this can probably be done as a follow-up.

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH V4 0/3] KVM/Hyper-V: Add Hyper-V direct tlb flush support
From: Tianyu Lan @ 2019-09-18  1:55 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Paolo Bonzini, Tianyu Lan, kvm, linux-doc, linux-hyperv,
	linux-kernel@vger kernel org, Radim Krcmar, corbet, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, michael.h.kelley
In-Reply-To: <874l1baqnf.fsf@vitty.brq.redhat.com>

On Tue, Sep 17, 2019 at 11:28 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Paolo Bonzini <pbonzini@redhat.com> writes:
>
> > On 22/08/19 16:30, lantianyu1986@gmail.com wrote:
> >> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >>
> >> This patchset is to add Hyper-V direct tlb support in KVM. Hyper-V
> >> in L0 can delegate L1 hypervisor to handle tlb flush request from
> >> L2 guest when direct tlb flush is enabled in L1.
> >>
> >> Patch 2 introduces new cap KVM_CAP_HYPERV_DIRECT_TLBFLUSH to enable
> >> feature from user space. User space should enable this feature only
> >> when Hyper-V hypervisor capability is exposed to guest and KVM profile
> >> is hided. There is a parameter conflict between KVM and Hyper-V hypercall.
> >> We hope L2 guest doesn't use KVM hypercall when the feature is
> >> enabled. Detail please see comment of new API "KVM_CAP_HYPERV_DIRECT_TLBFLUSH"
> >>
> >> Change since v3:
> >>        - Update changelog in each patches.
> >>
> >> Change since v2:
> >>        - Move hv assist page(hv_pa_pg) from struct kvm  to struct kvm_hv.
> >>
> >> Change since v1:
> >>        - Fix offset issue in the patch 1.
> >>        - Update description of KVM KVM_CAP_HYPERV_DIRECT_TLBFLUSH.
> >>
> >> Tianyu Lan (2):
> >>   x86/Hyper-V: Fix definition of struct hv_vp_assist_page
> >>   KVM/Hyper-V: Add new KVM capability KVM_CAP_HYPERV_DIRECT_TLBFLUSH
> >>
> >> Vitaly Kuznetsov (1):
> >>   KVM/Hyper-V/VMX: Add direct tlb flush support
> >>
> >>  Documentation/virtual/kvm/api.txt  | 13 +++++++++++++
> >>  arch/x86/include/asm/hyperv-tlfs.h | 24 ++++++++++++++++++-----
> >>  arch/x86/include/asm/kvm_host.h    |  4 ++++
> >>  arch/x86/kvm/vmx/evmcs.h           |  2 ++
> >>  arch/x86/kvm/vmx/vmx.c             | 39 ++++++++++++++++++++++++++++++++++++++
> >>  arch/x86/kvm/x86.c                 |  8 ++++++++
> >>  include/uapi/linux/kvm.h           |  1 +
> >>  7 files changed, 86 insertions(+), 5 deletions(-)
> >>
> >
> > Queued, thanks.
> >
>
> I had a suggestion how we can get away without the new capability (like
> direct tlb flush gets automatically enabled when Hyper-V hypercall page
> is activated and we know we can't handle KVM hypercalls any more)
> but this can probably be done as a follow-up.
>

Hi Vital'y：
              Actually, I have tried your proposal but it turns out
KVM in L1 fails to
enable direct tlb flush most time after nested VM starts.
"hv_enlightenments_control.
nested_flush_hypercall" flag in evmcs is cleared by Hyper-V after run
nested VM. I still
wait answer from Hyper-V team. So far, it looks like enabling direct
tlb flush before start
nested VM is a safe way.Once get more infomration from Hyper-V team and we may
have a look to how to enable your proposal.
-- 
Best regards
Tianyu Lan

^ permalink raw reply

* [PATHC v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Wei Hu @ 2019-09-18  6:03 UTC (permalink / raw)
  To: Michael Kelley, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan, Dexuan Cui
  Cc: Wei Hu

Without deferred IO support, hyperv_fb driver informs the host to refresh
the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
is screen update or not. This patch supports deferred IO for screens in
graphics mode and also enables the frame buffer on-demand refresh. The
highest refresh rate is still set at 20Hz.

Currently Hyper-V only takes a physical address from guest as the starting
address of frame buffer. This implies the guest must allocate contiguous
physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
accept address from MMIO region as frame buffer address. Due to these
limitations on Hyper-V host, we keep a shadow copy of frame buffer
in the guest. This means one more copy of the dirty rectangle inside
guest when doing the on-demand refresh. This can be optimized in the
future with help from host. For now the host performance gain from deferred
IO outweighs the shadow copy impact in the guest.

Signed-off-by: Wei Hu <weh@microsoft.com>
---
    v2: Incorporated review comments from Michael Kelley
    - Increased dirty rectangle by one row in deferred IO case when sending
    to Hyper-V.
    - Corrected the dirty rectangle size in the text mode.
    - Added more comments.
    - Other minor code cleanups.

    v3: Incorporated more review comments
    - Removed a few unnecessary variable tests

    v4: Incorporated test and review feedback from Dexuan Cui
    - Not disable interrupt while acquiring docopy_lock in
      hvfb_update_work(). This avoids significant bootup delay in
      large vCPU count VMs.

    v5: Completely remove the unnecessary docopy_lock after discussing
    with Dexuan Cui.

    v6: Do not request host refresh when the VM guest screen is
    closed or minimized.

 drivers/video/fbdev/Kconfig     |   1 +
 drivers/video/fbdev/hyperv_fb.c | 210 ++++++++++++++++++++++++++++----
 2 files changed, 190 insertions(+), 21 deletions(-)

diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
index 1b2f5f31fb6f..e781f89a1824 100644
--- a/drivers/video/fbdev/Kconfig
+++ b/drivers/video/fbdev/Kconfig
@@ -2241,6 +2241,7 @@ config FB_HYPERV
 	select FB_CFB_FILLRECT
 	select FB_CFB_COPYAREA
 	select FB_CFB_IMAGEBLIT
+	select FB_DEFERRED_IO
 	help
 	  This framebuffer driver supports Microsoft Hyper-V Synthetic Video.
 
diff --git a/drivers/video/fbdev/hyperv_fb.c b/drivers/video/fbdev/hyperv_fb.c
index fe319fc39bec..0c57445f8357 100644
--- a/drivers/video/fbdev/hyperv_fb.c
+++ b/drivers/video/fbdev/hyperv_fb.c
@@ -237,6 +237,7 @@ struct synthvid_msg {
 #define RING_BUFSIZE (256 * 1024)
 #define VSP_TIMEOUT (10 * HZ)
 #define HVFB_UPDATE_DELAY (HZ / 20)
+#define HVFB_ONDEMAND_THROTTLE (HZ / 20)
 
 struct hvfb_par {
 	struct fb_info *info;
@@ -256,6 +257,16 @@ struct hvfb_par {
 	bool synchronous_fb;
 
 	struct notifier_block hvfb_panic_nb;
+
+	/* Memory for deferred IO and frame buffer itself */
+	unsigned char *dio_vp;
+	unsigned char *mmio_vp;
+	unsigned long mmio_pp;
+
+	/* Dirty rectangle, protected by delayed_refresh_lock */
+	int x1, y1, x2, y2;
+	bool delayed_refresh;
+	spinlock_t delayed_refresh_lock;
 };
 
 static uint screen_width = HVFB_WIDTH;
@@ -264,6 +275,7 @@ static uint screen_width_max = HVFB_WIDTH;
 static uint screen_height_max = HVFB_HEIGHT;
 static uint screen_depth;
 static uint screen_fb_size;
+static uint dio_fb_size; /* FB size for deferred IO */
 
 /* Send message to Hyper-V host */
 static inline int synthvid_send(struct hv_device *hdev,
@@ -350,28 +362,88 @@ static int synthvid_send_ptr(struct hv_device *hdev)
 }
 
 /* Send updated screen area (dirty rectangle) location to host */
-static int synthvid_update(struct fb_info *info)
+static int
+synthvid_update(struct fb_info *info, int x1, int y1, int x2, int y2)
 {
 	struct hv_device *hdev = device_to_hv_device(info->device);
 	struct synthvid_msg msg;
 
 	memset(&msg, 0, sizeof(struct synthvid_msg));
+	if (x2 == INT_MAX)
+		x2 = info->var.xres;
+	if (y2 == INT_MAX)
+		y2 = info->var.yres;
 
 	msg.vid_hdr.type = SYNTHVID_DIRT;
 	msg.vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
 		sizeof(struct synthvid_dirt);
 	msg.dirt.video_output = 0;
 	msg.dirt.dirt_count = 1;
-	msg.dirt.rect[0].x1 = 0;
-	msg.dirt.rect[0].y1 = 0;
-	msg.dirt.rect[0].x2 = info->var.xres;
-	msg.dirt.rect[0].y2 = info->var.yres;
+	msg.dirt.rect[0].x1 = (x1 > x2) ? 0 : x1;
+	msg.dirt.rect[0].y1 = (y1 > y2) ? 0 : y1;
+	msg.dirt.rect[0].x2 =
+		(x2 < x1 || x2 > info->var.xres) ? info->var.xres : x2;
+	msg.dirt.rect[0].y2 =
+		(y2 < y1 || y2 > info->var.yres) ? info->var.yres : y2;
 
 	synthvid_send(hdev, &msg);
 
 	return 0;
 }
 
+static void hvfb_docopy(struct hvfb_par *par,
+			unsigned long offset,
+			unsigned long size)
+{
+	if (!par || !par->mmio_vp || !par->dio_vp || !par->fb_ready ||
+	    size == 0 || offset >= dio_fb_size)
+		return;
+
+	if (offset + size > dio_fb_size)
+		size = dio_fb_size - offset;
+
+	memcpy(par->mmio_vp + offset, par->dio_vp + offset, size);
+}
+
+/* Deferred IO callback */
+static void synthvid_deferred_io(struct fb_info *p,
+				 struct list_head *pagelist)
+{
+	struct hvfb_par *par = p->par;
+	struct page *page;
+	unsigned long start, end;
+	int y1, y2, miny, maxy;
+
+	miny = INT_MAX;
+	maxy = 0;
+
+	/*
+	 * Merge dirty pages. It is possible that last page cross
+	 * over the end of frame buffer row yres. This is taken care of
+	 * in synthvid_update function by clamping the y2
+	 * value to yres.
+	 */
+	list_for_each_entry(page, pagelist, lru) {
+		start = page->index << PAGE_SHIFT;
+		end = start + PAGE_SIZE - 1;
+		y1 = start / p->fix.line_length;
+		y2 = end / p->fix.line_length;
+		miny = min_t(int, miny, y1);
+		maxy = max_t(int, maxy, y2);
+
+		/* Copy from dio space to mmio address */
+		if (par->fb_ready)
+			hvfb_docopy(par, start, PAGE_SIZE);
+	}
+
+	if (par->fb_ready && par->update)
+		synthvid_update(p, 0, miny, p->var.xres, maxy + 1);
+}
+
+static struct fb_deferred_io synthvid_defio = {
+	.delay		= HZ / 20,
+	.deferred_io	= synthvid_deferred_io,
+};
 
 /*
  * Actions on received messages from host:
@@ -618,7 +690,7 @@ static int synthvid_send_config(struct hv_device *hdev)
 	msg->vid_hdr.type = SYNTHVID_VRAM_LOCATION;
 	msg->vid_hdr.size = sizeof(struct synthvid_msg_hdr) +
 		sizeof(struct synthvid_vram_location);
-	msg->vram.user_ctx = msg->vram.vram_gpa = info->fix.smem_start;
+	msg->vram.user_ctx = msg->vram.vram_gpa = par->mmio_pp;
 	msg->vram.is_vram_gpa_specified = 1;
 	synthvid_send(hdev, msg);
 
@@ -628,7 +700,7 @@ static int synthvid_send_config(struct hv_device *hdev)
 		ret = -ETIMEDOUT;
 		goto out;
 	}
-	if (msg->vram_ack.user_ctx != info->fix.smem_start) {
+	if (msg->vram_ack.user_ctx != par->mmio_pp) {
 		pr_err("Unable to set VRAM location\n");
 		ret = -ENODEV;
 		goto out;
@@ -645,19 +717,77 @@ static int synthvid_send_config(struct hv_device *hdev)
 
 /*
  * Delayed work callback:
- * It is called at HVFB_UPDATE_DELAY or longer time interval to process
- * screen updates. It is re-scheduled if further update is necessary.
+ * It is scheduled to call whenever update request is received and it has
+ * not been called in last HVFB_ONDEMAND_THROTTLE time interval.
  */
 static void hvfb_update_work(struct work_struct *w)
 {
 	struct hvfb_par *par = container_of(w, struct hvfb_par, dwork.work);
 	struct fb_info *info = par->info;
+	unsigned long flags;
+	int x1, x2, y1, y2;
+	int j;
+
+	spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+	/* Reset the request flag */
+	par->delayed_refresh = false;
+
+	/* Store the dirty rectangle to local variables */
+	x1 = par->x1;
+	x2 = par->x2;
+	y1 = par->y1;
+	y2 = par->y2;
+
+	/* Clear dirty rectangle */
+	par->x1 = par->y1 = INT_MAX;
+	par->x2 = par->y2 = 0;
+
+	spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
+
+	if (x1 > info->var.xres || x2 > info->var.xres ||
+	    y1 > info->var.yres || y2 > info->var.yres || x2 <= x1)
+		return;
+
+	/* Copy the dirty rectangle to frame buffer memory */
+	for (j = y1; j < y2; j++) {
+		hvfb_docopy(par,
+			    j * info->fix.line_length +
+			    (x1 * screen_depth / 8),
+			    (x2 - x1) * screen_depth / 8);
+	}
+
+	/* Refresh */
+	if (par->fb_ready && par->update)
+		synthvid_update(info, x1, y1, x2, y2);
+}
 
-	if (par->fb_ready)
-		synthvid_update(info);
+/*
+ * Control the on-demand refresh frequency. It schedules a delayed
+ * screen update if it has not yet.
+ */
+static void hvfb_ondemand_refresh_throttle(struct hvfb_par *par,
+					   int x1, int y1, int w, int h)
+{
+	unsigned long flags;
+	int x2 = x1 + w;
+	int y2 = y1 + h;
+
+	spin_lock_irqsave(&par->delayed_refresh_lock, flags);
+
+	/* Merge dirty rectangle */
+	par->x1 = min_t(int, par->x1, x1);
+	par->y1 = min_t(int, par->y1, y1);
+	par->x2 = max_t(int, par->x2, x2);
+	par->y2 = max_t(int, par->y2, y2);
+
+	/* Schedule a delayed screen update if not yet */
+	if (par->delayed_refresh == false) {
+		schedule_delayed_work(&par->dwork,
+				      HVFB_ONDEMAND_THROTTLE);
+		par->delayed_refresh = true;
+	}
 
-	if (par->update)
-		schedule_delayed_work(&par->dwork, HVFB_UPDATE_DELAY);
+	spin_unlock_irqrestore(&par->delayed_refresh_lock, flags);
 }
 
 static int hvfb_on_panic(struct notifier_block *nb,
@@ -669,7 +799,8 @@ static int hvfb_on_panic(struct notifier_block *nb,
 	par = container_of(nb, struct hvfb_par, hvfb_panic_nb);
 	par->synchronous_fb = true;
 	info = par->info;
-	synthvid_update(info);
+	hvfb_docopy(par, 0, dio_fb_size);
+	synthvid_update(info, 0, 0, INT_MAX, INT_MAX);
 
 	return NOTIFY_DONE;
 }
@@ -730,7 +861,10 @@ static void hvfb_cfb_fillrect(struct fb_info *p,
 
 	cfb_fillrect(p, rect);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, rect->dx, rect->dy,
+					       rect->width, rect->height);
 }
 
 static void hvfb_cfb_copyarea(struct fb_info *p,
@@ -740,7 +874,10 @@ static void hvfb_cfb_copyarea(struct fb_info *p,
 
 	cfb_copyarea(p, area);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, area->dx, area->dy,
+					       area->width, area->height);
 }
 
 static void hvfb_cfb_imageblit(struct fb_info *p,
@@ -750,7 +887,10 @@ static void hvfb_cfb_imageblit(struct fb_info *p,
 
 	cfb_imageblit(p, image);
 	if (par->synchronous_fb)
-		synthvid_update(p);
+		synthvid_update(p, 0, 0, INT_MAX, INT_MAX);
+	else
+		hvfb_ondemand_refresh_throttle(par, image->dx, image->dy,
+					       image->width, image->height);
 }
 
 static struct fb_ops hvfb_ops = {
@@ -809,6 +949,9 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 	resource_size_t pot_start, pot_end;
 	int ret;
 
+	dio_fb_size =
+		screen_width * screen_height * screen_depth / 8;
+
 	if (gen2vm) {
 		pot_start = 0;
 		pot_end = -1;
@@ -843,9 +986,14 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 	if (!fb_virt)
 		goto err2;
 
+	/* Allocate memory for deferred IO */
+	par->dio_vp = vzalloc(round_up(dio_fb_size, PAGE_SIZE));
+	if (par->dio_vp == NULL)
+		goto err3;
+
 	info->apertures = alloc_apertures(1);
 	if (!info->apertures)
-		goto err3;
+		goto err4;
 
 	if (gen2vm) {
 		info->apertures->ranges[0].base = screen_info.lfb_base;
@@ -857,16 +1005,23 @@ static int hvfb_getmem(struct hv_device *hdev, struct fb_info *info)
 		info->apertures->ranges[0].size = pci_resource_len(pdev, 0);
 	}
 
+	/* Physical address of FB device */
+	par->mmio_pp = par->mem->start;
+	/* Virtual address of FB device */
+	par->mmio_vp = (unsigned char *) fb_virt;
+
 	info->fix.smem_start = par->mem->start;
-	info->fix.smem_len = screen_fb_size;
-	info->screen_base = fb_virt;
-	info->screen_size = screen_fb_size;
+	info->fix.smem_len = dio_fb_size;
+	info->screen_base = par->dio_vp;
+	info->screen_size = dio_fb_size;
 
 	if (!gen2vm)
 		pci_dev_put(pdev);
 
 	return 0;
 
+err4:
+	vfree(par->dio_vp);
 err3:
 	iounmap(fb_virt);
 err2:
@@ -884,6 +1039,7 @@ static void hvfb_putmem(struct fb_info *info)
 {
 	struct hvfb_par *par = info->par;
 
+	vfree(par->dio_vp);
 	iounmap(info->screen_base);
 	vmbus_free_mmio(par->mem->start, screen_fb_size);
 	par->mem = NULL;
@@ -909,6 +1065,11 @@ static int hvfb_probe(struct hv_device *hdev,
 	init_completion(&par->wait);
 	INIT_DELAYED_WORK(&par->dwork, hvfb_update_work);
 
+	par->delayed_refresh = false;
+	spin_lock_init(&par->delayed_refresh_lock);
+	par->x1 = par->y1 = INT_MAX;
+	par->x2 = par->y2 = 0;
+
 	/* Connect to VSP */
 	hv_set_drvdata(hdev, info);
 	ret = synthvid_connect_vsp(hdev);
@@ -960,6 +1121,10 @@ static int hvfb_probe(struct hv_device *hdev,
 	info->fbops = &hvfb_ops;
 	info->pseudo_palette = par->pseudo_palette;
 
+	/* Initialize deferred IO */
+	info->fbdefio = &synthvid_defio;
+	fb_deferred_io_init(info);
+
 	/* Send config to host */
 	ret = synthvid_send_config(hdev);
 	if (ret)
@@ -981,6 +1146,7 @@ static int hvfb_probe(struct hv_device *hdev,
 	return 0;
 
 error:
+	fb_deferred_io_cleanup(info);
 	hvfb_putmem(info);
 error2:
 	vmbus_close(hdev->channel);
@@ -1003,6 +1169,8 @@ static int hvfb_remove(struct hv_device *hdev)
 	par->update = false;
 	par->fb_ready = false;
 
+	fb_deferred_io_cleanup(info);
+
 	unregister_framebuffer(info);
 	cancel_delayed_work_sync(&par->dwork);
 
-- 
2.20.1


^ permalink raw reply related

* RE: [PATHC v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Dexuan Cui @ 2019-09-18 16:45 UTC (permalink / raw)
  To: Wei Hu, Michael Kelley, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan
In-Reply-To: <20190918060227.6834-1-weh@microsoft.com>

> From: Wei Hu <weh@microsoft.com>
> Sent: Tuesday, September 17, 2019 11:03 PM
> [...]
> ---
>     v6: Do not request host refresh when the VM guest screen is
>     closed or minimized.

Looks good to me. Thanks!

Reviewed-by: Dexuan Cui <decui@microsoft.com>

^ permalink raw reply

* [PATCH] hv: vmbus: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2019-09-18 20:00 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin
  Cc: Arnd Bergmann, Dexuan Cui, Michael Kelley, linux-hyperv,
	linux-kernel

When CONFIG_PM is disabled, we get a couple of harmless warnings:

drivers/hv/vmbus_drv.c:918:12: error: unused function 'vmbus_suspend' [-Werror,-Wunused-function]
drivers/hv/vmbus_drv.c:937:12: error: unused function 'vmbus_resume' [-Werror,-Wunused-function]
drivers/hv/vmbus_drv.c:2128:12: error: unused function 'vmbus_bus_suspend' [-Werror,-Wunused-function]
drivers/hv/vmbus_drv.c:2208:12: error: unused function 'vmbus_bus_resume' [-Werror,-Wunused-function]

Mark these functions __maybe_unused to let gcc drop them silently.

Fixes: f53335e3289f ("Drivers: hv: vmbus: Suspend/resume the vmbus itself for hibernation")
Fixes: 271b2224d42f ("Drivers: hv: vmbus: Implement suspend/resume for VSC drivers for hibernation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/hv/vmbus_drv.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 391f0b225c9a..2542eb1f872b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -915,7 +915,7 @@ static void vmbus_shutdown(struct device *child_device)
 /*
  * vmbus_suspend - Suspend a vmbus device
  */
-static int vmbus_suspend(struct device *child_device)
+static int __maybe_unused vmbus_suspend(struct device *child_device)
 {
 	struct hv_driver *drv;
 	struct hv_device *dev = device_to_hv_device(child_device);
@@ -934,7 +934,7 @@ static int vmbus_suspend(struct device *child_device)
 /*
  * vmbus_resume - Resume a vmbus device
  */
-static int vmbus_resume(struct device *child_device)
+static int __maybe_unused vmbus_resume(struct device *child_device)
 {
 	struct hv_driver *drv;
 	struct hv_device *dev = device_to_hv_device(child_device);
@@ -2125,7 +2125,7 @@ static int vmbus_acpi_add(struct acpi_device *device)
 	return ret_val;
 }
 
-static int vmbus_bus_suspend(struct device *dev)
+static int __maybe_unused vmbus_bus_suspend(struct device *dev)
 {
 	struct vmbus_channel *channel, *sc;
 	unsigned long flags;
@@ -2205,7 +2205,7 @@ static int vmbus_bus_suspend(struct device *dev)
 	return 0;
 }
 
-static int vmbus_bus_resume(struct device *dev)
+static int __maybe_unused vmbus_bus_resume(struct device *dev)
 {
 	struct vmbus_channel_msginfo *msginfo;
 	size_t msgsize;
-- 
2.20.0


^ permalink raw reply related

* RE: [PATHC v6] video: hyperv: hyperv_fb: Support deferred IO for Hyper-V frame buffer driver
From: Michael Kelley @ 2019-09-18 21:48 UTC (permalink / raw)
  To: Wei Hu, rdunlap@infradead.org, shc_work@mail.ru,
	gregkh@linuxfoundation.org, lee.jones@linaro.org,
	alexandre.belloni@bootlin.com, baijiaju1990@gmail.com,
	fthain@telegraphics.com.au, info@metux.net,
	linux-hyperv@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linux-fbdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	sashal@kernel.org, Stephen Hemminger, Haiyang Zhang,
	KY Srinivasan, Dexuan Cui
In-Reply-To: <20190918060227.6834-1-weh@microsoft.com>

From: Wei Hu <weh@microsoft.com> Sent: Tuesday, September 17, 2019 11:03 PM

> 
> Without deferred IO support, hyperv_fb driver informs the host to refresh
> the entire guest frame buffer at fixed rate, e.g. at 20Hz, no matter there
> is screen update or not. This patch supports deferred IO for screens in
> graphics mode and also enables the frame buffer on-demand refresh. The
> highest refresh rate is still set at 20Hz.
> 
> Currently Hyper-V only takes a physical address from guest as the starting
> address of frame buffer. This implies the guest must allocate contiguous
> physical memory for frame buffer. In addition, Hyper-V Gen 2 VMs only
> accept address from MMIO region as frame buffer address. Due to these
> limitations on Hyper-V host, we keep a shadow copy of frame buffer
> in the guest. This means one more copy of the dirty rectangle inside
> guest when doing the on-demand refresh. This can be optimized in the
> future with help from host. For now the host performance gain from deferred
> IO outweighs the shadow copy impact in the guest.
> 
> Signed-off-by: Wei Hu <weh@microsoft.com>
> ---
>     v2: Incorporated review comments from Michael Kelley
>     - Increased dirty rectangle by one row in deferred IO case when sending
>     to Hyper-V.
>     - Corrected the dirty rectangle size in the text mode.
>     - Added more comments.
>     - Other minor code cleanups.
> 
>     v3: Incorporated more review comments
>     - Removed a few unnecessary variable tests
> 
>     v4: Incorporated test and review feedback from Dexuan Cui
>     - Not disable interrupt while acquiring docopy_lock in
>       hvfb_update_work(). This avoids significant bootup delay in
>       large vCPU count VMs.
> 
>     v5: Completely remove the unnecessary docopy_lock after discussing
>     with Dexuan Cui.
> 
>     v6: Do not request host refresh when the VM guest screen is
>     closed or minimized.
> 
>  drivers/video/fbdev/Kconfig     |   1 +
>  drivers/video/fbdev/hyperv_fb.c | 210 ++++++++++++++++++++++++++++----
>  2 files changed, 190 insertions(+), 21 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH] hv: vmbus: mark PM functions as __maybe_unused
From: Dexuan Cui @ 2019-09-19  5:15 UTC (permalink / raw)
  To: Arnd Bergmann, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin
  Cc: Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20190918200052.2261769-1-arnd@arndb.de>

> From: Arnd Bergmann <arnd@arndb.de>
> Sent: Wednesday, September 18, 2019 1:01 PM
> 
> When CONFIG_PM is disabled, we get a couple of harmless warnings:
> 
> drivers/hv/vmbus_drv.c:918:12: error: unused function 'vmbus_suspend'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:937:12: error: unused function 'vmbus_resume'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:2128:12: error: unused function 'vmbus_bus_suspend'
> [-Werror,-Wunused-function]
> drivers/hv/vmbus_drv.c:2208:12: error: unused function 'vmbus_bus_resume'
> [-Werror,-Wunused-function]
> 
> Mark these functions __maybe_unused to let gcc drop them silently.

Hi Arnd,
Thanks for reporting the issue!

If CONFIG_PM is not set, IMO it's better to comment out these functions. :-)

I'll post a patch for this with you Cc'd.

Thanks,
-- Dexuan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox