Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH v3 2/4] x86/kvm: Change print code to use pr_*() format
From: Sean Christopherson @ 2019-10-02 17:15 UTC (permalink / raw)
  To: Zhenzhong Duan
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Paolo Bonzini,
	Radim Krcmar, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <1569847479-13201-3-git-send-email-zhenzhong.duan@oracle.com>

On Mon, Sep 30, 2019 at 08:44:37PM +0800, Zhenzhong Duan wrote:
> pr_*() is preferred than printk(KERN_* ...), after change all the print
> in arch/x86/kernel/kvm.c will have "KVM: xxx" style.
> 
> No functional change.
> 
> Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>

This wasn't really suggested by Vitaly, he just requested it be done in a
separate patch.

> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krcmar <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/kernel/kvm.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index a4f108d..ce4f578 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -7,6 +7,8 @@
>   *   Authors: Anthony Liguori <aliguori@us.ibm.com>
>   */
>  
> +#define pr_fmt(fmt) "KVM: " fmt

Not a fan of "KVM" as the prefix as it's easily confused with KVM the
hypervisor.  Maybe "kvm_guest"?

> +
>  #include <linux/context_tracking.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> @@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
>  		return;
>  
>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
> -	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
> -		cpu, (unsigned long long) slow_virt_to_phys(st));
> +	pr_info("stealtime: cpu %d, msr %llx\n", cpu,
> +		(unsigned long long) slow_virt_to_phys(st));
>  }
>  
>  static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> @@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
>  
>  		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
>  		__this_cpu_write(apf_reason.enabled, 1);
> -		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
> -		       smp_processor_id());
> +		pr_info("setup async PF for cpu %d\n", smp_processor_id());
>  	}
>  
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
> @@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
>  	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
>  	__this_cpu_write(apf_reason.enabled, 0);
>  
> -	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
> -	       smp_processor_id());
> +	pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id());
>  }
>  
>  static void kvm_pv_guest_cpu_reboot(void *unused)
> @@ -509,7 +509,7 @@ static void kvm_setup_pv_ipi(void)
>  {
>  	apic->send_IPI_mask = kvm_send_ipi_mask;
>  	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> -	pr_info("KVM setup pv IPIs\n");
> +	pr_info("setup pv IPIs\n");
>  }
>  
>  static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
> @@ -639,11 +639,11 @@ static void __init kvm_guest_init(void)
>  	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
>  	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
>  		smp_ops.send_call_func_ipi = kvm_smp_send_call_func_ipi;
> -		pr_info("KVM setup pv sched yield\n");
> +		pr_info("setup pv sched yield\n");
>  	}
>  	if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
>  				      kvm_cpu_online, kvm_cpu_down_prepare) < 0)
> -		pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
> +		pr_err("failed to install cpu hotplug callbacks\n");
>  #else
>  	sev_map_percpu_data();
>  	kvm_guest_cpu_init();
> @@ -746,7 +746,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
>  			zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
>  				GFP_KERNEL, cpu_to_node(cpu));
>  		}
> -		pr_info("KVM setup pv remote TLB flush\n");
> +		pr_info("setup pv remote TLB flush\n");
>  	}
>  
>  	return 0;
> @@ -879,8 +879,8 @@ static void kvm_enable_host_haltpoll(void *i)
>  void arch_haltpoll_enable(unsigned int cpu)
>  {
>  	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
> -		pr_err_once("kvm: host does not support poll control\n");
> -		pr_err_once("kvm: host upgrade recommended\n");
> +		pr_err_once("host does not support poll control\n");
> +		pr_err_once("host upgrade recommended\n");
>  		return;
>  	}
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH v3 3/4] xen: Mark "xen_nopvspin" parameter obsolete and map it to "nopvspin"
From: Sean Christopherson @ 2019-10-02 17:18 UTC (permalink / raw)
  To: Zhenzhong Duan
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Jonathan Corbet,
	Boris Ostrovsky, Juergen Gross, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <1569847479-13201-4-git-send-email-zhenzhong.duan@oracle.com>

On Mon, Sep 30, 2019 at 08:44:38PM +0800, Zhenzhong Duan wrote:
> Fix stale description of "xen_nopvspin" as we use qspinlock now.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  7 ++++---
>  arch/x86/xen/spinlock.c                         | 13 +++++++------
>  2 files changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 4b956d8..1f0a62f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5303,8 +5303,9 @@
>  			never -- do not unplug even if version check succeeds
>  
>  	xen_nopvspin	[X86,XEN]
> -			Disables the ticketlock slowpath using Xen PV
> -			optimizations.
> +			Disables the qspinlock slowpath using Xen PV optimizations.
> +			This parameter is obsoleted by "nopvspin" parameter, which
> +			has equivalent effect for XEN platform.
>  
>  	xen_nopv	[X86]
>  			Disables the PV optimizations forcing the HVM guest to
> @@ -5330,7 +5331,7 @@
>  			as generic guest with no PV drivers. Currently support
>  			XEN HVM, KVM, HYPER_V and VMWARE guest.
>  
> -	nopvspin	[X86,KVM] Disables the qspinlock slow path
> +	nopvspin	[X86,XEN,KVM] Disables the qspinlock slow path
>  			using PV optimizations which allow the hypervisor to
>  			'idle' the guest on lock contention.
>  
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 6deb490..092a53f 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -18,7 +18,6 @@
>  static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
>  static DEFINE_PER_CPU(char *, irq_name);
>  static DEFINE_PER_CPU(atomic_t, xen_qlock_wait_nest);
> -static bool xen_pvspin = true;
>  
>  static void xen_qlock_kick(int cpu)
>  {
> @@ -68,7 +67,7 @@ void xen_init_lock_cpu(int cpu)
>  	int irq;
>  	char *name;
>  
> -	if (!xen_pvspin)
> +	if (!pvspin)
>  		return;
>  
>  	WARN(per_cpu(lock_kicker_irq, cpu) >= 0, "spinlock on CPU%d exists on IRQ%d!\n",
> @@ -93,7 +92,7 @@ void xen_init_lock_cpu(int cpu)
>  
>  void xen_uninit_lock_cpu(int cpu)
>  {
> -	if (!xen_pvspin)
> +	if (!pvspin)
>  		return;
>  
>  	unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
> @@ -117,9 +116,9 @@ void __init xen_init_spinlocks(void)
>  
>  	/*  Don't need to use pvqspinlock code if there is only 1 vCPU. */
>  	if (num_possible_cpus() == 1)
> -		xen_pvspin = false;
> +		pvspin = false;

As suggested in the other patch, if you incorporate pvspin (or nopvspin)
into xen_pvspin then the param can be __initdata and the diff for this
patch will be smaller, e.g. you wouldn't need the xen_domain() shenanigans
in xen_parse_nopvspin().

> -	if (!xen_pvspin) {
> +	if (!pvspin) {
>  		printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
>  		static_branch_disable(&virt_spin_lock_key);
>  		return;
> @@ -137,7 +136,9 @@ void __init xen_init_spinlocks(void)
>  
>  static __init int xen_parse_nopvspin(char *arg)
>  {
> -	xen_pvspin = false;
> +	pr_notice("\"xen_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
> +	if (xen_domain())
> +		pvspin = false;
>  	return 0;
>  }
>  early_param("xen_nopvspin", xen_parse_nopvspin);
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH v3 4/4] x86/hyperv: Mark "hv_nopvspin" parameter obsolete and map it to "nopvspin"
From: Sean Christopherson @ 2019-10-02 17:19 UTC (permalink / raw)
  To: Zhenzhong Duan
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Jonathan Corbet,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <1569847479-13201-5-git-send-email-zhenzhong.duan@oracle.com>

On Mon, Sep 30, 2019 at 08:44:39PM +0800, Zhenzhong Duan wrote:
> Includes asm/hypervisor.h in order to reference x86_hyper_type.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
>  arch/x86/hyperv/hv_spinlock.c                   | 9 +++++----
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 1f0a62f..43f922c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1436,6 +1436,10 @@
>  	hv_nopvspin	[X86,HYPER_V] Disables the paravirt spinlock optimizations
>  				      which allow the hypervisor to 'idle' the
>  				      guest on lock contention.
> +				      This parameter is obsoleted by "nopvspin"
> +				      parameter, which has equivalent effect for
> +				      HYPER_V platform.
> +
>  
>  	keep_bootcon	[KNL]
>  			Do not unregister boot console at start. This is only
> @@ -5331,7 +5335,7 @@
>  			as generic guest with no PV drivers. Currently support
>  			XEN HVM, KVM, HYPER_V and VMWARE guest.
>  
> -	nopvspin	[X86,XEN,KVM] Disables the qspinlock slow path
> +	nopvspin	[X86,XEN,KVM,HYPER_V] Disables the qspinlock slow path
>  			using PV optimizations which allow the hypervisor to
>  			'idle' the guest on lock contention.
>  
> diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
> index 07f21a0..e00e319 100644
> --- a/arch/x86/hyperv/hv_spinlock.c
> +++ b/arch/x86/hyperv/hv_spinlock.c
> @@ -12,12 +12,11 @@
>  
>  #include <linux/spinlock.h>
>  
> +#include <asm/hypervisor.h>
>  #include <asm/mshyperv.h>
>  #include <asm/paravirt.h>
>  #include <asm/apic.h>
>  
> -static bool __initdata hv_pvspin = true;
> -
>  static void hv_qlock_kick(int cpu)
>  {
>  	apic->send_IPI(cpu, X86_PLATFORM_IPI_VECTOR);
> @@ -64,7 +63,7 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
>  
>  void __init hv_init_spinlocks(void)
>  {
> -	if (!hv_pvspin || !apic ||
> +	if (!pvspin || !apic ||
>  	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
>  	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
>  		pr_info("PV spinlocks disabled\n");
> @@ -82,7 +81,9 @@ void __init hv_init_spinlocks(void)
>  
>  static __init int hv_parse_nopvspin(char *arg)
>  {
> -	hv_pvspin = false;
> +	pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
> +	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
> +		pvspin = false;

Personal preference would be to keep the hv_pvspin variable and add the
extra check in hv_init_spinlocks().

>  	return 0;
>  }
>  early_param("hv_nopvspin", hv_parse_nopvspin);
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-10-03  5:35 UTC (permalink / raw)
  To: dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <20190930230652.GW237523@dtor-ws>

> From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> Sent: Monday, September 30, 2019 4:07 PM
> 
> On Mon, Sep 30, 2019 at 10:09:27PM +0000, Dexuan Cui wrote:
> > > From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> > > Sent: Friday, September 27, 2019 5:32 PM
> > > > ...
> > > > pm_wakeup_pending() is tested in a lot of places in the suspend
> > > > process and eventually an unintentional keystroke (or mouse movement,
> > > > when it comes to the Hyper-V mouse driver drivers/hid/hid-hyperv.c)
> > > > causes the whole hibernation process to be aborted. Usually this
> > > > behavior is not expected by the user, I think.
> > >
> > > Why not? If a device is configured as wakeup source, then it activity
> > > should wake up the system, unless you disable it.
> >
> > Generally speaking, I agree, but compared to a physical machine, IMO
> > the scenario is a little different when it comes to a VM running on Hyper-V:
> > on the host there is a window that represents the VM, and the user can
> > unintentionally switch the keyboard input focus to the window (or move
> > the mouse/cursor over the window) and then the host automatically
> > sends some special keystrokes (and mouse events) , and this aborts the
> > hibernation process.
> >
> > And, when it comes to the Hyper-V mouse device, IMO it's easy for the
> > user to unintentionally move the mouse after the "hibernation" button
> > is clicked. I suppose a physical machine would have the same issue, though.
> 
> If waking the machine up by mouse/keyboard activity is not desired in
> Hyper-V environment, then simply disable them as wakeup sources.

Sorry for the late reply! I have been sidetracked by something else...

Several years ago, we marked Hyper-V mouse/keyboard devices as wake 
sources to fix such a bug: the VM can not be woken up after we run
"echo freeze > /sys/power/state". IMO we should keep the mouse/keyboard
as wakeup sources.
 
> >
> > > > So, I use the notifier to set the flag variable and with it the driver can
> > > > know when it should not call pm_wakeup_hard_event().
> > >
> > > No, please implement hibernation support properly, as notifier + flag is
> > > a hack.
> >
> > The keyboard/mouse driver can avoid the flag by disabling the
> > keyboard/mouse event handling, but the problem is that they don't know
> > when exactly they should disable the event handling. I think the PM
> > notifier is the only way to tell the drivers a hibernation process is ongoing.
> 
> Whatever initiates hibernation (in userspace) can adjust wakeup sources
> as needed if you want them disabled completely.

Good to know this! I just found the userspace is able to disable the Hyper-V
mouse/keyboard as wakeup sources by something like:

echo disabled >  /sys/bus/vmbus/devices/XXX/power/wakeup
(XXX is the device GUID).
 
> >
> > Do you think this idea (notifier + disabling event handling) is acceptable?
> 
> No, I believe this a hack, that is why I am pushing back on this.

Ok, I think we can get rid of the notifier completely, and tell the users to disable
the 2 wakeup sources, if they think the wakeup behavior is undesired.
 
> >
> > If not, then I'll have to remove the notifier completely, and document this as
> > a known issue to the user: when a hibernation process is started, be careful
> > to not switch input focus and not touch the keyboard/mouse until the
> > hibernation process is finished. :-)
> >
> > > In this particular case you do not want to have your
> > > hv_kbd_resume() to be called in place of pm_ops->thaw() as that is what
> > > reenables the keyboard vmbus channel and causes the undesired wakeup
> > > events.
> >
> > This is only part of the issues. Another example: before the
> > pm_ops()->freeze()'s of all the devices are called, pm_wakeup_pending()
> > is already tested in a lot of places (e.g. in try_to_freeze_tasks ()) in the
> > suspend process, and can abort the whole suspend process upon the user's
> > unintentional input focus switch, keystroke and mouse movement.
> 
> How long is the prepare() phase on your systems? 

I have no specific data, but I know it's fast.

> User may wiggle mouse at any time really, even before the notifier fires up.

This doesn't matter, because the counter "pm_abort_suspend" is cleared at
a later place. The code path is:

hibernate() ->
  __pm_notifier_call_chain(PM_HIBERNATION_PREPARE, -1, &nr_calls)
  freeze_processes() ->
    pm_wakeup_clear() -> 
      atomic_set(&pm_abort_suspend, 0);

This patch sets the flag in the PM_HIBERNATION_PREPARE notifier, so
there is no race.

Since I'm going to get rid of the notifier, we don't care at all about this now.
 
> >
> > > Your vmbus implementation should allow individual drivers to
> > > control the set of PM operations that they wish to use, instead of
> > > forcing everything through suspend/resume.
> > >
> > > Dmitry
> >
> > Since the devices are pure software-emulated devices, no PM operation was
> > supported in the past, and now suspend/resume are the only two PM
> operations
> > we're going to support. If the idea (notifier + disabling event handling) is not
> > good enough, we'll have to document the issue to the user, as I described
> above.
> 
> ¯\_(ツ)_/¯ If you do not want to implement hibernation properly in vmbus
> code that is totally up to you (have you read in pm.h how freeze() is
> different from suspend()?).
> Dmitry

I understand freeze() is different from suspend(). Here I treat suspend() as a
heavyweight freeze() for simplicity and IMHO the extra cost of time is
neglectable considering the long hibernation process, which can take 
5~10+ seconds.

Even if I implement all the pm ops, IMO the issue we're talking about
(i.e. the hibernation process can be aborted by user's keyboard/mouse
activities) still exists. Actually I think a physical Linux machine should have
the same issue.

In practice, IMO the issue is not a big concern, as the VM usually runs in
a remote data center, and the user has no access to the VM's 
keyboard/mouse. :-)

I hope I understood your comments. I'll post a v2 without the notifier. 
Please Ack the v2 if it looks good to you.

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-10-03  6:44 UTC (permalink / raw)
  To: dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <PU1P153MB01696258D9983DF59D68E748BF9F0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

> From: Dexuan Cui
> Sent: Wednesday, October 2, 2019 10:35 PM
> > ... 
> >
> > ¯\_(ツ)_/¯ If you do not want to implement hibernation properly in vmbus
> > code that is totally up to you (have you read in pm.h how freeze() is
> > different from suspend()?).
> > Dmitry
> 
> I understand freeze() is different from suspend(). Here I treat suspend() as a
> heavyweight freeze() for simplicity and IMHO the extra cost of time is
> neglectable considering the long hibernation process, which can take
> 5~10+ seconds.
> 
> Even if I implement all the pm ops, IMO the issue we're talking about
> (i.e. the hibernation process can be aborted by user's keyboard/mouse
> activities) still exists. Actually I think a physical Linux machine should have
> the same issue.
> 
> In practice, IMO the issue is not a big concern, as the VM usually runs in
> a remote data center, and the user has no access to the VM's
> keyboard/mouse. :-)
> 
> I hope I understood your comments. I'll post a v2 without the notifier.
> Please Ack the v2 if it looks good to you.
> 
> -- Dexuan

I think I understood now: it looks the vmbus driver should implement
a prepare() or freeze(), which calls the hyperv_keyboard driver's
prepare() or freeze(), which can set the flag or disable the keyboard
event handling. This way we don't need the notifier.

Please let me know if I still don't get it right.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v3 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-03 10:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Jonathan Corbet,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Paolo Bonzini, Radim Krcmar, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Peter Zijlstra, Will Deacon
In-Reply-To: <20191002171006.GB9615@linux.intel.com>

On 2019/10/3 1:10, Sean Christopherson wrote:

> On Mon, Sep 30, 2019 at 08:44:36PM +0800, Zhenzhong Duan wrote:
>> There are cases where a guest tries to switch spinlocks to bare metal
>> behavior (e.g. by setting "xen_nopvspin" on XEN platform and
>> "hv_nopvspin" on HYPER_V).
>>
>> That feature is missed on KVM, add a new parameter "nopvspin" to disable
>> PV spinlocks for KVM guest.
>>
>> This new parameter is also used to replace "xen_nopvspin" and
>> "hv_nopvspin".
> This is confusing as there are no Xen or Hyper-V changes in this patch.
> Please make it clear that you're talking about future patches, e.g.:
>
>    The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
>    parameters in future patches.

Will fix

>
>> The global variable pvspin isn't defined as __initdata as it's used at
>> runtime by XEN guest.
> Same comment as above regarding what this patch is doing versus what will
> be done in the future.  Arguably you should even mark it __initdata in
> this patch and deal with conflict in the Xen patch, e.g. use it only to
> set the existing xen_pvspin variable.

Will fix

>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>

......snip

>>   /**
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index e820568..a4f108d 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -842,6 +842,13 @@ void __init kvm_spinlock_init(void)
>>   	if (num_possible_cpus() == 1)
>>   		return;
>>   
>> +	if (!pvspin) {
>> +		pr_info("PV spinlocks disabled\n");
>> +		static_branch_disable(&virt_spin_lock_key);
>> +		return;
>> +	}
>> +	pr_info("PV spinlocks enabled\n");
> These prints could be confusing as KVM also disables PV spinlocks when it
> sees KVM_HINTS_REALTIME.

What about below:

pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");

Or you prefer separate print for each disabling like below?

         /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
         if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
                 pr_info("PV spinlocks disabled, KVM_FEATURE_PV_UNHALT feature needed.\n");
                 return;
         }

         if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
                 pr_info("PV spinlocks disabled, having non-preemption hints.\n");
                 return;
         }

         /* Don't use the pvqspinlock code if there is only 1 vCPU. */
         if (num_possible_cpus() == 1) {
                 pr_info("PV spinlocks disabled on UP.\n");
                 return;
         }
	if (!pvspin) {
		pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
		static_branch_disable(&virt_spin_lock_key);
		return;
	}
	pr_info("PV spinlocks enabled\n");

>
>> +
>>   	__pv_init_lock_hash();
>>   	pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
>>   	pv_ops.lock.queued_spin_unlock =
>> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
>> index 2473f10..945b510 100644
>> --- a/kernel/locking/qspinlock.c
>> +++ b/kernel/locking/qspinlock.c
>> @@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>>   #include "qspinlock_paravirt.h"
>>   #include "qspinlock.c"
>>   
>> +bool pvspin = true;
> This can be __ro_after_init, or probably better __initdata and have Xen
> snapshot the value for its use case.

I will use __initdata

>
> Personal preference: I'd invert the bool and name it nopvspin to make it
> easier to connect the variable to the kernel param.

OK, will do that.  Thanks for review for all the patches.

Zhenzhong


^ permalink raw reply

* Re: [PATCH v3 2/4] x86/kvm: Change print code to use pr_*() format
From: Zhenzhong Duan @ 2019-10-03 10:32 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Paolo Bonzini,
	Radim Krcmar, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <20191002171551.GC9615@linux.intel.com>

On 2019/10/3 1:15, Sean Christopherson wrote:

> On Mon, Sep 30, 2019 at 08:44:37PM +0800, Zhenzhong Duan wrote:
>> pr_*() is preferred than printk(KERN_* ...), after change all the print
>> in arch/x86/kernel/kvm.c will have "KVM: xxx" style.
>>
>> No functional change.
>>
>> Suggested-by: Vitaly Kuznetsov<vkuznets@redhat.com>
> This wasn't really suggested by Vitaly, he just requested it be done in a
> separate patch.

Will fix.

>
>> Signed-off-by: Zhenzhong Duan<zhenzhong.duan@oracle.com>
>> Cc: Paolo Bonzini<pbonzini@redhat.com>
>> Cc: Radim Krcmar<rkrcmar@redhat.com>
>> Cc: Sean Christopherson<sean.j.christopherson@intel.com>
>> Cc: Vitaly Kuznetsov<vkuznets@redhat.com>
>> Cc: Wanpeng Li<wanpengli@tencent.com>
>> Cc: Jim Mattson<jmattson@google.com>
>> Cc: Joerg Roedel<joro@8bytes.org>
>> Cc: Thomas Gleixner<tglx@linutronix.de>
>> Cc: Ingo Molnar<mingo@redhat.com>
>> Cc: Borislav Petkov<bp@alien8.de>
>> Cc: "H. Peter Anvin"<hpa@zytor.com>
>> ---
>>   arch/x86/kernel/kvm.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index a4f108d..ce4f578 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -7,6 +7,8 @@
>>    *   Authors: Anthony Liguori<aliguori@us.ibm.com>
>>    */
>>   
>> +#define pr_fmt(fmt) "KVM: " fmt
> Not a fan of "KVM" as the prefix as it's easily confused with KVM the
> hypervisor.  Maybe "kvm_guest"?

Yeah, looks better, will change to"kvm_guest". Thanks

Zhenzhong


^ permalink raw reply

* Re: [PATCH v2] x86/hyperv: make vapic support x2apic mode
From: Vitaly Kuznetsov @ 2019-10-03 10:54 UTC (permalink / raw)
  To: Roman Kagan
  Cc: kvm@vger.kernel.org, Michael Kelley, Lan Tianyu, Joerg Roedel,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20191002101923.4981-1-rkagan@virtuozzo.com>

Roman Kagan <rkagan@virtuozzo.com> writes:

> Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
> when supported by the vcpus.
>
> However, the apic access functions for Hyper-V enlightened apic assume
> xapic mode only.
>
> As a result, Linux fails to bring up secondary cpus when run as a guest
> in QEMU/KVM with both hv_apic and x2apic enabled.
>
> I didn't manage to make my instance of Hyper-V expose x2apic to the
> guest; nor does Hyper-V spec document the expected behavior.  However,
> a Windows guest running in QEMU/KVM with hv_apic and x2apic and a big
> number of vcpus (so that it turns on x2apic mode) does use enlightened
> apic MSRs passing unshifted 32bit destination id and falls back to the
> regular x2apic MSRs for less frequently used apic fields.
>
> So implement the same behavior, by replacing enlightened apic access
> functions (only those where it makes a difference) with their
> x2apic-aware versions when x2apic is in use.
>
> Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
> Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
> Cc: stable@vger.kernel.org
> Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> ---
> v1 -> v2:
> - add ifdefs to handle !CONFIG_X86_X2APIC
>
>  arch/x86/hyperv/hv_apic.c | 54 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 51 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> index 5c056b8aebef..eb1434ae9e46 100644
> --- a/arch/x86/hyperv/hv_apic.c
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -84,6 +84,44 @@ static void hv_apic_write(u32 reg, u32 val)
>  	}
>  }
>  
> +#ifdef CONFIG_X86_X2APIC
> +static void hv_x2apic_icr_write(u32 low, u32 id)
> +{
> +	wrmsr(HV_X64_MSR_ICR, low, id);
> +}

AFAIU you're trying to mirror native_x2apic_icr_write() here but this is
different from what hv_apic_icr_write() does
(SET_APIC_DEST_FIELD(id)). Is it actually correct? (I think you've
tested this and it is but) Michael, could you please shed some light
here?

> +
> +static u32 hv_x2apic_read(u32 reg)
> +{
> +	u32 reg_val, hi;
> +
> +	switch (reg) {
> +	case APIC_EOI:
> +		rdmsr(HV_X64_MSR_EOI, reg_val, hi);
> +		return reg_val;
> +	case APIC_TASKPRI:
> +		rdmsr(HV_X64_MSR_TPR, reg_val, hi);
> +		return reg_val;
> +
> +	default:
> +		return native_apic_msr_read(reg);
> +	}
> +}
> +
> +static void hv_x2apic_write(u32 reg, u32 val)
> +{
> +	switch (reg) {
> +	case APIC_EOI:
> +		wrmsr(HV_X64_MSR_EOI, val, 0);
> +		break;
> +	case APIC_TASKPRI:
> +		wrmsr(HV_X64_MSR_TPR, val, 0);
> +		break;
> +	default:
> +		native_apic_msr_write(reg, val);
> +	}
> +}
> +#endif /* CONFIG_X86_X2APIC */
> +
>  static void hv_apic_eoi_write(u32 reg, u32 val)
>  {
>  	struct hv_vp_assist_page *hvp = hv_vp_assist_page[smp_processor_id()];
> @@ -262,9 +300,19 @@ void __init hv_apic_init(void)
>  	if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
>  		pr_info("Hyper-V: Using MSR based APIC access\n");
>  		apic_set_eoi_write(hv_apic_eoi_write);
> -		apic->read      = hv_apic_read;
> -		apic->write     = hv_apic_write;
> -		apic->icr_write = hv_apic_icr_write;
> +#ifdef CONFIG_X86_X2APIC
> +		if (x2apic_enabled()) {
> +			apic->read      = hv_x2apic_read;
> +			apic->write     = hv_x2apic_write;
> +			apic->icr_write = hv_x2apic_icr_write;
> +		} else {
> +#endif
> +			apic->read      = hv_apic_read;
> +			apic->write     = hv_apic_write;
> +			apic->icr_write = hv_apic_icr_write;

(just wondering): Is it always safe to assume that we cannot switch
between apic_flat/x2apic in runtime? Moreover, the only difference
between hv_apic_read/hv_apic_write and hv_x2apic_read/hv_x2apic_write is
native_apic_mem_{read,write} -> native_apic_msr_{read,write}. Would it
make sense to move if (x2apic_enabled()) and merge these functions?

> +#ifdef CONFIG_X86_X2APIC
> +		}
> +#endif
>  		apic->icr_read  = hv_apic_icr_read;
>  	}
>  }

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v3 3/4] xen: Mark "xen_nopvspin" parameter obsolete and map it to "nopvspin"
From: Zhenzhong Duan @ 2019-10-03 11:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Jonathan Corbet,
	Boris Ostrovsky, Juergen Gross, Stefano Stabellini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <20191002171805.GD9615@linux.intel.com>


On 2019/10/3 1:18, Sean Christopherson wrote:
> On Mon, Sep 30, 2019 at 08:44:38PM +0800, Zhenzhong Duan wrote:
>> Fix stale description of "xen_nopvspin" as we use qspinlock now.
>>
>> Signed-off-by: Zhenzhong Duan<zhenzhong.duan@oracle.com>
>> Reviewed-by: Juergen Gross<jgross@suse.com>
>> Cc: Jonathan Corbet<corbet@lwn.net>
>> Cc: Boris Ostrovsky<boris.ostrovsky@oracle.com>
>> Cc: Juergen Gross<jgross@suse.com>
>> Cc: Stefano Stabellini<sstabellini@kernel.org>
>> Cc: Thomas Gleixner<tglx@linutronix.de>
>> Cc: Ingo Molnar<mingo@redhat.com>
>> Cc: Borislav Petkov<bp@alien8.de>
>> Cc: "H. Peter Anvin"<hpa@zytor.com>
>> ---
...snip
>> @@ -93,7 +92,7 @@ void xen_init_lock_cpu(int cpu)
>>   
>>   void xen_uninit_lock_cpu(int cpu)
>>   {
>> -	if (!xen_pvspin)
>> +	if (!pvspin)
>>   		return;
>>   
>>   	unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL);
>> @@ -117,9 +116,9 @@ void __init xen_init_spinlocks(void)
>>   
>>   	/*  Don't need to use pvqspinlock code if there is only 1 vCPU. */
>>   	if (num_possible_cpus() == 1)
>> -		xen_pvspin = false;
>> +		pvspin = false;
> As suggested in the other patch, if you incorporate pvspin (or nopvspin)
> into xen_pvspin then the param can be __initdata and the diff for this
> patch will be smaller, e.g. you wouldn't need the xen_domain() shenanigans
> in xen_parse_nopvspin().

Ok, will fix. Thanks

Zhenzhong


^ permalink raw reply

* Re: [PATCH v3 4/4] x86/hyperv: Mark "hv_nopvspin" parameter obsolete and map it to "nopvspin"
From: Zhenzhong Duan @ 2019-10-03 11:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, vkuznets, linux-hyperv, kvm, Jonathan Corbet,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin
In-Reply-To: <20191002171952.GE9615@linux.intel.com>


On 2019/10/3 1:19, Sean Christopherson wrote:
> On Mon, Sep 30, 2019 at 08:44:39PM +0800, Zhenzhong Duan wrote:
>> Includes asm/hypervisor.h in order to reference x86_hyper_type.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>> Cc: Jonathan Corbet <corbet@lwn.net>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> ---
...snip
>> @@ -64,7 +63,7 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
>>   
>>   void __init hv_init_spinlocks(void)
>>   {
>> -	if (!hv_pvspin || !apic ||
>> +	if (!pvspin || !apic ||
>>   	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
>>   	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
>>   		pr_info("PV spinlocks disabled\n");
>> @@ -82,7 +81,9 @@ void __init hv_init_spinlocks(void)
>>   
>>   static __init int hv_parse_nopvspin(char *arg)
>>   {
>> -	hv_pvspin = false;
>> +	pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
>> +	if (x86_hyper_type == X86_HYPER_MS_HYPERV)
>> +		pvspin = false;
> Personal preference would be to keep the hv_pvspin variable and add the
> extra check in hv_init_spinlocks().

OK, will do that way. Thanks

Zhenzhong


^ permalink raw reply

* Re: [PATCH v2] x86/hyperv: make vapic support x2apic mode
From: Roman Kagan @ 2019-10-03 12:52 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: kvm@vger.kernel.org, Michael Kelley, Lan Tianyu, Joerg Roedel,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <87muei14ms.fsf@vitty.brq.redhat.com>

On Thu, Oct 03, 2019 at 12:54:03PM +0200, Vitaly Kuznetsov wrote:
> Roman Kagan <rkagan@virtuozzo.com> writes:
> 
> > Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode
> > when supported by the vcpus.
> >
> > However, the apic access functions for Hyper-V enlightened apic assume
> > xapic mode only.
> >
> > As a result, Linux fails to bring up secondary cpus when run as a guest
> > in QEMU/KVM with both hv_apic and x2apic enabled.
> >
> > I didn't manage to make my instance of Hyper-V expose x2apic to the
> > guest; nor does Hyper-V spec document the expected behavior.  However,
> > a Windows guest running in QEMU/KVM with hv_apic and x2apic and a big
> > number of vcpus (so that it turns on x2apic mode) does use enlightened
> > apic MSRs passing unshifted 32bit destination id and falls back to the
> > regular x2apic MSRs for less frequently used apic fields.
> >
> > So implement the same behavior, by replacing enlightened apic access
> > functions (only those where it makes a difference) with their
> > x2apic-aware versions when x2apic is in use.
> >
> > Fixes: 29217a474683 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver")
> > Fixes: 6b48cb5f8347 ("X86/Hyper-V: Enlighten APIC access")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
> > ---
> > v1 -> v2:
> > - add ifdefs to handle !CONFIG_X86_X2APIC
> >
> >  arch/x86/hyperv/hv_apic.c | 54 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 51 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> > index 5c056b8aebef..eb1434ae9e46 100644
> > --- a/arch/x86/hyperv/hv_apic.c
> > +++ b/arch/x86/hyperv/hv_apic.c
> > @@ -84,6 +84,44 @@ static void hv_apic_write(u32 reg, u32 val)
> >  	}
> >  }
> >  
> > +#ifdef CONFIG_X86_X2APIC
> > +static void hv_x2apic_icr_write(u32 low, u32 id)
> > +{
> > +	wrmsr(HV_X64_MSR_ICR, low, id);
> > +}
> 
> AFAIU you're trying to mirror native_x2apic_icr_write() here but this is
> different from what hv_apic_icr_write() does
> (SET_APIC_DEST_FIELD(id)).

Right.  In xapic mode the ICR2 aka the high 4 bytes of ICR is programmed
with the destination id in the highest byte; in x2apic mode the whole
ICR2 is set to the 32bit destination id.

> Is it actually correct? (I think you've tested this and it is but)

As I wrote in the commit log, I haven't tested it in the sense that I
ran a Linux guest in a Hyper-V VM exposing x2apic to the guest, because
I didn't manage to configure it to do so.  OTOH I did run a Windows
guest in QEMU/KVM with hv_apic and x2apic enabled and saw it write
destination ids unshifted to the ICR2 part of ICR, so I assume it's
correct.

> Michael, could you please shed some light here?

Would be appreciated, indeed.

> > +static u32 hv_x2apic_read(u32 reg)
> > +{
> > +	u32 reg_val, hi;
> > +
> > +	switch (reg) {
> > +	case APIC_EOI:
> > +		rdmsr(HV_X64_MSR_EOI, reg_val, hi);
> > +		return reg_val;
> > +	case APIC_TASKPRI:
> > +		rdmsr(HV_X64_MSR_TPR, reg_val, hi);
> > +		return reg_val;
> > +
> > +	default:
> > +		return native_apic_msr_read(reg);
> > +	}
> > +}
> > +
> > +static void hv_x2apic_write(u32 reg, u32 val)
> > +{
> > +	switch (reg) {
> > +	case APIC_EOI:
> > +		wrmsr(HV_X64_MSR_EOI, val, 0);
> > +		break;
> > +	case APIC_TASKPRI:
> > +		wrmsr(HV_X64_MSR_TPR, val, 0);
> > +		break;
> > +	default:
> > +		native_apic_msr_write(reg, val);
> > +	}
> > +}
> > +#endif /* CONFIG_X86_X2APIC */
> > +
> >  static void hv_apic_eoi_write(u32 reg, u32 val)
> >  {
> >  	struct hv_vp_assist_page *hvp = hv_vp_assist_page[smp_processor_id()];
> > @@ -262,9 +300,19 @@ void __init hv_apic_init(void)
> >  	if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
> >  		pr_info("Hyper-V: Using MSR based APIC access\n");
> >  		apic_set_eoi_write(hv_apic_eoi_write);
> > -		apic->read      = hv_apic_read;
> > -		apic->write     = hv_apic_write;
> > -		apic->icr_write = hv_apic_icr_write;
> > +#ifdef CONFIG_X86_X2APIC
> > +		if (x2apic_enabled()) {
> > +			apic->read      = hv_x2apic_read;
> > +			apic->write     = hv_x2apic_write;
> > +			apic->icr_write = hv_x2apic_icr_write;
> > +		} else {
> > +#endif
> > +			apic->read      = hv_apic_read;
> > +			apic->write     = hv_apic_write;
> > +			apic->icr_write = hv_apic_icr_write;
> 
> (just wondering): Is it always safe to assume that we cannot switch
> between apic_flat/x2apic in runtime?

I guess so.  All apic choices are made early at __init, obviously before
the secondary CPUs are brought up, and aren't reconsidered afterwards.

> Moreover, the only difference
> between hv_apic_read/hv_apic_write and hv_x2apic_read/hv_x2apic_write is
> native_apic_mem_{read,write} -> native_apic_msr_{read,write}. Would it
> make sense to move if (x2apic_enabled()) and merge these functions?

x2apic_enabled() is too heavy for that: it reads MSR_IA32_APICBASE.  One
could probably use a read-mostly global variable instead but I'm not
sure it would make the code better.

Thanks,
Roman.

^ permalink raw reply

* [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Andrea Parri @ 2019-10-03 15:52 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, x86
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Michael Kelley, Andrea Parri

If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
configuration version.  Bit 15 corresponds to the
AccessTscInvariantControls privilege.  If this privilege bit is set,
guests can access the HvSyntheticInvariantTscControl MSR: guests can
set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
After setting the synthetic MSR, CPUID will enumerate support for
InvariantTSC.

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 5 +++++
 arch/x86/kernel/cpu/mshyperv.c     | 7 ++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 7741e211f7f51..5f10f7f2098db 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -86,6 +86,8 @@
 #define HV_X64_ACCESS_FREQUENCY_MSRS		BIT(11)
 /* AccessReenlightenmentControls privilege */
 #define HV_X64_ACCESS_REENLIGHTENMENT		BIT(13)
+/* AccessTscInvariantControls privilege */
+#define HV_X64_ACCESS_TSC_INVARIANT		BIT(15)
 
 /*
  * Feature identification: indicates which flags were specified at partition
@@ -278,6 +280,9 @@
 #define HV_X64_MSR_TSC_EMULATION_CONTROL	0x40000107
 #define HV_X64_MSR_TSC_EMULATION_STATUS		0x40000108
 
+/* TSC invariant control */
+#define HV_X64_MSR_TSC_INVARIANT_CONTROL	0x40000118
+
 /*
  * Declare the MSR used to setup pages used to communicate with the hypervisor.
  */
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 267daad8c0360..105844d542e5c 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -286,7 +286,12 @@ static void __init ms_hyperv_init_platform(void)
 	machine_ops.shutdown = hv_machine_shutdown;
 	machine_ops.crash_shutdown = hv_machine_crash_shutdown;
 #endif
-	mark_tsc_unstable("running on Hyper-V");
+	if (ms_hyperv.features & HV_X64_ACCESS_TSC_INVARIANT) {
+		wrmsrl(HV_X64_MSR_TSC_INVARIANT_CONTROL, 0x1);
+		setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+	} else {
+		mark_tsc_unstable("running on Hyper-V");
+	}
 
 	/*
 	 * Generation 2 instances don't support reading the NMI status from
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Andrea Parri @ 2019-10-03 16:03 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv, x86
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Michael Kelley
In-Reply-To: <20191003155200.22022-1-parri.andrea@gmail.com>


On Thu, Oct 03, 2019 at 05:52:00PM +0200, Andrea Parri wrote:
> If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
> HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
> configuration version.  Bit 15 corresponds to the
> AccessTscInvariantControls privilege.  If this privilege bit is set,
> guests can access the HvSyntheticInvariantTscControl MSR: guests can
> set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
> After setting the synthetic MSR, CPUID will enumerate support for
> InvariantTSC.
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>

Subject should have been "[PATCH] ...", i.e., there is no 2/2 planned
(not for this patchset at least).  Please let me know if I should re-
submit with the subject fixed.

Thanks,
  Andrea

^ permalink raw reply

* Re: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: dmitry.torokhov @ 2019-10-03 17:45 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <PU1P153MB0169CC57749BF297F2581B02BF9F0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Oct 03, 2019 at 06:44:04AM +0000, Dexuan Cui wrote:
> > From: Dexuan Cui
> > Sent: Wednesday, October 2, 2019 10:35 PM
> > > ... 
> > >
> > > ¯\_(ツ)_/¯ If you do not want to implement hibernation properly in vmbus
> > > code that is totally up to you (have you read in pm.h how freeze() is
> > > different from suspend()?).
> > > Dmitry
> > 
> > I understand freeze() is different from suspend(). Here I treat suspend() as a
> > heavyweight freeze() for simplicity and IMHO the extra cost of time is
> > neglectable considering the long hibernation process, which can take
> > 5~10+ seconds.
> > 
> > Even if I implement all the pm ops, IMO the issue we're talking about
> > (i.e. the hibernation process can be aborted by user's keyboard/mouse
> > activities) still exists. Actually I think a physical Linux machine should have
> > the same issue.
> > 
> > In practice, IMO the issue is not a big concern, as the VM usually runs in
> > a remote data center, and the user has no access to the VM's
> > keyboard/mouse. :-)
> > 
> > I hope I understood your comments. I'll post a v2 without the notifier.
> > Please Ack the v2 if it looks good to you.
> > 
> > -- Dexuan
> 
> I think I understood now: it looks the vmbus driver should implement
> a prepare() or freeze(), which calls the hyperv_keyboard driver's
> prepare() or freeze(), which can set the flag or disable the keyboard
> event handling. This way we don't need the notifier.

Right. I think in practice the current suspend implementation can work
as freeze() for the HV keyboard, because in suspend you shut off vmbus
channel, so there should not be wakeup signals anymore. What you do not
want is to have the current resume to be used in place of thaw(), as
there you re-enable the vmbus channel and resume sending wakeup requests
as you are writing out the hibernation image to storage.

I think if vmbus allowed HV keyboard driver to supply empty thaw() and
poweroff() implementations, while using suspend() as freeze() and
resume() as restore(), it would solve the issue for you.

Thanks.

-- 
Dmitry

^ permalink raw reply

* RE: [PATCH] Input: hyperv-keyboard: Add the support of hibernation
From: Dexuan Cui @ 2019-10-03 18:18 UTC (permalink / raw)
  To: dmitry.torokhov@gmail.com
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-input@vger.kernel.org, linux-kernel@vger.kernel.org,
	Michael Kelley
In-Reply-To: <20191003174530.GB22365@dtor-ws>

> From: dmitry.torokhov@gmail.com <dmitry.torokhov@gmail.com>
> Sent: Thursday, October 3, 2019 10:46 AM
> >
> > I think I understood now: it looks the vmbus driver should implement
> > a prepare() or freeze(), which calls the hyperv_keyboard driver's
> > prepare() or freeze(), which can set the flag or disable the keyboard
> > event handling. This way we don't need the notifier.
> 
> Right. I think in practice the current suspend implementation can work
> as freeze() for the HV keyboard, because in suspend you shut off vmbus
> channel, so there should not be wakeup signals anymore. What you do not
> want is to have the current resume to be used in place of thaw(), as
> there you re-enable the vmbus channel and resume sending wakeup requests
> as you are writing out the hibernation image to storage.
> 
> I think if vmbus allowed HV keyboard driver to supply empty thaw() and
> poweroff() implementations, while using suspend() as freeze() and
> resume() as restore(), it would solve the issue for you.
> 
> Dmitry

Exactly. I'll have to fix vmbus first, then post a v2 for this patch.

Thanks,
-- Dexuan

^ permalink raw reply

* Re: [PATCH v5 1/2] drivers: hv: vmbus: Introduce latency testing
From: Branden Bonaby @ 2019-10-03 18:22 UTC (permalink / raw)
  To: Michael Kelley
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <DM5PR21MB01373C2DB4DE6A4B6079C2BAD7890@DM5PR21MB0137.namprd21.prod.outlook.com>

On Thu, Sep 19, 2019 at 10:52:41PM +0000, Michael Kelley wrote:
> From: Branden Bonaby <brandonbonaby94@gmail.com> Sent: Thursday, September 12, 2019 7:32 PM
> > 
> > +
> > +static int hv_debugfs_delay_set(void *data, u64 val)
> > +{
> > +	int ret = 0;
> > +
> > +	if (val >= 0 && val <= 1000)
> > +		*(u32 *)data = val;
> > +	else
> > +		ret = -EINVAL;
> > +
> > +	return ret;
> > +}
> 
> I should probably quit picking at your code, but I'm going to
> do it one more time. :-)
> 
> The above test for val >=0 is redundant as 'val' is declared
> as 'u64'.  As an unsigned value, it will always be >= 0.  More
> broadly, the above function could be written as follows
> with no loss of clarity.  This accomplishes the same thing in
> only 4 lines of code instead of 6, and the main execution path
> is in the sequential execution flow, not in an 'if' statement.
> 
> {
> 	if (val > 1000)
> 		return -EINVAL;
> 	*(u32 *)data = val;
> 	return 0;
> }
> 
> Your code is correct as written, so this is arguably more a
> matter of style, but Linux generally likes to do things clearly
> and compactly with no extra motion.
> 

Yea the less than 0 comparison isnt needed, so I'll update that

> +/* Delay buffer/message reads on a vmbus channel */
> > +void hv_debug_delay_test(struct vmbus_channel *channel, enum delay delay_type)
> > +{
> > +	struct vmbus_channel *test_channel =    channel->primary_channel ?
> > +						channel->primary_channel :
> > +						channel;
> > +	bool state = test_channel->fuzz_testing_state;
> > +
> > +	if (state) {
> > +		if (delay_type == 0)
> > +			udelay(test_channel->fuzz_testing_interrupt_delay);
> > +		else
> > +			udelay(test_channel->fuzz_testing_message_delay);
> 
> This 'if/else' statement got me thinking.  You have an enum declared below
> that lists the two options -- INTERRUPT_DELAY or MESSAGE_DELAY.  The
> implication is that we might add more options in the future.  But the
> above 'if/else' statement isn't really set up to easily add more options, and
> the individual fields for fuzz_testing_interrupt_delay and
> fuzz_testing_message_delay mean adding more branches to the 'if/else'
> statement whenever a new DELAY type is added to the enum.   And the
> same is true when adding the entries into debugfs.  A more general
> solution might use arrays and loops, and treat the enum value as an
> index into an array of delay values.  Extending to add another delay type
> could be as easy as adding another entry to the enum declaration.
> 
> The current code is for the case where n=2 (i.e., two different delay
> types), and as such probably doesn't warrant the full index/looping
> treatment.  But in the future, if we add additional delay types, we'll
> probably revise the code to do the index/looping approach.
> 
> So to be clear, at this point I'm not asking you to change the existing
> code.  My comments are more of an observation and something to
> think about in the future.
> 

I do see your point, thanks for the input. I think since its just two
it might be better to leave it but it definitely makes sense.

> > 
> > +enum delay {
> > +	INTERRUPT_DELAY = 0,
> > +	MESSAGE_DELAY   = 1,
> > +};
> > +
> 
> Michael

^ permalink raw reply

* RE: [RFC PATCH 09/13] hv_sock: set VMADDR_CID_HOST in the hvs_remote_addr_init()
From: Dexuan Cui @ 2019-10-03 19:36 UTC (permalink / raw)
  To: Stefano Garzarella, netdev@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Stefan Hajnoczi,
	Sasha Levin, linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	David S. Miller, virtualization@lists.linux-foundation.org,
	Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
	Jorgen Hansen
In-Reply-To: <20190927112703.17745-10-sgarzare@redhat.com>

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Friday, September 27, 2019 4:27 AM
> To: netdev@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org; KY Srinivasan <kys@microsoft.com>; Stefan
> Hajnoczi <stefanha@redhat.com>; Sasha Levin <sashal@kernel.org>;
> linux-kernel@vger.kernel.org; kvm@vger.kernel.org; David S. Miller
> <davem@davemloft.net>; virtualization@lists.linux-foundation.org; Stephen
> Hemminger <sthemmin@microsoft.com>; Jason Wang
> <jasowang@redhat.com>; Michael S. Tsirkin <mst@redhat.com>; Haiyang
> Zhang <haiyangz@microsoft.com>; Dexuan Cui <decui@microsoft.com>;
> Jorgen Hansen <jhansen@vmware.com>
> Subject: [RFC PATCH 09/13] hv_sock: set VMADDR_CID_HOST in the
> hvs_remote_addr_init()
> 
> Remote peer is always the host, so we set VMADDR_CID_HOST as
> remote CID instead of VMADDR_CID_ANY.
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/hyperv_transport.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/vmw_vsock/hyperv_transport.c
> b/net/vmw_vsock/hyperv_transport.c
> index 4f47af2054dd..306310794522 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -186,7 +186,8 @@ static void hvs_remote_addr_init(struct sockaddr_vm
> *remote,
>  	static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
>  	struct sock *sk;
> 
> -	vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
> +	/* Remote peer is always the host */
> +	vsock_addr_init(remote, VMADDR_CID_HOST, VMADDR_PORT_ANY);
> 
>  	while (1) {
>  		/* Wrap around ? */
> --

Looks good to me, since hv_sock doesn't really use the CID in the 
transport layer.

Reviewed-by: Dexuan Cui <decui@microsoft.com>

^ permalink raw reply

* RE: [RFC PATCH 07/13] vsock: handle buffer_size sockopts in the core
From: Dexuan Cui @ 2019-10-03 20:11 UTC (permalink / raw)
  To: Stefano Garzarella, netdev@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Stefan Hajnoczi,
	Sasha Levin, linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	David S. Miller, virtualization@lists.linux-foundation.org,
	Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
	Jorgen Hansen
In-Reply-To: <20190927112703.17745-8-sgarzare@redhat.com>

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Friday, September 27, 2019 4:27 AM
> To: netdev@vger.kernel.org
> 
> virtio_transport and vmci_transport handle the buffer_size
> sockopts in a very similar way.
> 
> In order to support multiple transports, this patch moves this
> handling in the core to allow the user to change the options
> also if the socket is not yet assigned to any transport.
> 
> This patch also adds the '.notify_buffer_size' callback in the
> 'struct virtio_transport' in order to inform the transport,
> when the buffer_size is changed by the user. It is also useful
> to limit the 'buffer_size' requested (e.g. virtio transports).
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  drivers/vhost/vsock.c                   |  7 +-
>  include/linux/virtio_vsock.h            | 15 +----
>  include/net/af_vsock.h                  | 14 ++--
>  net/vmw_vsock/af_vsock.c                | 43 ++++++++++---
>  net/vmw_vsock/hyperv_transport.c        | 36 -----------
>  net/vmw_vsock/virtio_transport.c        |  8 +--
>  net/vmw_vsock/virtio_transport_common.c | 78 ++++------------------
>  net/vmw_vsock/vmci_transport.c          | 86 +++----------------------
>  net/vmw_vsock/vmci_transport.h          |  3 -
>  9 files changed, 64 insertions(+), 226 deletions(-)
> 

The hv_sock part (hyperv_transport.c) looks good to me.

Acked-by: Dexuan Cui <decui@microsoft.com>


^ permalink raw reply

* RE: [RFC PATCH 08/13] vsock: move vsock_insert_unbound() in the vsock_create()
From: Dexuan Cui @ 2019-10-03 20:17 UTC (permalink / raw)
  To: Stefano Garzarella, netdev@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Stefan Hajnoczi,
	Sasha Levin, linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	David S. Miller, virtualization@lists.linux-foundation.org,
	Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
	Jorgen Hansen
In-Reply-To: <20190927112703.17745-9-sgarzare@redhat.com>

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Friday, September 27, 2019 4:27 AM
> 
> vsock_insert_unbound() was called only when 'sock' parameter of
> __vsock_create() was not null. This only happened when
> __vsock_create() was called by vsock_create().
> 
> In order to simplify the multi-transports support, this patch
> moves vsock_insert_unbound() at the end of vsock_create().
> 
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
> ---
>  net/vmw_vsock/af_vsock.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 

Reviewed-by: Dexuan Cui <decui@microsoft.com>

^ permalink raw reply

* [PATCH v6 0/2] hv: vmbus: add fuzz testing to hv device
From: Branden Bonaby @ 2019-10-03 21:01 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal
  Cc: Branden Bonaby, linux-hyperv, linux-kernel

This patchset introduces a testing framework for Hyper-V drivers.
This framework allows us to introduce delays in the packet receive
path on a per-device basis. While the current code only supports
introducing arbitrary delays in the host/guest communication path,
we intend to expand this to support error injection in the future.

changes in v6:
  patch 1:
	changed kernel version in 
	Documentation/ABI/testing/debugfs-hyperv to 5.5
	
	removed less than 0 if statement when dealing with
	u64 datatype, as suggested by michael.

changes in v5:
  patch 1:
        As per Stephen's suggestion, Moved CONFIG_HYPERV_TESTING
        to lib/Kconfig.debug.

        Fixed build issue reported by Kbuild, with Michael's
        suggestion to make hv_debugfs part of the hv_vmbus
        module.

changes in v4:
  patch 1:
        Combined previous v3 patches 1 and 2, into a single patch
        which is now patch 1. This was done so that calls to
        the new debugfs functions are in the same patch as
        the definitions for these functions.

        Moved debugfs code from "vmbus_drv.c" that was in
        previous v3 patch 2, into a new file "debugfs.c" in
        drivers/hv.

        Updated the Makefile to compile "debugfs.c" if
        CONFIG_HYPERV_TESTING is enabled

        As per Michael's comments, added empty implementations
        of the new functions, so the compiler will not generate
        code when CONFIG_HYPERV_TESTING is not enabled.

  patch 2 (was previously v3 patch 3):
        Based on Harrys comments, made the tool more
        user friendly and added more error checking.

changes in v3:
  patch 2: change call to IS_ERR_OR_NULL, to IS_ERR.

  patch 3: Align python tool to match Linux coding style.

Changes in v2:
  Patch 1: As per Vitaly's suggestion, wrapped the test code under an
           #ifdef and updated the Kconfig file, so that the test code
           will only be used when the config option is set to true.
           (default is false).

           Updated hyperv_vmbus header to contain new #ifdef with new
           new functions for the test code.

  Patch 2: Moved code from under sysfs to debugfs and wrapped it under
           the new ifdef.

           Updated MAINTAINERS file with new debugfs-hyperv file under
           the section for hyperv.

  Patch 3: Updated testing tool with new debugfs location.

Branden Bonaby (2):
  drivers: hv: vmbus: Introduce latency testing
  tools: hv: add vmbus testing tool

 Documentation/ABI/testing/debugfs-hyperv |  23 ++
 MAINTAINERS                              |   1 +
 drivers/hv/Makefile                      |   1 +
 drivers/hv/connection.c                  |   1 +
 drivers/hv/hv_debugfs.c                  | 178 +++++++++++
 drivers/hv/hyperv_vmbus.h                |  31 ++
 drivers/hv/ring_buffer.c                 |   2 +
 drivers/hv/vmbus_drv.c                   |   6 +
 include/linux/hyperv.h                   |  19 ++
 lib/Kconfig.debug                        |   7 +
 tools/hv/vmbus_testing                   | 376 +++++++++++++++++++++++
 11 files changed, 645 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-hyperv
 create mode 100644 drivers/hv/hv_debugfs.c
 create mode 100755 tools/hv/vmbus_testing

-- 
2.17.1


^ permalink raw reply

* [PATCH v6 1/2] drivers: hv: vmbus: Introduce latency testing
From: Branden Bonaby @ 2019-10-03 21:01 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal
  Cc: Branden Bonaby, linux-hyperv, linux-kernel
In-Reply-To: <cover.1570130325.git.brandonbonaby94@gmail.com>

Introduce user specified latency in the packet reception path
By exposing the test parameters as part of the debugfs channel
attributes. We will control the testing state via these attributes.

Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
---
changes in v6:
 - changed kernel version in
   Documentation/ABI/testing/debugfs-hyperv to 5.5

 - removed less than 0 if statement when dealing with
   u64 datatype, as suggested by Michael.

changes in v5:
 - As per Stephen's suggestion, Moved CONFIG_HYPERV_TESTING
   to lib/Kconfig.debug.

 - Fixed build issue reported by Kbuild, with Michael's
   suggestion to make hv_debugfs part of the hv_vmbus
   module.

 - updated debugfs-hyperv to show kernel version 5.4

changes in v4:
 - Combined v3 patch 2 into this patch, and changed the
   commit description to reflect this.

 - Moved debugfs code from "vmbus_drv.c" that was in
   previous v3 patch 2, into a new file "debugfs.c" in
   drivers/hv.

 - Updated the Makefile to compile "debugfs.c" if
   CONFIG_HYPERV_TESTING is enabled

 - As per Michael's comments, added empty implementations
   of the new functions, so the compiler will not generate
   code when CONFIG_HYPERV_TESTING is not enabled.

 - Added microseconds into description for files in
   Documentation/ABI/testing/debugfs-hyperv.

Changes in v2:
 - Add #ifdef in Kconfig file so test code will not interfere
   with non-test code.
 - Move test code functions for delay to hyperv_vmbus header
   file.
 - Wrap test code under #ifdef statement.

 Documentation/ABI/testing/debugfs-hyperv |  23 +++
 MAINTAINERS                              |   1 +
 drivers/hv/Makefile                      |   1 +
 drivers/hv/connection.c                  |   1 +
 drivers/hv/hv_debugfs.c                  | 178 +++++++++++++++++++++++
 drivers/hv/hyperv_vmbus.h                |  31 ++++
 drivers/hv/ring_buffer.c                 |   2 +
 drivers/hv/vmbus_drv.c                   |   6 +
 include/linux/hyperv.h                   |  19 +++
 lib/Kconfig.debug                        |   7 +
 10 files changed, 269 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-hyperv
 create mode 100644 drivers/hv/hv_debugfs.c

diff --git a/Documentation/ABI/testing/debugfs-hyperv b/Documentation/ABI/testing/debugfs-hyperv
new file mode 100644
index 000000000000..9185e1b06bba
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-hyperv
@@ -0,0 +1,23 @@
+What:           /sys/kernel/debug/hyperv/<UUID>/fuzz_test_state
+Date:           October 2019
+KernelVersion:  5.5
+Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
+Description:    Fuzz testing status of a vmbus device, whether its in an ON
+                state or a OFF state
+Users:          Debugging tools
+
+What:           /sys/kernel/debug/hyperv/<UUID>/delay/fuzz_test_buffer_interrupt_delay
+Date:           October 2019
+KernelVersion:  5.5
+Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
+Description:    Fuzz testing buffer interrupt delay value between 0 - 1000
+		 microseconds (inclusive).
+Users:          Debugging tools
+
+What:           /sys/kernel/debug/hyperv/<UUID>/delay/fuzz_test_message_delay
+Date:           October 2019
+KernelVersion:  5.5
+Contact:        Branden Bonaby <brandonbonaby94@gmail.com>
+Description:    Fuzz testing message delay value between 0 - 1000 microseconds
+		 (inclusive).
+Users:          Debugging tools
diff --git a/MAINTAINERS b/MAINTAINERS
index 55199ef7fa74..9801b1924213 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7581,6 +7581,7 @@ F:	include/uapi/linux/hyperv.h
 F:	include/asm-generic/mshyperv.h
 F:	tools/hv/
 F:	Documentation/ABI/stable/sysfs-bus-vmbus
+F:	Documentation/ABI/testing/debugfs-hyperv
 
 HYPERBUS SUPPORT
 M:	Vignesh Raghavendra <vigneshr@ti.com>
diff --git a/drivers/hv/Makefile b/drivers/hv/Makefile
index a1eec7177c2d..94daf8240c95 100644
--- a/drivers/hv/Makefile
+++ b/drivers/hv/Makefile
@@ -9,4 +9,5 @@ CFLAGS_hv_balloon.o = -I$(src)
 hv_vmbus-y := vmbus_drv.o \
 		 hv.o connection.o channel.o \
 		 channel_mgmt.o ring_buffer.o hv_trace.o
+hv_vmbus-$(CONFIG_HYPERV_TESTING)	+= hv_debugfs.o
 hv_utils-y := hv_util.o hv_kvp.o hv_snapshot.o hv_fcopy.o hv_utils_transport.o
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6e4c015783ff..7001b1ab4cdd 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -361,6 +361,7 @@ void vmbus_on_event(unsigned long data)
 
 	trace_vmbus_on_event(channel);
 
+	hv_debug_delay_test(channel, INTERRUPT_DELAY);
 	do {
 		void (*callback_fn)(void *);
 
diff --git a/drivers/hv/hv_debugfs.c b/drivers/hv/hv_debugfs.c
new file mode 100644
index 000000000000..8a2878573582
--- /dev/null
+++ b/drivers/hv/hv_debugfs.c
@@ -0,0 +1,178 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Authors:
+ *   Branden Bonaby <brandonbonaby94@gmail.com>
+ */
+
+#include <linux/hyperv.h>
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+
+#include "hyperv_vmbus.h"
+
+struct dentry *hv_debug_root;
+
+static int hv_debugfs_delay_get(void *data, u64 *val)
+{
+	*val = *(u32 *)data;
+	return 0;
+}
+
+static int hv_debugfs_delay_set(void *data, u64 val)
+{
+	if (val > 1000)
+		return -EINVAL;
+	*(u32 *)data = val;
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(hv_debugfs_delay_fops, hv_debugfs_delay_get,
+			 hv_debugfs_delay_set, "%llu\n");
+
+static int hv_debugfs_state_get(void *data, u64 *val)
+{
+	*val = *(bool *)data;
+	return 0;
+}
+
+static int hv_debugfs_state_set(void *data, u64 val)
+{
+	if (val == 1)
+		*(bool *)data = true;
+	else if (val == 0)
+		*(bool *)data = false;
+	else
+		return -EINVAL;
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(hv_debugfs_state_fops, hv_debugfs_state_get,
+			 hv_debugfs_state_set, "%llu\n");
+
+/* Setup delay files to store test values */
+static int hv_debug_delay_files(struct hv_device *dev, struct dentry *root)
+{
+	struct vmbus_channel *channel = dev->channel;
+	char *buffer = "fuzz_test_buffer_interrupt_delay";
+	char *message = "fuzz_test_message_delay";
+	int *buffer_val = &channel->fuzz_testing_interrupt_delay;
+	int *message_val = &channel->fuzz_testing_message_delay;
+	struct dentry *buffer_file, *message_file;
+
+	buffer_file = debugfs_create_file(buffer, 0644, root,
+					  buffer_val,
+					  &hv_debugfs_delay_fops);
+	if (IS_ERR(buffer_file)) {
+		pr_debug("debugfs_hyperv: file %s not created\n", buffer);
+		return PTR_ERR(buffer_file);
+	}
+
+	message_file = debugfs_create_file(message, 0644, root,
+					   message_val,
+					   &hv_debugfs_delay_fops);
+	if (IS_ERR(message_file)) {
+		pr_debug("debugfs_hyperv: file %s not created\n", message);
+		return PTR_ERR(message_file);
+	}
+
+	return 0;
+}
+
+/* Setup test state value for vmbus device */
+static int hv_debug_set_test_state(struct hv_device *dev, struct dentry *root)
+{
+	struct vmbus_channel *channel = dev->channel;
+	bool *state = &channel->fuzz_testing_state;
+	char *status = "fuzz_test_state";
+	struct dentry *test_state;
+
+	test_state = debugfs_create_file(status, 0644, root,
+					 state,
+					 &hv_debugfs_state_fops);
+	if (IS_ERR(test_state)) {
+		pr_debug("debugfs_hyperv: file %s not created\n", status);
+		return PTR_ERR(test_state);
+	}
+
+	return 0;
+}
+
+/* Bind hv device to a dentry for debugfs */
+static void hv_debug_set_dir_dentry(struct hv_device *dev, struct dentry *root)
+{
+	if (hv_debug_root)
+		dev->debug_dir = root;
+}
+
+/* Create all test dentry's and names for fuzz testing */
+int hv_debug_add_dev_dir(struct hv_device *dev)
+{
+	const char *device = dev_name(&dev->device);
+	char *delay_name = "delay";
+	struct dentry *delay, *dev_root;
+	int ret;
+
+	if (!IS_ERR(hv_debug_root)) {
+		dev_root = debugfs_create_dir(device, hv_debug_root);
+		if (IS_ERR(dev_root)) {
+			pr_debug("debugfs_hyperv: hyperv/%s/ not created\n",
+				 device);
+			return PTR_ERR(dev_root);
+		}
+		hv_debug_set_test_state(dev, dev_root);
+		hv_debug_set_dir_dentry(dev, dev_root);
+		delay = debugfs_create_dir(delay_name, dev_root);
+
+		if (IS_ERR(delay)) {
+			pr_debug("debugfs_hyperv: hyperv/%s/%s/ not created\n",
+				 device, delay_name);
+			return PTR_ERR(delay);
+		}
+		ret = hv_debug_delay_files(dev, delay);
+
+		return ret;
+	}
+	pr_debug("debugfs_hyperv: hyperv/ not in root debugfs path\n");
+	return PTR_ERR(hv_debug_root);
+}
+
+/* Remove dentry associated with released hv device */
+void hv_debug_rm_dev_dir(struct hv_device *dev)
+{
+	if (!IS_ERR(hv_debug_root))
+		debugfs_remove_recursive(dev->debug_dir);
+}
+
+/* Remove all dentrys associated with vmbus testing */
+void hv_debug_rm_all_dir(void)
+{
+	debugfs_remove_recursive(hv_debug_root);
+}
+
+/* Delay buffer/message reads on a vmbus channel */
+void hv_debug_delay_test(struct vmbus_channel *channel, enum delay delay_type)
+{
+	struct vmbus_channel *test_channel =    channel->primary_channel ?
+						channel->primary_channel :
+						channel;
+	bool state = test_channel->fuzz_testing_state;
+
+	if (state) {
+		if (delay_type == 0)
+			udelay(test_channel->fuzz_testing_interrupt_delay);
+		else
+			udelay(test_channel->fuzz_testing_message_delay);
+	}
+}
+
+/* Initialize top dentry for vmbus testing */
+int hv_debug_init(void)
+{
+	hv_debug_root = debugfs_create_dir("hyperv", NULL);
+	if (IS_ERR(hv_debug_root)) {
+		pr_debug("debugfs_hyperv: hyperv/ not created\n");
+		return PTR_ERR(hv_debug_root);
+	}
+	return 0;
+}
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index af9379a3bf89..20edcfd3b96c 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -385,4 +385,35 @@ enum hvutil_device_state {
 	HVUTIL_DEVICE_DYING,     /* driver unload is in progress */
 };
 
+enum delay {
+	INTERRUPT_DELAY = 0,
+	MESSAGE_DELAY   = 1,
+};
+
+#ifdef CONFIG_HYPERV_TESTING
+
+int hv_debug_add_dev_dir(struct hv_device *dev);
+void hv_debug_rm_dev_dir(struct hv_device *dev);
+void hv_debug_rm_all_dir(void);
+int hv_debug_init(void);
+void hv_debug_delay_test(struct vmbus_channel *channel, enum delay delay_type);
+
+#else /* CONFIG_HYPERV_TESTING */
+
+static inline void hv_debug_rm_dev_dir(struct hv_device *dev) {};
+static inline void hv_debug_rm_all_dir(void) {};
+static inline void hv_debug_delay_test(struct vmbus_channel *channel,
+				       enum delay delay_type) {};
+static inline int hv_debug_init(void)
+{
+	return -1;
+}
+
+static inline int hv_debug_add_dev_dir(struct hv_device *dev)
+{
+	return -1;
+}
+
+#endif /* CONFIG_HYPERV_TESTING */
+
 #endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 9a03b163cbbd..356e22159e83 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -396,6 +396,7 @@ struct vmpacket_descriptor *hv_pkt_iter_first(struct vmbus_channel *channel)
 	struct hv_ring_buffer_info *rbi = &channel->inbound;
 	struct vmpacket_descriptor *desc;
 
+	hv_debug_delay_test(channel, MESSAGE_DELAY);
 	if (hv_pkt_iter_avail(rbi) < sizeof(struct vmpacket_descriptor))
 		return NULL;
 
@@ -421,6 +422,7 @@ __hv_pkt_iter_next(struct vmbus_channel *channel,
 	u32 packetlen = desc->len8 << 3;
 	u32 dsize = rbi->ring_datasize;
 
+	hv_debug_delay_test(channel, MESSAGE_DELAY);
 	/* bump offset to next potential packet */
 	rbi->priv_read_index += packetlen + VMBUS_PKT_TRAILER;
 	if (rbi->priv_read_index >= dsize)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 391f0b225c9a..e785dd485b9b 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -958,6 +958,8 @@ static void vmbus_device_release(struct device *device)
 	struct hv_device *hv_dev = device_to_hv_device(device);
 	struct vmbus_channel *channel = hv_dev->channel;
 
+	hv_debug_rm_dev_dir(hv_dev);
+
 	mutex_lock(&vmbus_connection.channel_mutex);
 	hv_process_channel_removal(channel);
 	mutex_unlock(&vmbus_connection.channel_mutex);
@@ -1810,6 +1812,7 @@ int vmbus_device_register(struct hv_device *child_device_obj)
 		pr_err("Unable to register primary channeln");
 		goto err_kset_unregister;
 	}
+	hv_debug_add_dev_dir(child_device_obj);
 
 	return 0;
 
@@ -2369,6 +2372,7 @@ static int __init hv_acpi_init(void)
 		ret = -ETIMEDOUT;
 		goto cleanup;
 	}
+	hv_debug_init();
 
 	ret = vmbus_bus_init();
 	if (ret)
@@ -2405,6 +2409,8 @@ static void __exit vmbus_exit(void)
 
 		tasklet_kill(&hv_cpu->msg_dpc);
 	}
+	hv_debug_rm_all_dir();
+
 	vmbus_free_channels();
 
 	if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b4a017093b69..ac66577852d1 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -932,6 +932,21 @@ struct vmbus_channel {
 	 * full outbound ring buffer.
 	 */
 	u64 out_full_first;
+
+	/* enabling/disabling fuzz testing on the channel (default is false)*/
+	bool fuzz_testing_state;
+
+	/*
+	 * Interrupt delay will delay the guest from emptying the ring buffer
+	 * for a specific amount of time. The delay is in microseconds and will
+	 * be between 1 to a maximum of 1000, its default is 0 (no delay).
+	 * The  Message delay will delay guest reading on a per message basis
+	 * in microseconds between 1 to 1000 with the default being 0
+	 * (no delay).
+	 */
+	u32 fuzz_testing_interrupt_delay;
+	u32 fuzz_testing_message_delay;
+
 };
 
 static inline bool is_hvsock_channel(const struct vmbus_channel *c)
@@ -1180,6 +1195,10 @@ struct hv_device {
 
 	struct vmbus_channel *channel;
 	struct kset	     *channels_kset;
+
+	/* place holder to keep track of the dir for hv device in debugfs */
+	struct dentry *debug_dir;
+
 };
 
 
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 93d97f9b0157..55eebbc0b0fb 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2127,4 +2127,11 @@ config IO_STRICT_DEVMEM
 
 source "arch/$(SRCARCH)/Kconfig.debug"
 
+config HYPERV_TESTING
+	bool "Microsoft Hyper-V driver testing"
+	default n
+	depends on HYPERV && DEBUG_FS
+	help
+	  Select this option to enable Hyper-V vmbus testing.
+
 endmenu # Kernel hacking
-- 
2.17.1


^ permalink raw reply related

* [PATCH v6 2/2] tools: hv: add vmbus testing tool
From: Branden Bonaby @ 2019-10-03 21:02 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal
  Cc: Branden Bonaby, linux-hyperv, linux-kernel
In-Reply-To: <cover.1570130325.git.brandonbonaby94@gmail.com>

This is a userspace tool to drive the testing. Currently it supports
introducing user specified delay in the host to guest communication
path on a per-channel basis.

Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
---
Changes in v4:
- Based on Harrys comments, made the tool more
  user friendly and added more error checking.

Changes in v3:
- Align python tool to match Linux coding style.

Changes in v2:
 - Move testing location to new location in debugfs.

 tools/hv/vmbus_testing | 376 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 376 insertions(+)
 create mode 100755 tools/hv/vmbus_testing

diff --git a/tools/hv/vmbus_testing b/tools/hv/vmbus_testing
new file mode 100755
index 000000000000..e7212903dd1d
--- /dev/null
+++ b/tools/hv/vmbus_testing
@@ -0,0 +1,376 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+#
+# Program to allow users to fuzz test Hyper-V drivers
+# by interfacing with Hyper-V debugfs attributes.
+# Current test methods available:
+#       1. delay testing
+#
+# Current file/directory structure of hyper-V debugfs:
+#       /sys/kernel/debug/hyperv/UUID
+#       /sys/kernel/debug/hyperv/UUID/<test-state filename>
+#       /sys/kernel/debug/hyperv/UUID/<test-method sub-directory>
+#
+# author: Branden Bonaby <brandonbonaby94@gmail.com>
+
+import os
+import cmd
+import argparse
+import glob
+from argparse import RawDescriptionHelpFormatter
+from argparse import RawTextHelpFormatter
+from enum import Enum
+
+# Do not change unless, you change the debugfs attributes
+# in /drivers/hv/debugfs.c. All fuzz testing
+# attributes will start with "fuzz_test".
+
+# debugfs path for hyperv must exist before proceeding
+debugfs_hyperv_path = "/sys/kernel/debug/hyperv"
+if not os.path.isdir(debugfs_hyperv_path):
+        print("{} doesn't exist/check permissions".format(debugfs_hyperv_path))
+        exit(-1)
+
+class dev_state(Enum):
+        off = 0
+        on = 1
+
+# File names, that correspond to the files created in
+# /drivers/hv/debugfs.c
+class f_names(Enum):
+        state_f = "fuzz_test_state"
+        buff_f =  "fuzz_test_buffer_interrupt_delay"
+        mess_f =  "fuzz_test_message_delay"
+
+# Both single_actions and all_actions are used
+# for error checking and to allow for some subparser
+# names to be abbreviated. Do not abbreviate the
+# test method names, as it will become less intuitive
+# as to what the user can do. If you do decide to
+# abbreviate the test method name, make sure the main
+# function reflects this change.
+
+all_actions = [
+        "disable_all",
+        "D",
+        "enable_all",
+        "view_all",
+        "V"
+]
+
+single_actions = [
+        "disable_single",
+        "d",
+        "enable_single",
+        "view_single",
+        "v"
+]
+
+def main():
+
+        file_map = recursive_file_lookup(debugfs_hyperv_path, dict())
+        args = parse_args()
+        if (not args.action):
+                print ("Error, no options selected...exiting")
+                exit(-1)
+        arg_set = { k for (k,v) in vars(args).items() if v and k != "action" }
+        arg_set.add(args.action)
+        path = args.path if "path" in arg_set else None
+        if (path and path[-1] == "/"):
+                path = path[:-1]
+        validate_args_path(path, arg_set, file_map)
+        if (path and "enable_single" in arg_set):
+            state_path = locate_state(path, file_map)
+            set_test_state(state_path, dev_state.on.value, args.quiet)
+
+        # Use subparsers as the key for different actions
+        if ("delay" in arg_set):
+                validate_delay_values(args.delay_time)
+                if (args.enable_all):
+                        set_delay_all_devices(file_map, args.delay_time,
+                                              args.quiet)
+                else:
+                        set_delay_values(path, file_map, args.delay_time,
+                                         args.quiet)
+        elif ("disable_all" in arg_set or "D" in arg_set):
+                disable_all_testing(file_map)
+        elif ("disable_single" in arg_set or "d" in arg_set):
+                disable_testing_single_device(path, file_map)
+        elif ("view_all" in arg_set or "V" in arg_set):
+                get_all_devices_test_status(file_map)
+        elif ("view_single" in arg_set or  "v" in arg_set):
+                get_device_test_values(path, file_map)
+
+# Get the state location
+def locate_state(device, file_map):
+        return file_map[device][f_names.state_f.value]
+
+# Validate delay values to make sure they are acceptable to
+# enable delays on a device
+def validate_delay_values(delay):
+
+        if (delay[0]  == -1 and delay[1] == -1):
+                print("\nError, At least 1 value must be greater than 0")
+                exit(-1)
+        for i in delay:
+                if (i < -1 or i == 0 or i > 1000):
+                        print("\nError, Values must be  equal to -1 "
+                              "or be > 0 and <= 1000")
+                        exit(-1)
+
+# Validate argument path
+def validate_args_path(path, arg_set, file_map):
+
+        if (not path and any(element in arg_set for element in single_actions)):
+                print("Error, path (-p) REQUIRED for the specified option. "
+                      "Use (-h) to check usage.")
+                exit(-1)
+        elif (path and any(item in arg_set for item in all_actions)):
+                print("Error, path (-p) NOT REQUIRED for the specified option. "
+                      "Use (-h) to check usage." )
+                exit(-1)
+        elif (path not in file_map and any(item in arg_set
+                                           for item in single_actions)):
+                print("Error, path '{}' not a valid vmbus device".format(path))
+                exit(-1)
+
+# display Testing status of single device
+def get_device_test_values(path, file_map):
+
+        for name in file_map[path]:
+                file_location = file_map[path][name]
+                print( name + " = " + str(read_test_files(file_location)))
+
+# Create a map of the vmbus devices and their associated files
+# [key=device, value = [key = filename, value = file path]]
+def recursive_file_lookup(path, file_map):
+
+        for f_path in glob.iglob(path + '**/*'):
+                if (os.path.isfile(f_path)):
+                        if (f_path.rsplit("/",2)[0] == debugfs_hyperv_path):
+                                directory = f_path.rsplit("/",1)[0]
+                        else:
+                                directory = f_path.rsplit("/",2)[0]
+                        f_name = f_path.split("/")[-1]
+                        if (file_map.get(directory)):
+                                file_map[directory].update({f_name:f_path})
+                        else:
+                                file_map[directory] = {f_name:f_path}
+                elif (os.path.isdir(f_path)):
+                        recursive_file_lookup(f_path,file_map)
+        return file_map
+
+# display Testing state of devices
+def get_all_devices_test_status(file_map):
+
+        for device in file_map:
+                if (get_test_state(locate_state(device, file_map)) is 1):
+                        print("Testing = ON for: {}"
+                              .format(device.split("/")[5]))
+                else:
+                        print("Testing = OFF for: {}"
+                              .format(device.split("/")[5]))
+
+# read the vmbus device files, path must be absolute path before calling
+def read_test_files(path):
+        try:
+                with open(path,"r") as f:
+                        file_value = f.readline().strip()
+                return int(file_value)
+
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on file {2}"
+                      .format(errno, strerror, path))
+                exit(-1)
+        except ValueError:
+                print ("Element to int conversion error in: \n{}".format(path))
+                exit(-1)
+
+# writing to vmbus device files, path must be absolute path before calling
+def write_test_files(path, value):
+
+        try:
+                with open(path,"w") as f:
+                        f.write("{}".format(value))
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on file {2}"
+                      .format(errno, strerror, path))
+                exit(-1)
+
+# set testing state of device
+def set_test_state(state_path, state_value, quiet):
+
+        write_test_files(state_path, state_value)
+        if (get_test_state(state_path) is 1):
+                if (not quiet):
+                        print("Testing = ON for device: {}"
+                              .format(state_path.split("/")[5]))
+        else:
+                if (not quiet):
+                        print("Testing = OFF for device: {}"
+                              .format(state_path.split("/")[5]))
+
+# get testing state of device
+def get_test_state(state_path):
+        #state == 1 - test = ON
+        #state == 0 - test = OFF
+        return  read_test_files(state_path)
+
+# write 1 - 1000 microseconds, into a single device using the
+# fuzz_test_buffer_interrupt_delay and fuzz_test_message_delay
+# debugfs attributes
+def set_delay_values(device, file_map, delay_length, quiet):
+
+        try:
+                interrupt = file_map[device][f_names.buff_f.value]
+                message = file_map[device][f_names.mess_f.value]
+
+                # delay[0]- buffer interrupt delay, delay[1]- message delay
+                if (delay_length[0] >= 0 and delay_length[0] <= 1000):
+                        write_test_files(interrupt, delay_length[0])
+                if (delay_length[1] >= 0 and delay_length[1] <= 1000):
+                        write_test_files(message, delay_length[1])
+                if (not quiet):
+                        print("Buffer delay testing = {} for: {}"
+                              .format(read_test_files(interrupt),
+                                      interrupt.split("/")[5]))
+                        print("Message delay testing = {} for: {}"
+                              .format(read_test_files(message),
+                                      message.split("/")[5]))
+        except IOError as e:
+                errno, strerror = e.args
+                print("I/O error({0}): {1} on files {2}{3}"
+                      .format(errno, strerror, interrupt, message))
+                exit(-1)
+
+# enabling delay testing on all devices
+def set_delay_all_devices(file_map, delay, quiet):
+
+        for device in (file_map):
+                set_test_state(locate_state(device, file_map),
+                               dev_state.on.value,
+                               quiet)
+                set_delay_values(device, file_map, delay, quiet)
+
+# disable all testing on a SINGLE device.
+def disable_testing_single_device(device, file_map):
+
+        for name in file_map[device]:
+                file_location = file_map[device][name]
+                write_test_files(file_location, dev_state.off.value)
+        print("ALL testing now OFF for {}".format(device.split("/")[-1]))
+
+# disable all testing on ALL devices
+def disable_all_testing(file_map):
+
+        for device in file_map:
+                disable_testing_single_device(device, file_map)
+
+def parse_args():
+        parser = argparse.ArgumentParser(prog = "vmbus_testing",usage ="\n"
+                "%(prog)s [delay]   [-h] [-e|-E] -t [-p]\n"
+                "%(prog)s [view_all       | V]      [-h]\n"
+                "%(prog)s [disable_all    | D]      [-h]\n"
+                "%(prog)s [disable_single | d]      [-h|-p]\n"
+                "%(prog)s [view_single    | v]      [-h|-p]\n"
+                "%(prog)s --version\n",
+                description = "\nUse lsvmbus to get vmbus device type "
+                "information.\n" "\nThe debugfs root path is "
+                "/sys/kernel/debug/hyperv",
+                formatter_class = RawDescriptionHelpFormatter)
+        subparsers = parser.add_subparsers(dest = "action")
+        parser.add_argument("--version", action = "version",
+                version = '%(prog)s 0.1.0')
+        parser.add_argument("-q","--quiet", action = "store_true",
+                help = "silence none important test messages."
+                       " This will only work when enabling testing"
+                       " on a device.")
+        # Use the path parser to hold the --path attribute so it can
+        # be shared between subparsers. Also do the same for the state
+        # parser, as all testing methods will use --enable_all and
+        # enable_single.
+        path_parser = argparse.ArgumentParser(add_help=False)
+        path_parser.add_argument("-p","--path", metavar = "",
+                help = "Debugfs path to a vmbus device. The path "
+                "must be the absolute path to the device.")
+        state_parser = argparse.ArgumentParser(add_help=False)
+        state_group = state_parser.add_mutually_exclusive_group(required = True)
+        state_group.add_argument("-E", "--enable_all", action = "store_const",
+                                 const = "enable_all",
+                                 help = "Enable the specified test type "
+                                 "on ALL vmbus devices.")
+        state_group.add_argument("-e", "--enable_single",
+                                 action = "store_const",
+                                 const = "enable_single",
+                                 help = "Enable the specified test type on a "
+                                 "SINGLE vmbus device.")
+        parser_delay = subparsers.add_parser("delay",
+                        parents = [state_parser, path_parser],
+                        help = "Delay the ring buffer interrupt or the "
+                        "ring buffer message reads in microseconds.",
+                        prog = "vmbus_testing",
+                        usage = "%(prog)s [-h]\n"
+                        "%(prog)s -E -t [value] [value]\n"
+                        "%(prog)s -e -t [value] [value] -p",
+                        description = "Delay the ring buffer interrupt for "
+                        "vmbus devices, or delay the ring buffer message "
+                        "reads for vmbus devices (both in microseconds). This "
+                        "is only on the host to guest channel.")
+        parser_delay.add_argument("-t", "--delay_time", metavar = "", nargs = 2,
+                        type = check_range, default =[0,0], required = (True),
+                        help = "Set [buffer] & [message] delay time. "
+                        "Value constraints: -1 == value "
+                        "or 0 < value <= 1000.\n"
+                        "Use -1 to keep the previous value for that delay "
+                        "type, or a value > 0 <= 1000 to change the delay "
+                        "time.")
+        parser_dis_all = subparsers.add_parser("disable_all",
+                        aliases = ['D'], prog = "vmbus_testing",
+                        usage = "%(prog)s [disable_all | D] -h\n"
+                        "%(prog)s [disable_all | D]\n",
+                        help = "Disable ALL testing on ALL vmbus devices.",
+                        description = "Disable ALL testing on ALL vmbus "
+                        "devices.")
+        parser_dis_single = subparsers.add_parser("disable_single",
+                        aliases = ['d'],
+                        parents = [path_parser], prog = "vmbus_testing",
+                        usage = "%(prog)s [disable_single | d] -h\n"
+                        "%(prog)s [disable_single | d] -p\n",
+                        help = "Disable ALL testing on a SINGLE vmbus device.",
+                        description = "Disable ALL testing on a SINGLE vmbus "
+                        "device.")
+        parser_view_all = subparsers.add_parser("view_all", aliases = ['V'],
+                        help = "View the test state for ALL vmbus devices.",
+                        prog = "vmbus_testing",
+                        usage = "%(prog)s [view_all | V] -h\n"
+                        "%(prog)s [view_all | V]\n",
+                        description = "This shows the test state for ALL the "
+                        "vmbus devices.")
+        parser_view_single = subparsers.add_parser("view_single",
+                        aliases = ['v'],parents = [path_parser],
+                        help = "View the test values for a SINGLE vmbus "
+                        "device.",
+                        description = "This shows the test values for a SINGLE "
+                        "vmbus device.", prog = "vmbus_testing",
+                        usage = "%(prog)s [view_single | v] -h\n"
+                        "%(prog)s [view_single | v] -p")
+
+        return  parser.parse_args()
+
+# value checking for range checking input in parser
+def check_range(arg1):
+
+        try:
+                val = int(arg1)
+        except ValueError as err:
+                raise argparse.ArgumentTypeError(str(err))
+        if val < -1 or val > 1000:
+                message = ("\n\nvalue must be -1 or  0 < value <= 1000. "
+                           "Value program received: {}\n").format(val)
+                raise argparse.ArgumentTypeError(message)
+        return val
+
+if __name__ == "__main__":
+        main()
-- 
2.17.1


^ permalink raw reply related

* RE: [PATCH v6 1/2] drivers: hv: vmbus: Introduce latency testing
From: Michael Kelley @ 2019-10-03 23:05 UTC (permalink / raw)
  To: brandonbonaby94, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org
  Cc: brandonbonaby94, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <d3e32c4995c8e4992fab91c3e43c2b0d6a3ef0f2.1570130325.git.brandonbonaby94@gmail.com>

From: Branden Bonaby <brandonbonaby94@gmail.com> Sent: Thursday, October 3, 2019 2:02 PM
> 
> Introduce user specified latency in the packet reception path
> By exposing the test parameters as part of the debugfs channel
> attributes. We will control the testing state via these attributes.
> 
> Signed-off-by: Branden Bonaby <brandonbonaby94@gmail.com>
> ---
> changes in v6:
>  - changed kernel version in
>    Documentation/ABI/testing/debugfs-hyperv to 5.5
> 
>  - removed less than 0 if statement when dealing with
>    u64 datatype, as suggested by Michael.
> 
> changes in v5:
>  - As per Stephen's suggestion, Moved CONFIG_HYPERV_TESTING
>    to lib/Kconfig.debug.
> 
>  - Fixed build issue reported by Kbuild, with Michael's
>    suggestion to make hv_debugfs part of the hv_vmbus
>    module.
> 
>  - updated debugfs-hyperv to show kernel version 5.4
> 
> changes in v4:
>  - Combined v3 patch 2 into this patch, and changed the
>    commit description to reflect this.
> 
>  - Moved debugfs code from "vmbus_drv.c" that was in
>    previous v3 patch 2, into a new file "debugfs.c" in
>    drivers/hv.
> 
>  - Updated the Makefile to compile "debugfs.c" if
>    CONFIG_HYPERV_TESTING is enabled
> 
>  - As per Michael's comments, added empty implementations
>    of the new functions, so the compiler will not generate
>    code when CONFIG_HYPERV_TESTING is not enabled.
> 
>  - Added microseconds into description for files in
>    Documentation/ABI/testing/debugfs-hyperv.
> 
> Changes in v2:
>  - Add #ifdef in Kconfig file so test code will not interfere
>    with non-test code.
>  - Move test code functions for delay to hyperv_vmbus header
>    file.
>  - Wrap test code under #ifdef statement.
> 
>  Documentation/ABI/testing/debugfs-hyperv |  23 +++
>  MAINTAINERS                              |   1 +
>  drivers/hv/Makefile                      |   1 +
>  drivers/hv/connection.c                  |   1 +
>  drivers/hv/hv_debugfs.c                  | 178 +++++++++++++++++++++++
>  drivers/hv/hyperv_vmbus.h                |  31 ++++
>  drivers/hv/ring_buffer.c                 |   2 +
>  drivers/hv/vmbus_drv.c                   |   6 +
>  include/linux/hyperv.h                   |  19 +++
>  lib/Kconfig.debug                        |   7 +
>  10 files changed, 269 insertions(+)
>  create mode 100644 Documentation/ABI/testing/debugfs-hyperv
>  create mode 100644 drivers/hv/hv_debugfs.c
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Michael Kelley @ 2019-10-03 23:15 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, x86@kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin
In-Reply-To: <20191003155200.22022-1-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Thursday, October 3, 2019 8:52 AM
> 
> If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
> HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
> configuration version.  Bit 15 corresponds to the
> AccessTscInvariantControls privilege.  If this privilege bit is set,
> guests can access the HvSyntheticInvariantTscControl MSR: guests can
> set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
> After setting the synthetic MSR, CPUID will enumerate support for
> InvariantTSC.
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 5 +++++
>  arch/x86/kernel/cpu/mshyperv.c     | 7 ++++++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 

As noted in a separate email, this patch is standalone, not 1 of 2 as
indicated in the subject line.  Modulo that,

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply

* RE: [RFC PATCH 00/13] vsock: add multi-transports support
From: Dexuan Cui @ 2019-10-04  0:04 UTC (permalink / raw)
  To: Stefano Garzarella, netdev@vger.kernel.org
  Cc: linux-hyperv@vger.kernel.org, KY Srinivasan, Stefan Hajnoczi,
	Sasha Levin, linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	David S. Miller, virtualization@lists.linux-foundation.org,
	Stephen Hemminger, Jason Wang, Michael S. Tsirkin, Haiyang Zhang,
	Jorgen Hansen
In-Reply-To: <20190927112703.17745-1-sgarzare@redhat.com>

> From: Stefano Garzarella <sgarzare@redhat.com>
> Sent: Friday, September 27, 2019 4:27 AM
>  ...
> Patch 9 changes the hvs_remote_addr_init(). setting the
> VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make
> the choice of transport to be used work properly.
> @Dexuan Could this change break anything?

This patch looks good to me.

> @Dexuan please can you test on HyperV that I didn't break anything
> even without nested VMs?

I did some quick tests with the 13 patches in a Linux VM (this is not
a nested VM) on Hyper-V and it looks nothing is broken. :-)

> I'll try to setup a Windows host where to test the nested VMs

I suppose you're going to run a Linux VM on a Hyper-V host,
and the Linux VM itself runs KVM/VmWare so it can create its own child 
VMs. IMO this is similar to the test "nested KVM ( ..., virtio-transport[L1,L2]"
you have done.
.
Thanks!
Dexuan

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox