Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [GIT PULL] Hyper-V commits for v5.4-rc
From: pr-tracker-bot @ 2019-10-11 17:50 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linus Torvalds, linux-kernel, linux-hyperv, kys, sthemmin,
	linux-kernel
In-Reply-To: <20191011150106.44699206CD@mail.kernel.org>

The pull request you sent on Fri, 11 Oct 2019 11:01:05 -0400:

> git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/56c642e2aa1c3be3e51e136eace6502aca8116ab

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply

* Re: [PATCH v5 0/5] Add a unified parameter "nopvspin"
From: Zhenzhong Duan @ 2019-10-12  8:40 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

The last two patches are reviewed, will any KVM expert be willing to 
review the first three patches?

They are all KVM related changes. Thanks

Zhenzhong

On 2019/10/7 17:04, Zhenzhong Duan wrote:
> There are cases folks want to disable spinlock optimization for
> debug/test purpose. Xen and hyperv already have parameters "xen_nopvspin"
> and "hv_nopvspin" to support that, but kvm doesn't.
>
> The first patch adds that feature to KVM guest with "nopvspin".
>
> For compatibility reason original parameters "xen_nopvspin" and
> "hv_nopvspin" are retained and marked obsolete.
>
> v5:
> PATCH1: new patch to revert a currently unnecessory commit,
>          code is simpler a bit after that change.         [Boris Ostrovsky]
> PATCH3: fold 'if' statement,add comments on virt_spin_lock_key,
>          reorder with PATCH2 to better reflect dependency
> PATCH4: fold 'if' statement, add Reviewed-by             [Boris Ostrovsky]
> PATCH5: add Reviewed-by                                  [Michael Kelley]
>
> v4:
> PATCH1: use variable name nopvspin instead of pvspin and
>          defined it as __initdata, changed print message,
>          updated patch description                     [Sean Christopherson]
> PATCH2: remove Suggested-by, use "kvm-guest:" prefix  [Sean Christopherson]
> PATCH3: make variable nopvsin and xen_pvspin coexist
>          remove Reviewed-by due to code change         [Sean Christopherson]
> PATCH4: make variable nopvsin and hv_pvspin coexist   [Sean Christopherson]
>
> v3:
> PATCH2: Fix indentation
>
> v2:
> PATCH1: pick the print code change into separate PATCH2,
>          updated patch description             [Vitaly Kuznetsov]
> PATCH2: new patch with print code change      [Vitaly Kuznetsov]
> PATCH3: add Reviewed-by                       [Juergen Gross]
>
> Zhenzhong Duan (5):
>    Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key
>      get initialized"
>    x86/kvm: Change print code to use pr_*() format
>    x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
>    xen: Mark "xen_nopvspin" parameter obsolete
>    x86/hyperv: Mark "hv_nopvspin" parameter obsolete
>
>   Documentation/admin-guide/kernel-parameters.txt | 14 +++++-
>   arch/x86/hyperv/hv_spinlock.c                   |  4 ++
>   arch/x86/include/asm/qspinlock.h                |  1 +
>   arch/x86/kernel/kvm.c                           | 63 ++++++++++++++-----------
>   arch/x86/xen/spinlock.c                         |  4 +-
>   kernel/locking/qspinlock.c                      |  7 +++
>   6 files changed, 62 insertions(+), 31 deletions(-)
>

^ permalink raw reply

* RE: [PATCH v2 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Michael Kelley @ 2019-10-12 13:34 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	David S . Miller, vkuznets, Dexuan Cui
In-Reply-To: <20191010154600.23875-3-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Thursday, October 10, 2019 8:46 AM
> 
> Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
> versions.  Allow Linux guests to negotiate these new protocol versions
> on versions of Hyper-V that support them.  While on this, also allow
> guests to negotiate the VMBus protocol version 4.1 (which was missing).
> 
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c          | 15 +++++++++------
>  drivers/net/hyperv/netvsc.c      |  6 +++---
>  include/linux/hyperv.h           |  8 +++++++-
>  net/vmw_vsock/hyperv_transport.c |  4 ++--
>  4 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index c08b62dbd151f..a4f80e30b0207 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -182,15 +182,21 @@ static inline u32 hv_get_avail_to_write_percent(
>   * 2 . 4  (Windows 8)
>   * 3 . 0  (Windows 8 R2)
>   * 4 . 0  (Windows 10)
> + * 4 . 1  (Windows 10 RS3)
>   * 5 . 0  (Newer Windows 10)
> + * 5 . 1  (Windows 10 RS4)
> + * 5 . 2  (Windows Server 2019, RS5)
>   */
> 
>  #define VERSION_WS2008  ((0 << 16) | (13))
>  #define VERSION_WIN7    ((1 << 16) | (1))
>  #define VERSION_WIN8    ((2 << 16) | (4))
>  #define VERSION_WIN8_1    ((3 << 16) | (0))
> -#define VERSION_WIN10	((4 << 16) | (0))
> +#define VERSION_WIN10_V4 ((4 << 16) | (0))

I would recommend not changing the symbol name for version 4.0.
The change makes it more consistent with the later VERSION_WIN10_*
symbols, but it doesn't fundamentally add any clarity and I'm not sure
it's worth the churn in the other files that have to be touched. It's a
judgment call, and that's just my input.

> +#define VERSION_WIN10_V4_1 ((4 << 16) | (1))
>  #define VERSION_WIN10_V5 ((5 << 16) | (0))
> +#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
> +#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
> 

Michael

^ permalink raw reply

* RE: [PATCH v2 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Michael Kelley @ 2019-10-12 13:47 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	David S . Miller, vkuznets, Dexuan Cui
In-Reply-To: <20191010154600.23875-2-parri.andrea@gmail.com>

From: Andrea Parri <parri.andrea@gmail.com> Sent: Thursday, October 10, 2019 8:46 AM
> 
> The technique used to get the next VMBus version seems increasisly
> clumsy as the number of VMBus versions increases.  Performance is
> not a concern since this is only done once during system boot; it's
> just that we'll end up with more lines of code than is really needed.
> 
> As an alternative, introduce a table with the version numbers listed
> in order (from the most recent to the oldest).  vmbus_connect() loops
> through the versions listed in the table until it gets an accepted
> connection or gets to the end of the table (invalid version).
> 
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c | 46 ++++++++++++++---------------------------
>  drivers/hv/vmbus_drv.c  |  3 +--
>  include/linux/hyperv.h  |  4 ----
>  3 files changed, 17 insertions(+), 36 deletions(-)
> 
> @@ -244,20 +232,18 @@ int vmbus_connect(void)
>  	 * version.
>  	 */
> 
> -	version = VERSION_CURRENT;
> +	for (i = 0; i < ARRAY_SIZE(vmbus_versions); i++) {
> +		version = vmbus_versions[i];
> 
> -	do {
>  		ret = vmbus_negotiate_version(msginfo, version);
>  		if (ret == -ETIMEDOUT)
>  			goto cleanup;
> 
>  		if (vmbus_connection.conn_state == CONNECTED)
>  			break;
> +	}
> 
> -		version = vmbus_get_next_version(version);
> -	} while (version != VERSION_INVAL);
> -
> -	if (version == VERSION_INVAL)
> +	if (vmbus_connection.conn_state != CONNECTED)
>  		goto cleanup;
> 

This is a nit, but the loop exit path bugs me.  When a connection
is established, the loop is exited by the "break", and then
conn_state has to be tested again to decide whether the loop
exited due to getting a connection vs. hitting the end of the list.
Slightly cleaner in my mind would be:

	for (i=0; ; i++) {
		if (i == ARRAY_SIZE(vmbus_versions))
			goto cleanup;

		version  = vmbus_versions[i];
		ret = vmbus_negotiate_version(msginfo, version);
		if (ret == -ETIMEDOUT)
			goto cleanup;

		if (vmbus_connection.conn_state == CONNECTED)
			break;
	}

Michael

^ permalink raw reply

* [PATCH net-next v2] hv_sock: use HV_HYP_PAGE_SIZE for Hyper-V communication
From: Michael Kelley @ 2019-10-13  0:30 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, davem@davemloft.net,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
  Cc: himadrispandya@gmail.com

From: Himadri Pandya <himadrispandya@gmail.com>

Current code assumes PAGE_SIZE (the guest page size) is equal
to the page size used to communicate with Hyper-V (which is
always 4K). While this assumption is true on x86, it may not
be true for Hyper-V on other architectures. For example,
Linux on ARM64 may have PAGE_SIZE of 16K or 64K. A new symbol,
HV_HYP_PAGE_SIZE, has been previously introduced to use when
the Hyper-V page size is intended instead of the guest page size.

Make this code work on non-x86 architectures by using the new
HV_HYP_PAGE_SIZE symbol instead of PAGE_SIZE, where appropriate.
Also replace the now redundant PAGE_SIZE_4K with HV_HYP_PAGE_SIZE.
The change has no effect on x86, but lays the groundwork to run
on ARM64 and others.

Signed-off-by: Himadri Pandya <himadrispandya@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
---

Changes in v2:
* Revised commit message and subject [Jakub Kicinski]

---
 net/vmw_vsock/hyperv_transport.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 261521d..d2929ea 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -13,15 +13,16 @@
 #include <linux/hyperv.h>
 #include <net/sock.h>
 #include <net/af_vsock.h>
+#include <asm/hyperv-tlfs.h>
 
 /* Older (VMBUS version 'VERSION_WIN10' or before) Windows hosts have some
- * stricter requirements on the hv_sock ring buffer size of six 4K pages. Newer
- * hosts don't have this limitation; but, keep the defaults the same for compat.
+ * stricter requirements on the hv_sock ring buffer size of six 4K pages.
+ * hyperv-tlfs defines HV_HYP_PAGE_SIZE as 4K. Newer hosts don't have this
+ * limitation; but, keep the defaults the same for compat.
  */
-#define PAGE_SIZE_4K		4096
-#define RINGBUFFER_HVS_RCV_SIZE (PAGE_SIZE_4K * 6)
-#define RINGBUFFER_HVS_SND_SIZE (PAGE_SIZE_4K * 6)
-#define RINGBUFFER_HVS_MAX_SIZE (PAGE_SIZE_4K * 64)
+#define RINGBUFFER_HVS_RCV_SIZE (HV_HYP_PAGE_SIZE * 6)
+#define RINGBUFFER_HVS_SND_SIZE (HV_HYP_PAGE_SIZE * 6)
+#define RINGBUFFER_HVS_MAX_SIZE (HV_HYP_PAGE_SIZE * 64)
 
 /* The MTU is 16KB per the host side's design */
 #define HVS_MTU_SIZE		(1024 * 16)
@@ -54,7 +55,8 @@ struct hvs_recv_buf {
  * ringbuffer APIs that allow us to directly copy data from userspace buffer
  * to VMBus ringbuffer.
  */
-#define HVS_SEND_BUF_SIZE (PAGE_SIZE_4K - sizeof(struct vmpipe_proto_header))
+#define HVS_SEND_BUF_SIZE \
+		(HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
 
 struct hvs_send_buf {
 	/* The header before the payload data */
@@ -393,10 +395,10 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 	} else {
 		sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
 		sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
-		sndbuf = ALIGN(sndbuf, PAGE_SIZE);
+		sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
 		rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
 		rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
-		rcvbuf = ALIGN(rcvbuf, PAGE_SIZE);
+		rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
 	}
 
 	ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
@@ -670,7 +672,7 @@ static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
 	ssize_t ret = 0;
 	ssize_t bytes_written = 0;
 
-	BUILD_BUG_ON(sizeof(*send_buf) != PAGE_SIZE_4K);
+	BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
 
 	send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
 	if (!send_buf)
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v5 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Vitaly Kuznetsov @ 2019-10-13  9:02 UTC (permalink / raw)
  To: Zhenzhong Duan, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <1570439071-9814-4-git-send-email-zhenzhong.duan@oracle.com>

Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:

> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" on XEN platform and
> "hv_nopvspin" on HYPER_V).
>
> That feature is missed on KVM, add a new parameter "nopvspin" to disable
> PV spinlocks for KVM guest.
>
> The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
> parameters in future patches.
>
> Define variable nopvsin as global because it will be used in future
> patches as above.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krcmar <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will@kernel.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  5 +++++
>  arch/x86/include/asm/qspinlock.h                |  1 +
>  arch/x86/kernel/kvm.c                           | 21 +++++++++++++++++----
>  kernel/locking/qspinlock.c                      |  7 +++++++
>  4 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c7ac2f3..89d77ea 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -5330,6 +5330,11 @@
>  			as generic guest with no PV drivers. Currently support
>  			XEN HVM, KVM, HYPER_V and VMWARE guest.
>  
> +	nopvspin	[X86,KVM]
> +			Disables the qspinlock slow path using PV optimizations
> +			which allow the hypervisor to 'idle' the guest on lock
> +			contention.
> +
>  	xirc2ps_cs=	[NET,PCMCIA]
>  			Format:
>  			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
> diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
> index 444d6fd..d86ab94 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
>  extern void __pv_init_lock_hash(void);
>  extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
>  extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
> +extern bool nopvspin;
>  
>  #define	queued_spin_unlock queued_spin_unlock
>  /**
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index ef836d6..6e14bd4 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -825,18 +825,31 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
>   */
>  void __init kvm_spinlock_init(void)
>  {
> -	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> -	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> +	/*
> +	 * Disable PV qspinlocks if host kernel doesn't support
> +	 * KVM_FEATURE_PV_UNHALT feature or there is only 1 vCPU.
> +	 * virt_spin_lock_key is enabled to avoid lock holder
> +	 * preemption issue.
> +	 */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
> +	    num_possible_cpus() == 1) {
> +		pr_info("PV spinlocks disabled\n");

Why don't we need static_branch_disable(&virt_spin_lock_key) here?

Also, as you're printing the exact reason for PV spinlocks disablement
in other cases, I'd suggest separating "no host support" and "single
CPU" cases.

>  		return;
> +	}
>  
>  	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
> +		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
>  		static_branch_disable(&virt_spin_lock_key);
>  		return;
>  	}
>  
> -	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
> -	if (num_possible_cpus() == 1)
> +	if (nopvspin) {
> +		pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");

Nit: to make it sound better a comma is missing between 'disabled' and
'forced', or

"PV spinlocks forcefully disabled by ..." if you prefer.

> +		static_branch_disable(&virt_spin_lock_key);
>  		return;
> +	}
> +
> +	pr_info("PV spinlocks enabled\n");
>  
>  	__pv_init_lock_hash();
>  	pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
> diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
> index 2473f10..75193d6 100644
> --- a/kernel/locking/qspinlock.c
> +++ b/kernel/locking/qspinlock.c
> @@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
>  #include "qspinlock_paravirt.h"
>  #include "qspinlock.c"
>  
> +bool nopvspin __initdata;
> +static __init int parse_nopvspin(char *arg)
> +{
> +	nopvspin = true;
> +	return 0;
> +}
> +early_param("nopvspin", parse_nopvspin);
>  #endif

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v5 2/5] x86/kvm: Change print code to use pr_*() format
From: Vitaly Kuznetsov @ 2019-10-13  9:06 UTC (permalink / raw)
  To: Zhenzhong Duan, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1570439071-9814-3-git-send-email-zhenzhong.duan@oracle.com>

Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:

> pr_*() is preferred than printk(KERN_* ...), after change all the print
> in arch/x86/kernel/kvm.c will have "kvm_guest: xxx" style.
>
> No functional change.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krcmar <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/kernel/kvm.c | 30 ++++++++++++++++--------------
>  1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 3bc6a266..ef836d6 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -7,6 +7,8 @@
>   *   Authors: Anthony Liguori <aliguori@us.ibm.com>
>   */
>  
> +#define pr_fmt(fmt) "kvm_guest: " fmt
> +
>  #include <linux/context_tracking.h>
>  #include <linux/init.h>
>  #include <linux/kernel.h>
> @@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
>  		return;
>  
>  	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
> -	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
> -		cpu, (unsigned long long) slow_virt_to_phys(st));
> +	pr_info("stealtime: cpu %d, msr %llx\n", cpu,
> +		(unsigned long long) slow_virt_to_phys(st));
>  }
>  
>  static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
> @@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
>  
>  		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
>  		__this_cpu_write(apf_reason.enabled, 1);
> -		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
> -		       smp_processor_id());
> +		pr_info("setup async PF for cpu %d\n", smp_processor_id());
>  	}
>  
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
> @@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
>  	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
>  	__this_cpu_write(apf_reason.enabled, 0);
>  
> -	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
> -	       smp_processor_id());
> +	pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id());
>  }
>  
>  static void kvm_pv_guest_cpu_reboot(void *unused)
> @@ -469,7 +469,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>  		} else {
>  			ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
>  				(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
> -			WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
> +			WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
> +				  ret);
>  			min = max = apic_id;
>  			ipi_bitmap = 0;
>  		}
> @@ -479,7 +480,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>  	if (ipi_bitmap) {
>  		ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
>  			(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
> -		WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
> +		WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
> +			  ret);
>  	}
>  
>  	local_irq_restore(flags);
> @@ -509,7 +511,7 @@ static void kvm_setup_pv_ipi(void)
>  {
>  	apic->send_IPI_mask = kvm_send_ipi_mask;
>  	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
> -	pr_info("KVM setup pv IPIs\n");
> +	pr_info("setup pv IPIs\n");

Not your fault but in WARN_ONCE() above we use 'PV' capitalized so I'd
suggest we converge on something: either capitalize them all or make
them all lowercase.

>  }
>  
>  static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
> @@ -631,11 +633,11 @@ static void __init kvm_guest_init(void)
>  	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
>  	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
>  		smp_ops.send_call_func_ipi = kvm_smp_send_call_func_ipi;
> -		pr_info("KVM setup pv sched yield\n");
> +		pr_info("setup pv sched yield\n");

here

>  	}
>  	if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
>  				      kvm_cpu_online, kvm_cpu_down_prepare) < 0)
> -		pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
> +		pr_err("failed to install cpu hotplug callbacks\n");
>  #else
>  	sev_map_percpu_data();
>  	kvm_guest_cpu_init();
> @@ -738,7 +740,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
>  			zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
>  				GFP_KERNEL, cpu_to_node(cpu));
>  		}
> -		pr_info("KVM setup pv remote TLB flush\n");
> +		pr_info("setup pv remote TLB flush\n");

and here too.

>  	}
>  
>  	return 0;
> @@ -866,8 +868,8 @@ static void kvm_enable_host_haltpoll(void *i)
>  void arch_haltpoll_enable(unsigned int cpu)
>  {
>  	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
> -		pr_err_once("kvm: host does not support poll control\n");
> -		pr_err_once("kvm: host upgrade recommended\n");
> +		pr_err_once("host does not support poll control\n");
> +		pr_err_once("host upgrade recommended\n");
>  		return;
>  	}

Other than the above,

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v5 1/5] Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized"
From: Vitaly Kuznetsov @ 2019-10-13  9:08 UTC (permalink / raw)
  To: Zhenzhong Duan, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1570439071-9814-2-git-send-email-zhenzhong.duan@oracle.com>

Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:

> This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff.
>
> Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching
> early") adds jump_label_init() call in setup_arch() to make static
> keys initialized early, so we could use the original simpler code
> again.
>
> The similar change for XEN is in commit 090d54bcbc54 ("Revert
> "x86/paravirt: Set up the virt_spin_lock_key after static keys get
> initialized"")
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krcmar <rkrcmar@redhat.com>
> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> Cc: Wanpeng Li <wanpengli@tencent.com>
> Cc: Jim Mattson <jmattson@google.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> ---
>  arch/x86/kernel/kvm.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index e820568..3bc6a266 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -527,13 +527,6 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
>  	}
>  }
>  
> -static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
> -{
> -	native_smp_prepare_cpus(max_cpus);
> -	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
> -		static_branch_disable(&virt_spin_lock_key);
> -}
> -
>  static void __init kvm_smp_prepare_boot_cpu(void)
>  {
>  	/*
> @@ -633,7 +626,6 @@ static void __init kvm_guest_init(void)
>  		apic_set_eoi_write(kvm_guest_apic_eoi_write);
>  
>  #ifdef CONFIG_SMP
> -	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
>  	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
>  	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
> @@ -835,8 +827,10 @@ void __init kvm_spinlock_init(void)
>  	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>  		return;
>  
> -	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
> +	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
> +		static_branch_disable(&virt_spin_lock_key);
>  		return;
> +	}
>  
>  	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
>  	if (num_possible_cpus() == 1)

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v5 2/5] x86/kvm: Change print code to use pr_*() format
From: Zhenzhong Duan @ 2019-10-14  1:38 UTC (permalink / raw)
  To: Vitaly Kuznetsov, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	H. Peter Anvin
In-Reply-To: <87lftp5819.fsf@vitty.brq.redhat.com>


On 2019/10/13 17:06, Vitaly Kuznetsov wrote:
> Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:
>
>> pr_*() is preferred than printk(KERN_* ...), after change all the print
>> in arch/x86/kernel/kvm.c will have "kvm_guest: xxx" style.
>>
>> No functional change.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krcmar <rkrcmar@redhat.com>
>> Cc: Sean Christopherson <sean.j.christopherson@intel.com>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> Cc: Wanpeng Li <wanpengli@tencent.com>
>> Cc: Jim Mattson <jmattson@google.com>
>> Cc: Joerg Roedel <joro@8bytes.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> ---
>>   arch/x86/kernel/kvm.c | 30 ++++++++++++++++--------------
>>   1 file changed, 16 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 3bc6a266..ef836d6 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -7,6 +7,8 @@
>>    *   Authors: Anthony Liguori <aliguori@us.ibm.com>
>>    */
>>   
>> +#define pr_fmt(fmt) "kvm_guest: " fmt
>> +
>>   #include <linux/context_tracking.h>
>>   #include <linux/init.h>
>>   #include <linux/kernel.h>
>> @@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
>>   		return;
>>   
>>   	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
>> -	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
>> -		cpu, (unsigned long long) slow_virt_to_phys(st));
>> +	pr_info("stealtime: cpu %d, msr %llx\n", cpu,
>> +		(unsigned long long) slow_virt_to_phys(st));
>>   }
>>   
>>   static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
>> @@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
>>   
>>   		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
>>   		__this_cpu_write(apf_reason.enabled, 1);
>> -		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
>> -		       smp_processor_id());
>> +		pr_info("setup async PF for cpu %d\n", smp_processor_id());
>>   	}
>>   
>>   	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
>> @@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
>>   	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
>>   	__this_cpu_write(apf_reason.enabled, 0);
>>   
>> -	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
>> -	       smp_processor_id());
>> +	pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id());
>>   }
>>   
>>   static void kvm_pv_guest_cpu_reboot(void *unused)
>> @@ -469,7 +469,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>>   		} else {
>>   			ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
>>   				(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
>> -			WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
>> +			WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
>> +				  ret);
>>   			min = max = apic_id;
>>   			ipi_bitmap = 0;
>>   		}
>> @@ -479,7 +480,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
>>   	if (ipi_bitmap) {
>>   		ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
>>   			(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
>> -		WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
>> +		WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
>> +			  ret);
>>   	}
>>   
>>   	local_irq_restore(flags);
>> @@ -509,7 +511,7 @@ static void kvm_setup_pv_ipi(void)
>>   {
>>   	apic->send_IPI_mask = kvm_send_ipi_mask;
>>   	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
>> -	pr_info("KVM setup pv IPIs\n");
>> +	pr_info("setup pv IPIs\n");
> Not your fault but in WARN_ONCE() above we use 'PV' capitalized so I'd
> suggest we converge on something: either capitalize them all or make
> them all lowercase.

Thanks for catching, will do with 'PV' for all print.

Zhenzhong


^ permalink raw reply

* Re: [PATCH v5 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-14  1:52 UTC (permalink / raw)
  To: Vitaly Kuznetsov, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <87o8yl587f.fsf@vitty.brq.redhat.com>


On 2019/10/13 17:02, Vitaly Kuznetsov wrote:
> Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:
...snip
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index ef836d6..6e14bd4 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -825,18 +825,31 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
>    */
>   void __init kvm_spinlock_init(void)
>   {
> -	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
> -	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
> +	/*
> +	 * Disable PV qspinlocks if host kernel doesn't support
> +	 * KVM_FEATURE_PV_UNHALT feature or there is only 1 vCPU.
> +	 * virt_spin_lock_key is enabled to avoid lock holder
> +	 * preemption issue.
> +	 */
> +	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
> +	    num_possible_cpus() == 1) {
> +		pr_info("PV spinlocks disabled\n");
> Why don't we need static_branch_disable(&virt_spin_lock_key) here?

Thanks for review.

I have a brief explanation in above comment area.

Boris also raised the same question in v4 and see my detailed explanation

in https://lkml.org/lkml/2019/10/6/39

>
> Also, as you're printing the exact reason for PV spinlocks disablement
> in other cases, I'd suggest separating "no host support" and "single
> CPU" cases.

Will do after reaching a consensus on your first question.

>
>>   		return;
>> +	}
>>   
>>   	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
>> +		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
>>   		static_branch_disable(&virt_spin_lock_key);
>>   		return;
>>   	}
>>   
>> -	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
>> -	if (num_possible_cpus() == 1)
>> +	if (nopvspin) {
>> +		pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
> Nit: to make it sound better a comma is missing between 'disabled' and
> 'forced', or
>
> "PV spinlocks forcefully disabled by ..." if you prefer.

Will do.

Zhenzhong



^ permalink raw reply

* Re: [PATCH v5 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Vitaly Kuznetsov @ 2019-10-14  9:18 UTC (permalink / raw)
  To: Zhenzhong Duan, linux-kernel
  Cc: linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal, tglx, mingo,
	bp, pbonzini, rkrcmar, sean.j.christopherson, wanpengli, jmattson,
	joro, boris.ostrovsky, jgross, sstabellini, peterz,
	Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <4e1ef1d3-527b-bb70-5536-d9daeb50b7c7@oracle.com>

Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:

> On 2019/10/13 17:02, Vitaly Kuznetsov wrote:
>> Zhenzhong Duan <zhenzhong.duan@oracle.com> writes:
> ...snip
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index ef836d6..6e14bd4 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -825,18 +825,31 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
>>    */
>>   void __init kvm_spinlock_init(void)
>>   {
>> -	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>> -	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>> +	/*
>> +	 * Disable PV qspinlocks if host kernel doesn't support
>> +	 * KVM_FEATURE_PV_UNHALT feature or there is only 1 vCPU.
>> +	 * virt_spin_lock_key is enabled to avoid lock holder
>> +	 * preemption issue.
>> +	 */
>> +	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
>> +	    num_possible_cpus() == 1) {
>> +		pr_info("PV spinlocks disabled\n");
>> Why don't we need static_branch_disable(&virt_spin_lock_key) here?
>
> Thanks for review.
>
> I have a brief explanation in above comment area.
>
> Boris also raised the same question in v4 and see my detailed explanation
>
> in https://lkml.org/lkml/2019/10/6/39
>
>>
>> Also, as you're printing the exact reason for PV spinlocks disablement
>> in other cases, I'd suggest separating "no host support" and "single
>> CPU" cases.
>
> Will do after reaching a consensus on your first question.

Oh, sorry I missed v4 discussion. As I'm not the first to ask why we
don't do static_branch_disable(&virt_spin_lock_key) here I suggest we do
the followin:

- Split !kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) and
  num_possible_cpus() == 1 cases
- Do static_branch_disable(&virt_spin_lock_key) for UP case (just for
  consistency).
- Add a comment why we don't do that for
  !kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) case (basically, what you
  replied to Boris)

This will also allow us to print the exact reason.

>
>>
>>>   		return;
>>> +	}
>>>   
>>>   	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
>>> +		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
>>>   		static_branch_disable(&virt_spin_lock_key);
>>>   		return;
>>>   	}
>>>   
>>> -	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
>>> -	if (num_possible_cpus() == 1)
>>> +	if (nopvspin) {
>>> +		pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
>> Nit: to make it sound better a comma is missing between 'disabled' and
>> 'forced', or
>>
>> "PV spinlocks forcefully disabled by ..." if you prefer.
>
> Will do.
>
> Zhenzhong
>
>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 1/3] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-14  9:52 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, David S . Miller, vkuznets,
	Dexuan Cui
In-Reply-To: <DM5PR21MB013798776480FFA5DCD22442D7960@DM5PR21MB0137.namprd21.prod.outlook.com>

> > @@ -244,20 +232,18 @@ int vmbus_connect(void)
> >  	 * version.
> >  	 */
> > 
> > -	version = VERSION_CURRENT;
> > +	for (i = 0; i < ARRAY_SIZE(vmbus_versions); i++) {
> > +		version = vmbus_versions[i];
> > 
> > -	do {
> >  		ret = vmbus_negotiate_version(msginfo, version);
> >  		if (ret == -ETIMEDOUT)
> >  			goto cleanup;
> > 
> >  		if (vmbus_connection.conn_state == CONNECTED)
> >  			break;
> > +	}
> > 
> > -		version = vmbus_get_next_version(version);
> > -	} while (version != VERSION_INVAL);
> > -
> > -	if (version == VERSION_INVAL)
> > +	if (vmbus_connection.conn_state != CONNECTED)
> >  		goto cleanup;
> > 
> 
> This is a nit, but the loop exit path bugs me.  When a connection
> is established, the loop is exited by the "break", and then
> conn_state has to be tested again to decide whether the loop
> exited due to getting a connection vs. hitting the end of the list.
> Slightly cleaner in my mind would be:
> 
> 	for (i=0; ; i++) {
> 		if (i == ARRAY_SIZE(vmbus_versions))
> 			goto cleanup;
> 
> 		version  = vmbus_versions[i];
> 		ret = vmbus_negotiate_version(msginfo, version);
> 		if (ret == -ETIMEDOUT)
> 			goto cleanup;
> 
> 		if (vmbus_connection.conn_state == CONNECTED)
> 			break;
> 	}

Indeed.  I applied this locally, for the next iteration.  Thank you for
the review, Michael.

  Andrea

^ permalink raw reply

* Re: [PATCH v2 2/3] Drivers: hv: vmbus: Enable VMBus protocol versions 4.1, 5.1 and 5.2
From: Andrea Parri @ 2019-10-14 10:09 UTC (permalink / raw)
  To: Michael Kelley
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, David S . Miller, vkuznets,
	Dexuan Cui
In-Reply-To: <DM5PR21MB0137ACE07DCD2A6BBEC83665D7960@DM5PR21MB0137.namprd21.prod.outlook.com>

> > @@ -182,15 +182,21 @@ static inline u32 hv_get_avail_to_write_percent(
> >   * 2 . 4  (Windows 8)
> >   * 3 . 0  (Windows 8 R2)
> >   * 4 . 0  (Windows 10)
> > + * 4 . 1  (Windows 10 RS3)
> >   * 5 . 0  (Newer Windows 10)
> > + * 5 . 1  (Windows 10 RS4)
> > + * 5 . 2  (Windows Server 2019, RS5)
> >   */
> > 
> >  #define VERSION_WS2008  ((0 << 16) | (13))
> >  #define VERSION_WIN7    ((1 << 16) | (1))
> >  #define VERSION_WIN8    ((2 << 16) | (4))
> >  #define VERSION_WIN8_1    ((3 << 16) | (0))
> > -#define VERSION_WIN10	((4 << 16) | (0))
> > +#define VERSION_WIN10_V4 ((4 << 16) | (0))
> 
> I would recommend not changing the symbol name for version 4.0.
> The change makes it more consistent with the later VERSION_WIN10_*
> symbols, but it doesn't fundamentally add any clarity and I'm not sure
> it's worth the churn in the other files that have to be touched. It's a
> judgment call, and that's just my input.

My fingers were itching:  ;-) I've reverted this change, following
your recommendation.

Thanks,
  Andrea

^ permalink raw reply

* [PATCH v3 0/7] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas

From: Bjorn Helgaas <bhelgaas@google.com>

Dexuan, the important thing here is the first patch, which is your [1],
which I modified by doing pci_restore_state() as well as setting to D0:

  pci_set_power_state(pci_dev, PCI_D0);
  pci_restore_state(pci_dev);

I'm proposing some more patches on top.  None are relevant to the problem
you're solving; they're just minor doc and other updates in the same area.

Rafael, if you have a chance to look at these, I'd appreciate it.  I tried
to make the doc match the code, but I'm no PM expert.

[1] https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM


Dexuan Cui (1):
  PCI/PM: Always return devices to D0 when thawing

Bjorn Helgaas (6):
  PCI/PM: Correct pci_pm_thaw_noirq() documentation
  PCI/PM: Clear PCIe PME Status even for legacy power management
  PCI/PM: Run resume fixups before disabling wakeup events
  PCI/PM: Make power management op coding style consistent
  PCI/PM: Wrap long lines in documentation
  PCI/MSI: Move power state check out of pci_msi_supported()

 Documentation/power/pci.rst | 38 +++++++-------
 drivers/pci/msi.c           |  6 +--
 drivers/pci/pci-driver.c    | 99 ++++++++++++++++++-------------------
 3 files changed, 71 insertions(+), 72 deletions(-)

-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply

* [PATCH 1/7] PCI/PM: Always return devices to D0 when thawing
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas,
	stable
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Dexuan Cui <decui@microsoft.com>

pci_pm_thaw_noirq() is supposed to return the device to D0 and restore its
configuration registers, but previously it only did that for devices whose
drivers implemented the new power management ops.

Hibernation, e.g., via "echo disk > /sys/power/state", involves freezing
devices, creating a hibernation image, thawing devices, writing the image,
and powering off.  The fact that thawing did not return devices with legacy
power management to D0 caused errors, e.g., in this path:

  pci_pm_thaw_noirq
    if (pci_has_legacy_pm_support(pci_dev)) # true for Mellanox VF driver
      return pci_legacy_resume_early(dev)   # ... legacy PM skips the rest
    pci_set_power_state(pci_dev, PCI_D0)
    pci_restore_state(pci_dev)
  pci_pm_thaw
    if (pci_has_legacy_pm_support(pci_dev))
      pci_legacy_resume
	drv->resume
	  mlx4_resume
	    ...
	      pci_enable_msix_range
	        ...
		  if (dev->current_state != PCI_D0)  # <---
		    return -EINVAL;

which caused these warnings:

  mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
  PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
  PM: Device a6d1:00:02.0 failed to thaw: error -95

Return devices to D0 and restore config registers for all devices, not just
those whose drivers support new power management.

[bhelgaas: also call pci_restore_state() before pci_legacy_resume_early(),
update comment, add stable tag, commit log]
Link: https://lore.kernel.org/r/KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org	# v4.13+
---
 drivers/pci/pci-driver.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..d4ac8ce8c1f9 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1076,17 +1076,22 @@ static int pci_pm_thaw_noirq(struct device *dev)
 			return error;
 	}
 
-	if (pci_has_legacy_pm_support(pci_dev))
-		return pci_legacy_resume_early(dev);
-
 	/*
-	 * pci_restore_state() requires the device to be in D0 (because of MSI
-	 * restoration among other things), so force it into D0 in case the
-	 * driver's "freeze" callbacks put it into a low-power state directly.
+	 * Both the legacy ->resume_early() and the new pm->thaw_noirq()
+	 * callbacks assume the device has been returned to D0 and its
+	 * config state has been restored.
+	 *
+	 * In addition, pci_restore_state() restores MSI-X state in MMIO
+	 * space, which requires the device to be in D0, so return it to D0
+	 * in case the driver's "freeze" callbacks put it into a low-power
+	 * state.
 	 */
 	pci_set_power_state(pci_dev, PCI_D0);
 	pci_restore_state(pci_dev);
 
+	if (pci_has_legacy_pm_support(pci_dev))
+		return pci_legacy_resume_early(dev);
+
 	if (drv && drv->pm && drv->pm->thaw_noirq)
 		error = drv->pm->thaw_noirq(dev);
 
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 2/7] PCI/PM: Correct pci_pm_thaw_noirq() documentation
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

According to the documentation, pci_pm_thaw_noirq() did not put the device
into the full-power state and restore its standard configuration registers.
This is incorrect, so update the documentation to match the code.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/power/pci.rst | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index 0e2ef7429304..1525c594d631 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -600,17 +600,17 @@ using the following PCI bus type's callbacks::
 
 respectively.
 
-The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(),
-but it doesn't put the device into the full power state and doesn't attempt to
-restore its standard configuration registers.  It also executes the device
-driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq().
+The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq().
+It puts the device into the full power state and restores its standard
+configuration registers.  It also executes the device driver's pm->thaw_noirq()
+callback, if defined, instead of pm->resume_noirq().
 
 The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device
 driver's pm->thaw() callback instead of pm->resume().  It is executed
 asynchronously for different PCI devices that don't depend on each other in a
 known way.
 
-The complete phase it the same as for system resume.
+The complete phase is the same as for system resume.
 
 After saving the image, devices need to be powered down before the system can
 enter the target sleep state (ACPI S4 for ACPI-based systems).  This is done in
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 3/7] PCI/PM: Clear PCIe PME Status even for legacy power management
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

Previously, pci_pm_resume_noirq() cleared the PME Status bit in the Root
Status register only if the device had no driver or the driver did not
implement legacy power management.  It should clear PME Status regardless
of what sort of power management the driver supports, so do this before
checking for legacy power management.

This affects Root Ports and Root Complex Event Collectors, for which the
usual driver is the PCIe portdrv, which implements new power management, so
this change is just on principle, not to fix any actual defects.

Fixes: a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci-driver.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d4ac8ce8c1f9..0c3086793e4e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -941,12 +941,11 @@ static int pci_pm_resume_noirq(struct device *dev)
 		pci_pm_default_resume_early(pci_dev);
 
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
+	pcie_pme_root_status_cleanup(pci_dev);
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	pcie_pme_root_status_cleanup(pci_dev);
-
 	if (drv && drv->pm && drv->pm->resume_noirq)
 		error = drv->pm->resume_noirq(dev);
 
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 4/7] PCI/PM: Run resume fixups before disabling wakeup events
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

pci_pm_resume() and pci_pm_restore() call pci_pm_default_resume(), which
runs resume fixups before disabling wakeup events:

  static void pci_pm_default_resume(struct pci_dev *pci_dev)
  {
    pci_fixup_device(pci_fixup_resume, pci_dev);
    pci_enable_wake(pci_dev, PCI_D0, false);
  }

pci_pm_runtime_resume() does both of these, but in the opposite order:

  pci_enable_wake(pci_dev, PCI_D0, false);
  pci_fixup_device(pci_fixup_resume, pci_dev);

We should always use the same ordering unless there's a reason to do
otherwise.  Change pci_pm_runtime_resume() to call pci_pm_default_resume()
instead of open-coding this, so the fixups are always done before disabling
wakeup events.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci-driver.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 0c3086793e4e..55acb658273f 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1345,8 +1345,7 @@ static int pci_pm_runtime_resume(struct device *dev)
 		return 0;
 
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
-	pci_enable_wake(pci_dev, PCI_D0, false);
-	pci_fixup_device(pci_fixup_resume, pci_dev);
+	pci_pm_default_resume(pci_dev);
 
 	if (pm && pm->runtime_resume)
 		rc = pm->runtime_resume(dev);
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 7/7] PCI/MSI: Move power state check out of pci_msi_supported()
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

27e20603c54b ("PCI/MSI: Move D0 check into pci_msi_check_device()")
moved the power state check into pci_msi_check_device(), which was
subsequently renamed to pci_msi_supported().  This didn't change the
behavior, since both callers checked the power state.

However, it doesn't fit the current "pci_msi_supported()" name, which
should return what the device is capable of, independent of the power
state.

Move the power state check back into the callers for readability.  No
functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/msi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 0884bedcfc7a..20e9c729617c 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -861,7 +861,7 @@ static int pci_msi_supported(struct pci_dev *dev, int nvec)
 	if (!pci_msi_enable)
 		return 0;
 
-	if (!dev || dev->no_msi || dev->current_state != PCI_D0)
+	if (!dev || dev->no_msi)
 		return 0;
 
 	/*
@@ -972,7 +972,7 @@ static int __pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
 	int nr_entries;
 	int i, j;
 
-	if (!pci_msi_supported(dev, nvec))
+	if (!pci_msi_supported(dev, nvec) || dev->current_state != PCI_D0)
 		return -EINVAL;
 
 	nr_entries = pci_msix_vec_count(dev);
@@ -1058,7 +1058,7 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec,
 	int nvec;
 	int rc;
 
-	if (!pci_msi_supported(dev, minvec))
+	if (!pci_msi_supported(dev, minvec) || dev->current_state != PCI_D0)
 		return -EINVAL;
 
 	/* Check whether driver already requested MSI-X IRQs */
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 6/7] PCI/PM: Wrap long lines in documentation
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

Documentation/power/pci.rst is wrapped to fit in 80 columns, but directory
structure changes made a few lines longer.  Wrap them so they all fit in 80
columns again.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 Documentation/power/pci.rst | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/Documentation/power/pci.rst b/Documentation/power/pci.rst
index 1525c594d631..db41a770a2f5 100644
--- a/Documentation/power/pci.rst
+++ b/Documentation/power/pci.rst
@@ -426,12 +426,12 @@ pm->runtime_idle() callback.
 2.4. System-Wide Power Transitions
 ----------------------------------
 There are a few different types of system-wide power transitions, described in
-Documentation/driver-api/pm/devices.rst.  Each of them requires devices to be handled
-in a specific way and the PM core executes subsystem-level power management
-callbacks for this purpose.  They are executed in phases such that each phase
-involves executing the same subsystem-level callback for every device belonging
-to the given subsystem before the next phase begins.  These phases always run
-after tasks have been frozen.
+Documentation/driver-api/pm/devices.rst.  Each of them requires devices to be
+handled in a specific way and the PM core executes subsystem-level power
+management callbacks for this purpose.  They are executed in phases such that
+each phase involves executing the same subsystem-level callback for every device
+belonging to the given subsystem before the next phase begins.  These phases
+always run after tasks have been frozen.
 
 2.4.1. System Suspend
 ^^^^^^^^^^^^^^^^^^^^^
@@ -636,12 +636,12 @@ System restore requires a hibernation image to be loaded into memory and the
 pre-hibernation memory contents to be restored before the pre-hibernation system
 activity can be resumed.
 
-As described in Documentation/driver-api/pm/devices.rst, the hibernation image is loaded
-into memory by a fresh instance of the kernel, called the boot kernel, which in
-turn is loaded and run by a boot loader in the usual way.  After the boot kernel
-has loaded the image, it needs to replace its own code and data with the code
-and data of the "hibernated" kernel stored within the image, called the image
-kernel.  For this purpose all devices are frozen just like before creating
+As described in Documentation/driver-api/pm/devices.rst, the hibernation image
+is loaded into memory by a fresh instance of the kernel, called the boot kernel,
+which in turn is loaded and run by a boot loader in the usual way.  After the
+boot kernel has loaded the image, it needs to replace its own code and data with
+the code and data of the "hibernated" kernel stored within the image, called the
+image kernel.  For this purpose all devices are frozen just like before creating
 the image during hibernation, in the
 
 	prepare, freeze, freeze_noirq
@@ -691,8 +691,8 @@ controlling the runtime power management of their devices.
 
 At the time of this writing there are two ways to define power management
 callbacks for a PCI device driver, the recommended one, based on using a
-dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and the
-"legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
+dev_pm_ops structure described in Documentation/driver-api/pm/devices.rst, and
+the "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and
 .resume() callbacks from struct pci_driver are used.  The legacy approach,
 however, doesn't allow one to define runtime power management callbacks and is
 not really suitable for any new drivers.  Therefore it is not covered by this
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH 5/7] PCI/PM: Make power management op coding style consistent
From: Bjorn Helgaas @ 2019-10-14 23:00 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: Rafael J . Wysocki, Lorenzo Pieralisi, Michael Kelley,
	Sasha Levin, Haiyang Zhang, KY Srinivasan, Stephen Hemminger,
	olaf, apw, jasowang, vkuznets, marcelo.cerri, jackm, linux-pci,
	linux-hyperv, linux-kernel, driverdev-devel, Bjorn Helgaas
In-Reply-To: <20191014230016.240912-1-helgaas@kernel.org>

From: Bjorn Helgaas <bhelgaas@google.com>

Some of the power management ops use this style:

  struct device_driver *drv = dev->driver;
  if (drv && drv->pm && drv->pm->prepare(dev))
    drv->pm->prepare(dev);

while others use this:

  const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
  if (pm && pm->runtime_resume)
    pm->runtime_resume(dev);

Convert the first style to the second so they're all consistent.  Remove
local "error" variables when unnecessary.  No functional change intended.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pci-driver.c | 76 +++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 40 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 55acb658273f..abbf5c39cb9c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -679,11 +679,11 @@ static bool pci_has_legacy_pm_support(struct pci_dev *pci_dev)
 
 static int pci_pm_prepare(struct device *dev)
 {
-	struct device_driver *drv = dev->driver;
 	struct pci_dev *pci_dev = to_pci_dev(dev);
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
-	if (drv && drv->pm && drv->pm->prepare) {
-		int error = drv->pm->prepare(dev);
+	if (pm && pm->prepare) {
+		int error = pm->prepare(dev);
 		if (error < 0)
 			return error;
 
@@ -917,8 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
 static int pci_pm_resume_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (dev_pm_may_skip_resume(dev))
 		return 0;
@@ -946,17 +945,16 @@ static int pci_pm_resume_noirq(struct device *dev)
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	if (drv && drv->pm && drv->pm->resume_noirq)
-		error = drv->pm->resume_noirq(dev);
+	if (pm && pm->resume_noirq)
+		return pm->resume_noirq(dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_resume(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	/*
 	 * This is necessary for the suspend error path in which resume is
@@ -972,12 +970,12 @@ static int pci_pm_resume(struct device *dev)
 
 	if (pm) {
 		if (pm->resume)
-			error = pm->resume(dev);
+			return pm->resume(dev);
 	} else {
 		pci_pm_reenable_device(pci_dev);
 	}
 
-	return error;
+	return 0;
 }
 
 #else /* !CONFIG_SUSPEND */
@@ -1038,16 +1036,16 @@ static int pci_pm_freeze(struct device *dev)
 static int pci_pm_freeze_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_FREEZE);
 
-	if (drv && drv->pm && drv->pm->freeze_noirq) {
+	if (pm && pm->freeze_noirq) {
 		int error;
 
-		error = drv->pm->freeze_noirq(dev);
-		suspend_report_result(drv->pm->freeze_noirq, error);
+		error = pm->freeze_noirq(dev);
+		suspend_report_result(pm->freeze_noirq, error);
 		if (error)
 			return error;
 	}
@@ -1066,8 +1064,8 @@ static int pci_pm_freeze_noirq(struct device *dev)
 static int pci_pm_thaw_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error;
 
 	if (pcibios_pm_ops.thaw_noirq) {
 		error = pcibios_pm_ops.thaw_noirq(dev);
@@ -1091,10 +1089,10 @@ static int pci_pm_thaw_noirq(struct device *dev)
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	if (drv && drv->pm && drv->pm->thaw_noirq)
-		error = drv->pm->thaw_noirq(dev);
+	if (pm && pm->thaw_noirq)
+		return pm->thaw_noirq(dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_thaw(struct device *dev)
@@ -1165,24 +1163,24 @@ static int pci_pm_poweroff_late(struct device *dev)
 static int pci_pm_poweroff_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
 
 	if (dev_pm_smart_suspend_and_suspended(dev))
 		return 0;
 
-	if (pci_has_legacy_pm_support(to_pci_dev(dev)))
+	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_suspend_late(dev, PMSG_HIBERNATE);
 
-	if (!drv || !drv->pm) {
+	if (!pm) {
 		pci_fixup_device(pci_fixup_suspend_late, pci_dev);
 		return 0;
 	}
 
-	if (drv->pm->poweroff_noirq) {
+	if (pm->poweroff_noirq) {
 		int error;
 
-		error = drv->pm->poweroff_noirq(dev);
-		suspend_report_result(drv->pm->poweroff_noirq, error);
+		error = pm->poweroff_noirq(dev);
+		suspend_report_result(pm->poweroff_noirq, error);
 		if (error)
 			return error;
 	}
@@ -1208,8 +1206,8 @@ static int pci_pm_poweroff_noirq(struct device *dev)
 static int pci_pm_restore_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
-	struct device_driver *drv = dev->driver;
-	int error = 0;
+	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error;
 
 	if (pcibios_pm_ops.restore_noirq) {
 		error = pcibios_pm_ops.restore_noirq(dev);
@@ -1223,17 +1221,16 @@ static int pci_pm_restore_noirq(struct device *dev)
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
-	if (drv && drv->pm && drv->pm->restore_noirq)
-		error = drv->pm->restore_noirq(dev);
+	if (pm && pm->restore_noirq)
+		return pm->restore_noirq(dev);
 
-	return error;
+	return 0;
 }
 
 static int pci_pm_restore(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int error = 0;
 
 	/*
 	 * This is necessary for the hibernation error path in which restore is
@@ -1249,12 +1246,12 @@ static int pci_pm_restore(struct device *dev)
 
 	if (pm) {
 		if (pm->restore)
-			error = pm->restore(dev);
+			return pm->restore(dev);
 	} else {
 		pci_pm_reenable_device(pci_dev);
 	}
 
-	return error;
+	return 0;
 }
 
 #else /* !CONFIG_HIBERNATE_CALLBACKS */
@@ -1330,9 +1327,9 @@ static int pci_pm_runtime_suspend(struct device *dev)
 
 static int pci_pm_runtime_resume(struct device *dev)
 {
-	int rc = 0;
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	int error = 0;
 
 	/*
 	 * Restoring config space is necessary even if the device is not bound
@@ -1348,18 +1345,17 @@ static int pci_pm_runtime_resume(struct device *dev)
 	pci_pm_default_resume(pci_dev);
 
 	if (pm && pm->runtime_resume)
-		rc = pm->runtime_resume(dev);
+		error = pm->runtime_resume(dev);
 
 	pci_dev->runtime_d3cold = false;
 
-	return rc;
+	return error;
 }
 
 static int pci_pm_runtime_idle(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
-	int ret = 0;
 
 	/*
 	 * If pci_dev->driver is not set (unbound), the device should
@@ -1372,9 +1368,9 @@ static int pci_pm_runtime_idle(struct device *dev)
 		return -ENOSYS;
 
 	if (pm->runtime_idle)
-		ret = pm->runtime_idle(dev);
+		return pm->runtime_idle(dev);
 
-	return ret;
+	return 0;
 }
 
 static const struct dev_pm_ops pci_dev_pm_ops = {
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related

* [PATCH v6 0/5] Add a unified parameter "nopvspin"
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan

There are cases folks want to disable spinlock optimization for
debug/test purpose. Xen and hyperv already have parameters "xen_nopvspin"
and "hv_nopvspin" to support that, but kvm doesn't.

The first patch adds that feature to KVM guest with "nopvspin".

For compatibility reason original parameters "xen_nopvspin" and
"hv_nopvspin" are retained and marked obsolete.

v6:
PATCH1: add Reviewed-by                                  [Vitaly Kuznetsov]
PATCH2: change 'pv' to 'PV', add Reviewed-by             [Vitaly Kuznetsov]
PATCH3: refactor 'if' branch in kvm_spinlock_init()      [Vitaly Kuznetsov]

v5:
PATCH1: new patch to revert a currently unnecessory commit,
        code is simpler a bit after that change.         [Boris Ostrovsky]
PATCH3: fold 'if' statement,add comments on virt_spin_lock_key,
        reorder with PATCH2 to better reflect dependency                               
PATCH4: fold 'if' statement, add Reviewed-by             [Boris Ostrovsky]
PATCH5: add Reviewed-by                                  [Michael Kelley]

v4:
PATCH1: use variable name nopvspin instead of pvspin and
        defined it as __initdata, changed print message,
        updated patch description                     [Sean Christopherson]
PATCH2: remove Suggested-by, use "kvm-guest:" prefix  [Sean Christopherson]
PATCH3: make variable nopvsin and xen_pvspin coexist
        remove Reviewed-by due to code change         [Sean Christopherson]
PATCH4: make variable nopvsin and hv_pvspin coexist   [Sean Christopherson]

v3:
PATCH2: Fix indentation

v2:
PATCH1: pick the print code change into separate PATCH2,
        updated patch description             [Vitaly Kuznetsov]
PATCH2: new patch with print code change      [Vitaly Kuznetsov]
PATCH3: add Reviewed-by                       [Juergen Gross]

Zhenzhong Duan (5):
  Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key
    get initialized"
  x86/kvm: Change print code to use pr_*() format
  x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
  xen: Mark "xen_nopvspin" parameter obsolete
  x86/hyperv: Mark "hv_nopvspin" parameter obsolete

 Documentation/admin-guide/kernel-parameters.txt | 14 ++++-
 arch/x86/hyperv/hv_spinlock.c                   |  4 ++
 arch/x86/include/asm/qspinlock.h                |  1 +
 arch/x86/kernel/kvm.c                           | 76 ++++++++++++++++---------
 arch/x86/xen/spinlock.c                         |  4 +-
 kernel/locking/qspinlock.c                      |  7 +++
 6 files changed, 75 insertions(+), 31 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* [PATCH v6 2/5] x86/kvm: Change print code to use pr_*() format
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

pr_*() is preferred than printk(KERN_* ...), after change all the print
in arch/x86/kernel/kvm.c will have "kvm_guest: xxx" style.

No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/kvm.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3bc6a266..249f14a 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -7,6 +7,8 @@
  *   Authors: Anthony Liguori <aliguori@us.ibm.com>
  */
 
+#define pr_fmt(fmt) "kvm_guest: " fmt
+
 #include <linux/context_tracking.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
 		return;
 
 	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
-	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
-		cpu, (unsigned long long) slow_virt_to_phys(st));
+	pr_info("stealtime: cpu %d, msr %llx\n", cpu,
+		(unsigned long long) slow_virt_to_phys(st));
 }
 
 static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
@@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
 
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
-		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
-		       smp_processor_id());
+		pr_info("setup async PF for cpu %d\n", smp_processor_id());
 	}
 
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
@@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
 	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
 	__this_cpu_write(apf_reason.enabled, 0);
 
-	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
-	       smp_processor_id());
+	pr_info("Unregister PV shared memory for cpu %d\n", smp_processor_id());
 }
 
 static void kvm_pv_guest_cpu_reboot(void *unused)
@@ -469,7 +469,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 		} else {
 			ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
 				(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
-			WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+			WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+				  ret);
 			min = max = apic_id;
 			ipi_bitmap = 0;
 		}
@@ -479,7 +480,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 	if (ipi_bitmap) {
 		ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
 			(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
-		WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+		WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+			  ret);
 	}
 
 	local_irq_restore(flags);
@@ -509,7 +511,7 @@ static void kvm_setup_pv_ipi(void)
 {
 	apic->send_IPI_mask = kvm_send_ipi_mask;
 	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
-	pr_info("KVM setup pv IPIs\n");
+	pr_info("setup PV IPIs\n");
 }
 
 static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
@@ -631,11 +633,11 @@ static void __init kvm_guest_init(void)
 	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
 	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		smp_ops.send_call_func_ipi = kvm_smp_send_call_func_ipi;
-		pr_info("KVM setup pv sched yield\n");
+		pr_info("setup PV sched yield\n");
 	}
 	if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
 				      kvm_cpu_online, kvm_cpu_down_prepare) < 0)
-		pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
+		pr_err("failed to install cpu hotplug callbacks\n");
 #else
 	sev_map_percpu_data();
 	kvm_guest_cpu_init();
@@ -738,7 +740,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
 			zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
 				GFP_KERNEL, cpu_to_node(cpu));
 		}
-		pr_info("KVM setup pv remote TLB flush\n");
+		pr_info("setup PV remote TLB flush\n");
 	}
 
 	return 0;
@@ -866,8 +868,8 @@ static void kvm_enable_host_haltpoll(void *i)
 void arch_haltpoll_enable(unsigned int cpu)
 {
 	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
-		pr_err_once("kvm: host does not support poll control\n");
-		pr_err_once("kvm: host upgrade recommended\n");
+		pr_err_once("host does not support poll control\n");
+		pr_err_once("host upgrade recommended\n");
 		return;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v6 4/5] xen: Mark "xen_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
as we use qspinlock now.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 ++++---
 arch/x86/xen/spinlock.c                         | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bd49ed2..85059dd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5307,8 +5307,9 @@
 			panic() code such as dumping handler.
 
 	xen_nopvspin	[X86,XEN]
-			Disables the ticketlock slowpath using Xen PV
-			optimizations.
+			Disables the qspinlock slowpath using Xen PV optimizations.
+			This parameter is obsoleted by "nopvspin" parameter, which
+			has equivalent effect for XEN platform.
 
 	xen_nopv	[X86]
 			Disables the PV optimizations forcing the HVM guest to
@@ -5334,7 +5335,7 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-	nopvspin	[X86,KVM]
+	nopvspin	[X86,XEN,KVM]
 			Disables the qspinlock slow path using PV optimizations
 			which allow the hypervisor to 'idle' the guest on lock
 			contention.
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 6deb490..799f4eb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,9 +114,8 @@ void xen_uninit_lock_cpu(int cpu)
  */
 void __init xen_init_spinlocks(void)
 {
-
 	/*  Don't need to use pvqspinlock code if there is only 1 vCPU. */
-	if (num_possible_cpus() == 1)
+	if (num_possible_cpus() == 1 || nopvspin)
 		xen_pvspin = false;
 
 	if (!xen_pvspin) {
@@ -137,6 +136,7 @@ void __init xen_init_spinlocks(void)
 
 static __init int xen_parse_nopvspin(char *arg)
 {
+	pr_notice("\"xen_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
 	xen_pvspin = false;
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v6 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-15  1:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin,
	Will Deacon
In-Reply-To: <1571102367-31595-1-git-send-email-zhenzhong.duan@oracle.com>

There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).

That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.

The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
parameters in future patches.

Define variable nopvsin as global because it will be used in future
patches as above.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  5 ++++
 arch/x86/include/asm/qspinlock.h                |  1 +
 arch/x86/kernel/kvm.c                           | 34 ++++++++++++++++++++++---
 kernel/locking/qspinlock.c                      |  7 +++++
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f..bd49ed2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5334,6 +5334,11 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
+	nopvspin	[X86,KVM]
+			Disables the qspinlock slow path using PV optimizations
+			which allow the hypervisor to 'idle' the guest on lock
+			contention.
+
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
 			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd..d86ab94 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
 extern void __pv_init_lock_hash(void);
 extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
+extern bool nopvspin;
 
 #define	queued_spin_unlock queued_spin_unlock
 /**
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 249f14a..e9c76d8 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -825,18 +825,44 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
  */
 void __init kvm_spinlock_init(void)
 {
-	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
-	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+	/*
+	 * PV spinlocks is disabled if no host side support, then native
+	 * qspinlock will be used. As native qspinlock is a fair lock, there is
+	 * lock holder preemption issue using it in a guest, imaging one pCPU
+	 * running 10 vCPUs of same guest contending same lock.
+	 *
+	 * virt_spin_lock() is introduced as an optimization for that scenario
+	 * which is enabled by virt_spin_lock_key key. To use that optimization,
+	 * virt_spin_lock_key isn't disabled here.
+	 */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT)) {
+		pr_info("PV spinlocks disabled, no host support.\n");
 		return;
+	}
 
+	/*
+	 * Disable PV qspinlock and use native qspinlock when dedicated pCPUs
+	 * are available.
+	 */
 	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
+		static_branch_disable(&virt_spin_lock_key);
+		return;
+	}
+
+	if (num_possible_cpus() == 1) {
+		pr_info("PV spinlocks disabled, single CPU.\n");
 		static_branch_disable(&virt_spin_lock_key);
 		return;
 	}
 
-	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
-	if (num_possible_cpus() == 1)
+	if (nopvspin) {
+		pr_info("PV spinlocks disabled, forced by \"nopvspin\" parameter.\n");
+		static_branch_disable(&virt_spin_lock_key);
 		return;
+	}
+
+	pr_info("PV spinlocks enabled\n");
 
 	__pv_init_lock_hash();
 	pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10..75193d6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 #include "qspinlock_paravirt.h"
 #include "qspinlock.c"
 
+bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
 #endif
-- 
1.8.3.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox