Linux-HyperV List
 help / color / mirror / Atom feed
* RE: [PATCH 1/2] x86/hyperv: Allow guests to enable InvariantTSC
From: Vitaly Kuznetsov @ 2019-10-08 15:12 UTC (permalink / raw)
  To: Michael Kelley, Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org, x86@kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H . Peter Anvin,
	Andrea Parri
In-Reply-To: <DM5PR21MB01371F96CD845743D9777DC5D79E0@DM5PR21MB0137.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Friday, October 4, 2019 9:57 AM
>> 
>> Andrea Parri <parri.andrea@gmail.com> writes:
>> 
>> > If the hardware supports TSC scaling, Hyper-V will set bit 15 of the
>> > HV_PARTITION_PRIVILEGE_MASK in guest VMs with a compatible Hyper-V
>> > configuration version.  Bit 15 corresponds to the
>> > AccessTscInvariantControls privilege.  If this privilege bit is set,
>> > guests can access the HvSyntheticInvariantTscControl MSR: guests can
>> > set bit 0 of this synthetic MSR to enable the InvariantTSC feature.
>> > After setting the synthetic MSR, CPUID will enumerate support for
>> > InvariantTSC.
>> 
>> I tried getting more information from TLFS but as of 5.0C this feature
>> is not described there. I'm really interested in why this additional
>> interface is needed, e.g. why can't Hyper-V just set InvariantTSC
>> unconditionally when TSC scaling is supported?
>> 
>
> Yes, this is very new functionality that is not yet available in a released
> version of Hyper-V.  And as you know, the Hyper-V TLFS has gotten
> woefully out-of-date. :-(
>
> Your question is the same question I asked.   The reason given by
> Hyper-V is to take the more cautious approach of not "automatically"
> giving VMs an InvariantTSC due to updating the underlying Hyper-V
> version.  Instead, guest VMs must have been explicitly coded to take
> advantage of the new InvariantTSC feature.  It's not clear to me how
> much of this caution is driven by Windows guests vs. Linux or FreeBSD
> guests, but it is what it is.
>
> Having to explicitly enable the InvariantTSC does give the Linux code
> the opportunity to be a bit cleaner by doing things like not marking
> the TSC as unstable when the InvariantTSC feature is present, and to
> mark the TSC as reliable so we don't try to do TSC synchronization
> (which Hyper-V does not want guests to try to do).

Thank you for these additional details Michael,

we'll probably have to add support for this bit to KVM and I'd like to
know the background. From Linux perspective, no matter what's the
interface we'd like to get InvariantTSC.

Feel free to add

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 0/2] Drivers: hv: vmbus: Miscellaneous improvements
From: Andrea Parri @ 2019-10-08 15:08 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, vkuznets
In-Reply-To: <PU1P153MB01691E9B0DD83E521B1E12C0BF9B0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Mon, Oct 07, 2019 at 05:41:10PM +0000, Dexuan Cui wrote:
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andrea Parri
> > Sent: Monday, October 7, 2019 9:31 AM
> > 
> > Hi all,
> > 
> > The patchset:
> > 
> > - simplifies/refactors the VMBus negotiation code by introducing
> >   the table of VMBus protocol versions (patch 1/2),
> > 
> > - enables VMBus protocol versions 5.1 and 5.2 (patch 2/2).
> > 
> > Thanks,
> >   Andrea
> > 
> > Andrea Parri (2):
> >   Drivers: hv: vmbus: Introduce table of VMBus protocol versions
> >   Drivers: hv: vmbus: Enable VMBus protocol versions 5.1 and 5.2
> 
> Should we add a module parameter to allow the user to specify a lower
> protocol version, when the VM runs on the latest host? 
> 
> This can be useful for testing and debugging purpose: the variable
> "vmbus_proto_version" is referenced by the vmbus driver itself and
> some VSC drivers: if we always use the latest available proto version,
> some code paths can not be tested on the latest hosts. 

The idea is appealing to me (altough I made no attempt to implement/test
it yet).  What do others think about this?  Maybe can be considered as a
follow-up patch/work to this series?

Thanks,
  Andrea

^ permalink raw reply

* Re: [RFC PATCH] mm: set memory section offline when all its pages are offline.
From: David Hildenbrand @ 2019-10-08 14:51 UTC (permalink / raw)
  To: lantianyu1986, pbonzini, rkrcmar, tglx, mingo, bp, hpa, x86,
	dave.hansen, luto, peterz, kys, haiyangz, sthemmin, sashal, akpm,
	rppt, jgross, mhocko, paul.burton, m.mizuma, huang.zijiang,
	karahmed, dan.j.williams, bhelgaas, osalvador, rdunlap,
	richardw.yang, pavel.tatashin, cai, arunks, vbabka, mgorman,
	alexander.h.duyck, glider, logang, bsingharora, bhe, Tianyu.Lan,
	michael.h.kelley
  Cc: kvm, linux-kernel, linux-hyperv, linux-mm, vkuznets
In-Reply-To: <20191008143648.11882-1-Tianyu.Lan@microsoft.com>

On 08.10.19 16:36, lantianyu1986@gmail.com wrote:
> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> 
> If size of offline memory region passed to offline_pages() is
> not aligned with PAGES_PER_SECTION, memory section will be set
> to offline in the offline_mem_sections() with some pages of
> memory section online. Fix it, Update memory section status after
> marking offline pages as "reserved" in __offline_isolated_pages()
> and check all pages in memory are reserved or not before setting
> memory section offline.
> 
> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> ---
> This patch is to prepare for hot remove memory function in Hyper-V
> balloon driver. It requests to offline memory with random size.

I proposed roughly the same a while ago. See

https://lkml.org/lkml/2018/4/30/207

Memory onlining/offlining works in memory block granularity only.
Sub-sections, you have to emulate it on top, similar to how hyper-v
balloon handles it already. E.g., have a look how virtio-mem handles it
using alloc_contig_range/free_contig_range and PG_offline extensions.

https://lkml.org/lkml/2018/4/30/207

So a clear NACK from my side.


> ---
>  mm/page_alloc.c |  3 ++-
>  mm/sparse.c     | 10 ++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dbd0d5cbbcbb..cc02866924ae 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -8540,7 +8540,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	if (pfn == end_pfn)
>  		return offlined_pages;
>  
> -	offline_mem_sections(pfn, end_pfn);
>  	zone = page_zone(pfn_to_page(pfn));
>  	spin_lock_irqsave(&zone->lock, flags);
>  	pfn = start_pfn;
> @@ -8576,6 +8575,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	}
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
> +	offline_mem_sections(pfn, end_pfn);
> +
>  	return offlined_pages;
>  }
>  #endif
> diff --git a/mm/sparse.c b/mm/sparse.c
> index fd13166949b5..eb5860487b84 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -571,6 +571,7 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  {
>  	unsigned long pfn;
> +	int i;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>  		unsigned long section_nr = pfn_to_section_nr(pfn);
> @@ -583,6 +584,15 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
>  		if (WARN_ON(!valid_section_nr(section_nr)))
>  			continue;
>  
> +		/*
> +		 * Check whether all pages in the section are reserverd before
> +		 * setting setction offline.
> +		 */
> +		for (i = 0; i < PAGES_PER_SECTION; i++)
> +			if (!PageReserved(pfn_to_page(
> +			    SECTION_ALIGN_DOWN(pfn + i))))
> +				continue;
> +
>  		ms = __nr_to_section(section_nr);
>  		ms->section_mem_map &= ~SECTION_IS_ONLINE;
>  	}
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply

* [RFC PATCH] mm: set memory section offline when all its pages are offline.
From: lantianyu1986 @ 2019-10-08 14:36 UTC (permalink / raw)
  To: pbonzini, rkrcmar, tglx, mingo, bp, hpa, x86, dave.hansen, luto,
	peterz, kys, haiyangz, sthemmin, sashal, akpm, rppt, jgross,
	mhocko, paul.burton, m.mizuma, huang.zijiang, karahmed,
	dan.j.williams, bhelgaas, osalvador, rdunlap, richardw.yang,
	david, pavel.tatashin, cai, arunks, vbabka, mgorman,
	alexander.h.duyck, glider, logang, bsingharora, bhe, Tianyu.Lan,
	michael.h.kelley
  Cc: kvm, linux-kernel, linux-hyperv, linux-mm, vkuznets

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

If size of offline memory region passed to offline_pages() is
not aligned with PAGES_PER_SECTION, memory section will be set
to offline in the offline_mem_sections() with some pages of
memory section online. Fix it, Update memory section status after
marking offline pages as "reserved" in __offline_isolated_pages()
and check all pages in memory are reserved or not before setting
memory section offline.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
This patch is to prepare for hot remove memory function in Hyper-V
balloon driver. It requests to offline memory with random size.
---
 mm/page_alloc.c |  3 ++-
 mm/sparse.c     | 10 ++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dbd0d5cbbcbb..cc02866924ae 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8540,7 +8540,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 	if (pfn == end_pfn)
 		return offlined_pages;
 
-	offline_mem_sections(pfn, end_pfn);
 	zone = page_zone(pfn_to_page(pfn));
 	spin_lock_irqsave(&zone->lock, flags);
 	pfn = start_pfn;
@@ -8576,6 +8575,8 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 	}
 	spin_unlock_irqrestore(&zone->lock, flags);
 
+	offline_mem_sections(pfn, end_pfn);
+
 	return offlined_pages;
 }
 #endif
diff --git a/mm/sparse.c b/mm/sparse.c
index fd13166949b5..eb5860487b84 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -571,6 +571,7 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 {
 	unsigned long pfn;
+	int i;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
 		unsigned long section_nr = pfn_to_section_nr(pfn);
@@ -583,6 +584,15 @@ void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
 		if (WARN_ON(!valid_section_nr(section_nr)))
 			continue;
 
+		/*
+		 * Check whether all pages in the section are reserverd before
+		 * setting setction offline.
+		 */
+		for (i = 0; i < PAGES_PER_SECTION; i++)
+			if (!PageReserved(pfn_to_page(
+			    SECTION_ALIGN_DOWN(pfn + i))))
+				continue;
+
 		ms = __nr_to_section(section_nr);
 		ms->section_mem_map &= ~SECTION_IS_ONLINE;
 	}
-- 
2.14.5


^ permalink raw reply related

* [PATCH RFC] kconfig: add hvconfig for Linux on Hyper-V
From: Wei Liu @ 2019-10-08 13:15 UTC (permalink / raw)
  To: Linux on Hyper-V List; +Cc: Linux Kernel List, Linux Kconfig List, Wei Liu

From: Wei Liu <liuwe@microsoft.com>

Add an config file snippet which enalbes additional options useful for
running the kernel in a Hyper-V guest.

The expected use case is a user provides an existing config file then
executes `make hvconfig`. It will merge those options with the
provided config file.

Based on similar concept for Xen and KVM.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
RFC: I only tested this on x86.  Although the config options included in
hv_guest.config don't seem to be arch-specific, we should probably
move the ones not yet implemented on Arm to an x86 specific config
file.
---
 Documentation/admin-guide/README.rst |  3 +++
 kernel/configs/hv_guest.config       | 33 ++++++++++++++++++++++++++++
 scripts/kconfig/Makefile             |  5 +++++
 3 files changed, 41 insertions(+)
 create mode 100644 kernel/configs/hv_guest.config

diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst
index cc6151fc0845..d5f4389a7a2f 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -224,6 +224,9 @@ Configuring the kernel
      "make xenconfig"   Enable additional options for xen dom0 guest kernel
                         support.
 
+     "make hvconfig"    Enable additional options for Hyper-V guest kernel
+                        support.
+
      "make tinyconfig"  Configure the tiniest possible kernel.
 
    You can find more information on using the Linux kernel config tools
diff --git a/kernel/configs/hv_guest.config b/kernel/configs/hv_guest.config
new file mode 100644
index 000000000000..0e71e34a2d4d
--- /dev/null
+++ b/kernel/configs/hv_guest.config
@@ -0,0 +1,33 @@
+CONFIG_NET=y
+CONFIG_NET_CORE=y
+CONFIG_NETDEVICES=y
+CONFIG_BLOCK=y
+CONFIG_BLK_DEV=y
+CONFIG_NETWORK_FILESYSTEMS=y
+CONFIG_INET=y
+CONFIG_TTY=y
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_BINFMT_ELF=y
+CONFIG_PCI=y
+CONFIG_PCI_MSI=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_VIRTUALIZATION=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_HYPERV=y
+CONFIG_HYPERV_VSOCKETS=y
+CONFIG_PCI_HYPERV=y
+CONFIG_PCI_HYPERV_INTERFACE=y
+CONFIG_HYPERV_STORAGE=y
+CONFIG_HYPERV_NET=y
+CONFIG_HYPERV_KEYBOARD=y
+CONFIG_FB_HYPERV=y
+CONFIG_HID_HYPERV_MOUSE=y
+CONFIG_HYPERV=y
+CONFIG_HYPERV_TIMER=y
+CONFIG_HYPERV_UTILS=y
+CONFIG_HYPERV_BALLOON=y
+CONFIG_HYPERV_IOMMU=y
diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index ef2f2336c469..2ee46301b22e 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -104,6 +104,10 @@ PHONY += xenconfig
 xenconfig: xen.config
 	@:
 
+PHONY += hvconfig
+hvconfig: hv_guest.config
+	@:
+
 PHONY += tinyconfig
 tinyconfig:
 	$(Q)$(MAKE) -f $(srctree)/Makefile allnoconfig tiny.config
@@ -138,6 +142,7 @@ help:
 	@echo  '                    default value without prompting'
 	@echo  '  kvmconfig	  - Enable additional options for kvm guest kernel support'
 	@echo  '  xenconfig       - Enable additional options for xen dom0 and guest kernel support'
+	@echo  '  hvconfig        - Enable additional options for Hyper-V guest kernel support'
 	@echo  '  tinyconfig	  - Configure the tiniest possible kernel'
 	@echo  '  testconfig	  - Run Kconfig unit tests (requires python3 and pytest)'
 
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Vitaly Kuznetsov @ 2019-10-08 13:00 UTC (permalink / raw)
  To: Andrea Parri
  Cc: linux-kernel, linux-hyperv, K . Y . Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Michael Kelley, Dexuan Cui
In-Reply-To: <20191008124052.GA11245@andrea.guest.corp.microsoft.com>

Andrea Parri <parri.andrea@gmail.com> writes:

>> > @@ -244,21 +234,18 @@ int vmbus_connect(void)
>> >  	 * version.
>> >  	 */
>> >  
>> > -	version = VERSION_CURRENT;
>> > +	for (i = 0; ; i++) {
>> > +		version = vmbus_versions[i];
>> > +		if (version == VERSION_INVAL)
>> > +			goto cleanup;
>> 
>> If you use e.g. ARRAY_SIZE() you can get rid of VERSION_INVAL - and make
>> this code look more natural.
>
> Thank you for pointing this out, Vitaly.
>
> IIUC, you're suggesting that I do:
>
> 	for (i = 0; i < ARRAY_SIZE(vmbus_versions); i++) {
> 		version = vmbus_versions[i];
>
> 		ret = vmbus_negotiate_version(msginfo, version);
> 		if (ret == -ETIMEDOUT)
> 			goto cleanup;
>
> 		if (vmbus_connection.conn_state == CONNECTED)
> 			break;
> 	}
>
> 	if (vmbus_connection.conn_state != CONNECTED)
> 		break;
>
> and that I remove VERSION_INVAL from vmbus_versions, yes?
>

Yes, something like that.

> Looking at the uses of VERSION_INVAL, I find one remaining occurrence
> of this macro in vmbus_bus_resume(), which does:
>
> 	if (vmbus_proto_version == VERSION_INVAL ||
> 	    vmbus_proto_version == 0) {
> 		...
> 	}
>
> TBH I'm looking at vmbus_bus_resume() and vmbus_bus_suspend() for the
> first time and I'm not sure about how to change such check yet... any
> suggestions?

Hm, I don't think vmbus_proto_version can ever become == VERSION_INVAL
if we rewrite the code the way you suggest, right? So you'll reduce this
to 'if (!vmbus_proto_version)' meaning we haven't negotiated any version
(yet).

>
> Mmh, I see that vmbus_bus_resume() and vmbus_bus_suspend() can access
> vmbus_connection.conn_state: can such accesses race with a concurrent
> vmbus_connect()?

Let's summon Dexuan who's the author! :-) 

>
> Thanks,
>   Andrea
>
>
>> >  
>> > -	do {
>> >  		ret = vmbus_negotiate_version(msginfo, version);
>> >  		if (ret == -ETIMEDOUT)
>> >  			goto cleanup;
>> >  
>> >  		if (vmbus_connection.conn_state == CONNECTED)
>> >  			break;
>> > -
>> > -		version = vmbus_get_next_version(version);
>> > -	} while (version != VERSION_INVAL);
>> > -
>> > -	if (version == VERSION_INVAL)
>> > -		goto cleanup;
>> > +	}
>> >  
>> >  	vmbus_proto_version = version;
>> >  	pr_info("Vmbus version:%d.%d\n",

-- 
Vitaly


^ permalink raw reply

* Re: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-08 12:44 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, linux-hyperv, K . Y . Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Michael Kelley, Dexuan Cui
In-Reply-To: <20191008124052.GA11245@andrea.guest.corp.microsoft.com>

> IIUC, you're suggesting that I do:
> 
> 	for (i = 0; i < ARRAY_SIZE(vmbus_versions); i++) {
> 		version = vmbus_versions[i];
> 
> 		ret = vmbus_negotiate_version(msginfo, version);
> 		if (ret == -ETIMEDOUT)
> 			goto cleanup;
> 
> 		if (vmbus_connection.conn_state == CONNECTED)
> 			break;
> 	}
> 
> 	if (vmbus_connection.conn_state != CONNECTED)
> 		break;

This "break" should have been "goto cleanup".  Sorry.

  Andrea

^ permalink raw reply

* Re: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-08 12:42 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org,
	KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, vkuznets
In-Reply-To: <PU1P153MB0169CAC756996A623CFDFBA7BF9B0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Mon, Oct 07, 2019 at 05:25:18PM +0000, Dexuan Cui wrote:
> > From: linux-hyperv-owner@vger.kernel.org
> > <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andrea Parri
> > Sent: Monday, October 7, 2019 9:31 AM
> > ....
> > +/*
> > + * Table of VMBus versions listed from newest to oldest; the table
> > + * must terminate with VERSION_INVAL.
> > + */
> > +__u32 vmbus_versions[] = {
> > +	VERSION_WIN10_V5,
> 
> This should be "static"?

I think so, will add in v2.  Thank you for pointing this out, Dexuan.

  Andrea

^ permalink raw reply

* Re: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-08 12:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-kernel, linux-hyperv, K . Y . Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Michael Kelley, Dexuan Cui
In-Reply-To: <87eezo1nrr.fsf@vitty.brq.redhat.com>

> > @@ -244,21 +234,18 @@ int vmbus_connect(void)
> >  	 * version.
> >  	 */
> >  
> > -	version = VERSION_CURRENT;
> > +	for (i = 0; ; i++) {
> > +		version = vmbus_versions[i];
> > +		if (version == VERSION_INVAL)
> > +			goto cleanup;
> 
> If you use e.g. ARRAY_SIZE() you can get rid of VERSION_INVAL - and make
> this code look more natural.

Thank you for pointing this out, Vitaly.

IIUC, you're suggesting that I do:

	for (i = 0; i < ARRAY_SIZE(vmbus_versions); i++) {
		version = vmbus_versions[i];

		ret = vmbus_negotiate_version(msginfo, version);
		if (ret == -ETIMEDOUT)
			goto cleanup;

		if (vmbus_connection.conn_state == CONNECTED)
			break;
	}

	if (vmbus_connection.conn_state != CONNECTED)
		break;

and that I remove VERSION_INVAL from vmbus_versions, yes?

Looking at the uses of VERSION_INVAL, I find one remaining occurrence
of this macro in vmbus_bus_resume(), which does:

	if (vmbus_proto_version == VERSION_INVAL ||
	    vmbus_proto_version == 0) {
		...
	}

TBH I'm looking at vmbus_bus_resume() and vmbus_bus_suspend() for the
first time and I'm not sure about how to change such check yet... any
suggestions?

Mmh, I see that vmbus_bus_resume() and vmbus_bus_suspend() can access
vmbus_connection.conn_state: can such accesses race with a concurrent
vmbus_connect()?

Thanks,
  Andrea


> >  
> > -	do {
> >  		ret = vmbus_negotiate_version(msginfo, version);
> >  		if (ret == -ETIMEDOUT)
> >  			goto cleanup;
> >  
> >  		if (vmbus_connection.conn_state == CONNECTED)
> >  			break;
> > -
> > -		version = vmbus_get_next_version(version);
> > -	} while (version != VERSION_INVAL);
> > -
> > -	if (version == VERSION_INVAL)
> > -		goto cleanup;
> > +	}
> >  
> >  	vmbus_proto_version = version;
> >  	pr_info("Vmbus version:%d.%d\n",

^ permalink raw reply

* [PATCH v5 1/5] Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized"
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

This reverts commit 34226b6b70980a8f81fff3c09a2c889f77edeeff.

Commit 8990cac6e5ea ("x86/jump_label: Initialize static branching
early") adds jump_label_init() call in setup_arch() to make static
keys initialized early, so we could use the original simpler code
again.

The similar change for XEN is in commit 090d54bcbc54 ("Revert
"x86/paravirt: Set up the virt_spin_lock_key after static keys get
initialized"")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/kvm.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index e820568..3bc6a266 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -527,13 +527,6 @@ static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
 	}
 }
 
-static void __init kvm_smp_prepare_cpus(unsigned int max_cpus)
-{
-	native_smp_prepare_cpus(max_cpus);
-	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
-		static_branch_disable(&virt_spin_lock_key);
-}
-
 static void __init kvm_smp_prepare_boot_cpu(void)
 {
 	/*
@@ -633,7 +626,6 @@ static void __init kvm_guest_init(void)
 		apic_set_eoi_write(kvm_guest_apic_eoi_write);
 
 #ifdef CONFIG_SMP
-	smp_ops.smp_prepare_cpus = kvm_smp_prepare_cpus;
 	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 	if (kvm_para_has_feature(KVM_FEATURE_PV_SCHED_YIELD) &&
 	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
@@ -835,8 +827,10 @@ void __init kvm_spinlock_init(void)
 	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
 		return;
 
-	if (kvm_para_has_hint(KVM_HINTS_REALTIME))
+	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+		static_branch_disable(&virt_spin_lock_key);
 		return;
+	}
 
 	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
 	if (num_possible_cpus() == 1)
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v5 3/5] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin,
	Will Deacon
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" on XEN platform and
"hv_nopvspin" on HYPER_V).

That feature is missed on KVM, add a new parameter "nopvspin" to disable
PV spinlocks for KVM guest.

The new 'nopvspin' parameter will also replace Xen and Hyper-V specific
parameters in future patches.

Define variable nopvsin as global because it will be used in future
patches as above.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  5 +++++
 arch/x86/include/asm/qspinlock.h                |  1 +
 arch/x86/kernel/kvm.c                           | 21 +++++++++++++++++----
 kernel/locking/qspinlock.c                      |  7 +++++++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7ac2f3..89d77ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5330,6 +5330,11 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
+	nopvspin	[X86,KVM]
+			Disables the qspinlock slow path using PV optimizations
+			which allow the hypervisor to 'idle' the guest on lock
+			contention.
+
 	xirc2ps_cs=	[NET,PCMCIA]
 			Format:
 			<irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 444d6fd..d86ab94 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -32,6 +32,7 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo
 extern void __pv_init_lock_hash(void);
 extern void __pv_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __raw_callee_save___pv_queued_spin_unlock(struct qspinlock *lock);
+extern bool nopvspin;
 
 #define	queued_spin_unlock queued_spin_unlock
 /**
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index ef836d6..6e14bd4 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -825,18 +825,31 @@ __visible bool __kvm_vcpu_is_preempted(long cpu)
  */
 void __init kvm_spinlock_init(void)
 {
-	/* Does host kernel support KVM_FEATURE_PV_UNHALT? */
-	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
+	/*
+	 * Disable PV qspinlocks if host kernel doesn't support
+	 * KVM_FEATURE_PV_UNHALT feature or there is only 1 vCPU.
+	 * virt_spin_lock_key is enabled to avoid lock holder
+	 * preemption issue.
+	 */
+	if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
+	    num_possible_cpus() == 1) {
+		pr_info("PV spinlocks disabled\n");
 		return;
+	}
 
 	if (kvm_para_has_hint(KVM_HINTS_REALTIME)) {
+		pr_info("PV spinlocks disabled with KVM_HINTS_REALTIME hints.\n");
 		static_branch_disable(&virt_spin_lock_key);
 		return;
 	}
 
-	/* Don't use the pvqspinlock code if there is only 1 vCPU. */
-	if (num_possible_cpus() == 1)
+	if (nopvspin) {
+		pr_info("PV spinlocks disabled forced by \"nopvspin\" parameter.\n");
+		static_branch_disable(&virt_spin_lock_key);
 		return;
+	}
+
+	pr_info("PV spinlocks enabled\n");
 
 	__pv_init_lock_hash();
 	pv_ops.lock.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 2473f10..75193d6 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -580,4 +580,11 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
 #include "qspinlock_paravirt.h"
 #include "qspinlock.c"
 
+bool nopvspin __initdata;
+static __init int parse_nopvspin(char *arg)
+{
+	nopvspin = true;
+	return 0;
+}
+early_param("nopvspin", parse_nopvspin);
 #endif
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v5 5/5] x86/hyperv: Mark "hv_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

Map "hv_nopvspin" to "nopvspin".

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 6 +++++-
 arch/x86/hyperv/hv_spinlock.c                   | 4 ++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index df1eacc..08c6d34 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1436,6 +1436,10 @@
 	hv_nopvspin	[X86,HYPER_V] Disables the paravirt spinlock optimizations
 				      which allow the hypervisor to 'idle' the
 				      guest on lock contention.
+				      This parameter is obsoleted by "nopvspin"
+				      parameter, which has equivalent effect for
+				      HYPER_V platform.
+
 
 	keep_bootcon	[KNL]
 			Do not unregister boot console at start. This is only
@@ -5331,7 +5335,7 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-	nopvspin	[X86,XEN,KVM]
+	nopvspin	[X86,XEN,KVM,HYPER_V]
 			Disables the qspinlock slow path using PV optimizations
 			which allow the hypervisor to 'idle' the guest on lock
 			contention.
diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..47c7d6c 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (nopvspin)
+		hv_pvspin = false;
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
@@ -82,6 +85,7 @@ void __init hv_init_spinlocks(void)
 
 static __init int hv_parse_nopvspin(char *arg)
 {
+	pr_notice("\"hv_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
 	hv_pvspin = false;
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v5 4/5] xen: Mark "xen_nopvspin" parameter obsolete
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, Jonathan Corbet, H. Peter Anvin
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

Map "xen_nopvspin" to "nopvspin", fix stale description of "xen_nopvspin"
as we use qspinlock now.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 7 ++++---
 arch/x86/xen/spinlock.c                         | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 89d77ea..df1eacc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5303,8 +5303,9 @@
 			never -- do not unplug even if version check succeeds
 
 	xen_nopvspin	[X86,XEN]
-			Disables the ticketlock slowpath using Xen PV
-			optimizations.
+			Disables the qspinlock slowpath using Xen PV optimizations.
+			This parameter is obsoleted by "nopvspin" parameter, which
+			has equivalent effect for XEN platform.
 
 	xen_nopv	[X86]
 			Disables the PV optimizations forcing the HVM guest to
@@ -5330,7 +5331,7 @@
 			as generic guest with no PV drivers. Currently support
 			XEN HVM, KVM, HYPER_V and VMWARE guest.
 
-	nopvspin	[X86,KVM]
+	nopvspin	[X86,XEN,KVM]
 			Disables the qspinlock slow path using PV optimizations
 			which allow the hypervisor to 'idle' the guest on lock
 			contention.
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 6deb490..799f4eb 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -114,9 +114,8 @@ void xen_uninit_lock_cpu(int cpu)
  */
 void __init xen_init_spinlocks(void)
 {
-
 	/*  Don't need to use pvqspinlock code if there is only 1 vCPU. */
-	if (num_possible_cpus() == 1)
+	if (num_possible_cpus() == 1 || nopvspin)
 		xen_pvspin = false;
 
 	if (!xen_pvspin) {
@@ -137,6 +136,7 @@ void __init xen_init_spinlocks(void)
 
 static __init int xen_parse_nopvspin(char *arg)
 {
+	pr_notice("\"xen_nopvspin\" is deprecated, please use \"nopvspin\" instead\n");
 	xen_pvspin = false;
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v5 2/5] x86/kvm: Change print code to use pr_*() format
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan, H. Peter Anvin
In-Reply-To: <1570439071-9814-1-git-send-email-zhenzhong.duan@oracle.com>

pr_*() is preferred than printk(KERN_* ...), after change all the print
in arch/x86/kernel/kvm.c will have "kvm_guest: xxx" style.

No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
---
 arch/x86/kernel/kvm.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 3bc6a266..ef836d6 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -7,6 +7,8 @@
  *   Authors: Anthony Liguori <aliguori@us.ibm.com>
  */
 
+#define pr_fmt(fmt) "kvm_guest: " fmt
+
 #include <linux/context_tracking.h>
 #include <linux/init.h>
 #include <linux/kernel.h>
@@ -286,8 +288,8 @@ static void kvm_register_steal_time(void)
 		return;
 
 	wrmsrl(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED));
-	pr_info("kvm-stealtime: cpu %d, msr %llx\n",
-		cpu, (unsigned long long) slow_virt_to_phys(st));
+	pr_info("stealtime: cpu %d, msr %llx\n", cpu,
+		(unsigned long long) slow_virt_to_phys(st));
 }
 
 static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) = KVM_PV_EOI_DISABLED;
@@ -321,8 +323,7 @@ static void kvm_guest_cpu_init(void)
 
 		wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
 		__this_cpu_write(apf_reason.enabled, 1);
-		printk(KERN_INFO"KVM setup async PF for cpu %d\n",
-		       smp_processor_id());
+		pr_info("setup async PF for cpu %d\n", smp_processor_id());
 	}
 
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) {
@@ -347,8 +348,7 @@ static void kvm_pv_disable_apf(void)
 	wrmsrl(MSR_KVM_ASYNC_PF_EN, 0);
 	__this_cpu_write(apf_reason.enabled, 0);
 
-	printk(KERN_INFO"Unregister pv shared memory for cpu %d\n",
-	       smp_processor_id());
+	pr_info("Unregister pv shared memory for cpu %d\n", smp_processor_id());
 }
 
 static void kvm_pv_guest_cpu_reboot(void *unused)
@@ -469,7 +469,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 		} else {
 			ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
 				(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
-			WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+			WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+				  ret);
 			min = max = apic_id;
 			ipi_bitmap = 0;
 		}
@@ -479,7 +480,8 @@ static void __send_ipi_mask(const struct cpumask *mask, int vector)
 	if (ipi_bitmap) {
 		ret = kvm_hypercall4(KVM_HC_SEND_IPI, (unsigned long)ipi_bitmap,
 			(unsigned long)(ipi_bitmap >> BITS_PER_LONG), min, icr);
-		WARN_ONCE(ret < 0, "KVM: failed to send PV IPI: %ld", ret);
+		WARN_ONCE(ret < 0, "kvm_guest: failed to send PV IPI: %ld",
+			  ret);
 	}
 
 	local_irq_restore(flags);
@@ -509,7 +511,7 @@ static void kvm_setup_pv_ipi(void)
 {
 	apic->send_IPI_mask = kvm_send_ipi_mask;
 	apic->send_IPI_mask_allbutself = kvm_send_ipi_mask_allbutself;
-	pr_info("KVM setup pv IPIs\n");
+	pr_info("setup pv IPIs\n");
 }
 
 static void kvm_smp_send_call_func_ipi(const struct cpumask *mask)
@@ -631,11 +633,11 @@ static void __init kvm_guest_init(void)
 	    !kvm_para_has_hint(KVM_HINTS_REALTIME) &&
 	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		smp_ops.send_call_func_ipi = kvm_smp_send_call_func_ipi;
-		pr_info("KVM setup pv sched yield\n");
+		pr_info("setup pv sched yield\n");
 	}
 	if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/kvm:online",
 				      kvm_cpu_online, kvm_cpu_down_prepare) < 0)
-		pr_err("kvm_guest: Failed to install cpu hotplug callbacks\n");
+		pr_err("failed to install cpu hotplug callbacks\n");
 #else
 	sev_map_percpu_data();
 	kvm_guest_cpu_init();
@@ -738,7 +740,7 @@ static __init int kvm_setup_pv_tlb_flush(void)
 			zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu),
 				GFP_KERNEL, cpu_to_node(cpu));
 		}
-		pr_info("KVM setup pv remote TLB flush\n");
+		pr_info("setup pv remote TLB flush\n");
 	}
 
 	return 0;
@@ -866,8 +868,8 @@ static void kvm_enable_host_haltpoll(void *i)
 void arch_haltpoll_enable(unsigned int cpu)
 {
 	if (!kvm_para_has_feature(KVM_FEATURE_POLL_CONTROL)) {
-		pr_err_once("kvm: host does not support poll control\n");
-		pr_err_once("kvm: host upgrade recommended\n");
+		pr_err_once("host does not support poll control\n");
+		pr_err_once("host upgrade recommended\n");
 		return;
 	}
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH v5 0/5] Add a unified parameter "nopvspin"
From: Zhenzhong Duan @ 2019-10-07  9:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, boris.ostrovsky, jgross, sstabellini,
	peterz, Zhenzhong Duan

There are cases folks want to disable spinlock optimization for
debug/test purpose. Xen and hyperv already have parameters "xen_nopvspin"
and "hv_nopvspin" to support that, but kvm doesn't.

The first patch adds that feature to KVM guest with "nopvspin".

For compatibility reason original parameters "xen_nopvspin" and
"hv_nopvspin" are retained and marked obsolete.

v5:
PATCH1: new patch to revert a currently unnecessory commit,
        code is simpler a bit after that change.         [Boris Ostrovsky]
PATCH3: fold 'if' statement,add comments on virt_spin_lock_key,
        reorder with PATCH2 to better reflect dependency                               
PATCH4: fold 'if' statement, add Reviewed-by             [Boris Ostrovsky]
PATCH5: add Reviewed-by                                  [Michael Kelley]

v4:
PATCH1: use variable name nopvspin instead of pvspin and
        defined it as __initdata, changed print message,
        updated patch description                     [Sean Christopherson]
PATCH2: remove Suggested-by, use "kvm-guest:" prefix  [Sean Christopherson]
PATCH3: make variable nopvsin and xen_pvspin coexist
        remove Reviewed-by due to code change         [Sean Christopherson]
PATCH4: make variable nopvsin and hv_pvspin coexist   [Sean Christopherson]

v3:
PATCH2: Fix indentation

v2:
PATCH1: pick the print code change into separate PATCH2,
        updated patch description             [Vitaly Kuznetsov]
PATCH2: new patch with print code change      [Vitaly Kuznetsov]
PATCH3: add Reviewed-by                       [Juergen Gross]

Zhenzhong Duan (5):
  Revert "KVM: X86: Fix setup the virt_spin_lock_key before static key
    get initialized"
  x86/kvm: Change print code to use pr_*() format
  x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
  xen: Mark "xen_nopvspin" parameter obsolete
  x86/hyperv: Mark "hv_nopvspin" parameter obsolete

 Documentation/admin-guide/kernel-parameters.txt | 14 +++++-
 arch/x86/hyperv/hv_spinlock.c                   |  4 ++
 arch/x86/include/asm/qspinlock.h                |  1 +
 arch/x86/kernel/kvm.c                           | 63 ++++++++++++++-----------
 arch/x86/xen/spinlock.c                         |  4 +-
 kernel/locking/qspinlock.c                      |  7 +++
 6 files changed, 62 insertions(+), 31 deletions(-)

-- 
1.8.3.1


^ permalink raw reply

* Re: [PATCH v4 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Zhenzhong Duan @ 2019-10-08  2:17 UTC (permalink / raw)
  To: Boris Ostrovsky, linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, jgross, sstabellini, peterz,
	Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <ee3feb72-e587-9ac8-d5ba-e5c00b2a89c7@oracle.com>


On 2019/10/7 22:46, Boris Ostrovsky wrote:
> On 10/6/19 3:49 AM, Zhenzhong Duan wrote:
>> On 2019/10/4 22:52, Boris Ostrovsky wrote:
>>
>>> On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
>>>>    void __init kvm_spinlock_init(void)
>>>>    {
>>>> -    /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>>>> -    if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>>>> -        return;
>>>> -
>>>> -    if (kvm_para_has_hint(KVM_HINTS_REALTIME))
>>>> +    /*
>>>> +     * Don't use the pvqspinlock code if no KVM_FEATURE_PV_UNHALT
>>>> feature
>>>> +     * support, or there is REALTIME hints or only 1 vCPU.
>>>> +     */
>>>> +    if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
>>>> +        kvm_para_has_hint(KVM_HINTS_REALTIME) ||
>>>> +        num_possible_cpus() == 1) {
>>>> +        pr_info("PV spinlocks disabled\n");
>>>>            return;
>>>> +    }
>>>>    -    /* Don't use the pvqspinlock code if there is only 1 vCPU. */
>>>> -    if (num_possible_cpus() == 1)
>>>> +    if (nopvspin) {
>>>> +        pr_info("PV spinlocks disabled forced by \"nopvspin\"
>>>> parameter.\n");
>>>> +        static_branch_disable(&virt_spin_lock_key);
>>> Would it make sense to bring here the other site where the key is
>>> disabled (in kvm_smp_prepare_cpus())?
>> Thanks for point out, I'll do it. Just not clear if I should do that
>> in a separate patch,
>> there is a history about that code:
>>
>> Its original place was here and then moved to kvm_smp_prepare_cpus()
>> by below commit:
>> 34226b6b ("KVM: X86: Fix setup the virt_spin_lock_key before static
>> key get initialized")
>> which fixed jump_label_init() calling late issue.
>>
>> Then 8990cac6 ("x86/jump_label: Initialize static branching early")
>> move jump_label_init()
>> early, so commit 34226b6b could be reverted.
> Which is similar to what you did earlier for Xen.

You remember that, ok, I'll do the same for KVM.

Thanks

Zhenzhong


^ permalink raw reply

* RE: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Dexuan Cui @ 2019-10-07 18:57 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <20191007132414.GA19294@google.com>

> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: Monday, October 7, 2019 6:24 AM
> To: Dexuan Cui <decui@microsoft.com>
> Cc: lorenzo.pieralisi@arm.com; linux-pci@vger.kernel.org; Michael Kelley
> <mikelley@microsoft.com>; linux-hyperv@vger.kernel.org;
> linux-kernel@vger.kernel.org; driverdev-devel@linuxdriverproject.org; Sasha
> Levin <Alexander.Levin@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; KY Srinivasan <kys@microsoft.com>;
> olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; vkuznets
> <vkuznets@redhat.com>; marcelo.cerri@canonical.com; Stephen Hemminger
> <sthemmin@microsoft.com>; jackm@mellanox.com
> Subject: Re: [PATCH v2] PCI: PM: Move to D0 before calling
> pci_legacy_resume_early()
> 
> On Wed, Aug 14, 2019 at 01:06:55AM +0000, Dexuan Cui wrote:
> >
> > In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> >
> > In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> > but the current code misses the pci_legacy_resume_early() path, so the
> > state remains in PCI_UNKNOWN in that path. As a result, in the resume
> > phase of hibernation, this causes an error for the Mellanox VF driver,
> > which fails to enable MSI-X because pci_msi_supported() is false due
> > to dev->current_state != PCI_D0:
> >
> > mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> > mlx4_core a6d1:00:02.0: Sending reset
> > mlx4_core a6d1:00:02.0: Sending vhcr0
> > mlx4_core a6d1:00:02.0: HCA minimum page size:512
> > mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> > mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode,
> aborting
> > PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> > PM: Device a6d1:00:02.0 failed to thaw: error -95
> >
> > To be more accurate, the "resume" phase means the "thaw" callbacks which
> > run before the system enters hibernation: when the user runs the command
> > "echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
> > all the devices and creates a hibernation image, then the kernel "thaws"
> > the devices including the disk/NIC, writes the memory to the disk, and
> > powers down. This patch fixes the error message for the Mellanox VF driver
> > in this phase.
> >
> > When the system starts again, a fresh kernel starts to run, and when the
> > kernel detects that a hibernation image was saved, the kernel "quiesces"
> > the devices, and then "restores" the devices from the saved image. In this
> > path:
> > device_resume_noirq() -> ... ->
> >   pci_pm_restore_noirq() ->
> >     pci_pm_default_resume_early() ->
> >       pci_power_up() moves the device states back to PCI_D0. This path is
> > not broken and doesn't need my patch.
> >
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> 
> This looks like a bugfix for 5839ee7389e8 ("PCI / PM: Force devices to
> D0 in pci_pm_thaw_noirq()") so maybe it should be marked for stable as
> 5839ee7389e8 was?
> 
> Rafael, could you confirm?
> 
> > ---
> >
> > changes in v2:
> > 	Updated the changelog with more details.
> >
> >  drivers/pci/pci-driver.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index 36dbe960306b..27dfc68db9e7 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device
> *dev)
> >  			return error;
> >  	}
> >
> > -	if (pci_has_legacy_pm_support(pci_dev))
> > -		return pci_legacy_resume_early(dev);
> > -
> >  	/*
> >  	 * pci_restore_state() requires the device to be in D0 (because of MSI
> >  	 * restoration among other things), so force it into D0 in case the
> >  	 * driver's "freeze" callbacks put it into a low-power state directly.
> >  	 */
> >  	pci_set_power_state(pci_dev, PCI_D0);
> > +
> > +	if (pci_has_legacy_pm_support(pci_dev))
> > +		return pci_legacy_resume_early(dev);
> > +
> >  	pci_restore_state(pci_dev);
> >
> >  	if (drv && drv->pm && drv->pm->thaw_noirq)
> > --
> > 2.19.1
> >

Added Rafael to "To".

Thanks,
-- Dexuan


^ permalink raw reply

* RE: [PATCH 0/2] Drivers: hv: vmbus: Miscellaneous improvements
From: Dexuan Cui @ 2019-10-07 17:41 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, vkuznets
In-Reply-To: <20191007163115.26197-1-parri.andrea@gmail.com>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andrea Parri
> Sent: Monday, October 7, 2019 9:31 AM
> 
> Hi all,
> 
> The patchset:
> 
> - simplifies/refactors the VMBus negotiation code by introducing
>   the table of VMBus protocol versions (patch 1/2),
> 
> - enables VMBus protocol versions 5.1 and 5.2 (patch 2/2).
> 
> Thanks,
>   Andrea
> 
> Andrea Parri (2):
>   Drivers: hv: vmbus: Introduce table of VMBus protocol versions
>   Drivers: hv: vmbus: Enable VMBus protocol versions 5.1 and 5.2

Should we add a module parameter to allow the user to specify a lower
protocol version, when the VM runs on the latest host? 

This can be useful for testing and debugging purpose: the variable
"vmbus_proto_version" is referenced by the vmbus driver itself and
some VSC drivers: if we always use the latest available proto version,
some code paths can not be tested on the latest hosts. 

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Dexuan Cui @ 2019-10-07 17:25 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel@vger.kernel.org,
	linux-hyperv@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, vkuznets
In-Reply-To: <20191007163115.26197-2-parri.andrea@gmail.com>

> From: linux-hyperv-owner@vger.kernel.org
> <linux-hyperv-owner@vger.kernel.org> On Behalf Of Andrea Parri
> Sent: Monday, October 7, 2019 9:31 AM
> ....
> +/*
> + * Table of VMBus versions listed from newest to oldest; the table
> + * must terminate with VERSION_INVAL.
> + */
> +__u32 vmbus_versions[] = {
> +	VERSION_WIN10_V5,

This should be "static"?
 
Thanks,
Dexuan

^ permalink raw reply

* Re: [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Vitaly Kuznetsov @ 2019-10-07 17:14 UTC (permalink / raw)
  To: Andrea Parri, linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Andrea Parri
In-Reply-To: <20191007163115.26197-2-parri.andrea@gmail.com>

Andrea Parri <parri.andrea@gmail.com> writes:

> The technique used to get the next VMBus version seems increasisly
> clumsy as the number of VMBus versions increases.  Performance is
> not a concern since this is only done once during system boot; it's
> just that we'll end up with more lines of code than is really needed.
>
> As an alternative, introduce a table with the version numbers listed
> in order (from the most recent to the oldest).  vmbus_connect() loops
> through the versions listed in the table until it gets an accepted
> connection or gets to the end of the table (invalid version).
>
> Suggested-by: Michael Kelley <mikelley@microsoft.com>
> Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
> ---
>  drivers/hv/connection.c | 51 +++++++++++++++--------------------------
>  include/linux/hyperv.h  |  2 --
>  2 files changed, 19 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index 6e4c015783ffc..90a32c9d79403 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -40,29 +40,19 @@ EXPORT_SYMBOL_GPL(vmbus_connection);
>  __u32 vmbus_proto_version;
>  EXPORT_SYMBOL_GPL(vmbus_proto_version);
>  
> -static __u32 vmbus_get_next_version(__u32 current_version)
> -{
> -	switch (current_version) {
> -	case (VERSION_WIN7):
> -		return VERSION_WS2008;
> -
> -	case (VERSION_WIN8):
> -		return VERSION_WIN7;
> -
> -	case (VERSION_WIN8_1):
> -		return VERSION_WIN8;
> -
> -	case (VERSION_WIN10):
> -		return VERSION_WIN8_1;
> -
> -	case (VERSION_WIN10_V5):
> -		return VERSION_WIN10;
> -
> -	case (VERSION_WS2008):
> -	default:
> -		return VERSION_INVAL;
> -	}
> -}
> +/*
> + * Table of VMBus versions listed from newest to oldest; the table
> + * must terminate with VERSION_INVAL.
> + */
> +__u32 vmbus_versions[] = {
> +	VERSION_WIN10_V5,
> +	VERSION_WIN10,
> +	VERSION_WIN8_1,
> +	VERSION_WIN8,
> +	VERSION_WIN7,
> +	VERSION_WS2008,
> +	VERSION_INVAL
> +};
>  
>  int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>  {
> @@ -169,8 +159,8 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
>   */
>  int vmbus_connect(void)
>  {
> -	int ret = 0;
>  	struct vmbus_channel_msginfo *msginfo = NULL;
> +	int i, ret = 0;
>  	__u32 version;
>  
>  	/* Initialize the vmbus connection */
> @@ -244,21 +234,18 @@ int vmbus_connect(void)
>  	 * version.
>  	 */
>  
> -	version = VERSION_CURRENT;
> +	for (i = 0; ; i++) {
> +		version = vmbus_versions[i];
> +		if (version == VERSION_INVAL)
> +			goto cleanup;

If you use e.g. ARRAY_SIZE() you can get rid of VERSION_INVAL - and make
this code look more natural.
>  
> -	do {
>  		ret = vmbus_negotiate_version(msginfo, version);
>  		if (ret == -ETIMEDOUT)
>  			goto cleanup;
>  
>  		if (vmbus_connection.conn_state == CONNECTED)
>  			break;
> -
> -		version = vmbus_get_next_version(version);
> -	} while (version != VERSION_INVAL);
> -
> -	if (version == VERSION_INVAL)
> -		goto cleanup;
> +	}
>  
>  	vmbus_proto_version = version;
>  	pr_info("Vmbus version:%d.%d\n",
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index b4a017093b697..7073f1eb3618c 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -194,8 +194,6 @@ static inline u32 hv_get_avail_to_write_percent(
>  
>  #define VERSION_INVAL -1
>  
> -#define VERSION_CURRENT VERSION_WIN10_V5
> -
>  /* Make maximum size of pipe payload of 16K */
>  #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)

-- 
Vitaly


^ permalink raw reply

* [PATCH 2/2] Drivers: hv: vmbus: Enable VMBus protocol versions 5.1 and 5.2
From: Andrea Parri @ 2019-10-07 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Andrea Parri
In-Reply-To: <20191007163115.26197-1-parri.andrea@gmail.com>

Hyper-V has added VMBus protocol versions 5.1 and 5.2 in recent release
versions.  Allow Linux guests to negotiate these new protocol versions
on versions of Hyper-V that support them.

Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 drivers/hv/connection.c | 12 +++++++-----
 include/linux/hyperv.h  |  4 ++++
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 90a32c9d79403..d05fef3e09080 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -45,6 +45,8 @@ EXPORT_SYMBOL_GPL(vmbus_proto_version);
  * must terminate with VERSION_INVAL.
  */
 __u32 vmbus_versions[] = {
+	VERSION_WIN10_V5_2,
+	VERSION_WIN10_V5_1,
 	VERSION_WIN10_V5,
 	VERSION_WIN10,
 	VERSION_WIN8_1,
@@ -70,12 +72,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 	msg->vmbus_version_requested = version;
 
 	/*
-	 * VMBus protocol 5.0 (VERSION_WIN10_V5) requires that we must use
-	 * VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
+	 * VMBus protocol 5.0 (VERSION_WIN10_V5) and higher require that we must
+	 * use VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate Contact Message,
 	 * and for subsequent messages, we must use the Message Connection ID
 	 * field in the host-returned Version Response Message. And, with
-	 * VERSION_WIN10_V5, we don't use msg->interrupt_page, but we tell
-	 * the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
+	 * VERSION_WIN10_V5 and higher, we don't use msg->interrupt_page, but we
+	 * tell the host explicitly that we still use VMBUS_MESSAGE_SINT(2) for
 	 * compatibility.
 	 *
 	 * On old hosts, we should always use VMBUS_MESSAGE_CONNECTION_ID (1).
@@ -400,7 +402,7 @@ int vmbus_post_msg(void *buffer, size_t buflen, bool can_sleep)
 		case HV_STATUS_INVALID_CONNECTION_ID:
 			/*
 			 * See vmbus_negotiate_version(): VMBus protocol 5.0
-			 * requires that we must use
+			 * and higher require that we must use
 			 * VMBUS_MESSAGE_CONNECTION_ID_4 for the Initiate
 			 * Contact message, but on old hosts that only
 			 * support VMBus protocol 4.0 or lower, here we get
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7073f1eb3618c..5ecb2ff7cc25d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -183,6 +183,8 @@ static inline u32 hv_get_avail_to_write_percent(
  * 3 . 0  (Windows 8 R2)
  * 4 . 0  (Windows 10)
  * 5 . 0  (Newer Windows 10)
+ * 5 . 1  (Windows 10 RS4)
+ * 5 . 2  (Windows Server 2019, RS5)
  */
 
 #define VERSION_WS2008  ((0 << 16) | (13))
@@ -191,6 +193,8 @@ static inline u32 hv_get_avail_to_write_percent(
 #define VERSION_WIN8_1    ((3 << 16) | (0))
 #define VERSION_WIN10	((4 << 16) | (0))
 #define VERSION_WIN10_V5 ((5 << 16) | (0))
+#define VERSION_WIN10_V5_1 ((5 << 16) | (1))
+#define VERSION_WIN10_V5_2 ((5 << 16) | (2))
 
 #define VERSION_INVAL -1
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH 1/2] Drivers: hv: vmbus: Introduce table of VMBus protocol versions
From: Andrea Parri @ 2019-10-07 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Andrea Parri
In-Reply-To: <20191007163115.26197-1-parri.andrea@gmail.com>

The technique used to get the next VMBus version seems increasisly
clumsy as the number of VMBus versions increases.  Performance is
not a concern since this is only done once during system boot; it's
just that we'll end up with more lines of code than is really needed.

As an alternative, introduce a table with the version numbers listed
in order (from the most recent to the oldest).  vmbus_connect() loops
through the versions listed in the table until it gets an accepted
connection or gets to the end of the table (invalid version).

Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
---
 drivers/hv/connection.c | 51 +++++++++++++++--------------------------
 include/linux/hyperv.h  |  2 --
 2 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6e4c015783ffc..90a32c9d79403 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -40,29 +40,19 @@ EXPORT_SYMBOL_GPL(vmbus_connection);
 __u32 vmbus_proto_version;
 EXPORT_SYMBOL_GPL(vmbus_proto_version);
 
-static __u32 vmbus_get_next_version(__u32 current_version)
-{
-	switch (current_version) {
-	case (VERSION_WIN7):
-		return VERSION_WS2008;
-
-	case (VERSION_WIN8):
-		return VERSION_WIN7;
-
-	case (VERSION_WIN8_1):
-		return VERSION_WIN8;
-
-	case (VERSION_WIN10):
-		return VERSION_WIN8_1;
-
-	case (VERSION_WIN10_V5):
-		return VERSION_WIN10;
-
-	case (VERSION_WS2008):
-	default:
-		return VERSION_INVAL;
-	}
-}
+/*
+ * Table of VMBus versions listed from newest to oldest; the table
+ * must terminate with VERSION_INVAL.
+ */
+__u32 vmbus_versions[] = {
+	VERSION_WIN10_V5,
+	VERSION_WIN10,
+	VERSION_WIN8_1,
+	VERSION_WIN8,
+	VERSION_WIN7,
+	VERSION_WS2008,
+	VERSION_INVAL
+};
 
 int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
 {
@@ -169,8 +159,8 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo *msginfo, u32 version)
  */
 int vmbus_connect(void)
 {
-	int ret = 0;
 	struct vmbus_channel_msginfo *msginfo = NULL;
+	int i, ret = 0;
 	__u32 version;
 
 	/* Initialize the vmbus connection */
@@ -244,21 +234,18 @@ int vmbus_connect(void)
 	 * version.
 	 */
 
-	version = VERSION_CURRENT;
+	for (i = 0; ; i++) {
+		version = vmbus_versions[i];
+		if (version == VERSION_INVAL)
+			goto cleanup;
 
-	do {
 		ret = vmbus_negotiate_version(msginfo, version);
 		if (ret == -ETIMEDOUT)
 			goto cleanup;
 
 		if (vmbus_connection.conn_state == CONNECTED)
 			break;
-
-		version = vmbus_get_next_version(version);
-	} while (version != VERSION_INVAL);
-
-	if (version == VERSION_INVAL)
-		goto cleanup;
+	}
 
 	vmbus_proto_version = version;
 	pr_info("Vmbus version:%d.%d\n",
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index b4a017093b697..7073f1eb3618c 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -194,8 +194,6 @@ static inline u32 hv_get_avail_to_write_percent(
 
 #define VERSION_INVAL -1
 
-#define VERSION_CURRENT VERSION_WIN10_V5
-
 /* Make maximum size of pipe payload of 16K */
 #define MAX_PIPE_DATA_PAYLOAD		(sizeof(u8) * 16384)
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH 0/2] Drivers: hv: vmbus: Miscellaneous improvements
From: Andrea Parri @ 2019-10-07 16:31 UTC (permalink / raw)
  To: linux-kernel, linux-hyperv
  Cc: K . Y . Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Michael Kelley, Vitaly Kuznetsov, Andrea Parri

Hi all,

The patchset:

- simplifies/refactors the VMBus negotiation code by introducing
  the table of VMBus protocol versions (patch 1/2),

- enables VMBus protocol versions 5.1 and 5.2 (patch 2/2).

Thanks,
  Andrea

Andrea Parri (2):
  Drivers: hv: vmbus: Introduce table of VMBus protocol versions
  Drivers: hv: vmbus: Enable VMBus protocol versions 5.1 and 5.2

 drivers/hv/connection.c | 63 +++++++++++++++++------------------------
 include/linux/hyperv.h  |  6 ++--
 2 files changed, 30 insertions(+), 39 deletions(-)

-- 
2.23.0


^ permalink raw reply

* Re: [PATCH v4 1/4] x86/kvm: Add "nopvspin" parameter to disable PV spinlocks
From: Boris Ostrovsky @ 2019-10-07 14:46 UTC (permalink / raw)
  To: Zhenzhong Duan, linux-kernel
  Cc: vkuznets, linux-hyperv, kvm, kys, haiyangz, sthemmin, sashal,
	tglx, mingo, bp, pbonzini, rkrcmar, sean.j.christopherson,
	wanpengli, jmattson, joro, jgross, sstabellini, peterz,
	Jonathan Corbet, H. Peter Anvin, Will Deacon
In-Reply-To: <b0d6df7b-00ff-cdd8-f9f2-26af73256f5b@oracle.com>

On 10/6/19 3:49 AM, Zhenzhong Duan wrote:
> On 2019/10/4 22:52, Boris Ostrovsky wrote:
>
>> On 10/3/19 10:02 AM, Zhenzhong Duan wrote:
>>>   void __init kvm_spinlock_init(void)
>>>   {
>>> -    /* Does host kernel support KVM_FEATURE_PV_UNHALT? */
>>> -    if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
>>> -        return;
>>> -
>>> -    if (kvm_para_has_hint(KVM_HINTS_REALTIME))
>>> +    /*
>>> +     * Don't use the pvqspinlock code if no KVM_FEATURE_PV_UNHALT
>>> feature
>>> +     * support, or there is REALTIME hints or only 1 vCPU.
>>> +     */
>>> +    if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) ||
>>> +        kvm_para_has_hint(KVM_HINTS_REALTIME) ||
>>> +        num_possible_cpus() == 1) {
>>> +        pr_info("PV spinlocks disabled\n");
>>>           return;
>>> +    }
>>>   -    /* Don't use the pvqspinlock code if there is only 1 vCPU. */
>>> -    if (num_possible_cpus() == 1)
>>> +    if (nopvspin) {
>>> +        pr_info("PV spinlocks disabled forced by \"nopvspin\"
>>> parameter.\n");
>>> +        static_branch_disable(&virt_spin_lock_key);
>> Would it make sense to bring here the other site where the key is
>> disabled (in kvm_smp_prepare_cpus())?
>
> Thanks for point out, I'll do it. Just not clear if I should do that
> in a separate patch,
> there is a history about that code:
>
> Its original place was here and then moved to kvm_smp_prepare_cpus()
> by below commit:
> 34226b6b ("KVM: X86: Fix setup the virt_spin_lock_key before static
> key get initialized")
> which fixed jump_label_init() calling late issue.
>
> Then 8990cac6 ("x86/jump_label: Initialize static branching early")
> move jump_label_init()
> early, so commit 34226b6b could be reverted.


Which is similar to what you did earlier for Xen.


>
>>
>> (and, in fact, shouldn't all of the checks that result in early return
>> above disable the key?)
>
> I think we should enable he key for
> !kvm_para_has_feature(KVM_FEATURE_PV_UNHALT) case,
> there is lock holder preemption issue as qspinlock is fair lock,
> virt_spin_lock()
> is an optimization to that, imaging one pcpu running 10 vcpus of same
> guest
> contending a same lock.

Right. I conflated pv lock and virt_spin_lock_key, and that is wrong.

-boris


>
> For kvm_para_has_hint(KVM_HINTS_REALTIME) case, hypervisor hints there is
> no preemption and we should disable virt_spin_lock_key to use native
> qspinlock.
>
> For the UP case, we don't care virt_spin_lock_key value.
>
> For nopvspin case, we intentionally check native qspinlock code
> performance,
> compare it with PV qspinlock, etc. So virt_spin_lock() optimization
> should be disabled.
>
> Let me know if anything wrong with above understanding. Thanks
>
> Zhenzhong
>


^ permalink raw reply

* Re: [PATCH v2] PCI: PM: Move to D0 before calling pci_legacy_resume_early()
From: Bjorn Helgaas @ 2019-10-07 13:24 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: lorenzo.pieralisi@arm.com, linux-pci@vger.kernel.org,
	Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	driverdev-devel@linuxdriverproject.org, Sasha Levin,
	Haiyang Zhang, KY Srinivasan, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com,
	Stephen Hemminger, jackm@mellanox.com
In-Reply-To: <KU1P153MB016637CAEAD346F0AA8E3801BFAD0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM>

On Wed, Aug 14, 2019 at 01:06:55AM +0000, Dexuan Cui wrote:
> 
> In pci_legacy_suspend_late(), the device state is moved to PCI_UNKNOWN.
> 
> In pci_pm_thaw_noirq(), the state is supposed to be moved back to PCI_D0,
> but the current code misses the pci_legacy_resume_early() path, so the
> state remains in PCI_UNKNOWN in that path. As a result, in the resume
> phase of hibernation, this causes an error for the Mellanox VF driver,
> which fails to enable MSI-X because pci_msi_supported() is false due
> to dev->current_state != PCI_D0:
> 
> mlx4_core a6d1:00:02.0: Detected virtual function - running in slave mode
> mlx4_core a6d1:00:02.0: Sending reset
> mlx4_core a6d1:00:02.0: Sending vhcr0
> mlx4_core a6d1:00:02.0: HCA minimum page size:512
> mlx4_core a6d1:00:02.0: Timestamping is not supported in slave mode
> mlx4_core a6d1:00:02.0: INTx is not supported in multi-function mode, aborting
> PM: dpm_run_callback(): pci_pm_thaw+0x0/0xd7 returns -95
> PM: Device a6d1:00:02.0 failed to thaw: error -95
> 
> To be more accurate, the "resume" phase means the "thaw" callbacks which
> run before the system enters hibernation: when the user runs the command
> "echo disk > /sys/power/state" for hibernation, first the kernel "freezes"
> all the devices and creates a hibernation image, then the kernel "thaws"
> the devices including the disk/NIC, writes the memory to the disk, and
> powers down. This patch fixes the error message for the Mellanox VF driver
> in this phase.
> 
> When the system starts again, a fresh kernel starts to run, and when the
> kernel detects that a hibernation image was saved, the kernel "quiesces"
> the devices, and then "restores" the devices from the saved image. In this
> path:
> device_resume_noirq() -> ... ->
>   pci_pm_restore_noirq() ->
>     pci_pm_default_resume_early() ->
>       pci_power_up() moves the device states back to PCI_D0. This path is
> not broken and doesn't need my patch.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>

This looks like a bugfix for 5839ee7389e8 ("PCI / PM: Force devices to
D0 in pci_pm_thaw_noirq()") so maybe it should be marked for stable as
5839ee7389e8 was?

Rafael, could you confirm?

> ---
> 
> changes in v2:
> 	Updated the changelog with more details.
> 
>  drivers/pci/pci-driver.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 36dbe960306b..27dfc68db9e7 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1074,15 +1074,16 @@ static int pci_pm_thaw_noirq(struct device *dev)
>  			return error;
>  	}
>  
> -	if (pci_has_legacy_pm_support(pci_dev))
> -		return pci_legacy_resume_early(dev);
> -
>  	/*
>  	 * pci_restore_state() requires the device to be in D0 (because of MSI
>  	 * restoration among other things), so force it into D0 in case the
>  	 * driver's "freeze" callbacks put it into a low-power state directly.
>  	 */
>  	pci_set_power_state(pci_dev, PCI_D0);
> +
> +	if (pci_has_legacy_pm_support(pci_dev))
> +		return pci_legacy_resume_early(dev);
> +
>  	pci_restore_state(pci_dev);
>  
>  	if (drv && drv->pm && drv->pm->thaw_noirq)
> -- 
> 2.19.1
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox