Linux-HyperV List
 help / color / mirror / Atom feed
* [GIT PULL] Hyper-V commits for 5.1
From: Sasha Levin @ 2019-04-17  1:34 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, linux-hyperv, kys, haiyangz, sthemmin, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

The following changes since commit 9e98c678c2d6ae3a17cb2de55d17f69dddaa231b:

  Linux 5.1-rc1 (2019-03-17 14:22:26 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git tags/hyperv-fixes-signed

for you to fetch changes up to a0033bd1eae4650b69be07c17cb87393da584563:

  Drivers: hv: vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup() (2019-04-13 09:36:35 -0400)

- ----------------------------------------------------------------
Three fixes:

 1. Fix for a race condition in the hyper-v ringbuffer code by Kimberly
Brown.
 2. Fix to show monitor data only when monitor pages are actually
allocated, also by Kimberly Brown.
 3. Fix cpu reference counting in the vmbus code by Dexuan Cui.

- ----------------------------------------------------------------
Dexuan Cui (1):
      Drivers: hv: vmbus: Remove the undesired put_cpu_ptr() in hv_synic_cleanup()

Kimberly Brown (4):
      Drivers: hv: vmbus: Expose monitor data only when monitor pages are used
      Drivers: hv: vmbus: Refactor chan->state if statement
      Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
      Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutex

 Documentation/ABI/stable/sysfs-bus-vmbus |  12 ++-
 drivers/hv/channel_mgmt.c                |   3 +
 drivers/hv/hv.c                          |   1 -
 drivers/hv/hyperv_vmbus.h                |   3 +
 drivers/hv/ring_buffer.c                 |  22 +++-
 drivers/hv/vmbus_drv.c                   | 166 +++++++++++++++++++++++++------
 include/linux/hyperv.h                   |   7 +-
 7 files changed, 175 insertions(+), 39 deletions(-)
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAly2gjwACgkQ3qZv95d3
LNwtLg//UPVpBuilk7pzyumaEYNNeVcq781h/cghAOxiTDPlCI/qAkshdPj9ePAI
UWmBl/lUTXWcN26Xb2NLF9VulXfTFCiphQdgYMQEVslnOvbYLMLkIOOkrkovT6ie
OE5x1j+2S2uF+nwZvLq+FYuS5OiCBx8De1HaCUJQjh1dnrcdjCBTk+idE12tdqa6
pXO32CBKLUK0AEo9yHfeyg9RRsXV89u0wDEOhS160sqx5B6o1TevGOxkPDDQKrKG
mo80aY17MV3ljcODt4pqCNz1rITV6wwZ2oTz1sEv+5nMsqGZ9dcpXYQiQp0huaVl
/j4v/xY9gUDhGYvixoT8Mn87zc7y0y6o+VMcoM7YO7NJkbDA3XN6UaGzXdon061U
w25JBpWdHmot+5PxNafp708V7OBcWpbCoEFhSKvzKcYZlnGo4OA0om966v1WsW2K
ssXEPS4qdwtRM1dyK9c/Lt9HgaZWXzpIzj2bMxVjYPMeAmOd6W0QlfnuZuxYHmRd
aoZswMwFiC2lyOoHtbo50EzeKLwxVju+ToHFTf2jRXnjsLvDXkE6FmX9DSYnwvnE
Vs5qCvW8ynOPhTWAn6eKl0Ysb12+m36w+MEol7JTUlw4XPw7ahHsuxsN4TJQnshD
mZYG2Mrc38B/YOytT0asT5Z9U5ob9U7446I2Rn3XbdjVs9EEBAA=
=gXIY
-----END PGP SIGNATURE-----

^ permalink raw reply

* RE: [PATCH] x86/hyper-v: implement EOI assist
From: Long Li @ 2019-04-16  0:21 UTC (permalink / raw)
  To: vkuznets, Simon Xiao
  Cc: x86@kernel.org, linux-kernel@vger.kernel.org, KY Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Sasha Levin, Michael Kelley,
	linux-hyperv@vger.kernel.org
In-Reply-To: <87v9zfh2rr.fsf@vitty.brq.redhat.com>

>>>Subject: Re: [PATCH] x86/hyper-v: implement EOI assist
>>>
>>>Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>>>
>>>> Hyper-V TLFS suggests an optimization to avoid imminent VMExit on EOI:
>>>> "The OS performs an EOI by atomically writing zero to the EOI Assist
>>>> field of the virtual VP assist page and checking whether the "No EOI
>>>required"
>>>> field was previously zero. If it was, the OS must write to the
>>>> HV_X64_APIC_EOI MSR thereby triggering an intercept into the
>>>hypervisor."
>>>>
>>>> Implement the optimization in Linux.
>>>>
>>>
>>>Simon, Long,
>>>
>>>did you get a chance to run some tests with this?

I have ran some tests on Azure L80s_v2.

With 10 NVMe disks on raid0 and formatted to EXT4, I'm getting 2.6m max IOPS with the patch, compared to 2.55m IOPS before.

The VM has been running stable. Thank you!

Tested-by: Long Li <longli@microsoft.com>

>>>
>>>--
>>>Vitaly

^ permalink raw reply

* Re: [PATCH] x86/hyper-v: implement EOI assist
From: Vitaly Kuznetsov @ 2019-04-15 12:27 UTC (permalink / raw)
  To: Long Li, Simon Xiao
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Michael Kelley (EOSG),
	linux-hyperv
In-Reply-To: <20190403170309.4107-1-vkuznets@redhat.com>

Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Hyper-V TLFS suggests an optimization to avoid imminent VMExit on EOI:
> "The OS performs an EOI by atomically writing zero to the EOI Assist field
> of the virtual VP assist page and checking whether the "No EOI required"
> field was previously zero. If it was, the OS must write to the
> HV_X64_APIC_EOI MSR thereby triggering an intercept into the hypervisor."
>
> Implement the optimization in Linux.
>

Simon, Long,

did you get a chance to run some tests with this?

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
From: Vitaly Kuznetsov @ 2019-04-12  7:52 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel
In-Reply-To: <20190412072401.GA69620@maya190131.isni1t2eisqetojrdim5hhf1se.xx.internal.cloudapp.net>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
>> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
>> 
>> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>> >  u32 hv_max_vp_index;
>> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> >  
>> > +struct kmem_cache *cachep;
>> > +EXPORT_SYMBOL_GPL(cachep);
>> > +
>> >  static int hv_cpu_init(unsigned int cpu)
>> >  {
>> >  	u64 msr_vp_index;
>> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> >  	void **input_arg;
>> > -	struct page *pg;
>> >  
>> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>> > -	pg = alloc_page(GFP_KERNEL);
>> > -	if (unlikely(!pg))
>> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
>> 
>> I'm not sure use of kmem_cache is justified here: pages we allocate are
>> not cache-line and all these allocations are supposed to persist for the
>> lifetime of the guest. In case you think that even on x86 it will be
>> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
>> instead.
>> 
> Thank you for your feedback, Vitaly!
>
> Will you please tell me how cache-line relates to kmem_cache?
>
> I understand that alloc_pages() would work when PAGE_SIZE <=
> HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
> HV_HYP_PAGE_SIZE.

Sorry, my bad: I meant to say "not cache-like" (these allocations are
not 'cache') but the typo made it completely incomprehensible. 

>
>> Also, in case the idea is to generalize stuff, what will happen if
>> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
>> 
>> I think we can leave hypercall arguments, vp_assist and similar pages
>> alone for now: the code is not going to be shared among architectures
>> anyways.
>> 
> About the alignment, kmem_cache_create() aligns memory with its third
> parameter, offset.

Yes, I know, I was trying to think about a (hypothetical) situation when
page sizes differ: what would be the memory alignment requirements from
the hypervisor for e.g. hypercall arguments? In case it's always
HV_HYP_PAGE_SIZE we're good but could it be PAGE_SIZE (for e.g. TLB
flush hypercall)? I don't know. For x86 this discussion probably makes
no sense. I'm, however, struggling to understand what benefit we will
get from the change. Maybe just leave it as-is for now and fix
arch-independent code only? And later, if we decide to generalize this
code, make another approach? (Not insisting, just a suggestion)

>
>> > @@ -338,7 +349,10 @@ void __init hyperv_init(void)
>> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>> >  
>> > -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
>> > +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
>> > +	if (hv_hypercall_pg)
>> > +		set_memory_x((unsigned long)hv_hypercall_pg, 1);
>> 
>> _RX is not writeable, right?
>> 
> Yes, you are correct. I should use set_memory_ro() in addition to
> set_memory_x().
>
>> > @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
>> >  	 * let hypercall operations fail safely rather than
>> >  	 * panic the kernel for using invalid hypercall page
>> >  	 */
>> > +	kmem_cache_free(cachep, hv_hypercall_pg);
>> 
>> Please don't do that: hyperv_cleanup() is called on kexec/kdump and
>> we're trying to do the bare minimum to allow next kernel to boot. Doing
>> excessive work here will likely lead to consequent problems (we're
>> already crashing the case it's kdump!).
>> 
> Thank you for the explanation! I will remove that.
>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
From: Maya Nakamura @ 2019-04-12  7:24 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel
In-Reply-To: <87wok8it8p.fsf@vitty.brq.redhat.com>

On Fri, Apr 05, 2019 at 01:31:02PM +0200, Vitaly Kuznetsov wrote:
> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
> 
> > @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> >  u32 hv_max_vp_index;
> >  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  
> > +struct kmem_cache *cachep;
> > +EXPORT_SYMBOL_GPL(cachep);
> > +
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> >  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> >  	void **input_arg;
> > -	struct page *pg;
> >  
> >  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> > -	pg = alloc_page(GFP_KERNEL);
> > -	if (unlikely(!pg))
> > +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
> 
> I'm not sure use of kmem_cache is justified here: pages we allocate are
> not cache-line and all these allocations are supposed to persist for the
> lifetime of the guest. In case you think that even on x86 it will be
> possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
> instead.
> 
Thank you for your feedback, Vitaly!

Will you please tell me how cache-line relates to kmem_cache?

I understand that alloc_pages() would work when PAGE_SIZE <=
HV_HYP_PAGE_SIZE, but I think that it would not work if PAGE_SIZE >
HV_HYP_PAGE_SIZE.

> Also, in case the idea is to generalize stuff, what will happen if
> PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?
> 
> I think we can leave hypercall arguments, vp_assist and similar pages
> alone for now: the code is not going to be shared among architectures
> anyways.
> 
About the alignment, kmem_cache_create() aligns memory with its third
parameter, offset.

> > @@ -338,7 +349,10 @@ void __init hyperv_init(void)
> >  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
> >  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
> >  
> > -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
> > +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
> > +	if (hv_hypercall_pg)
> > +		set_memory_x((unsigned long)hv_hypercall_pg, 1);
> 
> _RX is not writeable, right?
> 
Yes, you are correct. I should use set_memory_ro() in addition to
set_memory_x().

> > @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
> >  	 * let hypercall operations fail safely rather than
> >  	 * panic the kernel for using invalid hypercall page
> >  	 */
> > +	kmem_cache_free(cachep, hv_hypercall_pg);
> 
> Please don't do that: hyperv_cleanup() is called on kexec/kdump and
> we're trying to do the bare minimum to allow next kernel to boot. Doing
> excessive work here will likely lead to consequent problems (we're
> already crashing the case it's kdump!).
> 
Thank you for the explanation! I will remove that.


^ permalink raw reply

* RE: [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
From: Vitaly Kuznetsov @ 2019-04-12  6:58 UTC (permalink / raw)
  To: Michael Kelley, m.maya.nakamura
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org
In-Reply-To: <DM5PR2101MB091843B6DD7A11C2C27917F1D7280@DM5PR2101MB0918.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com>  Sent: Friday, April 5, 2019 4:11 AM
>> >
>> > @@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
>> >  	do {
>> >  		diff = end > cur ? end - cur : 0;
>> >
>> > -		gva_list[gva_n] = cur & PAGE_MASK;
>> > +		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;
>> 
>> I'm not sure this is correct: here we're expressing guest virtual
>> addresses in need of flushing, this should be unrelated to the
>> hypervisor page size.
>> 
>
> I talked to the Hyper-V guys about this.  They have not implemented
> the HvFlushVirtualAddressList hypercalls for ARM64 yet.  But when they
> do, they expect the GVA list will need to be in terms of the Hyper-V
> page size.  They don't want to have to figure out the guest page size
> and adjust their arithmetic on a per-guest basis.

Understood, thanks! However, what I wanted to say is that this code is
x86-specific (arch/x86/hyperv/mmu.c) and the implementation is
hard-wired to the spec, namely we use lower 12 bits (there's even a
comment in the code about this which the patch doesn't change) to encode
the number of pages to flush. We can speculate that when these
hypercalls are implemented on ARM they'll have similar requirements but
I'd suggest we leave it as-is for now and fix this when (and, actually,
if) we decide to move this to arch-independent part. We may need to do
additional adjustments - and we don't know about them yet because
there's no spec published.

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v3 0/3] Drivers: hv: vmbus: Fix a race condition in "_show" functions
From: Sasha Levin @ 2019-04-10 22:59 UTC (permalink / raw)
  To: Kimberly Brown
  Cc: Michael Kelley, Long Li, Sasha Levin, Stephen Hemminger,
	Dexuan Cui, K. Y. Srinivasan, Haiyang Zhang, linux-hyperv,
	linux-kernel
In-Reply-To: <cover.1552592620.git.kimbrownkd@gmail.com>

On Thu, Mar 14, 2019 at 04:04:52PM -0400, Kimberly Brown wrote:
>This patchset fixes a race condition in the "_show" functions that
>access the channel ring buffers.
>
>Changes in v3:
>Patch 1: Drivers: hv: vmbus: Refactor chan->state if statement
> - Added the “reviewed-by” line from v2.
>
>Patch 2: Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
> - This patch is new. This change allows the new mutex locks in patch 3
>   to be initialized when the channel is initialized.
>
>Patch 3: Drivers: hv: vmbus: Fix race condition with new
>         ring_buffer_info mutex
> - Added two ring buffer info mutex locks instead of the single channel
>   mutex lock that was proposed in v2.
> - Changed the mutex acquire/release calls as needed for the new ring
>   buffer info mutex locks.
> - Updated the commit message.
>
>
>Changes in v2:
> - In v1, I proposed using “vmbus_connection.channel_mutex” in the
>   “_show” functions to prevent the race condition. However, using this
>   mutex could result in a deadlock, so a new approach is proposed in
>   this patchset.
> - Patch 1 is new and consists of refactoring an if statement.
> - Patch 2 introduces a new mutex lock in the “vmbus_channel” struct,
>   which is used to eliminate the race condition.
>
>Kimberly Brown (3):
>  Drivers: hv: vmbus: Refactor chan->state if statement
>  Drivers: hv: vmbus: Set ring_info field to 0 and remove memset
>  Drivers: hv: vmbus: Fix race condition with new ring_buffer_info mutex
>
> drivers/hv/channel_mgmt.c |  2 +
> drivers/hv/hyperv_vmbus.h |  1 +
> drivers/hv/ring_buffer.c  | 22 ++++++++--
> drivers/hv/vmbus_drv.c    | 89 +++++++++++++++++++++++++++------------
> include/linux/hyperv.h    |  7 ++-
> 5 files changed, 88 insertions(+), 33 deletions(-)

Queued up, thanks Kimberly!

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
From: Vitaly Kuznetsov @ 2019-04-05 11:31 UTC (permalink / raw)
  To: Maya Nakamura, mikelley, kys, haiyangz, sthemmin, sashal
  Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <bdbacc872e369762a877af4415ad1b07054826db.1554426040.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Switch from the function that allocates a single Linux guest page to a
> different one to use a Hyper-V page because the guest page size and
> hypervisor page size concepts are different, even though they happen to
> be the same value on x86.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/hyperv/hv_init.c | 38 +++++++++++++++++++++++++++-----------
>  1 file changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e4ba467a9fc6..5f946135aa18 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -31,6 +31,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/slab.h>
>  #include <linux/cpuhotplug.h>
> +#include <asm/set_memory.h>
>  
>  #ifdef CONFIG_HYPERV_TSCPAGE
>  
> @@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  
> +struct kmem_cache *cachep;
> +EXPORT_SYMBOL_GPL(cachep);
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
>  	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>  	void **input_arg;
> -	struct page *pg;
>  
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> -	pg = alloc_page(GFP_KERNEL);
> -	if (unlikely(!pg))
> +	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);

I'm not sure use of kmem_cache is justified here: pages we allocate are
not cache-line and all these allocations are supposed to persist for the
lifetime of the guest. In case you think that even on x86 it will be
possible to see PAGE_SIZE != HV_HYP_PAGE_SIZE you can use alloc_pages()
instead.

Also, in case the idea is to generalize stuff, what will happen if
PAGE_SIZE > HV_HYP_PAGE_SIZE? Who will guarantee proper alignment?

I think we can leave hypercall arguments, vp_assist and similar pages
alone for now: the code is not going to be shared among architectures
anyways.

> +
> +	if (unlikely(!*input_arg))
>  		return -ENOMEM;
> -	*input_arg = page_address(pg);
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> @@ -122,14 +125,12 @@ static int hv_cpu_init(unsigned int cpu)
>  		return 0;
>  
>  	if (!*hvp)
> -		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> +		*hvp = kmem_cache_alloc(cachep, GFP_KERNEL);
>  
>  	if (*hvp) {
>  		u64 val;
>  
> -		val = vmalloc_to_pfn(*hvp);
> -		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
> -			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
> +		val = virt_to_phys(*hvp) | HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
>  
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
>  	}
> @@ -233,17 +234,22 @@ static int hv_cpu_die(unsigned int cpu)
>  	unsigned long flags;
>  	void **input_arg;
>  	void *input_pg = NULL;
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  
>  	local_irq_save(flags);
>  	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>  	input_pg = *input_arg;
>  	*input_arg = NULL;
>  	local_irq_restore(flags);
> -	free_page((unsigned long)input_pg);
> +	kmem_cache_free(cachep, input_pg);
> +	input_pg = NULL;
>  
>  	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>  		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
>  
> +	kmem_cache_free(cachep, *hvp);
> +	*hvp = NULL;
> +
>  	if (hv_reenlightenment_cb == NULL)
>  		return 0;
>  
> @@ -325,6 +331,11 @@ void __init hyperv_init(void)
>  		goto free_vp_index;
>  	}
>  
> +	cachep = kmem_cache_create("hyperv_pages", HV_HYP_PAGE_SIZE,
> +				   HV_HYP_PAGE_SIZE, 0, NULL);
> +	if (!cachep)
> +		goto free_vp_assist_page;
> +
>  	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>  				  hv_cpu_init, hv_cpu_die);
>  	if (cpuhp < 0)
> @@ -338,7 +349,10 @@ void __init hyperv_init(void)
>  	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
>  
> -	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
> +	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
> +	if (hv_hypercall_pg)
> +		set_memory_x((unsigned long)hv_hypercall_pg, 1);

_RX is not writeable, right?

> +
>  	if (hv_hypercall_pg == NULL) {
>  		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
>  		goto remove_cpuhp_state;
> @@ -346,7 +360,8 @@ void __init hyperv_init(void)
>  
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	hypercall_msr.enable = 1;
> -	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> +	hypercall_msr.guest_physical_address = virt_to_phys(hv_hypercall_pg) >>
> +		HV_HYP_PAGE_SHIFT;
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  
>  	hv_apic_init();
> @@ -416,6 +431,7 @@ void hyperv_cleanup(void)
>  	 * let hypercall operations fail safely rather than
>  	 * panic the kernel for using invalid hypercall page
>  	 */
> +	kmem_cache_free(cachep, hv_hypercall_pg);

Please don't do that: hyperv_cleanup() is called on kexec/kdump and
we're trying to do the bare minimum to allow next kernel to boot. Doing
excessive work here will likely lead to consequent problems (we're
already crashing the case it's kdump!).

>  	hv_hypercall_pg = NULL;
>  
>  	/* Reset the hypercall page */

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
From: Vitaly Kuznetsov @ 2019-04-05 11:10 UTC (permalink / raw)
  To: Maya Nakamura, mikelley, kys, haiyangz, sthemmin, sashal
  Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <3bc5d60092473815fbd90422875233fb6075285b.1554426040.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Replace PAGE_SHIFT, PAGE_SIZE, and PAGE_MASK with HV_HYP_PAGE_SHIFT,
> HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK, respectively, because the guest
> page size and hypervisor page size concepts are different, even though
> they happen to be the same value on x86.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/hyperv/mmu.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e65d7fe6489f..175f6dcc7362 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -15,7 +15,7 @@
>  #include <asm/trace/hyperv.h>
>  
>  /* Each gva in gva_list encodes up to 4096 pages to flush */
> -#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
> +#define HV_TLB_FLUSH_UNIT (4096 * HV_HYP_PAGE_SIZE)
>  
>  static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>  				      const struct flush_tlb_info *info);
> @@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
>  	do {
>  		diff = end > cur ? end - cur : 0;
>  
> -		gva_list[gva_n] = cur & PAGE_MASK;
> +		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;

I'm not sure this is correct: here we're expressing guest virtual
addresses in need of flushing, this should be unrelated to the
hypervisor page size.

>  		/*
>  		 * Lower 12 bits encode the number of additional
>  		 * pages to flush (in addition to the 'cur' page).
>  		 */
>  		if (diff >= HV_TLB_FLUSH_UNIT)
> -			gva_list[gva_n] |= ~PAGE_MASK;
> +			gva_list[gva_n] |= ~HV_HYP_PAGE_MASK;
>  		else if (diff)
> -			gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;
> +			gva_list[gva_n] |= (diff - 1) >> HV_HYP_PAGE_SHIFT;
>  
>  		cur += HV_TLB_FLUSH_UNIT;
>  		gva_n++;
> @@ -129,7 +129,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
>  	 * We can flush not more than max_gvas with one hypercall. Flush the
>  	 * whole address space if we were asked to do more.
>  	 */
> -	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
> +	max_gvas = (HV_HYP_PAGE_SIZE - sizeof(*flush)) /
> +		    sizeof(flush->gva_list[0]);
>  
>  	if (info->end == TLB_FLUSH_ALL) {
>  		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
> @@ -200,9 +201,9 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
>  	 * whole address space if we were asked to do more.
>  	 */
>  	max_gvas =
> -		(PAGE_SIZE - sizeof(*flush) - nr_bank *
> +		(HV_HYP_PAGE_SIZE - sizeof(*flush) - nr_bank *
>  		 sizeof(flush->hv_vp_set.bank_contents[0])) /
> -		sizeof(flush->gva_list[0]);
> +		 sizeof(flush->gva_list[0]);
>  
>  	if (info->end == TLB_FLUSH_ALL) {
>  		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;

-- 
Vitaly

^ permalink raw reply

* [PATCH 6/6] Input: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-04-05  1:20 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/input/serio/hyperv-keyboard.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/input/serio/hyperv-keyboard.c b/drivers/input/serio/hyperv-keyboard.c
index a8b9be3e28db..e1b0feeaeb91 100644
--- a/drivers/input/serio/hyperv-keyboard.c
+++ b/drivers/input/serio/hyperv-keyboard.c
@@ -83,8 +83,8 @@ struct synth_kbd_keystroke {
 
 #define HK_MAXIMUM_MESSAGE_SIZE 256
 
-#define KBD_VSC_SEND_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
-#define KBD_VSC_RECV_RING_BUFFER_SIZE		(10 * PAGE_SIZE)
+#define KBD_VSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define KBD_VSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 #define XTKBD_EMUL0     0xe0
 #define XTKBD_EMUL1     0xe1
-- 
2.17.1


^ permalink raw reply related

* [PATCH 5/6] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Maya Nakamura @ 2019-04-05  1:19 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Define the ring buffer size as a constant expression because it should
not depend on the guest page size.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hid/hid-hyperv.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
index 704049e62d58..4d0b63bf8b33 100644
--- a/drivers/hid/hid-hyperv.c
+++ b/drivers/hid/hid-hyperv.c
@@ -112,8 +112,8 @@ struct synthhid_input_report {
 
 #pragma pack(pop)
 
-#define INPUTVSC_SEND_RING_BUFFER_SIZE		(10*PAGE_SIZE)
-#define INPUTVSC_RECV_RING_BUFFER_SIZE		(10*PAGE_SIZE)
+#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
+#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
 
 
 enum pipe_prot_msg_type {
-- 
2.17.1


^ permalink raw reply related

* [PATCH 4/6] x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
From: Maya Nakamura @ 2019-04-05  1:17 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Replace PAGE_SHIFT, PAGE_SIZE, and PAGE_MASK with HV_HYP_PAGE_SHIFT,
HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK, respectively, because the guest
page size and hypervisor page size concepts are different, even though
they happen to be the same value on x86.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/hyperv/mmu.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e65d7fe6489f..175f6dcc7362 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -15,7 +15,7 @@
 #include <asm/trace/hyperv.h>
 
 /* Each gva in gva_list encodes up to 4096 pages to flush */
-#define HV_TLB_FLUSH_UNIT (4096 * PAGE_SIZE)
+#define HV_TLB_FLUSH_UNIT (4096 * HV_HYP_PAGE_SIZE)
 
 static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 				      const struct flush_tlb_info *info);
@@ -32,15 +32,15 @@ static inline int fill_gva_list(u64 gva_list[], int offset,
 	do {
 		diff = end > cur ? end - cur : 0;
 
-		gva_list[gva_n] = cur & PAGE_MASK;
+		gva_list[gva_n] = cur & HV_HYP_PAGE_MASK;
 		/*
 		 * Lower 12 bits encode the number of additional
 		 * pages to flush (in addition to the 'cur' page).
 		 */
 		if (diff >= HV_TLB_FLUSH_UNIT)
-			gva_list[gva_n] |= ~PAGE_MASK;
+			gva_list[gva_n] |= ~HV_HYP_PAGE_MASK;
 		else if (diff)
-			gva_list[gva_n] |= (diff - 1) >> PAGE_SHIFT;
+			gva_list[gva_n] |= (diff - 1) >> HV_HYP_PAGE_SHIFT;
 
 		cur += HV_TLB_FLUSH_UNIT;
 		gva_n++;
@@ -129,7 +129,8 @@ static void hyperv_flush_tlb_others(const struct cpumask *cpus,
 	 * We can flush not more than max_gvas with one hypercall. Flush the
 	 * whole address space if we were asked to do more.
 	 */
-	max_gvas = (PAGE_SIZE - sizeof(*flush)) / sizeof(flush->gva_list[0]);
+	max_gvas = (HV_HYP_PAGE_SIZE - sizeof(*flush)) /
+		    sizeof(flush->gva_list[0]);
 
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
@@ -200,9 +201,9 @@ static u64 hyperv_flush_tlb_others_ex(const struct cpumask *cpus,
 	 * whole address space if we were asked to do more.
 	 */
 	max_gvas =
-		(PAGE_SIZE - sizeof(*flush) - nr_bank *
+		(HV_HYP_PAGE_SIZE - sizeof(*flush) - nr_bank *
 		 sizeof(flush->hv_vp_set.bank_contents[0])) /
-		sizeof(flush->gva_list[0]);
+		 sizeof(flush->gva_list[0]);
 
 	if (info->end == TLB_FLUSH_ALL) {
 		flush->flags |= HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY;
-- 
2.17.1


^ permalink raw reply related

* [PATCH 3/6] hv: vmbus: Replace page definition with Hyper-V specific one
From: Maya Nakamura @ 2019-04-05  1:16 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
not be 4096 on all architectures and Hyper-V always runs with a page
size of 4096.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 drivers/hv/hyperv_vmbus.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index a94aab94e0b5..00ad573870e9 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -207,11 +207,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
 		       u64 *requestid, bool raw);
 
 /*
- * Maximum channels is determined by the size of the interrupt page
- * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
- * and the other is receive endpoint interrupt
+ * Maximum channels, 16348, is determined by the size of the interrupt page,
+ * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
+ * interrupt, and the other is to receive endpoint interrupt.
  */
-#define MAX_NUM_CHANNELS	((PAGE_SIZE >> 1) << 3)	/* 16348 channels */
+#define MAX_NUM_CHANNELS	((HV_HYP_PAGE_SIZE >> 1) << 3)
 
 /* The value here must be in multiple of 32 */
 /* TODO: Need to make this configurable */
-- 
2.17.1


^ permalink raw reply related

* [PATCH 2/6] x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
From: Maya Nakamura @ 2019-04-05  1:14 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Switch from the function that allocates a single Linux guest page to a
different one to use a Hyper-V page because the guest page size and
hypervisor page size concepts are different, even though they happen to
be the same value on x86.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/hyperv/hv_init.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index e4ba467a9fc6..5f946135aa18 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -31,6 +31,7 @@
 #include <linux/hyperv.h>
 #include <linux/slab.h>
 #include <linux/cpuhotplug.h>
+#include <asm/set_memory.h>
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
@@ -98,18 +99,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
 u32 hv_max_vp_index;
 EXPORT_SYMBOL_GPL(hv_max_vp_index);
 
+struct kmem_cache *cachep;
+EXPORT_SYMBOL_GPL(cachep);
+
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
 	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
 	void **input_arg;
-	struct page *pg;
 
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
-	pg = alloc_page(GFP_KERNEL);
-	if (unlikely(!pg))
+	*input_arg = kmem_cache_alloc(cachep, GFP_KERNEL);
+
+	if (unlikely(!*input_arg))
 		return -ENOMEM;
-	*input_arg = page_address(pg);
 
 	hv_get_vp_index(msr_vp_index);
 
@@ -122,14 +125,12 @@ static int hv_cpu_init(unsigned int cpu)
 		return 0;
 
 	if (!*hvp)
-		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+		*hvp = kmem_cache_alloc(cachep, GFP_KERNEL);
 
 	if (*hvp) {
 		u64 val;
 
-		val = vmalloc_to_pfn(*hvp);
-		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
-			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+		val = virt_to_phys(*hvp) | HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
 
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
 	}
@@ -233,17 +234,22 @@ static int hv_cpu_die(unsigned int cpu)
 	unsigned long flags;
 	void **input_arg;
 	void *input_pg = NULL;
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 
 	local_irq_save(flags);
 	input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
 	input_pg = *input_arg;
 	*input_arg = NULL;
 	local_irq_restore(flags);
-	free_page((unsigned long)input_pg);
+	kmem_cache_free(cachep, input_pg);
+	input_pg = NULL;
 
 	if (hv_vp_assist_page && hv_vp_assist_page[cpu])
 		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
 
+	kmem_cache_free(cachep, *hvp);
+	*hvp = NULL;
+
 	if (hv_reenlightenment_cb == NULL)
 		return 0;
 
@@ -325,6 +331,11 @@ void __init hyperv_init(void)
 		goto free_vp_index;
 	}
 
+	cachep = kmem_cache_create("hyperv_pages", HV_HYP_PAGE_SIZE,
+				   HV_HYP_PAGE_SIZE, 0, NULL);
+	if (!cachep)
+		goto free_vp_assist_page;
+
 	cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
 				  hv_cpu_init, hv_cpu_die);
 	if (cpuhp < 0)
@@ -338,7 +349,10 @@ void __init hyperv_init(void)
 	guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
 	wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-	hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+	hv_hypercall_pg = kmem_cache_alloc(cachep, GFP_KERNEL);
+	if (hv_hypercall_pg)
+		set_memory_x((unsigned long)hv_hypercall_pg, 1);
+
 	if (hv_hypercall_pg == NULL) {
 		wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
 		goto remove_cpuhp_state;
@@ -346,7 +360,8 @@ void __init hyperv_init(void)
 
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	hypercall_msr.enable = 1;
-	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
+	hypercall_msr.guest_physical_address = virt_to_phys(hv_hypercall_pg) >>
+		HV_HYP_PAGE_SHIFT;
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
 	hv_apic_init();
@@ -416,6 +431,7 @@ void hyperv_cleanup(void)
 	 * let hypercall operations fail safely rather than
 	 * panic the kernel for using invalid hypercall page
 	 */
+	kmem_cache_free(cachep, hv_hypercall_pg);
 	hv_hypercall_pg = NULL;
 
 	/* Reset the hypercall page */
-- 
2.17.1


^ permalink raw reply related

* [PATCH 1/6] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
From: Maya Nakamura @ 2019-04-05  1:13 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel
In-Reply-To: <cover.1554426039.git.m.maya.nakamura@gmail.com>

Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
the Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86.

Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.

Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
---
 arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index cdf44aa9a501..44bd68aefd00 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -12,6 +12,16 @@
 #include <linux/types.h>
 #include <asm/page.h>
 
+/*
+ * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
+ * of 4096. These definitions are used when communicating with Hyper-V using
+ * guest physical pages and guest physical page addresses, since the guest page
+ * size may not be 4096 on all architectures.
+ */
+#define HV_HYP_PAGE_SHIFT	12
+#define HV_HYP_PAGE_SIZE	BIT(HV_HYP_PAGE_SHIFT)
+#define HV_HYP_PAGE_MASK	(~(HV_HYP_PAGE_SIZE - 1))
+
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
  * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
@@ -841,7 +851,7 @@ union hv_gpa_page_range {
  * count is equal with how many entries of union hv_gpa_page_range can
  * be populated into the input parameter page.
  */
-#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /	\
+#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) /	\
 				sizeof(union hv_gpa_page_range))
 
 struct hv_guest_mapping_flush_list {
-- 
2.17.1


^ permalink raw reply related

* [PATCH 0/6] hv: Remove dependencies on guest page size
From: Maya Nakamura @ 2019-04-05  1:11 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal; +Cc: x86, linux-hyperv, linux-kernel

The Linux guest page size and hypervisor page size concepts are
different, even though they happen to be the same value on x86. Hyper-V
code mixes up the two, so this patchset begins to address that by
creating and using a set of Hyper-V specific page definitions.

A major benefit of those new definitions is that they support non-x86
architectures, such as ARM64, that use different page sizes. On ARM64,
the guest page size may not be 4096, and Hyper-V always runs with a page
size of 4096.

In this patchset, the first two patches lay the foundation for the
others, creating definitions and preparing for memory allocations of
Hyper-V pages. Subsequent patches apply the definitions where the guest
VM and Hyper-V communicate, and where the code intends to use the
Hyper-V page size. The last two patches set the ring buffer size to a
fixed value, removing the dependencies on the guest page size.

This is the initial set of changes to the Hyper-V code, and future
patches will make additional changes using the same foundation, for
example, replace __vmalloc() and related functions when Hyper-V pages
are intended.

Maya Nakamura (6):
  x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
  x86: hv: hv_init.c: Replace alloc_page() with kmem_cache_alloc()
  hv: vmbus: Replace page definition with Hyper-V specific one
  x86: hv: mmu.c: Replace page definitions with Hyper-V specific ones
  HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
  Input: hv: Remove dependencies on PAGE_SIZE for ring buffer

 arch/x86/hyperv/hv_init.c             | 38 +++++++++++++++++++--------
 arch/x86/hyperv/mmu.c                 | 15 ++++++-----
 arch/x86/include/asm/hyperv-tlfs.h    | 12 ++++++++-
 drivers/hid/hid-hyperv.c              |  4 +--
 drivers/hv/hyperv_vmbus.h             |  8 +++---
 drivers/input/serio/hyperv-keyboard.c |  4 +--
 6 files changed, 54 insertions(+), 27 deletions(-)

-- 
2.17.1


^ permalink raw reply

* Re: linux-5.1-rc3: nvme hv_pci: request for interrupt failed
From: Solio Sarabia @ 2019-04-04  4:37 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, linux-nvme@lists.infradead.org,
	Haiyang Zhang, KY Srinivasan, Michael Kelley, Shiny Sebastian
In-Reply-To: <PU1P153MB01691F731370D45D13C3615CBF500@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

On Thu, Apr 04, 2019 at 02:42:56AM +0000, Dexuan Cui wrote:
> > From: Solio Sarabia <solio.sarabia@intel.com>
> > Sent: Wednesday, April 3, 2019 5:38 PM
> > To: linux-hyperv@vger.kernel.org; linux-nvme@lists.infradead.org
> > 
> > When two nvme devices are discrete-assigned [1] to a linuxvm on
> > hyper-v rs5 host, it fails to initialize both.  It worked a couple of
> > times and after some reboots it failed. `dmesg` shows:
> > 
> > [   14.099310] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: Request for
> > interrupt failed: 0xc000009a
> > 
> > Thanks,
> > -Solio
> 
> 0xc000009a is STATUS_INSUFFICIENT_RESOURCES.
> 
> This is a known host resource leakage bug of the RS5 host. After the issue
> happens, rebooting the VM can not help, and rebooting the host may hang
> and we may have to power cycle the host by force.
> 
> The bug has been fixed in 19H1, which is in the Insider Preview phase, though:
> https://docs.microsoft.com/en-us/windows-insider/at-home/whats-new-wip-at-home-19h1
> https://www.microsoft.com/en-us/software-download/windowsinsiderpreviewadvanced
> 
> The fix is being backported to RS5, but I don't have an ETA yet. 
> I'll try to get more info today and keep you updated.
> 
> Thanks,
> -- Dexuan

Great to know you're on top of things.  Guest hanged and had to reboot host;
in all cases guest was in consistent state upon reboot.

It's not a blocking issue at the moment, I can work on one device at a time.

Thanks,
-Solio

^ permalink raw reply

* Re: [PATCH 1/1] scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
From: Martin K. Petersen @ 2019-04-04  3:31 UTC (permalink / raw)
  To: Michael Kelley
  Cc: KY Srinivasan, martin.petersen@oracle.com, Long Li,
	James.Bottomley@hansenpartnership.com, emilne@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org
In-Reply-To: <1554154871-10305-1-git-send-email-mikelley@microsoft.com>


Michael,

> Reduce the default VMbus channel ring buffer size for storvsc SCSI
> devices from 1 Mbyte to 128 Kbytes. Measurements show that ring buffer
> sizes above 128 Kbytes do not increase performance even at very high
> IOPS rates, so don't waste the memory. Also remove the dependence on
> PAGE_SIZE, since the ring buffer size should not change on
> architectures where PAGE_SIZE is not 4 Kbytes.

Applied to 5.1/scsi-fixes, thanks.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH v2 1/1] scsi: storvsc: Fix calculation of sub-channel count
From: Martin K. Petersen @ 2019-04-04  3:30 UTC (permalink / raw)
  To: Michael Kelley
  Cc: KY Srinivasan, martin.petersen@oracle.com, Long Li,
	James.Bottomley@hansenpartnership.com, emilne@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org
In-Reply-To: <1554134985-8671-1-git-send-email-mikelley@microsoft.com>


Michael,

> When the number of sub-channels offered by Hyper-V is >= the number of
> CPUs in the VM, calculate the correct number of sub-channels.  The
> current code produces one too many.

Applied to 5.1/scsi-fixes, thanks.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* RE: linux-5.1-rc3: nvme hv_pci: request for interrupt failed
From: Dexuan Cui @ 2019-04-04  2:42 UTC (permalink / raw)
  To: Solio Sarabia, linux-hyperv@vger.kernel.org,
	linux-nvme@lists.infradead.org
  Cc: Haiyang Zhang, KY Srinivasan, Michael Kelley, Shiny Sebastian
In-Reply-To: <FCE231C24CDC4243982F7030CC75E92F644C034C@FMSMSX102.amr.corp.intel.com>

> From: Solio Sarabia <solio.sarabia@intel.com>
> Sent: Wednesday, April 3, 2019 5:38 PM
> To: linux-hyperv@vger.kernel.org; linux-nvme@lists.infradead.org
> 
> When two nvme devices are discrete-assigned [1] to a linuxvm on
> hyper-v rs5 host, it fails to initialize both.  It worked a couple of
> times and after some reboots it failed. `dmesg` shows:
> 
> [   14.099310] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: Request for
> interrupt failed: 0xc000009a
> 
> Thanks,
> -Solio

0xc000009a is STATUS_INSUFFICIENT_RESOURCES.

This is a known host resource leakage bug of the RS5 host. After the issue
happens, rebooting the VM can not help, and rebooting the host may hang
and we may have to power cycle the host by force.

The bug has been fixed in 19H1, which is in the Insider Preview phase, though:
https://docs.microsoft.com/en-us/windows-insider/at-home/whats-new-wip-at-home-19h1
https://www.microsoft.com/en-us/software-download/windowsinsiderpreviewadvanced

The fix is being backported to RS5, but I don't have an ETA yet. 
I'll try to get more info today and keep you updated.

Thanks,
-- Dexuan

^ permalink raw reply

* linux-5.1-rc3: nvme hv_pci: request for interrupt failed
From: Solio Sarabia @ 2019-04-04  0:38 UTC (permalink / raw)
  To: linux-hyperv, linux-nvme; +Cc: haiyangz, kys, decui, mikelley, shiny.sebastian

When two nvme devices are discrete-assigned [1] to a linuxvm on
hyper-v rs5 host, it fails to initialize both.  It worked a couple of
times and after some reboots it failed. `dmesg` shows:

[   13.941971] nvme nvme0: pci function 82c6:00:00.0
[   13.942802] nvme 82c6:00:00.0: can't derive routing for PCI INT A
[   13.942803] nvme 82c6:00:00.0: PCI INT A: no GSI
[   13.942844] nvme nvme1: pci function 8f8d:00:00.0
[   13.943397] nvme 8f8d:00:00.0: can't derive routing for PCI INT A
[   13.943399] nvme 8f8d:00:00.0: PCI INT A: no GSI
[   14.099310] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: Request for interrupt failed: 0xc000009a
[   14.099353] hv_pci 092472da-23bf-434f-8f8d-cc7546cf6cc1: Request for interrupt failed: 0xc000009a
[   14.119391] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: hv_irq_unmask() failed: 0x5
[   14.124416] hv_pci 092472da-23bf-434f-8f8d-cc7546cf6cc1: hv_irq_unmask() failed: 0x5
[   74.932888] nvme nvme1: I/O 7 QID 0 timeout, completion polled
[   74.932893] nvme nvme0: I/O 3 QID 0 timeout, completion polled
[  136.372890] nvme nvme1: I/O 4 QID 0 timeout, completion polled
[  136.372892] nvme nvme0: I/O 20 QID 0 timeout, completion polled
[  136.373280] hv_pci 092472da-23bf-434f-8f8d-cc7546cf6cc1: Request for interrupt failed: 0xc000009a
[  136.373432] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: Request for interrupt failed: 0xc000009a
[  136.376262] hv_pci 092472da-23bf-434f-8f8d-cc7546cf6cc1: hv_irq_unmask() failed: 0x5
[  136.376906] hv_pci 96a07283-8dac-417a-82c6-111eb8b9a4c0: hv_irq_unmask() failed: 0x5
loop of 'interrupt failed' and 'hv_irq_unmask' calls
...

Device is intel ssd p4608 pci nvme, that consists of two nvme devices
as seen by linux (5.0.1-rc3).  Some info from `lspci -v`:

82c6:00:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500/P4600 (prog-if 02 [NVM Express])
8f8d:00:00.0 Non-Volatile memory controller: Intel Corporation Express Flash NVMe P4500/P4600 (prog-if 02 [NVM Express])

Let me know if other info/logs are needed.

[1] https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/deploy/deploying-storage-devices-using-dda

Thanks,
-Solio

^ permalink raw reply

* [PATCH] x86/hyper-v: implement EOI assist
From: Vitaly Kuznetsov @ 2019-04-03 17:03 UTC (permalink / raw)
  To: linux-hyperv
  Cc: x86, linux-kernel, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Michael Kelley (EOSG), Long Li,
	Simon Xiao

Hyper-V TLFS suggests an optimization to avoid imminent VMExit on EOI:
"The OS performs an EOI by atomically writing zero to the EOI Assist field
of the virtual VP assist page and checking whether the "No EOI required"
field was previously zero. If it was, the OS must write to the
HV_X64_APIC_EOI MSR thereby triggering an intercept into the hypervisor."

Implement the optimization in Linux.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/hyperv/hv_apic.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
index 8eb6fbee8e13..5c056b8aebef 100644
--- a/arch/x86/hyperv/hv_apic.c
+++ b/arch/x86/hyperv/hv_apic.c
@@ -86,6 +86,11 @@ static void hv_apic_write(u32 reg, u32 val)
 
 static void hv_apic_eoi_write(u32 reg, u32 val)
 {
+	struct hv_vp_assist_page *hvp = hv_vp_assist_page[smp_processor_id()];
+
+	if (hvp && (xchg(&hvp->apic_assist, 0) & 0x1))
+		return;
+
 	wrmsr(HV_X64_MSR_EOI, val, 0);
 }
 
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH 1/1] scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
From: Haiyang Zhang @ 2019-04-02 15:58 UTC (permalink / raw)
  To: Michael Kelley, KY Srinivasan, martin.petersen@oracle.com,
	Long Li, James.Bottomley@hansenpartnership.com, emilne@redhat.com,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org
  Cc: Michael Kelley
In-Reply-To: <1554154871-10305-1-git-send-email-mikelley@microsoft.com>



> -----Original Message-----
> From: linux-hyperv-owner@vger.kernel.org <linux-hyperv-
> owner@vger.kernel.org> On Behalf Of Michael Kelley
> Sent: Monday, April 1, 2019 5:42 PM
> To: KY Srinivasan <kys@microsoft.com>; martin.petersen@oracle.com; Long Li
> <longli@microsoft.com>; James.Bottomley@hansenpartnership.com;
> emilne@redhat.com; linux-hyperv@vger.kernel.org; linux-
> kernel@vger.kernel.org; linux-scsi@vger.kernel.org
> Cc: Michael Kelley <mikelley@microsoft.com>
> Subject: [PATCH 1/1] scsi: storvsc: Reduce default ring buffer size to 128 Kbytes
> 
> Reduce the default VMbus channel ring buffer size for storvsc SCSI devices
> from 1 Mbyte to 128 Kbytes. Measurements show that ring buffer sizes above
> 128 Kbytes do not increase performance even at very high IOPS rates, so don't
> waste the memory. Also remove the dependence on PAGE_SIZE, since the ring
> buffer size should not change on architectures where PAGE_SIZE is not 4 Kbytes.
> 
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thank you.

^ permalink raw reply

* [PATCH] Do not overwrite CFLAG passed into makefile
From: Zhongcheng Lao @ 2019-04-02 14:11 UTC (permalink / raw)
  To: haiyangz
  Cc: Zhongcheng Lao, K. Y. Srinivasan, Stephen Hemminger, Sasha Levin,
	linux-hyperv, linux-kernel

It is necessary to pass a custom value for KVP_SCRIPTS_PATH
because of the directory structure differences in Linux distros.
Current makefile does not allow this.

The patch fixes the issue.

Signed-off-by: Zhongcheng Lao <Zhongcheng.Lao@microsoft.com>
---
 tools/hv/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index 5db5e62cebda..058c3dbb7824 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -2,7 +2,7 @@
 # Makefile for Hyper-V tools
 
 WARNINGS = -Wall -Wextra
-CFLAGS = $(WARNINGS) -g $(shell getconf LFS_CFLAGS)
+CFLAGS += $(WARNINGS) -g $(shell getconf LFS_CFLAGS)
 
 CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
 
-- 
2.21.0


^ permalink raw reply related

* [PATCH v2] Remove SPDX "WITH Linux-syscall-note" from kernel-space headers
From: Masahiro Yamada @ 2019-04-02  9:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Karthikeyan Ramasubramanian, Radim Krčmář,
	Thomas Gleixner, Andy Gross, Vitaly Kuznetsov, Girish Mahadevan,
	Sagar Dharia, Masahiro Yamada, linux-arch, H. Peter Anvin,
	Arnd Bergmann, Haiyang Zhang, K. Y. Srinivasan, Sasha Levin,
	Borislav Petkov, x86, linux-hyperv, linux-kernel,
	Stephen Hemminger, Ingo Molnar

The "WITH Linux-syscall-note" should be added to headers exported to
the user-space.

Some kernel-space headers have "WITH Linux-syscall-note", which seems
a mistake.

[1] arch/x86/include/asm/hyperv-tlfs.h

5a4858032217 ("x86/hyper-v: move hyperv.h out of uapi") moved this file
out of uapi, but missed to update the SPDX License tag.

[2] include/asm-generic/shmparam.h

76ce2a80a28e ("Rename include/{uapi => }/asm-generic/shmparam.h really")
moved this file out of uapi, but missed to update the SPDX License tag.

[3] include/linux/qcom-geni-se.h

eddac5af0654 ("soc: qcom: Add GENI based QUP Wrapper driver") added this
file, but I do not see a good reason why its license tag must include
"WITH Linux-syscall-note".

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

Changes in v2:
  - Fix a typo in commit log

 arch/x86/include/asm/hyperv-tlfs.h | 2 +-
 include/asm-generic/shmparam.h     | 2 +-
 include/linux/qcom-geni-se.h       | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 2bdbbbc..cdf44aa 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* SPDX-License-Identifier: GPL-2.0 */
 
 /*
  * This file contains definitions from Hyper-V Hypervisor Top-Level Functional
diff --git a/include/asm-generic/shmparam.h b/include/asm-generic/shmparam.h
index 8b78c0b..b8f9035 100644
--- a/include/asm-generic/shmparam.h
+++ b/include/asm-generic/shmparam.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __ASM_GENERIC_SHMPARAM_H
 #define __ASM_GENERIC_SHMPARAM_H
 
diff --git a/include/linux/qcom-geni-se.h b/include/linux/qcom-geni-se.h
index 3bcd67f..dd46494 100644
--- a/include/linux/qcom-geni-se.h
+++ b/include/linux/qcom-geni-se.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (c) 2017-2018, The Linux Foundation. All rights reserved.
  */
-- 
2.7.4


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox