Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] hv_balloon: Use a static page for the balloon_up send buffer
From: Sasha Levin @ 2019-07-30 22:36 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-hyperv@vger.kernel.org, gregkh@linuxfoundation.org,
	Stephen Hemminger, Sasha Levin, Haiyang Zhang, KY Srinivasan,
	linux-kernel@vger.kernel.org, Michael Kelley, Tianyu Lan,
	olaf@aepfle.de, apw@canonical.com, jasowang@redhat.com, vkuznets,
	marcelo.cerri@canonical.com
In-Reply-To: <1560537692-37400-1-git-send-email-decui@microsoft.com>

On Fri, Jun 14, 2019 at 06:42:17PM +0000, Dexuan Cui wrote:
>It's unnecessary to dynamically allocate the buffer.
>
>Signed-off-by: Dexuan Cui <decui@microsoft.com>

I've queued these up for hyperv-next, thanks!

--
Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH 3/7] Drivers: hv: vmbus: Split hv_synic_init/cleanup into regs and timer settings
From: Michael Kelley @ 2019-07-30 22:35 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1562650084-99874-4-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com>  Sent: Monday, July 8, 2019 10:29 PM
> 
> There is only one functional change: the unnecessary check
> "if (sctrl.enable != 1) return -EFAULT;" is removed, because when we're in
> hv_synic_cleanup(), we're absolutely sure sctrl.enable must be 1.
> 
> The new functions hv_synic_disable/enable_regs() will be used by a later patch
> to support hibernation.

Seems like this commit message doesn't really describe the main change.
How about:

Break out synic enable and disable operations into separate
hv_synic_disable_regs() and hv_synic_enable_regs() functions for use by a
later patch to support hibernation.

There is no functional change except the unnecessary check
"if (sctrl.enable != 1) return -EFAULT;" is removed, because when we're in
hv_synic_cleanup(), we're absolutely sure sctrl.enable must be 1.

Otherwise,

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>

> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/hv/hv.c           | 66 ++++++++++++++++++++++++++---------------------
>  drivers/hv/hyperv_vmbus.h |  2 ++
>  2 files changed, 39 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
> index 6188fb7..fcc5279 100644
> --- a/drivers/hv/hv.c
> +++ b/drivers/hv/hv.c
> @@ -154,7 +154,7 @@ void hv_synic_free(void)
>   * retrieve the initialized message and event pages.  Otherwise, we create and
>   * initialize the message and event pages.
>   */
> -int hv_synic_init(unsigned int cpu)
> +void hv_synic_enable_regs(unsigned int cpu)
>  {
>  	struct hv_per_cpu_context *hv_cpu
>  		= per_cpu_ptr(hv_context.cpu_context, cpu);
> @@ -196,6 +196,11 @@ int hv_synic_init(unsigned int cpu)
>  	sctrl.enable = 1;
> 
>  	hv_set_synic_state(sctrl.as_uint64);
> +}
> +
> +int hv_synic_init(unsigned int cpu)
> +{
> +	hv_synic_enable_regs(cpu);
> 
>  	hv_stimer_init(cpu);
> 
> @@ -205,20 +210,45 @@ int hv_synic_init(unsigned int cpu)
>  /*
>   * hv_synic_cleanup - Cleanup routine for hv_synic_init().
>   */
> -int hv_synic_cleanup(unsigned int cpu)
> +void hv_synic_disable_regs(unsigned int cpu)
>  {
>  	union hv_synic_sint shared_sint;
>  	union hv_synic_simp simp;
>  	union hv_synic_siefp siefp;
>  	union hv_synic_scontrol sctrl;
> +
> +	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> +	shared_sint.masked = 1;
> +
> +	/* Need to correctly cleanup in the case of SMP!!! */
> +	/* Disable the interrupt */
> +	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> +
> +	hv_get_simp(simp.as_uint64);
> +	simp.simp_enabled = 0;
> +	simp.base_simp_gpa = 0;
> +
> +	hv_set_simp(simp.as_uint64);
> +
> +	hv_get_siefp(siefp.as_uint64);
> +	siefp.siefp_enabled = 0;
> +	siefp.base_siefp_gpa = 0;
> +
> +	hv_set_siefp(siefp.as_uint64);
> +
> +	/* Disable the global synic bit */
> +	hv_get_synic_state(sctrl.as_uint64);
> +	sctrl.enable = 0;
> +	hv_set_synic_state(sctrl.as_uint64);
> +}
> +
> +int hv_synic_cleanup(unsigned int cpu)
> +{
>  	struct vmbus_channel *channel, *sc;
>  	bool channel_found = false;
>  	unsigned long flags;
> 
> -	hv_get_synic_state(sctrl.as_uint64);
> -	if (sctrl.enable != 1)
> -		return -EFAULT;
> -
>  	/*
>  	 * Search for channels which are bound to the CPU we're about to
>  	 * cleanup. In case we find one and vmbus is still connected we need to
> @@ -249,29 +279,7 @@ int hv_synic_cleanup(unsigned int cpu)
> 
>  	hv_stimer_cleanup(cpu);
> 
> -	hv_get_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> -
> -	shared_sint.masked = 1;
> -
> -	/* Need to correctly cleanup in the case of SMP!!! */
> -	/* Disable the interrupt */
> -	hv_set_synint_state(VMBUS_MESSAGE_SINT, shared_sint.as_uint64);
> -
> -	hv_get_simp(simp.as_uint64);
> -	simp.simp_enabled = 0;
> -	simp.base_simp_gpa = 0;
> -
> -	hv_set_simp(simp.as_uint64);
> -
> -	hv_get_siefp(siefp.as_uint64);
> -	siefp.siefp_enabled = 0;
> -	siefp.base_siefp_gpa = 0;
> -
> -	hv_set_siefp(siefp.as_uint64);
> -
> -	/* Disable the global synic bit */
> -	sctrl.enable = 0;
> -	hv_set_synic_state(sctrl.as_uint64);
> +	hv_synic_disable_regs(cpu);
> 
>  	return 0;
>  }
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index 362e70e..26ea161 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -171,8 +171,10 @@ extern int hv_post_message(union hv_connection_id
> connection_id,
> 
>  extern void hv_synic_free(void);
> 
> +extern void hv_synic_enable_regs(unsigned int cpu);
>  extern int hv_synic_init(unsigned int cpu);
> 
> +extern void hv_synic_disable_regs(unsigned int cpu);
>  extern int hv_synic_cleanup(unsigned int cpu);
> 
>  /* Interface */
> --
> 1.8.3.1


^ permalink raw reply

* Re: [PATCH] hv: Use the correct style for SPDX License Identifier
From: Sasha Levin @ 2019-07-30 22:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Nishad Kamdar, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Joe Perches, Uwe Kleine-König, linux-hyperv, linux-kernel
In-Reply-To: <20190722140809.GA29862@kroah.com>

On Mon, Jul 22, 2019 at 04:08:09PM +0200, Greg Kroah-Hartman wrote:
>On Mon, Jul 22, 2019 at 07:01:17PM +0530, Nishad Kamdar wrote:
>> This patch corrects the SPDX License Identifier style
>> in the trace header file related to Microsoft Hyper-V
>> client drivers.
>> For C header files Documentation/process/license-rules.rst
>> mandates C-like comments (opposed to C source files where
>> C++ style should be used)
>>
>> Changes made by using a script provided by Joe Perches here:
>> https://lkml.org/lkml/2019/2/7/46
>>
>> Suggested-by: Joe Perches <joe@perches.com>
>> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
>
>Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Queued up for hyperv-fixes, thanks!

--
Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH 2/7] clocksource/drivers: Suspend/resume Hyper-V clocksource for hibernation
From: Michael Kelley @ 2019-07-30 22:23 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1562650084-99874-3-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, July 8, 2019 10:29 PM
> 
> This is needed for hibernation, e.g. when we resume the old kernel, we need
> to disable the "current" kernel's TSC page and then resume the old kernel's.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/clocksource/hyperv_timer.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
> index ba2c79e6..41c31a7 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -237,12 +237,37 @@ static u64 read_hv_clock_tsc(struct clocksource *arg)
>  	return read_hv_sched_clock_tsc();
>  }
> 
> +static void suspend_hv_clock_tsc(struct clocksource *arg)
> +{
> +	u64 tsc_msr;
> +
> +	/* Disable the TSC page */
> +	hv_get_reference_tsc(tsc_msr);
> +	tsc_msr &= ~BIT_ULL(0);
> +	hv_set_reference_tsc(tsc_msr);
> +}
> +
> +
> +static void resume_hv_clock_tsc(struct clocksource *arg)
> +{
> +	phys_addr_t phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
> +	u64 tsc_msr;
> +
> +	/* Re-enable the TSC page */
> +	hv_get_reference_tsc(tsc_msr);
> +	tsc_msr &= GENMASK_ULL(11, 0);
> +	tsc_msr |= BIT_ULL(0) | (u64)phys_addr;
> +	hv_set_reference_tsc(tsc_msr);
> +}
> +
>  static struct clocksource hyperv_cs_tsc = {
>  	.name	= "hyperv_clocksource_tsc_page",
>  	.rating	= 400,
>  	.read	= read_hv_clock_tsc,
>  	.mask	= CLOCKSOURCE_MASK(64),
>  	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
> +	.suspend= suspend_hv_clock_tsc,
> +	.resume	= resume_hv_clock_tsc,
>  };
>  #endif
> 
> --
> 1.8.3.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply

* RE: [PATCH 1/7] x86/hyper-v: Suspend/resume the hypercall page for hibernation
From: Michael Kelley @ 2019-07-30 22:18 UTC (permalink / raw)
  To: Dexuan Cui, linux-hyperv@vger.kernel.org,
	gregkh@linuxfoundation.org, Stephen Hemminger, Sasha Levin,
	sashal@kernel.org, Haiyang Zhang, KY Srinivasan,
	tglx@linutronix.de
  Cc: linux-kernel@vger.kernel.org
In-Reply-To: <1562650084-99874-2-git-send-email-decui@microsoft.com>

From: Dexuan Cui <decui@microsoft.com> Sent: Monday, July 8, 2019 10:29 PM
> 
> This is needed for hibernation, e.g. when we resume the old kernel, we need
> to disable the "current" kernel's hypercall page and then resume the old
> kernel's.
> 
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0e033ef..3005871 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -20,6 +20,7 @@
>  #include <linux/hyperv.h>
>  #include <linux/slab.h>
>  #include <linux/cpuhotplug.h>
> +#include <linux/syscore_ops.h>
>  #include <clocksource/hyperv_timer.h>
> 
>  void *hv_hypercall_pg;
> @@ -214,6 +215,34 @@ static int __init hv_pci_init(void)
>  	return 1;
>  }
> 
> +static int hv_suspend(void)
> +{
> +	union hv_x64_msr_hypercall_contents hypercall_msr;
> +
> +	/* Reset the hypercall page */
> +	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +	hypercall_msr.enable = 0;
> +	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> +	return 0;
> +}
> +
> +static void hv_resume(void)
> +{
> +	union hv_x64_msr_hypercall_contents hypercall_msr;
> +
> +	/* Re-enable the hypercall page */
> +	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +	hypercall_msr.enable = 1;
> +	hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> +	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +}
> +
> +static struct syscore_ops hv_syscore_ops = {
> +	.suspend = hv_suspend,
> +	.resume = hv_resume,
> +};
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -294,6 +323,9 @@ void __init hyperv_init(void)
> 
>  	/* Register Hyper-V specific clocksource */
>  	hv_init_clocksource();
> +
> +	register_syscore_ops(&hv_syscore_ops);
> +
>  	return;
> 
>  remove_cpuhp_state:
> @@ -313,6 +345,8 @@ void hyperv_cleanup(void)
>  {
>  	union hv_x64_msr_hypercall_contents hypercall_msr;
> 
> +	unregister_syscore_ops(&hv_syscore_ops);
> +
>  	/* Reset our OS id */
>  	wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
> 
> --
> 1.8.3.1

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply

* RE: [PATCH v2] x86/hyper-v: Zero out the VP ASSIST PAGE to fix CPU offlining
From: Michael Kelley @ 2019-07-30 20:22 UTC (permalink / raw)
  To: Dexuan Cui, Thomas Gleixner, vkuznets, Haiyang Zhang,
	KY Srinivasan, Stephen Hemminger, Sasha Levin,
	linux-hyperv@vger.kernel.org, Long Li, x86@kernel.org
  Cc: Ingo Molnar, Borislav Petkov, H. Peter Anvin, jasowang@redhat.com,
	driverdev-devel@linuxdriverproject.org,
	linux-kernel@vger.kernel.org, apw@canonical.com,
	marcelo.cerri@canonical.com, olaf@aepfle.de
In-Reply-To: <PU1P153MB0169B716A637FABF07433C04BFCB0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>

From: Dexuan Cui <decui@microsoft.com> Sent: Thursday, July 18, 2019 8:23 PM
> 
> The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
> 5.2.1 "GPA Overlay Pages" for the details) and here is an excerpt:
> 
> "
> The hypervisor defines several special pages that "overlay" the guest's
> Guest Physical Addresses (GPA) space. Overlays are addressed GPA but are
> not included in the normal GPA map maintained internally by the hypervisor.
> Conceptually, they exist in a separate map that overlays the GPA map.
> 
> If a page within the GPA space is overlaid, any SPA page mapped to the
> GPA page is effectively "obscured" and generally unreachable by the
> virtual processor through processor memory accesses.
> 
> If an overlay page is disabled, the underlying GPA page is "uncovered",
> and an existing mapping becomes accessible to the guest.
> "
> 
> SPA = System Physical Address = the final real physical address.
> 
> When a CPU (e.g. CPU1) is being onlined, in hv_cpu_init(), we allocate the
> VP ASSIST PAGE and enable the EOI optimization for this CPU by writing the
> MSR HV_X64_MSR_VP_ASSIST_PAGE. From now on, hvp->apic_assist belongs to the
> special SPA page, and this CPU *always* uses hvp->apic_assist (which is
> shared with the hypervisor) to decide if it needs to write the EOI MSR.
> 
> When a CPU (e.g. CPU1) is being offlined, on this CPU, we do:
> 1. in hv_cpu_die(), we disable the EOI optimizaton for this CPU, and from
>    now on hvp->apic_assist belongs to the original "normal" SPA page;
> 2. we finish the remaining work of stopping this CPU;
> 3. this CPU is completely stopped.
> 
> Between 1 and 3, this CPU can still receive interrupts (e.g. reschedule
> IPIs from CPU0, and Local APIC timer interrupts), and this CPU *must* write
> the EOI MSR for every interrupt received, otherwise the hypervisor may not
> deliver further interrupts, which may be needed to completely stop the CPU.
> 
> So, after we disable the EOI optimization in hv_cpu_die(), we need to make
> sure hvp->apic_assist's bit0 is zero. The easiest way is we just zero out
> the page when it's allocated in hv_cpu_init().
> 
> Note 1: after the "normal" SPA page is allocted and zeroed out, neither the
> hypervisor nor the guest writes into the page, so the page remains with
> zeros.
> 
> Note 2: see Section 10.3.5 "EOI Assist" for the details of the EOI
> optimization. When the optimization is enabled, the guest can still write
> the EOI MSR register irrespective of the "No EOI required" value, though
> by doing so we can't benefit from the optimization.
> 
> Fixes: ba696429d290 ("x86/hyper-v: Implement EOI assist")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
> 
> v2: there is no code change. I just improved the comment and the changelog
> according to the discussion with tglx:
> 
> https://lkml.org/lkml/2019/7/17/781
> https://lkml.org/lkml/2019/7/18/91 
> 
>  arch/x86/hyperv/hv_init.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0e033ef11a9f..d26832cb38bb 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -60,8 +60,16 @@ static int hv_cpu_init(unsigned int cpu)
>  	if (!hv_vp_assist_page)
>  		return 0;
> 
> +	/*
> +	 * The VP ASSIST PAGE is an "overlay" page (see Hyper-V TLFS's Section
> +	 * 5.2.1 "GPA Overlay Pages"). Here it must be zeroed out to make sure
> +	 * we always write the EOI MSR in hv_apic_eoi_write() *after* the
> +	 * EOI optimization is disabled in hv_cpu_die(), otherwise a CPU may
> +	 * not be stopped in the case of CPU offlining and the VM will hang.
> +	 */
>  	if (!*hvp)
> -		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
> +		*hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
> +				 PAGE_KERNEL);
> 
>  	if (*hvp) {
>  		u64 val;
> --
> 2.19.1

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Tianyu Lan @ 2019-07-30 13:41 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Peter Zijlstra, Tianyu Lan, linux-arch, linux-hyperv,
	linux-kernel@vger kernel org, Andy Lutomirski, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	the arch/x86 maintainers, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Daniel Lezcano, Arnd Bergmann,
	michael.h.kelley, ashal
In-Reply-To: <87wog1kpib.fsf@vitty.brq.redhat.com>

Hi Vitaly & Peter:
    Thanks for your review.

On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov <vkuznets@redhat.com> wrote:
>
> Peter Zijlstra <peterz@infradead.org> writes:
>
> > On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> >> lantianyu1986@gmail.com writes:
> >>
> >> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >> >
> >> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> >> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> >> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> >> > to set the sched clock function appropriately.  On x86, this sets
> >> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> >> > scaled and adjusted to be continuous.
> >>
> >> Hypervisor can, in theory, disable TSC page and then we're forced to use
> >> MSR-based clocksource but using it as sched_clock() can be very slow,
> >> I'm afraid.
> >>
> >> On the other hand, what we have now is probably worse: TSC can,
> >> actually, jump backwards (e.g. on migration) and we're breaking the
> >> requirements for sched_clock().
> >
> > That (obviously) also breaks the requirements for using TSC as
> > clocksource.
> >
> > IOW, it breaks the entire purpose of having TSC in the first place.
>
> Currently, we mark raw TSC as unstable when running on Hyper-V (see
> 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
> instead. The problem is that 'TSC page' can be disabled by the
> hypervisor and in that case the only remaining clocksource is MSR-based
> (slow).
>

Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.

-- 
Best regards
Tianyu Lan

^ permalink raw reply

* [PATCH 2/2] Drivers: hv: vmbus: Remove dependencies on guest page size
From: Himadri Pandya @ 2019-07-30  9:49 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa
  Cc: x86, linux-hyperv, linux-kernel, Himadri Pandya
In-Reply-To: <20190730094944.96007-1-himadri18.07@gmail.com>

Hyper-V assumes page size to be 4K. This might not be the case for ARM64
architecture. Hence use hyper-v page size and page allocation function
to avoid conflicts between different host and guest page size on ARM64.

Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
---
 drivers/hv/connection.c | 14 +++++++-------
 drivers/hv/vmbus_drv.c  |  6 +++---
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 09829e15d4a0..dcb8f6a8c08c 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -202,7 +202,7 @@ int vmbus_connect(void)
 	 * abstraction stuff
 	 */
 	vmbus_connection.int_page =
-	(void *)__get_free_pages(GFP_KERNEL|__GFP_ZERO, 0);
+	(void *)hv_alloc_hyperv_zeroed_page();
 	if (vmbus_connection.int_page == NULL) {
 		ret = -ENOMEM;
 		goto cleanup;
@@ -211,14 +211,14 @@ int vmbus_connect(void)
 	vmbus_connection.recv_int_page = vmbus_connection.int_page;
 	vmbus_connection.send_int_page =
 		(void *)((unsigned long)vmbus_connection.int_page +
-			(PAGE_SIZE >> 1));
+			(HV_HYP_PAGE_SIZE >> 1));
 
 	/*
 	 * Setup the monitor notification facility. The 1st page for
 	 * parent->child and the 2nd page for child->parent
 	 */
-	vmbus_connection.monitor_pages[0] = (void *)__get_free_pages((GFP_KERNEL|__GFP_ZERO), 0);
-	vmbus_connection.monitor_pages[1] = (void *)__get_free_pages((GFP_KERNEL|__GFP_ZERO), 0);
+	vmbus_connection.monitor_pages[0] = (void *)hv_alloc_hyperv_zeroed_page();
+	vmbus_connection.monitor_pages[1] = (void *)hv_alloc_hyperv_zeroed_page();
 	if ((vmbus_connection.monitor_pages[0] == NULL) ||
 	    (vmbus_connection.monitor_pages[1] == NULL)) {
 		ret = -ENOMEM;
@@ -291,12 +291,12 @@ void vmbus_disconnect(void)
 		destroy_workqueue(vmbus_connection.work_queue);
 
 	if (vmbus_connection.int_page) {
-		free_pages((unsigned long)vmbus_connection.int_page, 0);
+		hv_free_hyperv_page((unsigned long)vmbus_connection.int_page);
 		vmbus_connection.int_page = NULL;
 	}
 
-	free_pages((unsigned long)vmbus_connection.monitor_pages[0], 0);
-	free_pages((unsigned long)vmbus_connection.monitor_pages[1], 0);
+	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[0]);
+	hv_free_hyperv_page((unsigned long)vmbus_connection.monitor_pages[1]);
 	vmbus_connection.monitor_pages[0] = NULL;
 	vmbus_connection.monitor_pages[1] = NULL;
 }
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index ebd35fc35290..2ee388a23c8f 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1186,7 +1186,7 @@ static void hv_kmsg_dump(struct kmsg_dumper *dumper,
 	 * Write dump contents to the page. No need to synchronize; panic should
 	 * be single-threaded.
 	 */
-	kmsg_dump_get_buffer(dumper, true, hv_panic_page, PAGE_SIZE,
+	kmsg_dump_get_buffer(dumper, true, hv_panic_page, HV_HYP_PAGE_SIZE,
 			     &bytes_written);
 	if (bytes_written)
 		hyperv_report_panic_msg(panic_pa, bytes_written);
@@ -1290,7 +1290,7 @@ static int vmbus_bus_init(void)
 		 */
 		hv_get_crash_ctl(hyperv_crash_ctl);
 		if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG) {
-			hv_panic_page = (void *)get_zeroed_page(GFP_KERNEL);
+			hv_panic_page = (void *)hv_alloc_hyperv_zeroed_page();
 			if (hv_panic_page) {
 				ret = kmsg_dump_register(&hv_kmsg_dumper);
 				if (ret)
@@ -1319,7 +1319,7 @@ static int vmbus_bus_init(void)
 	hv_remove_vmbus_irq();
 
 	bus_unregister(&hv_bus);
-	free_page((unsigned long)hv_panic_page);
+	hv_free_hyperv_page((unsigned long)hv_panic_page);
 	unregister_sysctl_table(hv_ctl_table_hdr);
 	hv_ctl_table_hdr = NULL;
 	return ret;
-- 
2.17.1


^ permalink raw reply related

* [PATCH 1/2] x86: hv: Add function to allocate zeroed page for Hyper-V
From: Himadri Pandya @ 2019-07-30  9:49 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa
  Cc: x86, linux-hyperv, linux-kernel, Himadri Pandya
In-Reply-To: <20190730094944.96007-1-himadri18.07@gmail.com>

Hyper-V assumes page size to be 4K. While this assumption holds true on
x86 architecture, it might not  be true for ARM64 architecture. Hence
define hyper-v specific function to allocate a zeroed page which can
have a different implementation on ARM64 architecture to handle the
conflict between hyper-v's assumed page size and actual guest page size.

Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
---
 arch/x86/hyperv/hv_init.c       | 8 ++++++++
 arch/x86/include/asm/mshyperv.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index d314cf1e15fd..2d0b9b2bddf7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -45,6 +45,14 @@ void *hv_alloc_hyperv_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
 
+void *hv_alloc_hyperv_zeroed_page(void)
+{
+        BUILD_BUG_ON(PAGE_SIZE != HV_HYP_PAGE_SIZE);
+
+        return (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
+}
+EXPORT_SYMBOL_GPL(hv_alloc_hyperv_zeroed_page);
+
 void hv_free_hyperv_page(unsigned long addr)
 {
 	free_page(addr);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index f4138aeb4280..6b79515abb82 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -219,6 +219,7 @@ static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
 void __init hyperv_init(void);
 void hyperv_setup_mmu_ops(void);
 void *hv_alloc_hyperv_page(void);
+void *hv_alloc_hyperv_zeroed_page(void);
 void hv_free_hyperv_page(unsigned long addr);
 void hyperv_reenlightenment_intr(struct pt_regs *regs);
 void set_hv_tscchange_cb(void (*cb)(void));
-- 
2.17.1


^ permalink raw reply related

* [PATCH 0/2] Drivers: hv: Remove dependencies on guest page size
From: Himadri Pandya @ 2019-07-30  9:49 UTC (permalink / raw)
  To: mikelley, kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa
  Cc: x86, linux-hyperv, linux-kernel, Himadri Pandya

Hyper-V assumes page size to be 4KB. This might not be the case on ARM64
architecture. The first patch in this patchset introduces a hyer-v
specific function for allocating a zeroed page which can have a
different implementation on ARM64 to address the issue of different
guest and host page sizes. The second patch removes dependencies on
guest page size in vmbus by using hyper-v specific page symbol and
functions. 

Himadri Pandya (2):
  x86: hv: Add function to allocate zeroed page for Hyper-V
  Drivers: hv: vmbus: Remove dependencies on guest page size

 arch/x86/hyperv/hv_init.c       |  8 ++++++++
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c         | 14 +++++++-------
 drivers/hv/vmbus_drv.c          |  6 +++---
 4 files changed, 19 insertions(+), 10 deletions(-)

-- 
2.17.1


^ permalink raw reply

* RE: [PATCH net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-07-29 20:23 UTC (permalink / raw)
  To: Sunil Muthuswamy, David Miller, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <MW2PR2101MB1116DC8461F1B02C232019E2C0DD0@MW2PR2101MB1116.namprd21.prod.outlook.com>

> From: Sunil Muthuswamy <sunilmut@microsoft.com>
> Sent: Monday, July 29, 2019 10:21 AM
> > --- a/net/vmw_vsock/hyperv_transport.c
> > +++ b/net/vmw_vsock/hyperv_transport.c
> > @@ -309,9 +309,16 @@ static void hvs_close_connection(struct
> vmbus_channel *chan)
> >  {
> >  	struct sock *sk = get_per_channel_state(chan);
> >
> > +	/* Grab an extra reference since hvs_do_close_lock_held() may decrease
> > +	 * the reference count to 0 by calling sock_put(sk).
> > +	 */
> > +	sock_hold(sk);
> > +
> 
> To me, it seems like when 'hvs_close_connection' is called, there should always
> be an outstanding reference to the socket. 

I agree. There *should* be, but it turns out there is race condition: 

For an established connectin that is being closed by the guest, the refcnt is 4
at the end of hvs_release() (Note: here the 'remove_sock' is false):

1 for the initial value;
1 for the sk being in the bound list;
1 for the sk being in the connected list;
1 for the delayed close_work.

After hvs_release() finishes, __vsock_release() -> sock_put(sk) *may* decrease
the refcnt to 3. 

Concurrently, hvs_close_connection() runs in another thread:
  calls vsock_remove_sock() to decrease the refcnt by 2;
  call sock_put() to decrease the refcnt to 0, and free the sk;
  Next, the "release_sock(sk)" may hang due to use-after-free.

In the above, after hvs_release() finishes, if hvs_close_connection() runs
faster than "__vsock_release() -> sock_put(sk)", then there is not any issue,
because at the beginning of hvs_close_connection(), the refcnt is still 4.

So, this patch can work, but it's not the right fix. 
Your suggestion is correct and here is the patch. 
I'll give it more tests and send a v2.

--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -312,6 +312,11 @@ static void hvs_close_connection(struct vmbus_channel *chan)
        lock_sock(sk);
        hvs_do_close_lock_held(vsock_sk(sk), true);
        release_sock(sk);
+
+       /* Release the refcnt for the channel that's opened in
+        * hvs_open_connection().
+        */
+       sock_put(sk);
 }

 static void hvs_open_connection(struct vmbus_channel *chan)
@@ -407,6 +412,9 @@ static void hvs_open_connection(struct vmbus_channel *chan)
        }

        set_per_channel_state(chan, conn_from_host ? new : sk);
+
+       /* This reference will be dropped by hvs_close_connection(). */
+       sock_hold(conn_from_host ? new: sk);
        vmbus_set_chn_rescind_callback(chan, hvs_close_connection);

        /* Set the pending send size to max packet size to always get


> The reference that is dropped by
> ' hvs_do_close_lock_held' is a legitimate reference that was taken by
> 'hvs_close_lock_held'.

Correct.

> Or, in other words, I think the right solution is to always maintain a reference to
> socket
> until this routine is called and drop that here. That can be done by taking the
> reference to
> the socket prior to ' vmbus_set_chn_rescind_callback(chan,
> hvs_close_connection)' and
> dropping that reference at the end of 'hvs_close_connection'.
> 
> >  	lock_sock(sk);
> >  	hvs_do_close_lock_held(vsock_sk(sk), true);
> >  	release_sock(sk);
> > +
> > +	sock_put(sk);
> 
> Thanks for taking a look at this. We should queue this fix and the other
> hvsocket fixes
> for the stable branch.

I added a "Cc: stable@vger.kernel.org" tag so this pach will go to the
stable kernels automatically.

Your previous two fixes are in the v5.2.4 stable kernel, but not in the other
longterm stable kernels 4.19 and 4.14:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=v5.2.4&qt=author&q=Muthuswamy

I'll request them to be backported for 4.19 and 4.14.
I'll also request the patch "vsock: correct removal of socket from the list"
to be backported.

The other two "hv_sock: perf" patches are more of features rather than
fixes. Usually the stable kernel maintaners don't backport feature patches.

Thanks,
-- Dexuan

^ permalink raw reply

* RE: [PATCH net] hv_sock: Fix hang when a connection is closed
From: Sunil Muthuswamy @ 2019-07-29 17:21 UTC (permalink / raw)
  To: Dexuan Cui, David Miller, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com
In-Reply-To: <PU1P153MB01690A7767ECDF420FF78D66BFC20@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>



> -----Original Message-----
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Sunday, July 28, 2019 11:32 AM
> To: Sunil Muthuswamy <sunilmut@microsoft.com>; David Miller <davem@davemloft.net>; netdev@vger.kernel.org
> Cc: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; sashal@kernel.org; Michael Kelley <mikelley@microsoft.com>; linux-hyperv@vger.kernel.org; linux-
> kernel@vger.kernel.org; olaf@aepfle.de; apw@canonical.com; jasowang@redhat.com; vkuznets <vkuznets@redhat.com>;
> marcelo.cerri@canonical.com
> Subject: [PATCH net] hv_sock: Fix hang when a connection is closed
> 
> 
> hvs_do_close_lock_held() may decrease the reference count to 0 and free the
> sk struct completely, and then the following release_sock(sk) may hang.
> 
> Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Cc: stable@vger.kernel.org
> 
> ---
> With the proper kernel debugging options enabled, first a warning can
> appear:
> 
> kworker/1:0/4467 is freeing memory ..., with a lock still held there!
> stack backtrace:
> Workqueue: events vmbus_onmessage_work [hv_vmbus]
> Call Trace:
>  dump_stack+0x67/0x90
>  debug_check_no_locks_freed.cold.52+0x78/0x7d
>  slab_free_freelist_hook+0x85/0x140
>  kmem_cache_free+0xa5/0x380
>  __sk_destruct+0x150/0x260
>  hvs_close_connection+0x24/0x30 [hv_sock]
>  vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
>  process_one_work+0x241/0x600
>  worker_thread+0x3c/0x390
>  kthread+0x11b/0x140
>  ret_from_fork+0x24/0x30
> 
> and then the following release_sock(sk) can hang:
> 
> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:4467]
> ...
> irq event stamp: 62890
> CPU: 1 PID: 4467 Comm: kworker/1:0 Tainted: G        W         5.2.0+ #39
> Workqueue: events vmbus_onmessage_work [hv_vmbus]
> RIP: 0010:queued_spin_lock_slowpath+0x2b/0x1e0
> ...
> Call Trace:
>  do_raw_spin_lock+0xab/0xb0
>  release_sock+0x19/0xb0
>  vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
>  process_one_work+0x241/0x600
>  worker_thread+0x3c/0x390
>  kthread+0x11b/0x140
>  ret_from_fork+0x24/0x30
> 
>  net/vmw_vsock/hyperv_transport.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
> index f2084e3f7aa4..efbda8ef1eff 100644
> --- a/net/vmw_vsock/hyperv_transport.c
> +++ b/net/vmw_vsock/hyperv_transport.c
> @@ -309,9 +309,16 @@ static void hvs_close_connection(struct vmbus_channel *chan)
>  {
>  	struct sock *sk = get_per_channel_state(chan);
> 
> +	/* Grab an extra reference since hvs_do_close_lock_held() may decrease
> +	 * the reference count to 0 by calling sock_put(sk).
> +	 */
> +	sock_hold(sk);
> +

To me, it seems like when 'hvs_close_connection' is called, there should always be
an outstanding reference to the socket. The reference that is dropped by
' hvs_do_close_lock_held' is a legitimate reference that was taken by 'hvs_close_lock_held'.
Or, in other words, I think the right solution is to always maintain a reference to socket
until this routine is called and drop that here. That can be done by taking the reference to
the socket prior to ' vmbus_set_chn_rescind_callback(chan, hvs_close_connection)' and
dropping that reference at the end of 'hvs_close_connection'.

>  	lock_sock(sk);
>  	hvs_do_close_lock_held(vsock_sk(sk), true);
>  	release_sock(sk);
> +
> +	sock_put(sk);
>  }
> 
>  static void hvs_open_connection(struct vmbus_channel *chan)
> --
> 2.19.1

Thanks for taking a look at this. We should queue this fix and the other hvsocket fixes
for the stable branch.

^ permalink raw reply

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Vitaly Kuznetsov @ 2019-07-29 12:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv, linux-kernel,
	luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
In-Reply-To: <20190729110927.GC31398@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:

> On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
>> lantianyu1986@gmail.com writes:
>> 
>> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>> >
>> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
>> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
>> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
>> > to set the sched clock function appropriately.  On x86, this sets
>> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
>> > scaled and adjusted to be continuous.
>> 
>> Hypervisor can, in theory, disable TSC page and then we're forced to use
>> MSR-based clocksource but using it as sched_clock() can be very slow,
>> I'm afraid.
>> 
>> On the other hand, what we have now is probably worse: TSC can,
>> actually, jump backwards (e.g. on migration) and we're breaking the
>> requirements for sched_clock().
>
> That (obviously) also breaks the requirements for using TSC as
> clocksource.
>
> IOW, it breaks the entire purpose of having TSC in the first place.

Currently, we mark raw TSC as unstable when running on Hyper-V (see
88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
instead. The problem is that 'TSC page' can be disabled by the
hypervisor and in that case the only remaining clocksource is MSR-based
(slow).

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Peter Zijlstra @ 2019-07-29 11:09 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: lantianyu1986, Tianyu Lan, linux-arch, linux-hyperv, linux-kernel,
	luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
In-Reply-To: <87zhkxksxd.fsf@vitty.brq.redhat.com>

On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
> lantianyu1986@gmail.com writes:
> 
> > From: Tianyu Lan <Tianyu.Lan@microsoft.com>
> >
> > Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> > on x86.  But native_sched_clock() directly uses the raw TSC value, which
> > can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> > to set the sched clock function appropriately.  On x86, this sets
> > pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> > scaled and adjusted to be continuous.
> 
> Hypervisor can, in theory, disable TSC page and then we're forced to use
> MSR-based clocksource but using it as sched_clock() can be very slow,
> I'm afraid.
> 
> On the other hand, what we have now is probably worse: TSC can,
> actually, jump backwards (e.g. on migration) and we're breaking the
> requirements for sched_clock().

That (obviously) also breaks the requirements for using TSC as
clocksource.

IOW, it breaks the entire purpose of having TSC in the first place.

^ permalink raw reply

* Re: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Vitaly Kuznetsov @ 2019-07-29 10:59 UTC (permalink / raw)
  To: lantianyu1986
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel, luto, tglx,
	mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
In-Reply-To: <20190729075243.22745-1-Tianyu.Lan@microsoft.com>

lantianyu1986@gmail.com writes:

> From: Tianyu Lan <Tianyu.Lan@microsoft.com>
>
> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
> on x86.  But native_sched_clock() directly uses the raw TSC value, which
> can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
> to set the sched clock function appropriately.  On x86, this sets
> pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
> scaled and adjusted to be continuous.

Hypervisor can, in theory, disable TSC page and then we're forced to use
MSR-based clocksource but using it as sched_clock() can be very slow,
I'm afraid.

On the other hand, what we have now is probably worse: TSC can,
actually, jump backwards (e.g. on migration) and we're breaking the
requirements for sched_clock().

-- 
Vitaly

^ permalink raw reply

* [PATCH 2/2] clocksource/Hyper-V:  Add Hyper-V specific sched clock function
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: kys, haiyangz, sthemmin, sashal, tglx, mingo, bp, hpa, x86,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-hyperv, linux-kernel, linux-arch
In-Reply-To: <20190729075243.22745-1-Tianyu.Lan@microsoft.com>

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately. On x86, this sets pv_ops.time.
sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted
to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 22 ++++++++++++----------
 include/asm-generic/mshyperv.h     |  1 +
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0d258688c8cf..866dfb3dca48 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -301,8 +301,6 @@ void __init hyperv_init(void)
 
 	x86_init.pci.arch_init = hv_pci_init;
 
-	/* Register Hyper-V specific clocksource */
-	hv_init_clocksource();
 	return;
 
 remove_cpuhp_state:
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 062f77279ce3..53afd33990eb 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -29,6 +29,7 @@
 #include <asm/timer.h>
 #include <asm/reboot.h>
 #include <asm/nmi.h>
+#include <clocksource/hyperv_timer.h>
 
 struct ms_hyperv_info ms_hyperv;
 EXPORT_SYMBOL_GPL(ms_hyperv);
@@ -338,9 +339,16 @@ static void __init ms_hyperv_init_platform(void)
 		x2apic_phys = 1;
 # endif
 
+	/* Register Hyper-V specific clocksource */
+	hv_init_clocksource();
 #endif
 }
 
+void hv_setup_sched_clock(void *sched_clock)
+{
+	pv_ops.time.sched_clock = sched_clock;
+}
+
 const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = {
 	.name			= "Microsoft Hyper-V",
 	.detect			= ms_hyperv_platform,
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 86764ec9a854..eafca89b44d7 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -215,6 +215,7 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 #ifdef CONFIG_HYPERV_TSCPAGE
 
 static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
+static u64 hv_sched_clock_offset __ro_after_init;
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
@@ -222,7 +223,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 notrace read_hv_clock_tsc(struct clocksource *arg)
 {
 	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
@@ -232,9 +233,9 @@ static u64 notrace read_hv_sched_clock_tsc(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_sched_clock_tsc(void)
 {
-	return read_hv_sched_clock_tsc();
+	return read_hv_clock_tsc(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_tsc = {
@@ -246,7 +247,7 @@ static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 notrace read_hv_clock_msr(struct clocksource *arg)
 {
 	u64 current_tick;
 	/*
@@ -258,9 +259,9 @@ static u64 notrace read_hv_sched_clock_msr(void)
 	return current_tick;
 }
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_sched_clock_msr(void)
 {
-	return read_hv_sched_clock_msr();
+	return read_hv_clock_msr(NULL) - hv_sched_clock_offset;
 }
 
 static struct clocksource hyperv_cs_msr = {
@@ -298,8 +299,9 @@ static bool __init hv_init_tsc_clocksource(void)
 	hv_set_clocksource_vdso(hyperv_cs_tsc);
 	clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_tsc, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_tsc);
+
 	return true;
 }
 #else
@@ -329,7 +331,7 @@ void __init hv_init_clocksource(void)
 	hyperv_cs = &hyperv_cs_msr;
 	clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
 
-	/* sched_clock_register is needed on ARM64 but is a no-op on x86 */
-	sched_clock_register(read_hv_sched_clock_msr, 64, HV_CLOCK_HZ);
+	hv_sched_clock_offset = hyperv_cs->read(hyperv_cs);
+	hv_setup_sched_clock(read_hv_sched_clock_msr);
 }
 EXPORT_SYMBOL_GPL(hv_init_clocksource);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 0becb7d9704d..18d8e2d8210f 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -167,6 +167,7 @@ void hyperv_report_panic(struct pt_regs *regs, long err);
 void hyperv_report_panic_msg(phys_addr_t pa, size_t size);
 bool hv_is_hyperv_initialized(void);
 void hyperv_cleanup(void);
+void hv_setup_sched_clock(void *sched_clock);
 #else /* CONFIG_HYPERV */
 static inline bool hv_is_hyperv_initialized(void) { return false; }
 static inline void hyperv_cleanup(void) {}
-- 
2.14.5


^ permalink raw reply related

* [PATCH 1/2] clocksource/Hyper-v: Allocate Hyper-V tsc page statically
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-kernel, linux-hyperv, linux-arch
In-Reply-To: <20190729075243.22745-1-Tianyu.Lan@microsoft.com>

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

This is to prepare to add Hyper-V sched clock callback and move
Hyper-V reference TSC initialization much earlier in the boot
process when timestamp is 0. So no discontinuity is observed
when pv_ops.time.sched_clock to calculate its offset. This earlier
initialization requires that the Hyper-V TSC page be allocated
statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
---
 arch/x86/entry/vdso/vma.c          |  2 +-
 drivers/clocksource/hyperv_timer.c | 12 ++++--------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 349a61d8bf34..f5937742b290 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -122,7 +122,7 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
 
 		if (tsc_pg && vclock_was_used(VCLOCK_HVCLOCK))
 			return vmf_insert_pfn(vma, vmf->address,
-					vmalloc_to_pfn(tsc_pg));
+					virt_to_phys(tsc_pg) >> PAGE_SHIFT);
 	}
 
 	return VM_FAULT_SIGBUS;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index ba2c79e6a0ee..86764ec9a854 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -214,17 +214,17 @@ EXPORT_SYMBOL_GPL(hyperv_cs);
 
 #ifdef CONFIG_HYPERV_TSCPAGE
 
-static struct ms_hyperv_tsc_page *tsc_pg;
+static struct ms_hyperv_tsc_page tsc_pg __aligned(PAGE_SIZE);
 
 struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 {
-	return tsc_pg;
+	return &tsc_pg;
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
 static u64 notrace read_hv_sched_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(tsc_pg);
+	u64 current_tick = hv_read_tsc_page(&tsc_pg);
 
 	if (current_tick == U64_MAX)
 		hv_get_time_ref_count(current_tick);
@@ -280,12 +280,8 @@ static bool __init hv_init_tsc_clocksource(void)
 	if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
 		return false;
 
-	tsc_pg = vmalloc(PAGE_SIZE);
-	if (!tsc_pg)
-		return false;
-
 	hyperv_cs = &hyperv_cs_tsc;
-	phys_addr = page_to_phys(vmalloc_to_page(tsc_pg));
+	phys_addr = virt_to_phys(&tsc_pg) & PAGE_MASK;
 
 	/*
 	 * The Hyper-V TLFS specifies to preserve the value of reserved
-- 
2.14.5


^ permalink raw reply related

* [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: lantianyu1986 @ 2019-07-29  7:52 UTC (permalink / raw)
  To: luto, tglx, mingo, bp, hpa, x86, kys, haiyangz, sthemmin, sashal,
	daniel.lezcano, arnd, michael.h.kelley, ashal
  Cc: Tianyu Lan, linux-arch, linux-hyperv, linux-kernel

From: Tianyu Lan <Tianyu.Lan@microsoft.com>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately.  On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.

Also move the Hyper-V reference TSC initialization much earlier in the boot
process so no discontinuity is observed when pv_ops.time.sched_clock
calculates its offset.  This earlier initialization requires that the Hyper-V TSC
page be allocated statically instead of with vmalloc(), so fixup the references
to the TSC page and the method of getting its physical address.

Tianyu Lan (2):
  clocksource/Hyper-v: Allocate Hyper-V tsc page statically
  clocksource/Hyper-V: Add Hyper-V specific sched clock function

 arch/x86/entry/vdso/vma.c          |  2 +-
 arch/x86/hyperv/hv_init.c          |  2 --
 arch/x86/kernel/cpu/mshyperv.c     |  8 ++++++++
 drivers/clocksource/hyperv_timer.c | 34 ++++++++++++++++------------------
 include/asm-generic/mshyperv.h     |  1 +
 5 files changed, 26 insertions(+), 21 deletions(-)

-- 
2.14.5


^ permalink raw reply

* [PATCH net] hv_sock: Fix hang when a connection is closed
From: Dexuan Cui @ 2019-07-28 18:32 UTC (permalink / raw)
  To: Sunil Muthuswamy, David Miller, netdev@vger.kernel.org
  Cc: KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal@kernel.org, Michael Kelley, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, vkuznets, marcelo.cerri@canonical.com


hvs_do_close_lock_held() may decrease the reference count to 0 and free the
sk struct completely, and then the following release_sock(sk) may hang.

Fixes: a9eeb998c28d ("hv_sock: Add support for delayed close")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org

---
With the proper kernel debugging options enabled, first a warning can
appear:

kworker/1:0/4467 is freeing memory ..., with a lock still held there!
stack backtrace:
Workqueue: events vmbus_onmessage_work [hv_vmbus]
Call Trace:
 dump_stack+0x67/0x90
 debug_check_no_locks_freed.cold.52+0x78/0x7d
 slab_free_freelist_hook+0x85/0x140
 kmem_cache_free+0xa5/0x380
 __sk_destruct+0x150/0x260
 hvs_close_connection+0x24/0x30 [hv_sock]
 vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
 process_one_work+0x241/0x600
 worker_thread+0x3c/0x390
 kthread+0x11b/0x140
 ret_from_fork+0x24/0x30

and then the following release_sock(sk) can hang:

watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:4467]
...
irq event stamp: 62890
CPU: 1 PID: 4467 Comm: kworker/1:0 Tainted: G        W         5.2.0+ #39
Workqueue: events vmbus_onmessage_work [hv_vmbus]
RIP: 0010:queued_spin_lock_slowpath+0x2b/0x1e0
...
Call Trace:
 do_raw_spin_lock+0xab/0xb0
 release_sock+0x19/0xb0
 vmbus_onmessage_work+0x1d/0x30 [hv_vmbus]
 process_one_work+0x241/0x600
 worker_thread+0x3c/0x390
 kthread+0x11b/0x140
 ret_from_fork+0x24/0x30

 net/vmw_vsock/hyperv_transport.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index f2084e3f7aa4..efbda8ef1eff 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -309,9 +309,16 @@ static void hvs_close_connection(struct vmbus_channel *chan)
 {
 	struct sock *sk = get_per_channel_state(chan);
 
+	/* Grab an extra reference since hvs_do_close_lock_held() may decrease
+	 * the reference count to 0 by calling sock_put(sk).
+	 */
+	sock_hold(sk);
+
 	lock_sock(sk);
 	hvs_do_close_lock_held(vsock_sk(sk), true);
 	release_sock(sk);
+
+	sock_put(sk);
 }
 
 static void hvs_open_connection(struct vmbus_channel *chan)
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-28  4:06 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
	linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

Hi Himadri,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-rc1-7-g2b96cd8-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   include/linux/sched.h:609:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:609:73: sparse: sparse: invalid named zero-width bitfield `value'
   include/linux/sched.h:610:43: sparse: sparse: bad integer constant expression
   include/linux/sched.h:610:67: sparse: sparse: invalid named zero-width bitfield `bucket_id'
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    right side has type int
   net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse: sparse: incompatible types for operation (-)
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    left side has type bad type
>> net/vmw_vsock/hyperv_transport.c:214:39: sparse:    right side has type int
   net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:65:17: sparse: sparse: bad constant expression type
   net/vmw_vsock/hyperv_transport.c:387:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:388:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
>> net/vmw_vsock/hyperv_transport.c:390:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:391:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:392:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:393:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:394:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:395:26: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:465:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:466:25: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:666:9: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: undefined identifier 'HV_HYP_PAGE_SIZE'
   net/vmw_vsock/hyperv_transport.c:681:28: sparse: sparse: cast from unknown type

vim +214 net/vmw_vsock/hyperv_transport.c

ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   59  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   60  struct hvs_send_buf {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   61  	/* The header before the payload data */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   62  	struct vmpipe_proto_header hdr;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   63  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   64  	/* The payload */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  @65  	u8 data[HVS_SEND_BUF_SIZE];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   66  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   67  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   68  #define HVS_HEADER_LEN	(sizeof(struct vmpacket_descriptor) + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   69  			 sizeof(struct vmpipe_proto_header))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   70  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   71  /* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write(), and
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   72   * __hv_pkt_iter_next().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   73   */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   74  #define VMBUS_PKT_TRAILER_SIZE	(sizeof(u64))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   75  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   76  #define HVS_PKT_LEN(payload_len)	(HVS_HEADER_LEN + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   77  					 ALIGN((payload_len), 8) + \
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   78  					 VMBUS_PKT_TRAILER_SIZE)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   79  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   80  union hvs_service_id {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   81  	uuid_le	srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   82  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   83  	struct {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   84  		unsigned int svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   85  		unsigned char b[sizeof(uuid_le) - sizeof(unsigned int)];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   86  	};
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   87  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   88  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   89  /* Per-socket state (accessed via vsk->trans) */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   90  struct hvsock {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   91  	struct vsock_sock *vsk;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   92  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   93  	uuid_le vm_srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   94  	uuid_le host_srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   95  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   96  	struct vmbus_channel *chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   97  	struct vmpacket_descriptor *recv_desc;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   98  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26   99  	/* The length of the payload not delivered to userland yet */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  100  	u32 recv_data_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  101  	/* The offset of the payload */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  102  	u32 recv_data_off;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  103  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  104  	/* Have we sent the zero-length packet (FIN)? */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  105  	bool fin_sent;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  106  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  107  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  108  /* In the VM, we support Hyper-V Sockets with AF_VSOCK, and the endpoint is
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  109   * <cid, port> (see struct sockaddr_vm). Note: cid is not really used here:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  110   * when we write apps to connect to the host, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  111   * or VMADDR_CID_HOST (both are equivalent) as the remote cid, and when we
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  112   * write apps to bind() & listen() in the VM, we can only use VMADDR_CID_ANY
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  113   * as the local cid.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  114   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  115   * On the host, Hyper-V Sockets are supported by Winsock AF_HYPERV:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  116   * https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  117   * guide/make-integration-service, and the endpoint is <VmID, ServiceId> with
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  118   * the below sockaddr:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  119   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  120   * struct SOCKADDR_HV
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  121   * {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  122   *    ADDRESS_FAMILY Family;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  123   *    USHORT Reserved;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  124   *    GUID VmId;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  125   *    GUID ServiceId;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  126   * };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  127   * Note: VmID is not used by Linux VM and actually it isn't transmitted via
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  128   * VMBus, because here it's obvious the host and the VM can easily identify
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  129   * each other. Though the VmID is useful on the host, especially in the case
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  130   * of Windows container, Linux VM doesn't need it at all.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  131   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  132   * To make use of the AF_VSOCK infrastructure in Linux VM, we have to limit
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  133   * the available GUID space of SOCKADDR_HV so that we can create a mapping
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  134   * between AF_VSOCK port and SOCKADDR_HV Service GUID. The rule of writing
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  135   * Hyper-V Sockets apps on the host and in Linux VM is:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  136   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  137   ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  138   * The only valid Service GUIDs, from the perspectives of both the host and *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  139   * Linux VM, that can be connected by the other end, must conform to this   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  140   * format: <port>-facb-11e6-bd58-64006a7986d3, and the "port" must be in    *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  141   * this range [0, 0x7FFFFFFF].                                              *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  142   ****************************************************************************
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  143   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  144   * When we write apps on the host to connect(), the GUID ServiceID is used.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  145   * When we write apps in Linux VM to connect(), we only need to specify the
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  146   * port and the driver will form the GUID and use that to request the host.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  147   *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  148   * From the perspective of Linux VM:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  149   * 1. the local ephemeral port (i.e. the local auto-bound port when we call
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  150   * connect() without explicit bind()) is generated by __vsock_bind_stream(),
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  151   * and the range is [1024, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  152   * 2. the remote ephemeral port (i.e. the auto-generated remote port for
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  153   * a connect request initiated by the host's connect()) is generated by
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  154   * hvs_remote_addr_init() and the range is [0x80000000, 0xFFFFFFFF).
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  155   */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  156  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  157  #define MAX_LISTEN_PORT			((u32)0x7FFFFFFF)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  158  #define MAX_VM_LISTEN_PORT		MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  159  #define MAX_HOST_LISTEN_PORT		MAX_LISTEN_PORT
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  160  #define MIN_HOST_EPHEMERAL_PORT		(MAX_HOST_LISTEN_PORT + 1)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  161  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  162  /* 00000000-facb-11e6-bd58-64006a7986d3 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  163  static const uuid_le srv_id_template =
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  164  	UUID_LE(0x00000000, 0xfacb, 0x11e6, 0xbd, 0x58,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  165  		0x64, 0x00, 0x6a, 0x79, 0x86, 0xd3);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  166  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  167  static bool is_valid_srv_id(const uuid_le *id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  168  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  169  	return !memcmp(&id->b[4], &srv_id_template.b[4], sizeof(uuid_le) - 4);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  170  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  171  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  172  static unsigned int get_port_by_srv_id(const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  173  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  174  	return *((unsigned int *)svr_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  175  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  176  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  177  static void hvs_addr_init(struct sockaddr_vm *addr, const uuid_le *svr_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  178  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  179  	unsigned int port = get_port_by_srv_id(svr_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  180  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  181  	vsock_addr_init(addr, VMADDR_CID_ANY, port);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  182  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  183  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  184  static void hvs_remote_addr_init(struct sockaddr_vm *remote,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  185  				 struct sockaddr_vm *local)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  186  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  187  	static u32 host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  188  	struct sock *sk;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  189  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  190  	vsock_addr_init(remote, VMADDR_CID_ANY, VMADDR_PORT_ANY);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  191  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  192  	while (1) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  193  		/* Wrap around ? */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  194  		if (host_ephemeral_port < MIN_HOST_EPHEMERAL_PORT ||
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  195  		    host_ephemeral_port == VMADDR_PORT_ANY)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  196  			host_ephemeral_port = MIN_HOST_EPHEMERAL_PORT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  197  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  198  		remote->svm_port = host_ephemeral_port++;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  199  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  200  		sk = vsock_find_connected_socket(remote, local);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  201  		if (!sk) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  202  			/* Found an available ephemeral port */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  203  			return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  204  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  205  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  206  		/* Release refcnt got in vsock_find_connected_socket */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  207  		sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  208  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  209  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  210  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  211  static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  212  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  213  	set_channel_pending_send_size(chan,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26 @214  				      HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  215  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  216  	virt_mb();
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  217  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  218  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  219  static bool hvs_channel_readable(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  220  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  221  	u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  222  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  223  	/* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  224  	return readable >= HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  225  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  226  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  227  static int hvs_channel_readable_payload(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  228  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  229  	u32 readable = hv_get_bytes_to_read(&chan->inbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  230  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  231  	if (readable > HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  232  		/* At least we have 1 byte to read. We don't need to return
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  233  		 * the exact readable bytes: see vsock_stream_recvmsg() ->
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  234  		 * vsock_stream_has_data().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  235  		 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  236  		return 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  237  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  238  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  239  	if (readable == HVS_PKT_LEN(0)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  240  		/* 0-size payload means FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  241  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  242  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  243  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  244  	/* No payload or FIN */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  245  	return -1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  246  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  247  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  248  static size_t hvs_channel_writable_bytes(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  249  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  250  	u32 writeable = hv_get_bytes_to_write(&chan->outbound);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  251  	size_t ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  252  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  253  	/* The ringbuffer mustn't be 100% full, and we should reserve a
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  254  	 * zero-length-payload packet for the FIN: see hv_ringbuffer_write()
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  255  	 * and hvs_shutdown().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  256  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  257  	if (writeable <= HVS_PKT_LEN(1) + HVS_PKT_LEN(0))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  258  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  259  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  260  	ret = writeable - HVS_PKT_LEN(1) - HVS_PKT_LEN(0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  261  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  262  	return round_down(ret, 8);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  263  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  264  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  265  static int hvs_send_data(struct vmbus_channel *chan,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  266  			 struct hvs_send_buf *send_buf, size_t to_write)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  267  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  268  	send_buf->hdr.pkt_type = 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  269  	send_buf->hdr.data_size = to_write;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  270  	return vmbus_sendpacket(chan, &send_buf->hdr,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  271  				sizeof(send_buf->hdr) + to_write,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  272  				0, VM_PKT_DATA_INBAND, 0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  273  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  274  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  275  static void hvs_channel_cb(void *ctx)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  276  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  277  	struct sock *sk = (struct sock *)ctx;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  278  	struct vsock_sock *vsk = vsock_sk(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  279  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  280  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  281  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  282  	if (hvs_channel_readable(chan))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  283  		sk->sk_data_ready(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  284  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  285  	if (hv_get_bytes_to_write(&chan->outbound) > 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  286  		sk->sk_write_space(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  287  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  288  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  289  static void hvs_do_close_lock_held(struct vsock_sock *vsk,
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  290  				   bool cancel_timeout)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  291  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  292  	struct sock *sk = sk_vsock(vsk);
b4562ca7925a3be Dexuan Cui       2017-10-19  293  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  294  	sock_set_flag(sk, SOCK_DONE);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  295  	vsk->peer_shutdown = SHUTDOWN_MASK;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  296  	if (vsock_stream_has_data(vsk) <= 0)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  297  		sk->sk_state = TCP_CLOSING;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  298  	sk->sk_state_change(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  299  	if (vsk->close_work_scheduled &&
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  300  	    (!cancel_timeout || cancel_delayed_work(&vsk->close_work))) {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  301  		vsk->close_work_scheduled = false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  302  		vsock_remove_sock(vsk);
b4562ca7925a3be Dexuan Cui       2017-10-19  303  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  304  		/* Release the reference taken while scheduling the timeout */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  305  		sock_put(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  306  	}
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  307  }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  308  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  309  static void hvs_close_connection(struct vmbus_channel *chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  310  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  311  	struct sock *sk = get_per_channel_state(chan);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  312  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  313  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  314  	hvs_do_close_lock_held(vsock_sk(sk), true);
b4562ca7925a3be Dexuan Cui       2017-10-19  315  	release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  316  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  317  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  318  static void hvs_open_connection(struct vmbus_channel *chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  319  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  320  	uuid_le *if_instance, *if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  321  	unsigned char conn_from_host;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  322  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  323  	struct sockaddr_vm addr;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  324  	struct sock *sk, *new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  325  	struct vsock_sock *vnew = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  326  	struct hvsock *hvs = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  327  	struct hvsock *hvs_new = NULL;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  328  	int rcvbuf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  329  	int ret;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  330  	int sndbuf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  331  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  332  	if_type = &chan->offermsg.offer.if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  333  	if_instance = &chan->offermsg.offer.if_instance;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  334  	conn_from_host = chan->offermsg.offer.u.pipe.user_def[0];
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  335  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  336  	/* The host or the VM should only listen on a port in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  337  	 * [0, MAX_LISTEN_PORT]
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  338  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  339  	if (!is_valid_srv_id(if_type) ||
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  340  	    get_port_by_srv_id(if_type) > MAX_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  341  		return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  342  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  343  	hvs_addr_init(&addr, conn_from_host ? if_type : if_instance);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  344  	sk = vsock_find_bound_socket(&addr);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  345  	if (!sk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  346  		return;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  347  
b4562ca7925a3be Dexuan Cui       2017-10-19  348  	lock_sock(sk);
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  349  	if ((conn_from_host && sk->sk_state != TCP_LISTEN) ||
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  350  	    (!conn_from_host && sk->sk_state != TCP_SYN_SENT))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  351  		goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  352  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  353  	if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  354  		if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  355  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  356  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  357  		new = __vsock_create(sock_net(sk), NULL, sk, GFP_KERNEL,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  358  				     sk->sk_type, 0);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  359  		if (!new)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  360  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  361  
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  362  		new->sk_state = TCP_SYN_SENT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  363  		vnew = vsock_sk(new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  364  		hvs_new = vnew->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  365  		hvs_new->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  366  	} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  367  		hvs = vsock_sk(sk)->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  368  		hvs->chan = chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  369  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  370  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  371  	set_channel_read_mode(chan, HV_CALL_DIRECT);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  372  
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  373  	/* Use the socket buffer sizes as hints for the VMBUS ring size. For
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  374  	 * server side sockets, 'sk' is the parent socket and thus, this will
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  375  	 * allow the child sockets to inherit the size from the parent. Keep
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  376  	 * the mins to the default value and align to page size as per VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  377  	 * requirements.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  378  	 * For the max, the socket core library will limit the socket buffer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  379  	 * size that can be set by the user, but, since currently, the hv_sock
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  380  	 * VMBUS ring buffer is physically contiguous allocation, restrict it
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  381  	 * further.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  382  	 * Older versions of hv_sock host side code cannot handle bigger VMBUS
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  383  	 * ring buffer size. Use the version number to limit the change to newer
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  384  	 * versions.
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  385  	 */
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  386  	if (vmbus_proto_version < VERSION_WIN10_V5) {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  387  		sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  388  		rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  389  	} else {
ac383f58f3c98de Sunil Muthuswamy 2019-05-22 @390  		sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  391  		sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya   2019-07-25  392  		sndbuf = ALIGN(sndbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  393  		rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  394  		rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
31113cc83e30924 Himadri Pandya   2019-07-25  395  		rcvbuf = ALIGN(rcvbuf, HV_HYP_PAGE_SIZE);
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  396  	}
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  397  
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  398  	ret = vmbus_open(chan, sndbuf, rcvbuf, NULL, 0, hvs_channel_cb,
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  399  			 conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  400  	if (ret != 0) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  401  		if (conn_from_host) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  402  			hvs_new->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  403  			sock_put(new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  404  		} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  405  			hvs->chan = NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  406  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  407  		goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  408  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  409  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  410  	set_per_channel_state(chan, conn_from_host ? new : sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  411  	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  412  
cb359b60416701c Sunil Muthuswamy 2019-06-17  413  	/* Set the pending send size to max packet size to always get
cb359b60416701c Sunil Muthuswamy 2019-06-17  414  	 * notifications from the host when there is enough writable space.
cb359b60416701c Sunil Muthuswamy 2019-06-17  415  	 * The host is optimized to send notifications only when the pending
cb359b60416701c Sunil Muthuswamy 2019-06-17  416  	 * size boundary is crossed, and not always.
cb359b60416701c Sunil Muthuswamy 2019-06-17  417  	 */
cb359b60416701c Sunil Muthuswamy 2019-06-17  418  	hvs_set_channel_pending_send_size(chan);
cb359b60416701c Sunil Muthuswamy 2019-06-17  419  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  420  	if (conn_from_host) {
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  421  		new->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  422  		sk->sk_ack_backlog++;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  423  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  424  		hvs_addr_init(&vnew->local_addr, if_type);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  425  		hvs_remote_addr_init(&vnew->remote_addr, &vnew->local_addr);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  426  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  427  		hvs_new->vm_srv_id = *if_type;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  428  		hvs_new->host_srv_id = *if_instance;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  429  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  430  		vsock_insert_connected(vnew);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  431  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  432  		vsock_enqueue_accept(sk, new);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  433  	} else {
3b4477d2dcf2709 Stefan Hajnoczi  2017-10-05  434  		sk->sk_state = TCP_ESTABLISHED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  435  		sk->sk_socket->state = SS_CONNECTED;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  436  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  437  		vsock_insert_connected(vsock_sk(sk));
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  438  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  439  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  440  	sk->sk_state_change(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  441  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  442  out:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  443  	/* Release refcnt obtained when we called vsock_find_bound_socket() */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  444  	sock_put(sk);
b4562ca7925a3be Dexuan Cui       2017-10-19  445  
b4562ca7925a3be Dexuan Cui       2017-10-19  446  	release_sock(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  447  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  448  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  449  static u32 hvs_get_local_cid(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  450  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  451  	return VMADDR_CID_ANY;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  452  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  453  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  454  static int hvs_sock_init(struct vsock_sock *vsk, struct vsock_sock *psk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  455  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  456  	struct hvsock *hvs;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  457  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  458  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  459  	hvs = kzalloc(sizeof(*hvs), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  460  	if (!hvs)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  461  		return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  462  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  463  	vsk->trans = hvs;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  464  	hvs->vsk = vsk;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  465  	sk->sk_sndbuf = RINGBUFFER_HVS_SND_SIZE;
ac383f58f3c98de Sunil Muthuswamy 2019-05-22  466  	sk->sk_rcvbuf = RINGBUFFER_HVS_RCV_SIZE;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  467  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  468  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  469  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  470  static int hvs_connect(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  471  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  472  	union hvs_service_id vm, host;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  473  	struct hvsock *h = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  474  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  475  	vm.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  476  	vm.svm_port = vsk->local_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  477  	h->vm_srv_id = vm.srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  478  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  479  	host.srv_id = srv_id_template;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  480  	host.svm_port = vsk->remote_addr.svm_port;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  481  	h->host_srv_id = host.srv_id;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  482  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  483  	return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  484  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  485  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  486  static void hvs_shutdown_lock_held(struct hvsock *hvs, int mode)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  487  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  488  	struct vmpipe_proto_header hdr;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  489  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  490  	if (hvs->fin_sent || !hvs->chan)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  491  		return;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  492  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  493  	/* It can't fail: see hvs_channel_writable_bytes(). */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  494  	(void)hvs_send_data(hvs->chan, (struct hvs_send_buf *)&hdr, 0);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  495  	hvs->fin_sent = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  496  }
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  497  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  498  static int hvs_shutdown(struct vsock_sock *vsk, int mode)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  499  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  500  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  501  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  502  	if (!(mode & SEND_SHUTDOWN))
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  503  		return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  504  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  505  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  506  	hvs_shutdown_lock_held(vsk->trans, mode);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  507  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  508  	return 0;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  509  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  510  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  511  static void hvs_close_timeout(struct work_struct *work)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  512  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  513  	struct vsock_sock *vsk =
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  514  		container_of(work, struct vsock_sock, close_work.work);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  515  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  516  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  517  	sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  518  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  519  	if (!sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  520  		hvs_do_close_lock_held(vsk, false);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  521  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  522  	vsk->close_work_scheduled = false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  523  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  524  	sock_put(sk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  525  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  526  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  527  /* Returns true, if it is safe to remove socket; false otherwise */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  528  static bool hvs_close_lock_held(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  529  {
b4562ca7925a3be Dexuan Cui       2017-10-19  530  	struct sock *sk = sk_vsock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  531  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  532  	if (!(sk->sk_state == TCP_ESTABLISHED ||
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  533  	      sk->sk_state == TCP_CLOSING))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  534  		return true;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  535  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  536  	if ((sk->sk_shutdown & SHUTDOWN_MASK) != SHUTDOWN_MASK)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  537  		hvs_shutdown_lock_held(vsk->trans, SHUTDOWN_MASK);
b4562ca7925a3be Dexuan Cui       2017-10-19  538  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  539  	if (sock_flag(sk, SOCK_DONE))
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  540  		return true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  541  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  542  	/* This reference will be dropped by the delayed close routine */
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  543  	sock_hold(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  544  	INIT_DELAYED_WORK(&vsk->close_work, hvs_close_timeout);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  545  	vsk->close_work_scheduled = true;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  546  	schedule_delayed_work(&vsk->close_work, HVS_CLOSE_TIMEOUT);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  547  	return false;
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  548  }
b4562ca7925a3be Dexuan Cui       2017-10-19  549  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  550  static void hvs_release(struct vsock_sock *vsk)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  551  {
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  552  	struct sock *sk = sk_vsock(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  553  	bool remove_sock;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  554  
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  555  	lock_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  556  	remove_sock = hvs_close_lock_held(vsk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  557  	release_sock(sk);
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  558  	if (remove_sock)
a9eeb998c28d550 Sunil Muthuswamy 2019-05-15  559  		vsock_remove_sock(vsk);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  560  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  561  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  562  static void hvs_destruct(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  563  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  564  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  565  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  566  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  567  	if (chan)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  568  		vmbus_hvsock_device_unregister(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  569  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  570  	kfree(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  571  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  572  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  573  static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  574  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  575  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  576  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  577  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  578  static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  579  			     size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  580  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  581  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  582  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  583  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  584  static int hvs_dgram_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  585  			     struct sockaddr_vm *remote, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  586  			     size_t dgram_len)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  587  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  588  	return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  589  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  590  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  591  static bool hvs_dgram_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  592  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  593  	return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  594  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  595  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  596  static int hvs_update_recv_data(struct hvsock *hvs)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  597  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  598  	struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  599  	u32 payload_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  600  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  601  	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  602  	payload_len = recv_buf->hdr.data_size;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  603  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  604  	if (payload_len > HVS_MTU_SIZE)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  605  		return -EIO;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  606  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  607  	if (payload_len == 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  608  		hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  609  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  610  	hvs->recv_data_len = payload_len;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  611  	hvs->recv_data_off = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  612  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  613  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  614  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  615  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  616  static ssize_t hvs_stream_dequeue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  617  				  size_t len, int flags)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  618  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  619  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  620  	bool need_refill = !hvs->recv_desc;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  621  	struct hvs_recv_buf *recv_buf;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  622  	u32 to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  623  	int ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  624  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  625  	if (flags & MSG_PEEK)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  626  		return -EOPNOTSUPP;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  627  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  628  	if (need_refill) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  629  		hvs->recv_desc = hv_pkt_iter_first(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  630  		ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  631  		if (ret)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  632  			return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  633  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  634  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  635  	recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  636  	to_read = min_t(u32, len, hvs->recv_data_len);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  637  	ret = memcpy_to_msg(msg, recv_buf->data + hvs->recv_data_off, to_read);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  638  	if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  639  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  640  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  641  	hvs->recv_data_len -= to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  642  	if (hvs->recv_data_len == 0) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  643  		hvs->recv_desc = hv_pkt_iter_next(hvs->chan, hvs->recv_desc);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  644  		if (hvs->recv_desc) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  645  			ret = hvs_update_recv_data(hvs);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  646  			if (ret)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  647  				return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  648  		}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  649  	} else {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  650  		hvs->recv_data_off += to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  651  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  652  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  653  	return to_read;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  654  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  655  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  656  static ssize_t hvs_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  657  				  size_t len)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  658  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  659  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  660  	struct vmbus_channel *chan = hvs->chan;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  661  	struct hvs_send_buf *send_buf;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  662  	ssize_t to_write, max_writable;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  663  	ssize_t ret = 0;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  664  	ssize_t bytes_written = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  665  
31113cc83e30924 Himadri Pandya   2019-07-25  666  	BUILD_BUG_ON(sizeof(*send_buf) != HV_HYP_PAGE_SIZE);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  667  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  668  	send_buf = kmalloc(sizeof(*send_buf), GFP_KERNEL);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  669  	if (!send_buf)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  670  		return -ENOMEM;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  671  
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  672  	/* Reader(s) could be draining data from the channel as we write.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  673  	 * Maximize bandwidth, by iterating until the channel is found to be
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  674  	 * full.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  675  	 */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  676  	while (len) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  677  		max_writable = hvs_channel_writable_bytes(chan);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  678  		if (!max_writable)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  679  			break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  680  		to_write = min_t(ssize_t, len, max_writable);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  681  		to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  682  		/* memcpy_from_msg is safe for loop as it advances the offsets
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  683  		 * within the message iterator.
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  684  		 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  685  		ret = memcpy_from_msg(send_buf->data, msg, to_write);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  686  		if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  687  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  688  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  689  		ret = hvs_send_data(hvs->chan, send_buf, to_write);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  690  		if (ret < 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  691  			goto out;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  692  
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  693  		bytes_written += to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  694  		len -= to_write;
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  695  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  696  out:
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  697  	/* If any data has been sent, return that */
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  698  	if (bytes_written)
14a1eaa8820e8f3 Sunil Muthuswamy 2019-05-22  699  		ret = bytes_written;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  700  	kfree(send_buf);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  701  	return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  702  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  703  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  704  static s64 hvs_stream_has_data(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  705  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  706  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  707  	s64 ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  708  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  709  	if (hvs->recv_data_len > 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  710  		return 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  711  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  712  	switch (hvs_channel_readable_payload(hvs->chan)) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  713  	case 1:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  714  		ret = 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  715  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  716  	case 0:
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  717  		vsk->peer_shutdown |= SEND_SHUTDOWN;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  718  		ret = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  719  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  720  	default: /* -1 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  721  		ret = 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  722  		break;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  723  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  724  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  725  	return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  726  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  727  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  728  static s64 hvs_stream_has_space(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  729  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  730  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  731  
cb359b60416701c Sunil Muthuswamy 2019-06-17  732  	return hvs_channel_writable_bytes(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  733  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  734  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  735  static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  736  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  737  	return HVS_MTU_SIZE + 1;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  738  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  739  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  740  static bool hvs_stream_is_active(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  741  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  742  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  743  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  744  	return hvs->chan != NULL;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  745  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  746  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  747  static bool hvs_stream_allow(u32 cid, u32 port)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  748  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  749  	/* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0xFFFFFFFF) is
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  750  	 * reserved as ephemeral ports, which are used as the host's ports
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  751  	 * when the host initiates connections.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  752  	 *
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  753  	 * Perform this check in the guest so an immediate error is produced
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  754  	 * instead of a timeout.
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  755  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  756  	if (port > MAX_HOST_LISTEN_PORT)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  757  		return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  758  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  759  	if (cid == VMADDR_CID_HOST)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  760  		return true;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  761  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  762  	return false;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  763  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  764  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  765  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  766  int hvs_notify_poll_in(struct vsock_sock *vsk, size_t target, bool *readable)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  767  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  768  	struct hvsock *hvs = vsk->trans;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  769  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  770  	*readable = hvs_channel_readable(hvs->chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  771  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  772  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  773  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  774  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  775  int hvs_notify_poll_out(struct vsock_sock *vsk, size_t target, bool *writable)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  776  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  777  	*writable = hvs_stream_has_space(vsk) > 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  778  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  779  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  780  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  781  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  782  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  783  int hvs_notify_recv_init(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  784  			 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  785  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  786  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  787  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  788  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  789  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  790  int hvs_notify_recv_pre_block(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  791  			      struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  792  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  793  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  794  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  795  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  796  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  797  int hvs_notify_recv_pre_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  798  				struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  799  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  800  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  801  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  802  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  803  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  804  int hvs_notify_recv_post_dequeue(struct vsock_sock *vsk, size_t target,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  805  				 ssize_t copied, bool data_read,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  806  				 struct vsock_transport_recv_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  807  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  808  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  809  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  810  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  811  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  812  int hvs_notify_send_init(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  813  			 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  814  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  815  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  816  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  817  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  818  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  819  int hvs_notify_send_pre_block(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  820  			      struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  821  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  822  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  823  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  824  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  825  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  826  int hvs_notify_send_pre_enqueue(struct vsock_sock *vsk,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  827  				struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  828  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  829  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  830  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  831  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  832  static
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  833  int hvs_notify_send_post_enqueue(struct vsock_sock *vsk, ssize_t written,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  834  				 struct vsock_transport_send_notify_data *d)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  835  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  836  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  837  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  838  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  839  static void hvs_set_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  840  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  841  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  842  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  843  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  844  static void hvs_set_min_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  845  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  846  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  847  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  848  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  849  static void hvs_set_max_buffer_size(struct vsock_sock *vsk, u64 val)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  850  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  851  	/* Ignored. */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  852  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  853  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  854  static u64 hvs_get_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  855  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  856  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  857  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  858  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  859  static u64 hvs_get_min_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  860  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  861  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  862  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  863  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  864  static u64 hvs_get_max_buffer_size(struct vsock_sock *vsk)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  865  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  866  	return -ENOPROTOOPT;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  867  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  868  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  869  static struct vsock_transport hvs_transport = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  870  	.get_local_cid            = hvs_get_local_cid,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  871  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  872  	.init                     = hvs_sock_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  873  	.destruct                 = hvs_destruct,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  874  	.release                  = hvs_release,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  875  	.connect                  = hvs_connect,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  876  	.shutdown                 = hvs_shutdown,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  877  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  878  	.dgram_bind               = hvs_dgram_bind,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  879  	.dgram_dequeue            = hvs_dgram_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  880  	.dgram_enqueue            = hvs_dgram_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  881  	.dgram_allow              = hvs_dgram_allow,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  882  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  883  	.stream_dequeue           = hvs_stream_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  884  	.stream_enqueue           = hvs_stream_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  885  	.stream_has_data          = hvs_stream_has_data,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  886  	.stream_has_space         = hvs_stream_has_space,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  887  	.stream_rcvhiwat          = hvs_stream_rcvhiwat,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  888  	.stream_is_active         = hvs_stream_is_active,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  889  	.stream_allow             = hvs_stream_allow,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  890  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  891  	.notify_poll_in           = hvs_notify_poll_in,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  892  	.notify_poll_out          = hvs_notify_poll_out,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  893  	.notify_recv_init         = hvs_notify_recv_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  894  	.notify_recv_pre_block    = hvs_notify_recv_pre_block,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  895  	.notify_recv_pre_dequeue  = hvs_notify_recv_pre_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  896  	.notify_recv_post_dequeue = hvs_notify_recv_post_dequeue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  897  	.notify_send_init         = hvs_notify_send_init,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  898  	.notify_send_pre_block    = hvs_notify_send_pre_block,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  899  	.notify_send_pre_enqueue  = hvs_notify_send_pre_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  900  	.notify_send_post_enqueue = hvs_notify_send_post_enqueue,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  901  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  902  	.set_buffer_size          = hvs_set_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  903  	.set_min_buffer_size      = hvs_set_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  904  	.set_max_buffer_size      = hvs_set_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  905  	.get_buffer_size          = hvs_get_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  906  	.get_min_buffer_size      = hvs_get_min_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  907  	.get_max_buffer_size      = hvs_get_max_buffer_size,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  908  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  909  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  910  static int hvs_probe(struct hv_device *hdev,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  911  		     const struct hv_vmbus_device_id *dev_id)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  912  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  913  	struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  914  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  915  	hvs_open_connection(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  916  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  917  	/* Always return success to suppress the unnecessary error message
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  918  	 * in vmbus_probe(): on error the host will rescind the device in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  919  	 * 30 seconds and we can do cleanup at that time in
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  920  	 * vmbus_onoffer_rescind().
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  921  	 */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  922  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  923  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  924  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  925  static int hvs_remove(struct hv_device *hdev)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  926  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  927  	struct vmbus_channel *chan = hdev->channel;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  928  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  929  	vmbus_close(chan);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  930  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  931  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  932  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  933  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  934  /* This isn't really used. See vmbus_match() and vmbus_probe() */
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  935  static const struct hv_vmbus_device_id id_table[] = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  936  	{},
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  937  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  938  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  939  static struct hv_driver hvs_drv = {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  940  	.name		= "hv_sock",
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  941  	.hvsock		= true,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  942  	.id_table	= id_table,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  943  	.probe		= hvs_probe,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  944  	.remove		= hvs_remove,
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  945  };
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  946  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  947  static int __init hvs_init(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  948  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  949  	int ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  950  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  951  	if (vmbus_proto_version < VERSION_WIN10)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  952  		return -ENODEV;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  953  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  954  	ret = vmbus_driver_register(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  955  	if (ret != 0)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  956  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  957  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  958  	ret = vsock_core_init(&hvs_transport);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  959  	if (ret) {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  960  		vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  961  		return ret;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  962  	}
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  963  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  964  	return 0;
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  965  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  966  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  967  static void __exit hvs_exit(void)
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  968  {
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  969  	vsock_core_exit();
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  970  	vmbus_driver_unregister(&hvs_drv);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  971  }
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  972  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  973  module_init(hvs_init);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  974  module_exit(hvs_exit);
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  975  
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  976  MODULE_DESCRIPTION("Hyper-V Sockets");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  977  MODULE_VERSION("1.0.0");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  978  MODULE_LICENSE("GPL");
ae0078fcf0a5eb3 Dexuan Cui       2017-08-26  979  MODULE_ALIAS_NETPROTO(PF_VSOCK);

:::::: The code at line 214 was first introduced by commit
:::::: ae0078fcf0a5eb3a8623bfb5f988262e0911fdb9 hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

:::::: TO: Dexuan Cui <decui@microsoft.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: Himadri Pandya @ 2019-07-27 11:50 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
	linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <201907271302.tDRkl9uU%lkp@intel.com>


On 7/27/2019 10:50 AM, kbuild test robot wrote:
> Hi Himadri,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc1 next-20190726]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

This patch should be applied to linux-next git tree.

Thank you.

- Himadri

>
> url:    https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
> config: x86_64-allyesconfig (attached as .config)
> compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
> reproduce:
>          # save the attached .config to linux build tree
>          make ARCH=x86_64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp@intel.com>
>
> All error/warnings (new ones prefixed by >>):
>
>>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
>      #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
>                                 ^
>>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
>       u8 data[HVS_SEND_BUF_SIZE];
>               ^~~~~~~~~~~~~~~~~
>     In file included from include/linux/list.h:9:0,
>                      from include/linux/module.h:9,
>                      from net/vmw_vsock/hyperv_transport.c:11:
>     net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
>       __builtin_choose_expr(__safe_cmp(x, y), \
>       ^
>     include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
>      #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
>                                ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
>        sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
>                 ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
>       __builtin_choose_expr(__safe_cmp(x, y), \
>       ^
>     include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
>      #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
>                                ^~~~~~~~~~~~~
>>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
>        sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
>                 ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
>       __builtin_choose_expr(__safe_cmp(x, y), \
>       ^
>     include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
>      #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
>                                ^~~~~~~~~~~~~
>     net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
>        rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
>                 ^~~~~
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
>       __builtin_choose_expr(__safe_cmp(x, y), \
>       ^
>     include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
>      #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
>                                ^~~~~~~~~~~~~
>     net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
>        rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
>                 ^~~~~
>     net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
>       __builtin_choose_expr(__safe_cmp(x, y), \
>       ^
>     include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
>      #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
>                                ^~~~~~~~~~~~~
>     net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
>        to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
>                   ^~~~~
>
> vim +58 net/vmw_vsock/hyperv_transport.c
>
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply

* Re: [PATCH] hv_sock: use HV_HYP_PAGE_SIZE instead of PAGE_SIZE_4K
From: kbuild test robot @ 2019-07-27  5:20 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: kbuild-all, mikelley, kys, haiyangz, sthemmin, sashal, davem,
	linux-hyperv, netdev, linux-kernel, Himadri Pandya
In-Reply-To: <20190725051125.10605-1-himadri18.07@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4160 bytes --]

Hi Himadri,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.3-rc1 next-20190726]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Himadri-Pandya/hv_sock-use-HV_HYP_PAGE_SIZE-instead-of-PAGE_SIZE_4K/20190726-085229
config: x86_64-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-10) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> net/vmw_vsock/hyperv_transport.c:58:28: error: 'HV_HYP_PAGE_SIZE' undeclared here (not in a function); did you mean 'HV_MESSAGE_SIZE'?
    #define HVS_SEND_BUF_SIZE (HV_HYP_PAGE_SIZE - sizeof(struct vmpipe_proto_header))
                               ^
>> net/vmw_vsock/hyperv_transport.c:65:10: note: in expansion of macro 'HVS_SEND_BUF_SIZE'
     u8 data[HVS_SEND_BUF_SIZE];
             ^~~~~~~~~~~~~~~~~
   In file included from include/linux/list.h:9:0,
                    from include/linux/module.h:9,
                    from net/vmw_vsock/hyperv_transport.c:11:
   net/vmw_vsock/hyperv_transport.c: In function 'hvs_open_connection':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
    #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
                              ^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:390:12: note: in expansion of macro 'max_t'
      sndbuf = max_t(int, sk->sk_sndbuf, RINGBUFFER_HVS_SND_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
>> net/vmw_vsock/hyperv_transport.c:391:12: note: in expansion of macro 'min_t'
      sndbuf = min_t(int, sndbuf, RINGBUFFER_HVS_MAX_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:921:27: note: in expansion of macro '__careful_cmp'
    #define max_t(type, x, y) __careful_cmp((type)(x), (type)(y), >)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:393:12: note: in expansion of macro 'max_t'
      rcvbuf = max_t(int, sk->sk_rcvbuf, RINGBUFFER_HVS_RCV_SIZE);
               ^~~~~
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:394:12: note: in expansion of macro 'min_t'
      rcvbuf = min_t(int, rcvbuf, RINGBUFFER_HVS_MAX_SIZE);
               ^~~~~
   net/vmw_vsock/hyperv_transport.c: In function 'hvs_stream_enqueue':
>> include/linux/kernel.h:845:2: error: first argument to '__builtin_choose_expr' not a constant
     __builtin_choose_expr(__safe_cmp(x, y), \
     ^
   include/linux/kernel.h:913:27: note: in expansion of macro '__careful_cmp'
    #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
                              ^~~~~~~~~~~~~
   net/vmw_vsock/hyperv_transport.c:681:14: note: in expansion of macro 'min_t'
      to_write = min_t(ssize_t, to_write, HVS_SEND_BUF_SIZE);
                 ^~~~~

vim +58 net/vmw_vsock/hyperv_transport.c

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 69531 bytes --]

^ permalink raw reply

* [PATCH] clocksource/drivers: hyperv_timer: Fix CPU offlining by unbinding the timer
From: Dexuan Cui @ 2019-07-27  5:07 UTC (permalink / raw)
  To: tglx@linutronix.de, daniel.lezcano@linaro.org,
	gregkh@linuxfoundation.org, sashal@kernel.org, Stephen Hemminger,
	Haiyang Zhang, KY Srinivasan, Michael Kelley,
	linux-hyperv@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, Dexuan Cui

The commit fd1fea6834d0 says "No behavior is changed", but actually it
removes the clockevents_unbind_device() call from hv_synic_cleanup().

In the discussion earlier this month, I thought the unbind call is
unnecessary (see https://www.spinics.net/lists/arm-kernel/msg739888.html),
however, after more investigation, when a VM runs on Hyper-V, it turns out
the unbind call must be kept, otherwise CPU offling may not work, because
a per-cpu timer device is still needed, after hv_synic_cleanup() disables
the per-cpu Hyper-V timer device.

The issue is found in the hibernation test. These are the details:

1. CPU0 hangs in wait_for_ap_thread(), when trying to offline CPU1:

hibernation_snapshot
  create_image
    suspend_disable_secondary_cpus
      freeze_secondary_cpus
        _cpu_down(1, 1, CPUHP_OFFLINE)
          cpuhp_kick_ap_work
            cpuhp_kick_ap
              __cpuhp_kick_ap
                wait_for_ap_thread()

2. CPU0 hangs because CPU1 hangs this way: after CPU1 disables the per-cpu
Hyper-V timer device in hv_synic_cleanup(), CPU1 sets a timer... Please
read on to see how this can happen.

2.1 By "_cpu_down(1, 1, CPUHP_OFFLINE):", CPU0 first tries to move CPU1 to
the CPUHP_TEARDOWN_CPU state and this wakes up the cpuhp/1 thread on CPU1;
the thread is basically a loop of executing various callbacks defined in
the global array cpuhp_hp_states[]: see smpboot_thread_fn().

2.2 This is how a callback is called on CPU1:
  smpboot_thread_fn
    ht->thread_fn(td->cpu), i.e. cpuhp_thread_fun
      cpuhp_invoke_callback
        state = st->state
        st->state--
        cpuhp_get_step(state)->teardown.single()

2.3 At first, the state of CPU1 is CPUHP_ONLINE, which defines a
.teardown.single of NULL, so the execution of the code returns to the loop
in smpboot_thread_fn(), and then reruns cpuhp_invoke_callback() with a
smaller st->state.

2.4 The .teardown.single of every state between CPUHP_ONLINE and
CPUHP_TEARDOWN_CPU runs one by one.

2.5 When it comes to the CPUHP_AP_ONLINE_DYN range, hv_synic_cleanup()
runs: see vmbus_bus_init(). It calls hv_stimer_cleanup() ->
hv_ce_shutdown() to disable the per-cpu timer device, so timer interrupt
will no longer happen on CPU1.

2.6 Later, the .teardown.single of CPUHP_AP_SMPBOOT_THREADS, i.e.
smpboot_park_threads(), starts to run, trying to park all the other
hotplug_threads, e.g. ksoftirqd/1 and rcuc/1; here a timer can be set up
this way and the timer will never be fired since CPU1 doesn't have
an active timer device now, so CPU1 hangs and can not be offlined:
  smpboot_park_threads
    smpboot_park_thread
      kthread_park
        wait_task_inactive
          schedule_hrtimeout(&to, HRTIMER_MODE_REL)

With this patch, when the per-cpu Hyper-V timer device is disabled, the
system switches to the Local APIC timer, and the hang issue can not
happen.

Fixes: fd1fea6834d0 ("clocksource/drivers: Make Hyper-V clocksource ISA agnostic")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 drivers/clocksource/hyperv_timer.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index 41c31a7ac0e4..8f3422c66cbb 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -139,6 +139,7 @@ void hv_stimer_cleanup(unsigned int cpu)
 	/* Turn off clockevent device */
 	if (ms_hyperv.features & HV_MSR_SYNTIMER_AVAILABLE) {
 		ce = per_cpu_ptr(hv_clock_event, cpu);
+		clockevents_unbind_device(ce, cpu);
 		hv_ce_shutdown(ce);
 	}
 }
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH 1/2] Drivers: hv: Specify receive buffer size using Hyper-V page size
From: Stephen Hemminger @ 2019-07-26 16:07 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, linux-hyperv, linux-kernel, himadri18.07
In-Reply-To: <20190725050315.6935-2-himadri18.07@gmail.com>

On Wed, 24 Jul 2019 22:03:14 -0700
"Himadri Pandya" <himadrispandya@gmail.com> wrote:

> The recv_buffer is used to retrieve data from the VMbus ring buffer.
> VMbus ring buffers are sized based on the guest page size which
> Hyper-V assumes to be 4KB. But it may be different on some
> architectures. So use the Hyper-V page size to allocate the
> recv_buffer and set the maximum size to receive.
> 
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>

If pagesize is 64K, then doing it this way will waste lots of
memory.

^ permalink raw reply

* Re: [PATCH 2/2] Drivers: hv: util: Specify ring buffer size using Hyper-V page size
From: Stephen Hemminger @ 2019-07-26 16:06 UTC (permalink / raw)
  To: Himadri Pandya
  Cc: Michael Kelley, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	sashal, linux-hyperv, linux-kernel, himadri18.07
In-Reply-To: <20190725050315.6935-3-himadri18.07@gmail.com>

On Wed, 24 Jul 2019 22:03:15 -0700
"Himadri Pandya" <himadrispandya@gmail.com> wrote:

> VMbus ring buffers are sized based on the 4K page size used by
> Hyper-V. The Linux guest page size may not be 4K on all architectures
> so use the Hyper-V page size to specify the ring buffer size.
> 
> Signed-off-by: Himadri Pandya <himadri18.07@gmail.com>
> ---
>  drivers/hv/hv_util.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
> index c2c08f26bd5f..766bd8457346 100644
> --- a/drivers/hv/hv_util.c
> +++ b/drivers/hv/hv_util.c
> @@ -413,8 +413,9 @@ static int util_probe(struct hv_device *dev,
>  
>  	hv_set_drvdata(dev, srv);
>  
> -	ret = vmbus_open(dev->channel, 4 * PAGE_SIZE, 4 * PAGE_SIZE, NULL,
> 0,
> -			srv->util_cb, dev->channel);
> +	ret = vmbus_open(dev->channel, 4 * HV_HYP_PAGE_SIZE,
> +			 4 * HV_HYP_PAGE_SIZE, NULL, 0, srv->util_cb,
> +			 dev->channel);
>  	if (ret)
>  		goto error;
>  

hv_util doesn't need lots of buffering. Why not define a fixed
value across all architectures. Maybe with some roundup to HV_HYP_PAGE_SIZE.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox