Linux-HyperV List
 help / color / mirror / Atom feed
* Re: properly communicate queue limits to the DMA layer
From: Jens Axboe @ 2019-06-07  5:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sebastian Ott, Sagi Grimberg, Max Gurtovoy, Bart Van Assche,
	Ulf Hansson, Alan Stern, Oliver Neukum, linux-block, linux-rdma,
	linux-mmc, linux-nvme, linux-scsi, megaraidlinux.pdl,
	MPT-FusionLinux.pdl, linux-hyperv, linux-usb, usb-storage,
	linux-kernel
In-Reply-To: <20190605192405.GA18243@lst.de>

On 6/5/19 1:24 PM, Christoph Hellwig wrote:
> On Wed, Jun 05, 2019 at 01:17:15PM -0600, Jens Axboe wrote:
>> Since I'm heading out shortly and since I think this should make
>> the next -rc, I'll tentatively queue this up.
> 
> The SCSI bits will need a bit more review, and possibly tweaking
> fo megaraid and mpt3sas.  But they are really independent of the
> other patches, so maybe skip them for now and leave them for Martin
> to deal with.

I dropped the SCSI bits.

-- 
Jens Axboe


^ permalink raw reply

* Re: properly communicate queue limits to the DMA layer
From: Martin K. Petersen @ 2019-06-07 17:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <f07d0abf-b3eb-f530-37b9-e66454740b3f@kernel.dk>


Jens,

>> The SCSI bits will need a bit more review, and possibly tweaking
>> fo megaraid and mpt3sas.  But they are really independent of the
>> other patches, so maybe skip them for now and leave them for Martin
>> to deal with.
>
> I dropped the SCSI bits.

I'll monitor and merge them.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: properly communicate queue limits to the DMA layer
From: Jens Axboe @ 2019-06-08  8:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <yq1o939i9qh.fsf@oracle.com>

On 6/7/19 11:30 AM, Martin K. Petersen wrote:
> 
> Jens,
> 
>>> The SCSI bits will need a bit more review, and possibly tweaking
>>> fo megaraid and mpt3sas.  But they are really independent of the
>>> other patches, so maybe skip them for now and leave them for Martin
>>> to deal with.
>>
>> I dropped the SCSI bits.
> 
> I'll monitor and merge them.

Great, thanks Martin.

-- 
Jens Axboe


^ permalink raw reply

* Re: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-08  8:14 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, PDL,MEGARAIDLINUX, PDL-MPT-FUSIONLINUX, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <cd713506efb9579d1f69a719d831c28d@mail.gmail.com>

On Thu, Jun 06, 2019 at 09:07:27PM +0530, Kashyap Desai wrote:
> Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We want
> to confirm few sanity before ACK. BTW, what benefit we will see moving
> virt_boundry setting to SCSI mid layer ? Is it just modular approach OR any
> functional fix ?

The big difference is that virt_boundary now also changes the
max_segment_size, and this ensures that this limit is also communicated
to the DMA mapping layer.

^ permalink raw reply

* [PATCH hyperv-fixes] hv_netvsc: Set probe mode to sync
From: Haiyang Zhang @ 2019-06-10 20:49 UTC (permalink / raw)
  To: sashal@kernel.org, linux-hyperv@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger

For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 03ea5a7..afdcc56 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2407,7 +2407,7 @@ static int netvsc_remove(struct hv_device *dev)
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
 	.driver = {
-		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 	},
 };
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Dmitry Safonov @ 2019-06-11 21:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	Vitaly Kuznetsov, devel, kvm, linux-hyperv, x86

KVM support may be compiled as dynamic module, which triggers the
following splat on modprobe:

 KVM: vmx: using Hyper-V Enlightened VMCS
 BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
 CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
 Call Trace:
  dump_stack+0x61/0x7e
  check_preemption_disabled+0xd4/0xe6
  debug_smp_processor_id+0x17/0x19
  set_hv_tscchange_cb+0x1b/0x89
  kvm_arch_init+0x14a/0x163 [kvm]
  kvm_init+0x30/0x259 [kvm]
  vmx_init+0xed/0x3db [kvm_intel]
  do_one_initcall+0x89/0x1bc
  do_init_module+0x5f/0x207
  load_module+0x1b34/0x209b
  __ia32_sys_init_module+0x17/0x19
  do_fast_syscall_32+0x121/0x1fa
  entry_SYSENTER_compat+0x7f/0x91

The easiest solution seems to be disabling preemption while setting up
reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.

Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
support")

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>

Cc: devel@linuxdriverproject.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/hyperv/hv_init.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..0bdd79ecbff8 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	void **input_arg;
 	struct page *pg;
 
@@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
 
 	hv_get_vp_index(msr_vp_index);
 
-	hv_vp_index[smp_processor_id()] = msr_vp_index;
+	hv_vp_index[cpu] = msr_vp_index;
 
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
@@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	struct hv_reenlightenment_control re_ctrl = {
 		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
 		.enabled = 1,
-		.target_vp = hv_vp_index[smp_processor_id()]
 	};
 	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
 
@@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	/* Make sure callback is registered before we write to MSRs */
 	wmb();
 
+	preempt_disable();
+	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
 	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	preempt_enable();
+
 	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
 }
 EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Peter Zijlstra @ 2019-06-12  9:35 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	Vitaly Kuznetsov, devel, kvm, linux-hyperv, x86
In-Reply-To: <20190611212003.26382-1-dima@arista.com>

On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
> 
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91
> 
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
> 
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;
> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +
>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);

This looks bogus, MSRs are a per-cpu resource, you had better know what
CPUs you're on and be stuck to it when you do wrmsr. This just fudges
the code to make the warning go away and doesn't fix the actual problem
afaict.

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-12 10:17 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel
  Cc: Prasanna Panchamukhi, Andy Lutomirski, Borislav Petkov,
	Cathy Avery, Haiyang Zhang, H. Peter Anvin, Ingo Molnar,
	K. Y. Srinivasan, Michael Kelley (EOSG), Mohammed Gamal,
	Paolo Bonzini, Radim Krčmář, Roman Kagan,
	Sasha Levin, Stephen Hemminger, Thomas Gleixner, devel, kvm,
	linux-hyperv, x86
In-Reply-To: <20190611212003.26382-1-dima@arista.com>

Dmitry Safonov <dima@arista.com> writes:

> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
>
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91

Hm, I never noticed this one, you probably need something like
CONFIG_PREEMPT enabled so see it.

>
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;

The above is unrelated cleanup (as cpu == smp_processor_id() for
CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
preempted.

> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +

My personal preference would be to do something like
   int cpu = get_cpu();

   ... set things up ...

   put_cpu();

instead, there are no long-running things in the whole function. But
what you've done should work too, so

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-12 10:25 UTC (permalink / raw)
  To: Peter Zijlstra, Dmitry Safonov
  Cc: linux-kernel, Prasanna Panchamukhi, Andy Lutomirski,
	Borislav Petkov, Cathy Avery, Haiyang Zhang, H. Peter Anvin,
	Ingo Molnar, K. Y. Srinivasan, Michael Kelley (EOSG),
	Mohammed Gamal, Paolo Bonzini, Radim Krčmář,
	Roman Kagan, Sasha Levin, Stephen Hemminger, Thomas Gleixner,
	devel, kvm, linux-hyperv, x86
In-Reply-To: <20190612093506.GH3436@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
>> KVM support may be compiled as dynamic module, which triggers the
>> following splat on modprobe:
>> 
>>  KVM: vmx: using Hyper-V Enlightened VMCS
>>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>>  Call Trace:
>>   dump_stack+0x61/0x7e
>>   check_preemption_disabled+0xd4/0xe6
>>   debug_smp_processor_id+0x17/0x19
>>   set_hv_tscchange_cb+0x1b/0x89
>>   kvm_arch_init+0x14a/0x163 [kvm]
>>   kvm_init+0x30/0x259 [kvm]
>>   vmx_init+0xed/0x3db [kvm_intel]
>>   do_one_initcall+0x89/0x1bc
>>   do_init_module+0x5f/0x207
>>   load_module+0x1b34/0x209b
>>   __ia32_sys_init_module+0x17/0x19
>>   do_fast_syscall_32+0x121/0x1fa
>>   entry_SYSENTER_compat+0x7f/0x91
>> 
>> The easiest solution seems to be disabling preemption while setting up
>> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>> 
>> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
>> support")
>> 
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Cathy Avery <cavery@redhat.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
>> Cc: Mohammed Gamal <mmorsy@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Roman Kagan <rkagan@virtuozzo.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> Cc: devel@linuxdriverproject.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-hyperv@vger.kernel.org
>> Cc: x86@kernel.org
>> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>> ---
>>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 1608050e9df9..0bdd79ecbff8 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>>  static int hv_cpu_init(unsigned int cpu)
>>  {
>>  	u64 msr_vp_index;
>> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>>  	void **input_arg;
>>  	struct page *pg;
>>  
>> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>>  
>>  	hv_get_vp_index(msr_vp_index);
>>  
>> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
>> +	hv_vp_index[cpu] = msr_vp_index;
>>  
>>  	if (msr_vp_index > hv_max_vp_index)
>>  		hv_max_vp_index = msr_vp_index;
>> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	struct hv_reenlightenment_control re_ctrl = {
>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>  		.enabled = 1,
>> -		.target_vp = hv_vp_index[smp_processor_id()]
>>  	};
>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>  
>> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	/* Make sure callback is registered before we write to MSRs */
>>  	wmb();
>>  
>> +	preempt_disable();
>> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
>> +	preempt_enable();
>> +
>>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>>  }
>>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
>
> This looks bogus, MSRs are a per-cpu resource, you had better know what
> CPUs you're on and be stuck to it when you do wrmsr. This just fudges
> the code to make the warning go away and doesn't fix the actual problem
> afaict.

Actually, we don't care which CPU will receive the reenlightenment
notification and TSC Emulation in Hyper-V is, of course, global. We have
code which re-assignes the notification to some other CPU in case the
one it's currently assigned to goes away (see hv_cpu_die()).

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 1/5] x86: hv: hyperv-tlfs.h: Create and use Hyper-V page definitions
From: Vitaly Kuznetsov @ 2019-06-12 10:36 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: x86, linux-hyperv, linux-kernel, mikelley, kys, haiyangz,
	sthemmin, sashal
In-Reply-To: <67be3e283c0f28326f9c31a64f399fe659ad5690.1559807514.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Define HV_HYP_PAGE_SHIFT, HV_HYP_PAGE_SIZE, and HV_HYP_PAGE_MASK because
> the Linux guest page size and hypervisor page size concepts are
> different, even though they happen to be the same value on x86.
>
> Also, replace PAGE_SIZE with HV_HYP_PAGE_SIZE.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
> index cdf44aa9a501..44bd68aefd00 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -12,6 +12,16 @@
>  #include <linux/types.h>
>  #include <asm/page.h>
>  
> +/*
> + * While not explicitly listed in the TLFS, Hyper-V always runs with a page size
> + * of 4096. These definitions are used when communicating with Hyper-V using
> + * guest physical pages and guest physical page addresses, since the guest page
> + * size may not be 4096 on all architectures.
> + */
> +#define HV_HYP_PAGE_SHIFT	12
> +#define HV_HYP_PAGE_SIZE	BIT(HV_HYP_PAGE_SHIFT)
> +#define HV_HYP_PAGE_MASK	(~(HV_HYP_PAGE_SIZE - 1))
> +
>  /*
>   * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
>   * is set by CPUID(HvCpuIdFunctionVersionAndFeatures).
> @@ -841,7 +851,7 @@ union hv_gpa_page_range {
>   * count is equal with how many entries of union hv_gpa_page_range can
>   * be populated into the input parameter page.
>   */
> -#define HV_MAX_FLUSH_REP_COUNT ((PAGE_SIZE - 2 * sizeof(u64)) /	\
> +#define HV_MAX_FLUSH_REP_COUNT ((HV_HYP_PAGE_SIZE - 2 * sizeof(u64)) /	\
>  				sizeof(union hv_gpa_page_range))
>  
>  struct hv_guest_mapping_flush_list {

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 2/5] x86: hv: hv_init.c: Add functions to allocate/deallocate page for Hyper-V
From: Vitaly Kuznetsov @ 2019-06-12 10:36 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: x86, linux-hyperv, linux-kernel, mikelley, kys, haiyangz,
	sthemmin, sashal
In-Reply-To: <5cf4ad6f3fae8dec33e364b367b99cbb5b0f2ba4.1559807514.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Introduce two new functions, hv_alloc_hyperv_page() and
> hv_free_hyperv_page(), to allocate/deallocate memory with the size and
> alignment that Hyper-V expects as a page. Although currently they are
> not used, they are ready to be used to allocate/deallocate memory on x86
> when their ARM64 counterparts are implemented, keeping symmetry between
> architectures with potentially different guest page sizes.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  arch/x86/hyperv/hv_init.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e4ba467a9fc6..84baf0e9a2d4 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -98,6 +98,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  
> +void *hv_alloc_hyperv_page(void)
> +{
> +	BUILD_BUG_ON(!(PAGE_SIZE == HV_HYP_PAGE_SIZE));

(nit)

PAGE_SIZE != HV_HYP_PAGE_SIZE ?

> +
> +	return (void *)__get_free_page(GFP_KERNEL);
> +}
> +EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
> +
> +void hv_free_hyperv_page(unsigned long addr)
> +{
> +	free_page(addr);
> +}
> +EXPORT_SYMBOL_GPL(hv_free_hyperv_page);
> +
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 3/5] hv: vmbus: Replace page definition with Hyper-V specific one
From: Vitaly Kuznetsov @ 2019-06-12 10:38 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: x86, linux-hyperv, linux-kernel, mikelley, kys, haiyangz,
	sthemmin, sashal
In-Reply-To: <210c56ddb1dafc20ba289e6be9165efe8a5e818c.1559807514.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Replace PAGE_SIZE with HV_HYP_PAGE_SIZE because the guest page size may
> not be 4096 on all architectures and Hyper-V always runs with a page
> size of 4096.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  drivers/hv/hyperv_vmbus.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> index e5467b821f41..5489b061d261 100644
> --- a/drivers/hv/hyperv_vmbus.h
> +++ b/drivers/hv/hyperv_vmbus.h
> @@ -208,11 +208,11 @@ int hv_ringbuffer_read(struct vmbus_channel *channel,
>  		       u64 *requestid, bool raw);
>  
>  /*
> - * Maximum channels is determined by the size of the interrupt page
> - * which is PAGE_SIZE. 1/2 of PAGE_SIZE is for send endpoint interrupt
> - * and the other is receive endpoint interrupt
> + * Maximum channels, 16348, is determined by the size of the interrupt page,
> + * which is HV_HYP_PAGE_SIZE. 1/2 of HV_HYP_PAGE_SIZE is to send endpoint
> + * interrupt, and the other is to receive endpoint interrupt.
>   */
> -#define MAX_NUM_CHANNELS	((PAGE_SIZE >> 1) << 3)	/* 16348 channels */
> +#define MAX_NUM_CHANNELS	((HV_HYP_PAGE_SIZE >> 1) << 3)
>  
>  /* The value here must be in multiple of 32 */
>  /* TODO: Need to make this configurable */

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH v2 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Vitaly Kuznetsov @ 2019-06-12 10:40 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: x86, linux-hyperv, linux-kernel, mikelley, kys, haiyangz,
	sthemmin, sashal
In-Reply-To: <0e9385a241dc7c26445eb7e104d08e2e2c5d30de.1559807514.git.m.maya.nakamura@gmail.com>

Maya Nakamura <m.maya.nakamura@gmail.com> writes:

> Define the ring buffer size as a constant expression because it should
> not depend on the guest page size.
>
> Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
> ---
>  drivers/hid/hid-hyperv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
> index d3311d714d35..e8b154fa38e2 100644
> --- a/drivers/hid/hid-hyperv.c
> +++ b/drivers/hid/hid-hyperv.c
> @@ -112,8 +112,8 @@ struct synthhid_input_report {
>  
>  #pragma pack(pop)
>  
> -#define INPUTVSC_SEND_RING_BUFFER_SIZE		(10*PAGE_SIZE)
> -#define INPUTVSC_RECV_RING_BUFFER_SIZE		(10*PAGE_SIZE)
> +#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
> +#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
>  

My understanding is that this size is pretty arbitrary and as I see you
use it for hyperv-keyboard.c as well. It may make sense to have a
define, something like HYPERV_STD_RINGBUFFER_SIZE.

>  
>  enum pipe_prot_msg_type {

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Christoph Hellwig @ 2019-06-13  8:44 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Christoph Hellwig, Jens Axboe, Sebastian Ott, Sagi Grimberg,
	Max Gurtovoy, Bart Van Assche, Ulf Hansson, Alan Stern,
	Oliver Neukum, linux-block, linux-rdma, linux-mmc, linux-nvme,
	linux-scsi, PDL,MEGARAIDLINUX, PDL-MPT-FUSIONLINUX, linux-hyperv,
	linux-usb, usb-storage, linux-kernel
In-Reply-To: <cd713506efb9579d1f69a719d831c28d@mail.gmail.com>

So before I respin this series, can you help with a way to figure out
for mpt3sas and megaraid if a given controller supports NVMe devices
at all, so that we don't have to set the virt boundary if not?

^ permalink raw reply

* [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-06-13  6:48 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: linux-kernel, Ingo Molnar, Borislav Petkov, x86, Thomas Gleixner,
	Dave Hansen, Nadav Amit, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Juergen Gross, Paolo Bonzini,
	Boris Ostrovsky, linux-hyperv, virtualization, kvm, xen-devel
In-Reply-To: <20190613064813.8102-1-namit@vmware.com>

To improve TLB shootdown performance, flush the remote and local TLBs
concurrently. Introduce flush_tlb_multi() that does so. The current
flush_tlb_others() interface is kept, since paravirtual interfaces need
to be adapted first before it can be removed. This is left for future
work. In such PV environments, TLB flushes are not performed, at this
time, concurrently.

Add a static key to tell whether this new interface is supported.

Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: Juergen Gross <jgross@suse.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: linux-hyperv@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: kvm@vger.kernel.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/hyperv/mmu.c                 |  2 +
 arch/x86/include/asm/paravirt.h       |  8 +++
 arch/x86/include/asm/paravirt_types.h |  6 +++
 arch/x86/include/asm/tlbflush.h       |  6 +++
 arch/x86/kernel/kvm.c                 |  1 +
 arch/x86/kernel/paravirt.c            |  3 ++
 arch/x86/mm/tlb.c                     | 71 ++++++++++++++++++++++-----
 arch/x86/xen/mmu_pv.c                 |  2 +
 8 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
index e65d7fe6489f..ca28b400c87c 100644
--- a/arch/x86/hyperv/mmu.c
+++ b/arch/x86/hyperv/mmu.c
@@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void)
 	pr_info("Using hypercall for remote TLB flush\n");
 	pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
 	pv_ops.mmu.tlb_remove_table = tlb_remove_table;
+
+	static_key_disable(&flush_tlb_multi_enabled.key);
 }
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25c38a05c1c..192be7254457 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -47,6 +47,8 @@ static inline void slow_down_io(void)
 #endif
 }
 
+DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
+
 static inline void __flush_tlb(void)
 {
 	PVOP_VCALL0(mmu.flush_tlb_user);
@@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr)
 	PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
 }
 
+static inline void flush_tlb_multi(const struct cpumask *cpumask,
+				   const struct flush_tlb_info *info)
+{
+	PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info);
+}
+
 static inline void flush_tlb_others(const struct cpumask *cpumask,
 				    const struct flush_tlb_info *info)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 946f8f1f1efc..b93b3d90729a 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -211,6 +211,12 @@ struct pv_mmu_ops {
 	void (*flush_tlb_user)(void);
 	void (*flush_tlb_kernel)(void);
 	void (*flush_tlb_one_user)(unsigned long addr);
+	/*
+	 * flush_tlb_multi() is the preferred interface, which is capable to
+	 * flush both local and remote CPUs.
+	 */
+	void (*flush_tlb_multi)(const struct cpumask *cpus,
+				const struct flush_tlb_info *info);
 	void (*flush_tlb_others)(const struct cpumask *cpus,
 				 const struct flush_tlb_info *info);
 
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index dee375831962..79272938cf79 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
 	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
 }
 
+void native_flush_tlb_multi(const struct cpumask *cpumask,
+			     const struct flush_tlb_info *info);
+
 void native_flush_tlb_others(const struct cpumask *cpumask,
 			     const struct flush_tlb_info *info);
 
@@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
 extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
 
 #ifndef CONFIG_PARAVIRT
+#define flush_tlb_multi(mask, info)	\
+	native_flush_tlb_multi(mask, info)
+
 #define flush_tlb_others(mask, info)	\
 	native_flush_tlb_others(mask, info)
 
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5169b8cc35bb..00d81e898717 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -630,6 +630,7 @@ static void __init kvm_guest_init(void)
 	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
 		pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others;
 		pv_ops.mmu.tlb_remove_table = tlb_remove_table;
+		static_key_disable(&flush_tlb_multi_enabled.key);
 	}
 
 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 98039d7fb998..ac00afed5570 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -159,6 +159,8 @@ unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
 	return insn_len;
 }
 
+DEFINE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
+
 static void native_flush_tlb(void)
 {
 	__native_flush_tlb();
@@ -363,6 +365,7 @@ struct paravirt_patch_template pv_ops = {
 	.mmu.flush_tlb_user	= native_flush_tlb,
 	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
 	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
+	.mmu.flush_tlb_multi	= native_flush_tlb_multi,
 	.mmu.flush_tlb_others	= native_flush_tlb_others,
 	.mmu.tlb_remove_table	=
 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c34bcf03f06f..db73d5f1dd43 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -551,7 +551,7 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 		 * garbage into our TLB.  Since switching to init_mm is barely
 		 * slower than a minimal flush, just switch to init_mm.
 		 *
-		 * This should be rare, with native_flush_tlb_others skipping
+		 * This should be rare, with native_flush_tlb_multi skipping
 		 * IPIs to lazy TLB mode CPUs.
 		 */
 		switch_mm_irqs_off(NULL, &init_mm, NULL);
@@ -635,9 +635,12 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
 	this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
 }
 
-static void flush_tlb_func_local(const void *info, enum tlb_flush_reason reason)
+static void flush_tlb_func_local(void *info)
 {
 	const struct flush_tlb_info *f = info;
+	enum tlb_flush_reason reason;
+
+	reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
 
 	flush_tlb_func_common(f, true, reason);
 }
@@ -655,14 +658,21 @@ static void flush_tlb_func_remote(void *info)
 	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
 }
 
-static bool tlb_is_not_lazy(int cpu, void *data)
+static inline bool tlb_is_not_lazy(int cpu)
 {
 	return !per_cpu(cpu_tlbstate.is_lazy, cpu);
 }
 
-void native_flush_tlb_others(const struct cpumask *cpumask,
-			     const struct flush_tlb_info *info)
+static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
+
+void native_flush_tlb_multi(const struct cpumask *cpumask,
+			    const struct flush_tlb_info *info)
 {
+	/*
+	 * Do accounting and tracing. Note that there are (and have always been)
+	 * cases in which a remote TLB flush will be traced, but eventually
+	 * would not happen.
+	 */
 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
 	if (info->end == TLB_FLUSH_ALL)
 		trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
@@ -682,10 +692,14 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 		 * means that the percpu tlb_gen variables won't be updated
 		 * and we'll do pointless flushes on future context switches.
 		 *
-		 * Rather than hooking native_flush_tlb_others() here, I think
+		 * Rather than hooking native_flush_tlb_multi() here, I think
 		 * that UV should be updated so that smp_call_function_many(),
 		 * etc, are optimal on UV.
 		 */
+		local_irq_disable();
+		flush_tlb_func_local((__force void *)info);
+		local_irq_enable();
+
 		cpumask = uv_flush_tlb_others(cpumask, info);
 		if (cpumask)
 			smp_call_function_many(cpumask, flush_tlb_func_remote,
@@ -704,11 +718,39 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
 	 * doing a speculative memory access.
 	 */
 	if (info->freed_tables)
-		smp_call_function_many(cpumask, flush_tlb_func_remote,
-			       (void *)info, 1);
-	else
-		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func_remote,
-				(void *)info, 1, GFP_ATOMIC, cpumask);
+		__smp_call_function_many(cpumask, flush_tlb_func_remote,
+					 flush_tlb_func_local, (void *)info, 1);
+	else {
+		/*
+		 * Although we could have used on_each_cpu_cond_mask(),
+		 * open-coding it has several performance advantages: (1) we can
+		 * use specialized functions for remote and local flushes; (2)
+		 * no need for indirect branch to test if TLB is lazy; (3) we
+		 * can use a designated cpumask for evaluating the condition
+		 * instead of allocating a new one.
+		 *
+		 * This works under the assumption that there are no nested TLB
+		 * flushes, an assumption that is already made in
+		 * flush_tlb_mm_range().
+		 */
+		struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask);
+		int cpu;
+
+		cpumask_clear(cond_cpumask);
+
+		for_each_cpu(cpu, cpumask) {
+			if (tlb_is_not_lazy(cpu))
+				__cpumask_set_cpu(cpu, cond_cpumask);
+		}
+		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
+					 flush_tlb_func_local, (void *)info, 1);
+	}
+}
+
+void native_flush_tlb_others(const struct cpumask *cpumask,
+			     const struct flush_tlb_info *info)
+{
+	native_flush_tlb_multi(cpumask, info);
 }
 
 /*
@@ -774,10 +816,15 @@ static void flush_tlb_on_cpus(const cpumask_t *cpumask,
 {
 	int this_cpu = smp_processor_id();
 
+	if (static_branch_likely(&flush_tlb_multi_enabled)) {
+		flush_tlb_multi(cpumask, info);
+		return;
+	}
+
 	if (cpumask_test_cpu(this_cpu, cpumask)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
-		flush_tlb_func_local(info, TLB_LOCAL_MM_SHOOTDOWN);
+		flush_tlb_func_local((__force void *)info);
 		local_irq_enable();
 	}
 
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index beb44e22afdf..0cb277848cb4 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2474,6 +2474,8 @@ void __init xen_init_mmu_ops(void)
 
 	pv_ops.mmu = xen_mmu_ops;
 
+	static_key_disable(&flush_tlb_multi_enabled.key);
+
 	memset(dummy_mapping, 0xff, PAGE_SIZE);
 }
 
-- 
2.20.1


^ permalink raw reply related

* [PATCH net] hvsock: fix epollout hang from race condition
From: Sunil Muthuswamy @ 2019-06-12 21:19 UTC (permalink / raw)
  To: KY Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	David S. Miller, Dexuan Cui, Michael Kelley
  Cc: netdev@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org

Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
not return even when the hvsock socket is writable, under some race
condition. This can happen under the following sequence:
- fd = socket(hvsocket)
- fd_out = dup(fd)
- fd_in = dup(fd)
- start a writer thread that writes data to fd_out with a combination of
  epoll_wait(fd_out, EPOLLOUT) and
- start a reader thread that reads data from fd_in with a combination of
  epoll_wait(fd_in, EPOLLIN)
- On the host, there are two threads that are reading/writing data to the
  hvsocket

stack:
hvs_stream_has_space
hvs_notify_poll_out
vsock_poll
sock_poll
ep_poll

Race condition:
check for epollout from ep_poll():
	assume no writable space in the socket
	hvs_stream_has_space() returns 0
check for epollin from ep_poll():
	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
	hvs_stream_has_space() will clear the channel pending send size
	host will not notify the guest because the pending send size has
		been cleared and so the hvsocket will never mark the
		socket writable

Now, the EPOLLOUT will never return even if the socket write buffer is
empty.

The fix is to set the pending size to the default size and never change it.
This way the host will always notify the guest whenever the writable space
is bigger than the pending size. The host is already optimized to *only*
notify the guest when the pending size threshold boundary is crossed and
not everytime.

This change also reduces the cpu usage somewhat since hv_stream_has_space()
is in the hotpath of send:
vsock_stream_sendmsg()->hv_stream_has_space()
Earlier hv_stream_has_space was setting/clearing the pending size on every
call.

Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
---
 net/vmw_vsock/hyperv_transport.c | 39 ++++++++-------------------------------
 1 file changed, 8 insertions(+), 31 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 982a8dc..8d1ea9e 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -220,18 +220,6 @@ static void hvs_set_channel_pending_send_size(struct vmbus_channel *chan)
 	set_channel_pending_send_size(chan,
 				      HVS_PKT_LEN(HVS_SEND_BUF_SIZE));
 
-	/* See hvs_stream_has_space(): we must make sure the host has seen
-	 * the new pending send size, before we can re-check the writable
-	 * bytes.
-	 */
-	virt_mb();
-}
-
-static void hvs_clear_channel_pending_send_size(struct vmbus_channel *chan)
-{
-	set_channel_pending_send_size(chan, 0);
-
-	/* Ditto */
 	virt_mb();
 }
 
@@ -301,9 +289,6 @@ static void hvs_channel_cb(void *ctx)
 	if (hvs_channel_readable(chan))
 		sk->sk_data_ready(sk);
 
-	/* See hvs_stream_has_space(): when we reach here, the writable bytes
-	 * may be already less than HVS_PKT_LEN(HVS_SEND_BUF_SIZE).
-	 */
 	if (hv_get_bytes_to_write(&chan->outbound) > 0)
 		sk->sk_write_space(sk);
 }
@@ -404,6 +389,13 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 	set_per_channel_state(chan, conn_from_host ? new : sk);
 	vmbus_set_chn_rescind_callback(chan, hvs_close_connection);
 
+	/* Set the pending send size to max packet size to always get
+	 * notifications from the host when there is enough writable space.
+	 * The host is optimized to send notifications only when the pending
+	 * size boundary is crossed, and not always.
+	 */
+	hvs_set_channel_pending_send_size(chan);
+
 	if (conn_from_host) {
 		new->sk_state = TCP_ESTABLISHED;
 		sk->sk_ack_backlog++;
@@ -697,23 +689,8 @@ static s64 hvs_stream_has_data(struct vsock_sock *vsk)
 static s64 hvs_stream_has_space(struct vsock_sock *vsk)
 {
 	struct hvsock *hvs = vsk->trans;
-	struct vmbus_channel *chan = hvs->chan;
-	s64 ret;
-
-	ret = hvs_channel_writable_bytes(chan);
-	if (ret > 0)  {
-		hvs_clear_channel_pending_send_size(chan);
-	} else {
-		/* See hvs_channel_cb() */
-		hvs_set_channel_pending_send_size(chan);
-
-		/* Re-check the writable bytes to avoid race */
-		ret = hvs_channel_writable_bytes(chan);
-		if (ret > 0)
-			hvs_clear_channel_pending_send_size(chan);
-	}
 
-	return ret;
+	return hvs_channel_writable_bytes(hvs->chan);
 }
 
 static u64 hvs_stream_rcvhiwat(struct vsock_sock *vsk)
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Thomas Gleixner @ 2019-06-13 19:00 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dmitry Safonov, linux-kernel, Prasanna Panchamukhi,
	Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
	H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
	Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
	Radim Krčmář, Roman Kagan, Sasha Levin,
	Stephen Hemminger, devel, kvm, linux-hyperv, x86
In-Reply-To: <8736kff6q3.fsf@vitty.brq.redhat.com>

On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote:
> Dmitry Safonov <dima@arista.com> writes:
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 1608050e9df9..0bdd79ecbff8 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> >  	void **input_arg;
> >  	struct page *pg;
> >  
> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
> >  
> >  	hv_get_vp_index(msr_vp_index);
> >  
> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> > +	hv_vp_index[cpu] = msr_vp_index;
> >  
> >  	if (msr_vp_index > hv_max_vp_index)
> >  		hv_max_vp_index = msr_vp_index;
> 
> The above is unrelated cleanup (as cpu == smp_processor_id() for
> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
> preempted.

They can be preempted, but they are guaranteed to run on the upcoming CPU,
i.e. smp_processor_id() is allowed even in preemptible context as the task
cannot migrate.

Thanks,

	tglx

^ permalink raw reply

* RE: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Kashyap Desai @ 2019-06-13 19:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	PDL,MEGARAIDLINUX, PDL-MPT-FUSIONLINUX, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <20190608081400.GA19573@lst.de>

>
> On Thu, Jun 06, 2019 at 09:07:27PM +0530, Kashyap Desai wrote:
> > Hi Christoph, Changes for <megaraid_sas> and <mpt3sas> looks good. We
> > want to confirm few sanity before ACK. BTW, what benefit we will see
> > moving virt_boundry setting to SCSI mid layer ? Is it just modular
> > approach OR any functional fix ?
>
> The big difference is that virt_boundary now also changes the
> max_segment_size, and this ensures that this limit is also communicated
to
> the DMA mapping layer.
Is there any changes in API  blk_queue_virt_boundary? I could not find
relevant code which account for this. Can you help ?
Which git repo shall I use for testing ? That way I can confirm, I didn't
miss relevant changes.

From your above explanation, it means (after this patch) max segment size
of the MR controller will be set to 4K.
Earlier it is possible to receive single SGE of 64K datalength (Since max
seg size was 64K), but now the same buffer will reach the driver having 16
SGEs (Each SGE will contain 4K length).
Right ?

Kashyap

^ permalink raw reply

* RE: [PATCH 10/13] megaraid_sas: set virt_boundary_mask in the scsi host
From: Kashyap Desai @ 2019-06-13 20:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sebastian Ott, Sagi Grimberg, Max Gurtovoy,
	Bart Van Assche, Ulf Hansson, Alan Stern, Oliver Neukum,
	linux-block, linux-rdma, linux-mmc, linux-nvme, linux-scsi,
	PDL,MEGARAIDLINUX, PDL-MPT-FUSIONLINUX, linux-hyperv, linux-usb,
	usb-storage, linux-kernel
In-Reply-To: <20190613084458.GB13221@lst.de>

>
> So before I respin this series, can you help with a way to figure out
for
> mpt3sas and megaraid if a given controller supports NVMe devices at all,
so
> that we don't have to set the virt boundary if not?


In MegaRaid we have below enum -        VENTURA_SERIES and AERO_SERIES
supports NVME

enum MR_ADAPTER_TYPE {
        MFI_SERIES = 1,
        THUNDERBOLT_SERIES = 2,
        INVADER_SERIES = 3,
        VENTURA_SERIES = 4,
        AERO_SERIES = 5,
};

In mpt3sas driver we have below method - If IOC FACT reports NVME Device
support in Protocol Flags, we can consider it as HBA with NVME drive
support.

ioc->facts.ProtocolFlags & MPI2_IOCFACTS_PROTOCOL_NVME_DEVICES

Kashyap

^ permalink raw reply

* [PATCH net] hv_netvsc: Set probe mode to sync
From: Haiyang Zhang @ 2019-06-13 21:06 UTC (permalink / raw)
  To: sashal@kernel.org, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org
  Cc: Haiyang Zhang, KY Srinivasan, Stephen Hemminger, olaf@aepfle.de,
	vkuznets, davem@davemloft.net, linux-kernel@vger.kernel.org

For better consistency of synthetic NIC names, we set the probe mode to
PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
channel offer sequence.

Fixes: af0a5646cb8d ("use the new async probing feature for the hyperv drivers")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 03ea5a7..afdcc56 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2407,7 +2407,7 @@ static int netvsc_remove(struct hv_device *dev)
 	.probe = netvsc_probe,
 	.remove = netvsc_remove,
 	.driver = {
-		.probe_type = PROBE_PREFER_ASYNCHRONOUS,
+		.probe_type = PROBE_FORCE_SYNCHRONOUS,
 	},
 };
 
-- 
1.8.3.1


^ permalink raw reply related

* RE: [PATCH net] hvsock: fix epollout hang from race condition
From: Dexuan Cui @ 2019-06-13 23:35 UTC (permalink / raw)
  To: Sunil Muthuswamy, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, David S. Miller, Michael Kelley
  Cc: netdev@vger.kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <MW2PR2101MB11164C6EEAA5C511B395EF3AC0EC0@MW2PR2101MB1116.namprd21.prod.outlook.com>

> From: Sunil Muthuswamy <sunilmut@microsoft.com>
> Sent: Wednesday, June 12, 2019 2:19 PM
>  ...
> The fix is to set the pending size to the default size and never change it.
> This way the host will always notify the guest whenever the writable space
> is bigger than the pending size. The host is already optimized to *only*
> notify the guest when the pending size threshold boundary is crossed and
> not everytime.
> 
> This change also reduces the cpu usage somewhat since
> hv_stream_has_space()
> is in the hotpath of send:
> vsock_stream_sendmsg()->hv_stream_has_space()
> Earlier hv_stream_has_space was setting/clearing the pending size on every
> call.
> 
> Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>

Hi Sunil, thanks for the fix! It looks good.

Reviewed-by: Dexuan Cui <decui@microsoft.com>


^ permalink raw reply

* RE: [PATCH v2 4/5] HID: hv: Remove dependencies on PAGE_SIZE for ring buffer
From: Vitaly Kuznetsov @ 2019-06-14  7:53 UTC (permalink / raw)
  To: Michael Kelley, m.maya.nakamura
  Cc: x86@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, KY Srinivasan, Haiyang Zhang,
	Stephen Hemminger, sashal@kernel.org
In-Reply-To: <BL0PR2101MB134877ED5DCB9F23033C92D9D7EF0@BL0PR2101MB1348.namprd21.prod.outlook.com>

Michael Kelley <mikelley@microsoft.com> writes:

> From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Wednesday, June 12, 2019 3:40 AM
>> Maya Nakamura <m.maya.nakamura@gmail.com> writes:
>> 
>> > Define the ring buffer size as a constant expression because it should
>> > not depend on the guest page size.
>> >
>> > Signed-off-by: Maya Nakamura <m.maya.nakamura@gmail.com>
>> > ---
>> >  drivers/hid/hid-hyperv.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/hid/hid-hyperv.c b/drivers/hid/hid-hyperv.c
>> > index d3311d714d35..e8b154fa38e2 100644
>> > --- a/drivers/hid/hid-hyperv.c
>> > +++ b/drivers/hid/hid-hyperv.c
>> > @@ -112,8 +112,8 @@ struct synthhid_input_report {
>> >
>> >  #pragma pack(pop)
>> >
>> > -#define INPUTVSC_SEND_RING_BUFFER_SIZE		(10*PAGE_SIZE)
>> > -#define INPUTVSC_RECV_RING_BUFFER_SIZE		(10*PAGE_SIZE)
>> > +#define INPUTVSC_SEND_RING_BUFFER_SIZE		(40 * 1024)
>> > +#define INPUTVSC_RECV_RING_BUFFER_SIZE		(40 * 1024)
>> >
>> 
>> My understanding is that this size is pretty arbitrary and as I see you
>> use it for hyperv-keyboard.c as well. It may make sense to have a
>> define, something like HYPERV_STD_RINGBUFFER_SIZE.
>
> Yes, the size is pretty arbitrary because it hasn't been important enough
> from a memory consumption or performance standpoint to run experiments
> to see if a smaller value could be used.  That said, I would not want to
> link these two devices (keyboard and mouse) by using a shared ring buffer
> size definition.  Logically, the ring buffer sizes are independent of each other,
> and using a common #define implies that they are somehow linked.

Ok, makes sense, let's keep them separate.

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-14  8:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Dmitry Safonov, linux-kernel, Prasanna Panchamukhi,
	Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
	H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
	Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
	Radim Krčmář, Roman Kagan, Sasha Levin,
	Stephen Hemminger, devel, kvm, linux-hyperv, x86
In-Reply-To: <alpine.DEB.2.21.1906132059020.1791@nanos.tec.linutronix.de>

Thomas Gleixner <tglx@linutronix.de> writes:

> On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote:
>> Dmitry Safonov <dima@arista.com> writes:
>> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > index 1608050e9df9..0bdd79ecbff8 100644
>> > --- a/arch/x86/hyperv/hv_init.c
>> > +++ b/arch/x86/hyperv/hv_init.c
>> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> >  static int hv_cpu_init(unsigned int cpu)
>> >  {
>> >  	u64 msr_vp_index;
>> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>> >  	void **input_arg;
>> >  	struct page *pg;
>> >  
>> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>> >  
>> >  	hv_get_vp_index(msr_vp_index);
>> >  
>> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
>> > +	hv_vp_index[cpu] = msr_vp_index;
>> >  
>> >  	if (msr_vp_index > hv_max_vp_index)
>> >  		hv_max_vp_index = msr_vp_index;
>> 
>> The above is unrelated cleanup (as cpu == smp_processor_id() for
>> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
>> preempted.
>
> They can be preempted, but they are guaranteed to run on the upcoming CPU,
> i.e. smp_processor_id() is allowed even in preemptible context as the task
> cannot migrate.
>

Ah, right, thanks! The guarantee that they don't migrate should be enough.

-- 
Vitaly

^ permalink raw reply

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Peter Zijlstra @ 2019-06-14  8:28 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: Dmitry Safonov, linux-kernel, Prasanna Panchamukhi,
	Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
	H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
	Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
	Radim Krčmář, Roman Kagan, Sasha Levin,
	Stephen Hemminger, Thomas Gleixner, devel, kvm, linux-hyperv, x86
In-Reply-To: <8736kff6q3.fsf@vitty.brq.redhat.com>

On Wed, Jun 12, 2019 at 12:17:24PM +0200, Vitaly Kuznetsov wrote:
> Dmitry Safonov <dima@arista.com> writes:
> 
> > KVM support may be compiled as dynamic module, which triggers the
> > following splat on modprobe:
> >
> >  KVM: vmx: using Hyper-V Enlightened VMCS
> >  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
> >  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
> >  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
> >  Call Trace:
> >   dump_stack+0x61/0x7e
> >   check_preemption_disabled+0xd4/0xe6
> >   debug_smp_processor_id+0x17/0x19
> >   set_hv_tscchange_cb+0x1b/0x89
> >   kvm_arch_init+0x14a/0x163 [kvm]
> >   kvm_init+0x30/0x259 [kvm]
> >   vmx_init+0xed/0x3db [kvm_intel]
> >   do_one_initcall+0x89/0x1bc
> >   do_init_module+0x5f/0x207
> >   load_module+0x1b34/0x209b
> >   __ia32_sys_init_module+0x17/0x19
> >   do_fast_syscall_32+0x121/0x1fa
> >   entry_SYSENTER_compat+0x7f/0x91
> 
> Hm, I never noticed this one, you probably need something like
> CONFIG_PREEMPT enabled so see it.

CONFIG_DEBUG_PREEMPT

> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> >  	void **input_arg;
> >  	struct page *pg;
> >  
> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
> >  
> >  	hv_get_vp_index(msr_vp_index);
> >  
> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> > +	hv_vp_index[cpu] = msr_vp_index;
> >  
> >  	if (msr_vp_index > hv_max_vp_index)
> >  		hv_max_vp_index = msr_vp_index;
> 
> The above is unrelated cleanup (as cpu == smp_processor_id() for
> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
> preempted.

Yeah, makes sense though.

> > @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
> >  	struct hv_reenlightenment_control re_ctrl = {
> >  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
> >  		.enabled = 1,
> > -		.target_vp = hv_vp_index[smp_processor_id()]
> >  	};
> >  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
> >  
> > @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
> >  	/* Make sure callback is registered before we write to MSRs */
> >  	wmb();
> >  
> > +	preempt_disable();
> > +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
> >  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> > +	preempt_enable();
> > +
> 
> My personal preference would be to do something like
>    int cpu = get_cpu();
> 
>    ... set things up ...
> 
>    put_cpu();

If it doesn't matter, how about this then?

---
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..e58c693a9fce 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	void **input_arg;
 	struct page *pg;
 
@@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
 
 	hv_get_vp_index(msr_vp_index);
 
-	hv_vp_index[smp_processor_id()] = msr_vp_index;
+	hv_vp_index[cpu] = msr_vp_index;
 
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
@@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	struct hv_reenlightenment_control re_ctrl = {
 		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
 		.enabled = 1,
-		.target_vp = hv_vp_index[smp_processor_id()]
+		.target_vp = hv_vp_index[raw_smp_processor_id()]
 	};
 	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
 

^ permalink raw reply related

* Re: [PATCH] x86/hyperv: Disable preemption while setting reenlightenment vector
From: Vitaly Kuznetsov @ 2019-06-14 10:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dmitry Safonov, linux-kernel, Prasanna Panchamukhi,
	Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
	H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
	Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
	Radim Krčmář, Roman Kagan, Sasha Levin,
	Stephen Hemminger, Thomas Gleixner, devel, kvm, linux-hyperv, x86
In-Reply-To: <20190614082807.GV3436@hirez.programming.kicks-ass.net>

Peter Zijlstra <peterz@infradead.org> writes:

> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  

Yes, this should do, thanks! I'd also suggest to leave a comment like
	/* 
         * This function can get preemted and migrate to a different CPU
	 * but this doesn't matter. We just need to assign
	 * reenlightenment notification to some online CPU. In case this
         * CPU goes offline, hv_cpu_die() will re-assign it to some
 	 * other online CPU.
	 */
  
-- 
Vitaly

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox