Linux-HyperV List

Linux-HyperV List
 help / color / mirror / Atom feed

* [PATCH v1] Tools: hv: move to tools buildsystem
From: Andy Shevchenko @ 2019-06-28 17:05 UTC (permalink / raw)
  To: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, linux-hyperv,
	Sasha Levin
  Cc: Andy Shevchenko

There is a nice buildsystem dedicated for userspace tools in Linux kernel tree.
Switch gpio target to be built by it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 tools/hv/Build    |  3 +++
 tools/hv/Makefile | 51 +++++++++++++++++++++++++++++++++++++----------
 2 files changed, 44 insertions(+), 10 deletions(-)
 create mode 100644 tools/hv/Build

diff --git a/tools/hv/Build b/tools/hv/Build
new file mode 100644
index 000000000000..6cf51fa4b306
--- /dev/null
+++ b/tools/hv/Build
@@ -0,0 +1,3 @@
+hv_kvp_daemon-y += hv_kvp_daemon.o
+hv_vss_daemon-y += hv_vss_daemon.o
+hv_fcopy_daemon-y += hv_fcopy_daemon.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index 5db5e62cebda..b57143d9459c 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -1,28 +1,55 @@
 # SPDX-License-Identifier: GPL-2.0
 # Makefile for Hyper-V tools
-
-WARNINGS = -Wall -Wextra
-CFLAGS = $(WARNINGS) -g $(shell getconf LFS_CFLAGS)
-
-CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
+include ../scripts/Makefile.include
 
 sbindir ?= /usr/sbin
 libexecdir ?= /usr/libexec
 sharedstatedir ?= /var/lib
 
-ALL_PROGRAMS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(CURDIR)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+endif
+
+# Do not use make's built-in rules
+# (this improves performance and avoids hard-to-debug behaviour);
+MAKEFLAGS += -r
+
+override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
+
+ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
 
 ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
 
 all: $(ALL_PROGRAMS)
 
-%: %.c
-	$(CC) $(CFLAGS) -o $@ $^
+export srctree OUTPUT CC LD CFLAGS
+include $(srctree)/tools/build/Makefile.include
+
+HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
+$(HV_KVP_DAEMON_IN): FORCE
+	$(Q)$(MAKE) $(build)=hv_kvp_daemon
+$(OUTPUT)hv_kvp_daemon: $(HV_KVP_DAEMON_IN)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+
+HV_VSS_DAEMON_IN := $(OUTPUT)hv_vss_daemon-in.o
+$(HV_VSS_DAEMON_IN): FORCE
+	$(Q)$(MAKE) $(build)=hv_vss_daemon
+$(OUTPUT)hv_vss_daemon: $(HV_VSS_DAEMON_IN)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
+
+HV_FCOPY_DAEMON_IN := $(OUTPUT)hv_fcopy_daemon-in.o
+$(HV_FCOPY_DAEMON_IN): FORCE
+	$(Q)$(MAKE) $(build)=hv_fcopy_daemon
+$(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
+	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
 
 clean:
-	$(RM) hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+	rm -f $(ALL_PROGRAMS)
+	find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
 
-install: all
+install: $(ALL_PROGRAMS)
 	install -d -m 755 $(DESTDIR)$(sbindir); \
 	install -d -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd; \
 	install -d -m 755 $(DESTDIR)$(sharedstatedir); \
@@ -33,3 +60,7 @@ install: all
 	for script in $(ALL_SCRIPTS); do \
 		install $$script -m 755 $(DESTDIR)$(libexecdir)/hypervkvpd/$${script%.sh}; \
 	done
+
+FORCE:
+
+.PHONY: all install clean FORCE prepare
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 6/7] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-06-28  0:53 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, tglx, mingo, bp, hpa, boris.ostrovsky, jgross,
	sstabellini, peterz, srinivas.eeda, Waiman Long, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Ingo Molnar, linux-hyperv
In-Reply-To: <20190627222851.GC11506@sasha-vm>


On 2019/6/28 6:28, Sasha Levin wrote:
> On Mon, Jun 24, 2019 at 08:02:58PM +0800, Zhenzhong Duan wrote:
>> With the boot parameter "hv_nopvspin" specified a Hyperv guest should
>> not make use of paravirt spinlocks, but behave as if running on bare
>> metal. This is not true, however, as the qspinlock code will fall back
>> to a test-and-set scheme when it is detecting a hypervisor.
>>
>> In order to avoid this disable the virt_spin_lock_key.
>>
>> Same change for XEN is already in Commit e6fd28eb3522
>> ("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>> Cc: Waiman Long <longman@redhat.com>
>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: linux-hyperv@vger.kernel.org
>> ---
>> arch/x86/hyperv/hv_spinlock.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/arch/x86/hyperv/hv_spinlock.c 
>> b/arch/x86/hyperv/hv_spinlock.c
>> index 07f21a0..d90b4b0 100644
>> --- a/arch/x86/hyperv/hv_spinlock.c
>> +++ b/arch/x86/hyperv/hv_spinlock.c
>> @@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
>>
>> void __init hv_init_spinlocks(void)
>> {
>> +    if (unlikely(!hv_pvspin))
>> +        static_branch_disable(&virt_spin_lock_key);
>
> This should be combined in the conditional under it, which already
> attempts to disable PV spinlocks, note how hv_pvspin is checked there.
> hc_pvspin isn't the only reason we would disable PV spinlocks on hyperv.

In virt_spin_lock() there is a comment as below. The test-and-set spinlock

is an optimization to hypervisor platform when PV spinlock is unsupported.

         /*
          * On hypervisors without PARAVIRT_SPINLOCKS support we fall
          * back to a Test-and-Set spinlock, because fair locks have
          * horrible lock 'holder' preemption issues.
          */


So my understanding is:

If hv_pvspin=0 by command line, we want to behave as if running on bare 
metal(the fair locks path).

Though there is performance regression, but it's not that important when 
we use hv_pvspin=0.

If PV spinlock is disabled by other reasons, we prefer the optimization 
path.

>
> Also, there's no need for the unlikely() here, it's only getting called
> once...

Ok, I'll removed it.


Thanks

Zhenzhong


^ permalink raw reply

* Re: [PATCHv2] x86/hyperv: Hold cpus_read_lock() on assigning reenlightenment vector
From: Thomas Gleixner @ 2019-06-27 22:39 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Dmitry Safonov, Prasanna Panchamukhi,
	Andy Lutomirski, Borislav Petkov, Cathy Avery, Haiyang Zhang,
	H. Peter Anvin, Ingo Molnar, K. Y. Srinivasan,
	Michael Kelley (EOSG), Mohammed Gamal, Paolo Bonzini,
	Peter Zijlstra, Radim Krčmář, Roman Kagan,
	Sasha Levin, Stephen Hemminger, Vitaly Kuznetsov, devel, kvm,
	linux-hyperv, x86
In-Reply-To: <20190617163955.25659-1-dima@arista.com>

On Mon, 17 Jun 2019, Dmitry Safonov wrote:
> @@ -196,7 +196,16 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	/*
> +	 * As reenlightenment vector is global, there is no difference which
> +	 * CPU will register MSR, though it should be an online CPU.
> +	 * hv_cpu_die() callback guarantees that on CPU teardown
> +	 * another CPU will re-register MSR back.
> +	 */
> +	cpus_read_lock();
> +	re_ctrl.target_vp = hv_vp_index[raw_smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	cpus_read_unlock();

Should work

>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
> @@ -239,6 +248,7 @@ static int hv_cpu_die(unsigned int cpu)
>  
>  	rdmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
>  	if (re_ctrl.target_vp == hv_vp_index[cpu]) {
> +		lockdep_assert_cpus_held();

So you're not trusting the hotplug core code to hold the lock when it
brings a CPU down and invokes that callback? Come on

>  		/* Reassign to some other online CPU */
>  		new_cpu = cpumask_any_but(cpu_online_mask, cpu);

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v2 6/7] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Sasha Levin @ 2019-06-27 22:28 UTC (permalink / raw)
  To: Zhenzhong Duan
  Cc: linux-kernel, tglx, mingo, bp, hpa, boris.ostrovsky, jgross,
	sstabellini, peterz, srinivas.eeda, Waiman Long, K. Y. Srinivasan,
	Haiyang Zhang, Stephen Hemminger, Ingo Molnar, linux-hyperv
In-Reply-To: <1561377779-28036-7-git-send-email-zhenzhong.duan@oracle.com>

On Mon, Jun 24, 2019 at 08:02:58PM +0800, Zhenzhong Duan wrote:
>With the boot parameter "hv_nopvspin" specified a Hyperv guest should
>not make use of paravirt spinlocks, but behave as if running on bare
>metal. This is not true, however, as the qspinlock code will fall back
>to a test-and-set scheme when it is detecting a hypervisor.
>
>In order to avoid this disable the virt_spin_lock_key.
>
>Same change for XEN is already in Commit e6fd28eb3522
>("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")
>
>Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
>Cc: Waiman Long <longman@redhat.com>
>Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>Cc: Haiyang Zhang <haiyangz@microsoft.com>
>Cc: Stephen Hemminger <sthemmin@microsoft.com>
>Cc: Sasha Levin <sashal@kernel.org>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: Borislav Petkov <bp@alien8.de>
>Cc: linux-hyperv@vger.kernel.org
>---
> arch/x86/hyperv/hv_spinlock.c | 3 +++
> 1 file changed, 3 insertions(+)
>
>diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
>index 07f21a0..d90b4b0 100644
>--- a/arch/x86/hyperv/hv_spinlock.c
>+++ b/arch/x86/hyperv/hv_spinlock.c
>@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
>
> void __init hv_init_spinlocks(void)
> {
>+	if (unlikely(!hv_pvspin))
>+		static_branch_disable(&virt_spin_lock_key);

This should be combined in the conditional under it, which already
attempts to disable PV spinlocks, note how hv_pvspin is checked there.
hc_pvspin isn't the only reason we would disable PV spinlocks on hyperv.

Also, there's no need for the unlikely() here, it's only getting called
once...

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v3 2/5] x86: hv: hv_init.c: Add functions to allocate/deallocate page for Hyper-V
From: Thomas Gleixner @ 2019-06-27 21:38 UTC (permalink / raw)
  To: Maya Nakamura
  Cc: mikelley, kys, haiyangz, sthemmin, sashal, x86, linux-hyperv,
	linux-kernel
In-Reply-To: <d19c28cda88bf1706baff883380dfd321da30a68.1560837096.git.m.maya.nakamura@gmail.com>

Maya,

On Tue, 18 Jun 2019, Maya Nakamura wrote:

> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0e033ef11a9f..e8960a83add7 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,6 +37,20 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  
> +void *hv_alloc_hyperv_page(void)
> +{
> +	BUILD_BUG_ON(PAGE_SIZE != HV_HYP_PAGE_SIZE);
> +
> +	return (void *)__get_free_page(GFP_KERNEL);
> +}
> +EXPORT_SYMBOL_GPL(hv_alloc_hyperv_page);
> +
> +void hv_free_hyperv_page(unsigned long addr)
> +{
> +	free_page(addr);
> +}
> +EXPORT_SYMBOL_GPL(hv_free_hyperv_page);

These functions need to be declared in a header file.

Thanks,

	tglx



^ permalink raw reply

* [PATCH RESEND] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-06-26  8:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Zhenzhong Duan, Waiman Long, Peter Zijlstra (Intel),
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, linux-hyperv

With the boot parameter "hv_nopvspin" specified a Hyperv guest should
not make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this disable the virt_spin_lock_key.

Same change for XEN is already in Commit e6fd28eb3522
("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-hyperv@vger.kernel.org
---
 arch/x86/hyperv/hv_spinlock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..d90b4b0 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (unlikely(!hv_pvspin))
+		static_branch_disable(&virt_spin_lock_key);
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Andy Lutomirski @ 2019-06-26  3:51 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Andy Lutomirski, Peter Zijlstra, LKML, Ingo Molnar,
	Borislav Petkov, X86 ML, Thomas Gleixner, Dave Hansen,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Juergen Gross, Paolo Bonzini, Boris Ostrovsky,
	linux-hyperv@vger.kernel.org, Linux Virtualization, kvm list,
	xen-devel
In-Reply-To: <28C3D489-54E4-4670-B726-21B09FA469EE@vmware.com>

On Tue, Jun 25, 2019 at 8:48 PM Nadav Amit <namit@vmware.com> wrote:
>
> > On Jun 25, 2019, at 8:36 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit <namit@vmware.com> wrote:
> >> To improve TLB shootdown performance, flush the remote and local TLBs
> >> concurrently. Introduce flush_tlb_multi() that does so. The current
> >> flush_tlb_others() interface is kept, since paravirtual interfaces need
> >> to be adapted first before it can be removed. This is left for future
> >> work. In such PV environments, TLB flushes are not performed, at this
> >> time, concurrently.
> >
> > Would it be straightforward to have a default PV flush_tlb_multi()
> > that uses flush_tlb_others() under the hood?
>
> I prefer not to have a default PV implementation that should anyhow go away.
>
> I can create unoptimized untested versions for Xen and Hyper-V, if you want.
>

I think I prefer that approach.  We should be able to get the
maintainers to test it.  I don't love having legacy paths in there,
ahem, UV.

^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-06-26  3:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Borislav Petkov, X86 ML,
	Thomas Gleixner, Dave Hansen, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, Juergen Gross, Paolo Bonzini,
	Boris Ostrovsky, linux-hyperv@vger.kernel.org,
	Linux Virtualization, kvm list, xen-devel
In-Reply-To: <CALCETrXyJ8y7PSqf+RmGKjM4VSLXmNEGi6K=Jzw4jmckRQECTg@mail.gmail.com>

> On Jun 25, 2019, at 8:36 PM, Andy Lutomirski <luto@kernel.org> wrote:
> 
> On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit <namit@vmware.com> wrote:
>> To improve TLB shootdown performance, flush the remote and local TLBs
>> concurrently. Introduce flush_tlb_multi() that does so. The current
>> flush_tlb_others() interface is kept, since paravirtual interfaces need
>> to be adapted first before it can be removed. This is left for future
>> work. In such PV environments, TLB flushes are not performed, at this
>> time, concurrently.
> 
> Would it be straightforward to have a default PV flush_tlb_multi()
> that uses flush_tlb_others() under the hood?

I prefer not to have a default PV implementation that should anyhow go away.

I can create unoptimized untested versions for Xen and Hyper-V, if you want.


^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Andy Lutomirski @ 2019-06-26  3:36 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Andy Lutomirski, LKML, Ingo Molnar,
	Borislav Petkov, X86 ML, Thomas Gleixner, Dave Hansen,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Juergen Gross, Paolo Bonzini, Boris Ostrovsky, linux-hyperv,
	Linux Virtualization, kvm list, xen-devel
In-Reply-To: <20190613064813.8102-5-namit@vmware.com>

On Wed, Jun 12, 2019 at 11:49 PM Nadav Amit <namit@vmware.com> wrote:
>
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. The current
> flush_tlb_others() interface is kept, since paravirtual interfaces need
> to be adapted first before it can be removed. This is left for future
> work. In such PV environments, TLB flushes are not performed, at this
> time, concurrently.

Would it be straightforward to have a default PV flush_tlb_multi()
that uses flush_tlb_others() under the hood?

^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-06-26  3:32 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Zijlstra, Andy Lutomirski, LKML, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, Thomas Gleixner,
	Dave Hansen, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Juergen Gross, Paolo Bonzini, Boris Ostrovsky,
	linux-hyperv@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	xen-devel@lists.xenproject.org
In-Reply-To: <88a76cb8-2484-818a-2be6-d06a4ffef107@intel.com>

> On Jun 25, 2019, at 8:00 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 6/25/19 7:35 PM, Nadav Amit wrote:
>>>> const struct flush_tlb_info *f = info;
>>>> +	enum tlb_flush_reason reason;
>>>> +
>>>> +	reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
>>> 
>>> Should we just add the "reason" to flush_tlb_info?  It's OK-ish to imply
>>> it like this, but seems like it would be nicer and easier to track down
>>> the origins of these things if we did this at the caller.
>> 
>> I prefer not to. I want later to inline flush_tlb_info into the same
>> cacheline that holds call_function_data. Increasing the size of
>> flush_tlb_info for no good reason will not help…
> 
> Well, flush_tlb_info is at 6/8ths of a cacheline at the moment.
> call_function_data is 3/8ths.  To me, that means we have some slack in
> the size.

I do not understand your math.. :(

6 + 3 > 8 so putting both flush_tlb_info and call_function_data does not
leave us any slack (we can save one qword, so we can actually put them
at the same cacheline).

You can see my current implementation here:

https://lore.kernel.org/lkml/20190531063645.4697-4-namit@vmware.com/T/#m0ab5fe0799ba9ff0d41197f1095679fe26aebd57
https://lore.kernel.org/lkml/20190531063645.4697-4-namit@vmware.com/T/#m7b35a93dffd23fbb7ca813c795a0777d4cdcb51b


^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Dave Hansen @ 2019-06-26  3:00 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Peter Zijlstra, Andy Lutomirski, LKML, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, Thomas Gleixner,
	Dave Hansen, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Juergen Gross, Paolo Bonzini, Boris Ostrovsky,
	linux-hyperv@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	xen-devel@lists.xenproject.org
In-Reply-To: <1545B936-7CEC-4A1C-B776-74004F774218@vmware.com>

On 6/25/19 7:35 PM, Nadav Amit wrote:
>>> 	const struct flush_tlb_info *f = info;
>>> +	enum tlb_flush_reason reason;
>>> +
>>> +	reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
>>
>> Should we just add the "reason" to flush_tlb_info?  It's OK-ish to imply
>> it like this, but seems like it would be nicer and easier to track down
>> the origins of these things if we did this at the caller.
> 
> I prefer not to. I want later to inline flush_tlb_info into the same
> cacheline that holds call_function_data. Increasing the size of
> flush_tlb_info for no good reason will not help…

Well, flush_tlb_info is at 6/8ths of a cacheline at the moment.
call_function_data is 3/8ths.  To me, that means we have some slack in
the size.


^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Nadav Amit @ 2019-06-26  2:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Peter Zijlstra, Andy Lutomirski, LKML, Ingo Molnar,
	Borislav Petkov, the arch/x86 maintainers, Thomas Gleixner,
	Dave Hansen, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Juergen Gross, Paolo Bonzini, Boris Ostrovsky,
	linux-hyperv@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	xen-devel@lists.xenproject.org
In-Reply-To: <723d63ee-c8cb-14a1-0eb9-265e580360f4@intel.com>

> On Jun 25, 2019, at 2:29 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 6/12/19 11:48 PM, Nadav Amit wrote:
>> To improve TLB shootdown performance, flush the remote and local TLBs
>> concurrently. Introduce flush_tlb_multi() that does so. The current
>> flush_tlb_others() interface is kept, since paravirtual interfaces need
>> to be adapted first before it can be removed. This is left for future
>> work. In such PV environments, TLB flushes are not performed, at this
>> time, concurrently.
>> 
>> Add a static key to tell whether this new interface is supported.
>> 
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: x86@kernel.org
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: linux-hyperv@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: virtualization@lists.linux-foundation.org
>> Cc: kvm@vger.kernel.org
>> Cc: xen-devel@lists.xenproject.org
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>> arch/x86/hyperv/mmu.c                 |  2 +
>> arch/x86/include/asm/paravirt.h       |  8 +++
>> arch/x86/include/asm/paravirt_types.h |  6 +++
>> arch/x86/include/asm/tlbflush.h       |  6 +++
>> arch/x86/kernel/kvm.c                 |  1 +
>> arch/x86/kernel/paravirt.c            |  3 ++
>> arch/x86/mm/tlb.c                     | 71 ++++++++++++++++++++++-----
>> arch/x86/xen/mmu_pv.c                 |  2 +
>> 8 files changed, 87 insertions(+), 12 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
>> index e65d7fe6489f..ca28b400c87c 100644
>> --- a/arch/x86/hyperv/mmu.c
>> +++ b/arch/x86/hyperv/mmu.c
>> @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void)
>> 	pr_info("Using hypercall for remote TLB flush\n");
>> 	pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
>> 	pv_ops.mmu.tlb_remove_table = tlb_remove_table;
>> +
>> +	static_key_disable(&flush_tlb_multi_enabled.key);
>> }
>> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
>> index c25c38a05c1c..192be7254457 100644
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -47,6 +47,8 @@ static inline void slow_down_io(void)
>> #endif
>> }
>> 
>> +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
>> +
>> static inline void __flush_tlb(void)
>> {
>> 	PVOP_VCALL0(mmu.flush_tlb_user);
>> @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr)
>> 	PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
>> }
>> 
>> +static inline void flush_tlb_multi(const struct cpumask *cpumask,
>> +				   const struct flush_tlb_info *info)
>> +{
>> +	PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info);
>> +}
>> +
>> static inline void flush_tlb_others(const struct cpumask *cpumask,
>> 				    const struct flush_tlb_info *info)
>> {
>> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
>> index 946f8f1f1efc..b93b3d90729a 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -211,6 +211,12 @@ struct pv_mmu_ops {
>> 	void (*flush_tlb_user)(void);
>> 	void (*flush_tlb_kernel)(void);
>> 	void (*flush_tlb_one_user)(unsigned long addr);
>> +	/*
>> +	 * flush_tlb_multi() is the preferred interface, which is capable to
>> +	 * flush both local and remote CPUs.
>> +	 */
>> +	void (*flush_tlb_multi)(const struct cpumask *cpus,
>> +				const struct flush_tlb_info *info);
>> 	void (*flush_tlb_others)(const struct cpumask *cpus,
>> 				 const struct flush_tlb_info *info);
>> 
>> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
>> index dee375831962..79272938cf79 100644
>> --- a/arch/x86/include/asm/tlbflush.h
>> +++ b/arch/x86/include/asm/tlbflush.h
>> @@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
>> 	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
>> }
>> 
>> +void native_flush_tlb_multi(const struct cpumask *cpumask,
>> +			     const struct flush_tlb_info *info);
>> +
>> void native_flush_tlb_others(const struct cpumask *cpumask,
>> 			     const struct flush_tlb_info *info);
>> 
>> @@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
>> extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>> 
>> #ifndef CONFIG_PARAVIRT
>> +#define flush_tlb_multi(mask, info)	\
>> +	native_flush_tlb_multi(mask, info)
>> +
>> #define flush_tlb_others(mask, info)	\
>> 	native_flush_tlb_others(mask, info)
>> 
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 5169b8cc35bb..00d81e898717 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -630,6 +630,7 @@ static void __init kvm_guest_init(void)
>> 	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
>> 		pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others;
>> 		pv_ops.mmu.tlb_remove_table = tlb_remove_table;
>> +		static_key_disable(&flush_tlb_multi_enabled.key);
>> 	}
>> 
>> 	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
>> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
>> index 98039d7fb998..ac00afed5570 100644
>> --- a/arch/x86/kernel/paravirt.c
>> +++ b/arch/x86/kernel/paravirt.c
>> @@ -159,6 +159,8 @@ unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
>> 	return insn_len;
>> }
>> 
>> +DEFINE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
>> +
>> static void native_flush_tlb(void)
>> {
>> 	__native_flush_tlb();
>> @@ -363,6 +365,7 @@ struct paravirt_patch_template pv_ops = {
>> 	.mmu.flush_tlb_user	= native_flush_tlb,
>> 	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
>> 	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
>> +	.mmu.flush_tlb_multi	= native_flush_tlb_multi,
>> 	.mmu.flush_tlb_others	= native_flush_tlb_others,
>> 	.mmu.tlb_remove_table	=
>> 			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
>> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
>> index c34bcf03f06f..db73d5f1dd43 100644
>> --- a/arch/x86/mm/tlb.c
>> +++ b/arch/x86/mm/tlb.c
>> @@ -551,7 +551,7 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>> 		 * garbage into our TLB.  Since switching to init_mm is barely
>> 		 * slower than a minimal flush, just switch to init_mm.
>> 		 *
>> -		 * This should be rare, with native_flush_tlb_others skipping
>> +		 * This should be rare, with native_flush_tlb_multi skipping
>> 		 * IPIs to lazy TLB mode CPUs.
>> 		 */
> 
> Nit, since we're messing with this, it can now be
> "native_flush_tlb_multi()" since it is a function.

Sure.

> 
>> switch_mm_irqs_off(NULL, &init_mm, NULL);
>> @@ -635,9 +635,12 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>> 	this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
>> }
>> 
>> -static void flush_tlb_func_local(const void *info, enum tlb_flush_reason reason)
>> +static void flush_tlb_func_local(void *info)
>> {
>> 	const struct flush_tlb_info *f = info;
>> +	enum tlb_flush_reason reason;
>> +
>> +	reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
> 
> Should we just add the "reason" to flush_tlb_info?  It's OK-ish to imply
> it like this, but seems like it would be nicer and easier to track down
> the origins of these things if we did this at the caller.

I prefer not to. I want later to inline flush_tlb_info into the same
cacheline that holds call_function_data. Increasing the size of
flush_tlb_info for no good reason will not help…

>> flush_tlb_func_common(f, true, reason);
>> }
>> @@ -655,14 +658,21 @@ static void flush_tlb_func_remote(void *info)
>> 	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
>> }
>> 
>> -static bool tlb_is_not_lazy(int cpu, void *data)
>> +static inline bool tlb_is_not_lazy(int cpu)
>> {
>> 	return !per_cpu(cpu_tlbstate.is_lazy, cpu);
>> }
> 
> Nit: the compiler will probably inline this sucker anyway.  So, for
> these kinds of patches, I'd resist the urge to do these kinds of tweaks,
> especially since it starts to hide the important change on the line.

Of course.

> 
>> -void native_flush_tlb_others(const struct cpumask *cpumask,
>> -			     const struct flush_tlb_info *info)
>> +static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
>> +
>> +void native_flush_tlb_multi(const struct cpumask *cpumask,
>> +			    const struct flush_tlb_info *info)
>> {
>> +	/*
>> +	 * Do accounting and tracing. Note that there are (and have always been)
>> +	 * cases in which a remote TLB flush will be traced, but eventually
>> +	 * would not happen.
>> +	 */
>> 	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
>> 	if (info->end == TLB_FLUSH_ALL)
>> 		trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
>> @@ -682,10 +692,14 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>> 		 * means that the percpu tlb_gen variables won't be updated
>> 		 * and we'll do pointless flushes on future context switches.
>> 		 *
>> -		 * Rather than hooking native_flush_tlb_others() here, I think
>> +		 * Rather than hooking native_flush_tlb_multi() here, I think
>> 		 * that UV should be updated so that smp_call_function_many(),
>> 		 * etc, are optimal on UV.
>> 		 */
>> +		local_irq_disable();
>> +		flush_tlb_func_local((__force void *)info);
>> +		local_irq_enable();
>> +
>> 		cpumask = uv_flush_tlb_others(cpumask, info);
>> 		if (cpumask)
>> 			smp_call_function_many(cpumask, flush_tlb_func_remote,
>> @@ -704,11 +718,39 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>> 	 * doing a speculative memory access.
>> 	 */
>> 	if (info->freed_tables)
>> -		smp_call_function_many(cpumask, flush_tlb_func_remote,
>> -			       (void *)info, 1);
>> -	else
>> -		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func_remote,
>> -				(void *)info, 1, GFP_ATOMIC, cpumask);
>> +		__smp_call_function_many(cpumask, flush_tlb_func_remote,
>> +					 flush_tlb_func_local, (void *)info, 1);
>> +	else {
> 
> I prefer brackets be added for 'if' blocks like this since it doesn't
> take up any meaningful space and makes it less prone to compile errors.

If you say so.

> 
>> +		/*
>> +		 * Although we could have used on_each_cpu_cond_mask(),
>> +		 * open-coding it has several performance advantages: (1) we can
>> +		 * use specialized functions for remote and local flushes; (2)
>> +		 * no need for indirect branch to test if TLB is lazy; (3) we
>> +		 * can use a designated cpumask for evaluating the condition
>> +		 * instead of allocating a new one.
>> +		 *
>> +		 * This works under the assumption that there are no nested TLB
>> +		 * flushes, an assumption that is already made in
>> +		 * flush_tlb_mm_range().
>> +		 */
>> +		struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask);
> 
> This is logically a stack-local variable, right?  But, since we've got
> preempt off and cpumasks can be huge, we don't want to allocate it on
> the stack.  That might be worth a comment somewhere.

I will add a comment here.

> 
>> +		int cpu;
>> +
>> +		cpumask_clear(cond_cpumask);
>> +
>> +		for_each_cpu(cpu, cpumask) {
>> +			if (tlb_is_not_lazy(cpu))
>> +				__cpumask_set_cpu(cpu, cond_cpumask);
>> +		}
> 
> FWIW, it's probably worth calling out in the changelog that this loop
> exists in on_each_cpu_cond_mask() too.  It looks bad here, but it's no
> worse than what it replaces.

Added.

> 
>> +		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
>> +					 flush_tlb_func_local, (void *)info, 1);
>> +	}
>> +}
> 
> There was a __force on an earlier 'info' cast.  Could you talk about
> that for a minute an explain why that one is needed?

I have no idea where the __force came from. I’ll remove it.

> 
>> +void native_flush_tlb_others(const struct cpumask *cpumask,
>> +			     const struct flush_tlb_info *info)
>> +{
>> +	native_flush_tlb_multi(cpumask, info);
>> }
>> 
>> /*
>> @@ -774,10 +816,15 @@ static void flush_tlb_on_cpus(const cpumask_t *cpumask,
>> {
>> 	int this_cpu = smp_processor_id();
>> 
>> +	if (static_branch_likely(&flush_tlb_multi_enabled)) {
>> +		flush_tlb_multi(cpumask, info);
>> +		return;
>> +	}
> 
> Probably needs a comment for posterity above the if()^^:
> 
> 	/* Use the optimized flush_tlb_multi() where we can. */

Right.

> 
>> --- a/arch/x86/xen/mmu_pv.c
>> +++ b/arch/x86/xen/mmu_pv.c
>> @@ -2474,6 +2474,8 @@ void __init xen_init_mmu_ops(void)
>> 
>> 	pv_ops.mmu = xen_mmu_ops;
>> 
>> +	static_key_disable(&flush_tlb_multi_enabled.key);
>> +
>> 	memset(dummy_mapping, 0xff, PAGE_SIZE);
>> }
> 
> More comments, please.  Perhaps:
> 
> 	Existing paravirt TLB flushes are incompatible with
> 	flush_tlb_multi() because....  Disable it when they are
> 	in use.

There is no inherent reason for them to be incompatible. Someone needs to
adapt them. I will use my affiliation as an excuse for the question “why
don’t you do it?” ;-)

Anyhow, I will add a comment.


^ permalink raw reply

* Re: [PATCH 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
From: Dave Hansen @ 2019-06-25 21:29 UTC (permalink / raw)
  To: Nadav Amit, Peter Zijlstra, Andy Lutomirski
  Cc: linux-kernel, Ingo Molnar, Borislav Petkov, x86, Thomas Gleixner,
	Dave Hansen, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Sasha Levin, Juergen Gross, Paolo Bonzini, Boris Ostrovsky,
	linux-hyperv, virtualization, kvm, xen-devel
In-Reply-To: <20190613064813.8102-5-namit@vmware.com>

On 6/12/19 11:48 PM, Nadav Amit wrote:
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. The current
> flush_tlb_others() interface is kept, since paravirtual interfaces need
> to be adapted first before it can be removed. This is left for future
> work. In such PV environments, TLB flushes are not performed, at this
> time, concurrently.
> 
> Add a static key to tell whether this new interface is supported.
> 
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: x86@kernel.org
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: linux-hyperv@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: kvm@vger.kernel.org
> Cc: xen-devel@lists.xenproject.org
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  arch/x86/hyperv/mmu.c                 |  2 +
>  arch/x86/include/asm/paravirt.h       |  8 +++
>  arch/x86/include/asm/paravirt_types.h |  6 +++
>  arch/x86/include/asm/tlbflush.h       |  6 +++
>  arch/x86/kernel/kvm.c                 |  1 +
>  arch/x86/kernel/paravirt.c            |  3 ++
>  arch/x86/mm/tlb.c                     | 71 ++++++++++++++++++++++-----
>  arch/x86/xen/mmu_pv.c                 |  2 +
>  8 files changed, 87 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c
> index e65d7fe6489f..ca28b400c87c 100644
> --- a/arch/x86/hyperv/mmu.c
> +++ b/arch/x86/hyperv/mmu.c
> @@ -233,4 +233,6 @@ void hyperv_setup_mmu_ops(void)
>  	pr_info("Using hypercall for remote TLB flush\n");
>  	pv_ops.mmu.flush_tlb_others = hyperv_flush_tlb_others;
>  	pv_ops.mmu.tlb_remove_table = tlb_remove_table;
> +
> +	static_key_disable(&flush_tlb_multi_enabled.key);
>  }
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index c25c38a05c1c..192be7254457 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -47,6 +47,8 @@ static inline void slow_down_io(void)
>  #endif
>  }
>  
> +DECLARE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
> +
>  static inline void __flush_tlb(void)
>  {
>  	PVOP_VCALL0(mmu.flush_tlb_user);
> @@ -62,6 +64,12 @@ static inline void __flush_tlb_one_user(unsigned long addr)
>  	PVOP_VCALL1(mmu.flush_tlb_one_user, addr);
>  }
>  
> +static inline void flush_tlb_multi(const struct cpumask *cpumask,
> +				   const struct flush_tlb_info *info)
> +{
> +	PVOP_VCALL2(mmu.flush_tlb_multi, cpumask, info);
> +}
> +
>  static inline void flush_tlb_others(const struct cpumask *cpumask,
>  				    const struct flush_tlb_info *info)
>  {
> diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
> index 946f8f1f1efc..b93b3d90729a 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -211,6 +211,12 @@ struct pv_mmu_ops {
>  	void (*flush_tlb_user)(void);
>  	void (*flush_tlb_kernel)(void);
>  	void (*flush_tlb_one_user)(unsigned long addr);
> +	/*
> +	 * flush_tlb_multi() is the preferred interface, which is capable to
> +	 * flush both local and remote CPUs.
> +	 */
> +	void (*flush_tlb_multi)(const struct cpumask *cpus,
> +				const struct flush_tlb_info *info);
>  	void (*flush_tlb_others)(const struct cpumask *cpus,
>  				 const struct flush_tlb_info *info);
>  
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index dee375831962..79272938cf79 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -569,6 +569,9 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a)
>  	flush_tlb_mm_range(vma->vm_mm, a, a + PAGE_SIZE, PAGE_SHIFT, false);
>  }
>  
> +void native_flush_tlb_multi(const struct cpumask *cpumask,
> +			     const struct flush_tlb_info *info);
> +
>  void native_flush_tlb_others(const struct cpumask *cpumask,
>  			     const struct flush_tlb_info *info);
>  
> @@ -593,6 +596,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch,
>  extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch);
>  
>  #ifndef CONFIG_PARAVIRT
> +#define flush_tlb_multi(mask, info)	\
> +	native_flush_tlb_multi(mask, info)
> +
>  #define flush_tlb_others(mask, info)	\
>  	native_flush_tlb_others(mask, info)
>  
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 5169b8cc35bb..00d81e898717 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -630,6 +630,7 @@ static void __init kvm_guest_init(void)
>  	    kvm_para_has_feature(KVM_FEATURE_STEAL_TIME)) {
>  		pv_ops.mmu.flush_tlb_others = kvm_flush_tlb_others;
>  		pv_ops.mmu.tlb_remove_table = tlb_remove_table;
> +		static_key_disable(&flush_tlb_multi_enabled.key);
>  	}
>  
>  	if (kvm_para_has_feature(KVM_FEATURE_PV_EOI))
> diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
> index 98039d7fb998..ac00afed5570 100644
> --- a/arch/x86/kernel/paravirt.c
> +++ b/arch/x86/kernel/paravirt.c
> @@ -159,6 +159,8 @@ unsigned paravirt_patch_insns(void *insn_buff, unsigned len,
>  	return insn_len;
>  }
>  
> +DEFINE_STATIC_KEY_TRUE(flush_tlb_multi_enabled);
> +
>  static void native_flush_tlb(void)
>  {
>  	__native_flush_tlb();
> @@ -363,6 +365,7 @@ struct paravirt_patch_template pv_ops = {
>  	.mmu.flush_tlb_user	= native_flush_tlb,
>  	.mmu.flush_tlb_kernel	= native_flush_tlb_global,
>  	.mmu.flush_tlb_one_user	= native_flush_tlb_one_user,
> +	.mmu.flush_tlb_multi	= native_flush_tlb_multi,
>  	.mmu.flush_tlb_others	= native_flush_tlb_others,
>  	.mmu.tlb_remove_table	=
>  			(void (*)(struct mmu_gather *, void *))tlb_remove_page,
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index c34bcf03f06f..db73d5f1dd43 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -551,7 +551,7 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>  		 * garbage into our TLB.  Since switching to init_mm is barely
>  		 * slower than a minimal flush, just switch to init_mm.
>  		 *
> -		 * This should be rare, with native_flush_tlb_others skipping
> +		 * This should be rare, with native_flush_tlb_multi skipping
>  		 * IPIs to lazy TLB mode CPUs.
>  		 */

Nit, since we're messing with this, it can now be
"native_flush_tlb_multi()" since it is a function.

>  		switch_mm_irqs_off(NULL, &init_mm, NULL);
> @@ -635,9 +635,12 @@ static void flush_tlb_func_common(const struct flush_tlb_info *f,
>  	this_cpu_write(cpu_tlbstate.ctxs[loaded_mm_asid].tlb_gen, mm_tlb_gen);
>  }
>  
> -static void flush_tlb_func_local(const void *info, enum tlb_flush_reason reason)
> +static void flush_tlb_func_local(void *info)
>  {
>  	const struct flush_tlb_info *f = info;
> +	enum tlb_flush_reason reason;
> +
> +	reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;

Should we just add the "reason" to flush_tlb_info?  It's OK-ish to imply
it like this, but seems like it would be nicer and easier to track down
the origins of these things if we did this at the caller.

>  	flush_tlb_func_common(f, true, reason);
>  }
> @@ -655,14 +658,21 @@ static void flush_tlb_func_remote(void *info)
>  	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
>  }
>  
> -static bool tlb_is_not_lazy(int cpu, void *data)
> +static inline bool tlb_is_not_lazy(int cpu)
>  {
>  	return !per_cpu(cpu_tlbstate.is_lazy, cpu);
>  }

Nit: the compiler will probably inline this sucker anyway.  So, for
these kinds of patches, I'd resist the urge to do these kinds of tweaks,
especially since it starts to hide the important change on the line.

> -void native_flush_tlb_others(const struct cpumask *cpumask,
> -			     const struct flush_tlb_info *info)
> +static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask);
> +
> +void native_flush_tlb_multi(const struct cpumask *cpumask,
> +			    const struct flush_tlb_info *info)
>  {
> +	/*
> +	 * Do accounting and tracing. Note that there are (and have always been)
> +	 * cases in which a remote TLB flush will be traced, but eventually
> +	 * would not happen.
> +	 */
>  	count_vm_tlb_event(NR_TLB_REMOTE_FLUSH);
>  	if (info->end == TLB_FLUSH_ALL)
>  		trace_tlb_flush(TLB_REMOTE_SEND_IPI, TLB_FLUSH_ALL);
> @@ -682,10 +692,14 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  		 * means that the percpu tlb_gen variables won't be updated
>  		 * and we'll do pointless flushes on future context switches.
>  		 *
> -		 * Rather than hooking native_flush_tlb_others() here, I think
> +		 * Rather than hooking native_flush_tlb_multi() here, I think
>  		 * that UV should be updated so that smp_call_function_many(),
>  		 * etc, are optimal on UV.
>  		 */
> +		local_irq_disable();
> +		flush_tlb_func_local((__force void *)info);
> +		local_irq_enable();
> +
>  		cpumask = uv_flush_tlb_others(cpumask, info);
>  		if (cpumask)
>  			smp_call_function_many(cpumask, flush_tlb_func_remote,
> @@ -704,11 +718,39 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  	 * doing a speculative memory access.
>  	 */
>  	if (info->freed_tables)
> -		smp_call_function_many(cpumask, flush_tlb_func_remote,
> -			       (void *)info, 1);
> -	else
> -		on_each_cpu_cond_mask(tlb_is_not_lazy, flush_tlb_func_remote,
> -				(void *)info, 1, GFP_ATOMIC, cpumask);
> +		__smp_call_function_many(cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local, (void *)info, 1);
> +	else {

I prefer brackets be added for 'if' blocks like this since it doesn't
take up any meaningful space and makes it less prone to compile errors.

> +		/*
> +		 * Although we could have used on_each_cpu_cond_mask(),
> +		 * open-coding it has several performance advantages: (1) we can
> +		 * use specialized functions for remote and local flushes; (2)
> +		 * no need for indirect branch to test if TLB is lazy; (3) we
> +		 * can use a designated cpumask for evaluating the condition
> +		 * instead of allocating a new one.
> +		 *
> +		 * This works under the assumption that there are no nested TLB
> +		 * flushes, an assumption that is already made in
> +		 * flush_tlb_mm_range().
> +		 */
> +		struct cpumask *cond_cpumask = this_cpu_ptr(&flush_tlb_mask);

This is logically a stack-local variable, right?  But, since we've got
preempt off and cpumasks can be huge, we don't want to allocate it on
the stack.  That might be worth a comment somewhere.

> +		int cpu;
> +
> +		cpumask_clear(cond_cpumask);
> +
> +		for_each_cpu(cpu, cpumask) {
> +			if (tlb_is_not_lazy(cpu))
> +				__cpumask_set_cpu(cpu, cond_cpumask);
> +		}

FWIW, it's probably worth calling out in the changelog that this loop
exists in on_each_cpu_cond_mask() too.  It looks bad here, but it's no
worse than what it replaces.

> +		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local, (void *)info, 1);
> +	}
> +}

There was a __force on an earlier 'info' cast.  Could you talk about
that for a minute an explain why that one is needed?

> +void native_flush_tlb_others(const struct cpumask *cpumask,
> +			     const struct flush_tlb_info *info)
> +{
> +	native_flush_tlb_multi(cpumask, info);
>  }
>  
>  /*
> @@ -774,10 +816,15 @@ static void flush_tlb_on_cpus(const cpumask_t *cpumask,
>  {
>  	int this_cpu = smp_processor_id();
>  
> +	if (static_branch_likely(&flush_tlb_multi_enabled)) {
> +		flush_tlb_multi(cpumask, info);
> +		return;
> +	}

Probably needs a comment for posterity above the if()^^:

	/* Use the optimized flush_tlb_multi() where we can. */

> --- a/arch/x86/xen/mmu_pv.c
> +++ b/arch/x86/xen/mmu_pv.c
> @@ -2474,6 +2474,8 @@ void __init xen_init_mmu_ops(void)
>  
>  	pv_ops.mmu = xen_mmu_ops;
>  
> +	static_key_disable(&flush_tlb_multi_enabled.key);
> +
>  	memset(dummy_mapping, 0xff, PAGE_SIZE);
>  }

More comments, please.  Perhaps:

	Existing paravirt TLB flushes are incompatible with
	flush_tlb_multi() because....  Disable it when they are
	in use.

^ permalink raw reply

* [PATCH v2 6/7] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-06-24 12:02 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, boris.ostrovsky, jgross, sstabellini,
	peterz, srinivas.eeda, Zhenzhong Duan, Waiman Long,
	K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Sasha Levin,
	Ingo Molnar, linux-hyperv
In-Reply-To: <1561377779-28036-1-git-send-email-zhenzhong.duan@oracle.com>

With the boot parameter "hv_nopvspin" specified a Hyperv guest should
not make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this disable the virt_spin_lock_key.

Same change for XEN is already in Commit e6fd28eb3522
("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: linux-hyperv@vger.kernel.org
---
 arch/x86/hyperv/hv_spinlock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..d90b4b0 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (unlikely(!hv_pvspin))
+		static_branch_disable(&virt_spin_lock_key);
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 6/8] IB/srp: set virt_boundary_mask in the scsi host
From: Max Gurtovoy @ 2019-06-24 22:33 UTC (permalink / raw)
  To: Christoph Hellwig, Martin K . Petersen
  Cc: Sagi Grimberg, Bart Van Assche, linux-rdma, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
	linux-kernel
In-Reply-To: <20190617122000.22181-7-hch@lst.de>


On 6/17/2019 3:19 PM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>


^ permalink raw reply

* Re: [PATCH 5/8] IB/iser: set virt_boundary_mask in the scsi host
From: Max Gurtovoy @ 2019-06-24 22:32 UTC (permalink / raw)
  To: Christoph Hellwig, Martin K . Petersen
  Cc: Sagi Grimberg, Bart Van Assche, linux-rdma, linux-scsi,
	megaraidlinux.pdl, MPT-FusionLinux.pdl, linux-hyperv,
	linux-kernel
In-Reply-To: <20190617122000.22181-6-hch@lst.de>

On 6/17/2019 3:19 PM, Christoph Hellwig wrote:
> This ensures all proper DMA layer handling is taken care of by the
> SCSI midlayer.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>


Looks good,

Reviewed-by: Max Gurtovoy <maxg@mellanox.com>


^ permalink raw reply

* [PATCH 5/6] locking/spinlocks, paravirt, hyperv: Correct the hv_nopvspin case
From: Zhenzhong Duan @ 2019-06-23 13:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: tglx, mingo, bp, hpa, boris.ostrovsky, jgross, sstabellini,
	Zhenzhong Duan, K. Y. Srinivasan, Haiyang Zhang,
	Stephen Hemminger, Sasha Levin, linux-hyperv
In-Reply-To: <1561294903-6166-1-git-send-email-zhenzhong.duan@oracle.com>

With the boot parameter "hv_nopvspin" specified a Hyperv guest should
not make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this disable the virt_spin_lock_key.

Same change for XEN is already in Commit e6fd28eb3522
("locking/spinlocks, paravirt, xen: Correct the xen_nopvspin case")

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: linux-hyperv@vger.kernel.org
---
 arch/x86/hyperv/hv_spinlock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/hyperv/hv_spinlock.c b/arch/x86/hyperv/hv_spinlock.c
index 07f21a0..210495b 100644
--- a/arch/x86/hyperv/hv_spinlock.c
+++ b/arch/x86/hyperv/hv_spinlock.c
@@ -64,6 +64,9 @@ __visible bool hv_vcpu_is_preempted(int vcpu)
 
 void __init hv_init_spinlocks(void)
 {
+	if (!hv_pvspin)
+		static_branch_disable(&virt_spin_lock_key);
+
 	if (!hv_pvspin || !apic ||
 	    !(ms_hyperv.hints & HV_X64_CLUSTER_IPI_RECOMMENDED) ||
 	    !(ms_hyperv.features & HV_X64_MSR_GUEST_IDLE_AVAILABLE)) {
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Thomas Gleixner @ 2019-06-23 22:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Michael Kelley, Vincenzo Frascino, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH, Stephen Rothwell
In-Reply-To: <20190623190929.GL2226@sasha-vm>

Sasha,

On Sun, 23 Jun 2019, Sasha Levin wrote:
> On Sat, Jun 22, 2019 at 04:46:28PM +0200, Thomas Gleixner wrote:
> > Folks, please stop chosing Cc lists as you like. We have well established
> > rules for that. And please stop queueing random unreviewed patches in
> > next. Next is not a playground for not ready and unreviewed stuff. No, the
> > hyper-v inbreed Reviewed-by is not sufficient for anything x86 and
> > clocksource related.
> 
> I'm sorry for this, you were supposed to be Cc'ed on these patches and I
> see that you were not.

All good. I've vented steam and am back to normal pressure :)

> > After chasing and looking at those patches, which have horrible subject
> > lines and changelogs btw, I was not able to judge quickly whether that
> > stuff is self contained or not. So no, I fixed up the fallout and rebased
> > Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if
> > they are not self contained they will break bisection badly.
> > 
> > I'm going to push out the VDSO series later today. That will nicely break

Not yet, but soon :)

> > in combination with the hyper-next branch. Stephen, please drop that and do
> > not try to handle the fallout. That stuff needs to go through the proper
> > channels or at least be acked/reviewed by the relevant maintainers. So the
> > hyper-v folks can rebase themself and post it proper.
> 
> Okay, thank you. We'll rebase and resend.

I have no objections if you collect hyper-v stuff, quite the contrary, but
changes which touch other subsystems need to be coordinated upfront. That's
all I'm asking for.

Btw, that clocksource stuff looks good code wise, just the change logs need
some care and after the VDSO stuff hits next we need to sort out the
logistics. I hope these changes are completely self contained. If not we'll
find a solution.

Thanks,

	tglx

^ permalink raw reply

* RE: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Thomas Gleixner @ 2019-06-24  0:25 UTC (permalink / raw)
  To: Michael Kelley
  Cc: Sasha Levin, Vincenzo Frascino, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies,
	linux-hyperv@vger.kernel.org, Greg KH, Stephen Rothwell
In-Reply-To: <BYAPR21MB135202F46C4B023B51EBBFD0D7E00@BYAPR21MB1352.namprd21.prod.outlook.com>

On Mon, 24 Jun 2019, Michael Kelley wrote:
> From: Thomas Gleixner <tglx@linutronix.de> Sent: Sunday, June 23, 2019 3:13 PM
> > 
> > I have no objections if you collect hyper-v stuff, quite the contrary, but
> > changes which touch other subsystems need to be coordinated upfront. That's
> > all I'm asking for.
> > 
> > Btw, that clocksource stuff looks good code wise, just the change logs need
> > some care and after the VDSO stuff hits next we need to sort out the
> > logistics. I hope these changes are completely self contained. If not we'll
> > find a solution.
> >
> 
> In my view, the only thing that potentially needs a solution is where the
> Hyper-V clock code used by VDSO ends up in the code tree.  I think the
> right long term place is include/clocksource/hyperv_timer.h.   That location
> is architecture neutral, and the same Hyper-V clock code will be shared by
> the Hyper-V on ARM64 support that's in process.
> 
> Vincenzo's patch set creates a new file arch/x86/include/asm/mshyperv-tsc.h,
> which I will want to move when creating the separate Hyper-V clocksource
> driver.   If you're OK with that file existing for a release and then going away,
> that's fine.  Alternatively, put the code in include/clocksource/hyperv_timer.h
> now as part of the VDSO patch set so it's in the right place from the start.  My
> subsequent patch set will add a few additional tweaks to remove x86-isms
> and fully integrate with the separate Hyper-V clocksource driver.

I don't care whether this goes into 5.3 or later. If you can provide me
rebased self contained patches on top of

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso

I'm happy to pull them in on top.

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Stephen Rothwell @ 2019-06-23 21:58 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Michael Kelley, Vincenzo Frascino,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH
In-Reply-To: <20190623190929.GL2226@sasha-vm>

[-- Attachment #1: Type: text/plain, Size: 359 bytes --]

Hi Sasha,

On Sun, 23 Jun 2019 15:09:29 -0400 Sasha Levin <sashal@kernel.org> wrote:
>
> Appologies about this. I ended up with way more travel than I would have
> liked (writing this from an airport). I've reset our hyperv-next branch
> to remove these 3 commits until we figure this out.

But not pushed out, yet?

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Stephen Rothwell @ 2019-06-24  1:20 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Thomas Gleixner, Michael Kelley, Vincenzo Frascino,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH
In-Reply-To: <20190624002430.GN2226@sasha-vm>

[-- Attachment #1: Type: text/plain, Size: 204 bytes --]

Hi Sasha,

On Sun, 23 Jun 2019 20:24:30 -0400 Sasha Levin <sashal@kernel.org> wrote:
>
> Pushed now. For some reason the airport wifi was blocking ssh :/

Thanks.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* RE: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Michael Kelley @ 2019-06-24  0:04 UTC (permalink / raw)
  To: Thomas Gleixner, Sasha Levin
  Cc: Vincenzo Frascino, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies,
	linux-hyperv@vger.kernel.org, Greg KH, Stephen Rothwell
In-Reply-To: <alpine.DEB.2.21.1906240006090.32342@nanos.tec.linutronix.de>

From: Thomas Gleixner <tglx@linutronix.de> Sent: Sunday, June 23, 2019 3:13 PM
> 
> I have no objections if you collect hyper-v stuff, quite the contrary, but
> changes which touch other subsystems need to be coordinated upfront. That's
> all I'm asking for.
> 
> Btw, that clocksource stuff looks good code wise, just the change logs need
> some care and after the VDSO stuff hits next we need to sort out the
> logistics. I hope these changes are completely self contained. If not we'll
> find a solution.
>

In my view, the only thing that potentially needs a solution is where the
Hyper-V clock code used by VDSO ends up in the code tree.  I think the
right long term place is include/clocksource/hyperv_timer.h.   That location
is architecture neutral, and the same Hyper-V clock code will be shared by
the Hyper-V on ARM64 support that's in process.

Vincenzo's patch set creates a new file arch/x86/include/asm/mshyperv-tsc.h,
which I will want to move when creating the separate Hyper-V clocksource
driver.   If you're OK with that file existing for a release and then going away,
that's fine.  Alternatively, put the code in include/clocksource/hyperv_timer.h
now as part of the VDSO patch set so it's in the right place from the start.  My
subsequent patch set will add a few additional tweaks to remove x86-isms
and fully integrate with the separate Hyper-V clocksource driver.

Michael 

^ permalink raw reply

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Sasha Levin @ 2019-06-24  0:24 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Thomas Gleixner, Michael Kelley, Vincenzo Frascino,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH
In-Reply-To: <20190624075834.2491a61a@canb.auug.org.au>

On Mon, Jun 24, 2019 at 07:58:34AM +1000, Stephen Rothwell wrote:
>Hi Sasha,
>
>On Sun, 23 Jun 2019 15:09:29 -0400 Sasha Levin <sashal@kernel.org> wrote:
>>
>> Appologies about this. I ended up with way more travel than I would have
>> liked (writing this from an airport). I've reset our hyperv-next branch
>> to remove these 3 commits until we figure this out.
>
>But not pushed out, yet?

Pushed now. For some reason the airport wifi was blocking ssh :/

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Sasha Levin @ 2019-06-23 19:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Michael Kelley, Vincenzo Frascino, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH, Stephen Rothwell
In-Reply-To: <alpine.DEB.2.21.1906221542270.5503@nanos.tec.linutronix.de>

On Sat, Jun 22, 2019 at 04:46:28PM +0200, Thomas Gleixner wrote:
>On Fri, 14 Jun 2019, Sasha Levin wrote:
>> On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
>> > On Thu, 30 May 2019, Michael Kelley wrote:
>> > > Vincenzo -- these changes for Hyper-V are a subset of a larger patch set
>> > > I have that moves all of the Hyper-V clock/timer code into a separate
>> > > clocksource driver in drivers/clocksource, with an include file in
>> > > includes/clocksource.  That new include file should be able to work
>> > > instead of your new mshyperv-tsc.h.  It also has the benefit of being
>> > > ISA neutral, so it will work with my in-progress patch set to support
>> > > Linux on Hyper-V on ARM64.  See https://lkml.org/lkml/2019/5/27/231
>> > > for the new clocksource driver patch set.
>> >
>> > Grrr. That's queued in hyperv-next for whatever reasons.
>>
>> I queue up our future pull requests there to give them some soaking in
>> -next.
>
>What? You queue completely unreviewed stuff which touches two other
>subsystems to let it soak in next?

It was out on LKML for 2+ weeks before I've pulled it in. As it mostly
touches hyperv bits I felt comfortable to give it time in -next (but not
actually to try and merge it until it gets a few acks).

>> > Sasha, can you please provide me the branch to pull from so I can have a
>> > common base for all the various changes floating around?
>>
>> I'll send you a unified pull request for these changes.
>
>Which has not materialized yet.

Appologies about this. I ended up with way more travel than I would have
liked (writing this from an airport). I've reset our hyperv-next branch
to remove these 3 commits until we figure this out.

>TBH, I'm pretty grumpy about those clocksource changes. Here is the
>diffstat:
>
> MAINTAINERS                          |    2
> arch/x86/entry/vdso/vclock_gettime.c |    1
> arch/x86/entry/vdso/vma.c            |    2
> arch/x86/hyperv/hv_init.c            |   91 ---------
> arch/x86/include/asm/hyperv-tlfs.h   |    6
> arch/x86/include/asm/mshyperv.h      |   81 +-------
> arch/x86/kernel/cpu/mshyperv.c       |    2
> arch/x86/kvm/x86.c                   |    1
> drivers/clocksource/Makefile         |    1
> drivers/clocksource/hyperv_timer.c   |  322 +++++++++++++++++++++++++++++++++++
> drivers/hv/Kconfig                   |    3
> drivers/hv/hv.c                      |  156 ----------------
> drivers/hv/hv_util.c                 |    1
> drivers/hv/hyperv_vmbus.h            |    3
> drivers/hv/vmbus_drv.c               |   42 ++--
> include/clocksource/hyperv_timer.h   |  105 +++++++++++
>
>While the world and some more people have been CC'ed on those patches,
>neither the clocksource nor the x86 maintainer have been.
>
>When I gave Vincenzo the advise to base his code on that hyper-v branch, I
>expected that I find the related patches in my mail backlog. No, they have
>not been there because I was not on CC.
>
>Folks, please stop chosing Cc lists as you like. We have well established
>rules for that. And please stop queueing random unreviewed patches in
>next. Next is not a playground for not ready and unreviewed stuff. No, the
>hyper-v inbreed Reviewed-by is not sufficient for anything x86 and
>clocksource related.

I'm sorry for this, you were supposed to be Cc'ed on these patches and I
see that you were not.

>After chasing and looking at those patches, which have horrible subject
>lines and changelogs btw, I was not able to judge quickly whether that
>stuff is self contained or not. So no, I fixed up the fallout and rebased
>Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if
>they are not self contained they will break bisection badly.
>
>I'm going to push out the VDSO series later today. That will nicely break
>in combination with the hyper-next branch. Stephen, please drop that and do
>not try to handle the fallout. That stuff needs to go through the proper
>channels or at least be acked/reviewed by the relevant maintainers. So the
>hyper-v folks can rebase themself and post it proper.

Okay, thank you. We'll rebase and resend.

--
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v6 18/19] x86: Add support for generic vDSO
From: Thomas Gleixner @ 2019-06-22 14:46 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Michael Kelley, Vincenzo Frascino, linux-arch@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Catalin Marinas, Will Deacon,
	Arnd Bergmann, Russell King, Ralf Baechle, Paul Burton,
	Daniel Lezcano, Mark Salyzyn, Peter Collingbourne, Shuah Khan,
	Dmitry Safonov, Rasmus Villemoes, Huw Davies, linux-hyperv,
	Greg KH, Stephen Rothwell
In-Reply-To: <20190614211710.GQ1513@sasha-vm>

On Fri, 14 Jun 2019, Sasha Levin wrote:
> On Fri, Jun 14, 2019 at 01:15:23PM +0200, Thomas Gleixner wrote:
> > On Thu, 30 May 2019, Michael Kelley wrote:
> > > Vincenzo -- these changes for Hyper-V are a subset of a larger patch set
> > > I have that moves all of the Hyper-V clock/timer code into a separate
> > > clocksource driver in drivers/clocksource, with an include file in
> > > includes/clocksource.  That new include file should be able to work
> > > instead of your new mshyperv-tsc.h.  It also has the benefit of being
> > > ISA neutral, so it will work with my in-progress patch set to support
> > > Linux on Hyper-V on ARM64.  See https://lkml.org/lkml/2019/5/27/231
> > > for the new clocksource driver patch set.
> > 
> > Grrr. That's queued in hyperv-next for whatever reasons.
> 
> I queue up our future pull requests there to give them some soaking in
> -next.

What? You queue completely unreviewed stuff which touches two other
subsystems to let it soak in next?

> > Sasha, can you please provide me the branch to pull from so I can have a
> > common base for all the various changes floating around?
> 
> I'll send you a unified pull request for these changes.

Which has not materialized yet.

TBH, I'm pretty grumpy about those clocksource changes. Here is the
diffstat:

 MAINTAINERS                          |    2 
 arch/x86/entry/vdso/vclock_gettime.c |    1 
 arch/x86/entry/vdso/vma.c            |    2 
 arch/x86/hyperv/hv_init.c            |   91 ---------
 arch/x86/include/asm/hyperv-tlfs.h   |    6 
 arch/x86/include/asm/mshyperv.h      |   81 +-------
 arch/x86/kernel/cpu/mshyperv.c       |    2 
 arch/x86/kvm/x86.c                   |    1 
 drivers/clocksource/Makefile         |    1 
 drivers/clocksource/hyperv_timer.c   |  322 +++++++++++++++++++++++++++++++++++
 drivers/hv/Kconfig                   |    3 
 drivers/hv/hv.c                      |  156 ----------------
 drivers/hv/hv_util.c                 |    1 
 drivers/hv/hyperv_vmbus.h            |    3 
 drivers/hv/vmbus_drv.c               |   42 ++--
 include/clocksource/hyperv_timer.h   |  105 +++++++++++

While the world and some more people have been CC'ed on those patches,
neither the clocksource nor the x86 maintainer have been.

When I gave Vincenzo the advise to base his code on that hyper-v branch, I
expected that I find the related patches in my mail backlog. No, they have
not been there because I was not on CC.

Folks, please stop chosing Cc lists as you like. We have well established
rules for that. And please stop queueing random unreviewed patches in
next. Next is not a playground for not ready and unreviewed stuff. No, the
hyper-v inbreed Reviewed-by is not sufficient for anything x86 and
clocksource related.

After chasing and looking at those patches, which have horrible subject
lines and changelogs btw, I was not able to judge quickly whether that
stuff is self contained or not. So no, I fixed up the fallout and rebased
Vincenzos VDSO stuff on mainline w/o those hyperv changes simply because if
they are not self contained they will break bisection badly.

I'm going to push out the VDSO series later today. That will nicely break
in combination with the hyper-next branch. Stephen, please drop that and do
not try to handle the fallout. That stuff needs to go through the proper
channels or at least be acked/reviewed by the relevant maintainers. So the
hyper-v folks can rebase themself and post it proper.

Yours grumpy,

	tglx

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox