All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Nadav Amit <namit@vmware.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	Sasha Levin <sashal@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Juergen Gross <jgross@suse.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	linux-hyperv@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	xen-devel@lists.xenproject.org
Subject: Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Date: Mon, 22 Jul 2019 21:14:33 +0200	[thread overview]
Message-ID: <20190722191433.GD6698@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20190719005837.4150-5-namit@vmware.com>

On Thu, Jul 18, 2019 at 05:58:32PM -0700, Nadav Amit wrote:
> @@ -709,8 +716,9 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  	 * doing a speculative memory access.
>  	 */
>  	if (info->freed_tables) {
> -		smp_call_function_many(cpumask, flush_tlb_func_remote,
> -			       (void *)info, 1);
> +		__smp_call_function_many(cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
> +					 (void *)info, 1);
>  	} else {
>  		/*
>  		 * Although we could have used on_each_cpu_cond_mask(),
> @@ -737,7 +745,8 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  			if (tlb_is_not_lazy(cpu))
>  				__cpumask_set_cpu(cpu, cond_cpumask);
>  		}
> -		smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
>  					 (void *)info, 1);
>  	}
>  }

Do we really need that _local/_remote distinction? ISTR you had a patch
that frobbed flush_tlb_info into the csd and that gave space
constraints, but I'm not seeing that here (probably a wise, get stuff
merged etc..).

struct __call_single_data {
        struct llist_node          llist;                /*     0     8 */
        smp_call_func_t            func;                 /*     8     8 */
        void *                     info;                 /*    16     8 */
        unsigned int               flags;                /*    24     4 */

        /* size: 32, cachelines: 1, members: 4 */
        /* padding: 4 */
        /* last cacheline: 32 bytes */
};

struct flush_tlb_info {
        struct mm_struct *         mm;                   /*     0     8 */
        long unsigned int          start;                /*     8     8 */
        long unsigned int          end;                  /*    16     8 */
        u64                        new_tlb_gen;          /*    24     8 */
        unsigned int               stride_shift;         /*    32     4 */
        bool                       freed_tables;         /*    36     1 */

        /* size: 40, cachelines: 1, members: 6 */
        /* padding: 3 */
        /* last cacheline: 40 bytes */
};

IIRC what you did was make void *__call_single_data::info the last
member and a union until the full cacheline size (64). Given the above
that would get us 24 bytes for csd, leaving us 40 for that
flush_tlb_info.

But then we can still do something like the below, which doesn't change
things and still gets rid of that dual function crud, simplifying
smp_call_function_many again.

Index: linux-2.6/arch/x86/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/tlbflush.h
+++ linux-2.6/arch/x86/include/asm/tlbflush.h
@@ -546,8 +546,9 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
-	unsigned int		stride_shift;
-	bool			freed_tables;
+	unsigned int		cpu;
+	unsigned short		stride_shift;
+	unsigned char		freed_tables;
 };
 
 #define local_flush_tlb() __flush_tlb()
Index: linux-2.6/arch/x86/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/tlb.c
+++ linux-2.6/arch/x86/mm/tlb.c
@@ -659,6 +659,27 @@ static void flush_tlb_func_remote(void *
 	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
 }
 
+static void flush_tlb_func(void *info)
+{
+	const struct flush_tlb_info *f = info;
+	enum tlb_flush_reason reason = TLB_REMOTE_SHOOTDOWN;
+	bool local = false;
+
+	if (f->cpu == smp_processor_id()) {
+		local = true;
+		reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
+	} else {
+		inc_irq_stat(irq_tlb_count);
+
+		if (f->mm && f->mm != this_cpu_read(cpu_tlbstate.loaded_mm))
+			return;
+
+		count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+	}
+
+	flush_tlb_func_common(f, local, reason);
+}
+
 static bool tlb_is_not_lazy(int cpu)
 {
 	return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);


WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Nadav Amit <namit@vmware.com>
Cc: Sasha Levin <sashal@kernel.org>, Juergen Gross <jgross@suse.com>,
	linux-hyperv@vger.kernel.org, x86@kernel.org,
	Stephen Hemminger <sthemmin@microsoft.com>,
	kvm@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	xen-devel@lists.xenproject.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Date: Mon, 22 Jul 2019 21:14:33 +0200	[thread overview]
Message-ID: <20190722191433.GD6698@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20190719005837.4150-5-namit@vmware.com>

On Thu, Jul 18, 2019 at 05:58:32PM -0700, Nadav Amit wrote:
> @@ -709,8 +716,9 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  	 * doing a speculative memory access.
>  	 */
>  	if (info->freed_tables) {
> -		smp_call_function_many(cpumask, flush_tlb_func_remote,
> -			       (void *)info, 1);
> +		__smp_call_function_many(cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
> +					 (void *)info, 1);
>  	} else {
>  		/*
>  		 * Although we could have used on_each_cpu_cond_mask(),
> @@ -737,7 +745,8 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  			if (tlb_is_not_lazy(cpu))
>  				__cpumask_set_cpu(cpu, cond_cpumask);
>  		}
> -		smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
>  					 (void *)info, 1);
>  	}
>  }

Do we really need that _local/_remote distinction? ISTR you had a patch
that frobbed flush_tlb_info into the csd and that gave space
constraints, but I'm not seeing that here (probably a wise, get stuff
merged etc..).

struct __call_single_data {
        struct llist_node          llist;                /*     0     8 */
        smp_call_func_t            func;                 /*     8     8 */
        void *                     info;                 /*    16     8 */
        unsigned int               flags;                /*    24     4 */

        /* size: 32, cachelines: 1, members: 4 */
        /* padding: 4 */
        /* last cacheline: 32 bytes */
};

struct flush_tlb_info {
        struct mm_struct *         mm;                   /*     0     8 */
        long unsigned int          start;                /*     8     8 */
        long unsigned int          end;                  /*    16     8 */
        u64                        new_tlb_gen;          /*    24     8 */
        unsigned int               stride_shift;         /*    32     4 */
        bool                       freed_tables;         /*    36     1 */

        /* size: 40, cachelines: 1, members: 6 */
        /* padding: 3 */
        /* last cacheline: 40 bytes */
};

IIRC what you did was make void *__call_single_data::info the last
member and a union until the full cacheline size (64). Given the above
that would get us 24 bytes for csd, leaving us 40 for that
flush_tlb_info.

But then we can still do something like the below, which doesn't change
things and still gets rid of that dual function crud, simplifying
smp_call_function_many again.

Index: linux-2.6/arch/x86/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/tlbflush.h
+++ linux-2.6/arch/x86/include/asm/tlbflush.h
@@ -546,8 +546,9 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
-	unsigned int		stride_shift;
-	bool			freed_tables;
+	unsigned int		cpu;
+	unsigned short		stride_shift;
+	unsigned char		freed_tables;
 };
 
 #define local_flush_tlb() __flush_tlb()
Index: linux-2.6/arch/x86/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/tlb.c
+++ linux-2.6/arch/x86/mm/tlb.c
@@ -659,6 +659,27 @@ static void flush_tlb_func_remote(void *
 	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
 }
 
+static void flush_tlb_func(void *info)
+{
+	const struct flush_tlb_info *f = info;
+	enum tlb_flush_reason reason = TLB_REMOTE_SHOOTDOWN;
+	bool local = false;
+
+	if (f->cpu == smp_processor_id()) {
+		local = true;
+		reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
+	} else {
+		inc_irq_stat(irq_tlb_count);
+
+		if (f->mm && f->mm != this_cpu_read(cpu_tlbstate.loaded_mm))
+			return;
+
+		count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+	}
+
+	flush_tlb_func_common(f, local, reason);
+}
+
 static bool tlb_is_not_lazy(int cpu)
 {
 	return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Nadav Amit <namit@vmware.com>
Cc: Sasha Levin <sashal@kernel.org>, Juergen Gross <jgross@suse.com>,
	linux-hyperv@vger.kernel.org, x86@kernel.org,
	Stephen Hemminger <sthemmin@microsoft.com>,
	kvm@vger.kernel.org, Haiyang Zhang <haiyangz@microsoft.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	xen-devel@lists.xenproject.org,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"K. Y. Srinivasan" <kys@microsoft.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [Xen-devel] [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently
Date: Mon, 22 Jul 2019 21:14:33 +0200	[thread overview]
Message-ID: <20190722191433.GD6698@worktop.programming.kicks-ass.net> (raw)
In-Reply-To: <20190719005837.4150-5-namit@vmware.com>

On Thu, Jul 18, 2019 at 05:58:32PM -0700, Nadav Amit wrote:
> @@ -709,8 +716,9 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  	 * doing a speculative memory access.
>  	 */
>  	if (info->freed_tables) {
> -		smp_call_function_many(cpumask, flush_tlb_func_remote,
> -			       (void *)info, 1);
> +		__smp_call_function_many(cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
> +					 (void *)info, 1);
>  	} else {
>  		/*
>  		 * Although we could have used on_each_cpu_cond_mask(),
> @@ -737,7 +745,8 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>  			if (tlb_is_not_lazy(cpu))
>  				__cpumask_set_cpu(cpu, cond_cpumask);
>  		}
> -		smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +		__smp_call_function_many(cond_cpumask, flush_tlb_func_remote,
> +					 flush_tlb_func_local,
>  					 (void *)info, 1);
>  	}
>  }

Do we really need that _local/_remote distinction? ISTR you had a patch
that frobbed flush_tlb_info into the csd and that gave space
constraints, but I'm not seeing that here (probably a wise, get stuff
merged etc..).

struct __call_single_data {
        struct llist_node          llist;                /*     0     8 */
        smp_call_func_t            func;                 /*     8     8 */
        void *                     info;                 /*    16     8 */
        unsigned int               flags;                /*    24     4 */

        /* size: 32, cachelines: 1, members: 4 */
        /* padding: 4 */
        /* last cacheline: 32 bytes */
};

struct flush_tlb_info {
        struct mm_struct *         mm;                   /*     0     8 */
        long unsigned int          start;                /*     8     8 */
        long unsigned int          end;                  /*    16     8 */
        u64                        new_tlb_gen;          /*    24     8 */
        unsigned int               stride_shift;         /*    32     4 */
        bool                       freed_tables;         /*    36     1 */

        /* size: 40, cachelines: 1, members: 6 */
        /* padding: 3 */
        /* last cacheline: 40 bytes */
};

IIRC what you did was make void *__call_single_data::info the last
member and a union until the full cacheline size (64). Given the above
that would get us 24 bytes for csd, leaving us 40 for that
flush_tlb_info.

But then we can still do something like the below, which doesn't change
things and still gets rid of that dual function crud, simplifying
smp_call_function_many again.

Index: linux-2.6/arch/x86/include/asm/tlbflush.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/tlbflush.h
+++ linux-2.6/arch/x86/include/asm/tlbflush.h
@@ -546,8 +546,9 @@ struct flush_tlb_info {
 	unsigned long		start;
 	unsigned long		end;
 	u64			new_tlb_gen;
-	unsigned int		stride_shift;
-	bool			freed_tables;
+	unsigned int		cpu;
+	unsigned short		stride_shift;
+	unsigned char		freed_tables;
 };
 
 #define local_flush_tlb() __flush_tlb()
Index: linux-2.6/arch/x86/mm/tlb.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/tlb.c
+++ linux-2.6/arch/x86/mm/tlb.c
@@ -659,6 +659,27 @@ static void flush_tlb_func_remote(void *
 	flush_tlb_func_common(f, false, TLB_REMOTE_SHOOTDOWN);
 }
 
+static void flush_tlb_func(void *info)
+{
+	const struct flush_tlb_info *f = info;
+	enum tlb_flush_reason reason = TLB_REMOTE_SHOOTDOWN;
+	bool local = false;
+
+	if (f->cpu == smp_processor_id()) {
+		local = true;
+		reason = (f->mm == NULL) ? TLB_LOCAL_SHOOTDOWN : TLB_LOCAL_MM_SHOOTDOWN;
+	} else {
+		inc_irq_stat(irq_tlb_count);
+
+		if (f->mm && f->mm != this_cpu_read(cpu_tlbstate.loaded_mm))
+			return;
+
+		count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED);
+	}
+
+	flush_tlb_func_common(f, local, reason);
+}
+
 static bool tlb_is_not_lazy(int cpu)
 {
 	return !per_cpu(cpu_tlbstate_shared.is_lazy, cpu);


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2019-07-22 19:14 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-19  0:58 [PATCH v3 0/9] x86: Concurrent TLB flushes Nadav Amit
2019-07-19  0:58 ` [Xen-devel] " Nadav Amit
2019-07-19  0:58 ` [PATCH v3 1/9] smp: Run functions concurrently in smp_call_function_many() Nadav Amit
2019-07-19 18:23   ` Dave Hansen
2019-07-22 18:16     ` Peter Zijlstra
2019-07-22 18:41       ` Nadav Amit
2019-07-22 19:34         ` Peter Zijlstra
2019-07-22 18:21   ` Peter Zijlstra
2019-07-22 18:34     ` Nadav Amit
2019-07-22 19:19       ` Peter Zijlstra
2019-07-22 18:37     ` Thomas Gleixner
2019-07-22 18:40       ` Nadav Amit
2019-07-22 18:51         ` Thomas Gleixner
2019-07-22 19:02           ` Nadav Amit
2019-07-25 12:36             ` Thomas Gleixner
2019-07-25 19:10               ` Nadav Amit
2019-07-19  0:58 ` [PATCH v3 2/9] x86/mm/tlb: Remove reason as argument for flush_tlb_func_local() Nadav Amit
2019-07-19  0:58 ` [PATCH v3 3/9] x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy() Nadav Amit
2019-07-19 18:36   ` Dave Hansen
2019-07-19 18:41     ` Nadav Amit
2019-07-19 22:44       ` Joe Perches
2019-07-19 23:02         ` Nadav Amit
2019-07-22 18:27   ` Peter Zijlstra
2019-07-22 19:47   ` Rasmus Villemoes
2019-07-22 19:51     ` Nadav Amit
2019-07-19  0:58 ` [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently Nadav Amit via Virtualization
2019-07-19  0:58 ` Nadav Amit
2019-07-19  0:58   ` [Xen-devel] " Nadav Amit
2019-07-22 19:14   ` Peter Zijlstra [this message]
2019-07-22 19:14     ` Peter Zijlstra
2019-07-22 19:14     ` Peter Zijlstra
2019-07-22 19:27     ` Nadav Amit
2019-07-22 19:27       ` [Xen-devel] " Nadav Amit
2019-07-22 19:32       ` Peter Zijlstra
2019-07-22 19:32         ` [Xen-devel] " Peter Zijlstra
2019-07-22 19:32         ` Peter Zijlstra
2019-07-22 19:27     ` Nadav Amit via Virtualization
2019-07-26  7:28   ` Juergen Gross
2019-07-26  7:28   ` Juergen Gross
2019-07-26  7:28     ` [Xen-devel] " Juergen Gross
2019-07-31  0:13   ` Michael Kelley via Virtualization
2019-07-31  0:13   ` Michael Kelley
2019-07-31  0:13     ` [Xen-devel] " Michael Kelley
2019-07-19  0:58 ` [PATCH v3 5/9] x86/mm/tlb: Privatize cpu_tlbstate Nadav Amit
2019-07-19 18:38   ` Dave Hansen
2019-07-19 18:43     ` Nadav Amit
2019-07-19 18:48       ` Dave Hansen
2019-07-19 18:54         ` Nadav Amit
2019-07-20 13:58           ` Andy Lutomirski
2019-07-21 20:21     ` Nadav Amit
2019-07-19  0:58 ` [PATCH v3 6/9] x86/mm/tlb: Do not make is_lazy dirty for no reason Nadav Amit
2019-07-19  0:58 ` [PATCH v3 7/9] cpumask: Mark functions as pure Nadav Amit
2019-07-19  0:58 ` [PATCH v3 8/9] x86/mm/tlb: Remove UV special case Nadav Amit
2019-07-19  2:25   ` Mike Travis
2019-07-19  4:58     ` Nadav Amit
2019-07-31  3:11     ` Nadav Amit
2019-07-19  0:58 ` [PATCH v3 9/9] x86/mm/tlb: Remove unnecessary uses of the inline keyword Nadav Amit
2019-07-19 21:36 ` [PATCH v3 0/9] x86: Concurrent TLB flushes Dave Hansen
2019-07-19 21:36 ` Dave Hansen
2019-07-19 21:36   ` [Xen-devel] " Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190722191433.GD6698@worktop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=haiyangz@microsoft.com \
    --cc=jgross@suse.com \
    --cc=kvm@vger.kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namit@vmware.com \
    --cc=pbonzini@redhat.com \
    --cc=sashal@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.